All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: xen master: xl create hangs
       [not found] ` <S4N2Wfl4ELkLaDHWDd44TotbqXvtrCzjQ5_gKmiThQxKPdsssI93Hy-et5a4CIULJylIynUpvIRPTLL7Zkm4-4Nw6cNDfR9_Y5NWzIDsy6s=@protonmail.com>
@ 2022-07-20 14:31   ` Anthony PERARD
  2022-07-20 15:04     ` Mathieu Tarral
  0 siblings, 1 reply; 5+ messages in thread
From: Anthony PERARD @ 2022-07-20 14:31 UTC (permalink / raw)
  To: Mathieu Tarral
  Cc: Xen-users, George Dunlap, George Dunlap, Juergen Gross, xen-devel

CCing Juergen and xen-devel.

On Mon, Jul 18, 2022 at 06:25:54PM +0000, Mathieu Tarral wrote:
> Using gdb to debug the xl process, I get the following stacktrace:
> 
> (gdb) bt
> #0  __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=8652, futex_word=0x7f6debd22a50) at ./nptl/futex-internal.c:57
> #1  __futex_abstimed_wait_common (cancel=true, private=128, abstime=0x0, clockid=0, expected=8652, futex_word=0x7f6debd22a50) at ./nptl/futex-internal.c:87
> #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7f6debd22a50, expected=8652, clockid=clockid@entry=0, abstime=abstime@entry=0x0,
>     private=private@entry=128) at ./nptl/futex-internal.c:139
> #3  0x00007f6deba736a4 in __pthread_clockjoin_ex (threadid=140110084581248, thread_return=thread_return@entry=0x0, clockid=clockid@entry=0,
>     abstime=abstime@entry=0x0, block=block@entry=true) at ./nptl/pthread_join_common.c:105
> #4  0x00007f6deba73543 in ___pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0x0) at ./nptl/pthread_join.c:24
> #5  0x00007f6deb9a144b in xs_daemon_close (h=0x561db3bc5bc0) at xs.c:366
> #6  0x00007f6deb9a145f in xs_close (xsh=<optimized out>) at xs.c:386
> #7  0x00007f6debc43a36 in libxl_ctx_free (ctx=0x561db3bc52e0) at libxl.c:173
> #8  0x0000561db33bf5a3 in xl_ctx_free () at xl.c:370
> #9  0x00007f6deba22495 in __run_exit_handlers (status=0, listp=0x7f6debbf6838 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true,
>     run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:113
> #10 0x00007f6deba22610 in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:143
> #11 0x00007f6deba06d97 in __libc_start_call_main (main=main@entry=0x561db33c0425 <main>, argc=argc@entry=4, argv=argv@entry=0x7ffeb2f263d8)
>     at ../sysdeps/nptl/libc_start_call_main.h:74
> #12 0x00007f6deba06e40 in __libc_start_main_impl (main=0x561db33c0425 <main>, argc=4, argv=0x7ffeb2f263d8, init=<optimized out>, fini=<optimized out>,
>     rtld_fini=<optimized out>, stack_end=0x7ffeb2f263c8) at ../csu/libc-start.c:392
> #13 0x0000561db33bf425 in _start ()
> 
> Colorized version in a Github Gist:
> https://gist.github.com/Wenzel/4da1e0a025954fac13a0ee57147cc44f
> 
> So looks like xs_daemon_close is waiting on a thread to join:
> https://github.com/xen-project/xen/blob/a5fb66f4513c2c2d222dcc3753163b15690bd003/tools/libs/store/xs.c#L366

On Wed, Jul 20, 2022 at 12:53:29PM +0000, Mathieu Tarral wrote:
> > Verify that things work properly at that commit, then use that as the “good” starting point.
> 
> Turns out that this commit (74a11c43fd7e074b1f77631b446dd2115eacb9e8) was also bad.
> So I used git bisect again, but this time to find the commit which introduced the bug fix
> between 74a11c43fd7e074b1f77631b446dd2115eacb9e8 and RELEASE-4.16.1.
> 
> After a few steps, git bisect identified this commit:
> https://github.com/xen-project/xen/commit/59505f48fabed2e6fa5ad992edaabeb4a1441599
> "Turn off debug by default"
> Surprisingly simple.
> 
> And I confirm that it's one that fixes the issue of xl create hanging.
> 
> I cherry-picked this commit on master:
> https://user-images.githubusercontent.com/964610/179986382-a774c91a-7b68-416b-9dbe-226b8aca0673.png
> 
> recompiled and tested again, my master branch now works as expected, tested with the small config file I already had and the XTF test-pv64-example.
> 
> So it works, but I don't know why this commit fixed it.

$(debug) controls the level of optimisation of the compilation to make
it easier to debug.

So, with debug=y, we have libxenstore having issue with killing the
its reading thread? :-(
Maybe that reading thread is doing something that can't be stopped,
maybe it's waiting for a lock. Could you try to print a back trace of
that thread (or even all thread in `xl`)? ("thread apply all bt full" in gdb)

Thanks,

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xen master: xl create hangs
  2022-07-20 14:31   ` xen master: xl create hangs Anthony PERARD
@ 2022-07-20 15:04     ` Mathieu Tarral
  2022-07-20 16:24       ` Anthony PERARD
  0 siblings, 1 reply; 5+ messages in thread
From: Mathieu Tarral @ 2022-07-20 15:04 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Xen-users, George Dunlap, George Dunlap, Juergen Gross, xen-devel

Hi Anthony


> $(debug) controls the level of optimisation of the compilation to make
> it easier to debug.
>
> So, with debug=y, we have libxenstore having issue with killing the
> its reading thread? :-(
> Maybe that reading thread is doing something that can't be stopped,
> maybe it's waiting for a lock. Could you try to print a back trace of
> that thread (or even all thread in `xl`)? ("thread apply all bt full" in gdb)

I recompiled the buggy master, and this is the full GDB stacktrace when xl create hangs:
https://gist.github.com/Wenzel/969d5c06982246cd6cb2eb8cdf252a18

I don't see the same stacktrace as before, maybe I was on a different commit ?

I hope this helps.

Mathieu



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xen master: xl create hangs
  2022-07-20 15:04     ` Mathieu Tarral
@ 2022-07-20 16:24       ` Anthony PERARD
  2022-07-20 17:30         ` Mathieu Tarral
  2022-07-20 17:42         ` Elliott Mitchell
  0 siblings, 2 replies; 5+ messages in thread
From: Anthony PERARD @ 2022-07-20 16:24 UTC (permalink / raw)
  To: Mathieu Tarral
  Cc: Xen-users, George Dunlap, George Dunlap, Juergen Gross, xen-devel

On Wed, Jul 20, 2022 at 03:04:22PM +0000, Mathieu Tarral wrote:
> Hi Anthony
> 
> 
> > $(debug) controls the level of optimisation of the compilation to make
> > it easier to debug.
> >
> > So, with debug=y, we have libxenstore having issue with killing the
> > its reading thread? :-(
> > Maybe that reading thread is doing something that can't be stopped,
> > maybe it's waiting for a lock. Could you try to print a back trace of
> > that thread (or even all thread in `xl`)? ("thread apply all bt full" in gdb)
> 
> I recompiled the buggy master, and this is the full GDB stacktrace when xl create hangs:
> https://gist.github.com/Wenzel/969d5c06982246cd6cb2eb8cdf252a18
> 
> I don't see the same stacktrace as before, maybe I was on a different commit ?

I think that this `xl` process just wait for the domain to shutdown
or die. When we run `xl create`, before exiting there's a fork/exec of
xl which handle a few domain events, so it looks like this stack trace
is expected (and look like the one I have). So it don't looks like to be
the xl process that hangs.

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xen master: xl create hangs
  2022-07-20 16:24       ` Anthony PERARD
@ 2022-07-20 17:30         ` Mathieu Tarral
  2022-07-20 17:42         ` Elliott Mitchell
  1 sibling, 0 replies; 5+ messages in thread
From: Mathieu Tarral @ 2022-07-20 17:30 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Xen-users, George Dunlap, George Dunlap, Juergen Gross, xen-devel

> I think that this `xl` process just wait for the domain to shutdown
> or die. When we run `xl create`, before exiting there's a fork/exec of
> xl which handle a few domain events, so it looks like this stack trace
> is expected (and look like the one I have). So it don't looks like to be
> the xl process that hangs.

I tested again but this time with XTF test-pv64-example:
https://user-images.githubusercontent.com/964610/180044164-74d12f63-d901-4e33-93be-073c7ed8d7dc.png

This is the new xl stacktrace:
https://gist.github.com/Wenzel/969d5c06982246cd6cb2eb8cdf252a18#file-gdb2-xs-daemon-close-c

It now shows the first thread waiting on the reading thread to join, as we expected:
https://github.com/xen-project/xen/blob/0e60f1d9d1970cae49ee9d03f5759f44afc1fdee/tools/libs/store/xs.c#L366

And the second one waiting in read_message:
https://github.com/xen-project/xen/blob/0e60f1d9d1970cae49ee9d03f5759f44afc1fdee/tools/libs/store/xs.c#L1265

Mathieu


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xen master: xl create hangs
  2022-07-20 16:24       ` Anthony PERARD
  2022-07-20 17:30         ` Mathieu Tarral
@ 2022-07-20 17:42         ` Elliott Mitchell
  1 sibling, 0 replies; 5+ messages in thread
From: Elliott Mitchell @ 2022-07-20 17:42 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Mathieu Tarral, Xen-users, George Dunlap, George Dunlap,
	Juergen Gross, xen-devel

On Wed, Jul 20, 2022 at 05:24:22PM +0100, Anthony PERARD wrote:
> 
> I think that this `xl` process just wait for the domain to shutdown
> or die. When we run `xl create`, before exiting there's a fork/exec of
> xl which handle a few domain events, so it looks like this stack trace
> is expected (and look like the one I have). So it don't looks like to be
> the xl process that hangs.

I've got a patch to make use of `setproctitle()` to modify what shows as
the process name for this process.  Unfortunately `setproctitle()` is
*BSD-only, "libbsd" for Linux implements similar functionality and I've
been meaning to figure out how it works.

I definitely think this should be done, just haven't gotten around to
finding a proper way to do it.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-07-20 17:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <DAbRAAjnRz3aFx_bSck4UDuz2jfsLuEiepSpT3aNvEh0HhRko8ZIKQYb2FWqtqUVJhxG9VzNDz4oTmqNU6HlSeorDeS_JrDns76I4yeHPrY=@protonmail.com>
     [not found] ` <S4N2Wfl4ELkLaDHWDd44TotbqXvtrCzjQ5_gKmiThQxKPdsssI93Hy-et5a4CIULJylIynUpvIRPTLL7Zkm4-4Nw6cNDfR9_Y5NWzIDsy6s=@protonmail.com>
2022-07-20 14:31   ` xen master: xl create hangs Anthony PERARD
2022-07-20 15:04     ` Mathieu Tarral
2022-07-20 16:24       ` Anthony PERARD
2022-07-20 17:30         ` Mathieu Tarral
2022-07-20 17:42         ` Elliott Mitchell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.