All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] libxl: Fix deadlock with pygrub
@ 2014-02-24 14:19 Ian Jackson
  2014-02-24 14:19 ` [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc Ian Jackson
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Ian Jackson @ 2014-02-24 14:19 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Ian Campbell, M A Young

Thanks to Michael Young for reporting this, and my apologies for
introducing this bug.

  1/3 libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2/3 libxl: Hold the atfork lock while closing carefd
  3/3 libxl: Fix carefd lock leak in save callout

Patch 1 is the actual bugfix.  The bug is IMO clearly a blocker for
4.4 so the patch should go in if it is correct.  I have checked that
it seems to fix the problem for me.

Patches 2 and 3 are other theoretical locking bugs I discovered while
looking for the cause of the pygrub deadlock.  I haven't tried to
construct test cases that might exercise these bugs; doing so would be
quite difficult.

The atfork carefd race (patch 2) might be relevant to callers which
are multithreaded and also call libxl_postfork_child_noexec.  libvirt
does not make any such call.  xl is single-threaded.  I haven't
investigated other toolstacks, but the race is theoretical rather than
observed.

The carefd lock leak (patch 3) is obvious but theoretical except in
deeply pathological situations.  I don't think it's worth adding risk
to 4.4 to fix it.

So I think patch 1 should go into 4.4.0 after review.
2-3 should wait for 4.4.1 (coming via unstable in the usual way).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2014-02-24 14:19 [PATCH 0/3] libxl: Fix deadlock with pygrub Ian Jackson
@ 2014-02-24 14:19 ` Ian Jackson
  2014-02-24 14:45   ` Ian Campbell
  2014-02-24 15:17   ` [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc George Dunlap
  2014-02-24 14:19 ` [PATCH 2/3] libxl: Hold the atfork lock while closing carefd Ian Jackson
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Ian Jackson @ 2014-02-24 14:19 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Ian Jackson, Ian Campbell, M A Young

libxl_postfork_child_noexec would nestedly reaquire the non-recursive
"no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove.
The result on Linux is that the process always deadlocks before
returning from this function.

This is used by xl's console child.  So, the ultimate effect is that
xl with pygrub does not manage to connect to the pygrub console.
This beahviour was reported by Michael Young in Xen 4.4.0 RC5.

Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is
not correct with SIGCHLD sharing.  libxl_postfork_child_noexec is
documented to suffice if called only on one ctx.  So deregistering the
ctx it's called on is not sufficient.  Instead, we need a new approach
which discards the whole sigchld_user list and unconditionally removes
our SIGCHLD handler if we had one.

Prompted by this, clarify the semantics of
libxl_postfork_child_noexec.  Specifically, expand on the meaning of
"quickly" by explaining what operations are not permitted; and
document the fact that the function doesn't reclaim the resources in
the ctxs.

And add a comment in libxl_postfork_child_noexec explaining the
internal concurrency situation.

This is an important bugfix.  IMO the bug is a blocker for Xen 4.4.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reported-by: M A Young <m.a.young@durham.ac.uk>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
---
 tools/libxl/libxl_event.h |   16 ++++++++++++++++
 tools/libxl/libxl_fork.c  |   44 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
index ca43cb9..b5db83c 100644
--- a/tools/libxl/libxl_event.h
+++ b/tools/libxl/libxl_event.h
@@ -601,6 +601,22 @@ void libxl_childproc_sigchld_occurred(libxl_ctx *ctx)
  * this all previously existing libxl_ctx's are invalidated and
  * must not be used - or even freed.  It is harmless to call this
  * postfork function and then exec anyway.
+ *
+ * Until libxl_postfork_child_noexec has returned:
+ *  - No other libxl calls may be made.
+ *  - If any libxl ctx was configured handle the process's SIGCHLD,
+ *    the child may not create further (grand)child processes, nor
+ *    manipulate SIGCHLD.
+ *
+ * libxl_postfork_child_noexec may not reclaim all the resources
+ * associated with the libxl ctx.  This includes but is not limited
+ * to: ordinary memory; files on disk and in /var/run; file
+ * descriptors; memory mapped into the process from domains being
+ * managed (grant maps); Xen event channels.  Use of libxl in
+ * processes which fork long-lived children is not recommended for
+ * this reason.  libxl_postfork_child_noexec is provided so that
+ * an application can make further libxl calls in a child which
+ * is going to exec or exit soon.
  */
 void libxl_postfork_child_noexec(libxl_ctx *ctx);
 
diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
index 1d0017b..8421296 100644
--- a/tools/libxl/libxl_fork.c
+++ b/tools/libxl/libxl_fork.c
@@ -60,6 +60,9 @@ static void sigchld_removehandler_core(void); /* idempotent */
 static void sigchld_user_remove(libxl_ctx *ctx); /* idempotent */
 static void sigchld_sethandler_raw(void (*handler)(int), struct sigaction *old);
 
+static void defer_sigchld(void);
+static void release_sigchld(void);
+
 static void atfork_lock(void)
 {
     int r = pthread_mutex_lock(&no_forking);
@@ -117,6 +120,19 @@ libxl__carefd *libxl__carefd_opened(libxl_ctx *ctx, int fd)
 
 void libxl_postfork_child_noexec(libxl_ctx *ctx)
 {
+    /*
+     * Anything running without the no_forking lock (atfork_lock)
+     * might be interrupted by fork.  But libxl functions other than
+     * this one are then forbidden to the child.
+     *
+     * Conversely, this function might interrupt any other libxl
+     * operation (even though that other operation has the libxl ctx
+     * lock).  We don't take the lock ourselves, since we are running
+     * in the child and if the lock is held the thread that took it no
+     * longer exists.  To prevent us being interrupted by another call
+     * to ourselves (whether in another thread or by virtue of another
+     * fork) we take the atfork lock ourselves.
+     */
     libxl__carefd *cf, *cf_tmp;
     int r;
 
@@ -134,7 +150,33 @@ void libxl_postfork_child_noexec(libxl_ctx *ctx)
     }
     LIBXL_LIST_INIT(&carefds);
 
-    sigchld_user_remove(ctx);
+    if (sigchld_installed) {
+        /* We are in theory not at risk of concurrent execution of the
+         * SIGCHLD handler, because the application should call
+         * libxl_postfork_child_noexec before the child forks again.
+         * (If the SIGCHLD was in flight in the parent at the time of
+         * the fork, the thread it was delivered on exists only in the
+         * parent so is not our concern.)
+         *
+         * But in case the application violated this rule (and did so
+         * while multithreaded in the child), we use our deferral
+         * machinery.  The result is that the SIGCHLD may then be lost
+         * (i.e. signaled to the now-defunct libxl ctx(s)).  But at
+         * least we won't execute undefined behaviour (by examining
+         * the list in the signal handler concurrently with clearing
+         * it here), and since we won't actually reap the new children
+         * things will in fact go OK if the application doesn't try to
+         * use SIGCHLD, but instead just waits for the child(ren). */
+        defer_sigchld();
+
+        LIBXL_LIST_INIT(&sigchld_users);
+        /* After this the ->sigchld_user_registered entries in the
+         * now-obsolete contexts may be lies.  But that's OK because
+         * no-one will look at them. */
+
+        release_sigchld();
+        sigchld_removehandler_core();
+    }
 
     atfork_unlock();
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/3] libxl: Hold the atfork lock while closing carefd
  2014-02-24 14:19 [PATCH 0/3] libxl: Fix deadlock with pygrub Ian Jackson
  2014-02-24 14:19 ` [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc Ian Jackson
@ 2014-02-24 14:19 ` Ian Jackson
  2014-02-24 14:47   ` Ian Campbell
  2014-02-24 14:19 ` [PATCH 3/3] libxl: Fix carefd lock leak in save callout Ian Jackson
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Ian Jackson @ 2014-02-24 14:19 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Ian Jackson, Ian Campbell, M A Young

This avoids the process being forked while a carefd is recorded in the
list but the actual fd has been closed.  If that happened, a
subsequent libxl_postfork_child_noexec would attempt to close the fd
again.  If we are lucky that results in a harmless warning; but if we
are unlucky the fd number has been reused and we close an unrelated
fd.

This race has not been observed anywhere as far as we are aware.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
---
 tools/libxl/libxl_fork.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
index 8421296..fa15095 100644
--- a/tools/libxl/libxl_fork.c
+++ b/tools/libxl/libxl_fork.c
@@ -184,9 +184,9 @@ void libxl_postfork_child_noexec(libxl_ctx *ctx)
 int libxl__carefd_close(libxl__carefd *cf)
 {
     if (!cf) return 0;
+    atfork_lock();
     int r = cf->fd < 0 ? 0 : close(cf->fd);
     int esave = errno;
-    atfork_lock();
     LIBXL_LIST_REMOVE(cf, entry);
     atfork_unlock();
     free(cf);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/3] libxl: Fix carefd lock leak in save callout
  2014-02-24 14:19 [PATCH 0/3] libxl: Fix deadlock with pygrub Ian Jackson
  2014-02-24 14:19 ` [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc Ian Jackson
  2014-02-24 14:19 ` [PATCH 2/3] libxl: Hold the atfork lock while closing carefd Ian Jackson
@ 2014-02-24 14:19 ` Ian Jackson
  2014-02-24 14:48   ` Ian Campbell
  2014-02-24 14:49 ` [PATCH 0/3] libxl: Fix deadlock with pygrub Ian Campbell
  2014-03-13 13:59 ` Ian Campbell
  4 siblings, 1 reply; 26+ messages in thread
From: Ian Jackson @ 2014-02-24 14:19 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Ian Jackson, Ian Campbell, M A Young

If libxl_pipe fails we leave the carefd locked, which translates to
the atfork lock remaining held.  This would probably cause the process
to deadlock shortly afterwards.

Of course libxl_pipe is very unlikely to fail unless things are
already going very badly.  This bug has not been observed anywhere as
far as we are aware.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
---
 tools/libxl/libxl_save_callout.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 6e45b2f..e3bda8f 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -185,7 +185,11 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
     for (childfd=0; childfd<2; childfd++) {
         /* Setting up the pipe for the child's fd childfd */
         int fds[2];
-        if (libxl_pipe(CTX,fds)) { rc = ERROR_FAIL; goto out; }
+        if (libxl_pipe(CTX,fds)) {
+            rc = ERROR_FAIL;
+            libxl__carefd_unlock();
+            goto out;
+        }
         int childs_end = childfd==0 ? 0 /*read*/  : 1 /*write*/;
         int our_end    = childfd==0 ? 1 /*write*/ : 0 /*read*/;
         childs_pipes[childfd] = libxl__carefd_record(CTX, fds[childs_end]);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2014-02-24 14:19 ` [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc Ian Jackson
@ 2014-02-24 14:45   ` Ian Campbell
  2014-02-24 14:53     ` M A Young
  2014-02-24 15:17   ` [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc George Dunlap
  1 sibling, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2014-02-24 14:45 UTC (permalink / raw)
  To: Ian Jackson; +Cc: George Dunlap, xen-devel, M A Young

On Mon, 2014-02-24 at 14:19 +0000, Ian Jackson wrote:
> libxl_postfork_child_noexec would nestedly reaquire the non-recursive
> "no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove.
> The result on Linux is that the process always deadlocks before
> returning from this function.
> 
> This is used by xl's console child.  So, the ultimate effect is that
> xl with pygrub does not manage to connect to the pygrub console.
> This beahviour was reported by Michael Young in Xen 4.4.0 RC5.

"behaviour".

Michael reported this earlier on -rc2 as well but it fell through the
cracks because I failed to properly appreciate the severity. Sorry.

> Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is
> not correct with SIGCHLD sharing.  libxl_postfork_child_noexec is
> documented to suffice if called only on one ctx.  So deregistering the
> ctx it's called on is not sufficient.  Instead, we need a new approach
> which discards the whole sigchld_user list and unconditionally removes
> our SIGCHLD handler if we had one.
> 
> Prompted by this, clarify the semantics of
> libxl_postfork_child_noexec.  Specifically, expand on the meaning of
> "quickly" by explaining what operations are not permitted; and
> document the fact that the function doesn't reclaim the resources in
> the ctxs.
> 
> And add a comment in libxl_postfork_child_noexec explaining the
> internal concurrency situation.
> 
> This is an important bugfix.  IMO the bug is a blocker for Xen 4.4.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> Reported-by: M A Young <m.a.young@durham.ac.uk>

Acked-by: Ian Campbell <Ian.Campbell@citrix.com>

> CC: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>  tools/libxl/libxl_event.h |   16 ++++++++++++++++
>  tools/libxl/libxl_fork.c  |   44 +++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 59 insertions(+), 1 deletion(-)

Impressive considering the real meat is -1/+6 ;-)

Not that I'm going to complain about lots of docs!

> 
> @@ -134,7 +150,33 @@ void libxl_postfork_child_noexec(libxl_ctx *ctx)
>      }
>      LIBXL_LIST_INIT(&carefds);
>  
> -    sigchld_user_remove(ctx);
> +    if (sigchld_installed) {
> +        defer_sigchld();
> +
> +        LIBXL_LIST_INIT(&sigchld_users);
> +        /* After this the ->sigchld_user_registered entries in the
> +         * now-obsolete contexts may be lies.  But that's OK because
> +         * no-one will look at them. */
> +
> +        release_sigchld();
> +        sigchld_removehandler_core();
> +    }
>  
>      atfork_unlock();
>  }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] libxl: Hold the atfork lock while closing carefd
  2014-02-24 14:19 ` [PATCH 2/3] libxl: Hold the atfork lock while closing carefd Ian Jackson
@ 2014-02-24 14:47   ` Ian Campbell
  0 siblings, 0 replies; 26+ messages in thread
From: Ian Campbell @ 2014-02-24 14:47 UTC (permalink / raw)
  To: Ian Jackson; +Cc: George Dunlap, xen-devel, M A Young

On Mon, 2014-02-24 at 14:19 +0000, Ian Jackson wrote:
> This avoids the process being forked while a carefd is recorded in the
> list but the actual fd has been closed.  If that happened, a
> subsequent libxl_postfork_child_noexec would attempt to close the fd
> again.  If we are lucky that results in a harmless warning; but if we
> are unlucky the fd number has been reused and we close an unrelated
> fd.
> 
> This race has not been observed anywhere as far as we are aware.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <Ian.Campbell@citrix.com>

> CC: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>  tools/libxl/libxl_fork.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
> index 8421296..fa15095 100644
> --- a/tools/libxl/libxl_fork.c
> +++ b/tools/libxl/libxl_fork.c
> @@ -184,9 +184,9 @@ void libxl_postfork_child_noexec(libxl_ctx *ctx)
>  int libxl__carefd_close(libxl__carefd *cf)
>  {
>      if (!cf) return 0;
> +    atfork_lock();
>      int r = cf->fd < 0 ? 0 : close(cf->fd);
>      int esave = errno;
> -    atfork_lock();
>      LIBXL_LIST_REMOVE(cf, entry);
>      atfork_unlock();
>      free(cf);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] libxl: Fix carefd lock leak in save callout
  2014-02-24 14:19 ` [PATCH 3/3] libxl: Fix carefd lock leak in save callout Ian Jackson
@ 2014-02-24 14:48   ` Ian Campbell
  0 siblings, 0 replies; 26+ messages in thread
From: Ian Campbell @ 2014-02-24 14:48 UTC (permalink / raw)
  To: Ian Jackson; +Cc: George Dunlap, xen-devel, M A Young

On Mon, 2014-02-24 at 14:19 +0000, Ian Jackson wrote:
> If libxl_pipe fails we leave the carefd locked, which translates to
> the atfork lock remaining held.  This would probably cause the process
> to deadlock shortly afterwards.
> 
> Of course libxl_pipe is very unlikely to fail unless things are
> already going very badly.  This bug has not been observed anywhere as
> far as we are aware.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <Ian.Campbell@citrix.com>

> CC: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>  tools/libxl/libxl_save_callout.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
> index 6e45b2f..e3bda8f 100644
> --- a/tools/libxl/libxl_save_callout.c
> +++ b/tools/libxl/libxl_save_callout.c
> @@ -185,7 +185,11 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
>      for (childfd=0; childfd<2; childfd++) {
>          /* Setting up the pipe for the child's fd childfd */
>          int fds[2];
> -        if (libxl_pipe(CTX,fds)) { rc = ERROR_FAIL; goto out; }
> +        if (libxl_pipe(CTX,fds)) {
> +            rc = ERROR_FAIL;
> +            libxl__carefd_unlock();
> +            goto out;
> +        }
>          int childs_end = childfd==0 ? 0 /*read*/  : 1 /*write*/;
>          int our_end    = childfd==0 ? 1 /*write*/ : 0 /*read*/;
>          childs_pipes[childfd] = libxl__carefd_record(CTX, fds[childs_end]);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] libxl: Fix deadlock with pygrub
  2014-02-24 14:19 [PATCH 0/3] libxl: Fix deadlock with pygrub Ian Jackson
                   ` (2 preceding siblings ...)
  2014-02-24 14:19 ` [PATCH 3/3] libxl: Fix carefd lock leak in save callout Ian Jackson
@ 2014-02-24 14:49 ` Ian Campbell
  2014-04-04 15:06   ` Ian Jackson
  2014-03-13 13:59 ` Ian Campbell
  4 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2014-02-24 14:49 UTC (permalink / raw)
  To: Ian Jackson; +Cc: George Dunlap, xen-devel, M A Young

On Mon, 2014-02-24 at 14:19 +0000, Ian Jackson wrote:
> So I think patch 1 should go into 4.4.0 after review.
> 2-3 should wait for 4.4.1 (coming via unstable in the usual way).

I agree with your logic and your conclusion.

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2014-02-24 14:45   ` Ian Campbell
@ 2014-02-24 14:53     ` M A Young
  2014-02-24 14:55       ` Ian Campbell
  2014-02-24 15:26       ` [PATCH] tools/console: reset tty when xenconsole fails Ian Jackson
  0 siblings, 2 replies; 26+ messages in thread
From: M A Young @ 2014-02-24 14:53 UTC (permalink / raw)
  To: Ian Campbell; +Cc: George Dunlap, xen-devel, Ian Jackson

On Mon, 24 Feb 2014, Ian Campbell wrote:

> On Mon, 2014-02-24 at 14:19 +0000, Ian Jackson wrote:
>> libxl_postfork_child_noexec would nestedly reaquire the non-recursive
>> "no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove.
>> The result on Linux is that the process always deadlocks before
>> returning from this function.
>>
>> This is used by xl's console child.  So, the ultimate effect is that
>> xl with pygrub does not manage to connect to the pygrub console.
>> This beahviour was reported by Michael Young in Xen 4.4.0 RC5.
>
> "behaviour".
>
> Michael reported this earlier on -rc2 as well but it fell through the
> cracks because I failed to properly appreciate the severity. Sorry.

No that was a different problem (you get some warnings and a messed 
up terminal sesssion on dom0 after using pygrub). This is a new problem, 
probably introduced in rc4 (which I didn't get around to testing).

 	Michael Young

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2014-02-24 14:53     ` M A Young
@ 2014-02-24 14:55       ` Ian Campbell
  2014-02-24 15:26       ` [PATCH] tools/console: reset tty when xenconsole fails Ian Jackson
  1 sibling, 0 replies; 26+ messages in thread
From: Ian Campbell @ 2014-02-24 14:55 UTC (permalink / raw)
  To: M A Young; +Cc: George Dunlap, xen-devel, Ian Jackson

On Mon, 2014-02-24 at 14:53 +0000, M A Young wrote:
> On Mon, 24 Feb 2014, Ian Campbell wrote:
> 
> > On Mon, 2014-02-24 at 14:19 +0000, Ian Jackson wrote:
> >> libxl_postfork_child_noexec would nestedly reaquire the non-recursive
> >> "no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove.
> >> The result on Linux is that the process always deadlocks before
> >> returning from this function.
> >>
> >> This is used by xl's console child.  So, the ultimate effect is that
> >> xl with pygrub does not manage to connect to the pygrub console.
> >> This beahviour was reported by Michael Young in Xen 4.4.0 RC5.
> >
> > "behaviour".
> >
> > Michael reported this earlier on -rc2 as well but it fell through the
> > cracks because I failed to properly appreciate the severity. Sorry.
> 
> No that was a different problem (you get some warnings and a messed 
> up terminal sesssion on dom0 after using pygrub). This is a new problem, 
> probably introduced in rc4 (which I didn't get around to testing).

OK, thanks for clarifying!

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2014-02-24 14:19 ` [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc Ian Jackson
  2014-02-24 14:45   ` Ian Campbell
@ 2014-02-24 15:17   ` George Dunlap
  2014-02-24 15:47     ` George Dunlap
  2014-02-24 15:56     ` Ian Jackson
  1 sibling, 2 replies; 26+ messages in thread
From: George Dunlap @ 2014-02-24 15:17 UTC (permalink / raw)
  To: Ian Jackson, xen-devel; +Cc: Ian Campbell, M A Young

On 02/24/2014 02:19 PM, Ian Jackson wrote:
> libxl_postfork_child_noexec would nestedly reaquire the non-recursive
> "no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove.
> The result on Linux is that the process always deadlocks before
> returning from this function.
>
> This is used by xl's console child.  So, the ultimate effect is that
> xl with pygrub does not manage to connect to the pygrub console.
> This beahviour was reported by Michael Young in Xen 4.4.0 RC5.
>
> Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is
> not correct with SIGCHLD sharing.  libxl_postfork_child_noexec is
> documented to suffice if called only on one ctx.  So deregistering the
> ctx it's called on is not sufficient.  Instead, we need a new approach
> which discards the whole sigchld_user list and unconditionally removes
> our SIGCHLD handler if we had one.
>
> Prompted by this, clarify the semantics of
> libxl_postfork_child_noexec.  Specifically, expand on the meaning of
> "quickly" by explaining what operations are not permitted; and
> document the fact that the function doesn't reclaim the resources in
> the ctxs.
>
> And add a comment in libxl_postfork_child_noexec explaining the
> internal concurrency situation.
>
> This is an important bugfix.  IMO the bug is a blocker for Xen 4.4.
>
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> Reported-by: M A Young <m.a.young@durham.ac.uk>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: George Dunlap <george.dunlap@eu.citrix.com>

So it looks like this path gets called from a number of other places in xl:

libxl_postfork_child_noexec() is called by xl.c:postfork().

postfork() is called in xl_cmdimpl.c by autoconnect_vncviewer(), 
autoconnect_console(), and do_daemonize().

do_daemonize() is called during "xl create", and "xl devd".

Was this deadlock not triggered for those, or was it triggered and 
nobody noticed?

  -George

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] tools/console: reset tty when xenconsole fails
  2014-02-24 14:53     ` M A Young
  2014-02-24 14:55       ` Ian Campbell
@ 2014-02-24 15:26       ` Ian Jackson
  2014-02-24 15:42         ` Ian Campbell
  2014-02-24 16:17         ` George Dunlap
  1 sibling, 2 replies; 26+ messages in thread
From: Ian Jackson @ 2014-02-24 15:26 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Ian Jackson, Ian Campbell, M A Young

If xenconsole (the client program) fails, it calls err.  This would
previously neglect to reset the user's terminal to sanity.  Use atexit
to do so.

This routinely happens in Xen 4.4 RC5 with pygrub because something
writes the value "" to the tty xenstore key when using xenconsole.
The cause of this is not yet known, but after this patch it just
results in a harmless error message.

Reported-by: M A Young <m.a.young@durham.ac.uk>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
---
 tools/console/client/main.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/console/client/main.c b/tools/console/client/main.c
index 3242008..add3313 100644
--- a/tools/console/client/main.c
+++ b/tools/console/client/main.c
@@ -258,6 +258,13 @@ typedef enum {
        CONSOLE_SERIAL,
 } console_type;
 
+static struct termios stdin_old_attr;
+
+static void restore_term_stdin(void)
+{
+	restore_term(STDIN_FILENO, &stdin_old_attr);
+}
+
 int main(int argc, char **argv)
 {
 	struct termios attr;
@@ -384,9 +391,9 @@ int main(int argc, char **argv)
 	}
 
 	init_term(spty, &attr);
-	init_term(STDIN_FILENO, &attr);
+	init_term(STDIN_FILENO, &stdin_old_attr);
+        atexit(restore_term_stdin); /* if this fails, oh dear */
 	console_loop(spty, xs, path);
-	restore_term(STDIN_FILENO, &attr);
 
 	free(path);
 	free(dom_path);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] tools/console: reset tty when xenconsole fails
  2014-02-24 15:26       ` [PATCH] tools/console: reset tty when xenconsole fails Ian Jackson
@ 2014-02-24 15:42         ` Ian Campbell
  2014-02-24 16:17         ` George Dunlap
  1 sibling, 0 replies; 26+ messages in thread
From: Ian Campbell @ 2014-02-24 15:42 UTC (permalink / raw)
  To: Ian Jackson; +Cc: George Dunlap, xen-devel, M A Young

On Mon, 2014-02-24 at 15:26 +0000, Ian Jackson wrote:
> If xenconsole (the client program) fails, it calls err.  This would
> previously neglect to reset the user's terminal to sanity.  Use atexit
> to do so.
> 
> This routinely happens in Xen 4.4 RC5 with pygrub because something
> writes the value "" to the tty xenstore key when using xenconsole.
> The cause of this is not yet known, but after this patch it just
> results in a harmless error message.
> 
> Reported-by: M A Young <m.a.young@durham.ac.uk>
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>  tools/console/client/main.c |   11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/console/client/main.c b/tools/console/client/main.c
> index 3242008..add3313 100644
> --- a/tools/console/client/main.c
> +++ b/tools/console/client/main.c
> @@ -258,6 +258,13 @@ typedef enum {
>         CONSOLE_SERIAL,
>  } console_type;
>  
> +static struct termios stdin_old_attr;
> +
> +static void restore_term_stdin(void)
> +{
> +	restore_term(STDIN_FILENO, &stdin_old_attr);
> +}
> +
>  int main(int argc, char **argv)
>  {
>  	struct termios attr;
> @@ -384,9 +391,9 @@ int main(int argc, char **argv)
>  	}
>  
>  	init_term(spty, &attr);
> -	init_term(STDIN_FILENO, &attr);
> +	init_term(STDIN_FILENO, &stdin_old_attr);
> +        atexit(restore_term_stdin); /* if this fails, oh dear */

Looks like some sort of tab/space damage, but otherwise:
Acked-by: Ian Campbell <ian.campbell@citrix.com>

>  	console_loop(spty, xs, path);
> -	restore_term(STDIN_FILENO, &attr);
>  
>  	free(path);
>  	free(dom_path);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2014-02-24 15:17   ` [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc George Dunlap
@ 2014-02-24 15:47     ` George Dunlap
  2014-02-24 15:49       ` George Dunlap
  2014-02-24 15:56     ` Ian Jackson
  1 sibling, 1 reply; 26+ messages in thread
From: George Dunlap @ 2014-02-24 15:47 UTC (permalink / raw)
  To: Ian Jackson, xen-devel; +Cc: Ian Campbell, M A Young

On Mon, Feb 24, 2014 at 3:17 PM, George Dunlap
<george.dunlap@eu.citrix.com> wrote:
> On 02/24/2014 02:19 PM, Ian Jackson wrote:
>>
>> libxl_postfork_child_noexec would nestedly reaquire the non-recursive
>> "no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove.
>> The result on Linux is that the process always deadlocks before
>> returning from this function.
>>
>> This is used by xl's console child.  So, the ultimate effect is that
>> xl with pygrub does not manage to connect to the pygrub console.
>> This beahviour was reported by Michael Young in Xen 4.4.0 RC5.
>>
>> Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is
>> not correct with SIGCHLD sharing.  libxl_postfork_child_noexec is
>> documented to suffice if called only on one ctx.  So deregistering the
>> ctx it's called on is not sufficient.  Instead, we need a new approach
>> which discards the whole sigchld_user list and unconditionally removes
>> our SIGCHLD handler if we had one.
>>
>> Prompted by this, clarify the semantics of
>> libxl_postfork_child_noexec.  Specifically, expand on the meaning of
>> "quickly" by explaining what operations are not permitted; and
>> document the fact that the function doesn't reclaim the resources in
>> the ctxs.
>>
>> And add a comment in libxl_postfork_child_noexec explaining the
>> internal concurrency situation.
>>
>> This is an important bugfix.  IMO the bug is a blocker for Xen 4.4.
>>
>> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> Reported-by: M A Young <m.a.young@durham.ac.uk>
>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>> CC: George Dunlap <george.dunlap@eu.citrix.com>
>
>
> So it looks like this path gets called from a number of other places in xl:
>
> libxl_postfork_child_noexec() is called by xl.c:postfork().
>
> postfork() is called in xl_cmdimpl.c by autoconnect_vncviewer(),
> autoconnect_console(), and do_daemonize().
>
> do_daemonize() is called during "xl create", and "xl devd".
>
> Was this deadlock not triggered for those, or was it triggered and nobody
> noticed?

In any case, I do think we need to fix this; the main question is, do
we need to delay the release a bit further to make sure it gets
sufficient testing?

 -George

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2014-02-24 15:47     ` George Dunlap
@ 2014-02-24 15:49       ` George Dunlap
  2014-02-25  5:17         ` Jim Fehlig
  0 siblings, 1 reply; 26+ messages in thread
From: George Dunlap @ 2014-02-24 15:49 UTC (permalink / raw)
  To: Ian Jackson, xen-devel
  Cc: Dario Faggioli, Jim Fehlig, Ian Campbell, M A Young

On Mon, Feb 24, 2014 at 3:47 PM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Mon, Feb 24, 2014 at 3:17 PM, George Dunlap
> <george.dunlap@eu.citrix.com> wrote:
>> On 02/24/2014 02:19 PM, Ian Jackson wrote:
>>>
>>> libxl_postfork_child_noexec would nestedly reaquire the non-recursive
>>> "no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove.
>>> The result on Linux is that the process always deadlocks before
>>> returning from this function.
>>>
>>> This is used by xl's console child.  So, the ultimate effect is that
>>> xl with pygrub does not manage to connect to the pygrub console.
>>> This beahviour was reported by Michael Young in Xen 4.4.0 RC5.
>>>
>>> Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is
>>> not correct with SIGCHLD sharing.  libxl_postfork_child_noexec is
>>> documented to suffice if called only on one ctx.  So deregistering the
>>> ctx it's called on is not sufficient.  Instead, we need a new approach
>>> which discards the whole sigchld_user list and unconditionally removes
>>> our SIGCHLD handler if we had one.
>>>
>>> Prompted by this, clarify the semantics of
>>> libxl_postfork_child_noexec.  Specifically, expand on the meaning of
>>> "quickly" by explaining what operations are not permitted; and
>>> document the fact that the function doesn't reclaim the resources in
>>> the ctxs.
>>>
>>> And add a comment in libxl_postfork_child_noexec explaining the
>>> internal concurrency situation.
>>>
>>> This is an important bugfix.  IMO the bug is a blocker for Xen 4.4.
>>>
>>> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
>>> Reported-by: M A Young <m.a.young@durham.ac.uk>
>>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>>> CC: George Dunlap <george.dunlap@eu.citrix.com>
>>
>>
>> So it looks like this path gets called from a number of other places in xl:
>>
>> libxl_postfork_child_noexec() is called by xl.c:postfork().
>>
>> postfork() is called in xl_cmdimpl.c by autoconnect_vncviewer(),
>> autoconnect_console(), and do_daemonize().
>>
>> do_daemonize() is called during "xl create", and "xl devd".
>>
>> Was this deadlock not triggered for those, or was it triggered and nobody
>> noticed?
>
> In any case, I do think we need to fix this; the main question is, do
> we need to delay the release a bit further to make sure it gets
> sufficient testing?

Also,  it would be nice to get a Tested-by: from someone using it with
libvirt (before the release at least, if not before the check-in).

Jim / Dario?

 -George

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2014-02-24 15:17   ` [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc George Dunlap
  2014-02-24 15:47     ` George Dunlap
@ 2014-02-24 15:56     ` Ian Jackson
  2014-02-24 16:28       ` George Dunlap
  1 sibling, 1 reply; 26+ messages in thread
From: Ian Jackson @ 2014-02-24 15:56 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Ian Campbell, M A Young

George Dunlap writes ("Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc."):
> So it looks like this path gets called from a number of other places in xl:
> 
> libxl_postfork_child_noexec() is called by xl.c:postfork().

In the old code the deadlock only happens when
ctx->sigchld_user_registered (for the ctx passed to
libxl_postfork_child_noexec).

Because xl uses .chldowner = libxl_sigchld_owner_libxl rather than
libxl_sigchld_owner_libxl_always, sigchld_user_registered is only true
when libxl actually has a child process.

In the single-threaded synchronous model used by xl for its
long_running operations (ie always passing ao_how==0), libxl only has
children _during_ the libxl call, when libxl calls back to the
application with a progress callback.

That's what happens in the pygrub case: xl needs to restart the
console viewer.

> postfork() is called in xl_cmdimpl.c by autoconnect_vncviewer(), 
> autoconnect_console(), and do_daemonize().

osstest doesn't use `xl create -c' so they weren't tested anyway.
But it also works just fine in the non-pygrub case.

Likewise `xl create -V' (autoconnect_vncviewer) works too.

> do_daemonize() is called during "xl create", and "xl devd".

These work without the bugfix.

> Was this deadlock not triggered for those, or was it triggered and 
> nobody noticed?

Mostly, the former.

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] tools/console: reset tty when xenconsole fails
  2014-02-24 15:26       ` [PATCH] tools/console: reset tty when xenconsole fails Ian Jackson
  2014-02-24 15:42         ` Ian Campbell
@ 2014-02-24 16:17         ` George Dunlap
  1 sibling, 0 replies; 26+ messages in thread
From: George Dunlap @ 2014-02-24 16:17 UTC (permalink / raw)
  To: Ian Jackson, xen-devel; +Cc: Ian Campbell, M A Young

On 02/24/2014 03:26 PM, Ian Jackson wrote:
> If xenconsole (the client program) fails, it calls err.  This would
> previously neglect to reset the user's terminal to sanity.  Use atexit
> to do so.
>
> This routinely happens in Xen 4.4 RC5 with pygrub because something
> writes the value "" to the tty xenstore key when using xenconsole.
> The cause of this is not yet known, but after this patch it just
> results in a harmless error message.
>
> Reported-by: M A Young <m.a.young@durham.ac.uk>
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

I'm pretty sure this has been here for quite a while -- possibly since 
4.3.  I've been working around it for some time, at any rate. :-)  So I 
think at this point we should save this for 4.4.1.

  -George

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2014-02-24 15:56     ` Ian Jackson
@ 2014-02-24 16:28       ` George Dunlap
  2014-02-24 17:09         ` Ian Jackson
  0 siblings, 1 reply; 26+ messages in thread
From: George Dunlap @ 2014-02-24 16:28 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Ian Campbell, M A Young

On Mon, Feb 24, 2014 at 3:56 PM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> George Dunlap writes ("Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc."):
>> So it looks like this path gets called from a number of other places in xl:
>>
>> libxl_postfork_child_noexec() is called by xl.c:postfork().
>
> In the old code the deadlock only happens when
> ctx->sigchld_user_registered (for the ctx passed to
> libxl_postfork_child_noexec).
>
> Because xl uses .chldowner = libxl_sigchld_owner_libxl rather than
> libxl_sigchld_owner_libxl_always, sigchld_user_registered is only true
> when libxl actually has a child process.
>
> In the single-threaded synchronous model used by xl for its
> long_running operations (ie always passing ao_how==0), libxl only has
> children _during_ the libxl call, when libxl calls back to the
> application with a progress callback.
>
> That's what happens in the pygrub case: xl needs to restart the
> console viewer.

Right.  Having chatted f2f with Ian, and looked through the old and
new code, I'm reasonably convinced that only the broken case can
actually be affected.

So:

Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com>

We're also going to do an RC6 -- please do a smoke-test if you get a
chance.  (I'll send out a separate e-mail to the list as well.)

 -George

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2014-02-24 16:28       ` George Dunlap
@ 2014-02-24 17:09         ` Ian Jackson
  0 siblings, 0 replies; 26+ messages in thread
From: Ian Jackson @ 2014-02-24 17:09 UTC (permalink / raw)
  To: stefano.stabellini; +Cc: George Dunlap, xen-devel

George Dunlap writes ("Re: [Xen-devel] [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc."):
> We're also going to do an RC6 -- please do a smoke-test if you get a
> chance.  (I'll send out a separate e-mail to the list as well.)

Stefano, I wrote on IRC:

16:35 <Diziet> stefano_s: Can you make a tag for 4.4.0-rc6 at the same
               point as -rc5 in qemu-xen-upstream-4.4-testing.git (or
               whatever it's called) please ?
16:36 <Diziet> (and push the tag to both the master and staging/ of that tree)

It seems you're busy on a conference call so I'm going to do this
myself.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2014-02-24 15:49       ` George Dunlap
@ 2014-02-25  5:17         ` Jim Fehlig
  2014-02-25 13:32           ` Ian Jackson
  2014-02-27 17:05           ` Jim Fehlig
  0 siblings, 2 replies; 26+ messages in thread
From: Jim Fehlig @ 2014-02-25  5:17 UTC (permalink / raw)
  To: George Dunlap
  Cc: Dario Faggioli, xen-devel, Ian Jackson, Ian Campbell, M A Young

George Dunlap wrote:
> On Mon, Feb 24, 2014 at 3:47 PM, George Dunlap
> <George.Dunlap@eu.citrix.com> wrote:
>   
>> On Mon, Feb 24, 2014 at 3:17 PM, George Dunlap
>> <george.dunlap@eu.citrix.com> wrote:
>>     
>>> On 02/24/2014 02:19 PM, Ian Jackson wrote:
>>>       
>>>> libxl_postfork_child_noexec would nestedly reaquire the non-recursive
>>>> "no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove.
>>>> The result on Linux is that the process always deadlocks before
>>>> returning from this function.
>>>>
>>>> This is used by xl's console child.  So, the ultimate effect is that
>>>> xl with pygrub does not manage to connect to the pygrub console.
>>>> This beahviour was reported by Michael Young in Xen 4.4.0 RC5.
>>>>
>>>> Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is
>>>> not correct with SIGCHLD sharing.  libxl_postfork_child_noexec is
>>>> documented to suffice if called only on one ctx.  So deregistering the
>>>> ctx it's called on is not sufficient.  Instead, we need a new approach
>>>> which discards the whole sigchld_user list and unconditionally removes
>>>> our SIGCHLD handler if we had one.
>>>>
>>>> Prompted by this, clarify the semantics of
>>>> libxl_postfork_child_noexec.  Specifically, expand on the meaning of
>>>> "quickly" by explaining what operations are not permitted; and
>>>> document the fact that the function doesn't reclaim the resources in
>>>> the ctxs.
>>>>
>>>> And add a comment in libxl_postfork_child_noexec explaining the
>>>> internal concurrency situation.
>>>>
>>>> This is an important bugfix.  IMO the bug is a blocker for Xen 4.4.
>>>>
>>>> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
>>>> Reported-by: M A Young <m.a.young@durham.ac.uk>
>>>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>>>> CC: George Dunlap <george.dunlap@eu.citrix.com>
>>>>         
>>> So it looks like this path gets called from a number of other places in xl:
>>>
>>> libxl_postfork_child_noexec() is called by xl.c:postfork().
>>>
>>> postfork() is called in xl_cmdimpl.c by autoconnect_vncviewer(),
>>> autoconnect_console(), and do_daemonize().
>>>
>>> do_daemonize() is called during "xl create", and "xl devd".
>>>
>>> Was this deadlock not triggered for those, or was it triggered and nobody
>>> noticed?
>>>       
>> In any case, I do think we need to fix this; the main question is, do
>> we need to delay the release a bit further to make sure it gets
>> sufficient testing?
>>     
>
> Also,  it would be nice to get a Tested-by: from someone using it with
> libvirt (before the release at least, if not before the check-in).
>
> Jim / Dario?
>   

I'll update my test system to rc6 tomorrow and restart my tests.

FYI, the tests were running over the weekend on rc5 + libvirt 1.2.2
rc1.  Over 25,000 domains started, shutdown, created, saved, restored,
etc. with no problems noted.

Regards,
Jim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2014-02-25  5:17         ` Jim Fehlig
@ 2014-02-25 13:32           ` Ian Jackson
  2014-02-27 17:05           ` Jim Fehlig
  1 sibling, 0 replies; 26+ messages in thread
From: Ian Jackson @ 2014-02-25 13:32 UTC (permalink / raw)
  To: Jim Fehlig
  Cc: George Dunlap, Dario Faggioli, xen-devel, Ian Campbell, M A Young

Jim Fehlig writes ("Re: [Xen-devel] [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc."):
> I'll update my test system to rc6 tomorrow and restart my tests.

Thanks.  To be honest, I'm very confident that the change between rc5
and rc6 won't break libvirt because the changes to libxl are entirely
confined to libxl_postfork_child_noexec which is not called by
libvirt.

> FYI, the tests were running over the weekend on rc5 + libvirt 1.2.2
> rc1.  Over 25,000 domains started, shutdown, created, saved, restored,
> etc. with no problems noted.

Great.  Thanks a lot for your testing and bug reports (and sorry for
the bugs).

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2014-02-25  5:17         ` Jim Fehlig
  2014-02-25 13:32           ` Ian Jackson
@ 2014-02-27 17:05           ` Jim Fehlig
  2014-02-27 17:08             ` George Dunlap
  1 sibling, 1 reply; 26+ messages in thread
From: Jim Fehlig @ 2014-02-27 17:05 UTC (permalink / raw)
  To: George Dunlap
  Cc: Dario Faggioli, xen-devel, Ian Jackson, Ian Campbell, M A Young

Jim Fehlig wrote:
> George Dunlap wrote:
>   
>> Also,  it would be nice to get a Tested-by: from someone using it with
>> libvirt (before the release at least, if not before the check-in).
>>
>> Jim / Dario?
>>   
>>     
>
> I'll update my test system to rc6 tomorrow and restart my tests.
>   

FYI, the libvirt-based tests have been running for 18 hours with no
problems.  I need to stop them now to use the machine for other
purposes, but will start a weekend run tomorrow.  I know this patch is
already committed, but nonetheless

    Tested-by: Jim Fehlig <jfehlig@suse.com>

Regards,
Jim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
  2014-02-27 17:05           ` Jim Fehlig
@ 2014-02-27 17:08             ` George Dunlap
  0 siblings, 0 replies; 26+ messages in thread
From: George Dunlap @ 2014-02-27 17:08 UTC (permalink / raw)
  To: Jim Fehlig
  Cc: Dario Faggioli, xen-devel, Ian Jackson, Ian Campbell, M A Young

On 02/27/2014 05:05 PM, Jim Fehlig wrote:
> Jim Fehlig wrote:
>> George Dunlap wrote:
>>    
>>> Also,  it would be nice to get a Tested-by: from someone using it with
>>> libvirt (before the release at least, if not before the check-in).
>>>
>>> Jim / Dario?
>>>    
>>>      
>> I'll update my test system to rc6 tomorrow and restart my tests.
>>    
> FYI, the libvirt-based tests have been running for 18 hours with no
> problems.  I need to stop them now to use the machine for other
> purposes, but will start a weekend run tomorrow.  I know this patch is
> already committed, but nonetheless
>
>      Tested-by: Jim Fehlig <jfehlig@suse.com>

Awesome, thanks, Jim.

  -George

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] libxl: Fix deadlock with pygrub
  2014-02-24 14:19 [PATCH 0/3] libxl: Fix deadlock with pygrub Ian Jackson
                   ` (3 preceding siblings ...)
  2014-02-24 14:49 ` [PATCH 0/3] libxl: Fix deadlock with pygrub Ian Campbell
@ 2014-03-13 13:59 ` Ian Campbell
  2014-03-13 18:09   ` Ian Jackson
  4 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2014-03-13 13:59 UTC (permalink / raw)
  To: Ian Jackson; +Cc: George Dunlap, xen-devel, M A Young

On Mon, 2014-02-24 at 14:19 +0000, Ian Jackson wrote:
> 2-3 should wait for 4.4.1 (coming via unstable in the usual way).

Applied to staging.

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] libxl: Fix deadlock with pygrub
  2014-03-13 13:59 ` Ian Campbell
@ 2014-03-13 18:09   ` Ian Jackson
  0 siblings, 0 replies; 26+ messages in thread
From: Ian Jackson @ 2014-03-13 18:09 UTC (permalink / raw)
  To: Ian Campbell; +Cc: George Dunlap, xen-devel, M A Young

Ian Campbell writes ("Re: [PATCH 0/3] libxl: Fix deadlock with pygrub"):
> On Mon, 2014-02-24 at 14:19 +0000, Ian Jackson wrote:
> > 2-3 should wait for 4.4.1 (coming via unstable in the usual way).
> 
> Applied to staging.

Thanks.  I have added them to my backport list.

Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] libxl: Fix deadlock with pygrub
  2014-02-24 14:49 ` [PATCH 0/3] libxl: Fix deadlock with pygrub Ian Campbell
@ 2014-04-04 15:06   ` Ian Jackson
  0 siblings, 0 replies; 26+ messages in thread
From: Ian Jackson @ 2014-04-04 15:06 UTC (permalink / raw)
  To: Ian Campbell; +Cc: George Dunlap, xen-devel, M A Young

Ian Campbell writes ("Re: [PATCH 0/3] libxl: Fix deadlock with pygrub"):
> On Mon, 2014-02-24 at 14:19 +0000, Ian Jackson wrote:
> > So I think patch 1 should go into 4.4.0 after review.
> > 2-3 should wait for 4.4.1 (coming via unstable in the usual way).
> 
> I agree with your logic and your conclusion.

I have now backported these two changes
  libxl: Hold the atfork lock while closing carefd
  libxl: Fix carefd lock leak in save callout
to 4.4, 4.3 and 4.2.  (A trivial cherry pick in each case.)

They are not applicable to 4.1.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-04-04 15:06 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-24 14:19 [PATCH 0/3] libxl: Fix deadlock with pygrub Ian Jackson
2014-02-24 14:19 ` [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc Ian Jackson
2014-02-24 14:45   ` Ian Campbell
2014-02-24 14:53     ` M A Young
2014-02-24 14:55       ` Ian Campbell
2014-02-24 15:26       ` [PATCH] tools/console: reset tty when xenconsole fails Ian Jackson
2014-02-24 15:42         ` Ian Campbell
2014-02-24 16:17         ` George Dunlap
2014-02-24 15:17   ` [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc George Dunlap
2014-02-24 15:47     ` George Dunlap
2014-02-24 15:49       ` George Dunlap
2014-02-25  5:17         ` Jim Fehlig
2014-02-25 13:32           ` Ian Jackson
2014-02-27 17:05           ` Jim Fehlig
2014-02-27 17:08             ` George Dunlap
2014-02-24 15:56     ` Ian Jackson
2014-02-24 16:28       ` George Dunlap
2014-02-24 17:09         ` Ian Jackson
2014-02-24 14:19 ` [PATCH 2/3] libxl: Hold the atfork lock while closing carefd Ian Jackson
2014-02-24 14:47   ` Ian Campbell
2014-02-24 14:19 ` [PATCH 3/3] libxl: Fix carefd lock leak in save callout Ian Jackson
2014-02-24 14:48   ` Ian Campbell
2014-02-24 14:49 ` [PATCH 0/3] libxl: Fix deadlock with pygrub Ian Campbell
2014-04-04 15:06   ` Ian Jackson
2014-03-13 13:59 ` Ian Campbell
2014-03-13 18:09   ` Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.