xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications
@ 2020-01-13 17:08 Ian Jackson
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 01/10] libxl: event: Rename poller.fds_changed to .fds_deregistered Ian Jackson
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: Ian Jackson @ 2020-01-13 17:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, George Dunlap

The meat here, including a description of the bug, is in:
  libxl: event: Fix hang when mixing blocking and eventy calls

Re v1 I wrote:
  I suggest we try to convince ourselves of its correctness
  via a second round of code review.

I put this into practice by writing an informal proof of correctness.
This found a bug, the fixing of which was not entirely trivial.

George tells me he tested v1 of this series.  As with v1, I have
compiled this v2 but not executed it.

Ian Jackson (10):
  libxl: event: Rename poller.fds_changed to .fds_deregistered
  libxl: event: Rename ctx.pollers_fd_changed to .pollers_active
  libxl: event: Introduce CTX_UNLOCK_EGC_FREE
  libxl: event: Make LIBXL__EVENT_DISASTER take a gc, not an egc
  libxl: event: Make libxl__poller_wakeup take a gc, not an egc
  libxl: event: Fix hang when mixing blocking and eventy calls
  libxl: event: poller pipe optimisation
  libxl: event: Break out baton_wake
  libxl: event: Fix possible hang with libxl_osevent_beforepoll
  libxl: event: Move poller pipe emptying to the end of afterpoll

 tools/libxl/libxl.c          |   4 +-
 tools/libxl/libxl_aoutils.c  |   2 +-
 tools/libxl/libxl_disk.c     |   4 +-
 tools/libxl/libxl_domain.c   |   2 +-
 tools/libxl/libxl_event.c    | 286 +++++++++++++++++++++++++++++++++++--------
 tools/libxl/libxl_fork.c     |  17 ++-
 tools/libxl/libxl_internal.h |  54 +++++---
 7 files changed, 290 insertions(+), 79 deletions(-)

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v2 01/10] libxl: event: Rename poller.fds_changed to .fds_deregistered
  2020-01-13 17:08 [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications Ian Jackson
@ 2020-01-13 17:08 ` Ian Jackson
  2020-01-17 11:07   ` George Dunlap
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 02/10] libxl: event: Rename ctx.pollers_fd_changed to .pollers_active Ian Jackson
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Ian Jackson @ 2020-01-13 17:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, George Dunlap

This is only for deregistration.  We are going to add another variable
for new events, with different semantics, and this overly-general name
will become confusing.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl_event.c    | 8 ++++----
 tools/libxl/libxl_internal.h | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index aa8b7d1945..1210c1bfb3 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -239,7 +239,7 @@ void libxl__ev_fd_deregister(libxl__gc *gc, libxl__ev_fd *ev)
     ev->fd = -1;
 
     LIBXL_LIST_FOREACH(poller, &CTX->pollers_fds_changed, fds_changed_entry)
-        poller->fds_changed = 1;
+        poller->fds_deregistered = 1;
 
  out:
     CTX_UNLOCK;
@@ -1120,7 +1120,7 @@ static int beforepoll_internal(libxl__gc *gc, libxl__poller *poller,
 
     *nfds_io = used;
 
-    poller->fds_changed = 0;
+    poller->fds_deregistered = 0;
 
     libxl__ev_time *etime = LIBXL_TAILQ_FIRST(&CTX->etimes);
     if (etime) {
@@ -1186,7 +1186,7 @@ static int afterpoll_check_fd(libxl__poller *poller,
             /* again, stale slot entry */
             continue;
 
-        assert(poller->fds_changed || !(fds[slot].revents & POLLNVAL));
+        assert(poller->fds_deregistered || !(fds[slot].revents & POLLNVAL));
 
         /* we mask in case requested events have changed */
         int slot_revents = fds[slot].revents & events;
@@ -1626,7 +1626,7 @@ int libxl__poller_init(libxl__gc *gc, libxl__poller *p)
     int rc;
     p->fd_polls = 0;
     p->fd_rindices = 0;
-    p->fds_changed = 0;
+    p->fds_deregistered = 0;
 
     rc = libxl__pipe_nonblock(CTX, p->wakeup_pipe);
     if (rc) goto out;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index ba8c9b41ab..c5b71d15f0 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -629,14 +629,14 @@ struct libxl__poller {
     /*
      * We also use the poller to record whether any fds have been
      * deregistered since we entered poll.  Each poller which is not
-     * idle is on the list pollers_fds_changed.  fds_changed is
+     * idle is on the list pollers_fds_changed.  fds_deregistered is
      * cleared by beforepoll, and tested by afterpoll.  Whenever an fd
-     * event is deregistered, we set the fds_changed of all non-idle
+     * event is deregistered, we set the fds_deregistered of all non-idle
      * pollers.  So afterpoll can tell whether any POLLNVAL is
      * plausibly due to an fd being closed and reopened.
      */
     LIBXL_LIST_ENTRY(libxl__poller) fds_changed_entry;
-    bool fds_changed;
+    bool fds_deregistered;
 };
 
 struct libxl__gc {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v2 02/10] libxl: event: Rename ctx.pollers_fd_changed to .pollers_active
  2020-01-13 17:08 [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications Ian Jackson
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 01/10] libxl: event: Rename poller.fds_changed to .fds_deregistered Ian Jackson
@ 2020-01-13 17:08 ` Ian Jackson
  2020-01-17 11:08   ` George Dunlap
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 03/10] libxl: event: Introduce CTX_UNLOCK_EGC_FREE Ian Jackson
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Ian Jackson @ 2020-01-13 17:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, George Dunlap

We are going to use this a bit more widely.  Make the name more
general.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl.c          | 4 ++--
 tools/libxl/libxl_event.c    | 8 ++++----
 tools/libxl/libxl_internal.h | 6 +++---
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index a0d84281d0..f60fd3e4fd 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -48,7 +48,7 @@ int libxl_ctx_alloc(libxl_ctx **pctx, int version,
     ctx->poller_app = 0;
     LIBXL_LIST_INIT(&ctx->pollers_event);
     LIBXL_LIST_INIT(&ctx->pollers_idle);
-    LIBXL_LIST_INIT(&ctx->pollers_fds_changed);
+    LIBXL_LIST_INIT(&ctx->pollers_active);
 
     LIBXL_LIST_INIT(&ctx->efds);
     LIBXL_TAILQ_INIT(&ctx->etimes);
@@ -177,7 +177,7 @@ int libxl_ctx_free(libxl_ctx *ctx)
     libxl__poller_put(ctx, ctx->poller_app);
     ctx->poller_app = NULL;
     assert(LIBXL_LIST_EMPTY(&ctx->pollers_event));
-    assert(LIBXL_LIST_EMPTY(&ctx->pollers_fds_changed));
+    assert(LIBXL_LIST_EMPTY(&ctx->pollers_active));
     libxl__poller *poller, *poller_tmp;
     LIBXL_LIST_FOREACH_SAFE(poller, &ctx->pollers_idle, entry, poller_tmp) {
         libxl__poller_dispose(poller);
diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 1210c1bfb3..5b12a45e70 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -238,7 +238,7 @@ void libxl__ev_fd_deregister(libxl__gc *gc, libxl__ev_fd *ev)
     LIBXL_LIST_REMOVE(ev, entry);
     ev->fd = -1;
 
-    LIBXL_LIST_FOREACH(poller, &CTX->pollers_fds_changed, fds_changed_entry)
+    LIBXL_LIST_FOREACH(poller, &CTX->pollers_active, active_entry)
         poller->fds_deregistered = 1;
 
  out:
@@ -1663,15 +1663,15 @@ libxl__poller *libxl__poller_get(libxl__gc *gc)
         }
     }
 
-    LIBXL_LIST_INSERT_HEAD(&CTX->pollers_fds_changed, p,
-                           fds_changed_entry);
+    LIBXL_LIST_INSERT_HEAD(&CTX->pollers_active, p,
+                           active_entry);
     return p;
 }
 
 void libxl__poller_put(libxl_ctx *ctx, libxl__poller *p)
 {
     if (!p) return;
-    LIBXL_LIST_REMOVE(p, fds_changed_entry);
+    LIBXL_LIST_REMOVE(p, active_entry);
     LIBXL_LIST_INSERT_HEAD(&ctx->pollers_idle, p, entry);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index c5b71d15f0..581d64b99c 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -629,13 +629,13 @@ struct libxl__poller {
     /*
      * We also use the poller to record whether any fds have been
      * deregistered since we entered poll.  Each poller which is not
-     * idle is on the list pollers_fds_changed.  fds_deregistered is
+     * idle is on the list pollers_active.  fds_deregistered is
      * cleared by beforepoll, and tested by afterpoll.  Whenever an fd
      * event is deregistered, we set the fds_deregistered of all non-idle
      * pollers.  So afterpoll can tell whether any POLLNVAL is
      * plausibly due to an fd being closed and reopened.
      */
-    LIBXL_LIST_ENTRY(libxl__poller) fds_changed_entry;
+    LIBXL_LIST_ENTRY(libxl__poller) active_entry;
     bool fds_deregistered;
 };
 
@@ -678,7 +678,7 @@ struct libxl__ctx {
 
     libxl__poller *poller_app; /* libxl_osevent_beforepoll and _afterpoll */
     LIBXL_LIST_HEAD(, libxl__poller) pollers_event, pollers_idle;
-    LIBXL_LIST_HEAD(, libxl__poller) pollers_fds_changed;
+    LIBXL_LIST_HEAD(, libxl__poller) pollers_active;
 
     LIBXL_SLIST_HEAD(libxl__osevent_hook_nexi, libxl__osevent_hook_nexus)
         hook_fd_nexi_idle, hook_timeout_nexi_idle;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v2 03/10] libxl: event: Introduce CTX_UNLOCK_EGC_FREE
  2020-01-13 17:08 [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications Ian Jackson
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 01/10] libxl: event: Rename poller.fds_changed to .fds_deregistered Ian Jackson
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 02/10] libxl: event: Rename ctx.pollers_fd_changed to .pollers_active Ian Jackson
@ 2020-01-13 17:08 ` Ian Jackson
  2020-01-17 11:08   ` George Dunlap
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 04/10] libxl: event: Make LIBXL__EVENT_DISASTER take a gc, not an egc Ian Jackson
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Ian Jackson @ 2020-01-13 17:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, George Dunlap

This is a very common exit pattern.  We are going to want to change
this pattern.  So we should make it into a macro of its own.

No functional change.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl_event.c    | 18 ++++++------------
 tools/libxl/libxl_fork.c     |  6 ++----
 tools/libxl/libxl_internal.h |  2 ++
 3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 5b12a45e70..be37e12bb0 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -1152,8 +1152,7 @@ int libxl_osevent_beforepoll(libxl_ctx *ctx, int *nfds_io,
     CTX_LOCK;
     int rc = beforepoll_internal(gc, ctx->poller_app,
                                  nfds_io, fds, timeout_upd, now);
-    CTX_UNLOCK;
-    EGC_FREE;
+    CTX_UNLOCK_EGC_FREE;
     return rc;
 }
 
@@ -1305,8 +1304,7 @@ void libxl_osevent_afterpoll(libxl_ctx *ctx, int nfds, const struct pollfd *fds,
     EGC_INIT(ctx);
     CTX_LOCK;
     afterpoll_internal(egc, ctx->poller_app, nfds, fds, now);
-    CTX_UNLOCK;
-    EGC_FREE;
+    CTX_UNLOCK_EGC_FREE;
 }
 
 /*
@@ -1342,8 +1340,7 @@ void libxl_osevent_occurred_fd(libxl_ctx *ctx, void *for_libxl,
     fd_occurs(egc, ev, revents_ign);
 
  out:
-    CTX_UNLOCK;
-    EGC_FREE;
+    CTX_UNLOCK_EGC_FREE;
 }
 
 void libxl_osevent_occurred_timeout(libxl_ctx *ctx, void *for_libxl)
@@ -1365,8 +1362,7 @@ void libxl_osevent_occurred_timeout(libxl_ctx *ctx, void *for_libxl)
     time_occurs(egc, ev, ERROR_TIMEDOUT);
 
  out:
-    CTX_UNLOCK;
-    EGC_FREE;
+    CTX_UNLOCK_EGC_FREE;
 }
 
 void libxl__event_disaster(libxl__egc *egc, const char *msg, int errnoval,
@@ -1546,8 +1542,7 @@ int libxl_event_check(libxl_ctx *ctx, libxl_event **event_r,
     EGC_INIT(ctx);
     CTX_LOCK;
     int rc = event_check_internal(egc, event_r, typemask, pred, pred_user);
-    CTX_UNLOCK;
-    EGC_FREE;
+    CTX_UNLOCK_EGC_FREE;
     return rc;
 }
 
@@ -1772,8 +1767,7 @@ int libxl_event_wait(libxl_ctx *ctx, libxl_event **event_r,
  out:
     libxl__poller_put(ctx, poller);
 
-    CTX_UNLOCK;
-    EGC_FREE;
+    CTX_UNLOCK_EGC_FREE;
     return rc;
 }
 
diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
index 0f1b6b518c..cf170b9085 100644
--- a/tools/libxl/libxl_fork.c
+++ b/tools/libxl/libxl_fork.c
@@ -483,8 +483,7 @@ int libxl_childproc_reaped(libxl_ctx *ctx, pid_t pid, int status)
     assert(CTX->childproc_hooks->chldowner
            == libxl_sigchld_owner_mainloop);
     int rc = childproc_reaped(egc, pid, status);
-    CTX_UNLOCK;
-    EGC_FREE;
+    CTX_UNLOCK_EGC_FREE;
     return rc;
 }
 
@@ -529,8 +528,7 @@ void libxl_childproc_sigchld_occurred(libxl_ctx *ctx)
     assert(CTX->childproc_hooks->chldowner
            == libxl_sigchld_owner_mainloop);
     childproc_checkall(egc);
-    CTX_UNLOCK;
-    EGC_FREE;
+    CTX_UNLOCK_EGC_FREE;
 }
 
 static void sigchld_selfpipe_handler(libxl__egc *egc, libxl__ev_fd *ev,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 581d64b99c..983fffac7a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2363,6 +2363,8 @@ _hidden void libxl__egc_cleanup(libxl__egc *egc);
 
 #define EGC_FREE           libxl__egc_cleanup(egc)
 
+#define CTX_UNLOCK_EGC_FREE  do{ CTX_UNLOCK; EGC_FREE; }while(0)
+
 
 /*
  * Machinery for asynchronous operations ("ao")
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v2 04/10] libxl: event: Make LIBXL__EVENT_DISASTER take a gc, not an egc
  2020-01-13 17:08 [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications Ian Jackson
                   ` (2 preceding siblings ...)
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 03/10] libxl: event: Introduce CTX_UNLOCK_EGC_FREE Ian Jackson
@ 2020-01-13 17:08 ` Ian Jackson
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 05/10] libxl: event: Make libxl__poller_wakeup " Ian Jackson
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Ian Jackson @ 2020-01-13 17:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, George Dunlap

We are going to want to change libxl__poller_wakeup to take a gc.

In theory there is a risk here that it would be called inappropriately
in a future patch but this seems unlikely.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: New patch
---
 tools/libxl/libxl_aoutils.c  |  2 +-
 tools/libxl/libxl_disk.c     |  4 ++--
 tools/libxl/libxl_domain.c   |  2 +-
 tools/libxl/libxl_event.c    | 21 ++++++++++-----------
 tools/libxl/libxl_fork.c     | 11 ++++++-----
 tools/libxl/libxl_internal.h | 10 +++++-----
 6 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index e24e4eed53..1be858c93c 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -282,7 +282,7 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
                 hupchk.revents = 0;
                 r = poll(&hupchk, 1, 0);
                 if (r < 0)
-                    LIBXL__EVENT_DISASTER(egc,
+                    LIBXL__EVENT_DISASTER(gc,
      "unexpected failure polling fd for datacopier eof hup check",
                                   errno, 0);
                 if (datacopier_pollhup_handled(egc, dc, fd, hupchk.revents, 0))
diff --git a/tools/libxl/libxl_disk.c b/tools/libxl/libxl_disk.c
index 64a6691424..a463334130 100644
--- a/tools/libxl/libxl_disk.c
+++ b/tools/libxl/libxl_disk.c
@@ -33,7 +33,7 @@ static void disk_eject_xswatch_callback(libxl__egc *egc, libxl__ev_xswatch *w,
         return;
 
     if (libxl__xs_printf(gc, XBT_NULL, wpath, "")) {
-        LIBXL__EVENT_DISASTER(egc, "xs_write failed acknowledging eject",
+        LIBXL__EVENT_DISASTER(gc, "xs_write failed acknowledging eject",
                               errno, LIBXL_EVENT_TYPE_DISK_EJECT);
         return;
     }
@@ -43,7 +43,7 @@ static void disk_eject_xswatch_callback(libxl__egc *egc, libxl__ev_xswatch *w,
 
     rc = libxl__xs_read_checked(gc, XBT_NULL, evg->be_ptr_path, &backend);
     if (rc) {
-        LIBXL__EVENT_DISASTER(egc, "xs_read failed reading be_ptr_path",
+        LIBXL__EVENT_DISASTER(gc, "xs_read failed reading be_ptr_path",
                               errno, LIBXL_EVENT_TYPE_DISK_EJECT);
         return;
     }
diff --git a/tools/libxl/libxl_domain.c b/tools/libxl/libxl_domain.c
index 5714501778..b59cc65750 100644
--- a/tools/libxl/libxl_domain.c
+++ b/tools/libxl/libxl_domain.c
@@ -892,7 +892,7 @@ static void domain_death_xswatch_callback(libxl__egc *egc, libxl__ev_xswatch *w,
 
         rc = xc_domain_getinfolist(CTX->xch, evg->domid, nentries, domaininfos);
         if (rc == -1) {
-            LIBXL__EVENT_DISASTER(egc, "xc_domain_getinfolist failed while"
+            LIBXL__EVENT_DISASTER(gc, "xc_domain_getinfolist failed while"
                                   " processing @releaseDomain watch event",
                                   errno, 0);
             goto out;
diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index be37e12bb0..16e6786889 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -261,7 +261,7 @@ short libxl__fd_poll_recheck(libxl__egc *egc, int fd, short events) {
             break;
         assert(r<0);
         if (errno != EINTR) {
-            LIBXL__EVENT_DISASTER(egc, "failed poll to check for fd", errno, 0);
+            LIBXL__EVENT_DISASTER(gc, "failed poll to check for fd", errno, 0);
             return 0;
         }
     }
@@ -509,14 +509,14 @@ static void watchfd_callback(libxl__egc *egc, libxl__ev_fd *ev,
     EGC_GC;
 
     if (revents & (POLLERR|POLLHUP))
-        LIBXL__EVENT_DISASTER(egc, "unexpected poll event on watch fd", 0, 0);
+        LIBXL__EVENT_DISASTER(gc, "unexpected poll event on watch fd", 0, 0);
 
     for (;;) {
         char **event = xs_check_watch(CTX->xsh);
         if (!event) {
             if (errno == EAGAIN) break;
             if (errno == EINTR) continue;
-            LIBXL__EVENT_DISASTER(egc, "cannot check/read watches", errno, 0);
+            LIBXL__EVENT_DISASTER(gc, "cannot check/read watches", errno, 0);
             return;
         }
 
@@ -705,7 +705,7 @@ static int evtchn_revents_check(libxl__egc *egc, int revents)
 
     if (revents & ~POLLIN) {
         LOG(ERROR, "unexpected poll event on event channel fd: %x", revents);
-        LIBXL__EVENT_DISASTER(egc,
+        LIBXL__EVENT_DISASTER(gc,
                    "unexpected poll event on event channel fd", 0, 0);
         libxl__ev_fd_deregister(gc, &CTX->evtchn_efd);
         return ERROR_FAIL;
@@ -746,7 +746,7 @@ static void evtchn_fd_callback(libxl__egc *egc, libxl__ev_fd *ev,
         if (port < 0) {
             if (errno == EAGAIN)
                 break;
-            LIBXL__EVENT_DISASTER(egc,
+            LIBXL__EVENT_DISASTER(gc,
      "unexpected failure fetching occurring event port number from evtchn",
                                   errno, 0);
             return;
@@ -966,7 +966,7 @@ static void domaindeathcheck_callback(libxl__egc *egc, libxl__ev_xswatch *w,
     libxl__domaindeathcheck_stop(gc,dc);
 
     if (errno!=ENOENT) {
-        LIBXL__EVENT_DISASTER(egc,"failed to read xenstore"
+        LIBXL__EVENT_DISASTER(gc,"failed to read xenstore"
                               " for domain detach check", errno, 0);
         return;
     }
@@ -1279,7 +1279,7 @@ static void afterpoll_internal(libxl__egc *egc, libxl__poller *poller,
 
     if (afterpoll_check_fd(poller,fds,nfds, poller->wakeup_pipe[0],POLLIN)) {
         int e = libxl__self_pipe_eatall(poller->wakeup_pipe[0]);
-        if (e) LIBXL__EVENT_DISASTER(egc, "read wakeup", e, 0);
+        if (e) LIBXL__EVENT_DISASTER(gc, "read wakeup", e, 0);
     }
 
     for (;;) {
@@ -1365,12 +1365,10 @@ void libxl_osevent_occurred_timeout(libxl_ctx *ctx, void *for_libxl)
     CTX_UNLOCK_EGC_FREE;
 }
 
-void libxl__event_disaster(libxl__egc *egc, const char *msg, int errnoval,
+void libxl__event_disaster(libxl__gc *gc, const char *msg, int errnoval,
                            libxl_event_type type /* may be 0 */,
                            const char *file, int line, const char *func)
 {
-    EGC_GC;
-
     libxl__log(CTX, XTL_CRITICAL, errnoval, file, line, func, INVALID_DOMID,
                "DISASTER in event loop: %s%s%s%s",
                msg,
@@ -1672,8 +1670,9 @@ void libxl__poller_put(libxl_ctx *ctx, libxl__poller *p)
 
 void libxl__poller_wakeup(libxl__egc *egc, libxl__poller *p)
 {
+    EGC_GC;
     int e = libxl__self_pipe_wakeup(p->wakeup_pipe[1]);
-    if (e) LIBXL__EVENT_DISASTER(egc, "cannot poke watch pipe", e, 0);
+    if (e) LIBXL__EVENT_DISASTER(gc, "cannot poke watch pipe", e, 0);
 }
 
 /*
diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
index cf170b9085..9a4709b9a4 100644
--- a/tools/libxl/libxl_fork.c
+++ b/tools/libxl/libxl_fork.c
@@ -211,6 +211,7 @@ int libxl__carefd_fd(const libxl__carefd *cf)
 /* Like waitpid(,,WNOHANG) but handles all errors except ECHILD. */
 static pid_t checked_waitpid(libxl__egc *egc, pid_t want, int *status)
 {
+    EGC_GC;
     for (;;) {
         pid_t got = waitpid(want, status, WNOHANG);
         if (got != -1)
@@ -219,7 +220,7 @@ static pid_t checked_waitpid(libxl__egc *egc, pid_t want, int *status)
             return got;
         if (errno == EINTR)
             continue;
-        LIBXL__EVENT_DISASTER(egc, "waitpid() failed", errno, 0);
+        LIBXL__EVENT_DISASTER(gc, "waitpid() failed", errno, 0);
         return 0;
     }
 }
@@ -507,7 +508,7 @@ static void childproc_checkall(libxl__egc *egc)
     found:
         if (got == -1) {
             LIBXL__EVENT_DISASTER
-                (egc, "waitpid() gave ECHILD but we have a child",
+                (gc, "waitpid() gave ECHILD but we have a child",
                  ECHILD, 0);
             /* it must have finished but we don't know its status */
             status = 255<<8; /* no wait.h macro for this! */
@@ -545,14 +546,14 @@ static void sigchld_selfpipe_handler(libxl__egc *egc, libxl__ev_fd *ev,
 
     if (revents & ~POLLIN) {
         LOG(ERROR, "unexpected poll event 0x%x on SIGCHLD self pipe", revents);
-        LIBXL__EVENT_DISASTER(egc,
+        LIBXL__EVENT_DISASTER(gc,
                               "unexpected poll event on SIGCHLD self pipe",
                               0, 0);
     }
     assert(revents & POLLIN);
 
     int e = libxl__self_pipe_eatall(selfpipe);
-    if (e) LIBXL__EVENT_DISASTER(egc, "read sigchld pipe", e, 0);
+    if (e) LIBXL__EVENT_DISASTER(gc, "read sigchld pipe", e, 0);
 
     if (CTX->childproc_hooks->chldowner
         == libxl_sigchld_owner_libxl_always_selective_reap) {
@@ -581,7 +582,7 @@ static void sigchld_selfpipe_handler(libxl__egc *egc, libxl__ev_fd *ev,
                              " libxl_childproc_hooks->reaped_callback"
                              " (for pid=%lu, status=%d; error code %d)",
                              (unsigned long)pid, status, rc);
-                    LIBXL__EVENT_DISASTER(egc, disasterbuf, 0, 0);
+                    LIBXL__EVENT_DISASTER(gc, disasterbuf, 0, 0);
                     return;
                 }
             } else {
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 983fffac7a..328ecf3e1e 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1281,8 +1281,8 @@ _hidden libxl_event *libxl__event_new(libxl__egc*, libxl_event_type,
 /*
  * In general, call this via the macro LIBXL__EVENT_DISASTER.
  *
- * Event-generating functions may call this if they might have wanted
- * to generate an event (either an internal one ie a
+ * Event-generating functions, or ao machinery, may call this if they
+ * might have wanted to generate an event (either an internal one ie a
  * libxl__ev_FOO_callback or an application event), but are prevented
  * from doing so due to eg lack of memory.
  *
@@ -1290,12 +1290,12 @@ _hidden libxl_event *libxl__event_new(libxl__egc*, libxl_event_type,
  * then crash, although it may fail (and henceforth leave things in a
  * state where many or all calls fail).
  */
-_hidden void libxl__event_disaster(libxl__egc*, const char *msg, int errnoval,
+_hidden void libxl__event_disaster(libxl__gc*, const char *msg, int errnoval,
                                    libxl_event_type type /* may be 0 */,
                                    const char *file, int line,
                                    const char *func);
-#define LIBXL__EVENT_DISASTER(egc, msg, errnoval, type) \
-    libxl__event_disaster(egc, msg, errnoval, type, __FILE__,__LINE__,__func__)
+#define LIBXL__EVENT_DISASTER(gc, msg, errnoval, type) \
+    libxl__event_disaster(gc, msg, errnoval, type, __FILE__,__LINE__,__func__)
 
 
 /* Fills in, or disposes of, the resources held by, a poller whose
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v2 05/10] libxl: event: Make libxl__poller_wakeup take a gc, not an egc
  2020-01-13 17:08 [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications Ian Jackson
                   ` (3 preceding siblings ...)
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 04/10] libxl: event: Make LIBXL__EVENT_DISASTER take a gc, not an egc Ian Jackson
@ 2020-01-13 17:08 ` Ian Jackson
  2020-01-17 12:30   ` George Dunlap
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 06/10] libxl: event: Fix hang when mixing blocking and eventy calls Ian Jackson
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Ian Jackson @ 2020-01-13 17:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, George Dunlap

We are going to want to call this in the following situation:

 * We have just set up an ao, which is to call back - so a
   non-synchronous one.  It ought not to call the application
   back right away, so no egc.

 * There is a libxl thread blocking somewhere but it is using
   using an out of date fd or timeout set, which does not take into
   account the ao we have just started.

 * We try to wake that thread up, but libxl__poller_wakeup fails.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: New patch
---
 tools/libxl/libxl_event.c    | 7 +++----
 tools/libxl/libxl_internal.h | 2 +-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 16e6786889..268a5da120 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -1477,7 +1477,7 @@ void libxl__event_occurred(libxl__egc *egc, libxl_event *event)
         libxl__poller *poller;
         LIBXL_TAILQ_INSERT_TAIL(&CTX->occurred, event, link);
         LIBXL_LIST_FOREACH(poller, &CTX->pollers_event, entry)
-            libxl__poller_wakeup(egc, poller);
+            libxl__poller_wakeup(gc, poller);
     }
 }
 
@@ -1668,9 +1668,8 @@ void libxl__poller_put(libxl_ctx *ctx, libxl__poller *p)
     LIBXL_LIST_INSERT_HEAD(&ctx->pollers_idle, p, entry);
 }
 
-void libxl__poller_wakeup(libxl__egc *egc, libxl__poller *p)
+void libxl__poller_wakeup(libxl__gc *gc, libxl__poller *p)
 {
-    EGC_GC;
     int e = libxl__self_pipe_wakeup(p->wakeup_pipe[1]);
     if (e) LIBXL__EVENT_DISASTER(gc, "cannot poke watch pipe", e, 0);
 }
@@ -1924,7 +1923,7 @@ void libxl__ao_complete_check_progress_reports(libxl__egc *egc, libxl__ao *ao)
         assert(ao->in_initiator);
         if (!ao->constructing)
             /* don't bother with this if we're not in the event loop */
-            libxl__poller_wakeup(egc, ao->poller);
+            libxl__poller_wakeup(gc, ao->poller);
     } else if (ao->how.callback) {
         LOG(DEBUG, "ao %p: complete for callback", ao);
         LIBXL_TAILQ_INSERT_TAIL(&egc->aos_for_callback, ao, entry_for_callback);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 328ecf3e1e..b68ab218b6 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1311,7 +1311,7 @@ _hidden void libxl__poller_put(libxl_ctx*, libxl__poller *p /* may be NULL */);
 
 /* Notifies whoever is polling using p that they should wake up.
  * ctx must be locked. */
-_hidden void libxl__poller_wakeup(libxl__egc *egc, libxl__poller *p);
+_hidden void libxl__poller_wakeup(libxl__gc *egc, libxl__poller *p);
 
 /* Internal to fork and child reaping machinery */
 extern const libxl_childproc_hooks libxl__childproc_default_hooks;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v2 06/10] libxl: event: Fix hang when mixing blocking and eventy calls
  2020-01-13 17:08 [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications Ian Jackson
                   ` (4 preceding siblings ...)
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 05/10] libxl: event: Make libxl__poller_wakeup " Ian Jackson
@ 2020-01-13 17:08 ` Ian Jackson
  2020-01-17 13:38   ` George Dunlap
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 07/10] libxl: event: poller pipe optimisation Ian Jackson
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Ian Jackson @ 2020-01-13 17:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, George Dunlap

If the application calls libxl with ao_how==0 and also makes calls
like _occurred, libxl will sometimes get stuck.

The bug happens as follows (for example):

  Thread A
       libxl_do_thing(,ao_how==0)
       libxl_do_thing starts, sets up some callbacks
       libxl_do_thing exit path calls AO_INPROGRESS
       libxl__ao_inprogress goes into event loop
       eventloop_iteration sleeps on:
          - do_thing's current fd set
          - sigchld pipe if applicable
          - its poller

  Thread B
       libxl_something_occurred
       the something is to do with do_thing, above
       do_thing_next_callback does some more work
       do_thing_next_callback becomes interested in fd N
       thread B returns to application

Note that nothing wakes up thread A.  A is not listening on fd N.  So
do_thing_* will not spot when fd N signals.  do_thing will not make
further timely progress.  If there is no timeout thread A will never
wake up.

The problem here occurs because thread A is waiting on an out of date
osevent set.

There is also the possibility that a thread might block waiting for
libxl osevents but outside libxl, eg if the application used
libxl_osevent_beforepoll.  We will deal with that in a moment.

See the big comment in libxl_event.c for a fairly formal correctness
argument.

This depends on libxl__egc_ao_cleanup_1_baton being called everywhere
an egc or ao is disposed of.  Firstly egcs: in this patch we rename
libxl__egc_cleanup, which means we catch all the disposal sites.
Secondly aos: these are disposed of by (i) AO_CREATE_FAIL
(ii) ao__inprogress and (iii) an event which completes the ao later.
(i) and (ii) we handle by adding the call to _baton.  In the case of
(iii) any such function must be an event-generating function so it has
an egc too, so it will pass on the baton when the egc is disposed.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: Call libxl__egc_ao_cleanup_1_baton (renamed from __egc_cleanup) on
    all exits from ao_inprogress, even requests for async processing.
    Fixes a remaining instance of this bug (!)
    This involves disposing of ao->poller somewhat earlier.

v2: New correctness arguments in libxl_event.c comment and
    in commit message.
---
 tools/libxl/libxl_event.c    | 178 ++++++++++++++++++++++++++++++++++++++++---
 tools/libxl/libxl_internal.h |  33 ++++++--
 2 files changed, 194 insertions(+), 17 deletions(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 268a5da120..b50d4e5074 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -37,6 +37,140 @@ static void ao__check_destroy(libxl_ctx *ctx, libxl__ao *ao);
 
 
 /*
+ * osevent update baton handling
+ *
+ * We need the following property (the "unstale liveness property"):
+ *
+ * Whenever any thread is blocking in the libxl event loop[1], at
+ * least one thread must be using an up to date osevent set.  It is OK
+ * for all but one threads to have stale event sets, because so long
+ * as one waiting thread has the right event set, any actually
+ * interesting event will, if nothing else, wake that "right" thread
+ * up.  It will then make some progress and/or, if it exits, ensure
+ * that some other thread becomes the "right" thread.
+ *
+ * [1] TODO: Right now we are considering only the libxl event loop.
+ * We need to consider application event loop outside libxl too.
+ *
+ * Argument that our approach is sound:
+ *
+ * The issue we are concerned about is libxl sleeping on an out of
+ * date fd set, or too long a timeout, so that it doesn't make
+ * progress.  If the property above is satisfied, then if any thread
+ * is waiting in libxl at least one such thread will be waiting on a
+ * sufficient osevent set, so any relevant osevent will wake up a
+ * libxl thread which will either handle the event, or arrange that at
+ * least one other libxl thread has the right set.
+ *
+ * There are two calls to poll in libxl: one is the fd recheck, which
+ * is not blocking.  There is only the one blocking call, in
+ * eventloop_iteration.  poll runs with the ctx unlocked, so osevents
+ * might be added after it unlocks the ctx - that is what we are
+ * worried about.
+ *
+ * To demonstrate that the unstale liveness property is satisfied:
+ *
+ * We define a baton holder as follows: a libxl thread is a baton
+ * holder if
+ *   (a) it has an egc or an ao and holds the ctx lock, or
+ *   (b) it has an active non-app poller and no osevents have been
+ *       added since it released the lock, or
+ *   (c) it has an active non-app poller which has been woken
+ *       (by writing to its pipe), so it will not sleep
+ * We will maintain the invariant (the "baton invariant") that
+ * whenever there is any active poller, there is at least
+ * one baton holder.  ("non-app" means simply "not poller_app".)
+ *
+ * No thread outside libxl can have an active non-app poller: pollers
+ * are put on the active list by poller_get which is called in three
+ * places: libxl_event_wait, which puts it before returning;
+ * libxl__ao_create but only in the synchronous case, in which case
+ * the poller is put before returning; and the poller_app, during
+ * initialisation.
+ *
+ * So any time when all libxl threads are blocking (and therefore do
+ * not have the ctx lock), the non-app active pollers belong to those
+ * threads.  If at least one is a baton holder (the invariant), that
+ * thread has a good enough event set.
+ *
+ * Now we will demonstrate that the "baton invariant" is maintained:
+ *
+ * The rule is that any thread which might be the baton holder is
+ * responsible for checking that there continues to be a baton holder
+ * as needed.
+ *
+ * Firstly, consider the case when the baton holders (b) cease to be
+ * baton holders because osevents are added.
+ *
+ * There are only two kinds of osevents: timeouts and fds.  Every
+ * other internal event source reduces to one of these eventually.
+ * Both of these cases are handled (in the case of fd events, add and
+ * modify, separately), calling pollers_note_osevent_added.
+ *
+ * This walks the poller_active list, marking the active pollers
+ * osevents_added=1.  Such a poller cannot be the baton holder.  But
+ * pollers_note_osevent_added is called only from ev_* functions,
+ * which are only called from event-chain libxl code: ie, code with an
+ * ao or an egc.  So at this point we are a baton holder, and there is
+ * still a baton holder.
+ *
+ * Secondly, consider the case where baton holders (a) cease to be
+ * batton holders because they dispose of their egc or ao.  We call
+ * libxl__egc_ao_cleanup_1_baton on every exit path.  We arrange that
+ * everything that disposes of an egc or an ao checks that there is a
+ * new baton holder by calling libxl__egc_ao_cleanup_1_baton.
+ *
+ * This function handles the invariant explicitly: if we have any
+ * non-app active pollers it looks for one which is up to date (baton
+ * holder category (b)), and failing that it picks a victim to turn
+ * into the baton holder category (c) by waking it up.  (Correctness
+ * depends on this function not spotting its own thread as the
+ * baton-holder, since it is on its way to not being the baton-holder,
+ * so it must be called after the poller has been put back.)
+ *
+ * Thirdly, we must consider the case (c).  A thread in category (c)
+ * will reenter libxl when it gains the lock and necessarily then
+ * becomes a baton holder in category (a).
+ *
+ * So the "baton invariant" is maintained.  QED.
+ */
+static void pollers_note_osevent_added(libxl_ctx *ctx) {
+    libxl__poller *poller;
+    LIBXL_LIST_FOREACH(poller, &ctx->pollers_active, active_entry)
+        poller->osevents_added = 1;
+}
+
+void libxl__egc_ao_cleanup_1_baton(libxl__gc *gc)
+    /* Any poller we had must have been `put' already. */
+{
+    libxl__poller *search, *wake=0;
+
+    LIBXL_LIST_FOREACH(search, &CTX->pollers_active, active_entry) {
+        if (search == CTX->poller_app)
+            /* This one is special.  We can't give it the baton. */
+            continue;
+        if (!search->osevents_added)
+            /* This poller is up to date and will wake up as needed. */
+            return;
+        if (!wake)
+            wake = search;
+    }
+
+    if (!wake)
+        /* no-one in libxl waiting for any events */
+        return;
+
+    libxl__poller_wakeup(gc, wake);
+
+    wake->osevents_added = 0;
+    /* This serves to make _1_baton idempotent.  It is OK even though
+     * that poller may currently be sleeping on only old osevents,
+     * because it is going to wake up because we've just prodded it,
+     * and it pick up new osevents on its next iteration (or pass
+     * on the baton). */
+}
+
+/*
  * The counter osevent_in_hook is used to ensure that the application
  * honours the reentrancy restriction documented in libxl_event.h.
  *
@@ -194,6 +328,7 @@ int libxl__ev_fd_register(libxl__gc *gc, libxl__ev_fd *ev,
     ev->func = func;
 
     LIBXL_LIST_INSERT_HEAD(&CTX->efds, ev, entry);
+    pollers_note_osevent_added(CTX);
 
     rc = 0;
 
@@ -214,6 +349,8 @@ int libxl__ev_fd_modify(libxl__gc *gc, libxl__ev_fd *ev, short events)
     rc = OSEVENT_HOOK(fd,modify, noop, ev->fd, &ev->nexus->for_app_reg, events);
     if (rc) goto out;
 
+    if ((events & ~ev->events))
+        pollers_note_osevent_added(CTX);
     ev->events = events;
 
     rc = 0;
@@ -315,6 +452,7 @@ static int time_register_finite(libxl__gc *gc, libxl__ev_time *ev,
     LIBXL_TAILQ_INSERT_SORTED(&CTX->etimes, entry, ev, evsearch, /*empty*/,
                               timercmp(&ev->abs, &evsearch->abs, >));
 
+    pollers_note_osevent_added(CTX);
     return 0;
 }
 
@@ -1121,6 +1259,7 @@ static int beforepoll_internal(libxl__gc *gc, libxl__poller *poller,
     *nfds_io = used;
 
     poller->fds_deregistered = 0;
+    poller->osevents_added = 0;
 
     libxl__ev_time *etime = LIBXL_TAILQ_FIRST(&CTX->etimes);
     if (etime) {
@@ -1442,7 +1581,7 @@ static void egc_run_callbacks(libxl__egc *egc)
     }
 }
 
-void libxl__egc_cleanup(libxl__egc *egc)
+void libxl__egc_cleanup_2_ul_cb_gc(libxl__egc *egc)
 {
     EGC_GC;
     egc_run_callbacks(egc);
@@ -1752,13 +1891,15 @@ int libxl_event_wait(libxl_ctx *ctx, libxl_event **event_r,
         rc = eventloop_iteration(egc, poller);
         if (rc) goto out;
 
-        /* we unlock and cleanup the egc each time we go through this loop,
-         * so that (a) we don't accumulate garbage and (b) any events
-         * which are to be dispatched by callback are actually delivered
-         * in a timely fashion.
+        /* we unlock and cleanup the egc each time we go through this
+         * loop, so that (a) we don't accumulate garbage and (b) any
+         * events which are to be dispatched by callback are actually
+         * delivered in a timely fashion.  _1_baton will be
+         * called to pass the baton iff we actually leave; otherwise
+         * we are still carrying it.
          */
         CTX_UNLOCK;
-        libxl__egc_cleanup(egc);
+        libxl__egc_cleanup_2_ul_cb_gc(egc);
         CTX_LOCK;
     }
 
@@ -2031,14 +2172,24 @@ int libxl__ao_inprogress(libxl__ao *ao,
                  * synchronous cancellation ability. */
             }
 
+            /* The call to egc..1_baton is below, only if we are leaving. */
             CTX_UNLOCK;
-            libxl__egc_cleanup(&egc);
+            libxl__egc_cleanup_2_ul_cb_gc(&egc);
             CTX_LOCK;
         }
+
+        /* Dispose of this early so libxl__egc_ao_cleanup_1_baton
+         * doesn't mistake us for a baton-holder.  No-one much is
+         * going to look at this ao now so setting this to 0 is fine.
+         * We can't call _baton below _leave because _leave destroys
+         * our gc, which _baton needs. */
+        libxl__poller_put(CTX, ao->poller);
+        ao->poller = 0;
     } else {
         rc = 0;
     }
 
+    libxl__egc_ao_cleanup_1_baton(gc);
     ao->in_initiator = 0;
     ao__manip_leave(CTX, ao);
 
@@ -2051,6 +2202,9 @@ int libxl__ao_inprogress(libxl__ao *ao,
 static int ao__abort(libxl_ctx *ctx, libxl__ao *parent)
 /* Temporarily unlocks ctx, which must be locked exactly once on entry. */
 {
+    libxl__egc egc;
+    LIBXL_INIT_EGC(egc,ctx);
+
     int rc;
     ao__manip_enter(parent);
 
@@ -2071,9 +2225,6 @@ static int ao__abort(libxl_ctx *ctx, libxl__ao *parent)
 
     /* We keep calling abort hooks until there are none left */
     while (!LIBXL_LIST_EMPTY(&parent->abortables)) {
-        libxl__egc egc;
-        LIBXL_INIT_EGC(egc,ctx);
-
         assert(!parent->complete);
 
         libxl__ao_abortable *abrt = LIBXL_LIST_FIRST(&parent->abortables);
@@ -2086,15 +2237,20 @@ static int ao__abort(libxl_ctx *ctx, libxl__ao *parent)
                    "ao %p: abrt=%p: aborting", parent, abrt->ao);
         abrt->callback(&egc, abrt, ERROR_ABORTED);
 
+        /* The call to egc..1_baton is in the out block below. */
         libxl__ctx_unlock(ctx);
-        libxl__egc_cleanup(&egc);
+        libxl__egc_cleanup_2_ul_cb_gc(&egc);
         libxl__ctx_lock(ctx);
     }
 
     rc = 0;
 
  out:
+    libxl__egc_ao_cleanup_1_baton(&egc.gc);
     ao__manip_leave(ctx, parent);
+    /* The call to egc..2_ul_cb_gc is above.  This is sufficient
+     * because only code inside the loop adds anything to the egc, and
+     * we ensures that the egc is clean when we leave the loop. */
     return rc;
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b68ab218b6..eec4bf767d 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -634,9 +634,23 @@ struct libxl__poller {
      * event is deregistered, we set the fds_deregistered of all non-idle
      * pollers.  So afterpoll can tell whether any POLLNVAL is
      * plausibly due to an fd being closed and reopened.
+     *
+     * Additionally, we record whether any fd or time event sources
+     * have been registered.  This is necessary because sometimes we
+     * need to wake up the only libxl thread stuck in
+     * eventloop_iteration so that it will pick up new fds or earlier
+     * timeouts.  osevents_added is cleared by beforepoll, and set by
+     * fd or timeout event registration.  When we are about to leave
+     * libxl (strictly, when we are about to give up an egc), we check
+     * whether there are any pollers.  If there are, then at least one
+     * of them must have osevents_added clear.  If not, we wake up the
+     * first one on the list.  Any entry on pollers_active constitutes
+     * a promise to also make this check, so the baton will never be
+     * dropped.
      */
     LIBXL_LIST_ENTRY(libxl__poller) active_entry;
     bool fds_deregistered;
+    bool osevents_added;
 };
 
 struct libxl__gc {
@@ -2350,7 +2364,10 @@ _hidden libxl_device_model_version libxl__default_device_model(libxl__gc *gc);
         LIBXL_STAILQ_INIT(&(egc).ev_immediates);        \
     } while(0)
 
-_hidden void libxl__egc_cleanup(libxl__egc *egc);
+_hidden void libxl__egc_ao_cleanup_1_baton(libxl__gc *gc);
+  /* Passes the baton for added osevents.  See comment for
+   * osevents_added in struct libxl__poller. */
+_hidden void libxl__egc_cleanup_2_ul_cb_gc(libxl__egc *egc);
   /* Frees memory allocated within this egc's gc, and and report all
    * occurred events via callback, if applicable.  May reenter the
    * application; see restrictions above.  The ctx must be UNLOCKED. */
@@ -2361,9 +2378,11 @@ _hidden void libxl__egc_cleanup(libxl__egc *egc);
     libxl__egc egc[1]; LIBXL_INIT_EGC(egc[0],ctx);      \
     EGC_GC
 
-#define EGC_FREE           libxl__egc_cleanup(egc)
-
-#define CTX_UNLOCK_EGC_FREE  do{ CTX_UNLOCK; EGC_FREE; }while(0)
+#define CTX_UNLOCK_EGC_FREE  do{                        \
+        libxl__egc_ao_cleanup_1_baton(&egc->gc);        \
+        CTX_UNLOCK;                                     \
+        libxl__egc_cleanup_2_ul_cb_gc(egc);             \
+    }while(0)
 
 
 /*
@@ -2468,8 +2487,9 @@ _hidden void libxl__egc_cleanup(libxl__egc *egc);
 
 #define AO_INPROGRESS ({                                        \
         libxl_ctx *ao__ctx = libxl__gc_owner(&ao->gc);          \
+        /* __ao_inprogress will do egc..1_baton if needed */	\
         CTX_UNLOCK;                                             \
-        EGC_FREE;                                               \
+        libxl__egc_cleanup_2_ul_cb_gc(egc);                     \
         CTX_LOCK;                                               \
         int ao__rc = libxl__ao_inprogress(ao,                   \
                                __FILE__, __LINE__, __func__);   \
@@ -2481,8 +2501,9 @@ _hidden void libxl__egc_cleanup(libxl__egc *egc);
         libxl_ctx *ao__ctx = libxl__gc_owner(&ao->gc);          \
         assert(rc);                                             \
         libxl__ao_create_fail(ao);                              \
+        libxl__egc_ao_cleanup_1_baton(&egc->gc);                \
         libxl__ctx_unlock(ao__ctx); /* gc is now invalid */     \
-        EGC_FREE;                                               \
+        libxl__egc_cleanup_2_ul_cb_gc(egc);                     \
         (rc);                                                   \
     })
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v2 07/10] libxl: event: poller pipe optimisation
  2020-01-13 17:08 [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications Ian Jackson
                   ` (5 preceding siblings ...)
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 06/10] libxl: event: Fix hang when mixing blocking and eventy calls Ian Jackson
@ 2020-01-13 17:08 ` Ian Jackson
  2020-01-17 12:06   ` George Dunlap
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 08/10] libxl: event: Break out baton_wake Ian Jackson
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Ian Jackson @ 2020-01-13 17:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, George Dunlap

Track in userland whether the poller pipe is nonempty.  This saves us
writing many many bytes to the pipe if nothing ever reads them.

This is going to be relevant in a moment, where we are going to create
a situation where this will happen quite a lot.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

squash! libxl: event: poller pipe optimisation
---
 tools/libxl/libxl_event.c    | 3 +++
 tools/libxl/libxl_internal.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index b50d4e5074..3e76fa5af5 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -1417,6 +1417,7 @@ static void afterpoll_internal(libxl__egc *egc, libxl__poller *poller,
     }
 
     if (afterpoll_check_fd(poller,fds,nfds, poller->wakeup_pipe[0],POLLIN)) {
+        poller->pipe_nonempty = 0;
         int e = libxl__self_pipe_eatall(poller->wakeup_pipe[0]);
         if (e) LIBXL__EVENT_DISASTER(gc, "read wakeup", e, 0);
     }
@@ -1809,6 +1810,8 @@ void libxl__poller_put(libxl_ctx *ctx, libxl__poller *p)
 
 void libxl__poller_wakeup(libxl__gc *gc, libxl__poller *p)
 {
+    if (p->pipe_nonempty) return;
+    p->pipe_nonempty = 1;
     int e = libxl__self_pipe_wakeup(p->wakeup_pipe[1]);
     if (e) LIBXL__EVENT_DISASTER(gc, "cannot poke watch pipe", e, 0);
 }
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index eec4bf767d..0ab324102b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -625,6 +625,7 @@ struct libxl__poller {
     int (*fd_rindices)[3]; /* see libxl_event.c:beforepoll_internal */
 
     int wakeup_pipe[2]; /* 0 means no fd allocated */
+    bool pipe_nonempty;
 
     /*
      * We also use the poller to record whether any fds have been
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v2 08/10] libxl: event: Break out baton_wake
  2020-01-13 17:08 [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications Ian Jackson
                   ` (6 preceding siblings ...)
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 07/10] libxl: event: poller pipe optimisation Ian Jackson
@ 2020-01-13 17:08 ` Ian Jackson
  2020-01-17 12:31   ` George Dunlap
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 09/10] libxl: event: Fix possible hang with libxl_osevent_beforepoll Ian Jackson
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Ian Jackson @ 2020-01-13 17:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, George Dunlap

No functional change.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: Now it takes a gc, not an egc.
---
 tools/libxl/libxl_event.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 3e76fa5af5..45cc67942d 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -140,6 +140,18 @@ static void pollers_note_osevent_added(libxl_ctx *ctx) {
         poller->osevents_added = 1;
 }
 
+static void baton_wake(libxl__gc *gc, libxl__poller *wake)
+{
+    libxl__poller_wakeup(gc, wake);
+
+    wake->osevents_added = 0;
+    /* This serves to make _1_baton idempotent.  It is OK even though
+     * that poller may currently be sleeping on only old osevents,
+     * because it is going to wake up because we've just prodded it,
+     * and it pick up new osevents on its next iteration (or pass
+     * on the baton). */
+}
+
 void libxl__egc_ao_cleanup_1_baton(libxl__gc *gc)
     /* Any poller we had must have been `put' already. */
 {
@@ -160,14 +172,7 @@ void libxl__egc_ao_cleanup_1_baton(libxl__gc *gc)
         /* no-one in libxl waiting for any events */
         return;
 
-    libxl__poller_wakeup(gc, wake);
-
-    wake->osevents_added = 0;
-    /* This serves to make _1_baton idempotent.  It is OK even though
-     * that poller may currently be sleeping on only old osevents,
-     * because it is going to wake up because we've just prodded it,
-     * and it pick up new osevents on its next iteration (or pass
-     * on the baton). */
+    baton_wake(gc, wake);
 }
 
 /*
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v2 09/10] libxl: event: Fix possible hang with libxl_osevent_beforepoll
  2020-01-13 17:08 [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications Ian Jackson
                   ` (7 preceding siblings ...)
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 08/10] libxl: event: Break out baton_wake Ian Jackson
@ 2020-01-13 17:08 ` Ian Jackson
  2020-01-17 13:39   ` George Dunlap
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 10/10] libxl: event: Move poller pipe emptying to the end of afterpoll Ian Jackson
  2020-01-17 13:40 ` [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications George Dunlap
  10 siblings, 1 reply; 28+ messages in thread
From: Ian Jackson @ 2020-01-13 17:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, George Dunlap

If the application uses libxl_osevent_beforepoll, a similar hang is
possible to the one described and fixed in
   libxl: event: Fix hang when mixing blocking and eventy calls
Application behaviour would have to be fairly unusual, but it
doesn't seem sensible to just leave this latent bug.

We fix the latent bug by waking up the "poller_app" pipe every time we
add osevents.  If the application does not ever call beforepoll, we
write one byte to the pipe and set pipe_nonempty and then we ignore
it.  We only write another byte if beforepoll is called again.

Normally in an eventy program there would only be one thread calling
libxl_osevent_beforepoll.  The effect in such a program is to
sometimes needlessly go round the poll loop again if a timeout
callback becomes interested in a new osevent.  We'll fix that in a
moment.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: New addition to correctness arguments in libxl_event.c comment.
---
 tools/libxl/libxl_event.c | 54 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 45cc67942d..5f6a607d80 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -41,18 +41,25 @@ static void ao__check_destroy(libxl_ctx *ctx, libxl__ao *ao);
  *
  * We need the following property (the "unstale liveness property"):
  *
- * Whenever any thread is blocking in the libxl event loop[1], at
- * least one thread must be using an up to date osevent set.  It is OK
- * for all but one threads to have stale event sets, because so long
- * as one waiting thread has the right event set, any actually
- * interesting event will, if nothing else, wake that "right" thread
- * up.  It will then make some progress and/or, if it exits, ensure
- * that some other thread becomes the "right" thread.
+ * Whenever any thread is blocking as a result of being given an fd
+ * set or timeout by libxl, at least one thread must be using an up to
+ * date osevent set.  It is OK for all but one threads to have stale
+ * event sets, because so long as one waiting thread has the right
+ * event set, any actually interesting event will, if nothing else,
+ * wake that "right" thread up.  It will then make some progress
+ * and/or, if it exits, ensure that some other thread becomes the
+ * "right" thread.
  *
- * [1] TODO: Right now we are considering only the libxl event loop.
- * We need to consider application event loop outside libxl too.
+ * For threads blocking outside libxl and which are receiving libxl's
+ * fd and timeout information via the libxl_osevent_hooks callbacks,
+ * libxl calls this function as soon as it becomes interested.  It is
+ * the responsiblity of a provider of these functions in a
+ * multithreaded environment to make arrangements to wake up event
+ * waiting thread(s) with stale event sets.
  *
- * Argument that our approach is sound:
+ * Waiters outside libxl using _beforepoll are dealt with below.
+ *
+ * For the libxl event loop, the argument is as follows:
  *
  * The issue we are concerned about is libxl sleeping on an out of
  * date fd set, or too long a timeout, so that it doesn't make
@@ -132,7 +139,29 @@ static void ao__check_destroy(libxl_ctx *ctx, libxl__ao *ao);
  * will reenter libxl when it gains the lock and necessarily then
  * becomes a baton holder in category (a).
  *
- * So the "baton invariant" is maintained.  QED.
+ * So the "baton invariant" is maintained.
+ * QED (for waiters in libxl).
+ *
+ *
+ * For waiters outside libxl which used libxl_osevent_beforepoll
+ * to get the fd set:
+ *
+ * As above, adding an osevent involves having an egc or an ao.
+ * It sets poller->osevents_added on all active pollers.  Notably
+ * it sets it on poller_app, which is always active.
+ *
+ * The thread which does this will dispose of its egc or ao before
+ * exiting libxl so it will always wake up the poller_app if the last
+ * call to _beforepoll was before the osevents were added.  So the
+ * application's fd set contains at least a wakeup in the form of the
+ * poller_app fd.  The application cannot sleep on the libxl fd set
+ * until it has called _afterpoll which empties the pipe, and it
+ * is expected to then call _beforepoll again before sleeping.
+ *
+ * So all the application's event waiting thread(s) will always have
+ * an up to date osevent set, and will be woken up if necessary to
+ * achieve this.  (This is in contrast libxl's own event loop where
+ * only one thread need be up to date, as discussed above.)
  */
 static void pollers_note_osevent_added(libxl_ctx *ctx) {
     libxl__poller *poller;
@@ -157,6 +186,9 @@ void libxl__egc_ao_cleanup_1_baton(libxl__gc *gc)
 {
     libxl__poller *search, *wake=0;
 
+    if (CTX->poller_app->osevents_added)
+        baton_wake(gc, CTX->poller_app);
+
     LIBXL_LIST_FOREACH(search, &CTX->pollers_active, active_entry) {
         if (search == CTX->poller_app)
             /* This one is special.  We can't give it the baton. */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v2 10/10] libxl: event: Move poller pipe emptying to the end of afterpoll
  2020-01-13 17:08 [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications Ian Jackson
                   ` (8 preceding siblings ...)
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 09/10] libxl: event: Fix possible hang with libxl_osevent_beforepoll Ian Jackson
@ 2020-01-13 17:08 ` Ian Jackson
  2020-01-17 12:32   ` George Dunlap
  2020-01-17 13:40 ` [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications George Dunlap
  10 siblings, 1 reply; 28+ messages in thread
From: Ian Jackson @ 2020-01-13 17:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, George Dunlap

If a timer event callback causes this poller to be woken (not very
unlikely) we would go round the poll loop twice rather than once.

Do the poller pipe emptying at the end; this is slightly more
efficient because it can't cause any callbacks, so it happens after
all the callbacks have been run.

(This pipe-emptying has to happen in afterpoll rather than the
apparently more logical beforepoll, because the application calling
beforepoll doesn't constitute a promise to actually do anything.)

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl_event.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 5f6a607d80..7c5387e94f 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -1453,12 +1453,6 @@ static void afterpoll_internal(libxl__egc *egc, libxl__poller *poller,
         fd_occurs(egc, efd, revents);
     }
 
-    if (afterpoll_check_fd(poller,fds,nfds, poller->wakeup_pipe[0],POLLIN)) {
-        poller->pipe_nonempty = 0;
-        int e = libxl__self_pipe_eatall(poller->wakeup_pipe[0]);
-        if (e) LIBXL__EVENT_DISASTER(gc, "read wakeup", e, 0);
-    }
-
     for (;;) {
         libxl__ev_time *etime = LIBXL_TAILQ_FIRST(&CTX->etimes);
         if (!etime)
@@ -1473,6 +1467,12 @@ static void afterpoll_internal(libxl__egc *egc, libxl__poller *poller,
 
         time_occurs(egc, etime, ERROR_TIMEDOUT);
     }
+
+    if (afterpoll_check_fd(poller,fds,nfds, poller->wakeup_pipe[0],POLLIN)) {
+        poller->pipe_nonempty = 0;
+        int e = libxl__self_pipe_eatall(poller->wakeup_pipe[0]);
+        if (e) LIBXL__EVENT_DISASTER(gc, "read wakeup", e, 0);
+    }
 }
 
 void libxl_osevent_afterpoll(libxl_ctx *ctx, int nfds, const struct pollfd *fds,
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/10] libxl: event: Rename poller.fds_changed to .fds_deregistered
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 01/10] libxl: event: Rename poller.fds_changed to .fds_deregistered Ian Jackson
@ 2020-01-17 11:07   ` George Dunlap
  0 siblings, 0 replies; 28+ messages in thread
From: George Dunlap @ 2020-01-17 11:07 UTC (permalink / raw)
  To: Ian Jackson, xen-devel

On 1/13/20 5:08 PM, Ian Jackson wrote:
> This is only for deregistration.  We are going to add another variable
> for new events, with different semantics, and this overly-general name
> will become confusing.
> 
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 02/10] libxl: event: Rename ctx.pollers_fd_changed to .pollers_active
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 02/10] libxl: event: Rename ctx.pollers_fd_changed to .pollers_active Ian Jackson
@ 2020-01-17 11:08   ` George Dunlap
  0 siblings, 0 replies; 28+ messages in thread
From: George Dunlap @ 2020-01-17 11:08 UTC (permalink / raw)
  To: Ian Jackson, xen-devel

On 1/13/20 5:08 PM, Ian Jackson wrote:
> We are going to use this a bit more widely.  Make the name more
> general.
> 
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 03/10] libxl: event: Introduce CTX_UNLOCK_EGC_FREE
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 03/10] libxl: event: Introduce CTX_UNLOCK_EGC_FREE Ian Jackson
@ 2020-01-17 11:08   ` George Dunlap
  0 siblings, 0 replies; 28+ messages in thread
From: George Dunlap @ 2020-01-17 11:08 UTC (permalink / raw)
  To: Ian Jackson, xen-devel

On 1/13/20 5:08 PM, Ian Jackson wrote:
> This is a very common exit pattern.  We are going to want to change
> this pattern.  So we should make it into a macro of its own.
> 
> No functional change.
> 
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 07/10] libxl: event: poller pipe optimisation
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 07/10] libxl: event: poller pipe optimisation Ian Jackson
@ 2020-01-17 12:06   ` George Dunlap
  2020-01-17 13:41     ` Ian Jackson
  0 siblings, 1 reply; 28+ messages in thread
From: George Dunlap @ 2020-01-17 12:06 UTC (permalink / raw)
  To: Ian Jackson, xen-devel

On 1/13/20 5:08 PM, Ian Jackson wrote:
> Track in userland whether the poller pipe is nonempty.  This saves us
> writing many many bytes to the pipe if nothing ever reads them.
> 
> This is going to be relevant in a moment, where we are going to create
> a situation where this will happen quite a lot.
> 
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
> 
> squash! libxl: event: poller pipe optimisation

Stray "squash" detrius.

Other than that:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 05/10] libxl: event: Make libxl__poller_wakeup take a gc, not an egc
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 05/10] libxl: event: Make libxl__poller_wakeup " Ian Jackson
@ 2020-01-17 12:30   ` George Dunlap
  2020-01-17 13:46     ` Ian Jackson
  0 siblings, 1 reply; 28+ messages in thread
From: George Dunlap @ 2020-01-17 12:30 UTC (permalink / raw)
  To: Ian Jackson, xen-devel

On 1/13/20 5:08 PM, Ian Jackson wrote:
> We are going to want to call this in the following situation:
> 
>  * We have just set up an ao, which is to call back - so a
>    non-synchronous one.  It ought not to call the application
>    back right away, so no egc.
> 
>  * There is a libxl thread blocking somewhere but it is using
>    using an out of date fd or timeout set, which does not take into
>    account the ao we have just started.
> 
>  * We try to wake that thread up, but libxl__poller_wakeup fails.

So the idea before was that these two functions take an egc, not so much
because it actually uses the egc, but to make sure it's only called in a
restricted set of conditions; and now we're relaxing those conditions?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 08/10] libxl: event: Break out baton_wake
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 08/10] libxl: event: Break out baton_wake Ian Jackson
@ 2020-01-17 12:31   ` George Dunlap
  0 siblings, 0 replies; 28+ messages in thread
From: George Dunlap @ 2020-01-17 12:31 UTC (permalink / raw)
  To: Ian Jackson, xen-devel

On 1/13/20 5:08 PM, Ian Jackson wrote:
> No functional change.
> 
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 10/10] libxl: event: Move poller pipe emptying to the end of afterpoll
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 10/10] libxl: event: Move poller pipe emptying to the end of afterpoll Ian Jackson
@ 2020-01-17 12:32   ` George Dunlap
  2020-01-17 14:24     ` Ian Jackson
  0 siblings, 1 reply; 28+ messages in thread
From: George Dunlap @ 2020-01-17 12:32 UTC (permalink / raw)
  To: Ian Jackson, xen-devel

On 1/13/20 5:08 PM, Ian Jackson wrote:
> If a timer event callback causes this poller to be woken (not very
> unlikely) we would go round the poll loop twice rather than once.
> 
> Do the poller pipe emptying at the end; this is slightly more
> efficient because it can't cause any callbacks, so it happens after
> all the callbacks have been run.
> 
> (This pipe-emptying has to happen in afterpoll rather than the
> apparently more logical beforepoll, because the application calling
> beforepoll doesn't constitute a promise to actually do anything.)
> 
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

I can't quite figure out: why would you end up going around the loop
twice, and how does this fix it?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 06/10] libxl: event: Fix hang when mixing blocking and eventy calls
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 06/10] libxl: event: Fix hang when mixing blocking and eventy calls Ian Jackson
@ 2020-01-17 13:38   ` George Dunlap
  0 siblings, 0 replies; 28+ messages in thread
From: George Dunlap @ 2020-01-17 13:38 UTC (permalink / raw)
  To: Ian Jackson, xen-devel

On 1/13/20 5:08 PM, Ian Jackson wrote:
> If the application calls libxl with ao_how==0 and also makes calls
> like _occurred, libxl will sometimes get stuck.
> 
> The bug happens as follows (for example):
> 
>   Thread A
>        libxl_do_thing(,ao_how==0)
>        libxl_do_thing starts, sets up some callbacks
>        libxl_do_thing exit path calls AO_INPROGRESS
>        libxl__ao_inprogress goes into event loop
>        eventloop_iteration sleeps on:
>           - do_thing's current fd set
>           - sigchld pipe if applicable
>           - its poller
> 
>   Thread B
>        libxl_something_occurred
>        the something is to do with do_thing, above
>        do_thing_next_callback does some more work
>        do_thing_next_callback becomes interested in fd N
>        thread B returns to application
> 
> Note that nothing wakes up thread A.  A is not listening on fd N.  So
> do_thing_* will not spot when fd N signals.  do_thing will not make
> further timely progress.  If there is no timeout thread A will never
> wake up.
> 
> The problem here occurs because thread A is waiting on an out of date
> osevent set.
> 
> There is also the possibility that a thread might block waiting for
> libxl osevents but outside libxl, eg if the application used
> libxl_osevent_beforepoll.  We will deal with that in a moment.
> 
> See the big comment in libxl_event.c for a fairly formal correctness
> argument.
> 
> This depends on libxl__egc_ao_cleanup_1_baton being called everywhere
> an egc or ao is disposed of.  Firstly egcs: in this patch we rename
> libxl__egc_cleanup, which means we catch all the disposal sites.
> Secondly aos: these are disposed of by (i) AO_CREATE_FAIL
> (ii) ao__inprogress and (iii) an event which completes the ao later.
> (i) and (ii) we handle by adding the call to _baton.  In the case of
> (iii) any such function must be an event-generating function so it has
> an egc too, so it will pass on the baton when the egc is disposed.
> 
> Reported-by: George Dunlap <george.dunlap@citrix.com>
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

This all looks very plausible.  I don't feel confident I have enough of
a grasp of the situation to say that I would notice anything missing;
but I think it's worth putting in and letting osstest give it some
exercise (via libvirt).

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 09/10] libxl: event: Fix possible hang with libxl_osevent_beforepoll
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 09/10] libxl: event: Fix possible hang with libxl_osevent_beforepoll Ian Jackson
@ 2020-01-17 13:39   ` George Dunlap
  0 siblings, 0 replies; 28+ messages in thread
From: George Dunlap @ 2020-01-17 13:39 UTC (permalink / raw)
  To: Ian Jackson, xen-devel

On 1/13/20 5:08 PM, Ian Jackson wrote:
> If the application uses libxl_osevent_beforepoll, a similar hang is
> possible to the one described and fixed in
>    libxl: event: Fix hang when mixing blocking and eventy calls
> Application behaviour would have to be fairly unusual, but it
> doesn't seem sensible to just leave this latent bug.
> 
> We fix the latent bug by waking up the "poller_app" pipe every time we
> add osevents.  If the application does not ever call beforepoll, we
> write one byte to the pipe and set pipe_nonempty and then we ignore
> it.  We only write another byte if beforepoll is called again.
> 
> Normally in an eventy program there would only be one thread calling
> libxl_osevent_beforepoll.  The effect in such a program is to
> sometimes needlessly go round the poll loop again if a timeout
> callback becomes interested in a new osevent.  We'll fix that in a
> moment.
> 
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

Same as the comment on patch 5:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications
  2020-01-13 17:08 [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications Ian Jackson
                   ` (9 preceding siblings ...)
  2020-01-13 17:08 ` [Xen-devel] [PATCH v2 10/10] libxl: event: Move poller pipe emptying to the end of afterpoll Ian Jackson
@ 2020-01-17 13:40 ` George Dunlap
  10 siblings, 0 replies; 28+ messages in thread
From: George Dunlap @ 2020-01-17 13:40 UTC (permalink / raw)
  To: Ian Jackson, xen-devel

On 1/13/20 5:08 PM, Ian Jackson wrote:
> The meat here, including a description of the bug, is in:
>   libxl: event: Fix hang when mixing blocking and eventy calls
> 
> Re v1 I wrote:
>   I suggest we try to convince ourselves of its correctness
>   via a second round of code review.
> 
> I put this into practice by writing an informal proof of correctness.
> This found a bug, the fixing of which was not entirely trivial.
> 
> George tells me he tested v1 of this series.  As with v1, I have
> compiled this v2 but not executed it.

I have tested this series both with my C-based proof of concept, and now
with the golang bindings, and it solves my problem and seems to work as
advertised.

Tested-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 07/10] libxl: event: poller pipe optimisation
  2020-01-17 12:06   ` George Dunlap
@ 2020-01-17 13:41     ` Ian Jackson
  0 siblings, 0 replies; 28+ messages in thread
From: Ian Jackson @ 2020-01-17 13:41 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

George Dunlap writes ("Re: [PATCH v2 07/10] libxl: event: poller pipe optimisation"):
> On 1/13/20 5:08 PM, Ian Jackson wrote:
> > squash! libxl: event: poller pipe optimisation
> 
> Stray "squash" detrius.

Oops.

> Other than that:
> 
> Reviewed-by: George Dunlap <george.dunlap@citrix.com>

Thanks, fixed.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 05/10] libxl: event: Make libxl__poller_wakeup take a gc, not an egc
  2020-01-17 12:30   ` George Dunlap
@ 2020-01-17 13:46     ` Ian Jackson
  2020-01-17 13:50       ` George Dunlap
  0 siblings, 1 reply; 28+ messages in thread
From: Ian Jackson @ 2020-01-17 13:46 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

George Dunlap writes ("Re: [PATCH v2 05/10] libxl: event: Make libxl__poller_wakeup take a gc, not an egc"):
> On 1/13/20 5:08 PM, Ian Jackson wrote:
> > We are going to want to call this in the following situation:
> > 
> >  * We have just set up an ao, which is to call back - so a
> >    non-synchronous one.  It ought not to call the application
> >    back right away, so no egc.
> > 
> >  * There is a libxl thread blocking somewhere but it is using
> >    using an out of date fd or timeout set, which does not take into
> >    account the ao we have just started.
> > 
> >  * We try to wake that thread up, but libxl__poller_wakeup fails.
> 
> So the idea before was that these two functions take an egc, not so much
> because it actually uses the egc, but to make sure it's only called in a
> restricted set of conditions; and now we're relaxing those conditions?

Yes.  Specifically, we need to make one exception, relating to ao's.

In the situation described above, there is no egc, but we need to call
libxl__poller_wakeup.  Introducing an egc is wrong because that would
imply that this situation might result in application callbacks, but
it shouldn't (and not having an egc prevents that).

libxl__poller_wakeup and LIBXL__EVENT_DISASTER only take an egc for
form's sake; they don't use any part of it other than the gc.  The
"form's sake" is to stop them being called from libxl entrypoints that
are not involved in event generation.

Before this patch this is enforced by the types: you can't call it in
the wrong place because it wants an egc which you don't have.

After this patch this is no longer enforced.  But the mistake
(principally, calling _DISASTER) seems unlikely.  The type enforcement
I mention above was done because it was possible and easy, not because
it was important.

Does more of this want to be in the commit message ?

Thanks,
Ian.
(much text of this mail first written on irc)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 05/10] libxl: event: Make libxl__poller_wakeup take a gc, not an egc
  2020-01-17 13:46     ` Ian Jackson
@ 2020-01-17 13:50       ` George Dunlap
  0 siblings, 0 replies; 28+ messages in thread
From: George Dunlap @ 2020-01-17 13:50 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On 1/17/20 1:46 PM, Ian Jackson wrote:
> George Dunlap writes ("Re: [PATCH v2 05/10] libxl: event: Make libxl__poller_wakeup take a gc, not an egc"):
>> On 1/13/20 5:08 PM, Ian Jackson wrote:
>>> We are going to want to call this in the following situation:
>>>
>>>  * We have just set up an ao, which is to call back - so a
>>>    non-synchronous one.  It ought not to call the application
>>>    back right away, so no egc.
>>>
>>>  * There is a libxl thread blocking somewhere but it is using
>>>    using an out of date fd or timeout set, which does not take into
>>>    account the ao we have just started.
>>>
>>>  * We try to wake that thread up, but libxl__poller_wakeup fails.
>>
>> So the idea before was that these two functions take an egc, not so much
>> because it actually uses the egc, but to make sure it's only called in a
>> restricted set of conditions; and now we're relaxing those conditions?
> 
> Yes.  Specifically, we need to make one exception, relating to ao's.
> 
> In the situation described above, there is no egc, but we need to call
> libxl__poller_wakeup.  Introducing an egc is wrong because that would
> imply that this situation might result in application callbacks, but
> it shouldn't (and not having an egc prevents that).
> 
> libxl__poller_wakeup and LIBXL__EVENT_DISASTER only take an egc for
> form's sake; they don't use any part of it other than the gc.  The
> "form's sake" is to stop them being called from libxl entrypoints that
> are not involved in event generation.
> 
> Before this patch this is enforced by the types: you can't call it in
> the wrong place because it wants an egc which you don't have.
> 
> After this patch this is no longer enforced.  But the mistake
> (principally, calling _DISASTER) seems unlikely.  The type enforcement
> I mention above was done because it was possible and easy, not because
> it was important.

That makes sense; just trying partly to make sure I have it right,
partly to have things in the public record.  In which case, re the code:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>


> Does more of this want to be in the commit message ?

I was going to say I'm not sure we need another round-trip.  I'd be OK
with checking it in as-is; or you could edit the commit message on
check-in if you wanted.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 10/10] libxl: event: Move poller pipe emptying to the end of afterpoll
  2020-01-17 12:32   ` George Dunlap
@ 2020-01-17 14:24     ` Ian Jackson
  2020-01-17 14:33       ` Ian Jackson
  2020-01-17 14:34       ` George Dunlap
  0 siblings, 2 replies; 28+ messages in thread
From: Ian Jackson @ 2020-01-17 14:24 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

George Dunlap writes ("Re: [PATCH v2 10/10] libxl: event: Move poller pipe emptying to the end of afterpoll"):
> On 1/13/20 5:08 PM, Ian Jackson wrote:
> > If a timer event callback causes this poller to be woken (not very
> > unlikely) we would go round the poll loop twice rather than once.
> > 
> > Do the poller pipe emptying at the end; this is slightly more
> > efficient because it can't cause any callbacks, so it happens after
> > all the callbacks have been run.
> > 
> > (This pipe-emptying has to happen in afterpoll rather than the
> > apparently more logical beforepoll, because the application calling
> > beforepoll doesn't constitute a promise to actually do anything.)
> > 
> > Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
> 
> I can't quite figure out: why would you end up going around the loop
> twice, and how does this fix it?

I now think this is not true and the situation I describe cannot
happen.

What I was thinking was that pollers_note_osevent_added might be
called by something from time_occurs, and that would write a byte into
the poller pipe.  But pollers_note_osevent_added doesn't wake up
pollers; it just tags them osevents_added.

I now think the spurious wakeup cannot happen because:

For this patch to make any difference, the poller pipe would have to
be woken up by something in the time scan loop in afterpoll_internal.

But poller pipes are only woken up by ao completion or by
cleanup_1_baton.

cleanup_1_baton is not called anywhere there (as an argument against:
any such call would violate the rule that cleanup_1_baton may not be
called with a poller in hand).

And as for ao completion, we would indeed wake up the poller.  But we
also mark the ao as complete, so ao_inprogress would spot
!ao_work_outstanding, and not reenter eventloop_iteration at all.
The woken-up poller would be put by ao_inprogress.

This leads me to this observation: poller_get might give you a
woken-up poller.  This is not incorrect, but it is pointless.  So
maybe I should write a patch that puts a call to
libxl__self_pipe_eatall in libxl__poller_get.

TBH I still think this patch tidies the code up a bit.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 10/10] libxl: event: Move poller pipe emptying to the end of afterpoll
  2020-01-17 14:24     ` Ian Jackson
@ 2020-01-17 14:33       ` Ian Jackson
  2020-01-17 14:39         ` George Dunlap
  2020-01-17 14:34       ` George Dunlap
  1 sibling, 1 reply; 28+ messages in thread
From: Ian Jackson @ 2020-01-17 14:33 UTC (permalink / raw)
  To: George Dunlap, xen-devel

Ian Jackson writes ("Re: [PATCH v2 10/10] libxl: event: Move poller pipe emptying to the end of afterpoll"):
> TBH I still think this patch tidies the code up a bit.

Given you tested it with this change, and I think it makes it a bit
tidier and no less correct, I would like to keep it.

I rewrote the commit message - see below.

Ian.

libxl: event: Move poller pipe emptying to the end of afterpoll

This seems neater.  It doesn't have any significant effect because:

The poller fd wouldn't be emptied by time_occurs.  It would only be
woken by time_occurs as a result of an ao completing, or by
libxl__egc_ao_cleanup_1_baton.  But ...1_baton won't be called in
between (for one thing, this would violate the rule of not still
having the active caller when ...1_baton is called).

While discussing this patch, I noticed that there is a possibility (in
libxl in general) that poller_put might be called on a woken poller.
It would probably be sensible at some point to make poller_get empty
the pipe, at least if the pipe_nonempty flag is set.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
---
v2: Completely revised commit message; now we think this is just
    cleanup.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 10/10] libxl: event: Move poller pipe emptying to the end of afterpoll
  2020-01-17 14:24     ` Ian Jackson
  2020-01-17 14:33       ` Ian Jackson
@ 2020-01-17 14:34       ` George Dunlap
  1 sibling, 0 replies; 28+ messages in thread
From: George Dunlap @ 2020-01-17 14:34 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On 1/17/20 2:24 PM, Ian Jackson wrote:
> George Dunlap writes ("Re: [PATCH v2 10/10] libxl: event: Move poller pipe emptying to the end of afterpoll"):
>> On 1/13/20 5:08 PM, Ian Jackson wrote:
>>> If a timer event callback causes this poller to be woken (not very
>>> unlikely) we would go round the poll loop twice rather than once.
>>>
>>> Do the poller pipe emptying at the end; this is slightly more
>>> efficient because it can't cause any callbacks, so it happens after
>>> all the callbacks have been run.
>>>
>>> (This pipe-emptying has to happen in afterpoll rather than the
>>> apparently more logical beforepoll, because the application calling
>>> beforepoll doesn't constitute a promise to actually do anything.)
>>>
>>> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
>>
>> I can't quite figure out: why would you end up going around the loop
>> twice, and how does this fix it?
> 
> I now think this is not true and the situation I describe cannot
> happen.
> 
> What I was thinking was that pollers_note_osevent_added might be
> called by something from time_occurs, and that would write a byte into
> the poller pipe.  But pollers_note_osevent_added doesn't wake up
> pollers; it just tags them osevents_added.
> 
> I now think the spurious wakeup cannot happen because:
> 
> For this patch to make any difference, the poller pipe would have to
> be woken up by something in the time scan loop in afterpoll_internal.
> 
> But poller pipes are only woken up by ao completion or by
> cleanup_1_baton.
> 
> cleanup_1_baton is not called anywhere there (as an argument against:
> any such call would violate the rule that cleanup_1_baton may not be
> called with a poller in hand).
> 
> And as for ao completion, we would indeed wake up the poller.  But we
> also mark the ao as complete, so ao_inprogress would spot
> !ao_work_outstanding, and not reenter eventloop_iteration at all.
> The woken-up poller would be put by ao_inprogress.
> 
> This leads me to this observation: poller_get might give you a
> woken-up poller.  This is not incorrect, but it is pointless.  So
> maybe I should write a patch that puts a call to
> libxl__self_pipe_eatall in libxl__poller_get.
> 
> TBH I still think this patch tidies the code up a bit.

No objection to it on those grounds. :-)

Thanks for the explanation,
 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v2 10/10] libxl: event: Move poller pipe emptying to the end of afterpoll
  2020-01-17 14:33       ` Ian Jackson
@ 2020-01-17 14:39         ` George Dunlap
  0 siblings, 0 replies; 28+ messages in thread
From: George Dunlap @ 2020-01-17 14:39 UTC (permalink / raw)
  To: Ian Jackson, xen-devel

On 1/17/20 2:33 PM, Ian Jackson wrote:
> Ian Jackson writes ("Re: [PATCH v2 10/10] libxl: event: Move poller pipe emptying to the end of afterpoll"):
>> TBH I still think this patch tidies the code up a bit.
> 
> Given you tested it with this change, and I think it makes it a bit
> tidier and no less correct, I would like to keep it.
> 
> I rewrote the commit message - see below.
> 
> Ian.
> 
> libxl: event: Move poller pipe emptying to the end of afterpoll
> 
> This seems neater.  It doesn't have any significant effect because:
> 
> The poller fd wouldn't be emptied by time_occurs.  It would only be
> woken by time_occurs as a result of an ao completing, or by
> libxl__egc_ao_cleanup_1_baton.  But ...1_baton won't be called in
> between (for one thing, this would violate the rule of not still
> having the active caller when ...1_baton is called).
> 
> While discussing this patch, I noticed that there is a possibility (in
> libxl in general) that poller_put might be called on a woken poller.
> It would probably be sensible at some point to make poller_get empty
> the pipe, at least if the pipe_nonempty flag is set.
> 
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
> Tested-by: George Dunlap <george.dunlap@citrix.com>

With the new commit message:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-01-17 14:40 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-13 17:08 [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications Ian Jackson
2020-01-13 17:08 ` [Xen-devel] [PATCH v2 01/10] libxl: event: Rename poller.fds_changed to .fds_deregistered Ian Jackson
2020-01-17 11:07   ` George Dunlap
2020-01-13 17:08 ` [Xen-devel] [PATCH v2 02/10] libxl: event: Rename ctx.pollers_fd_changed to .pollers_active Ian Jackson
2020-01-17 11:08   ` George Dunlap
2020-01-13 17:08 ` [Xen-devel] [PATCH v2 03/10] libxl: event: Introduce CTX_UNLOCK_EGC_FREE Ian Jackson
2020-01-17 11:08   ` George Dunlap
2020-01-13 17:08 ` [Xen-devel] [PATCH v2 04/10] libxl: event: Make LIBXL__EVENT_DISASTER take a gc, not an egc Ian Jackson
2020-01-13 17:08 ` [Xen-devel] [PATCH v2 05/10] libxl: event: Make libxl__poller_wakeup " Ian Jackson
2020-01-17 12:30   ` George Dunlap
2020-01-17 13:46     ` Ian Jackson
2020-01-17 13:50       ` George Dunlap
2020-01-13 17:08 ` [Xen-devel] [PATCH v2 06/10] libxl: event: Fix hang when mixing blocking and eventy calls Ian Jackson
2020-01-17 13:38   ` George Dunlap
2020-01-13 17:08 ` [Xen-devel] [PATCH v2 07/10] libxl: event: poller pipe optimisation Ian Jackson
2020-01-17 12:06   ` George Dunlap
2020-01-17 13:41     ` Ian Jackson
2020-01-13 17:08 ` [Xen-devel] [PATCH v2 08/10] libxl: event: Break out baton_wake Ian Jackson
2020-01-17 12:31   ` George Dunlap
2020-01-13 17:08 ` [Xen-devel] [PATCH v2 09/10] libxl: event: Fix possible hang with libxl_osevent_beforepoll Ian Jackson
2020-01-17 13:39   ` George Dunlap
2020-01-13 17:08 ` [Xen-devel] [PATCH v2 10/10] libxl: event: Move poller pipe emptying to the end of afterpoll Ian Jackson
2020-01-17 12:32   ` George Dunlap
2020-01-17 14:24     ` Ian Jackson
2020-01-17 14:33       ` Ian Jackson
2020-01-17 14:39         ` George Dunlap
2020-01-17 14:34       ` George Dunlap
2020-01-17 13:40 ` [Xen-devel] [PATCH v2 00/10] libxl: event: Fix hang for some applications George Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).