All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jim Fehlig <jfehlig@suse.com>
To: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Bamvor Jian Zhang <bjzhang@suse.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	xen-devel@lists.xen.org
Subject: Re: [PATCH 2/2] libxl: fix stale timeout event callback race
Date: Tue, 11 Dec 2012 15:53:40 -0700	[thread overview]
Message-ID: <50C7B974.4050706@suse.com> (raw)
In-Reply-To: <1355158624-11163-2-git-send-email-ian.jackson@eu.citrix.com>

Ian Jackson wrote:
> Because there is not necessarily any lock held at the point the
> application (eg, libvirt) calls libxl_osevent_occurred_timeout, in a
> multithreaded program those calls may be arbitrarily delayed in
> relation to other activities within the program.
>
> Specifically this means when ->timeout_deregister returns, libxl does
> not know whether it can safely dispose of the for_libxl value or
> whether it needs to retain it in case of an in-progress call to
> _occurred_timeout.
>
> The interface could be fixed by requiring the application to make a
> new call into libxl to say that the deregistration was complete.
>
> However that new call would have to be threaded through the
> application's event loop; this is complicated and some application
> authors are likely not to implement it properly.  Furthermore the
> easiest way to implement this facility in most event loops is to queue
> up a time event for "now".
>
> Shortcut all of this by having libxl always call timeout_modify
> setting abs={0,0} (ie, ASAP) instead of timeout_deregister.  This will
> cause the application to call _occurred_timeout.  When processing this
> calldown we see that we were no longer actually interested and simply
> throw it away.
>
> Additionally, there is a race between _occurred_timeout and
> ->timeout_modify.  If libxl ever adjusts the deadline for a timeout
> the application may already be in the process of calling _occurred, in
> which case the situation with for_app's lifetime becomes very
> complicated.  Therefore abolish libxl__ev_time_modify_{abs,rel} (which
> have no callers) and promise to the application only ever to call
> ->timeout_modify with abs=={0,0}.  The application still needs to cope
> with ->timeout_modify racing with its internal function which calls
> _occurred_timeout.  Document this.
>
> This is a forwards-compatible change for applications using the libxl
> API, and will hopefully eliminate these races in callback-supplying
> applications (such as libvirt) without the need for corresponding
> changes to the application.
>   

I'm not sure how to avoid changing the application (libvirt). Currently,
the libvirt libxl driver removes the timeout from the libvirt event loop
in the timeout_deregister() function. This function is never called now
and hence the timeout is never removed. I had to make the following
change in the libxl driver to use your patches

diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c
index 302f81c..d8f078e 100644
--- a/src/libxl/libxl_driver.c
+++ b/src/libxl/libxl_driver.c
@@ -225,16 +225,11 @@ static int libxlTimeoutRegisterEventHook(void *priv,

static int libxlTimeoutModifyEventHook(void *priv ATTRIBUTE_UNUSED,
void **hndp,
- struct timeval abs_t)
+ struct timeval abs_t ATTRIBUTE_UNUSED)
{
- struct timeval now;
- int timeout;
struct libxlOSEventHookTimerInfo *timer_info = *hndp;

- gettimeofday(&now, NULL);
- timeout = (abs_t.tv_usec - now.tv_usec) / 1000;
- timeout += (abs_t.tv_sec - now.tv_sec) * 1000;
- virEventUpdateTimeout(timer_info->id, timeout);
+ virEventRemoveTimeout(timer_info->id);
return 0;
}

I could also call virEventUpdateTimeout() with a timeout of 0 to make it
fire on the next iteration of the event loop, and then remove the
timeout in the callback before invoking
libxl_osevent_occurred_timeout(). But either way, changes need to be
made to the application.

> For clarity, fold the body of time_register_finite into its one
> remaining call site.  This makes the semantics of ev->infinite
> slightly clearer.
>
> Cc: Bamvor Jian Zhang <bjzhang@suse.com>
> Cc: Ian Campbell <Ian.Campbell@citrix.com>
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
>
> --
> v3:
>   - Fix null pointer dereference in case when hooks not supplied.
> ---
>  tools/libxl/libxl_event.c |   89 +++++++--------------------------------------
>  tools/libxl/libxl_event.h |   17 ++++++++-
>  2 files changed, 29 insertions(+), 77 deletions(-)
>
> diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
> index f1fe425..f86c528 100644
> --- a/tools/libxl/libxl_event.c
> +++ b/tools/libxl/libxl_event.c
> @@ -267,18 +267,11 @@ static int time_rel_to_abs(libxl__gc *gc, int ms, struct timeval *abs_out)
>      return 0;
>  }
>  
> -static void time_insert_finite(libxl__gc *gc, libxl__ev_time *ev)
> -{
> -    libxl__ev_time *evsearch;
> -    LIBXL_TAILQ_INSERT_SORTED(&CTX->etimes, entry, ev, evsearch, /*empty*/,
> -                              timercmp(&ev->abs, &evsearch->abs, >));
> -    ev->infinite = 0;
> -}
> -
>  static int time_register_finite(libxl__gc *gc, libxl__ev_time *ev,
>                                  struct timeval absolute)
>  {
>      int rc;
> +    libxl__ev_time *evsearch;
>  
>      rc = OSEVENT_HOOK(timeout,register, alloc, &ev->nexus->for_app_reg,
>                        absolute, ev->nexus);
> @@ -286,7 +279,8 @@ static int time_register_finite(libxl__gc *gc, libxl__ev_time *ev,
>  
>      ev->infinite = 0;
>      ev->abs = absolute;
> -    time_insert_finite(gc, ev);
> +    LIBXL_TAILQ_INSERT_SORTED(&CTX->etimes, entry, ev, evsearch, /*empty*/,
> +                              timercmp(&ev->abs, &evsearch->abs, >));
>  
>      return 0;
>  }
> @@ -294,7 +288,12 @@ static int time_register_finite(libxl__gc *gc, libxl__ev_time *ev,
>  static void time_deregister(libxl__gc *gc, libxl__ev_time *ev)
>  {
>      if (!ev->infinite) {
> -        OSEVENT_HOOK_VOID(timeout,deregister, release, ev->nexus->for_app_reg);
> +        struct timeval right_away = { 0, 0 };
> +        if (ev->nexus) /* only set if app provided hooks */
> +            ev->nexus->ev = 0;
> +        OSEVENT_HOOK_VOID(timeout,modify,
>   

Nit again about the space here.

> +                          noop /* release nexus in _occurred_ */,
> +                          ev->nexus->for_app_reg, right_away);
>   

I had to change this to &ev->nexus->for_app_reg to get it to work
properly. But with this change, and the aforementioned hack to the
libvirt libxl driver, I'm not seeing the race any longer.

Regards,
Jim

>          LIBXL_TAILQ_REMOVE(&CTX->etimes, ev, entry);
>      }
>  }
> @@ -364,70 +363,6 @@ int libxl__ev_time_register_rel(libxl__gc *gc, libxl__ev_time *ev,
>      return rc;
>  }
>  
> -int libxl__ev_time_modify_abs(libxl__gc *gc, libxl__ev_time *ev,
> -                              struct timeval absolute)
> -{
> -    int rc;
> -
> -    CTX_LOCK;
> -
> -    DBG("ev_time=%p modify abs==%lu.%06lu",
> -        ev, (unsigned long)absolute.tv_sec, (unsigned long)absolute.tv_usec);
> -
> -    assert(libxl__ev_time_isregistered(ev));
> -
> -    if (ev->infinite) {
> -        rc = time_register_finite(gc, ev, absolute);
> -        if (rc) goto out;
> -    } else {
> -        rc = OSEVENT_HOOK(timeout,modify, noop,
> -                          &ev->nexus->for_app_reg, absolute);
> -        if (rc) goto out;
> -
> -        LIBXL_TAILQ_REMOVE(&CTX->etimes, ev, entry);
> -        ev->abs = absolute;
> -        time_insert_finite(gc, ev);
> -    }
> -
> -    rc = 0;
> - out:
> -    time_done_debug(gc,__func__,ev,rc);
> -    CTX_UNLOCK;
> -    return rc;
> -}
> -
> -int libxl__ev_time_modify_rel(libxl__gc *gc, libxl__ev_time *ev,
> -                              int milliseconds)
> -{
> -    struct timeval absolute;
> -    int rc;
> -
> -    CTX_LOCK;
> -
> -    DBG("ev_time=%p modify ms=%d", ev, milliseconds);
> -
> -    assert(libxl__ev_time_isregistered(ev));
> -
> -    if (milliseconds < 0) {
> -        time_deregister(gc, ev);
> -        ev->infinite = 1;
> -        rc = 0;
> -        goto out;
> -    }
> -
> -    rc = time_rel_to_abs(gc, milliseconds, &absolute);
> -    if (rc) goto out;
> -
> -    rc = libxl__ev_time_modify_abs(gc, ev, absolute);
> -    if (rc) goto out;
> -
> -    rc = 0;
> - out:
> -    time_done_debug(gc,__func__,ev,rc);
> -    CTX_UNLOCK;
> -    return rc;
> -}
> -
>  void libxl__ev_time_deregister(libxl__gc *gc, libxl__ev_time *ev)
>  {
>      CTX_LOCK;
> @@ -1161,7 +1096,11 @@ void libxl_osevent_occurred_timeout(libxl_ctx *ctx, void *for_libxl)
>      CTX_LOCK;
>      assert(!CTX->osevent_in_hook);
>  
> -    libxl__ev_time *ev = osevent_ev_from_hook_nexus(ctx, for_libxl);
> +    libxl__osevent_hook_nexus *nexus = for_libxl;
> +    libxl__ev_time *ev = osevent_ev_from_hook_nexus(ctx, nexus);
> +
> +    osevent_release_nexus(gc, &CTX->hook_timeout_nexi_idle, nexus);
> +
>      if (!ev) goto out;
>      assert(!ev->infinite);
>  
> diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
> index 3bcb6d3..51f2721 100644
> --- a/tools/libxl/libxl_event.h
> +++ b/tools/libxl/libxl_event.h
> @@ -287,8 +287,10 @@ typedef struct libxl_osevent_hooks {
>    int (*timeout_register)(void *user, void **for_app_registration_out,
>                            struct timeval abs, void *for_libxl);
>    int (*timeout_modify)(void *user, void **for_app_registration_update,
> -                         struct timeval abs);
> -  void (*timeout_deregister)(void *user, void *for_app_registration);
> +                         struct timeval abs)
> +      /* only ever called with abs={0,0}, meaning ASAP */;
> +  void (*timeout_deregister)(void *user, void *for_app_registration)
> +      /* will never be called */;
>  } libxl_osevent_hooks;
>  
>  /* The application which calls register_fd_hooks promises to
> @@ -337,6 +339,17 @@ typedef struct libxl_osevent_hooks {
>   * register (or modify), and pass it to subsequent calls to modify
>   * or deregister.
>   *
> + * Note that the application must cope with a call from libxl to
> + * timeout_modify racing with its own call to
> + * libxl__osevent_occurred_timeout.  libxl guarantees that
> + * timeout_modify will only be called with abs={0,0} but the
> + * application must still ensure that libxl's attempt to cause the
> + * timeout to occur immediately is safely ignored even the timeout is
> + * actually already in the process of occurring.
> + *
> + * timeout_deregister is not used because it forms part of a
> + * deprecated unsafe mode of use of the API.
> + *
>   * osevent_register_hooks may be called only once for each libxl_ctx.
>   * libxl may make calls to register/modify/deregister from within
>   * any libxl function (indeed, it will usually call register from
>   

  reply	other threads:[~2012-12-11 22:53 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-20  7:16 [PATCH] fix race condition between libvirtd event handling and libxl fd deregister Bamvor Jian Zhang
2012-11-23  9:43 ` Ian Campbell
2012-11-26 23:22   ` Jim Fehlig
2012-11-27 10:03     ` Ian Campbell
2012-11-27 19:08 ` Ian Jackson
2012-11-28 11:25   ` Ian Campbell
2012-12-06 16:11     ` 答复: " Bamvor Jian Zhang
2012-12-07 19:11     ` Ian Jackson
2012-12-07 19:15       ` [PATCH 1/2] libxl: fix stale fd event callback race Ian Jackson
2012-12-07 19:22         ` Ian Jackson
2012-12-07 19:15       ` [PATCH 2/2] libxl: fix stale timeout " Ian Jackson
2012-12-07 19:21       ` [PATCH 1/2] libxl: fix stale fd " Ian Jackson
2012-12-07 19:21       ` [PATCH 2/2] libxl: fix stale timeout " Ian Jackson
2012-12-10 10:19       ` [PATCH] fix race condition between libvirtd event handling and libxl fd deregister Ian Campbell
2012-12-10 10:37         ` Ian Jackson
2012-12-10 16:56           ` Ian Jackson
2012-12-10 16:57             ` [PATCH 1/2] libxl: fix stale fd event callback race Ian Jackson
2012-12-11 22:35               ` Jim Fehlig
2012-12-12 17:04                 ` Ian Jackson
2012-12-12 17:20                   ` Jim Fehlig
2012-12-10 16:57             ` [PATCH 2/2] libxl: fix stale timeout " Ian Jackson
2012-12-11 22:53               ` Jim Fehlig [this message]
2012-12-12 17:14                 ` Ian Jackson
2012-12-12 17:16                   ` Ian Jackson
2012-12-12 17:26                   ` Jim Fehlig
2012-12-12 17:37                     ` Ian Jackson
2012-12-12 18:01                       ` Jim Fehlig
2012-12-13 10:29                         ` Ian Campbell
2012-12-13 13:12                           ` Ian Jackson
2012-12-13 15:53                           ` Jim Fehlig
2012-12-13 15:58                             ` Ian Jackson
2012-12-13 16:53                               ` Jim Fehlig
2012-12-13 13:07                         ` Ian Jackson
2012-12-13 15:41                           ` Jim Fehlig
2012-12-13 15:51                             ` Ian Jackson
2012-12-13 15:57                               ` Jim Fehlig
2012-12-11 10:19             ` [PATCH] fix race condition between libvirtd event handling and libxl fd deregister Bamvor Jian Zhang
2012-12-11 11:20               ` Ian Jackson
2013-01-11 11:41             ` Ian Campbell
2013-01-11 17:51               ` Jim Fehlig
2013-01-15 10:40                 ` Ian Campbell
2013-01-15 17:02                   ` Jim Fehlig
2013-01-21 19:37               ` Jim Fehlig
2013-01-22 10:21                 ` Ian Campbell
2012-12-10 23:56           ` Jim Fehlig
2012-12-11 11:18             ` Ian Jackson
2012-12-06 16:06   ` 答复: " Bamvor Jian Zhang
2013-01-23 16:08 [PATCH 0/2 v4] libxl: fix event races exposed by libvirt Ian Jackson
2013-01-23 16:08 ` [PATCH 2/2] libxl: fix stale timeout event callback race Ian Jackson
2013-01-23 17:48   ` Jim Fehlig
2013-01-23 18:21 [PATCH 0/2 v4.1] libxl: fix event races exposed by libvirt Ian Jackson
2013-01-23 18:21 ` [PATCH 2/2] libxl: fix stale timeout event callback race Ian Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C7B974.4050706@suse.com \
    --to=jfehlig@suse.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=bjzhang@suse.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.