On Wed, Sep 4, 2013 at 12:16 PM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:

Shriram Rajagopalan writes ("[Xen-devel] [PATCH 3 of 5 V2] tools/libxl: setup/teardown Remus network buffering"):

> tools/libxl: setup/teardown Remus network buffering

Thanks.

I'm afraid the event handling, particularly with respect to
subprocesses, is not yet right.

> diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_dom.c
> --- a/tools/libxl/libxl_dom.c Thu Aug 29 14:36:25 2013 -0700
> +++ b/tools/libxl/libxl_dom.c Thu Aug 29 14:36:36 2013 -0700
> @@ -1259,7 +1259,7 @@ static void remus_checkpoint_dm_saved(li
> /* REMUS TODO: Wait for disk and memory ack, release network buffer */
> /* REMUS TODO: make this asynchronous */
> assert(!rc); /* REMUS TODO handle this error properly */
> - usleep(dss->interval * 1000);
> + usleep(dss->remus_ctx->interval * 1000);
> libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);

Do you plan to fix this later (perhaps as part of a future series) ?

Its actually fixed in the next patch in the same series.

> diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_netbuffer.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +0000
> +++ b/tools/libxl/libxl_netbuffer.c Thu Aug 29 14:36:36 2013 -0700
> @@ -0,0 +1,434 @@

...

> +static void netbuf_script_child_death_cb(libxl__egc *egc,
> + libxl__ev_child *child,
> + pid_t pid, int status)
> +{
> + /* No-op. hotplug-error will be read in setup/teardown_ifb */
> + return;
> +}

This can't possibly be right. You must always wait for your children
and you may not complete the ao until the child has finished. This
should all be documented in libxl_internal.h.

Thats true. But this was not meant to be an ao (see explanation at
end of this email).

Also I don't see where this other error handling code is.

> + if (!pid) {
> + /* child: Launch netbuf script */
> + libxl__exec(gc, -1, -1, -1, args[0], args, env);
> + /* notreached */
> + abort();
> + }
> +
> + assert(libxl__ev_child_inuse(&childw_out));

What purpose does this serve ? This can be false if the death_cb is
called from within libxl__ev_child_fork.

Sorry. This was copy pasted from code in device_hotplug(). I couldn't find
a simple fork & exec abstraction. The libxl_ev_child_fork seemed to be

the only option.

> + rc = libxl__exec_netbuf_script(gc, domid, devid, vif,
> + script, "setup");
> + if (rc) return rc;

This definitely isn't right - it needs to be asynchronous, with a
callback.

> + /* Now wait for the result (XENBUS_PATH/hotplug-status).
> + * It doesnt matter what the result is. If the hotplug-status path
> + * appears, then we are good to go.
> + */
> + rc = libxl__wait_for_offspring(gc, domid, LIBXL_HOTPLUG_TIMEOUT,
> + script,
> + /* path */
> + GCSPRINTF("%s/hotplug-status",
> + out_path_base),
> + NULL /* state */,
> + NULL, script_done_cb, NULL);

This wait does not actually block the current execution.

It actually does. Atleast based on the code in libxl_exec.c which uses
a select() call to wait on xenstore fds.

It arranges
for the callback to be called. You need to break this function apart.

> +static int libxl__netbuf_script_teardown(libxl__gc *gc, uint32_t domid,
> + int devid, char *vif,
> + char *script)
> +{
> + /* Nothing to wait for during tear down */
> + return libxl__exec_netbuf_script(gc, domid, devid, vif,
> + script, "teardown");

And this one.

> + for (i = 0; i < num_netbufs; ++i) {
> +
> + /* The setup script does the following
> + * find a free IFB to act as buffer for the vif.
> + * set up the plug qdisc on the IFB.
> + */
> + ret = libxl__netbuf_script_setup(gc, domid, i, vif_list[i],
> + (char *) remus_ctx->netbufscript,
> + &ifb_list[i]);

Does this run a script per netbuf ? Do they want to run in
parallel ?

A script per netbuf. Running them in parallel is an overkill, since the remus
setup is a one-time task and the average domain is expected to have few interfaces.

Besides, the script has no waits. It just issues a bunch of shell commands that don't block

> +int libxl__remus_netbuf_teardown(libxl__gc *gc, uint32_t domid,
> + libxl__remus_ctx *remus_ctx)
> +{

...

> + for (i = 0; i < netbuf_ctx->num_netbufs; ++i)
> + libxl__netbuf_script_teardown(gc, domid, i, vif_list[i],
> + (char *) remus_ctx->netbufscript);

I think you should definitely do this asynchronously and in parallel.

See answer above for running in parallel. In terms of running this asynchronously,
I had a comment in this file explaining why I decided to go the synchronous route.

Basically, the setup and teardown are part of the remus API call, which is already
asynchronous.

The remus external API (libxl_domain_remus_start) returns after starting remus.
It goes like this:
setup scripts
suspend_domain (which basically launches save_domain helper process)
return AO_IN_PROGRESS

The setup scripts are have no long running operations. They complete in jiffy

(adding a bunch of tc rules to interfaces)

The AO callback in libxl.c (named "remus_replication_failure_cb" [beginning
of this patch]), is called when the backup fails, i.e. when the replication

operations fail (such as write() calls).

The AO callback calls the above teardown function. I remember reading in the
comments that an AO op should not be issued from an AO callback

(please correct me if I am wrong).

What happens if I use "xl destroy" to destroy a domain which is being
handled by remus.

One of the replication steps (domain suspend or domain resume) would fail.
This would result in the AO callback being invoked, which in turn invokes the
external scripts. The scripts fail silently if the vif interface cannot be found,

but they go ahead and remove the IFB device, and so on.

Thanks,
Ian.

Ian J, do you have any further thoughts related to my responses on your concerns about

the event handling in this part of the code ?

Do you have suggestions on a cleaner way to go about the libxl_ev_child_fork code block ?

With regard to the AO ops, do my answers make sense ?

thanks

shriram