From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shriram Rajagopalan Subject: Re: [PATCH 3 of 5 V2] tools/libxl: setup/teardown Remus network buffering Date: Wed, 4 Sep 2013 13:22:28 -0400 Message-ID: References: <3e8b0aa7cd4a945ec3ef.1377814582@athos.nss.cs.ubc.ca> <21031.23781.949565.131838@mariner.uk.xensource.com> Reply-To: rshriram@cs.ubc.ca Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7012792103723273393==" Return-path: In-Reply-To: <21031.23781.949565.131838@mariner.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Jackson Cc: Andrew Cooper , Stefano Stabellini , Ian Campbell , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org --===============7012792103723273393== Content-Type: multipart/alternative; boundary=047d7b874c921a65e804e5920f39 --047d7b874c921a65e804e5920f39 Content-Type: text/plain; charset=ISO-8859-1 On Wed, Sep 4, 2013 at 12:16 PM, Ian Jackson wrote: > Shriram Rajagopalan writes ("[Xen-devel] [PATCH 3 of 5 V2] tools/libxl: > setup/teardown Remus network buffering"): > > tools/libxl: setup/teardown Remus network buffering > > Thanks. > > I'm afraid the event handling, particularly with respect to > subprocesses, is not yet right. > > > diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_dom.c > > --- a/tools/libxl/libxl_dom.c Thu Aug 29 14:36:25 2013 -0700 > > +++ b/tools/libxl/libxl_dom.c Thu Aug 29 14:36:36 2013 -0700 > > @@ -1259,7 +1259,7 @@ static void remus_checkpoint_dm_saved(li > > /* REMUS TODO: Wait for disk and memory ack, release network buffer > */ > > /* REMUS TODO: make this asynchronous */ > > assert(!rc); /* REMUS TODO handle this error properly */ > > - usleep(dss->interval * 1000); > > + usleep(dss->remus_ctx->interval * 1000); > > libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1); > > Do you plan to fix this later (perhaps as part of a future series) ? > > Its actually fixed in the next patch in the same series. > > diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_netbuffer.c > > --- /dev/null Thu Jan 01 00:00:00 1970 +0000 > > +++ b/tools/libxl/libxl_netbuffer.c Thu Aug 29 14:36:36 2013 -0700 > > @@ -0,0 +1,434 @@ > ... > > +static void netbuf_script_child_death_cb(libxl__egc *egc, > > + libxl__ev_child *child, > > + pid_t pid, int status) > > +{ > > + /* No-op. hotplug-error will be read in setup/teardown_ifb */ > > + return; > > +} > > This can't possibly be right. You must always wait for your children > and you may not complete the ao until the child has finished. This > should all be documented in libxl_internal.h. > > Thats true. But this was not meant to be an ao (see explanation at end of this email). > Also I don't see where this other error handling code is. > > > + if (!pid) { > > + /* child: Launch netbuf script */ > > + libxl__exec(gc, -1, -1, -1, args[0], args, env); > > + /* notreached */ > > + abort(); > > + } > > + > > + assert(libxl__ev_child_inuse(&childw_out)); > > What purpose does this serve ? This can be false if the death_cb is > called from within libxl__ev_child_fork. > > Sorry. This was copy pasted from code in device_hotplug(). I couldn't find a simple fork & exec abstraction. The libxl_ev_child_fork seemed to be the only option. > > + rc = libxl__exec_netbuf_script(gc, domid, devid, vif, > > + script, "setup"); > > + if (rc) return rc; > > This definitely isn't right - it needs to be asynchronous, with a > callback. > > > + /* Now wait for the result (XENBUS_PATH/hotplug-status). > > + * It doesnt matter what the result is. If the hotplug-status path > > + * appears, then we are good to go. > > + */ > > + rc = libxl__wait_for_offspring(gc, domid, LIBXL_HOTPLUG_TIMEOUT, > > + script, > > + /* path */ > > + GCSPRINTF("%s/hotplug-status", > > + out_path_base), > > + NULL /* state */, > > + NULL, script_done_cb, NULL); > > This wait does not actually block the current execution. It actually does. Atleast based on the code in libxl_exec.c which uses a select() call to wait on xenstore fds. > It arranges > for the callback to be called. You need to break this function apart. > > > +static int libxl__netbuf_script_teardown(libxl__gc *gc, uint32_t domid, > > + int devid, char *vif, > > + char *script) > > +{ > > + /* Nothing to wait for during tear down */ > > + return libxl__exec_netbuf_script(gc, domid, devid, vif, > > + script, "teardown"); > > And this one. > > > + for (i = 0; i < num_netbufs; ++i) { > > + > > + /* The setup script does the following > > + * find a free IFB to act as buffer for the vif. > > + * set up the plug qdisc on the IFB. > > + */ > > + ret = libxl__netbuf_script_setup(gc, domid, i, vif_list[i], > > + (char *) > remus_ctx->netbufscript, > > + &ifb_list[i]); > > Does this run a script per netbuf ? Do they want to run in > parallel ? > > A script per netbuf. Running them in parallel is an overkill, since the remus setup is a one-time task and the average domain is expected to have few interfaces. Besides, the script has no waits. It just issues a bunch of shell commands that don't block > > +int libxl__remus_netbuf_teardown(libxl__gc *gc, uint32_t domid, > > + libxl__remus_ctx *remus_ctx) > > +{ > ... > > + for (i = 0; i < netbuf_ctx->num_netbufs; ++i) > > + libxl__netbuf_script_teardown(gc, domid, i, vif_list[i], > > + (char *) remus_ctx->netbufscript); > > I think you should definitely do this asynchronously and in parallel. > > See answer above for running in parallel. In terms of running this asynchronously, I had a comment in this file explaining why I decided to go the synchronous route. Basically, the setup and teardown are part of the remus API call, which is already asynchronous. The remus external API (libxl_domain_remus_start) returns after starting remus. It goes like this: setup scripts suspend_domain (which basically launches save_domain helper process) return AO_IN_PROGRESS The setup scripts are have no long running operations. They complete in jiffy (adding a bunch of tc rules to interfaces) The AO callback in libxl.c (named "remus_replication_failure_cb" [beginning of this patch]), is called when the backup fails, i.e. when the replication operations fail (such as write() calls). The AO callback calls the above teardown function. I remember reading in the comments that an AO op should not be issued from an AO callback (please correct me if I am wrong). What happens if I use "xl destroy" to destroy a domain which is being > handled by remus. > > One of the replication steps (domain suspend or domain resume) would fail. This would result in the AO callback being invoked, which in turn invokes the external scripts. The scripts fail silently if the vif interface cannot be found, but they go ahead and remove the IFB device, and so on. > Thanks, > Ian. > > --047d7b874c921a65e804e5920f39 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On Wed, Sep 4, 2013 at 12:16 PM, Ian Jackson <Ian= .Jackson@eu.citrix.com> wrote:
=
Shriram Rajagopalan writes ("[Xen-devel] [PATCH 3 of = 5 V2] tools/libxl: setup/teardown Remus network buffering"):
> tools/libxl: setup/teardown Remus network buffering<= br>
Thanks.

I'm afraid the event handling, particularly with respect to
subprocesses, is not yet right.

> diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_dom.c
> --- a/tools/libxl/libxl_dom.c Thu Aug 29 14:36:25 2013 -0700
> +++ b/tools/libxl/libxl_dom.c Thu Aug 29 14:36:36 2013 -0700
> @@ -1259,7 +1259,7 @@ static void remus_checkpoint_dm_saved(li
> =A0 =A0 =A0/* REMUS TODO: Wait for disk and memory ack, release networ= k buffer */
> =A0 =A0 =A0/* REMUS TODO: make this asynchronous */
> =A0 =A0 =A0assert(!rc); /* REMUS TODO handle this error properly */ > - =A0 =A0usleep(dss->interval * 1000);
> + =A0 =A0usleep(dss->remus_ctx->interval * 1000);
> =A0 =A0 =A0libxl__xc_domain_saverestore_async_callback_done(egc, &= dss->shs, 1);

Do you plan to fix this later (perhaps as part of a future series) ?<= br>


Its actually f= ixed in the next patch in the same series.
=A0
> diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_netbuffer.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +0000
> +++ b/tools/libxl/libxl_netbuffer.c =A0 Thu Aug 29 14:36:36 2013 -0700=
> @@ -0,0 +1,434 @@
...
> +static void netbuf_script_child_death_cb(libxl__egc= *egc,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 libxl__ev_child *child,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 pid_t pid, int status)
> +{
> + =A0 =A0/* No-op. hotplug-error will be read in setup/teardown_ifb */=
> + =A0 =A0return;
> +}

This can't possibly be right. =A0You must always wait for your ch= ildren
and you may not complete the ao until the child has finished. =A0This
should all be documented in libxl_internal.h.


Thats true. But this was not meant to = be an ao (see explanation at
end of this email).
=A0
Also I don't see where this other error handling code is.

> + =A0 =A0if (!pid) {
> + =A0 =A0 =A0 =A0/* child: Launch netbuf script */
> + =A0 =A0 =A0 =A0libxl__exec(gc, -1, -1, -1, args[0], args, env);
> + =A0 =A0 =A0 =A0/* notreached */
> + =A0 =A0 =A0 =A0abort();
> + =A0 =A0}
> +
> + =A0 =A0assert(libxl__ev_child_inuse(&childw_out));

What purpose does this serve ? =A0This can be false if the death_cb i= s
called from within libxl__ev_child_fork.


Sorry. This wa= s copy pasted from code in device_hotplug(). I couldn't find
= a simple fork & exec abstraction. The libxl_ev_child_fork seemed to be<= /div>
the only option.
=A0
> + =A0 =A0rc =3D libxl__exec_netbuf_script(gc, domid, devid, vif,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = script, "setup");
> + =A0 =A0if (rc) return rc;

This definitely isn't right - it needs to be asynchronous, with a=
callback.

> + =A0 =A0/* Now wait for the result (XENBUS_PATH/hotplug-status).
> + =A0 =A0 * It doesnt matter what the result is. If the hotplug-status= path
> + =A0 =A0 * appears, then we are good to go.
> + =A0 =A0 */
> + =A0 =A0rc =3D libxl__wait_for_offspring(gc, domid, LIBXL_HOTPLUG_TIM= EOUT,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = script,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = /* path */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = GCSPRINTF("%s/hotplug-status",
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 out_path_base),
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = NULL /* state */,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = NULL, script_done_cb, NULL);

This wait does not actually block the current execution.

It actually does. Atleast based on the code in libxl_e= xec.c which uses
a select() call to wait on xenstore fds.
=A0
=A0It arranges
for the callback to be called. =A0You need to break this function apart.

> +static int libxl__netbuf_script_teardown(libxl__gc *gc, uint32_t domi= d,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 int devid, char *vif,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 char *script)
> +{
> + =A0 =A0/* Nothing to wait for during tear down */
> + =A0 =A0return libxl__exec_netbuf_script(gc, domid, devid, vif,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 script, "teardown");

And this one.

> + =A0 =A0for (i =3D 0; i < num_netbufs; ++i) {
> +
> + =A0 =A0 =A0 =A0/* The setup script does the following
> + =A0 =A0 =A0 =A0 * find a free IFB to act as buffer for the vif.
> + =A0 =A0 =A0 =A0 * set up the plug qdisc on the IFB.
> + =A0 =A0 =A0 =A0 */
> + =A0 =A0 =A0 =A0ret =3D libxl__netbuf_script_setup(gc, domid, i, vif_= list[i],
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 (char *) remus_ctx->netbufscript,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 &ifb_list[i]);

Does this run a script per netbuf ? =A0Do they want to run in
parallel ?


A script per n= etbuf. Running them in parallel is an overkill, since the remus
s= etup is a one-time task and the average domain is expected to have few inte= rfaces.
Besides, the script has no waits. It just issues a bunch of shell comm= ands that don't block
=A0
> +int libxl__remus_netbuf_teardown(libxl__gc *gc, uint32_t domid,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0libxl__re= mus_ctx *remus_ctx)
> +{
...
> + =A0 =A0for (i =3D 0; i < netbuf_ctx->num_net= bufs; ++i)
> + =A0 =A0 =A0 =A0libxl__netbuf_script_teardown(gc, domid, i, vif_list[= i],
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0(char *) remus_ctx->netbufscript);

I think you should definitely do this asynchronously and in parallel.=


See answer above for running in parall= el. In terms of running this asynchronously,
I had a comment in t= his file explaining why I decided to go the synchronous route.
Basically, the setup and teardown are part of the remus API call, which is = already
asynchronous.

The remus external= API (libxl_domain_remus_start) returns after starting remus.
It goes like this:
=A0 setup scripts
=A0 suspend_domain= (which basically launches save_domain helper process)
=A0 return= AO_IN_PROGRESS

The setup scripts are have no long= running operations. They complete in jiffy
(adding a bunch of tc rules to interfaces)

Th= e AO callback in libxl.c (named "remus_replication_failure_cb" [b= eginning=A0
of this patch]), is called when the backup fails, i.e= . when the replication=A0
operations fail (such as write() calls).

The = AO callback calls the above teardown function. =A0I remember reading in the=
comments that an AO op should not be issued from an AO callback= =A0
(please correct me if I am wrong).

What happens if I use "xl destroy" to destroy a domain which is b= eing
handled by remus.


One of the replication steps (domain s= uspend or domain resume) would fail.
This would result in the AO = callback being invoked, which in turn invokes the
external script= s. The scripts fail silently if the vif interface cannot be found,=A0
but they go ahead and remove the IFB device, and so on.
=A0<= /div>
Thanks,
Ian.


--047d7b874c921a65e804e5920f39-- --===============7012792103723273393== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============7012792103723273393==--