From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shriram Rajagopalan Subject: Re: [PATCH 3 of 5 V2] tools/libxl: setup/teardown Remus network buffering Date: Wed, 9 Oct 2013 09:32:47 -0700 Message-ID: References: <3e8b0aa7cd4a945ec3ef.1377814582@athos.nss.cs.ubc.ca> <21031.23781.949565.131838@mariner.uk.xensource.com> Reply-To: rshriram@cs.ubc.ca Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3316858617554060713==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Jackson Cc: Andrew Cooper , Stefano Stabellini , Ian Campbell , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org --===============3316858617554060713== Content-Type: multipart/alternative; boundary=047d7bdc9e68e3bb1404e8517104 --047d7bdc9e68e3bb1404e8517104 Content-Type: text/plain; charset=ISO-8859-1 On Wed, Sep 4, 2013 at 10:22 AM, Shriram Rajagopalan wrote: > On Wed, Sep 4, 2013 at 12:16 PM, Ian Jackson wrote: > >> Shriram Rajagopalan writes ("[Xen-devel] [PATCH 3 of 5 V2] tools/libxl: >> setup/teardown Remus network buffering"): >> > tools/libxl: setup/teardown Remus network buffering >> >> Thanks. >> >> I'm afraid the event handling, particularly with respect to >> subprocesses, is not yet right. >> >> > diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_dom.c >> > --- a/tools/libxl/libxl_dom.c Thu Aug 29 14:36:25 2013 -0700 >> > +++ b/tools/libxl/libxl_dom.c Thu Aug 29 14:36:36 2013 -0700 >> > @@ -1259,7 +1259,7 @@ static void remus_checkpoint_dm_saved(li >> > /* REMUS TODO: Wait for disk and memory ack, release network >> buffer */ >> > /* REMUS TODO: make this asynchronous */ >> > assert(!rc); /* REMUS TODO handle this error properly */ >> > - usleep(dss->interval * 1000); >> > + usleep(dss->remus_ctx->interval * 1000); >> > libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, >> 1); >> >> Do you plan to fix this later (perhaps as part of a future series) ? >> >> > Its actually fixed in the next patch in the same series. > > >> > diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_netbuffer.c >> > --- /dev/null Thu Jan 01 00:00:00 1970 +0000 >> > +++ b/tools/libxl/libxl_netbuffer.c Thu Aug 29 14:36:36 2013 -0700 >> > @@ -0,0 +1,434 @@ >> ... >> > +static void netbuf_script_child_death_cb(libxl__egc *egc, >> > + libxl__ev_child *child, >> > + pid_t pid, int status) >> > +{ >> > + /* No-op. hotplug-error will be read in setup/teardown_ifb */ >> > + return; >> > +} >> >> This can't possibly be right. You must always wait for your children >> and you may not complete the ao until the child has finished. This >> should all be documented in libxl_internal.h. >> >> > Thats true. But this was not meant to be an ao (see explanation at > end of this email). > > >> Also I don't see where this other error handling code is. >> >> > + if (!pid) { >> > + /* child: Launch netbuf script */ >> > + libxl__exec(gc, -1, -1, -1, args[0], args, env); >> > + /* notreached */ >> > + abort(); >> > + } >> > + >> > + assert(libxl__ev_child_inuse(&childw_out)); >> >> What purpose does this serve ? This can be false if the death_cb is >> called from within libxl__ev_child_fork. >> >> > Sorry. This was copy pasted from code in device_hotplug(). I couldn't find > a simple fork & exec abstraction. The libxl_ev_child_fork seemed to be > the only option. > > >> > + rc = libxl__exec_netbuf_script(gc, domid, devid, vif, >> > + script, "setup"); >> > + if (rc) return rc; >> >> This definitely isn't right - it needs to be asynchronous, with a >> callback. >> >> > + /* Now wait for the result (XENBUS_PATH/hotplug-status). >> > + * It doesnt matter what the result is. If the hotplug-status path >> > + * appears, then we are good to go. >> > + */ >> > + rc = libxl__wait_for_offspring(gc, domid, LIBXL_HOTPLUG_TIMEOUT, >> > + script, >> > + /* path */ >> > + GCSPRINTF("%s/hotplug-status", >> > + out_path_base), >> > + NULL /* state */, >> > + NULL, script_done_cb, NULL); >> >> This wait does not actually block the current execution. > > > It actually does. Atleast based on the code in libxl_exec.c which uses > a select() call to wait on xenstore fds. > > >> It arranges >> for the callback to be called. You need to break this function apart. >> >> > +static int libxl__netbuf_script_teardown(libxl__gc *gc, uint32_t domid, >> > + int devid, char *vif, >> > + char *script) >> > +{ >> > + /* Nothing to wait for during tear down */ >> > + return libxl__exec_netbuf_script(gc, domid, devid, vif, >> > + script, "teardown"); >> >> And this one. >> >> > + for (i = 0; i < num_netbufs; ++i) { >> > + >> > + /* The setup script does the following >> > + * find a free IFB to act as buffer for the vif. >> > + * set up the plug qdisc on the IFB. >> > + */ >> > + ret = libxl__netbuf_script_setup(gc, domid, i, vif_list[i], >> > + (char *) >> remus_ctx->netbufscript, >> > + &ifb_list[i]); >> >> Does this run a script per netbuf ? Do they want to run in >> parallel ? >> >> > A script per netbuf. Running them in parallel is an overkill, since the > remus > setup is a one-time task and the average domain is expected to have few > interfaces. > Besides, the script has no waits. It just issues a bunch of shell commands > that don't block > > >> > +int libxl__remus_netbuf_teardown(libxl__gc *gc, uint32_t domid, >> > + libxl__remus_ctx *remus_ctx) >> > +{ >> ... >> > + for (i = 0; i < netbuf_ctx->num_netbufs; ++i) >> > + libxl__netbuf_script_teardown(gc, domid, i, vif_list[i], >> > + (char *) >> remus_ctx->netbufscript); >> >> I think you should definitely do this asynchronously and in parallel. >> >> > See answer above for running in parallel. In terms of running this > asynchronously, > I had a comment in this file explaining why I decided to go the > synchronous route. > Basically, the setup and teardown are part of the remus API call, which > is already > asynchronous. > > The remus external API (libxl_domain_remus_start) returns after starting > remus. > It goes like this: > setup scripts > suspend_domain (which basically launches save_domain helper process) > return AO_IN_PROGRESS > > The setup scripts are have no long running operations. They complete in > jiffy > (adding a bunch of tc rules to interfaces) > > The AO callback in libxl.c (named "remus_replication_failure_cb" > [beginning > of this patch]), is called when the backup fails, i.e. when the > replication > operations fail (such as write() calls). > > The AO callback calls the above teardown function. I remember reading in > the > comments that an AO op should not be issued from an AO callback > (please correct me if I am wrong). > > What happens if I use "xl destroy" to destroy a domain which is being >> handled by remus. >> >> > One of the replication steps (domain suspend or domain resume) would fail. > This would result in the AO callback being invoked, which in turn invokes > the > external scripts. The scripts fail silently if the vif interface cannot be > found, > but they go ahead and remove the IFB device, and so on. > > >> Thanks, >> Ian. >> >> > Ian J, do you have any further thoughts related to my responses on your concerns about the event handling in this part of the code ? Do you have suggestions on a cleaner way to go about the libxl_ev_child_fork code block ? With regard to the AO ops, do my answers make sense ? thanks shriram --047d7bdc9e68e3bb1404e8517104 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On Wed, Sep 4, 2013 at 10:22 AM, Shriram Rajagopalan <rs= hriram@cs.ubc.ca> wrote:
On Wed, Sep 4, 2013 at = 12:16 PM, Ian Jackson <Ian.Jackson@eu.citrix.com> wr= ote:
Shriram Rajagopalan writes ("[Xen-devel] [PATCH 3 of = 5 V2] tools/libxl: setup/teardown Remus network buffering"):
> tools/libxl: setup/teardown Remus network buffering

Thanks.

I'm afraid the event handling, particularly with respect to
subprocesses, is not yet right.

> diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_dom.c
> --- a/tools/libxl/libxl_dom.c Thu Aug 29 14:36:25 2013 -0700
> +++ b/tools/libxl/libxl_dom.c Thu Aug 29 14:36:36 2013 -0700
> @@ -1259,7 +1259,7 @@ static void remus_checkpoint_dm_saved(li
> =A0 =A0 =A0/* REMUS TODO: Wait for disk and memory ack, release networ= k buffer */
> =A0 =A0 =A0/* REMUS TODO: make this asynchronous */
> =A0 =A0 =A0assert(!rc); /* REMUS TODO handle this error properly */ > - =A0 =A0usleep(dss->interval * 1000);
> + =A0 =A0usleep(dss->remus_ctx->interval * 1000);
> =A0 =A0 =A0libxl__xc_domain_saverestore_async_callback_done(egc, &= dss->shs, 1);

Do you plan to fix this later (perhaps as part of a future series) ?<= br>


Its actually fixed in= the next patch in the same series.
=A0
> diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_netbuffer.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +0000
> +++ b/tools/libxl/libxl_netbuffer.c =A0 Thu Aug 29 14:36:36 2013 -0700=
> @@ -0,0 +1,434 @@
...
> +static void netbuf_script_child_death_cb(libxl__egc *egc,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 libxl__ev_child *child,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 pid_t pid, int status)
> +{
> + =A0 =A0/* No-op. hotplug-error will be read in setup/teardown_ifb */=
> + =A0 =A0return;
> +}

This can't possibly be right. =A0You must always wait for your ch= ildren
and you may not complete the ao until the child has finished. =A0This
should all be documented in libxl_internal.h.


Thats true. But this was not mea= nt to be an ao (see explanation at
end of this email).
=A0
Also I don't see where this other error handling code is.

> + =A0 =A0if (!pid) {
> + =A0 =A0 =A0 =A0/* child: Launch netbuf script */
> + =A0 =A0 =A0 =A0libxl__exec(gc, -1, -1, -1, args[0], args, env);
> + =A0 =A0 =A0 =A0/* notreached */
> + =A0 =A0 =A0 =A0abort();
> + =A0 =A0}
> +
> + =A0 =A0assert(libxl__ev_child_inuse(&childw_out));

What purpose does this serve ? =A0This can be false if the death_cb i= s
called from within libxl__ev_child_fork.


Sorry. This was copy = pasted from code in device_hotplug(). I couldn't find
a simpl= e fork & exec abstraction. The libxl_ev_child_fork seemed to be
the only option.
=A0
> + =A0 =A0rc =3D libxl__exec_netbuf_script(gc, domid, devid, vif,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = script, "setup");
> + =A0 =A0if (rc) return rc;

This definitely isn't right - it needs to be asynchronous, with a=
callback.

> + =A0 =A0/* Now wait for the result (XENBUS_PATH/hotplug-status).
> + =A0 =A0 * It doesnt matter what the result is. If the hotplug-status= path
> + =A0 =A0 * appears, then we are good to go.
> + =A0 =A0 */
> + =A0 =A0rc =3D libxl__wait_for_offspring(gc, domid, LIBXL_HOTPLUG_TIM= EOUT,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = script,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = /* path */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = GCSPRINTF("%s/hotplug-status",
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 out_path_base),
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = NULL /* state */,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = NULL, script_done_cb, NULL);

This wait does not actually block the current execution.

It actually does. Atleast based on the code in l= ibxl_exec.c which uses
a select() call to wait on xenstore fds.
=A0
=A0It arranges
for the callback to be called. =A0You need to break this function apart.

> +static int libxl__netbuf_script_teardown(libxl__gc *gc, uint32_t domi= d,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 int devid, char *vif,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 char *script)
> +{
> + =A0 =A0/* Nothing to wait for during tear down */
> + =A0 =A0return libxl__exec_netbuf_script(gc, domid, devid, vif,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 script, "teardown");

And this one.

> + =A0 =A0for (i =3D 0; i < num_netbufs; ++i) {
> +
> + =A0 =A0 =A0 =A0/* The setup script does the following
> + =A0 =A0 =A0 =A0 * find a free IFB to act as buffer for the vif.
> + =A0 =A0 =A0 =A0 * set up the plug qdisc on the IFB.
> + =A0 =A0 =A0 =A0 */
> + =A0 =A0 =A0 =A0ret =3D libxl__netbuf_script_setup(gc, domid, i, vif_= list[i],
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 (char *) remus_ctx->netbufscript,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 &ifb_list[i]);

Does this run a script per netbuf ? =A0Do they want to run in
parallel ?


A script per netbuf. = Running them in parallel is an overkill, since the remus
setup is= a one-time task and the average domain is expected to have few interfaces.=
Besides, the script has no waits. It just issues a bunch of shell comm= ands that don't block
=A0
> +int libxl__remus_netbuf_teardown(libxl__gc *gc, uint32_t domid,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0libxl__re= mus_ctx *remus_ctx)
> +{
...
> + =A0 =A0for (i =3D 0; i < netbuf_ctx->num_netbufs; ++i) > + =A0 =A0 =A0 =A0libxl__netbuf_script_teardown(gc, domid, i, vif_list[= i],
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0(char *) remus_ctx->netbufscript);

I think you should definitely do this asynchronously and in parallel.=


See answer above for running in = parallel. In terms of running this asynchronously,
I had a commen= t in this file explaining why I decided to go the synchronous route.
Basically, the setup and teardown are part of the remus API call, which is = already
asynchronous.

The remus external= API (libxl_domain_remus_start) returns after starting remus.
It goes like this:
=A0 setup scripts
=A0 suspend_domain= (which basically launches save_domain helper process)
=A0 return= AO_IN_PROGRESS

The setup scripts are have no long= running operations. They complete in jiffy
(adding a bunch of tc rules to interfaces)

Th= e AO callback in libxl.c (named "remus_replication_failure_cb" [b= eginning=A0
of this patch]), is called when the backup fails, i.e= . when the replication=A0
operations fail (such as write() calls).

The = AO callback calls the above teardown function. =A0I remember reading in the=
comments that an AO op should not be issued from an AO callback= =A0
(please correct me if I am wrong).

What happens if I use "xl destroy" to destroy a domain which is b= eing
handled by remus.


One of the replication steps (do= main suspend or domain resume) would fail.
This would result in t= he AO callback being invoked, which in turn invokes the
external = scripts. The scripts fail silently if the vif interface cannot be found,=A0=
but they go ahead and remove the IFB device, and so on.
=A0<= /div>
Thanks,
Ian.



Ian J, do you= have any further thoughts related to my responses on your concerns about
the event handling in this part of the= code ?=A0
Do you have suggestions on a cleaner way t= o go about the libxl_ev_child_fork code block ?
With regard to the AO ops, do my answers make sense ?


thanks
shriram
--047d7bdc9e68e3bb1404e8517104-- --===============3316858617554060713== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============3316858617554060713==--