From mboxrd@z Thu Jan  1 00:00:00 1970
From: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Subject: Re: [PATCH 3 of 5 V2] tools/libxl: setup/teardown Remus
 network buffering
Date: Wed, 4 Sep 2013 13:22:28 -0400
Message-ID: <CAP8mzPN+Pe9V9J7UHSkC81PZTT7Btx9L4WEK1BDhxe0dXs6T9A@mail.gmail.com>
References: <patchbomb.1377814579@athos.nss.cs.ubc.ca>
	<3e8b0aa7cd4a945ec3ef.1377814582@athos.nss.cs.ubc.ca>
	<21031.23781.949565.131838@mariner.uk.xensource.com>
Reply-To: rshriram@cs.ubc.ca
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============7012792103723273393=="
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <21031.23781.949565.131838@mariner.uk.xensource.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, Ian Campbell <ian.campbell@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

--===============7012792103723273393==
Content-Type: multipart/alternative; boundary=047d7b874c921a65e804e5920f39

--047d7b874c921a65e804e5920f39
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Sep 4, 2013 at 12:16 PM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:

> Shriram Rajagopalan writes ("[Xen-devel] [PATCH 3 of 5 V2] tools/libxl:
> setup/teardown Remus network buffering"):
> > tools/libxl: setup/teardown Remus network buffering
>
> Thanks.
>
> I'm afraid the event handling, particularly with respect to
> subprocesses, is not yet right.
>
> > diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_dom.c
> > --- a/tools/libxl/libxl_dom.c Thu Aug 29 14:36:25 2013 -0700
> > +++ b/tools/libxl/libxl_dom.c Thu Aug 29 14:36:36 2013 -0700
> > @@ -1259,7 +1259,7 @@ static void remus_checkpoint_dm_saved(li
> >      /* REMUS TODO: Wait for disk and memory ack, release network buffer
> */
> >      /* REMUS TODO: make this asynchronous */
> >      assert(!rc); /* REMUS TODO handle this error properly */
> > -    usleep(dss->interval * 1000);
> > +    usleep(dss->remus_ctx->interval * 1000);
> >      libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
>
> Do you plan to fix this later (perhaps as part of a future series) ?
>
>
Its actually fixed in the next patch in the same series.


>  > diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_netbuffer.c
> > --- /dev/null Thu Jan 01 00:00:00 1970 +0000
> > +++ b/tools/libxl/libxl_netbuffer.c   Thu Aug 29 14:36:36 2013 -0700
> > @@ -0,0 +1,434 @@
> ...
> > +static void netbuf_script_child_death_cb(libxl__egc *egc,
> > +                                         libxl__ev_child *child,
> > +                                         pid_t pid, int status)
> > +{
> > +    /* No-op. hotplug-error will be read in setup/teardown_ifb */
> > +    return;
> > +}
>
> This can't possibly be right.  You must always wait for your children
> and you may not complete the ao until the child has finished.  This
> should all be documented in libxl_internal.h.
>
>
Thats true. But this was not meant to be an ao (see explanation at
end of this email).


> Also I don't see where this other error handling code is.
>
> > +    if (!pid) {
> > +        /* child: Launch netbuf script */
> > +        libxl__exec(gc, -1, -1, -1, args[0], args, env);
> > +        /* notreached */
> > +        abort();
> > +    }
> > +
> > +    assert(libxl__ev_child_inuse(&childw_out));
>
> What purpose does this serve ?  This can be false if the death_cb is
> called from within libxl__ev_child_fork.
>
>
Sorry. This was copy pasted from code in device_hotplug(). I couldn't find
a simple fork & exec abstraction. The libxl_ev_child_fork seemed to be
the only option.


> > +    rc = libxl__exec_netbuf_script(gc, domid, devid, vif,
> > +                                   script, "setup");
> > +    if (rc) return rc;
>
> This definitely isn't right - it needs to be asynchronous, with a
> callback.
>
> > +    /* Now wait for the result (XENBUS_PATH/hotplug-status).
> > +     * It doesnt matter what the result is. If the hotplug-status path
> > +     * appears, then we are good to go.
> > +     */
> > +    rc = libxl__wait_for_offspring(gc, domid, LIBXL_HOTPLUG_TIMEOUT,
> > +                                   script,
> > +                                   /* path */
> > +                                   GCSPRINTF("%s/hotplug-status",
> > +                                             out_path_base),
> > +                                   NULL /* state */,
> > +                                   NULL, script_done_cb, NULL);
>
> This wait does not actually block the current execution.


It actually does. Atleast based on the code in libxl_exec.c which uses
a select() call to wait on xenstore fds.


>  It arranges
> for the callback to be called.  You need to break this function apart.
>
> > +static int libxl__netbuf_script_teardown(libxl__gc *gc, uint32_t domid,
> > +                                         int devid, char *vif,
> > +                                         char *script)
> > +{
> > +    /* Nothing to wait for during tear down */
> > +    return libxl__exec_netbuf_script(gc, domid, devid, vif,
> > +                                     script, "teardown");
>
> And this one.
>
> > +    for (i = 0; i < num_netbufs; ++i) {
> > +
> > +        /* The setup script does the following
> > +         * find a free IFB to act as buffer for the vif.
> > +         * set up the plug qdisc on the IFB.
> > +         */
> > +        ret = libxl__netbuf_script_setup(gc, domid, i, vif_list[i],
> > +                                         (char *)
> remus_ctx->netbufscript,
> > +                                         &ifb_list[i]);
>
> Does this run a script per netbuf ?  Do they want to run in
> parallel ?
>
>
A script per netbuf. Running them in parallel is an overkill, since the
remus
setup is a one-time task and the average domain is expected to have few
interfaces.
Besides, the script has no waits. It just issues a bunch of shell commands
that don't block


>  > +int libxl__remus_netbuf_teardown(libxl__gc *gc, uint32_t domid,
> > +                              libxl__remus_ctx *remus_ctx)
> > +{
> ...
> > +    for (i = 0; i < netbuf_ctx->num_netbufs; ++i)
> > +        libxl__netbuf_script_teardown(gc, domid, i, vif_list[i],
> > +                                      (char *) remus_ctx->netbufscript);
>
> I think you should definitely do this asynchronously and in parallel.
>
>
See answer above for running in parallel. In terms of running this
asynchronously,
I had a comment in this file explaining why I decided to go the synchronous
route.
Basically, the setup and teardown are part of the remus API call, which is
already
asynchronous.

The remus external API (libxl_domain_remus_start) returns after starting
remus.
It goes like this:
  setup scripts
  suspend_domain (which basically launches save_domain helper process)
  return AO_IN_PROGRESS

The setup scripts are have no long running operations. They complete in
jiffy
(adding a bunch of tc rules to interfaces)

The AO callback in libxl.c (named "remus_replication_failure_cb" [beginning
of this patch]), is called when the backup fails, i.e. when the replication
operations fail (such as write() calls).

The AO callback calls the above teardown function.  I remember reading in
the
comments that an AO op should not be issued from an AO callback
(please correct me if I am wrong).

What happens if I use "xl destroy" to destroy a domain which is being
> handled by remus.
>
>
One of the replication steps (domain suspend or domain resume) would fail.
This would result in the AO callback being invoked, which in turn invokes
the
external scripts. The scripts fail silently if the vif interface cannot be
found,
but they go ahead and remove the IFB device, and so on.


> Thanks,
> Ian.
>
>

--047d7b874c921a65e804e5920f39
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Wed, Sep 4, 2013 at 12:16 PM, Ian Jackson <span dir=3D"=
ltr">&lt;<a href=3D"mailto:Ian.Jackson@eu.citrix.com" target=3D"_blank">Ian=
.Jackson@eu.citrix.com</a>&gt;</span> wrote:<br><div class=3D"gmail_extra">=
<div class=3D"gmail_quote">

<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">Shriram Rajagopalan writes (&quot;[Xen-devel] [PATCH 3 of =
5 V2] tools/libxl: setup/teardown Remus network buffering&quot;):<br>


<div class=3D"im">&gt; tools/libxl: setup/teardown Remus network buffering<=
br>
<br>
</div>Thanks.<br>
<br>
I&#39;m afraid the event handling, particularly with respect to<br>
subprocesses, is not yet right.<br>
<div class=3D"im"><br>
&gt; diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_dom.c<br>
&gt; --- a/tools/libxl/libxl_dom.c Thu Aug 29 14:36:25 2013 -0700<br>
&gt; +++ b/tools/libxl/libxl_dom.c Thu Aug 29 14:36:36 2013 -0700<br>
&gt; @@ -1259,7 +1259,7 @@ static void remus_checkpoint_dm_saved(li<br>
&gt; =A0 =A0 =A0/* REMUS TODO: Wait for disk and memory ack, release networ=
k buffer */<br>
&gt; =A0 =A0 =A0/* REMUS TODO: make this asynchronous */<br>
&gt; =A0 =A0 =A0assert(!rc); /* REMUS TODO handle this error properly */<br=
>
&gt; - =A0 =A0usleep(dss-&gt;interval * 1000);<br>
&gt; + =A0 =A0usleep(dss-&gt;remus_ctx-&gt;interval * 1000);<br>
&gt; =A0 =A0 =A0libxl__xc_domain_saverestore_async_callback_done(egc, &amp;=
dss-&gt;shs, 1);<br>
<br>
</div>Do you plan to fix this later (perhaps as part of a future series) ?<=
br>
<div class=3D"im"><br></div></blockquote><div><br></div><div>Its actually f=
ixed in the next patch in the same series.</div><div>=A0</div><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1p=
x;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1=
ex">

<div class=3D"im">
&gt; diff -r 79b21d778550 -r 3e8b0aa7cd4a tools/libxl/libxl_netbuffer.c<br>
&gt; --- /dev/null Thu Jan 01 00:00:00 1970 +0000<br>
&gt; +++ b/tools/libxl/libxl_netbuffer.c =A0 Thu Aug 29 14:36:36 2013 -0700=
<br>
&gt; @@ -0,0 +1,434 @@<br>
</div>...<br>
<div class=3D"im">&gt; +static void netbuf_script_child_death_cb(libxl__egc=
 *egc,<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 libxl__ev_child *child,<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 pid_t pid, int status)<br>
&gt; +{<br>
&gt; + =A0 =A0/* No-op. hotplug-error will be read in setup/teardown_ifb */=
<br>
&gt; + =A0 =A0return;<br>
&gt; +}<br>
<br>
</div>This can&#39;t possibly be right. =A0You must always wait for your ch=
ildren<br>
and you may not complete the ao until the child has finished. =A0This<br>
should all be documented in libxl_internal.h.<br>
<br></blockquote><div><br></div><div>Thats true. But this was not meant to =
be an ao (see explanation at</div><div>end of this email).</div><div>=A0</d=
iv><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bord=
er-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:soli=
d;padding-left:1ex">


Also I don&#39;t see where this other error handling code is.<br>
<div class=3D"im"><br>
&gt; + =A0 =A0if (!pid) {<br>
&gt; + =A0 =A0 =A0 =A0/* child: Launch netbuf script */<br>
&gt; + =A0 =A0 =A0 =A0libxl__exec(gc, -1, -1, -1, args[0], args, env);<br>
&gt; + =A0 =A0 =A0 =A0/* notreached */<br>
&gt; + =A0 =A0 =A0 =A0abort();<br>
&gt; + =A0 =A0}<br>
&gt; +<br>
&gt; + =A0 =A0assert(libxl__ev_child_inuse(&amp;childw_out));<br>
<br>
</div>What purpose does this serve ? =A0This can be false if the death_cb i=
s<br>
called from within libxl__ev_child_fork.<br>
<div class=3D"im"><br></div></blockquote><div><br></div><div>Sorry. This wa=
s copy pasted from code in device_hotplug(). I couldn&#39;t find</div><div>=
a simple fork &amp; exec abstraction. The libxl_ev_child_fork seemed to be<=
/div>

<div>the only option.</div><div>=A0</div><blockquote class=3D"gmail_quote" =
style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:r=
gb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=3D"im"=
>
&gt; + =A0 =A0rc =3D libxl__exec_netbuf_script(gc, domid, devid, vif,<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
script, &quot;setup&quot;);<br>
&gt; + =A0 =A0if (rc) return rc;<br>
<br>
</div>This definitely isn&#39;t right - it needs to be asynchronous, with a=
<br>
callback.<br>
<div class=3D"im"><br>
&gt; + =A0 =A0/* Now wait for the result (XENBUS_PATH/hotplug-status).<br>
&gt; + =A0 =A0 * It doesnt matter what the result is. If the hotplug-status=
 path<br>
&gt; + =A0 =A0 * appears, then we are good to go.<br>
&gt; + =A0 =A0 */<br>
&gt; + =A0 =A0rc =3D libxl__wait_for_offspring(gc, domid, LIBXL_HOTPLUG_TIM=
EOUT,<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
script,<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
/* path */<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
GCSPRINTF(&quot;%s/hotplug-status&quot;,<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 out_path_base),<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
NULL /* state */,<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
NULL, script_done_cb, NULL);<br>
<br>
</div>This wait does not actually block the current execution. </blockquote=
><div><br></div><div>It actually does. Atleast based on the code in libxl_e=
xec.c which uses</div><div>a select() call to wait on xenstore fds.</div>

<div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=
 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left=
-style:solid;padding-left:1ex">=A0It arranges<br>
for the callback to be called. =A0You need to break this function apart.<br=
>
<div class=3D"im"><br>
&gt; +static int libxl__netbuf_script_teardown(libxl__gc *gc, uint32_t domi=
d,<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 int devid, char *vif,<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 char *script)<br>
&gt; +{<br>
&gt; + =A0 =A0/* Nothing to wait for during tear down */<br>
&gt; + =A0 =A0return libxl__exec_netbuf_script(gc, domid, devid, vif,<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 script, &quot;teardown&quot;);<br>
<br>
</div>And this one.<br>
<div class=3D"im"><br>
&gt; + =A0 =A0for (i =3D 0; i &lt; num_netbufs; ++i) {<br>
&gt; +<br>
&gt; + =A0 =A0 =A0 =A0/* The setup script does the following<br>
&gt; + =A0 =A0 =A0 =A0 * find a free IFB to act as buffer for the vif.<br>
&gt; + =A0 =A0 =A0 =A0 * set up the plug qdisc on the IFB.<br>
&gt; + =A0 =A0 =A0 =A0 */<br>
&gt; + =A0 =A0 =A0 =A0ret =3D libxl__netbuf_script_setup(gc, domid, i, vif_=
list[i],<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 (char *) remus_ctx-&gt;netbufscript,<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 &amp;ifb_list[i]);<br>
<br>
</div>Does this run a script per netbuf ? =A0Do they want to run in<br>
parallel ?<br>
<div class=3D"im"><br></div></blockquote><div><br></div><div>A script per n=
etbuf. Running them in parallel is an overkill, since the remus</div><div>s=
etup is a one-time task and the average domain is expected to have few inte=
rfaces.</div>

<div>Besides, the script has no waits. It just issues a bunch of shell comm=
ands that don&#39;t block</div><div>=A0</div><blockquote class=3D"gmail_quo=
te" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-col=
or:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div class=3D"im">
&gt; +int libxl__remus_netbuf_teardown(libxl__gc *gc, uint32_t domid,<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0libxl__re=
mus_ctx *remus_ctx)<br>
&gt; +{<br>
</div>...<br>
<div class=3D"im">&gt; + =A0 =A0for (i =3D 0; i &lt; netbuf_ctx-&gt;num_net=
bufs; ++i)<br>
&gt; + =A0 =A0 =A0 =A0libxl__netbuf_script_teardown(gc, domid, i, vif_list[=
i],<br>
&gt; + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0(char *) remus_ctx-&gt;netbufscript);<br>
<br>
</div>I think you should definitely do this asynchronously and in parallel.=
<br>
<br></blockquote><div><br></div><div>See answer above for running in parall=
el. In terms of running this asynchronously,</div><div>I had a comment in t=
his file explaining why I decided to go the synchronous route.</div><div>

Basically, the setup and teardown are part of the remus API call, which is =
already</div><div>asynchronous.</div><div><br></div><div>The remus external=
 API (libxl_domain_remus_start) returns after starting remus.</div><div>

It goes like this:</div><div>=A0 setup scripts</div><div>=A0 suspend_domain=
 (which basically launches save_domain helper process)</div><div>=A0 return=
 AO_IN_PROGRESS</div><div><br></div><div>The setup scripts are have no long=
 running operations. They complete in jiffy</div>

<div>(adding a bunch of tc rules to interfaces)</div><div><br></div><div>Th=
e AO callback in libxl.c (named &quot;remus_replication_failure_cb&quot; [b=
eginning=A0</div><div>of this patch]), is called when the backup fails, i.e=
. when the replication=A0</div>

<div>operations fail (such as write() calls).</div><div><br></div><div>The =
AO callback calls the above teardown function. =A0I remember reading in the=
</div><div>comments that an AO op should not be issued from an AO callback=
=A0</div>

<div>(please correct me if I am wrong).</div><div><br></div><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;=
border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex=
">

What happens if I use &quot;xl destroy&quot; to destroy a domain which is b=
eing<br>
handled by remus.<br>
<br></blockquote><div><br></div><div>One of the replication steps (domain s=
uspend or domain resume) would fail.</div><div>This would result in the AO =
callback being invoked, which in turn invokes the</div><div>external script=
s. The scripts fail silently if the vif interface cannot be found,=A0</div>

<div>but they go ahead and remove the IFB device, and so on.</div><div>=A0<=
/div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:so=
lid;padding-left:1ex">


Thanks,<br>
Ian.<br>
<br>
</blockquote></div><br></div></div>

--047d7b874c921a65e804e5920f39--


--===============7012792103723273393==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

--===============7012792103723273393==--