From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Jackson Subject: Re: [PATCH 05/19] libxl: domain save/restore: run in a separate process Date: Thu, 14 Jun 2012 17:48:44 +0100 Message-ID: <20442.5612.626430.387995@mariner.uk.xensource.com> References: <1339176870-32652-1-git-send-email-ian.jackson@eu.citrix.com> <1339176870-32652-6-git-send-email-ian.jackson@eu.citrix.com> <1339585498.24104.190.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1339585498.24104.190.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org Ian Campbell writes ("Re: [Xen-devel] [PATCH 05/19] libxl: domain save/restore: run in a separate process"): > On Fri, 2012-06-08 at 18:34 +0100, Ian Jackson wrote: > > libxenctrl expects to be able to simply run the save or restore > > operation synchronously. This won't work well in a process which is > > trying to handle multiple domains. ... > > diff --git a/.hgignore b/.hgignore > > index 27d8f79..05304ea 100644 > > ^tools/libxl/tmp\..*$ > > +^tools/libxl/.*\.new$ > > Our naming scheme for these sorts of temporary files is rather in > consistent (including an existing use of .new), oh well... Hmm, yes. I took my cue from libxl_list.h, which is immediately before _libxl_save_msgs_helper.[ch] in the Makefile. > > +libxl-save-helper: $(SAVE_HELPER_OBJS) libxenlight.so > > + $(CC) $(LDFLAGS) -o $@ $(SAVE_HELPER_OBJS) $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(APPEND_LDFLAGS) > > The changelog says that libxl-save-helper doesn't link libxenlight but > it is declared as a dependency here and CFLAGS_libxenlight is included > in SAVE_HELPER_OBJS' CFLAGS above. This is a mistake in the Makefile, which I have fixed. > > testidl: testidl.o libxlutil.so libxenlight.so > > $(CC) $(LDFLAGS) -o $@ testidl.o libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS) > [...] > > @@ -170,6 +184,7 @@ install: all > > $(INSTALL_DIR) $(DESTDIR)$(BASH_COMPLETION_DIR) > > $(INSTALL_DIR) $(DESTDIR)$(XEN_RUN_DIR) > > $(INSTALL_PROG) xl $(DESTDIR)$(SBINDIR) > > + $(INSTALL_PROG) libxl-save-helper $(DESTDIR)$(PRIVATE_BINDIR) > > This needs an INSTALL_DIR for $(DESTDIR)$(PRIVATE_BINDIR) to support > "make -C tools/libxl DESTDIR=xxx install" else: > install: cannot create regular file `/tmp/tmplKa0Y7/usr/lib/xen/bin': No such file or directory OK. I guess I never do that and it's a bit of a mystery why one would without having done a more general install before, but it's clearly a correct change. > > diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c > > index b48452c..fea0c94 100644 ... > > int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf, > > - uint32_t size, void *data) ... > > + libxl_ctx *ctx = CTX; > > Any reason you didn't s/ctx/CTX/ in one of your preparatory patches? I didn't want to do that unless it was necessary. There was already enough else going on. I don't think having a ctx rather than using CTX is wrong, although it is slightly less in the modern idiom. > > void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss, > > unsigned long vm_generationid_addr) > > { > > STATE_AO_GC(dss->ao); > > - int r; > > + int r, rc; > > + uint32_t toolstack_data_len; > > + > > + /* Resources we need to free */ > > + uint8_t *toolstack_data_buf = 0; > > + > > + unsigned cbflags = libxl__srm_callout_enumcallbacks_save > > + (&dss->shs.callbacks.save.a); > > + > > + r = libxl__toolstack_save(dss->domid, &toolstack_data_buf, > > + &toolstack_data_len, dss); > > This seems to be called twice? I'm looking at the source after the full > series is applied and I see > dss->shs.callbacks.save.toolstack_save = libxl__toolstack_save; > in libxl__domain_suspend as well as this call here. But in fact dss->shs.callbacks.save.toolstack_save is never used anywhere in that version. The bug is that libxl__xc_domain_save should call the callback function (if it is non-null), not call libxl__toolstack_save directly. > The callback one seems to be otherwise unused. Exactly. I have fixed this: * libxl__xc_domain_save uses supplied callback function pointer, rather than calling libxl__toolstack_save directly; toolstack data save callback is only supplied to libxc if in-libxl caller supplied a callback. > > +static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs, > > + const char *mode_arg, int data_fd, > > + const unsigned long *argnums, int num_argnums) > > +{ ... > > + libxl__carefd_begin(); > > + for (i=0; i<2; i++) { > > + int fds[2]; > > + if (libxl_pipe(CTX,fds)) { rc = ERROR_FAIL; goto out; } > > + childs_pipes[i] = libxl__carefd_record(CTX, fds[i]); > > + shs->pipes[i] = libxl__carefd_record(CTX, fds[!i]); > > Using !i here is clever, but I had to deploy a pen to be sure the > assignments were all correct. > > Open coding this simple loop would also give you the opportunity the > have comments like "/* This is the helper processes's stdout */" etc in > the appropriate place to aid in comprehension when checking the correct > end of each pipe goes in the right place. Perhaps the right answer is to use a better variable name than `i'. I wrote it out longhand and it looked like this: libxl__carefd_begin(); int fds[2]; /* child's stdin */ if (libxl_pipe(CTX,fds)) { rc = ERROR_FAIL; goto out; } childs_pipes[0] = libxl__carefd_record(CTX, fds[0]); shs->pipes[0] = libxl__carefd_record(CTX, fds[1]); /* child's stdout */ if (libxl_pipe(CTX,fds)) { rc = ERROR_FAIL; goto out; } childs_pipes[1] = libxl__carefd_record(CTX, fds[1]); shs->pipes[1] = libxl__carefd_record(CTX, fds[0]); libxl__carefd_unlock(); which is a great way to obscure the subtle differences between the two stanzas. How about: libxl__carefd_begin(); int childfd; for (childfd=0; childfd<2; childfd++) { /* Setting up the pipe for the child's fd childfd */ int fds[2]; if (libxl_pipe(CTX,fds)) { rc = ERROR_FAIL; goto out; } int childs_end = childfd==0 ? 0 /*read*/ : 1 /*write*/; int our_end = childfd==0 ? 1 /*write*/ : 0 /*read*/; childs_pipes[childfd] = libxl__carefd_record(CTX, fds[childs_end]); shs->pipes[childfd] = libxl__carefd_record(CTX, fds[our_end]); } libxl__carefd_unlock(); ? > > +static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev, > > + int fd, short events, short revents) > > +{ > > + libxl__save_helper_state *shs = CONTAINER_OF(ev, *shs, readable); > > + STATE_AO_GC(shs->ao); > > + int rc, errnoval; > > + > > + if (revents & POLLPRI) { > > + LOG(ERROR, "%s signaled POLLERR", shs->stdout_what); > > signalled ? > > (it's not impossible that my spell checker is wrong, google seem to > think both are ok) http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html uses "signaled". > The if says POLLPRI but the log says POLLERR? This should check and log both. > > + rc = ERROR_FAIL; > > + out: > > + /* this is here because otherwise we bypass the decl of msg[] */ > > I don't get this comment, why can't "out:" be in the normal place after > the non-error return? Because it is not legal to "goto" within the scope of a variable whose size is not a constant. The alternative would be to introduce a block scope starting before `unsigned char msg[msglen]' and ending after `return'. Arguably msg[msglen] is asking too much (up to 64K) of the caller's stack. Should I change it ? > > + if (!shs->completed) { > > + if (!shs->rc) > > + LOG(ERROR,"%s exited without signaling completion",what); > > signalling again, same caveat I'll go with what SuS says. > > --- /dev/null > > +++ b/tools/libxl/libxl_save_helper.c > > @@ -0,0 +1,279 @@ > > +/* > > + * Copyright (C) 2012 Citrix Ltd. > > + * > > + * This program is free software; you can redistribute it and/or modify > > + * it under the terms of the GNU Lesser General Public License as published > > + * by the Free Software Foundation; version 2.1 only. with the special > > + * exception on linking described in file LICENSE. > > LGPL, or perhaps GPL since this is a helper? > > (I suppose the same argument could be made for xl itself) I can see no reason to deviate from the licence of the rest of libxl. If nothing else, code may move between the helper and libxl proper. > > +int helper_getreply(void *user) > > +{ > > + int v; > > + int r = read_exactly(0, &v, sizeof(v)); > > + if (r<=0) exit(-2); > > You use -1 a lot but here you use -2, are we supposed to be able to > infer something from the specific error code? Not really. I used -1 for general system call failures. This particular situation might include a protocol error by the parent, though, so I thought I'd distinguish it. That way if you get a message saying the helper exited with a nonzero exit status foobar you get slightly more information. I could make this more formal and be more consistent with these exit statuses, or alternatively I could abolish it. I don't think it's critical - nothing turns on this exit status. > > diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl > > new file mode 100755 > > index 0000000..cd0837e > > --- /dev/null > > +++ b/tools/libxl/libxl_save_msgs_gen.pl > > @@ -0,0 +1,393 @@ > > +#!/usr/bin/perl -w > > + > > +use warnings; > > +use strict; > > +use POSIX; > > + > > +our $debug = 0; # produce copius debugging output at run-time? > > copious > > (which is probably the most useful feedback you are going to get from me > on a Perl script ;-)) Fixed. > > +foreach my $ah (qw(callout helper)) { > > + $out_body{$ah} .= < [...] > > +END > [...] > > +END > [...] > > +END > > I think I can infer what this does, but wow ;-) Following a suggestion from Mark Wooding I have changed this to: < #include #include #include END_BOTH #include "libxl_internal.h" END_CALLOUT #include "_libxl_save_msgs_${ah}.h" #include #include END_HELPER which I think is much clearer. Ian.