From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Jackson Subject: Re: [PATCH 4 of 5 V2] tools/libxl: Control network buffering in remus callbacks Date: Thu, 5 Sep 2013 12:25:13 +0100 Message-ID: <21032.27161.197547.33583@mariner.uk.xensource.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Shriram Rajagopalan Cc: Andrew Cooper , Stefano Stabellini , Ian Jackson , Ian Campbell , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org Shriram Rajagopalan writes ("[Xen-devel] [PATCH 4 of 5 V2] tools/libxl: Control network buffering in remus callbacks"): > tools/libxl: Control network buffering in remus callbacks ... > @@ -1256,10 +1272,36 @@ static void libxl__remus_domain_checkpoi > static void remus_checkpoint_dm_saved(libxl__egc *egc, > libxl__domain_suspend_state *dss, int rc) > { I think there are some error handling problems in this function. > - /* REMUS TODO: Wait for disk and memory ack, release network buffer */ > - /* REMUS TODO: make this asynchronous */ > - assert(!rc); /* REMUS TODO handle this error properly */ > - usleep(dss->remus_ctx->interval * 1000); > + /* > + * REMUS TODO: Wait for disk and explicit memory ack (through restore > + * callback from remote) before releasing network buffer. > + */ > + libxl__remus_ctx *remus_ctx = dss->remus_ctx; > + struct timespec epoch; > + int ret; > + STATE_AO_GC(dss->ao); > + > + if (rc) { > + LOG(ERROR, "Failed to save device model. Terminating Remus.."); > + libxl__xc_domain_saverestore_async_callback_done(egc, > + &dss->shs, rc); Firstly, it would be much clearer if you used the "goto out" error handling style, rather than calling ..._callback_done and return separately each time. Secondly, here you return rc, a libxl error code. > + ret = libxl__remus_netbuf_release_prev_epoch(gc, dss->domid, > + remus_ctx); > + if (ret) { > + libxl__xc_domain_saverestore_async_callback_done(egc, > + &dss->shs, > + ret); Here you return "ret" which is what exactly ? > + return; ... > libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1); And here the original function returns "1". The specification for the callback is in xenguest.h's struct save_callbacks, the "checkpoint" member. It says the return value must be 0 or 1. So if there is a failure I think you need to stash the error code somewhere because libxc won't pass it through for you. Ian.