All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Ian Campbell <ian.campbell@citrix.com>
Cc: Ross Lagerwall <ross.lagerwall@citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Wei Liu <wei.liu2@citrix.com>,
	Xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2 16/27] tools/libxl: Infrastructure for reading a libxl migration v2 stream
Date: Fri, 10 Jul 2015 11:47:01 +0100	[thread overview]
Message-ID: <559FA2A5.9090105@citrix.com> (raw)
In-Reply-To: <1436523793.23508.224.camel@citrix.com>

On 10/07/15 11:23, Ian Campbell wrote:
> On Thu, 2015-07-09 at 19:26 +0100, Andrew Cooper wrote:
>> From: Ross Lagerwall <ross.lagerwall@citrix.com>
>>
>> This contains the event machinary and state machines to read an act on a
> "machinery"
>
> [...]
>
>
>> Large quantities of the logic here are completely overhauled since v1, mostly
>> as part of fixing the checkpoint buffering bug which was the cause of the
>> broken Remus failover.  The result is actually more simple overall;
> I agree, it looks much nicer, thanks!
>
>> +struct libxl__stream_read_state {
>> +    /* filled by the user */
>> +    libxl__ao *ao;
>> +    int fd;
>> +    void (*completion_callback)(libxl__egc *egc,
>> +                                libxl__stream_read_state *srs,
>> +                                int rc);
>> +    /* Private */
>> +    int rc;
>> +    bool running;
> [...]
>> +void libxl__stream_read_start(libxl__egc *egc,
>> +                              libxl__stream_read_state *stream)
>> +{
>> +    libxl__datacopier_state *dc = &stream->dc;
>> +    int ret = 0;
>> +
>> +    /* State initialisation. */
>> +    assert(!stream->running);
> Since running is declared private and there is no _init function (I
> think _start is effectively filling that role) I'm not sure that the
> caller can necessarily be expected to have initialised anything other
> than the ao, fd and callback fields.

It was a sanity check that _start() doesn't get called twice (guess what
I managed to do while developing).  It can probably be dropped.

>
> You might choose to handle this as a request for a doc comment ("must
> call LIBXL_FILLZERO on it to init"), or to add a separate init function
> containing the memset or to do away with this check. I've not gotten to
> the caller yet so I don't know which you will prefer.

It is all zeroed because of the way dcs is constructed.  I suppose I can
also drop the zeroing of the dc.

>
>> +
>> +    memset(dc, 0, sizeof(*dc));
>> +    dc->ao = stream->ao;
>> +    dc->readfd = stream->fd;
>> +    dc->writefd = -1;
>> +
>> +    /* Start reading the stream header. */
>> +    ret = setup_read(stream, "stream header",
>> +                     &stream->hdr, sizeof(stream->hdr),
>> +                     stream_header_done);
>> +    if (ret)
>> +        goto err;
>> +
>> +    stream->running = true;
>> +    stream->phase = SRS_PHASE_NORMAL;
>> +    LIBXL_STAILQ_INIT(&stream->record_queue);
>> +    stream->recursion_guard = 0;
>> +
>> +    assert(!ret);
>> +    return;
>> +
>> + err:
>> +    assert(ret);
>> +    stream_failed(egc, stream, ret);
> stream failed looks at stream->running, which due to the above might
> also be uninitialised here.

Oops yes. I fixed this in the write() side but forgot to propagate the
bugfix back to the read side.

>
>> +static void stream_done(libxl__egc *egc,
>> +                        libxl__stream_read_state *stream)
>> +{
>> +    libxl__sr_record_buf *rec, *trec;
>> +
>> +    assert(stream->running);
>> +    stream->running = false;
>> +
>> +    if (stream->emu_carefd)
>> +        libxl__carefd_close(stream->emu_carefd);
>> +
>> +    LIBXL_STAILQ_FOREACH_SAFE(rec, &stream->record_queue, entry, trec) {
>> +        free(rec->body);
>> +        free(rec);
>> +    }
> Am I right in thinking that we should only get here with a non-empty
> queue on failure? If so then perhaps:
>         assert(LIBXL_STAILQ_EMPTY(...) || stream->rc);
>         
> ?

I believe so.  There is no way the stream should succeed if there are
outstanding buffered records.

>
>> +
>> +    stream->completion_callback(egc, stream, stream->rc);
>> +}
>> +
>> +static void stream_continue(libxl__egc *egc,
>> +                            libxl__stream_read_state *stream)
>> +{
>> +    STATE_AO_GC(stream->ao);
>> +
>> +    /* Must not mutually recurse with process_record() */
>> +    assert(stream->recursion_guard == false);
>> +    stream->recursion_guard = true;
> This smells a bit like it ought to be a SRS_PHASE_PROCESSING or some
> such, but lets leave that alone...

This check is pre-emptively avoid the naive bug which would occur if
process_record() called back into stream_continue() and there were many
TOOLSTACK records back to back in the processing queue.

In that case (and potentially future records as well), the two functions
would mutually recurse based on the contents of the stream.

>
>> +
>> +    switch (stream->phase) {
>> +    case SRS_PHASE_NORMAL:
>> +        /*
>> +         * Normal phase of the stream.  We arrive here in several senarios.
> "scenarios"
>
>> +static void stream_header_done(libxl__egc *egc,
>> +                               libxl__datacopier_state *dc,
>> +                               int rc, int onwrite, int errnoval)
>> +{
>> +    libxl__stream_read_state *stream = CONTAINER_OF(dc, *stream, dc);
>> +    libxl__sr_hdr *hdr = &stream->hdr;
>> +    STATE_AO_GC(dc->ao);
>> +    int ret = 0;
>> +
>> +    if (rc || onwrite || errnoval) {
>> +        ret = ERROR_FAIL;
>> +        LOG(ERROR, "rc %d, onwrite %d, errnoval %d", rc, onwrite, errnoval);
> Could use LOGEV(ERRRO, errnoval, "rc %d, onweite %d", rc, onwrite);
> (for all cases I think).
>
> Actually, doesn't dc guarantee to always have already logged on fail?
> Comments in the libxl_internal.h suggest so, apart from the abort case,
> so I think maybe you can avoid logging explicitly here.

So it does.  That is handy.

>
>> +        goto err;
>> +    }
>> +
>> +    hdr->ident   = be64toh(hdr->ident);
>> +    hdr->version = be32toh(hdr->version);
>> +    hdr->options = be32toh(hdr->options);
>> +
>> +    if (hdr->ident != RESTORE_STREAM_IDENT) {
>> +        ret = ERROR_FAIL;
> Eventually I suspect the xapi people would like to see something more
> specific at least for the general "SRS header fail" if not the
> individual reasons.

If you don't object too strongly, I would prefer to leave that
bikeshedding to the error value improvements work.

>
>> +        LOG(ERROR,
>> +            "Invalid ident: expected 0x%016"PRIx64", got 0x%016"PRIx64,
>> +            RESTORE_STREAM_IDENT, hdr->ident);
>> +        goto err;
>> +    }
>> +    if (hdr->version != RESTORE_STREAM_VERSION) {
>> +        ret = ERROR_FAIL;
>> +        LOG(ERROR, "Unexpected Version: expected %u, got %u",
> hdr->version is a uint32_t, so PRIu32 would be more appropriate.

In both 32 and 64 builds they are equivalent.  All parameters are
promoted to unsigned int.

>
>> +            RESTORE_STREAM_VERSION, hdr->version);
>> +        goto err;
>> +    }
>> +    if (hdr->options & RESTORE_OPT_BIG_ENDIAN) {
>> +        ret = ERROR_FAIL;
>> +        LOG(ERROR, "Unable to handle big endian streams");
>> +        goto err;
>> +    }
>> +
>> +    LOG(DEBUG, "Stream v%u%s", hdr->version,
> and again.
>
> Actually looking around since you've used uintXX_t throughout the format
> structs, I think you need a lot more PRI[ux]FOO around the place.

Will do

>
> _If_ you've compile tested this for both 32- and 64-bit and it works we
> could perhaps leave that audit until later.
>
>> +static void setup_read_record(libxl__egc *egc,
>> +                              libxl__stream_read_state *stream)
>> +{
>> +    STATE_AO_GC(stream->ao);
>> +    libxl__sr_record_buf *rec = NULL;
>> +    int ret;
>> +
>> +    assert(stream->incoming_record == NULL);
>> +
>> +    stream->incoming_record = rec = libxl__zalloc(NOGC, sizeof(*rec));
> I recall Ian J and you discussing NOGC allocations on IRC. Was the
> conclusion that it was OK, or that it could be fixed later, or that it
> should be fixed now via an nested ao or something similar?
>
> Unless the answer is "fixed now" I think the reason for the NOGC should
> be in either the commit log or a comment (in the header, around about
> the definition of the allocated data structure).

I will add a note about in the commit message.  We agreed on IRC that
NOGC was OK.  It might be possible to switch to some nested ao later,
but that depends entirely on the COLO work.

~Andrew

  reply	other threads:[~2015-07-10 10:47 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-09 18:26 [PATCH v2 00/27] Libxl migration v2 Andrew Cooper
2015-07-09 18:26 ` [PATCH v2 01/27] bsd-sys-queue-h-seddery: Massage `offsetof' Andrew Cooper
2015-07-10  9:32   ` Ian Campbell
2015-07-09 18:26 ` [PATCH v2 02/27] tools/libxc: Always compile the compat qemu variables into xc_sr_context Andrew Cooper
2015-07-09 18:26 ` [PATCH v2 03/27] tools/libxl: Introduce ROUNDUP() Andrew Cooper
2015-07-09 18:26 ` [PATCH v2 04/27] tools/libxl: Introduce libxl__kill() Andrew Cooper
2015-07-10  1:34   ` Yang Hongyang
2015-07-10  8:56     ` Andrew Cooper
2015-07-10  9:08   ` Wei Liu
2015-07-10  9:25     ` Andrew Cooper
2015-07-10  9:34     ` Ian Campbell
2015-07-09 18:26 ` [PATCH v2 05/27] tools/libxl: Stash all restore parameters in domain_create_state Andrew Cooper
2015-07-09 18:26 ` [PATCH v2 06/27] tools/libxl: Split libxl__domain_create_state.restore_fd in two Andrew Cooper
2015-07-10  9:37   ` Ian Campbell
2015-07-09 18:26 ` [PATCH v2 07/27] tools/libxl: Extra management APIs for the save helper Andrew Cooper
2015-07-10  9:41   ` Ian Campbell
2015-07-10  9:52     ` Andrew Cooper
2015-07-09 18:26 ` [PATCH v2 08/27] tools/xl: Mandatory flag indicating the format of the migration stream Andrew Cooper
2015-07-09 18:26 ` [PATCH v2 09/27] docs: Libxl migration v2 stream specification Andrew Cooper
2015-07-10  9:46   ` Ian Campbell
2015-07-09 18:26 ` [PATCH v2 10/27] tools/python: Libxc migration v2 infrastructure Andrew Cooper
2015-07-09 18:26 ` [PATCH v2 11/27] tools/python: Libxl " Andrew Cooper
2015-07-09 18:26 ` [PATCH v2 12/27] tools/python: Other migration infrastructure Andrew Cooper
2015-07-10  9:48   ` Ian Campbell
2015-07-09 18:26 ` [PATCH v2 13/27] tools/python: Verification utility for v2 stream spec compliance Andrew Cooper
2015-07-09 18:26 ` [PATCH v2 14/27] tools/python: Conversion utility for legacy migration streams Andrew Cooper
2015-07-09 18:26 ` [PATCH v2 15/27] tools/libxl: Migration v2 stream format Andrew Cooper
2015-07-10  9:49   ` Ian Campbell
2015-07-09 18:26 ` [PATCH v2 16/27] tools/libxl: Infrastructure for reading a libxl migration v2 stream Andrew Cooper
2015-07-10 10:23   ` Ian Campbell
2015-07-10 10:47     ` Andrew Cooper [this message]
2015-07-10 11:16       ` Ian Jackson
2015-07-10 11:25       ` Ian Campbell
2015-07-10 12:28         ` Andrew Cooper
2015-07-10 12:46           ` Ian Jackson
2015-07-10 12:50             ` Andrew Cooper
2015-07-10 12:17   ` Ian Jackson
2015-07-10 12:56     ` Andrew Cooper
2015-07-10 13:09       ` Ian Jackson
2015-07-09 18:26 ` [PATCH v2 17/27] tools/libxl: Support converting a legacy stream to a " Andrew Cooper
2015-07-10 10:28   ` Ian Campbell
2015-07-10 10:39     ` Andrew Cooper
2015-07-10 12:28   ` Ian Jackson
2015-07-09 18:26 ` [PATCH v2 18/27] tools/libxl: Convert a legacy stream if needed Andrew Cooper
2015-07-10 10:31   ` Ian Campbell
2015-07-10 12:41   ` Ian Jackson
2015-07-09 18:26 ` [PATCH v2 19/27] tools/libxc+libxl+xl: Restore v2 streams Andrew Cooper
2015-07-10 10:45   ` Ian Campbell
2015-07-09 18:26 ` [PATCH v2 20/27] tools/libxl: Infrastructure for writing a v2 stream Andrew Cooper
2015-07-10 11:10   ` Ian Campbell
2015-07-09 18:26 ` [PATCH v2 21/27] tools/libxc+libxl+xl: Save v2 streams Andrew Cooper
2015-07-10 10:57   ` Ian Campbell
2015-07-09 18:26 ` [PATCH v2 22/27] docs/libxl: Introduce CHECKPOINT_END to support migration v2 remus streams Andrew Cooper
2015-07-10 10:59   ` Ian Campbell
2015-07-09 18:26 ` [PATCH v2 23/27] tools/libxl: Write checkpoint records into the stream Andrew Cooper
2015-07-10 11:02   ` Ian Campbell
2015-07-10 11:47   ` Wei Liu
2015-07-09 18:26 ` [PATCH v2 24/27] tools/libx{c, l}: Introduce restore_callbacks.checkpoint() Andrew Cooper
2015-07-10 11:13   ` Ian Campbell
2015-07-09 18:26 ` [PATCH v2 25/27] tools/libxl: Handle checkpoint records in a libxl migration v2 stream Andrew Cooper
2015-07-10 11:18   ` Ian Campbell
2015-07-10 14:34     ` Andrew Cooper
2015-07-09 18:26 ` [PATCH v2 26/27] tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc Andrew Cooper
2015-07-09 18:26 ` [PATCH v2 27/27] tools/libxl: Drop all knowledge of toolstack callbacks Andrew Cooper
2015-07-10  3:01 ` [PATCH v2 00/27] Libxl migration v2 Yang Hongyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=559FA2A5.9090105@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=ross.lagerwall@citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.