From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32886) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1agxLC-0000MR-NH for qemu-devel@nongnu.org; Fri, 18 Mar 2016 12:33:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1agxL8-0007yN-K4 for qemu-devel@nongnu.org; Fri, 18 Mar 2016 12:33:34 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44180) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1agxL8-0007yB-As for qemu-devel@nongnu.org; Fri, 18 Mar 2016 12:33:30 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (Postfix) with ESMTPS id BB4E7C076620 for ; Fri, 18 Mar 2016 16:33:29 +0000 (UTC) Date: Fri, 18 Mar 2016 16:33:25 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20160318163323.GI2246@work-vm> References: <1458311463-28272-1-git-send-email-berrange@redhat.com> <1458311463-28272-11-git-send-email-berrange@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1458311463-28272-11-git-send-email-berrange@redhat.com> Subject: Re: [Qemu-devel] [PATCH v5 10/28] migration: add reporting of errors for outgoing migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Daniel P. Berrange" Cc: Amit Shah , qemu-devel@nongnu.org, Juan Quintela * Daniel P. Berrange (berrange@redhat.com) wrote: > Currently if an application initiates an outgoing migration, > it may or may not, get an error reported back on failure. If > the error occurs synchronously to the 'migrate' command > execution, the client app will see the error message. This > is the case for DNS lookup failures. If the error occurs > asynchronously to the monitor command though, the error > will be thrown away and the client left guessing about > what went wrong. This is the case for failure to connect > to the TCP server (eg due to wrong port, or firewall > rules, or other similar errors). > > In the future we'll be adding more scope for errors to > happen asynchronously with the TLS protocol handshake. > TLS errors are hard to diagnose even when they are well > reported, so discarding errors entirely will make it > impossible to debug TLS connection problems. > > Management apps which do migration are already using > 'query-migrate' / 'info migrate' to check up on progress > of background migration operations and to see their end > status. This is a fine place to also include the error > message when things go wrong. > > This patch thus adds an 'error-desc' field to the > MigrationInfo struct, which will be populated when > the 'status' is set to 'failed': Reviewed-by: Dr. David Alan Gilbert > (qemu) migrate -d tcp:localhost:9001 > (qemu) info migrate > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off > Migration status: failed (Error connecting to socket: Connection refused) > total time: 0 milliseconds > > In the HMP, when doing non-detached migration, it is > also possible to display this error message directly > to the app. > > (qemu) migrate tcp:localhost:9001 > Error connecting to socket: Connection refused > > Or with QMP > > { > "execute": "query-migrate", > "arguments": {} > } > { > "return": { > "status": "failed", > "error-desc": "address resolution failed for myhost:9000: No address associated with hostname" > } > } > > Signed-off-by: Daniel P. Berrange > --- > hmp.c | 13 ++++++++++++- > include/migration/migration.h | 5 ++++- > include/qapi/error.h | 2 +- > migration/migration.c | 15 ++++++++++++--- > migration/rdma.c | 10 +++------- > migration/tcp.c | 2 +- > migration/unix.c | 2 +- > qapi-schema.json | 7 ++++++- > trace-events | 2 +- > util/error.c | 2 +- > 10 files changed, 42 insertions(+), 18 deletions(-) > > diff --git a/hmp.c b/hmp.c > index 5b6084a..7126f17 100644 > --- a/hmp.c > +++ b/hmp.c > @@ -34,6 +34,7 @@ > #include "ui/console.h" > #include "block/qapi.h" > #include "qemu-io.h" > +#include "qemu/error-report.h" > > #ifdef CONFIG_SPICE > #include > @@ -167,8 +168,15 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict) > } > > if (info->has_status) { > - monitor_printf(mon, "Migration status: %s\n", > + monitor_printf(mon, "Migration status: %s", > MigrationStatus_lookup[info->status]); > + if (info->status == MIGRATION_STATUS_FAILED && > + info->has_error_desc) { > + monitor_printf(mon, " (%s)\n", info->error_desc); > + } else { > + monitor_printf(mon, "\n"); > + } > + > monitor_printf(mon, "total time: %" PRIu64 " milliseconds\n", > info->total_time); > if (info->has_expected_downtime) { > @@ -1532,6 +1540,9 @@ static void hmp_migrate_status_cb(void *opaque) > if (status->is_block_migration) { > monitor_printf(status->mon, "\n"); > } > + if (info->has_error_desc) { > + error_report("%s", info->error_desc); > + } > monitor_resume(status->mon); > timer_del(status->timer); > g_free(status); > diff --git a/include/migration/migration.h b/include/migration/migration.h > index e335380..46c1bbe 100644 > --- a/include/migration/migration.h > +++ b/include/migration/migration.h > @@ -171,6 +171,9 @@ struct MigrationState > QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests; > /* The RAMBlock used in the last src_page_request */ > RAMBlock *last_req_rb; > + > + /* The last error that occurred */ > + Error *error; > }; > > void migrate_set_state(int *state, int old_state, int new_state); > @@ -207,7 +210,7 @@ void rdma_start_outgoing_migration(void *opaque, const char *host_port, Error ** > > void rdma_start_incoming_migration(const char *host_port, Error **errp); > > -void migrate_fd_error(MigrationState *s); > +void migrate_fd_error(MigrationState *s, const Error *error); > > void migrate_fd_connect(MigrationState *s); > > diff --git a/include/qapi/error.h b/include/qapi/error.h > index 02e9dd2..c7e2869 100644 > --- a/include/qapi/error.h > +++ b/include/qapi/error.h > @@ -139,7 +139,7 @@ typedef enum ErrorClass { > /* > * Get @err's human-readable error message. > */ > -const char *error_get_pretty(Error *err); > +const char *error_get_pretty(const Error *err); > > /* > * Get @err's error class. > diff --git a/migration/migration.c b/migration/migration.c > index c22c0eb..b4a63a9 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -691,6 +691,10 @@ MigrationInfo *qmp_query_migrate(Error **errp) > break; > case MIGRATION_STATUS_FAILED: > info->has_status = true; > + if (s->error) { > + info->has_error_desc = true; > + info->error_desc = g_strdup(error_get_pretty(s->error)); > + } > break; > case MIGRATION_STATUS_CANCELLED: > info->has_status = true; > @@ -863,12 +867,15 @@ static void migrate_fd_cleanup(void *opaque) > notifier_list_notify(&migration_state_notifiers, s); > } > > -void migrate_fd_error(MigrationState *s) > +void migrate_fd_error(MigrationState *s, const Error *error) > { > - trace_migrate_fd_error(); > + trace_migrate_fd_error(error ? error_get_pretty(error) : ""); > assert(s->to_dst_file == NULL); > migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, > MIGRATION_STATUS_FAILED); > + if (!s->error) { > + s->error = error_copy(error); > + } > notifier_list_notify(&migration_state_notifiers, s); > } > > @@ -967,6 +974,8 @@ MigrationState *migrate_init(const MigrationParams *params) > s->postcopy_after_devices = false; > s->migration_thread_running = false; > s->last_req_rb = NULL; > + error_free(s->error); > + s->error = NULL; > > migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP); > > @@ -1067,7 +1076,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, > } > > if (local_err) { > - migrate_fd_error(s); > + migrate_fd_error(s, local_err); > error_propagate(errp, local_err); > return; > } > diff --git a/migration/rdma.c b/migration/rdma.c > index 187fc1c..cd33d90 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -3487,16 +3487,14 @@ void rdma_start_outgoing_migration(void *opaque, > const char *host_port, Error **errp) > { > MigrationState *s = opaque; > - Error *local_err = NULL, **temp = &local_err; > - RDMAContext *rdma = qemu_rdma_data_init(host_port, &local_err); > + RDMAContext *rdma = qemu_rdma_data_init(host_port, errp); > int ret = 0; > > if (rdma == NULL) { > - ERROR(temp, "Failed to initialize RDMA data structures! %d", ret); > goto err; > } > > - ret = qemu_rdma_source_init(rdma, &local_err, > + ret = qemu_rdma_source_init(rdma, errp, > s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL]); > > if (ret) { > @@ -3504,7 +3502,7 @@ void rdma_start_outgoing_migration(void *opaque, > } > > trace_rdma_start_outgoing_migration_after_rdma_source_init(); > - ret = qemu_rdma_connect(rdma, &local_err); > + ret = qemu_rdma_connect(rdma, errp); > > if (ret) { > goto err; > @@ -3516,7 +3514,5 @@ void rdma_start_outgoing_migration(void *opaque, > migrate_fd_connect(s); > return; > err: > - error_propagate(errp, local_err); > g_free(rdma); > - migrate_fd_error(s); > } > diff --git a/migration/tcp.c b/migration/tcp.c > index e1fa7f8..d0e0db9 100644 > --- a/migration/tcp.c > +++ b/migration/tcp.c > @@ -40,7 +40,7 @@ static void tcp_wait_for_connect(int fd, Error *err, void *opaque) > if (fd < 0) { > DPRINTF("migrate connect error: %s\n", error_get_pretty(err)); > s->to_dst_file = NULL; > - migrate_fd_error(s); > + migrate_fd_error(s, err); > } else { > DPRINTF("migrate connect success\n"); > s->to_dst_file = qemu_fopen_socket(fd, "wb"); > diff --git a/migration/unix.c b/migration/unix.c > index d9aac36..b3537fd 100644 > --- a/migration/unix.c > +++ b/migration/unix.c > @@ -40,7 +40,7 @@ static void unix_wait_for_connect(int fd, Error *err, void *opaque) > if (fd < 0) { > DPRINTF("migrate connect error: %s\n", error_get_pretty(err)); > s->to_dst_file = NULL; > - migrate_fd_error(s); > + migrate_fd_error(s, err); > } else { > DPRINTF("migrate connect success\n"); > s->to_dst_file = qemu_fopen_socket(fd, "wb"); > diff --git a/qapi-schema.json b/qapi-schema.json > index f253a37..a3fe948 100644 > --- a/qapi-schema.json > +++ b/qapi-schema.json > @@ -484,6 +484,10 @@ > # throttled during auto-converge. This is only present when auto-converge > # has started throttling guest cpus. (Since 2.5) > # > +# @error-desc: #optional the human readable error description string, when > +# @status is 'failed'. Clients should not attempt to parse the > +# error strings. (Since 2.6) > +# > # Since: 0.14.0 > ## > { 'struct': 'MigrationInfo', > @@ -494,7 +498,8 @@ > '*expected-downtime': 'int', > '*downtime': 'int', > '*setup-time': 'int', > - '*x-cpu-throttle-percentage': 'int'} } > + '*x-cpu-throttle-percentage': 'int', > + '*error-desc': 'str'} } > > ## > # @query-migrate > diff --git a/trace-events b/trace-events > index d494de1..8a29e94 100644 > --- a/trace-events > +++ b/trace-events > @@ -1483,7 +1483,7 @@ await_return_path_close_on_source_close(void) "" > await_return_path_close_on_source_joining(void) "" > migrate_set_state(int new_state) "new state %d" > migrate_fd_cleanup(void) "" > -migrate_fd_error(void) "" > +migrate_fd_error(const char *error_desc) "error=%s" > migrate_fd_cancel(void) "" > migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at %zx len %zx" > migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")" > diff --git a/util/error.c b/util/error.c > index 47f93af..16132ec 100644 > --- a/util/error.c > +++ b/util/error.c > @@ -216,7 +216,7 @@ ErrorClass error_get_class(const Error *err) > return err->err_class; > } > > -const char *error_get_pretty(Error *err) > +const char *error_get_pretty(const Error *err) > { > return err->msg; > } > -- > 2.5.0 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK