From: Peter Xu <peterx@redhat.com> To: guangrong.xiao@gmail.com Cc: kvm@vger.kernel.org, mst@redhat.com, mtosatti@redhat.com, Xiao Guangrong <xiaoguangrong@tencent.com>, qemu-devel@nongnu.org, pbonzini@redhat.com Subject: Re: [PATCH 3/8] migration: support to detect compression and decompression errors Date: Wed, 21 Mar 2018 18:00:43 +0800 [thread overview] Message-ID: <20180321100043.GA30634@xz-mi> (raw) In-Reply-To: <20180313075739.11194-4-xiaoguangrong@tencent.com> On Tue, Mar 13, 2018 at 03:57:34PM +0800, guangrong.xiao@gmail.com wrote: > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > Currently the page being compressed is allowed to be updated by > the VM on the source QEMU, correspondingly the destination QEMU > just ignores the decompression error. However, we completely miss > the chance to catch real errors, then the VM is corrupted silently > > To make the migration more robuster, we copy the page to a buffer > first to avoid it being written by VM, then detect and handle the > errors of both compression and decompression errors properly Not sure I missed anything important, but I'll just shoot my thoughts as questions (again)... Actually this is a more general question? Say, even without compression, we can be sending a page that is being modified. However, IMHO we don't need to worry that, since if that page is modified, we'll definitely send that page again, so the new page will replace the old. So on destination side, even if decompress() failed on a page it'll be fine IMHO. Though now we are copying the corrupted buffer. On that point, I fully agree that we should not - maybe we can just drop the page entirely? For non-compress pages, we can't detect that, so we'll copy the page even if corrupted. The special part for compression would be: would the deflate() fail if there is concurrent update to the buffer being compressed? And would that corrupt the whole compression stream, or it would only fail the deflate() call? Thanks, > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> > --- > migration/qemu-file.c | 4 ++-- > migration/ram.c | 29 +++++++++++++++++++---------- > 2 files changed, 21 insertions(+), 12 deletions(-) > > diff --git a/migration/qemu-file.c b/migration/qemu-file.c > index 1ff33a1ffb..137bcc8bdc 100644 > --- a/migration/qemu-file.c > +++ b/migration/qemu-file.c > @@ -711,9 +711,9 @@ ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream, > blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t), > blen, p, size); > if (blen < 0) { > - error_report("Compress Failed!"); > - return 0; > + return -1; > } > + > qemu_put_be32(f, blen); > if (f->ops->writev_buffer) { > add_to_iovec(f, f->buf + f->buf_index, blen, false); > diff --git a/migration/ram.c b/migration/ram.c > index fff3f31e90..c47185d38c 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -273,6 +273,7 @@ struct DecompressParam { > bool quit; > QemuMutex mutex; > QemuCond cond; > + QEMUFile *file; > void *des; > uint8_t *compbuf; > int len; > @@ -1051,11 +1052,13 @@ static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, > { > RAMState *rs = ram_state; > int bytes_sent, blen; > - uint8_t *p = block->host + (offset & TARGET_PAGE_MASK); > + uint8_t buf[TARGET_PAGE_SIZE], *p; > > + p = block->host + (offset & TARGET_PAGE_MASK); > bytes_sent = save_page_header(rs, f, block, offset | > RAM_SAVE_FLAG_COMPRESS_PAGE); > - blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE); > + memcpy(buf, p, TARGET_PAGE_SIZE); > + blen = qemu_put_compression_data(f, stream, buf, TARGET_PAGE_SIZE); > if (blen < 0) { > bytes_sent = 0; > qemu_file_set_error(migrate_get_current()->to_dst_file, blen); > @@ -2547,7 +2550,7 @@ static void *do_data_decompress(void *opaque) > DecompressParam *param = opaque; > unsigned long pagesize; > uint8_t *des; > - int len; > + int len, ret; > > qemu_mutex_lock(¶m->mutex); > while (!param->quit) { > @@ -2563,8 +2566,12 @@ static void *do_data_decompress(void *opaque) > * not a problem because the dirty page will be retransferred > * and uncompress() won't break the data in other pages. > */ > - qemu_uncompress(¶m->stream, des, pagesize, > - param->compbuf, len); > + ret = qemu_uncompress(¶m->stream, des, pagesize, > + param->compbuf, len); > + if (ret < 0) { > + error_report("decompress data failed"); > + qemu_file_set_error(param->file, ret); > + } > > qemu_mutex_lock(&decomp_done_lock); > param->done = true; > @@ -2581,12 +2588,12 @@ static void *do_data_decompress(void *opaque) > return NULL; > } > > -static void wait_for_decompress_done(void) > +static int wait_for_decompress_done(QEMUFile *f) > { > int idx, thread_count; > > if (!migrate_use_compression()) { > - return; > + return 0; > } > > thread_count = migrate_decompress_threads(); > @@ -2597,6 +2604,7 @@ static void wait_for_decompress_done(void) > } > } > qemu_mutex_unlock(&decomp_done_lock); > + return qemu_file_get_error(f); > } > > static void compress_threads_load_cleanup(void) > @@ -2635,7 +2643,7 @@ static void compress_threads_load_cleanup(void) > decomp_param = NULL; > } > > -static int compress_threads_load_setup(void) > +static int compress_threads_load_setup(QEMUFile *f) > { > int i, thread_count; > > @@ -2654,6 +2662,7 @@ static int compress_threads_load_setup(void) > } > decomp_param[i].stream.opaque = &decomp_param[i]; > > + decomp_param[i].file = f; > qemu_mutex_init(&decomp_param[i].mutex); > qemu_cond_init(&decomp_param[i].cond); > decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE)); > @@ -2708,7 +2717,7 @@ static void decompress_data_with_multi_threads(QEMUFile *f, > */ > static int ram_load_setup(QEMUFile *f, void *opaque) > { > - if (compress_threads_load_setup()) { > + if (compress_threads_load_setup(f)) { > return -1; > } > > @@ -3063,7 +3072,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) > } > } > > - wait_for_decompress_done(); > + ret |= wait_for_decompress_done(f); > rcu_read_unlock(); > trace_ram_load_complete(ret, seq_iter); > return ret; > -- > 2.14.3 > > -- Peter Xu
WARNING: multiple messages have this Message-ID (diff)
From: Peter Xu <peterx@redhat.com> To: guangrong.xiao@gmail.com Cc: pbonzini@redhat.com, mst@redhat.com, mtosatti@redhat.com, Xiao Guangrong <xiaoguangrong@tencent.com>, qemu-devel@nongnu.org, kvm@vger.kernel.org Subject: Re: [Qemu-devel] [PATCH 3/8] migration: support to detect compression and decompression errors Date: Wed, 21 Mar 2018 18:00:43 +0800 [thread overview] Message-ID: <20180321100043.GA30634@xz-mi> (raw) In-Reply-To: <20180313075739.11194-4-xiaoguangrong@tencent.com> On Tue, Mar 13, 2018 at 03:57:34PM +0800, guangrong.xiao@gmail.com wrote: > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > Currently the page being compressed is allowed to be updated by > the VM on the source QEMU, correspondingly the destination QEMU > just ignores the decompression error. However, we completely miss > the chance to catch real errors, then the VM is corrupted silently > > To make the migration more robuster, we copy the page to a buffer > first to avoid it being written by VM, then detect and handle the > errors of both compression and decompression errors properly Not sure I missed anything important, but I'll just shoot my thoughts as questions (again)... Actually this is a more general question? Say, even without compression, we can be sending a page that is being modified. However, IMHO we don't need to worry that, since if that page is modified, we'll definitely send that page again, so the new page will replace the old. So on destination side, even if decompress() failed on a page it'll be fine IMHO. Though now we are copying the corrupted buffer. On that point, I fully agree that we should not - maybe we can just drop the page entirely? For non-compress pages, we can't detect that, so we'll copy the page even if corrupted. The special part for compression would be: would the deflate() fail if there is concurrent update to the buffer being compressed? And would that corrupt the whole compression stream, or it would only fail the deflate() call? Thanks, > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> > --- > migration/qemu-file.c | 4 ++-- > migration/ram.c | 29 +++++++++++++++++++---------- > 2 files changed, 21 insertions(+), 12 deletions(-) > > diff --git a/migration/qemu-file.c b/migration/qemu-file.c > index 1ff33a1ffb..137bcc8bdc 100644 > --- a/migration/qemu-file.c > +++ b/migration/qemu-file.c > @@ -711,9 +711,9 @@ ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream, > blen = qemu_compress_data(stream, f->buf + f->buf_index + sizeof(int32_t), > blen, p, size); > if (blen < 0) { > - error_report("Compress Failed!"); > - return 0; > + return -1; > } > + > qemu_put_be32(f, blen); > if (f->ops->writev_buffer) { > add_to_iovec(f, f->buf + f->buf_index, blen, false); > diff --git a/migration/ram.c b/migration/ram.c > index fff3f31e90..c47185d38c 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -273,6 +273,7 @@ struct DecompressParam { > bool quit; > QemuMutex mutex; > QemuCond cond; > + QEMUFile *file; > void *des; > uint8_t *compbuf; > int len; > @@ -1051,11 +1052,13 @@ static int do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, > { > RAMState *rs = ram_state; > int bytes_sent, blen; > - uint8_t *p = block->host + (offset & TARGET_PAGE_MASK); > + uint8_t buf[TARGET_PAGE_SIZE], *p; > > + p = block->host + (offset & TARGET_PAGE_MASK); > bytes_sent = save_page_header(rs, f, block, offset | > RAM_SAVE_FLAG_COMPRESS_PAGE); > - blen = qemu_put_compression_data(f, stream, p, TARGET_PAGE_SIZE); > + memcpy(buf, p, TARGET_PAGE_SIZE); > + blen = qemu_put_compression_data(f, stream, buf, TARGET_PAGE_SIZE); > if (blen < 0) { > bytes_sent = 0; > qemu_file_set_error(migrate_get_current()->to_dst_file, blen); > @@ -2547,7 +2550,7 @@ static void *do_data_decompress(void *opaque) > DecompressParam *param = opaque; > unsigned long pagesize; > uint8_t *des; > - int len; > + int len, ret; > > qemu_mutex_lock(¶m->mutex); > while (!param->quit) { > @@ -2563,8 +2566,12 @@ static void *do_data_decompress(void *opaque) > * not a problem because the dirty page will be retransferred > * and uncompress() won't break the data in other pages. > */ > - qemu_uncompress(¶m->stream, des, pagesize, > - param->compbuf, len); > + ret = qemu_uncompress(¶m->stream, des, pagesize, > + param->compbuf, len); > + if (ret < 0) { > + error_report("decompress data failed"); > + qemu_file_set_error(param->file, ret); > + } > > qemu_mutex_lock(&decomp_done_lock); > param->done = true; > @@ -2581,12 +2588,12 @@ static void *do_data_decompress(void *opaque) > return NULL; > } > > -static void wait_for_decompress_done(void) > +static int wait_for_decompress_done(QEMUFile *f) > { > int idx, thread_count; > > if (!migrate_use_compression()) { > - return; > + return 0; > } > > thread_count = migrate_decompress_threads(); > @@ -2597,6 +2604,7 @@ static void wait_for_decompress_done(void) > } > } > qemu_mutex_unlock(&decomp_done_lock); > + return qemu_file_get_error(f); > } > > static void compress_threads_load_cleanup(void) > @@ -2635,7 +2643,7 @@ static void compress_threads_load_cleanup(void) > decomp_param = NULL; > } > > -static int compress_threads_load_setup(void) > +static int compress_threads_load_setup(QEMUFile *f) > { > int i, thread_count; > > @@ -2654,6 +2662,7 @@ static int compress_threads_load_setup(void) > } > decomp_param[i].stream.opaque = &decomp_param[i]; > > + decomp_param[i].file = f; > qemu_mutex_init(&decomp_param[i].mutex); > qemu_cond_init(&decomp_param[i].cond); > decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE)); > @@ -2708,7 +2717,7 @@ static void decompress_data_with_multi_threads(QEMUFile *f, > */ > static int ram_load_setup(QEMUFile *f, void *opaque) > { > - if (compress_threads_load_setup()) { > + if (compress_threads_load_setup(f)) { > return -1; > } > > @@ -3063,7 +3072,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) > } > } > > - wait_for_decompress_done(); > + ret |= wait_for_decompress_done(f); > rcu_read_unlock(); > trace_ram_load_complete(ret, seq_iter); > return ret; > -- > 2.14.3 > > -- Peter Xu
next prev parent reply other threads:[~2018-03-21 10:00 UTC|newest] Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-03-13 7:57 [PATCH 0/8] migration: improve and cleanup compression guangrong.xiao 2018-03-13 7:57 ` [Qemu-devel] " guangrong.xiao 2018-03-13 7:57 ` [PATCH 1/8] migration: stop compressing page in migration thread guangrong.xiao 2018-03-13 7:57 ` [Qemu-devel] " guangrong.xiao 2018-03-15 10:25 ` Dr. David Alan Gilbert 2018-03-15 10:25 ` [Qemu-devel] " Dr. David Alan Gilbert 2018-03-16 8:05 ` Xiao Guangrong 2018-03-16 8:05 ` [Qemu-devel] " Xiao Guangrong 2018-03-19 12:11 ` Dr. David Alan Gilbert 2018-03-19 12:11 ` [Qemu-devel] " Dr. David Alan Gilbert 2018-03-21 8:19 ` Peter Xu 2018-03-21 8:19 ` [Qemu-devel] " Peter Xu 2018-03-22 11:38 ` Xiao Guangrong 2018-03-22 11:38 ` [Qemu-devel] " Xiao Guangrong 2018-03-26 9:02 ` Peter Xu 2018-03-26 9:02 ` [Qemu-devel] " Peter Xu 2018-03-26 15:43 ` Xiao Guangrong 2018-03-26 15:43 ` [Qemu-devel] " Xiao Guangrong 2018-03-27 7:33 ` Peter Xu 2018-03-27 7:33 ` [Qemu-devel] " Peter Xu 2018-03-27 19:12 ` Dr. David Alan Gilbert 2018-03-27 19:12 ` [Qemu-devel] " Dr. David Alan Gilbert 2018-03-28 3:01 ` Wang, Wei W 2018-03-28 3:01 ` [Qemu-devel] " Wang, Wei W 2018-03-27 15:24 ` Xiao Guangrong 2018-03-27 15:24 ` [Qemu-devel] " Xiao Guangrong 2018-03-28 7:30 ` Wei Wang 2018-03-28 7:30 ` [Qemu-devel] " Wei Wang 2018-03-28 7:37 ` Peter Xu 2018-03-28 7:37 ` [Qemu-devel] " Peter Xu 2018-03-28 8:30 ` Wei Wang 2018-03-28 8:30 ` [Qemu-devel] " Wei Wang 2018-03-13 7:57 ` [PATCH 2/8] migration: stop allocating and freeing memory frequently guangrong.xiao 2018-03-13 7:57 ` [Qemu-devel] " guangrong.xiao 2018-03-15 11:03 ` Dr. David Alan Gilbert 2018-03-15 11:03 ` [Qemu-devel] " Dr. David Alan Gilbert 2018-03-16 8:19 ` Xiao Guangrong 2018-03-16 8:19 ` [Qemu-devel] " Xiao Guangrong 2018-03-19 10:54 ` Dr. David Alan Gilbert 2018-03-19 10:54 ` [Qemu-devel] " Dr. David Alan Gilbert 2018-03-19 12:11 ` Xiao Guangrong 2018-03-19 12:11 ` [Qemu-devel] " Xiao Guangrong 2018-03-19 1:49 ` [PATCH 2/8] migration: stop allocating and freeingmemory frequently jiang.biao2 2018-03-19 1:49 ` [Qemu-devel] " jiang.biao2 2018-03-19 4:03 ` Xiao Guangrong 2018-03-19 4:03 ` [Qemu-devel] " Xiao Guangrong 2018-03-19 4:48 ` [PATCH 2/8] migration: stop allocating andfreeingmemory frequently jiang.biao2 2018-03-19 4:48 ` [Qemu-devel] " jiang.biao2 2018-03-21 9:06 ` [PATCH 2/8] migration: stop allocating and freeing memory frequently Peter Xu 2018-03-21 9:06 ` [Qemu-devel] " Peter Xu 2018-03-22 11:57 ` Xiao Guangrong 2018-03-22 11:57 ` [Qemu-devel] " Xiao Guangrong 2018-03-27 7:07 ` Peter Xu 2018-03-27 7:07 ` [Qemu-devel] " Peter Xu 2018-03-13 7:57 ` [PATCH 3/8] migration: support to detect compression and decompression errors guangrong.xiao 2018-03-13 7:57 ` [Qemu-devel] " guangrong.xiao 2018-03-15 11:29 ` Dr. David Alan Gilbert 2018-03-15 11:29 ` [Qemu-devel] " Dr. David Alan Gilbert 2018-03-16 8:25 ` Xiao Guangrong 2018-03-16 8:25 ` [Qemu-devel] " Xiao Guangrong 2018-03-19 7:56 ` [PATCH 3/8] migration: support to detect compressionand " jiang.biao2 2018-03-19 7:56 ` [Qemu-devel] " jiang.biao2 2018-03-19 8:01 ` Xiao Guangrong 2018-03-19 8:01 ` [Qemu-devel] " Xiao Guangrong 2018-03-21 10:00 ` Peter Xu [this message] 2018-03-21 10:00 ` [Qemu-devel] [PATCH 3/8] migration: support to detect compression and " Peter Xu 2018-03-22 12:03 ` Xiao Guangrong 2018-03-22 12:03 ` [Qemu-devel] " Xiao Guangrong 2018-03-27 7:22 ` Peter Xu 2018-03-27 7:22 ` [Qemu-devel] " Peter Xu 2018-03-26 19:42 ` Xiao Guangrong 2018-03-26 19:42 ` [Qemu-devel] " Xiao Guangrong 2018-03-27 11:17 ` Peter Xu 2018-03-27 11:17 ` [Qemu-devel] " Peter Xu 2018-03-27 1:20 ` Xiao Guangrong 2018-03-27 1:20 ` [Qemu-devel] " Xiao Guangrong 2018-03-28 0:43 ` [PATCH 3/8] migration: support to detectcompression " jiang.biao2 2018-03-28 0:43 ` [Qemu-devel] " jiang.biao2 2018-03-27 14:35 ` Xiao Guangrong 2018-03-27 14:35 ` [Qemu-devel] " Xiao Guangrong 2018-03-28 3:03 ` Peter Xu 2018-03-28 3:03 ` [Qemu-devel] " Peter Xu 2018-03-28 4:08 ` [PATCH 3/8] migration: support todetectcompression " jiang.biao2 2018-03-28 4:08 ` [Qemu-devel] " jiang.biao2 2018-03-28 4:20 ` Peter Xu 2018-03-28 4:20 ` [Qemu-devel] " Peter Xu 2018-03-27 18:44 ` Xiao Guangrong 2018-03-27 18:44 ` [Qemu-devel] " Xiao Guangrong 2018-03-28 8:07 ` [PATCH 3/8] migration: support todetectcompressionand " jiang.biao2 2018-03-28 8:07 ` [Qemu-devel] " jiang.biao2 2018-03-13 7:57 ` [PATCH 4/8] migration: introduce control_save_page() guangrong.xiao 2018-03-13 7:57 ` [Qemu-devel] " guangrong.xiao 2018-03-15 11:37 ` Dr. David Alan Gilbert 2018-03-15 11:37 ` [Qemu-devel] " Dr. David Alan Gilbert 2018-03-16 8:52 ` Xiao Guangrong 2018-03-16 8:52 ` [Qemu-devel] " Xiao Guangrong 2018-03-27 7:47 ` Peter Xu 2018-03-27 7:47 ` [Qemu-devel] " Peter Xu 2018-03-13 7:57 ` [PATCH 5/8] migration: move calling control_save_page to the common place guangrong.xiao 2018-03-13 7:57 ` [Qemu-devel] " guangrong.xiao 2018-03-15 11:47 ` Dr. David Alan Gilbert 2018-03-15 11:47 ` [Qemu-devel] " Dr. David Alan Gilbert 2018-03-16 8:59 ` Xiao Guangrong 2018-03-16 8:59 ` [Qemu-devel] " Xiao Guangrong 2018-03-19 13:15 ` Dr. David Alan Gilbert 2018-03-19 13:15 ` [Qemu-devel] " Dr. David Alan Gilbert 2018-03-27 12:35 ` Peter Xu 2018-03-27 12:35 ` [Qemu-devel] " Peter Xu 2018-03-13 7:57 ` [PATCH 6/8] migration: move calling save_zero_page " guangrong.xiao 2018-03-13 7:57 ` [Qemu-devel] " guangrong.xiao 2018-03-15 12:27 ` Dr. David Alan Gilbert 2018-03-15 12:27 ` [Qemu-devel] " Dr. David Alan Gilbert 2018-03-27 12:49 ` Peter Xu 2018-03-27 12:49 ` [Qemu-devel] " Peter Xu 2018-03-13 7:57 ` [PATCH 7/8] migration: introduce save_normal_page() guangrong.xiao 2018-03-13 7:57 ` [Qemu-devel] " guangrong.xiao 2018-03-15 12:30 ` Dr. David Alan Gilbert 2018-03-15 12:30 ` [Qemu-devel] " Dr. David Alan Gilbert 2018-03-27 12:54 ` Peter Xu 2018-03-27 12:54 ` [Qemu-devel] " Peter Xu 2018-03-13 7:57 ` [PATCH 8/8] migration: remove ram_save_compressed_page() guangrong.xiao 2018-03-13 7:57 ` [Qemu-devel] " guangrong.xiao 2018-03-15 12:32 ` Dr. David Alan Gilbert 2018-03-15 12:32 ` [Qemu-devel] " Dr. David Alan Gilbert 2018-03-27 12:56 ` Peter Xu 2018-03-27 12:56 ` [Qemu-devel] " Peter Xu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180321100043.GA30634@xz-mi \ --to=peterx@redhat.com \ --cc=guangrong.xiao@gmail.com \ --cc=kvm@vger.kernel.org \ --cc=mst@redhat.com \ --cc=mtosatti@redhat.com \ --cc=pbonzini@redhat.com \ --cc=qemu-devel@nongnu.org \ --cc=xiaoguangrong@tencent.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.