* [PATCH 0/2] tools/migration: Fixes in send_checkpoint_dirty_pfn_list() @ 2021-07-06 11:23 Andrew Cooper 2021-07-06 11:23 ` [PATCH 1/2] tools/migration: Fix iovec handling " Andrew Cooper ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Andrew Cooper @ 2021-07-06 11:23 UTC (permalink / raw) To: Xen-devel Cc: Andrew Cooper, Ian Jackson, Wei Liu, Juergen Gross, Jan Beulich, Olaf Hering These are a prerequisite to all currently on-list patches touching the function. Andrew Cooper (2): tools/migration: Fix iovec handling in send_checkpoint_dirty_pfn_list() tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() tools/libs/guest/xg_sr_restore.c | 48 ++++++++++++++-------------------------- 1 file changed, 17 insertions(+), 31 deletions(-) -- 2.11.0 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/2] tools/migration: Fix iovec handling in send_checkpoint_dirty_pfn_list() 2021-07-06 11:23 [PATCH 0/2] tools/migration: Fixes in send_checkpoint_dirty_pfn_list() Andrew Cooper @ 2021-07-06 11:23 ` Andrew Cooper 2021-07-06 11:54 ` Jan Beulich 2021-07-06 12:20 ` Olaf Hering 2021-07-06 11:23 ` [PATCH 2/2] tools/migration: Fix potential overflow " Andrew Cooper 2021-07-06 12:07 ` [PATCH 0/2] tools/migration: Fixes " Jan Beulich 2 siblings, 2 replies; 20+ messages in thread From: Andrew Cooper @ 2021-07-06 11:23 UTC (permalink / raw) To: Xen-devel Cc: Andrew Cooper, Ian Jackson, Wei Liu, Juergen Gross, Jan Beulich, Olaf Hering We shouldn't be using two struct iovec's to write half of 'rec' each, and there is no need to malloc() for two struct iovec's at all. Simplify down to just two - one covering the whole of 'rec', and one covering the pfns array. No functional change. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Ian Jackson <iwj@xenproject.org> CC: Wei Liu <wl@xen.org> CC: Juergen Gross <jgross@suse.com> CC: Jan Beulich <JBeulich@suse.com> CC: Olaf Hering <olaf@aepfle.de> --- tools/libs/guest/xg_sr_restore.c | 25 ++++++------------------- 1 file changed, 6 insertions(+), 19 deletions(-) diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index aa4113d7f679..07c9e291610b 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -425,11 +425,13 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) int rc = -1; unsigned int count, written; uint64_t i, *pfns = NULL; - struct iovec *iov = NULL; xc_shadow_op_stats_t stats = { 0, ctx->restore.p2m_size }; struct xc_sr_record rec = { .type = REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST, }; + struct iovec iov[2] = { + { &rec, sizeof(rec) }, + }; DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, &ctx->restore.dirty_bitmap_hbuf); @@ -471,26 +473,12 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) pfns[written++] = i; } - /* iovec[] for writev(). */ - iov = malloc(3 * sizeof(*iov)); - if ( !iov ) - { - ERROR("Unable to allocate memory for sending dirty bitmap"); - goto err; - } - rec.length = count * sizeof(*pfns); - iov[0].iov_base = &rec.type; - iov[0].iov_len = sizeof(rec.type); - - iov[1].iov_base = &rec.length; - iov[1].iov_len = sizeof(rec.length); - - iov[2].iov_base = pfns; - iov[2].iov_len = count * sizeof(*pfns); + iov[1].iov_base = pfns; + iov[1].iov_len = rec.length; - if ( writev_exact(ctx->restore.send_back_fd, iov, 3) ) + if ( writev_exact(ctx->restore.send_back_fd, iov, ARRAY_SIZE(iov)) ) { PERROR("Failed to write dirty bitmap to stream"); goto err; @@ -499,7 +487,6 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) rc = 0; err: free(pfns); - free(iov); return rc; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] tools/migration: Fix iovec handling in send_checkpoint_dirty_pfn_list() 2021-07-06 11:23 ` [PATCH 1/2] tools/migration: Fix iovec handling " Andrew Cooper @ 2021-07-06 11:54 ` Jan Beulich 2021-07-06 12:20 ` Olaf Hering 1 sibling, 0 replies; 20+ messages in thread From: Jan Beulich @ 2021-07-06 11:54 UTC (permalink / raw) To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Juergen Gross, Olaf Hering, Xen-devel On 06.07.2021 13:23, Andrew Cooper wrote: > We shouldn't be using two struct iovec's to write half of 'rec' each, and > there is no need to malloc() for two struct iovec's at all. I was indeed wondering about all of these while also touching the function. But I was guessing that there might be a deeper reason I don't see. > Simplify down to just two - one covering the whole of 'rec', and one covering > the pfns array. > > No functional change. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Jan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] tools/migration: Fix iovec handling in send_checkpoint_dirty_pfn_list() 2021-07-06 11:23 ` [PATCH 1/2] tools/migration: Fix iovec handling " Andrew Cooper 2021-07-06 11:54 ` Jan Beulich @ 2021-07-06 12:20 ` Olaf Hering 1 sibling, 0 replies; 20+ messages in thread From: Olaf Hering @ 2021-07-06 12:20 UTC (permalink / raw) To: Andrew Cooper; +Cc: Xen-devel, Ian Jackson, Wei Liu, Juergen Gross, Jan Beulich [-- Attachment #1: Type: text/plain, Size: 385 bytes --] Am Tue, 6 Jul 2021 12:23:31 +0100 schrieb Andrew Cooper <andrew.cooper3@citrix.com>: > We shouldn't be using two struct iovec's to write half of 'rec' each, and > there is no need to malloc() for two struct iovec's at all. > > Simplify down to just two - one covering the whole of 'rec', and one covering > the pfns array. Acked-by: Olaf Hering <olaf@aepfle.de> Olaf [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 11:23 [PATCH 0/2] tools/migration: Fixes in send_checkpoint_dirty_pfn_list() Andrew Cooper 2021-07-06 11:23 ` [PATCH 1/2] tools/migration: Fix iovec handling " Andrew Cooper @ 2021-07-06 11:23 ` Andrew Cooper 2021-07-06 12:03 ` Jan Beulich 2021-07-06 12:58 ` Olaf Hering 2021-07-06 12:07 ` [PATCH 0/2] tools/migration: Fixes " Jan Beulich 2 siblings, 2 replies; 20+ messages in thread From: Andrew Cooper @ 2021-07-06 11:23 UTC (permalink / raw) To: Xen-devel Cc: Andrew Cooper, Ian Jackson, Wei Liu, Juergen Gross, Jan Beulich, Olaf Hering 'count * sizeof(*pfns)' can in principle overflow, but is implausible in practice as the time between checkpoints is typically sub-second. Nevertheless, simplify the code and remove the risk. There is no need to loop over the bitmap to calculate count. The number of set bits is returned in xc_shadow_op_stats_t which is already collected (and ignored). Bounds check the count against what will fit in REC_LENGTH_MAX. At the time of writing, this allows up to 0xffffff pfns. Rearrange the pfns loop to check for errors both ways, not simply that there were more pfns than expected. Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Ian Jackson <iwj@xenproject.org> CC: Wei Liu <wl@xen.org> CC: Juergen Gross <jgross@suse.com> CC: Jan Beulich <JBeulich@suse.com> CC: Olaf Hering <olaf@aepfle.de> --- tools/libs/guest/xg_sr_restore.c | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index 07c9e291610b..bda04ee42e3f 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -425,7 +425,7 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) int rc = -1; unsigned int count, written; uint64_t i, *pfns = NULL; - xc_shadow_op_stats_t stats = { 0, ctx->restore.p2m_size }; + xc_shadow_op_stats_t stats; struct xc_sr_record rec = { .type = REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST, }; @@ -444,14 +444,17 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) goto err; } - for ( i = 0, count = 0; i < ctx->restore.p2m_size; i++ ) + count = stats.dirty_count; + + if ( ((REC_LENGTH_MAX - sizeof(rec)) / sizeof(*pfns)) < count ) { - if ( test_bit(i, dirty_bitmap) ) - count++; + ERROR("Too many PFNs (%u) to fit in record (limit %zu)", count, + ((REC_LENGTH_MAX - sizeof(rec)) / sizeof(*pfns))); + goto err; } - - pfns = malloc(count * sizeof(*pfns)); + iov[1].iov_len = rec.length = count * sizeof(*pfns); + iov[1].iov_base = pfns = malloc(rec.length); if ( !pfns ) { ERROR("Unable to allocate %zu bytes of memory for dirty pfn list", @@ -459,24 +462,20 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) goto err; } - for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i ) + for ( i = 0, written = 0; count && i < ctx->restore.p2m_size; ++i, --count ) { if ( !test_bit(i, dirty_bitmap) ) continue; - if ( written > count ) - { - ERROR("Dirty pfn list exceed"); - goto err; - } - pfns[written++] = i; } - rec.length = count * sizeof(*pfns); - - iov[1].iov_base = pfns; - iov[1].iov_len = rec.length; + if ( written != stats.dirty_count ) + { + ERROR("Mismatch between dirty bitmap bits (%u), and dirty_count (%u)", + written, stats.dirty_count); + goto err; + } if ( writev_exact(ctx->restore.send_back_fd, iov, ARRAY_SIZE(iov)) ) { -- 2.11.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 11:23 ` [PATCH 2/2] tools/migration: Fix potential overflow " Andrew Cooper @ 2021-07-06 12:03 ` Jan Beulich 2021-07-06 13:34 ` Andrew Cooper 2021-07-06 12:58 ` Olaf Hering 1 sibling, 1 reply; 20+ messages in thread From: Jan Beulich @ 2021-07-06 12:03 UTC (permalink / raw) To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Juergen Gross, Olaf Hering, Xen-devel On 06.07.2021 13:23, Andrew Cooper wrote: > 'count * sizeof(*pfns)' can in principle overflow, but is implausible in > practice as the time between checkpoints is typically sub-second. > Nevertheless, simplify the code and remove the risk. > > There is no need to loop over the bitmap to calculate count. The number of > set bits is returned in xc_shadow_op_stats_t which is already collected (and > ignored). > > Bounds check the count against what will fit in REC_LENGTH_MAX. At the time > of writing, this allows up to 0xffffff pfns. Well, okay, this then means that an overflow in the reporting of dirty_count doesn't matter for now, because the limit is lower anyway. > Rearrange the pfns loop to check > for errors both ways, not simply that there were more pfns than expected. Hmm, "both ways" to me would mean ... > @@ -459,24 +462,20 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) > goto err; > } > > - for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i ) > + for ( i = 0, written = 0; count && i < ctx->restore.p2m_size; ++i, --count ) > { > if ( !test_bit(i, dirty_bitmap) ) > continue; > > - if ( written > count ) > - { > - ERROR("Dirty pfn list exceed"); > - goto err; > - } > - > pfns[written++] = i; > } > > - rec.length = count * sizeof(*pfns); > - > - iov[1].iov_base = pfns; > - iov[1].iov_len = rec.length; > + if ( written != stats.dirty_count ) > + { > + ERROR("Mismatch between dirty bitmap bits (%u), and dirty_count (%u)", > + written, stats.dirty_count); > + goto err; > + } ... you then also check that there are no further bit set in the bitmap. As said elsewhere, I'm not convinced using statistics as a basis for actual operation (rather than just reporting) is appropriate. I'm unaware of there being any spelled out guarantee that the numbers reported back from the hypercall are accurate. Jan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 12:03 ` Jan Beulich @ 2021-07-06 13:34 ` Andrew Cooper 2021-07-06 14:00 ` Jan Beulich 0 siblings, 1 reply; 20+ messages in thread From: Andrew Cooper @ 2021-07-06 13:34 UTC (permalink / raw) To: Jan Beulich; +Cc: Ian Jackson, Wei Liu, Juergen Gross, Olaf Hering, Xen-devel On 06/07/2021 13:03, Jan Beulich wrote: > On 06.07.2021 13:23, Andrew Cooper wrote: >> 'count * sizeof(*pfns)' can in principle overflow, but is implausible in >> practice as the time between checkpoints is typically sub-second. >> Nevertheless, simplify the code and remove the risk. >> >> There is no need to loop over the bitmap to calculate count. The number of >> set bits is returned in xc_shadow_op_stats_t which is already collected (and >> ignored). >> >> Bounds check the count against what will fit in REC_LENGTH_MAX. At the time >> of writing, this allows up to 0xffffff pfns. > Well, okay, this then means that an overflow in the reporting of > dirty_count doesn't matter for now, because the limit is lower > anyway. > >> Rearrange the pfns loop to check >> for errors both ways, not simply that there were more pfns than expected. > Hmm, "both ways" to me would mean ... > >> @@ -459,24 +462,20 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) >> goto err; >> } >> >> - for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i ) >> + for ( i = 0, written = 0; count && i < ctx->restore.p2m_size; ++i, --count ) >> { >> if ( !test_bit(i, dirty_bitmap) ) >> continue; >> >> - if ( written > count ) >> - { >> - ERROR("Dirty pfn list exceed"); >> - goto err; >> - } >> - >> pfns[written++] = i; >> } >> >> - rec.length = count * sizeof(*pfns); >> - >> - iov[1].iov_base = pfns; >> - iov[1].iov_len = rec.length; >> + if ( written != stats.dirty_count ) >> + { >> + ERROR("Mismatch between dirty bitmap bits (%u), and dirty_count (%u)", >> + written, stats.dirty_count); >> + goto err; >> + } > ... you then also check that there are no further bit set in the > bitmap. As said elsewhere, I'm not convinced using statistics as > a basis for actual operation (rather than just reporting) is > appropriate. I'm not interested in inference based on the name of the structure. > I'm unaware of there being any spelled out guarantee > that the numbers reported back from the hypercall are accurate. The live loop uses this information already for this purpose. If it is wrong, we've got bigger problems that this. ~Andrew ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 13:34 ` Andrew Cooper @ 2021-07-06 14:00 ` Jan Beulich 0 siblings, 0 replies; 20+ messages in thread From: Jan Beulich @ 2021-07-06 14:00 UTC (permalink / raw) To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Juergen Gross, Olaf Hering, Xen-devel On 06.07.2021 15:34, Andrew Cooper wrote: > On 06/07/2021 13:03, Jan Beulich wrote: >> On 06.07.2021 13:23, Andrew Cooper wrote: >>> 'count * sizeof(*pfns)' can in principle overflow, but is implausible in >>> practice as the time between checkpoints is typically sub-second. >>> Nevertheless, simplify the code and remove the risk. >>> >>> There is no need to loop over the bitmap to calculate count. The number of >>> set bits is returned in xc_shadow_op_stats_t which is already collected (and >>> ignored). >>> >>> Bounds check the count against what will fit in REC_LENGTH_MAX. At the time >>> of writing, this allows up to 0xffffff pfns. >> Well, okay, this then means that an overflow in the reporting of >> dirty_count doesn't matter for now, because the limit is lower >> anyway. >> >>> Rearrange the pfns loop to check >>> for errors both ways, not simply that there were more pfns than expected. >> Hmm, "both ways" to me would mean ... >> >>> @@ -459,24 +462,20 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) >>> goto err; >>> } >>> >>> - for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i ) >>> + for ( i = 0, written = 0; count && i < ctx->restore.p2m_size; ++i, --count ) >>> { >>> if ( !test_bit(i, dirty_bitmap) ) >>> continue; >>> >>> - if ( written > count ) >>> - { >>> - ERROR("Dirty pfn list exceed"); >>> - goto err; >>> - } >>> - >>> pfns[written++] = i; >>> } >>> >>> - rec.length = count * sizeof(*pfns); >>> - >>> - iov[1].iov_base = pfns; >>> - iov[1].iov_len = rec.length; >>> + if ( written != stats.dirty_count ) >>> + { >>> + ERROR("Mismatch between dirty bitmap bits (%u), and dirty_count (%u)", >>> + written, stats.dirty_count); >>> + goto err; >>> + } >> ... you then also check that there are no further bit set in the >> bitmap. As said elsewhere, I'm not convinced using statistics as >> a basis for actual operation (rather than just reporting) is >> appropriate. > > I'm not interested in inference based on the name of the structure. I'm afraid that's the problem: Because you started using it for something it wasn't meant to be used for, you now think it's the name that's misleading, and the use is okay. I remain of the opinion that it's the other way around (but see below for there not being an real dependency at least in this particular case). >> I'm unaware of there being any spelled out guarantee >> that the numbers reported back from the hypercall are accurate. > > The live loop uses this information already for this purpose. If it is > wrong, we've got bigger problems that this. send_memory_live() passes the value to send_dirty_pages(), which in turn passes it only to xc_report_progress_step() and uses it in if ( written > entries ) DPRINTF("Bitmap contained more entries than expected..."); There's no relying on this number at all afaics. Jan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 11:23 ` [PATCH 2/2] tools/migration: Fix potential overflow " Andrew Cooper 2021-07-06 12:03 ` Jan Beulich @ 2021-07-06 12:58 ` Olaf Hering 2021-07-06 13:19 ` Andrew Cooper 2021-07-06 14:11 ` Olaf Hering 1 sibling, 2 replies; 20+ messages in thread From: Olaf Hering @ 2021-07-06 12:58 UTC (permalink / raw) To: Andrew Cooper; +Cc: Xen-devel, Ian Jackson, Wei Liu, Juergen Gross, Jan Beulich [-- Attachment #1.1: Type: text/plain, Size: 883 bytes --] Am Tue, 6 Jul 2021 12:23:32 +0100 schrieb Andrew Cooper <andrew.cooper3@citrix.com>: > + count = stats.dirty_count; Is this accurate? I remember the reporting is broken since a while, and testing a busy domU indicates it is still the case. # xen-logdirty `xl domid domU` 0: faults= 0 dirty= 258050 1: faults= 0 dirty= 257817 2: faults= 0 dirty= 253713 3: faults= 0 dirty= 253197 4: faults= 0 dirty= 255154 5: faults= 0 dirty= 260876 6: faults= 0 dirty= 259083 7: faults= 0 dirty= 253163 8: faults= 0 dirty= 258349 9: faults= 0 dirty= 260330 10: faults= 0 dirty= 259754 11: faults= 0 dirty= 257273 12: faults= 0 dirty= 255756 13: faults= 0 dirty= 258209 14: faults= 0 dirty= 257847 15: faults= 0 dirty= 59639 16: faults= 0 dirty= 81 17: faults= 0 dirty= 86 18: faults= 0 dirty= 111 19: faults= 0 dirty= 81 20: faults= 0 dirty= 80 .... Olaf [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1.2: logdirty.c --] [-- Type: text/x-c++src, Size: 3186 bytes --] /* gcc -Wall -o logdirty -O -lxenctrl logdirty.c */ #include <unistd.h> #include <errno.h> #include <inttypes.h> #include <stdlib.h> #include <stdio.h> #include <xenctrl.h> #include <signal.h> #define PAGE_SHIFT XC_PAGE_SHIFT #define ROUNDUP(_x,_w) (((unsigned long)(_x)+(1UL<<(_w))-1) & ~((1UL<<(_w))-1)) #define NRPAGES(x) (ROUNDUP(x, PAGE_SHIFT) >> PAGE_SHIFT) #define BITS_PER_LONG (sizeof(unsigned long) * 8) #define ORDER_LONG (sizeof(unsigned long) == 4 ? 5 : 6) static unsigned int domid = 0; static xc_interface *xch; static unsigned long xdmg(xc_interface *c, unsigned int d) { unsigned long r; #if XEN_DOMCTL_INTERFACE_VERSION < 0x0b r = xc_domain_maximum_gpfn(c, d); #else xen_pfn_t gpfns = 0; r = xc_domain_maximum_gpfn(c, d, &gpfns); #endif return r + 1; } static void sigint_handler(int sig) { int rc; fprintf(stderr, "User aborted\n"); rc = xc_shadow_control(xch, domid, XEN_DOMCTL_SHADOW_OP_OFF, NULL, 0, NULL, 0, NULL); if (rc < 0) perror("XEN_DOMCTL_SHADOW_OP_OFF hypercall failed\n"); exit(1); } static inline int bitmap_size(int nr_bits) { int nr_long, nr_bytes; nr_long = (nr_bits + BITS_PER_LONG - 1) >> ORDER_LONG; nr_bytes = nr_long * sizeof(unsigned long); return nr_bytes; } int main(int argc, char *argv[]) { int rc, ret = 1, i, runs = 42; unsigned int lflags; xentoollog_level lvl; xentoollog_logger *l; DECLARE_HYPERCALL_BUFFER(unsigned long, to_skip); unsigned long p2m_size; xc_shadow_op_stats_t stats; errno = EINVAL; if (argc > 1) domid = atoi(argv[1]); if (!domid) goto out; if (argc > 2) runs = atoi(argv[2]); if (!runs) goto out; errno = 0; lvl = XTL_DEBUG; lflags = XTL_STDIOSTREAM_SHOW_PID | XTL_STDIOSTREAM_HIDE_PROGRESS; l = (xentoollog_logger *) xtl_createlogger_stdiostream(stderr, lvl, lflags); if (!l) goto out; xch = xc_interface_open(l, 0, 0); if (!xch) goto out; p2m_size = xdmg(xch, domid); if (!p2m_size) goto out; to_skip = xc_hypercall_buffer_alloc_pages(xch, to_skip, NRPAGES(bitmap_size(p2m_size))); if (!to_skip) goto out; signal(SIGINT, sigint_handler); rc = xc_shadow_control(xch, domid, XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY, NULL, 0, NULL, 0, NULL); if (rc < 0) { rc = xc_shadow_control(xch, domid, XEN_DOMCTL_SHADOW_OP_OFF, NULL, 0, NULL, 0, NULL); if (rc < 0) goto out; rc = xc_shadow_control(xch, domid, XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY, NULL, 0, NULL, 0, NULL); if (rc < 0) goto out; } rc = xc_shadow_control(xch, domid, XEN_DOMCTL_SHADOW_OP_CLEAN, NULL, 0, NULL, 0, NULL); if (rc < 0) goto out; for (i = 0; i < runs; i++) { sleep(1); if (1) rc = xc_shadow_control(xch, domid, XEN_DOMCTL_SHADOW_OP_CLEAN, HYPERCALL_BUFFER(to_skip), p2m_size, NULL, 0, &stats); if (rc < 0) goto out; printf("%d: faults= %" PRIu32 " dirty= %" PRIu32 "\n", i, stats.fault_count, stats.dirty_count); } rc = xc_shadow_control(xch, domid, XEN_DOMCTL_SHADOW_OP_OFF, NULL, 0, NULL, 0, NULL); if (rc < 0) goto out; errno = ret = 0; out: perror(argv[0]); return ret; } [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 12:58 ` Olaf Hering @ 2021-07-06 13:19 ` Andrew Cooper 2021-07-06 13:22 ` Andrew Cooper ` (2 more replies) 2021-07-06 14:11 ` Olaf Hering 1 sibling, 3 replies; 20+ messages in thread From: Andrew Cooper @ 2021-07-06 13:19 UTC (permalink / raw) To: Olaf Hering; +Cc: Xen-devel, Ian Jackson, Wei Liu, Juergen Gross, Jan Beulich On 06/07/2021 13:58, Olaf Hering wrote: > Am Tue, 6 Jul 2021 12:23:32 +0100 > schrieb Andrew Cooper <andrew.cooper3@citrix.com>: > >> + count = stats.dirty_count; > Is this accurate? The live loop relies on it, and it worked correctly the last time I tested it. > I remember the reporting is broken since a while, and testing a busy domU indicates it is still the case. If its broken, it needs fixing. > # xen-logdirty `xl domid domU` > 0: faults= 0 dirty= 258050 > 1: faults= 0 dirty= 257817 > 2: faults= 0 dirty= 253713 > 3: faults= 0 dirty= 253197 > 4: faults= 0 dirty= 255154 > 5: faults= 0 dirty= 260876 > 6: faults= 0 dirty= 259083 > 7: faults= 0 dirty= 253163 > 8: faults= 0 dirty= 258349 > 9: faults= 0 dirty= 260330 > 10: faults= 0 dirty= 259754 > 11: faults= 0 dirty= 257273 > 12: faults= 0 dirty= 255756 > 13: faults= 0 dirty= 258209 > 14: faults= 0 dirty= 257847 > 15: faults= 0 dirty= 59639 > 16: faults= 0 dirty= 81 > 17: faults= 0 dirty= 86 > 18: faults= 0 dirty= 111 > 19: faults= 0 dirty= 81 > 20: faults= 0 dirty= 80 What is this showing, other than (unsurprisingly) faults doesn't work for an HVM guest? ~Andrew ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 13:19 ` Andrew Cooper @ 2021-07-06 13:22 ` Andrew Cooper 2021-07-06 13:39 ` Olaf Hering 2021-07-06 13:28 ` Olaf Hering 2021-07-06 13:56 ` Jan Beulich 2 siblings, 1 reply; 20+ messages in thread From: Andrew Cooper @ 2021-07-06 13:22 UTC (permalink / raw) To: Olaf Hering; +Cc: Xen-devel, Ian Jackson, Wei Liu, Juergen Gross, Jan Beulich On 06/07/2021 14:19, Andrew Cooper wrote: > On 06/07/2021 13:58, Olaf Hering wrote: >> Am Tue, 6 Jul 2021 12:23:32 +0100 >> schrieb Andrew Cooper <andrew.cooper3@citrix.com>: >> >>> + count = stats.dirty_count; >> Is this accurate? > The live loop relies on it, and it worked correctly the last time I > tested it. > >> I remember the reporting is broken since a while, and testing a busy domU indicates it is still the case. > If its broken, it needs fixing. > >> # xen-logdirty `xl domid domU` >> 0: faults= 0 dirty= 258050 >> 1: faults= 0 dirty= 257817 >> 2: faults= 0 dirty= 253713 >> 3: faults= 0 dirty= 253197 >> 4: faults= 0 dirty= 255154 >> 5: faults= 0 dirty= 260876 >> 6: faults= 0 dirty= 259083 >> 7: faults= 0 dirty= 253163 >> 8: faults= 0 dirty= 258349 >> 9: faults= 0 dirty= 260330 >> 10: faults= 0 dirty= 259754 >> 11: faults= 0 dirty= 257273 >> 12: faults= 0 dirty= 255756 >> 13: faults= 0 dirty= 258209 >> 14: faults= 0 dirty= 257847 >> 15: faults= 0 dirty= 59639 >> 16: faults= 0 dirty= 81 >> 17: faults= 0 dirty= 86 >> 18: faults= 0 dirty= 111 >> 19: faults= 0 dirty= 81 >> 20: faults= 0 dirty= 80 > What is this showing, other than (unsurprisingly) faults doesn't work > for an HVM guest? Sorry - I meant HAP guest. What hardware is this on? i.e. is the Page Modification Logging feature in use? ~Andrew ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 13:22 ` Andrew Cooper @ 2021-07-06 13:39 ` Olaf Hering 2021-07-06 13:43 ` Andrew Cooper 0 siblings, 1 reply; 20+ messages in thread From: Olaf Hering @ 2021-07-06 13:39 UTC (permalink / raw) To: Andrew Cooper; +Cc: Xen-devel, Ian Jackson, Wei Liu, Juergen Gross, Jan Beulich [-- Attachment #1: Type: text/plain, Size: 271 bytes --] Am Tue, 6 Jul 2021 14:22:58 +0100 schrieb Andrew Cooper <andrew.cooper3@citrix.com>: > What hardware is this on? i.e. is the Page Modification Logging feature > in use? At least it gets reported as VMX feature during boot, this is a CoyotePass system. Olaf [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 13:39 ` Olaf Hering @ 2021-07-06 13:43 ` Andrew Cooper 0 siblings, 0 replies; 20+ messages in thread From: Andrew Cooper @ 2021-07-06 13:43 UTC (permalink / raw) To: Olaf Hering; +Cc: Xen-devel, Ian Jackson, Wei Liu, Juergen Gross, Jan Beulich On 06/07/2021 14:39, Olaf Hering wrote: > Am Tue, 6 Jul 2021 14:22:58 +0100 > schrieb Andrew Cooper <andrew.cooper3@citrix.com>: > >> What hardware is this on? i.e. is the Page Modification Logging feature >> in use? > At least it gets reported as VMX feature during boot, this is a CoyotePass system. That logging is problematic (that means "this hardware has the feature"), but yes - PML will be used by default when available. ~Andrew ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 13:19 ` Andrew Cooper 2021-07-06 13:22 ` Andrew Cooper @ 2021-07-06 13:28 ` Olaf Hering 2021-07-06 13:56 ` Jan Beulich 2 siblings, 0 replies; 20+ messages in thread From: Olaf Hering @ 2021-07-06 13:28 UTC (permalink / raw) To: Andrew Cooper; +Cc: Xen-devel, Ian Jackson, Wei Liu, Juergen Gross, Jan Beulich [-- Attachment #1: Type: text/plain, Size: 601 bytes --] Am Tue, 6 Jul 2021 14:19:21 +0100 schrieb Andrew Cooper <andrew.cooper3@citrix.com>: > > 20: faults= 0 dirty= 80 > > What is this showing, other than (unsurprisingly) faults doesn't work > for an HVM guest? The dirty count goes down after a while for a domU that constantly touches as many pages as it can. But yes, the current code uses stats.dirty_count in a number of places. It seems all of the usage is just for reporting, so if these code paths would get wrong input nothing bad happens. precopy_policy may indicate more or less iterations, this is not critical either. Olaf [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 13:19 ` Andrew Cooper 2021-07-06 13:22 ` Andrew Cooper 2021-07-06 13:28 ` Olaf Hering @ 2021-07-06 13:56 ` Jan Beulich 2 siblings, 0 replies; 20+ messages in thread From: Jan Beulich @ 2021-07-06 13:56 UTC (permalink / raw) To: Andrew Cooper; +Cc: Xen-devel, Ian Jackson, Wei Liu, Juergen Gross, Olaf Hering On 06.07.2021 15:19, Andrew Cooper wrote: > On 06/07/2021 13:58, Olaf Hering wrote: >> Am Tue, 6 Jul 2021 12:23:32 +0100 >> schrieb Andrew Cooper <andrew.cooper3@citrix.com>: >> >>> + count = stats.dirty_count; >> Is this accurate? > > The live loop relies on it, and it worked correctly the last time I > tested it. When still merely investigating in preparation of my recent series, i.e. without having made changes yet except to add some logging, I did observe "Bitmap contained more entries than expected..." a couple of times, with "written" and "entries" typically apart by just 1 (as determined by extra logging; to be honest I don't recall if they were farther apart at any point). So the number is _not_ accurate in any event, and cannot be used for other than reporting purposes (as also expressed elsewhere on this thread). This also underlines that, unlike you did say in a reply to one of my patches, this is only a "detail" message, not an error, because migration happily went on and succeeded. Jan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 12:58 ` Olaf Hering 2021-07-06 13:19 ` Andrew Cooper @ 2021-07-06 14:11 ` Olaf Hering 2021-07-06 15:13 ` Jan Beulich 1 sibling, 1 reply; 20+ messages in thread From: Olaf Hering @ 2021-07-06 14:11 UTC (permalink / raw) To: Andrew Cooper; +Cc: Xen-devel, Ian Jackson, Wei Liu, Juergen Gross, Jan Beulich [-- Attachment #1: Type: text/plain, Size: 307 bytes --] Am Tue, 6 Jul 2021 14:58:04 +0200 schrieb Olaf Hering <olaf@aepfle.de>: > the reporting is broken since a while A quick check on a Dell T320 with E5-2430L, which does not have "Page Modification Logging", indicates that staging-4.5 appears to work, but reporting in staging-4.6 is broken. Olaf [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 14:11 ` Olaf Hering @ 2021-07-06 15:13 ` Jan Beulich 2021-07-06 15:22 ` Jan Beulich 0 siblings, 1 reply; 20+ messages in thread From: Jan Beulich @ 2021-07-06 15:13 UTC (permalink / raw) To: Olaf Hering; +Cc: Xen-devel, Ian Jackson, Wei Liu, Juergen Gross, Andrew Cooper On 06.07.2021 16:11, Olaf Hering wrote: > Am Tue, 6 Jul 2021 14:58:04 +0200 > schrieb Olaf Hering <olaf@aepfle.de>: > >> the reporting is broken since a while > > A quick check on a Dell T320 with E5-2430L, which does not have "Page Modification Logging", indicates that staging-4.5 appears to work, but reporting in staging-4.6 is broken. Not surprising at all, considering PML support was added in 4.6. Jan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 15:13 ` Jan Beulich @ 2021-07-06 15:22 ` Jan Beulich 2021-07-06 16:08 ` Andrew Cooper 0 siblings, 1 reply; 20+ messages in thread From: Jan Beulich @ 2021-07-06 15:22 UTC (permalink / raw) To: Olaf Hering; +Cc: Xen-devel, Ian Jackson, Wei Liu, Juergen Gross, Andrew Cooper On 06.07.2021 17:13, Jan Beulich wrote: > On 06.07.2021 16:11, Olaf Hering wrote: >> Am Tue, 6 Jul 2021 14:58:04 +0200 >> schrieb Olaf Hering <olaf@aepfle.de>: >> >>> the reporting is broken since a while >> >> A quick check on a Dell T320 with E5-2430L, which does not have "Page Modification Logging", indicates that staging-4.5 appears to work, but reporting in staging-4.6 is broken. > > Not surprising at all, considering PML support was added in 4.6. Or perhaps still surprising, as the functions involved there don't appear to be bypassing any core logic, so after some looking around I can't say I see anything that's obviously wrong. Oh, and I notice only now the "not" in your original reply, which renders my earlier statement completely pointless anyway. I'm sorry. Jan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] tools/migration: Fix potential overflow in send_checkpoint_dirty_pfn_list() 2021-07-06 15:22 ` Jan Beulich @ 2021-07-06 16:08 ` Andrew Cooper 0 siblings, 0 replies; 20+ messages in thread From: Andrew Cooper @ 2021-07-06 16:08 UTC (permalink / raw) To: Jan Beulich, Olaf Hering; +Cc: Xen-devel, Ian Jackson, Wei Liu, Juergen Gross On 06/07/2021 16:22, Jan Beulich wrote: > On 06.07.2021 17:13, Jan Beulich wrote: >> On 06.07.2021 16:11, Olaf Hering wrote: >>> Am Tue, 6 Jul 2021 14:58:04 +0200 >>> schrieb Olaf Hering <olaf@aepfle.de>: >>> >>>> the reporting is broken since a while >>> A quick check on a Dell T320 with E5-2430L, which does not have "Page Modification Logging", indicates that staging-4.5 appears to work, but reporting in staging-4.6 is broken. >> Not surprising at all, considering PML support was added in 4.6. > Or perhaps still surprising, as the functions involved there don't > appear to be bypassing any core logic, so after some looking around > I can't say I see anything that's obviously wrong. > > Oh, and I notice only now the "not" in your original reply, which > renders my earlier statement completely pointless anyway. I'm sorry. Ok, so my observations of dirty_count being reliable during development of migration v2 appears to be correct, and something regressed in the dev window where it got committed upstream... That is suspicious. ~Andrew ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/2] tools/migration: Fixes in send_checkpoint_dirty_pfn_list() 2021-07-06 11:23 [PATCH 0/2] tools/migration: Fixes in send_checkpoint_dirty_pfn_list() Andrew Cooper 2021-07-06 11:23 ` [PATCH 1/2] tools/migration: Fix iovec handling " Andrew Cooper 2021-07-06 11:23 ` [PATCH 2/2] tools/migration: Fix potential overflow " Andrew Cooper @ 2021-07-06 12:07 ` Jan Beulich 2 siblings, 0 replies; 20+ messages in thread From: Jan Beulich @ 2021-07-06 12:07 UTC (permalink / raw) To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Juergen Gross, Olaf Hering, Xen-devel On 06.07.2021 13:23, Andrew Cooper wrote: > These are a prerequisite to all currently on-list patches touching the > function. Just as an observation, while I can see how from your pov (judging from your not-exactly-review-comments) patch 2 may be a prereq to one of my changes, I don't think I'd call patch 1 a prereq to anything - this is cleanup which can be done at any point. Of course it'll allow me to shrink that one of my patches, which is certainly nice. Jan ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2021-07-06 16:09 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-06 11:23 [PATCH 0/2] tools/migration: Fixes in send_checkpoint_dirty_pfn_list() Andrew Cooper 2021-07-06 11:23 ` [PATCH 1/2] tools/migration: Fix iovec handling " Andrew Cooper 2021-07-06 11:54 ` Jan Beulich 2021-07-06 12:20 ` Olaf Hering 2021-07-06 11:23 ` [PATCH 2/2] tools/migration: Fix potential overflow " Andrew Cooper 2021-07-06 12:03 ` Jan Beulich 2021-07-06 13:34 ` Andrew Cooper 2021-07-06 14:00 ` Jan Beulich 2021-07-06 12:58 ` Olaf Hering 2021-07-06 13:19 ` Andrew Cooper 2021-07-06 13:22 ` Andrew Cooper 2021-07-06 13:39 ` Olaf Hering 2021-07-06 13:43 ` Andrew Cooper 2021-07-06 13:28 ` Olaf Hering 2021-07-06 13:56 ` Jan Beulich 2021-07-06 14:11 ` Olaf Hering 2021-07-06 15:13 ` Jan Beulich 2021-07-06 15:22 ` Jan Beulich 2021-07-06 16:08 ` Andrew Cooper 2021-07-06 12:07 ` [PATCH 0/2] tools/migration: Fixes " Jan Beulich
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.