* [PATCH v2 0/2] migration: faster savevm_state_handler_insert() @ 2019-10-17 20:59 Scott Cheloha 2019-10-17 20:59 ` [PATCH v2 1/2] migration: add savevm_state_handler_remove() Scott Cheloha 2019-10-17 20:59 ` [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion Scott Cheloha 0 siblings, 2 replies; 17+ messages in thread From: Scott Cheloha @ 2019-10-17 20:59 UTC (permalink / raw) To: qemu-devel; +Cc: Dr . David Alan Gilbert, Juan Quintela The savevm_state.handlers queue of SaveStateEntry objects is a priority queue with an O(n) insertion cost. This is makes startup extremely slow when a VM has many such objects to register. For instance, a ppc64 VM with a large (8T+) maxmem needs to register tens of thousands of SaveStateEntry handlers: startup for these VMs is glacial. If we track insertion points within the priority queue we can make savevm_state_handler_insert() a constant-time operation with little change to the module's code. Startup times for VMs with many handlers are dramatically improved as a result. Changes since v1: * Split patch 1 into 2 patches. Scott Cheloha (2): migration: add savevm_state_handler_remove() migration: savevm_state_handler_insert: constant-time element insertion migration/savevm.c | 35 ++++++++++++++++++++++++++++++----- 1 file changed, 30 insertions(+), 5 deletions(-) -- 2.23.0 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 1/2] migration: add savevm_state_handler_remove() 2019-10-17 20:59 [PATCH v2 0/2] migration: faster savevm_state_handler_insert() Scott Cheloha @ 2019-10-17 20:59 ` Scott Cheloha 2019-12-04 16:43 ` Dr. David Alan Gilbert 2020-01-08 19:07 ` Juan Quintela 2019-10-17 20:59 ` [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion Scott Cheloha 1 sibling, 2 replies; 17+ messages in thread From: Scott Cheloha @ 2019-10-17 20:59 UTC (permalink / raw) To: qemu-devel; +Cc: Dr . David Alan Gilbert, Juan Quintela Create a function to abstract common logic needed when removing a SaveStateEntry element from the savevm_state.handlers queue. For now we just remove the element. Soon it will involve additional cleanup. Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com> --- migration/savevm.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/migration/savevm.c b/migration/savevm.c index 8d95e261f6..b2e3b7222a 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -725,6 +725,11 @@ static void savevm_state_handler_insert(SaveStateEntry *nse) } } +static void savevm_state_handler_remove(SaveStateEntry *se) +{ + QTAILQ_REMOVE(&savevm_state.handlers, se, entry); +} + /* TODO: Individual devices generally have very little idea about the rest of the system, so instance_id should be removed/replaced. Meanwhile pass -1 as instance_id if you do not already have a clearly @@ -777,7 +782,7 @@ void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque) QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se) { if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) { - QTAILQ_REMOVE(&savevm_state.handlers, se, entry); + savevm_state_handler_remove(se); g_free(se->compat); g_free(se); } @@ -841,7 +846,7 @@ void vmstate_unregister(DeviceState *dev, const VMStateDescription *vmsd, QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se) { if (se->vmsd == vmsd && se->opaque == opaque) { - QTAILQ_REMOVE(&savevm_state.handlers, se, entry); + savevm_state_handler_remove(se); g_free(se->compat); g_free(se); } -- 2.23.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] migration: add savevm_state_handler_remove() 2019-10-17 20:59 ` [PATCH v2 1/2] migration: add savevm_state_handler_remove() Scott Cheloha @ 2019-12-04 16:43 ` Dr. David Alan Gilbert 2020-01-08 19:07 ` Juan Quintela 1 sibling, 0 replies; 17+ messages in thread From: Dr. David Alan Gilbert @ 2019-12-04 16:43 UTC (permalink / raw) To: Scott Cheloha; +Cc: qemu-devel, Juan Quintela * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: > Create a function to abstract common logic needed when removing a > SaveStateEntry element from the savevm_state.handlers queue. > > For now we just remove the element. Soon it will involve additional > cleanup. > > Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/savevm.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/migration/savevm.c b/migration/savevm.c > index 8d95e261f6..b2e3b7222a 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -725,6 +725,11 @@ static void savevm_state_handler_insert(SaveStateEntry *nse) > } > } > > +static void savevm_state_handler_remove(SaveStateEntry *se) > +{ > + QTAILQ_REMOVE(&savevm_state.handlers, se, entry); > +} > + > /* TODO: Individual devices generally have very little idea about the rest > of the system, so instance_id should be removed/replaced. > Meanwhile pass -1 as instance_id if you do not already have a clearly > @@ -777,7 +782,7 @@ void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque) > > QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se) { > if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) { > - QTAILQ_REMOVE(&savevm_state.handlers, se, entry); > + savevm_state_handler_remove(se); > g_free(se->compat); > g_free(se); > } > @@ -841,7 +846,7 @@ void vmstate_unregister(DeviceState *dev, const VMStateDescription *vmsd, > > QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se) { > if (se->vmsd == vmsd && se->opaque == opaque) { > - QTAILQ_REMOVE(&savevm_state.handlers, se, entry); > + savevm_state_handler_remove(se); > g_free(se->compat); > g_free(se); > } > -- > 2.23.0 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] migration: add savevm_state_handler_remove() 2019-10-17 20:59 ` [PATCH v2 1/2] migration: add savevm_state_handler_remove() Scott Cheloha 2019-12-04 16:43 ` Dr. David Alan Gilbert @ 2020-01-08 19:07 ` Juan Quintela 1 sibling, 0 replies; 17+ messages in thread From: Juan Quintela @ 2020-01-08 19:07 UTC (permalink / raw) To: Scott Cheloha; +Cc: qemu-devel, Dr . David Alan Gilbert Scott Cheloha <cheloha@linux.vnet.ibm.com> wrote: > Create a function to abstract common logic needed when removing a > SaveStateEntry element from the savevm_state.handlers queue. > > For now we just remove the element. Soon it will involve additional > cleanup. > > Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com> Reviewed-by: Juan Quintela <quintela@redhat.com> queued ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion 2019-10-17 20:59 [PATCH v2 0/2] migration: faster savevm_state_handler_insert() Scott Cheloha 2019-10-17 20:59 ` [PATCH v2 1/2] migration: add savevm_state_handler_remove() Scott Cheloha @ 2019-10-17 20:59 ` Scott Cheloha 2019-10-18 8:16 ` Dr. David Alan Gilbert 2019-12-04 16:47 ` Dr. David Alan Gilbert 1 sibling, 2 replies; 17+ messages in thread From: Scott Cheloha @ 2019-10-17 20:59 UTC (permalink / raw) To: qemu-devel; +Cc: Dr . David Alan Gilbert, Juan Quintela savevm_state's SaveStateEntry TAILQ is a priority queue. Priority sorting is maintained by searching from head to tail for a suitable insertion spot. Insertion is thus an O(n) operation. If we instead keep track of the head of each priority's subqueue within that larger queue we can reduce this operation to O(1) time. savevm_state_handler_remove() becomes slightly more complex to accomodate these gains: we need to replace the head of a priority's subqueue when removing it. With O(1) insertion, booting VMs with many SaveStateEntry objects is more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such objects to insert. Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com> --- migration/savevm.c | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/migration/savevm.c b/migration/savevm.c index b2e3b7222a..f7a2d36bba 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -250,6 +250,7 @@ typedef struct SaveStateEntry { typedef struct SaveState { QTAILQ_HEAD(, SaveStateEntry) handlers; + SaveStateEntry *handler_pri_head[MIG_PRI_MAX + 1]; int global_section_id; uint32_t len; const char *name; @@ -261,6 +262,7 @@ typedef struct SaveState { static SaveState savevm_state = { .handlers = QTAILQ_HEAD_INITIALIZER(savevm_state.handlers), + .handler_pri_head = { [MIG_PRI_DEFAULT ... MIG_PRI_MAX] = NULL }, .global_section_id = 0, }; @@ -709,24 +711,42 @@ static void savevm_state_handler_insert(SaveStateEntry *nse) { MigrationPriority priority = save_state_priority(nse); SaveStateEntry *se; + int i; assert(priority <= MIG_PRI_MAX); - QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { - if (save_state_priority(se) < priority) { + for (i = priority - 1; i >= 0; i--) { + se = savevm_state.handler_pri_head[i]; + if (se != NULL) { + assert(save_state_priority(se) < priority); break; } } - if (se) { + if (i >= 0) { QTAILQ_INSERT_BEFORE(se, nse, entry); } else { QTAILQ_INSERT_TAIL(&savevm_state.handlers, nse, entry); } + + if (savevm_state.handler_pri_head[priority] == NULL) { + savevm_state.handler_pri_head[priority] = nse; + } } static void savevm_state_handler_remove(SaveStateEntry *se) { + SaveStateEntry *next; + MigrationPriority priority = save_state_priority(se); + + if (se == savevm_state.handler_pri_head[priority]) { + next = QTAILQ_NEXT(se, entry); + if (next != NULL && save_state_priority(next) == priority) { + savevm_state.handler_pri_head[priority] = next; + } else { + savevm_state.handler_pri_head[priority] = NULL; + } + } QTAILQ_REMOVE(&savevm_state.handlers, se, entry); } -- 2.23.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion 2019-10-17 20:59 ` [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion Scott Cheloha @ 2019-10-18 8:16 ` Dr. David Alan Gilbert 2019-10-18 8:34 ` Laurent Vivier 2019-12-04 16:47 ` Dr. David Alan Gilbert 1 sibling, 1 reply; 17+ messages in thread From: Dr. David Alan Gilbert @ 2019-10-18 8:16 UTC (permalink / raw) To: Scott Cheloha, david, lvivier; +Cc: qemu-devel, Juan Quintela * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: > savevm_state's SaveStateEntry TAILQ is a priority queue. Priority > sorting is maintained by searching from head to tail for a suitable > insertion spot. Insertion is thus an O(n) operation. > > If we instead keep track of the head of each priority's subqueue > within that larger queue we can reduce this operation to O(1) time. > > savevm_state_handler_remove() becomes slightly more complex to > accomodate these gains: we need to replace the head of a priority's > subqueue when removing it. > > With O(1) insertion, booting VMs with many SaveStateEntry objects is > more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such > objects to insert. Separate from reviewing this patch, I'd like to understand why you've got 40000 objects. This feels very very wrong and is likely to cause problems to random other bits of qemu as well. Dave > Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com> > --- > migration/savevm.c | 26 +++++++++++++++++++++++--- > 1 file changed, 23 insertions(+), 3 deletions(-) > > diff --git a/migration/savevm.c b/migration/savevm.c > index b2e3b7222a..f7a2d36bba 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -250,6 +250,7 @@ typedef struct SaveStateEntry { > > typedef struct SaveState { > QTAILQ_HEAD(, SaveStateEntry) handlers; > + SaveStateEntry *handler_pri_head[MIG_PRI_MAX + 1]; > int global_section_id; > uint32_t len; > const char *name; > @@ -261,6 +262,7 @@ typedef struct SaveState { > > static SaveState savevm_state = { > .handlers = QTAILQ_HEAD_INITIALIZER(savevm_state.handlers), > + .handler_pri_head = { [MIG_PRI_DEFAULT ... MIG_PRI_MAX] = NULL }, > .global_section_id = 0, > }; > > @@ -709,24 +711,42 @@ static void savevm_state_handler_insert(SaveStateEntry *nse) > { > MigrationPriority priority = save_state_priority(nse); > SaveStateEntry *se; > + int i; > > assert(priority <= MIG_PRI_MAX); > > - QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { > - if (save_state_priority(se) < priority) { > + for (i = priority - 1; i >= 0; i--) { > + se = savevm_state.handler_pri_head[i]; > + if (se != NULL) { > + assert(save_state_priority(se) < priority); > break; > } > } > > - if (se) { > + if (i >= 0) { > QTAILQ_INSERT_BEFORE(se, nse, entry); > } else { > QTAILQ_INSERT_TAIL(&savevm_state.handlers, nse, entry); > } > + > + if (savevm_state.handler_pri_head[priority] == NULL) { > + savevm_state.handler_pri_head[priority] = nse; > + } > } > > static void savevm_state_handler_remove(SaveStateEntry *se) > { > + SaveStateEntry *next; > + MigrationPriority priority = save_state_priority(se); > + > + if (se == savevm_state.handler_pri_head[priority]) { > + next = QTAILQ_NEXT(se, entry); > + if (next != NULL && save_state_priority(next) == priority) { > + savevm_state.handler_pri_head[priority] = next; > + } else { > + savevm_state.handler_pri_head[priority] = NULL; > + } > + } > QTAILQ_REMOVE(&savevm_state.handlers, se, entry); > } > > -- > 2.23.0 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion 2019-10-18 8:16 ` Dr. David Alan Gilbert @ 2019-10-18 8:34 ` Laurent Vivier 2019-10-18 9:43 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 17+ messages in thread From: Laurent Vivier @ 2019-10-18 8:34 UTC (permalink / raw) To: Dr. David Alan Gilbert, Scott Cheloha, david Cc: Michael Roth, qemu-devel, Juan Quintela On 18/10/2019 10:16, Dr. David Alan Gilbert wrote: > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: >> savevm_state's SaveStateEntry TAILQ is a priority queue. Priority >> sorting is maintained by searching from head to tail for a suitable >> insertion spot. Insertion is thus an O(n) operation. >> >> If we instead keep track of the head of each priority's subqueue >> within that larger queue we can reduce this operation to O(1) time. >> >> savevm_state_handler_remove() becomes slightly more complex to >> accomodate these gains: we need to replace the head of a priority's >> subqueue when removing it. >> >> With O(1) insertion, booting VMs with many SaveStateEntry objects is >> more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such >> objects to insert. > > Separate from reviewing this patch, I'd like to understand why you've > got 40000 objects. This feels very very wrong and is likely to cause > problems to random other bits of qemu as well. I think the 40000 objects are the "dr-connectors" that are used to plug peripherals (memory, pci card, cpus, ...). https://github.com/qemu/qemu/blob/master/hw/ppc/spapr_drc.c They are part of SPAPR specification. https://raw.githubusercontent.com/qemu/qemu/master/docs/specs/ppc-spapr-hotplug.txt CC Michael Roth Thanks, Laurent ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion 2019-10-18 8:34 ` Laurent Vivier @ 2019-10-18 9:43 ` Dr. David Alan Gilbert 2019-10-18 16:38 ` Michael Roth 2019-10-19 10:12 ` David Gibson 0 siblings, 2 replies; 17+ messages in thread From: Dr. David Alan Gilbert @ 2019-10-18 9:43 UTC (permalink / raw) To: Laurent Vivier Cc: Michael Roth, Juan Quintela, Scott Cheloha, qemu-devel, david * Laurent Vivier (lvivier@redhat.com) wrote: > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote: > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: > >> savevm_state's SaveStateEntry TAILQ is a priority queue. Priority > >> sorting is maintained by searching from head to tail for a suitable > >> insertion spot. Insertion is thus an O(n) operation. > >> > >> If we instead keep track of the head of each priority's subqueue > >> within that larger queue we can reduce this operation to O(1) time. > >> > >> savevm_state_handler_remove() becomes slightly more complex to > >> accomodate these gains: we need to replace the head of a priority's > >> subqueue when removing it. > >> > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is > >> more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such > >> objects to insert. > > > > Separate from reviewing this patch, I'd like to understand why you've > > got 40000 objects. This feels very very wrong and is likely to cause > > problems to random other bits of qemu as well. > > I think the 40000 objects are the "dr-connectors" that are used to plug > peripherals (memory, pci card, cpus, ...). Yes, Scott confirmed that in the reply to the previous version. IMHO nothing in qemu is designed to deal with that many devices/objects - I'm sure that something other than the migration code is going to get upset. Is perhaps the structure wrong somewhere - should there be a single DRC device that knows about all DRCs? Dave > https://github.com/qemu/qemu/blob/master/hw/ppc/spapr_drc.c > > They are part of SPAPR specification. > > https://raw.githubusercontent.com/qemu/qemu/master/docs/specs/ppc-spapr-hotplug.txt > > CC Michael Roth > > Thanks, > Laurent -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion 2019-10-18 9:43 ` Dr. David Alan Gilbert @ 2019-10-18 16:38 ` Michael Roth 2019-10-18 17:26 ` Dr. David Alan Gilbert 2019-10-21 7:33 ` David Gibson 2019-10-19 10:12 ` David Gibson 1 sibling, 2 replies; 17+ messages in thread From: Michael Roth @ 2019-10-18 16:38 UTC (permalink / raw) To: Dr. David Alan Gilbert, Laurent Vivier Cc: david, Scott Cheloha, qemu-devel, Juan Quintela Quoting Dr. David Alan Gilbert (2019-10-18 04:43:52) > * Laurent Vivier (lvivier@redhat.com) wrote: > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote: > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: > > >> savevm_state's SaveStateEntry TAILQ is a priority queue. Priority > > >> sorting is maintained by searching from head to tail for a suitable > > >> insertion spot. Insertion is thus an O(n) operation. > > >> > > >> If we instead keep track of the head of each priority's subqueue > > >> within that larger queue we can reduce this operation to O(1) time. > > >> > > >> savevm_state_handler_remove() becomes slightly more complex to > > >> accomodate these gains: we need to replace the head of a priority's > > >> subqueue when removing it. > > >> > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is > > >> more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such > > >> objects to insert. > > > > > > Separate from reviewing this patch, I'd like to understand why you've > > > got 40000 objects. This feels very very wrong and is likely to cause > > > problems to random other bits of qemu as well. > > > > I think the 40000 objects are the "dr-connectors" that are used to plug > > peripherals (memory, pci card, cpus, ...). > > Yes, Scott confirmed that in the reply to the previous version. > IMHO nothing in qemu is designed to deal with that many devices/objects > - I'm sure that something other than the migration code is going to get upset. The device/object management aspect seems to handle things *mostly* okay, at least ever since QOM child properties started being tracked by a hash table instead of a linked list. It's worth noting that that change (b604a854) was done to better handle IRQ pins for ARM guests with lots of CPUs. I think it is inevitable that certain machine types/configurations will call for large numbers of objects and I think it is fair to improve things to allow for this sort of scalability. But I agree it shouldn't be abused, and you're right that there are some problem areas that arise. Trying to outline them: a) introspection commands like 'info qom-tree' become pretty unwieldly, and with large enough numbers of objects might even break things (QMP response size limits maybe?) b) various related lists like reset handlers, vmstate/savevm handlers might grow quite large I think we could work around a) with maybe flagging certain "internally-only" objects as 'hidden'. Introspection routines could then filter these out, and routines like qom-set/qom-get could return report something similar to EACCESS so they are never used/useful to management tools. In cases like b) we can optimize things where it makes sense like with Scott's patch here. In most cases these lists need to be walked one way or another, whether it's done internally by the object or through common interfaces provided by QEMU. It's really just the O(n^2) type handling where relying on common interfaces becomes drastically less efficient, but I think we should avoid implementing things in that way anyway, or improve them as needed. > > Is perhaps the structure wrong somewhere - should there be a single DRC > device that knows about all DRCs? That's an interesting proposition, I think it's worth exploring further, but from a high level: - each SpaprDrc has migration state, and some sub-classes SpaprDrc (e.g. SpaprDrcPhysical) have additional migration state. These are sent as-needed as separate VMState entries in the migration stream. Moving to a single DRC means we're either sending them as an flat array or a sparse list, which would put just as much load on the migration code (at least, with Scott's changes in place). It would also be difficult to do all this in a way which maintains migration compatibility with older machine types. - other aspects of modeling these as QOM objects, such as look-ups, reset-handling, and memory allocations, wouldn't be dramatically improved upon by handling it all internally within the object AFAICT the biggest issue with modeling the DRCs as individual objects is actually how we deal with introspection, and we should try to improve. What do you think of the alternative suggestion above of marking certain objects as 'hidden' from various introspection interfaces? > > Dave > > > > https://github.com/qemu/qemu/blob/master/hw/ppc/spapr_drc.c > > > > They are part of SPAPR specification. > > > > https://raw.githubusercontent.com/qemu/qemu/master/docs/specs/ppc-spapr-hotplug.txt > > > > CC Michael Roth > > > > Thanks, > > Laurent > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion 2019-10-18 16:38 ` Michael Roth @ 2019-10-18 17:26 ` Dr. David Alan Gilbert 2019-10-21 7:33 ` David Gibson 1 sibling, 0 replies; 17+ messages in thread From: Dr. David Alan Gilbert @ 2019-10-18 17:26 UTC (permalink / raw) To: Michael Roth Cc: Laurent Vivier, david, Scott Cheloha, qemu-devel, Juan Quintela * Michael Roth (mdroth@linux.vnet.ibm.com) wrote: > Quoting Dr. David Alan Gilbert (2019-10-18 04:43:52) > > * Laurent Vivier (lvivier@redhat.com) wrote: > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote: > > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue. Priority > > > >> sorting is maintained by searching from head to tail for a suitable > > > >> insertion spot. Insertion is thus an O(n) operation. > > > >> > > > >> If we instead keep track of the head of each priority's subqueue > > > >> within that larger queue we can reduce this operation to O(1) time. > > > >> > > > >> savevm_state_handler_remove() becomes slightly more complex to > > > >> accomodate these gains: we need to replace the head of a priority's > > > >> subqueue when removing it. > > > >> > > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is > > > >> more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such > > > >> objects to insert. > > > > > > > > Separate from reviewing this patch, I'd like to understand why you've > > > > got 40000 objects. This feels very very wrong and is likely to cause > > > > problems to random other bits of qemu as well. > > > > > > I think the 40000 objects are the "dr-connectors" that are used to plug > > > peripherals (memory, pci card, cpus, ...). > > > > Yes, Scott confirmed that in the reply to the previous version. > > IMHO nothing in qemu is designed to deal with that many devices/objects > > - I'm sure that something other than the migration code is going to get upset. > > The device/object management aspect seems to handle things *mostly* okay, at > least ever since QOM child properties started being tracked by a hash table > instead of a linked list. It's worth noting that that change (b604a854) was > done to better handle IRQ pins for ARM guests with lots of CPUs. I think it is > inevitable that certain machine types/configurations will call for large > numbers of objects and I think it is fair to improve things to allow for this > sort of scalability. > > But I agree it shouldn't be abused, and you're right that there are some > problem areas that arise. Trying to outline them: > > a) introspection commands like 'info qom-tree' become pretty unwieldly, > and with large enough numbers of objects might even break things (QMP > response size limits maybe?) > b) various related lists like reset handlers, vmstate/savevm handlers might > grow quite large > > I think we could work around a) with maybe flagging certain > "internally-only" objects as 'hidden'. Introspection routines could then > filter these out, and routines like qom-set/qom-get could return report > something similar to EACCESS so they are never used/useful to management > tools. > > In cases like b) we can optimize things where it makes sense like with > Scott's patch here. In most cases these lists need to be walked one way > or another, whether it's done internally by the object or through common > interfaces provided by QEMU. It's really just the O(n^2) type handling > where relying on common interfaces becomes drastically less efficient, > but I think we should avoid implementing things in that way anyway, or > improve them as needed. > > > > > Is perhaps the structure wrong somewhere - should there be a single DRC > > device that knows about all DRCs? > > That's an interesting proposition, I think it's worth exploring further, > but from a high level: > > - each SpaprDrc has migration state, and some sub-classes SpaprDrc (e.g. > SpaprDrcPhysical) have additional migration state. These are sent > as-needed as separate VMState entries in the migration stream. > Moving to a single DRC means we're either sending them as an flat > array or a sparse list, which would put just as much load on the > migration code (at least, with Scott's changes in place). It would > also be difficult to do all this in a way which maintains migration > compatibility with older machine types. Having sparse arrays etc within a vmstate isn't as bad; none of them actually need to be 'objects' as such - even if you have separate chunks of VMState. > - other aspects of modeling these as QOM objects, such as look-ups, > reset-handling, and memory allocations, wouldn't be dramatically > improved upon by handling it all internally within the object > > AFAICT the biggest issue with modeling the DRCs as individual objects > is actually how we deal with introspection, and we should try to > improve. What do you think of the alternative suggestion above of > marking certain objects as 'hidden' from various introspection > interfaces? That's one for someone who knows/cares about QOM more than me; Paolo, Dan Berrange, or Eduardo Habkost are QOM people. Dave > > > > Dave > > > > > > > https://github.com/qemu/qemu/blob/master/hw/ppc/spapr_drc.c > > > > > > They are part of SPAPR specification. > > > > > > https://raw.githubusercontent.com/qemu/qemu/master/docs/specs/ppc-spapr-hotplug.txt > > > > > > CC Michael Roth > > > > > > Thanks, > > > Laurent > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion 2019-10-18 16:38 ` Michael Roth 2019-10-18 17:26 ` Dr. David Alan Gilbert @ 2019-10-21 7:33 ` David Gibson 1 sibling, 0 replies; 17+ messages in thread From: David Gibson @ 2019-10-21 7:33 UTC (permalink / raw) To: Michael Roth Cc: Laurent Vivier, qemu-devel, Scott Cheloha, Dr. David Alan Gilbert, Juan Quintela [-- Attachment #1: Type: text/plain, Size: 5256 bytes --] On Fri, Oct 18, 2019 at 11:38:37AM -0500, Michael Roth wrote: > Quoting Dr. David Alan Gilbert (2019-10-18 04:43:52) > > * Laurent Vivier (lvivier@redhat.com) wrote: > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote: > > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue. Priority > > > >> sorting is maintained by searching from head to tail for a suitable > > > >> insertion spot. Insertion is thus an O(n) operation. > > > >> > > > >> If we instead keep track of the head of each priority's subqueue > > > >> within that larger queue we can reduce this operation to O(1) time. > > > >> > > > >> savevm_state_handler_remove() becomes slightly more complex to > > > >> accomodate these gains: we need to replace the head of a priority's > > > >> subqueue when removing it. > > > >> > > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is > > > >> more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such > > > >> objects to insert. > > > > > > > > Separate from reviewing this patch, I'd like to understand why you've > > > > got 40000 objects. This feels very very wrong and is likely to cause > > > > problems to random other bits of qemu as well. > > > > > > I think the 40000 objects are the "dr-connectors" that are used to plug > > > peripherals (memory, pci card, cpus, ...). > > > > Yes, Scott confirmed that in the reply to the previous version. > > IMHO nothing in qemu is designed to deal with that many devices/objects > > - I'm sure that something other than the migration code is going to get upset. > > The device/object management aspect seems to handle things *mostly* okay, at > least ever since QOM child properties started being tracked by a hash table > instead of a linked list. It's worth noting that that change (b604a854) was > done to better handle IRQ pins for ARM guests with lots of CPUs. I think it is > inevitable that certain machine types/configurations will call for large > numbers of objects and I think it is fair to improve things to allow for this > sort of scalability. > > But I agree it shouldn't be abused, and you're right that there are some > problem areas that arise. Trying to outline them: > > a) introspection commands like 'info qom-tree' become pretty unwieldly, > and with large enough numbers of objects might even break things (QMP > response size limits maybe?) > b) various related lists like reset handlers, vmstate/savevm handlers might > grow quite large > > I think we could work around a) with maybe flagging certain > "internally-only" objects as 'hidden'. Introspection routines could then > filter these out, and routines like qom-set/qom-get could return report > something similar to EACCESS so they are never used/useful to management > tools. > > In cases like b) we can optimize things where it makes sense like with > Scott's patch here. In most cases these lists need to be walked one way > or another, whether it's done internally by the object or through common > interfaces provided by QEMU. It's really just the O(n^2) type handling > where relying on common interfaces becomes drastically less efficient, > but I think we should avoid implementing things in that way anyway, or > improve them as needed. > > > > > Is perhaps the structure wrong somewhere - should there be a single DRC > > device that knows about all DRCs? > > That's an interesting proposition, I think it's worth exploring further, > but from a high level: > > - each SpaprDrc has migration state, and some sub-classes SpaprDrc (e.g. > SpaprDrcPhysical) have additional migration state. These are sent > as-needed as separate VMState entries in the migration stream. > Moving to a single DRC means we're either sending them as an flat > array or a sparse list, which would put just as much load on the > migration code (at least, with Scott's changes in place). It would > also be difficult to do all this in a way which maintains migration > compatibility with older machine types. > - other aspects of modeling these as QOM objects, such as look-ups, > reset-handling, and memory allocations, wouldn't be dramatically > improved upon by handling it all internally within the object > > AFAICT the biggest issue with modeling the DRCs as individual objects > is actually how we deal with introspection, and we should try to > improve. What do you think of the alternative suggestion above of > marking certain objects as 'hidden' from various introspection > interfaces? So, that's not something I'd considered particularly in this context, but it has bothered me in other contexts. The fact that all the QOM interfaces are freely user-inspectable, but are also used for a bunch of interaction between qemu components means that we (arguably) routinely make a bunch of stuff into user-visible API which we probably don't really need or want to. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion 2019-10-18 9:43 ` Dr. David Alan Gilbert 2019-10-18 16:38 ` Michael Roth @ 2019-10-19 10:12 ` David Gibson 2019-10-21 8:14 ` Dr. David Alan Gilbert 1 sibling, 1 reply; 17+ messages in thread From: David Gibson @ 2019-10-19 10:12 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Laurent Vivier, Michael Roth, Scott Cheloha, qemu-devel, Juan Quintela [-- Attachment #1: Type: text/plain, Size: 2352 bytes --] On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote: > * Laurent Vivier (lvivier@redhat.com) wrote: > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote: > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: > > >> savevm_state's SaveStateEntry TAILQ is a priority queue. Priority > > >> sorting is maintained by searching from head to tail for a suitable > > >> insertion spot. Insertion is thus an O(n) operation. > > >> > > >> If we instead keep track of the head of each priority's subqueue > > >> within that larger queue we can reduce this operation to O(1) time. > > >> > > >> savevm_state_handler_remove() becomes slightly more complex to > > >> accomodate these gains: we need to replace the head of a priority's > > >> subqueue when removing it. > > >> > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is > > >> more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such > > >> objects to insert. > > > > > > Separate from reviewing this patch, I'd like to understand why you've > > > got 40000 objects. This feels very very wrong and is likely to cause > > > problems to random other bits of qemu as well. > > > > I think the 40000 objects are the "dr-connectors" that are used to plug > > peripherals (memory, pci card, cpus, ...). > > Yes, Scott confirmed that in the reply to the previous version. > IMHO nothing in qemu is designed to deal with that many devices/objects > - I'm sure that something other than the migration code is going to > get upset. It kind of did. Particularly when there was n^2 and n^3 cubed behaviour in the property stuff we had some ludicrously long startup times (hours) with large maxmem values. Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem. The problem is the memory DRCs, there's one for each LMB - each 256MiB chunk of memory (or possible memory). > Is perhaps the structure wrong somewhere - should there be a single DRC > device that knows about all DRCs? Maybe. The tricky bit is how to get there from here without breaking migration or something else along the way. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion 2019-10-19 10:12 ` David Gibson @ 2019-10-21 8:14 ` Dr. David Alan Gilbert 2019-11-20 21:48 ` Scott Cheloha 0 siblings, 1 reply; 17+ messages in thread From: Dr. David Alan Gilbert @ 2019-10-21 8:14 UTC (permalink / raw) To: David Gibson Cc: Laurent Vivier, Michael Roth, Scott Cheloha, qemu-devel, Juan Quintela * David Gibson (david@gibson.dropbear.id.au) wrote: > On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote: > > * Laurent Vivier (lvivier@redhat.com) wrote: > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote: > > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue. Priority > > > >> sorting is maintained by searching from head to tail for a suitable > > > >> insertion spot. Insertion is thus an O(n) operation. > > > >> > > > >> If we instead keep track of the head of each priority's subqueue > > > >> within that larger queue we can reduce this operation to O(1) time. > > > >> > > > >> savevm_state_handler_remove() becomes slightly more complex to > > > >> accomodate these gains: we need to replace the head of a priority's > > > >> subqueue when removing it. > > > >> > > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is > > > >> more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such > > > >> objects to insert. > > > > > > > > Separate from reviewing this patch, I'd like to understand why you've > > > > got 40000 objects. This feels very very wrong and is likely to cause > > > > problems to random other bits of qemu as well. > > > > > > I think the 40000 objects are the "dr-connectors" that are used to plug > > > peripherals (memory, pci card, cpus, ...). > > > > Yes, Scott confirmed that in the reply to the previous version. > > IMHO nothing in qemu is designed to deal with that many devices/objects > > - I'm sure that something other than the migration code is going to > > get upset. > > It kind of did. Particularly when there was n^2 and n^3 cubed > behaviour in the property stuff we had some ludicrously long startup > times (hours) with large maxmem values. > > Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem. > The problem is the memory DRCs, there's one for each LMB - each 256MiB > chunk of memory (or possible memory). > > > Is perhaps the structure wrong somewhere - should there be a single DRC > > device that knows about all DRCs? > > Maybe. The tricky bit is how to get there from here without breaking > migration or something else along the way. Switch on the next machine type version - it doesn't matter if migration is incompatible then. Without knowing anything about the innards of DRCs, I suggest a DRCMulti that takes a parameter and represents 'n' DRCs at consecutive chunks of memory. Then use one DRCMulti for each RAMBlock or DIMM or other convenient sized thing. Dave > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion 2019-10-21 8:14 ` Dr. David Alan Gilbert @ 2019-11-20 21:48 ` Scott Cheloha 2019-12-04 16:49 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 17+ messages in thread From: Scott Cheloha @ 2019-11-20 21:48 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Laurent Vivier, qemu-devel, Juan Quintela, Michael Roth, David Gibson On Mon, Oct 21, 2019 at 09:14:44AM +0100, Dr. David Alan Gilbert wrote: > * David Gibson (david@gibson.dropbear.id.au) wrote: > > On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote: > > > * Laurent Vivier (lvivier@redhat.com) wrote: > > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote: > > > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: > > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue. Priority > > > > >> sorting is maintained by searching from head to tail for a suitable > > > > >> insertion spot. Insertion is thus an O(n) operation. > > > > >> > > > > >> If we instead keep track of the head of each priority's subqueue > > > > >> within that larger queue we can reduce this operation to O(1) time. > > > > >> > > > > >> savevm_state_handler_remove() becomes slightly more complex to > > > > >> accomodate these gains: we need to replace the head of a priority's > > > > >> subqueue when removing it. > > > > >> > > > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is > > > > >> more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such > > > > >> objects to insert. > > > > > > > > > > Separate from reviewing this patch, I'd like to understand why you've > > > > > got 40000 objects. This feels very very wrong and is likely to cause > > > > > problems to random other bits of qemu as well. > > > > > > > > I think the 40000 objects are the "dr-connectors" that are used to plug > > > > peripherals (memory, pci card, cpus, ...). > > > > > > Yes, Scott confirmed that in the reply to the previous version. > > > IMHO nothing in qemu is designed to deal with that many devices/objects > > > - I'm sure that something other than the migration code is going to > > > get upset. > > > > It kind of did. Particularly when there was n^2 and n^3 cubed > > behaviour in the property stuff we had some ludicrously long startup > > times (hours) with large maxmem values. > > > > Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem. > > The problem is the memory DRCs, there's one for each LMB - each 256MiB > > chunk of memory (or possible memory). > > > > > Is perhaps the structure wrong somewhere - should there be a single DRC > > > device that knows about all DRCs? > > > > Maybe. The tricky bit is how to get there from here without breaking > > migration or something else along the way. > > Switch on the next machine type version - it doesn't matter if migration > is incompatible then. 1mo bump. Is there anything I need to do with this patch in particular to make it suitable for merging? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion 2019-11-20 21:48 ` Scott Cheloha @ 2019-12-04 16:49 ` Dr. David Alan Gilbert 2019-12-04 22:28 ` David Gibson 0 siblings, 1 reply; 17+ messages in thread From: Dr. David Alan Gilbert @ 2019-12-04 16:49 UTC (permalink / raw) To: Scott Cheloha Cc: Laurent Vivier, qemu-devel, Juan Quintela, Michael Roth, David Gibson * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: > On Mon, Oct 21, 2019 at 09:14:44AM +0100, Dr. David Alan Gilbert wrote: > > * David Gibson (david@gibson.dropbear.id.au) wrote: > > > On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote: > > > > * Laurent Vivier (lvivier@redhat.com) wrote: > > > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote: > > > > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: > > > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue. Priority > > > > > >> sorting is maintained by searching from head to tail for a suitable > > > > > >> insertion spot. Insertion is thus an O(n) operation. > > > > > >> > > > > > >> If we instead keep track of the head of each priority's subqueue > > > > > >> within that larger queue we can reduce this operation to O(1) time. > > > > > >> > > > > > >> savevm_state_handler_remove() becomes slightly more complex to > > > > > >> accomodate these gains: we need to replace the head of a priority's > > > > > >> subqueue when removing it. > > > > > >> > > > > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is > > > > > >> more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such > > > > > >> objects to insert. > > > > > > > > > > > > Separate from reviewing this patch, I'd like to understand why you've > > > > > > got 40000 objects. This feels very very wrong and is likely to cause > > > > > > problems to random other bits of qemu as well. > > > > > > > > > > I think the 40000 objects are the "dr-connectors" that are used to plug > > > > > peripherals (memory, pci card, cpus, ...). > > > > > > > > Yes, Scott confirmed that in the reply to the previous version. > > > > IMHO nothing in qemu is designed to deal with that many devices/objects > > > > - I'm sure that something other than the migration code is going to > > > > get upset. > > > > > > It kind of did. Particularly when there was n^2 and n^3 cubed > > > behaviour in the property stuff we had some ludicrously long startup > > > times (hours) with large maxmem values. > > > > > > Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem. > > > The problem is the memory DRCs, there's one for each LMB - each 256MiB > > > chunk of memory (or possible memory). > > > > > > > Is perhaps the structure wrong somewhere - should there be a single DRC > > > > device that knows about all DRCs? > > > > > > Maybe. The tricky bit is how to get there from here without breaking > > > migration or something else along the way. > > > > Switch on the next machine type version - it doesn't matter if migration > > is incompatible then. > > 1mo bump. > > Is there anything I need to do with this patch in particular to make it suitable > for merging? Apologies for the delay; hopefully this will go in one of the pulls just after the tree opens again. Please please try and work on reducing the number of objects somehow - while this migration fix is a useful short term fix, and not too invasive; having that many objects around qemu is a really really bad idea so needs fixing properly. Dave > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion 2019-12-04 16:49 ` Dr. David Alan Gilbert @ 2019-12-04 22:28 ` David Gibson 0 siblings, 0 replies; 17+ messages in thread From: David Gibson @ 2019-12-04 22:28 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Laurent Vivier, Juan Quintela, Scott Cheloha, Michael Roth, qemu-devel [-- Attachment #1: Type: text/plain, Size: 3629 bytes --] On Wed, Dec 04, 2019 at 04:49:15PM +0000, Dr. David Alan Gilbert wrote: > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: > > On Mon, Oct 21, 2019 at 09:14:44AM +0100, Dr. David Alan Gilbert wrote: > > > * David Gibson (david@gibson.dropbear.id.au) wrote: > > > > On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote: > > > > > * Laurent Vivier (lvivier@redhat.com) wrote: > > > > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote: > > > > > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: > > > > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue. Priority > > > > > > >> sorting is maintained by searching from head to tail for a suitable > > > > > > >> insertion spot. Insertion is thus an O(n) operation. > > > > > > >> > > > > > > >> If we instead keep track of the head of each priority's subqueue > > > > > > >> within that larger queue we can reduce this operation to O(1) time. > > > > > > >> > > > > > > >> savevm_state_handler_remove() becomes slightly more complex to > > > > > > >> accomodate these gains: we need to replace the head of a priority's > > > > > > >> subqueue when removing it. > > > > > > >> > > > > > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is > > > > > > >> more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such > > > > > > >> objects to insert. > > > > > > > > > > > > > > Separate from reviewing this patch, I'd like to understand why you've > > > > > > > got 40000 objects. This feels very very wrong and is likely to cause > > > > > > > problems to random other bits of qemu as well. > > > > > > > > > > > > I think the 40000 objects are the "dr-connectors" that are used to plug > > > > > > peripherals (memory, pci card, cpus, ...). > > > > > > > > > > Yes, Scott confirmed that in the reply to the previous version. > > > > > IMHO nothing in qemu is designed to deal with that many devices/objects > > > > > - I'm sure that something other than the migration code is going to > > > > > get upset. > > > > > > > > It kind of did. Particularly when there was n^2 and n^3 cubed > > > > behaviour in the property stuff we had some ludicrously long startup > > > > times (hours) with large maxmem values. > > > > > > > > Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem. > > > > The problem is the memory DRCs, there's one for each LMB - each 256MiB > > > > chunk of memory (or possible memory). > > > > > > > > > Is perhaps the structure wrong somewhere - should there be a single DRC > > > > > device that knows about all DRCs? > > > > > > > > Maybe. The tricky bit is how to get there from here without breaking > > > > migration or something else along the way. > > > > > > Switch on the next machine type version - it doesn't matter if migration > > > is incompatible then. > > > > 1mo bump. > > > > Is there anything I need to do with this patch in particular to make it suitable > > for merging? > > Apologies for the delay; hopefully this will go in one of the pulls > just after the tree opens again. > > Please please try and work on reducing the number of objects somehow - > while this migration fix is a useful short term fix, and not too > invasive; having that many objects around qemu is a really really bad > idea so needs fixing properly. I'm hoping to have a crack at this tomorrow. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion 2019-10-17 20:59 ` [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion Scott Cheloha 2019-10-18 8:16 ` Dr. David Alan Gilbert @ 2019-12-04 16:47 ` Dr. David Alan Gilbert 1 sibling, 0 replies; 17+ messages in thread From: Dr. David Alan Gilbert @ 2019-12-04 16:47 UTC (permalink / raw) To: Scott Cheloha; +Cc: qemu-devel, Juan Quintela * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote: > savevm_state's SaveStateEntry TAILQ is a priority queue. Priority > sorting is maintained by searching from head to tail for a suitable > insertion spot. Insertion is thus an O(n) operation. > > If we instead keep track of the head of each priority's subqueue > within that larger queue we can reduce this operation to O(1) time. > > savevm_state_handler_remove() becomes slightly more complex to > accomodate these gains: we need to replace the head of a priority's > subqueue when removing it. > > With O(1) insertion, booting VMs with many SaveStateEntry objects is > more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such > objects to insert. > > Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com> OK, it took me a while to figure out why you didn't just turn handlers into handlers[MIG_PRI_MAX]; but I guess the problem is you would have to change all the foreach's scattered around that walk the list. So Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/savevm.c | 26 +++++++++++++++++++++++--- > 1 file changed, 23 insertions(+), 3 deletions(-) > > diff --git a/migration/savevm.c b/migration/savevm.c > index b2e3b7222a..f7a2d36bba 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -250,6 +250,7 @@ typedef struct SaveStateEntry { > > typedef struct SaveState { > QTAILQ_HEAD(, SaveStateEntry) handlers; > + SaveStateEntry *handler_pri_head[MIG_PRI_MAX + 1]; > int global_section_id; > uint32_t len; > const char *name; > @@ -261,6 +262,7 @@ typedef struct SaveState { > > static SaveState savevm_state = { > .handlers = QTAILQ_HEAD_INITIALIZER(savevm_state.handlers), > + .handler_pri_head = { [MIG_PRI_DEFAULT ... MIG_PRI_MAX] = NULL }, > .global_section_id = 0, > }; > > @@ -709,24 +711,42 @@ static void savevm_state_handler_insert(SaveStateEntry *nse) > { > MigrationPriority priority = save_state_priority(nse); > SaveStateEntry *se; > + int i; > > assert(priority <= MIG_PRI_MAX); > > - QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { > - if (save_state_priority(se) < priority) { > + for (i = priority - 1; i >= 0; i--) { > + se = savevm_state.handler_pri_head[i]; > + if (se != NULL) { > + assert(save_state_priority(se) < priority); > break; > } > } > > - if (se) { > + if (i >= 0) { > QTAILQ_INSERT_BEFORE(se, nse, entry); > } else { > QTAILQ_INSERT_TAIL(&savevm_state.handlers, nse, entry); > } > + > + if (savevm_state.handler_pri_head[priority] == NULL) { > + savevm_state.handler_pri_head[priority] = nse; > + } > } > > static void savevm_state_handler_remove(SaveStateEntry *se) > { > + SaveStateEntry *next; > + MigrationPriority priority = save_state_priority(se); > + > + if (se == savevm_state.handler_pri_head[priority]) { > + next = QTAILQ_NEXT(se, entry); > + if (next != NULL && save_state_priority(next) == priority) { > + savevm_state.handler_pri_head[priority] = next; > + } else { > + savevm_state.handler_pri_head[priority] = NULL; > + } > + } > QTAILQ_REMOVE(&savevm_state.handlers, se, entry); > } > > -- > 2.23.0 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2020-01-08 19:08 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-17 20:59 [PATCH v2 0/2] migration: faster savevm_state_handler_insert() Scott Cheloha 2019-10-17 20:59 ` [PATCH v2 1/2] migration: add savevm_state_handler_remove() Scott Cheloha 2019-12-04 16:43 ` Dr. David Alan Gilbert 2020-01-08 19:07 ` Juan Quintela 2019-10-17 20:59 ` [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion Scott Cheloha 2019-10-18 8:16 ` Dr. David Alan Gilbert 2019-10-18 8:34 ` Laurent Vivier 2019-10-18 9:43 ` Dr. David Alan Gilbert 2019-10-18 16:38 ` Michael Roth 2019-10-18 17:26 ` Dr. David Alan Gilbert 2019-10-21 7:33 ` David Gibson 2019-10-19 10:12 ` David Gibson 2019-10-21 8:14 ` Dr. David Alan Gilbert 2019-11-20 21:48 ` Scott Cheloha 2019-12-04 16:49 ` Dr. David Alan Gilbert 2019-12-04 22:28 ` David Gibson 2019-12-04 16:47 ` Dr. David Alan Gilbert
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.