qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] migration: faster savevm_state_handler_insert()
@ 2019-10-17 20:59 Scott Cheloha
  2019-10-17 20:59 ` [PATCH v2 1/2] migration: add savevm_state_handler_remove() Scott Cheloha
  2019-10-17 20:59 ` [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion Scott Cheloha
  0 siblings, 2 replies; 17+ messages in thread
From: Scott Cheloha @ 2019-10-17 20:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr . David Alan Gilbert, Juan Quintela

The savevm_state.handlers queue of SaveStateEntry objects is a
priority queue with an O(n) insertion cost.  This is makes startup
extremely slow when a VM has many such objects to register.  For
instance, a ppc64 VM with a large (8T+) maxmem needs to register tens
of thousands of SaveStateEntry handlers: startup for these VMs is
glacial.

If we track insertion points within the priority queue we can make
savevm_state_handler_insert() a constant-time operation with little
change to the module's code.  Startup times for VMs with many handlers
are dramatically improved as a result.

Changes since v1:
  * Split patch 1 into 2 patches.

Scott Cheloha (2):
  migration: add savevm_state_handler_remove()
  migration: savevm_state_handler_insert: constant-time element
    insertion

 migration/savevm.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

-- 
2.23.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/2] migration: add savevm_state_handler_remove()
  2019-10-17 20:59 [PATCH v2 0/2] migration: faster savevm_state_handler_insert() Scott Cheloha
@ 2019-10-17 20:59 ` Scott Cheloha
  2019-12-04 16:43   ` Dr. David Alan Gilbert
  2020-01-08 19:07   ` Juan Quintela
  2019-10-17 20:59 ` [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion Scott Cheloha
  1 sibling, 2 replies; 17+ messages in thread
From: Scott Cheloha @ 2019-10-17 20:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr . David Alan Gilbert, Juan Quintela

Create a function to abstract common logic needed when removing a
SaveStateEntry element from the savevm_state.handlers queue.

For now we just remove the element.  Soon it will involve additional
cleanup.

Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
---
 migration/savevm.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 8d95e261f6..b2e3b7222a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -725,6 +725,11 @@ static void savevm_state_handler_insert(SaveStateEntry *nse)
     }
 }
 
+static void savevm_state_handler_remove(SaveStateEntry *se)
+{
+    QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
+}
+
 /* TODO: Individual devices generally have very little idea about the rest
    of the system, so instance_id should be removed/replaced.
    Meanwhile pass -1 as instance_id if you do not already have a clearly
@@ -777,7 +782,7 @@ void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque)
 
     QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se) {
         if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
-            QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
+            savevm_state_handler_remove(se);
             g_free(se->compat);
             g_free(se);
         }
@@ -841,7 +846,7 @@ void vmstate_unregister(DeviceState *dev, const VMStateDescription *vmsd,
 
     QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se) {
         if (se->vmsd == vmsd && se->opaque == opaque) {
-            QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
+            savevm_state_handler_remove(se);
             g_free(se->compat);
             g_free(se);
         }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
  2019-10-17 20:59 [PATCH v2 0/2] migration: faster savevm_state_handler_insert() Scott Cheloha
  2019-10-17 20:59 ` [PATCH v2 1/2] migration: add savevm_state_handler_remove() Scott Cheloha
@ 2019-10-17 20:59 ` Scott Cheloha
  2019-10-18  8:16   ` Dr. David Alan Gilbert
  2019-12-04 16:47   ` Dr. David Alan Gilbert
  1 sibling, 2 replies; 17+ messages in thread
From: Scott Cheloha @ 2019-10-17 20:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr . David Alan Gilbert, Juan Quintela

savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
sorting is maintained by searching from head to tail for a suitable
insertion spot.  Insertion is thus an O(n) operation.

If we instead keep track of the head of each priority's subqueue
within that larger queue we can reduce this operation to O(1) time.

savevm_state_handler_remove() becomes slightly more complex to
accomodate these gains: we need to replace the head of a priority's
subqueue when removing it.

With O(1) insertion, booting VMs with many SaveStateEntry objects is
more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
objects to insert.

Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
---
 migration/savevm.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index b2e3b7222a..f7a2d36bba 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -250,6 +250,7 @@ typedef struct SaveStateEntry {
 
 typedef struct SaveState {
     QTAILQ_HEAD(, SaveStateEntry) handlers;
+    SaveStateEntry *handler_pri_head[MIG_PRI_MAX + 1];
     int global_section_id;
     uint32_t len;
     const char *name;
@@ -261,6 +262,7 @@ typedef struct SaveState {
 
 static SaveState savevm_state = {
     .handlers = QTAILQ_HEAD_INITIALIZER(savevm_state.handlers),
+    .handler_pri_head = { [MIG_PRI_DEFAULT ... MIG_PRI_MAX] = NULL },
     .global_section_id = 0,
 };
 
@@ -709,24 +711,42 @@ static void savevm_state_handler_insert(SaveStateEntry *nse)
 {
     MigrationPriority priority = save_state_priority(nse);
     SaveStateEntry *se;
+    int i;
 
     assert(priority <= MIG_PRI_MAX);
 
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (save_state_priority(se) < priority) {
+    for (i = priority - 1; i >= 0; i--) {
+        se = savevm_state.handler_pri_head[i];
+        if (se != NULL) {
+            assert(save_state_priority(se) < priority);
             break;
         }
     }
 
-    if (se) {
+    if (i >= 0) {
         QTAILQ_INSERT_BEFORE(se, nse, entry);
     } else {
         QTAILQ_INSERT_TAIL(&savevm_state.handlers, nse, entry);
     }
+
+    if (savevm_state.handler_pri_head[priority] == NULL) {
+        savevm_state.handler_pri_head[priority] = nse;
+    }
 }
 
 static void savevm_state_handler_remove(SaveStateEntry *se)
 {
+    SaveStateEntry *next;
+    MigrationPriority priority = save_state_priority(se);
+
+    if (se == savevm_state.handler_pri_head[priority]) {
+        next = QTAILQ_NEXT(se, entry);
+        if (next != NULL && save_state_priority(next) == priority) {
+            savevm_state.handler_pri_head[priority] = next;
+        } else {
+            savevm_state.handler_pri_head[priority] = NULL;
+        }
+    }
     QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
 }
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
  2019-10-17 20:59 ` [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion Scott Cheloha
@ 2019-10-18  8:16   ` Dr. David Alan Gilbert
  2019-10-18  8:34     ` Laurent Vivier
  2019-12-04 16:47   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2019-10-18  8:16 UTC (permalink / raw)
  To: Scott Cheloha, david, lvivier; +Cc: qemu-devel, Juan Quintela

* Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> sorting is maintained by searching from head to tail for a suitable
> insertion spot.  Insertion is thus an O(n) operation.
> 
> If we instead keep track of the head of each priority's subqueue
> within that larger queue we can reduce this operation to O(1) time.
> 
> savevm_state_handler_remove() becomes slightly more complex to
> accomodate these gains: we need to replace the head of a priority's
> subqueue when removing it.
> 
> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
> objects to insert.

Separate from reviewing this patch, I'd like to understand why you've
got 40000 objects.  This feels very very wrong and is likely to cause
problems to random other bits of qemu as well.

Dave

> Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>
> ---
>  migration/savevm.c | 26 +++++++++++++++++++++++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index b2e3b7222a..f7a2d36bba 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -250,6 +250,7 @@ typedef struct SaveStateEntry {
>  
>  typedef struct SaveState {
>      QTAILQ_HEAD(, SaveStateEntry) handlers;
> +    SaveStateEntry *handler_pri_head[MIG_PRI_MAX + 1];
>      int global_section_id;
>      uint32_t len;
>      const char *name;
> @@ -261,6 +262,7 @@ typedef struct SaveState {
>  
>  static SaveState savevm_state = {
>      .handlers = QTAILQ_HEAD_INITIALIZER(savevm_state.handlers),
> +    .handler_pri_head = { [MIG_PRI_DEFAULT ... MIG_PRI_MAX] = NULL },
>      .global_section_id = 0,
>  };
>  
> @@ -709,24 +711,42 @@ static void savevm_state_handler_insert(SaveStateEntry *nse)
>  {
>      MigrationPriority priority = save_state_priority(nse);
>      SaveStateEntry *se;
> +    int i;
>  
>      assert(priority <= MIG_PRI_MAX);
>  
> -    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> -        if (save_state_priority(se) < priority) {
> +    for (i = priority - 1; i >= 0; i--) {
> +        se = savevm_state.handler_pri_head[i];
> +        if (se != NULL) {
> +            assert(save_state_priority(se) < priority);
>              break;
>          }
>      }
>  
> -    if (se) {
> +    if (i >= 0) {
>          QTAILQ_INSERT_BEFORE(se, nse, entry);
>      } else {
>          QTAILQ_INSERT_TAIL(&savevm_state.handlers, nse, entry);
>      }
> +
> +    if (savevm_state.handler_pri_head[priority] == NULL) {
> +        savevm_state.handler_pri_head[priority] = nse;
> +    }
>  }
>  
>  static void savevm_state_handler_remove(SaveStateEntry *se)
>  {
> +    SaveStateEntry *next;
> +    MigrationPriority priority = save_state_priority(se);
> +
> +    if (se == savevm_state.handler_pri_head[priority]) {
> +        next = QTAILQ_NEXT(se, entry);
> +        if (next != NULL && save_state_priority(next) == priority) {
> +            savevm_state.handler_pri_head[priority] = next;
> +        } else {
> +            savevm_state.handler_pri_head[priority] = NULL;
> +        }
> +    }
>      QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
>  }
>  
> -- 
> 2.23.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
  2019-10-18  8:16   ` Dr. David Alan Gilbert
@ 2019-10-18  8:34     ` Laurent Vivier
  2019-10-18  9:43       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 17+ messages in thread
From: Laurent Vivier @ 2019-10-18  8:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Scott Cheloha, david
  Cc: Michael Roth, qemu-devel, Juan Quintela

On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
>> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
>> sorting is maintained by searching from head to tail for a suitable
>> insertion spot.  Insertion is thus an O(n) operation.
>>
>> If we instead keep track of the head of each priority's subqueue
>> within that larger queue we can reduce this operation to O(1) time.
>>
>> savevm_state_handler_remove() becomes slightly more complex to
>> accomodate these gains: we need to replace the head of a priority's
>> subqueue when removing it.
>>
>> With O(1) insertion, booting VMs with many SaveStateEntry objects is
>> more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
>> objects to insert.
> 
> Separate from reviewing this patch, I'd like to understand why you've
> got 40000 objects.  This feels very very wrong and is likely to cause
> problems to random other bits of qemu as well.

I think the 40000 objects are the "dr-connectors" that are used to plug
peripherals (memory, pci card, cpus, ...).

https://github.com/qemu/qemu/blob/master/hw/ppc/spapr_drc.c

They are part of SPAPR specification.

https://raw.githubusercontent.com/qemu/qemu/master/docs/specs/ppc-spapr-hotplug.txt

CC Michael Roth

Thanks,
Laurent


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
  2019-10-18  8:34     ` Laurent Vivier
@ 2019-10-18  9:43       ` Dr. David Alan Gilbert
  2019-10-18 16:38         ` Michael Roth
  2019-10-19 10:12         ` David Gibson
  0 siblings, 2 replies; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2019-10-18  9:43 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Michael Roth, Juan Quintela, Scott Cheloha, qemu-devel, david

* Laurent Vivier (lvivier@redhat.com) wrote:
> On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> >> sorting is maintained by searching from head to tail for a suitable
> >> insertion spot.  Insertion is thus an O(n) operation.
> >>
> >> If we instead keep track of the head of each priority's subqueue
> >> within that larger queue we can reduce this operation to O(1) time.
> >>
> >> savevm_state_handler_remove() becomes slightly more complex to
> >> accomodate these gains: we need to replace the head of a priority's
> >> subqueue when removing it.
> >>
> >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> >> more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
> >> objects to insert.
> > 
> > Separate from reviewing this patch, I'd like to understand why you've
> > got 40000 objects.  This feels very very wrong and is likely to cause
> > problems to random other bits of qemu as well.
> 
> I think the 40000 objects are the "dr-connectors" that are used to plug
> peripherals (memory, pci card, cpus, ...).

Yes, Scott confirmed that in the reply to the previous version.
IMHO nothing in qemu is designed to deal with that many devices/objects
- I'm sure that something other than the migration code is going to get upset.

Is perhaps the structure wrong somewhere - should there be a single DRC
device that knows about all DRCs?

Dave


> https://github.com/qemu/qemu/blob/master/hw/ppc/spapr_drc.c
> 
> They are part of SPAPR specification.
> 
> https://raw.githubusercontent.com/qemu/qemu/master/docs/specs/ppc-spapr-hotplug.txt
> 
> CC Michael Roth
> 
> Thanks,
> Laurent
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
  2019-10-18  9:43       ` Dr. David Alan Gilbert
@ 2019-10-18 16:38         ` Michael Roth
  2019-10-18 17:26           ` Dr. David Alan Gilbert
  2019-10-21  7:33           ` David Gibson
  2019-10-19 10:12         ` David Gibson
  1 sibling, 2 replies; 17+ messages in thread
From: Michael Roth @ 2019-10-18 16:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Laurent Vivier
  Cc: david, Scott Cheloha, qemu-devel, Juan Quintela

Quoting Dr. David Alan Gilbert (2019-10-18 04:43:52)
> * Laurent Vivier (lvivier@redhat.com) wrote:
> > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > >> sorting is maintained by searching from head to tail for a suitable
> > >> insertion spot.  Insertion is thus an O(n) operation.
> > >>
> > >> If we instead keep track of the head of each priority's subqueue
> > >> within that larger queue we can reduce this operation to O(1) time.
> > >>
> > >> savevm_state_handler_remove() becomes slightly more complex to
> > >> accomodate these gains: we need to replace the head of a priority's
> > >> subqueue when removing it.
> > >>
> > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
> > >> objects to insert.
> > > 
> > > Separate from reviewing this patch, I'd like to understand why you've
> > > got 40000 objects.  This feels very very wrong and is likely to cause
> > > problems to random other bits of qemu as well.
> > 
> > I think the 40000 objects are the "dr-connectors" that are used to plug
> > peripherals (memory, pci card, cpus, ...).
> 
> Yes, Scott confirmed that in the reply to the previous version.
> IMHO nothing in qemu is designed to deal with that many devices/objects
> - I'm sure that something other than the migration code is going to get upset.

The device/object management aspect seems to handle things *mostly* okay, at
least ever since QOM child properties started being tracked by a hash table
instead of a linked list. It's worth noting that that change (b604a854) was
done to better handle IRQ pins for ARM guests with lots of CPUs. I think it is
inevitable that certain machine types/configurations will call for large
numbers of objects and I think it is fair to improve things to allow for this
sort of scalability.

But I agree it shouldn't be abused, and you're right that there are some
problem areas that arise. Trying to outline them:

 a) introspection commands like 'info qom-tree' become pretty unwieldly,
    and with large enough numbers of objects might even break things (QMP
    response size limits maybe?)
 b) various related lists like reset handlers, vmstate/savevm handlers might
    grow quite large

I think we could work around a) with maybe flagging certain
"internally-only" objects as 'hidden'. Introspection routines could then
filter these out, and routines like qom-set/qom-get could return report
something similar to EACCESS so they are never used/useful to management
tools.

In cases like b) we can optimize things where it makes sense like with
Scott's patch here. In most cases these lists need to be walked one way
or another, whether it's done internally by the object or through common
interfaces provided by QEMU. It's really just the O(n^2) type handling
where relying on common interfaces becomes drastically less efficient,
but I think we should avoid implementing things in that way anyway, or
improve them as needed.

> 
> Is perhaps the structure wrong somewhere - should there be a single DRC
> device that knows about all DRCs?

That's an interesting proposition, I think it's worth exploring further,
but from a high level:

 - each SpaprDrc has migration state, and some sub-classes SpaprDrc (e.g.
   SpaprDrcPhysical) have additional migration state. These are sent
   as-needed as separate VMState entries in the migration stream.
   Moving to a single DRC means we're either sending them as an flat
   array or a sparse list, which would put just as much load on the
   migration code (at least, with Scott's changes in place). It would
   also be difficult to do all this in a way which maintains migration
   compatibility with older machine types.
 - other aspects of modeling these as QOM objects, such as look-ups,
   reset-handling, and memory allocations, wouldn't be dramatically
   improved upon by handling it all internally within the object

AFAICT the biggest issue with modeling the DRCs as individual objects
is actually how we deal with introspection, and we should try to
improve. What do you think of the alternative suggestion above of
marking certain objects as 'hidden' from various introspection
interfaces?

> 
> Dave
> 
> 
> > https://github.com/qemu/qemu/blob/master/hw/ppc/spapr_drc.c
> > 
> > They are part of SPAPR specification.
> > 
> > https://raw.githubusercontent.com/qemu/qemu/master/docs/specs/ppc-spapr-hotplug.txt
> > 
> > CC Michael Roth
> > 
> > Thanks,
> > Laurent
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
  2019-10-18 16:38         ` Michael Roth
@ 2019-10-18 17:26           ` Dr. David Alan Gilbert
  2019-10-21  7:33           ` David Gibson
  1 sibling, 0 replies; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2019-10-18 17:26 UTC (permalink / raw)
  To: Michael Roth
  Cc: Laurent Vivier, david, Scott Cheloha, qemu-devel, Juan Quintela

* Michael Roth (mdroth@linux.vnet.ibm.com) wrote:
> Quoting Dr. David Alan Gilbert (2019-10-18 04:43:52)
> > * Laurent Vivier (lvivier@redhat.com) wrote:
> > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > > >> sorting is maintained by searching from head to tail for a suitable
> > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > >>
> > > >> If we instead keep track of the head of each priority's subqueue
> > > >> within that larger queue we can reduce this operation to O(1) time.
> > > >>
> > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > >> accomodate these gains: we need to replace the head of a priority's
> > > >> subqueue when removing it.
> > > >>
> > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
> > > >> objects to insert.
> > > > 
> > > > Separate from reviewing this patch, I'd like to understand why you've
> > > > got 40000 objects.  This feels very very wrong and is likely to cause
> > > > problems to random other bits of qemu as well.
> > > 
> > > I think the 40000 objects are the "dr-connectors" that are used to plug
> > > peripherals (memory, pci card, cpus, ...).
> > 
> > Yes, Scott confirmed that in the reply to the previous version.
> > IMHO nothing in qemu is designed to deal with that many devices/objects
> > - I'm sure that something other than the migration code is going to get upset.
> 
> The device/object management aspect seems to handle things *mostly* okay, at
> least ever since QOM child properties started being tracked by a hash table
> instead of a linked list. It's worth noting that that change (b604a854) was
> done to better handle IRQ pins for ARM guests with lots of CPUs. I think it is
> inevitable that certain machine types/configurations will call for large
> numbers of objects and I think it is fair to improve things to allow for this
> sort of scalability.
> 
> But I agree it shouldn't be abused, and you're right that there are some
> problem areas that arise. Trying to outline them:
> 
>  a) introspection commands like 'info qom-tree' become pretty unwieldly,
>     and with large enough numbers of objects might even break things (QMP
>     response size limits maybe?)
>  b) various related lists like reset handlers, vmstate/savevm handlers might
>     grow quite large
> 
> I think we could work around a) with maybe flagging certain
> "internally-only" objects as 'hidden'. Introspection routines could then
> filter these out, and routines like qom-set/qom-get could return report
> something similar to EACCESS so they are never used/useful to management
> tools.
> 
> In cases like b) we can optimize things where it makes sense like with
> Scott's patch here. In most cases these lists need to be walked one way
> or another, whether it's done internally by the object or through common
> interfaces provided by QEMU. It's really just the O(n^2) type handling
> where relying on common interfaces becomes drastically less efficient,
> but I think we should avoid implementing things in that way anyway, or
> improve them as needed.
> 
> > 
> > Is perhaps the structure wrong somewhere - should there be a single DRC
> > device that knows about all DRCs?
> 
> That's an interesting proposition, I think it's worth exploring further,
> but from a high level:
> 
>  - each SpaprDrc has migration state, and some sub-classes SpaprDrc (e.g.
>    SpaprDrcPhysical) have additional migration state. These are sent
>    as-needed as separate VMState entries in the migration stream.
>    Moving to a single DRC means we're either sending them as an flat
>    array or a sparse list, which would put just as much load on the
>    migration code (at least, with Scott's changes in place). It would
>    also be difficult to do all this in a way which maintains migration
>    compatibility with older machine types.

Having sparse arrays etc within a vmstate isn't as bad; none of
them actually need to be 'objects' as such - even if you have
separate chunks of VMState.

>  - other aspects of modeling these as QOM objects, such as look-ups,
>    reset-handling, and memory allocations, wouldn't be dramatically
>    improved upon by handling it all internally within the object
> 
> AFAICT the biggest issue with modeling the DRCs as individual objects
> is actually how we deal with introspection, and we should try to
> improve. What do you think of the alternative suggestion above of
> marking certain objects as 'hidden' from various introspection
> interfaces?

That's one for someone who knows/cares about QOM more than me;
Paolo, Dan Berrange, or Eduardo Habkost are QOM people.

Dave

> > 
> > Dave
> > 
> > 
> > > https://github.com/qemu/qemu/blob/master/hw/ppc/spapr_drc.c
> > > 
> > > They are part of SPAPR specification.
> > > 
> > > https://raw.githubusercontent.com/qemu/qemu/master/docs/specs/ppc-spapr-hotplug.txt
> > > 
> > > CC Michael Roth
> > > 
> > > Thanks,
> > > Laurent
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
  2019-10-18  9:43       ` Dr. David Alan Gilbert
  2019-10-18 16:38         ` Michael Roth
@ 2019-10-19 10:12         ` David Gibson
  2019-10-21  8:14           ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 17+ messages in thread
From: David Gibson @ 2019-10-19 10:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Laurent Vivier, Michael Roth, Scott Cheloha, qemu-devel, Juan Quintela

[-- Attachment #1: Type: text/plain, Size: 2352 bytes --]

On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote:
> * Laurent Vivier (lvivier@redhat.com) wrote:
> > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > >> sorting is maintained by searching from head to tail for a suitable
> > >> insertion spot.  Insertion is thus an O(n) operation.
> > >>
> > >> If we instead keep track of the head of each priority's subqueue
> > >> within that larger queue we can reduce this operation to O(1) time.
> > >>
> > >> savevm_state_handler_remove() becomes slightly more complex to
> > >> accomodate these gains: we need to replace the head of a priority's
> > >> subqueue when removing it.
> > >>
> > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
> > >> objects to insert.
> > > 
> > > Separate from reviewing this patch, I'd like to understand why you've
> > > got 40000 objects.  This feels very very wrong and is likely to cause
> > > problems to random other bits of qemu as well.
> > 
> > I think the 40000 objects are the "dr-connectors" that are used to plug
> > peripherals (memory, pci card, cpus, ...).
> 
> Yes, Scott confirmed that in the reply to the previous version.
> IMHO nothing in qemu is designed to deal with that many devices/objects
> - I'm sure that something other than the migration code is going to
> get upset.

It kind of did.  Particularly when there was n^2 and n^3 cubed
behaviour in the property stuff we had some ludicrously long startup
times (hours) with large maxmem values.

Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem.
The problem is the memory DRCs, there's one for each LMB - each 256MiB
chunk of memory (or possible memory).

> Is perhaps the structure wrong somewhere - should there be a single DRC
> device that knows about all DRCs?

Maybe.  The tricky bit is how to get there from here without breaking
migration or something else along the way.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
  2019-10-18 16:38         ` Michael Roth
  2019-10-18 17:26           ` Dr. David Alan Gilbert
@ 2019-10-21  7:33           ` David Gibson
  1 sibling, 0 replies; 17+ messages in thread
From: David Gibson @ 2019-10-21  7:33 UTC (permalink / raw)
  To: Michael Roth
  Cc: Laurent Vivier, qemu-devel, Scott Cheloha,
	Dr. David Alan Gilbert, Juan Quintela

[-- Attachment #1: Type: text/plain, Size: 5256 bytes --]

On Fri, Oct 18, 2019 at 11:38:37AM -0500, Michael Roth wrote:
> Quoting Dr. David Alan Gilbert (2019-10-18 04:43:52)
> > * Laurent Vivier (lvivier@redhat.com) wrote:
> > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > > >> sorting is maintained by searching from head to tail for a suitable
> > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > >>
> > > >> If we instead keep track of the head of each priority's subqueue
> > > >> within that larger queue we can reduce this operation to O(1) time.
> > > >>
> > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > >> accomodate these gains: we need to replace the head of a priority's
> > > >> subqueue when removing it.
> > > >>
> > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
> > > >> objects to insert.
> > > > 
> > > > Separate from reviewing this patch, I'd like to understand why you've
> > > > got 40000 objects.  This feels very very wrong and is likely to cause
> > > > problems to random other bits of qemu as well.
> > > 
> > > I think the 40000 objects are the "dr-connectors" that are used to plug
> > > peripherals (memory, pci card, cpus, ...).
> > 
> > Yes, Scott confirmed that in the reply to the previous version.
> > IMHO nothing in qemu is designed to deal with that many devices/objects
> > - I'm sure that something other than the migration code is going to get upset.
> 
> The device/object management aspect seems to handle things *mostly* okay, at
> least ever since QOM child properties started being tracked by a hash table
> instead of a linked list. It's worth noting that that change (b604a854) was
> done to better handle IRQ pins for ARM guests with lots of CPUs. I think it is
> inevitable that certain machine types/configurations will call for large
> numbers of objects and I think it is fair to improve things to allow for this
> sort of scalability.
> 
> But I agree it shouldn't be abused, and you're right that there are some
> problem areas that arise. Trying to outline them:
> 
>  a) introspection commands like 'info qom-tree' become pretty unwieldly,
>     and with large enough numbers of objects might even break things (QMP
>     response size limits maybe?)
>  b) various related lists like reset handlers, vmstate/savevm handlers might
>     grow quite large
> 
> I think we could work around a) with maybe flagging certain
> "internally-only" objects as 'hidden'. Introspection routines could then
> filter these out, and routines like qom-set/qom-get could return report
> something similar to EACCESS so they are never used/useful to management
> tools.
> 
> In cases like b) we can optimize things where it makes sense like with
> Scott's patch here. In most cases these lists need to be walked one way
> or another, whether it's done internally by the object or through common
> interfaces provided by QEMU. It's really just the O(n^2) type handling
> where relying on common interfaces becomes drastically less efficient,
> but I think we should avoid implementing things in that way anyway, or
> improve them as needed.
> 
> > 
> > Is perhaps the structure wrong somewhere - should there be a single DRC
> > device that knows about all DRCs?
> 
> That's an interesting proposition, I think it's worth exploring further,
> but from a high level:
> 
>  - each SpaprDrc has migration state, and some sub-classes SpaprDrc (e.g.
>    SpaprDrcPhysical) have additional migration state. These are sent
>    as-needed as separate VMState entries in the migration stream.
>    Moving to a single DRC means we're either sending them as an flat
>    array or a sparse list, which would put just as much load on the
>    migration code (at least, with Scott's changes in place). It would
>    also be difficult to do all this in a way which maintains migration
>    compatibility with older machine types.
>  - other aspects of modeling these as QOM objects, such as look-ups,
>    reset-handling, and memory allocations, wouldn't be dramatically
>    improved upon by handling it all internally within the object
> 
> AFAICT the biggest issue with modeling the DRCs as individual objects
> is actually how we deal with introspection, and we should try to
> improve. What do you think of the alternative suggestion above of
> marking certain objects as 'hidden' from various introspection
> interfaces?

So, that's not something I'd considered particularly in this context,
but it has bothered me in other contexts.  The fact that all the QOM
interfaces are freely user-inspectable, but are also used for a bunch
of interaction between qemu components means that we (arguably)
routinely make a bunch of stuff into user-visible API which we
probably don't really need or want to.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
  2019-10-19 10:12         ` David Gibson
@ 2019-10-21  8:14           ` Dr. David Alan Gilbert
  2019-11-20 21:48             ` Scott Cheloha
  0 siblings, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2019-10-21  8:14 UTC (permalink / raw)
  To: David Gibson
  Cc: Laurent Vivier, Michael Roth, Scott Cheloha, qemu-devel, Juan Quintela

* David Gibson (david@gibson.dropbear.id.au) wrote:
> On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote:
> > * Laurent Vivier (lvivier@redhat.com) wrote:
> > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > > >> sorting is maintained by searching from head to tail for a suitable
> > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > >>
> > > >> If we instead keep track of the head of each priority's subqueue
> > > >> within that larger queue we can reduce this operation to O(1) time.
> > > >>
> > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > >> accomodate these gains: we need to replace the head of a priority's
> > > >> subqueue when removing it.
> > > >>
> > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
> > > >> objects to insert.
> > > > 
> > > > Separate from reviewing this patch, I'd like to understand why you've
> > > > got 40000 objects.  This feels very very wrong and is likely to cause
> > > > problems to random other bits of qemu as well.
> > > 
> > > I think the 40000 objects are the "dr-connectors" that are used to plug
> > > peripherals (memory, pci card, cpus, ...).
> > 
> > Yes, Scott confirmed that in the reply to the previous version.
> > IMHO nothing in qemu is designed to deal with that many devices/objects
> > - I'm sure that something other than the migration code is going to
> > get upset.
> 
> It kind of did.  Particularly when there was n^2 and n^3 cubed
> behaviour in the property stuff we had some ludicrously long startup
> times (hours) with large maxmem values.
> 
> Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem.
> The problem is the memory DRCs, there's one for each LMB - each 256MiB
> chunk of memory (or possible memory).
> 
> > Is perhaps the structure wrong somewhere - should there be a single DRC
> > device that knows about all DRCs?
> 
> Maybe.  The tricky bit is how to get there from here without breaking
> migration or something else along the way.

Switch on the next machine type version - it doesn't matter if migration
is incompatible then.

Without knowing anything about the innards of DRCs, I suggest a 
DRCMulti that takes a parameter and represents 'n' DRCs at consecutive
chunks of memory.  Then use one DRCMulti for each RAMBlock or DIMM or
other convenient sized thing.

Dave

> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
  2019-10-21  8:14           ` Dr. David Alan Gilbert
@ 2019-11-20 21:48             ` Scott Cheloha
  2019-12-04 16:49               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 17+ messages in thread
From: Scott Cheloha @ 2019-11-20 21:48 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Laurent Vivier, qemu-devel, Juan Quintela, Michael Roth, David Gibson

On Mon, Oct 21, 2019 at 09:14:44AM +0100, Dr. David Alan Gilbert wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote:
> > > * Laurent Vivier (lvivier@redhat.com) wrote:
> > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > > > >> sorting is maintained by searching from head to tail for a suitable
> > > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > > >>
> > > > >> If we instead keep track of the head of each priority's subqueue
> > > > >> within that larger queue we can reduce this operation to O(1) time.
> > > > >>
> > > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > > >> accomodate these gains: we need to replace the head of a priority's
> > > > >> subqueue when removing it.
> > > > >>
> > > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
> > > > >> objects to insert.
> > > > > 
> > > > > Separate from reviewing this patch, I'd like to understand why you've
> > > > > got 40000 objects.  This feels very very wrong and is likely to cause
> > > > > problems to random other bits of qemu as well.
> > > > 
> > > > I think the 40000 objects are the "dr-connectors" that are used to plug
> > > > peripherals (memory, pci card, cpus, ...).
> > > 
> > > Yes, Scott confirmed that in the reply to the previous version.
> > > IMHO nothing in qemu is designed to deal with that many devices/objects
> > > - I'm sure that something other than the migration code is going to
> > > get upset.
> > 
> > It kind of did.  Particularly when there was n^2 and n^3 cubed
> > behaviour in the property stuff we had some ludicrously long startup
> > times (hours) with large maxmem values.
> > 
> > Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem.
> > The problem is the memory DRCs, there's one for each LMB - each 256MiB
> > chunk of memory (or possible memory).
> > 
> > > Is perhaps the structure wrong somewhere - should there be a single DRC
> > > device that knows about all DRCs?
> > 
> > Maybe.  The tricky bit is how to get there from here without breaking
> > migration or something else along the way.
> 
> Switch on the next machine type version - it doesn't matter if migration
> is incompatible then.

1mo bump.

Is there anything I need to do with this patch in particular to make it suitable
for merging?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] migration: add savevm_state_handler_remove()
  2019-10-17 20:59 ` [PATCH v2 1/2] migration: add savevm_state_handler_remove() Scott Cheloha
@ 2019-12-04 16:43   ` Dr. David Alan Gilbert
  2020-01-08 19:07   ` Juan Quintela
  1 sibling, 0 replies; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2019-12-04 16:43 UTC (permalink / raw)
  To: Scott Cheloha; +Cc: qemu-devel, Juan Quintela

* Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> Create a function to abstract common logic needed when removing a
> SaveStateEntry element from the savevm_state.handlers queue.
> 
> For now we just remove the element.  Soon it will involve additional
> cleanup.
> 
> Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/savevm.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 8d95e261f6..b2e3b7222a 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -725,6 +725,11 @@ static void savevm_state_handler_insert(SaveStateEntry *nse)
>      }
>  }
>  
> +static void savevm_state_handler_remove(SaveStateEntry *se)
> +{
> +    QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
> +}
> +
>  /* TODO: Individual devices generally have very little idea about the rest
>     of the system, so instance_id should be removed/replaced.
>     Meanwhile pass -1 as instance_id if you do not already have a clearly
> @@ -777,7 +782,7 @@ void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque)
>  
>      QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se) {
>          if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
> -            QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
> +            savevm_state_handler_remove(se);
>              g_free(se->compat);
>              g_free(se);
>          }
> @@ -841,7 +846,7 @@ void vmstate_unregister(DeviceState *dev, const VMStateDescription *vmsd,
>  
>      QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se) {
>          if (se->vmsd == vmsd && se->opaque == opaque) {
> -            QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
> +            savevm_state_handler_remove(se);
>              g_free(se->compat);
>              g_free(se);
>          }
> -- 
> 2.23.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
  2019-10-17 20:59 ` [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion Scott Cheloha
  2019-10-18  8:16   ` Dr. David Alan Gilbert
@ 2019-12-04 16:47   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2019-12-04 16:47 UTC (permalink / raw)
  To: Scott Cheloha; +Cc: qemu-devel, Juan Quintela

* Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> sorting is maintained by searching from head to tail for a suitable
> insertion spot.  Insertion is thus an O(n) operation.
> 
> If we instead keep track of the head of each priority's subqueue
> within that larger queue we can reduce this operation to O(1) time.
> 
> savevm_state_handler_remove() becomes slightly more complex to
> accomodate these gains: we need to replace the head of a priority's
> subqueue when removing it.
> 
> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
> objects to insert.
> 
> Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>

OK, it took me a while to figure out why you didn't just
turn handlers into handlers[MIG_PRI_MAX]; but I guess the problem is
you would have to change all the foreach's scattered around that walk
the list.  So


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/savevm.c | 26 +++++++++++++++++++++++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index b2e3b7222a..f7a2d36bba 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -250,6 +250,7 @@ typedef struct SaveStateEntry {
>  
>  typedef struct SaveState {
>      QTAILQ_HEAD(, SaveStateEntry) handlers;
> +    SaveStateEntry *handler_pri_head[MIG_PRI_MAX + 1];
>      int global_section_id;
>      uint32_t len;
>      const char *name;
> @@ -261,6 +262,7 @@ typedef struct SaveState {
>  
>  static SaveState savevm_state = {
>      .handlers = QTAILQ_HEAD_INITIALIZER(savevm_state.handlers),
> +    .handler_pri_head = { [MIG_PRI_DEFAULT ... MIG_PRI_MAX] = NULL },
>      .global_section_id = 0,
>  };
>  
> @@ -709,24 +711,42 @@ static void savevm_state_handler_insert(SaveStateEntry *nse)
>  {
>      MigrationPriority priority = save_state_priority(nse);
>      SaveStateEntry *se;
> +    int i;
>  
>      assert(priority <= MIG_PRI_MAX);
>  
> -    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> -        if (save_state_priority(se) < priority) {
> +    for (i = priority - 1; i >= 0; i--) {
> +        se = savevm_state.handler_pri_head[i];
> +        if (se != NULL) {
> +            assert(save_state_priority(se) < priority);
>              break;
>          }
>      }
>  
> -    if (se) {
> +    if (i >= 0) {
>          QTAILQ_INSERT_BEFORE(se, nse, entry);
>      } else {
>          QTAILQ_INSERT_TAIL(&savevm_state.handlers, nse, entry);
>      }
> +
> +    if (savevm_state.handler_pri_head[priority] == NULL) {
> +        savevm_state.handler_pri_head[priority] = nse;
> +    }
>  }
>  
>  static void savevm_state_handler_remove(SaveStateEntry *se)
>  {
> +    SaveStateEntry *next;
> +    MigrationPriority priority = save_state_priority(se);
> +
> +    if (se == savevm_state.handler_pri_head[priority]) {
> +        next = QTAILQ_NEXT(se, entry);
> +        if (next != NULL && save_state_priority(next) == priority) {
> +            savevm_state.handler_pri_head[priority] = next;
> +        } else {
> +            savevm_state.handler_pri_head[priority] = NULL;
> +        }
> +    }
>      QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
>  }
>  
> -- 
> 2.23.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
  2019-11-20 21:48             ` Scott Cheloha
@ 2019-12-04 16:49               ` Dr. David Alan Gilbert
  2019-12-04 22:28                 ` David Gibson
  0 siblings, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2019-12-04 16:49 UTC (permalink / raw)
  To: Scott Cheloha
  Cc: Laurent Vivier, qemu-devel, Juan Quintela, Michael Roth, David Gibson

* Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> On Mon, Oct 21, 2019 at 09:14:44AM +0100, Dr. David Alan Gilbert wrote:
> > * David Gibson (david@gibson.dropbear.id.au) wrote:
> > > On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Laurent Vivier (lvivier@redhat.com) wrote:
> > > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> > > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > > > > >> sorting is maintained by searching from head to tail for a suitable
> > > > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > > > >>
> > > > > >> If we instead keep track of the head of each priority's subqueue
> > > > > >> within that larger queue we can reduce this operation to O(1) time.
> > > > > >>
> > > > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > > > >> accomodate these gains: we need to replace the head of a priority's
> > > > > >> subqueue when removing it.
> > > > > >>
> > > > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > > > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
> > > > > >> objects to insert.
> > > > > > 
> > > > > > Separate from reviewing this patch, I'd like to understand why you've
> > > > > > got 40000 objects.  This feels very very wrong and is likely to cause
> > > > > > problems to random other bits of qemu as well.
> > > > > 
> > > > > I think the 40000 objects are the "dr-connectors" that are used to plug
> > > > > peripherals (memory, pci card, cpus, ...).
> > > > 
> > > > Yes, Scott confirmed that in the reply to the previous version.
> > > > IMHO nothing in qemu is designed to deal with that many devices/objects
> > > > - I'm sure that something other than the migration code is going to
> > > > get upset.
> > > 
> > > It kind of did.  Particularly when there was n^2 and n^3 cubed
> > > behaviour in the property stuff we had some ludicrously long startup
> > > times (hours) with large maxmem values.
> > > 
> > > Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem.
> > > The problem is the memory DRCs, there's one for each LMB - each 256MiB
> > > chunk of memory (or possible memory).
> > > 
> > > > Is perhaps the structure wrong somewhere - should there be a single DRC
> > > > device that knows about all DRCs?
> > > 
> > > Maybe.  The tricky bit is how to get there from here without breaking
> > > migration or something else along the way.
> > 
> > Switch on the next machine type version - it doesn't matter if migration
> > is incompatible then.
> 
> 1mo bump.
> 
> Is there anything I need to do with this patch in particular to make it suitable
> for merging?

Apologies for the delay;  hopefully this will go in one of the pulls
just after the tree opens again.

Please please try and work on reducing the number of objects somehow -
while this migration fix is a useful short term fix, and not too
invasive; having that many objects around qemu is a really really bad
idea so needs fixing properly.

Dave

> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
  2019-12-04 16:49               ` Dr. David Alan Gilbert
@ 2019-12-04 22:28                 ` David Gibson
  0 siblings, 0 replies; 17+ messages in thread
From: David Gibson @ 2019-12-04 22:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Laurent Vivier, Juan Quintela, Scott Cheloha, Michael Roth, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3629 bytes --]

On Wed, Dec 04, 2019 at 04:49:15PM +0000, Dr. David Alan Gilbert wrote:
> * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> > On Mon, Oct 21, 2019 at 09:14:44AM +0100, Dr. David Alan Gilbert wrote:
> > > * David Gibson (david@gibson.dropbear.id.au) wrote:
> > > > On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Laurent Vivier (lvivier@redhat.com) wrote:
> > > > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> > > > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > > > > > >> sorting is maintained by searching from head to tail for a suitable
> > > > > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > > > > >>
> > > > > > >> If we instead keep track of the head of each priority's subqueue
> > > > > > >> within that larger queue we can reduce this operation to O(1) time.
> > > > > > >>
> > > > > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > > > > >> accomodate these gains: we need to replace the head of a priority's
> > > > > > >> subqueue when removing it.
> > > > > > >>
> > > > > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > > > > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
> > > > > > >> objects to insert.
> > > > > > > 
> > > > > > > Separate from reviewing this patch, I'd like to understand why you've
> > > > > > > got 40000 objects.  This feels very very wrong and is likely to cause
> > > > > > > problems to random other bits of qemu as well.
> > > > > > 
> > > > > > I think the 40000 objects are the "dr-connectors" that are used to plug
> > > > > > peripherals (memory, pci card, cpus, ...).
> > > > > 
> > > > > Yes, Scott confirmed that in the reply to the previous version.
> > > > > IMHO nothing in qemu is designed to deal with that many devices/objects
> > > > > - I'm sure that something other than the migration code is going to
> > > > > get upset.
> > > > 
> > > > It kind of did.  Particularly when there was n^2 and n^3 cubed
> > > > behaviour in the property stuff we had some ludicrously long startup
> > > > times (hours) with large maxmem values.
> > > > 
> > > > Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem.
> > > > The problem is the memory DRCs, there's one for each LMB - each 256MiB
> > > > chunk of memory (or possible memory).
> > > > 
> > > > > Is perhaps the structure wrong somewhere - should there be a single DRC
> > > > > device that knows about all DRCs?
> > > > 
> > > > Maybe.  The tricky bit is how to get there from here without breaking
> > > > migration or something else along the way.
> > > 
> > > Switch on the next machine type version - it doesn't matter if migration
> > > is incompatible then.
> > 
> > 1mo bump.
> > 
> > Is there anything I need to do with this patch in particular to make it suitable
> > for merging?
> 
> Apologies for the delay;  hopefully this will go in one of the pulls
> just after the tree opens again.
> 
> Please please try and work on reducing the number of objects somehow -
> while this migration fix is a useful short term fix, and not too
> invasive; having that many objects around qemu is a really really bad
> idea so needs fixing properly.

I'm hoping to have a crack at this tomorrow.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] migration: add savevm_state_handler_remove()
  2019-10-17 20:59 ` [PATCH v2 1/2] migration: add savevm_state_handler_remove() Scott Cheloha
  2019-12-04 16:43   ` Dr. David Alan Gilbert
@ 2020-01-08 19:07   ` Juan Quintela
  1 sibling, 0 replies; 17+ messages in thread
From: Juan Quintela @ 2020-01-08 19:07 UTC (permalink / raw)
  To: Scott Cheloha; +Cc: qemu-devel, Dr . David Alan Gilbert

Scott Cheloha <cheloha@linux.vnet.ibm.com> wrote:
> Create a function to abstract common logic needed when removing a
> SaveStateEntry element from the savevm_state.handlers queue.
>
> For now we just remove the element.  Soon it will involve additional
> cleanup.
>
> Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

queued



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-01-08 19:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-17 20:59 [PATCH v2 0/2] migration: faster savevm_state_handler_insert() Scott Cheloha
2019-10-17 20:59 ` [PATCH v2 1/2] migration: add savevm_state_handler_remove() Scott Cheloha
2019-12-04 16:43   ` Dr. David Alan Gilbert
2020-01-08 19:07   ` Juan Quintela
2019-10-17 20:59 ` [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion Scott Cheloha
2019-10-18  8:16   ` Dr. David Alan Gilbert
2019-10-18  8:34     ` Laurent Vivier
2019-10-18  9:43       ` Dr. David Alan Gilbert
2019-10-18 16:38         ` Michael Roth
2019-10-18 17:26           ` Dr. David Alan Gilbert
2019-10-21  7:33           ` David Gibson
2019-10-19 10:12         ` David Gibson
2019-10-21  8:14           ` Dr. David Alan Gilbert
2019-11-20 21:48             ` Scott Cheloha
2019-12-04 16:49               ` Dr. David Alan Gilbert
2019-12-04 22:28                 ` David Gibson
2019-12-04 16:47   ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).