QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Michael Roth <mdroth@linux.vnet.ibm.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>
Cc: david@gibson.dropbear.id.au,
	Scott Cheloha <cheloha@linux.vnet.ibm.com>,
	qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
Date: Fri, 18 Oct 2019 11:38:37 -0500
Message-ID: <157141671749.15348.15966144834012002565@sif> (raw)
In-Reply-To: <20191018094352.GC2990@work-vm>

Quoting Dr. David Alan Gilbert (2019-10-18 04:43:52)
> * Laurent Vivier (lvivier@redhat.com) wrote:
> > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > >> sorting is maintained by searching from head to tail for a suitable
> > >> insertion spot.  Insertion is thus an O(n) operation.
> > >>
> > >> If we instead keep track of the head of each priority's subqueue
> > >> within that larger queue we can reduce this operation to O(1) time.
> > >>
> > >> savevm_state_handler_remove() becomes slightly more complex to
> > >> accomodate these gains: we need to replace the head of a priority's
> > >> subqueue when removing it.
> > >>
> > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 40000 such
> > >> objects to insert.
> > > 
> > > Separate from reviewing this patch, I'd like to understand why you've
> > > got 40000 objects.  This feels very very wrong and is likely to cause
> > > problems to random other bits of qemu as well.
> > 
> > I think the 40000 objects are the "dr-connectors" that are used to plug
> > peripherals (memory, pci card, cpus, ...).
> 
> Yes, Scott confirmed that in the reply to the previous version.
> IMHO nothing in qemu is designed to deal with that many devices/objects
> - I'm sure that something other than the migration code is going to get upset.

The device/object management aspect seems to handle things *mostly* okay, at
least ever since QOM child properties started being tracked by a hash table
instead of a linked list. It's worth noting that that change (b604a854) was
done to better handle IRQ pins for ARM guests with lots of CPUs. I think it is
inevitable that certain machine types/configurations will call for large
numbers of objects and I think it is fair to improve things to allow for this
sort of scalability.

But I agree it shouldn't be abused, and you're right that there are some
problem areas that arise. Trying to outline them:

 a) introspection commands like 'info qom-tree' become pretty unwieldly,
    and with large enough numbers of objects might even break things (QMP
    response size limits maybe?)
 b) various related lists like reset handlers, vmstate/savevm handlers might
    grow quite large

I think we could work around a) with maybe flagging certain
"internally-only" objects as 'hidden'. Introspection routines could then
filter these out, and routines like qom-set/qom-get could return report
something similar to EACCESS so they are never used/useful to management
tools.

In cases like b) we can optimize things where it makes sense like with
Scott's patch here. In most cases these lists need to be walked one way
or another, whether it's done internally by the object or through common
interfaces provided by QEMU. It's really just the O(n^2) type handling
where relying on common interfaces becomes drastically less efficient,
but I think we should avoid implementing things in that way anyway, or
improve them as needed.

> 
> Is perhaps the structure wrong somewhere - should there be a single DRC
> device that knows about all DRCs?

That's an interesting proposition, I think it's worth exploring further,
but from a high level:

 - each SpaprDrc has migration state, and some sub-classes SpaprDrc (e.g.
   SpaprDrcPhysical) have additional migration state. These are sent
   as-needed as separate VMState entries in the migration stream.
   Moving to a single DRC means we're either sending them as an flat
   array or a sparse list, which would put just as much load on the
   migration code (at least, with Scott's changes in place). It would
   also be difficult to do all this in a way which maintains migration
   compatibility with older machine types.
 - other aspects of modeling these as QOM objects, such as look-ups,
   reset-handling, and memory allocations, wouldn't be dramatically
   improved upon by handling it all internally within the object

AFAICT the biggest issue with modeling the DRCs as individual objects
is actually how we deal with introspection, and we should try to
improve. What do you think of the alternative suggestion above of
marking certain objects as 'hidden' from various introspection
interfaces?

> 
> Dave
> 
> 
> > https://github.com/qemu/qemu/blob/master/hw/ppc/spapr_drc.c
> > 
> > They are part of SPAPR specification.
> > 
> > https://raw.githubusercontent.com/qemu/qemu/master/docs/specs/ppc-spapr-hotplug.txt
> > 
> > CC Michael Roth
> > 
> > Thanks,
> > Laurent
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 


  reply index

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17 20:59 [PATCH v2 0/2] migration: faster savevm_state_handler_insert() Scott Cheloha
2019-10-17 20:59 ` [PATCH v2 1/2] migration: add savevm_state_handler_remove() Scott Cheloha
2019-12-04 16:43   ` Dr. David Alan Gilbert
2020-01-08 19:07   ` Juan Quintela
2019-10-17 20:59 ` [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion Scott Cheloha
2019-10-18  8:16   ` Dr. David Alan Gilbert
2019-10-18  8:34     ` Laurent Vivier
2019-10-18  9:43       ` Dr. David Alan Gilbert
2019-10-18 16:38         ` Michael Roth [this message]
2019-10-18 17:26           ` Dr. David Alan Gilbert
2019-10-21  7:33           ` David Gibson
2019-10-19 10:12         ` David Gibson
2019-10-21  8:14           ` Dr. David Alan Gilbert
2019-11-20 21:48             ` Scott Cheloha
2019-12-04 16:49               ` Dr. David Alan Gilbert
2019-12-04 22:28                 ` David Gibson
2019-12-04 16:47   ` Dr. David Alan Gilbert

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=157141671749.15348.15966144834012002565@sif \
    --to=mdroth@linux.vnet.ibm.com \
    --cc=cheloha@linux.vnet.ibm.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dgilbert@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git