All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Michal Privoznik" <mprivozn@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Stefan Weil" <sw@weilnetz.de>
Subject: Re: [PATCH v3 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext
Date: Thu, 27 Oct 2022 11:02:15 +0200	[thread overview]
Message-ID: <312f188d-9b0c-839f-d747-9f7c4ac95683@redhat.com> (raw)
In-Reply-To: <20221014134720.168738-1-david@redhat.com>

On 14.10.22 15:47, David Hildenbrand wrote:
> This is a follow-up on "util: NUMA aware memory preallocation" [1] by
> Michal.
> 
> Setting the CPU affinity of threads from inside QEMU usually isn't
> easily possible, because we don't want QEMU -- once started and running
> guest code -- to be able to mess up the system. QEMU disallows relevant
> syscalls using seccomp, such that any such invocation will fail.
> 
> Especially for memory preallocation in memory backends, the CPU affinity
> can significantly increase guest startup time, for example, when running
> large VMs backed by huge/gigantic pages, because of NUMA effects. For
> NUMA-aware preallocation, we have to set the CPU affinity, however:
> 
> (1) Once preallocation threads are created during preallocation, management
>      tools cannot intercept anymore to change the affinity. These threads
>      are created automatically on demand.
> (2) QEMU cannot easily set the CPU affinity itself.
> (3) The CPU affinity derived from the NUMA bindings of the memory backend
>      might not necessarily be exactly the CPUs we actually want to use
>      (e.g., CPU-less NUMA nodes, CPUs that are pinned/used for other VMs).
> 
> There is an easy "workaround". If we have a thread with the right CPU
> affinity, we can simply create new threads on demand via that prepared
> context. So, all we have to do is setup and create such a context ahead
> of time, to then configure preallocation to create new threads via that
> environment.
> 
> So, let's introduce a user-creatable "thread-context" object that
> essentially consists of a context thread used to create new threads.
> QEMU can either try setting the CPU affinity itself ("cpu-affinity",
> "node-affinity" property), or upper layers can extract the thread id
> ("thread-id" property) to configure it externally.
> 
> Make memory-backends consume a thread-context object
> (via the "prealloc-context" property) and use it when preallocating to
> create new threads with the desired CPU affinity. Further, to make it
> easier to use, allow creation of "thread-context" objects, including
> setting the CPU affinity directly from QEMU, before enabling the
> sandbox option.
> 
> 
> Quick test on a system with 2 NUMA nodes:
> 
> Without CPU affinity:
>      time qemu-system-x86_64 \
>          -object memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind \
>          -nographic -monitor stdio
> 
>      real    0m5.383s
>      real    0m3.499s
>      real    0m5.129s
>      real    0m4.232s
>      real    0m5.220s
>      real    0m4.288s
>      real    0m3.582s
>      real    0m4.305s
>      real    0m5.421s
>      real    0m4.502s
> 
>      -> It heavily depends on the scheduler CPU selection
> 
> With CPU affinity:
>      time qemu-system-x86_64 \
>          -object thread-context,id=tc1,node-affinity=0 \
>          -object memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind,prealloc-context=tc1 \
>          -sandbox enable=on,resourcecontrol=deny \
>          -nographic -monitor stdio
> 
>      real    0m1.959s
>      real    0m1.942s
>      real    0m1.943s
>      real    0m1.941s
>      real    0m1.948s
>      real    0m1.964s
>      real    0m1.949s
>      real    0m1.948s
>      real    0m1.941s
>      real    0m1.937s
> 
> On reasonably large VMs, the speedup can be quite significant.
> 
> While this concept is currently only used for short-lived preallocation
> threads, nothing major speaks against reusing the concept for other
> threads that are harder to identify/configure -- except that
> we need additional (idle) context threads that are otherwise left unused.
> 
> This series does not yet tackle concurrent preallocation of memory
> backends. Memory backend objects are created and memory is preallocated one
> memory backend at a time -- and there is currently no way to do
> preallocation asynchronously.
> 
> [1] https://lkml.kernel.org/r/ffdcd118d59b379ede2b64745144165a40f6a813.1652165704.git.mprivozn@redhat.com
> 
> v2 -> v3:
> * "util: Introduce ThreadContext user-creatable object"
>   -> Further impove documentation and patch description and add ACK. [Markus]
> * "util: Add write-only "node-affinity" property for ThreadContext"
>   -> Further impove documentation and patch description and add ACK. [Markus]
> 
> v1 -> v2:
> * Fixed some minor style nits
> * "util: Introduce ThreadContext user-creatable object"
>   -> Impove documentation and patch description. [Markus]
> * "util: Add write-only "node-affinity" property for ThreadContext"
>   -> Impove documentation and patch description. [Markus]
> 
> RFC -> v1:
> * "vl: Allow ThreadContext objects to be created before the sandbox option"
>   -> Move parsing of the "name" property before object_create_pre_sandbox
> * Added RB's

I'm queuing this to

https://github.com/davidhildenbrand/qemu.git mem-next

And most probably send a MR tomorrow before soft-freeze.

-- 
Thanks,

David / dhildenb



      parent reply	other threads:[~2022-10-27  9:03 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-14 13:47 [PATCH v3 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext David Hildenbrand
2022-10-14 13:47 ` [PATCH v3 1/7] util: Cleanup and rename os_mem_prealloc() David Hildenbrand
2022-10-14 13:47 ` [PATCH v3 2/7] util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity() David Hildenbrand
2022-10-14 13:47 ` [PATCH v3 3/7] util: Introduce ThreadContext user-creatable object David Hildenbrand
2022-10-14 13:47 ` [PATCH v3 4/7] util: Add write-only "node-affinity" property for ThreadContext David Hildenbrand
2022-10-17  8:56   ` Markus Armbruster
2022-10-17 11:29     ` David Hildenbrand
2022-10-14 13:47 ` [PATCH v3 5/7] util: Make qemu_prealloc_mem() optionally consume a ThreadContext David Hildenbrand
2022-10-14 13:47 ` [PATCH v3 6/7] hostmem: Allow for specifying a ThreadContext for preallocation David Hildenbrand
2022-10-14 13:47 ` [PATCH v3 7/7] vl: Allow ThreadContext objects to be created before the sandbox option David Hildenbrand
2022-10-19 12:36 ` [PATCH v3 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext David Hildenbrand
2022-10-27  9:02 ` David Hildenbrand [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=312f188d-9b0c-839f-d747-9f7c4ac95683@redhat.com \
    --to=david@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=imammedo@redhat.com \
    --cc=mprivozn@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=sw@weilnetz.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.