From: David Hildenbrand <david@redhat.com>
To: qemu-devel@nongnu.org
Cc: "David Hildenbrand" <david@redhat.com>,
"Michal Privoznik" <mprivozn@redhat.com>,
"Igor Mammedov" <imammedo@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Eduardo Habkost" <eduardo@habkost.net>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
"Eric Blake" <eblake@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Stefan Weil" <sw@weilnetz.de>
Subject: [PATCH v1 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext
Date: Wed, 28 Sep 2022 18:45:35 +0200 [thread overview]
Message-ID: <20220928164542.117952-1-david@redhat.com> (raw)
This is a follow-up on "util: NUMA aware memory preallocation" [1] by
Michal.
Setting the CPU affinity of threads from inside QEMU usually isn't
easily possible, because we don't want QEMU -- once started and running
guest code -- to be able to mess up the system. QEMU disallows relevant
syscalls using seccomp, such that any such invocation will fail.
Especially for memory preallocation in memory backends, the CPU affinity
can significantly increase guest startup time, for example, when running
large VMs backed by huge/gigantic pages, because of NUMA effects. For
NUMA-aware preallocation, we have to set the CPU affinity, however:
(1) Once preallocation threads are created during preallocation, management
tools cannot intercept anymore to change the affinity. These threads
are created automatically on demand.
(2) QEMU cannot easily set the CPU affinity itself.
(3) The CPU affinity derived from the NUMA bindings of the memory backend
might not necessarily be exactly the CPUs we actually want to use
(e.g., CPU-less NUMA nodes, CPUs that are pinned/used for other VMs).
There is an easy "workaround". If we have a thread with the right CPU
affinity, we can simply create new threads on demand via that prepared
context. So, all we have to do is setup and create such a context ahead
of time, to then configure preallocation to create new threads via that
environment.
So, let's introduce a user-creatable "thread-context" object that
essentially consists of a context thread used to create new threads.
QEMU can either try setting the CPU affinity itself ("cpu-affinity",
"node-affinity" property), or upper layers can extract the thread id
("thread-id" property) to configure it externally.
Make memory-backends consume a thread-context object
(via the "prealloc-context" property) and use it when preallocating to
create new threads with the desired CPU affinity. Further, to make it
easier to use, allow creation of "thread-context" objects, including
setting the CPU affinity directly from QEMU, before enabling the
sandbox option.
Quick test on a system with 2 NUMA nodes:
Without CPU affinity:
time qemu-system-x86_64 \
-object memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind \
-nographic -monitor stdio
real 0m5.383s
real 0m3.499s
real 0m5.129s
real 0m4.232s
real 0m5.220s
real 0m4.288s
real 0m3.582s
real 0m4.305s
real 0m5.421s
real 0m4.502s
-> It heavily depends on the scheduler CPU selection
With CPU affinity:
time qemu-system-x86_64 \
-object thread-context,id=tc1,node-affinity=0 \
-object memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind,prealloc-context=tc1 \
-sandbox enable=on,resourcecontrol=deny \
-nographic -monitor stdio
real 0m1.959s
real 0m1.942s
real 0m1.943s
real 0m1.941s
real 0m1.948s
real 0m1.964s
real 0m1.949s
real 0m1.948s
real 0m1.941s
real 0m1.937s
On reasonably large VMs, the speedup can be quite significant.
While this concept is currently only used for short-lived preallocation
threads, nothing major speaks against reusing the concept for other
threads that are harder to identify/configure -- except that
we need additional (idle) context threads that are otherwise left unused.
This series does not yet tackle concurrent preallocation of memory
backends. Memory backend objects are created and memory is preallocated one
memory backend at a time -- and there is currently no way to do
preallocation asynchronously.
[1] https://lkml.kernel.org/r/ffdcd118d59b379ede2b64745144165a40f6a813.1652165704.git.mprivozn@redhat.com
RFC -> v1:
* "vl: Allow ThreadContext objects to be created before the sandbox option"
-> Move parsing of the "name" property before object_create_pre_sandbox
* Added RB's
Cc: Michal Privoznik <mprivozn@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Stefan Weil <sw@weilnetz.de>
David Hildenbrand (7):
util: Cleanup and rename os_mem_prealloc()
util: Introduce qemu_thread_set_affinity() and
qemu_thread_get_affinity()
util: Introduce ThreadContext user-creatable object
util: Add write-only "node-affinity" property for ThreadContext
util: Make qemu_prealloc_mem() optionally consume a ThreadContext
hostmem: Allow for specifying a ThreadContext for preallocation
vl: Allow ThreadContext objects to be created before the sandbox
option
backends/hostmem.c | 13 +-
hw/virtio/virtio-mem.c | 2 +-
include/qemu/osdep.h | 19 +-
include/qemu/thread-context.h | 58 ++++++
include/qemu/thread.h | 4 +
include/sysemu/hostmem.h | 2 +
meson.build | 16 ++
qapi/qom.json | 25 +++
softmmu/cpus.c | 2 +-
softmmu/vl.c | 36 +++-
util/meson.build | 1 +
util/oslib-posix.c | 39 ++--
util/oslib-win32.c | 8 +-
util/qemu-thread-posix.c | 70 +++++++
util/qemu-thread-win32.c | 12 ++
util/thread-context.c | 363 ++++++++++++++++++++++++++++++++++
16 files changed, 640 insertions(+), 30 deletions(-)
create mode 100644 include/qemu/thread-context.h
create mode 100644 util/thread-context.c
--
2.37.3
next reply other threads:[~2022-09-28 17:09 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-28 16:45 David Hildenbrand [this message]
2022-09-28 16:45 ` [PATCH v1 1/7] util: Cleanup and rename os_mem_prealloc() David Hildenbrand
2022-09-28 16:45 ` [PATCH v1 2/7] util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity() David Hildenbrand
2022-09-28 16:45 ` [PATCH v1 3/7] util: Introduce ThreadContext user-creatable object David Hildenbrand
2022-09-29 11:12 ` Markus Armbruster
2022-09-29 11:18 ` David Hildenbrand
2022-09-29 12:25 ` Markus Armbruster
2022-09-29 16:05 ` David Hildenbrand
2022-09-28 16:45 ` [PATCH v1 4/7] util: Add write-only "node-affinity" property for ThreadContext David Hildenbrand
2022-09-29 11:13 ` Markus Armbruster
2022-09-30 9:17 ` David Hildenbrand
2022-09-28 16:45 ` [PATCH v1 5/7] util: Make qemu_prealloc_mem() optionally consume a ThreadContext David Hildenbrand
2022-09-28 16:45 ` [PATCH v1 6/7] hostmem: Allow for specifying a ThreadContext for preallocation David Hildenbrand
2022-09-28 16:45 ` [PATCH v1 7/7] vl: Allow ThreadContext objects to be created before the sandbox option David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220928164542.117952-1-david@redhat.com \
--to=david@redhat.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eblake@redhat.com \
--cc=eduardo@habkost.net \
--cc=imammedo@redhat.com \
--cc=mprivozn@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=sw@weilnetz.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.