QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: qemu-devel@nongnu.org
Cc: bauerchen <bauerchen@tencent.com>
Subject: [PULL 132/136] mem-prealloc: optimize large guest startup
Date: Tue, 25 Feb 2020 13:07:30 +0100
Message-ID: <1582632454-16491-30-git-send-email-pbonzini@redhat.com> (raw)
In-Reply-To: <1582631466-13880-1-git-send-email-pbonzini@redhat.com>

From: bauerchen <bauerchen@tencent.com>

[desc]:
    Large memory VM starts slowly when using -mem-prealloc, and
    there are some areas to optimize in current method;

    1、mmap will be used to alloc threads stack during create page
    clearing threads, and it will attempt mm->mmap_sem for write
    lock, but clearing threads have hold read lock, this competition
    will cause threads createion very slow;

    2、methods of calcuating pages for per threads is not well;if we use
    64 threads to split 160 hugepage,63 threads clear 2page,1 thread
    clear 34 page,so the entire speed is very slow;

    to solve the first problem,we add a mutex in thread function,and
    start all threads when all threads finished createion;
    and the second problem, we spread remainder to other threads,in
    situation that 160 hugepage and 64 threads, there are 32 threads
    clear 3 pages,and 32 threads clear 2 pages.

[test]:
    320G 84c VM start time can be reduced to 10s
    680G 84c VM start time can be reduced to 18s

Signed-off-by: bauerchen <bauerchen@tencent.com>
Reviewed-by: Pan Rui <ruippan@tencent.com>
Reviewed-by: Ivan Ren <ivanren@tencent.com>
[Simplify computation of the number of pages per thread. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 util/oslib-posix.c | 32 ++++++++++++++++++++++++--------
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 5a291cc..897e8f3 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -76,6 +76,10 @@ static MemsetThread *memset_thread;
 static int memset_num_threads;
 static bool memset_thread_failed;
 
+static QemuMutex page_mutex;
+static QemuCond page_cond;
+static bool threads_created_flag;
+
 int qemu_get_thread_id(void)
 {
 #if defined(__linux__)
@@ -403,6 +407,17 @@ static void *do_touch_pages(void *arg)
     MemsetThread *memset_args = (MemsetThread *)arg;
     sigset_t set, oldset;
 
+    /*
+     * On Linux, the page faults from the loop below can cause mmap_sem
+     * contention with allocation of the thread stacks.  Do not start
+     * clearing until all threads have been created.
+     */
+    qemu_mutex_lock(&page_mutex);
+    while(!threads_created_flag){
+        qemu_cond_wait(&page_cond, &page_mutex);
+    }
+    qemu_mutex_unlock(&page_mutex);
+
     /* unblock SIGBUS */
     sigemptyset(&set);
     sigaddset(&set, SIGBUS);
@@ -451,27 +466,28 @@ static inline int get_memset_num_threads(int smp_cpus)
 static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
                             int smp_cpus)
 {
-    size_t numpages_per_thread;
-    size_t size_per_thread;
+    size_t numpages_per_thread, leftover;
     char *addr = area;
     int i = 0;
 
     memset_thread_failed = false;
+    threads_created_flag = false;
     memset_num_threads = get_memset_num_threads(smp_cpus);
     memset_thread = g_new0(MemsetThread, memset_num_threads);
-    numpages_per_thread = (numpages / memset_num_threads);
-    size_per_thread = (hpagesize * numpages_per_thread);
+    numpages_per_thread = numpages / memset_num_threads;
+    leftover = numpages % memset_num_threads;
     for (i = 0; i < memset_num_threads; i++) {
         memset_thread[i].addr = addr;
-        memset_thread[i].numpages = (i == (memset_num_threads - 1)) ?
-                                    numpages : numpages_per_thread;
+        memset_thread[i].numpages = numpages_per_thread + (i < leftover);
         memset_thread[i].hpagesize = hpagesize;
         qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
                            do_touch_pages, &memset_thread[i],
                            QEMU_THREAD_JOINABLE);
-        addr += size_per_thread;
-        numpages -= numpages_per_thread;
+        addr += memset_thread[i].numpages * hpagesize;
     }
+    threads_created_flag = true;
+    qemu_cond_broadcast(&page_cond);
+
     for (i = 0; i < memset_num_threads; i++) {
         qemu_thread_join(&memset_thread[i].pgthread);
     }
-- 
1.8.3.1




  parent reply index

Thread overview: 149+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-25 11:48 [PULL 000/136] Misc patches for 2020-02-25 (refactoring and Coccinelle edition) Paolo Bonzini
2020-02-25 11:48 ` [PULL 002/136] machine: introduce memory-backend property Paolo Bonzini
2020-02-25 11:48 ` [PULL 003/136] machine: alias -mem-path and -mem-prealloc into memory-foo backend Paolo Bonzini
2020-02-25 11:48 ` [PULL 004/136] machine: introduce convenience MachineState::ram Paolo Bonzini
2020-02-25 11:48 ` [PULL 005/136] initialize MachineState::ram in NUMA case Paolo Bonzini
2020-02-25 11:48 ` [PULL 006/136] vl.c: move -m parsing after memory backends has been processed Paolo Bonzini
2020-03-26  9:20   ` Auger Eric
2020-03-26 10:43     ` Igor Mammedov
2020-02-25 11:48 ` [PULL 007/136] vl.c: ensure that ram_size matches size of machine.memory-backend Paolo Bonzini
2020-02-25 11:48 ` [PULL 008/136] alpha/dp264: use memdev for RAM Paolo Bonzini
2020-02-25 11:48 ` [PULL 009/136] arm/aspeed: actually check RAM size Paolo Bonzini
2020-02-25 11:49 ` [PULL 010/136] arm/aspeed: use memdev for RAM Paolo Bonzini
2020-02-25 11:49 ` [PULL 011/136] arm/collie: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 012/136] arm/cubieboard: " Paolo Bonzini
2020-03-02 15:41   ` Peter Maydell
2020-03-02 16:55     ` Igor Mammedov
2020-03-02 17:11       ` Peter Maydell
2020-02-25 11:49 ` [PULL 013/136] arm/digic_boards: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 014/136] arm/highbank: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 015/136] arm/imx25_pdk: drop RAM size fixup Paolo Bonzini
2020-02-25 11:49 ` [PULL 016/136] arm/imx25_pdk: use memdev for RAM Paolo Bonzini
2020-02-25 11:49 ` [PULL 017/136] arm/integratorcp: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 018/136] arm/kzm: drop RAM size fixup Paolo Bonzini
2020-02-25 11:49 ` [PULL 019/136] arm/kzm: use memdev for RAM Paolo Bonzini
2020-02-25 11:49 ` [PULL 020/136] arm/mcimx6ul-evk: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 021/136] arm/mcimx7d-sabre: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 022/136] arm/mps2-tz: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 023/136] arm/mps2: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 024/136] arm/musicpal: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 025/136] arm/nseries: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 026/136] arm/omap_sx1: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 027/136] arm/palm: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 028/136] arm/sabrelite: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 029/136] arm/raspi: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 030/136] arm/sbsa-ref: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 031/136] arm/versatilepb: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 032/136] arm/vexpress: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 033/136] arm/virt: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 034/136] arm/xilinx_zynq: drop RAM size fixup Paolo Bonzini
2020-02-25 11:49 ` [PULL 035/136] arm/xilinx_zynq: use memdev for RAM Paolo Bonzini
2020-02-25 11:49 ` [PULL 036/136] arm/xlnx-versal-virt: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 037/136] arm/xlnx-zcu102: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 038/136] s390x/s390-virtio-ccw: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 039/136] null-machine: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 040/136] cris/axis_dev88: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 041/136] hppa: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 042/136] x86/microvm: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 043/136] x86/pc: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 044/136] lm32/lm32_boards: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 045/136] lm32/milkymist: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 046/136] m68k/an5206: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 047/136] m68k/q800: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 048/136] m68k/mcf5208: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 049/136] m68k/next-cube: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 050/136] mips/boston: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 051/136] mips/mips_fulong2e: drop RAM size fixup Paolo Bonzini
2020-02-25 11:49 ` [PULL 052/136] mips/mips_fulong2e: use memdev for RAM Paolo Bonzini
2020-02-25 11:49 ` [PULL 053/136] mips/mips_jazz: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 054/136] mips/mips_jazz: add max ram size check Paolo Bonzini
2020-02-25 11:49 ` [PULL 055/136] mips/mips_malta: use memdev for RAM Paolo Bonzini
2020-02-25 11:49 ` [PULL 056/136] mips/mips_mipssim: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 057/136] mips/mips_r4k: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 058/136] ppc/e500: drop RAM size fixup Paolo Bonzini
2020-02-25 11:49 ` [PULL 059/136] ppc/e500: use memdev for RAM Paolo Bonzini
2020-02-25 11:49 ` [PULL 060/136] ppc/mac_newworld: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 061/136] ppc/mac_oldworld: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 062/136] ppc/pnv: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 063/136] ppc/ppc405_boards: add RAM size checks Paolo Bonzini
2020-02-25 11:49 ` [PULL 064/136] ppc/ppc405_boards: use memdev for RAM Paolo Bonzini
2020-02-25 11:49 ` [PULL 065/136] ppc/{ppc440_bamboo, sam460ex}: drop RAM size fixup Paolo Bonzini
2020-02-25 11:49 ` [PULL 066/136] ppc/{ppc440_bamboo, sam460ex}: use memdev for RAM Paolo Bonzini
2020-02-25 11:49 ` [PULL 067/136] ppc/spapr: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 068/136] ppc/virtex_ml507: " Paolo Bonzini
2020-02-25 11:49 ` [PULL 069/136] sparc/leon3: " Paolo Bonzini
2020-02-25 11:50 ` [PULL 070/136] sparc/sun4m: " Paolo Bonzini
2020-02-25 11:50 ` [PULL 071/136] sparc/niagara: " Paolo Bonzini
2020-02-25 11:50 ` [PULL 072/136] remove no longer used memory_region_allocate_system_memory() Paolo Bonzini
2020-02-25 11:50 ` [PULL 073/136] exec: cleanup qemu_minrampagesize()/qemu_maxrampagesize() Paolo Bonzini
2020-02-25 11:50 ` [PULL 074/136] exec: drop bogus mem_path from qemu_ram_alloc_from_fd() Paolo Bonzini
2020-02-25 11:50 ` [PULL 075/136] make mem_path local variable Paolo Bonzini
2020-02-25 11:50 ` [PULL 076/136] hostmem: introduce "prealloc-threads" property Paolo Bonzini
2020-02-25 11:50 ` [PULL 077/136] hostmem: fix strict bind policy Paolo Bonzini
2020-02-25 11:50 ` [PULL 078/136] tests/numa-test: make top level args dynamic and g_autofree(cli) cleanups Paolo Bonzini
2020-02-25 11:50 ` [PULL 079/136] tests:numa-test: use explicit memdev to specify node RAM Paolo Bonzini
2020-02-25 11:50 ` [PULL 080/136] scripts/git.orderfile: Display Cocci scripts before code modifications Paolo Bonzini
2020-02-25 11:50 ` [PULL 081/136] hw: Remove unnecessary cast when calling dma_memory_read() Paolo Bonzini
2020-02-25 11:50 ` [PULL 082/136] exec: Rename ram_ptr variable Paolo Bonzini
2020-02-25 11:50 ` [PULL 083/136] exec: Let flatview API take void pointer arguments Paolo Bonzini
2020-02-25 11:50 ` [PULL 084/136] exec: Let the address_space API use " Paolo Bonzini
2020-02-25 11:50 ` [PULL 085/136] hw/net: Avoid casting non-const pointer, use address_space_write() Paolo Bonzini
2020-02-25 11:50 ` [PULL 086/136] Remove unnecessary cast when using the address_space API Paolo Bonzini
2020-02-25 11:50 ` [PULL 087/136] exec: Let the cpu_[physical]_memory API use void pointer arguments Paolo Bonzini
2020-02-25 11:50 ` [PULL 088/136] Remove unnecessary cast when using the cpu_[physical]_memory API Paolo Bonzini
2020-02-25 11:50 ` [PULL 089/136] hw/ide/internal: Remove unused DMARestartFunc typedef Paolo Bonzini
2020-02-25 11:50 ` [PULL 090/136] hw/ide: Let the DMAIntFunc prototype use a boolean 'is_write' argument Paolo Bonzini
2020-02-25 11:50 ` [PULL 091/136] hw/virtio: Let virtqueue_map_iovec() " Paolo Bonzini
2020-02-25 11:50 ` [PULL 092/136] hw/virtio: Let vhost_memory_map() " Paolo Bonzini
2020-02-25 11:50 ` [PULL 093/136] exec: Let address_space_unmap() " Paolo Bonzini
2020-02-25 11:50 ` [PULL 094/136] Let address_space_rw() calls pass " Paolo Bonzini
2020-02-25 11:50 ` [PULL 095/136] Avoid address_space_rw() with a constant is_write argument Paolo Bonzini
2020-02-25 11:50 ` [PULL 096/136] exec: Let cpu_[physical]_memory API use a boolean 'is_write' argument Paolo Bonzini
2020-02-25 11:50 ` [PULL 097/136] Let cpu_[physical]_memory() calls pass " Paolo Bonzini
2020-02-25 11:50 ` [PULL 098/136] Avoid cpu_physical_memory_rw() with a constant is_write argument Paolo Bonzini
2020-02-25 11:50 ` [PULL 099/136] memory: Correctly return alias region type Paolo Bonzini
2020-02-25 11:50 ` [PULL 100/136] memory: Simplify memory_region_init_rom_nomigrate() to ease review Paolo Bonzini
2020-02-25 11:50 ` [PULL 101/136] scripts/cocci: Rename memory-region-{init-ram -> housekeeping} Paolo Bonzini
2020-02-25 11:50 ` [PULL 102/136] scripts/cocci: Patch to replace memory_region_init_{ram, readonly -> rom} Paolo Bonzini
2020-02-25 11:50 ` [PULL 103/136] hw/arm: Use memory_region_init_rom() with read-only regions Paolo Bonzini
2020-02-25 12:07 ` Paolo Bonzini
2020-02-25 12:07 ` [PULL 104/136] hw/display: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 105/136] hw/mips: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 106/136] hw/m68k: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 107/136] hw/net: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 108/136] hw/pci-host: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 109/136] hw/ppc: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 110/136] hw/riscv: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 111/136] hw/sh4: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 112/136] hw/sparc: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 113/136] scripts/cocci: Patch to detect potential use of memory_region_init_rom Paolo Bonzini
2020-02-25 12:07 ` [PULL 114/136] hw/arm/stm32: Use memory_region_init_rom() with read-only regions Paolo Bonzini
2020-02-25 12:07 ` [PULL 115/136] hw/ppc/ppc405: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 116/136] hw/i386/pc_sysfw: Simplify using memory_region_init_alias() Paolo Bonzini
2020-02-25 12:07 ` [PULL 117/136] hw/i386/pc_sysfw: Remove unused 'ram_size' argument Paolo Bonzini
2020-02-25 12:07 ` [PULL 118/136] scripts/cocci: Patch to remove unnecessary memory_region_set_readonly() Paolo Bonzini
2020-02-25 12:07 ` [PULL 119/136] hw/arm: Remove unnecessary memory_region_set_readonly() on ROM alias Paolo Bonzini
2020-02-25 12:07 ` [PULL 120/136] scripts/cocci: Patch to let devices own their MemoryRegions Paolo Bonzini
2020-02-25 12:07 ` [PULL 121/136] hw/arm: Let devices own the MemoryRegion they create Paolo Bonzini
2020-02-25 12:07 ` [PULL 122/136] hw/char: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 123/136] hw/core: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 124/136] hw/display: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 125/136] hw/dma: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 126/136] hw/riscv: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 127/136] hw/input/milkymist-softusb: Remove unused 'pmem_ptr' field Paolo Bonzini
2020-02-25 12:07 ` [PULL 128/136] hw/input/milkymist-softusb: Let devices own the MemoryRegion they create Paolo Bonzini
2020-02-25 12:07 ` [PULL 129/136] hw/net/milkymist-minimac2: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 130/136] hw/block/onenand: " Paolo Bonzini
2020-02-25 12:07 ` [PULL 131/136] memory: batch allocate ioeventfds[] in address_space_update_ioeventfds() Paolo Bonzini
2020-02-25 12:07 ` Paolo Bonzini [this message]
2020-03-16  8:42   ` [PULL 132/136] mem-prealloc: optimize large guest startup Laurent Vivier
2020-03-16  8:45     ` Paolo Bonzini
2020-02-25 12:07 ` [PULL 133/136] qdev-monitor: Forbid repeated device_del Paolo Bonzini
2020-02-25 12:07 ` [PULL 134/136] target/i386: check for empty register in FXAM Paolo Bonzini
2020-02-25 12:07 ` [PULL 135/136] accel/kvm: Check ioctl(KVM_SET_USER_MEMORY_REGION) return value Paolo Bonzini
2020-02-25 12:07 ` [PULL 136/136] WHPX: Assigning maintainer for Windows Hypervisor Platform Paolo Bonzini
2020-02-26 21:07 ` [PULL 000/136] Misc patches for 2020-02-25 (refactoring and Coccinelle edition) Aleksandar Markovic
2020-02-28 10:40   ` Paolo Bonzini
2020-03-06  8:02 ` Christian Borntraeger
2020-03-06  8:34   ` Christian Borntraeger
2020-03-06  8:42     ` Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1582632454-16491-30-git-send-email-pbonzini@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=bauerchen@tencent.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git