QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Robert Foley <robert.foley@linaro.org>
To: "Emilio G. Cota" <cota@braap.org>
Cc: "Richard Henderson" <richard.henderson@linaro.org>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	"Peter Puhov" <peter.puhov@linaro.org>
Subject: Re: [PATCH v8 74/74] cputlb: queue async flush jobs without the BQL
Date: Wed, 20 May 2020 11:01:20 -0400
Message-ID: <CAEyhzFuraifsPTBrM2g+KQVWvD09Q3fwdh=fCo+a9YOTRBAMeg@mail.gmail.com> (raw)
In-Reply-To: <20200520044613.GA359481@sff>

On Wed, 20 May 2020 at 00:46, Emilio G. Cota <cota@braap.org> wrote:
>
> On Mon, May 18, 2020 at 09:46:36 -0400, Robert Foley wrote:
>
> Thanks for doing these tests. I know from experience that benchmarking
> is hard and incredibly time consuming, so please do not be discouraged by
> my comments below.
>

Hi,
Thanks for all the comments, and for including the script!
These are all very helpful.

We will work to replicate these results using a PPC VM,
and will re-post them here.

Thanks & Regards,
-Rob

> A couple of points:
>
> 1. I am not familiar with aarch64 KVM but I'd expect it to scale almost
> like the native run. Are you assigning enough RAM to the guest? Also,
> it can help to run the kernel build in a ramfs in the guest.

> 2. The build itself does not seem to impose a scaling limit, since
> it scales very well when run natively (per-thread I presume aarch64 TCG is
> still slower than native, even if TCG is run on a faster x86 machine).
> The limit here is probably aarch64 TCG. In particular, last time I
> checked aarch64 TCG has room for improvement scalability-wise handling
> interrupts and some TLB operations; this is likely to explain why we
> see no benefit with per-CPU locks, i.e. the bottleneck is elsewhere.
> This can be confirmed with the sync profiler.
>
> IIRC I originally used ppc64 for this test because ppc64 TCG does not
> have any other big bottlenecks scalability-wise. I just checked but
> unfortunately I can't find the ppc64 image I used :( What I can offer
> is the script I used to run these benchmarks; see the appended.
>
> Thanks,
>                 Emilio
>
> ---
> #!/bin/bash
>
> set -eu
>
> # path to host files
> MYHOME=/local/home/cota/src
>
> # guest image
> QEMU_INST_PATH=$MYHOME/qemu-inst
> IMG=$MYHOME/qemu/img/ppc64/ubuntu.qcow2
>
> ARCH=ppc64
> COMMON_ARGS="-M pseries -nodefaults \
>                 -hda $IMG -nographic -serial stdio \
>                 -net nic -net user,hostfwd=tcp::2222-:22 \
>                 -m 48G"
>
> # path to this script's directory, where .txt output will be copied
> # from the guest.
> QELT=$MYHOME/qelt
> HOST_PATH=$QELT/fig/kcomp
>
> # The guest must be able to SSH to the HOST without entering a password.
> # The way I set this up is to have a passwordless SSH key in the guest's
> # root user, and then copy that key's public key to the host.
> # I used the root user because the guest runs on bootup (as root) a
> # script that scp's run-guest.sh (see below) from the host, then executes it.
> # This is done via a tiny script in the guest invoked from systemd once
> # boot-up has completed.
> HOST=foo@bar.edu
>
> # This is a script in the host to use an appropriate cpumask to
> # use cores in the same socket if possible.
> # See https://github.com/cota/cputopology-perl
> CPUTOPO=$MYHOME/cputopology-perl
>
> # For each run we create this file that then the guest will SCP
> # and execute. It is a quick and dirty way of passing arguments to the guest.
> create_file () {
>     TAG=$1
>     CORES=$2
>     NAME=$ARCH.$TAG-$CORES.txt
>
>     echo '#!/bin/bash' > run-guest.sh
>     echo 'cp -r /home/cota/linux-4.18-rc7 /tmp2/linux' >> run-guest.sh
>     echo "cd /tmp2/linux" >> run-guest.sh
>     echo "{ time make -j $CORES vmlinux >/dev/null; } 2>>/home/cota/$NAME" >> run-guest.sh
>     # Output with execution time is then scp'ed to the host.
>     echo "ssh $HOST 'cat >> $HOST_PATH/$NAME' < /home/cota/$NAME" >> run-guest.sh
>     echo "poweroff" >> run-guest.sh
> }
>
> # Change here THREADS and also the TAGS that point to different QEMU installations.
> for THREADS in 64 32 16; do
>     for TAG in cpu-exclusive-work cputlb-no-bql per-cpu-lock cpu-has-work baseline; do
>         QEMU=$QEMU_INST_PATH/$TAG/bin/qemu-system-$ARCH
>         CPUMASK=$($CPUTOPO/list.pl --policy=compact-smt $THREADS)
>
>         create_file $TAG $THREADS
>         time taskset -c $CPUMASK $QEMU $COMMON_ARGS -smp $THREADS
>     done
> done


  reply index

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-26 19:30 [PATCH v8 00/74] per-CPU locks Robert Foley
2020-03-26 19:30 ` [PATCH v8 01/74] cpu: convert queued work to a QSIMPLEQ Robert Foley
2020-03-26 19:30 ` [PATCH v8 02/74] cpu: rename cpu->work_mutex to cpu->lock Robert Foley
2020-05-11 14:48   ` Alex Bennée
2020-05-11 16:33     ` Robert Foley
2020-03-26 19:30 ` [PATCH v8 03/74] cpu: introduce cpu_mutex_lock/unlock Robert Foley
2020-05-11 10:24   ` Alex Bennée
2020-05-11 16:09     ` Robert Foley
2020-03-26 19:30 ` [PATCH v8 04/74] cpu: make qemu_work_cond per-cpu Robert Foley
2020-03-26 19:30 ` [PATCH v8 05/74] cpu: move run_on_cpu to cpus-common Robert Foley
2020-03-26 19:30 ` [PATCH v8 06/74] cpu: introduce process_queued_cpu_work_locked Robert Foley
2020-03-26 19:30 ` [PATCH v8 07/74] cpu: make per-CPU locks an alias of the BQL in TCG rr mode Robert Foley
2020-03-26 19:30 ` [PATCH v8 08/74] tcg-runtime: define helper_cpu_halted_set Robert Foley
2020-03-26 19:30 ` [PATCH v8 09/74] ppc: convert to helper_cpu_halted_set Robert Foley
2020-03-26 19:30 ` [PATCH v8 10/74] cris: " Robert Foley
2020-03-26 19:30 ` [PATCH v8 11/74] hppa: " Robert Foley
2020-03-26 19:30 ` [PATCH v8 12/74] m68k: " Robert Foley
2020-03-26 19:30 ` [PATCH v8 13/74] alpha: " Robert Foley
2020-03-26 19:30 ` [PATCH v8 14/74] microblaze: " Robert Foley
2020-03-26 19:30 ` [PATCH v8 15/74] cpu: define cpu_halted helpers Robert Foley
2020-03-26 19:30 ` [PATCH v8 16/74] tcg-runtime: convert to cpu_halted_set Robert Foley
2020-03-26 19:30 ` [PATCH v8 17/74] hw/semihosting: " Robert Foley
2020-05-11 10:25   ` Alex Bennée
2020-03-26 19:31 ` [PATCH v8 18/74] arm: convert to cpu_halted Robert Foley
2020-03-26 19:31 ` [PATCH v8 19/74] ppc: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 20/74] sh4: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 21/74] i386: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 22/74] lm32: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 23/74] m68k: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 24/74] mips: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 25/74] riscv: " Robert Foley
2020-05-11 10:40   ` Alex Bennée
2020-05-11 16:13     ` Robert Foley
2020-03-26 19:31 ` [PATCH v8 26/74] s390x: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 27/74] sparc: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 28/74] xtensa: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 29/74] gdbstub: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 30/74] openrisc: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 31/74] cpu-exec: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 32/74] cpu: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 33/74] cpu: define cpu_interrupt_request helpers Robert Foley
2020-03-26 19:31 ` [PATCH v8 34/74] ppc: use cpu_reset_interrupt Robert Foley
2020-03-26 19:31 ` [PATCH v8 35/74] exec: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 36/74] i386: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 37/74] s390x: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 38/74] openrisc: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 39/74] arm: convert to cpu_interrupt_request Robert Foley
2020-03-26 19:31 ` [PATCH v8 40/74] i386: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 41/74] i386/kvm: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 42/74] i386/hax-all: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 43/74] i386/whpx-all: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 44/74] i386/hvf: convert to cpu_request_interrupt Robert Foley
2020-03-26 19:31 ` [PATCH v8 45/74] ppc: convert to cpu_interrupt_request Robert Foley
2020-03-26 19:31 ` [PATCH v8 46/74] sh4: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 47/74] cris: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 48/74] hppa: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 49/74] lm32: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 50/74] m68k: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 51/74] mips: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 52/74] nios: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 53/74] s390x: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 54/74] alpha: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 55/74] moxie: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 56/74] sparc: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 57/74] openrisc: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 58/74] unicore32: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 59/74] microblaze: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 60/74] accel/tcg: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 61/74] cpu: convert to interrupt_request Robert Foley
2020-03-26 19:31 ` [PATCH v8 62/74] cpu: call .cpu_has_work with the CPU lock held Robert Foley
2020-03-26 19:31 ` [PATCH v8 63/74] cpu: introduce cpu_has_work_with_iothread_lock Robert Foley
2020-03-26 19:31 ` [PATCH v8 64/74] ppc: convert to cpu_has_work_with_iothread_lock Robert Foley
2020-03-26 19:31 ` [PATCH v8 65/74] mips: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 66/74] s390x: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 67/74] riscv: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 68/74] sparc: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 69/74] xtensa: " Robert Foley
2020-03-26 19:31 ` [PATCH v8 70/74] cpu: rename all_cpu_threads_idle to qemu_tcg_rr_all_cpu_threads_idle Robert Foley
2020-03-26 19:31 ` [PATCH v8 71/74] cpu: protect CPU state with cpu->lock instead of the BQL Robert Foley
2020-03-26 19:31 ` [PATCH v8 72/74] cpus-common: release BQL earlier in run_on_cpu Robert Foley
2020-03-26 19:31 ` [PATCH v8 73/74] cpu: add async_run_on_cpu_no_bql Robert Foley
2020-03-26 19:31 ` [PATCH v8 74/74] cputlb: queue async flush jobs without the BQL Robert Foley
2020-05-12 16:27   ` Alex Bennée
2020-05-12 19:26     ` Robert Foley
2020-05-18 13:46       ` Robert Foley
2020-05-20  4:46         ` Emilio G. Cota
2020-05-20 15:01           ` Robert Foley [this message]
2020-05-21 14:17             ` Robert Foley
2020-05-12 18:38   ` Alex Bennée
2020-03-26 22:58 ` [PATCH v8 00/74] per-CPU locks Aleksandar Markovic
2020-03-27  9:39   ` Alex Bennée
2020-03-27  9:50     ` Aleksandar Markovic
2020-03-27 10:24       ` Aleksandar Markovic
2020-03-27 17:21         ` Robert Foley
2020-03-27  5:14 ` Emilio G. Cota
2020-03-27 10:59   ` Philippe Mathieu-Daudé
2020-03-30  8:57     ` Stefan Hajnoczi
2020-03-27 18:23   ` Alex Bennée
2020-03-27 18:30   ` Robert Foley
2020-05-12 16:29 ` Alex Bennée

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEyhzFuraifsPTBrM2g+KQVWvD09Q3fwdh=fCo+a9YOTRBAMeg@mail.gmail.com' \
    --to=robert.foley@linaro.org \
    --cc=alex.bennee@linaro.org \
    --cc=cota@braap.org \
    --cc=peter.puhov@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git