From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42889) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bN44T-0001fd-A7 for qemu-devel@nongnu.org; Tue, 12 Jul 2016 16:14:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bN44O-0008Bc-T8 for qemu-devel@nongnu.org; Tue, 12 Jul 2016 16:14:20 -0400 Received: from mail-lf0-x22e.google.com ([2a00:1450:4010:c07::22e]:33861) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bN44O-0008AZ-9x for qemu-devel@nongnu.org; Tue, 12 Jul 2016 16:14:16 -0400 Received: by mail-lf0-x22e.google.com with SMTP id h129so22904125lfh.1 for ; Tue, 12 Jul 2016 13:14:15 -0700 (PDT) From: Sergey Fedorov Date: Tue, 12 Jul 2016 23:13:35 +0300 Message-Id: <1468354426-837-1-git-send-email-sergey.fedorov@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org, mttcg@listserver.greensocs.com, fred.konrad@greensocs.com, a.rigo@virtualopensystems.com, serge.fdrv@gmail.com, cota@braap.org, bobby.prani@gmail.com, rth@twiddle.net Cc: patches@linaro.org, mark.burton@greensocs.com, pbonzini@redhat.com, jan.kiszka@siemens.com, peter.maydell@linaro.org, claudio.fontana@huawei.com, =?UTF-8?q?Alex=20Benn=C3=A9e?= From: Sergey Fedorov Hi, This is my respin of Alex's v2 series [1]. The first 8 patches are preparation for the patch 9, the subject matter of this series, which enables lockless translation block lookup. The main change here is that Paolo's suggestion is implemented: TBs are marked with invalid CPU state early during invalidation. This allows to make lockless lookup safe from races on 'tb_jmp_cache' and direct block chaining. The patch 10 is a simple solution to avoid unnecessary bouncing on 'tb_lock' between tb_gen_code() and tb_add_jump(). A local variable is used to keep track of whether 'tb_lock' has already been taken. The last patch is my attempt to restructure tb_find_{fast,slow}() into a single function tb_find(). I think it will be easier to follow the locking scheme this way. However, I am afraid this last patch can be controversial, so it can be simply dropped. This series can be fetch from the public git repository: https://github.com/sergefdrv/qemu.git lockless-tb-lookup-v3 [1] http://thread.gmane.org/gmane.comp.emulators.qemu/424856 Kind regards, Sergey Summary of changes in v3: - QHT memory ordering assumptions documented - 'tb_jmp_cache' reset in tb_flush() made atomic - explicit memory barriers removed around 'tb_jmp_cache' access - safe access to 'tb_flushed' out of 'tb_lock' prepared - TBs marked with invalid CPU state early on invalidation - Alex's tb_find_{fast,slow}() roll-up related patches dropped - bouncing of tb_lock between tb_gen_code() and tb_add_jump() avoided with local variable 'have_tb_lock' - tb_find_{fast,slow}() merged Alex Bennée (2): tcg: set up tb->page_addr before insertion tcg: cpu-exec: remove tb_lock from the hot-path Sergey Fedorov (9): util/qht: Document memory ordering assumptions cpu-exec: Pass last_tb by value to tb_find_fast() tcg: Prepare safe tb_jmp_cache lookup out of tb_lock tcg: Prepare safe access to tb_flushed out of tb_lock target-i386: Remove redundant HF_SOFTMMU_MASK tcg: Introduce tb_mark_invalid() and tb_is_invalid() tcg: Prepare TB invalidation for lockless TB lookup tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump() tcg: Merge tb_find_slow() and tb_find_fast() cpu-exec.c | 110 +++++++++++++++++++++-------------------------- include/exec/exec-all.h | 10 +++++ include/qemu/qht.h | 9 ++++ target-alpha/cpu.h | 14 ++++++ target-arm/cpu.h | 14 ++++++ target-cris/cpu.h | 14 ++++++ target-i386/cpu.c | 3 -- target-i386/cpu.h | 20 +++++++-- target-i386/translate.c | 12 ++---- target-lm32/cpu.h | 14 ++++++ target-m68k/cpu.h | 14 ++++++ target-microblaze/cpu.h | 14 ++++++ target-mips/cpu.h | 14 ++++++ target-moxie/cpu.h | 14 ++++++ target-openrisc/cpu.h | 14 ++++++ target-ppc/cpu.h | 14 ++++++ target-s390x/cpu.h | 14 ++++++ target-sh4/cpu.h | 14 ++++++ target-sparc/cpu.h | 14 ++++++ target-sparc/translate.c | 1 + target-tilegx/cpu.h | 14 ++++++ target-tricore/cpu.h | 14 ++++++ target-unicore32/cpu.h | 14 ++++++ target-xtensa/cpu.h | 14 ++++++ translate-all.c | 30 ++++++------- util/qht.c | 8 ++++ 26 files changed, 349 insertions(+), 92 deletions(-) -- 1.9.1