All of lore.kernel.org
 help / color / mirror / Atom feed
From: alvise rigo <a.rigo@virtualopensystems.com>
To: Frederic Konrad <fred.konrad@greensocs.com>
Cc: mttcg@listserver.greensocs.com,
	"Claudio Fontana" <claudio.fontana@huawei.com>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Jani Kokkonen" <jani.kokkonen@huawei.com>,
	"VirtualOpenSystems Technical Team" <tech@virtualopensystems.com>,
	"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [Qemu-devel] [RFC v3 00/13] Slow-path for atomic instruction translation
Date: Fri, 10 Jul 2015 11:04:46 +0200	[thread overview]
Message-ID: <CAH47eN0-RpXTLymxU=9kA2RoGF2kZC6h5BFJ42bSudO5c8Q19g@mail.gmail.com> (raw)
In-Reply-To: <559F84C2.1090109@greensocs.com>

On Fri, Jul 10, 2015 at 10:39 AM, Frederic Konrad
<fred.konrad@greensocs.com> wrote:
> On 10/07/2015 10:23, Alvise Rigo wrote:
>>
>> This is the third iteration of the patch series; starting from PATCH 007
>> there are the changes to move the whole work to multi-threading.
>> Changes versus previous versions are at the bottom of this cover letter.
>>
>> This patch series provides an infrastructure for atomic
>> instruction implementation in QEMU, paving the way for TCG
>> multi-threading.
>> The adopted design does not rely on host atomic
>> instructions and is intended to propose a 'legacy' solution for
>> translating guest atomic instructions.
>>
>> The underlying idea is to provide new TCG instructions that guarantee
>> atomicity to some memory accesses or in general a way to define memory
>> transactions. More specifically, a new pair of TCG instructions are
>> implemented, qemu_ldlink_i32 and qemu_stcond_i32, that behave as
>> LoadLink and StoreConditional primitives (only 32 bit variant
>> implemented).  In order to achieve this, a new bitmap is added to the
>> ram_list structure (always unique) which flags all memory pages that
>> could not be accessed directly through the fast-path, due to previous
>> exclusive operations. This new bitmap is coupled with a new TLB flag
>> which forces the slow-path execution. All stores which are performed
>> between an LL/SC operation by other vCPUs to the same (protected) address
>> will fail the subsequent StoreConditional.
>>
>> In theory, the provided implementation of TCG LoadLink/StoreConditional
>> can be used to properly handle atomic instructions on any architecture.
>>
>> The new slow-path is implemented such that:
>> - the LoadLink behaves as a normal load slow-path, except for cleaning
>>    the dirty flag in the bitmap. The TLB entries created from now on will
>>    force the slow-path. To ensure it, we flush the TLB cache for the
>>    other vCPUs. The vCPU also sets into a private variable the accessed
>>    address, in order to make it visible to the other vCPUs
>> - the StoreConditional behaves as a normal store slow-path, except for
>>    checking whether other vCPUs have set the same exclusive address
>>
>> All those write accesses that are forced to follow the 'legacy'
>> slow-path will set the accessed memory page to dirty.
>>
>> In this series only the ARM ldrex/strex instructions are implemented
>> for ARM and i386 hosts.
>> The code has been tested with bare-metal test cases and by booting Linux,
>> using the latest mttcg QEMU branch available at
>> http://git.greensocs.com/fkonrad/mttcg.git.
>
> branch multi_tcg_v6 at this time.
>
>>
>> * Performance considerations
>> This implementation shows good results while booting a Linux kernel,
>> where tons of flushes affect the overall performance. A complete ARM
>> Linux boot, without any filesystem, requires 30% longer if compared to
>> the mttcg implementation, benefiting however of being capable to offer
>> the infrastructure to handle atomic instructions on any architecture.
>> Instead compared to the current TCG upstream, it is 40% faster with four
>> vCPUs and 2.1 times faster with 8 vCPUs.
>> In addition, there is still margin to improve such performance, since at
>> the moment TLB is flushed quite often, probably more than the required.
>>
>> On the other hand, the test case
>> https://git.virtualopensystems.com/dev/tcg_baremetal_tests.git
>> that stresses heavily the LL/SC mechanic but not that much the TLB related
>> part, performs up to 1.9 times faster with 8 cores and one milion
>> iterations
>> if compared with the mttcg implementation.
>>
>> Changes from v2:
>> - the bitmap accessors are now atomic
>> - a rendezvous between vCPUs and a simple callback support before
>> executing
>>    a TB have been added to handle the TLB flush support
>
> Isn't exactly what my async_safe_work is supposed to do?

Hi Frederic,

I've started this implementation with your v4 and I've missed this
feature while porting to v6.
I think it's doable, it will make things simpler and cleaner.

Thank you,
alvise

>
>
>> - the softmmu_template and softmmu_llsc_template have been adapted to work
>>    on real multi-threading
>>
>> Changes from v1:
>> - The ram bitmap is not reversed anymore, 1 = dirty, 0 = exclusive
>> - The way how the offset to access the bitmap is calculated has
>>    been improved and fixed
>> - A page to be set as dirty requires a vCPU to target the protected
>> address
>>    and not just an address in the page
>> - Addressed comments from Richard Henderson to improve the logic in
>>    softmmu_template.h and to simplify the methods generation through
>>    softmmu_llsc_template.h
>> - Added initial implementation of qemu_{ldlink,stcond}_i32 for tcg/i386
>>
>> This work has been sponsored by Huawei Technologies Duesseldorf GmbH.
>>
>> Alvise Rigo (13):
>>    exec: Add new exclusive bitmap to ram_list
>>    cputlb: Add new TLB_EXCL flag
>>    softmmu: Add helpers for a new slow-path
>>    tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions
>>    target-arm: translate: implement qemu_ldlink and qemu_stcond ops
>>    target-i386: translate: implement qemu_ldlink and qemu_stcond ops
>>    ram_addr.h: Make exclusive bitmap accessors atomic
>>    exec.c: introduce a simple rendezvous support
>>    cpus.c: introduce simple callback support
>>    Simple TLB flush wrap to use as exit callback
>>    Introduce exit_flush_req and tcg_excl_access_lock
>>    softmmu_llsc_template.h: move to multithreading
>>    softmmu_template.h: move to multithreading
>>
>>   cpus.c                  |  39 ++++++++
>>   cputlb.c                |  33 +++++-
>>   exec.c                  |  46 +++++++++
>>   include/exec/cpu-all.h  |   2 +
>>   include/exec/cpu-defs.h |   8 ++
>>   include/exec/memory.h   |   3 +-
>>   include/exec/ram_addr.h |  22 ++++
>>   include/qom/cpu.h       |  37 +++++++
>>   softmmu_llsc_template.h | 184 ++++++++++++++++++++++++++++++++++
>>   softmmu_template.h      | 261
>> +++++++++++++++++++++++++++++++++++-------------
>>   target-arm/translate.c  |  87 +++++++++++++++-
>>   tcg/arm/tcg-target.c    | 121 ++++++++++++++++------
>>   tcg/i386/tcg-target.c   | 136 +++++++++++++++++++++----
>>   tcg/tcg-be-ldst.h       |   1 +
>>   tcg/tcg-op.c            |  23 +++++
>>   tcg/tcg-op.h            |   3 +
>>   tcg/tcg-opc.h           |   4 +
>>   tcg/tcg.c               |   2 +
>>   tcg/tcg.h               |  20 ++++
>>   19 files changed, 910 insertions(+), 122 deletions(-)
>>   create mode 100644 softmmu_llsc_template.h
>>
>

      reply	other threads:[~2015-07-10  9:04 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-10  8:23 [Qemu-devel] [RFC v3 00/13] Slow-path for atomic instruction translation Alvise Rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 01/13] exec: Add new exclusive bitmap to ram_list Alvise Rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 02/13] cputlb: Add new TLB_EXCL flag Alvise Rigo
2015-07-16 14:32   ` Alex Bennée
2015-07-16 15:04     ` alvise rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 03/13] softmmu: Add helpers for a new slow-path Alvise Rigo
2015-07-16 14:53   ` Alex Bennée
2015-07-16 15:15     ` alvise rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 04/13] tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions Alvise Rigo
2015-07-17  9:49   ` Alex Bennée
2015-07-17 10:05     ` alvise rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 05/13] target-arm: translate: implement qemu_ldlink and qemu_stcond ops Alvise Rigo
2015-07-17 12:51   ` Alex Bennée
2015-07-17 13:01     ` alvise rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 06/13] target-i386: " Alvise Rigo
2015-07-17 12:56   ` Alex Bennée
2015-07-17 13:27     ` alvise rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 07/13] ram_addr.h: Make exclusive bitmap accessors atomic Alvise Rigo
2015-07-17 13:32   ` Alex Bennée
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 08/13] exec.c: introduce a simple rendezvous support Alvise Rigo
2015-07-17 13:45   ` Alex Bennée
2015-07-17 13:54     ` alvise rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 09/13] cpus.c: introduce simple callback support Alvise Rigo
2015-07-10  9:36   ` Paolo Bonzini
2015-07-10  9:47     ` alvise rigo
2015-07-10  9:53       ` Frederic Konrad
2015-07-10 10:06         ` alvise rigo
2015-07-10 10:24       ` Paolo Bonzini
2015-07-10 12:16         ` Frederic Konrad
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 10/13] Simple TLB flush wrap to use as exit callback Alvise Rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 11/13] Introduce exit_flush_req and tcg_excl_access_lock Alvise Rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 12/13] softmmu_llsc_template.h: move to multithreading Alvise Rigo
2015-07-17 15:27   ` Alex Bennée
2015-07-17 15:31     ` alvise rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 13/13] softmmu_template.h: " Alvise Rigo
2015-07-17 15:57   ` Alex Bennée
2015-07-17 16:19     ` alvise rigo
2015-07-10  8:31 ` [Qemu-devel] [RFC v3 00/13] Slow-path for atomic instruction translation Mark Burton
2015-07-10  8:58   ` alvise rigo
2015-07-10  8:39 ` Frederic Konrad
2015-07-10  9:04   ` alvise rigo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH47eN0-RpXTLymxU=9kA2RoGF2kZC6h5BFJ42bSudO5c8Q19g@mail.gmail.com' \
    --to=a.rigo@virtualopensystems.com \
    --cc=alex.bennee@linaro.org \
    --cc=claudio.fontana@huawei.com \
    --cc=fred.konrad@greensocs.com \
    --cc=jani.kokkonen@huawei.com \
    --cc=mttcg@listserver.greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tech@virtualopensystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.