All of lore.kernel.org
 help / color / mirror / Atom feed
From: alvise rigo <a.rigo@virtualopensystems.com>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: mttcg@listserver.greensocs.com,
	Claudio Fontana <claudio.fontana@huawei.com>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jani Kokkonen <jani.kokkonen@huawei.com>,
	VirtualOpenSystems Technical Team <tech@virtualopensystems.com>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [RFC v6 00/14] Slow-path for atomic instruction translation
Date: Thu, 17 Dec 2015 17:16:33 +0100	[thread overview]
Message-ID: <CAH47eN0ShA_RzO251X04gh231+FGmyHemk+AgqJyS6nuukxy2A@mail.gmail.com> (raw)
In-Reply-To: <87si31f4a8.fsf@linaro.org>

[-- Attachment #1: Type: text/plain, Size: 8799 bytes --]

Hi Alex,

On Thu, Dec 17, 2015 at 5:06 PM, Alex Bennée <alex.bennee@linaro.org> wrote:

>
> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>
> > This is the sixth iteration of the patch series which applies to the
> > upstream branch of QEMU (v2.5.0-rc3).
> >
> > Changes versus previous versions are at the bottom of this cover letter.
> >
> > The code is also available at following repository:
> > https://git.virtualopensystems.com/dev/qemu-mt.git
> > branch:
> > slowpath-for-atomic-v6-no-mttcg
>
> I'm starting to look through this now. However one problem that
>

Thank you for this.


> immediately comes up is the aarch64 breakage. Because there is an
> intrinsic link between a lot of the arm and aarch64 code it breaks the
> other targets.
>
> You could fix this by ensuring that CONFIG_TCG_USE_LDST_EXCL doesn't get
> passed to the aarch64 build (tricky as aarch64-softmmu.mak includes
> arm-softmmu.mak) or bite the bullet now and add the 64 bit helpers that
> will be needed to convert the aarch64 exclusive equivalents.
>

This is what I'm doing right now :)

Best regards,
alvise


>
> >
> > This patch series provides an infrastructure for atomic instruction
> > implementation in QEMU, thus offering a 'legacy' solution for
> > translating guest atomic instructions. Moreover, it can be considered as
> > a first step toward a multi-thread TCG.
> >
> > The underlying idea is to provide new TCG helpers (sort of softmmu
> > helpers) that guarantee atomicity to some memory accesses or in general
> > a way to define memory transactions.
> >
> > More specifically, the new softmmu helpers behave as LoadLink and
> > StoreConditional instructions, and are called from TCG code by means of
> > target specific helpers. This work includes the implementation for all
> > the ARM atomic instructions, see target-arm/op_helper.c.
> >
> > The implementation heavily uses the software TLB together with a new
> > bitmap that has been added to the ram_list structure which flags, on a
> > per-CPU basis, all the memory pages that are in the middle of a LoadLink
> > (LL), StoreConditional (SC) operation.  Since all these pages can be
> > accessed directly through the fast-path and alter a vCPU's linked value,
> > the new bitmap has been coupled with a new TLB flag for the TLB virtual
> > address which forces the slow-path execution for all the accesses to a
> > page containing a linked address.
> >
> > The new slow-path is implemented such that:
> > - the LL behaves as a normal load slow-path, except for clearing the
> >   dirty flag in the bitmap.  The cputlb.c code while generating a TLB
> >   entry, checks if there is at least one vCPU that has the bit cleared
> >   in the exclusive bitmap, it that case the TLB entry will have the EXCL
> >   flag set, thus forcing the slow-path.  In order to ensure that all the
> >   vCPUs will follow the slow-path for that page, we flush the TLB cache
> >   of all the other vCPUs.
> >
> >   The LL will also set the linked address and size of the access in a
> >   vCPU's private variable. After the corresponding SC, this address will
> >   be set to a reset value.
> >
> > - the SC can fail returning 1, or succeed, returning 0.  It has to come
> >   always after a LL and has to access the same address 'linked' by the
> >   previous LL, otherwise it will fail. If in the time window delimited
> >   by a legit pair of LL/SC operations another write access happens to
> >   the linked address, the SC will fail.
> >
> > In theory, the provided implementation of TCG LoadLink/StoreConditional
> > can be used to properly handle atomic instructions on any architecture.
> >
> > The code has been tested with bare-metal test cases and by booting Linux.
> >
> > * Performance considerations
> > The new slow-path adds some overhead to the translation of the ARM
> > atomic instructions, since their emulation doesn't happen anymore only
> > in the guest (by mean of pure TCG generated code), but requires the
> > execution of two helpers functions. Despite this, the additional time
> > required to boot an ARM Linux kernel on an i7 clocked at 2.5GHz is
> > negligible.
> > Instead, on a LL/SC bound test scenario - like:
> > https://git.virtualopensystems.com/dev/tcg_baremetal_tests.git - this
> > solution requires 30% (1 million iterations) and 70% (10 millions
> > iterations) of additional time for the test to complete.
> >
> > Changes from v5:
> > - The exclusive memory region is now set through a CPUClass hook,
> >   allowing any architecture to decide the memory area that will be
> >   protected during a LL/SC operation [PATCH 3]
> > - The runtime helpers dropped any target dependency and are now in a
> >   common file [PATCH 5]
> > - Improved the way we restore a guest page as non-exclusive [PATCH 9]
> > - Included MMIO memory as possible target of LL/SC
> >   instructions. This also required to somehow simplify the
> >   helper_*_st_name helpers in softmmu_template.h [PATCH 8-14]
> >
> > Changes from v4:
> > - Reworked the exclusive bitmap to be of fixed size (8 bits per address)
> > - The slow-path is now TCG backend independent, no need to touch
> >   tcg/* anymore as suggested by Aurelien Jarno.
> >
> > Changes from v3:
> > - based on upstream QEMU
> > - addressed comments from Alex Bennée
> > - the slow path can be enabled by the user with:
> >   ./configure --enable-tcg-ldst-excl only if the backend supports it
> > - all the ARM ldex/stex instructions make now use of the slow path
> > - added aarch64 TCG backend support
> > - part of the code has been rewritten
> >
> > Changes from v2:
> > - the bitmap accessors are now atomic
> > - a rendezvous between vCPUs and a simple callback support before
> executing
> >   a TB have been added to handle the TLB flush support
> > - the softmmu_template and softmmu_llsc_template have been adapted to
> work
> >   on real multi-threading
> >
> > Changes from v1:
> > - The ram bitmap is not reversed anymore, 1 = dirty, 0 = exclusive
> > - The way how the offset to access the bitmap is calculated has
> >   been improved and fixed
> > - A page to be set as dirty requires a vCPU to target the protected
> address
> >   and not just an address in the page
> > - Addressed comments from Richard Henderson to improve the logic in
> >   softmmu_template.h and to simplify the methods generation through
> >   softmmu_llsc_template.h
> > - Added initial implementation of qemu_{ldlink,stcond}_i32 for tcg/i386
> >
> > This work has been sponsored by Huawei Technologies Duesseldorf GmbH.
> >
> > Alvise Rigo (14):
> >   exec.c: Add new exclusive bitmap to ram_list
> >   softmmu: Add new TLB_EXCL flag
> >   Add CPUClass hook to set exclusive range
> >   softmmu: Add helpers for a new slowpath
> >   tcg: Create new runtime helpers for excl accesses
> >   configure: Use slow-path for atomic only when the softmmu is enabled
> >   target-arm: translate: Use ld/st excl for atomic insns
> >   target-arm: Add atomic_clear helper for CLREX insn
> >   softmmu: Add history of excl accesses
> >   softmmu: Simplify helper_*_st_name, wrap unaligned code
> >   softmmu: Simplify helper_*_st_name, wrap MMIO code
> >   softmmu: Simplify helper_*_st_name, wrap RAM code
> >   softmmu: Include MMIO/invalid exclusive accesses
> >   softmmu: Protect MMIO exclusive range
> >
> >  Makefile.target             |   2 +-
> >  configure                   |   4 +
> >  cputlb.c                    |  67 ++++++++-
> >  exec.c                      |   8 +-
> >  include/exec/cpu-all.h      |   8 ++
> >  include/exec/cpu-defs.h     |   1 +
> >  include/exec/helper-gen.h   |   1 +
> >  include/exec/helper-proto.h |   1 +
> >  include/exec/helper-tcg.h   |   1 +
> >  include/exec/memory.h       |   4 +-
> >  include/exec/ram_addr.h     |  76 ++++++++++
> >  include/qom/cpu.h           |  21 +++
> >  qom/cpu.c                   |   7 +
> >  softmmu_llsc_template.h     | 144 +++++++++++++++++++
> >  softmmu_template.h          | 338
> +++++++++++++++++++++++++++++++++-----------
> >  target-arm/helper.h         |   2 +
> >  target-arm/op_helper.c      |   6 +
> >  target-arm/translate.c      | 102 ++++++++++++-
> >  tcg-llsc-helper.c           | 109 ++++++++++++++
> >  tcg-llsc-helper.h           |  35 +++++
> >  tcg/tcg-llsc-gen-helper.h   |  32 +++++
> >  tcg/tcg.h                   |  31 ++++
> >  22 files changed, 909 insertions(+), 91 deletions(-)
> >  create mode 100644 softmmu_llsc_template.h
> >  create mode 100644 tcg-llsc-helper.c
> >  create mode 100644 tcg-llsc-helper.h
> >  create mode 100644 tcg/tcg-llsc-gen-helper.h
>
>
> --
> Alex Bennée
>

[-- Attachment #2: Type: text/html, Size: 10929 bytes --]

  reply	other threads:[~2015-12-17 16:16 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-14  8:41 [Qemu-devel] [RFC v6 00/14] Slow-path for atomic instruction translation Alvise Rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 01/14] exec.c: Add new exclusive bitmap to ram_list Alvise Rigo
2015-12-18 13:18   ` Alex Bennée
2015-12-18 13:47     ` alvise rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 02/14] softmmu: Add new TLB_EXCL flag Alvise Rigo
2016-01-05 16:10   ` Alex Bennée
2016-01-05 17:27     ` alvise rigo
2016-01-05 18:39       ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 03/14] Add CPUClass hook to set exclusive range Alvise Rigo
2016-01-05 16:42   ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 04/14] softmmu: Add helpers for a new slowpath Alvise Rigo
2016-01-06 15:16   ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 05/14] tcg: Create new runtime helpers for excl accesses Alvise Rigo
2015-12-14  9:40   ` Paolo Bonzini
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 06/14] configure: Use slow-path for atomic only when the softmmu is enabled Alvise Rigo
2015-12-14  9:38   ` Paolo Bonzini
2015-12-14  9:39     ` Paolo Bonzini
2015-12-14 10:14   ` Laurent Vivier
2015-12-15 14:23     ` alvise rigo
2015-12-15 14:31       ` Paolo Bonzini
2015-12-15 15:18         ` Laurent Vivier
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 07/14] target-arm: translate: Use ld/st excl for atomic insns Alvise Rigo
2016-01-06 17:11   ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 08/14] target-arm: Add atomic_clear helper for CLREX insn Alvise Rigo
2016-01-06 17:13   ` Alex Bennée
2016-01-06 17:27     ` alvise rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 09/14] softmmu: Add history of excl accesses Alvise Rigo
2015-12-14  9:35   ` Paolo Bonzini
2015-12-15 14:26     ` alvise rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 10/14] softmmu: Simplify helper_*_st_name, wrap unaligned code Alvise Rigo
2016-01-07 14:46   ` Alex Bennée
2016-01-07 15:09     ` alvise rigo
2016-01-07 16:35       ` Alex Bennée
2016-01-07 16:54         ` alvise rigo
2016-01-07 17:36           ` Alex Bennée
2016-01-08 11:19   ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 11/14] softmmu: Simplify helper_*_st_name, wrap MMIO code Alvise Rigo
2016-01-11  9:54   ` Alex Bennée
2016-01-11 10:19     ` alvise rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 12/14] softmmu: Simplify helper_*_st_name, wrap RAM code Alvise Rigo
2015-12-17 16:52   ` Alex Bennée
2015-12-17 17:13     ` alvise rigo
2015-12-17 20:20       ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 13/14] softmmu: Include MMIO/invalid exclusive accesses Alvise Rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 14/14] softmmu: Protect MMIO exclusive range Alvise Rigo
2015-12-14  9:33 ` [Qemu-devel] [RFC v6 00/14] Slow-path for atomic instruction translation Paolo Bonzini
2015-12-14 10:04   ` alvise rigo
2015-12-14 10:17     ` Paolo Bonzini
2015-12-15 13:59       ` alvise rigo
2015-12-15 14:18         ` Paolo Bonzini
2015-12-15 14:22           ` alvise rigo
2015-12-14 22:09 ` Andreas Tobler
2015-12-15  8:16   ` alvise rigo
2015-12-17 16:06 ` Alex Bennée
2015-12-17 16:16   ` alvise rigo [this message]
2016-01-06 18:00 ` Andrew Baumann
2016-01-07 10:21   ` alvise rigo
2016-01-07 10:22     ` Peter Maydell
2016-01-07 10:49       ` alvise rigo
2016-01-07 11:16         ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAH47eN0ShA_RzO251X04gh231+FGmyHemk+AgqJyS6nuukxy2A@mail.gmail.com \
    --to=a.rigo@virtualopensystems.com \
    --cc=alex.bennee@linaro.org \
    --cc=claudio.fontana@huawei.com \
    --cc=jani.kokkonen@huawei.com \
    --cc=mttcg@listserver.greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=tech@virtualopensystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.