All of lore.kernel.org
 help / color / mirror / Atom feed
From: alvise rigo <a.rigo@virtualopensystems.com>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: mttcg@listserver.greensocs.com,
	Claudio Fontana <claudio.fontana@huawei.com>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jani Kokkonen <jani.kokkonen@huawei.com>,
	VirtualOpenSystems Technical Team <tech@virtualopensystems.com>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [RFC v6 10/14] softmmu: Simplify helper_*_st_name, wrap unaligned code
Date: Thu, 7 Jan 2016 17:54:29 +0100	[thread overview]
Message-ID: <CAH47eN2+_vBgajOqV__=qJ=BcV7t+kEU7qz6YJQaq4pJ33YD9w@mail.gmail.com> (raw)
In-Reply-To: <8737u9cppj.fsf@linaro.org>

On Thu, Jan 7, 2016 at 5:35 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> alvise rigo <a.rigo@virtualopensystems.com> writes:
>
>> On Thu, Jan 7, 2016 at 3:46 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>
>>> Alvise Rigo <a.rigo@virtualopensystems.com> writes:
>>>
>>>> Attempting to simplify the helper_*_st_name, wrap the
>>>> do_unaligned_access code into an inline function.
>>>> Remove also the goto statement.
>>>
>>> As I said in the other thread I think these sort of clean-ups can come
>>> before the ll/sc implementations and potentially get merged ahead of the
>>> rest of it.
>>>
>>>>
>>>> Suggested-by: Jani Kokkonen <jani.kokkonen@huawei.com>
>>>> Suggested-by: Claudio Fontana <claudio.fontana@huawei.com>
>>>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>>>> ---
>>>>  softmmu_template.h | 96 ++++++++++++++++++++++++++++++++++--------------------
>>>>  1 file changed, 60 insertions(+), 36 deletions(-)
>>>>
>>>> diff --git a/softmmu_template.h b/softmmu_template.h
>>>> index d3d5902..92f92b1 100644
>>>> --- a/softmmu_template.h
>>>> +++ b/softmmu_template.h
>>>> @@ -370,6 +370,32 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>>>>                                   iotlbentry->attrs);
>>>>  }
>>>>
>>>> +static inline void glue(helper_le_st_name, _do_unl_access)(CPUArchState *env,
>>>> +                                                           DATA_TYPE val,
>>>> +                                                           target_ulong addr,
>>>> +                                                           TCGMemOpIdx oi,
>>>> +                                                           unsigned mmu_idx,
>>>> +                                                           uintptr_t retaddr)
>>>> +{
>>>> +    int i;
>>>> +
>>>> +    if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
>>>> +        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
>>>> +                             mmu_idx, retaddr);
>>>> +    }
>>>> +    /* XXX: not efficient, but simple */
>>>> +    /* Note: relies on the fact that tlb_fill() does not remove the
>>>> +     * previous page from the TLB cache.  */
>>>> +    for (i = DATA_SIZE - 1; i >= 0; i--) {
>>>> +        /* Little-endian extract.  */
>>>> +        uint8_t val8 = val >> (i * 8);
>>>> +        /* Note the adjustment at the beginning of the function.
>>>> +           Undo that for the recursion.  */
>>>> +        glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
>>>> +                                        oi, retaddr + GETPC_ADJ);
>>>> +    }
>>>> +}
>>>
>>> There is still duplication of 99% of the code here which is silly given
>>
>> Then why should we keep this template-like design in the first place?
>> I tried to keep the code duplication for performance reasons
>> (otherwise how can we justify the two almost identical versions of the
>> helper?), while making the code more compact and readable.
>
> We shouldn't really - code duplication is bad for all the well known
> reasons. The main reason we need explicit helpers for the be/le case are
> because they are called directly from the TCG code which encodes the
> endianess decision in the call it makes. However that doesn't stop us
> making generic inline helpers (helpers for the helpers ;-) which the
> compiler can sort out.

I thought you wanted to make conditional all the le/be differences not
just those helpers for the helpers...
So, if we are allowed to introduce this small overhead, all the
helper_{le,be}_st_name_do_{unl,mmio,ram}_access can be squashed to
helper_generic_st_do_{unl,mmio,ram}_access. I think this is want you
proposed in the POC, right?

>
>>
>>> the compiler inlines the code anyway. If we gave the helper a more
>>> generic name and passed the endianess in via args I would hope the
>>> compiler did the sensible thing and constant fold the code. Something
>>> like:
>>>
>>> static inline void glue(helper_generic_st_name, _do_unl_access)
>>>                         (CPUArchState *env,
>>>                         bool little_endian,
>>>                         DATA_TYPE val,
>>>                         target_ulong addr,
>>>                         TCGMemOpIdx oi,
>>>                         unsigned mmu_idx,
>>>                         uintptr_t retaddr)
>>> {
>>>     int i;
>>>
>>>     if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
>>>         cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
>>>                              mmu_idx, retaddr);
>>>     }
>>>     /* Note: relies on the fact that tlb_fill() does not remove the
>>>      * previous page from the TLB cache.  */
>>>     for (i = DATA_SIZE - 1; i >= 0; i--) {
>>>         if (little_endian) {
>>
>> little_endian will have >99% of the time the same value, does it make
>> sense to have a branch here?
>
> The compiler should detect that little_endian is constant when it
> inlines the code and not bother generating a test/branch case for
> something that will never happen.
>
> Even if it did though I doubt a local branch would stall the processor
> that much, have you counted how many instructions we execute once we are
> on the slow path?

Too many :)

Regards,
alvise

>
>>
>> Thank you,
>> alvise
>>
>>>                 /* Little-endian extract.  */
>>>                 uint8_t val8 = val >> (i * 8);
>>>         } else {
>>>                 /* Big-endian extract.  */
>>>                 uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
>>>         }
>>>         /* Note the adjustment at the beginning of the function.
>>>            Undo that for the recursion.  */
>>>         glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
>>>                                         oi, retaddr + GETPC_ADJ);
>>>     }
>>> }
>>>
>>>
>>>> +
>>>>  void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>>                         TCGMemOpIdx oi, uintptr_t retaddr)
>>>>  {
>>>> @@ -433,7 +459,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>>              return;
>>>>          } else {
>>>>              if ((addr & (DATA_SIZE - 1)) != 0) {
>>>> -                goto do_unaligned_access;
>>>> +                glue(helper_le_st_name, _do_unl_access)(env, val, addr, mmu_idx,
>>>> +                                                        oi, retaddr);
>>>>              }
>>>>              iotlbentry = &env->iotlb[mmu_idx][index];
>>>>
>>>> @@ -449,23 +476,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>>      if (DATA_SIZE > 1
>>>>          && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
>>>>                       >= TARGET_PAGE_SIZE)) {
>>>> -        int i;
>>>> -    do_unaligned_access:
>>>> -        if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
>>>> -            cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
>>>> -                                 mmu_idx, retaddr);
>>>> -        }
>>>> -        /* XXX: not efficient, but simple */
>>>> -        /* Note: relies on the fact that tlb_fill() does not remove the
>>>> -         * previous page from the TLB cache.  */
>>>> -        for (i = DATA_SIZE - 1; i >= 0; i--) {
>>>> -            /* Little-endian extract.  */
>>>> -            uint8_t val8 = val >> (i * 8);
>>>> -            /* Note the adjustment at the beginning of the function.
>>>> -               Undo that for the recursion.  */
>>>> -            glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
>>>> -                                            oi, retaddr + GETPC_ADJ);
>>>> -        }
>>>> +        glue(helper_le_st_name, _do_unl_access)(env, val, addr, oi, mmu_idx,
>>>> +                                                retaddr);
>>>>          return;
>>>>      }
>>>>
>>>> @@ -485,6 +497,32 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>>  }
>>>>
>>>>  #if DATA_SIZE > 1
>>>> +static inline void glue(helper_be_st_name, _do_unl_access)(CPUArchState *env,
>>>> +                                                           DATA_TYPE val,
>>>> +                                                           target_ulong addr,
>>>> +                                                           TCGMemOpIdx oi,
>>>> +                                                           unsigned mmu_idx,
>>>> +                                                           uintptr_t retaddr)
>>>> +{
>>>> +    int i;
>>>> +
>>>> +    if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
>>>> +        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
>>>> +                             mmu_idx, retaddr);
>>>> +    }
>>>> +    /* XXX: not efficient, but simple */
>>>> +    /* Note: relies on the fact that tlb_fill() does not remove the
>>>> +     * previous page from the TLB cache.  */
>>>> +    for (i = DATA_SIZE - 1; i >= 0; i--) {
>>>> +        /* Big-endian extract.  */
>>>> +        uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
>>>> +        /* Note the adjustment at the beginning of the function.
>>>> +           Undo that for the recursion.  */
>>>> +        glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
>>>> +                                        oi, retaddr + GETPC_ADJ);
>>>> +    }
>>>> +}
>>>
>>> Not that it matters if you combine to two as suggested because anything
>>> not called shouldn't generate the code.
>>>
>>>>  void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>>                         TCGMemOpIdx oi, uintptr_t retaddr)
>>>>  {
>>>> @@ -548,7 +586,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>>              return;
>>>>          } else {
>>>>              if ((addr & (DATA_SIZE - 1)) != 0) {
>>>> -                goto do_unaligned_access;
>>>> +                glue(helper_be_st_name, _do_unl_access)(env, val, addr, mmu_idx,
>>>> +                                                        oi, retaddr);
>>>>              }
>>>>              iotlbentry = &env->iotlb[mmu_idx][index];
>>>>
>>>> @@ -564,23 +603,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>>>      if (DATA_SIZE > 1
>>>>          && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
>>>>                       >= TARGET_PAGE_SIZE)) {
>>>> -        int i;
>>>> -    do_unaligned_access:
>>>> -        if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
>>>> -            cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
>>>> -                                 mmu_idx, retaddr);
>>>> -        }
>>>> -        /* XXX: not efficient, but simple */
>>>> -        /* Note: relies on the fact that tlb_fill() does not remove the
>>>> -         * previous page from the TLB cache.  */
>>>> -        for (i = DATA_SIZE - 1; i >= 0; i--) {
>>>> -            /* Big-endian extract.  */
>>>> -            uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
>>>> -            /* Note the adjustment at the beginning of the function.
>>>> -               Undo that for the recursion.  */
>>>> -            glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
>>>> -                                            oi, retaddr + GETPC_ADJ);
>>>> -        }
>>>> +        glue(helper_be_st_name, _do_unl_access)(env, val, addr, oi, mmu_idx,
>>>> +                                                retaddr);
>>>>          return;
>>>>      }
>>>
>>>
>>> --
>>> Alex Bennée
>
>
> --
> Alex Bennée

  reply	other threads:[~2016-01-07 16:54 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-14  8:41 [Qemu-devel] [RFC v6 00/14] Slow-path for atomic instruction translation Alvise Rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 01/14] exec.c: Add new exclusive bitmap to ram_list Alvise Rigo
2015-12-18 13:18   ` Alex Bennée
2015-12-18 13:47     ` alvise rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 02/14] softmmu: Add new TLB_EXCL flag Alvise Rigo
2016-01-05 16:10   ` Alex Bennée
2016-01-05 17:27     ` alvise rigo
2016-01-05 18:39       ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 03/14] Add CPUClass hook to set exclusive range Alvise Rigo
2016-01-05 16:42   ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 04/14] softmmu: Add helpers for a new slowpath Alvise Rigo
2016-01-06 15:16   ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 05/14] tcg: Create new runtime helpers for excl accesses Alvise Rigo
2015-12-14  9:40   ` Paolo Bonzini
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 06/14] configure: Use slow-path for atomic only when the softmmu is enabled Alvise Rigo
2015-12-14  9:38   ` Paolo Bonzini
2015-12-14  9:39     ` Paolo Bonzini
2015-12-14 10:14   ` Laurent Vivier
2015-12-15 14:23     ` alvise rigo
2015-12-15 14:31       ` Paolo Bonzini
2015-12-15 15:18         ` Laurent Vivier
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 07/14] target-arm: translate: Use ld/st excl for atomic insns Alvise Rigo
2016-01-06 17:11   ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 08/14] target-arm: Add atomic_clear helper for CLREX insn Alvise Rigo
2016-01-06 17:13   ` Alex Bennée
2016-01-06 17:27     ` alvise rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 09/14] softmmu: Add history of excl accesses Alvise Rigo
2015-12-14  9:35   ` Paolo Bonzini
2015-12-15 14:26     ` alvise rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 10/14] softmmu: Simplify helper_*_st_name, wrap unaligned code Alvise Rigo
2016-01-07 14:46   ` Alex Bennée
2016-01-07 15:09     ` alvise rigo
2016-01-07 16:35       ` Alex Bennée
2016-01-07 16:54         ` alvise rigo [this message]
2016-01-07 17:36           ` Alex Bennée
2016-01-08 11:19   ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 11/14] softmmu: Simplify helper_*_st_name, wrap MMIO code Alvise Rigo
2016-01-11  9:54   ` Alex Bennée
2016-01-11 10:19     ` alvise rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 12/14] softmmu: Simplify helper_*_st_name, wrap RAM code Alvise Rigo
2015-12-17 16:52   ` Alex Bennée
2015-12-17 17:13     ` alvise rigo
2015-12-17 20:20       ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 13/14] softmmu: Include MMIO/invalid exclusive accesses Alvise Rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 14/14] softmmu: Protect MMIO exclusive range Alvise Rigo
2015-12-14  9:33 ` [Qemu-devel] [RFC v6 00/14] Slow-path for atomic instruction translation Paolo Bonzini
2015-12-14 10:04   ` alvise rigo
2015-12-14 10:17     ` Paolo Bonzini
2015-12-15 13:59       ` alvise rigo
2015-12-15 14:18         ` Paolo Bonzini
2015-12-15 14:22           ` alvise rigo
2015-12-14 22:09 ` Andreas Tobler
2015-12-15  8:16   ` alvise rigo
2015-12-17 16:06 ` Alex Bennée
2015-12-17 16:16   ` alvise rigo
2016-01-06 18:00 ` Andrew Baumann
2016-01-07 10:21   ` alvise rigo
2016-01-07 10:22     ` Peter Maydell
2016-01-07 10:49       ` alvise rigo
2016-01-07 11:16         ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH47eN2+_vBgajOqV__=qJ=BcV7t+kEU7qz6YJQaq4pJ33YD9w@mail.gmail.com' \
    --to=a.rigo@virtualopensystems.com \
    --cc=alex.bennee@linaro.org \
    --cc=claudio.fontana@huawei.com \
    --cc=jani.kokkonen@huawei.com \
    --cc=mttcg@listserver.greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=tech@virtualopensystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.