[PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush
@ 2022-01-10 16:47 Idan Horowitz
  2022-01-10 16:47 ` [PATCH 2/2] target/arm: Bail out early on 0-length tlb range invalidate Idan Horowitz
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Idan Horowitz @ 2022-01-10 16:47 UTC (permalink / raw)
  To: qemu-arm
  Cc: peter.maydell, richard.henderson, Idan Horowitz, qemu-devel, pbonzini

When the length of the range is large enough, clearing the whole cache is
faster than iterating over the (possibly extremely large) set of pages
contained in the range.

This mimics the pre-existing similar optimization done on the flush of the
tlb itself.

Signed-off-by: Idan Horowitz <idan.horowitz@gmail.com>
---
 accel/tcg/cputlb.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 5e0d0eebc3..926d9a9192 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -783,6 +783,15 @@ static void tlb_flush_range_by_mmuidx_async_0(CPUState *cpu,
     }
     qemu_spin_unlock(&env_tlb(env)->c.lock);
 
+    /*
+     * If the length is larger than the jump cache size, then it will take
+     * longer to clear each entry individually than it will to clear it all.
+     */
+    if (d.len >= (TARGET_PAGE_SIZE * TB_JMP_CACHE_SIZE)) {
+        cpu_tb_jmp_cache_clear(cpu);
+        return;
+    }
+
     for (target_ulong i = 0; i < d.len; i += TARGET_PAGE_SIZE) {
         tb_flush_jmp_cache(cpu, d.addr + i);
     }
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] target/arm: Bail out early on 0-length tlb range invalidate
  2022-01-10 16:47 [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush Idan Horowitz
@ 2022-01-10 16:47 ` Idan Horowitz
  2022-01-25 22:05   ` Richard Henderson
  2022-01-14 15:48 ` [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush Alex Bennée
  2022-01-25 21:55 ` Richard Henderson
  2 siblings, 1 reply; 9+ messages in thread
From: Idan Horowitz @ 2022-01-10 16:47 UTC (permalink / raw)
  To: qemu-arm
  Cc: peter.maydell, richard.henderson, Idan Horowitz, qemu-devel, pbonzini

If the given range specifies no addresses to be flushed there's no reason
to schedule a function on all CPUs that does nothing.

Signed-off-by: Idan Horowitz <idan.horowitz@gmail.com>
---
 target/arm/helper.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index cfca0f5ba6..1e819835c2 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4564,6 +4564,10 @@ static void do_rvae_write(CPUARMState *env, uint64_t value,
     length = tlbi_aa64_range_get_length(env, value);
     bits = tlbbits_for_regime(env, one_idx, baseaddr);
 
+    if (length == 0) {
+        return;
+    }
+
     if (synced) {
         tlb_flush_range_by_mmuidx_all_cpus_synced(env_cpu(env),
                                                   baseaddr,
-- 
2.33.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush
  2022-01-10 16:47 [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush Idan Horowitz
  2022-01-10 16:47 ` [PATCH 2/2] target/arm: Bail out early on 0-length tlb range invalidate Idan Horowitz
@ 2022-01-14 15:48 ` Alex Bennée
  2022-01-14 16:41   ` Idan Horowitz
  2022-01-25 21:55 ` Richard Henderson
  2 siblings, 1 reply; 9+ messages in thread
From: Alex Bennée @ 2022-01-14 15:48 UTC (permalink / raw)
  To: Idan Horowitz
  Cc: peter.maydell, qemu-arm, richard.henderson, qemu-devel, pbonzini


Idan Horowitz <idan.horowitz@gmail.com> writes:

> When the length of the range is large enough, clearing the whole cache is
> faster than iterating over the (possibly extremely large) set of pages
> contained in the range.
>
> This mimics the pre-existing similar optimization done on the flush of the
> tlb itself.
>
> Signed-off-by: Idan Horowitz <idan.horowitz@gmail.com>

For multi-patch series please include a cover letter which is the parent
of all the patches. This is the default for git-send-email.

The code itself looks fine but what sort of improvements are we talking
about here? What measurements have you taken and under what conditions?

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush
  2022-01-14 15:48 ` [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush Alex Bennée
@ 2022-01-14 16:41   ` Idan Horowitz
  2022-01-14 20:56     ` Idan Horowitz
  0 siblings, 1 reply; 9+ messages in thread
From: Idan Horowitz @ 2022-01-14 16:41 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Peter Maydell, qemu-arm, richard.henderson, qemu-devel, pbonzini

Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> For multi-patch series please include a cover letter which is the parent
> of all the patches. This is the default for git-send-email.
>

Sorry, I will do so from now on.

>
> The code itself looks fine but what sort of improvements are we talking
> about here? What measurements have you taken and under what conditions?
>

Execution time of the following assembly snippet in TCG
aarch64-softmmu with icount (shift=10) enabled decreased from 4
minutes and 53 seconds to below 1 second:

movk    x0, #0x0
movk    x0, #0x0, lsl #16
movk    x0, #0xff80, lsl #32
movk    x0, #0x0, lsl #48
mov      x9, #0x64
str         x9, [x8]
tlbi        rvae1, x0
ldr         x9, [x8]
sub       x9, x9, #0x1
str         x9, [x8]
cbnz     x9, 0x5168abc8

>
> --
> Alex Bennée

Idan Horowitz


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush
  2022-01-14 16:41   ` Idan Horowitz
@ 2022-01-14 20:56     ` Idan Horowitz
  0 siblings, 0 replies; 9+ messages in thread
From: Idan Horowitz @ 2022-01-14 20:56 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Peter Maydell, qemu-arm, richard.henderson, qemu-devel, pbonzini

Idan Horowitz <idan.horowitz@gmail.com> wrote:
>
> cbnz     x9, 0x5168abc8
>

I forgot to include the addresses of the instructions, making this
jump undecipherable, here's the snippet again but with addresses this
time:

0x5168abb0 movk    x0, #0x0
0x5168abb4 movk    x0, #0x0, lsl #16
0x5168abb8 movk    x0, #0xff80, lsl #32
0x5168abbc movk    x0, #0x0, lsl #48
0x5168abc0 mov     x9, #0x64
0x5168abc4 str     x9, [x8]
0x5168abc8 tlbi    rvae1, x0
0x5168abcc ldr     x9, [x8]
0x5168abd0 sub     x9, x9, #0x1
0x5168abd4 str     x9, [x8]
0x5168abd8 cbnz    x9, 0x5168abc8

>
> Idan Horowitz

Idan Horowitz


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush
  2022-01-10 16:47 [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush Idan Horowitz
  2022-01-10 16:47 ` [PATCH 2/2] target/arm: Bail out early on 0-length tlb range invalidate Idan Horowitz
  2022-01-14 15:48 ` [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush Alex Bennée
@ 2022-01-25 21:55 ` Richard Henderson
  2 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2022-01-25 21:55 UTC (permalink / raw)
  To: Idan Horowitz, qemu-arm; +Cc: peter.maydell, qemu-devel, pbonzini

On 1/11/22 3:47 AM, Idan Horowitz wrote:
> +    /*
> +     * If the length is larger than the jump cache size, then it will take
> +     * longer to clear each entry individually than it will to clear it all.
> +     */
> +    if (d.len >= (TARGET_PAGE_SIZE * TB_JMP_CACHE_SIZE)) {
> +        cpu_tb_jmp_cache_clear(cpu);
> +        return;
> +    }
> +
>       for (target_ulong i = 0; i < d.len; i += TARGET_PAGE_SIZE) {
>           tb_flush_jmp_cache(cpu, d.addr + i);
>       }
> 

Hmm.  Valid.  It looks like we could reasonably rewrite tb_flush_jmp_cache to be more 
efficient, by passing down len.  But this is a decent intermediate step.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Queued to tcg-next.

r~


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] target/arm: Bail out early on 0-length tlb range invalidate
  2022-01-10 16:47 ` [PATCH 2/2] target/arm: Bail out early on 0-length tlb range invalidate Idan Horowitz
@ 2022-01-25 22:05   ` Richard Henderson
  2022-01-25 22:06     ` Peter Maydell
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2022-01-25 22:05 UTC (permalink / raw)
  To: Idan Horowitz, qemu-arm; +Cc: peter.maydell, qemu-devel, pbonzini

On 1/11/22 3:47 AM, Idan Horowitz wrote:
> If the given range specifies no addresses to be flushed there's no reason
> to schedule a function on all CPUs that does nothing.
> 
> Signed-off-by: Idan Horowitz <idan.horowitz@gmail.com>
> ---
>   target/arm/helper.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index cfca0f5ba6..1e819835c2 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -4564,6 +4564,10 @@ static void do_rvae_write(CPUARMState *env, uint64_t value,
>       length = tlbi_aa64_range_get_length(env, value);
>       bits = tlbbits_for_regime(env, one_idx, baseaddr);
>   
> +    if (length == 0) {
> +        return;
> +    }
> +
>       if (synced) {
>           tlb_flush_range_by_mmuidx_all_cpus_synced(env_cpu(env),
>                                                     baseaddr,
> 

Looks good.  I guess we could sort the extractions above so that we do

     length = ...;
     if (length == 0) {
         return;
     }

     addr = ...
     bits = ...

Either way,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] target/arm: Bail out early on 0-length tlb range invalidate
  2022-01-25 22:05   ` Richard Henderson
@ 2022-01-25 22:06     ` Peter Maydell
  2022-01-25 23:34       ` Richard Henderson
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Maydell @ 2022-01-25 22:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: pbonzini, qemu-arm, Idan Horowitz, qemu-devel

On Tue, 25 Jan 2022 at 22:05, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 1/11/22 3:47 AM, Idan Horowitz wrote:
> > If the given range specifies no addresses to be flushed there's no reason
> > to schedule a function on all CPUs that does nothing.
> >
> > Signed-off-by: Idan Horowitz <idan.horowitz@gmail.com>
> > ---
> >   target/arm/helper.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > index cfca0f5ba6..1e819835c2 100644
> > --- a/target/arm/helper.c
> > +++ b/target/arm/helper.c
> > @@ -4564,6 +4564,10 @@ static void do_rvae_write(CPUARMState *env, uint64_t value,
> >       length = tlbi_aa64_range_get_length(env, value);
> >       bits = tlbbits_for_regime(env, one_idx, baseaddr);
> >
> > +    if (length == 0) {
> > +        return;
> > +    }
> > +
> >       if (synced) {
> >           tlb_flush_range_by_mmuidx_all_cpus_synced(env_cpu(env),
> >                                                     baseaddr,
> >
>
> Looks good.  I guess we could sort the extractions above so that we do
>
>      length = ...;
>      if (length == 0) {
>          return;
>      }
>
>      addr = ...
>      bits = ...
>
> Either way,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Should we handle this in the tlb_flush_* functions themselves,
or is it just Arm that has to fix up a special case of "actually
the length is zero" ?

-- PMM


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] target/arm: Bail out early on 0-length tlb range invalidate
  2022-01-25 22:06     ` Peter Maydell
@ 2022-01-25 23:34       ` Richard Henderson
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2022-01-25 23:34 UTC (permalink / raw)
  To: Peter Maydell; +Cc: pbonzini, qemu-arm, Idan Horowitz, qemu-devel

On 1/26/22 9:06 AM, Peter Maydell wrote:
> On Tue, 25 Jan 2022 at 22:05, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 1/11/22 3:47 AM, Idan Horowitz wrote:
>>> If the given range specifies no addresses to be flushed there's no reason
>>> to schedule a function on all CPUs that does nothing.
>>>
>>> Signed-off-by: Idan Horowitz <idan.horowitz@gmail.com>
>>> ---
>>>    target/arm/helper.c | 4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/target/arm/helper.c b/target/arm/helper.c
>>> index cfca0f5ba6..1e819835c2 100644
>>> --- a/target/arm/helper.c
>>> +++ b/target/arm/helper.c
>>> @@ -4564,6 +4564,10 @@ static void do_rvae_write(CPUARMState *env, uint64_t value,
>>>        length = tlbi_aa64_range_get_length(env, value);
>>>        bits = tlbbits_for_regime(env, one_idx, baseaddr);
>>>
>>> +    if (length == 0) {
>>> +        return;
>>> +    }
>>> +
>>>        if (synced) {
>>>            tlb_flush_range_by_mmuidx_all_cpus_synced(env_cpu(env),
>>>                                                      baseaddr,
>>>
>>
>> Looks good.  I guess we could sort the extractions above so that we do
>>
>>       length = ...;
>>       if (length == 0) {
>>           return;
>>       }
>>
>>       addr = ...
>>       bits = ...
>>
>> Either way,
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> Should we handle this in the tlb_flush_* functions themselves,
> or is it just Arm that has to fix up a special case of "actually
> the length is zero" ?

Hmm.  Probably should handle this in tlb_*, yes.
So far, Arm is the only user regardless.

r~




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-01-25 23:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-10 16:47 [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush Idan Horowitz
2022-01-10 16:47 ` [PATCH 2/2] target/arm: Bail out early on 0-length tlb range invalidate Idan Horowitz
2022-01-25 22:05   ` Richard Henderson
2022-01-25 22:06     ` Peter Maydell
2022-01-25 23:34       ` Richard Henderson
2022-01-14 15:48 ` [PATCH 1/2] accel/tcg: Optimize jump cache flush during tlb range flush Alex Bennée
2022-01-14 16:41   ` Idan Horowitz
2022-01-14 20:56     ` Idan Horowitz
2022-01-25 21:55 ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.