* [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb()
@ 2023-07-30 17:56 Helge Deller
2023-07-30 18:01 ` Richard Henderson
2023-07-30 20:03 ` Richard Henderson
0 siblings, 2 replies; 7+ messages in thread
From: Helge Deller @ 2023-07-30 17:56 UTC (permalink / raw)
To: Richard Henderson, Laurent Vivier, qemu-devel
I'm quite unclear about translator_use_goto_tb() for qemu-user
emulation....(and in general).
Based on the function name, the function translator_use_goto_tb() shall
help to decide if a program should use goto_tb() and exit_tb() to jump
to the next instruction.
Currently, if the destination is on the same page, it returns true.
I wonder, if it shouldn't return false in this case instead, because
arches have code like this: (taken from target/hppa/translate.c):
if (... && translator_use_goto_tb(ctx, f)) {
tcg_gen_goto_tb(which);
tcg_gen_movi_reg(cpu_iaoq_f, f);
tcg_gen_movi_reg(cpu_iaoq_b, b);
tcg_gen_exit_tb(ctx->base.tb, which);
} else {
copy_iaoq_entry(cpu_iaoq_f, f, cpu_iaoq_b);
copy_iaoq_entry(cpu_iaoq_b, b, ctx->iaoq_n_var);
tcg_gen_lookup_and_goto_ptr();
}
Shouldn't, if the destination is on the same page, the (faster?)
path with tcg_gen_lookup_and_goto_ptr() be taken instead?
Now, for user-mode emulation page faults can't happen at all.
Shouldn't in this case the tcg_gen_lookup_and_goto_ptr() path been taken
unconditionally, as shown in the patch below?
Signed-off-by: Helge Deller <deller@gmx.de>
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 1a6a5448c8..07224a7f83 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -124,8 +124,13 @@ bool translator_use_goto_tb(DisasContextBase *db, vaddr dest)
return false;
}
+#ifndef CONFIG_USER_ONLY
/* Check for the dest on the same page as the start of the TB. */
return ((db->pc_first ^ dest) & TARGET_PAGE_MASK) == 0;
+#else
+ /* linux-user doesn't need to fear pagefaults for exec swap-in */
+ return false;
+#endif
}
void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb()
2023-07-30 17:56 [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb() Helge Deller
@ 2023-07-30 18:01 ` Richard Henderson
2023-07-30 18:02 ` Richard Henderson
2023-07-30 20:03 ` Richard Henderson
1 sibling, 1 reply; 7+ messages in thread
From: Richard Henderson @ 2023-07-30 18:01 UTC (permalink / raw)
To: Helge Deller, Laurent Vivier, qemu-devel
On 7/30/23 10:56, Helge Deller wrote:
> +++ b/accel/tcg/translator.c
> @@ -124,8 +124,13 @@ bool translator_use_goto_tb(DisasContextBase *db, vaddr dest)
> return false;
> }
>
> +#ifndef CONFIG_USER_ONLY
> /* Check for the dest on the same page as the start of the TB. */
> return ((db->pc_first ^ dest) & TARGET_PAGE_MASK) == 0;
> +#else
> + /* linux-user doesn't need to fear pagefaults for exec swap-in */
> + return false;
> +#endif
> }
No, we still need to stop at pages for breakpoints.
r~
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb()
2023-07-30 18:01 ` Richard Henderson
@ 2023-07-30 18:02 ` Richard Henderson
2023-07-30 18:19 ` Helge Deller
0 siblings, 1 reply; 7+ messages in thread
From: Richard Henderson @ 2023-07-30 18:02 UTC (permalink / raw)
To: Helge Deller, Laurent Vivier, qemu-devel
On 7/30/23 11:01, Richard Henderson wrote:
> On 7/30/23 10:56, Helge Deller wrote:
>> +++ b/accel/tcg/translator.c
>> @@ -124,8 +124,13 @@ bool translator_use_goto_tb(DisasContextBase *db, vaddr dest)
>> return false;
>> }
>>
>> +#ifndef CONFIG_USER_ONLY
>> /* Check for the dest on the same page as the start of the TB. */
>> return ((db->pc_first ^ dest) & TARGET_PAGE_MASK) == 0;
>> +#else
>> + /* linux-user doesn't need to fear pagefaults for exec swap-in */
>> + return false;
>> +#endif
>> }
>
> No, we still need to stop at pages for breakpoints.
... and munmap for e.g. dlclose.
r~
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb()
2023-07-30 18:02 ` Richard Henderson
@ 2023-07-30 18:19 ` Helge Deller
0 siblings, 0 replies; 7+ messages in thread
From: Helge Deller @ 2023-07-30 18:19 UTC (permalink / raw)
To: Richard Henderson, Laurent Vivier, qemu-devel
On 7/30/23 20:02, Richard Henderson wrote:
> On 7/30/23 11:01, Richard Henderson wrote:
>> On 7/30/23 10:56, Helge Deller wrote:
>>> +++ b/accel/tcg/translator.c
>>> @@ -124,8 +124,13 @@ bool translator_use_goto_tb(DisasContextBase *db, vaddr dest)
>>> return false;
>>> }
>>>
>>> +#ifndef CONFIG_USER_ONLY
>>> /* Check for the dest on the same page as the start of the TB. */
>>> return ((db->pc_first ^ dest) & TARGET_PAGE_MASK) == 0;
>>> +#else
>>> + /* linux-user doesn't need to fear pagefaults for exec swap-in */
>>> + return false;
>>> +#endif
>>> }
>>
>> No, we still need to stop at pages for breakpoints.
> ... and munmap for e.g. dlclose.
Ok, so we can't optimize for linux-user.
But what about my other question: Shouldn't it be
return ((db->pc_first ^ dest) & TARGET_PAGE_MASK) != 0;
?
helge
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb()
2023-07-30 17:56 [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb() Helge Deller
2023-07-30 18:01 ` Richard Henderson
@ 2023-07-30 20:03 ` Richard Henderson
2023-07-30 20:37 ` Helge Deller
1 sibling, 1 reply; 7+ messages in thread
From: Richard Henderson @ 2023-07-30 20:03 UTC (permalink / raw)
To: Helge Deller, Laurent Vivier, qemu-devel
On 7/30/23 10:56, Helge Deller wrote:
> I'm quite unclear about translator_use_goto_tb() for qemu-user
> emulation....(and in general).
>
> Based on the function name, the function translator_use_goto_tb() shall
> help to decide if a program should use goto_tb() and exit_tb() to jump
> to the next instruction.
>
> Currently, if the destination is on the same page, it returns true.
> I wonder, if it shouldn't return false in this case instead, because
> arches have code like this: (taken from target/hppa/translate.c):
> if (... && translator_use_goto_tb(ctx, f)) {
> tcg_gen_goto_tb(which);
> tcg_gen_movi_reg(cpu_iaoq_f, f);
> tcg_gen_movi_reg(cpu_iaoq_b, b);
> tcg_gen_exit_tb(ctx->base.tb, which);
> } else {
> copy_iaoq_entry(cpu_iaoq_f, f, cpu_iaoq_b);
> copy_iaoq_entry(cpu_iaoq_b, b, ctx->iaoq_n_var);
> tcg_gen_lookup_and_goto_ptr();
> }
>
> Shouldn't, if the destination is on the same page, the (faster?)
> path with tcg_gen_lookup_and_goto_ptr() be taken instead?
No, because tcg_gen_lookup_and_goto_ptr is not the faster path.
That always involves a lookup, then an indirect branch.
The goto_tb path is linked, so only requires a lookup once, and the branch may be direct
(depending on the host architecture).
r~
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb()
2023-07-30 20:03 ` Richard Henderson
@ 2023-07-30 20:37 ` Helge Deller
2023-07-31 14:50 ` Richard Henderson
0 siblings, 1 reply; 7+ messages in thread
From: Helge Deller @ 2023-07-30 20:37 UTC (permalink / raw)
To: Richard Henderson, Laurent Vivier, qemu-devel
On 7/30/23 22:03, Richard Henderson wrote:
> On 7/30/23 10:56, Helge Deller wrote:
>> I'm quite unclear about translator_use_goto_tb() for qemu-user
>> emulation....(and in general).
>>
>> Based on the function name, the function translator_use_goto_tb() shall
>> help to decide if a program should use goto_tb() and exit_tb() to jump
>> to the next instruction.
>>
>> Currently, if the destination is on the same page, it returns true.
>> I wonder, if it shouldn't return false in this case instead, because
>> arches have code like this: (taken from target/hppa/translate.c):
>> if (... && translator_use_goto_tb(ctx, f)) {
>> tcg_gen_goto_tb(which);
>> tcg_gen_movi_reg(cpu_iaoq_f, f);
>> tcg_gen_movi_reg(cpu_iaoq_b, b);
>> tcg_gen_exit_tb(ctx->base.tb, which);
>> } else {
>> copy_iaoq_entry(cpu_iaoq_f, f, cpu_iaoq_b);
>> copy_iaoq_entry(cpu_iaoq_b, b, ctx->iaoq_n_var);
>> tcg_gen_lookup_and_goto_ptr();
>> }
>>
>> Shouldn't, if the destination is on the same page, the (faster?)
>> path with tcg_gen_lookup_and_goto_ptr() be taken instead?
>
> No, because tcg_gen_lookup_and_goto_ptr is not the faster path.
> That always involves a lookup, then an indirect branch.
Ah, ok. So my assumption was wrong, and this explains it.
> The goto_tb path is linked, so only requires a lookup once, and the
> branch may be direct (depending on the host architecture).
Probably the last question in this regard:
This code:
IN:
0x00010c98: cmpib,<>,n 0,r19,0x10c98
generates "nop/jmp" in the code:
the tcg_gen_goto_tb() branch:
OUT:
0x7fd7e400070e: 85 db testl %ebx, %ebx
0x7fd7e4000710: 0f 85 20 00 00 00 jne 0x7fd7e4000736
0x7fd7e4000716: 90 nop <- from "tcg_gen_op1i(INDEX_op_goto_tb, idx)" in tcg_gen_goto_tb()
0x7fd7e4000717: e9 00 00 00 00 jmp 0x7fd7e400071c <- jump is effective useless.
0x7fd7e400071c: c7 45 00 a3 0c 01 00 movl $0x10ca3, (%rbp)
0x7fd7e4000723: c7 45 04 a7 0c 01 00 movl $0x10ca7, 4(%rbp)
0x7fd7e400072a: 48 8d 05 0f ff ff ff leaq -0xf1(%rip), %rax
0x7fd7e4000731: e9 e2 f8 ff ff jmp 0x7fd7e4000018
0x7fd7e4000736: 90 nop <- here too.
0x7fd7e4000737: e9 00 00 00 00 jmp 0x7fd7e400073c
0x7fd7e400073c: c7 45 00 9f 0c 01 00 movl $0x10c9f, (%rbp)
0x7fd7e4000743: c7 45 04 9b 0c 01 00 movl $0x10c9b, 4(%rbp)
0x7fd7e400074a: 48 8d 05 f0 fe ff ff leaq -0x110(%rip), %rax
0x7fd7e4000751: e9 c2 f8 ff ff jmp 0x7fd7e4000018
I assume those nops/jmp+0 is to be able to insert breakpoints?
But those nops/jmps are never in the tcg_gen_lookup_and_goto_ptr() branch,
probably because breakpoints are checked in HELPER(lookup_tb_ptr), right?
0x7ff47c0006d0: 0f 85 18 00 00 00 jne 0x7ff47c0006ee
0x7ff47c0006d6: c7 45 00 a3 0c 01 00 movl $0x10ca3, (%rbp)
0x7ff47c0006dd: c7 45 04 a7 0c 01 00 movl $0x10ca7, 4(%rbp)
0x7ff47c0006e4: 48 8b fd movq %rbp, %rdi
0x7ff47c0006e7: e8 34 bf 42 0f callq 0x7ff48b42c620
0x7ff47c0006ec: ff e0 jmpq *%rax
0x7ff47c0006ee: c7 45 00 9f 0c 01 00 movl $0x10c9f, (%rbp)
0x7ff47c0006f5: c7 45 04 9b 0c 01 00 movl $0x10c9b, 4(%rbp)
0x7ff47c0006fc: 48 8b fd movq %rbp, %rdi
0x7ff47c0006ff: e8 1c bf 42 0f callq 0x7ff48b42c620
0x7ff47c000704: ff e0 jmpq *%rax
Question:
Couldn't those nops be avoided in the tcg_gen_goto_tb() branch too, e.g.
if breakpoints are checked when returning through the prologue exit path?
Thanks!
Helge
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb()
2023-07-30 20:37 ` Helge Deller
@ 2023-07-31 14:50 ` Richard Henderson
0 siblings, 0 replies; 7+ messages in thread
From: Richard Henderson @ 2023-07-31 14:50 UTC (permalink / raw)
To: Helge Deller, Laurent Vivier, qemu-devel
On 7/30/23 13:37, Helge Deller wrote:
> On 7/30/23 22:03, Richard Henderson wrote:
>> On 7/30/23 10:56, Helge Deller wrote:
>>> I'm quite unclear about translator_use_goto_tb() for qemu-user
>>> emulation....(and in general).
>>>
>>> Based on the function name, the function translator_use_goto_tb() shall
>>> help to decide if a program should use goto_tb() and exit_tb() to jump
>>> to the next instruction.
>>>
>>> Currently, if the destination is on the same page, it returns true.
>>> I wonder, if it shouldn't return false in this case instead, because
>>> arches have code like this: (taken from target/hppa/translate.c):
>>> if (... && translator_use_goto_tb(ctx, f)) {
>>> tcg_gen_goto_tb(which);
>>> tcg_gen_movi_reg(cpu_iaoq_f, f);
>>> tcg_gen_movi_reg(cpu_iaoq_b, b);
>>> tcg_gen_exit_tb(ctx->base.tb, which);
>>> } else {
>>> copy_iaoq_entry(cpu_iaoq_f, f, cpu_iaoq_b);
>>> copy_iaoq_entry(cpu_iaoq_b, b, ctx->iaoq_n_var);
>>> tcg_gen_lookup_and_goto_ptr();
>>> }
>>>
>>> Shouldn't, if the destination is on the same page, the (faster?)
>>> path with tcg_gen_lookup_and_goto_ptr() be taken instead?
>>
>> No, because tcg_gen_lookup_and_goto_ptr is not the faster path.
>> That always involves a lookup, then an indirect branch.
>
> Ah, ok. So my assumption was wrong, and this explains it.
>
>> The goto_tb path is linked, so only requires a lookup once, and the
>> branch may be direct (depending on the host architecture).
> Probably the last question in this regard:
>
> This code:
> IN:
> 0x00010c98: cmpib,<>,n 0,r19,0x10c98
>
> generates "nop/jmp" in the code:
>
> the tcg_gen_goto_tb() branch:
> OUT:
> 0x7fd7e400070e: 85 db testl %ebx, %ebx
> 0x7fd7e4000710: 0f 85 20 00 00 00 jne 0x7fd7e4000736
> 0x7fd7e4000716: 90 nop <- from
> "tcg_gen_op1i(INDEX_op_goto_tb, idx)" in tcg_gen_goto_tb()
> 0x7fd7e4000717: e9 00 00 00 00 jmp 0x7fd7e400071c <- jump is effective
> useless.
> 0x7fd7e400071c: c7 45 00 a3 0c 01 00 movl $0x10ca3, (%rbp)
> 0x7fd7e4000723: c7 45 04 a7 0c 01 00 movl $0x10ca7, 4(%rbp)
> 0x7fd7e400072a: 48 8d 05 0f ff ff ff leaq -0xf1(%rip), %rax
> 0x7fd7e4000731: e9 e2 f8 ff ff jmp 0x7fd7e4000018
> 0x7fd7e4000736: 90 nop <- here too.
> 0x7fd7e4000737: e9 00 00 00 00 jmp 0x7fd7e400073c
> 0x7fd7e400073c: c7 45 00 9f 0c 01 00 movl $0x10c9f, (%rbp)
> 0x7fd7e4000743: c7 45 04 9b 0c 01 00 movl $0x10c9b, 4(%rbp)
> 0x7fd7e400074a: 48 8d 05 f0 fe ff ff leaq -0x110(%rip), %rax
> 0x7fd7e4000751: e9 c2 f8 ff ff jmp 0x7fd7e4000018
>
> I assume those nops/jmp+0 is to be able to insert breakpoints?
No.
The destination of the jmp is patched by tb_target_set_jmp_target, which happens some time
after this disassembly. The nop is present to ensure that the patch point is aligned, so
that it is one 4-byte atomic store.
r~
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-07-31 14:51 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-30 17:56 [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb() Helge Deller
2023-07-30 18:01 ` Richard Henderson
2023-07-30 18:02 ` Richard Henderson
2023-07-30 18:19 ` Helge Deller
2023-07-30 20:03 ` Richard Henderson
2023-07-30 20:37 ` Helge Deller
2023-07-31 14:50 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).