qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Re: Question on implementation detail of `temp_sync`
       [not found] ` <tencent_3C5D583315945B14647C946B@qq.com>
@ 2020-08-05  8:39   ` Alex Bennée
  0 siblings, 0 replies; 2+ messages in thread
From: Alex Bennée @ 2020-08-05  8:39 UTC (permalink / raw)
  To: lrwei; +Cc: Richard Henderson, QEMU Developers, qemu-discuss


lrwei <lrwei@bupt.edu.cn> writes:

> Sorry for the unintentional sending of an uncompleted message.

Questions about the internals of the TCG are very much in the remit of
qemu-devel so are likely to get missed on qemu-discuss which is more
aimed at user questions.

>
<re-pasted to fix html noise>

> I understands that the current code works, but gets confused on why `ts` needs to be loaded in to a register when `free_or_dead` is not
> set.

It isn't, the break leaves the switch statement once it stores the
constant to memory.

> For example in the following scenario:
> movi_i32    r0, 0x1
> add_i32      r1, r1, r0
> ...
> (where r0 is not used any more, and both r0 and r1 are globals)

> If I am not mistaken, the code gen procedure of the first IR will call `temp_sync` with `free_or_dead` not set, which load the constant in to
> a register and store it back to memory. At this time, `r0` will be `TEMP_VAL_REG` instead of `TEMP_VAL_CONST`, so the following IR can't
> embed this constant operand in the assembly instruction it produces. Also, this results in a seemingly useless register allocation (, why
> don't the further use of r0 use the constant directly?)

Is this what you are actually seeing generated? If you run with -d
in_asm,op,op_opt,out_asm it should be clear what actually happened.

> So I wonder whether there is any reason for this loading a constant into register, I'll be very appreciated if someone can point out the
> reason for me.

<snip>
>
>
> Thanks in advance.
> lrwei&nbsp; 
> &nbsp;
> ------------------&nbsp;Original&nbsp;------------------
> From: &nbsp;"lrwei"<lrwei@bupt.edu.cn&gt;;
> Date: &nbsp;Tue, Aug 4, 2020 12:06 PM
> To: &nbsp;"qemu-discuss"<qemu-discuss@nongnu.org&gt;; 
> Subject: &nbsp;Question on implementation detail of `temp_sync`
>
<re-pasted fixing html noise>

> Hello to the list,
> Recently I have been studying the code of TCG, and get confused by the following detail in function `temp_sync` in tcg/tcg.c:

>     case TEMP_VAL_CONST:
>         /* If we're going to free the temp immediately, then we won't
>            require it later in a register, so attempt to store the
>            constant to memory directly.  */
>         if (free_or_dead
>            && tcg_out_sti(s, ts->type, ts->val,
>                            ts->mem_base->reg, ts->mem_offset)) {
>            break;
>         }
>         temp_load(s, ts, tcg_target_available_regs[ts->type],
>                   allocated_regs, preferred_regs);
>         /* fallthrough */

> movi_i32

> Would it be better to remove the `free_or_dead` in the if statement, i.e. turn the function to be:

>     case TEMP_VAL_CONST:
>         if (tcg_out_sti(s, ts->type, ts->val,
>                            ts->mem_base->reg, ts->mem_offset)) {
>            break;
>         }
>         temp_load(s, ts, tcg_target_available_regs[ts->type],
>                   allocated_regs, preferred_regs);
>         /* fallthrough */


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Question on implementation detail of `temp_sync`
@ 2020-08-06 15:53 lrwei
  0 siblings, 0 replies; 2+ messages in thread
From: lrwei @ 2020-08-06 15:53 UTC (permalink / raw)
  To: Alex Benn e; +Cc: Richard Henderson, QEMU Developers, qemu-discuss

> lrwei <[hidden email]> writes:
> 
> > Sorry for the unintentional sending of an uncompleted message.
> 
> Questions about the internals of the TCG are very much in the remit of
> qemu-devel so are likely to get missed on qemu-discuss which is more
> aimed at user questions.

Thanks for your reply! Next time I will send my questions there.

> 
> <re-pasted to fix html noise>
> 
> > I understands that the current code works, but gets confused on why
> > `ts` needs to be loaded in to a register when `free_or_dead` is not
> > set.
> 
> It isn't, the break leaves the switch statement once it stores the
> constant to memory.

It seems to me that if `free_or_dead` was zero, further `tcg_out_sti`
and `break` in the if-block won't be executed at all. And that is
exactly what confused me.

> 
> > For example in the following scenario:
> > movi_i32     r0, 0x1
> > add_i32      r1, r1, r0
> > ...
> > (where r0 is not used any more, and both r0 and r1 are globals)
> 
> > If I am not mistaken, the code gen procedure of the first IR will call
> > `temp_sync` with `free_or_dead` not set, which load the constant in to
> > a register and store it back to memory. At this time, `r0` will be
> > `TEMP_VAL_REG` instead of `TEMP_VAL_CONST`, so the following IR can't
> > embed this constant operand in the assembly instruction it produces.
> > Also, this results in a seemingly useless register allocation (, why
> > don't the further use of r0 use the constant directly?)
> 
> Is this what you are actually seeing generated? If you run with -d
> in_asm,op,op_opt,out_asm it should be clear what actually happened.

The above example is originally made up to illustrate my question, but
I modified the frontend to directly emit the above IR (meaninglessly),
and here is the result:

---- Pasting output of Qemu ----

QEMU 5.0.93 monitor - type 'help' for more information
(qemu) OP:
 ld_i32 tmp0,env,$0xfffffffffffffff0
 movi_i32 tmp1,$0x0
 brcond_i32 tmp0,tmp1,lt,$L0

 ---- 00000000
 movi_i32 r0,$0x1
 add_i32 r1,r1,r0
 movi_i32 pc,$0x4
 call hlt,$0x8,$0,env
 exit_tb $0x0
 set_label $L0
 exit_tb $0x7f8510000043

OP after optimization and liveness analysis:
 ld_i32 tmp0,env,$0xfffffffffffffff0      dead: 1  pref=0xffff
 movi_i32 tmp1,$0x0                       pref=0xffff
 brcond_i32 tmp0,tmp1,lt,$L0              dead: 0 1

 ---- 00000000                         
 movi_i32 r0,$0x1                         sync: 0  pref=0xffff
 add_i32 r1,r1,r0                         sync: 0  dead: 0 1 2  pref=0xffff
 movi_i32 pc,$0x4                         sync: 0  dead: 0  pref=0xffff
 call hlt,$0x8,$0,env                     dead: 0
 set_label $L0                          
 exit_tb $0x7f8510000043                

OUT: [size=72]
0x7f8510000100:  mov    -0x10(%rbp),%ebx		[tb header & initial instruction]
0x7f8510000103:  test   %ebx,%ebx
0x7f8510000105:  jl     0x7f8510000130
0x7f851000010b:  mov    $0x1,%ebx
0x7f8510000110:  mov    %ebx,0x0(%rbp)
0x7f8510000113:  mov    0x4(%rbp),%r12d
0x7f8510000117:  add    %r12d,%ebx
0x7f851000011a:  mov    %ebx,0x4(%rbp)
0x7f851000011d:  movl   $0x4,0x80(%rbp)
0x7f8510000127:  mov    %rbp,%rdi
0x7f851000012a:  callq  *0x10(%rip)        # 0x7f8510000140
0x7f8510000130:  lea    -0xf4(%rip),%rax        # 0x7f8510000043
0x7f8510000137:  jmpq   0x7f8510000018
  data: [size=8]
0x7f8510000140:  .quad  0x0000556e15d000d0

---- End of Qemu output ----

The constant is first loaded into `ebx` and then synced back to
its memory location. And further addition is of register format.
And I tried it again after remove the `free_or_dead` check in
`temp_sync`, it gives:

---- Pasting output of Qemu ----

QEMU 5.0.93 monitor - type 'help' for more information
(qemu) OP:
 ld_i32 tmp0,env,$0xfffffffffffffff0
 movi_i32 tmp1,$0x0
 brcond_i32 tmp0,tmp1,lt,$L0

 ---- 00000000
 movi_i32 r0,$0x1
 add_i32 r1,r1,r0
 movi_i32 pc,$0x4
 call hlt,$0x8,$0,env
 exit_tb $0x0
 set_label $L0
 exit_tb $0x7f9db0000043

OP after optimization and liveness analysis:
 ld_i32 tmp0,env,$0xfffffffffffffff0      dead: 1  pref=0xffff
 movi_i32 tmp1,$0x0                       pref=0xffff
 brcond_i32 tmp0,tmp1,lt,$L0              dead: 0 1

 ---- 00000000                         
 movi_i32 r0,$0x1                         sync: 0  pref=0xffff
 add_i32 r1,r1,r0                         sync: 0  dead: 0 1 2  pref=0xffff
 movi_i32 pc,$0x4                         sync: 0  dead: 0  pref=0xffff
 call hlt,$0x8,$0,env                     dead: 0
 set_label $L0                          
 exit_tb $0x7f9db0000043                

OUT: [size=72]
0x7f9db0000100:  mov    -0x10(%rbp),%ebx		[tb header & initial instruction]
0x7f9db0000103:  test   %ebx,%ebx
0x7f9db0000105:  jl     0x7f9db000012d
0x7f9db000010b:  movl   $0x1,0x0(%rbp)
0x7f9db0000112:  mov    0x4(%rbp),%ebx
0x7f9db0000115:  inc    %ebx
0x7f9db0000117:  mov    %ebx,0x4(%rbp)
0x7f9db000011a:  movl   $0x4,0x80(%rbp)
0x7f9db0000124:  mov    %rbp,%rdi
0x7f9db0000127:  callq  *0x13(%rip)        # 0x7f9db0000140
0x7f9db000012d:  lea    -0xf1(%rip),%rax        # 0x7f9db0000043
0x7f9db0000134:  jmpq   0x7f9db0000018
  data: [size=8]
0x7f9db0000140:  .quad  0x0000564aed2120d0

---- End of Qemu output ----

This syncs the constant directly back to memory and use `inc`
instead of register add for `add_i32`, which is kind of the intuitive
thing to do. But I'm not sure whether it will break something else.

Given the above result, I'm curious about why the register is load here.
Is it because that repeatedly embedding the same large constant in
instructions inflates the code? (or it could be other reasons?) Looking
forward to your further reply!

Thanks,
lrwei

> 
> > So I wonder whether there is any reason for this loading a constant
> > into register, I'll be very appreciated if someone can point out the
> > reason for me.
> 
> <snip>
> 
> >
> >
> > Thanks in advance.
> > lrwei
> > ------------------Original------------------
> > From: "lrwei"<[hidden email]>
> > Date: Tue, Aug 4, 2020 12:06 PM
> > To: "qemu-discuss"<[hidden email]>
> > Subject: Question on implementation detail of `temp_sync`
> >
> <re-pasted fixing html noise>
> 
> > Hello to the list,
> > Recently I have been studying the code of TCG, and get confused by
> > the following detail in function `temp_sync` in tcg/tcg.c:
> 
> >     case TEMP_VAL_CONST:
> >         /* If we're going to free the temp immediately, then we won't
> >            require it later in a register, so attempt to store the
> >            constant to memory directly.  */
> >         if (free_or_dead
> >             && tcg_out_sti(s, ts->type, ts->val,
> >                            ts->mem_base->reg, ts->mem_offset)) {
> >             break;
> >         }
> >         temp_load(s, ts, tcg_target_available_regs[ts->type],
> >                   allocated_regs, preferred_regs);
> >         /* fallthrough */
> 
> > Would it be better to remove the `free_or_dead` in the if statement,
> > i.e. turn the function to be:
> 
> >     case TEMP_VAL_CONST:
> >         if (tcg_out_sti(s, ts->type, ts->val,
> >                            ts->mem_base->reg, ts->mem_offset)) {
> >             break;
> >         }
> >         temp_load(s, ts, tcg_target_available_regs[ts->type],
> >                   allocated_regs, preferred_regs);
> >         /* fallthrough */
> 
> 
> -- 
> Alex Bennée

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-08-06 15:56 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <tencent_61ECB8BB3639D7BF2284FBDC@qq.com>
     [not found] ` <tencent_3C5D583315945B14647C946B@qq.com>
2020-08-05  8:39   ` Question on implementation detail of `temp_sync` Alex Bennée
2020-08-06 15:53 lrwei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).