x86 TCG helpers clobbered registers

* x86 TCG helpers clobbered registers
@ 2020-12-04 15:36 Stephane Duverger
  2020-12-04 19:35 ` Richard Henderson
  0 siblings, 1 reply; 7+ messages in thread
From: Stephane Duverger @ 2020-12-04 15:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Richard Henderson

Hello,

While looking at tcg/i386/tcg-target.c.inc:tcg_out_qemu_st(), I
discovered that the TCG generates a call to a store helper at the end
of the TB which is executed on TLB miss and get back to the remaining
translated ops. I tried to mimick this behavior around the fast path
(right between tcg_out_tlb_load() and tcg_out_qemu_st_direct()) to
filter on memory store accesses.

I know there is now TCG plugins for that purpose at TCG IR level,
which every tcg-target might benefit. FWIW, my design choice was more
led by the fact that I always work on an x86 host and plugins did not
exist by the time. Anyway, the point is more related to generating a
call to a helper at the TCG IR level (classic scenario), or later
during tcg-target code generation (slow path for instance).

The TCG when calling a helper knows that some registers will be call
clobbered and as such must free them. This is what I observed in
tcg_reg_alloc_call():

/* clobber call registers */
for (i = 0; i < TCG_TARGET_NB_REGS; i++) {
    if (tcg_regset_test_reg(tcg_target_call_clobber_regs, i)) {
        tcg_reg_free(s, i, allocated_regs);
    }
}

But in our case (ie. INDEX_op_qemu_st_i32), the TCG code path comes
from:

tcg_reg_alloc_op()
  tcg_out_op()
    tcg_out_qemu_st()

Then tcg_out_tlb_load() will inject a 'jmp' to the slow path, whose
generated code does not seem to take care of every call clobbered
registers, if we look at tcg_out_qemu_st_slow_path().

First for an i386 (32bits) tcg-target, as expected, the helper
arguments are injected into the stack. I noticed that 'esp' is not
shifted down before stacking up the args, which might corrupt last
stacked words.

Second, for both 32/64 bits tcg-targets since all of the 'call
clobbered' registers are not preserved, it may happen that depending
on the code executed by the helper (and so generated by GCC) these
registers will be clobbered (ie. R10 for x86-64).

While this never happened for the slow path helper call, I observed
that my guest had trouble running when filtering memory in the same
fashion the slow path helper would be called. Conversely, if I
push/pop all of the call clobbered regs around the call to the helper,
everything runs as expected.

Is this correct ? Am I missing something ?

Thanks a lot in advance for your eagle eye on this :)

^ permalink raw reply	[flat|nested] 7+ messages in thread