Re: [Qemu-devel] some thoughts on nanoMIPS solution

* Re: [Qemu-devel] some thoughts on nanoMIPS solution
       [not found] <BN7PR08MB4868A12171F6E86D64E8CB3CC64F0@BN7PR08MB4868.namprd08.prod.outlook.com>
@ 2018-06-28 22:51 ` Richard Henderson
  0 siblings, 0 replies; only message in thread
From: Richard Henderson @ 2018-06-28 22:51 UTC (permalink / raw)
  To: Aleksandar Markovic; +Cc: Petar Jovanovic, qemu-devel

On 06/28/2018 01:07 PM, Aleksandar Markovic wrote:
> Therefore, I think it makes sense to (more or less) split translate.c into,
> let's say, following set of files:
> 
>   * translate_cmn_chk.h - definitions of (typically inlined) functions check_XXX()
>   * translate_cmn_gen.h - declarations of gen_XXX() family of functions, and
>     definitions of such functions among them that are inlined
>   * translate_cmn_gen.c - definitions of non-inlined gen_XXX() functions
>     (including all their callees)
> 
>   * translate.h - common structure definitions such as DisasContext
>   * translate.c - the rest of code after splitting
> 
> 
> The nanoMIPS code would be, in such organization, in a separate file:
> 
>   * translate_nanomips.c

This sounds fine.  Thanks for being willing to tackle this.

> Let me explain in some additional details what I have in mind. Consider, for
> example, the function gen_logic(). It currently expects one argument to be
> OPC_ADD, OPC_OR, OPC_NOR, or OPC_XOR. Note that those are pre-nanoMIPS opcodes.
> nanoMIPS may or may not have the same respective opcodes. In order to break
> dependency of such functions on particular opcode tables, I propose introducing
> new enum types, independant on any opcode organization. For example, for
> gen_logic(), the new corresponding enum would be:
> 
> 
> enum LogicOp {
>     opLogic_AND             = 1000,
>     opLogic_OR,
>     opLogic_NOR,
>     opLogic_XOR,
> };
> 
> and the new declaration of gen_logic() would be:
> 
> 
> extern void gen_logic(DisasContext *ctx, LogicOp op, int rd, int rs, int rt);

I would suggest an alternative in which the enum is removed, and split each
case into a separate function.  There is a small amount of code that is shared
between the cases, and that can be handled via subroutines.

I suggest "gen_insn_<opcode>" as the prefix for this because there are quite a
number of "gen_foo" functions that are not in the form to implement an
architectural instruction exactly.  Naming provides an easy way to distinguish
the two cases, especially during transition.

For example,

/* Expand an operation on 3 integer registers.
 * The given operation must have no side effects
 * other than setting the destination.
 */
static void gen_rrr(DisasContext *ctx, int rd, int rs, int rt,
                    void (*gen_fn)(TCGv, TCGv, TCGv))
{
    TCGv reg_s = cpu_gpr[rs];
    TCGv reg_t = cpu_gpr[rt];
    TCGv zero = NULL;

    if (unlikely(rd == 0)) {
        return;
    }
    if (unlikely(rs == 0 || rt == 0)) {
        zero = tcg_const_tl(0);
        if (rs == 0) {
            reg_s = zero;
        }
        if (rt == 0) {
            reg_t = zero;
        }
    }

    gen_fn(cpu_gpr[rd], reg_s, reg_t);

    if (unlikely(zero != NULL)) {
        tcg_temp_free(zero);
    }
}

void gen_insn_and(DisasContext *ctx, int rd, int rs, int rt)
{
    gen_rrr(ctx, rd, rs, rt, tcg_gen_and_tl);
}

void gen_insn_xor(DisasContext *ctx, int rd, int rs, int rt)
{
    gen_rrr(ctx, rd, rs, rt, tcg_gen_xor_tl);
}

void gen_insn_nor(DisasContext *ctx, int rd, int rs, int rt)
{
    if (rd != 0) {
        /* NOT is a special case of NOR D,S,0.  */
        if (rs != 0 && rt == 0) {
            tcg_gen_not_tl(cpu_gpr[rd], cpu_gpr[rs]);
        } else {
            gen_rrr(ctx, rd, rs, rt, tcg_gen_nor_tl);
        }
    }
}

void gen_insn_or(DisasContext *ctx, int rd, int rs, int rt)
{
    if (rd != 0) {
        /* MOVE is a special case of OR D,S,0.  */
        if (rt == 0) {
            if (rs == 0) {
                tcg_gen_movi_tl(cpu_gpr[rd], 0);
            } else {
                tcg_gen_mov_tl(cpu_gpr[rd], cpu_gpr[rs]);
            }
        } else {
            gen_rrr(ctx, rd, rs, rt, tcg_gen_or_tl);
        }
    }
}

static void gen_addu(TCGv reg_d, TCGv reg_s, TCGv reg_t)
{
    tcg_gen_add_tl(reg_d, reg_s, reg_t);
    tcg_gen_ext32s_tl(reg_d, reg_d);
}

void gen_insn_addu(DisasContext *ctx, int rd, int rs, int rt)
{
    if (rd != 0) {
        /* Old assemblers expand MOVE with ADDU D,S,0.  */
        if (rt == 0) {
            if (rs == 0) {
                tcg_gen_movi_tl(cpu_gpr[rd], 0);
            } else {
                tcg_gen_ext32s_tl(cpu_gpr[rd], cpu_gpr[rs]);
            }
        } else {
            gen_rrr(ctx, rd, rs, rt, gen_addu);
        }
    }
}

While making this transformation, please bear in mind

(1) The simplifications, particularly with constant arguments,
    that are already made by tcg/tcg-op.c.

    E.g. d = s|s -> d = s, and d = s+0 -> d = s.

    There is no point in checking for these again in target/mips/.

(2) The likelihood of r0 appearing in a given argument.

    E.g. while it is possible to encode mul d,s,0, it will probably
    never appear as a set of arguments, and so it is not worth
    handling specially.

    So long as d, s and a temporary containing 0 are passed to the
    proper tcg expander, you will emulate the edge case correctly.

    The approach that we have taken for other targets, e.g. AArch64
    and Alpha, is to only recognize as special cases those pseudo
    instructions that are called out in the manual.  Or failing that,
    the pseudo instructions that are defined by the assembler.

    E.g. for mips, I see in gas:
    move	or     d,s,0
    move	daddu  d,s,0
    move        addu   d,s,0

    so above, I check for "or d,s,0" but not "or d,0,t".

(3) The simplifications that are made during tcg optimization.

    Even when we do not special case as per point 2, we may still
    produce optimal results out of the constant folding stage of
    tcg optimization.

    The balancing act based on total emulation time.

    Common operand combinations like the official NOP or MOVE are
    special-cased in the translator in order to produce fewer
    opcodes, which requires less processing further down the line.

    Uncommon operand combinations require more processing up front,
    but since they are (by definition) uncommon there is no payoff
    and merely introduces more branches during translation.

> The pseudocode of invoking gen_logic() for existing platforms would be:
> 
> case OPC_ADD:
>     gen_logic(opLogic_AND); break;
> case OPC_OR:
>     gen_logic(opLogic_OR); break;
> case OPC_NOR:
>     gen_logic(opLogic_NOR); break;
> case OPC_XOR:
>     gen_logic(opLogic_XOR); break;

  case OPC_ADD:
      gen_insn_and(...);
      break;
  case OPC_OR:
      gen_insn_or(...);
      break;
  case OPC_NOR:
      gen_insn_nor(...);
      break;
  case OPC_XOR:
      gen_insn_xor(...);
      break;

You can see that there's no difference in complexity between these two
solutions at the source-code level.

r~

^ permalink raw reply	[flat|nested] only message in thread