From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:44397) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R5mKj-0006f9-VB for qemu-devel@nongnu.org; Mon, 19 Sep 2011 18:29:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R5mKi-0006tL-GY for qemu-devel@nongnu.org; Mon, 19 Sep 2011 18:29:01 -0400 Received: from auth.b.painless.aaisp.net.uk ([90.155.4.48]:52995) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R5mKi-0006tH-4i for qemu-devel@nongnu.org; Mon, 19 Sep 2011 18:29:00 -0400 Received: from zubnet.me.uk ([81.187.243.246] helo=circe) by b.painless.aaisp.net.uk with esmtp (Exim 4.72) (envelope-from ) id 1R5mKf-0005fj-Ov for qemu-devel@nongnu.org; Mon, 19 Sep 2011 23:28:58 +0100 Date: Mon, 19 Sep 2011 23:28:54 +0100 From: Stuart Brady Message-ID: <20110919222854.GB19005@zubnet.me.uk> References: <4E74FC29.1050003@mail.berlios.de> <1316289634-18786-6-git-send-email-weil@mail.berlios.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org On Sun, Sep 18, 2011 at 10:03:07AM +0000, Blue Swirl wrote: > I was wondering if this #ifdeffery is needed since TCI would probably > give more performance compared to the alternative, TCG generated > emulation sequences. But it could be useful for testing those. Maybe > there should be two options to enable and disable all non-mandatory TCI > versions. We could perhaps even allow enabling/disabling of optional ops from the command line, although I this would complicate tcg-op.h pretty badly. > > +/* > > + * This code implements a TCG which does not generate machine code f= or some > > + * real target machine but which generates virtual machine code for = an > > + * interpreter. Interpreted pseudo code is slow, but it works on any= host. > > + * > > + * Some remarks might help in understanding the code: > > + * > > + * "target" or "TCG target" is the machine which runs the generated = code. > > + * This is different to the usual meaning in QEMU where "target" is = the > > + * emulated machine. So normally QEMU host is identical to TCG targe= t. > > + * Here the TCG target is a virtual machine, but this virtual machin= e must > > + * use the same word size like the real machine. >=20 > Why, for performance? Allowing that could be useful for testing TCG, > perhaps we could even use non-native endianness? I suppose any mismatch between TCGv_ptr and the host pointer size must be avoided. Perhaps it would be worth adding a TCG_TARGET_PTR_BITS and converting users of TCG_TARGET_REG_BITS appropriately. I'm surprised at just how few places I've found that test TCG_TARGET_REG_BITS to determine the width of a TCGv_ptr. > > + * Therefore, we need both 32 and 64 bit virtual machines (interpret= er). > > + */ > > + > > +#if !defined(TCG_TARGET_H) > > +#define TCG_TARGET_H > > + > > +#include "config-host.h" > > + > > +#define TCG_TARGET_INTERPRETER 1 > > + > > +#ifdef CONFIG_DEBUG_TCG > > +/* Enable debug output. */ > > +#define CONFIG_DEBUG_TCG_INTERPRETER > > +#endif > > + > > +#if 0 /* TCI tries to emulate a little endian host. */ > > +#if defined(HOST_WORDS_BIGENDIAN) > > +# define TCG_TARGET_WORDS_BIGENDIAN > > +#endif > > +#endif > > + > > +/* Optional instructions. */ > > + > > +#define TCG_TARGET_HAS_bswap16_i32 =A0 =A0 =A01 > > +#define TCG_TARGET_HAS_bswap32_i32 =A0 =A0 =A01 > > +/* Not more than one of the next two defines must be 1. */ > > +#define TCG_TARGET_HAS_div_i32 =A0 =A0 =A0 =A0 =A01 > > +#define TCG_TARGET_HAS_div2_i32 =A0 =A0 =A0 =A0 0 > > +#define TCG_TARGET_HAS_ext8s_i32 =A0 =A0 =A0 =A01 > > +#define TCG_TARGET_HAS_ext16s_i32 =A0 =A0 =A0 1 > > +#define TCG_TARGET_HAS_ext8u_i32 =A0 =A0 =A0 =A01 > > +#define TCG_TARGET_HAS_ext16u_i32 =A0 =A0 =A0 1 > > +#define TCG_TARGET_HAS_andc_i32 =A0 =A0 =A0 =A0 0 > > +#define TCG_TARGET_HAS_deposit_i32 =A0 =A0 =A00 > > +#define TCG_TARGET_HAS_eqv_i32 =A0 =A0 =A0 =A0 =A00 > > +#define TCG_TARGET_HAS_nand_i32 =A0 =A0 =A0 =A0 0 > > +#define TCG_TARGET_HAS_nor_i32 =A0 =A0 =A0 =A0 =A00 > > +#define TCG_TARGET_HAS_neg_i32 =A0 =A0 =A0 =A0 =A01 > > +#define TCG_TARGET_HAS_not_i32 =A0 =A0 =A0 =A0 =A01 > > +#define TCG_TARGET_HAS_orc_i32 =A0 =A0 =A0 =A0 =A00 > > +#define TCG_TARGET_HAS_rot_i32 =A0 =A0 =A0 =A0 =A01 > > + > > +#if TCG_TARGET_REG_BITS =3D=3D 64 > > +#define TCG_TARGET_HAS_bswap16_i64 =A0 =A0 =A01 > > +#define TCG_TARGET_HAS_bswap32_i64 =A0 =A0 =A01 > > +#define TCG_TARGET_HAS_bswap64_i64 =A0 =A0 =A01 > > +#define TCG_TARGET_HAS_deposit_i64 =A0 =A0 =A00 > > +/* Not more than one of the next two defines must be 1. */ > > +#define TCG_TARGET_HAS_div_i64 =A0 =A0 =A0 =A0 =A00 > > +#define TCG_TARGET_HAS_div2_i64 =A0 =A0 =A0 =A0 0 > > +#define TCG_TARGET_HAS_ext8s_i64 =A0 =A0 =A0 =A01 > > +#define TCG_TARGET_HAS_ext16s_i64 =A0 =A0 =A0 1 > > +#define TCG_TARGET_HAS_ext32s_i64 =A0 =A0 =A0 1 > > +#define TCG_TARGET_HAS_ext8u_i64 =A0 =A0 =A0 =A01 > > +#define TCG_TARGET_HAS_ext16u_i64 =A0 =A0 =A0 1 > > +#define TCG_TARGET_HAS_ext32u_i64 =A0 =A0 =A0 1 > > +#define TCG_TARGET_HAS_andc_i64 =A0 =A0 =A0 =A0 0 > > +#define TCG_TARGET_HAS_eqv_i64 =A0 =A0 =A0 =A0 =A00 > > +#define TCG_TARGET_HAS_nand_i64 =A0 =A0 =A0 =A0 0 > > +#define TCG_TARGET_HAS_nor_i64 =A0 =A0 =A0 =A0 =A00 > > +#define TCG_TARGET_HAS_neg_i64 =A0 =A0 =A0 =A0 =A01 > > +#define TCG_TARGET_HAS_not_i64 =A0 =A0 =A0 =A0 =A01 > > +#define TCG_TARGET_HAS_orc_i64 =A0 =A0 =A0 =A0 =A00 > > +#define TCG_TARGET_HAS_rot_i64 =A0 =A0 =A0 =A0 =A01 > > +#endif /* TCG_TARGET_REG_BITS =3D=3D 64 */ > > + > > +/* Offset to user memory in user mode. */ > > +#define TCG_TARGET_HAS_GUEST_BASE > > + > > +/* Number of registers available. > > + =A0 For 32 bit hosts, we need more than 8 registers (call arguments= ). */ >=20 > On i386 there certainly aren't 8 registers, where does 8 come from? We need eight registers to allow passing of four 32-bit arguments using the registers. Alternatively, we could use a stack to pass arguments. For this, we'd need to point our stack register (tci_reg[TCG_REG_CALL_STACK]) at some memory that we use as a stack. It wouldn't need to be much, just enough to accomodate the arguments, AFAICT. Unless we use the stack pointer register, we should not define TCG_REG_CALL_STACK at all, and we should #ifdef out the parts of tcg_reg_alloc_call() that use it. If there's no stack available, the code should abort in the case where there aren't enough TCI registers for all of the parameters being passed, although in this case, there should be a compile time check to ensure that: ARRAY_SIZE(tcg_target_call_iarg_regs) =3D=3D (MAX_OPC_PARAM_PER_ARG * MAX_OPC_PARAM_IARGS) We might want to relax that check to allow tcg_target_call_iarg_regs to have list more TCI registers than required, but those registers would just be wasted. BTW, note that TCG limits us to 64 registers (due to TCGRegSet). I expect this could be changed (for TCI only) if needed, though, but if we allow a user-supplied TCG_TARGET_NB_REGS, then we should check that it is no more than 64, at least for the time being. > > +/* #define TCG_TARGET_NB_REGS 32 */ > > + > > +/* List of registers which are used by TCG. */ > > +typedef enum { > > + =A0 =A0TCG_REG_R0 =3D 0, > > + =A0 =A0TCG_REG_R1, > > + =A0 =A0TCG_REG_R2, > > + =A0 =A0TCG_REG_R3, > > + =A0 =A0TCG_REG_R4, > > + =A0 =A0TCG_REG_R5, > > + =A0 =A0TCG_REG_R6, > > + =A0 =A0TCG_REG_R7, > > + =A0 =A0TCG_AREG0 =3D TCG_REG_R7, > > +#if TCG_TARGET_NB_REGS >=3D 16 > > + =A0 =A0TCG_REG_R8, > > + =A0 =A0TCG_REG_R9, > > + =A0 =A0TCG_REG_R10, > > + =A0 =A0TCG_REG_R11, > > + =A0 =A0TCG_REG_R12, > > + =A0 =A0TCG_REG_R13, > > + =A0 =A0TCG_REG_R14, > > + =A0 =A0TCG_REG_R15, > > +#if TCG_TARGET_NB_REGS >=3D 32 > > + =A0 =A0TCG_REG_R16, > > + =A0 =A0TCG_REG_R17, > > + =A0 =A0TCG_REG_R18, > > + =A0 =A0TCG_REG_R19, > > + =A0 =A0TCG_REG_R20, > > + =A0 =A0TCG_REG_R21, > > + =A0 =A0TCG_REG_R22, > > + =A0 =A0TCG_REG_R23, > > + =A0 =A0TCG_REG_R24, > > + =A0 =A0TCG_REG_R25, > > + =A0 =A0TCG_REG_R26, > > + =A0 =A0TCG_REG_R27, > > + =A0 =A0TCG_REG_R28, > > + =A0 =A0TCG_REG_R29, > > + =A0 =A0TCG_REG_R30, > > + =A0 =A0TCG_REG_R31, > > +#endif This seems unfortunate to me... I wonder whether some sort of chain of defines would be better: /* already defined in osdep.h */ #define xglue(x, y) x ## y #define glue(x, y) xglue(x, y) #define stringify(s) tostring(s) #define tostring(s) #s /* common definitions */ #define NUM_DEF_1(n) n(0) #define NUM_DEF_2(n) NUM_DEF_1(n) n(1) #define NUM_DEF_3(n) NUM_DEF_2(n) n(2) #define NUM_DEF_4(n) NUM_DEF_3(n) n(3) #define NUM_DEF_5(n) NUM_DEF_4(n) n(4) #define NUM_DEF_6(n) NUM_DEF_5(n) n(5) #define NUM_DEF_7(n) NUM_DEF_6(n) n(6) #define NUM_DEF_8(n) NUM_DEF_7(n) n(7) #define NUM_DEF_9(n) NUM_DEF_8(n) n(8) #define NUM_DEF_10(n) NUM_DEF_9(n) n(9) #define NUM_DEF_11(n) NUM_DEF_10(n) n(10) #define NUM_DEF_12(n) NUM_DEF_11(n) n(11) #define NUM_DEF_13(n) NUM_DEF_12(n) n(12) #define NUM_DEF_14(n) NUM_DEF_13(n) n(13) #define NUM_DEF_15(n) NUM_DEF_14(n) n(14) #define NUM_DEF_16(n) NUM_DEF_15(n) n(15) #define DEF_TCG_REGS glue(NUM_DEF_,TCG_TARGET_NB_REGS) /* tcg-target.h */ #define DEF_TCG_REG_NUM(x) TCG_REG_R##x, typedef enum { DEF_TCG_REGS(DEF_TCG_REG_NUM) }; /* tcg-target.c */ #define DEF_TCG_REG_NAME(x) tostring(r##x), static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] =3D= { DEF_TCG_REGS(DEF_TCG_REG_NAME) }; Okay, so I accept that this is rather horrible, but it does allow us to define the right number of entries based on TCG_TARGET_NB_REGS without masses of #ifdefs or relying on the compiler for the host. It might be better to allow the number of registers to be defined at run-time -- although TCG would have to be modified not to rely upon TCG_TARGET_NB_REGS when compiled for TCI. Cheers, --=20 Stuart