All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes
@ 2015-03-21  6:27 Emilio G. Cota
  2015-03-23 21:42 ` Stefan Weil
  0 siblings, 1 reply; 11+ messages in thread
From: Emilio G. Cota @ 2015-03-21  6:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-trivial, Richard Henderson

This brings down the size of the struct from 56 to 48 bytes.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/tcg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index add7f75..3276924 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -429,8 +429,8 @@ typedef struct TCGTemp {
     int val_type;
     int reg;
     tcg_target_long val;
-    int mem_reg;
     intptr_t mem_offset;
+    int mem_reg;
     unsigned int fixed_reg:1;
     unsigned int mem_coherent:1;
     unsigned int mem_allocated:1;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes
  2015-03-21  6:27 [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes Emilio G. Cota
@ 2015-03-23 21:42 ` Stefan Weil
  2015-03-24  1:07   ` Richard Henderson
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Weil @ 2015-03-23 21:42 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel; +Cc: qemu-trivial, Richard Henderson

Am 21.03.2015 um 07:27 schrieb Emilio G. Cota:
> This brings down the size of the struct from 56 to 48 bytes.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>   tcg/tcg.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index add7f75..3276924 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -429,8 +429,8 @@ typedef struct TCGTemp {
>       int val_type;
>       int reg;
>       tcg_target_long val;
> -    int mem_reg;
>       intptr_t mem_offset;
> +    int mem_reg;
>       unsigned int fixed_reg:1;
>       unsigned int mem_coherent:1;
>       unsigned int mem_allocated:1;

Reviewed-by: Stefan Weil <sw@weilnetz.de>

TCGContext includes an array of TCGTemp, so it is even reduced by 4 KiB 
(good for caching),
and tcg.o now uses 55364 instead of 56116 bytes (maybe faster, too).

Further optimizations are possible. TCGTemp can be reduced to 32 bytes 
as the output
of pahole shows:

struct TCGTemp {
         TCGTempVal                 val_type:8; /*     0:24  4 */
         unsigned int               reg:8; /*     0:16  4 */
         unsigned int               mem_reg:8; /*     0: 8  4 */

         /* Bitfield combined with next fields */

         _Bool                      fixed_reg:1; /*     3: 7  1 */
         _Bool                      mem_coherent:1; /*     3: 6  1 */
         _Bool                      mem_allocated:1; /*     3: 5  1 */
         _Bool                      temp_local:1; /*     3: 4  1 */
         _Bool                      temp_allocated:1; /*     3: 3  1 */

         /* XXX 3 bits hole, try to pack */

         TCGType                    base_type:16; /*     4:16  4 */
         TCGType                    type:16; /*     4: 0  4 */
         tcg_target_long            val; /*     8     8 */
         intptr_t                   mem_offset; /*    16     8 */
         const char  *              name; /*    24     8 */

         /* size: 32, cachelines: 1, members: 13 */
         /* bit holes: 1, sum bit holes: 3 bits */
         /* last cacheline: 32 bytes */
};

Here I used a new enum type for val_type and reduced some values to 8 or 
16 bit.
I also put the two most often used values at the beginning, so they can be
addressed without or with a small offset ("often" in the code, no runtime
data available).

Are such optimizations useful?

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes
  2015-03-23 21:42 ` Stefan Weil
@ 2015-03-24  1:07   ` Richard Henderson
  2015-03-25 19:50     ` [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2015-03-24  1:07 UTC (permalink / raw)
  To: Stefan Weil, Emilio G. Cota, qemu-devel; +Cc: qemu-trivial

On 03/23/2015 02:42 PM, Stefan Weil wrote:
> Further optimizations are possible. TCGTemp can be reduced to 32 bytes as the
> output
> of pahole shows:
> 
> struct TCGTemp {
>         TCGTempVal                 val_type:8; /*     0:24  4 */

Need only be 2 bits.

>         unsigned int               reg:8; /*     0:16  4 */
>         unsigned int               mem_reg:8; /*     0: 8  4 */

Need only be  6 (ia64) bits, but an aligned 8-bit slot probably performs best.

> 
>         /* Bitfield combined with next fields */
> 
>         _Bool                      fixed_reg:1; /*     3: 7  1 */
>         _Bool                      mem_coherent:1; /*     3: 6  1 */
>         _Bool                      mem_allocated:1; /*     3: 5  1 */
>         _Bool                      temp_local:1; /*     3: 4  1 */
>         _Bool                      temp_allocated:1; /*     3: 3  1 */
> 
>         /* XXX 3 bits hole, try to pack */
> 
>         TCGType                    base_type:16; /*     4:16  4 */
>         TCGType                    type:16; /*     4: 0  4 */

Need only be 1 bit, honestly, but 2 bits might be easier to arrange.  Anyway,
you're down to 23 bits from the word, or 16 bytes on a 32-bit host.  It's no
better than the 32 bytes you got for a 64-bit host though.


>         tcg_target_long            val; /*     8     8 */
>         intptr_t                   mem_offset; /*    16     8 */
>         const char  *              name; /*    24     8 */
> 
>         /* size: 32, cachelines: 1, members: 13 */
>         /* bit holes: 1, sum bit holes: 3 bits */
>         /* last cacheline: 32 bytes */
> };
> 
> Here I used a new enum type for val_type and reduced some values to 8 or 16 bit.
> I also put the two most often used values at the beginning, so they can be
> addressed without or with a small offset ("often" in the code, no runtime
> data available).
> 
> Are such optimizations useful?

Yes, I think so.  Especially because of the rather large arrays we build.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
  2015-03-24  1:07   ` Richard Henderson
@ 2015-03-25 19:50     ` Emilio G. Cota
  2015-03-27  9:55       ` Alex Bennée
  2015-03-27 14:58       ` Richard Henderson
  0 siblings, 2 replies; 11+ messages in thread
From: Emilio G. Cota @ 2015-03-25 19:50 UTC (permalink / raw)
  To: Stefan Weil, Richard Henderson; +Cc: qemu-trivial, qemu-devel

This brings down the size of the struct from 56 to 32 bytes on 64-bit,
and to 16 bytes on 32-bit.

The appended adds macros to prevent us from mistakenly overflowing
the bitfields when more elements are added to the corresponding
enums/macros.

Note that reg/mem_reg need only 6 bits (for ia64) but for performance
is probably better to align them to a byte address.

Given that TCGTemp is used in large arrays this leads to a few KBs
of savings. However, unpacking the bits takes additional code, so
the net effect depends on the target (host is x86_64):

Before:
$ find . -name 'tcg.o' | xargs size
   text    data     bss     dec     hex filename
  41131   29800      88   71019   1156b ./aarch64-softmmu/tcg/tcg.o
  37969   29416      96   67481   10799 ./x86_64-linux-user/tcg/tcg.o
  39354   28816      96   68266   10aaa ./arm-linux-user/tcg/tcg.o
  40802   29096      88   69986   11162 ./arm-softmmu/tcg/tcg.o
  39417   29672      88   69177   10e39 ./x86_64-softmmu/tcg/tcg.o

After:
$ find . -name 'tcg.o' | xargs size
   text    data     bss     dec     hex filename
  41187   29800      88   71075   115a3 ./aarch64-softmmu/tcg/tcg.o
  37777   29416      96   67289   106d9 ./x86_64-linux-user/tcg/tcg.o
  39162   28816      96   68074   109ea ./arm-linux-user/tcg/tcg.o
  40858   29096      88   70042   1119a ./arm-softmmu/tcg/tcg.o
  39473   29672      88   69233   10e71 ./x86_64-softmmu/tcg/tcg.o

Suggested-by: Stefan Weil <sw@weilnetz.de>
Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/tcg.h | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index add7f75..71ae7b2 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -193,7 +193,7 @@ typedef struct TCGPool {
 typedef enum TCGType {
     TCG_TYPE_I32,
     TCG_TYPE_I64,
-    TCG_TYPE_COUNT, /* number of different types */
+    TCG_TYPE_COUNT, /* number of different types, see TCG_TYPE_NR_BITS */
 
     /* An alias for the size of the host register.  */
 #if TCG_TARGET_REG_BITS == 32
@@ -217,6 +217,9 @@ typedef enum TCGType {
 #endif
 } TCGType;
 
+/* used for bitfield packing to save space */
+#define TCG_TYPE_NR_BITS 1
+
 /* Constants for qemu_ld and qemu_st for the Memory Operation field.  */
 typedef enum TCGMemOp {
     MO_8     = 0,
@@ -421,16 +424,14 @@ static inline TCGCond tcg_high_cond(TCGCond c)
 #define TEMP_VAL_REG   1
 #define TEMP_VAL_MEM   2
 #define TEMP_VAL_CONST 3
+#define TEMP_VAL_NR_BITS 2
 
-/* XXX: optimize memory layout */
 typedef struct TCGTemp {
-    TCGType base_type;
-    TCGType type;
-    int val_type;
-    int reg;
-    tcg_target_long val;
-    int mem_reg;
-    intptr_t mem_offset;
+    unsigned int reg:8;
+    unsigned int mem_reg:8;
+    unsigned int val_type:TEMP_VAL_NR_BITS;
+    unsigned int base_type:TCG_TYPE_NR_BITS;
+    unsigned int type:TCG_TYPE_NR_BITS;
     unsigned int fixed_reg:1;
     unsigned int mem_coherent:1;
     unsigned int mem_allocated:1;
@@ -438,6 +439,9 @@ typedef struct TCGTemp {
                                   basic blocks. Otherwise, it is not
                                   preserved across basic blocks. */
     unsigned int temp_allocated:1; /* never used for code gen */
+
+    tcg_target_long val;
+    intptr_t mem_offset;
     const char *name;
 } TCGTemp;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
  2015-03-25 19:50     ` [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota
@ 2015-03-27  9:55       ` Alex Bennée
  2015-03-27 21:09         ` Emilio G. Cota
  2015-03-27 14:58       ` Richard Henderson
  1 sibling, 1 reply; 11+ messages in thread
From: Alex Bennée @ 2015-03-27  9:55 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-trivial, Stefan Weil, qemu-devel, Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> This brings down the size of the struct from 56 to 32 bytes on 64-bit,
> and to 16 bytes on 32-bit.

Have you been able to measure any performance improvement with these new
structures? In theory, if aligned with cache lines, performance should
improve but real numbers would be nice.

>
> The appended adds macros to prevent us from mistakenly overflowing
> the bitfields when more elements are added to the corresponding
> enums/macros.

I can see the defines but I can't see any checks. Should we be able to
do a compile time check if TCG_TYPE_COUNT doesn't fit into
TCG_TYPE_NR_BITS?

>
> Note that reg/mem_reg need only 6 bits (for ia64) but for performance
> is probably better to align them to a byte address.
>
> Given that TCGTemp is used in large arrays this leads to a few KBs
> of savings. However, unpacking the bits takes additional code, so
> the net effect depends on the target (host is x86_64):
>
> Before:
> $ find . -name 'tcg.o' | xargs size
>    text    data     bss     dec     hex filename
>   41131   29800      88   71019   1156b ./aarch64-softmmu/tcg/tcg.o
>   37969   29416      96   67481   10799 ./x86_64-linux-user/tcg/tcg.o
>   39354   28816      96   68266   10aaa ./arm-linux-user/tcg/tcg.o
>   40802   29096      88   69986   11162 ./arm-softmmu/tcg/tcg.o
>   39417   29672      88   69177   10e39 ./x86_64-softmmu/tcg/tcg.o
>
> After:
> $ find . -name 'tcg.o' | xargs size
>    text    data     bss     dec     hex filename
>   41187   29800      88   71075   115a3 ./aarch64-softmmu/tcg/tcg.o
>   37777   29416      96   67289   106d9 ./x86_64-linux-user/tcg/tcg.o
>   39162   28816      96   68074   109ea ./arm-linux-user/tcg/tcg.o
>   40858   29096      88   70042   1119a ./arm-softmmu/tcg/tcg.o
>   39473   29672      88   69233   10e71 ./x86_64-softmmu/tcg/tcg.o
>
> Suggested-by: Stefan Weil <sw@weilnetz.de>
> Suggested-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  tcg/tcg.h | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index add7f75..71ae7b2 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -193,7 +193,7 @@ typedef struct TCGPool {
>  typedef enum TCGType {
>      TCG_TYPE_I32,
>      TCG_TYPE_I64,
> -    TCG_TYPE_COUNT, /* number of different types */
> +    TCG_TYPE_COUNT, /* number of different types, see TCG_TYPE_NR_BITS */
>  
>      /* An alias for the size of the host register.  */
>  #if TCG_TARGET_REG_BITS == 32
> @@ -217,6 +217,9 @@ typedef enum TCGType {
>  #endif
>  } TCGType;
>  
> +/* used for bitfield packing to save space */
> +#define TCG_TYPE_NR_BITS 1
> +
>  /* Constants for qemu_ld and qemu_st for the Memory Operation field.  */
>  typedef enum TCGMemOp {
>      MO_8     = 0,
> @@ -421,16 +424,14 @@ static inline TCGCond tcg_high_cond(TCGCond c)
>  #define TEMP_VAL_REG   1
>  #define TEMP_VAL_MEM   2
>  #define TEMP_VAL_CONST 3
> +#define TEMP_VAL_NR_BITS 2

A similar compile time check could be added here.

>  
> -/* XXX: optimize memory layout */
>  typedef struct TCGTemp {
> -    TCGType base_type;
> -    TCGType type;
> -    int val_type;
> -    int reg;
> -    tcg_target_long val;
> -    int mem_reg;
> -    intptr_t mem_offset;
> +    unsigned int reg:8;
> +    unsigned int mem_reg:8;
> +    unsigned int val_type:TEMP_VAL_NR_BITS;
> +    unsigned int base_type:TCG_TYPE_NR_BITS;
> +    unsigned int type:TCG_TYPE_NR_BITS;
>      unsigned int fixed_reg:1;
>      unsigned int mem_coherent:1;
>      unsigned int mem_allocated:1;
> @@ -438,6 +439,9 @@ typedef struct TCGTemp {
>                                    basic blocks. Otherwise, it is not
>                                    preserved across basic blocks. */
>      unsigned int temp_allocated:1; /* never used for code gen */
> +
> +    tcg_target_long val;
> +    intptr_t mem_offset;
>      const char *name;
>  } TCGTemp;

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
  2015-03-25 19:50     ` [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota
  2015-03-27  9:55       ` Alex Bennée
@ 2015-03-27 14:58       ` Richard Henderson
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2015-03-27 14:58 UTC (permalink / raw)
  To: Emilio G. Cota, Stefan Weil; +Cc: qemu-trivial, qemu-devel

On 03/25/2015 12:50 PM, Emilio G. Cota wrote:
> This brings down the size of the struct from 56 to 32 bytes on 64-bit,
> and to 16 bytes on 32-bit.
> 
> The appended adds macros to prevent us from mistakenly overflowing
> the bitfields when more elements are added to the corresponding
> enums/macros.
> 
> Note that reg/mem_reg need only 6 bits (for ia64) but for performance
> is probably better to align them to a byte address.
> 
> Given that TCGTemp is used in large arrays this leads to a few KBs
> of savings. However, unpacking the bits takes additional code, so
> the net effect depends on the target (host is x86_64):
> 
> Before:
> $ find . -name 'tcg.o' | xargs size
>    text    data     bss     dec     hex filename
>   41131   29800      88   71019   1156b ./aarch64-softmmu/tcg/tcg.o
>   37969   29416      96   67481   10799 ./x86_64-linux-user/tcg/tcg.o
>   39354   28816      96   68266   10aaa ./arm-linux-user/tcg/tcg.o
>   40802   29096      88   69986   11162 ./arm-softmmu/tcg/tcg.o
>   39417   29672      88   69177   10e39 ./x86_64-softmmu/tcg/tcg.o
> 
> After:
> $ find . -name 'tcg.o' | xargs size
>    text    data     bss     dec     hex filename
>   41187   29800      88   71075   115a3 ./aarch64-softmmu/tcg/tcg.o
>   37777   29416      96   67289   106d9 ./x86_64-linux-user/tcg/tcg.o
>   39162   28816      96   68074   109ea ./arm-linux-user/tcg/tcg.o
>   40858   29096      88   70042   1119a ./arm-softmmu/tcg/tcg.o
>   39473   29672      88   69233   10e71 ./x86_64-softmmu/tcg/tcg.o
> 
> Suggested-by: Stefan Weil <sw@weilnetz.de>
> Suggested-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  tcg/tcg.h | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index add7f75..71ae7b2 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -193,7 +193,7 @@ typedef struct TCGPool {
>  typedef enum TCGType {
>      TCG_TYPE_I32,
>      TCG_TYPE_I64,
> -    TCG_TYPE_COUNT, /* number of different types */
> +    TCG_TYPE_COUNT, /* number of different types, see TCG_TYPE_NR_BITS */
>  
>      /* An alias for the size of the host register.  */
>  #if TCG_TARGET_REG_BITS == 32
> @@ -217,6 +217,9 @@ typedef enum TCGType {
>  #endif
>  } TCGType;
>  
> +/* used for bitfield packing to save space */
> +#define TCG_TYPE_NR_BITS 1

I'd rather you moved TCG_TYPE_COUNT out of the enum and into a define.  Perhaps
even as (1 << TCG_TYPE_NR_BITS).

> @@ -421,16 +424,14 @@ static inline TCGCond tcg_high_cond(TCGCond c)
>  #define TEMP_VAL_REG   1
>  #define TEMP_VAL_MEM   2
>  #define TEMP_VAL_CONST 3
> +#define TEMP_VAL_NR_BITS 2

And make this an enumeration.

>  typedef struct TCGTemp {
> -    TCGType base_type;
> -    TCGType type;
> -    int val_type;
> -    int reg;
> -    tcg_target_long val;
> -    int mem_reg;
> -    intptr_t mem_offset;
> +    unsigned int reg:8;
> +    unsigned int mem_reg:8;
> +    unsigned int val_type:TEMP_VAL_NR_BITS;
> +    unsigned int base_type:TCG_TYPE_NR_BITS;
> +    unsigned int type:TCG_TYPE_NR_BITS;

And do *not* change these from the enumeration to an unsigned int.

I know why you did this -- to keep the compiler from warning that the TCGType
enum didn't fit in the bitfield, because of TCG_TYPE_COUNT being an enumerator,
rather than an unrelated number.  Except that's exactly the warning we want to
keep, on the off-chance that someone modifies the enums without modifying the
_NR_BITS defines.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
  2015-03-27  9:55       ` Alex Bennée
@ 2015-03-27 21:09         ` Emilio G. Cota
  2015-03-30  9:55           ` Laurent Desnogues
  0 siblings, 1 reply; 11+ messages in thread
From: Emilio G. Cota @ 2015-03-27 21:09 UTC (permalink / raw)
  To: Alex Bennée, Richard Henderson; +Cc: qemu-trivial, Stefan Weil, qemu-devel

On Fri, Mar 27, 2015 at 09:55:03 +0000, Alex Bennée wrote:
> Have you been able to measure any performance improvement with these new
> structures? In theory, if aligned with cache lines, performance should
> improve but real numbers would be nice.

I haven't benchmarked anything, which makes me very uneasy. All
I've checked is that the system boots, and FWIW I appreciate no
difference in boot time.

Is there a benchmark suite to test TCG changes?

Until proper benchmarking I wouldn't want to see this merged. For now I
propose to merge the initial change (remove 8-byte hole in 64-bit),
which is uncontroversial.

> > The appended adds macros to prevent us from mistakenly overflowing
> > the bitfields when more elements are added to the corresponding
> > enums/macros.
> 
> I can see the defines but I can't see any checks. Should we be able to
> do a compile time check if TCG_TYPE_COUNT doesn't fit into
> TCG_TYPE_NR_BITS?

> > +#define TEMP_VAL_NR_BITS 2
> 
> A similar compile time check could be added here.

Ack, addressed below.

On Fri, Mar 27, 2015 at 07:58:06 -0700, Richard Henderson wrote:
> On 03/25/2015 12:50 PM, Emilio G. Cota wrote:
> > +#define TCG_TYPE_NR_BITS 1
> 
> I'd rather you moved TCG_TYPE_COUNT out of the enum and into a define.  Perhaps
> even as (1 << TCG_TYPE_NR_BITS).
(snip)
> > +#define TEMP_VAL_NR_BITS 2
> 
> And make this an enumeration.
> 
> >  typedef struct TCGTemp {
(snip)
> > +    unsigned int base_type:TCG_TYPE_NR_BITS;
> > +    unsigned int type:TCG_TYPE_NR_BITS;
> 
> And do *not* change these from the enumeration to an unsigned int.
> 
> I know why you did this -- to keep the compiler from warning that the TCGType
> enum didn't fit in the bitfield, because of TCG_TYPE_COUNT being an enumerator,
> rather than an unrelated number.  Except that's exactly the warning we want to
> keep, on the off-chance that someone modifies the enums without modifying the
> _NR_BITS defines.

Agreed, please see below.

Thanks,

		E.

[No signoff due to lack of provable perf improvement, see above.]

diff --git a/tcg/tcg.h b/tcg/tcg.h
index add7f75..afd3f94 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -193,7 +193,6 @@ typedef struct TCGPool {
 typedef enum TCGType {
     TCG_TYPE_I32,
     TCG_TYPE_I64,
-    TCG_TYPE_COUNT, /* number of different types */
 
     /* An alias for the size of the host register.  */
 #if TCG_TARGET_REG_BITS == 32
@@ -217,6 +216,10 @@ typedef enum TCGType {
 #endif
 } TCGType;
 
+/* used for bitfield packing to save space */
+#define TCG_TYPE_NR_BITS	1
+#define TCG_TYPE_COUNT		BIT(TCG_TYPE_NR_BITS)
+
 /* Constants for qemu_ld and qemu_st for the Memory Operation field.  */
 typedef enum TCGMemOp {
     MO_8     = 0,
@@ -417,20 +420,21 @@ static inline TCGCond tcg_high_cond(TCGCond c)
     }
 }
 
-#define TEMP_VAL_DEAD  0
-#define TEMP_VAL_REG   1
-#define TEMP_VAL_MEM   2
-#define TEMP_VAL_CONST 3
+typedef enum TCGTempVal {
+    TEMP_VAL_DEAD,
+    TEMP_VAL_REG,
+    TEMP_VAL_MEM,
+    TEMP_VAL_CONST,
+} TCGTempVal;
+
+#define TEMP_VAL_NR_BITS 2
 
-/* XXX: optimize memory layout */
 typedef struct TCGTemp {
-    TCGType base_type;
-    TCGType type;
-    int val_type;
-    int reg;
-    tcg_target_long val;
-    int mem_reg;
-    intptr_t mem_offset;
+    unsigned int reg:8;
+    unsigned int mem_reg:8;
+    TCGTempVal val_type:TEMP_VAL_NR_BITS;
+    TCGType base_type:TCG_TYPE_NR_BITS;
+    TCGType type:TCG_TYPE_NR_BITS;
     unsigned int fixed_reg:1;
     unsigned int mem_coherent:1;
     unsigned int mem_allocated:1;
@@ -438,6 +442,9 @@ typedef struct TCGTemp {
                                   basic blocks. Otherwise, it is not
                                   preserved across basic blocks. */
     unsigned int temp_allocated:1; /* never used for code gen */
+
+    tcg_target_long val;
+    intptr_t mem_offset;
     const char *name;
 } TCGTemp;
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
  2015-03-27 21:09         ` Emilio G. Cota
@ 2015-03-30  9:55           ` Laurent Desnogues
  0 siblings, 0 replies; 11+ messages in thread
From: Laurent Desnogues @ 2015-03-30  9:55 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-trivial, Stefan Weil, Alex Bennée, qemu-devel,
	Richard Henderson

Hello,

On Fri, Mar 27, 2015 at 10:09 PM, Emilio G. Cota <cota@braap.org> wrote:
> On Fri, Mar 27, 2015 at 09:55:03 +0000, Alex Bennée wrote:
>> Have you been able to measure any performance improvement with these new
>> structures? In theory, if aligned with cache lines, performance should
>> improve but real numbers would be nice.
>
> I haven't benchmarked anything, which makes me very uneasy. All
> I've checked is that the system boots, and FWIW I appreciate no
> difference in boot time.
>
> Is there a benchmark suite to test TCG changes?
>
> Until proper benchmarking I wouldn't want to see this merged. For now I
> propose to merge the initial change (remove 8-byte hole in 64-bit),
> which is uncontroversial.

I tested the patch attached to your mail and saw no performance
difference on an ARM image booting Linux and then running
Sunspider with Google v8.  I also tested on one of the 176.gcc
inputs with QEMU ARM user mode and again saw no difference.

Thanks,

Laurent

>> > The appended adds macros to prevent us from mistakenly overflowing
>> > the bitfields when more elements are added to the corresponding
>> > enums/macros.
>>
>> I can see the defines but I can't see any checks. Should we be able to
>> do a compile time check if TCG_TYPE_COUNT doesn't fit into
>> TCG_TYPE_NR_BITS?
>
>> > +#define TEMP_VAL_NR_BITS 2
>>
>> A similar compile time check could be added here.
>
> Ack, addressed below.
>
> On Fri, Mar 27, 2015 at 07:58:06 -0700, Richard Henderson wrote:
>> On 03/25/2015 12:50 PM, Emilio G. Cota wrote:
>> > +#define TCG_TYPE_NR_BITS 1
>>
>> I'd rather you moved TCG_TYPE_COUNT out of the enum and into a define.  Perhaps
>> even as (1 << TCG_TYPE_NR_BITS).
> (snip)
>> > +#define TEMP_VAL_NR_BITS 2
>>
>> And make this an enumeration.
>>
>> >  typedef struct TCGTemp {
> (snip)
>> > +    unsigned int base_type:TCG_TYPE_NR_BITS;
>> > +    unsigned int type:TCG_TYPE_NR_BITS;
>>
>> And do *not* change these from the enumeration to an unsigned int.
>>
>> I know why you did this -- to keep the compiler from warning that the TCGType
>> enum didn't fit in the bitfield, because of TCG_TYPE_COUNT being an enumerator,
>> rather than an unrelated number.  Except that's exactly the warning we want to
>> keep, on the off-chance that someone modifies the enums without modifying the
>> _NR_BITS defines.
>
> Agreed, please see below.
>
> Thanks,
>
>                 E.
>
> [No signoff due to lack of provable perf improvement, see above.]
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index add7f75..afd3f94 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -193,7 +193,6 @@ typedef struct TCGPool {
>  typedef enum TCGType {
>      TCG_TYPE_I32,
>      TCG_TYPE_I64,
> -    TCG_TYPE_COUNT, /* number of different types */
>
>      /* An alias for the size of the host register.  */
>  #if TCG_TARGET_REG_BITS == 32
> @@ -217,6 +216,10 @@ typedef enum TCGType {
>  #endif
>  } TCGType;
>
> +/* used for bitfield packing to save space */
> +#define TCG_TYPE_NR_BITS       1
> +#define TCG_TYPE_COUNT         BIT(TCG_TYPE_NR_BITS)
> +
>  /* Constants for qemu_ld and qemu_st for the Memory Operation field.  */
>  typedef enum TCGMemOp {
>      MO_8     = 0,
> @@ -417,20 +420,21 @@ static inline TCGCond tcg_high_cond(TCGCond c)
>      }
>  }
>
> -#define TEMP_VAL_DEAD  0
> -#define TEMP_VAL_REG   1
> -#define TEMP_VAL_MEM   2
> -#define TEMP_VAL_CONST 3
> +typedef enum TCGTempVal {
> +    TEMP_VAL_DEAD,
> +    TEMP_VAL_REG,
> +    TEMP_VAL_MEM,
> +    TEMP_VAL_CONST,
> +} TCGTempVal;
> +
> +#define TEMP_VAL_NR_BITS 2
>
> -/* XXX: optimize memory layout */
>  typedef struct TCGTemp {
> -    TCGType base_type;
> -    TCGType type;
> -    int val_type;
> -    int reg;
> -    tcg_target_long val;
> -    int mem_reg;
> -    intptr_t mem_offset;
> +    unsigned int reg:8;
> +    unsigned int mem_reg:8;
> +    TCGTempVal val_type:TEMP_VAL_NR_BITS;
> +    TCGType base_type:TCG_TYPE_NR_BITS;
> +    TCGType type:TCG_TYPE_NR_BITS;
>      unsigned int fixed_reg:1;
>      unsigned int mem_coherent:1;
>      unsigned int mem_allocated:1;
> @@ -438,6 +442,9 @@ typedef struct TCGTemp {
>                                    basic blocks. Otherwise, it is not
>                                    preserved across basic blocks. */
>      unsigned int temp_allocated:1; /* never used for code gen */
> +
> +    tcg_target_long val;
> +    intptr_t mem_offset;
>      const char *name;
>  } TCGTemp;
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
  2015-03-29 21:52 Richard Henderson
  2015-03-30  5:33 ` Stefan Weil
@ 2015-03-30  5:43 ` Stefan Weil
  1 sibling, 0 replies; 11+ messages in thread
From: Stefan Weil @ 2015-03-30  5:43 UTC (permalink / raw)
  To: Richard Henderson, Emilio G. Cota
  Cc: qemu-trivial, Alex Bennée, qemu-devel

Am 29.03.2015 um 23:52 schrieb Richard Henderson:
> No decrease in boot time is good. We /know/ we're saving memory, after all.

Well, I would not mind a decrease in boot time, too.
The more it decreases, the better. :-)

To be honest: in my version I only used 1 bit bitfield entries for
boolean values, but 8 bit values (aligned on byte boundaries)
for other values because as far as I know, most (all?) cpu
architectures will need more time to extract some bits from
a machine word than to extract a byte.

I have no idea whether this makes a difference in performance
as I did not run any runtime benchmark.

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
  2015-03-29 21:52 Richard Henderson
@ 2015-03-30  5:33 ` Stefan Weil
  2015-03-30  5:43 ` Stefan Weil
  1 sibling, 0 replies; 11+ messages in thread
From: Stefan Weil @ 2015-03-30  5:33 UTC (permalink / raw)
  To: Richard Henderson, Emilio G. Cota
  Cc: qemu-trivial, Alex Bennée, qemu-devel

Am 29.03.2015 um 23:52 schrieb Richard Henderson:
> On Mar 27, 2015 14:09, "Emilio G. Cota" <cota@braap.org> wrote:
>> On Fri, Mar 27, 2015 at 09:55:03 +0000, Alex Bennée wrote:
>>> Have you been able to measure any performance improvement with these new
>>> structures? In theory, if aligned with cache lines, performance should
>>> improve but real numbers would be nice.
>> I haven't benchmarked anything, which makes me very uneasy. All
>> I've checked is that the system boots, and FWIW I appreciate no
>> difference in boot time.
> No decrease in boot time is good. We /know/ we're saving memory, after all.
>   
>> Is there a benchmark suite to test TCG changes?
> No, sorry.
>
>
> r~

Benchmarking TCG with QEMU's system emulation is nearly impossible
because operating systems usually contain lots of timer based operations.
The TCG interpreter for example is really slow, but a BIOS will boot
faster than expected with it.

The user mode emulation is much better for benchmarks.
Run some command line Linux application which mainly does
computations (not file i/o) using user mode emulation on Linux.
The OpenSSL package contains bntest which can be used
as a benchmark for TCG. Redirect all output to /dev/null when
you run it.

Binaries for i386 and x86_64 are available from
http://qemu.weilnetz.de/test/user/.

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
@ 2015-03-29 21:52 Richard Henderson
  2015-03-30  5:33 ` Stefan Weil
  2015-03-30  5:43 ` Stefan Weil
  0 siblings, 2 replies; 11+ messages in thread
From: Richard Henderson @ 2015-03-29 21:52 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-trivial, Stefan Weil, Alex Bennée, qemu-devel

On Mar 27, 2015 14:09, "Emilio G. Cota" <cota@braap.org> wrote:
>
> On Fri, Mar 27, 2015 at 09:55:03 +0000, Alex Bennée wrote: 
> > Have you been able to measure any performance improvement with these new 
> > structures? In theory, if aligned with cache lines, performance should 
> > improve but real numbers would be nice. 
>
> I haven't benchmarked anything, which makes me very uneasy. All 
> I've checked is that the system boots, and FWIW I appreciate no 
> difference in boot time.

No decrease in boot time is good. We /know/ we're saving memory, after all.
 
>
> Is there a benchmark suite to test TCG changes? 

No, sorry. 


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-03-30  9:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-21  6:27 [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes Emilio G. Cota
2015-03-23 21:42 ` Stefan Weil
2015-03-24  1:07   ` Richard Henderson
2015-03-25 19:50     ` [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota
2015-03-27  9:55       ` Alex Bennée
2015-03-27 21:09         ` Emilio G. Cota
2015-03-30  9:55           ` Laurent Desnogues
2015-03-27 14:58       ` Richard Henderson
2015-03-29 21:52 Richard Henderson
2015-03-30  5:33 ` Stefan Weil
2015-03-30  5:43 ` Stefan Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.