* [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes
@ 2015-03-21 6:27 Emilio G. Cota
2015-03-23 21:42 ` Stefan Weil
0 siblings, 1 reply; 11+ messages in thread
From: Emilio G. Cota @ 2015-03-21 6:27 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-trivial, Richard Henderson
This brings down the size of the struct from 56 to 48 bytes.
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
tcg/tcg.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index add7f75..3276924 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -429,8 +429,8 @@ typedef struct TCGTemp {
int val_type;
int reg;
tcg_target_long val;
- int mem_reg;
intptr_t mem_offset;
+ int mem_reg;
unsigned int fixed_reg:1;
unsigned int mem_coherent:1;
unsigned int mem_allocated:1;
--
1.9.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes
2015-03-21 6:27 [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes Emilio G. Cota
@ 2015-03-23 21:42 ` Stefan Weil
2015-03-24 1:07 ` Richard Henderson
0 siblings, 1 reply; 11+ messages in thread
From: Stefan Weil @ 2015-03-23 21:42 UTC (permalink / raw)
To: Emilio G. Cota, qemu-devel; +Cc: qemu-trivial, Richard Henderson
Am 21.03.2015 um 07:27 schrieb Emilio G. Cota:
> This brings down the size of the struct from 56 to 48 bytes.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
> tcg/tcg.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index add7f75..3276924 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -429,8 +429,8 @@ typedef struct TCGTemp {
> int val_type;
> int reg;
> tcg_target_long val;
> - int mem_reg;
> intptr_t mem_offset;
> + int mem_reg;
> unsigned int fixed_reg:1;
> unsigned int mem_coherent:1;
> unsigned int mem_allocated:1;
Reviewed-by: Stefan Weil <sw@weilnetz.de>
TCGContext includes an array of TCGTemp, so it is even reduced by 4 KiB
(good for caching),
and tcg.o now uses 55364 instead of 56116 bytes (maybe faster, too).
Further optimizations are possible. TCGTemp can be reduced to 32 bytes
as the output
of pahole shows:
struct TCGTemp {
TCGTempVal val_type:8; /* 0:24 4 */
unsigned int reg:8; /* 0:16 4 */
unsigned int mem_reg:8; /* 0: 8 4 */
/* Bitfield combined with next fields */
_Bool fixed_reg:1; /* 3: 7 1 */
_Bool mem_coherent:1; /* 3: 6 1 */
_Bool mem_allocated:1; /* 3: 5 1 */
_Bool temp_local:1; /* 3: 4 1 */
_Bool temp_allocated:1; /* 3: 3 1 */
/* XXX 3 bits hole, try to pack */
TCGType base_type:16; /* 4:16 4 */
TCGType type:16; /* 4: 0 4 */
tcg_target_long val; /* 8 8 */
intptr_t mem_offset; /* 16 8 */
const char * name; /* 24 8 */
/* size: 32, cachelines: 1, members: 13 */
/* bit holes: 1, sum bit holes: 3 bits */
/* last cacheline: 32 bytes */
};
Here I used a new enum type for val_type and reduced some values to 8 or
16 bit.
I also put the two most often used values at the beginning, so they can be
addressed without or with a small offset ("often" in the code, no runtime
data available).
Are such optimizations useful?
Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes
2015-03-23 21:42 ` Stefan Weil
@ 2015-03-24 1:07 ` Richard Henderson
2015-03-25 19:50 ` [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota
0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2015-03-24 1:07 UTC (permalink / raw)
To: Stefan Weil, Emilio G. Cota, qemu-devel; +Cc: qemu-trivial
On 03/23/2015 02:42 PM, Stefan Weil wrote:
> Further optimizations are possible. TCGTemp can be reduced to 32 bytes as the
> output
> of pahole shows:
>
> struct TCGTemp {
> TCGTempVal val_type:8; /* 0:24 4 */
Need only be 2 bits.
> unsigned int reg:8; /* 0:16 4 */
> unsigned int mem_reg:8; /* 0: 8 4 */
Need only be 6 (ia64) bits, but an aligned 8-bit slot probably performs best.
>
> /* Bitfield combined with next fields */
>
> _Bool fixed_reg:1; /* 3: 7 1 */
> _Bool mem_coherent:1; /* 3: 6 1 */
> _Bool mem_allocated:1; /* 3: 5 1 */
> _Bool temp_local:1; /* 3: 4 1 */
> _Bool temp_allocated:1; /* 3: 3 1 */
>
> /* XXX 3 bits hole, try to pack */
>
> TCGType base_type:16; /* 4:16 4 */
> TCGType type:16; /* 4: 0 4 */
Need only be 1 bit, honestly, but 2 bits might be easier to arrange. Anyway,
you're down to 23 bits from the word, or 16 bytes on a 32-bit host. It's no
better than the 32 bytes you got for a 64-bit host though.
> tcg_target_long val; /* 8 8 */
> intptr_t mem_offset; /* 16 8 */
> const char * name; /* 24 8 */
>
> /* size: 32, cachelines: 1, members: 13 */
> /* bit holes: 1, sum bit holes: 3 bits */
> /* last cacheline: 32 bytes */
> };
>
> Here I used a new enum type for val_type and reduced some values to 8 or 16 bit.
> I also put the two most often used values at the beginning, so they can be
> addressed without or with a small offset ("often" in the code, no runtime
> data available).
>
> Are such optimizations useful?
Yes, I think so. Especially because of the rather large arrays we build.
r~
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
2015-03-24 1:07 ` Richard Henderson
@ 2015-03-25 19:50 ` Emilio G. Cota
2015-03-27 9:55 ` Alex Bennée
2015-03-27 14:58 ` Richard Henderson
0 siblings, 2 replies; 11+ messages in thread
From: Emilio G. Cota @ 2015-03-25 19:50 UTC (permalink / raw)
To: Stefan Weil, Richard Henderson; +Cc: qemu-trivial, qemu-devel
This brings down the size of the struct from 56 to 32 bytes on 64-bit,
and to 16 bytes on 32-bit.
The appended adds macros to prevent us from mistakenly overflowing
the bitfields when more elements are added to the corresponding
enums/macros.
Note that reg/mem_reg need only 6 bits (for ia64) but for performance
is probably better to align them to a byte address.
Given that TCGTemp is used in large arrays this leads to a few KBs
of savings. However, unpacking the bits takes additional code, so
the net effect depends on the target (host is x86_64):
Before:
$ find . -name 'tcg.o' | xargs size
text data bss dec hex filename
41131 29800 88 71019 1156b ./aarch64-softmmu/tcg/tcg.o
37969 29416 96 67481 10799 ./x86_64-linux-user/tcg/tcg.o
39354 28816 96 68266 10aaa ./arm-linux-user/tcg/tcg.o
40802 29096 88 69986 11162 ./arm-softmmu/tcg/tcg.o
39417 29672 88 69177 10e39 ./x86_64-softmmu/tcg/tcg.o
After:
$ find . -name 'tcg.o' | xargs size
text data bss dec hex filename
41187 29800 88 71075 115a3 ./aarch64-softmmu/tcg/tcg.o
37777 29416 96 67289 106d9 ./x86_64-linux-user/tcg/tcg.o
39162 28816 96 68074 109ea ./arm-linux-user/tcg/tcg.o
40858 29096 88 70042 1119a ./arm-softmmu/tcg/tcg.o
39473 29672 88 69233 10e71 ./x86_64-softmmu/tcg/tcg.o
Suggested-by: Stefan Weil <sw@weilnetz.de>
Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
tcg/tcg.h | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index add7f75..71ae7b2 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -193,7 +193,7 @@ typedef struct TCGPool {
typedef enum TCGType {
TCG_TYPE_I32,
TCG_TYPE_I64,
- TCG_TYPE_COUNT, /* number of different types */
+ TCG_TYPE_COUNT, /* number of different types, see TCG_TYPE_NR_BITS */
/* An alias for the size of the host register. */
#if TCG_TARGET_REG_BITS == 32
@@ -217,6 +217,9 @@ typedef enum TCGType {
#endif
} TCGType;
+/* used for bitfield packing to save space */
+#define TCG_TYPE_NR_BITS 1
+
/* Constants for qemu_ld and qemu_st for the Memory Operation field. */
typedef enum TCGMemOp {
MO_8 = 0,
@@ -421,16 +424,14 @@ static inline TCGCond tcg_high_cond(TCGCond c)
#define TEMP_VAL_REG 1
#define TEMP_VAL_MEM 2
#define TEMP_VAL_CONST 3
+#define TEMP_VAL_NR_BITS 2
-/* XXX: optimize memory layout */
typedef struct TCGTemp {
- TCGType base_type;
- TCGType type;
- int val_type;
- int reg;
- tcg_target_long val;
- int mem_reg;
- intptr_t mem_offset;
+ unsigned int reg:8;
+ unsigned int mem_reg:8;
+ unsigned int val_type:TEMP_VAL_NR_BITS;
+ unsigned int base_type:TCG_TYPE_NR_BITS;
+ unsigned int type:TCG_TYPE_NR_BITS;
unsigned int fixed_reg:1;
unsigned int mem_coherent:1;
unsigned int mem_allocated:1;
@@ -438,6 +439,9 @@ typedef struct TCGTemp {
basic blocks. Otherwise, it is not
preserved across basic blocks. */
unsigned int temp_allocated:1; /* never used for code gen */
+
+ tcg_target_long val;
+ intptr_t mem_offset;
const char *name;
} TCGTemp;
--
1.9.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
2015-03-25 19:50 ` [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota
@ 2015-03-27 9:55 ` Alex Bennée
2015-03-27 21:09 ` Emilio G. Cota
2015-03-27 14:58 ` Richard Henderson
1 sibling, 1 reply; 11+ messages in thread
From: Alex Bennée @ 2015-03-27 9:55 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-trivial, Stefan Weil, qemu-devel, Richard Henderson
Emilio G. Cota <cota@braap.org> writes:
> This brings down the size of the struct from 56 to 32 bytes on 64-bit,
> and to 16 bytes on 32-bit.
Have you been able to measure any performance improvement with these new
structures? In theory, if aligned with cache lines, performance should
improve but real numbers would be nice.
>
> The appended adds macros to prevent us from mistakenly overflowing
> the bitfields when more elements are added to the corresponding
> enums/macros.
I can see the defines but I can't see any checks. Should we be able to
do a compile time check if TCG_TYPE_COUNT doesn't fit into
TCG_TYPE_NR_BITS?
>
> Note that reg/mem_reg need only 6 bits (for ia64) but for performance
> is probably better to align them to a byte address.
>
> Given that TCGTemp is used in large arrays this leads to a few KBs
> of savings. However, unpacking the bits takes additional code, so
> the net effect depends on the target (host is x86_64):
>
> Before:
> $ find . -name 'tcg.o' | xargs size
> text data bss dec hex filename
> 41131 29800 88 71019 1156b ./aarch64-softmmu/tcg/tcg.o
> 37969 29416 96 67481 10799 ./x86_64-linux-user/tcg/tcg.o
> 39354 28816 96 68266 10aaa ./arm-linux-user/tcg/tcg.o
> 40802 29096 88 69986 11162 ./arm-softmmu/tcg/tcg.o
> 39417 29672 88 69177 10e39 ./x86_64-softmmu/tcg/tcg.o
>
> After:
> $ find . -name 'tcg.o' | xargs size
> text data bss dec hex filename
> 41187 29800 88 71075 115a3 ./aarch64-softmmu/tcg/tcg.o
> 37777 29416 96 67289 106d9 ./x86_64-linux-user/tcg/tcg.o
> 39162 28816 96 68074 109ea ./arm-linux-user/tcg/tcg.o
> 40858 29096 88 70042 1119a ./arm-softmmu/tcg/tcg.o
> 39473 29672 88 69233 10e71 ./x86_64-softmmu/tcg/tcg.o
>
> Suggested-by: Stefan Weil <sw@weilnetz.de>
> Suggested-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
> tcg/tcg.h | 22 +++++++++++++---------
> 1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index add7f75..71ae7b2 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -193,7 +193,7 @@ typedef struct TCGPool {
> typedef enum TCGType {
> TCG_TYPE_I32,
> TCG_TYPE_I64,
> - TCG_TYPE_COUNT, /* number of different types */
> + TCG_TYPE_COUNT, /* number of different types, see TCG_TYPE_NR_BITS */
>
> /* An alias for the size of the host register. */
> #if TCG_TARGET_REG_BITS == 32
> @@ -217,6 +217,9 @@ typedef enum TCGType {
> #endif
> } TCGType;
>
> +/* used for bitfield packing to save space */
> +#define TCG_TYPE_NR_BITS 1
> +
> /* Constants for qemu_ld and qemu_st for the Memory Operation field. */
> typedef enum TCGMemOp {
> MO_8 = 0,
> @@ -421,16 +424,14 @@ static inline TCGCond tcg_high_cond(TCGCond c)
> #define TEMP_VAL_REG 1
> #define TEMP_VAL_MEM 2
> #define TEMP_VAL_CONST 3
> +#define TEMP_VAL_NR_BITS 2
A similar compile time check could be added here.
>
> -/* XXX: optimize memory layout */
> typedef struct TCGTemp {
> - TCGType base_type;
> - TCGType type;
> - int val_type;
> - int reg;
> - tcg_target_long val;
> - int mem_reg;
> - intptr_t mem_offset;
> + unsigned int reg:8;
> + unsigned int mem_reg:8;
> + unsigned int val_type:TEMP_VAL_NR_BITS;
> + unsigned int base_type:TCG_TYPE_NR_BITS;
> + unsigned int type:TCG_TYPE_NR_BITS;
> unsigned int fixed_reg:1;
> unsigned int mem_coherent:1;
> unsigned int mem_allocated:1;
> @@ -438,6 +439,9 @@ typedef struct TCGTemp {
> basic blocks. Otherwise, it is not
> preserved across basic blocks. */
> unsigned int temp_allocated:1; /* never used for code gen */
> +
> + tcg_target_long val;
> + intptr_t mem_offset;
> const char *name;
> } TCGTemp;
--
Alex Bennée
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
2015-03-25 19:50 ` [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota
2015-03-27 9:55 ` Alex Bennée
@ 2015-03-27 14:58 ` Richard Henderson
1 sibling, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2015-03-27 14:58 UTC (permalink / raw)
To: Emilio G. Cota, Stefan Weil; +Cc: qemu-trivial, qemu-devel
On 03/25/2015 12:50 PM, Emilio G. Cota wrote:
> This brings down the size of the struct from 56 to 32 bytes on 64-bit,
> and to 16 bytes on 32-bit.
>
> The appended adds macros to prevent us from mistakenly overflowing
> the bitfields when more elements are added to the corresponding
> enums/macros.
>
> Note that reg/mem_reg need only 6 bits (for ia64) but for performance
> is probably better to align them to a byte address.
>
> Given that TCGTemp is used in large arrays this leads to a few KBs
> of savings. However, unpacking the bits takes additional code, so
> the net effect depends on the target (host is x86_64):
>
> Before:
> $ find . -name 'tcg.o' | xargs size
> text data bss dec hex filename
> 41131 29800 88 71019 1156b ./aarch64-softmmu/tcg/tcg.o
> 37969 29416 96 67481 10799 ./x86_64-linux-user/tcg/tcg.o
> 39354 28816 96 68266 10aaa ./arm-linux-user/tcg/tcg.o
> 40802 29096 88 69986 11162 ./arm-softmmu/tcg/tcg.o
> 39417 29672 88 69177 10e39 ./x86_64-softmmu/tcg/tcg.o
>
> After:
> $ find . -name 'tcg.o' | xargs size
> text data bss dec hex filename
> 41187 29800 88 71075 115a3 ./aarch64-softmmu/tcg/tcg.o
> 37777 29416 96 67289 106d9 ./x86_64-linux-user/tcg/tcg.o
> 39162 28816 96 68074 109ea ./arm-linux-user/tcg/tcg.o
> 40858 29096 88 70042 1119a ./arm-softmmu/tcg/tcg.o
> 39473 29672 88 69233 10e71 ./x86_64-softmmu/tcg/tcg.o
>
> Suggested-by: Stefan Weil <sw@weilnetz.de>
> Suggested-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
> tcg/tcg.h | 22 +++++++++++++---------
> 1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index add7f75..71ae7b2 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -193,7 +193,7 @@ typedef struct TCGPool {
> typedef enum TCGType {
> TCG_TYPE_I32,
> TCG_TYPE_I64,
> - TCG_TYPE_COUNT, /* number of different types */
> + TCG_TYPE_COUNT, /* number of different types, see TCG_TYPE_NR_BITS */
>
> /* An alias for the size of the host register. */
> #if TCG_TARGET_REG_BITS == 32
> @@ -217,6 +217,9 @@ typedef enum TCGType {
> #endif
> } TCGType;
>
> +/* used for bitfield packing to save space */
> +#define TCG_TYPE_NR_BITS 1
I'd rather you moved TCG_TYPE_COUNT out of the enum and into a define. Perhaps
even as (1 << TCG_TYPE_NR_BITS).
> @@ -421,16 +424,14 @@ static inline TCGCond tcg_high_cond(TCGCond c)
> #define TEMP_VAL_REG 1
> #define TEMP_VAL_MEM 2
> #define TEMP_VAL_CONST 3
> +#define TEMP_VAL_NR_BITS 2
And make this an enumeration.
> typedef struct TCGTemp {
> - TCGType base_type;
> - TCGType type;
> - int val_type;
> - int reg;
> - tcg_target_long val;
> - int mem_reg;
> - intptr_t mem_offset;
> + unsigned int reg:8;
> + unsigned int mem_reg:8;
> + unsigned int val_type:TEMP_VAL_NR_BITS;
> + unsigned int base_type:TCG_TYPE_NR_BITS;
> + unsigned int type:TCG_TYPE_NR_BITS;
And do *not* change these from the enumeration to an unsigned int.
I know why you did this -- to keep the compiler from warning that the TCGType
enum didn't fit in the bitfield, because of TCG_TYPE_COUNT being an enumerator,
rather than an unrelated number. Except that's exactly the warning we want to
keep, on the off-chance that someone modifies the enums without modifying the
_NR_BITS defines.
r~
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
2015-03-27 9:55 ` Alex Bennée
@ 2015-03-27 21:09 ` Emilio G. Cota
2015-03-30 9:55 ` Laurent Desnogues
0 siblings, 1 reply; 11+ messages in thread
From: Emilio G. Cota @ 2015-03-27 21:09 UTC (permalink / raw)
To: Alex Bennée, Richard Henderson; +Cc: qemu-trivial, Stefan Weil, qemu-devel
On Fri, Mar 27, 2015 at 09:55:03 +0000, Alex Bennée wrote:
> Have you been able to measure any performance improvement with these new
> structures? In theory, if aligned with cache lines, performance should
> improve but real numbers would be nice.
I haven't benchmarked anything, which makes me very uneasy. All
I've checked is that the system boots, and FWIW I appreciate no
difference in boot time.
Is there a benchmark suite to test TCG changes?
Until proper benchmarking I wouldn't want to see this merged. For now I
propose to merge the initial change (remove 8-byte hole in 64-bit),
which is uncontroversial.
> > The appended adds macros to prevent us from mistakenly overflowing
> > the bitfields when more elements are added to the corresponding
> > enums/macros.
>
> I can see the defines but I can't see any checks. Should we be able to
> do a compile time check if TCG_TYPE_COUNT doesn't fit into
> TCG_TYPE_NR_BITS?
> > +#define TEMP_VAL_NR_BITS 2
>
> A similar compile time check could be added here.
Ack, addressed below.
On Fri, Mar 27, 2015 at 07:58:06 -0700, Richard Henderson wrote:
> On 03/25/2015 12:50 PM, Emilio G. Cota wrote:
> > +#define TCG_TYPE_NR_BITS 1
>
> I'd rather you moved TCG_TYPE_COUNT out of the enum and into a define. Perhaps
> even as (1 << TCG_TYPE_NR_BITS).
(snip)
> > +#define TEMP_VAL_NR_BITS 2
>
> And make this an enumeration.
>
> > typedef struct TCGTemp {
(snip)
> > + unsigned int base_type:TCG_TYPE_NR_BITS;
> > + unsigned int type:TCG_TYPE_NR_BITS;
>
> And do *not* change these from the enumeration to an unsigned int.
>
> I know why you did this -- to keep the compiler from warning that the TCGType
> enum didn't fit in the bitfield, because of TCG_TYPE_COUNT being an enumerator,
> rather than an unrelated number. Except that's exactly the warning we want to
> keep, on the off-chance that someone modifies the enums without modifying the
> _NR_BITS defines.
Agreed, please see below.
Thanks,
E.
[No signoff due to lack of provable perf improvement, see above.]
diff --git a/tcg/tcg.h b/tcg/tcg.h
index add7f75..afd3f94 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -193,7 +193,6 @@ typedef struct TCGPool {
typedef enum TCGType {
TCG_TYPE_I32,
TCG_TYPE_I64,
- TCG_TYPE_COUNT, /* number of different types */
/* An alias for the size of the host register. */
#if TCG_TARGET_REG_BITS == 32
@@ -217,6 +216,10 @@ typedef enum TCGType {
#endif
} TCGType;
+/* used for bitfield packing to save space */
+#define TCG_TYPE_NR_BITS 1
+#define TCG_TYPE_COUNT BIT(TCG_TYPE_NR_BITS)
+
/* Constants for qemu_ld and qemu_st for the Memory Operation field. */
typedef enum TCGMemOp {
MO_8 = 0,
@@ -417,20 +420,21 @@ static inline TCGCond tcg_high_cond(TCGCond c)
}
}
-#define TEMP_VAL_DEAD 0
-#define TEMP_VAL_REG 1
-#define TEMP_VAL_MEM 2
-#define TEMP_VAL_CONST 3
+typedef enum TCGTempVal {
+ TEMP_VAL_DEAD,
+ TEMP_VAL_REG,
+ TEMP_VAL_MEM,
+ TEMP_VAL_CONST,
+} TCGTempVal;
+
+#define TEMP_VAL_NR_BITS 2
-/* XXX: optimize memory layout */
typedef struct TCGTemp {
- TCGType base_type;
- TCGType type;
- int val_type;
- int reg;
- tcg_target_long val;
- int mem_reg;
- intptr_t mem_offset;
+ unsigned int reg:8;
+ unsigned int mem_reg:8;
+ TCGTempVal val_type:TEMP_VAL_NR_BITS;
+ TCGType base_type:TCG_TYPE_NR_BITS;
+ TCGType type:TCG_TYPE_NR_BITS;
unsigned int fixed_reg:1;
unsigned int mem_coherent:1;
unsigned int mem_allocated:1;
@@ -438,6 +442,9 @@ typedef struct TCGTemp {
basic blocks. Otherwise, it is not
preserved across basic blocks. */
unsigned int temp_allocated:1; /* never used for code gen */
+
+ tcg_target_long val;
+ intptr_t mem_offset;
const char *name;
} TCGTemp;
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
2015-03-27 21:09 ` Emilio G. Cota
@ 2015-03-30 9:55 ` Laurent Desnogues
0 siblings, 0 replies; 11+ messages in thread
From: Laurent Desnogues @ 2015-03-30 9:55 UTC (permalink / raw)
To: Emilio G. Cota
Cc: qemu-trivial, Stefan Weil, Alex Bennée, qemu-devel,
Richard Henderson
Hello,
On Fri, Mar 27, 2015 at 10:09 PM, Emilio G. Cota <cota@braap.org> wrote:
> On Fri, Mar 27, 2015 at 09:55:03 +0000, Alex Bennée wrote:
>> Have you been able to measure any performance improvement with these new
>> structures? In theory, if aligned with cache lines, performance should
>> improve but real numbers would be nice.
>
> I haven't benchmarked anything, which makes me very uneasy. All
> I've checked is that the system boots, and FWIW I appreciate no
> difference in boot time.
>
> Is there a benchmark suite to test TCG changes?
>
> Until proper benchmarking I wouldn't want to see this merged. For now I
> propose to merge the initial change (remove 8-byte hole in 64-bit),
> which is uncontroversial.
I tested the patch attached to your mail and saw no performance
difference on an ARM image booting Linux and then running
Sunspider with Google v8. I also tested on one of the 176.gcc
inputs with QEMU ARM user mode and again saw no difference.
Thanks,
Laurent
>> > The appended adds macros to prevent us from mistakenly overflowing
>> > the bitfields when more elements are added to the corresponding
>> > enums/macros.
>>
>> I can see the defines but I can't see any checks. Should we be able to
>> do a compile time check if TCG_TYPE_COUNT doesn't fit into
>> TCG_TYPE_NR_BITS?
>
>> > +#define TEMP_VAL_NR_BITS 2
>>
>> A similar compile time check could be added here.
>
> Ack, addressed below.
>
> On Fri, Mar 27, 2015 at 07:58:06 -0700, Richard Henderson wrote:
>> On 03/25/2015 12:50 PM, Emilio G. Cota wrote:
>> > +#define TCG_TYPE_NR_BITS 1
>>
>> I'd rather you moved TCG_TYPE_COUNT out of the enum and into a define. Perhaps
>> even as (1 << TCG_TYPE_NR_BITS).
> (snip)
>> > +#define TEMP_VAL_NR_BITS 2
>>
>> And make this an enumeration.
>>
>> > typedef struct TCGTemp {
> (snip)
>> > + unsigned int base_type:TCG_TYPE_NR_BITS;
>> > + unsigned int type:TCG_TYPE_NR_BITS;
>>
>> And do *not* change these from the enumeration to an unsigned int.
>>
>> I know why you did this -- to keep the compiler from warning that the TCGType
>> enum didn't fit in the bitfield, because of TCG_TYPE_COUNT being an enumerator,
>> rather than an unrelated number. Except that's exactly the warning we want to
>> keep, on the off-chance that someone modifies the enums without modifying the
>> _NR_BITS defines.
>
> Agreed, please see below.
>
> Thanks,
>
> E.
>
> [No signoff due to lack of provable perf improvement, see above.]
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index add7f75..afd3f94 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -193,7 +193,6 @@ typedef struct TCGPool {
> typedef enum TCGType {
> TCG_TYPE_I32,
> TCG_TYPE_I64,
> - TCG_TYPE_COUNT, /* number of different types */
>
> /* An alias for the size of the host register. */
> #if TCG_TARGET_REG_BITS == 32
> @@ -217,6 +216,10 @@ typedef enum TCGType {
> #endif
> } TCGType;
>
> +/* used for bitfield packing to save space */
> +#define TCG_TYPE_NR_BITS 1
> +#define TCG_TYPE_COUNT BIT(TCG_TYPE_NR_BITS)
> +
> /* Constants for qemu_ld and qemu_st for the Memory Operation field. */
> typedef enum TCGMemOp {
> MO_8 = 0,
> @@ -417,20 +420,21 @@ static inline TCGCond tcg_high_cond(TCGCond c)
> }
> }
>
> -#define TEMP_VAL_DEAD 0
> -#define TEMP_VAL_REG 1
> -#define TEMP_VAL_MEM 2
> -#define TEMP_VAL_CONST 3
> +typedef enum TCGTempVal {
> + TEMP_VAL_DEAD,
> + TEMP_VAL_REG,
> + TEMP_VAL_MEM,
> + TEMP_VAL_CONST,
> +} TCGTempVal;
> +
> +#define TEMP_VAL_NR_BITS 2
>
> -/* XXX: optimize memory layout */
> typedef struct TCGTemp {
> - TCGType base_type;
> - TCGType type;
> - int val_type;
> - int reg;
> - tcg_target_long val;
> - int mem_reg;
> - intptr_t mem_offset;
> + unsigned int reg:8;
> + unsigned int mem_reg:8;
> + TCGTempVal val_type:TEMP_VAL_NR_BITS;
> + TCGType base_type:TCG_TYPE_NR_BITS;
> + TCGType type:TCG_TYPE_NR_BITS;
> unsigned int fixed_reg:1;
> unsigned int mem_coherent:1;
> unsigned int mem_allocated:1;
> @@ -438,6 +442,9 @@ typedef struct TCGTemp {
> basic blocks. Otherwise, it is not
> preserved across basic blocks. */
> unsigned int temp_allocated:1; /* never used for code gen */
> +
> + tcg_target_long val;
> + intptr_t mem_offset;
> const char *name;
> } TCGTemp;
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
2015-03-29 21:52 Richard Henderson
2015-03-30 5:33 ` Stefan Weil
@ 2015-03-30 5:43 ` Stefan Weil
1 sibling, 0 replies; 11+ messages in thread
From: Stefan Weil @ 2015-03-30 5:43 UTC (permalink / raw)
To: Richard Henderson, Emilio G. Cota
Cc: qemu-trivial, Alex Bennée, qemu-devel
Am 29.03.2015 um 23:52 schrieb Richard Henderson:
> No decrease in boot time is good. We /know/ we're saving memory, after all.
Well, I would not mind a decrease in boot time, too.
The more it decreases, the better. :-)
To be honest: in my version I only used 1 bit bitfield entries for
boolean values, but 8 bit values (aligned on byte boundaries)
for other values because as far as I know, most (all?) cpu
architectures will need more time to extract some bits from
a machine word than to extract a byte.
I have no idea whether this makes a difference in performance
as I did not run any runtime benchmark.
Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
2015-03-29 21:52 Richard Henderson
@ 2015-03-30 5:33 ` Stefan Weil
2015-03-30 5:43 ` Stefan Weil
1 sibling, 0 replies; 11+ messages in thread
From: Stefan Weil @ 2015-03-30 5:33 UTC (permalink / raw)
To: Richard Henderson, Emilio G. Cota
Cc: qemu-trivial, Alex Bennée, qemu-devel
Am 29.03.2015 um 23:52 schrieb Richard Henderson:
> On Mar 27, 2015 14:09, "Emilio G. Cota" <cota@braap.org> wrote:
>> On Fri, Mar 27, 2015 at 09:55:03 +0000, Alex Bennée wrote:
>>> Have you been able to measure any performance improvement with these new
>>> structures? In theory, if aligned with cache lines, performance should
>>> improve but real numbers would be nice.
>> I haven't benchmarked anything, which makes me very uneasy. All
>> I've checked is that the system boots, and FWIW I appreciate no
>> difference in boot time.
> No decrease in boot time is good. We /know/ we're saving memory, after all.
>
>> Is there a benchmark suite to test TCG changes?
> No, sorry.
>
>
> r~
Benchmarking TCG with QEMU's system emulation is nearly impossible
because operating systems usually contain lots of timer based operations.
The TCG interpreter for example is really slow, but a BIOS will boot
faster than expected with it.
The user mode emulation is much better for benchmarks.
Run some command line Linux application which mainly does
computations (not file i/o) using user mode emulation on Linux.
The OpenSSL package contains bntest which can be used
as a benchmark for TCG. Redirect all output to /dev/null when
you run it.
Binaries for i386 and x86_64 are available from
http://qemu.weilnetz.de/test/user/.
Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
@ 2015-03-29 21:52 Richard Henderson
2015-03-30 5:33 ` Stefan Weil
2015-03-30 5:43 ` Stefan Weil
0 siblings, 2 replies; 11+ messages in thread
From: Richard Henderson @ 2015-03-29 21:52 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-trivial, Stefan Weil, Alex Bennée, qemu-devel
On Mar 27, 2015 14:09, "Emilio G. Cota" <cota@braap.org> wrote:
>
> On Fri, Mar 27, 2015 at 09:55:03 +0000, Alex Bennée wrote:
> > Have you been able to measure any performance improvement with these new
> > structures? In theory, if aligned with cache lines, performance should
> > improve but real numbers would be nice.
>
> I haven't benchmarked anything, which makes me very uneasy. All
> I've checked is that the system boots, and FWIW I appreciate no
> difference in boot time.
No decrease in boot time is good. We /know/ we're saving memory, after all.
>
> Is there a benchmark suite to test TCG changes?
No, sorry.
r~
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-03-30 9:55 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-21 6:27 [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes Emilio G. Cota
2015-03-23 21:42 ` Stefan Weil
2015-03-24 1:07 ` Richard Henderson
2015-03-25 19:50 ` [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota
2015-03-27 9:55 ` Alex Bennée
2015-03-27 21:09 ` Emilio G. Cota
2015-03-30 9:55 ` Laurent Desnogues
2015-03-27 14:58 ` Richard Henderson
2015-03-29 21:52 Richard Henderson
2015-03-30 5:33 ` Stefan Weil
2015-03-30 5:43 ` Stefan Weil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.