[Qemu-devel] [PATCH v2 0/3] tcg: allocate TB structs preceding translated code

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 0/3] tcg: allocate TB structs preceding translated code
@ 2017-06-05 22:49 Emilio G. Cota
  2017-06-05 22:49 ` [Qemu-devel] [PATCH v2 1/3] compiler: define QEMU_CACHELINE_SIZE Emilio G. Cota
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Emilio G. Cota @ 2017-06-05 22:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, alex.bennee, Peter Maydell, Paolo Bonzini,
	Pranith Kumar

v1: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg00780.html

Changes from v1:

- Define QEMU_CACHELINE_SIZE as suggested by Richard.
  We try to get the value from the machine running configure, but if we fail
  we use some reasonable defaults. In any case the value can be overriden
  from --extra-cflags at configure time, which is particularly useful when
  cross-compiling.

- Use QEMU_CACHELINE_SIZE where appropriate, namely in tests/.

- In the unlikely case that code_gen_buffer_size / avg block / 8 is 0, then
  set tbs_size to 64K instead of just 1K, as suggested by Richard.

This patchset applies on top of rth's tcg-next branch (pull-tcg-20170605 tag).

NB. Apologies if some emails sent to me bounced during the last couple of days;
my domain name (braap.org) was down.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 1/3] compiler: define QEMU_CACHELINE_SIZE
  2017-06-05 22:49 [Qemu-devel] [PATCH v2 0/3] tcg: allocate TB structs preceding translated code Emilio G. Cota
@ 2017-06-05 22:49 ` Emilio G. Cota
  2017-06-06  5:39   ` Pranith Kumar
  2017-06-05 22:49 ` [Qemu-devel] [PATCH v2 2/3] tests: use QEMU_CACHELINE_SIZE instead of hard-coding it Emilio G. Cota
  2017-06-05 22:49 ` [Qemu-devel] [PATCH v2 3/3] tcg: allocate TB structs before the corresponding translated code Emilio G. Cota
  2 siblings, 1 reply; 18+ messages in thread
From: Emilio G. Cota @ 2017-06-05 22:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, alex.bennee, Peter Maydell, Paolo Bonzini,
	Pranith Kumar

This is a constant used as a hint for padding structs to hopefully avoid
false cache line sharing.

The constant can be set at configure time by defining QEMU_CACHELINE_SIZE
via --extra-cflags. If not set there, we try to obtain the value from
the machine running the configure script. If we fail, we default to
reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all others.

Note: the configure script only picks up the cache line size when run
on Linux hosts because I have no other platforms (e.g. Windows, BSD's)
to test on.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 configure               | 38 ++++++++++++++++++++++++++++++++++++++
 include/qemu/compiler.h | 17 +++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/configure b/configure
index 13e040d..6a68cb2 100755
--- a/configure
+++ b/configure
@@ -4832,6 +4832,41 @@ EOF
   fi
 fi
 
+# Find out the size of a cache line on the host
+# TODO: support more platforms
+cat > $TMPC<<EOF
+#ifdef __linux__
+
+#include <stdio.h>
+
+#define SYSFS "/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size"
+
+int main(int argc, char *argv[])
+{
+    unsigned int size;
+    FILE *fp;
+
+    fp = fopen(SYSFS, "r");
+    if (fp == NULL) {
+        return -1;
+    }
+    if (!fscanf(fp, "%u", &size)) {
+        return -1;
+    }
+    return size;
+}
+#else
+#error Cannot find host cache line size
+#endif
+EOF
+
+host_cacheline_size=0
+if compile_prog "" "" ; then
+    ./$TMPE
+    host_cacheline_size=$?
+fi
+
+
 ##########################################
 # check for _Static_assert()
 
@@ -5284,6 +5319,9 @@ fi
 if test "$bigendian" = "yes" ; then
   echo "HOST_WORDS_BIGENDIAN=y" >> $config_host_mak
 fi
+if test "$host_cacheline_size" -gt 0 ; then
+    echo "HOST_CACHELINE_SIZE=$host_cacheline_size" >> $config_host_mak
+fi
 if test "$mingw32" = "yes" ; then
   echo "CONFIG_WIN32=y" >> $config_host_mak
   rc_version=$(cat $source_path/VERSION)
diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 340e5fd..178d831 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -40,6 +40,23 @@
 # define QEMU_PACKED __attribute__((packed))
 #endif
 
+/*
+ * Cache line size of the host. Can be overriden.
+ * Note that this is just a compile-time hint to hopefully avoid false sharing
+ * of cache lines; code must be correct regardless of the constant's value.
+ */
+#ifndef QEMU_CACHELINE_SIZE
+# ifdef HOST_CACHELINE_SIZE
+#  define QEMU_CACHELINE_SIZE HOST_CACHELINE_SIZE
+# else
+#  if defined(__powerpc64__)
+#   define QEMU_CACHELINE_SIZE 128
+#  else
+#   define QEMU_CACHELINE_SIZE 64
+#  endif
+# endif
+#endif
+
 #define QEMU_ALIGNED(X) __attribute__((aligned(X)))
 
 #ifndef glue
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] compiler: define QEMU_CACHELINE_SIZE
  2017-06-05 22:49 ` [Qemu-devel] [PATCH v2 1/3] compiler: define QEMU_CACHELINE_SIZE Emilio G. Cota
@ 2017-06-06  5:39   ` Pranith Kumar
  2017-06-06  8:18     ` Richard Henderson
  2017-06-06 16:11     ` Emilio G. Cota
  0 siblings, 2 replies; 18+ messages in thread
From: Pranith Kumar @ 2017-06-06  5:39 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Richard Henderson, Alex Bennée, Peter Maydell,
	Paolo Bonzini

On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <cota@braap.org> wrote:
> This is a constant used as a hint for padding structs to hopefully avoid
> false cache line sharing.
>
> The constant can be set at configure time by defining QEMU_CACHELINE_SIZE
> via --extra-cflags. If not set there, we try to obtain the value from
> the machine running the configure script. If we fail, we default to
> reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all others.
>
> Note: the configure script only picks up the cache line size when run
> on Linux hosts because I have no other platforms (e.g. Windows, BSD's)
> to test on.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  configure               | 38 ++++++++++++++++++++++++++++++++++++++
>  include/qemu/compiler.h | 17 +++++++++++++++++
>  2 files changed, 55 insertions(+)
>
> diff --git a/configure b/configure
> index 13e040d..6a68cb2 100755
> --- a/configure
> +++ b/configure
> @@ -4832,6 +4832,41 @@ EOF
>    fi
>  fi
>
> +# Find out the size of a cache line on the host
> +# TODO: support more platforms
> +cat > $TMPC<<EOF
> +#ifdef __linux__
> +
> +#include <stdio.h>
> +
> +#define SYSFS "/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size"
> +
> +int main(int argc, char *argv[])
> +{
> +    unsigned int size;
> +    FILE *fp;
> +
> +    fp = fopen(SYSFS, "r");
> +    if (fp == NULL) {
> +        return -1;
> +    }
> +    if (!fscanf(fp, "%u", &size)) {
> +        return -1;
> +    }
> +    return size;
> +}
> +#else
> +#error Cannot find host cache line size
> +#endif
> +EOF

Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)?

Thanks,
-- 
Pranith

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] compiler: define QEMU_CACHELINE_SIZE
  2017-06-06  5:39   ` Pranith Kumar
@ 2017-06-06  8:18     ` Richard Henderson
  2017-06-06 16:11     ` Emilio G. Cota
  1 sibling, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2017-06-06  8:18 UTC (permalink / raw)
  To: Pranith Kumar, Emilio G. Cota
  Cc: qemu-devel, Alex Bennée, Peter Maydell, Paolo Bonzini

On 06/05/2017 10:39 PM, Pranith Kumar wrote:
> Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)?

That's an excellent idea.  In fact... see reply to 3/3.


r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] compiler: define QEMU_CACHELINE_SIZE
  2017-06-06  5:39   ` Pranith Kumar
  2017-06-06  8:18     ` Richard Henderson
@ 2017-06-06 16:11     ` Emilio G. Cota
  2017-06-06 17:39       ` Richard Henderson
  1 sibling, 1 reply; 18+ messages in thread
From: Emilio G. Cota @ 2017-06-06 16:11 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: qemu-devel, Richard Henderson, Alex Bennée, Peter Maydell,
	Paolo Bonzini

On Tue, Jun 06, 2017 at 01:39:45 -0400, Pranith Kumar wrote:
> On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <cota@braap.org> wrote:
> > This is a constant used as a hint for padding structs to hopefully avoid
> > false cache line sharing.
> >
> > The constant can be set at configure time by defining QEMU_CACHELINE_SIZE
> > via --extra-cflags. If not set there, we try to obtain the value from
> > the machine running the configure script. If we fail, we default to
> > reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all others.
(snip)
> Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)?

I tried using sysconf, but it doesn't work on the PowerPC machine I have
access to (it returns 0). It might be a machine-specific thing though-I
don't know. Here's the machine's `uname -a':
  Linux gcc2-power8.osuosl.org 3.10.0-514.10.2.el7.ppc64le #1 SMP Fri Mar \
    3 16:16:38 GMT 2017 ppc64le ppc64le ppc64le GNU/Linux

		E.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] compiler: define QEMU_CACHELINE_SIZE
  2017-06-06 16:11     ` Emilio G. Cota
@ 2017-06-06 17:39       ` Richard Henderson
  2017-06-06 20:28         ` Geert Martin Ijewski
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2017-06-06 17:39 UTC (permalink / raw)
  To: Emilio G. Cota, Pranith Kumar
  Cc: qemu-devel, Alex Bennée, Peter Maydell, Paolo Bonzini

On 06/06/2017 09:11 AM, Emilio G. Cota wrote:
> On Tue, Jun 06, 2017 at 01:39:45 -0400, Pranith Kumar wrote:
>> On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <cota@braap.org> wrote:
>>> This is a constant used as a hint for padding structs to hopefully avoid
>>> false cache line sharing.
>>>
>>> The constant can be set at configure time by defining QEMU_CACHELINE_SIZE
>>> via --extra-cflags. If not set there, we try to obtain the value from
>>> the machine running the configure script. If we fail, we default to
>>> reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all others.
> (snip)
>> Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)?
> 
> I tried using sysconf, but it doesn't work on the PowerPC machine I have
> access to (it returns 0). It might be a machine-specific thing though-I
> don't know. Here's the machine's `uname -a':
>    Linux gcc2-power8.osuosl.org 3.10.0-514.10.2.el7.ppc64le #1 SMP Fri Mar \
>      3 16:16:38 GMT 2017 ppc64le ppc64le ppc64le GNU/Linux

Well that's unfortunate.

Doing some digging, the kernel has provided the info to userland via elf auxv 
data since the beginning of time (aka initial git repository build), but glibc 
still does not export that information properly for ppc.

For ppc, you can get what we want from qemu_getauxval(AT_ICACHEBSIZE).  Indeed, 
we already have 4 different system dependent methods for determining the icache 
size in tcg/ppc/tcg-target.inc.c.

So what I think we ought to do is create a new util/cachesize.c like so:

unsigned qemu_icache_linesize = 64;
unsigned qemu_dcache_linesize = 64;

static void init_icache_data(void)
{
#ifdef _SC_LEVEL1_ICACHE_LINESIZE
     {
         long x = sysconf(_SC_LEVEL1_ICACHE_LINESIZE);
         if (x > 0) {
             qemu_icache_linesize = x;
             return;
         }
     }
#endif
#ifdef AT_ICACHEBSIZE
     {
         unsigned long x = qemu_getauxval(AT_ICACHEBSIZE);
         if (x > 0) {
             qemu_icache_linesize = x;
             return;
         }
     }
#endif
     // Other system specific methods.
}

static void init_dcache_data(void)
{
     // Similarly.
}

static void __attribute__((constructor)) init_cache_data(void)
{
     init_icache_data();
     init_dcache_data();
}

In particular, I think you want to be padding to the icache linesize rather 
than the dcache linesize since what we're attempting is to avoid writable data 
in the icache.


r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] compiler: define QEMU_CACHELINE_SIZE
  2017-06-06 17:39       ` Richard Henderson
@ 2017-06-06 20:28         ` Geert Martin Ijewski
  2017-06-06 21:38           ` Emilio G. Cota
  0 siblings, 1 reply; 18+ messages in thread
From: Geert Martin Ijewski @ 2017-06-06 20:28 UTC (permalink / raw)
  To: Richard Henderson, Emilio G. Cota, Pranith Kumar
  Cc: Peter Maydell, Alex Bennée, qemu-devel, Paolo Bonzini

Am 06.06.2017 um 19:39 schrieb Richard Henderson:
> On 06/06/2017 09:11 AM, Emilio G. Cota wrote:
>> On Tue, Jun 06, 2017 at 01:39:45 -0400, Pranith Kumar wrote:
>>> On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <cota@braap.org> wrote:
>>>> This is a constant used as a hint for padding structs to hopefully 
>>>> avoid
>>>> false cache line sharing.
>>>>
>>>> The constant can be set at configure time by defining 
>>>> QEMU_CACHELINE_SIZE
>>>> via --extra-cflags. If not set there, we try to obtain the value from
>>>> the machine running the configure script. If we fail, we default to
>>>> reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all 
>>>> others.
>> (snip)
>>> Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)?
>>
>> I tried using sysconf, but it doesn't work on the PowerPC machine I have
>> access to (it returns 0). It might be a machine-specific thing though-I
>> don't know. Here's the machine's `uname -a':
>>    Linux gcc2-power8.osuosl.org 3.10.0-514.10.2.el7.ppc64le #1 SMP Fri 
>> Mar \
>>      3 16:16:38 GMT 2017 ppc64le ppc64le ppc64le GNU/Linux
> 
> Well that's unfortunate.
> 
> Doing some digging, the kernel has provided the info to userland via elf 
> auxv data since the beginning of time (aka initial git repository 
> build), but glibc still does not export that information properly for ppc.
> 
> For ppc, you can get what we want from qemu_getauxval(AT_ICACHEBSIZE).  
> Indeed, we already have 4 different system dependent methods for 
> determining the icache size in tcg/ppc/tcg-target.inc.c.
> 
> So what I think we ought to do is create a new util/cachesize.c like so:
> 
> unsigned qemu_icache_linesize = 64;
> unsigned qemu_dcache_linesize = 64;
> 
> static void init_icache_data(void)
> {
> #ifdef _SC_LEVEL1_ICACHE_LINESIZE
>      {
>          long x = sysconf(_SC_LEVEL1_ICACHE_LINESIZE);
>          if (x > 0) {
>              qemu_icache_linesize = x;
>              return;
>          }
>      }
> #endif
> #ifdef AT_ICACHEBSIZE
>      {
>          unsigned long x = qemu_getauxval(AT_ICACHEBSIZE);
>          if (x > 0) {
>              qemu_icache_linesize = x;
>              return;
>          }
>      }
> #endif
>      // Other system specific methods.

On a fully patched Windows 10 with an i5-4690 this code works for me (TM):

#ifdef _WIN32
     {
         DWORD bufferSize = 0;
         if (!GetLogicalProcessorInformation(0, &bufferSize) &&
                 GetLastError() == ERROR_INSUFFICIENT_BUFFER)
         {
             PSYSTEM_LOGICAL_PROCESSOR_INFORMATION buffer =
 
(PSYSTEM_LOGICAL_PROCESSOR_INFORMATION)g_malloc0(bufferSize);
             if (GetLogicalProcessorInformation(buffer, &bufferSize)) {
                 size_t i = 0,
                     numOfProcessors =
                         bufferSize /
                         sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION);
                 for (; i < numOfProcessors; i++) {
                     if (buffer[i].Relationship == RelationCache &&
                         buffer[i].Cache.Level == 1 &&
                         (  buffer[i].Cache.Type == CacheUnified ||
                            buffer[i].Cache.Type == CacheInstruction)
                         )
                     {
                         qemu_icache_linesize = buffer[i].Cache.LineSize;
                         break;
                     }
                 }
             }
             g_free(buffer);
         }
     }
#endif

I don't particularly like that stair of ifs style, so I guess if I were 
to do a proper patch this should become a function.
> }
> 
> static void init_dcache_data(void)
> {
>      // Similarly.

The code from above, just s/CacheInstruction/CacheData/ and 
s/qemu_icache/qemu_dcache/
> }
> 
> static void __attribute__((constructor)) init_cache_data(void)
> {
>      init_icache_data();
>      init_dcache_data();
> }
> 
> In particular, I think you want to be padding to the icache linesize 
> rather than the dcache linesize since what we're attempting is to avoid 
> writable data in the icache.
> 
> 
> r~
> 
> 

To quote from the documentation:
"RelationCache: [... snip ...]
Windows Server 2003:  This value is not supported until Windows Server 
2003 with SP1 and Windows XP Professional x64 Edition." --
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686694(v=vs.85).aspx

I'm not sure if that is considered a problem, as both systems aren't 
supported anymore for almost 2 years now.

Geert

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] compiler: define QEMU_CACHELINE_SIZE
  2017-06-06 20:28         ` Geert Martin Ijewski
@ 2017-06-06 21:38           ` Emilio G. Cota
  2017-06-06 22:01             ` Geert Martin Ijewski
  0 siblings, 1 reply; 18+ messages in thread
From: Emilio G. Cota @ 2017-06-06 21:38 UTC (permalink / raw)
  To: Geert Martin Ijewski
  Cc: Richard Henderson, Pranith Kumar, Peter Maydell,
	Alex Bennée, qemu-devel, Paolo Bonzini

On Tue, Jun 06, 2017 at 22:28:23 +0200, Geert Martin Ijewski wrote:
> On a fully patched Windows 10 with an i5-4690 this code works for me (TM):

Thanks!
Can you please test this?

		Emilio
---
#include "qemu/osdep.h"
#include <windows.h>

static unsigned int linesize_win(PROCESSOR_CACHE_TYPE type)
{
    PSYSTEM_LOGICAL_PROCESSOR_INFORMATION buf;
    DWORD size = 0;
    unsigned int ret = 0;
    BOOL success;
    size_t n;
    size_t i;

    success = GetLogicalProcessorInformation(0, &size);
    if (success || GetLastError() != ERROR_INSUFFICIENT_BUF) {
        return 0;
    }
    buf = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION)g_malloc0(size);
    if (!GetLogicalProcessorInformation(buf, &size)) {
        goto out;
    }

    n = size / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION);
    for (i = 0; i < n; i++) {
        if (buf[i].Relationship == RelationCache &&
            buf[i].Cache.Level == 1 &&
            (buf[i].Cache.Type == CacheUnified ||
             buf[i].Cache.Type == type)) {
            ret = buf[i].Cache.LineSize;
            break;
        }
    }
 out:
    g_free(buf);
    return ret;
}

linesize_win(CacheInstruction);
linesize_win(CacheData);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] compiler: define QEMU_CACHELINE_SIZE
  2017-06-06 21:38           ` Emilio G. Cota
@ 2017-06-06 22:01             ` Geert Martin Ijewski
  0 siblings, 0 replies; 18+ messages in thread
From: Geert Martin Ijewski @ 2017-06-06 22:01 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: Richard Henderson, Pranith Kumar, Peter Maydell,
	Alex Bennée, qemu-devel, Paolo Bonzini

Am 06.06.2017 um 23:38 schrieb Emilio G. Cota:
 > On Tue, Jun 06, 2017 at 22:28:23 +0200, Geert Martin Ijewski wrote:
 >> On a fully patched Windows 10 with an i5-4690 this code works for me 
(TM):
 >
 > Thanks!
 > Can you please test this?
 >
 > 		Emilio
 > ---
 > #include "qemu/osdep.h"
 > #include <windows.h>

unnecassary as it's already included by qemu/osdep.h -> sysemu/os-win32.h
 >
 > static unsigned int linesize_win(PROCESSOR_CACHE_TYPE type)
 > {
 >      PSYSTEM_LOGICAL_PROCESSOR_INFORMATION buf;
 >      DWORD size = 0;
 >      unsigned int ret = 0;
 >      BOOL success;
 >      size_t n;
 >      size_t i;
 >
 >      success = GetLogicalProcessorInformation(0, &size);
 >      if (success || GetLastError() != ERROR_INSUFFICIENT_BUF) {
 >          return 0;
 >      }
 >      buf = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION)g_malloc0(size);
 >      if (!GetLogicalProcessorInformation(buf, &size)) {
 >          goto out;
 >      }
 >
 >      n = size / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION);
 >      for (i = 0; i < n; i++) {
 >          if (buf[i].Relationship == RelationCache &&
 >              buf[i].Cache.Level == 1 &&
 >              (buf[i].Cache.Type == CacheUnified ||
 >               buf[i].Cache.Type == type)) {
 >              ret = buf[i].Cache.LineSize;
 >              break;
 >          }
 >      }
 >   out:
 >      g_free(buf);
 >      return ret;
 > }
 >
 > linesize_win(CacheInstruction);
 > linesize_win(CacheData);
 >
 >

Yes, that works.
Tested-by: Geert Martin Ijewski <gm.ijewski@web.de>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 2/3] tests: use QEMU_CACHELINE_SIZE instead of hard-coding it
  2017-06-05 22:49 [Qemu-devel] [PATCH v2 0/3] tcg: allocate TB structs preceding translated code Emilio G. Cota
  2017-06-05 22:49 ` [Qemu-devel] [PATCH v2 1/3] compiler: define QEMU_CACHELINE_SIZE Emilio G. Cota
@ 2017-06-05 22:49 ` Emilio G. Cota
  2017-06-06  5:40   ` Pranith Kumar
  2017-06-05 22:49 ` [Qemu-devel] [PATCH v2 3/3] tcg: allocate TB structs before the corresponding translated code Emilio G. Cota
  2 siblings, 1 reply; 18+ messages in thread
From: Emilio G. Cota @ 2017-06-05 22:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, alex.bennee, Peter Maydell, Paolo Bonzini,
	Pranith Kumar

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tests/atomic_add-bench.c | 4 ++--
 tests/qht-bench.c        | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
index caa1e8e..c219109 100644
--- a/tests/atomic_add-bench.c
+++ b/tests/atomic_add-bench.c
@@ -5,11 +5,11 @@
 
 struct thread_info {
     uint64_t r;
-} QEMU_ALIGNED(64);
+} QEMU_ALIGNED(QEMU_CACHELINE_SIZE);
 
 struct count {
     unsigned long val;
-} QEMU_ALIGNED(64);
+} QEMU_ALIGNED(QEMU_CACHELINE_SIZE);
 
 static QemuThread *threads;
 static struct thread_info *th_info;
diff --git a/tests/qht-bench.c b/tests/qht-bench.c
index 2afa09d..3f4b5eb 100644
--- a/tests/qht-bench.c
+++ b/tests/qht-bench.c
@@ -28,7 +28,7 @@ struct thread_info {
     uint64_t r;
     bool write_op; /* writes alternate between insertions and removals */
     bool resize_down;
-} QEMU_ALIGNED(64); /* avoid false sharing among threads */
+} QEMU_ALIGNED(QEMU_CACHELINE_SIZE); /* avoid false sharing among threads */
 
 static struct qht ht;
 static QemuThread *rw_threads;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] tests: use QEMU_CACHELINE_SIZE instead of hard-coding it
  2017-06-05 22:49 ` [Qemu-devel] [PATCH v2 2/3] tests: use QEMU_CACHELINE_SIZE instead of hard-coding it Emilio G. Cota
@ 2017-06-06  5:40   ` Pranith Kumar
  0 siblings, 0 replies; 18+ messages in thread
From: Pranith Kumar @ 2017-06-06  5:40 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Richard Henderson, Alex Bennée, Peter Maydell,
	Paolo Bonzini

On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <cota@braap.org> wrote:
> Signed-off-by: Emilio G. Cota <cota@braap.org>

Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>

> ---
>  tests/atomic_add-bench.c | 4 ++--
>  tests/qht-bench.c        | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
> index caa1e8e..c219109 100644
> --- a/tests/atomic_add-bench.c
> +++ b/tests/atomic_add-bench.c
> @@ -5,11 +5,11 @@
>
>  struct thread_info {
>      uint64_t r;
> -} QEMU_ALIGNED(64);
> +} QEMU_ALIGNED(QEMU_CACHELINE_SIZE);
>
>  struct count {
>      unsigned long val;
> -} QEMU_ALIGNED(64);
> +} QEMU_ALIGNED(QEMU_CACHELINE_SIZE);
>
>  static QemuThread *threads;
>  static struct thread_info *th_info;
> diff --git a/tests/qht-bench.c b/tests/qht-bench.c
> index 2afa09d..3f4b5eb 100644
> --- a/tests/qht-bench.c
> +++ b/tests/qht-bench.c
> @@ -28,7 +28,7 @@ struct thread_info {
>      uint64_t r;
>      bool write_op; /* writes alternate between insertions and removals */
>      bool resize_down;
> -} QEMU_ALIGNED(64); /* avoid false sharing among threads */
> +} QEMU_ALIGNED(QEMU_CACHELINE_SIZE); /* avoid false sharing among threads */
>
>  static struct qht ht;
>  static QemuThread *rw_threads;
> --
> 2.7.4
>



-- 
Pranith

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v2 3/3] tcg: allocate TB structs before the corresponding translated code
  2017-06-05 22:49 [Qemu-devel] [PATCH v2 0/3] tcg: allocate TB structs preceding translated code Emilio G. Cota
  2017-06-05 22:49 ` [Qemu-devel] [PATCH v2 1/3] compiler: define QEMU_CACHELINE_SIZE Emilio G. Cota
  2017-06-05 22:49 ` [Qemu-devel] [PATCH v2 2/3] tests: use QEMU_CACHELINE_SIZE instead of hard-coding it Emilio G. Cota
@ 2017-06-05 22:49 ` Emilio G. Cota
  2017-06-06  5:36   ` Pranith Kumar
  2017-06-06  8:24   ` Richard Henderson
  2 siblings, 2 replies; 18+ messages in thread
From: Emilio G. Cota @ 2017-06-05 22:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, alex.bennee, Peter Maydell, Paolo Bonzini,
	Pranith Kumar

Allocating an arbitrarily-sized array of tbs results in either
(a) a lot of memory wasted or (b) unnecessary flushes of the code
cache when we run out of TB structs in the array.

An obvious solution would be to just malloc a TB struct when needed,
and keep the TB array as an array of pointers (recall that tb_find_pc()
needs the TB array to run in O(log n)).

Perhaps a better solution, which is implemented in this patch, is to
allocate TB's right before the translated code they describe. This
results in some memory waste due to padding to have code and TBs in
separate cache lines--for instance, I measured 4.7% of padding in the
used portion of code_gen_buffer when booting aarch64 Linux on a
host with 64-byte cache lines. However, it can allow for optimizations
in some host architectures, since TCG backends could safely assume that
the TB and the corresponding translated code are very close to each
other in memory. See this message by rth for a detailed explanation:

  https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05172.html
  Subject: Re: GSoC 2017 Proposal: TCG performance enhancements
  Message-ID: <1e67644b-4b30-887e-d329-1848e94c9484@twiddle.net>

Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/exec-all.h   |  2 +-
 include/exec/tb-context.h |  3 ++-
 tcg/tcg.c                 | 16 ++++++++++++++++
 tcg/tcg.h                 |  2 +-
 translate-all.c           | 37 ++++++++++++++++++++++---------------
 5 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 87ae10b..00c0f43 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -363,7 +363,7 @@ struct TranslationBlock {
      */
     uintptr_t jmp_list_next[2];
     uintptr_t jmp_list_first;
-};
+} QEMU_ALIGNED(QEMU_CACHELINE_SIZE);
 
 void tb_free(TranslationBlock *tb);
 void tb_flush(CPUState *cpu);
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index c7f17f2..25c2afe 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -31,8 +31,9 @@ typedef struct TBContext TBContext;
 
 struct TBContext {
 
-    TranslationBlock *tbs;
+    TranslationBlock **tbs;
     struct qht htable;
+    size_t tbs_size;
     int nb_tbs;
     /* any access to the tbs or the page table must use this lock */
     QemuMutex tb_lock;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 564292f..f657c51 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -383,6 +383,22 @@ void tcg_context_init(TCGContext *s)
     }
 }
 
+/*
+ * Allocate TBs right before their corresponding translated code, making
+ * sure that TBs and code are on different cache lines.
+ */
+TranslationBlock *tcg_tb_alloc(TCGContext *s)
+{
+    void *aligned;
+
+    aligned = (void *)ROUND_UP((uintptr_t)s->code_gen_ptr, QEMU_CACHELINE_SIZE);
+    if (unlikely(aligned + sizeof(TranslationBlock) > s->code_gen_highwater)) {
+        return NULL;
+    }
+    s->code_gen_ptr += aligned - s->code_gen_ptr + sizeof(TranslationBlock);
+    return aligned;
+}
+
 void tcg_prologue_init(TCGContext *s)
 {
     size_t prologue_size, total_size;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 5ec48d1..9e37722 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -697,7 +697,6 @@ struct TCGContext {
        here, because there's too much arithmetic throughout that relies
        on addition and subtraction working on bytes.  Rely on the GCC
        extension that allows arithmetic on void*.  */
-    int code_gen_max_blocks;
     void *code_gen_prologue;
     void *code_gen_epilogue;
     void *code_gen_buffer;
@@ -756,6 +755,7 @@ static inline bool tcg_op_buf_full(void)
 /* tb_lock must be held for tcg_malloc_internal. */
 void *tcg_malloc_internal(TCGContext *s, int size);
 void tcg_pool_reset(TCGContext *s);
+TranslationBlock *tcg_tb_alloc(TCGContext *s);
 
 void tb_lock(void);
 void tb_unlock(void);
diff --git a/translate-all.c b/translate-all.c
index b3ee876..0eb9d13 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -781,12 +781,13 @@ static inline void code_gen_alloc(size_t tb_size)
         exit(1);
     }
 
-    /* Estimate a good size for the number of TBs we can support.  We
-       still haven't deducted the prologue from the buffer size here,
-       but that's minimal and won't affect the estimate much.  */
-    tcg_ctx.code_gen_max_blocks
-        = tcg_ctx.code_gen_buffer_size / CODE_GEN_AVG_BLOCK_SIZE;
-    tcg_ctx.tb_ctx.tbs = g_new(TranslationBlock, tcg_ctx.code_gen_max_blocks);
+    /* size this conservatively -- realloc later if needed */
+    tcg_ctx.tb_ctx.tbs_size =
+        tcg_ctx.code_gen_buffer_size / CODE_GEN_AVG_BLOCK_SIZE / 8;
+    if (unlikely(!tcg_ctx.tb_ctx.tbs_size)) {
+        tcg_ctx.tb_ctx.tbs_size = 64 * 1024;
+    }
+    tcg_ctx.tb_ctx.tbs = g_new(TranslationBlock *, tcg_ctx.tb_ctx.tbs_size);
 
     qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
 }
@@ -828,13 +829,20 @@ bool tcg_enabled(void)
 static TranslationBlock *tb_alloc(target_ulong pc)
 {
     TranslationBlock *tb;
+    TBContext *ctx;
 
     assert_tb_locked();
 
-    if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks) {
+    tb = tcg_tb_alloc(&tcg_ctx);
+    if (unlikely(tb == NULL)) {
         return NULL;
     }
-    tb = &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];
+    ctx = &tcg_ctx.tb_ctx;
+    if (unlikely(ctx->nb_tbs == ctx->tbs_size)) {
+        ctx->tbs_size *= 2;
+        ctx->tbs = g_renew(TranslationBlock *, ctx->tbs, ctx->tbs_size);
+    }
+    ctx->tbs[ctx->nb_tbs++] = tb;
     tb->pc = pc;
     tb->cflags = 0;
     tb->invalid = false;
@@ -850,8 +858,8 @@ void tb_free(TranslationBlock *tb)
        Ignore the hard cases and just back up if this TB happens to
        be the last one generated.  */
     if (tcg_ctx.tb_ctx.nb_tbs > 0 &&
-            tb == &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs - 1]) {
-        tcg_ctx.code_gen_ptr = tb->tc_ptr;
+            tb == tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs - 1]) {
+        tcg_ctx.code_gen_ptr = tb->tc_ptr - sizeof(TranslationBlock);
         tcg_ctx.tb_ctx.nb_tbs--;
     }
 }
@@ -1666,7 +1674,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
     m_max = tcg_ctx.tb_ctx.nb_tbs - 1;
     while (m_min <= m_max) {
         m = (m_min + m_max) >> 1;
-        tb = &tcg_ctx.tb_ctx.tbs[m];
+        tb = tcg_ctx.tb_ctx.tbs[m];
         v = (uintptr_t)tb->tc_ptr;
         if (v == tc_ptr) {
             return tb;
@@ -1676,7 +1684,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
             m_min = m + 1;
         }
     }
-    return &tcg_ctx.tb_ctx.tbs[m_max];
+    return tcg_ctx.tb_ctx.tbs[m_max];
 }
 
 #if !defined(CONFIG_USER_ONLY)
@@ -1874,7 +1882,7 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     direct_jmp_count = 0;
     direct_jmp2_count = 0;
     for (i = 0; i < tcg_ctx.tb_ctx.nb_tbs; i++) {
-        tb = &tcg_ctx.tb_ctx.tbs[i];
+        tb = tcg_ctx.tb_ctx.tbs[i];
         target_code_size += tb->size;
         if (tb->size > max_target_code_size) {
             max_target_code_size = tb->size;
@@ -1894,8 +1902,7 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     cpu_fprintf(f, "gen code size       %td/%zd\n",
                 tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer,
                 tcg_ctx.code_gen_highwater - tcg_ctx.code_gen_buffer);
-    cpu_fprintf(f, "TB count            %d/%d\n",
-            tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.code_gen_max_blocks);
+    cpu_fprintf(f, "TB count            %d\n", tcg_ctx.tb_ctx.nb_tbs);
     cpu_fprintf(f, "TB avg target size  %d max=%d bytes\n",
             tcg_ctx.tb_ctx.nb_tbs ? target_code_size /
                     tcg_ctx.tb_ctx.nb_tbs : 0,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] tcg: allocate TB structs before the corresponding translated code
  2017-06-05 22:49 ` [Qemu-devel] [PATCH v2 3/3] tcg: allocate TB structs before the corresponding translated code Emilio G. Cota
@ 2017-06-06  5:36   ` Pranith Kumar
  2017-06-06 17:13     ` Emilio G. Cota
  2017-06-06  8:24   ` Richard Henderson
  1 sibling, 1 reply; 18+ messages in thread
From: Pranith Kumar @ 2017-06-06  5:36 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Richard Henderson, Alex Bennée, Peter Maydell,
	Paolo Bonzini

On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <cota@braap.org> wrote:
> Allocating an arbitrarily-sized array of tbs results in either
> (a) a lot of memory wasted or (b) unnecessary flushes of the code
> cache when we run out of TB structs in the array.
>
> An obvious solution would be to just malloc a TB struct when needed,
> and keep the TB array as an array of pointers (recall that tb_find_pc()
> needs the TB array to run in O(log n)).
>
> Perhaps a better solution, which is implemented in this patch, is to
> allocate TB's right before the translated code they describe. This
> results in some memory waste due to padding to have code and TBs in
> separate cache lines--for instance, I measured 4.7% of padding in the
> used portion of code_gen_buffer when booting aarch64 Linux on a
> host with 64-byte cache lines. However, it can allow for optimizations
> in some host architectures, since TCG backends could safely assume that
> the TB and the corresponding translated code are very close to each
> other in memory. See this message by rth for a detailed explanation:
>
>   https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05172.html
>   Subject: Re: GSoC 2017 Proposal: TCG performance enhancements
>   Message-ID: <1e67644b-4b30-887e-d329-1848e94c9484@twiddle.net>

Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>

Thanks for doing this Emilio. Do you plan to continue working on rth's
suggestions in that email? If so, can we co-ordinate our work?

-- 
Pranith

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] tcg: allocate TB structs before the corresponding translated code
  2017-06-06  5:36   ` Pranith Kumar
@ 2017-06-06 17:13     ` Emilio G. Cota
  0 siblings, 0 replies; 18+ messages in thread
From: Emilio G. Cota @ 2017-06-06 17:13 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: qemu-devel, Richard Henderson, Alex Bennée, Peter Maydell,
	Paolo Bonzini

On Tue, Jun 06, 2017 at 01:36:50 -0400, Pranith Kumar wrote:
> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
> 
> Thanks for doing this Emilio. Do you plan to continue working on rth's
> suggestions in that email? If so, can we co-ordinate our work?

My plan is to work on instrumentation. This was just low-hanging fruit;
I was curious to see the impact on cache miss rates of bringing the TB's
close to the corresponding translated code. Turns out it's pretty small
or my L1's are too big :-) The memory savings are significant though,
with the added benefit that this can enable more efficient translated
code as Richard pointed out.

I've just left a message on the GSoC thread with ideas.

		Emilio

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] tcg: allocate TB structs before the corresponding translated code
  2017-06-05 22:49 ` [Qemu-devel] [PATCH v2 3/3] tcg: allocate TB structs before the corresponding translated code Emilio G. Cota
  2017-06-06  5:36   ` Pranith Kumar
@ 2017-06-06  8:24   ` Richard Henderson
  2017-06-06 16:25     ` Emilio G. Cota
  1 sibling, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2017-06-06  8:24 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: alex.bennee, Peter Maydell, Paolo Bonzini, Pranith Kumar

On 06/05/2017 03:49 PM, Emilio G. Cota wrote:
> +TranslationBlock *tcg_tb_alloc(TCGContext *s)
> +{
> +    void *aligned;
> +
> +    aligned = (void *)ROUND_UP((uintptr_t)s->code_gen_ptr, QEMU_CACHELINE_SIZE);
> +    if (unlikely(aligned + sizeof(TranslationBlock) > s->code_gen_highwater)) {
> +        return NULL;
> +    }
> +    s->code_gen_ptr += aligned - s->code_gen_ptr + sizeof(TranslationBlock);
> +    return aligned;

We don't really need the 2/3 patch.  We don't gain anything by telling the 
compiler that the structure is more aligned than it needs to be.

We can query the line size at runtime, as suggested by Pranith, and use that 
for the alignment here.  Which means that the binary isn't tied to a particular 
cpu implementation, which is clearly preferable for distributions.


r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] tcg: allocate TB structs before the corresponding translated code
  2017-06-06  8:24   ` Richard Henderson
@ 2017-06-06 16:25     ` Emilio G. Cota
  2017-06-06 17:02       ` Richard Henderson
  0 siblings, 1 reply; 18+ messages in thread
From: Emilio G. Cota @ 2017-06-06 16:25 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, alex.bennee, Peter Maydell, Paolo Bonzini, Pranith Kumar

On Tue, Jun 06, 2017 at 01:24:11 -0700, Richard Henderson wrote:
> On 06/05/2017 03:49 PM, Emilio G. Cota wrote:
> >+TranslationBlock *tcg_tb_alloc(TCGContext *s)
> >+{
> >+    void *aligned;
> >+
> >+    aligned = (void *)ROUND_UP((uintptr_t)s->code_gen_ptr, QEMU_CACHELINE_SIZE);
> >+    if (unlikely(aligned + sizeof(TranslationBlock) > s->code_gen_highwater)) {
> >+        return NULL;
> >+    }
> >+    s->code_gen_ptr += aligned - s->code_gen_ptr + sizeof(TranslationBlock);
> >+    return aligned;
> 
> We don't really need the 2/3 patch.  We don't gain anything by telling the
> compiler that the structure is more aligned than it needs to be.

The compile-time requirement is for the compiler to pad the structs
appropriately; this is critical to avoid false sharing when allocating
arrays of structs like those test programs do.

> We can query the line size at runtime, as suggested by Pranith, and use that
> for the alignment here.  Which means that the binary isn't tied to a
> particular cpu implementation, which is clearly preferable for
> distributions.

For this particular case we can get away without padding the structs if
we're OK with having the end of a TB struct immediately followed
by its translated code, instead of having that code on the following
cache line.

		E.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] tcg: allocate TB structs before the corresponding translated code
  2017-06-06 16:25     ` Emilio G. Cota
@ 2017-06-06 17:02       ` Richard Henderson
  2017-06-06 17:31         ` Emilio G. Cota
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2017-06-06 17:02 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, alex.bennee, Peter Maydell, Paolo Bonzini, Pranith Kumar

On 06/06/2017 09:25 AM, Emilio G. Cota wrote:
> For this particular case we can get away without padding the structs if
> we're OK with having the end of a TB struct immediately followed
> by its translated code, instead of having that code on the following
> cache line.

Uh, no, if you can manually pad before the struct, you can manually pad after 
the struct too.


r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] tcg: allocate TB structs before the corresponding translated code
  2017-06-06 17:02       ` Richard Henderson
@ 2017-06-06 17:31         ` Emilio G. Cota
  0 siblings, 0 replies; 18+ messages in thread
From: Emilio G. Cota @ 2017-06-06 17:31 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, alex.bennee, Peter Maydell, Paolo Bonzini, Pranith Kumar

On Tue, Jun 06, 2017 at 10:02:17 -0700, Richard Henderson wrote:
> On 06/06/2017 09:25 AM, Emilio G. Cota wrote:
> >For this particular case we can get away without padding the structs if
> >we're OK with having the end of a TB struct immediately followed
> >by its translated code, instead of having that code on the following
> >cache line.
> 
> Uh, no, if you can manually pad before the struct, you can manually pad
> after the struct too.

Yes of course =) I'll respin the series to do this.

		E.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-06-06 22:02 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-05 22:49 [Qemu-devel] [PATCH v2 0/3] tcg: allocate TB structs preceding translated code Emilio G. Cota
2017-06-05 22:49 ` [Qemu-devel] [PATCH v2 1/3] compiler: define QEMU_CACHELINE_SIZE Emilio G. Cota
2017-06-06  5:39   ` Pranith Kumar
2017-06-06  8:18     ` Richard Henderson
2017-06-06 16:11     ` Emilio G. Cota
2017-06-06 17:39       ` Richard Henderson
2017-06-06 20:28         ` Geert Martin Ijewski
2017-06-06 21:38           ` Emilio G. Cota
2017-06-06 22:01             ` Geert Martin Ijewski
2017-06-05 22:49 ` [Qemu-devel] [PATCH v2 2/3] tests: use QEMU_CACHELINE_SIZE instead of hard-coding it Emilio G. Cota
2017-06-06  5:40   ` Pranith Kumar
2017-06-05 22:49 ` [Qemu-devel] [PATCH v2 3/3] tcg: allocate TB structs before the corresponding translated code Emilio G. Cota
2017-06-06  5:36   ` Pranith Kumar
2017-06-06 17:13     ` Emilio G. Cota
2017-06-06  8:24   ` Richard Henderson
2017-06-06 16:25     ` Emilio G. Cota
2017-06-06 17:02       ` Richard Henderson
2017-06-06 17:31         ` Emilio G. Cota

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.