All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH  v1 0/4] Fix codegen translation cache size
@ 2020-02-26 18:10 Alex Bennée
  2020-02-26 18:10 ` [PATCH v1 1/4] accel/tcg: use units.h for defining code gen buffer sizes Alex Bennée
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Alex Bennée @ 2020-02-26 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, Alex Bennée

Hi,

Slightly more cleaned up series based on what I sent earlier today to
fix the recent slowdown in emulation caused by an overly small
translation cache.

Please review.

Alex Bennée (4):
  accel/tcg: use units.h for defining code gen buffer sizes
  accel/tcg: remove link between guest ram and TCG cache size
  accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts
  accel/tcg: increase default code gen buffer size for 64 bit

 accel/tcg/translate-all.c | 42 +++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 22 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v1 1/4] accel/tcg: use units.h for defining code gen buffer sizes
  2020-02-26 18:10 [PATCH v1 0/4] Fix codegen translation cache size Alex Bennée
@ 2020-02-26 18:10 ` Alex Bennée
  2020-02-26 22:00   ` Niek Linnenbank
                     ` (2 more replies)
  2020-02-26 18:10 ` [PATCH v1 2/4] accel/tcg: remove link between guest ram and TCG cache size Alex Bennée
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 26+ messages in thread
From: Alex Bennée @ 2020-02-26 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, qemu-arm, Alex Bennée, Richard Henderson

It's easier to read.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 accel/tcg/translate-all.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index a08ab11f657..238b0e575bf 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "qemu-common.h"
 
 #define NO_CPU_IO_DEFS
@@ -901,33 +902,33 @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
 
 /* Minimum size of the code gen buffer.  This number is randomly chosen,
    but not so small that we can't have a fair number of TB's live.  */
-#define MIN_CODE_GEN_BUFFER_SIZE     (1024u * 1024)
+#define MIN_CODE_GEN_BUFFER_SIZE     (1 * MiB)
 
 /* Maximum size of the code gen buffer we'd like to use.  Unless otherwise
    indicated, this is constrained by the range of direct branches on the
    host cpu, as used by the TCG implementation of goto_tb.  */
 #if defined(__x86_64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #elif defined(__sparc__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #elif defined(__powerpc64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #elif defined(__powerpc__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (32u * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (32 * MiB)
 #elif defined(__aarch64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #elif defined(__s390x__)
   /* We have a +- 4GB range on the branches; leave some slop.  */
-# define MAX_CODE_GEN_BUFFER_SIZE  (3ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (3 * GiB)
 #elif defined(__mips__)
   /* We have a 256MB branch region, but leave room to make sure the
      main executable is also within that region.  */
-# define MAX_CODE_GEN_BUFFER_SIZE  (128ul * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (128 * MiB)
 #else
 # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 #endif
 
-#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32u * 1024 * 1024)
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
 
 #define DEFAULT_CODE_GEN_BUFFER_SIZE \
   (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v1 2/4] accel/tcg: remove link between guest ram and TCG cache size
  2020-02-26 18:10 [PATCH v1 0/4] Fix codegen translation cache size Alex Bennée
  2020-02-26 18:10 ` [PATCH v1 1/4] accel/tcg: use units.h for defining code gen buffer sizes Alex Bennée
@ 2020-02-26 18:10 ` Alex Bennée
  2020-02-26 22:26   ` Niek Linnenbank
                     ` (2 more replies)
  2020-02-26 18:10 ` [PATCH v1 3/4] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts Alex Bennée
  2020-02-26 18:10 ` [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit Alex Bennée
  3 siblings, 3 replies; 26+ messages in thread
From: Alex Bennée @ 2020-02-26 18:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Niek Linnenbank, qemu-arm, Paolo Bonzini, Igor Mammedov,
	Alex Bennée, Richard Henderson

Basing the TB cache size on the ram_size was always a little heuristic
and was broken by a1b18df9a4 which caused ram_size not to be fully
realised at the time we initialise the TCG translation cache.

The current DEFAULT_CODE_GEN_BUFFER_SIZE may still be a little small
but follow-up patches will address that.

Fixes: a1b18df9a4
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Niek Linnenbank <nieklinnenbank@gmail.com>
Cc: Igor Mammedov <imammedo@redhat.com>
---
 accel/tcg/translate-all.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 238b0e575bf..5b66af783b5 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -938,15 +938,7 @@ static inline size_t size_code_gen_buffer(size_t tb_size)
 {
     /* Size the buffer.  */
     if (tb_size == 0) {
-#ifdef USE_STATIC_CODE_GEN_BUFFER
         tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
-#else
-        /* ??? Needs adjustments.  */
-        /* ??? If we relax the requirement that CONFIG_USER_ONLY use the
-           static buffer, we could size this on RESERVED_VA, on the text
-           segment size of the executable, or continue to use the default.  */
-        tb_size = (unsigned long)(ram_size / 4);
-#endif
     }
     if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
         tb_size = MIN_CODE_GEN_BUFFER_SIZE;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v1 3/4] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts
  2020-02-26 18:10 [PATCH v1 0/4] Fix codegen translation cache size Alex Bennée
  2020-02-26 18:10 ` [PATCH v1 1/4] accel/tcg: use units.h for defining code gen buffer sizes Alex Bennée
  2020-02-26 18:10 ` [PATCH v1 2/4] accel/tcg: remove link between guest ram and TCG cache size Alex Bennée
@ 2020-02-26 18:10 ` Alex Bennée
  2020-02-26 22:50   ` Richard Henderson
                     ` (2 more replies)
  2020-02-26 18:10 ` [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit Alex Bennée
  3 siblings, 3 replies; 26+ messages in thread
From: Alex Bennée @ 2020-02-26 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, qemu-arm, Alex Bennée, Richard Henderson

There is no particular reason to use a static codegen buffer on 64 bit
hosts as we have address space to burn. Allow the common CONFIG_USER
case to use the mmap'ed buffers like SoftMMU.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 accel/tcg/translate-all.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 5b66af783b5..4ce5d1b3931 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -892,11 +892,12 @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
     }
 }
 
-#if defined(CONFIG_USER_ONLY)
-/* Currently it is not recommended to allocate big chunks of data in
-   user mode. It will change when a dedicated libc will be used.  */
-/* ??? 64-bit hosts ought to have no problem mmaping data outside the
-   region in which the guest needs to run.  Revisit this.  */
+#if defined(CONFIG_USER_ONLY) && TCG_TARGET_REG_BITS == 32
+/*
+ * For user mode on smaller 32 bit systems we may run into trouble
+ * allocating big chunks of data in the right place. On these systems
+ * we utilise a static code generation buffer directly in the binary.
+ */
 #define USE_STATIC_CODE_GEN_BUFFER
 #endif
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit
  2020-02-26 18:10 [PATCH v1 0/4] Fix codegen translation cache size Alex Bennée
                   ` (2 preceding siblings ...)
  2020-02-26 18:10 ` [PATCH v1 3/4] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts Alex Bennée
@ 2020-02-26 18:10 ` Alex Bennée
  2020-02-26 22:45   ` Niek Linnenbank
  2020-02-26 22:55   ` Richard Henderson
  3 siblings, 2 replies; 26+ messages in thread
From: Alex Bennée @ 2020-02-26 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, qemu-arm, Alex Bennée, Richard Henderson

While 32mb is certainly usable a full system boot ends up flushing the
codegen buffer nearly 100 times. Increase the default on 64 bit hosts
to take advantage of all that spare memory. After this change I can
boot my tests system without any TB flushes.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 accel/tcg/translate-all.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 4ce5d1b3931..f7baa512059 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -929,7 +929,11 @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
 # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 #endif
 
+#if TCG_TARGET_REG_BITS == 32
 #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
+#else
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (2 * GiB)
+#endif
 
 #define DEFAULT_CODE_GEN_BUFFER_SIZE \
   (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 1/4] accel/tcg: use units.h for defining code gen buffer sizes
  2020-02-26 18:10 ` [PATCH v1 1/4] accel/tcg: use units.h for defining code gen buffer sizes Alex Bennée
@ 2020-02-26 22:00   ` Niek Linnenbank
  2020-02-26 22:49   ` Richard Henderson
  2020-02-27 10:54   ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 26+ messages in thread
From: Niek Linnenbank @ 2020-02-26 22:00 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Paolo Bonzini, qemu-arm, QEMU Developers, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 2816 bytes --]

On Wed, Feb 26, 2020 at 7:11 PM Alex Bennée <alex.bennee@linaro.org> wrote:

> It's easier to read.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>


> ---
>  accel/tcg/translate-all.c | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index a08ab11f657..238b0e575bf 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -18,6 +18,7 @@
>   */
>
>  #include "qemu/osdep.h"
> +#include "qemu/units.h"
>  #include "qemu-common.h"
>
>  #define NO_CPU_IO_DEFS
> @@ -901,33 +902,33 @@ static void page_lock_pair(PageDesc **ret_p1,
> tb_page_addr_t phys1,
>
>  /* Minimum size of the code gen buffer.  This number is randomly chosen,
>     but not so small that we can't have a fair number of TB's live.  */
> -#define MIN_CODE_GEN_BUFFER_SIZE     (1024u * 1024)
> +#define MIN_CODE_GEN_BUFFER_SIZE     (1 * MiB)
>
>  /* Maximum size of the code gen buffer we'd like to use.  Unless otherwise
>     indicated, this is constrained by the range of direct branches on the
>     host cpu, as used by the TCG implementation of goto_tb.  */
>  #if defined(__x86_64__)
> -# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
> +# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
>  #elif defined(__sparc__)
> -# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
> +# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
>  #elif defined(__powerpc64__)
> -# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
> +# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
>  #elif defined(__powerpc__)
> -# define MAX_CODE_GEN_BUFFER_SIZE  (32u * 1024 * 1024)
> +# define MAX_CODE_GEN_BUFFER_SIZE  (32 * MiB)
>  #elif defined(__aarch64__)
> -# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
> +# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
>  #elif defined(__s390x__)
>    /* We have a +- 4GB range on the branches; leave some slop.  */
> -# define MAX_CODE_GEN_BUFFER_SIZE  (3ul * 1024 * 1024 * 1024)
> +# define MAX_CODE_GEN_BUFFER_SIZE  (3 * GiB)
>  #elif defined(__mips__)
>    /* We have a 256MB branch region, but leave room to make sure the
>       main executable is also within that region.  */
> -# define MAX_CODE_GEN_BUFFER_SIZE  (128ul * 1024 * 1024)
> +# define MAX_CODE_GEN_BUFFER_SIZE  (128 * MiB)
>  #else
>  # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
>  #endif
>
> -#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32u * 1024 * 1024)
> +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
>
>  #define DEFAULT_CODE_GEN_BUFFER_SIZE \
>    (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
> --
> 2.20.1
>
>
>

-- 
Niek Linnenbank

[-- Attachment #2: Type: text/html, Size: 3744 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/4] accel/tcg: remove link between guest ram and TCG cache size
  2020-02-26 18:10 ` [PATCH v1 2/4] accel/tcg: remove link between guest ram and TCG cache size Alex Bennée
@ 2020-02-26 22:26   ` Niek Linnenbank
  2020-02-26 22:50     ` Richard Henderson
  2020-02-26 22:49   ` Richard Henderson
  2020-02-27 10:58   ` Philippe Mathieu-Daudé
  2 siblings, 1 reply; 26+ messages in thread
From: Niek Linnenbank @ 2020-02-26 22:26 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Igor Mammedov, qemu-arm, QEMU Developers, Paolo Bonzini,
	Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 2222 bytes --]

Hi Alex,

On Wed, Feb 26, 2020 at 7:10 PM Alex Bennée <alex.bennee@linaro.org> wrote:

> Basing the TB cache size on the ram_size was always a little heuristic
> and was broken by a1b18df9a4 which caused ram_size not to be fully
> realised at the time we initialise the TCG translation cache.
>

Now I'm beginning to understand the issue better. So without this patch,
the TCG translation
cache effectively was disabled, causing the slowdown, correct?


>
> The current DEFAULT_CODE_GEN_BUFFER_SIZE may still be a little small
> but follow-up patches will address that.
>
> Fixes: a1b18df9a4
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Cc: Niek Linnenbank <nieklinnenbank@gmail.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
>
---
>  accel/tcg/translate-all.c | 8 --------
>  1 file changed, 8 deletions(-)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 238b0e575bf..5b66af783b5 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -938,15 +938,7 @@ static inline size_t size_code_gen_buffer(size_t
> tb_size)
>  {
>      /* Size the buffer.  */
>      if (tb_size == 0) {
> -#ifdef USE_STATIC_CODE_GEN_BUFFER
>          tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
>
-#else
> -        /* ??? Needs adjustments.  */
> -        /* ??? If we relax the requirement that CONFIG_USER_ONLY use the
> -           static buffer, we could size this on RESERVED_VA, on the text
> -           segment size of the executable, or continue to use the
> default.  */
> -        tb_size = (unsigned long)(ram_size / 4);
>

As you wrote in the commit message, I think we are indeed reducing the
cache size here to 32MiB
versus a larger size without this patch. In the next patch #4 in this
series you are increasing it for 64-bit hosts,
but what about the 32-bit hosts? Or will that be addressed in a later
series?

For now, this fix works and resolves the slowdown, so:

Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>

Regards,
Niek


> -#endif
>      }
>      if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
>          tb_size = MIN_CODE_GEN_BUFFER_SIZE;
> --
> 2.20.1
>
>

-- 
Niek Linnenbank

[-- Attachment #2: Type: text/html, Size: 3704 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit
  2020-02-26 18:10 ` [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit Alex Bennée
@ 2020-02-26 22:45   ` Niek Linnenbank
  2020-02-27 12:19     ` Alex Bennée
  2020-02-26 22:55   ` Richard Henderson
  1 sibling, 1 reply; 26+ messages in thread
From: Niek Linnenbank @ 2020-02-26 22:45 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Paolo Bonzini, qemu-arm, QEMU Developers, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 2159 bytes --]

Hi Alex,

On Wed, Feb 26, 2020 at 7:13 PM Alex Bennée <alex.bennee@linaro.org> wrote:

> While 32mb is certainly usable a full system boot ends up flushing the
> codegen buffer nearly 100 times. Increase the default on 64 bit hosts
> to take advantage of all that spare memory. After this change I can
> boot my tests system without any TB flushes.
>

That great, with this change I'm seeing a performance improvement when
running the avocado tests for cubieboard.
It runs about 4-5 seconds faster. My host is Ubuntu 18.04 on 64-bit.

I don't know much about the internals of TCG nor how it actually uses the
cache,
but it seems logical to me that increasing the cache size would improve
performance.

What I'm wondering is: will this also result in TCG translating larger
chunks in one shot, so potentially
taking more time to do the translation? If so, could it perhaps affect more
latency sensitive code?


>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>


> ---
>  accel/tcg/translate-all.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 4ce5d1b3931..f7baa512059 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -929,7 +929,11 @@ static void page_lock_pair(PageDesc **ret_p1,
> tb_page_addr_t phys1,
>  # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
>  #endif
>
> +#if TCG_TARGET_REG_BITS == 32
>  #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
> +#else
> +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (2 * GiB)
> +#endif
>

The qemu process now takes up more virtual memory, about ~2.5GiB in my
test, which can be expected with this change.

Is it very likely that the TCG cache will be filled quickly and completely?
I'm asking because I also use Qemu to do automated testing
where the nodes are 64-bit but each have only 2GiB physical RAM.

Regards,
Niek


>
>  #define DEFAULT_CODE_GEN_BUFFER_SIZE \
>    (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
> --
> 2.20.1
>
>
>

-- 
Niek Linnenbank

[-- Attachment #2: Type: text/html, Size: 3380 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 1/4] accel/tcg: use units.h for defining code gen buffer sizes
  2020-02-26 18:10 ` [PATCH v1 1/4] accel/tcg: use units.h for defining code gen buffer sizes Alex Bennée
  2020-02-26 22:00   ` Niek Linnenbank
@ 2020-02-26 22:49   ` Richard Henderson
  2020-02-27 10:54   ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 26+ messages in thread
From: Richard Henderson @ 2020-02-26 22:49 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel; +Cc: Paolo Bonzini, qemu-arm, Richard Henderson

On 2/26/20 10:10 AM, Alex Bennée wrote:
> It's easier to read.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  accel/tcg/translate-all.c | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/4] accel/tcg: remove link between guest ram and TCG cache size
  2020-02-26 18:10 ` [PATCH v1 2/4] accel/tcg: remove link between guest ram and TCG cache size Alex Bennée
  2020-02-26 22:26   ` Niek Linnenbank
@ 2020-02-26 22:49   ` Richard Henderson
  2020-02-27 10:58   ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 26+ messages in thread
From: Richard Henderson @ 2020-02-26 22:49 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Paolo Bonzini, Niek Linnenbank, qemu-arm, Richard Henderson,
	Igor Mammedov

On 2/26/20 10:10 AM, Alex Bennée wrote:
> Basing the TB cache size on the ram_size was always a little heuristic
> and was broken by a1b18df9a4 which caused ram_size not to be fully
> realised at the time we initialise the TCG translation cache.
> 
> The current DEFAULT_CODE_GEN_BUFFER_SIZE may still be a little small
> but follow-up patches will address that.
> 
> Fixes: a1b18df9a4
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Cc: Niek Linnenbank <nieklinnenbank@gmail.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> ---
>  accel/tcg/translate-all.c | 8 --------
>  1 file changed, 8 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/4] accel/tcg: remove link between guest ram and TCG cache size
  2020-02-26 22:26   ` Niek Linnenbank
@ 2020-02-26 22:50     ` Richard Henderson
  0 siblings, 0 replies; 26+ messages in thread
From: Richard Henderson @ 2020-02-26 22:50 UTC (permalink / raw)
  To: Niek Linnenbank, Alex Bennée
  Cc: Igor Mammedov, Richard Henderson, qemu-arm, QEMU Developers,
	Paolo Bonzini

On 2/26/20 2:26 PM, Niek Linnenbank wrote:
> Hi Alex,
> 
> On Wed, Feb 26, 2020 at 7:10 PM Alex Bennée <alex.bennee@linaro.org
> <mailto:alex.bennee@linaro.org>> wrote:
> 
>     Basing the TB cache size on the ram_size was always a little heuristic
>     and was broken by a1b18df9a4 which caused ram_size not to be fully
>     realised at the time we initialise the TCG translation cache.
> 
> 
> Now I'm beginning to understand the issue better. So without this patch, the
> TCG translation
> cache effectively was disabled, causing the slowdown, correct?

Yes.


r~


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 3/4] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts
  2020-02-26 18:10 ` [PATCH v1 3/4] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts Alex Bennée
@ 2020-02-26 22:50   ` Richard Henderson
  2020-02-27 10:55   ` Philippe Mathieu-Daudé
  2020-02-27 19:20   ` Niek Linnenbank
  2 siblings, 0 replies; 26+ messages in thread
From: Richard Henderson @ 2020-02-26 22:50 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel; +Cc: Paolo Bonzini, qemu-arm, Richard Henderson

On 2/26/20 10:10 AM, Alex Bennée wrote:
> There is no particular reason to use a static codegen buffer on 64 bit
> hosts as we have address space to burn. Allow the common CONFIG_USER
> case to use the mmap'ed buffers like SoftMMU.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  accel/tcg/translate-all.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit
  2020-02-26 18:10 ` [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit Alex Bennée
  2020-02-26 22:45   ` Niek Linnenbank
@ 2020-02-26 22:55   ` Richard Henderson
  2020-02-27 12:31     ` Alex Bennée
  1 sibling, 1 reply; 26+ messages in thread
From: Richard Henderson @ 2020-02-26 22:55 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel; +Cc: Paolo Bonzini, qemu-arm, Laurent Vivier

On 2/26/20 10:10 AM, Alex Bennée wrote:
> While 32mb is certainly usable a full system boot ends up flushing the
> codegen buffer nearly 100 times. Increase the default on 64 bit hosts
> to take advantage of all that spare memory. After this change I can
> boot my tests system without any TB flushes.

> +#if TCG_TARGET_REG_BITS == 32
>  #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
> +#else
> +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (2 * GiB)
> +#endif

This particular number, I'm not so sure about.

It makes sense for a lone vm, running in system mode, on a large-ish host.
It's more questionable for a large-ish host running many system mode vm's,
although one can tune that from the command-line, so perhaps it's still ok.

It does not make sense for a linux-user chroot, running make -jN, on just about
any host.  For linux-user, I could be happy with a modest increase, but not all
the way out to 2GiB.

Discuss.


r~


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 1/4] accel/tcg: use units.h for defining code gen buffer sizes
  2020-02-26 18:10 ` [PATCH v1 1/4] accel/tcg: use units.h for defining code gen buffer sizes Alex Bennée
  2020-02-26 22:00   ` Niek Linnenbank
  2020-02-26 22:49   ` Richard Henderson
@ 2020-02-27 10:54   ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 26+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-02-27 10:54 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel; +Cc: Paolo Bonzini, qemu-arm, Richard Henderson

On 2/26/20 7:10 PM, Alex Bennée wrote:
> It's easier to read.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   accel/tcg/translate-all.c | 19 ++++++++++---------
>   1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index a08ab11f657..238b0e575bf 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -18,6 +18,7 @@
>    */
>   
>   #include "qemu/osdep.h"
> +#include "qemu/units.h"
>   #include "qemu-common.h"
>   
>   #define NO_CPU_IO_DEFS
> @@ -901,33 +902,33 @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
>   
>   /* Minimum size of the code gen buffer.  This number is randomly chosen,
>      but not so small that we can't have a fair number of TB's live.  */
> -#define MIN_CODE_GEN_BUFFER_SIZE     (1024u * 1024)
> +#define MIN_CODE_GEN_BUFFER_SIZE     (1 * MiB)
>   
>   /* Maximum size of the code gen buffer we'd like to use.  Unless otherwise
>      indicated, this is constrained by the range of direct branches on the
>      host cpu, as used by the TCG implementation of goto_tb.  */
>   #if defined(__x86_64__)
> -# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
> +# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
>   #elif defined(__sparc__)
> -# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
> +# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
>   #elif defined(__powerpc64__)
> -# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
> +# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
>   #elif defined(__powerpc__)
> -# define MAX_CODE_GEN_BUFFER_SIZE  (32u * 1024 * 1024)
> +# define MAX_CODE_GEN_BUFFER_SIZE  (32 * MiB)
>   #elif defined(__aarch64__)
> -# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
> +# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
>   #elif defined(__s390x__)
>     /* We have a +- 4GB range on the branches; leave some slop.  */
> -# define MAX_CODE_GEN_BUFFER_SIZE  (3ul * 1024 * 1024 * 1024)
> +# define MAX_CODE_GEN_BUFFER_SIZE  (3 * GiB)
>   #elif defined(__mips__)
>     /* We have a 256MB branch region, but leave room to make sure the
>        main executable is also within that region.  */
> -# define MAX_CODE_GEN_BUFFER_SIZE  (128ul * 1024 * 1024)
> +# define MAX_CODE_GEN_BUFFER_SIZE  (128 * MiB)
>   #else
>   # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
>   #endif
>   
> -#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32u * 1024 * 1024)
> +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
>   
>   #define DEFAULT_CODE_GEN_BUFFER_SIZE \
>     (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
> 

Thanks!

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 3/4] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts
  2020-02-26 18:10 ` [PATCH v1 3/4] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts Alex Bennée
  2020-02-26 22:50   ` Richard Henderson
@ 2020-02-27 10:55   ` Philippe Mathieu-Daudé
  2020-02-27 19:20   ` Niek Linnenbank
  2 siblings, 0 replies; 26+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-02-27 10:55 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel; +Cc: Paolo Bonzini, qemu-arm, Richard Henderson

On 2/26/20 7:10 PM, Alex Bennée wrote:
> There is no particular reason to use a static codegen buffer on 64 bit
> hosts as we have address space to burn. Allow the common CONFIG_USER
> case to use the mmap'ed buffers like SoftMMU.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   accel/tcg/translate-all.c | 11 ++++++-----
>   1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 5b66af783b5..4ce5d1b3931 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -892,11 +892,12 @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
>       }
>   }
>   
> -#if defined(CONFIG_USER_ONLY)
> -/* Currently it is not recommended to allocate big chunks of data in
> -   user mode. It will change when a dedicated libc will be used.  */
> -/* ??? 64-bit hosts ought to have no problem mmaping data outside the
> -   region in which the guest needs to run.  Revisit this.  */
> +#if defined(CONFIG_USER_ONLY) && TCG_TARGET_REG_BITS == 32
> +/*
> + * For user mode on smaller 32 bit systems we may run into trouble
> + * allocating big chunks of data in the right place. On these systems
> + * we utilise a static code generation buffer directly in the binary.
> + */
>   #define USE_STATIC_CODE_GEN_BUFFER
>   #endif
>   
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/4] accel/tcg: remove link between guest ram and TCG cache size
  2020-02-26 18:10 ` [PATCH v1 2/4] accel/tcg: remove link between guest ram and TCG cache size Alex Bennée
  2020-02-26 22:26   ` Niek Linnenbank
  2020-02-26 22:49   ` Richard Henderson
@ 2020-02-27 10:58   ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 26+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-02-27 10:58 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: Paolo Bonzini, Niek Linnenbank, qemu-arm, Richard Henderson,
	Igor Mammedov

On 2/26/20 7:10 PM, Alex Bennée wrote:
> Basing the TB cache size on the ram_size was always a little heuristic
> and was broken by a1b18df9a4 which caused ram_size not to be fully
> realised at the time we initialise the TCG translation cache.
> 
> The current DEFAULT_CODE_GEN_BUFFER_SIZE may still be a little small
> but follow-up patches will address that.
> 
> Fixes: a1b18df9a4
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Cc: Niek Linnenbank <nieklinnenbank@gmail.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> ---
>   accel/tcg/translate-all.c | 8 --------
>   1 file changed, 8 deletions(-)
> 
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 238b0e575bf..5b66af783b5 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -938,15 +938,7 @@ static inline size_t size_code_gen_buffer(size_t tb_size)
>   {
>       /* Size the buffer.  */
>       if (tb_size == 0) {
> -#ifdef USE_STATIC_CODE_GEN_BUFFER
>           tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
> -#else
> -        /* ??? Needs adjustments.  */
> -        /* ??? If we relax the requirement that CONFIG_USER_ONLY use the
> -           static buffer, we could size this on RESERVED_VA, on the text
> -           segment size of the executable, or continue to use the default.  */
> -        tb_size = (unsigned long)(ram_size / 4);
> -#endif
>       }
>       if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
>           tb_size = MIN_CODE_GEN_BUFFER_SIZE;
> 

Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit
  2020-02-26 22:45   ` Niek Linnenbank
@ 2020-02-27 12:19     ` Alex Bennée
  2020-02-27 19:01       ` Niek Linnenbank
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Bennée @ 2020-02-27 12:19 UTC (permalink / raw)
  To: Niek Linnenbank
  Cc: Paolo Bonzini, qemu-arm, QEMU Developers, Richard Henderson


Niek Linnenbank <nieklinnenbank@gmail.com> writes:

> Hi Alex,
>
> On Wed, Feb 26, 2020 at 7:13 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>> While 32mb is certainly usable a full system boot ends up flushing the
>> codegen buffer nearly 100 times. Increase the default on 64 bit hosts
>> to take advantage of all that spare memory. After this change I can
>> boot my tests system without any TB flushes.
>>
>
> That great, with this change I'm seeing a performance improvement when
> running the avocado tests for cubieboard.
> It runs about 4-5 seconds faster. My host is Ubuntu 18.04 on 64-bit.
>
> I don't know much about the internals of TCG nor how it actually uses the
> cache,
> but it seems logical to me that increasing the cache size would improve
> performance.
>
> What I'm wondering is: will this also result in TCG translating larger
> chunks in one shot, so potentially
> taking more time to do the translation? If so, could it perhaps affect more
> latency sensitive code?

No - the size of the translation blocks is governed by the guest code
and where it ends a basic block. In system mode we also care about
crossing guest page boundaries.

>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>
> Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
>
>
>> ---
>>  accel/tcg/translate-all.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
>> index 4ce5d1b3931..f7baa512059 100644
>> --- a/accel/tcg/translate-all.c
>> +++ b/accel/tcg/translate-all.c
>> @@ -929,7 +929,11 @@ static void page_lock_pair(PageDesc **ret_p1,
>> tb_page_addr_t phys1,
>>  # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
>>  #endif
>>
>> +#if TCG_TARGET_REG_BITS == 32
>>  #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
>> +#else
>> +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (2 * GiB)
>> +#endif
>>
>
> The qemu process now takes up more virtual memory, about ~2.5GiB in my
> test, which can be expected with this change.
>
> Is it very likely that the TCG cache will be filled quickly and completely?
> I'm asking because I also use Qemu to do automated testing
> where the nodes are 64-bit but each have only 2GiB physical RAM.

Well so this is the interesting question and as ever it depends.

For system emulation the buffer will just slowly fill-up over time until
exhausted and which point it will flush and reset. Each time the guest
needs to flush a page and load fresh code in we will generate more
translated code. If the guest isn't under load and never uses all it's
RAM for code then in theory the pages of the mmap that are never filled
never need to be actualised by the host kernel.

You can view the behaviour by running "info jit" from the HMP monitor in
your tests. The "TB Flush" value shows the number of times this has
happened along with other information about translation state.

>
> Regards,
> Niek
>
>
>>
>>  #define DEFAULT_CODE_GEN_BUFFER_SIZE \
>>    (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
>> --
>> 2.20.1
>>
>>
>>


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit
  2020-02-26 22:55   ` Richard Henderson
@ 2020-02-27 12:31     ` Alex Bennée
  2020-02-27 12:56       ` Richard Henderson
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Bennée @ 2020-02-27 12:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Paolo Bonzini, qemu-arm, qemu-devel, Laurent Vivier


Richard Henderson <richard.henderson@linaro.org> writes:

> On 2/26/20 10:10 AM, Alex Bennée wrote:
>> While 32mb is certainly usable a full system boot ends up flushing the
>> codegen buffer nearly 100 times. Increase the default on 64 bit hosts
>> to take advantage of all that spare memory. After this change I can
>> boot my tests system without any TB flushes.
>
>> +#if TCG_TARGET_REG_BITS == 32
>>  #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
>> +#else
>> +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (2 * GiB)
>> +#endif
>
> This particular number, I'm not so sure about.
>
> It makes sense for a lone vm, running in system mode, on a large-ish host.
> It's more questionable for a large-ish host running many system mode vm's,
> although one can tune that from the command-line, so perhaps it's
> still ok.

Yeah it would be nice to get some feedback from users. I suspect system
emulation means the mmap is less efficient due to the sharding of the
translation buffer.

> It does not make sense for a linux-user chroot, running make -jN, on just about
> any host.  For linux-user, I could be happy with a modest increase, but not all
> the way out to 2GiB.
>
> Discuss.

Does it matter that much? Surely for small programs the kernel just
never pages in the used portions of the mmap?

That said does linux-user have a better idea of the size of the problem
set before we start running? Could we defer calling tcg_exec_init until
we have mapped in the main executable and then size based on that?

>
>
> r~


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit
  2020-02-27 12:31     ` Alex Bennée
@ 2020-02-27 12:56       ` Richard Henderson
  2020-02-27 14:13         ` Igor Mammedov
                           ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Richard Henderson @ 2020-02-27 12:56 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Paolo Bonzini, qemu-arm, qemu-devel, Laurent Vivier

On 2/27/20 4:31 AM, Alex Bennée wrote:
>> It does not make sense for a linux-user chroot, running make -jN, on just about
>> any host.  For linux-user, I could be happy with a modest increase, but not all
>> the way out to 2GiB.
>>
>> Discuss.
> 
> Does it matter that much? Surely for small programs the kernel just
> never pages in the used portions of the mmap?

That's why I used the example of a build under the chroot, because the compiler
is not a small program.

Consider when the memory *is* used, and N * 2GB implies lots of paging, where
the previous N * 32MB did not.

I'm saying that we should consider a setting more like 128MB or so, since the
value cannot be changed from the command-line, or through the environment.


r~


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit
  2020-02-27 12:56       ` Richard Henderson
@ 2020-02-27 14:13         ` Igor Mammedov
  2020-02-27 19:07         ` Niek Linnenbank
  2020-02-28  7:54         ` [PATCH] " Alex Bennée
  2 siblings, 0 replies; 26+ messages in thread
From: Igor Mammedov @ 2020-02-27 14:13 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Paolo Bonzini, qemu-arm, Alex Bennée, qemu-devel, Laurent Vivier

On Thu, 27 Feb 2020 04:56:46 -0800
Richard Henderson <richard.henderson@linaro.org> wrote:

> On 2/27/20 4:31 AM, Alex Bennée wrote:
> >> It does not make sense for a linux-user chroot, running make -jN, on just about
> >> any host.  For linux-user, I could be happy with a modest increase, but not all
> >> the way out to 2GiB.
> >>
> >> Discuss.  
> > 
> > Does it matter that much? Surely for small programs the kernel just
> > never pages in the used portions of the mmap?  
> 
> That's why I used the example of a build under the chroot, because the compiler
> is not a small program.
> 
> Consider when the memory *is* used, and N * 2GB implies lots of paging, where
> the previous N * 32MB did not.
> 
> I'm saying that we should consider a setting more like 128MB or so, since the
> value cannot be changed from the command-line, or through the environment.

That's what BSD guys force tb-size to, to speed up system emulation.

> 
> 
> r~
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit
  2020-02-27 12:19     ` Alex Bennée
@ 2020-02-27 19:01       ` Niek Linnenbank
  0 siblings, 0 replies; 26+ messages in thread
From: Niek Linnenbank @ 2020-02-27 19:01 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Paolo Bonzini, qemu-arm, QEMU Developers, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 3565 bytes --]

Hi Alex,

On Thu, Feb 27, 2020 at 1:19 PM Alex Bennée <alex.bennee@linaro.org> wrote:

>
> Niek Linnenbank <nieklinnenbank@gmail.com> writes:
>
> > Hi Alex,
> >
> > On Wed, Feb 26, 2020 at 7:13 PM Alex Bennée <alex.bennee@linaro.org>
> wrote:
> >
> >> While 32mb is certainly usable a full system boot ends up flushing the
> >> codegen buffer nearly 100 times. Increase the default on 64 bit hosts
> >> to take advantage of all that spare memory. After this change I can
> >> boot my tests system without any TB flushes.
> >>
> >
> > That great, with this change I'm seeing a performance improvement when
> > running the avocado tests for cubieboard.
> > It runs about 4-5 seconds faster. My host is Ubuntu 18.04 on 64-bit.
> >
> > I don't know much about the internals of TCG nor how it actually uses the
> > cache,
> > but it seems logical to me that increasing the cache size would improve
> > performance.
> >
> > What I'm wondering is: will this also result in TCG translating larger
> > chunks in one shot, so potentially
> > taking more time to do the translation? If so, could it perhaps affect
> more
> > latency sensitive code?
>
> No - the size of the translation blocks is governed by the guest code
> and where it ends a basic block. In system mode we also care about
> crossing guest page boundaries.
>
> >> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> >>
> > Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
> >
> >
> >> ---
> >>  accel/tcg/translate-all.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> >> index 4ce5d1b3931..f7baa512059 100644
> >> --- a/accel/tcg/translate-all.c
> >> +++ b/accel/tcg/translate-all.c
> >> @@ -929,7 +929,11 @@ static void page_lock_pair(PageDesc **ret_p1,
> >> tb_page_addr_t phys1,
> >>  # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
> >>  #endif
> >>
> >> +#if TCG_TARGET_REG_BITS == 32
> >>  #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
> >> +#else
> >> +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (2 * GiB)
> >> +#endif
> >>
> >
> > The qemu process now takes up more virtual memory, about ~2.5GiB in my
> > test, which can be expected with this change.
> >
> > Is it very likely that the TCG cache will be filled quickly and
> completely?
> > I'm asking because I also use Qemu to do automated testing
> > where the nodes are 64-bit but each have only 2GiB physical RAM.
>
> Well so this is the interesting question and as ever it depends.
>
> For system emulation the buffer will just slowly fill-up over time until
> exhausted and which point it will flush and reset. Each time the guest
> needs to flush a page and load fresh code in we will generate more
> translated code. If the guest isn't under load and never uses all it's
> RAM for code then in theory the pages of the mmap that are never filled
> never need to be actualised by the host kernel.
>
> You can view the behaviour by running "info jit" from the HMP monitor in
> your tests. The "TB Flush" value shows the number of times this has
> happened along with other information about translation state.
>

Thanks for clarifying this, now it all starts to make more sense to me.

Regards,
Niek


>
> >
> > Regards,
> > Niek
> >
> >
> >>
> >>  #define DEFAULT_CODE_GEN_BUFFER_SIZE \
> >>    (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
> >> --
> >> 2.20.1
> >>
> >>
> >>
>
>
> --
> Alex Bennée
>


-- 
Niek Linnenbank

[-- Attachment #2: Type: text/html, Size: 4968 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit
  2020-02-27 12:56       ` Richard Henderson
  2020-02-27 14:13         ` Igor Mammedov
@ 2020-02-27 19:07         ` Niek Linnenbank
  2020-02-28  7:47           ` Igor Mammedov
  2020-02-28  7:54         ` [PATCH] " Alex Bennée
  2 siblings, 1 reply; 26+ messages in thread
From: Niek Linnenbank @ 2020-02-27 19:07 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Paolo Bonzini, qemu-arm, Alex Bennée, QEMU Developers,
	Laurent Vivier

[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

Hi Richard,

On Thu, Feb 27, 2020 at 1:57 PM Richard Henderson <
richard.henderson@linaro.org> wrote:

> On 2/27/20 4:31 AM, Alex Bennée wrote:
> >> It does not make sense for a linux-user chroot, running make -jN, on
> just about
> >> any host.  For linux-user, I could be happy with a modest increase, but
> not all
> >> the way out to 2GiB.
> >>
> >> Discuss.
> >
> > Does it matter that much? Surely for small programs the kernel just
> > never pages in the used portions of the mmap?
>
> That's why I used the example of a build under the chroot, because the
> compiler
> is not a small program.
>
> Consider when the memory *is* used, and N * 2GB implies lots of paging,
> where
> the previous N * 32MB did not.
>
> I agree that a lower default value probably is safer until we have more
proof that a larger value does not give any issues.


> I'm saying that we should consider a setting more like 128MB or so, since
> the
> value cannot be changed from the command-line, or through the environment.
>

Proposal: can we then introduce a new command line parameter for this?
Maybe in a new patch?
Since the size of the code generation buffer appears to have an impact on
performance,
in my opinion it would make sense to make it configurable by the user.

Regards,
Niek


>
>
> r~
>
>

-- 
Niek Linnenbank

[-- Attachment #2: Type: text/html, Size: 2223 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 3/4] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts
  2020-02-26 18:10 ` [PATCH v1 3/4] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts Alex Bennée
  2020-02-26 22:50   ` Richard Henderson
  2020-02-27 10:55   ` Philippe Mathieu-Daudé
@ 2020-02-27 19:20   ` Niek Linnenbank
  2 siblings, 0 replies; 26+ messages in thread
From: Niek Linnenbank @ 2020-02-27 19:20 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Paolo Bonzini, qemu-arm, QEMU Developers, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1514 bytes --]

On Wed, Feb 26, 2020 at 7:12 PM Alex Bennée <alex.bennee@linaro.org> wrote:

> There is no particular reason to use a static codegen buffer on 64 bit
> hosts as we have address space to burn. Allow the common CONFIG_USER
> case to use the mmap'ed buffers like SoftMMU.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  accel/tcg/translate-all.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 5b66af783b5..4ce5d1b3931 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -892,11 +892,12 @@ static void page_lock_pair(PageDesc **ret_p1,
> tb_page_addr_t phys1,
>      }
>  }
>
> -#if defined(CONFIG_USER_ONLY)
> -/* Currently it is not recommended to allocate big chunks of data in
> -   user mode. It will change when a dedicated libc will be used.  */
> -/* ??? 64-bit hosts ought to have no problem mmaping data outside the
> -   region in which the guest needs to run.  Revisit this.  */
> +#if defined(CONFIG_USER_ONLY) && TCG_TARGET_REG_BITS == 32
> +/*
> + * For user mode on smaller 32 bit systems we may run into trouble
> + * allocating big chunks of data in the right place. On these systems
> + * we utilise a static code generation buffer directly in the binary.
> + */
>  #define USE_STATIC_CODE_GEN_BUFFER
>  #endif
>
> --
> 2.20.1
>
>
> Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>

-- 
Niek Linnenbank

[-- Attachment #2: Type: text/html, Size: 2150 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit
  2020-02-27 19:07         ` Niek Linnenbank
@ 2020-02-28  7:47           ` Igor Mammedov
  2020-02-28 19:20             ` Alex Bennée
  0 siblings, 1 reply; 26+ messages in thread
From: Igor Mammedov @ 2020-02-28  7:47 UTC (permalink / raw)
  To: Niek Linnenbank
  Cc: Richard Henderson, QEMU Developers, Laurent Vivier, qemu-arm,
	Paolo Bonzini, Alex Bennée

On Thu, 27 Feb 2020 20:07:24 +0100
Niek Linnenbank <nieklinnenbank@gmail.com> wrote:

> Hi Richard,
> 
> On Thu, Feb 27, 2020 at 1:57 PM Richard Henderson <
> richard.henderson@linaro.org> wrote:  
> 
> > On 2/27/20 4:31 AM, Alex Bennée wrote:  
> > >> It does not make sense for a linux-user chroot, running make -jN, on  
> > just about  
> > >> any host.  For linux-user, I could be happy with a modest increase, but  
> > not all  
> > >> the way out to 2GiB.
> > >>
> > >> Discuss.  
> > >
> > > Does it matter that much? Surely for small programs the kernel just
> > > never pages in the used portions of the mmap?  
> >
> > That's why I used the example of a build under the chroot, because the
> > compiler
> > is not a small program.
> >
> > Consider when the memory *is* used, and N * 2GB implies lots of paging,
> > where
> > the previous N * 32MB did not.
> >
> > I agree that a lower default value probably is safer until we have more  
> proof that a larger value does not give any issues.
> 
> 
> > I'm saying that we should consider a setting more like 128MB or so, since
> > the
> > value cannot be changed from the command-line, or through the environment.
> >  
> 
> Proposal: can we then introduce a new command line parameter for this?
> Maybe in a new patch?

linux-user currently uses 32Mb static buffer so it probably fine to
leave it as is or bump it to 128Mb regardless of the 32/64bit host.

for system emulation, we already have tb-size option to set user
specified buffer size.

Issue is with system emulation is that it sizes buffer to 1/4 of
ram_size and dependency on ram_size is what we are trying to get
rid of. If we consider unit/acceptance tests as main target/user,
then they mostly use default ram_size value which varies mostly
from 16Mb to 1Gb depending on the board. So used buffer size is
in 4-256Mb range.
Considering that current CI runs fine with max 256Mb buffer,
it might make sense to use it as new heuristic which would not
regress our test infrastructure and might improve performance
for boards where smaller default buffer was used.


> Since the size of the code generation buffer appears to have an impact on
> performance,
> in my opinion it would make sense to make it configurable by the user.
> 
> Regards,
> 
> 
> >
> >
> > r~
> >
> >  
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] accel/tcg: increase default code gen buffer size for 64 bit
  2020-02-27 12:56       ` Richard Henderson
  2020-02-27 14:13         ` Igor Mammedov
  2020-02-27 19:07         ` Niek Linnenbank
@ 2020-02-28  7:54         ` Alex Bennée
  2 siblings, 0 replies; 26+ messages in thread
From: Alex Bennée @ 2020-02-28  7:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: richard.henderson, Niek Linnenbank, qemu-arm, Paolo Bonzini,
	Alex Bennée, Richard Henderson

While 32mb is certainly usable a full system boot ends up flushing the
codegen buffer nearly 100 times. Increase the default on 64 bit hosts
to take advantage of all that spare memory. After this change I can
boot my tests system without any TB flushes.

As we usually run more CONFIG_USER binaries at a time in typical usage
we aren't quite as profligate for user-mode code generation usage. We
also bring the static code gen defies to the same place to keep all
the reasoning in the comments together.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>

---
v3
  - 2gb->1gb for system emulation
  - split user and system emulation buffer sizes
---
 accel/tcg/translate-all.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 4ce5d1b3931..78914154bfc 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -892,15 +892,6 @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
     }
 }
 
-#if defined(CONFIG_USER_ONLY) && TCG_TARGET_REG_BITS == 32
-/*
- * For user mode on smaller 32 bit systems we may run into trouble
- * allocating big chunks of data in the right place. On these systems
- * we utilise a static code generation buffer directly in the binary.
- */
-#define USE_STATIC_CODE_GEN_BUFFER
-#endif
-
 /* Minimum size of the code gen buffer.  This number is randomly chosen,
    but not so small that we can't have a fair number of TB's live.  */
 #define MIN_CODE_GEN_BUFFER_SIZE     (1 * MiB)
@@ -929,7 +920,33 @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
 # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 #endif
 
+#if TCG_TARGET_REG_BITS == 32
 #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
+#ifdef CONFIG_USER_ONLY
+/*
+ * For user mode on smaller 32 bit systems we may run into trouble
+ * allocating big chunks of data in the right place. On these systems
+ * we utilise a static code generation buffer directly in the binary.
+ */
+#define USE_STATIC_CODE_GEN_BUFFER
+#endif
+#else /* TCG_TARGET_REG_BITS == 64 */
+#ifdef CONFIG_USER_ONLY
+/*
+ * As user-mode emulation typically means running multiple instances
+ * of the translator don't go too nuts with our default code gen
+ * buffer lest we make things too hard for the OS.
+ */
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (128 * MiB)
+#else
+/*
+ * We expect most system emulation to run one or two guests per host.
+ * Users running large scale system emulation may want to tweak their
+ * runtime setup via the tb-size control on the command line.
+ */
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (1 * GiB)
+#endif
+#endif
 
 #define DEFAULT_CODE_GEN_BUFFER_SIZE \
   (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit
  2020-02-28  7:47           ` Igor Mammedov
@ 2020-02-28 19:20             ` Alex Bennée
  0 siblings, 0 replies; 26+ messages in thread
From: Alex Bennée @ 2020-02-28 19:20 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Richard Henderson, QEMU Developers, Laurent Vivier,
	Niek Linnenbank, qemu-arm, Paolo Bonzini


Igor Mammedov <imammedo@redhat.com> writes:

> On Thu, 27 Feb 2020 20:07:24 +0100
> Niek Linnenbank <nieklinnenbank@gmail.com> wrote:
>
>> Hi Richard,
>> 
>> On Thu, Feb 27, 2020 at 1:57 PM Richard Henderson <
>> richard.henderson@linaro.org> wrote:  
>> 
>> > On 2/27/20 4:31 AM, Alex Bennée wrote:  
>> > >> It does not make sense for a linux-user chroot, running make -jN, on  
>> > just about  
>> > >> any host.  For linux-user, I could be happy with a modest increase, but  
>> > not all  
>> > >> the way out to 2GiB.
>> > >>
>> > >> Discuss.  
>> > >
>> > > Does it matter that much? Surely for small programs the kernel just
>> > > never pages in the used portions of the mmap?  
>> >
>> > That's why I used the example of a build under the chroot, because the
>> > compiler
>> > is not a small program.
>> >
>> > Consider when the memory *is* used, and N * 2GB implies lots of paging,
>> > where
>> > the previous N * 32MB did not.
>> >
>> > I agree that a lower default value probably is safer until we have more  
>> proof that a larger value does not give any issues.
>> 
>> 
>> > I'm saying that we should consider a setting more like 128MB or so, since
>> > the
>> > value cannot be changed from the command-line, or through the environment.
>> >  
>> 
>> Proposal: can we then introduce a new command line parameter for this?
>> Maybe in a new patch?
>
> linux-user currently uses 32Mb static buffer so it probably fine to
> leave it as is or bump it to 128Mb regardless of the 32/64bit host.
>
> for system emulation, we already have tb-size option to set user
> specified buffer size.
>
> Issue is with system emulation is that it sizes buffer to 1/4 of
> ram_size and dependency on ram_size is what we are trying to get
> rid of. If we consider unit/acceptance tests as main target/user,
> then they mostly use default ram_size value which varies mostly
> from 16Mb to 1Gb depending on the board. So used buffer size is
> in 4-256Mb range.
> Considering that current CI runs fine with max 256Mb buffer,
> it might make sense to use it as new heuristic which would not
> regress our test infrastructure and might improve performance
> for boards where smaller default buffer was used.

I've dropped it from 2gb to 1gb for system emulation. For the acceptance
tests I doubt we'll even fill the buffer but the mmap memory should
overcommit fine.

>
>
>> Since the size of the code generation buffer appears to have an impact on
>> performance,
>> in my opinion it would make sense to make it configurable by the user.
>> 
>> Regards,
>> 
>> 
>> >
>> >
>> > r~
>> >
>> >  
>> 


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-02-28 19:21 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-26 18:10 [PATCH v1 0/4] Fix codegen translation cache size Alex Bennée
2020-02-26 18:10 ` [PATCH v1 1/4] accel/tcg: use units.h for defining code gen buffer sizes Alex Bennée
2020-02-26 22:00   ` Niek Linnenbank
2020-02-26 22:49   ` Richard Henderson
2020-02-27 10:54   ` Philippe Mathieu-Daudé
2020-02-26 18:10 ` [PATCH v1 2/4] accel/tcg: remove link between guest ram and TCG cache size Alex Bennée
2020-02-26 22:26   ` Niek Linnenbank
2020-02-26 22:50     ` Richard Henderson
2020-02-26 22:49   ` Richard Henderson
2020-02-27 10:58   ` Philippe Mathieu-Daudé
2020-02-26 18:10 ` [PATCH v1 3/4] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts Alex Bennée
2020-02-26 22:50   ` Richard Henderson
2020-02-27 10:55   ` Philippe Mathieu-Daudé
2020-02-27 19:20   ` Niek Linnenbank
2020-02-26 18:10 ` [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit Alex Bennée
2020-02-26 22:45   ` Niek Linnenbank
2020-02-27 12:19     ` Alex Bennée
2020-02-27 19:01       ` Niek Linnenbank
2020-02-26 22:55   ` Richard Henderson
2020-02-27 12:31     ` Alex Bennée
2020-02-27 12:56       ` Richard Henderson
2020-02-27 14:13         ` Igor Mammedov
2020-02-27 19:07         ` Niek Linnenbank
2020-02-28  7:47           ` Igor Mammedov
2020-02-28 19:20             ` Alex Bennée
2020-02-28  7:54         ` [PATCH] " Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.