All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] tcg: Dynamically allocate temporaries
@ 2021-01-19 18:34 Richard Henderson
  2021-01-19 18:34 ` [PATCH 1/5] tcg: Add an index to TCGTemp Richard Henderson
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Richard Henderson @ 2021-01-19 18:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: lvivier, alistair23

My recent change for caching tcg constants has, in a number of cases,
overflowed the statically allocated array of temporaries.  Change to
dynamic allocation.

I'll note that nothing in check-acceptance triggers this overflow.
Anyone care to add some more test cases there?

Also, there's some outstanding weirdness in gitlab testing that I
cannot reproduce locally.


r~


Richard Henderson (5):
  tcg: Add an index to TCGTemp
  tcg: Introduce and use tcg_temp
  tcg: Make TCGTempSet expandable
  tcg: Adjust tcgv_*_temp/temp_tcgv_*
  tcg: Dynamically allocate temporaries

 include/tcg/tcg.h |  79 ++++++++++++++-----
 tcg/optimize.c    |  23 +++---
 tcg/tcg.c         | 196 +++++++++++++++++++++++++++++++---------------
 3 files changed, 205 insertions(+), 93 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] tcg: Add an index to TCGTemp
  2021-01-19 18:34 [PATCH 0/5] tcg: Dynamically allocate temporaries Richard Henderson
@ 2021-01-19 18:34 ` Richard Henderson
  2021-01-19 18:34 ` [PATCH 2/5] tcg: Introduce and use tcg_temp Richard Henderson
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2021-01-19 18:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: lvivier, alistair23

This allows to break the tie between the memory location
of TCGTemp and TCGContext->temps[].  Which is the first
step to allowing the array to be reallocated.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h |  5 ++---
 tcg/tcg.c         | 10 ++++++++--
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 504c5e9bb0..5ef644ceae 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -497,6 +497,7 @@ typedef enum TCGTempKind {
 } TCGTempKind;
 
 typedef struct TCGTemp {
+    unsigned int index:16;
     TCGReg reg:8;
     TCGTempVal val_type:8;
     TCGType base_type:8;
@@ -721,9 +722,7 @@ static inline void *tcg_splitwx_to_rw(const void *rx)
 
 static inline size_t temp_idx(TCGTemp *ts)
 {
-    ptrdiff_t n = ts - tcg_ctx->temps;
-    tcg_debug_assert(n >= 0 && n < tcg_ctx->nb_temps);
-    return n;
+    return ts->index;
 }
 
 static inline TCGArg temp_arg(TCGTemp *ts)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 8f8badb61c..4a8dfb8f67 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1204,11 +1204,17 @@ void tcg_func_start(TCGContext *s)
     QSIMPLEQ_INIT(&s->labels);
 }
 
-static inline TCGTemp *tcg_temp_alloc(TCGContext *s)
+static TCGTemp *tcg_temp_alloc(TCGContext *s)
 {
     int n = s->nb_temps++;
+    TCGTemp *ret;
+
     tcg_debug_assert(n < TCG_MAX_TEMPS);
-    return memset(&s->temps[n], 0, sizeof(TCGTemp));
+    ret = &s->temps[n];
+    memset(ret, 0, sizeof(TCGTemp));
+    ret->index = n;
+
+    return ret;
 }
 
 static inline TCGTemp *tcg_global_alloc(TCGContext *s)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] tcg: Introduce and use tcg_temp
  2021-01-19 18:34 [PATCH 0/5] tcg: Dynamically allocate temporaries Richard Henderson
  2021-01-19 18:34 ` [PATCH 1/5] tcg: Add an index to TCGTemp Richard Henderson
@ 2021-01-19 18:34 ` Richard Henderson
  2021-01-19 18:34 ` [PATCH 3/5] tcg: Make TCGTempSet expandable Richard Henderson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2021-01-19 18:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: lvivier, alistair23

We're about to change the representation of the array of TCGTemps.
Wrap the indexing into an inline to minimize changes later.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h |  6 ++++
 tcg/optimize.c    |  5 ++--
 tcg/tcg.c         | 71 ++++++++++++++++++++++++++---------------------
 3 files changed, 48 insertions(+), 34 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 5ef644ceae..0d90701dcd 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -720,6 +720,12 @@ static inline void *tcg_splitwx_to_rw(const void *rx)
 }
 #endif
 
+static inline TCGTemp *tcg_temp(TCGContext *s, size_t idx)
+{
+    tcg_debug_assert(idx < s->nb_temps);
+    return &s->temps[idx];
+}
+
 static inline size_t temp_idx(TCGTemp *ts)
 {
     return ts->index;
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 37c902283e..2aa491605e 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -614,7 +614,8 @@ void tcg_optimize(TCGContext *s)
 
     memset(&temps_used, 0, sizeof(temps_used));
     for (i = 0; i < nb_temps; ++i) {
-        s->temps[i].state_ptr = NULL;
+        TCGTemp *ts = tcg_temp(s, i);
+        ts->state_ptr = NULL;
     }
 
     QTAILQ_FOREACH_SAFE(op, &s->ops, link, op_next) {
@@ -1485,7 +1486,7 @@ void tcg_optimize(TCGContext *s)
                   & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
                 for (i = 0; i < nb_globals; i++) {
                     if (test_bit(i, temps_used.l)) {
-                        reset_ts(&s->temps[i]);
+                        reset_ts(tcg_temp(s, i));
                     }
                 }
             }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 4a8dfb8f67..7284209cff 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -824,10 +824,10 @@ void tcg_register_thread(void)
 
     /* Relink mem_base.  */
     for (i = 0, n = tcg_init_ctx.nb_globals; i < n; ++i) {
-        if (tcg_init_ctx.temps[i].mem_base) {
-            ptrdiff_t b = tcg_init_ctx.temps[i].mem_base - tcg_init_ctx.temps;
-            tcg_debug_assert(b >= 0 && b < n);
-            s->temps[i].mem_base = &s->temps[b];
+        TCGTemp *its = tcg_temp(&tcg_init_ctx, i);
+        if (its->mem_base) {
+            TCGTemp *ots = tcg_temp(s, i);
+            ots->mem_base = tcg_temp(s, temp_idx(its->mem_base));
         }
     }
 
@@ -1332,7 +1332,7 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
         /* There is already an available temp with the right type.  */
         clear_bit(idx, s->free_temps[k].l);
 
-        ts = &s->temps[idx];
+        ts = tcg_temp(s, idx);
         ts->temp_allocated = 1;
         tcg_debug_assert(ts->base_type == type);
         tcg_debug_assert(ts->kind == kind);
@@ -2016,7 +2016,7 @@ static void tcg_reg_alloc_start(TCGContext *s)
     int i, n;
 
     for (i = 0, n = s->nb_temps; i < n; i++) {
-        TCGTemp *ts = &s->temps[i];
+        TCGTemp *ts = tcg_temp(s, i);
         TCGTempVal val = TEMP_VAL_MEM;
 
         switch (ts->kind) {
@@ -2654,12 +2654,14 @@ static void la_func_end(TCGContext *s, int ng, int nt)
     int i;
 
     for (i = 0; i < ng; ++i) {
-        s->temps[i].state = TS_DEAD | TS_MEM;
-        la_reset_pref(&s->temps[i]);
+        TCGTemp *ts = tcg_temp(s, i);
+        ts->state = TS_DEAD | TS_MEM;
+        la_reset_pref(ts);
     }
     for (i = ng; i < nt; ++i) {
-        s->temps[i].state = TS_DEAD;
-        la_reset_pref(&s->temps[i]);
+        TCGTemp *ts = tcg_temp(s, i);
+        ts->state = TS_DEAD;
+        la_reset_pref(ts);
     }
 }
 
@@ -2670,7 +2672,7 @@ static void la_bb_end(TCGContext *s, int ng, int nt)
     int i;
 
     for (i = 0; i < nt; ++i) {
-        TCGTemp *ts = &s->temps[i];
+        TCGTemp *ts = tcg_temp(s, i);
         int state;
 
         switch (ts->kind) {
@@ -2697,11 +2699,13 @@ static void la_global_sync(TCGContext *s, int ng)
     int i;
 
     for (i = 0; i < ng; ++i) {
-        int state = s->temps[i].state;
-        s->temps[i].state = state | TS_MEM;
+        TCGTemp *ts = tcg_temp(s, i);
+        int state = ts->state;
+
+        ts->state = state | TS_MEM;
         if (state == TS_DEAD) {
             /* If the global was previously dead, reset prefs.  */
-            la_reset_pref(&s->temps[i]);
+            la_reset_pref(ts);
         }
     }
 }
@@ -2715,7 +2719,7 @@ static void la_bb_sync(TCGContext *s, int ng, int nt)
     la_global_sync(s, ng);
 
     for (int i = ng; i < nt; ++i) {
-        TCGTemp *ts = &s->temps[i];
+        TCGTemp *ts = tcg_temp(s, i);
         int state;
 
         switch (ts->kind) {
@@ -2727,14 +2731,14 @@ static void la_bb_sync(TCGContext *s, int ng, int nt)
             }
             break;
         case TEMP_NORMAL:
-            s->temps[i].state = TS_DEAD;
+            ts->state = TS_DEAD;
             break;
         case TEMP_CONST:
             continue;
         default:
             g_assert_not_reached();
         }
-        la_reset_pref(&s->temps[i]);
+        la_reset_pref(ts);
     }
 }
 
@@ -2744,8 +2748,9 @@ static void la_global_kill(TCGContext *s, int ng)
     int i;
 
     for (i = 0; i < ng; i++) {
-        s->temps[i].state = TS_DEAD | TS_MEM;
-        la_reset_pref(&s->temps[i]);
+        TCGTemp *ts = tcg_temp(s, i);
+        ts->state = TS_DEAD | TS_MEM;
+        la_reset_pref(ts);
     }
 }
 
@@ -2756,7 +2761,8 @@ static void la_cross_call(TCGContext *s, int nt)
     int i;
 
     for (i = 0; i < nt; i++) {
-        TCGTemp *ts = &s->temps[i];
+        TCGTemp *ts = tcg_temp(s, i);
+
         if (!(ts->state & TS_DEAD)) {
             TCGRegSet *pset = la_temp_pref(ts);
             TCGRegSet set = *pset;
@@ -2784,7 +2790,8 @@ static void liveness_pass_1(TCGContext *s)
 
     prefs = tcg_malloc(sizeof(TCGRegSet) * nb_temps);
     for (i = 0; i < nb_temps; ++i) {
-        s->temps[i].state_ptr = prefs + i;
+        TCGTemp *ts = tcg_temp(s, i);
+        ts->state_ptr = prefs + i;
     }
 
     /* ??? Should be redundant with the exit_tb that ends the TB.  */
@@ -3094,7 +3101,7 @@ static bool liveness_pass_2(TCGContext *s)
 
     /* Create a temporary for each indirect global.  */
     for (i = 0; i < nb_globals; ++i) {
-        TCGTemp *its = &s->temps[i];
+        TCGTemp *its = tcg_temp(s, i);
         if (its->indirect_reg) {
             TCGTemp *dts = tcg_temp_alloc(s);
             dts->type = its->type;
@@ -3107,7 +3114,7 @@ static bool liveness_pass_2(TCGContext *s)
         its->state = TS_DEAD;
     }
     for (nb_temps = s->nb_temps; i < nb_temps; ++i) {
-        TCGTemp *its = &s->temps[i];
+        TCGTemp *its = tcg_temp(s, i);
         its->state_ptr = NULL;
         its->state = TS_DEAD;
     }
@@ -3190,7 +3197,7 @@ static bool liveness_pass_2(TCGContext *s)
             for (i = 0; i < nb_globals; ++i) {
                 /* Liveness should see that globals are synced back,
                    that is, either TS_DEAD or TS_MEM.  */
-                arg_ts = &s->temps[i];
+                arg_ts = tcg_temp(s, i);
                 tcg_debug_assert(arg_ts->state_ptr == 0
                                  || arg_ts->state != 0);
             }
@@ -3198,7 +3205,7 @@ static bool liveness_pass_2(TCGContext *s)
             for (i = 0; i < nb_globals; ++i) {
                 /* Liveness should see that globals are saved back,
                    that is, TS_DEAD, waiting to be reloaded.  */
-                arg_ts = &s->temps[i];
+                arg_ts = tcg_temp(s, i);
                 tcg_debug_assert(arg_ts->state_ptr == 0
                                  || arg_ts->state == TS_DEAD);
             }
@@ -3277,12 +3284,11 @@ static bool liveness_pass_2(TCGContext *s)
 #ifdef CONFIG_DEBUG_TCG
 static void dump_regs(TCGContext *s)
 {
-    TCGTemp *ts;
     int i;
     char buf[64];
 
     for(i = 0; i < s->nb_temps; i++) {
-        ts = &s->temps[i];
+        TCGTemp *ts = tcg_temp(s, i);
         printf("  %10s: ", tcg_get_arg_str_ptr(s, buf, sizeof(buf), ts));
         switch(ts->val_type) {
         case TEMP_VAL_REG:
@@ -3332,7 +3338,7 @@ static void check_regs(TCGContext *s)
         }
     }
     for (k = 0; k < s->nb_temps; k++) {
-        ts = &s->temps[k];
+        ts = tcg_temp(s, k);
         if (ts->val_type == TEMP_VAL_REG
             && ts->kind != TEMP_FIXED
             && s->reg_to_temp[ts->reg] != ts) {
@@ -3594,7 +3600,7 @@ static void save_globals(TCGContext *s, TCGRegSet allocated_regs)
     int i, n;
 
     for (i = 0, n = s->nb_globals; i < n; i++) {
-        temp_save(s, &s->temps[i], allocated_regs);
+        temp_save(s, tcg_temp(s, i), allocated_regs);
     }
 }
 
@@ -3606,7 +3612,7 @@ static void sync_globals(TCGContext *s, TCGRegSet allocated_regs)
     int i, n;
 
     for (i = 0, n = s->nb_globals; i < n; i++) {
-        TCGTemp *ts = &s->temps[i];
+        TCGTemp *ts = tcg_temp(s, i);
         tcg_debug_assert(ts->val_type != TEMP_VAL_REG
                          || ts->kind == TEMP_FIXED
                          || ts->mem_coherent);
@@ -3620,7 +3626,7 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
     int i;
 
     for (i = s->nb_globals; i < s->nb_temps; i++) {
-        TCGTemp *ts = &s->temps[i];
+        TCGTemp *ts = tcg_temp(s, i);
 
         switch (ts->kind) {
         case TEMP_LOCAL:
@@ -3652,7 +3658,8 @@ static void tcg_reg_alloc_cbranch(TCGContext *s, TCGRegSet allocated_regs)
     sync_globals(s, allocated_regs);
 
     for (int i = s->nb_globals; i < s->nb_temps; i++) {
-        TCGTemp *ts = &s->temps[i];
+        TCGTemp *ts = tcg_temp(s, i);
+
         /*
          * The liveness analysis already ensures that temps are dead.
          * Keep tcg_debug_asserts for safety.
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] tcg: Make TCGTempSet expandable
  2021-01-19 18:34 [PATCH 0/5] tcg: Dynamically allocate temporaries Richard Henderson
  2021-01-19 18:34 ` [PATCH 1/5] tcg: Add an index to TCGTemp Richard Henderson
  2021-01-19 18:34 ` [PATCH 2/5] tcg: Introduce and use tcg_temp Richard Henderson
@ 2021-01-19 18:34 ` Richard Henderson
  2021-01-19 18:34 ` [PATCH 4/5] tcg: Adjust tcgv_*_temp/temp_tcgv_* Richard Henderson
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2021-01-19 18:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: lvivier, alistair23

Introduce a complete set of operations on TCGTempSet,
and do not directly call <qemu/bitops.h> functions.
Expand the array as necessary on SET.  Use the tcg
allocation pool so that we do not have to worry about
explicitly freeing the array.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h | 28 +++++++++++++++++++++++++-
 tcg/optimize.c    | 18 ++++++++---------
 tcg/tcg.c         | 51 ++++++++++++++++++++++++++++++++++++++++-------
 3 files changed, 80 insertions(+), 17 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 0d90701dcd..4d001fed39 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -524,9 +524,35 @@ typedef struct TCGTemp {
 typedef struct TCGContext TCGContext;
 
 typedef struct TCGTempSet {
-    unsigned long l[BITS_TO_LONGS(TCG_MAX_TEMPS)];
+    unsigned long *data;
+    size_t word_len;
 } TCGTempSet;
 
+void tempset_init(TCGTempSet *set, size_t len);
+bool tempset_find_first(const TCGTempSet *set, size_t *i);
+void tempset_set(TCGTempSet *set, size_t i);
+
+static inline void tempset_clear_all(TCGTempSet *set)
+{
+    memset(set->data, 0, set->word_len * sizeof(unsigned long));
+}
+
+static inline void tempset_clear(TCGTempSet *set, size_t i)
+{
+    size_t l = i / BITS_PER_LONG;
+    size_t b = i % BITS_PER_LONG;
+    if (likely(l < set->word_len)) {
+        set->data[l] &= ~BIT(b);
+    }
+}
+
+static inline bool tempset_test(const TCGTempSet *set, size_t i)
+{
+    size_t l = i / BITS_PER_LONG;
+    size_t b = i % BITS_PER_LONG;
+    return l < set->word_len && (set->data[l] & BIT(b));
+}
+
 /* While we limit helpers to 6 arguments, for 32-bit hosts, with padding,
    this imples a max of 6*2 (64-bit in) + 2 (64-bit out) = 14 operands.
    There are never more than 2 outputs, which means that we can store all
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 2aa491605e..b0ecef1fb6 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -94,10 +94,10 @@ static void init_ts_info(TCGTempSet *temps_used, TCGTemp *ts)
     size_t idx = temp_idx(ts);
     TempOptInfo *ti;
 
-    if (test_bit(idx, temps_used->l)) {
+    if (tempset_test(temps_used, idx)) {
         return;
     }
-    set_bit(idx, temps_used->l);
+    tempset_set(temps_used, idx);
 
     ti = ts->state_ptr;
     if (ti == NULL) {
@@ -612,7 +612,7 @@ void tcg_optimize(TCGContext *s)
     nb_temps = s->nb_temps;
     nb_globals = s->nb_globals;
 
-    memset(&temps_used, 0, sizeof(temps_used));
+    tempset_init(&temps_used, nb_temps);
     for (i = 0; i < nb_temps; ++i) {
         TCGTemp *ts = tcg_temp(s, i);
         ts->state_ptr = NULL;
@@ -1254,7 +1254,7 @@ void tcg_optimize(TCGContext *s)
                                            op->args[1], op->args[2]);
             if (tmp != 2) {
                 if (tmp) {
-                    memset(&temps_used, 0, sizeof(temps_used));
+                    tempset_clear_all(&temps_used);
                     op->opc = INDEX_op_br;
                     op->args[0] = op->args[3];
                 } else {
@@ -1338,7 +1338,7 @@ void tcg_optimize(TCGContext *s)
             if (tmp != 2) {
                 if (tmp) {
             do_brcond_true:
-                    memset(&temps_used, 0, sizeof(temps_used));
+                    tempset_clear_all(&temps_used);
                     op->opc = INDEX_op_br;
                     op->args[0] = op->args[5];
                 } else {
@@ -1354,7 +1354,7 @@ void tcg_optimize(TCGContext *s)
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_brcond_high:
-                memset(&temps_used, 0, sizeof(temps_used));
+                tempset_clear_all(&temps_used);
                 op->opc = INDEX_op_brcond_i32;
                 op->args[0] = op->args[1];
                 op->args[1] = op->args[3];
@@ -1380,7 +1380,7 @@ void tcg_optimize(TCGContext *s)
                     goto do_default;
                 }
             do_brcond_low:
-                memset(&temps_used, 0, sizeof(temps_used));
+                tempset_clear_all(&temps_used);
                 op->opc = INDEX_op_brcond_i32;
                 op->args[1] = op->args[2];
                 op->args[2] = op->args[4];
@@ -1485,7 +1485,7 @@ void tcg_optimize(TCGContext *s)
             if (!(op->args[nb_oargs + nb_iargs + 1]
                   & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
                 for (i = 0; i < nb_globals; i++) {
-                    if (test_bit(i, temps_used.l)) {
+                    if (tempset_test(&temps_used, i)) {
                         reset_ts(tcg_temp(s, i));
                     }
                 }
@@ -1500,7 +1500,7 @@ void tcg_optimize(TCGContext *s)
                block, otherwise we only trash the output args.  "mask" is
                the non-zero bits mask for the first output arg.  */
             if (def->flags & TCG_OPF_BB_END) {
-                memset(&temps_used, 0, sizeof(temps_used));
+                tempset_clear_all(&temps_used);
             } else {
         do_reset_output:
                 for (i = 0; i < nb_oargs; i++) {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 7284209cff..a505457cee 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1182,7 +1182,9 @@ void tcg_func_start(TCGContext *s)
     s->nb_temps = s->nb_globals;
 
     /* No temps have been previously allocated for size or locality.  */
-    memset(s->free_temps, 0, sizeof(s->free_temps));
+    for (int i = 0; i < ARRAY_SIZE(s->free_temps); ++i) {
+        tempset_init(&s->free_temps[i], TCG_MAX_TEMPS);
+    }
 
     /* No constant temps have been previously allocated. */
     for (int i = 0; i < TCG_TYPE_COUNT; ++i) {
@@ -1324,13 +1326,12 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
     TCGContext *s = tcg_ctx;
     TCGTempKind kind = temp_local ? TEMP_LOCAL : TEMP_NORMAL;
     TCGTemp *ts;
-    int idx, k;
+    size_t idx, k;
 
     k = type + (temp_local ? TCG_TYPE_COUNT : 0);
-    idx = find_first_bit(s->free_temps[k].l, TCG_MAX_TEMPS);
-    if (idx < TCG_MAX_TEMPS) {
+    if (tempset_find_first(&s->free_temps[k], &idx)) {
         /* There is already an available temp with the right type.  */
-        clear_bit(idx, s->free_temps[k].l);
+        tempset_clear(&s->free_temps[k], idx);
 
         ts = tcg_temp(s, idx);
         ts->temp_allocated = 1;
@@ -1403,7 +1404,7 @@ TCGv_vec tcg_temp_new_vec_matching(TCGv_vec match)
 void tcg_temp_free_internal(TCGTemp *ts)
 {
     TCGContext *s = tcg_ctx;
-    int k, idx;
+    size_t k, idx;
 
     /* In order to simplify users of tcg_constant_*, silently ignore free. */
     if (ts->kind == TEMP_CONST) {
@@ -1423,7 +1424,7 @@ void tcg_temp_free_internal(TCGTemp *ts)
 
     idx = temp_idx(ts);
     k = ts->base_type + (ts->kind == TEMP_NORMAL ? 0 : TCG_TYPE_COUNT);
-    set_bit(idx, s->free_temps[k].l);
+    tempset_set(&s->free_temps[k], idx);
 }
 
 TCGTemp *tcg_constant_internal(TCGType type, int64_t val)
@@ -4665,6 +4666,42 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     return tcg_current_code_size(s);
 }
 
+void tempset_init(TCGTempSet *set, size_t len)
+{
+    size_t word_len = BITS_TO_LONGS(len);
+
+    set->word_len = word_len;
+    set->data = tcg_malloc(word_len * sizeof(unsigned long));
+    memset(set->data, 0, word_len * sizeof(unsigned long));
+}
+
+void tempset_set(TCGTempSet *set, size_t i)
+{
+    size_t l = i / BITS_PER_LONG;
+    size_t b = i % BITS_PER_LONG;
+
+    if (l >= set->word_len) {
+        size_t old_blen = set->word_len * sizeof(unsigned long);
+        size_t new_wlen = set->word_len * 2;
+        unsigned long *new_data = tcg_malloc(old_blen * 2);
+
+        memcpy(new_data, set->data, old_blen);
+        memset((char *)new_data + old_blen, 0, old_blen);
+
+        set->data = new_data;
+        set->word_len = new_wlen;
+    }
+    set->data[l] |= BIT(b);
+}
+
+bool tempset_find_first(const TCGTempSet *set, size_t *i)
+{
+    size_t max = set->word_len * BITS_PER_LONG;
+    size_t ret = find_first_bit(set->data, max);
+    *i = ret;
+    return ret < max;
+}
+
 #ifdef CONFIG_PROFILER
 void tcg_dump_info(void)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] tcg: Adjust tcgv_*_temp/temp_tcgv_*
  2021-01-19 18:34 [PATCH 0/5] tcg: Dynamically allocate temporaries Richard Henderson
                   ` (2 preceding siblings ...)
  2021-01-19 18:34 ` [PATCH 3/5] tcg: Make TCGTempSet expandable Richard Henderson
@ 2021-01-19 18:34 ` Richard Henderson
  2021-01-19 18:34 ` [PATCH 5/5] tcg: Dynamically allocate temporaries Richard Henderson
  2021-01-19 23:06 ` [PATCH 0/5] " BALATON Zoltan
  5 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2021-01-19 18:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: lvivier, alistair23

Encode the index of the temporary within the "pointer" rather
than its offset.  This breaks a tie with a statically allocated
array of temps.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 4d001fed39..996addd90c 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -757,6 +757,11 @@ static inline size_t temp_idx(TCGTemp *ts)
     return ts->index;
 }
 
+/*
+ * TCGArg is a convenience for TCGOps, and never exist outside of
+ * code generation with a specific TCGContext.  Simply store the
+ * pointer value within the TCGArg.
+ */
 static inline TCGArg temp_arg(TCGTemp *ts)
 {
     return (uintptr_t)ts;
@@ -767,15 +772,15 @@ static inline TCGTemp *arg_temp(TCGArg a)
     return (TCGTemp *)(uintptr_t)a;
 }
 
-/* Using the offset of a temporary, relative to TCGContext, rather than
-   its index means that we don't use 0.  That leaves offset 0 free for
-   a NULL representation without having to leave index 0 unused.  */
+/*
+ * TCGv_{i32,i64,ptr,vec} must be independent of TCGContext,
+ * so that the globals that we allocate at startup are valid for
+ * the thread-specfic TCGContext when we generate code.
+ * Reserve 0 for NULL, and use the temp index + 1 otherwise.
+ */
 static inline TCGTemp *tcgv_i32_temp(TCGv_i32 v)
 {
-    uintptr_t o = (uintptr_t)v;
-    TCGTemp *t = (void *)tcg_ctx + o;
-    tcg_debug_assert(offsetof(TCGContext, temps[temp_idx(t)]) == o);
-    return t;
+    return v ? tcg_temp(tcg_ctx, (uintptr_t)v - 1) : NULL;
 }
 
 static inline TCGTemp *tcgv_i64_temp(TCGv_i64 v)
@@ -815,8 +820,7 @@ static inline TCGArg tcgv_vec_arg(TCGv_vec v)
 
 static inline TCGv_i32 temp_tcgv_i32(TCGTemp *t)
 {
-    (void)temp_idx(t); /* trigger embedded assert */
-    return (TCGv_i32)((void *)t - (void *)tcg_ctx);
+    return (TCGv_i32)(t ? (uintptr_t)temp_idx(t) + 1 : 0);
 }
 
 static inline TCGv_i64 temp_tcgv_i64(TCGTemp *t)
@@ -837,12 +841,20 @@ static inline TCGv_vec temp_tcgv_vec(TCGTemp *t)
 #if TCG_TARGET_REG_BITS == 32
 static inline TCGv_i32 TCGV_LOW(TCGv_i64 t)
 {
-    return temp_tcgv_i32(tcgv_i64_temp(t));
+    /*
+     * The 64-bit value is a pair of TCGv_i32, with the low part at index 0.
+     * Since we're encoding the index in @t, pass it through unchanged.
+     */
+    return (TCGv_i32)t;
 }
 
 static inline TCGv_i32 TCGV_HIGH(TCGv_i64 t)
 {
-    return temp_tcgv_i32(tcgv_i64_temp(t) + 1);
+    /*
+     * The 64-bit value is a pair of TCGv_i32, with the high part at index 1.
+     * Since we're encoding the index in @t, add one.
+     */
+    return (TCGv_i32)((uintptr_t)t + 1);
 }
 #endif
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] tcg: Dynamically allocate temporaries
  2021-01-19 18:34 [PATCH 0/5] tcg: Dynamically allocate temporaries Richard Henderson
                   ` (3 preceding siblings ...)
  2021-01-19 18:34 ` [PATCH 4/5] tcg: Adjust tcgv_*_temp/temp_tcgv_* Richard Henderson
@ 2021-01-19 18:34 ` Richard Henderson
  2021-01-19 23:06 ` [PATCH 0/5] " BALATON Zoltan
  5 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2021-01-19 18:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: lvivier, alistair23

The caching of constant temps has resulted in a larger overall usage,
overflowing the statically allocated array.  Instead, allocate temps
as needed, placing pointers to the allocated temps into a GPtrArray
for later indexing.

Buglink: https://bugs.launchpad.net/bugs/1912065
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h |  6 ++--
 tcg/tcg.c         | 74 +++++++++++++++++++++++++++++------------------
 2 files changed, 49 insertions(+), 31 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 996addd90c..1da3bce0c8 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -275,7 +275,7 @@ typedef struct TCGPool {
 
 #define TCG_POOL_CHUNK_SIZE 32768
 
-#define TCG_MAX_TEMPS 512
+#define TCG_INIT_TEMPS 512
 #define TCG_MAX_INSNS 512
 
 /* when the size of the arguments of a called function is smaller than
@@ -696,7 +696,7 @@ struct TCGContext {
 
     GHashTable *const_table[TCG_TYPE_COUNT];
     TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
-    TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
+    GPtrArray *temps; /* globals first, temps after */
 
     QTAILQ_HEAD(, TCGOp) ops, free_ops;
     QSIMPLEQ_HEAD(, TCGLabel) labels;
@@ -749,7 +749,7 @@ static inline void *tcg_splitwx_to_rw(const void *rx)
 static inline TCGTemp *tcg_temp(TCGContext *s, size_t idx)
 {
     tcg_debug_assert(idx < s->nb_temps);
-    return &s->temps[idx];
+    return g_ptr_array_index(s->temps, idx);
 }
 
 static inline size_t temp_idx(TCGTemp *ts)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index a505457cee..5ca6860107 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -816,19 +816,24 @@ void tcg_register_thread(void)
 void tcg_register_thread(void)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
-    TCGContext *s = g_malloc(sizeof(*s));
-    unsigned int i, n;
+    TCGContext *s = g_memdup(&tcg_init_ctx, sizeof(*s));
+    unsigned int i, n = tcg_init_ctx.nb_globals;
+    TCGTemp *glob;
     bool err;
 
-    *s = tcg_init_ctx;
-
-    /* Relink mem_base.  */
-    for (i = 0, n = tcg_init_ctx.nb_globals; i < n; ++i) {
+    /* Copy the globals. */
+    s->temps = g_ptr_array_sized_new(TCG_INIT_TEMPS);
+    glob = g_new(TCGTemp, n);
+    for (i = 0; i < n; ++i) {
         TCGTemp *its = tcg_temp(&tcg_init_ctx, i);
+        TCGTemp *ots = glob + i;
+
+        *ots = *its;
         if (its->mem_base) {
-            TCGTemp *ots = tcg_temp(s, i);
-            ots->mem_base = tcg_temp(s, temp_idx(its->mem_base));
+            /* Relink mem_base. */
+            ots->mem_base = glob + temp_idx(its->mem_base);
         }
+        g_ptr_array_add(s->temps, ots);
     }
 
     /* Claim an entry in tcg_ctxs */
@@ -986,11 +991,11 @@ void tcg_context_init(TCGContext *s)
 
     memset(s, 0, sizeof(*s));
     s->nb_globals = 0;
+    s->temps = g_ptr_array_sized_new(TCG_INIT_TEMPS);
 
-    /* Count total number of arguments and allocate the corresponding
-       space */
+    /* Count total number of arguments and allocate the corresponding space */
     total_args = 0;
-    for(op = 0; op < NB_OPS; op++) {
+    for (op = 0; op < NB_OPS; op++) {
         def = &tcg_op_defs[op];
         n = def->nb_iargs + def->nb_oargs;
         total_args += n;
@@ -998,7 +1003,7 @@ void tcg_context_init(TCGContext *s)
 
     args_ct = g_new0(TCGArgConstraint, total_args);
 
-    for(op = 0; op < NB_OPS; op++) {
+    for (op = 0; op < NB_OPS; op++) {
         def = &tcg_op_defs[op];
         def->args_ct = args_ct;
         n = def->nb_iargs + def->nb_oargs;
@@ -1179,11 +1184,13 @@ void tcg_prologue_init(TCGContext *s)
 void tcg_func_start(TCGContext *s)
 {
     tcg_pool_reset(s);
+
     s->nb_temps = s->nb_globals;
+    g_ptr_array_set_size(s->temps, s->nb_globals);
 
     /* No temps have been previously allocated for size or locality.  */
     for (int i = 0; i < ARRAY_SIZE(s->free_temps); ++i) {
-        tempset_init(&s->free_temps[i], TCG_MAX_TEMPS);
+        tempset_init(&s->free_temps[i], TCG_INIT_TEMPS);
     }
 
     /* No constant temps have been previously allocated. */
@@ -1208,27 +1215,40 @@ void tcg_func_start(TCGContext *s)
 
 static TCGTemp *tcg_temp_alloc(TCGContext *s)
 {
-    int n = s->nb_temps++;
     TCGTemp *ret;
+    int n = s->nb_temps;
 
-    tcg_debug_assert(n < TCG_MAX_TEMPS);
-    ret = &s->temps[n];
+    /* Note that TCGTemp.index is 16 bits. */
+    tcg_debug_assert(n <= UINT16_MAX);
+    s->nb_temps = n + 1;
+
+    /* Non-global temps are allocated from the pool. */
+    ret = tcg_malloc(sizeof(TCGTemp));
     memset(ret, 0, sizeof(TCGTemp));
     ret->index = n;
 
+    g_ptr_array_add(s->temps, ret);
     return ret;
 }
 
-static inline TCGTemp *tcg_global_alloc(TCGContext *s)
+static TCGTemp *tcg_global_alloc(TCGContext *s)
 {
-    TCGTemp *ts;
+    TCGTemp *ret;
+    int n = s->nb_globals;
 
+    /* Note that TCGTemp.index is 16 bits. */
+    tcg_debug_assert(n <= UINT16_MAX);
     tcg_debug_assert(s->nb_globals == s->nb_temps);
-    s->nb_globals++;
-    ts = tcg_temp_alloc(s);
-    ts->kind = TEMP_GLOBAL;
+    s->nb_globals = n + 1;
+    s->nb_temps = n + 1;
 
-    return ts;
+    /* Global temps are allocated from the main heap and live forever. */
+    ret = g_new0(TCGTemp, 1);
+    ret->index = n;
+    ret->kind = TEMP_GLOBAL;
+
+    g_ptr_array_add(s->temps, ret);
+    return ret;
 }
 
 static TCGTemp *tcg_global_reg_new_internal(TCGContext *s, TCGType type,
@@ -1236,9 +1256,7 @@ static TCGTemp *tcg_global_reg_new_internal(TCGContext *s, TCGType type,
 {
     TCGTemp *ts;
 
-    if (TCG_TARGET_REG_BITS == 32 && type != TCG_TYPE_I32) {
-        tcg_abort();
-    }
+    tcg_debug_assert(TCG_TARGET_REG_BITS != 32 || type == TCG_TYPE_I32);
 
     ts = tcg_global_alloc(s);
     ts->base_type = type;
@@ -1299,7 +1317,7 @@ TCGTemp *tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
         pstrcat(buf, sizeof(buf), "_0");
         ts->name = strdup(buf);
 
-        tcg_debug_assert(ts2 == ts + 1);
+        tcg_debug_assert(temp_idx(ts2) == temp_idx(ts) + 1);
         ts2->base_type = TCG_TYPE_I64;
         ts2->type = TCG_TYPE_I32;
         ts2->indirect_reg = indirect_reg;
@@ -1347,7 +1365,7 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
             ts->temp_allocated = 1;
             ts->kind = kind;
 
-            tcg_debug_assert(ts2 == ts + 1);
+            tcg_debug_assert(temp_idx(ts2) == temp_idx(ts) + 1);
             ts2->base_type = TCG_TYPE_I64;
             ts2->type = TCG_TYPE_I32;
             ts2->temp_allocated = 1;
@@ -1456,7 +1474,7 @@ TCGTemp *tcg_constant_internal(TCGType type, int64_t val)
              */
             ts->val = val;
 
-            tcg_debug_assert(ts2 == ts + 1);
+            tcg_debug_assert(temp_idx(ts2) == temp_idx(ts) + 1);
             ts2->base_type = TCG_TYPE_I64;
             ts2->type = TCG_TYPE_I32;
             ts2->kind = TEMP_CONST;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] tcg: Dynamically allocate temporaries
  2021-01-19 18:34 [PATCH 0/5] tcg: Dynamically allocate temporaries Richard Henderson
                   ` (4 preceding siblings ...)
  2021-01-19 18:34 ` [PATCH 5/5] tcg: Dynamically allocate temporaries Richard Henderson
@ 2021-01-19 23:06 ` BALATON Zoltan
  2021-01-19 23:33   ` Philippe Mathieu-Daudé
  2021-01-21 20:09   ` BALATON Zoltan
  5 siblings, 2 replies; 11+ messages in thread
From: BALATON Zoltan @ 2021-01-19 23:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: lvivier, alistair23, qemu-devel

On Tue, 19 Jan 2021, Richard Henderson wrote:
> My recent change for caching tcg constants has, in a number of cases,
> overflowed the statically allocated array of temporaries.  Change to
> dynamic allocation.

This seems to work for me so

Tested-by: BALATON Zoltan <balaton@eik.bme.hu>

but have you done any performance tests to check that this actually 
improves emulation speed? To mee it seems slower. Booting AmigaOS on 
sam460ex with c0dd6654f207 (just before your TCG series) takes:

real	0m33.829s
user	0m34.432s
sys	0m0.296s

but on HEAD with this series:

real	0m44.381s
user	0m46.058s
sys	0m0.532s

This is noticable decrease in speed also without measuring it. With just 
increasing the TCG_MAX_TEMPS to 2048 on 7c79721606be without this series I 
get:

real	0m42.681s
user	0m44.208s
sys	0m0.435s

So the performance regression is somewhere in the original series not in 
this fix up series.

> I'll note that nothing in check-acceptance triggers this overflow.
> Anyone care to add some more test cases there?

The proposed test for the upcoming pegasos2 machine may also catch this 
(when that will be merged, its dependencies are still under review) or the 
sam460ex test that currently only checks the firmware could be enhanced to 
try to boot AROS if somebody wants to do that. The drawback is that it 
needs an external iso whereas the current test doesn't need any additional 
images but it did not catch problems with IRQ and neither this problem 
with TCG temps. This problem was also found with riscv and mips I think 
but don't know if those would be easier to test.

Regards,
BALATON Zoltan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] tcg: Dynamically allocate temporaries
  2021-01-19 23:06 ` [PATCH 0/5] " BALATON Zoltan
@ 2021-01-19 23:33   ` Philippe Mathieu-Daudé
  2021-01-20  9:03     ` BALATON Zoltan
  2021-01-21 20:09   ` BALATON Zoltan
  1 sibling, 1 reply; 11+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-01-19 23:33 UTC (permalink / raw)
  To: BALATON Zoltan, Richard Henderson, Lukáš Doktor
  Cc: lvivier, alistair23, qemu-devel

On 1/20/21 12:06 AM, BALATON Zoltan wrote:
> On Tue, 19 Jan 2021, Richard Henderson wrote:
>> My recent change for caching tcg constants has, in a number of cases,
>> overflowed the statically allocated array of temporaries.  Change to
>> dynamic allocation.
> 
> This seems to work for me so
> 
> Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
> 
> but have you done any performance tests to check that this actually
> improves emulation speed? To mee it seems slower. Booting AmigaOS on
> sam460ex with c0dd6654f207 (just before your TCG series) takes:
> 
> real    0m33.829s
> user    0m34.432s
> sys    0m0.296s
> 
> but on HEAD with this series:
> 
> real    0m44.381s
> user    0m46.058s
> sys    0m0.532s
> 
> This is noticable decrease in speed also without measuring it. With just
> increasing the TCG_MAX_TEMPS to 2048 on 7c79721606be without this series
> I get:
> 
> real    0m42.681s
> user    0m44.208s
> sys    0m0.435s
> 
> So the performance regression is somewhere in the original series not in
> this fix up series.

Cc'ing Lukas for the performance part, as he is investigating
how to catch such regressions.

>> I'll note that nothing in check-acceptance triggers this overflow.
>> Anyone care to add some more test cases there?
> 
> The proposed test for the upcoming pegasos2 machine may also catch this
> (when that will be merged, its dependencies are still under review)

What are your running on pegasos2?

> or
> the sam460ex test that currently only checks the firmware could be
> enhanced to try to boot AROS if somebody wants to do that. The drawback
> is that it needs an external iso whereas the current test doesn't need
> any additional images but it did not catch problems with IRQ and neither
> this problem with TCG temps.

So this other option is not very useful, right?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] tcg: Dynamically allocate temporaries
  2021-01-19 23:33   ` Philippe Mathieu-Daudé
@ 2021-01-20  9:03     ` BALATON Zoltan
  0 siblings, 0 replies; 11+ messages in thread
From: BALATON Zoltan @ 2021-01-20  9:03 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Lukáš Doktor, lvivier, Richard Henderson, qemu-devel,
	alistair23

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4061 bytes --]

On Wed, 20 Jan 2021, Philippe Mathieu-Daudé wrote:
> On 1/20/21 12:06 AM, BALATON Zoltan wrote:
>> On Tue, 19 Jan 2021, Richard Henderson wrote:
>>> My recent change for caching tcg constants has, in a number of cases,
>>> overflowed the statically allocated array of temporaries.  Change to
>>> dynamic allocation.
>>
>> This seems to work for me so
>>
>> Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
>>
>> but have you done any performance tests to check that this actually
>> improves emulation speed? To mee it seems slower. Booting AmigaOS on
>> sam460ex with c0dd6654f207 (just before your TCG series) takes:
>>
>> real    0m33.829s
>> user    0m34.432s
>> sys    0m0.296s
>>
>> but on HEAD with this series:
>>
>> real    0m44.381s
>> user    0m46.058s
>> sys    0m0.532s
>>
>> This is noticable decrease in speed also without measuring it. With just
>> increasing the TCG_MAX_TEMPS to 2048 on 7c79721606be without this series
>> I get:
>>
>> real    0m42.681s
>> user    0m44.208s
>> sys    0m0.435s
>>
>> So the performance regression is somewhere in the original series not in
>> this fix up series.
>
> Cc'ing Lukas for the performance part, as he is investigating
> how to catch such regressions.

I think there was a GSoC last year that resulted in some scripts to do 
performance testing and even bisecting regressions. I've seen a few 
reports posted about those but maybe the project should also use the 
results and run these on some dedicated test machines regularly to be 
useful. The GSoC has ended, so the student has left and I think the mentor 
was Aleksandar M. who may also be doing something else now so these 
benchmarking scripts seem to be abandoned. But maybe could be picked up as 
a starting point or inspiration for any similar activity to build on those 
results.

>>> I'll note that nothing in check-acceptance triggers this overflow.
>>> Anyone care to add some more test cases there?
>>
>> The proposed test for the upcoming pegasos2 machine may also catch this
>> (when that will be merged, its dependencies are still under review)
>
> What are your running on pegasos2?

I've sent you before what test I think we could do for pegasos2, see:

https://lists.nongnu.org/archive/html/qemu-ppc/2021-01/msg00112.html

but I could not write the script for that and have no way to test it so 
some help would be needed with that. By the way, before that there are 
also the vt82c686 patches still waiting for review. I hope you haven't 
forgot and will eventually come back to them.

>> or
>> the sam460ex test that currently only checks the firmware could be
>> enhanced to try to boot AROS if somebody wants to do that. The drawback
>> is that it needs an external iso whereas the current test doesn't need
>> any additional images but it did not catch problems with IRQ and neither
>> this problem with TCG temps.
>
> So this other option is not very useful, right?

It's still useful to test if the machine is working at all but the 
firmware does not seem to use interrupts and if you don't boot anything it 
won't access disks so some parts will not be tested by only firmware level 
testing. Basically only CPU, RAM, ROM, serial would be tested and that the 
machine could be created which would still catch some bugs but IRQs and 
IDE probably would only be tested by trying to boot an OS. I think Guenter 
runs Linux kernel boot tests but with -kernel option so disks would still 
not be tested by that therefore it may be more useful to run some other OS 
booted from CD just to increase coverage. Since I plan to use MorphOS for 
pegasos2 as test, sam460ex could use AROS (AmigaOS is not freely available 
so that cannot be used but I also test with that). The AROS isos are free 
but may be somewhat unstable so instead of using the latest, one should 
use a known working version and only update after manual testing, 
otherwise the test may break due to change in AROS.

Regards,
BALATON Zoltan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] tcg: Dynamically allocate temporaries
  2021-01-19 23:06 ` [PATCH 0/5] " BALATON Zoltan
  2021-01-19 23:33   ` Philippe Mathieu-Daudé
@ 2021-01-21 20:09   ` BALATON Zoltan
  2021-01-21 20:17     ` Richard Henderson
  1 sibling, 1 reply; 11+ messages in thread
From: BALATON Zoltan @ 2021-01-21 20:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: lvivier, alistair23, qemu-devel

On Wed, 20 Jan 2021, BALATON Zoltan wrote:
> On Tue, 19 Jan 2021, Richard Henderson wrote:
>> My recent change for caching tcg constants has, in a number of cases,
>> overflowed the statically allocated array of temporaries.  Change to
>> dynamic allocation.
>
> This seems to work for me so
>
> Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
>
> but have you done any performance tests to check that this actually improves 
> emulation speed? To mee it seems slower. Booting AmigaOS on sam460ex with 
> c0dd6654f207 (just before your TCG series) takes:
>
> real	0m33.829s
> user	0m34.432s
> sys	0m0.296s
>
> but on HEAD with this series:
>
> real	0m44.381s
> user	0m46.058s
> sys	0m0.532s
>
> This is noticable decrease in speed also without measuring it. With just 
> increasing the TCG_MAX_TEMPS to 2048 on 7c79721606be without this series I 
> get:
>
> real	0m42.681s
> user	0m44.208s
> sys	0m0.435s
>
> So the performance regression is somewhere in the original series not in this 
> fix up series.

I've tried to do more measurements to identify where it got slower but I 
could not reproduce it today. I'm now getting around 42 seconds both 
before and after the series so not sure what made it faster before but 
it's probably not because of a code change then.

Regards,
BALATON Zoltan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] tcg: Dynamically allocate temporaries
  2021-01-21 20:09   ` BALATON Zoltan
@ 2021-01-21 20:17     ` Richard Henderson
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2021-01-21 20:17 UTC (permalink / raw)
  To: BALATON Zoltan; +Cc: lvivier, alistair23, qemu-devel

On 1/21/21 10:09 AM, BALATON Zoltan wrote:
> On Wed, 20 Jan 2021, BALATON Zoltan wrote:
>> On Tue, 19 Jan 2021, Richard Henderson wrote:
>>> My recent change for caching tcg constants has, in a number of cases,
>>> overflowed the statically allocated array of temporaries.  Change to
>>> dynamic allocation.
>>
>> This seems to work for me so
>>
>> Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
>>
>> but have you done any performance tests to check that this actually improves
>> emulation speed? To mee it seems slower. Booting AmigaOS on sam460ex with
>> c0dd6654f207 (just before your TCG series) takes:
>>
>> real    0m33.829s
>> user    0m34.432s
>> sys    0m0.296s
>>
>> but on HEAD with this series:
>>
>> real    0m44.381s
>> user    0m46.058s
>> sys    0m0.532s
>>
>> This is noticable decrease in speed also without measuring it. With just
>> increasing the TCG_MAX_TEMPS to 2048 on 7c79721606be without this series I get:
>>
>> real    0m42.681s
>> user    0m44.208s
>> sys    0m0.435s
>>
>> So the performance regression is somewhere in the original series not in this
>> fix up series.
> 
> I've tried to do more measurements to identify where it got slower but I could
> not reproduce it today. I'm now getting around 42 seconds both before and after
> the series so not sure what made it faster before but it's probably not because
> of a code change then.

That's reassuring.  I hadn't been able to measure a performance regression myself.

(The kind of TB that caused this SEGV thread and creates oodles of temps seems
to be an outlier.  Otherwise there should be very little change to non-vector
code.)


r~


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-01-21 20:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-19 18:34 [PATCH 0/5] tcg: Dynamically allocate temporaries Richard Henderson
2021-01-19 18:34 ` [PATCH 1/5] tcg: Add an index to TCGTemp Richard Henderson
2021-01-19 18:34 ` [PATCH 2/5] tcg: Introduce and use tcg_temp Richard Henderson
2021-01-19 18:34 ` [PATCH 3/5] tcg: Make TCGTempSet expandable Richard Henderson
2021-01-19 18:34 ` [PATCH 4/5] tcg: Adjust tcgv_*_temp/temp_tcgv_* Richard Henderson
2021-01-19 18:34 ` [PATCH 5/5] tcg: Dynamically allocate temporaries Richard Henderson
2021-01-19 23:06 ` [PATCH 0/5] " BALATON Zoltan
2021-01-19 23:33   ` Philippe Mathieu-Daudé
2021-01-20  9:03     ` BALATON Zoltan
2021-01-21 20:09   ` BALATON Zoltan
2021-01-21 20:17     ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.