From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:40716)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1athFA-0005Ty-Ji
	for qemu-devel@nongnu.org; Fri, 22 Apr 2016 16:00:05 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1athF7-0001xx-AM
	for qemu-devel@nongnu.org; Fri, 22 Apr 2016 16:00:00 -0400
Received: from mail-qk0-x243.google.com ([2607:f8b0:400d:c09::243]:33532)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1athF7-0001xs-3K
	for qemu-devel@nongnu.org; Fri, 22 Apr 2016 15:59:57 -0400
Received: by mail-qk0-x243.google.com with SMTP id q184so6458194qkf.0
	for <qemu-devel@nongnu.org>; Fri, 22 Apr 2016 12:59:56 -0700 (PDT)
Sender: Richard Henderson <rth7680@gmail.com>
Reply-To: Richard Henderson <rth@twiddle.net>
References: <1461107270-19234-1-git-send-email-cota@braap.org>
	<1461107270-19234-12-git-send-email-cota@braap.org>
	<571A6245.4070209@twiddle.net>
From: Richard Henderson <rth@twiddle.net>
Message-ID: <571A82B8.5080908@twiddle.net>
Date: Fri, 22 Apr 2016 12:59:52 -0700
MIME-Version: 1.0
In-Reply-To: <571A6245.4070209@twiddle.net>
Content-Type: multipart/mixed; boundary="------------050403030907080203060003"
Subject: Re: [Qemu-devel] [PATCH v3 11/11] translate-all: add tb hash bucket
 info to 'info jit' dump
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Emilio G. Cota" <cota@braap.org>, QEMU Developers <qemu-devel@nongnu.org>, MTTCG Devel <mttcg@greensocs.com>
Cc: =?UTF-8?Q?Alex_Benn=c3=a9e?= <alex.bennee@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Peter Crosthwaite <crosthwaite.peter@gmail.com>, Peter Maydell <peter.maydell@linaro.org>, Sergey Fedorov <serge.fdrv@gmail.com>

This is a multi-part message in MIME format.
--------------050403030907080203060003
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit

On 04/22/2016 10:41 AM, Richard Henderson wrote:
> On 04/19/2016 04:07 PM, Emilio G. Cota wrote:
>> +    ht_avg_len = qht_avg_bucket_chain_length(&tcg_ctx.tb_ctx.htable, &ht_heads);
>> +    cpu_fprintf(f, "TB hash avg chain   %0.5f buckets\n", ht_avg_len);
>> +    cpu_fprintf(f, "TB hash size        %zu head buckets\n", ht_heads);
> 
> 
> I think the accounting is questionable here.
> 
> Consider the following data:
> 
> TB count             230467/671088
> TB invalidate count  25915
> TB hash avg chain    1.03073 buckets
> TB hash size         131072 head buckets
> 
> This means that we've got 230467 - 25915 = 204552 active TB's, installed into a
> hash table with 131072 heads.  For a perfectly uniform distribution of TBs,
> that would be an average chain length of 204552 / 131072 = 1.56.
> 
> In order to get the average down to 1.03, there need to be a substantial number
> of heads with zero entries.
> 
> I think perhaps it might be more enlightening to separately account for empty
> and non-empty heads.  E.g.
> 
> TB hash buckets used  xxxx/131072
> TB hash avg chain     yyyy
> 
> where xxxx is the number of non-empty heads, and yyyy = |TBs| / xxxx.
> 
> I also wonder if it wouldn't be better to size the hash table as appropriate
> for the maximum number of allowable TBs.

FWIW, so that I could get an idea of how the stats change as we improve the
hashing, I inserted the attachment 1 patch between patches 5 and 6, and with
attachment 2 attempting to fix the accounting for patches 9 and 10.

For booting an alpha kernel to login prompt:

Before hashing changes (@5/11)

TB count             175363/671088
TB invalidate count  3996
TB hash buckets      31731/32768
TB hash avg chain    5.289 max=59

After xxhash patch (@7/11)

TB hash buckets      32582/32768
TB hash avg chain    5.260 max=18

So far so good!

After qht patches (@11/11)

TB hash buckets      94360/131072
TB hash avg chain    1.774 max=8

Do note that those last numbers are off: 1.774 avg * 94360 used buckets =
167394 total entries, which is far from 171367, the correct number of total
entries.

I'm tempted to pull over gcc's non-chaining hash table implementation
(libiberty/hashtab.c, still gplv2+) and compare...


r~

--------------050403030907080203060003
Content-Type: text/plain; charset=UTF-8;
 name="z1.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="z1.txt"

>>From cdc7b3631fd78bd2e31d2823f7543e2a56681149 Mon Sep 17 00:00:00 2001
From: Richard Henderson <rth@twiddle.net>
Date: Fri, 22 Apr 2016 11:28:52 -0700
Subject: translate-all: Add hashtable accounting to info jit

Dump hash table occupancy numbers with "info jit".

Signed-off-by: Richard Henderson <rth@twiddle.net>

diff --git a/translate-all.c b/translate-all.c
index 1a8f68b..ed296d5 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1671,39 +1671,55 @@ void tb_flush_jmp_cache(CPUState *cpu, target_ulong addr)
 
 void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
 {
-    int i, target_code_size, max_target_code_size;
-    int direct_jmp_count, direct_jmp2_count, cross_page;
+    size_t target_code_size, max_target_code_size;
+    unsigned direct_jmp_count, direct_jmp2_count, cross_page;
+    unsigned used_buckets, max_chain, hash_tbs;
     TranslationBlock *tb;
+    int i;
 
     target_code_size = 0;
     max_target_code_size = 0;
     cross_page = 0;
     direct_jmp_count = 0;
     direct_jmp2_count = 0;
-    for (i = 0; i < tcg_ctx.tb_ctx.nb_tbs; i++) {
-        tb = &tcg_ctx.tb_ctx.tbs[i];
-        target_code_size += tb->size;
-        if (tb->size > max_target_code_size) {
-            max_target_code_size = tb->size;
-        }
-        if (tb->page_addr[1] != -1) {
-            cross_page++;
-        }
-        if (tb->tb_next_offset[0] != 0xffff) {
-            direct_jmp_count++;
-            if (tb->tb_next_offset[1] != 0xffff) {
-                direct_jmp2_count++;
+    used_buckets = 0;
+    hash_tbs = 0;
+    max_chain = 0;
+
+    for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) {
+        if (tcg_ctx.tb_ctx.tb_phys_hash[i]) {
+            unsigned this_chain = 0;
+            for (tb = tcg_ctx.tb_ctx.tb_phys_hash[i]; tb != NULL;
+                 tb = tb->phys_hash_next) {
+                this_chain++;
+                hash_tbs++;
+                target_code_size += tb->size;
+                if (tb->page_addr[1] != -1) {
+                    cross_page++;
+                }
+                if (tb->tb_next_offset[0] != 0xffff) {
+                    direct_jmp_count++;
+                    if (tb->tb_next_offset[1] != 0xffff) {
+                        direct_jmp2_count++;
+                    }
+                }
             }
+            if (this_chain > max_chain) {
+                max_chain = this_chain;
+            }
+            used_buckets++;
         }
     }
-    /* XXX: avoid using doubles ? */
+    assert(hash_tbs ==
+           tcg_ctx.tb_ctx.nb_tbs - tcg_ctx.tb_ctx.tb_phys_invalidate_count);
+
     cpu_fprintf(f, "Translation buffer state:\n");
     cpu_fprintf(f, "gen code size       %td/%zd\n",
                 tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer,
                 tcg_ctx.code_gen_highwater - tcg_ctx.code_gen_buffer);
     cpu_fprintf(f, "TB count            %d/%d\n",
             tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.code_gen_max_blocks);
-    cpu_fprintf(f, "TB avg target size  %d max=%d bytes\n",
+    cpu_fprintf(f, "TB avg target size  %zd max=%zd bytes\n",
             tcg_ctx.tb_ctx.nb_tbs ? target_code_size /
                     tcg_ctx.tb_ctx.nb_tbs : 0,
             max_target_code_size);
@@ -1717,13 +1733,18 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     cpu_fprintf(f, "cross page TB count %d (%d%%)\n", cross_page,
             tcg_ctx.tb_ctx.nb_tbs ? (cross_page * 100) /
                                     tcg_ctx.tb_ctx.nb_tbs : 0);
-    cpu_fprintf(f, "direct jump count   %d (%d%%) (2 jumps=%d %d%%)\n",
+    cpu_fprintf(f, "direct jump count   %u (%u%%) (2 jumps=%u %u%%)\n",
                 direct_jmp_count,
                 tcg_ctx.tb_ctx.nb_tbs ? (direct_jmp_count * 100) /
                         tcg_ctx.tb_ctx.nb_tbs : 0,
                 direct_jmp2_count,
                 tcg_ctx.tb_ctx.nb_tbs ? (direct_jmp2_count * 100) /
                         tcg_ctx.tb_ctx.nb_tbs : 0);
+    cpu_fprintf(f, "TB hash buckets     %u/%d\n",
+                used_buckets, CODE_GEN_PHYS_HASH_SIZE);
+    cpu_fprintf(f, "TB hash avg chain   %0.3f max=%u\n",
+                used_buckets ? (double)hash_tbs / used_buckets : 0.0,
+                max_chain);
     cpu_fprintf(f, "\nStatistics:\n");
     cpu_fprintf(f, "TB flush count      %d\n", tcg_ctx.tb_ctx.tb_flush_count);
     cpu_fprintf(f, "TB invalidate count %d\n",

--------------050403030907080203060003
Content-Type: text/plain; charset=UTF-8;
 name="z2.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="z2.txt"

>>From 7f1d677f3d085b5891e1adbd5f602185d68ba81a Mon Sep 17 00:00:00 2001
From: Richard Henderson <rth@twiddle.net>
Date: Fri, 22 Apr 2016 12:50:00 -0700
Subject: fixup to {09,10,11}/11


diff --git a/include/qemu/qht.h b/include/qemu/qht.h
index a0a1aa8..2d0b58f 100644
--- a/include/qemu/qht.h
+++ b/include/qemu/qht.h
@@ -49,6 +49,14 @@ void qht_grow(struct qht *ht);
 
 void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp,
                  uint32_t hash);
-double qht_avg_bucket_chain_length(struct qht *ht, size_t *n_head_buckets);
+
+struct qht_stats {
+    size_t used_buckets;
+    size_t max_buckets;
+    size_t used_entries;
+    size_t max_chain;
+};
+
+struct qht_stats qht_statistics(struct qht *ht);
 
 #endif /* QHT_H */
diff --git a/translate-all.c b/translate-all.c
index 3b73b46..a9ceb0a 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1664,8 +1664,7 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
 {
     size_t target_code_size, max_target_code_size;
     unsigned direct_jmp_count, direct_jmp2_count, cross_page;
-    unsigned used_buckets, max_chain, hash_tbs;
-    TranslationBlock *tb;
+    struct qht_stats hinfo;
     int i;
 
     target_code_size = 0;
@@ -1673,35 +1672,23 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     cross_page = 0;
     direct_jmp_count = 0;
     direct_jmp2_count = 0;
-    used_buckets = 0;
-    hash_tbs = 0;
-    max_chain = 0;
-
-    for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) {
-        if (tcg_ctx.tb_ctx.tb_phys_hash[i]) {
-            unsigned this_chain = 0;
-            for (tb = tcg_ctx.tb_ctx.tb_phys_hash[i]; tb != NULL;
-                 tb = tb->phys_hash_next) {
-                this_chain++;
-                hash_tbs++;
-                target_code_size += tb->size;
-                if (tb->page_addr[1] != -1) {
-                    cross_page++;
-                }
-                if (tb->tb_next_offset[0] != 0xffff) {
-                    direct_jmp_count++;
-                    if (tb->tb_next_offset[1] != 0xffff) {
-                        direct_jmp2_count++;
-                    }
-                }
-            }
-            if (this_chain > max_chain) {
-                max_chain = this_chain;
+
+    for (i = 0; i < tcg_ctx.tb_ctx.nb_tbs; i++) {
+        const TranslationBlock *tb = &tcg_ctx.tb_ctx.tbs[i];
+        target_code_size += tb->size;
+        if (tb->page_addr[1] != -1) {
+            cross_page++;
+        }
+        if (tb->tb_next_offset[0] != 0xffff) {
+            direct_jmp_count++;
+            if (tb->tb_next_offset[1] != 0xffff) {
+                direct_jmp2_count++;
             }
-            used_buckets++;
         }
     }
-    assert(hash_tbs ==
+
+    hinfo = qht_statistics(&tcg_ctx.tb_ctx.htable);
+    assert(hinfo.used_entries ==
            tcg_ctx.tb_ctx.nb_tbs - tcg_ctx.tb_ctx.tb_phys_invalidate_count);
 
     cpu_fprintf(f, "Translation buffer state:\n");
@@ -1731,11 +1718,12 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
                 direct_jmp2_count,
                 tcg_ctx.tb_ctx.nb_tbs ? (direct_jmp2_count * 100) /
                         tcg_ctx.tb_ctx.nb_tbs : 0);
-    cpu_fprintf(f, "TB hash buckets     %u/%d\n",
-                used_buckets, CODE_GEN_PHYS_HASH_SIZE);
-    cpu_fprintf(f, "TB hash avg chain   %0.3f max=%u\n",
-                used_buckets ? (double)hash_tbs / used_buckets : 0.0,
-                max_chain);
+    cpu_fprintf(f, "TB hash buckets     %zu/%zu\n",
+                hinfo.used_buckets, hinfo.max_buckets);
+    cpu_fprintf(f, "TB hash avg chain   %0.3f max=%zu\n",
+                hinfo.used_buckets
+                ? (double)hinfo.used_entries / hinfo.used_buckets : 0.0,
+                hinfo.max_chain);
     cpu_fprintf(f, "\nStatistics:\n");
     cpu_fprintf(f, "TB flush count      %d\n", tcg_ctx.tb_ctx.tb_flush_count);
     cpu_fprintf(f, "TB invalidate count %d\n",
diff --git a/util/qht.c b/util/qht.c
index 05ea5e8..535057b 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -556,35 +556,45 @@ void qht_grow(struct qht *ht)
  * value should be close to 1.
  * Note that each bucket tracks up to QHT_BUCKET_ENTRIES items.
  */
-double qht_avg_bucket_chain_length(struct qht *ht, size_t *n_head_buckets)
+struct qht_stats qht_statistics(struct qht *ht)
 {
     struct qht_map *map;
-    size_t count = 0;
-    size_t i;
+    struct qht_stats s = {};
+    size_t i, n;
 
     map = atomic_read(&ht->map);
     /* paired with smp_wmb() before setting ht->map */
     smp_rmb();
+    s.max_buckets = n = map->n;
 
-    for (i = 0; i < map->n; i++) {
+    for (i = 0; i < n; i++) {
         struct qht_bucket *head = &map->buckets[i];
         struct qht_bucket *b;
-        size_t bucket_count;
+        size_t this_chain;
         uint32_t version;
 
         do {
             version = seqlock_read_begin(&head->sequence);
-            bucket_count = 0;
+            this_chain = 0;
             b = head;
             do {
-                bucket_count++;
+                int j;
+                for (j = 0; j < QHT_BUCKET_ENTRIES; j++) {
+                    if (b->hashes[j]) {
+                        this_chain++;
+                    }
+                }
                 b = b->next;
             } while (b);
         } while (seqlock_read_retry(&head->sequence, version));
-        count += bucket_count;
-    }
-    if (n_head_buckets) {
-        *n_head_buckets = map->n;
+        if (this_chain != 0) {
+            s.used_entries += this_chain;
+            if (s.max_chain < this_chain) {
+                s.max_chain = this_chain;
+            }
+            s.used_buckets++;
+        }
     }
-    return (double)count / map->n;
+
+    return s;
 }

--------------050403030907080203060003--