All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Emilio G. Cota" <cota@braap.org>
To: Richard Henderson <rth@twiddle.net>
Cc: "QEMU Developers" <qemu-devel@nongnu.org>,
	"MTTCG Devel" <mttcg@greensocs.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Peter Crosthwaite" <crosthwaite.peter@gmail.com>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Sergey Fedorov" <serge.fdrv@gmail.com>
Subject: Re: [Qemu-devel] [PATCH v3 11/11] translate-all: add tb hash bucket info to 'info jit' dump
Date: Fri, 22 Apr 2016 19:57:38 -0400	[thread overview]
Message-ID: <20160422235738.GA2410@flamenco> (raw)
In-Reply-To: <571A82B8.5080908@twiddle.net>

On Fri, Apr 22, 2016 at 12:59:52 -0700, Richard Henderson wrote:
> FWIW, so that I could get an idea of how the stats change as we improve the
> hashing, I inserted the attachment 1 patch between patches 5 and 6, and with
> attachment 2 attempting to fix the accounting for patches 9 and 10.

For qht, I dislike the approach of reporting "avg chain" per-element,
instead of per-bucket. Performance for a bucket whose entries are
all valid is virtually the same as that of a bucket that only
has one valid element; thus, with per-bucket reporting, we'd say that
the chain lenght is 1 in both cases, i.e. "perfect". With per-element
reporting, we'd report 4 (on a 64-bit host, since that's the value of
QHT_BUCKET_ENTRIES) when the bucket is full, which IMO gives the
wrong idea (users would think they're in trouble, when they're not).

Using the avg-bucket-chain metric you can test how good the hashing is.
For instance, the metric is 1.01 for xxhash with phys_pc, pc and flags
(i.e. func5), and 1.21 if func5 takes only a valid phys_pc (the other two are 0).

I think reporting fully empty buckets as well as the longest chain
(of buckets for qht) in addition to this metric is a good idea, though.

> For booting an alpha kernel to login prompt:
> 
> Before hashing changes (@5/11)
> 
> TB count             175363/671088
> TB invalidate count  3996
> TB hash buckets      31731/32768
> TB hash avg chain    5.289 max=59
> 
> After xxhash patch (@7/11)
> 
> TB hash buckets      32582/32768
> TB hash avg chain    5.260 max=18
> 
> So far so good!
> 
> After qht patches (@11/11)
> 
> TB hash buckets      94360/131072
> TB hash avg chain    1.774 max=8
> 
> Do note that those last numbers are off: 1.774 avg * 94360 used buckets =
> 167394 total entries, which is far from 171367, the correct number of total
> entries.

If those numbers are off, then either this
    assert(hinfo.used_entries ==
           tcg_ctx.tb_ctx.nb_tbs - tcg_ctx.tb_ctx.tb_phys_invalidate_count);
should trigger, or the accounting isn't right.

Another option is that the "TB count - invalidate_count" is different
for each test you ran. I think this is what's going on, otherwise we couldn't
explain why the first report ("before 5/11") is also "wrong":

  5.289*31731=167825.259

Only the second report ("after 7/11") seems good (taking into account
lack of precision of just 3 decimals):
  5.26*32582=171381.32 ~= 171367
which leads me to believe that you've used the TB and invalidate
counts from that test.

I just tested your patches (on an ARM bootup) and the assert doesn't trigger,
and the stats are spot on for "after 11/11":

TB count            643610/2684354
TB hash buckets     369534/524288
TB hash avg chain   1.729 max=8
TB flush count      0
TB invalidate count 4718

1.729*369534=638924.286, which is ~= 643610-4718 = 638892.

> I'm tempted to pull over gcc's non-chaining hash table implementation
> (libiberty/hashtab.c, still gplv2+) and compare...

You can try, but I think performance wouldn't be great, because
the comparison function would be called way too often due to the
ht using open addressing. The problem there is not only the comparisons
themselves, but the all the cache lines needed to read the fields of
the comparison. I haven't tested libiberty's htable but I did test
the htable in concurrencykit[1], which also uses open addressing.

With ck's ht, performance was not good when booting ARM: IIRC ~30% of
runtime was spent on tb_cmp(); I also added the full hash to each TB so
that it would be compared first, but it didn't make a difference since
the delay was due to loading the cache line (I saw this with perf(1)'s
annotated code, which showed that ~80% of the time spent in tb_cmp()
was in performing the first load of the TB's fields).

This led me to a design that had buckets with a small set of
hash & pointer pairs, all in the same cache line as the head (then
I discovered somebody else had thought of this, and that's why there's
a link to the CLHT paper in qht.c).

BTW I tested ck's htable also because of a requirement we have for MTTCG,
which is to support lock-free concurrent lookups. AFAICT libiberty's ht
doesn't support this, so it might be a bit faster than ck's.

Thanks,

		Emilio

[1] http://concurrencykit.org/
    More info on their htable implementation here:
    http://backtrace.io/blog/blog/2015/03/13/workload-specialization/

  reply	other threads:[~2016-04-22 23:57 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-19 23:07 [Qemu-devel] [PATCH v3 00/11] tb hash improvements Emilio G. Cota
2016-04-19 23:07 ` [Qemu-devel] [PATCH v3 01/11] compiler.h: add QEMU_ALIGNED() to enforce struct alignment Emilio G. Cota
2016-04-22  9:32   ` Alex Bennée
2016-04-22  9:35   ` Peter Maydell
2016-04-22 15:50     ` Emilio G. Cota
2016-04-19 23:07 ` [Qemu-devel] [PATCH v3 02/11] seqlock: remove optional mutex Emilio G. Cota
2016-04-19 23:07 ` [Qemu-devel] [PATCH v3 03/11] seqlock: rename write_lock/unlock to write_begin/end Emilio G. Cota
2016-04-19 23:07 ` [Qemu-devel] [PATCH v3 04/11] include/processor.h: define cpu_relax() Emilio G. Cota
2016-04-20 12:15   ` KONRAD Frederic
2016-04-20 17:16     ` Emilio G. Cota
2016-04-20 17:18       ` Peter Maydell
2016-04-20 17:32   ` [Qemu-devel] [UPDATED " Emilio G. Cota
2016-04-22  9:35   ` [Qemu-devel] [PATCH " Alex Bennée
2016-04-19 23:07 ` [Qemu-devel] [PATCH v3 05/11] qemu-thread: add simple test-and-set spinlock Emilio G. Cota
2016-04-20 15:18   ` Richard Henderson
2016-04-20 17:17     ` Emilio G. Cota
2016-04-20 17:55       ` Richard Henderson
2016-04-20 18:11         ` Emilio G. Cota
2016-04-20 19:39           ` Richard Henderson
2016-04-21 16:24             ` Emilio G. Cota
2016-04-20 17:35   ` [Qemu-devel] [UPDATED " Emilio G. Cota
2016-04-19 23:07 ` [Qemu-devel] [PATCH v3 06/11] exec: add tb_hash_func5, derived from xxhash Emilio G. Cota
2016-04-20 15:19   ` Richard Henderson
2016-04-22 12:58   ` Alex Bennée
2016-04-19 23:07 ` [Qemu-devel] [PATCH v3 07/11] tb hash: hash phys_pc, pc, and flags with xxhash Emilio G. Cota
2016-04-22 12:58   ` Alex Bennée
2016-04-19 23:07 ` [Qemu-devel] [PATCH v3 08/11] qht: QEMU's fast, resizable and scalable Hash Table Emilio G. Cota
2016-04-22 14:04   ` Alex Bennée
2016-04-24 20:01   ` Richard Henderson
2016-04-24 21:58     ` Emilio G. Cota
2016-04-19 23:07 ` [Qemu-devel] [PATCH v3 09/11] qht: add test program Emilio G. Cota
2016-04-22 14:35   ` Alex Bennée
2016-04-19 23:07 ` [Qemu-devel] [PATCH v3 10/11] tb hash: track translated blocks with qht Emilio G. Cota
2016-04-28 13:27   ` Alex Bennée
2016-04-19 23:07 ` [Qemu-devel] [PATCH v3 11/11] translate-all: add tb hash bucket info to 'info jit' dump Emilio G. Cota
2016-04-20 15:21   ` Richard Henderson
2016-04-22 14:38   ` Alex Bennée
2016-04-22 17:41   ` Richard Henderson
2016-04-22 19:23     ` Emilio G. Cota
2016-04-22 19:59     ` Richard Henderson
2016-04-22 23:57       ` Emilio G. Cota [this message]
2016-04-24 19:46         ` Richard Henderson
2016-04-24 22:06           ` Emilio G. Cota
2016-04-27  2:43             ` Emilio G. Cota
2016-04-28 16:37               ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160422235738.GA2410@flamenco \
    --to=cota@braap.org \
    --cc=alex.bennee@linaro.org \
    --cc=crosthwaite.peter@gmail.com \
    --cc=mttcg@greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=serge.fdrv@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.