All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Emilio G. Cota" <cota@braap.org>
Cc: Pranith Kumar <bobby.prani@gmail.com>,
	Richard Henderson <rth@twiddle.net>,
	Peter Maydell <peter.maydell@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] GSoC 2017 Proposal: TCG performance enhancements
Date: Wed, 07 Jun 2017 11:15:32 +0100	[thread overview]
Message-ID: <87ink8cddn.fsf@linaro.org> (raw)
In-Reply-To: <20170606171320.GA8115@flamenco>


Emilio G. Cota <cota@braap.org> writes:

> On Sat, Mar 25, 2017 at 12:52:35 -0400, Pranith Kumar wrote:
> (snip)
>> * Implement an LRU translation block code cache.
>>
>>   In the current TCG design, when the translation cache fills up, we flush all
>>   the translated blocks (TBs) to free up space. We can improve this situation
>>   by not flushing the TBs that were recently used i.e., by implementing an LRU
>>   policy for freeing the blocks. This should avoid the re-translation overhead
>>   for frequently used blocks and improve performance.
>
> I doubt this will yield any benefits because:
>
> - I still have not found a workload where the performance bottleneck is
>   code retranslation due to unnecessary flushes (unless of course we
>   artificially restrict the size of code_gen_buffer.)
> - To keep track of LRU you need at least one extra instruction on every
>   TB, e.g. to increase a counter or add a timestamp. This might be expensive
>   and possibly a scalability bottleneck (e.g. what to do when several
>   cores are executing the same TB?).
> - tb_find_pc now does a simple binary search. This is easy because we
>   know that TB's are allocated from code_gen_buffer in order. If they
>   were out of order, we'd need another data structure (e.g. some sort of
>   tree) to have quick searches. This is not a fast path though so this
>   could be OK.

Certainly to make changes here we would need some proper numbers showing
it is a problem. Even my re-compile stress-ng test only flushes every
now an then.

>
> (snip)
>> Please let me know if you have any comments or suggestions. Also please let me
>> know if there are other enhancements that are easily implementable to increase
>> TCG performance as part of this project or otherwise.
>
> My not-necessarily-easy-to-implement wishlist would be:
>
> - Reduction of tb_lock contention when booting many cores. For instance,
>   booting 64 aarch64 cores on a 64-core host shows quite a bit of contention (host
>   cores are 80% idle, i.e. waiting to acquire tb_lock); fortunately this is not a
>   big deal (e.g. 4s for booting 1 core vs. ~14s to boot 64) and anyway most
>   long-running workloads are cached a lot more effectively.
>   Still, it would make sense to consider the option of not going through tb_lock
>   etc. (via a private cache? or simply not caching at all) for code that is not
>   executed many times. Another option is to translate privately, and only acquire
>   tb_lock to copy the translated code to the shared buffer.

Currently tb_lock protects the whole translation cycle. However to get
any sort of parallelism in a different translation cache we would also
need to make the translators thread safe. Currently translation involves
too many shared globals across the core TCG state as well as the
per-arch translate.c functions.

>
> - Instrumentation. I think QEMU should have a good interface to enable
>   dynamic binary instrumentation. This has many uses and in fact there
>   are quite a few forks of QEMU doing this.
>   I think Lluís Vilanova's work [1] is a good start to eventually get
>   something upstream.

I too want to see more here. It would be nice to have a hit count for
each block and some live introspection so we could investigate the
hotest blocks and examine the code the generate more closely.

I think there is scope for a big improvement if you could create a
hot-path series of basic blocks with multiple exit points and avoid the
spill/fills of registers in the hot path. However this is a fairly major
change to the current design.

Outside of performance improvements having a good instrumentation story
would be good for people who want to do analysis of guest behaviour.

>
> 		Emilio
>
> [1] https://projects.gso.ac.upc.edu/projects/qemu-dbi


--
Alex Bennée

  reply	other threads:[~2017-06-07 10:15 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-25 16:52 [Qemu-devel] GSoC 2017 Proposal: TCG performance enhancements Pranith Kumar
2017-03-27 10:57 ` Richard Henderson
2017-03-27 13:22   ` Alex Bennée
2017-03-28  3:03   ` Pranith Kumar
2017-03-28  3:09     ` Pranith Kumar
2017-03-28 10:03       ` Stefan Hajnoczi
2017-06-02 23:39   ` [Qemu-devel] [PATCH] tcg: allocate TB structs before the corresponding translated code Emilio G. Cota
2017-06-04 17:47     ` Richard Henderson
2017-03-27 11:32 ` [Qemu-devel] GSoC 2017 Proposal: TCG performance enhancements Paolo Bonzini
2017-03-28  3:07   ` Pranith Kumar
2017-03-27 15:54 ` Stefan Hajnoczi
2017-03-27 17:13   ` Pranith Kumar
2017-06-06 17:13 ` Emilio G. Cota
2017-06-07 10:15   ` Alex Bennée [this message]
2017-06-07 11:12   ` Lluís Vilanova
2017-06-07 12:07     ` Peter Maydell
2017-06-07 13:35       ` Paolo Bonzini
2017-06-07 15:52         ` Lluís Vilanova
2017-06-07 16:09           ` Alex Bennée
2017-06-07 17:07           ` Paolo Bonzini
2017-06-07 15:45       ` Lluís Vilanova
2017-06-07 16:17         ` Peter Maydell
2017-06-07 22:49         ` Emilio G. Cota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ink8cddn.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=bobby.prani@gmail.com \
    --cc=cota@braap.org \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.