* Re: [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted) [not found] ` <87va71uijc.fsf@linaro.org> @ 2018-10-01 18:34 ` Emilio G. Cota 2018-10-01 20:40 ` Richard Henderson 2018-10-02 6:48 ` Alex Bennée 0 siblings, 2 replies; 5+ messages in thread From: Emilio G. Cota @ 2018-10-01 18:34 UTC (permalink / raw) To: Alex Bennée; +Cc: qemu-devel, Pranith Kumar, Richard Henderson On Thu, Sep 20, 2018 at 01:19:51 +0100, Alex Bennée wrote: > If we are going to have an indirection then we can also drop the > requirement to scale the TLB according to the number of MMU indexes we > have to support. It's fairly wasteful when a bunch of them are almost > never used unless you are running stuff that uses them. So with dynamic TLB sizing, what you're suggesting here is to resize each MMU array independently (depending on their use rate) instead of using a single "TLB size" for all MMU indexes. Am I understanding your point correctly? Thanks, E. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted) 2018-10-01 18:34 ` [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted) Emilio G. Cota @ 2018-10-01 20:40 ` Richard Henderson 2018-10-02 1:54 ` Emilio G. Cota 2018-10-02 6:48 ` Alex Bennée 1 sibling, 1 reply; 5+ messages in thread From: Richard Henderson @ 2018-10-01 20:40 UTC (permalink / raw) To: Emilio G. Cota, Alex Bennée; +Cc: qemu-devel, Pranith Kumar On 10/1/18 1:34 PM, Emilio G. Cota wrote: > On Thu, Sep 20, 2018 at 01:19:51 +0100, Alex Bennée wrote: >> If we are going to have an indirection then we can also drop the >> requirement to scale the TLB according to the number of MMU indexes we >> have to support. It's fairly wasteful when a bunch of them are almost >> never used unless you are running stuff that uses them. > > So with dynamic TLB sizing, what you're suggesting here is to resize > each MMU array independently (depending on their use rate) instead > of using a single "TLB size" for all MMU indexes. Am I understanding > your point correctly? You cannot do that without flushing the TBs (and with out-of-line memory ops, the prologue as well) and regenerating. The TLB size is baked into the code. And we really don't have any extra registers free to vary that. r~ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted) 2018-10-01 20:40 ` Richard Henderson @ 2018-10-02 1:54 ` Emilio G. Cota 0 siblings, 0 replies; 5+ messages in thread From: Emilio G. Cota @ 2018-10-02 1:54 UTC (permalink / raw) To: Richard Henderson; +Cc: Alex Bennée, qemu-devel, Pranith Kumar On Mon, Oct 01, 2018 at 15:40:37 -0500, Richard Henderson wrote: > On 10/1/18 1:34 PM, Emilio G. Cota wrote: > > On Thu, Sep 20, 2018 at 01:19:51 +0100, Alex Bennée wrote: > >> If we are going to have an indirection then we can also drop the > >> requirement to scale the TLB according to the number of MMU indexes we > >> have to support. It's fairly wasteful when a bunch of them are almost > >> never used unless you are running stuff that uses them. > > > > So with dynamic TLB sizing, what you're suggesting here is to resize > > each MMU array independently (depending on their use rate) instead > > of using a single "TLB size" for all MMU indexes. Am I understanding > > your point correctly? > > You cannot do that without flushing the TBs (and with out-of-line memory ops, > the prologue as well) and regenerating. The TLB size is baked into the code. > And we really don't have any extra registers free to vary that. Can you please elaborate on this? I can't see where this is baked into the generated code, other than the TLB lookup. Grepping for CPU_TLB_SIZE and CPU_TLB_BITS only shows a few places. I have written today a prototype of dynamic TLB flushing. It uses no extra registers because mmu_idx is known at generation time. I haven't done any extensive testing yet, but at least it boots aarch64 and x86_64 guests on an x86_64 host. The code (some messy WIP commits in there, sorry) is at: https://github.com/cota/qemu/tree/tlb2 Please take a look -- am I doing anything horribly wrong there? Thanks, Emilio ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted) 2018-10-01 18:34 ` [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted) Emilio G. Cota 2018-10-01 20:40 ` Richard Henderson @ 2018-10-02 6:48 ` Alex Bennée 2018-10-02 18:09 ` Emilio G. Cota 1 sibling, 1 reply; 5+ messages in thread From: Alex Bennée @ 2018-10-02 6:48 UTC (permalink / raw) To: Emilio G. Cota; +Cc: qemu-devel, Pranith Kumar, Richard Henderson Emilio G. Cota <cota@braap.org> writes: > On Thu, Sep 20, 2018 at 01:19:51 +0100, Alex Bennée wrote: >> If we are going to have an indirection then we can also drop the >> requirement to scale the TLB according to the number of MMU indexes we >> have to support. It's fairly wasteful when a bunch of them are almost >> never used unless you are running stuff that uses them. > > So with dynamic TLB sizing, what you're suggesting here is to resize > each MMU array independently (depending on their use rate) instead > of using a single "TLB size" for all MMU indexes. Am I understanding > your point correctly? Not quite - I think it would overly complicate the lookup to have a differently sized TLB lookup for each mmu index - even if their usage patterns are different. I just meant that if we already have the cost of an indirection we don't have to ensure: CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE]; CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE]; restrict their sizes so any entry in the 2D array can be indexed directly from env. Currently CPU_TLB_SIZE/CPU_TLB_BITS is restricted by the number of NB_MMU_MODES we have to support. But if each can be flushed and managed separately we can have: CPUTLBEntry *tlb_table[NB_MMU_MODES]; And size CPU_TLB_SIZE for the maximum offset we can mange in the lookup code. This is mainly driven by the varying TCG_TARGET_TLB_DISPLACEMENT_BITS each backend has available to it. > > Thanks, > > E. -- Alex Bennée ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted) 2018-10-02 6:48 ` Alex Bennée @ 2018-10-02 18:09 ` Emilio G. Cota 0 siblings, 0 replies; 5+ messages in thread From: Emilio G. Cota @ 2018-10-02 18:09 UTC (permalink / raw) To: Alex Bennée; +Cc: qemu-devel, Pranith Kumar, Richard Henderson On Tue, Oct 02, 2018 at 07:48:20 +0100, Alex Bennée wrote: > > Emilio G. Cota <cota@braap.org> writes: > > > On Thu, Sep 20, 2018 at 01:19:51 +0100, Alex Bennée wrote: > >> If we are going to have an indirection then we can also drop the > >> requirement to scale the TLB according to the number of MMU indexes we > >> have to support. It's fairly wasteful when a bunch of them are almost > >> never used unless you are running stuff that uses them. > > > > So with dynamic TLB sizing, what you're suggesting here is to resize > > each MMU array independently (depending on their use rate) instead > > of using a single "TLB size" for all MMU indexes. Am I understanding > > your point correctly? > > Not quite - I think it would overly complicate the lookup to have a > differently sized TLB lookup for each mmu index - even if their usage > patterns are different. It just adds a load to get the mask, which will most likely be in the L1. The value is not used after 3 instructions later, when the L1 read will have completed. > I just meant that if we already have the cost of an indirection we don't > have to ensure: > > CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE]; > CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE]; > > restrict their sizes so any entry in the 2D array can be indexed > directly from env. Currently CPU_TLB_SIZE/CPU_TLB_BITS is restricted by > the number of NB_MMU_MODES we have to support. But if each can be > flushed and managed separately we can have: > > CPUTLBEntry *tlb_table[NB_MMU_MODES]; > > And size CPU_TLB_SIZE for the maximum offset we can mange in the lookup > code. This is mainly driven by the varying > TCG_TARGET_TLB_DISPLACEMENT_BITS each backend has available to it. What I implemented is what you suggest, but with dynamic resizing based on usage. I'm keeping the current CPU_TLB_SIZE as the minimum size, and took Pranith's TCG_TARGET_TLB_MAX_INDEX_BITS definitions (from 2017) to limit the max tlb size per mmu. I'll prepare an RFC. Thanks, Emilio ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-10-02 18:09 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20180919175423.GA25553@flamenco> [not found] ` <87va71uijc.fsf@linaro.org> 2018-10-01 18:34 ` [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted) Emilio G. Cota 2018-10-01 20:40 ` Richard Henderson 2018-10-02 1:54 ` Emilio G. Cota 2018-10-02 6:48 ` Alex Bennée 2018-10-02 18:09 ` Emilio G. Cota
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.