From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paul Mackerras Date: Tue, 09 Feb 2021 07:19:26 +0000 Subject: Re: [PATCH 1/2] KVM: PPC: Book3S HV: Remove shared-TLB optimisation from vCPU TLB coherency logic Message-Id: <20210209071926.GA2841126@thinks.paulus.ozlabs.org> List-Id: References: <20200528015331.GD307798@thinks.paulus.ozlabs.org> In-Reply-To: <20200528015331.GD307798@thinks.paulus.ozlabs.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: kvm-ppc@vger.kernel.org On Mon, Jan 18, 2021 at 10:26:08PM +1000, Nicholas Piggin wrote: > Processors that implement ISA v3.0 or later don't necessarily have > threads in a core sharing all translations, and/or TLBIEL does not > necessarily invalidate translations on all other threads (the > architecture talks only about the effect on translations for "the thread > executing the tlbiel instruction". It seems to me that to have an implementation where TLB entries created on one thread (say T0) are visible to and usable by another thread (T1), but a tlbiel on thread T0 does not result in the entry being removed from visibility/usability on T1, is a pretty insane implementation. I'm not sure that the architecture envisaged allowing this kind of implementation, though perhaps the language doesn't absolutely prohibit it. This kind of implementation is what you are allowing for in this patch, isn't it? The sane implementations would be ones where either (a) TLB entries are private to each thread and tlbiel only works on the local thread, or (b) TLB entries can be shared and tlbiel works across all threads. I think this is the conclusion we collectively came to when working on that bug we worked on towards the end of last year. > While this worked for POWER9, it may not for future implementations, so > remove it. A POWER9 specific optimisation would have to have a specific > CPU feature to check, if it were to be re-added. Did you do any measurements of how much performance impact this has on POWER9? I don't believe this patch will actually be necessary on POWER10, so it seems like this patch is just to allow for some undefined possible future CPU. It may still be worth putting in for the sake of strict architecture compliance if the performance impact is minimal. Paul.