On 06/04/2018 06:11 AM, speck for Konrad Rzeszutek Wilk wrote:
> On Mon, Jun 04, 2018 at 10:24:59AM +0200, speck for Martin Pohlack wrote:
>> [resending as new message as the replay seems to have been lost on at
>> least some mail paths]
>>
>> On 30.05.2018 11:01, speck for Paolo Bonzini wrote:
>>> On 30/05/2018 01:54, speck for Andrew Cooper wrote:
>>>> Other bits I don't understand are the 64k limit in the first place, why
>>>> it gets walked over in 4k strides to begin with (I'm not aware of any
>>>> prefetching which would benefit that...) and why a particularly
>>>> obfuscated piece of magic is used for the 64byte strides.
>>>
>>> That is the only part I understood, :) the 4k strides ensure that the
>>> source data is in the TLB.  Why that is needed is still a mystery though.
>>
>> I think the reasoning is that you first want to populate the TLB for the
>> whole flush array, then fence, to make sure TLB walks do not interfere
>> with the actual flushing later, either for performance reasons or for
>> preventing leakage of partial walk results.
>>
>> Not sure about the 64K, it likely is about the LRU implementation for L1
>> replacement not being perfect (but pseudo LRU), so you need to flush
>> more than the L1 size (32K) in software.  But I have also seen smaller
>> recommendations for that (52K).
> 
> Isn't Tim Chen from Intel on this mailing list? Tim, could you find out
> please?
> 

Will do.

Tim