From mboxrd@z Thu Jan  1 00:00:00 1970
From: Scott Wood <scottwood@freescale.com>
Subject: Re: [PATCH 04/14] KVM: PPC: e500: MMU API
Date: Tue, 1 Nov 2011 11:16:11 -0500
Message-ID: <4EB01B4B.8090209@freescale.com>
References: <1320047596-20577-1-git-send-email-agraf@suse.de> <1320047596-20577-5-git-send-email-agraf@suse.de> <4EAEA184.4050807@redhat.com> <4EAF013C.7050206@freescale.com> <4EAFB4B9.2040806@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Cc: Alexander Graf <agraf@suse.de>, <kvm-ppc@vger.kernel.org>,
	kvm list <kvm@vger.kernel.org>,
	Marcelo Tosatti <mtosatti@redhat.com>
To: Avi Kivity <avi@redhat.com>
Return-path: <kvm-ppc-owner@vger.kernel.org>
In-Reply-To: <4EAFB4B9.2040806@redhat.com>
Sender: kvm-ppc-owner@vger.kernel.org
List-Id: kvm.vger.kernel.org

On 11/01/2011 03:58 AM, Avi Kivity wrote:
> On 10/31/2011 10:12 PM, Scott Wood wrote:
>>>> +4.59 KVM_DIRTY_TLB
>>>> +
>>>> +Capability: KVM_CAP_SW_TLB
>>>> +Architectures: ppc
>>>> +Type: vcpu ioctl
>>>> +Parameters: struct kvm_dirty_tlb (in)
>>>> +Returns: 0 on success, -1 on error
>>>> +
>>>> +struct kvm_dirty_tlb {
>>>> +	__u64 bitmap;
>>>> +	__u32 num_dirty;
>>>> +};
>>>
>>> This is not 32/64 bit safe.  e500 is 32-bit only, yes?
>>
>> e5500 is 64-bit -- we don't support it with KVM yet, but it's planned.
>>
>>> but what if someone wants to emulate an e500 on a ppc64?  maybe it's better to add
>>> padding here.
>>
>> What is unsafe about it?  Are you picturing TLBs with more than 4
>> billion entries?
> 
> sizeof(struct kvm_tlb_dirty) == 12 for 32-bit userspace, but ==  16 for
> 64-bit userspace and the kernel.  ABI structures must have the same
> alignment and size for 32/64 bit userspace, or they need compat handling.

The size is 16 on 32-bit ppc -- the alignment of __u64 forces this.  It
looks like this is different in the 32x86 ABI.

We can pad explicitly if you prefer.

>> There shouldn't be any alignment issues.
>>
>>> Another alternative is to drop the num_dirty field (and let the kernel
>>> compute it instead, shouldn't take long?), and have the third argument
>>> to ioctl() reference the bitmap directly.
>>
>> The idea was to make it possible for the kernel to apply a threshold
>> above which it would be better to ignore the bitmap entirely and flush
>> everything:
>>
>> http://www.spinics.net/lists/kvm/msg50079.html
>>
>> Currently we always just flush everything, and QEMU always says
>> everything is dirty when it makes a change, but the API is there if needed.
> 
> Right, but you don't need num_dirty for it.  There are typically only a
> few dozen entries, yes?  It should take a trivial amount of time to
> calculate its weight.

There are over 500 entries currently, and QEMU could make it much larger
if it wants to decrease guest-visible faults on certain workloads.

It's not the most important feature, indeed we currently ignore the
bitmap entirely.  But it could be useful depending on how the API is
used in the future, and I don't think we gain much by dropping it at
this point.  Alex, any thoughts?

>> This API has been discussed extensively, and the code using it is
>> already in mainline QEMU.  This aspect of it hasn't changed since the
>> discussion back in February:
>>
>> http://www.spinics.net/lists/kvm/msg50102.html
>>
>> I'd prefer to avoid another round of major overhaul without a really
>> good reason.
> 
> Me too, but I also prefer not to make ABI choices by inertia.  ABI is
> practically the only thing I care about wrt non-x86 (other than
> whitespace, of course).  Please involve me in the discussions earlier in
> the future.

You participated in that thread. :-)

I apologize for forgetting the main kvm list (rather than just kvm-ppc)
when sending out the most recent batch of patches.

>>>> +For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
>>>> + - The "params" field is of type "struct kvm_book3e_206_tlb_params".
>>>> + - The "array" field points to an array of type "struct
>>>> +   kvm_book3e_206_tlb_entry".
>>>> + - The array consists of all entries in the first TLB, followed by all
>>>> +   entries in the second TLB.
>>>> + - Within a TLB, entries are ordered first by increasing set number.  Within a
>>>> +   set, entries are ordered by way (increasing ESEL).
>>>> + - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
>>>> +   where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
>>>> + - The tsize field of mas1 shall be set to 4K on TLB0, even though the
>>>> +   hardware ignores this value for TLB0.
>>>
>>> Holy shit.
>>
>> You were the one that first suggested we use shared data:
>> http://www.spinics.net/lists/kvm/msg49802.html
>>
>> These are the assumptions needed to make such an interface well-defined.
> 
> Just remarking on the complexity, don't take it personally.

:-)

Just wasn't sure whether the implication was that it was too complex.

-scott

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Scott Wood <scottwood@freescale.com>
Date: Tue, 01 Nov 2011 16:16:11 +0000
Subject: Re: [PATCH 04/14] KVM: PPC: e500: MMU API
Message-Id: <4EB01B4B.8090209@freescale.com>
List-Id: <kvm-ppc.vger.kernel.org>
References: <1320047596-20577-1-git-send-email-agraf@suse.de> <1320047596-20577-5-git-send-email-agraf@suse.de> <4EAEA184.4050807@redhat.com> <4EAF013C.7050206@freescale.com> <4EAFB4B9.2040806@redhat.com>
In-Reply-To: <4EAFB4B9.2040806@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: Avi Kivity <avi@redhat.com>
Cc: Alexander Graf <agraf@suse.de>, kvm-ppc@vger.kernel.org, kvm list <kvm@vger.kernel.org>, Marcelo Tosatti <mtosatti@redhat.com>

On 11/01/2011 03:58 AM, Avi Kivity wrote:
> On 10/31/2011 10:12 PM, Scott Wood wrote:
>>>> +4.59 KVM_DIRTY_TLB
>>>> +
>>>> +Capability: KVM_CAP_SW_TLB
>>>> +Architectures: ppc
>>>> +Type: vcpu ioctl
>>>> +Parameters: struct kvm_dirty_tlb (in)
>>>> +Returns: 0 on success, -1 on error
>>>> +
>>>> +struct kvm_dirty_tlb {
>>>> +	__u64 bitmap;
>>>> +	__u32 num_dirty;
>>>> +};
>>>
>>> This is not 32/64 bit safe.  e500 is 32-bit only, yes?
>>
>> e5500 is 64-bit -- we don't support it with KVM yet, but it's planned.
>>
>>> but what if someone wants to emulate an e500 on a ppc64?  maybe it's better to add
>>> padding here.
>>
>> What is unsafe about it?  Are you picturing TLBs with more than 4
>> billion entries?
> 
> sizeof(struct kvm_tlb_dirty) = 12 for 32-bit userspace, but =  16 for
> 64-bit userspace and the kernel.  ABI structures must have the same
> alignment and size for 32/64 bit userspace, or they need compat handling.

The size is 16 on 32-bit ppc -- the alignment of __u64 forces this.  It
looks like this is different in the 32x86 ABI.

We can pad explicitly if you prefer.

>> There shouldn't be any alignment issues.
>>
>>> Another alternative is to drop the num_dirty field (and let the kernel
>>> compute it instead, shouldn't take long?), and have the third argument
>>> to ioctl() reference the bitmap directly.
>>
>> The idea was to make it possible for the kernel to apply a threshold
>> above which it would be better to ignore the bitmap entirely and flush
>> everything:
>>
>> http://www.spinics.net/lists/kvm/msg50079.html
>>
>> Currently we always just flush everything, and QEMU always says
>> everything is dirty when it makes a change, but the API is there if needed.
> 
> Right, but you don't need num_dirty for it.  There are typically only a
> few dozen entries, yes?  It should take a trivial amount of time to
> calculate its weight.

There are over 500 entries currently, and QEMU could make it much larger
if it wants to decrease guest-visible faults on certain workloads.

It's not the most important feature, indeed we currently ignore the
bitmap entirely.  But it could be useful depending on how the API is
used in the future, and I don't think we gain much by dropping it at
this point.  Alex, any thoughts?

>> This API has been discussed extensively, and the code using it is
>> already in mainline QEMU.  This aspect of it hasn't changed since the
>> discussion back in February:
>>
>> http://www.spinics.net/lists/kvm/msg50102.html
>>
>> I'd prefer to avoid another round of major overhaul without a really
>> good reason.
> 
> Me too, but I also prefer not to make ABI choices by inertia.  ABI is
> practically the only thing I care about wrt non-x86 (other than
> whitespace, of course).  Please involve me in the discussions earlier in
> the future.

You participated in that thread. :-)

I apologize for forgetting the main kvm list (rather than just kvm-ppc)
when sending out the most recent batch of patches.

>>>> +For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
>>>> + - The "params" field is of type "struct kvm_book3e_206_tlb_params".
>>>> + - The "array" field points to an array of type "struct
>>>> +   kvm_book3e_206_tlb_entry".
>>>> + - The array consists of all entries in the first TLB, followed by all
>>>> +   entries in the second TLB.
>>>> + - Within a TLB, entries are ordered first by increasing set number.  Within a
>>>> +   set, entries are ordered by way (increasing ESEL).
>>>> + - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
>>>> +   where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
>>>> + - The tsize field of mas1 shall be set to 4K on TLB0, even though the
>>>> +   hardware ignores this value for TLB0.
>>>
>>> Holy shit.
>>
>> You were the one that first suggested we use shared data:
>> http://www.spinics.net/lists/kvm/msg49802.html
>>
>> These are the assumptions needed to make such an interface well-defined.
> 
> Just remarking on the complexity, don't take it personally.

:-)

Just wasn't sure whether the implication was that it was too complex.

-scott