From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alexander Graf <agraf@suse.de>
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
Date: Wed, 24 Jul 2013 11:21:11 +0200
Message-ID: <708AA5B2-2073-4D24-8B3C-23CC113594C2@suse.de>
References: <1374127456-9614-1-git-send-email-Bharat.Bhushan@freescale.com> <1374127456-9614-2-git-send-email-Bharat.Bhushan@freescale.com> <51E78A88.4000904@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6AFF@039-SN2MPN1-013.039d.mgd.msft.net> <51E799D0.1030406@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6DAB@039-SN2MPN1-013.039d.mgd.msft.net> <51E7A5A2.5040107@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6E79@039-SN2MPN1-013.039d.mgd.msft.net> <ABD89368-97E2-4CC4-A6A1-FDD78DA074E2@suse.de> <51E7BE90.60300@windriver.com> <50683BE9-8BDD-4730-B866-F972BCF1EDD6@suse.de> <51E7C14C.6060600@windriver.com> <E7A150B3-7263-470E-ABF2-680B0FA4FE7D@suse.de> <51EF3B67.5060403@windriver.com> <597D9B3F-BCE8-4483-B485-3035D6D443AC@suse.de> <6A3DF150A5B70D4F9B66A25E3F7C888D070E17FA@039
 -SN2MPN1-013.039d.mgd.msft.net>
Mime-Version: 1.0 (Apple Message framework v1278)
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: =?windows-1252?Q?=22=93tiejun=2Echen=94=22?=
	<tiejun.chen@windriver.com>,
	"kvm-ppc@vger.kernel.org" <kvm-ppc@vger.kernel.org>,
	"kvm@vger.kernel.org list" <kvm@vger.kernel.org>,
	Wood Scott-B07421 <B07421@freescale.com>,
	Gleb Natapov <gleb@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
To: Bhushan Bharat-R65777 <R65777@freescale.com>
Return-path: <kvm-ppc-owner@vger.kernel.org>
In-Reply-To: <6A3DF150A5B70D4F9B66A25E3F7C888D070E17FA@039-SN2MPN1-013.039d.mgd.msft.net>
Sender: kvm-ppc-owner@vger.kernel.org
List-Id: kvm.vger.kernel.org


On 24.07.2013, at 11:11, Bhushan Bharat-R65777 wrote:

>=20
>=20
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Wednesday, July 24, 2013 1:55 PM
>> To: "=93tiejun.chen=94"
>> Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; kvm@vger.kernel.=
org list;
>> Wood Scott-B07421; Gleb Natapov; Paolo Bonzini
>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for =
kernel
>> managed pages
>>=20
>>=20
>> On 24.07.2013, at 04:26, =93tiejun.chen=94 wrote:
>>=20
>>> On 07/18/2013 06:27 PM, Alexander Graf wrote:
>>>>=20
>>>> On 18.07.2013, at 12:19, =93tiejun.chen=94 wrote:
>>>>=20
>>>>> On 07/18/2013 06:12 PM, Alexander Graf wrote:
>>>>>>=20
>>>>>> On 18.07.2013, at 12:08, =93tiejun.chen=94 wrote:
>>>>>>=20
>>>>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote:
>>>>>>>>=20
>>>>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Bhushan Bharat-R65777
>>>>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM
>>>>>>>>>> To: '" tiejun.chen "'
>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>> agraf@suse.de; Wood Scott-
>>>>>>>>>> B07421
>>>>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency o=
nly
>>>>>>>>>> for kernel managed pages
>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com]
>>>>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>>> agraf@suse.de; Wood
>>>>>>>>>>> Scott-
>>>>>>>>>>> B07421
>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
>>>>>>>>>>> only for kernel managed pages
>>>>>>>>>>>=20
>>>>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>>>=20
>>>>>>>>>>>>=20
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of " tie=
jun.chen "
>>>>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>>>>> agraf@suse.de; Wood
>>>>>>>>>>>>> Scott-
>>>>>>>>>>>>> B07421
>>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherenc=
y
>>>>>>>>>>>>> only for kernel managed pages
>>>>>>>>>>>>>=20
>>>>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com=
]
>>>>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>>>>>>> agraf@suse.de; Wood
>>>>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache cohere=
ncy
>>>>>>>>>>>>>>> only for kernel managed pages
>>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>>>>>>>>> If there is a struct page for the requested mapping th=
en
>>>>>>>>>>>>>>>> it's normal DDR and the mapping sets "M" bit (coherent=
,
>>>>>>>>>>>>>>>> cacheable) else this is treated as I/O and we set  "I =
+
>>>>>>>>>>>>>>>> G"  (cache inhibited,
>>>>>>>>>>>>>>>> guarded)
>>>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assig=
ned
>>>>>>>>>>>>>>>> device
>>>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan
>>>>>>>>>>>>>>>> <bharat.bhushan@freescale.com>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>   arch/powerpc/kvm/e500_mmu_host.c |   17 ++++++++++++=
-----
>>>>>>>>>>>>>>>>   1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>>>>>>>>> usermode)
>>>>>>>>>>>>>>>>   	return mas3;
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, i=
nt
>>>>>>>>>>>>>>>> usermode)
>>>>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2,
>>>>>>>>>>>>>>>> +pfn_t pfn)
>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>> +	u32 mas2_attr;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +	mas2_attr =3D mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +	if (!pfn_valid(pfn)) {
>>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> What I understand from this function (someone can correc=
t
>>>>>>>>>>>>>> me) is that it
>>>>>>>>>>>>> returns "false" when the page is managed by kernel and is
>>>>>>>>>>>>> not marked as RESERVED (for some reason). For us it does =
not
>>>>>>>>>>>>> matter whether the page is reserved or not, if it is kern=
el
>>>>>>>>>>>>> visible page then it
>>>>>>>>>> is DDR.
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>=20
>>>>>>>>>>>>> I think you are setting I|G by addressing all mmio pages,
>>>>>>>>>>>>> right? If so,
>>>>>>>>>>>>>=20
>>>>>>>>>>>>>     KVM: direct mmio pfn check
>>>>>>>>>>>>>=20
>>>>>>>>>>>>>     Userspace may specify memory slots that are backed by
>>>>>>>>>>>>> mmio pages rather than
>>>>>>>>>>>>>     normal RAM.  In some cases it is not enough to identi=
fy
>>>>>>>>>>>>> these mmio
>>>>>>>>>>> pages
>>>>>>>>>>>>>     by pfn_valid().  This patch adds checking the PageRes=
erved as
>> well.
>>>>>>>>>>>>=20
>>>>>>>>>>>> Do you know what are those "some cases" and how checking
>>>>>>>>>>>> PageReserved helps in
>>>>>>>>>>> those cases?
>>>>>>>>>>>=20
>>>>>>>>>>> No, myself didn't see these actual cases in qemu,too. But t=
his
>>>>>>>>>>> should be chronically persistent as I understand ;-)
>>>>>>>>>>=20
>>>>>>>>>> Then I will wait till someone educate me :)
>>>>>>>>>=20
>>>>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy=
 and I do
>> not want to call this for all tlbwe operation unless it is necessary=
=2E
>>>>>>>>=20
>>>>>>>> It certainly does more than we need and potentially slows down=
 the fast
>> path (RAM mapping). The only thing it does on top of "if (pfn_valid(=
))" is to
>> check for pages that are declared reserved on the host. This happens=
 in 2 cases:
>>>>>>>>=20
>>>>>>>>  1) Non cache coherent DMA
>>>>>>>>  2) Memory hot remove
>>>>>>>>=20
>>>>>>>> The non coherent DMA case would be interesting, as with the me=
chanism as
>> it is in place in Linux today, we could potentially break normal gue=
st operation
>> if we don't take it into account. However, it's Kconfig guarded by:
>>>>>>>>=20
>>>>>>>>        depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUB=
E_COMMON
>>>>>>>>        default n if PPC_47x
>>>>>>>>        default y
>>>>>>>>=20
>>>>>>>> so we never hit it with any core we care about ;).
>>>>>>>>=20
>>>>>>>> Memory hot remove does not exist on e500 FWIW, so we don't hav=
e to worry
>> about that one either.
>>>>>>>=20
>>>>>>> Thanks for this good information :)
>>>>>>>=20
>>>>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside
>> kvm_is_mmio_pfn() to make sure that check is only valid when that is=
 really
>> needed? This can decrease those unnecessary performance loss.
>>>>>>>=20
>>>>>>> If I'm wrong please correct me :)
>>>>>>=20
>>>>>> You're perfectly right, but this is generic KVM code. So it gets=
 run across
>> all architectures. What if someone has the great idea to add a new c=
ase here for
>> x86, but doesn't tell us? In that case we potentially break x86.
>>>>>>=20
>>>>>> I'd rather not like to break x86 :).
>>>>>>=20
>>>>>> However, it'd be very interesting to see a benchmark with this c=
hange. Do
>> you think you could just rip out the whole reserved check and run a =
few
>> benchmarks and show us the results?
>>>>>>=20
>>>>>=20
>>>>> Often what case should be adopted to validate this scenario?
>>>>=20
>>>> Something which hammers the TLB emulation heavily. I usually just =
run
>>>> /bin/echo a thousand times in "time" and see how long it takes ;)
>>>>=20
>>>=20
>>> I tried to run five times with this combination, "time `for ((i=3D0=
; i<5000;
>> i++));  do /bin/echo; done`", to calculate the average value with th=
is change:
>>>=20
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
>>> 1580dd4..5e8635b 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -102,6 +102,10 @@ static bool largepages_enabled =3D true;
>>>=20
>>> bool kvm_is_mmio_pfn(pfn_t pfn)
>>> {
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>>=20
>> I'd feel safer if we narrow this down to e500.
>>=20
>>> +       /*
>>> +        * Currently only in memory hot remove case we may still ne=
ed this.
>>> +        */
>>>       if (pfn_valid(pfn)) {
>>=20
>> We still have to check for pfn_valid, no? So the #ifdef should be do=
wn here.
>>=20
>>>               int reserved;
>>>               struct page *tail =3D pfn_to_page(pfn); @@ -124,6 +12=
8,7
>>> @@ bool kvm_is_mmio_pfn(pfn_t pfn)
>>>               }
>>>               return PageReserved(tail);
>>>       }
>>> +#endif
>>>=20
>>>       return true;
>>> }
>>>=20
>>> Before apply this change:
>>>=20
>>> real    (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)=
/5=3D
>> 1m21.376s
>>> user    (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)=
/5=3D
>> 0m23.433s
>>> sys	(0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5=3D=
 0m50.349s
>>>=20
>>> After apply this change:
>>>=20
>>> real    (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)=
/5=3D
>> 1m20.667s
>>> user    (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)=
/5=3D
>> 0m22.615s
>>> sys	(0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5=3D=
 0m49.496s
>>>=20
>>> So,
>>>=20
>>> real    (1m20.667s - 1m21.376s)/1m21.376s x 100% =3D -0.6%
>>> user    (0m22.615s - 0m23.433s)/0m23.433s x 100% =3D -3.5%
>>> sys	(0m49.496s - 0m50.349s)/0m50.349s x 100% =3D -1.7%
>>=20
>> Very nice, so there is a real world performance benefit to doing thi=
s. Then yes,
>> I think it would make sense to change the global helper function to =
be fast on
>> e500 and use that one from e500_shadow_mas2_attrib() instead.
>=20
> Are not we going to use page_is_ram() from  e500_shadow_mas2_attrib()=
 as Scott commented?

rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?


Alex

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alexander Graf <agraf@suse.de>
Date: Wed, 24 Jul 2013 09:21:11 +0000
Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
Message-Id: <708AA5B2-2073-4D24-8B3C-23CC113594C2@suse.de>
List-Id: <kvm-ppc.vger.kernel.org>
References: <1374127456-9614-1-git-send-email-Bharat.Bhushan@freescale.com> <1374127456-9614-2-git-send-email-Bharat.Bhushan@freescale.com> <51E78A88.4000904@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6AFF@039-SN2MPN1-013.039d.mgd.msft.net> <51E799D0.1030406@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6DAB@039-SN2MPN1-013.039d.mgd.msft.net> <51E7A5A2.5040107@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6E79@039-SN2MPN1-013.039d.mgd.msft.net> <ABD89368-97E2-4CC4-A6A1-FDD78DA074E2@suse.de> <51E7BE90.60300@windriver.com> <50683BE9-8BDD-4730-B866-F972BCF1EDD6@suse.de> <51E7C14C.6060600@windriver.com> <E7A150B3-7263-470E-ABF2-680B0FA4FE7D@suse.de> <51EF3B67.5060403@windriver.com> <597D9B3F-BCE8-4483-B485-3035D6D443AC@suse.de> <6A3DF150A5B70D4F9B66A25E3F7C888D070E17FA@039
 -SN2MPN1-013.039d.mgd.msft.net>
In-Reply-To: <6A3DF150A5B70D4F9B66A25E3F7C888D070E17FA@039-SN2MPN1-013.039d.mgd.msft.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
To: Bhushan Bharat-R65777 <R65777@freescale.com>
Cc: =?windows-1252?Q?=22=93tiejun=2Echen=94=22?= <tiejun.chen@windriver.com>, "kvm-ppc@vger.kernel.org" <kvm-ppc@vger.kernel.org>, "kvm@vger.kernel.org list" <kvm@vger.kernel.org>, Wood Scott-B07421 <B07421@freescale.com>, Gleb Natapov <gleb@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>


On 24.07.2013, at 11:11, Bhushan Bharat-R65777 wrote:

>=20
>=20
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Wednesday, July 24, 2013 1:55 PM
>> To: "=93tiejun.chen=94"
>> Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; kvm@vger.kernel.org =
list;
>> Wood Scott-B07421; Gleb Natapov; Paolo Bonzini
>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kern=
el
>> managed pages
>>=20
>>=20
>> On 24.07.2013, at 04:26, =93tiejun.chen=94 wrote:
>>=20
>>> On 07/18/2013 06:27 PM, Alexander Graf wrote:
>>>>=20
>>>> On 18.07.2013, at 12:19, =93tiejun.chen=94 wrote:
>>>>=20
>>>>> On 07/18/2013 06:12 PM, Alexander Graf wrote:
>>>>>>=20
>>>>>> On 18.07.2013, at 12:08, =93tiejun.chen=94 wrote:
>>>>>>=20
>>>>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote:
>>>>>>>>=20
>>>>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>>>=20
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Bhushan Bharat-R65777
>>>>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM
>>>>>>>>>> To: '" tiejun.chen "'
>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>> agraf@suse.de; Wood Scott-
>>>>>>>>>> B07421
>>>>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>>>> for kernel managed pages
>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com]
>>>>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>>> agraf@suse.de; Wood
>>>>>>>>>>> Scott-
>>>>>>>>>>> B07421
>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
>>>>>>>>>>> only for kernel managed pages
>>>>>>>>>>>=20
>>>>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>>>=20
>>>>>>>>>>>>=20
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of " tiejun.=
chen "
>>>>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>>>>> agraf@suse.de; Wood
>>>>>>>>>>>>> Scott-
>>>>>>>>>>>>> B07421
>>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
>>>>>>>>>>>>> only for kernel managed pages
>>>>>>>>>>>>>=20
>>>>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com]
>>>>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>>>>>>> agraf@suse.de; Wood
>>>>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
>>>>>>>>>>>>>>> only for kernel managed pages
>>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>>>>>>>>> If there is a struct page for the requested mapping then
>>>>>>>>>>>>>>>> it's normal DDR and the mapping sets "M" bit (coherent,
>>>>>>>>>>>>>>>> cacheable) else this is treated as I/O and we set  "I +
>>>>>>>>>>>>>>>> G"  (cache inhibited,
>>>>>>>>>>>>>>>> guarded)
>>>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned
>>>>>>>>>>>>>>>> device
>>>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan
>>>>>>>>>>>>>>>> <bharat.bhushan@freescale.com>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>   arch/powerpc/kvm/e500_mmu_host.c |   17 ++++++++++++-----
>>>>>>>>>>>>>>>>   1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>>>>>>>>> usermode)
>>>>>>>>>>>>>>>>   	return mas3;
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>>>>>>>>> usermode)
>>>>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2,
>>>>>>>>>>>>>>>> +pfn_t pfn)
>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>> +	u32 mas2_attr;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +	mas2_attr =3D mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +	if (!pfn_valid(pfn)) {
>>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> What I understand from this function (someone can correct
>>>>>>>>>>>>>> me) is that it
>>>>>>>>>>>>> returns "false" when the page is managed by kernel and is
>>>>>>>>>>>>> not marked as RESERVED (for some reason). For us it does not
>>>>>>>>>>>>> matter whether the page is reserved or not, if it is kernel
>>>>>>>>>>>>> visible page then it
>>>>>>>>>> is DDR.
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>=20
>>>>>>>>>>>>> I think you are setting I|G by addressing all mmio pages,
>>>>>>>>>>>>> right? If so,
>>>>>>>>>>>>>=20
>>>>>>>>>>>>>     KVM: direct mmio pfn check
>>>>>>>>>>>>>=20
>>>>>>>>>>>>>     Userspace may specify memory slots that are backed by
>>>>>>>>>>>>> mmio pages rather than
>>>>>>>>>>>>>     normal RAM.  In some cases it is not enough to identify
>>>>>>>>>>>>> these mmio
>>>>>>>>>>> pages
>>>>>>>>>>>>>     by pfn_valid().  This patch adds checking the PageReserve=
d as
>> well.
>>>>>>>>>>>>=20
>>>>>>>>>>>> Do you know what are those "some cases" and how checking
>>>>>>>>>>>> PageReserved helps in
>>>>>>>>>>> those cases?
>>>>>>>>>>>=20
>>>>>>>>>>> No, myself didn't see these actual cases in qemu,too. But this
>>>>>>>>>>> should be chronically persistent as I understand ;-)
>>>>>>>>>>=20
>>>>>>>>>> Then I will wait till someone educate me :)
>>>>>>>>>=20
>>>>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and=
 I do
>> not want to call this for all tlbwe operation unless it is necessary.
>>>>>>>>=20
>>>>>>>> It certainly does more than we need and potentially slows down the=
 fast
>> path (RAM mapping). The only thing it does on top of "if (pfn_valid())" =
is to
>> check for pages that are declared reserved on the host. This happens in =
2 cases:
>>>>>>>>=20
>>>>>>>>  1) Non cache coherent DMA
>>>>>>>>  2) Memory hot remove
>>>>>>>>=20
>>>>>>>> The non coherent DMA case would be interesting, as with the mechan=
ism as
>> it is in place in Linux today, we could potentially break normal guest o=
peration
>> if we don't take it into account. However, it's Kconfig guarded by:
>>>>>>>>=20
>>>>>>>>        depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_CO=
MMON
>>>>>>>>        default n if PPC_47x
>>>>>>>>        default y
>>>>>>>>=20
>>>>>>>> so we never hit it with any core we care about ;).
>>>>>>>>=20
>>>>>>>> Memory hot remove does not exist on e500 FWIW, so we don't have to=
 worry
>> about that one either.
>>>>>>>=20
>>>>>>> Thanks for this good information :)
>>>>>>>=20
>>>>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside
>> kvm_is_mmio_pfn() to make sure that check is only valid when that is rea=
lly
>> needed? This can decrease those unnecessary performance loss.
>>>>>>>=20
>>>>>>> If I'm wrong please correct me :)
>>>>>>=20
>>>>>> You're perfectly right, but this is generic KVM code. So it gets run=
 across
>> all architectures. What if someone has the great idea to add a new case =
here for
>> x86, but doesn't tell us? In that case we potentially break x86.
>>>>>>=20
>>>>>> I'd rather not like to break x86 :).
>>>>>>=20
>>>>>> However, it'd be very interesting to see a benchmark with this chang=
e. Do
>> you think you could just rip out the whole reserved check and run a few
>> benchmarks and show us the results?
>>>>>>=20
>>>>>=20
>>>>> Often what case should be adopted to validate this scenario?
>>>>=20
>>>> Something which hammers the TLB emulation heavily. I usually just run
>>>> /bin/echo a thousand times in "time" and see how long it takes ;)
>>>>=20
>>>=20
>>> I tried to run five times with this combination, "time `for ((i=3D0; i<=
5000;
>> i++));  do /bin/echo; done`", to calculate the average value with this c=
hange:
>>>=20
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
>>> 1580dd4..5e8635b 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -102,6 +102,10 @@ static bool largepages_enabled =3D true;
>>>=20
>>> bool kvm_is_mmio_pfn(pfn_t pfn)
>>> {
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>>=20
>> I'd feel safer if we narrow this down to e500.
>>=20
>>> +       /*
>>> +        * Currently only in memory hot remove case we may still need t=
his.
>>> +        */
>>>       if (pfn_valid(pfn)) {
>>=20
>> We still have to check for pfn_valid, no? So the #ifdef should be down h=
ere.
>>=20
>>>               int reserved;
>>>               struct page *tail =3D pfn_to_page(pfn); @@ -124,6 +128,7
>>> @@ bool kvm_is_mmio_pfn(pfn_t pfn)
>>>               }
>>>               return PageReserved(tail);
>>>       }
>>> +#endif
>>>=20
>>>       return true;
>>> }
>>>=20
>>> Before apply this change:
>>>=20
>>> real    (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5>>=
 1m21.376s
>>> user    (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5>>=
 0m23.433s
>>> sys	(0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5=3D 0m=
50.349s
>>>=20
>>> After apply this change:
>>>=20
>>> real    (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5>>=
 1m20.667s
>>> user    (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5>>=
 0m22.615s
>>> sys	(0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5=3D 0m=
49.496s
>>>=20
>>> So,
>>>=20
>>> real    (1m20.667s - 1m21.376s)/1m21.376s x 100% =3D -0.6%
>>> user    (0m22.615s - 0m23.433s)/0m23.433s x 100% =3D -3.5%
>>> sys	(0m49.496s - 0m50.349s)/0m50.349s x 100% =3D -1.7%
>>=20
>> Very nice, so there is a real world performance benefit to doing this. T=
hen yes,
>> I think it would make sense to change the global helper function to be f=
ast on
>> e500 and use that one from e500_shadow_mas2_attrib() instead.
>=20
> Are not we going to use page_is_ram() from  e500_shadow_mas2_attrib() as =
Scott commented?

rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?


Alex