From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages Date: Wed, 24 Jul 2013 11:21:11 +0200 Message-ID: <708AA5B2-2073-4D24-8B3C-23CC113594C2@suse.de> References: <1374127456-9614-1-git-send-email-Bharat.Bhushan@freescale.com> <1374127456-9614-2-git-send-email-Bharat.Bhushan@freescale.com> <51E78A88.4000904@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6AFF@039-SN2MPN1-013.039d.mgd.msft.net> <51E799D0.1030406@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6DAB@039-SN2MPN1-013.039d.mgd.msft.net> <51E7A5A2.5040107@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6E79@039-SN2MPN1-013.039d.mgd.msft.net> <51E7BE90.60300@windriver.com> <50683BE9-8BDD-4730-B866-F972BCF1EDD6@suse.de> <51E7C14C.6060600@windriver.com> <51EF3B67.5060403@windriver.com> <597D9B3F-BCE8-4483-B485-3035D6D443AC@suse.de> <6A3DF150A5B70D4F9B66A25E3F7C888D070E17FA@039 -SN2MPN1-013.039d.mgd.msft.net> Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: =?windows-1252?Q?=22=93tiejun=2Echen=94=22?= , "kvm-ppc@vger.kernel.org" , "kvm@vger.kernel.org list" , Wood Scott-B07421 , Gleb Natapov , Paolo Bonzini To: Bhushan Bharat-R65777 Return-path: In-Reply-To: <6A3DF150A5B70D4F9B66A25E3F7C888D070E17FA@039-SN2MPN1-013.039d.mgd.msft.net> Sender: kvm-ppc-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 24.07.2013, at 11:11, Bhushan Bharat-R65777 wrote: >=20 >=20 >> -----Original Message----- >> From: Alexander Graf [mailto:agraf@suse.de] >> Sent: Wednesday, July 24, 2013 1:55 PM >> To: "=93tiejun.chen=94" >> Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; kvm@vger.kernel.= org list; >> Wood Scott-B07421; Gleb Natapov; Paolo Bonzini >> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for = kernel >> managed pages >>=20 >>=20 >> On 24.07.2013, at 04:26, =93tiejun.chen=94 wrote: >>=20 >>> On 07/18/2013 06:27 PM, Alexander Graf wrote: >>>>=20 >>>> On 18.07.2013, at 12:19, =93tiejun.chen=94 wrote: >>>>=20 >>>>> On 07/18/2013 06:12 PM, Alexander Graf wrote: >>>>>>=20 >>>>>> On 18.07.2013, at 12:08, =93tiejun.chen=94 wrote: >>>>>>=20 >>>>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote: >>>>>>>>=20 >>>>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote: >>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Bhushan Bharat-R65777 >>>>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM >>>>>>>>>> To: '" tiejun.chen "' >>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; >>>>>>>>>> agraf@suse.de; Wood Scott- >>>>>>>>>> B07421 >>>>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency o= nly >>>>>>>>>> for kernel managed pages >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com] >>>>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM >>>>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; >>>>>>>>>>> agraf@suse.de; Wood >>>>>>>>>>> Scott- >>>>>>>>>>> B07421 >>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency >>>>>>>>>>> only for kernel managed pages >>>>>>>>>>>=20 >>>>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: >>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org >>>>>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of " tie= jun.chen " >>>>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM >>>>>>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; >>>>>>>>>>>>> agraf@suse.de; Wood >>>>>>>>>>>>> Scott- >>>>>>>>>>>>> B07421 >>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherenc= y >>>>>>>>>>>>> only for kernel managed pages >>>>>>>>>>>>>=20 >>>>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com= ] >>>>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM >>>>>>>>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; >>>>>>>>>>>>>>> agraf@suse.de; Wood >>>>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777 >>>>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache cohere= ncy >>>>>>>>>>>>>>> only for kernel managed pages >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote: >>>>>>>>>>>>>>>> If there is a struct page for the requested mapping th= en >>>>>>>>>>>>>>>> it's normal DDR and the mapping sets "M" bit (coherent= , >>>>>>>>>>>>>>>> cacheable) else this is treated as I/O and we set "I = + >>>>>>>>>>>>>>>> G" (cache inhibited, >>>>>>>>>>>>>>>> guarded) >>>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assig= ned >>>>>>>>>>>>>>>> device >>>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++= ----- >>>>>>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-) >>>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>>>>> index 1c6a9d7..089c227 100644 >>>>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32 >>>>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int >>>>>>>>>>>>>>> usermode) >>>>>>>>>>>>>>>> return mas3; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, i= nt >>>>>>>>>>>>>>>> usermode) >>>>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, >>>>>>>>>>>>>>>> +pfn_t pfn) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> + u32 mas2_attr; >>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>> + mas2_attr =3D mas2 & MAS2_ATTRIB_MASK; >>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>> + if (!pfn_valid(pfn)) { >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()? >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> What I understand from this function (someone can correc= t >>>>>>>>>>>>>> me) is that it >>>>>>>>>>>>> returns "false" when the page is managed by kernel and is >>>>>>>>>>>>> not marked as RESERVED (for some reason). For us it does = not >>>>>>>>>>>>> matter whether the page is reserved or not, if it is kern= el >>>>>>>>>>>>> visible page then it >>>>>>>>>> is DDR. >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>> I think you are setting I|G by addressing all mmio pages, >>>>>>>>>>>>> right? If so, >>>>>>>>>>>>>=20 >>>>>>>>>>>>> KVM: direct mmio pfn check >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Userspace may specify memory slots that are backed by >>>>>>>>>>>>> mmio pages rather than >>>>>>>>>>>>> normal RAM. In some cases it is not enough to identi= fy >>>>>>>>>>>>> these mmio >>>>>>>>>>> pages >>>>>>>>>>>>> by pfn_valid(). This patch adds checking the PageRes= erved as >> well. >>>>>>>>>>>>=20 >>>>>>>>>>>> Do you know what are those "some cases" and how checking >>>>>>>>>>>> PageReserved helps in >>>>>>>>>>> those cases? >>>>>>>>>>>=20 >>>>>>>>>>> No, myself didn't see these actual cases in qemu,too. But t= his >>>>>>>>>>> should be chronically persistent as I understand ;-) >>>>>>>>>>=20 >>>>>>>>>> Then I will wait till someone educate me :) >>>>>>>>>=20 >>>>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy= and I do >> not want to call this for all tlbwe operation unless it is necessary= =2E >>>>>>>>=20 >>>>>>>> It certainly does more than we need and potentially slows down= the fast >> path (RAM mapping). The only thing it does on top of "if (pfn_valid(= ))" is to >> check for pages that are declared reserved on the host. This happens= in 2 cases: >>>>>>>>=20 >>>>>>>> 1) Non cache coherent DMA >>>>>>>> 2) Memory hot remove >>>>>>>>=20 >>>>>>>> The non coherent DMA case would be interesting, as with the me= chanism as >> it is in place in Linux today, we could potentially break normal gue= st operation >> if we don't take it into account. However, it's Kconfig guarded by: >>>>>>>>=20 >>>>>>>> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUB= E_COMMON >>>>>>>> default n if PPC_47x >>>>>>>> default y >>>>>>>>=20 >>>>>>>> so we never hit it with any core we care about ;). >>>>>>>>=20 >>>>>>>> Memory hot remove does not exist on e500 FWIW, so we don't hav= e to worry >> about that one either. >>>>>>>=20 >>>>>>> Thanks for this good information :) >>>>>>>=20 >>>>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside >> kvm_is_mmio_pfn() to make sure that check is only valid when that is= really >> needed? This can decrease those unnecessary performance loss. >>>>>>>=20 >>>>>>> If I'm wrong please correct me :) >>>>>>=20 >>>>>> You're perfectly right, but this is generic KVM code. So it gets= run across >> all architectures. What if someone has the great idea to add a new c= ase here for >> x86, but doesn't tell us? In that case we potentially break x86. >>>>>>=20 >>>>>> I'd rather not like to break x86 :). >>>>>>=20 >>>>>> However, it'd be very interesting to see a benchmark with this c= hange. Do >> you think you could just rip out the whole reserved check and run a = few >> benchmarks and show us the results? >>>>>>=20 >>>>>=20 >>>>> Often what case should be adopted to validate this scenario? >>>>=20 >>>> Something which hammers the TLB emulation heavily. I usually just = run >>>> /bin/echo a thousand times in "time" and see how long it takes ;) >>>>=20 >>>=20 >>> I tried to run five times with this combination, "time `for ((i=3D0= ; i<5000; >> i++)); do /bin/echo; done`", to calculate the average value with th= is change: >>>=20 >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index >>> 1580dd4..5e8635b 100644 >>> --- a/virt/kvm/kvm_main.c >>> +++ b/virt/kvm/kvm_main.c >>> @@ -102,6 +102,10 @@ static bool largepages_enabled =3D true; >>>=20 >>> bool kvm_is_mmio_pfn(pfn_t pfn) >>> { >>> +#ifdef CONFIG_MEMORY_HOTPLUG >>=20 >> I'd feel safer if we narrow this down to e500. >>=20 >>> + /* >>> + * Currently only in memory hot remove case we may still ne= ed this. >>> + */ >>> if (pfn_valid(pfn)) { >>=20 >> We still have to check for pfn_valid, no? So the #ifdef should be do= wn here. >>=20 >>> int reserved; >>> struct page *tail =3D pfn_to_page(pfn); @@ -124,6 +12= 8,7 >>> @@ bool kvm_is_mmio_pfn(pfn_t pfn) >>> } >>> return PageReserved(tail); >>> } >>> +#endif >>>=20 >>> return true; >>> } >>>=20 >>> Before apply this change: >>>=20 >>> real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)= /5=3D >> 1m21.376s >>> user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)= /5=3D >> 0m23.433s >>> sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5=3D= 0m50.349s >>>=20 >>> After apply this change: >>>=20 >>> real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)= /5=3D >> 1m20.667s >>> user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)= /5=3D >> 0m22.615s >>> sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5=3D= 0m49.496s >>>=20 >>> So, >>>=20 >>> real (1m20.667s - 1m21.376s)/1m21.376s x 100% =3D -0.6% >>> user (0m22.615s - 0m23.433s)/0m23.433s x 100% =3D -3.5% >>> sys (0m49.496s - 0m50.349s)/0m50.349s x 100% =3D -1.7% >>=20 >> Very nice, so there is a real world performance benefit to doing thi= s. Then yes, >> I think it would make sense to change the global helper function to = be fast on >> e500 and use that one from e500_shadow_mas2_attrib() instead. >=20 > Are not we going to use page_is_ram() from e500_shadow_mas2_attrib()= as Scott commented? rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()? Alex From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Date: Wed, 24 Jul 2013 09:21:11 +0000 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages Message-Id: <708AA5B2-2073-4D24-8B3C-23CC113594C2@suse.de> List-Id: References: <1374127456-9614-1-git-send-email-Bharat.Bhushan@freescale.com> <1374127456-9614-2-git-send-email-Bharat.Bhushan@freescale.com> <51E78A88.4000904@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6AFF@039-SN2MPN1-013.039d.mgd.msft.net> <51E799D0.1030406@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6DAB@039-SN2MPN1-013.039d.mgd.msft.net> <51E7A5A2.5040107@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6E79@039-SN2MPN1-013.039d.mgd.msft.net> <51E7BE90.60300@windriver.com> <50683BE9-8BDD-4730-B866-F972BCF1EDD6@suse.de> <51E7C14C.6060600@windriver.com> <51EF3B67.5060403@windriver.com> <597D9B3F-BCE8-4483-B485-3035D6D443AC@suse.de> <6A3DF150A5B70D4F9B66A25E3F7C888D070E17FA@039 -SN2MPN1-013.039d.mgd.msft.net> In-Reply-To: <6A3DF150A5B70D4F9B66A25E3F7C888D070E17FA@039-SN2MPN1-013.039d.mgd.msft.net> MIME-Version: 1.0 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable To: Bhushan Bharat-R65777 Cc: =?windows-1252?Q?=22=93tiejun=2Echen=94=22?= , "kvm-ppc@vger.kernel.org" , "kvm@vger.kernel.org list" , Wood Scott-B07421 , Gleb Natapov , Paolo Bonzini On 24.07.2013, at 11:11, Bhushan Bharat-R65777 wrote: >=20 >=20 >> -----Original Message----- >> From: Alexander Graf [mailto:agraf@suse.de] >> Sent: Wednesday, July 24, 2013 1:55 PM >> To: "=93tiejun.chen=94" >> Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; kvm@vger.kernel.org = list; >> Wood Scott-B07421; Gleb Natapov; Paolo Bonzini >> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kern= el >> managed pages >>=20 >>=20 >> On 24.07.2013, at 04:26, =93tiejun.chen=94 wrote: >>=20 >>> On 07/18/2013 06:27 PM, Alexander Graf wrote: >>>>=20 >>>> On 18.07.2013, at 12:19, =93tiejun.chen=94 wrote: >>>>=20 >>>>> On 07/18/2013 06:12 PM, Alexander Graf wrote: >>>>>>=20 >>>>>> On 18.07.2013, at 12:08, =93tiejun.chen=94 wrote: >>>>>>=20 >>>>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote: >>>>>>>>=20 >>>>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote: >>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Bhushan Bharat-R65777 >>>>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM >>>>>>>>>> To: '" tiejun.chen "' >>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; >>>>>>>>>> agraf@suse.de; Wood Scott- >>>>>>>>>> B07421 >>>>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only >>>>>>>>>> for kernel managed pages >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com] >>>>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM >>>>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; >>>>>>>>>>> agraf@suse.de; Wood >>>>>>>>>>> Scott- >>>>>>>>>>> B07421 >>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency >>>>>>>>>>> only for kernel managed pages >>>>>>>>>>>=20 >>>>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: >>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org >>>>>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of " tiejun.= chen " >>>>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM >>>>>>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; >>>>>>>>>>>>> agraf@suse.de; Wood >>>>>>>>>>>>> Scott- >>>>>>>>>>>>> B07421 >>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency >>>>>>>>>>>>> only for kernel managed pages >>>>>>>>>>>>>=20 >>>>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com] >>>>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM >>>>>>>>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; >>>>>>>>>>>>>>> agraf@suse.de; Wood >>>>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777 >>>>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency >>>>>>>>>>>>>>> only for kernel managed pages >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote: >>>>>>>>>>>>>>>> If there is a struct page for the requested mapping then >>>>>>>>>>>>>>>> it's normal DDR and the mapping sets "M" bit (coherent, >>>>>>>>>>>>>>>> cacheable) else this is treated as I/O and we set "I + >>>>>>>>>>>>>>>> G" (cache inhibited, >>>>>>>>>>>>>>>> guarded) >>>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned >>>>>>>>>>>>>>>> device >>>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++----- >>>>>>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-) >>>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>>>>> index 1c6a9d7..089c227 100644 >>>>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32 >>>>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int >>>>>>>>>>>>>>> usermode) >>>>>>>>>>>>>>>> return mas3; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int >>>>>>>>>>>>>>>> usermode) >>>>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, >>>>>>>>>>>>>>>> +pfn_t pfn) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> + u32 mas2_attr; >>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>> + mas2_attr =3D mas2 & MAS2_ATTRIB_MASK; >>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>> + if (!pfn_valid(pfn)) { >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()? >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> What I understand from this function (someone can correct >>>>>>>>>>>>>> me) is that it >>>>>>>>>>>>> returns "false" when the page is managed by kernel and is >>>>>>>>>>>>> not marked as RESERVED (for some reason). For us it does not >>>>>>>>>>>>> matter whether the page is reserved or not, if it is kernel >>>>>>>>>>>>> visible page then it >>>>>>>>>> is DDR. >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>> I think you are setting I|G by addressing all mmio pages, >>>>>>>>>>>>> right? If so, >>>>>>>>>>>>>=20 >>>>>>>>>>>>> KVM: direct mmio pfn check >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Userspace may specify memory slots that are backed by >>>>>>>>>>>>> mmio pages rather than >>>>>>>>>>>>> normal RAM. In some cases it is not enough to identify >>>>>>>>>>>>> these mmio >>>>>>>>>>> pages >>>>>>>>>>>>> by pfn_valid(). This patch adds checking the PageReserve= d as >> well. >>>>>>>>>>>>=20 >>>>>>>>>>>> Do you know what are those "some cases" and how checking >>>>>>>>>>>> PageReserved helps in >>>>>>>>>>> those cases? >>>>>>>>>>>=20 >>>>>>>>>>> No, myself didn't see these actual cases in qemu,too. But this >>>>>>>>>>> should be chronically persistent as I understand ;-) >>>>>>>>>>=20 >>>>>>>>>> Then I will wait till someone educate me :) >>>>>>>>>=20 >>>>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and= I do >> not want to call this for all tlbwe operation unless it is necessary. >>>>>>>>=20 >>>>>>>> It certainly does more than we need and potentially slows down the= fast >> path (RAM mapping). The only thing it does on top of "if (pfn_valid())" = is to >> check for pages that are declared reserved on the host. This happens in = 2 cases: >>>>>>>>=20 >>>>>>>> 1) Non cache coherent DMA >>>>>>>> 2) Memory hot remove >>>>>>>>=20 >>>>>>>> The non coherent DMA case would be interesting, as with the mechan= ism as >> it is in place in Linux today, we could potentially break normal guest o= peration >> if we don't take it into account. However, it's Kconfig guarded by: >>>>>>>>=20 >>>>>>>> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_CO= MMON >>>>>>>> default n if PPC_47x >>>>>>>> default y >>>>>>>>=20 >>>>>>>> so we never hit it with any core we care about ;). >>>>>>>>=20 >>>>>>>> Memory hot remove does not exist on e500 FWIW, so we don't have to= worry >> about that one either. >>>>>>>=20 >>>>>>> Thanks for this good information :) >>>>>>>=20 >>>>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside >> kvm_is_mmio_pfn() to make sure that check is only valid when that is rea= lly >> needed? This can decrease those unnecessary performance loss. >>>>>>>=20 >>>>>>> If I'm wrong please correct me :) >>>>>>=20 >>>>>> You're perfectly right, but this is generic KVM code. So it gets run= across >> all architectures. What if someone has the great idea to add a new case = here for >> x86, but doesn't tell us? In that case we potentially break x86. >>>>>>=20 >>>>>> I'd rather not like to break x86 :). >>>>>>=20 >>>>>> However, it'd be very interesting to see a benchmark with this chang= e. Do >> you think you could just rip out the whole reserved check and run a few >> benchmarks and show us the results? >>>>>>=20 >>>>>=20 >>>>> Often what case should be adopted to validate this scenario? >>>>=20 >>>> Something which hammers the TLB emulation heavily. I usually just run >>>> /bin/echo a thousand times in "time" and see how long it takes ;) >>>>=20 >>>=20 >>> I tried to run five times with this combination, "time `for ((i=3D0; i<= 5000; >> i++)); do /bin/echo; done`", to calculate the average value with this c= hange: >>>=20 >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index >>> 1580dd4..5e8635b 100644 >>> --- a/virt/kvm/kvm_main.c >>> +++ b/virt/kvm/kvm_main.c >>> @@ -102,6 +102,10 @@ static bool largepages_enabled =3D true; >>>=20 >>> bool kvm_is_mmio_pfn(pfn_t pfn) >>> { >>> +#ifdef CONFIG_MEMORY_HOTPLUG >>=20 >> I'd feel safer if we narrow this down to e500. >>=20 >>> + /* >>> + * Currently only in memory hot remove case we may still need t= his. >>> + */ >>> if (pfn_valid(pfn)) { >>=20 >> We still have to check for pfn_valid, no? So the #ifdef should be down h= ere. >>=20 >>> int reserved; >>> struct page *tail =3D pfn_to_page(pfn); @@ -124,6 +128,7 >>> @@ bool kvm_is_mmio_pfn(pfn_t pfn) >>> } >>> return PageReserved(tail); >>> } >>> +#endif >>>=20 >>> return true; >>> } >>>=20 >>> Before apply this change: >>>=20 >>> real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5>>= 1m21.376s >>> user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5>>= 0m23.433s >>> sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5=3D 0m= 50.349s >>>=20 >>> After apply this change: >>>=20 >>> real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5>>= 1m20.667s >>> user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5>>= 0m22.615s >>> sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5=3D 0m= 49.496s >>>=20 >>> So, >>>=20 >>> real (1m20.667s - 1m21.376s)/1m21.376s x 100% =3D -0.6% >>> user (0m22.615s - 0m23.433s)/0m23.433s x 100% =3D -3.5% >>> sys (0m49.496s - 0m50.349s)/0m50.349s x 100% =3D -1.7% >>=20 >> Very nice, so there is a real world performance benefit to doing this. T= hen yes, >> I think it would make sense to change the global helper function to be f= ast on >> e500 and use that one from e500_shadow_mas2_attrib() instead. >=20 > Are not we going to use page_is_ram() from e500_shadow_mas2_attrib() as = Scott commented? rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()? Alex