From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages Date: Wed, 24 Jul 2013 12:09:42 +0200 Message-ID: References: <6A3DF150A5B70D4F9B66A25E3F7C888D070D6DAB@039-SN2MPN1-013.039d.mgd.msft.net> <51E7A5A2.5040107@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6E79@039-SN2MPN1-013.039d.mgd.msft.net> <51E7BE90.60300@windriver.com> <50683BE9-8BDD-4730-B866-F972BCF1EDD6@suse.de> <51E7C14C.6060600@windriver.com> <51EF3B67.5060403@windriver.com> <597D9B3F-BCE8-4483-B485-3035D6D443AC@suse.de> <20130724100104.GE16400@redhat.com> Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8BIT Cc: =?windows-1252?Q?=22=93tiejun=2Echen=94=22?= , Bhushan Bharat-R65777 , kvm-ppc@vger.kernel.org, "kvm@vger.kernel.org list" , Wood Scott-B07421 , Paolo Bonzini , Andrea Arcangeli To: Gleb Natapov Return-path: In-Reply-To: <20130724100104.GE16400@redhat.com> Sender: kvm-ppc-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 24.07.2013, at 12:01, Gleb Natapov wrote: > Copying Andrea for him to verify that I am not talking nonsense :) > > On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote: >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >>> index 1580dd4..5e8635b 100644 >>> --- a/virt/kvm/kvm_main.c >>> +++ b/virt/kvm/kvm_main.c >>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true; >>> >>> bool kvm_is_mmio_pfn(pfn_t pfn) >>> { >>> +#ifdef CONFIG_MEMORY_HOTPLUG >> >> I'd feel safer if we narrow this down to e500. >> >>> + /* >>> + * Currently only in memory hot remove case we may still need this. >>> + */ >>> if (pfn_valid(pfn)) { >> >> We still have to check for pfn_valid, no? So the #ifdef should be down here. >> >>> int reserved; >>> struct page *tail = pfn_to_page(pfn); >>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn) >>> } >>> return PageReserved(tail); >>> } >>> +#endif >>> >>> return true; >>> } >>> >>> Before apply this change: >>> >>> real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s >>> user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s >>> sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s >>> >>> After apply this change: >>> >>> real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s >>> user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s >>> sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s >>> >>> So, >>> >>> real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6% >>> user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5% >>> sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7% >> >> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead. >> >> Gleb, Paolo, any hard feelings? >> > I do not see how can we break the function in such a way and get > away with it. Not all valid pfns point to memory. Physical address can > be sparse (due to PCI hole, framebuffer or just because). But we don't check for sparseness today in here either. We merely check for incomplete huge pages. Alex From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Date: Wed, 24 Jul 2013 10:09:42 +0000 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages Message-Id: List-Id: References: <6A3DF150A5B70D4F9B66A25E3F7C888D070D6DAB@039-SN2MPN1-013.039d.mgd.msft.net> <51E7A5A2.5040107@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6E79@039-SN2MPN1-013.039d.mgd.msft.net> <51E7BE90.60300@windriver.com> <50683BE9-8BDD-4730-B866-F972BCF1EDD6@suse.de> <51E7C14C.6060600@windriver.com> <51EF3B67.5060403@windriver.com> <597D9B3F-BCE8-4483-B485-3035D6D443AC@suse.de> <20130724100104.GE16400@redhat.com> In-Reply-To: <20130724100104.GE16400@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Gleb Natapov Cc: =?windows-1252?Q?=22=93tiejun=2Echen=94=22?= , Bhushan Bharat-R65777 , kvm-ppc@vger.kernel.org, "kvm@vger.kernel.org list" , Wood Scott-B07421 , Paolo Bonzini , Andrea Arcangeli On 24.07.2013, at 12:01, Gleb Natapov wrote: > Copying Andrea for him to verify that I am not talking nonsense :) > > On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote: >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >>> index 1580dd4..5e8635b 100644 >>> --- a/virt/kvm/kvm_main.c >>> +++ b/virt/kvm/kvm_main.c >>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true; >>> >>> bool kvm_is_mmio_pfn(pfn_t pfn) >>> { >>> +#ifdef CONFIG_MEMORY_HOTPLUG >> >> I'd feel safer if we narrow this down to e500. >> >>> + /* >>> + * Currently only in memory hot remove case we may still need this. >>> + */ >>> if (pfn_valid(pfn)) { >> >> We still have to check for pfn_valid, no? So the #ifdef should be down here. >> >>> int reserved; >>> struct page *tail = pfn_to_page(pfn); >>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn) >>> } >>> return PageReserved(tail); >>> } >>> +#endif >>> >>> return true; >>> } >>> >>> Before apply this change: >>> >>> real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s >>> user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s >>> sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s >>> >>> After apply this change: >>> >>> real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s >>> user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s >>> sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s >>> >>> So, >>> >>> real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6% >>> user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5% >>> sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7% >> >> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead. >> >> Gleb, Paolo, any hard feelings? >> > I do not see how can we break the function in such a way and get > away with it. Not all valid pfns point to memory. Physical address can > be sparse (due to PCI hole, framebuffer or just because). But we don't check for sparseness today in here either. We merely check for incomplete huge pages. Alex