From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages Date: Wed, 24 Jul 2013 12:25:18 +0200 Message-ID: <64EAFC75-2C30-4DCF-9DD2-132D5F286667@suse.de> References: <6A3DF150A5B70D4F9B66A25E3F7C888D070D6E79@039-SN2MPN1-013.039d.mgd.msft.net> <51E7BE90.60300@windriver.com> <50683BE9-8BDD-4730-B866-F972BCF1EDD6@suse.de> <51E7C14C.6060600@windriver.com> <51EF3B67.5060403@windriver.com> <597D9B3F-BCE8-4483-B485-3035D6D443AC@suse.de> <20130724100104.GE16400@redhat.com> <20130724101932.GA14229@redhat.com> Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8BIT Cc: =?windows-1252?Q?=22=93tiejun=2Echen=94=22?= , Bhushan Bharat-R65777 , kvm-ppc@vger.kernel.org, "kvm@vger.kernel.org list" , Wood Scott-B07421 , Paolo Bonzini , Andrea Arcangeli To: Gleb Natapov Return-path: Received: from cantor2.suse.de ([195.135.220.15]:50422 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751222Ab3GXKZX convert rfc822-to-8bit (ORCPT ); Wed, 24 Jul 2013 06:25:23 -0400 In-Reply-To: <20130724101932.GA14229@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 24.07.2013, at 12:19, Gleb Natapov wrote: > On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote: >> >> On 24.07.2013, at 12:01, Gleb Natapov wrote: >> >>> Copying Andrea for him to verify that I am not talking nonsense :) >>> >>> On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote: >>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >>>>> index 1580dd4..5e8635b 100644 >>>>> --- a/virt/kvm/kvm_main.c >>>>> +++ b/virt/kvm/kvm_main.c >>>>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true; >>>>> >>>>> bool kvm_is_mmio_pfn(pfn_t pfn) >>>>> { >>>>> +#ifdef CONFIG_MEMORY_HOTPLUG >>>> >>>> I'd feel safer if we narrow this down to e500. >>>> >>>>> + /* >>>>> + * Currently only in memory hot remove case we may still need this. >>>>> + */ >>>>> if (pfn_valid(pfn)) { >>>> >>>> We still have to check for pfn_valid, no? So the #ifdef should be down here. >>>> >>>>> int reserved; >>>>> struct page *tail = pfn_to_page(pfn); >>>>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn) >>>>> } >>>>> return PageReserved(tail); >>>>> } >>>>> +#endif >>>>> >>>>> return true; >>>>> } >>>>> >>>>> Before apply this change: >>>>> >>>>> real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s >>>>> user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s >>>>> sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s >>>>> >>>>> After apply this change: >>>>> >>>>> real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s >>>>> user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s >>>>> sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s >>>>> >>>>> So, >>>>> >>>>> real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6% >>>>> user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5% >>>>> sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7% >>>> >>>> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead. >>>> >>>> Gleb, Paolo, any hard feelings? >>>> >>> I do not see how can we break the function in such a way and get >>> away with it. Not all valid pfns point to memory. Physical address can >>> be sparse (due to PCI hole, framebuffer or just because). >> >> But we don't check for sparseness today in here either. We merely check for incomplete huge pages. >> > That's not how I read the code. The code checks for reserved flag set. > It should be set on pfns that point to memory holes. As far as I I couldn't find any traces of code that sets the reserved bits on e500 chips though. I've only seen it getting set for memory hotplug and memory incoherent DMA code which doesn't get used on e500. But I'd be more than happy to get proven wrong :). Alex > understand huge page tricks they are there to guaranty that THP does not > change flags under us, Andrea? > > -- > Gleb. > -- > To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Date: Wed, 24 Jul 2013 10:25:18 +0000 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages Message-Id: <64EAFC75-2C30-4DCF-9DD2-132D5F286667@suse.de> List-Id: References: <6A3DF150A5B70D4F9B66A25E3F7C888D070D6E79@039-SN2MPN1-013.039d.mgd.msft.net> <51E7BE90.60300@windriver.com> <50683BE9-8BDD-4730-B866-F972BCF1EDD6@suse.de> <51E7C14C.6060600@windriver.com> <51EF3B67.5060403@windriver.com> <597D9B3F-BCE8-4483-B485-3035D6D443AC@suse.de> <20130724100104.GE16400@redhat.com> <20130724101932.GA14229@redhat.com> In-Reply-To: <20130724101932.GA14229@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Gleb Natapov Cc: =?windows-1252?Q?=22=93tiejun=2Echen=94=22?= , Bhushan Bharat-R65777 , kvm-ppc@vger.kernel.org, "kvm@vger.kernel.org list" , Wood Scott-B07421 , Paolo Bonzini , Andrea Arcangeli On 24.07.2013, at 12:19, Gleb Natapov wrote: > On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote: >> >> On 24.07.2013, at 12:01, Gleb Natapov wrote: >> >>> Copying Andrea for him to verify that I am not talking nonsense :) >>> >>> On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote: >>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >>>>> index 1580dd4..5e8635b 100644 >>>>> --- a/virt/kvm/kvm_main.c >>>>> +++ b/virt/kvm/kvm_main.c >>>>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true; >>>>> >>>>> bool kvm_is_mmio_pfn(pfn_t pfn) >>>>> { >>>>> +#ifdef CONFIG_MEMORY_HOTPLUG >>>> >>>> I'd feel safer if we narrow this down to e500. >>>> >>>>> + /* >>>>> + * Currently only in memory hot remove case we may still need this. >>>>> + */ >>>>> if (pfn_valid(pfn)) { >>>> >>>> We still have to check for pfn_valid, no? So the #ifdef should be down here. >>>> >>>>> int reserved; >>>>> struct page *tail = pfn_to_page(pfn); >>>>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn) >>>>> } >>>>> return PageReserved(tail); >>>>> } >>>>> +#endif >>>>> >>>>> return true; >>>>> } >>>>> >>>>> Before apply this change: >>>>> >>>>> real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s >>>>> user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s >>>>> sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s >>>>> >>>>> After apply this change: >>>>> >>>>> real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s >>>>> user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s >>>>> sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s >>>>> >>>>> So, >>>>> >>>>> real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6% >>>>> user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5% >>>>> sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7% >>>> >>>> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead. >>>> >>>> Gleb, Paolo, any hard feelings? >>>> >>> I do not see how can we break the function in such a way and get >>> away with it. Not all valid pfns point to memory. Physical address can >>> be sparse (due to PCI hole, framebuffer or just because). >> >> But we don't check for sparseness today in here either. We merely check for incomplete huge pages. >> > That's not how I read the code. The code checks for reserved flag set. > It should be set on pfns that point to memory holes. As far as I I couldn't find any traces of code that sets the reserved bits on e500 chips though. I've only seen it getting set for memory hotplug and memory incoherent DMA code which doesn't get used on e500. But I'd be more than happy to get proven wrong :). Alex > understand huge page tricks they are there to guaranty that THP does not > change flags under us, Andrea? > > -- > Gleb. > -- > To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html