From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?IuKAnHRpZWp1bi5jaGVu4oCdIg==?= Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages Date: Thu, 18 Jul 2013 18:14:46 +0800 Message-ID: <51E7C016.1070808@windriver.com> References: <1374127456-9614-1-git-send-email-Bharat.Bhushan@freescale.com> <1374127456-9614-2-git-send-email-Bharat.Bhushan@freescale.com> <51E78A88.4000904@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6AFF@039-SN2MPN1-013.039d.mgd.msft.net> <51E799D0.1030406@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6DAB@039-SN2MPN1-013.039d.mgd.msft.net> <51E7A5A2.5040107@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6E79@039-SN2MPN1-013.039d.mgd.msft.net> <51E7AD7F.5080004@windriver.com> <42B0AAD2-F16A-4931-9B84-284260E4B436@suse.de> <51E7BBCA.6020908@windriver.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Bhushan Bharat-R65777 , "kvm-ppc@vger.kernel.org" , "kvm@vger.kernel.org" , Wood Scott-B07421 To: Alexander Graf Return-path: In-Reply-To: Sender: kvm-ppc-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 07/18/2013 06:00 PM, Alexander Graf wrote: > > On 18.07.2013, at 11:56, =E2=80=9Ctiejun.chen=E2=80=9D wrote: > >> On 07/18/2013 05:44 PM, Alexander Graf wrote: >>> >>> On 18.07.2013, at 10:55, =EF=BF=BDtiejun.chen=EF=BF=BD wrote: >>> >>>> On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote: >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Bhushan Bharat-R65777 >>>>>> Sent: Thursday, July 18, 2013 1:53 PM >>>>>> To: '"=EF=BF=BDtiejun.chen=EF=BF=BD"' >>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;= Wood Scott- >>>>>> B07421 >>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only = for kernel >>>>>> managed pages >>>>>> >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: "=EF=BF=BDtiejun.chen=EF=BF=BD" [mailto:tiejun.chen@windr= iver.com] >>>>>>> Sent: Thursday, July 18, 2013 1:52 PM >>>>>>> To: Bhushan Bharat-R65777 >>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de= ; Wood >>>>>>> Scott- >>>>>>> B07421 >>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only= for >>>>>>> kernel managed pages >>>>>>> >>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: kvm-ppc-owner@vger.kernel.org >>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "=EF=BF=BD= tiejun.chen=EF=BF=BD" >>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM >>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.= de; >>>>>>>>> Wood >>>>>>>>> Scott- >>>>>>>>> B07421 >>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency on= ly for >>>>>>>>> kernel managed pages >>>>>>>>> >>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: "=EF=BF=BDtiejun.chen=EF=BF=BD" [mailto:tiejun.chen@w= indriver.com] >>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM >>>>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@sus= e.de; >>>>>>>>>>> Wood >>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777 >>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency = only >>>>>>>>>>> for kernel managed pages >>>>>>>>>>> >>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote: >>>>>>>>>>>> If there is a struct page for the requested mapping then i= t's >>>>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheab= le) >>>>>>>>>>>> else this is treated as I/O and we set "I + G" (cache >>>>>>>>>>>> inhibited, >>>>>>>>>>>> guarded) >>>>>>>>>>>> >>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned = device >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Bharat Bhushan >>>>>>>>>>>> --- >>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-= ---- >>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>> index 1c6a9d7..089c227 100644 >>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32 >>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int >>>>>>>>>>> usermode) >>>>>>>>>>>> return mas3; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int >>>>>>>>>>>> usermode) >>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t= pfn) >>>>>>>>>>>> { >>>>>>>>>>>> + u32 mas2_attr; >>>>>>>>>>>> + >>>>>>>>>>>> + mas2_attr =3D mas2 & MAS2_ATTRIB_MASK; >>>>>>>>>>>> + >>>>>>>>>>>> + if (!pfn_valid(pfn)) { >>>>>>>>>>> >>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()? >>>>>>>>>> >>>>>>>>>> What I understand from this function (someone can correct me= ) is >>>>>>>>>> that it >>>>>>>>> returns "false" when the page is managed by kernel and is not >>>>>>>>> marked as RESERVED (for some reason). For us it does not matt= er >>>>>>>>> whether the page is reserved or not, if it is kernel visible = page then it >>>>>> is DDR. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I think you are setting I|G by addressing all mmio pages, rig= ht? If >>>>>>>>> so, >>>>>>>>> >>>>>>>>> KVM: direct mmio pfn check >>>>>>>>> >>>>>>>>> Userspace may specify memory slots that are backed by = mmio >>>>>>>>> pages rather than >>>>>>>>> normal RAM. In some cases it is not enough to identif= y these >>>>>>>>> mmio >>>>>>> pages >>>>>>>>> by pfn_valid(). This patch adds checking the PageRese= rved as well. >>>>>>>> >>>>>>>> Do you know what are those "some cases" and how checking >>>>>>>> PageReserved helps in >>>>>>> those cases? >>>>>>> >>>>>>> No, myself didn't see these actual cases in qemu,too. But this = should >>>>>>> be chronically persistent as I understand ;-) >>>>>> >>>>>> Then I will wait till someone educate me :) >>>>> >>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and= I do not want to call this for all tlbwe operation unless it is necess= ary. >>>> >>>> Furthermore, how to distinguish we're creating TLB entry for the d= evice assigned directly to the GS? >>> >>> Because other devices wouldn't be available to the guest through me= mory slots. >> >> Yes. >> >>> >>>> I think its unnecessary to always check if that is mmio's pfn sinc= e we have more non direct assigned devices. >>> >>> I'm not sure I understand. The shadow TLB code only knows "here is = a host virtual address". It needs to figure out whether the host physic= al address behind that is RAM (can access with cache enabled) or not (h= as to disable cache) >>> >> >> Sorry, looks I'm misleading you :-P >> >>>> So maybe we can introduce another helper to fixup that TLB entry i= n instead of this path. >>> >>> This path does fix up the shadow (host) TLB entry :). >>> >> >> I just mean whether we can have a separate path dedicated to those d= irect assigned devices, not go this common path :) > > I don't think it's possible to have a separate path without a certain= level of trust. In the current flow we don't trust anyone. We just che= ck every translated page whether we should enable caching or not. > > We could take that information from 2 other side though: > > 1) Memory Slot > 2) Guest TLB Flags > > If we take it from the memory slot we would have to trust QEMU (or an= y other user space) to give us the right hints. Malicious user space co= uld set invalid flags. Also we'd have to add logic to track this - whic= h doesn't exist today. > > If we take it from the guest we have to trust the guest. Malicious gu= ests could set invalid flags. Understood. > > Now why is setting invalid flags a problem? If I understand Scott cor= rectly, it can break the host if you access certain host devices with c= aching enabled. But to be sure I'd say we ask him directly :). Yes, we should certainly set I|G for that TLB entry mapping to device. > > Either way, not trusting anyone is definitely the safer choice. Definitely :) Tiejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?IuKAnHRpZWp1bi5jaGVu4oCdIg==?= Date: Thu, 18 Jul 2013 10:14:46 +0000 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages Message-Id: <51E7C016.1070808@windriver.com> List-Id: References: <1374127456-9614-1-git-send-email-Bharat.Bhushan@freescale.com> <1374127456-9614-2-git-send-email-Bharat.Bhushan@freescale.com> <51E78A88.4000904@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6AFF@039-SN2MPN1-013.039d.mgd.msft.net> <51E799D0.1030406@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6DAB@039-SN2MPN1-013.039d.mgd.msft.net> <51E7A5A2.5040107@windriver.com> <6A3DF150A5B70D4F9B66A25E3F7C888D070D6E79@039-SN2MPN1-013.039d.mgd.msft.net> <51E7AD7F.5080004@windriver.com> <42B0AAD2-F16A-4931-9B84-284260E4B436@suse.de> <51E7BBCA.6020908@windriver.com> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit To: Alexander Graf Cc: Bhushan Bharat-R65777 , "kvm-ppc@vger.kernel.org" , "kvm@vger.kernel.org" , Wood Scott-B07421 On 07/18/2013 06:00 PM, Alexander Graf wrote: > > On 18.07.2013, at 11:56, “tiejun.chen” wrote: > >> On 07/18/2013 05:44 PM, Alexander Graf wrote: >>> >>> On 18.07.2013, at 10:55, �tiejun.chen� wrote: >>> >>>> On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote: >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Bhushan Bharat-R65777 >>>>>> Sent: Thursday, July 18, 2013 1:53 PM >>>>>> To: '"�tiejun.chen�"' >>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott- >>>>>> B07421 >>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel >>>>>> managed pages >>>>>> >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com] >>>>>>> Sent: Thursday, July 18, 2013 1:52 PM >>>>>>> To: Bhushan Bharat-R65777 >>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood >>>>>>> Scott- >>>>>>> B07421 >>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for >>>>>>> kernel managed pages >>>>>>> >>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: kvm-ppc-owner@vger.kernel.org >>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "�tiejun.chen�" >>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM >>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; >>>>>>>>> Wood >>>>>>>>> Scott- >>>>>>>>> B07421 >>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for >>>>>>>>> kernel managed pages >>>>>>>>> >>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com] >>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM >>>>>>>>>>> To: Bhushan Bharat-R65777 >>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; >>>>>>>>>>> Wood >>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777 >>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only >>>>>>>>>>> for kernel managed pages >>>>>>>>>>> >>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote: >>>>>>>>>>>> If there is a struct page for the requested mapping then it's >>>>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable) >>>>>>>>>>>> else this is treated as I/O and we set "I + G" (cache >>>>>>>>>>>> inhibited, >>>>>>>>>>>> guarded) >>>>>>>>>>>> >>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned device >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Bharat Bhushan >>>>>>>>>>>> --- >>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++----- >>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>> index 1c6a9d7..089c227 100644 >>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c >>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32 >>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int >>>>>>>>>>> usermode) >>>>>>>>>>>> return mas3; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int >>>>>>>>>>>> usermode) >>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) >>>>>>>>>>>> { >>>>>>>>>>>> + u32 mas2_attr; >>>>>>>>>>>> + >>>>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK; >>>>>>>>>>>> + >>>>>>>>>>>> + if (!pfn_valid(pfn)) { >>>>>>>>>>> >>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()? >>>>>>>>>> >>>>>>>>>> What I understand from this function (someone can correct me) is >>>>>>>>>> that it >>>>>>>>> returns "false" when the page is managed by kernel and is not >>>>>>>>> marked as RESERVED (for some reason). For us it does not matter >>>>>>>>> whether the page is reserved or not, if it is kernel visible page then it >>>>>> is DDR. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I think you are setting I|G by addressing all mmio pages, right? If >>>>>>>>> so, >>>>>>>>> >>>>>>>>> KVM: direct mmio pfn check >>>>>>>>> >>>>>>>>> Userspace may specify memory slots that are backed by mmio >>>>>>>>> pages rather than >>>>>>>>> normal RAM. In some cases it is not enough to identify these >>>>>>>>> mmio >>>>>>> pages >>>>>>>>> by pfn_valid(). This patch adds checking the PageReserved as well. >>>>>>>> >>>>>>>> Do you know what are those "some cases" and how checking >>>>>>>> PageReserved helps in >>>>>>> those cases? >>>>>>> >>>>>>> No, myself didn't see these actual cases in qemu,too. But this should >>>>>>> be chronically persistent as I understand ;-) >>>>>> >>>>>> Then I will wait till someone educate me :) >>>>> >>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary. >>>> >>>> Furthermore, how to distinguish we're creating TLB entry for the device assigned directly to the GS? >>> >>> Because other devices wouldn't be available to the guest through memory slots. >> >> Yes. >> >>> >>>> I think its unnecessary to always check if that is mmio's pfn since we have more non direct assigned devices. >>> >>> I'm not sure I understand. The shadow TLB code only knows "here is a host virtual address". It needs to figure out whether the host physical address behind that is RAM (can access with cache enabled) or not (has to disable cache) >>> >> >> Sorry, looks I'm misleading you :-P >> >>>> So maybe we can introduce another helper to fixup that TLB entry in instead of this path. >>> >>> This path does fix up the shadow (host) TLB entry :). >>> >> >> I just mean whether we can have a separate path dedicated to those direct assigned devices, not go this common path :) > > I don't think it's possible to have a separate path without a certain level of trust. In the current flow we don't trust anyone. We just check every translated page whether we should enable caching or not. > > We could take that information from 2 other side though: > > 1) Memory Slot > 2) Guest TLB Flags > > If we take it from the memory slot we would have to trust QEMU (or any other user space) to give us the right hints. Malicious user space could set invalid flags. Also we'd have to add logic to track this - which doesn't exist today. > > If we take it from the guest we have to trust the guest. Malicious guests could set invalid flags. Understood. > > Now why is setting invalid flags a problem? If I understand Scott correctly, it can break the host if you access certain host devices with caching enabled. But to be sure I'd say we ask him directly :). Yes, we should certainly set I|G for that TLB entry mapping to device. > > Either way, not trusting anyone is definitely the safer choice. Definitely :) Tiejun