* [PATCH 1/2] kvm: powerpc: Do not ignore "E" attribute in mas2
@ 2013-07-18 6:04 Bharat Bhushan
2013-07-18 6:04 ` [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages Bharat Bhushan
0 siblings, 1 reply; 44+ messages in thread
From: Bharat Bhushan @ 2013-07-18 6:04 UTC (permalink / raw)
To: kvm-ppc, kvm, agraf, scottwood; +Cc: Bharat Bhushan, Bharat Bhushan
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
arch/powerpc/kvm/e500.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index c2e5e98..277cb18 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -117,7 +117,7 @@ static inline struct kvmppc_vcpu_e500 *to_e500(struct kvm_vcpu *vcpu)
#define E500_TLB_USER_PERM_MASK (MAS3_UX|MAS3_UR|MAS3_UW)
#define E500_TLB_SUPER_PERM_MASK (MAS3_SX|MAS3_SR|MAS3_SW)
#define MAS2_ATTRIB_MASK \
- (MAS2_X0 | MAS2_X1)
+ (MAS2_X0 | MAS2_X1 | MAS2_E)
#define MAS3_ATTRIB_MASK \
(MAS3_U0 | MAS3_U1 | MAS3_U2 | MAS3_U3 \
| E500_TLB_USER_PERM_MASK | E500_TLB_SUPER_PERM_MASK)
--
1.7.0.4
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 6:04 [PATCH 1/2] kvm: powerpc: Do not ignore "E" attribute in mas2 Bharat Bhushan
@ 2013-07-18 6:04 ` Bharat Bhushan
2013-07-18 6:26 ` "“tiejun.chen”"
2013-07-18 8:27 ` "“tiejun.chen”"
0 siblings, 2 replies; 44+ messages in thread
From: Bharat Bhushan @ 2013-07-18 6:04 UTC (permalink / raw)
To: kvm-ppc, kvm, agraf, scottwood; +Cc: Bharat Bhushan, Bharat Bhushan
If there is a struct page for the requested mapping then it's
normal DDR and the mapping sets "M" bit (coherent, cacheable)
else this is treated as I/O and we set "I + G" (cache inhibited, guarded)
This helps setting proper TLB mapping for direct assigned device
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
1 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..089c227 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode)
return mas3;
}
-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
+static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
{
+ u32 mas2_attr;
+
+ mas2_attr = mas2 & MAS2_ATTRIB_MASK;
+
+ if (!pfn_valid(pfn)) {
+ mas2_attr |= MAS2_I | MAS2_G;
+ } else {
#ifdef CONFIG_SMP
- return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
-#else
- return mas2 & MAS2_ATTRIB_MASK;
+ mas2_attr |= MAS2_M;
#endif
+ }
+ return mas2_attr;
}
/*
@@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
/* Force IPROT=0 for all guest mappings. */
stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
stlbe->mas2 = (gvaddr & MAS2_EPN) |
- e500_shadow_mas2_attrib(gtlbe->mas2, pr);
+ e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
--
1.7.0.4
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 6:04 ` [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages Bharat Bhushan
@ 2013-07-18 6:26 ` "“tiejun.chen”"
2013-07-18 7:12 ` Bhushan Bharat-R65777
2013-07-18 8:27 ` "“tiejun.chen”"
1 sibling, 1 reply; 44+ messages in thread
From: "“tiejun.chen”" @ 2013-07-18 6:26 UTC (permalink / raw)
To: Bharat Bhushan; +Cc: kvm-ppc, kvm, agraf, scottwood, Bharat Bhushan
On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> If there is a struct page for the requested mapping then it's
> normal DDR and the mapping sets "M" bit (coherent, cacheable)
> else this is treated as I/O and we set "I + G" (cache inhibited, guarded)
>
> This helps setting proper TLB mapping for direct assigned device
>
> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> ---
> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
> 1 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
> index 1c6a9d7..089c227 100644
> --- a/arch/powerpc/kvm/e500_mmu_host.c
> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode)
> return mas3;
> }
>
> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> {
> + u32 mas2_attr;
> +
> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> +
> + if (!pfn_valid(pfn)) {
Why not directly use kvm_is_mmio_pfn()?
Tiejun
> + mas2_attr |= MAS2_I | MAS2_G;
> + } else {
> #ifdef CONFIG_SMP
> - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
> -#else
> - return mas2 & MAS2_ATTRIB_MASK;
> + mas2_attr |= MAS2_M;
> #endif
> + }
> + return mas2_attr;
> }
>
> /*
> @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
> /* Force IPROT=0 for all guest mappings. */
> stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
> stlbe->mas2 = (gvaddr & MAS2_EPN) |
> - e500_shadow_mas2_attrib(gtlbe->mas2, pr);
> + e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
> stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
> e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
>
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 6:26 ` "“tiejun.chen”"
@ 2013-07-18 7:12 ` Bhushan Bharat-R65777
2013-07-18 7:31 ` "“tiejun.chen”"
0 siblings, 1 reply; 44+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-07-18 7:12 UTC (permalink / raw)
To: "“tiejun.chen”"
Cc: kvm-ppc, kvm, agraf, Wood Scott-B07421
> -----Original Message-----
> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
> Sent: Thursday, July 18, 2013 11:56 AM
> To: Bhushan Bharat-R65777
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
> B07421; Bhushan Bharat-R65777
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
> managed pages
>
> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> > If there is a struct page for the requested mapping then it's normal
> > DDR and the mapping sets "M" bit (coherent, cacheable) else this is
> > treated as I/O and we set "I + G" (cache inhibited, guarded)
> >
> > This helps setting proper TLB mapping for direct assigned device
> >
> > Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> > ---
> > arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
> > 1 files changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> > b/arch/powerpc/kvm/e500_mmu_host.c
> > index 1c6a9d7..089c227 100644
> > --- a/arch/powerpc/kvm/e500_mmu_host.c
> > +++ b/arch/powerpc/kvm/e500_mmu_host.c
> > @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int
> usermode)
> > return mas3;
> > }
> >
> > -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
> > +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> > {
> > + u32 mas2_attr;
> > +
> > + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> > +
> > + if (!pfn_valid(pfn)) {
>
> Why not directly use kvm_is_mmio_pfn()?
What I understand from this function (someone can correct me) is that it returns "false" when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR.
-Bharat
>
> Tiejun
>
> > + mas2_attr |= MAS2_I | MAS2_G;
> > + } else {
> > #ifdef CONFIG_SMP
> > - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
> > -#else
> > - return mas2 & MAS2_ATTRIB_MASK;
> > + mas2_attr |= MAS2_M;
> > #endif
> > + }
> > + return mas2_attr;
> > }
> >
> > /*
> > @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
> > /* Force IPROT=0 for all guest mappings. */
> > stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
> > stlbe->mas2 = (gvaddr & MAS2_EPN) |
> > - e500_shadow_mas2_attrib(gtlbe->mas2, pr);
> > + e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
> > stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
> > e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
> >
> >
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 7:12 ` Bhushan Bharat-R65777
@ 2013-07-18 7:31 ` "“tiejun.chen”"
2013-07-18 8:08 ` Bhushan Bharat-R65777
0 siblings, 1 reply; 44+ messages in thread
From: "“tiejun.chen”" @ 2013-07-18 7:31 UTC (permalink / raw)
To: Bhushan Bharat-R65777; +Cc: kvm-ppc, kvm, agraf, Wood Scott-B07421
On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>
>
>> -----Original Message-----
>> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
>> Sent: Thursday, July 18, 2013 11:56 AM
>> To: Bhushan Bharat-R65777
>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>> B07421; Bhushan Bharat-R65777
>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>> managed pages
>>
>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>> If there is a struct page for the requested mapping then it's normal
>>> DDR and the mapping sets "M" bit (coherent, cacheable) else this is
>>> treated as I/O and we set "I + G" (cache inhibited, guarded)
>>>
>>> This helps setting proper TLB mapping for direct assigned device
>>>
>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>> ---
>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>> index 1c6a9d7..089c227 100644
>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>> @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int
>> usermode)
>>> return mas3;
>>> }
>>>
>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>> {
>>> + u32 mas2_attr;
>>> +
>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>> +
>>> + if (!pfn_valid(pfn)) {
>>
>> Why not directly use kvm_is_mmio_pfn()?
>
> What I understand from this function (someone can correct me) is that it returns "false" when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR.
>
I think you are setting I|G by addressing all mmio pages, right? If so,
KVM: direct mmio pfn check
Userspace may specify memory slots that are backed by mmio pages rather than
normal RAM. In some cases it is not enough to identify these mmio pages
by pfn_valid(). This patch adds checking the PageReserved as well.
Tiejun
> -Bharat
>
>>
>> Tiejun
>>
>>> + mas2_attr |= MAS2_I | MAS2_G;
>>> + } else {
>>> #ifdef CONFIG_SMP
>>> - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
>>> -#else
>>> - return mas2 & MAS2_ATTRIB_MASK;
>>> + mas2_attr |= MAS2_M;
>>> #endif
>>> + }
>>> + return mas2_attr;
>>> }
>>>
>>> /*
>>> @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
>>> /* Force IPROT=0 for all guest mappings. */
>>> stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
>>> stlbe->mas2 = (gvaddr & MAS2_EPN) |
>>> - e500_shadow_mas2_attrib(gtlbe->mas2, pr);
>>> + e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
>>> stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
>>> e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 7:31 ` "“tiejun.chen”"
@ 2013-07-18 8:08 ` Bhushan Bharat-R65777
2013-07-18 8:21 ` "“tiejun.chen”"
0 siblings, 1 reply; 44+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-07-18 8:08 UTC (permalink / raw)
To: "“tiejun.chen”"
Cc: kvm-ppc, kvm, agraf, Wood Scott-B07421
> -----Original Message-----
> From: kvm-ppc-owner@vger.kernel.org [mailto:kvm-ppc-owner@vger.kernel.org] On
> Behalf Of "“tiejun.chen”"
> Sent: Thursday, July 18, 2013 1:01 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
> B07421
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
> managed pages
>
> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
> >
> >
> >> -----Original Message-----
> >> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
> >> Sent: Thursday, July 18, 2013 11:56 AM
> >> To: Bhushan Bharat-R65777
> >> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
> >> Scott- B07421; Bhushan Bharat-R65777
> >> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> >> kernel managed pages
> >>
> >> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> >>> If there is a struct page for the requested mapping then it's normal
> >>> DDR and the mapping sets "M" bit (coherent, cacheable) else this is
> >>> treated as I/O and we set "I + G" (cache inhibited, guarded)
> >>>
> >>> This helps setting proper TLB mapping for direct assigned device
> >>>
> >>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> >>> ---
> >>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
> >>> 1 files changed, 12 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> >>> b/arch/powerpc/kvm/e500_mmu_host.c
> >>> index 1c6a9d7..089c227 100644
> >>> --- a/arch/powerpc/kvm/e500_mmu_host.c
> >>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> >>> @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32
> >>> mas3, int
> >> usermode)
> >>> return mas3;
> >>> }
> >>>
> >>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
> >>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> >>> {
> >>> + u32 mas2_attr;
> >>> +
> >>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> >>> +
> >>> + if (!pfn_valid(pfn)) {
> >>
> >> Why not directly use kvm_is_mmio_pfn()?
> >
> > What I understand from this function (someone can correct me) is that it
> returns "false" when the page is managed by kernel and is not marked as RESERVED
> (for some reason). For us it does not matter whether the page is reserved or
> not, if it is kernel visible page then it is DDR.
> >
>
> I think you are setting I|G by addressing all mmio pages, right? If so,
>
> KVM: direct mmio pfn check
>
> Userspace may specify memory slots that are backed by mmio pages rather
> than
> normal RAM. In some cases it is not enough to identify these mmio pages
> by pfn_valid(). This patch adds checking the PageReserved as well.
Do you know what are those "some cases" and how checking PageReserved helps in those cases?
-Bharat
>
> Tiejun
>
> > -Bharat
> >
> >>
> >> Tiejun
> >>
> >>> + mas2_attr |= MAS2_I | MAS2_G;
> >>> + } else {
> >>> #ifdef CONFIG_SMP
> >>> - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
> >>> -#else
> >>> - return mas2 & MAS2_ATTRIB_MASK;
> >>> + mas2_attr |= MAS2_M;
> >>> #endif
> >>> + }
> >>> + return mas2_attr;
> >>> }
> >>>
> >>> /*
> >>> @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
> >>> /* Force IPROT=0 for all guest mappings. */
> >>> stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
> >>> stlbe->mas2 = (gvaddr & MAS2_EPN) |
> >>> - e500_shadow_mas2_attrib(gtlbe->mas2, pr);
> >>> + e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
> >>> stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
> >>> e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
> >>>
> >>>
> >>
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 8:08 ` Bhushan Bharat-R65777
@ 2013-07-18 8:21 ` "“tiejun.chen”"
2013-07-18 8:22 ` Bhushan Bharat-R65777
2013-07-18 8:25 ` Bhushan Bharat-R65777
0 siblings, 2 replies; 44+ messages in thread
From: "“tiejun.chen”" @ 2013-07-18 8:21 UTC (permalink / raw)
To: Bhushan Bharat-R65777; +Cc: kvm-ppc, kvm, agraf, Wood Scott-B07421
On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>
>
>> -----Original Message-----
>> From: kvm-ppc-owner@vger.kernel.org [mailto:kvm-ppc-owner@vger.kernel.org] On
>> Behalf Of "“tiejun.chen”"
>> Sent: Thursday, July 18, 2013 1:01 PM
>> To: Bhushan Bharat-R65777
>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>> B07421
>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>> managed pages
>>
>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>> To: Bhushan Bharat-R65777
>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>>> Scott- B07421; Bhushan Bharat-R65777
>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>> kernel managed pages
>>>>
>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>> If there is a struct page for the requested mapping then it's normal
>>>>> DDR and the mapping sets "M" bit (coherent, cacheable) else this is
>>>>> treated as I/O and we set "I + G" (cache inhibited, guarded)
>>>>>
>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>
>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>> ---
>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>> index 1c6a9d7..089c227 100644
>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>> @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32
>>>>> mas3, int
>>>> usermode)
>>>>> return mas3;
>>>>> }
>>>>>
>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>> {
>>>>> + u32 mas2_attr;
>>>>> +
>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>> +
>>>>> + if (!pfn_valid(pfn)) {
>>>>
>>>> Why not directly use kvm_is_mmio_pfn()?
>>>
>>> What I understand from this function (someone can correct me) is that it
>> returns "false" when the page is managed by kernel and is not marked as RESERVED
>> (for some reason). For us it does not matter whether the page is reserved or
>> not, if it is kernel visible page then it is DDR.
>>>
>>
>> I think you are setting I|G by addressing all mmio pages, right? If so,
>>
>> KVM: direct mmio pfn check
>>
>> Userspace may specify memory slots that are backed by mmio pages rather
>> than
>> normal RAM. In some cases it is not enough to identify these mmio pages
>> by pfn_valid(). This patch adds checking the PageReserved as well.
>
> Do you know what are those "some cases" and how checking PageReserved helps in those cases?
No, myself didn't see these actual cases in qemu,too. But this should be
chronically persistent as I understand ;-)
Tiejun
>
> -Bharat
>
>>
>> Tiejun
>>
>>> -Bharat
>>>
>>>>
>>>> Tiejun
>>>>
>>>>> + mas2_attr |= MAS2_I | MAS2_G;
>>>>> + } else {
>>>>> #ifdef CONFIG_SMP
>>>>> - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
>>>>> -#else
>>>>> - return mas2 & MAS2_ATTRIB_MASK;
>>>>> + mas2_attr |= MAS2_M;
>>>>> #endif
>>>>> + }
>>>>> + return mas2_attr;
>>>>> }
>>>>>
>>>>> /*
>>>>> @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
>>>>> /* Force IPROT=0 for all guest mappings. */
>>>>> stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
>>>>> stlbe->mas2 = (gvaddr & MAS2_EPN) |
>>>>> - e500_shadow_mas2_attrib(gtlbe->mas2, pr);
>>>>> + e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
>>>>> stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
>>>>> e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
>>>>>
>>>>>
>>>>
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body
>> of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 8:21 ` "“tiejun.chen”"
@ 2013-07-18 8:22 ` Bhushan Bharat-R65777
2013-07-18 8:25 ` Bhushan Bharat-R65777
1 sibling, 0 replies; 44+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-07-18 8:22 UTC (permalink / raw)
To: "“tiejun.chen”"
Cc: kvm-ppc, kvm, agraf, Wood Scott-B07421
> -----Original Message-----
> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
> Sent: Thursday, July 18, 2013 1:52 PM
> To: Bhushan Bharat-R65777
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
> B07421
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
> managed pages
>
> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
> >
> >
> >> -----Original Message-----
> >> From: kvm-ppc-owner@vger.kernel.org
> >> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "“tiejun.chen”"
> >> Sent: Thursday, July 18, 2013 1:01 PM
> >> To: Bhushan Bharat-R65777
> >> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
> >> Scott-
> >> B07421
> >> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> >> kernel managed pages
> >>
> >> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
> >>>> Sent: Thursday, July 18, 2013 11:56 AM
> >>>> To: Bhushan Bharat-R65777
> >>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
> >>>> Wood
> >>>> Scott- B07421; Bhushan Bharat-R65777
> >>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> >>>> kernel managed pages
> >>>>
> >>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> >>>>> If there is a struct page for the requested mapping then it's
> >>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable) else
> >>>>> this is treated as I/O and we set "I + G" (cache inhibited,
> >>>>> guarded)
> >>>>>
> >>>>> This helps setting proper TLB mapping for direct assigned device
> >>>>>
> >>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> >>>>> ---
> >>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
> >>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
> >>>>>
> >>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> >>>>> b/arch/powerpc/kvm/e500_mmu_host.c
> >>>>> index 1c6a9d7..089c227 100644
> >>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
> >>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> >>>>> @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32
> >>>>> mas3, int
> >>>> usermode)
> >>>>> return mas3;
> >>>>> }
> >>>>>
> >>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
> >>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> >>>>> {
> >>>>> + u32 mas2_attr;
> >>>>> +
> >>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> >>>>> +
> >>>>> + if (!pfn_valid(pfn)) {
> >>>>
> >>>> Why not directly use kvm_is_mmio_pfn()?
> >>>
> >>> What I understand from this function (someone can correct me) is
> >>> that it
> >> returns "false" when the page is managed by kernel and is not marked
> >> as RESERVED (for some reason). For us it does not matter whether the
> >> page is reserved or not, if it is kernel visible page then it is DDR.
> >>>
> >>
> >> I think you are setting I|G by addressing all mmio pages, right? If
> >> so,
> >>
> >> KVM: direct mmio pfn check
> >>
> >> Userspace may specify memory slots that are backed by mmio
> >> pages rather than
> >> normal RAM. In some cases it is not enough to identify these mmio
> pages
> >> by pfn_valid(). This patch adds checking the PageReserved as well.
> >
> > Do you know what are those "some cases" and how checking PageReserved helps in
> those cases?
>
> No, myself didn't see these actual cases in qemu,too. But this should be
> chronically persistent as I understand ;-)
Then I will wait till someone educate me :)
-Bharat
>
> Tiejun
>
> >
> > -Bharat
> >
> >>
> >> Tiejun
> >>
> >>> -Bharat
> >>>
> >>>>
> >>>> Tiejun
> >>>>
> >>>>> + mas2_attr |= MAS2_I | MAS2_G;
> >>>>> + } else {
> >>>>> #ifdef CONFIG_SMP
> >>>>> - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
> >>>>> -#else
> >>>>> - return mas2 & MAS2_ATTRIB_MASK;
> >>>>> + mas2_attr |= MAS2_M;
> >>>>> #endif
> >>>>> + }
> >>>>> + return mas2_attr;
> >>>>> }
> >>>>>
> >>>>> /*
> >>>>> @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
> >>>>> /* Force IPROT=0 for all guest mappings. */
> >>>>> stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
> >>>>> stlbe->mas2 = (gvaddr & MAS2_EPN) |
> >>>>> - e500_shadow_mas2_attrib(gtlbe->mas2, pr);
> >>>>> + e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
> >>>>> stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
> >>>>> e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> >> the body of a message to majordomo@vger.kernel.org More majordomo
> >> info at http://vger.kernel.org/majordomo-info.html
> >
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 8:21 ` "“tiejun.chen”"
2013-07-18 8:22 ` Bhushan Bharat-R65777
@ 2013-07-18 8:25 ` Bhushan Bharat-R65777
2013-07-18 8:55 ` "“tiejun.chen”"
2013-07-18 9:48 ` Alexander Graf
1 sibling, 2 replies; 44+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-07-18 8:25 UTC (permalink / raw)
To: Bhushan Bharat-R65777, "“tiejun.chen”"
Cc: kvm-ppc, kvm, agraf, Wood Scott-B07421
> -----Original Message-----
> From: Bhushan Bharat-R65777
> Sent: Thursday, July 18, 2013 1:53 PM
> To: '"“tiejun.chen”"'
> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
> B07421
> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
> managed pages
>
>
>
> > -----Original Message-----
> > From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
> > Sent: Thursday, July 18, 2013 1:52 PM
> > To: Bhushan Bharat-R65777
> > Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
> > Scott-
> > B07421
> > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> > kernel managed pages
> >
> > On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: kvm-ppc-owner@vger.kernel.org
> > >> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "“tiejun.chen”"
> > >> Sent: Thursday, July 18, 2013 1:01 PM
> > >> To: Bhushan Bharat-R65777
> > >> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
> > >> Wood
> > >> Scott-
> > >> B07421
> > >> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> > >> kernel managed pages
> > >>
> > >> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
> > >>>> Sent: Thursday, July 18, 2013 11:56 AM
> > >>>> To: Bhushan Bharat-R65777
> > >>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
> > >>>> Wood
> > >>>> Scott- B07421; Bhushan Bharat-R65777
> > >>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
> > >>>> for kernel managed pages
> > >>>>
> > >>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> > >>>>> If there is a struct page for the requested mapping then it's
> > >>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
> > >>>>> else this is treated as I/O and we set "I + G" (cache
> > >>>>> inhibited,
> > >>>>> guarded)
> > >>>>>
> > >>>>> This helps setting proper TLB mapping for direct assigned device
> > >>>>>
> > >>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> > >>>>> ---
> > >>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
> > >>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
> > >>>>>
> > >>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> > >>>>> b/arch/powerpc/kvm/e500_mmu_host.c
> > >>>>> index 1c6a9d7..089c227 100644
> > >>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
> > >>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> > >>>>> @@ -64,13 +64,20 @@ static inline u32
> > >>>>> e500_shadow_mas3_attrib(u32 mas3, int
> > >>>> usermode)
> > >>>>> return mas3;
> > >>>>> }
> > >>>>>
> > >>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
> > >>>>> usermode)
> > >>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> > >>>>> {
> > >>>>> + u32 mas2_attr;
> > >>>>> +
> > >>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> > >>>>> +
> > >>>>> + if (!pfn_valid(pfn)) {
> > >>>>
> > >>>> Why not directly use kvm_is_mmio_pfn()?
> > >>>
> > >>> What I understand from this function (someone can correct me) is
> > >>> that it
> > >> returns "false" when the page is managed by kernel and is not
> > >> marked as RESERVED (for some reason). For us it does not matter
> > >> whether the page is reserved or not, if it is kernel visible page then it
> is DDR.
> > >>>
> > >>
> > >> I think you are setting I|G by addressing all mmio pages, right? If
> > >> so,
> > >>
> > >> KVM: direct mmio pfn check
> > >>
> > >> Userspace may specify memory slots that are backed by mmio
> > >> pages rather than
> > >> normal RAM. In some cases it is not enough to identify these
> > >> mmio
> > pages
> > >> by pfn_valid(). This patch adds checking the PageReserved as well.
> > >
> > > Do you know what are those "some cases" and how checking
> > > PageReserved helps in
> > those cases?
> >
> > No, myself didn't see these actual cases in qemu,too. But this should
> > be chronically persistent as I understand ;-)
>
> Then I will wait till someone educate me :)
The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
-Bharat
> > >>>>> + mas2_attr |= MAS2_I | MAS2_G;
> > >>>>> + } else {
> > >>>>> #ifdef CONFIG_SMP
> > >>>>> - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
> > >>>>> -#else
> > >>>>> - return mas2 & MAS2_ATTRIB_MASK;
> > >>>>> + mas2_attr |= MAS2_M;
> > >>>>> #endif
> > >>>>> + }
> > >>>>> + return mas2_attr;
> > >>>>> }
> > >>>>>
> > >>>>> /*
> > >>>>> @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
> > >>>>> /* Force IPROT=0 for all guest mappings. */
> > >>>>> stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
> > >>>>> stlbe->mas2 = (gvaddr & MAS2_EPN) |
> > >>>>> - e500_shadow_mas2_attrib(gtlbe->mas2, pr);
> > >>>>> + e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
> > >>>>> stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
> > >>>>> e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
> > >>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe kvm-ppc"
> > >> in the body of a message to majordomo@vger.kernel.org More
> > >> majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> >
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 6:04 ` [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages Bharat Bhushan
2013-07-18 6:26 ` "“tiejun.chen”"
@ 2013-07-18 8:27 ` "“tiejun.chen”"
1 sibling, 0 replies; 44+ messages in thread
From: "“tiejun.chen”" @ 2013-07-18 8:27 UTC (permalink / raw)
To: Bharat Bhushan; +Cc: kvm-ppc, kvm, agraf, scottwood, Bharat Bhushan
On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> If there is a struct page for the requested mapping then it's
> normal DDR and the mapping sets "M" bit (coherent, cacheable)
> else this is treated as I/O and we set "I + G" (cache inhibited, guarded)
>
> This helps setting proper TLB mapping for direct assigned device
>
> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> ---
> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
> 1 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
> index 1c6a9d7..089c227 100644
> --- a/arch/powerpc/kvm/e500_mmu_host.c
> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode)
> return mas3;
> }
>
> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> {
> + u32 mas2_attr;
> +
> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> +
> + if (!pfn_valid(pfn)) {
> + mas2_attr |= MAS2_I | MAS2_G;
> + } else {
> #ifdef CONFIG_SMP
> - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
> -#else
> - return mas2 & MAS2_ATTRIB_MASK;
> + mas2_attr |= MAS2_M;
> #endif
> + }
Additionally, in UP case this little chunk of code is equivalent to
if (1) {
mas2_attr |= MAS2_I | MAS2_G;
} else {
}
So you'd better wrapper MAS2_m in advance like,
#ifdef CONFIG_SMP
#define M_IF_SMP MAS2_M
#else
#define M_IF_SMP 0
#endif
Then
if (1)
mas2_attr |= MAS2_I | MAS2_G;
else
mas2_attr |= M_IF_SMP;
Tiejun
> + return mas2_attr;
> }
>
> /*
> @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
> /* Force IPROT=0 for all guest mappings. */
> stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
> stlbe->mas2 = (gvaddr & MAS2_EPN) |
> - e500_shadow_mas2_attrib(gtlbe->mas2, pr);
> + e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
> stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
> e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
>
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 8:25 ` Bhushan Bharat-R65777
@ 2013-07-18 8:55 ` "“tiejun.chen”"
2013-07-18 9:44 ` Alexander Graf
2013-07-18 9:48 ` Alexander Graf
1 sibling, 1 reply; 44+ messages in thread
From: "“tiejun.chen”" @ 2013-07-18 8:55 UTC (permalink / raw)
To: Bhushan Bharat-R65777; +Cc: kvm-ppc, kvm, agraf, Wood Scott-B07421
On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote:
>
>
>> -----Original Message-----
>> From: Bhushan Bharat-R65777
>> Sent: Thursday, July 18, 2013 1:53 PM
>> To: '"“tiejun.chen”"'
>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>> B07421
>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>> managed pages
>>
>>
>>
>>> -----Original Message-----
>>> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
>>> Sent: Thursday, July 18, 2013 1:52 PM
>>> To: Bhushan Bharat-R65777
>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>> Scott-
>>> B07421
>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>> kernel managed pages
>>>
>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "“tiejun.chen”"
>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>> To: Bhushan Bharat-R65777
>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>> Wood
>>>>> Scott-
>>>>> B07421
>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>> kernel managed pages
>>>>>
>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>> To: Bhushan Bharat-R65777
>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>> Wood
>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>> for kernel managed pages
>>>>>>>
>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>> If there is a struct page for the requested mapping then it's
>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>>>>>>> else this is treated as I/O and we set "I + G" (cache
>>>>>>>> inhibited,
>>>>>>>> guarded)
>>>>>>>>
>>>>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>>>>
>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>>>> ---
>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>> usermode)
>>>>>>>> return mas3;
>>>>>>>> }
>>>>>>>>
>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>> usermode)
>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>>>>> {
>>>>>>>> + u32 mas2_attr;
>>>>>>>> +
>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>> +
>>>>>>>> + if (!pfn_valid(pfn)) {
>>>>>>>
>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>
>>>>>> What I understand from this function (someone can correct me) is
>>>>>> that it
>>>>> returns "false" when the page is managed by kernel and is not
>>>>> marked as RESERVED (for some reason). For us it does not matter
>>>>> whether the page is reserved or not, if it is kernel visible page then it
>> is DDR.
>>>>>>
>>>>>
>>>>> I think you are setting I|G by addressing all mmio pages, right? If
>>>>> so,
>>>>>
>>>>> KVM: direct mmio pfn check
>>>>>
>>>>> Userspace may specify memory slots that are backed by mmio
>>>>> pages rather than
>>>>> normal RAM. In some cases it is not enough to identify these
>>>>> mmio
>>> pages
>>>>> by pfn_valid(). This patch adds checking the PageReserved as well.
>>>>
>>>> Do you know what are those "some cases" and how checking
>>>> PageReserved helps in
>>> those cases?
>>>
>>> No, myself didn't see these actual cases in qemu,too. But this should
>>> be chronically persistent as I understand ;-)
>>
>> Then I will wait till someone educate me :)
>
> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
Furthermore, how to distinguish we're creating TLB entry for the device assigned
directly to the GS?
I think its unnecessary to always check if that is mmio's pfn since we have more
non direct assigned devices.
So maybe we can introduce another helper to fixup that TLB entry in instead of
this path.
Tiejun
>
> -Bharat
>
>>>>>>>> + mas2_attr |= MAS2_I | MAS2_G;
>>>>>>>> + } else {
>>>>>>>> #ifdef CONFIG_SMP
>>>>>>>> - return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
>>>>>>>> -#else
>>>>>>>> - return mas2 & MAS2_ATTRIB_MASK;
>>>>>>>> + mas2_attr |= MAS2_M;
>>>>>>>> #endif
>>>>>>>> + }
>>>>>>>> + return mas2_attr;
>>>>>>>> }
>>>>>>>>
>>>>>>>> /*
>>>>>>>> @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
>>>>>>>> /* Force IPROT=0 for all guest mappings. */
>>>>>>>> stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
>>>>>>>> stlbe->mas2 = (gvaddr & MAS2_EPN) |
>>>>>>>> - e500_shadow_mas2_attrib(gtlbe->mas2, pr);
>>>>>>>> + e500_shadow_mas2_attrib(gtlbe->mas2, pfn);
>>>>>>>> stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
>>>>>>>> e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe kvm-ppc"
>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>> majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 8:55 ` "“tiejun.chen”"
@ 2013-07-18 9:44 ` Alexander Graf
2013-07-18 9:56 ` "“tiejun.chen”"
0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2013-07-18 9:44 UTC (permalink / raw)
To: “tiejun.chen”
Cc: Bhushan Bharat-R65777, kvm-ppc, kvm, Wood Scott-B07421
On 18.07.2013, at 10:55, “tiejun.chen” wrote:
> On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote:
>>
>>
>>> -----Original Message-----
>>> From: Bhushan Bharat-R65777
>>> Sent: Thursday, July 18, 2013 1:53 PM
>>> To: '"“tiejun.chen”"'
>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>>> B07421
>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>>> managed pages
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>> To: Bhushan Bharat-R65777
>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>>> Scott-
>>>> B07421
>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>> kernel managed pages
>>>>
>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "“tiejun.chen”"
>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>> To: Bhushan Bharat-R65777
>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>> Wood
>>>>>> Scott-
>>>>>> B07421
>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>> kernel managed pages
>>>>>>
>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>> Wood
>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>> for kernel managed pages
>>>>>>>>
>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>> If there is a struct page for the requested mapping then it's
>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>>>>>>>> else this is treated as I/O and we set "I + G" (cache
>>>>>>>>> inhibited,
>>>>>>>>> guarded)
>>>>>>>>>
>>>>>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>>>>>
>>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>>>>> ---
>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>> usermode)
>>>>>>>>> return mas3;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>> usermode)
>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>>>>>> {
>>>>>>>>> + u32 mas2_attr;
>>>>>>>>> +
>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>> +
>>>>>>>>> + if (!pfn_valid(pfn)) {
>>>>>>>>
>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>
>>>>>>> What I understand from this function (someone can correct me) is
>>>>>>> that it
>>>>>> returns "false" when the page is managed by kernel and is not
>>>>>> marked as RESERVED (for some reason). For us it does not matter
>>>>>> whether the page is reserved or not, if it is kernel visible page then it
>>> is DDR.
>>>>>>>
>>>>>>
>>>>>> I think you are setting I|G by addressing all mmio pages, right? If
>>>>>> so,
>>>>>>
>>>>>> KVM: direct mmio pfn check
>>>>>>
>>>>>> Userspace may specify memory slots that are backed by mmio
>>>>>> pages rather than
>>>>>> normal RAM. In some cases it is not enough to identify these
>>>>>> mmio
>>>> pages
>>>>>> by pfn_valid(). This patch adds checking the PageReserved as well.
>>>>>
>>>>> Do you know what are those "some cases" and how checking
>>>>> PageReserved helps in
>>>> those cases?
>>>>
>>>> No, myself didn't see these actual cases in qemu,too. But this should
>>>> be chronically persistent as I understand ;-)
>>>
>>> Then I will wait till someone educate me :)
>>
>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
>
> Furthermore, how to distinguish we're creating TLB entry for the device assigned directly to the GS?
Because other devices wouldn't be available to the guest through memory slots.
> I think its unnecessary to always check if that is mmio's pfn since we have more non direct assigned devices.
I'm not sure I understand. The shadow TLB code only knows "here is a host virtual address". It needs to figure out whether the host physical address behind that is RAM (can access with cache enabled) or not (has to disable cache)
> So maybe we can introduce another helper to fixup that TLB entry in instead of this path.
This path does fix up the shadow (host) TLB entry :).
Alex
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 8:25 ` Bhushan Bharat-R65777
2013-07-18 8:55 ` "“tiejun.chen”"
@ 2013-07-18 9:48 ` Alexander Graf
2013-07-18 9:51 ` Bhushan Bharat-R65777
2013-07-18 10:08 ` "“tiejun.chen”"
1 sibling, 2 replies; 44+ messages in thread
From: Alexander Graf @ 2013-07-18 9:48 UTC (permalink / raw)
To: Bhushan Bharat-R65777
Cc: "“tiejun.chen”", kvm-ppc, kvm, Wood Scott-B07421
On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>
>
>> -----Original Message-----
>> From: Bhushan Bharat-R65777
>> Sent: Thursday, July 18, 2013 1:53 PM
>> To: '"“tiejun.chen”"'
>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>> B07421
>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>> managed pages
>>
>>
>>
>>> -----Original Message-----
>>> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
>>> Sent: Thursday, July 18, 2013 1:52 PM
>>> To: Bhushan Bharat-R65777
>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>> Scott-
>>> B07421
>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>> kernel managed pages
>>>
>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "“tiejun.chen”"
>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>> To: Bhushan Bharat-R65777
>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>> Wood
>>>>> Scott-
>>>>> B07421
>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>> kernel managed pages
>>>>>
>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>> To: Bhushan Bharat-R65777
>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>> Wood
>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>> for kernel managed pages
>>>>>>>
>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>> If there is a struct page for the requested mapping then it's
>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>>>>>>> else this is treated as I/O and we set "I + G" (cache
>>>>>>>> inhibited,
>>>>>>>> guarded)
>>>>>>>>
>>>>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>>>>
>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>>>> ---
>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>> usermode)
>>>>>>>> return mas3;
>>>>>>>> }
>>>>>>>>
>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>> usermode)
>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>>>>> {
>>>>>>>> + u32 mas2_attr;
>>>>>>>> +
>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>> +
>>>>>>>> + if (!pfn_valid(pfn)) {
>>>>>>>
>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>
>>>>>> What I understand from this function (someone can correct me) is
>>>>>> that it
>>>>> returns "false" when the page is managed by kernel and is not
>>>>> marked as RESERVED (for some reason). For us it does not matter
>>>>> whether the page is reserved or not, if it is kernel visible page then it
>> is DDR.
>>>>>>
>>>>>
>>>>> I think you are setting I|G by addressing all mmio pages, right? If
>>>>> so,
>>>>>
>>>>> KVM: direct mmio pfn check
>>>>>
>>>>> Userspace may specify memory slots that are backed by mmio
>>>>> pages rather than
>>>>> normal RAM. In some cases it is not enough to identify these
>>>>> mmio
>>> pages
>>>>> by pfn_valid(). This patch adds checking the PageReserved as well.
>>>>
>>>> Do you know what are those "some cases" and how checking
>>>> PageReserved helps in
>>> those cases?
>>>
>>> No, myself didn't see these actual cases in qemu,too. But this should
>>> be chronically persistent as I understand ;-)
>>
>> Then I will wait till someone educate me :)
>
> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
It certainly does more than we need and potentially slows down the fast path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to check for pages that are declared reserved on the host. This happens in 2 cases:
1) Non cache coherent DMA
2) Memory hot remove
The non coherent DMA case would be interesting, as with the mechanism as it is in place in Linux today, we could potentially break normal guest operation if we don't take it into account. However, it's Kconfig guarded by:
depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
default n if PPC_47x
default y
so we never hit it with any core we care about ;).
Memory hot remove does not exist on e500 FWIW, so we don't have to worry about that one either.
Which means I think it's fine to slim this whole thing down to only check for pfn_valid(), as the code does in this patch. It would however be very useful to have a comment there that explains why it's safe to do so.
Alex
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 9:48 ` Alexander Graf
@ 2013-07-18 9:51 ` Bhushan Bharat-R65777
2013-07-18 10:08 ` "“tiejun.chen”"
1 sibling, 0 replies; 44+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-07-18 9:51 UTC (permalink / raw)
To: Alexander Graf
Cc: "“tiejun.chen”", kvm-ppc, kvm, Wood Scott-B07421
> -----Original Message-----
> From: kvm-ppc-owner@vger.kernel.org [mailto:kvm-ppc-owner@vger.kernel.org] On
> Behalf Of Alexander Graf
> Sent: Thursday, July 18, 2013 3:19 PM
> To: Bhushan Bharat-R65777
> Cc: "“tiejun.chen”"; kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-
> B07421
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
> managed pages
>
>
> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>
> >
> >
> >> -----Original Message-----
> >> From: Bhushan Bharat-R65777
> >> Sent: Thursday, July 18, 2013 1:53 PM
> >> To: '"“tiejun.chen”"'
> >> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
> >> Scott-
> >> B07421
> >> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> >> kernel managed pages
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
> >>> Sent: Thursday, July 18, 2013 1:52 PM
> >>> To: Bhushan Bharat-R65777
> >>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
> >>> Wood
> >>> Scott-
> >>> B07421
> >>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
> >>> kernel managed pages
> >>>
> >>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: kvm-ppc-owner@vger.kernel.org
> >>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "“tiejun.chen”"
> >>>>> Sent: Thursday, July 18, 2013 1:01 PM
> >>>>> To: Bhushan Bharat-R65777
> >>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
> >>>>> Wood
> >>>>> Scott-
> >>>>> B07421
> >>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
> >>>>> for kernel managed pages
> >>>>>
> >>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: "“tiejun.chen”" [mailto:tiejun.chen@windriver.com]
> >>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
> >>>>>>> To: Bhushan Bharat-R65777
> >>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
> >>>>>>> Wood
> >>>>>>> Scott- B07421; Bhushan Bharat-R65777
> >>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
> >>>>>>> for kernel managed pages
> >>>>>>>
> >>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> >>>>>>>> If there is a struct page for the requested mapping then it's
> >>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
> >>>>>>>> else this is treated as I/O and we set "I + G" (cache
> >>>>>>>> inhibited,
> >>>>>>>> guarded)
> >>>>>>>>
> >>>>>>>> This helps setting proper TLB mapping for direct assigned
> >>>>>>>> device
> >>>>>>>>
> >>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> >>>>>>>> ---
> >>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
> >>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> >>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
> >>>>>>>> index 1c6a9d7..089c227 100644
> >>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
> >>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> >>>>>>>> @@ -64,13 +64,20 @@ static inline u32
> >>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
> >>>>>>> usermode)
> >>>>>>>> return mas3;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
> >>>>>>>> usermode)
> >>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
> >>>>>>>> {
> >>>>>>>> + u32 mas2_attr;
> >>>>>>>> +
> >>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> >>>>>>>> +
> >>>>>>>> + if (!pfn_valid(pfn)) {
> >>>>>>>
> >>>>>>> Why not directly use kvm_is_mmio_pfn()?
> >>>>>>
> >>>>>> What I understand from this function (someone can correct me) is
> >>>>>> that it
> >>>>> returns "false" when the page is managed by kernel and is not
> >>>>> marked as RESERVED (for some reason). For us it does not matter
> >>>>> whether the page is reserved or not, if it is kernel visible page
> >>>>> then it
> >> is DDR.
> >>>>>>
> >>>>>
> >>>>> I think you are setting I|G by addressing all mmio pages, right?
> >>>>> If so,
> >>>>>
> >>>>> KVM: direct mmio pfn check
> >>>>>
> >>>>> Userspace may specify memory slots that are backed by mmio
> >>>>> pages rather than
> >>>>> normal RAM. In some cases it is not enough to identify these
> >>>>> mmio
> >>> pages
> >>>>> by pfn_valid(). This patch adds checking the PageReserved as well.
> >>>>
> >>>> Do you know what are those "some cases" and how checking
> >>>> PageReserved helps in
> >>> those cases?
> >>>
> >>> No, myself didn't see these actual cases in qemu,too. But this
> >>> should be chronically persistent as I understand ;-)
> >>
> >> Then I will wait till someone educate me :)
> >
> > The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not
> want to call this for all tlbwe operation unless it is necessary.
>
> It certainly does more than we need and potentially slows down the fast path
> (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to check
> for pages that are declared reserved on the host. This happens in 2 cases:
>
> 1) Non cache coherent DMA
> 2) Memory hot remove
>
> The non coherent DMA case would be interesting, as with the mechanism as it is
> in place in Linux today, we could potentially break normal guest operation if we
> don't take it into account. However, it's Kconfig guarded by:
>
> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
> default n if PPC_47x
> default y
>
> so we never hit it with any core we care about ;).
>
> Memory hot remove does not exist on e500 FWIW, so we don't have to worry about
> that one either.
>
> Which means I think it's fine to slim this whole thing down to only check for
> pfn_valid(), as the code does in this patch. It would however be very useful to
> have a comment there that explains why it's safe to do so.
Big thanks for the details :-)
Will add a comment.
-Bharat
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 9:44 ` Alexander Graf
@ 2013-07-18 9:56 ` "“tiejun.chen”"
2013-07-18 10:00 ` Alexander Graf
0 siblings, 1 reply; 44+ messages in thread
From: "“tiejun.chen”" @ 2013-07-18 9:56 UTC (permalink / raw)
To: Alexander Graf; +Cc: Bhushan Bharat-R65777, kvm-ppc, kvm, Wood Scott-B07421
On 07/18/2013 05:44 PM, Alexander Graf wrote:
>
> On 18.07.2013, at 10:55, �tiejun.chen� wrote:
>
>> On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Bhushan Bharat-R65777
>>>> Sent: Thursday, July 18, 2013 1:53 PM
>>>> To: '"�tiejun.chen�"'
>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>>>> B07421
>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>>>> managed pages
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>>> To: Bhushan Bharat-R65777
>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>>>> Scott-
>>>>> B07421
>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>> kernel managed pages
>>>>>
>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "�tiejun.chen�"
>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>>> To: Bhushan Bharat-R65777
>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>> Wood
>>>>>>> Scott-
>>>>>>> B07421
>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>> kernel managed pages
>>>>>>>
>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>> Wood
>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>>> for kernel managed pages
>>>>>>>>>
>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>>> If there is a struct page for the requested mapping then it's
>>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>>>>>>>>> else this is treated as I/O and we set "I + G" (cache
>>>>>>>>>> inhibited,
>>>>>>>>>> guarded)
>>>>>>>>>>
>>>>>>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>>>>>> ---
>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>>> usermode)
>>>>>>>>>> return mas3;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>>> usermode)
>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>>>>>>> {
>>>>>>>>>> + u32 mas2_attr;
>>>>>>>>>> +
>>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>>> +
>>>>>>>>>> + if (!pfn_valid(pfn)) {
>>>>>>>>>
>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>>
>>>>>>>> What I understand from this function (someone can correct me) is
>>>>>>>> that it
>>>>>>> returns "false" when the page is managed by kernel and is not
>>>>>>> marked as RESERVED (for some reason). For us it does not matter
>>>>>>> whether the page is reserved or not, if it is kernel visible page then it
>>>> is DDR.
>>>>>>>>
>>>>>>>
>>>>>>> I think you are setting I|G by addressing all mmio pages, right? If
>>>>>>> so,
>>>>>>>
>>>>>>> KVM: direct mmio pfn check
>>>>>>>
>>>>>>> Userspace may specify memory slots that are backed by mmio
>>>>>>> pages rather than
>>>>>>> normal RAM. In some cases it is not enough to identify these
>>>>>>> mmio
>>>>> pages
>>>>>>> by pfn_valid(). This patch adds checking the PageReserved as well.
>>>>>>
>>>>>> Do you know what are those "some cases" and how checking
>>>>>> PageReserved helps in
>>>>> those cases?
>>>>>
>>>>> No, myself didn't see these actual cases in qemu,too. But this should
>>>>> be chronically persistent as I understand ;-)
>>>>
>>>> Then I will wait till someone educate me :)
>>>
>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
>>
>> Furthermore, how to distinguish we're creating TLB entry for the device assigned directly to the GS?
>
> Because other devices wouldn't be available to the guest through memory slots.
Yes.
>
>> I think its unnecessary to always check if that is mmio's pfn since we have more non direct assigned devices.
>
> I'm not sure I understand. The shadow TLB code only knows "here is a host virtual address". It needs to figure out whether the host physical address behind that is RAM (can access with cache enabled) or not (has to disable cache)
>
Sorry, looks I'm misleading you :-P
>> So maybe we can introduce another helper to fixup that TLB entry in instead of this path.
>
> This path does fix up the shadow (host) TLB entry :).
>
I just mean whether we can have a separate path dedicated to those direct
assigned devices, not go this common path :)
Tiejun
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 9:56 ` "“tiejun.chen”"
@ 2013-07-18 10:00 ` Alexander Graf
2013-07-18 10:14 ` "“tiejun.chen”"
2013-07-18 16:11 ` Scott Wood
0 siblings, 2 replies; 44+ messages in thread
From: Alexander Graf @ 2013-07-18 10:00 UTC (permalink / raw)
To: “tiejun.chen”
Cc: Bhushan Bharat-R65777, kvm-ppc, kvm, Wood Scott-B07421
On 18.07.2013, at 11:56, “tiejun.chen” wrote:
> On 07/18/2013 05:44 PM, Alexander Graf wrote:
>>
>> On 18.07.2013, at 10:55, �tiejun.chen� wrote:
>>
>>> On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Bhushan Bharat-R65777
>>>>> Sent: Thursday, July 18, 2013 1:53 PM
>>>>> To: '"�tiejun.chen�"'
>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>>>>> B07421
>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>>>>> managed pages
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>>>> To: Bhushan Bharat-R65777
>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>>>>> Scott-
>>>>>> B07421
>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>> kernel managed pages
>>>>>>
>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "�tiejun.chen�"
>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>> Wood
>>>>>>>> Scott-
>>>>>>>> B07421
>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>>> kernel managed pages
>>>>>>>>
>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>>> Wood
>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>>>> for kernel managed pages
>>>>>>>>>>
>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>>>> If there is a struct page for the requested mapping then it's
>>>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>>>>>>>>>> else this is treated as I/O and we set "I + G" (cache
>>>>>>>>>>> inhibited,
>>>>>>>>>>> guarded)
>>>>>>>>>>>
>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>>>>>>> ---
>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>>>> usermode)
>>>>>>>>>>> return mas3;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>>>> usermode)
>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>>>>>>>> {
>>>>>>>>>>> + u32 mas2_attr;
>>>>>>>>>>> +
>>>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>>>> +
>>>>>>>>>>> + if (!pfn_valid(pfn)) {
>>>>>>>>>>
>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>>>
>>>>>>>>> What I understand from this function (someone can correct me) is
>>>>>>>>> that it
>>>>>>>> returns "false" when the page is managed by kernel and is not
>>>>>>>> marked as RESERVED (for some reason). For us it does not matter
>>>>>>>> whether the page is reserved or not, if it is kernel visible page then it
>>>>> is DDR.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think you are setting I|G by addressing all mmio pages, right? If
>>>>>>>> so,
>>>>>>>>
>>>>>>>> KVM: direct mmio pfn check
>>>>>>>>
>>>>>>>> Userspace may specify memory slots that are backed by mmio
>>>>>>>> pages rather than
>>>>>>>> normal RAM. In some cases it is not enough to identify these
>>>>>>>> mmio
>>>>>> pages
>>>>>>>> by pfn_valid(). This patch adds checking the PageReserved as well.
>>>>>>>
>>>>>>> Do you know what are those "some cases" and how checking
>>>>>>> PageReserved helps in
>>>>>> those cases?
>>>>>>
>>>>>> No, myself didn't see these actual cases in qemu,too. But this should
>>>>>> be chronically persistent as I understand ;-)
>>>>>
>>>>> Then I will wait till someone educate me :)
>>>>
>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
>>>
>>> Furthermore, how to distinguish we're creating TLB entry for the device assigned directly to the GS?
>>
>> Because other devices wouldn't be available to the guest through memory slots.
>
> Yes.
>
>>
>>> I think its unnecessary to always check if that is mmio's pfn since we have more non direct assigned devices.
>>
>> I'm not sure I understand. The shadow TLB code only knows "here is a host virtual address". It needs to figure out whether the host physical address behind that is RAM (can access with cache enabled) or not (has to disable cache)
>>
>
> Sorry, looks I'm misleading you :-P
>
>>> So maybe we can introduce another helper to fixup that TLB entry in instead of this path.
>>
>> This path does fix up the shadow (host) TLB entry :).
>>
>
> I just mean whether we can have a separate path dedicated to those direct assigned devices, not go this common path :)
I don't think it's possible to have a separate path without a certain level of trust. In the current flow we don't trust anyone. We just check every translated page whether we should enable caching or not.
We could take that information from 2 other side though:
1) Memory Slot
2) Guest TLB Flags
If we take it from the memory slot we would have to trust QEMU (or any other user space) to give us the right hints. Malicious user space could set invalid flags. Also we'd have to add logic to track this - which doesn't exist today.
If we take it from the guest we have to trust the guest. Malicious guests could set invalid flags.
Now why is setting invalid flags a problem? If I understand Scott correctly, it can break the host if you access certain host devices with caching enabled. But to be sure I'd say we ask him directly :).
Either way, not trusting anyone is definitely the safer choice.
Alex
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 9:48 ` Alexander Graf
2013-07-18 9:51 ` Bhushan Bharat-R65777
@ 2013-07-18 10:08 ` "“tiejun.chen”"
2013-07-18 10:12 ` Alexander Graf
1 sibling, 1 reply; 44+ messages in thread
From: "“tiejun.chen”" @ 2013-07-18 10:08 UTC (permalink / raw)
To: Alexander Graf; +Cc: Bhushan Bharat-R65777, kvm-ppc, kvm, Wood Scott-B07421
On 07/18/2013 05:48 PM, Alexander Graf wrote:
>
> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>
>>
>>
>>> -----Original Message-----
>>> From: Bhushan Bharat-R65777
>>> Sent: Thursday, July 18, 2013 1:53 PM
>>> To: '"�tiejun.chen�"'
>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>>> B07421
>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>>> managed pages
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>> To: Bhushan Bharat-R65777
>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>>> Scott-
>>>> B07421
>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>> kernel managed pages
>>>>
>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "�tiejun.chen�"
>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>> To: Bhushan Bharat-R65777
>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>> Wood
>>>>>> Scott-
>>>>>> B07421
>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>> kernel managed pages
>>>>>>
>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>> Wood
>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>> for kernel managed pages
>>>>>>>>
>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>> If there is a struct page for the requested mapping then it's
>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>>>>>>>> else this is treated as I/O and we set "I + G" (cache
>>>>>>>>> inhibited,
>>>>>>>>> guarded)
>>>>>>>>>
>>>>>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>>>>>
>>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>>>>> ---
>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>> usermode)
>>>>>>>>> return mas3;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>> usermode)
>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>>>>>> {
>>>>>>>>> + u32 mas2_attr;
>>>>>>>>> +
>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>> +
>>>>>>>>> + if (!pfn_valid(pfn)) {
>>>>>>>>
>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>
>>>>>>> What I understand from this function (someone can correct me) is
>>>>>>> that it
>>>>>> returns "false" when the page is managed by kernel and is not
>>>>>> marked as RESERVED (for some reason). For us it does not matter
>>>>>> whether the page is reserved or not, if it is kernel visible page then it
>>> is DDR.
>>>>>>>
>>>>>>
>>>>>> I think you are setting I|G by addressing all mmio pages, right? If
>>>>>> so,
>>>>>>
>>>>>> KVM: direct mmio pfn check
>>>>>>
>>>>>> Userspace may specify memory slots that are backed by mmio
>>>>>> pages rather than
>>>>>> normal RAM. In some cases it is not enough to identify these
>>>>>> mmio
>>>> pages
>>>>>> by pfn_valid(). This patch adds checking the PageReserved as well.
>>>>>
>>>>> Do you know what are those "some cases" and how checking
>>>>> PageReserved helps in
>>>> those cases?
>>>>
>>>> No, myself didn't see these actual cases in qemu,too. But this should
>>>> be chronically persistent as I understand ;-)
>>>
>>> Then I will wait till someone educate me :)
>>
>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
>
> It certainly does more than we need and potentially slows down the fast path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to check for pages that are declared reserved on the host. This happens in 2 cases:
>
> 1) Non cache coherent DMA
> 2) Memory hot remove
>
> The non coherent DMA case would be interesting, as with the mechanism as it is in place in Linux today, we could potentially break normal guest operation if we don't take it into account. However, it's Kconfig guarded by:
>
> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
> default n if PPC_47x
> default y
>
> so we never hit it with any core we care about ;).
>
> Memory hot remove does not exist on e500 FWIW, so we don't have to worry about that one either.
Thanks for this good information :)
So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside kvm_is_mmio_pfn()
to make sure that check is only valid when that is really needed? This can
decrease those unnecessary performance loss.
If I'm wrong please correct me :)
Tiejun
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 10:08 ` "“tiejun.chen”"
@ 2013-07-18 10:12 ` Alexander Graf
2013-07-18 10:19 ` "“tiejun.chen”"
0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2013-07-18 10:12 UTC (permalink / raw)
To: “tiejun.chen”
Cc: Bhushan Bharat-R65777, kvm-ppc, kvm, Wood Scott-B07421
On 18.07.2013, at 12:08, “tiejun.chen” wrote:
> On 07/18/2013 05:48 PM, Alexander Graf wrote:
>>
>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Bhushan Bharat-R65777
>>>> Sent: Thursday, July 18, 2013 1:53 PM
>>>> To: '"�tiejun.chen�"'
>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>>>> B07421
>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>>>> managed pages
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>>> To: Bhushan Bharat-R65777
>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>>>> Scott-
>>>>> B07421
>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>> kernel managed pages
>>>>>
>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "�tiejun.chen�"
>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>>> To: Bhushan Bharat-R65777
>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>> Wood
>>>>>>> Scott-
>>>>>>> B07421
>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>> kernel managed pages
>>>>>>>
>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>> Wood
>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>>> for kernel managed pages
>>>>>>>>>
>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>>> If there is a struct page for the requested mapping then it's
>>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>>>>>>>>> else this is treated as I/O and we set "I + G" (cache
>>>>>>>>>> inhibited,
>>>>>>>>>> guarded)
>>>>>>>>>>
>>>>>>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>>>>>> ---
>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>>> usermode)
>>>>>>>>>> return mas3;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>>> usermode)
>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>>>>>>> {
>>>>>>>>>> + u32 mas2_attr;
>>>>>>>>>> +
>>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>>> +
>>>>>>>>>> + if (!pfn_valid(pfn)) {
>>>>>>>>>
>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>>
>>>>>>>> What I understand from this function (someone can correct me) is
>>>>>>>> that it
>>>>>>> returns "false" when the page is managed by kernel and is not
>>>>>>> marked as RESERVED (for some reason). For us it does not matter
>>>>>>> whether the page is reserved or not, if it is kernel visible page then it
>>>> is DDR.
>>>>>>>>
>>>>>>>
>>>>>>> I think you are setting I|G by addressing all mmio pages, right? If
>>>>>>> so,
>>>>>>>
>>>>>>> KVM: direct mmio pfn check
>>>>>>>
>>>>>>> Userspace may specify memory slots that are backed by mmio
>>>>>>> pages rather than
>>>>>>> normal RAM. In some cases it is not enough to identify these
>>>>>>> mmio
>>>>> pages
>>>>>>> by pfn_valid(). This patch adds checking the PageReserved as well.
>>>>>>
>>>>>> Do you know what are those "some cases" and how checking
>>>>>> PageReserved helps in
>>>>> those cases?
>>>>>
>>>>> No, myself didn't see these actual cases in qemu,too. But this should
>>>>> be chronically persistent as I understand ;-)
>>>>
>>>> Then I will wait till someone educate me :)
>>>
>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
>>
>> It certainly does more than we need and potentially slows down the fast path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to check for pages that are declared reserved on the host. This happens in 2 cases:
>>
>> 1) Non cache coherent DMA
>> 2) Memory hot remove
>>
>> The non coherent DMA case would be interesting, as with the mechanism as it is in place in Linux today, we could potentially break normal guest operation if we don't take it into account. However, it's Kconfig guarded by:
>>
>> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
>> default n if PPC_47x
>> default y
>>
>> so we never hit it with any core we care about ;).
>>
>> Memory hot remove does not exist on e500 FWIW, so we don't have to worry about that one either.
>
> Thanks for this good information :)
>
> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside kvm_is_mmio_pfn() to make sure that check is only valid when that is really needed? This can decrease those unnecessary performance loss.
>
> If I'm wrong please correct me :)
You're perfectly right, but this is generic KVM code. So it gets run across all architectures. What if someone has the great idea to add a new case here for x86, but doesn't tell us? In that case we potentially break x86.
I'd rather not like to break x86 :).
However, it'd be very interesting to see a benchmark with this change. Do you think you could just rip out the whole reserved check and run a few benchmarks and show us the results?
Alex
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 10:00 ` Alexander Graf
@ 2013-07-18 10:14 ` "“tiejun.chen”"
2013-07-18 16:11 ` Scott Wood
1 sibling, 0 replies; 44+ messages in thread
From: "“tiejun.chen”" @ 2013-07-18 10:14 UTC (permalink / raw)
To: Alexander Graf; +Cc: Bhushan Bharat-R65777, kvm-ppc, kvm, Wood Scott-B07421
On 07/18/2013 06:00 PM, Alexander Graf wrote:
>
> On 18.07.2013, at 11:56, “tiejun.chen” wrote:
>
>> On 07/18/2013 05:44 PM, Alexander Graf wrote:
>>>
>>> On 18.07.2013, at 10:55, �tiejun.chen� wrote:
>>>
>>>> On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Bhushan Bharat-R65777
>>>>>> Sent: Thursday, July 18, 2013 1:53 PM
>>>>>> To: '"�tiejun.chen�"'
>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>>>>>> B07421
>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>>>>>> managed pages
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>>>>> To: Bhushan Bharat-R65777
>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>>>>>> Scott-
>>>>>>> B07421
>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>> kernel managed pages
>>>>>>>
>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "�tiejun.chen�"
>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>> Wood
>>>>>>>>> Scott-
>>>>>>>>> B07421
>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>>>> kernel managed pages
>>>>>>>>>
>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>>>> Wood
>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>>>>> for kernel managed pages
>>>>>>>>>>>
>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>>>>> If there is a struct page for the requested mapping then it's
>>>>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>>>>>>>>>>> else this is treated as I/O and we set "I + G" (cache
>>>>>>>>>>>> inhibited,
>>>>>>>>>>>> guarded)
>>>>>>>>>>>>
>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>>>>> usermode)
>>>>>>>>>>>> return mas3;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>>>>> usermode)
>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>>>>>>>>> {
>>>>>>>>>>>> + u32 mas2_attr;
>>>>>>>>>>>> +
>>>>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>>>>> +
>>>>>>>>>>>> + if (!pfn_valid(pfn)) {
>>>>>>>>>>>
>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>>>>
>>>>>>>>>> What I understand from this function (someone can correct me) is
>>>>>>>>>> that it
>>>>>>>>> returns "false" when the page is managed by kernel and is not
>>>>>>>>> marked as RESERVED (for some reason). For us it does not matter
>>>>>>>>> whether the page is reserved or not, if it is kernel visible page then it
>>>>>> is DDR.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think you are setting I|G by addressing all mmio pages, right? If
>>>>>>>>> so,
>>>>>>>>>
>>>>>>>>> KVM: direct mmio pfn check
>>>>>>>>>
>>>>>>>>> Userspace may specify memory slots that are backed by mmio
>>>>>>>>> pages rather than
>>>>>>>>> normal RAM. In some cases it is not enough to identify these
>>>>>>>>> mmio
>>>>>>> pages
>>>>>>>>> by pfn_valid(). This patch adds checking the PageReserved as well.
>>>>>>>>
>>>>>>>> Do you know what are those "some cases" and how checking
>>>>>>>> PageReserved helps in
>>>>>>> those cases?
>>>>>>>
>>>>>>> No, myself didn't see these actual cases in qemu,too. But this should
>>>>>>> be chronically persistent as I understand ;-)
>>>>>>
>>>>>> Then I will wait till someone educate me :)
>>>>>
>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
>>>>
>>>> Furthermore, how to distinguish we're creating TLB entry for the device assigned directly to the GS?
>>>
>>> Because other devices wouldn't be available to the guest through memory slots.
>>
>> Yes.
>>
>>>
>>>> I think its unnecessary to always check if that is mmio's pfn since we have more non direct assigned devices.
>>>
>>> I'm not sure I understand. The shadow TLB code only knows "here is a host virtual address". It needs to figure out whether the host physical address behind that is RAM (can access with cache enabled) or not (has to disable cache)
>>>
>>
>> Sorry, looks I'm misleading you :-P
>>
>>>> So maybe we can introduce another helper to fixup that TLB entry in instead of this path.
>>>
>>> This path does fix up the shadow (host) TLB entry :).
>>>
>>
>> I just mean whether we can have a separate path dedicated to those direct assigned devices, not go this common path :)
>
> I don't think it's possible to have a separate path without a certain level of trust. In the current flow we don't trust anyone. We just check every translated page whether we should enable caching or not.
>
> We could take that information from 2 other side though:
>
> 1) Memory Slot
> 2) Guest TLB Flags
>
> If we take it from the memory slot we would have to trust QEMU (or any other user space) to give us the right hints. Malicious user space could set invalid flags. Also we'd have to add logic to track this - which doesn't exist today.
>
> If we take it from the guest we have to trust the guest. Malicious guests could set invalid flags.
Understood.
>
> Now why is setting invalid flags a problem? If I understand Scott correctly, it can break the host if you access certain host devices with caching enabled. But to be sure I'd say we ask him directly :).
Yes, we should certainly set I|G for that TLB entry mapping to device.
>
> Either way, not trusting anyone is definitely the safer choice.
Definitely :)
Tiejun
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 10:12 ` Alexander Graf
@ 2013-07-18 10:19 ` "“tiejun.chen”"
2013-07-18 10:27 ` Alexander Graf
0 siblings, 1 reply; 44+ messages in thread
From: "“tiejun.chen”" @ 2013-07-18 10:19 UTC (permalink / raw)
To: Alexander Graf; +Cc: Bhushan Bharat-R65777, kvm-ppc, kvm, Wood Scott-B07421
On 07/18/2013 06:12 PM, Alexander Graf wrote:
>
> On 18.07.2013, at 12:08, “tiejun.chen” wrote:
>
>> On 07/18/2013 05:48 PM, Alexander Graf wrote:
>>>
>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Bhushan Bharat-R65777
>>>>> Sent: Thursday, July 18, 2013 1:53 PM
>>>>> To: '"�tiejun.chen�"'
>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>>>>> B07421
>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>>>>> managed pages
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>>>> To: Bhushan Bharat-R65777
>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>>>>> Scott-
>>>>>> B07421
>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>> kernel managed pages
>>>>>>
>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "�tiejun.chen�"
>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>> Wood
>>>>>>>> Scott-
>>>>>>>> B07421
>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>>> kernel managed pages
>>>>>>>>
>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>>> Wood
>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>>>> for kernel managed pages
>>>>>>>>>>
>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>>>> If there is a struct page for the requested mapping then it's
>>>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>>>>>>>>>> else this is treated as I/O and we set "I + G" (cache
>>>>>>>>>>> inhibited,
>>>>>>>>>>> guarded)
>>>>>>>>>>>
>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>>>>>>> ---
>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>>>> usermode)
>>>>>>>>>>> return mas3;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>>>> usermode)
>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>>>>>>>> {
>>>>>>>>>>> + u32 mas2_attr;
>>>>>>>>>>> +
>>>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>>>> +
>>>>>>>>>>> + if (!pfn_valid(pfn)) {
>>>>>>>>>>
>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>>>
>>>>>>>>> What I understand from this function (someone can correct me) is
>>>>>>>>> that it
>>>>>>>> returns "false" when the page is managed by kernel and is not
>>>>>>>> marked as RESERVED (for some reason). For us it does not matter
>>>>>>>> whether the page is reserved or not, if it is kernel visible page then it
>>>>> is DDR.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think you are setting I|G by addressing all mmio pages, right? If
>>>>>>>> so,
>>>>>>>>
>>>>>>>> KVM: direct mmio pfn check
>>>>>>>>
>>>>>>>> Userspace may specify memory slots that are backed by mmio
>>>>>>>> pages rather than
>>>>>>>> normal RAM. In some cases it is not enough to identify these
>>>>>>>> mmio
>>>>>> pages
>>>>>>>> by pfn_valid(). This patch adds checking the PageReserved as well.
>>>>>>>
>>>>>>> Do you know what are those "some cases" and how checking
>>>>>>> PageReserved helps in
>>>>>> those cases?
>>>>>>
>>>>>> No, myself didn't see these actual cases in qemu,too. But this should
>>>>>> be chronically persistent as I understand ;-)
>>>>>
>>>>> Then I will wait till someone educate me :)
>>>>
>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
>>>
>>> It certainly does more than we need and potentially slows down the fast path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to check for pages that are declared reserved on the host. This happens in 2 cases:
>>>
>>> 1) Non cache coherent DMA
>>> 2) Memory hot remove
>>>
>>> The non coherent DMA case would be interesting, as with the mechanism as it is in place in Linux today, we could potentially break normal guest operation if we don't take it into account. However, it's Kconfig guarded by:
>>>
>>> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
>>> default n if PPC_47x
>>> default y
>>>
>>> so we never hit it with any core we care about ;).
>>>
>>> Memory hot remove does not exist on e500 FWIW, so we don't have to worry about that one either.
>>
>> Thanks for this good information :)
>>
>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside kvm_is_mmio_pfn() to make sure that check is only valid when that is really needed? This can decrease those unnecessary performance loss.
>>
>> If I'm wrong please correct me :)
>
> You're perfectly right, but this is generic KVM code. So it gets run across all architectures. What if someone has the great idea to add a new case here for x86, but doesn't tell us? In that case we potentially break x86.
>
> I'd rather not like to break x86 :).
>
> However, it'd be very interesting to see a benchmark with this change. Do you think you could just rip out the whole reserved check and run a few benchmarks and show us the results?
>
Often what case should be adopted to validate this scenario?
Tiejun
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 10:19 ` "“tiejun.chen”"
@ 2013-07-18 10:27 ` Alexander Graf
2013-07-24 2:26 ` "“tiejun.chen”"
0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2013-07-18 10:27 UTC (permalink / raw)
To: "“tiejun.chen”"
Cc: Bhushan Bharat-R65777, kvm-ppc, kvm, Wood Scott-B07421
On 18.07.2013, at 12:19, “tiejun.chen” wrote:
> On 07/18/2013 06:12 PM, Alexander Graf wrote:
>>
>> On 18.07.2013, at 12:08, “tiejun.chen” wrote:
>>
>>> On 07/18/2013 05:48 PM, Alexander Graf wrote:
>>>>
>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Bhushan Bharat-R65777
>>>>>> Sent: Thursday, July 18, 2013 1:53 PM
>>>>>> To: '"�tiejun.chen�"'
>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>>>>>> B07421
>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>>>>>> managed pages
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>>>>> To: Bhushan Bharat-R65777
>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>>>>>> Scott-
>>>>>>> B07421
>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>> kernel managed pages
>>>>>>>
>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "�tiejun.chen�"
>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>> Wood
>>>>>>>>> Scott-
>>>>>>>>> B07421
>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>>>> kernel managed pages
>>>>>>>>>
>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>>>> Wood
>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>>>>> for kernel managed pages
>>>>>>>>>>>
>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>>>>> If there is a struct page for the requested mapping then it's
>>>>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>>>>>>>>>>> else this is treated as I/O and we set "I + G" (cache
>>>>>>>>>>>> inhibited,
>>>>>>>>>>>> guarded)
>>>>>>>>>>>>
>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>>>>> usermode)
>>>>>>>>>>>> return mas3;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>>>>> usermode)
>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>>>>>>>>> {
>>>>>>>>>>>> + u32 mas2_attr;
>>>>>>>>>>>> +
>>>>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>>>>> +
>>>>>>>>>>>> + if (!pfn_valid(pfn)) {
>>>>>>>>>>>
>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>>>>
>>>>>>>>>> What I understand from this function (someone can correct me) is
>>>>>>>>>> that it
>>>>>>>>> returns "false" when the page is managed by kernel and is not
>>>>>>>>> marked as RESERVED (for some reason). For us it does not matter
>>>>>>>>> whether the page is reserved or not, if it is kernel visible page then it
>>>>>> is DDR.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think you are setting I|G by addressing all mmio pages, right? If
>>>>>>>>> so,
>>>>>>>>>
>>>>>>>>> KVM: direct mmio pfn check
>>>>>>>>>
>>>>>>>>> Userspace may specify memory slots that are backed by mmio
>>>>>>>>> pages rather than
>>>>>>>>> normal RAM. In some cases it is not enough to identify these
>>>>>>>>> mmio
>>>>>>> pages
>>>>>>>>> by pfn_valid(). This patch adds checking the PageReserved as well.
>>>>>>>>
>>>>>>>> Do you know what are those "some cases" and how checking
>>>>>>>> PageReserved helps in
>>>>>>> those cases?
>>>>>>>
>>>>>>> No, myself didn't see these actual cases in qemu,too. But this should
>>>>>>> be chronically persistent as I understand ;-)
>>>>>>
>>>>>> Then I will wait till someone educate me :)
>>>>>
>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
>>>>
>>>> It certainly does more than we need and potentially slows down the fast path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to check for pages that are declared reserved on the host. This happens in 2 cases:
>>>>
>>>> 1) Non cache coherent DMA
>>>> 2) Memory hot remove
>>>>
>>>> The non coherent DMA case would be interesting, as with the mechanism as it is in place in Linux today, we could potentially break normal guest operation if we don't take it into account. However, it's Kconfig guarded by:
>>>>
>>>> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
>>>> default n if PPC_47x
>>>> default y
>>>>
>>>> so we never hit it with any core we care about ;).
>>>>
>>>> Memory hot remove does not exist on e500 FWIW, so we don't have to worry about that one either.
>>>
>>> Thanks for this good information :)
>>>
>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside kvm_is_mmio_pfn() to make sure that check is only valid when that is really needed? This can decrease those unnecessary performance loss.
>>>
>>> If I'm wrong please correct me :)
>>
>> You're perfectly right, but this is generic KVM code. So it gets run across all architectures. What if someone has the great idea to add a new case here for x86, but doesn't tell us? In that case we potentially break x86.
>>
>> I'd rather not like to break x86 :).
>>
>> However, it'd be very interesting to see a benchmark with this change. Do you think you could just rip out the whole reserved check and run a few benchmarks and show us the results?
>>
>
> Often what case should be adopted to validate this scenario?
Something which hammers the TLB emulation heavily. I usually just run /bin/echo a thousand times in "time" and see how long it takes ;)
Alex
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 10:00 ` Alexander Graf
2013-07-18 10:14 ` "“tiejun.chen”"
@ 2013-07-18 16:11 ` Scott Wood
1 sibling, 0 replies; 44+ messages in thread
From: Scott Wood @ 2013-07-18 16:11 UTC (permalink / raw)
To: Alexander Graf
Cc: “tiejun.chen”,
Bhushan Bharat-R65777, kvm-ppc, kvm, Wood Scott-B07421
On 07/18/2013 05:00:42 AM, Alexander Graf wrote:
> Now why is setting invalid flags a problem? If I understand Scott
> correctly, it can break the host if you access certain host devices
> with caching enabled. But to be sure I'd say we ask him directly :).
The architecture makes it illegal to mix cacheable and cache-inhibited
mappings to the same physical page. Mixing W or M bits is generally
bad as well. I've seen it cause machine checks, error interrupts, etc.
-- not just corrupting the page in question.
-Scott
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-18 10:27 ` Alexander Graf
@ 2013-07-24 2:26 ` "“tiejun.chen”"
2013-07-24 8:25 ` Alexander Graf
0 siblings, 1 reply; 44+ messages in thread
From: "“tiejun.chen”" @ 2013-07-24 2:26 UTC (permalink / raw)
To: Alexander Graf; +Cc: Bhushan Bharat-R65777, kvm-ppc, kvm, Wood Scott-B07421
On 07/18/2013 06:27 PM, Alexander Graf wrote:
>
> On 18.07.2013, at 12:19, “tiejun.chen” wrote:
>
>> On 07/18/2013 06:12 PM, Alexander Graf wrote:
>>>
>>> On 18.07.2013, at 12:08, “tiejun.chen” wrote:
>>>
>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote:
>>>>>
>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Bhushan Bharat-R65777
>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM
>>>>>>> To: '"�tiejun.chen�"'
>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>>>>>>> B07421
>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>>>>>>> managed pages
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>>>>>>> Scott-
>>>>>>>> B07421
>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>>> kernel managed pages
>>>>>>>>
>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "�tiejun.chen�"
>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>>> Wood
>>>>>>>>>> Scott-
>>>>>>>>>> B07421
>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>>>>> kernel managed pages
>>>>>>>>>>
>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>>>>> Wood
>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>>>>>> for kernel managed pages
>>>>>>>>>>>>
>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>>>>>> If there is a struct page for the requested mapping then it's
>>>>>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>>>>>>>>>>>> else this is treated as I/O and we set "I + G" (cache
>>>>>>>>>>>>> inhibited,
>>>>>>>>>>>>> guarded)
>>>>>>>>>>>>>
>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>>>>>> usermode)
>>>>>>>>>>>>> return mas3;
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>>>>>> usermode)
>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>>>>>>>>>> {
>>>>>>>>>>>>> + u32 mas2_attr;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> + if (!pfn_valid(pfn)) {
>>>>>>>>>>>>
>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>>>>>
>>>>>>>>>>> What I understand from this function (someone can correct me) is
>>>>>>>>>>> that it
>>>>>>>>>> returns "false" when the page is managed by kernel and is not
>>>>>>>>>> marked as RESERVED (for some reason). For us it does not matter
>>>>>>>>>> whether the page is reserved or not, if it is kernel visible page then it
>>>>>>> is DDR.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think you are setting I|G by addressing all mmio pages, right? If
>>>>>>>>>> so,
>>>>>>>>>>
>>>>>>>>>> KVM: direct mmio pfn check
>>>>>>>>>>
>>>>>>>>>> Userspace may specify memory slots that are backed by mmio
>>>>>>>>>> pages rather than
>>>>>>>>>> normal RAM. In some cases it is not enough to identify these
>>>>>>>>>> mmio
>>>>>>>> pages
>>>>>>>>>> by pfn_valid(). This patch adds checking the PageReserved as well.
>>>>>>>>>
>>>>>>>>> Do you know what are those "some cases" and how checking
>>>>>>>>> PageReserved helps in
>>>>>>>> those cases?
>>>>>>>>
>>>>>>>> No, myself didn't see these actual cases in qemu,too. But this should
>>>>>>>> be chronically persistent as I understand ;-)
>>>>>>>
>>>>>>> Then I will wait till someone educate me :)
>>>>>>
>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
>>>>>
>>>>> It certainly does more than we need and potentially slows down the fast path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to check for pages that are declared reserved on the host. This happens in 2 cases:
>>>>>
>>>>> 1) Non cache coherent DMA
>>>>> 2) Memory hot remove
>>>>>
>>>>> The non coherent DMA case would be interesting, as with the mechanism as it is in place in Linux today, we could potentially break normal guest operation if we don't take it into account. However, it's Kconfig guarded by:
>>>>>
>>>>> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
>>>>> default n if PPC_47x
>>>>> default y
>>>>>
>>>>> so we never hit it with any core we care about ;).
>>>>>
>>>>> Memory hot remove does not exist on e500 FWIW, so we don't have to worry about that one either.
>>>>
>>>> Thanks for this good information :)
>>>>
>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside kvm_is_mmio_pfn() to make sure that check is only valid when that is really needed? This can decrease those unnecessary performance loss.
>>>>
>>>> If I'm wrong please correct me :)
>>>
>>> You're perfectly right, but this is generic KVM code. So it gets run across all architectures. What if someone has the great idea to add a new case here for x86, but doesn't tell us? In that case we potentially break x86.
>>>
>>> I'd rather not like to break x86 :).
>>>
>>> However, it'd be very interesting to see a benchmark with this change. Do you think you could just rip out the whole reserved check and run a few benchmarks and show us the results?
>>>
>>
>> Often what case should be adopted to validate this scenario?
>
> Something which hammers the TLB emulation heavily. I usually just run /bin/echo a thousand times in "time" and see how long it takes ;)
>
I tried to run five times with this combination, "time `for ((i=0; i<5000;
i++)); do /bin/echo; done`", to calculate the average value with this change:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1580dd4..5e8635b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -102,6 +102,10 @@ static bool largepages_enabled = true;
bool kvm_is_mmio_pfn(pfn_t pfn)
{
+#ifdef CONFIG_MEMORY_HOTPLUG
+ /*
+ * Currently only in memory hot remove case we may still need this.
+ */
if (pfn_valid(pfn)) {
int reserved;
struct page *tail = pfn_to_page(pfn);
@@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
}
return PageReserved(tail);
}
+#endif
return true;
}
Before apply this change:
real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
After apply this change:
real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
So,
real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
Note I only boot one VM.
Tiejun
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-24 2:26 ` "“tiejun.chen”"
@ 2013-07-24 8:25 ` Alexander Graf
2013-07-24 9:11 ` Bhushan Bharat-R65777
2013-07-24 10:01 ` Gleb Natapov
0 siblings, 2 replies; 44+ messages in thread
From: Alexander Graf @ 2013-07-24 8:25 UTC (permalink / raw)
To: "“tiejun.chen”"
Cc: Bhushan Bharat-R65777, kvm-ppc, kvm@vger.kernel.org list,
Wood Scott-B07421, Gleb Natapov, Paolo Bonzini
On 24.07.2013, at 04:26, “tiejun.chen” wrote:
> On 07/18/2013 06:27 PM, Alexander Graf wrote:
>>
>> On 18.07.2013, at 12:19, “tiejun.chen” wrote:
>>
>>> On 07/18/2013 06:12 PM, Alexander Graf wrote:
>>>>
>>>> On 18.07.2013, at 12:08, “tiejun.chen” wrote:
>>>>
>>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote:
>>>>>>
>>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Bhushan Bharat-R65777
>>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM
>>>>>>>> To: '"�tiejun.chen�"'
>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>>>>>>>> B07421
>>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>>>>>>>> managed pages
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>>>>>>>> Scott-
>>>>>>>>> B07421
>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>>>> kernel managed pages
>>>>>>>>>
>>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "�tiejun.chen�"
>>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>>>> Wood
>>>>>>>>>>> Scott-
>>>>>>>>>>> B07421
>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>>>>>> kernel managed pages
>>>>>>>>>>>
>>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: "�tiejun.chen�" [mailto:tiejun.chen@windriver.com]
>>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>>>>>> Wood
>>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>>>>>>> for kernel managed pages
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>>>>>>> If there is a struct page for the requested mapping then it's
>>>>>>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>>>>>>>>>>>>> else this is treated as I/O and we set "I + G" (cache
>>>>>>>>>>>>>> inhibited,
>>>>>>>>>>>>>> guarded)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>>>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>>>>>>> usermode)
>>>>>>>>>>>>>> return mas3;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>>>>>>> usermode)
>>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> + u32 mas2_attr;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> + if (!pfn_valid(pfn)) {
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>>>>>>
>>>>>>>>>>>> What I understand from this function (someone can correct me) is
>>>>>>>>>>>> that it
>>>>>>>>>>> returns "false" when the page is managed by kernel and is not
>>>>>>>>>>> marked as RESERVED (for some reason). For us it does not matter
>>>>>>>>>>> whether the page is reserved or not, if it is kernel visible page then it
>>>>>>>> is DDR.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think you are setting I|G by addressing all mmio pages, right? If
>>>>>>>>>>> so,
>>>>>>>>>>>
>>>>>>>>>>> KVM: direct mmio pfn check
>>>>>>>>>>>
>>>>>>>>>>> Userspace may specify memory slots that are backed by mmio
>>>>>>>>>>> pages rather than
>>>>>>>>>>> normal RAM. In some cases it is not enough to identify these
>>>>>>>>>>> mmio
>>>>>>>>> pages
>>>>>>>>>>> by pfn_valid(). This patch adds checking the PageReserved as well.
>>>>>>>>>>
>>>>>>>>>> Do you know what are those "some cases" and how checking
>>>>>>>>>> PageReserved helps in
>>>>>>>>> those cases?
>>>>>>>>>
>>>>>>>>> No, myself didn't see these actual cases in qemu,too. But this should
>>>>>>>>> be chronically persistent as I understand ;-)
>>>>>>>>
>>>>>>>> Then I will wait till someone educate me :)
>>>>>>>
>>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
>>>>>>
>>>>>> It certainly does more than we need and potentially slows down the fast path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to check for pages that are declared reserved on the host. This happens in 2 cases:
>>>>>>
>>>>>> 1) Non cache coherent DMA
>>>>>> 2) Memory hot remove
>>>>>>
>>>>>> The non coherent DMA case would be interesting, as with the mechanism as it is in place in Linux today, we could potentially break normal guest operation if we don't take it into account. However, it's Kconfig guarded by:
>>>>>>
>>>>>> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
>>>>>> default n if PPC_47x
>>>>>> default y
>>>>>>
>>>>>> so we never hit it with any core we care about ;).
>>>>>>
>>>>>> Memory hot remove does not exist on e500 FWIW, so we don't have to worry about that one either.
>>>>>
>>>>> Thanks for this good information :)
>>>>>
>>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside kvm_is_mmio_pfn() to make sure that check is only valid when that is really needed? This can decrease those unnecessary performance loss.
>>>>>
>>>>> If I'm wrong please correct me :)
>>>>
>>>> You're perfectly right, but this is generic KVM code. So it gets run across all architectures. What if someone has the great idea to add a new case here for x86, but doesn't tell us? In that case we potentially break x86.
>>>>
>>>> I'd rather not like to break x86 :).
>>>>
>>>> However, it'd be very interesting to see a benchmark with this change. Do you think you could just rip out the whole reserved check and run a few benchmarks and show us the results?
>>>>
>>>
>>> Often what case should be adopted to validate this scenario?
>>
>> Something which hammers the TLB emulation heavily. I usually just run /bin/echo a thousand times in "time" and see how long it takes ;)
>>
>
> I tried to run five times with this combination, "time `for ((i=0; i<5000; i++)); do /bin/echo; done`", to calculate the average value with this change:
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1580dd4..5e8635b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
>
> bool kvm_is_mmio_pfn(pfn_t pfn)
> {
> +#ifdef CONFIG_MEMORY_HOTPLUG
I'd feel safer if we narrow this down to e500.
> + /*
> + * Currently only in memory hot remove case we may still need this.
> + */
> if (pfn_valid(pfn)) {
We still have to check for pfn_valid, no? So the #ifdef should be down here.
> int reserved;
> struct page *tail = pfn_to_page(pfn);
> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
> }
> return PageReserved(tail);
> }
> +#endif
>
> return true;
> }
>
> Before apply this change:
>
> real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
> user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
> sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
>
> After apply this change:
>
> real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
> user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
> sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
>
> So,
>
> real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
Gleb, Paolo, any hard feelings?
Alex
^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-24 8:25 ` Alexander Graf
@ 2013-07-24 9:11 ` Bhushan Bharat-R65777
2013-07-24 9:21 ` Alexander Graf
2013-07-24 10:01 ` Gleb Natapov
1 sibling, 1 reply; 44+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-07-24 9:11 UTC (permalink / raw)
To: Alexander Graf, "“tiejun.chen”"
Cc: kvm-ppc, kvm@vger.kernel.org list, Wood Scott-B07421,
Gleb Natapov, Paolo Bonzini
> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Wednesday, July 24, 2013 1:55 PM
> To: "“tiejun.chen”"
> Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; kvm@vger.kernel.org list;
> Wood Scott-B07421; Gleb Natapov; Paolo Bonzini
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
> managed pages
>
>
> On 24.07.2013, at 04:26, “tiejun.chen” wrote:
>
> > On 07/18/2013 06:27 PM, Alexander Graf wrote:
> >>
> >> On 18.07.2013, at 12:19, “tiejun.chen” wrote:
> >>
> >>> On 07/18/2013 06:12 PM, Alexander Graf wrote:
> >>>>
> >>>> On 18.07.2013, at 12:08, “tiejun.chen” wrote:
> >>>>
> >>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote:
> >>>>>>
> >>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Bhushan Bharat-R65777
> >>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM
> >>>>>>>> To: '" tiejun.chen "'
> >>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
> >>>>>>>> agraf@suse.de; Wood Scott-
> >>>>>>>> B07421
> >>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only
> >>>>>>>> for kernel managed pages
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com]
> >>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM
> >>>>>>>>> To: Bhushan Bharat-R65777
> >>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
> >>>>>>>>> agraf@suse.de; Wood
> >>>>>>>>> Scott-
> >>>>>>>>> B07421
> >>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
> >>>>>>>>> only for kernel managed pages
> >>>>>>>>>
> >>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org
> >>>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of " tiejun.chen "
> >>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
> >>>>>>>>>>> To: Bhushan Bharat-R65777
> >>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
> >>>>>>>>>>> agraf@suse.de; Wood
> >>>>>>>>>>> Scott-
> >>>>>>>>>>> B07421
> >>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
> >>>>>>>>>>> only for kernel managed pages
> >>>>>>>>>>>
> >>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com]
> >>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
> >>>>>>>>>>>>> To: Bhushan Bharat-R65777
> >>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
> >>>>>>>>>>>>> agraf@suse.de; Wood
> >>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
> >>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
> >>>>>>>>>>>>> only for kernel managed pages
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> >>>>>>>>>>>>>> If there is a struct page for the requested mapping then
> >>>>>>>>>>>>>> it's normal DDR and the mapping sets "M" bit (coherent,
> >>>>>>>>>>>>>> cacheable) else this is treated as I/O and we set "I +
> >>>>>>>>>>>>>> G" (cache inhibited,
> >>>>>>>>>>>>>> guarded)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned
> >>>>>>>>>>>>>> device
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan
> >>>>>>>>>>>>>> <bharat.bhushan@freescale.com>
> >>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
> >>>>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> >>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
> >>>>>>>>>>>>>> index 1c6a9d7..089c227 100644
> >>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
> >>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> >>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
> >>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
> >>>>>>>>>>>>> usermode)
> >>>>>>>>>>>>>> return mas3;
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
> >>>>>>>>>>>>>> usermode)
> >>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2,
> >>>>>>>>>>>>>> +pfn_t pfn)
> >>>>>>>>>>>>>> {
> >>>>>>>>>>>>>> + u32 mas2_attr;
> >>>>>>>>>>>>>> +
> >>>>>>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> >>>>>>>>>>>>>> +
> >>>>>>>>>>>>>> + if (!pfn_valid(pfn)) {
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
> >>>>>>>>>>>>
> >>>>>>>>>>>> What I understand from this function (someone can correct
> >>>>>>>>>>>> me) is that it
> >>>>>>>>>>> returns "false" when the page is managed by kernel and is
> >>>>>>>>>>> not marked as RESERVED (for some reason). For us it does not
> >>>>>>>>>>> matter whether the page is reserved or not, if it is kernel
> >>>>>>>>>>> visible page then it
> >>>>>>>> is DDR.
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> I think you are setting I|G by addressing all mmio pages,
> >>>>>>>>>>> right? If so,
> >>>>>>>>>>>
> >>>>>>>>>>> KVM: direct mmio pfn check
> >>>>>>>>>>>
> >>>>>>>>>>> Userspace may specify memory slots that are backed by
> >>>>>>>>>>> mmio pages rather than
> >>>>>>>>>>> normal RAM. In some cases it is not enough to identify
> >>>>>>>>>>> these mmio
> >>>>>>>>> pages
> >>>>>>>>>>> by pfn_valid(). This patch adds checking the PageReserved as
> well.
> >>>>>>>>>>
> >>>>>>>>>> Do you know what are those "some cases" and how checking
> >>>>>>>>>> PageReserved helps in
> >>>>>>>>> those cases?
> >>>>>>>>>
> >>>>>>>>> No, myself didn't see these actual cases in qemu,too. But this
> >>>>>>>>> should be chronically persistent as I understand ;-)
> >>>>>>>>
> >>>>>>>> Then I will wait till someone educate me :)
> >>>>>>>
> >>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do
> not want to call this for all tlbwe operation unless it is necessary.
> >>>>>>
> >>>>>> It certainly does more than we need and potentially slows down the fast
> path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to
> check for pages that are declared reserved on the host. This happens in 2 cases:
> >>>>>>
> >>>>>> 1) Non cache coherent DMA
> >>>>>> 2) Memory hot remove
> >>>>>>
> >>>>>> The non coherent DMA case would be interesting, as with the mechanism as
> it is in place in Linux today, we could potentially break normal guest operation
> if we don't take it into account. However, it's Kconfig guarded by:
> >>>>>>
> >>>>>> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
> >>>>>> default n if PPC_47x
> >>>>>> default y
> >>>>>>
> >>>>>> so we never hit it with any core we care about ;).
> >>>>>>
> >>>>>> Memory hot remove does not exist on e500 FWIW, so we don't have to worry
> about that one either.
> >>>>>
> >>>>> Thanks for this good information :)
> >>>>>
> >>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside
> kvm_is_mmio_pfn() to make sure that check is only valid when that is really
> needed? This can decrease those unnecessary performance loss.
> >>>>>
> >>>>> If I'm wrong please correct me :)
> >>>>
> >>>> You're perfectly right, but this is generic KVM code. So it gets run across
> all architectures. What if someone has the great idea to add a new case here for
> x86, but doesn't tell us? In that case we potentially break x86.
> >>>>
> >>>> I'd rather not like to break x86 :).
> >>>>
> >>>> However, it'd be very interesting to see a benchmark with this change. Do
> you think you could just rip out the whole reserved check and run a few
> benchmarks and show us the results?
> >>>>
> >>>
> >>> Often what case should be adopted to validate this scenario?
> >>
> >> Something which hammers the TLB emulation heavily. I usually just run
> >> /bin/echo a thousand times in "time" and see how long it takes ;)
> >>
> >
> > I tried to run five times with this combination, "time `for ((i=0; i<5000;
> i++)); do /bin/echo; done`", to calculate the average value with this change:
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> > 1580dd4..5e8635b 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> >
> > bool kvm_is_mmio_pfn(pfn_t pfn)
> > {
> > +#ifdef CONFIG_MEMORY_HOTPLUG
>
> I'd feel safer if we narrow this down to e500.
>
> > + /*
> > + * Currently only in memory hot remove case we may still need this.
> > + */
> > if (pfn_valid(pfn)) {
>
> We still have to check for pfn_valid, no? So the #ifdef should be down here.
>
> > int reserved;
> > struct page *tail = pfn_to_page(pfn); @@ -124,6 +128,7
> > @@ bool kvm_is_mmio_pfn(pfn_t pfn)
> > }
> > return PageReserved(tail);
> > }
> > +#endif
> >
> > return true;
> > }
> >
> > Before apply this change:
> >
> > real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5=
> 1m21.376s
> > user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5=
> 0m23.433s
> > sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
> >
> > After apply this change:
> >
> > real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5=
> 1m20.667s
> > user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5=
> 0m22.615s
> > sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
> >
> > So,
> >
> > real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> > user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> > sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
>
> Very nice, so there is a real world performance benefit to doing this. Then yes,
> I think it would make sense to change the global helper function to be fast on
> e500 and use that one from e500_shadow_mas2_attrib() instead.
Are not we going to use page_is_ram() from e500_shadow_mas2_attrib() as Scott commented?
-Bharat
>
> Gleb, Paolo, any hard feelings?
>
>
> Alex
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-24 9:11 ` Bhushan Bharat-R65777
@ 2013-07-24 9:21 ` Alexander Graf
2013-07-24 9:35 ` Gleb Natapov
0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2013-07-24 9:21 UTC (permalink / raw)
To: Bhushan Bharat-R65777
Cc: "“tiejun.chen”",
kvm-ppc, kvm@vger.kernel.org list, Wood Scott-B07421,
Gleb Natapov, Paolo Bonzini
On 24.07.2013, at 11:11, Bhushan Bharat-R65777 wrote:
>
>
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Wednesday, July 24, 2013 1:55 PM
>> To: "“tiejun.chen”"
>> Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; kvm@vger.kernel.org list;
>> Wood Scott-B07421; Gleb Natapov; Paolo Bonzini
>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>> managed pages
>>
>>
>> On 24.07.2013, at 04:26, “tiejun.chen” wrote:
>>
>>> On 07/18/2013 06:27 PM, Alexander Graf wrote:
>>>>
>>>> On 18.07.2013, at 12:19, “tiejun.chen” wrote:
>>>>
>>>>> On 07/18/2013 06:12 PM, Alexander Graf wrote:
>>>>>>
>>>>>> On 18.07.2013, at 12:08, “tiejun.chen” wrote:
>>>>>>
>>>>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote:
>>>>>>>>
>>>>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Bhushan Bharat-R65777
>>>>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM
>>>>>>>>>> To: '" tiejun.chen "'
>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>> agraf@suse.de; Wood Scott-
>>>>>>>>>> B07421
>>>>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>>>> for kernel managed pages
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com]
>>>>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>>> agraf@suse.de; Wood
>>>>>>>>>>> Scott-
>>>>>>>>>>> B07421
>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
>>>>>>>>>>> only for kernel managed pages
>>>>>>>>>>>
>>>>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of " tiejun.chen "
>>>>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>>>>> agraf@suse.de; Wood
>>>>>>>>>>>>> Scott-
>>>>>>>>>>>>> B07421
>>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
>>>>>>>>>>>>> only for kernel managed pages
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com]
>>>>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>>>>>>> agraf@suse.de; Wood
>>>>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
>>>>>>>>>>>>>>> only for kernel managed pages
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>>>>>>>>> If there is a struct page for the requested mapping then
>>>>>>>>>>>>>>>> it's normal DDR and the mapping sets "M" bit (coherent,
>>>>>>>>>>>>>>>> cacheable) else this is treated as I/O and we set "I +
>>>>>>>>>>>>>>>> G" (cache inhibited,
>>>>>>>>>>>>>>>> guarded)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned
>>>>>>>>>>>>>>>> device
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan
>>>>>>>>>>>>>>>> <bharat.bhushan@freescale.com>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++-----
>>>>>>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>>>>>>>>> usermode)
>>>>>>>>>>>>>>>> return mas3;
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>>>>>>>>> usermode)
>>>>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2,
>>>>>>>>>>>>>>>> +pfn_t pfn)
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>> + u32 mas2_attr;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> + if (!pfn_valid(pfn)) {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What I understand from this function (someone can correct
>>>>>>>>>>>>>> me) is that it
>>>>>>>>>>>>> returns "false" when the page is managed by kernel and is
>>>>>>>>>>>>> not marked as RESERVED (for some reason). For us it does not
>>>>>>>>>>>>> matter whether the page is reserved or not, if it is kernel
>>>>>>>>>>>>> visible page then it
>>>>>>>>>> is DDR.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think you are setting I|G by addressing all mmio pages,
>>>>>>>>>>>>> right? If so,
>>>>>>>>>>>>>
>>>>>>>>>>>>> KVM: direct mmio pfn check
>>>>>>>>>>>>>
>>>>>>>>>>>>> Userspace may specify memory slots that are backed by
>>>>>>>>>>>>> mmio pages rather than
>>>>>>>>>>>>> normal RAM. In some cases it is not enough to identify
>>>>>>>>>>>>> these mmio
>>>>>>>>>>> pages
>>>>>>>>>>>>> by pfn_valid(). This patch adds checking the PageReserved as
>> well.
>>>>>>>>>>>>
>>>>>>>>>>>> Do you know what are those "some cases" and how checking
>>>>>>>>>>>> PageReserved helps in
>>>>>>>>>>> those cases?
>>>>>>>>>>>
>>>>>>>>>>> No, myself didn't see these actual cases in qemu,too. But this
>>>>>>>>>>> should be chronically persistent as I understand ;-)
>>>>>>>>>>
>>>>>>>>>> Then I will wait till someone educate me :)
>>>>>>>>>
>>>>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do
>> not want to call this for all tlbwe operation unless it is necessary.
>>>>>>>>
>>>>>>>> It certainly does more than we need and potentially slows down the fast
>> path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to
>> check for pages that are declared reserved on the host. This happens in 2 cases:
>>>>>>>>
>>>>>>>> 1) Non cache coherent DMA
>>>>>>>> 2) Memory hot remove
>>>>>>>>
>>>>>>>> The non coherent DMA case would be interesting, as with the mechanism as
>> it is in place in Linux today, we could potentially break normal guest operation
>> if we don't take it into account. However, it's Kconfig guarded by:
>>>>>>>>
>>>>>>>> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
>>>>>>>> default n if PPC_47x
>>>>>>>> default y
>>>>>>>>
>>>>>>>> so we never hit it with any core we care about ;).
>>>>>>>>
>>>>>>>> Memory hot remove does not exist on e500 FWIW, so we don't have to worry
>> about that one either.
>>>>>>>
>>>>>>> Thanks for this good information :)
>>>>>>>
>>>>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside
>> kvm_is_mmio_pfn() to make sure that check is only valid when that is really
>> needed? This can decrease those unnecessary performance loss.
>>>>>>>
>>>>>>> If I'm wrong please correct me :)
>>>>>>
>>>>>> You're perfectly right, but this is generic KVM code. So it gets run across
>> all architectures. What if someone has the great idea to add a new case here for
>> x86, but doesn't tell us? In that case we potentially break x86.
>>>>>>
>>>>>> I'd rather not like to break x86 :).
>>>>>>
>>>>>> However, it'd be very interesting to see a benchmark with this change. Do
>> you think you could just rip out the whole reserved check and run a few
>> benchmarks and show us the results?
>>>>>>
>>>>>
>>>>> Often what case should be adopted to validate this scenario?
>>>>
>>>> Something which hammers the TLB emulation heavily. I usually just run
>>>> /bin/echo a thousand times in "time" and see how long it takes ;)
>>>>
>>>
>>> I tried to run five times with this combination, "time `for ((i=0; i<5000;
>> i++)); do /bin/echo; done`", to calculate the average value with this change:
>>>
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
>>> 1580dd4..5e8635b 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
>>>
>>> bool kvm_is_mmio_pfn(pfn_t pfn)
>>> {
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>>
>> I'd feel safer if we narrow this down to e500.
>>
>>> + /*
>>> + * Currently only in memory hot remove case we may still need this.
>>> + */
>>> if (pfn_valid(pfn)) {
>>
>> We still have to check for pfn_valid, no? So the #ifdef should be down here.
>>
>>> int reserved;
>>> struct page *tail = pfn_to_page(pfn); @@ -124,6 +128,7
>>> @@ bool kvm_is_mmio_pfn(pfn_t pfn)
>>> }
>>> return PageReserved(tail);
>>> }
>>> +#endif
>>>
>>> return true;
>>> }
>>>
>>> Before apply this change:
>>>
>>> real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5=
>> 1m21.376s
>>> user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5=
>> 0m23.433s
>>> sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
>>>
>>> After apply this change:
>>>
>>> real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5=
>> 1m20.667s
>>> user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5=
>> 0m22.615s
>>> sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
>>>
>>> So,
>>>
>>> real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
>>> user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
>>> sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
>>
>> Very nice, so there is a real world performance benefit to doing this. Then yes,
>> I think it would make sense to change the global helper function to be fast on
>> e500 and use that one from e500_shadow_mas2_attrib() instead.
>
> Are not we going to use page_is_ram() from e500_shadow_mas2_attrib() as Scott commented?
rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
Alex
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-24 9:21 ` Alexander Graf
@ 2013-07-24 9:35 ` Gleb Natapov
2013-07-24 9:39 ` Alexander Graf
0 siblings, 1 reply; 44+ messages in thread
From: Gleb Natapov @ 2013-07-24 9:35 UTC (permalink / raw)
To: Alexander Graf
Cc: Bhushan Bharat-R65777, "“tiejun.chen”",
kvm-ppc, kvm@vger.kernel.org list, Wood Scott-B07421,
Paolo Bonzini
On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
> > Are not we going to use page_is_ram() from e500_shadow_mas2_attrib() as Scott commented?
>
> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
>
>
Because it is much slower and, IIRC, actually used to build pfn map that allow
us to check quickly for valid pfn.
--
Gleb.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-24 9:35 ` Gleb Natapov
@ 2013-07-24 9:39 ` Alexander Graf
2013-07-24 20:32 ` Scott Wood
0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2013-07-24 9:39 UTC (permalink / raw)
To: Gleb Natapov
Cc: Bhushan Bharat-R65777, "“tiejun.chen”",
kvm-ppc, kvm@vger.kernel.org list, Wood Scott-B07421,
Paolo Bonzini
On 24.07.2013, at 11:35, Gleb Natapov wrote:
> On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
>>> Are not we going to use page_is_ram() from e500_shadow_mas2_attrib() as Scott commented?
>>
>> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
>>
>>
> Because it is much slower and, IIRC, actually used to build pfn map that allow
> us to check quickly for valid pfn.
Then why should we use page_is_ram()? :)
I really don't want the e500 code to diverge too much from what the rest of the kvm code is doing.
Alex
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-24 8:25 ` Alexander Graf
2013-07-24 9:11 ` Bhushan Bharat-R65777
@ 2013-07-24 10:01 ` Gleb Natapov
2013-07-24 10:09 ` Alexander Graf
1 sibling, 1 reply; 44+ messages in thread
From: Gleb Natapov @ 2013-07-24 10:01 UTC (permalink / raw)
To: Alexander Graf
Cc: "“tiejun.chen”",
Bhushan Bharat-R65777, kvm-ppc, kvm@vger.kernel.org list,
Wood Scott-B07421, Paolo Bonzini, Andrea Arcangeli
Copying Andrea for him to verify that I am not talking nonsense :)
On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 1580dd4..5e8635b 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> >
> > bool kvm_is_mmio_pfn(pfn_t pfn)
> > {
> > +#ifdef CONFIG_MEMORY_HOTPLUG
>
> I'd feel safer if we narrow this down to e500.
>
> > + /*
> > + * Currently only in memory hot remove case we may still need this.
> > + */
> > if (pfn_valid(pfn)) {
>
> We still have to check for pfn_valid, no? So the #ifdef should be down here.
>
> > int reserved;
> > struct page *tail = pfn_to_page(pfn);
> > @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
> > }
> > return PageReserved(tail);
> > }
> > +#endif
> >
> > return true;
> > }
> >
> > Before apply this change:
> >
> > real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
> > user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
> > sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
> >
> > After apply this change:
> >
> > real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
> > user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
> > sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
> >
> > So,
> >
> > real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> > user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> > sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
>
> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
>
> Gleb, Paolo, any hard feelings?
>
I do not see how can we break the function in such a way and get
away with it. Not all valid pfns point to memory. Physical address can
be sparse (due to PCI hole, framebuffer or just because).
--
Gleb.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-24 10:01 ` Gleb Natapov
@ 2013-07-24 10:09 ` Alexander Graf
2013-07-24 10:19 ` Gleb Natapov
0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2013-07-24 10:09 UTC (permalink / raw)
To: Gleb Natapov
Cc: "“tiejun.chen”",
Bhushan Bharat-R65777, kvm-ppc, kvm@vger.kernel.org list,
Wood Scott-B07421, Paolo Bonzini, Andrea Arcangeli
On 24.07.2013, at 12:01, Gleb Natapov wrote:
> Copying Andrea for him to verify that I am not talking nonsense :)
>
> On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 1580dd4..5e8635b 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
>>>
>>> bool kvm_is_mmio_pfn(pfn_t pfn)
>>> {
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>>
>> I'd feel safer if we narrow this down to e500.
>>
>>> + /*
>>> + * Currently only in memory hot remove case we may still need this.
>>> + */
>>> if (pfn_valid(pfn)) {
>>
>> We still have to check for pfn_valid, no? So the #ifdef should be down here.
>>
>>> int reserved;
>>> struct page *tail = pfn_to_page(pfn);
>>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
>>> }
>>> return PageReserved(tail);
>>> }
>>> +#endif
>>>
>>> return true;
>>> }
>>>
>>> Before apply this change:
>>>
>>> real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
>>> user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
>>> sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
>>>
>>> After apply this change:
>>>
>>> real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
>>> user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
>>> sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
>>>
>>> So,
>>>
>>> real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
>>> user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
>>> sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
>>
>> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
>>
>> Gleb, Paolo, any hard feelings?
>>
> I do not see how can we break the function in such a way and get
> away with it. Not all valid pfns point to memory. Physical address can
> be sparse (due to PCI hole, framebuffer or just because).
But we don't check for sparseness today in here either. We merely check for incomplete huge pages.
Alex
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-24 10:09 ` Alexander Graf
@ 2013-07-24 10:19 ` Gleb Natapov
2013-07-24 10:25 ` Alexander Graf
0 siblings, 1 reply; 44+ messages in thread
From: Gleb Natapov @ 2013-07-24 10:19 UTC (permalink / raw)
To: Alexander Graf
Cc: "“tiejun.chen”",
Bhushan Bharat-R65777, kvm-ppc, kvm@vger.kernel.org list,
Wood Scott-B07421, Paolo Bonzini, Andrea Arcangeli
On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
>
> On 24.07.2013, at 12:01, Gleb Natapov wrote:
>
> > Copying Andrea for him to verify that I am not talking nonsense :)
> >
> > On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>> index 1580dd4..5e8635b 100644
> >>> --- a/virt/kvm/kvm_main.c
> >>> +++ b/virt/kvm/kvm_main.c
> >>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> >>>
> >>> bool kvm_is_mmio_pfn(pfn_t pfn)
> >>> {
> >>> +#ifdef CONFIG_MEMORY_HOTPLUG
> >>
> >> I'd feel safer if we narrow this down to e500.
> >>
> >>> + /*
> >>> + * Currently only in memory hot remove case we may still need this.
> >>> + */
> >>> if (pfn_valid(pfn)) {
> >>
> >> We still have to check for pfn_valid, no? So the #ifdef should be down here.
> >>
> >>> int reserved;
> >>> struct page *tail = pfn_to_page(pfn);
> >>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
> >>> }
> >>> return PageReserved(tail);
> >>> }
> >>> +#endif
> >>>
> >>> return true;
> >>> }
> >>>
> >>> Before apply this change:
> >>>
> >>> real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
> >>> user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
> >>> sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
> >>>
> >>> After apply this change:
> >>>
> >>> real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
> >>> user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
> >>> sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
> >>>
> >>> So,
> >>>
> >>> real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> >>> user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> >>> sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
> >>
> >> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
> >>
> >> Gleb, Paolo, any hard feelings?
> >>
> > I do not see how can we break the function in such a way and get
> > away with it. Not all valid pfns point to memory. Physical address can
> > be sparse (due to PCI hole, framebuffer or just because).
>
> But we don't check for sparseness today in here either. We merely check for incomplete huge pages.
>
That's not how I read the code. The code checks for reserved flag set.
It should be set on pfns that point to memory holes. As far as I
understand huge page tricks they are there to guaranty that THP does not
change flags under us, Andrea?
--
Gleb.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-24 10:19 ` Gleb Natapov
@ 2013-07-24 10:25 ` Alexander Graf
2013-07-24 10:30 ` Gleb Natapov
0 siblings, 1 reply; 44+ messages in thread
From: Alexander Graf @ 2013-07-24 10:25 UTC (permalink / raw)
To: Gleb Natapov
Cc: "“tiejun.chen”",
Bhushan Bharat-R65777, kvm-ppc, kvm@vger.kernel.org list,
Wood Scott-B07421, Paolo Bonzini, Andrea Arcangeli
On 24.07.2013, at 12:19, Gleb Natapov wrote:
> On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
>>
>> On 24.07.2013, at 12:01, Gleb Natapov wrote:
>>
>>> Copying Andrea for him to verify that I am not talking nonsense :)
>>>
>>> On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
>>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>>> index 1580dd4..5e8635b 100644
>>>>> --- a/virt/kvm/kvm_main.c
>>>>> +++ b/virt/kvm/kvm_main.c
>>>>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
>>>>>
>>>>> bool kvm_is_mmio_pfn(pfn_t pfn)
>>>>> {
>>>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>>>>
>>>> I'd feel safer if we narrow this down to e500.
>>>>
>>>>> + /*
>>>>> + * Currently only in memory hot remove case we may still need this.
>>>>> + */
>>>>> if (pfn_valid(pfn)) {
>>>>
>>>> We still have to check for pfn_valid, no? So the #ifdef should be down here.
>>>>
>>>>> int reserved;
>>>>> struct page *tail = pfn_to_page(pfn);
>>>>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
>>>>> }
>>>>> return PageReserved(tail);
>>>>> }
>>>>> +#endif
>>>>>
>>>>> return true;
>>>>> }
>>>>>
>>>>> Before apply this change:
>>>>>
>>>>> real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
>>>>> user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
>>>>> sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
>>>>>
>>>>> After apply this change:
>>>>>
>>>>> real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
>>>>> user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
>>>>> sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
>>>>>
>>>>> So,
>>>>>
>>>>> real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
>>>>> user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
>>>>> sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
>>>>
>>>> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
>>>>
>>>> Gleb, Paolo, any hard feelings?
>>>>
>>> I do not see how can we break the function in such a way and get
>>> away with it. Not all valid pfns point to memory. Physical address can
>>> be sparse (due to PCI hole, framebuffer or just because).
>>
>> But we don't check for sparseness today in here either. We merely check for incomplete huge pages.
>>
> That's not how I read the code. The code checks for reserved flag set.
> It should be set on pfns that point to memory holes. As far as I
I couldn't find any traces of code that sets the reserved bits on e500 chips though. I've only seen it getting set for memory hotplug and memory incoherent DMA code which doesn't get used on e500.
But I'd be more than happy to get proven wrong :).
Alex
> understand huge page tricks they are there to guaranty that THP does not
> change flags under us, Andrea?
>
> --
> Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-24 10:25 ` Alexander Graf
@ 2013-07-24 10:30 ` Gleb Natapov
2013-07-25 1:04 ` Andrea Arcangeli
0 siblings, 1 reply; 44+ messages in thread
From: Gleb Natapov @ 2013-07-24 10:30 UTC (permalink / raw)
To: Alexander Graf
Cc: "“tiejun.chen”",
Bhushan Bharat-R65777, kvm-ppc, kvm@vger.kernel.org list,
Wood Scott-B07421, Paolo Bonzini, Andrea Arcangeli
On Wed, Jul 24, 2013 at 12:25:18PM +0200, Alexander Graf wrote:
>
> On 24.07.2013, at 12:19, Gleb Natapov wrote:
>
> > On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
> >>
> >> On 24.07.2013, at 12:01, Gleb Natapov wrote:
> >>
> >>> Copying Andrea for him to verify that I am not talking nonsense :)
> >>>
> >>> On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
> >>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>>>> index 1580dd4..5e8635b 100644
> >>>>> --- a/virt/kvm/kvm_main.c
> >>>>> +++ b/virt/kvm/kvm_main.c
> >>>>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> >>>>>
> >>>>> bool kvm_is_mmio_pfn(pfn_t pfn)
> >>>>> {
> >>>>> +#ifdef CONFIG_MEMORY_HOTPLUG
> >>>>
> >>>> I'd feel safer if we narrow this down to e500.
> >>>>
> >>>>> + /*
> >>>>> + * Currently only in memory hot remove case we may still need this.
> >>>>> + */
> >>>>> if (pfn_valid(pfn)) {
> >>>>
> >>>> We still have to check for pfn_valid, no? So the #ifdef should be down here.
> >>>>
> >>>>> int reserved;
> >>>>> struct page *tail = pfn_to_page(pfn);
> >>>>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
> >>>>> }
> >>>>> return PageReserved(tail);
> >>>>> }
> >>>>> +#endif
> >>>>>
> >>>>> return true;
> >>>>> }
> >>>>>
> >>>>> Before apply this change:
> >>>>>
> >>>>> real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
> >>>>> user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
> >>>>> sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
> >>>>>
> >>>>> After apply this change:
> >>>>>
> >>>>> real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
> >>>>> user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
> >>>>> sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
> >>>>>
> >>>>> So,
> >>>>>
> >>>>> real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> >>>>> user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> >>>>> sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
> >>>>
> >>>> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
> >>>>
> >>>> Gleb, Paolo, any hard feelings?
> >>>>
> >>> I do not see how can we break the function in such a way and get
> >>> away with it. Not all valid pfns point to memory. Physical address can
> >>> be sparse (due to PCI hole, framebuffer or just because).
> >>
> >> But we don't check for sparseness today in here either. We merely check for incomplete huge pages.
> >>
> > That's not how I read the code. The code checks for reserved flag set.
> > It should be set on pfns that point to memory holes. As far as I
>
> I couldn't find any traces of code that sets the reserved bits on e500 chips though. I've only seen it getting set for memory hotplug and memory incoherent DMA code which doesn't get used on e500.
>
> But I'd be more than happy to get proven wrong :).
>
Can you write a module that scans all page structures? AFAIK all pages
are marked as reserved and then those that become regular memory are
marked as unreserved. Hope Andrea will chime in here :)
--
Gleb.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-24 9:39 ` Alexander Graf
@ 2013-07-24 20:32 ` Scott Wood
0 siblings, 0 replies; 44+ messages in thread
From: Scott Wood @ 2013-07-24 20:32 UTC (permalink / raw)
To: Alexander Graf
Cc: Gleb Natapov, Bhushan Bharat-R65777, “tiejun.chen”,
kvm-ppc, kvm@vger.kernel.org list, Wood Scott-B07421,
Paolo Bonzini, linuxppc-dev
On 07/24/2013 04:39:59 AM, Alexander Graf wrote:
>
> On 24.07.2013, at 11:35, Gleb Natapov wrote:
>
> > On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
> >>> Are not we going to use page_is_ram() from
> e500_shadow_mas2_attrib() as Scott commented?
> >>
> >> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
> >>
> >>
> > Because it is much slower and, IIRC, actually used to build pfn map
> that allow
> > us to check quickly for valid pfn.
>
> Then why should we use page_is_ram()? :)
>
> I really don't want the e500 code to diverge too much from what the
> rest of the kvm code is doing.
I don't understand "actually used to build pfn map...". What code is
this? I don't see any calls to page_is_ram() in the KVM code, or in
generic mm code. Is this a statement about what x86 does?
On PPC page_is_ram() is only called (AFAICT) for determining what
attributes to set on mmaps. We want to be sure that KVM always makes
the same decision. While pfn_valid() seems like it should be
equivalent, it's not obvious from the PPC code that it is.
If pfn_valid() is better, why is that not used for mmap? Why are there
two different names for the same thing?
-Scott
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
@ 2013-07-24 20:32 ` Scott Wood
0 siblings, 0 replies; 44+ messages in thread
From: Scott Wood @ 2013-07-24 20:32 UTC (permalink / raw)
To: Alexander Graf
Cc: Wood Scott-B07421, Gleb Natapov, kvm@vger.kernel.org list,
kvm-ppc, “tiejun.chen”,
Bhushan Bharat-R65777, Paolo Bonzini, linuxppc-dev
On 07/24/2013 04:39:59 AM, Alexander Graf wrote:
>=20
> On 24.07.2013, at 11:35, Gleb Natapov wrote:
>=20
> > On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
> >>> Are not we going to use page_is_ram() from =20
> e500_shadow_mas2_attrib() as Scott commented?
> >>
> >> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
> >>
> >>
> > Because it is much slower and, IIRC, actually used to build pfn map =20
> that allow
> > us to check quickly for valid pfn.
>=20
> Then why should we use page_is_ram()? :)
>=20
> I really don't want the e500 code to diverge too much from what the =20
> rest of the kvm code is doing.
I don't understand "actually used to build pfn map...". What code is =20
this? I don't see any calls to page_is_ram() in the KVM code, or in =20
generic mm code. Is this a statement about what x86 does?
On PPC page_is_ram() is only called (AFAICT) for determining what =20
attributes to set on mmaps. We want to be sure that KVM always makes =20
the same decision. While pfn_valid() seems like it should be =20
equivalent, it's not obvious from the PPC code that it is.
If pfn_valid() is better, why is that not used for mmap? Why are there =20
two different names for the same thing?
-Scott=
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-24 10:30 ` Gleb Natapov
@ 2013-07-25 1:04 ` Andrea Arcangeli
0 siblings, 0 replies; 44+ messages in thread
From: Andrea Arcangeli @ 2013-07-25 1:04 UTC (permalink / raw)
To: Gleb Natapov
Cc: Alexander Graf, "“tiejun.chen”",
Bhushan Bharat-R65777, kvm-ppc, kvm@vger.kernel.org list,
Wood Scott-B07421, Paolo Bonzini
Hi!
On Wed, Jul 24, 2013 at 01:30:12PM +0300, Gleb Natapov wrote:
> On Wed, Jul 24, 2013 at 12:25:18PM +0200, Alexander Graf wrote:
> >
> > On 24.07.2013, at 12:19, Gleb Natapov wrote:
> >
> > > On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
> > >>
> > >> On 24.07.2013, at 12:01, Gleb Natapov wrote:
> > >>
> > >>> Copying Andrea for him to verify that I am not talking nonsense :)
> > >>>
> > >>> On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
> > >>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > >>>>> index 1580dd4..5e8635b 100644
> > >>>>> --- a/virt/kvm/kvm_main.c
> > >>>>> +++ b/virt/kvm/kvm_main.c
> > >>>>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> > >>>>>
> > >>>>> bool kvm_is_mmio_pfn(pfn_t pfn)
> > >>>>> {
> > >>>>> +#ifdef CONFIG_MEMORY_HOTPLUG
> > >>>>
> > >>>> I'd feel safer if we narrow this down to e500.
> > >>>>
> > >>>>> + /*
> > >>>>> + * Currently only in memory hot remove case we may still need this.
> > >>>>> + */
> > >>>>> if (pfn_valid(pfn)) {
> > >>>>
> > >>>> We still have to check for pfn_valid, no? So the #ifdef should be down here.
> > >>>>
> > >>>>> int reserved;
> > >>>>> struct page *tail = pfn_to_page(pfn);
> > >>>>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
> > >>>>> }
> > >>>>> return PageReserved(tail);
> > >>>>> }
> > >>>>> +#endif
> > >>>>>
> > >>>>> return true;
> > >>>>> }
> > >>>>>
> > >>>>> Before apply this change:
> > >>>>>
> > >>>>> real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
> > >>>>> user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
> > >>>>> sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
> > >>>>>
> > >>>>> After apply this change:
> > >>>>>
> > >>>>> real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
> > >>>>> user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
> > >>>>> sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
> > >>>>>
> > >>>>> So,
> > >>>>>
> > >>>>> real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> > >>>>> user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> > >>>>> sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
> > >>>>
> > >>>> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
> > >>>>
> > >>>> Gleb, Paolo, any hard feelings?
> > >>>>
> > >>> I do not see how can we break the function in such a way and get
> > >>> away with it. Not all valid pfns point to memory. Physical address can
> > >>> be sparse (due to PCI hole, framebuffer or just because).
> > >>
> > >> But we don't check for sparseness today in here either. We merely check for incomplete huge pages.
> > >>
> > > That's not how I read the code. The code checks for reserved flag set.
> > > It should be set on pfns that point to memory holes. As far as I
> >
> > I couldn't find any traces of code that sets the reserved bits on e500 chips though. I've only seen it getting set for memory hotplug and memory incoherent DMA code which doesn't get used on e500.
> >
> > But I'd be more than happy to get proven wrong :).
> >
> Can you write a module that scans all page structures? AFAIK all pages
> are marked as reserved and then those that become regular memory are
> marked as unreserved. Hope Andrea will chime in here :)
So the situation with regard to non-RAM and PageReserved/pfn_valid is
quite simple.
"struct page" exists for non-RAM too as "struct page" must exist up to
at least 2^MAX_ORDER pfn alignment or things breaks, like the first
pfn must be 2^MXA_ORDER aligned or again things break in the buddy. We
don't make an effort to save a few "struct page" to keep it simpler.
But those non-RAM pages (or tiny non-RAM page holes if any) are marked
PageReserved.
If "struct page" doesn't exist pfn_valid returns false.
So you cannot get away skipping pfn_valid and at least one
PageReserved.
However it gets more complex than just ram vs non-RAM, because there
are pages that are real RAM (not left marked PageReserved at boot
after checking e820 or equivalent bios data for non-x86 archs) but
that are taken over by drivers, than then could use it as mmio regions
snooping the writes and mapping them in userland too as hugepages
maybe. That is the motivation for the THP related code in
kvm_is_mmio_pfn.
Those vmas have VM_PFNMAP set so vm_normal_page is zero and the
refcounting is skipped like if it's non-RAM and they're mapped with
remap_pfn_range (different mechanism for VM_MIXEDMAP that does the
refcounting and doesn't require in turn the driver to mark the page
PageReserved).
The above explains why KVM needs to skip the refcounting on
PageReserved == true && pfn_valid() == true, and it must skip the
refcounting for pfn_valid == false without trying to call pfn_to_page
(or it'll crash).
Now the code doing the THP check with smp_rmb is very safe, possibly
too safe. Looking at it now, it looks a minor overengineering
oversight.
The slight oversight is that split_huge_page cannot transfer the
PG_reserved bit from head to tail.
So there's no real risk that the driver allocates an hugepage, marks
the head reserved (the PG_ bits of a THP page are only relevant in the
head), maps the page with some new version of remap_pfn_range_huge
(not possible right now, PFNMAP|MIXEDMAP only can handle 4k mappings
right now) and then split_huge_page runs and we miss the reserved bit
on the tail page. Because the reserved bit wouldn't be transferred to
the tail page anyway by split_huge_page so we'd miss it anyway if
anything like that would happen.
Besides split_huge_page couldn't run on a device owned page as it's
not anonymous but device-owned and there's no way to map it with a
hugepmd too.
So in short, it's probably never going to help to have such a check
there. We can probably optimize away the THP code in there.
No matter how the driver maps this hypotetic new type of reserved
hugepage in userland, it should never allow split_huge_page to run on
it, and then it should take care of marking all subpages as reserved
too. And KVM won't need to worry about a driver setting reserved only
on a head page anymore.
Untested RFC patch follows.
==
>From 76927680df7034a575bed5da754f7ebe94481fb3 Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange@redhat.com>
Date: Thu, 25 Jul 2013 02:56:08 +0200
Subject: [PATCH] kvm: optimize away THP checks in kvm_is_mmio_pfn()
The checks on PG_reserved in the page structure on head and tail pages
aren't necessary because split_huge_page wouldn't transfer the
PG_reserved bit from head to tail anyway.
This was a forward-thinking check done in the case PageReserved was
set by a driver-owned page mapped in userland with something like
remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not
possible right now). It was meant to be very safe, but it's overkill
as it's unlikely split_huge_page could ever run without the driver
noticing and tearing down the hugepage itself.
And if a driver in the future will really want to map a reserved
hugepage in userland using an huge pmd it should simply take care of
marking all subpages reserved too to keep KVM safe. This of course
would require such a hypothetical driver to tear down the huge pmd
itself and splitting the hugepage itself, instead of relaying on
split_huge_page, but that sounds very reasonable, especially
considering split_huge_page wouldn't currently transfer the reserved
bit anyway.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
virt/kvm/kvm_main.c | 24 ++----------------------
1 file changed, 2 insertions(+), 22 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1580dd4..fa030fb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -102,28 +102,8 @@ static bool largepages_enabled = true;
bool kvm_is_mmio_pfn(pfn_t pfn)
{
- if (pfn_valid(pfn)) {
- int reserved;
- struct page *tail = pfn_to_page(pfn);
- struct page *head = compound_trans_head(tail);
- reserved = PageReserved(head);
- if (head != tail) {
- /*
- * "head" is not a dangling pointer
- * (compound_trans_head takes care of that)
- * but the hugepage may have been splitted
- * from under us (and we may not hold a
- * reference count on the head page so it can
- * be reused before we run PageReferenced), so
- * we've to check PageTail before returning
- * what we just read.
- */
- smp_rmb();
- if (PageTail(tail))
- return reserved;
- }
- return PageReserved(tail);
- }
+ if (pfn_valid(pfn))
+ return PageReserved(pfn_to_page(pfn));
return true;
}
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-24 20:32 ` Scott Wood
@ 2013-07-25 8:50 ` Gleb Natapov
-1 siblings, 0 replies; 44+ messages in thread
From: Gleb Natapov @ 2013-07-25 8:50 UTC (permalink / raw)
To: Scott Wood
Cc: Alexander Graf, Bhushan Bharat-R65777, “tiejun.chen”,
kvm-ppc, kvm@vger.kernel.org list, Wood Scott-B07421,
Paolo Bonzini, linuxppc-dev
On Wed, Jul 24, 2013 at 03:32:49PM -0500, Scott Wood wrote:
> On 07/24/2013 04:39:59 AM, Alexander Graf wrote:
> >
> >On 24.07.2013, at 11:35, Gleb Natapov wrote:
> >
> >> On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
> >>>> Are not we going to use page_is_ram() from
> >e500_shadow_mas2_attrib() as Scott commented?
> >>>
> >>> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
> >>>
> >>>
> >> Because it is much slower and, IIRC, actually used to build pfn
> >map that allow
> >> us to check quickly for valid pfn.
> >
> >Then why should we use page_is_ram()? :)
> >
> >I really don't want the e500 code to diverge too much from what
> >the rest of the kvm code is doing.
>
> I don't understand "actually used to build pfn map...". What code
> is this? I don't see any calls to page_is_ram() in the KVM code, or
> in generic mm code. Is this a statement about what x86 does?
It may be not page_is_ram() directly, but the same into page_is_ram() is
using. On power both page_is_ram() and do_init_bootmem() walks some kind
of memblock_region data structure. What important is that pfn_valid()
does not mean that there is a memory behind page structure. See Andrea's
reply.
>
> On PPC page_is_ram() is only called (AFAICT) for determining what
> attributes to set on mmaps. We want to be sure that KVM always
> makes the same decision. While pfn_valid() seems like it should be
> equivalent, it's not obvious from the PPC code that it is.
>
Again pfn_valid() is not enough.
> If pfn_valid() is better, why is that not used for mmap? Why are
> there two different names for the same thing?
>
They are not the same thing. page_is_ram() tells you if phys address is
ram backed. pfn_valid() tells you if there is struct page behind the
pfn. PageReserved() tells if you a pfn is marked as reserved. All non
ram pfns should be reserved, but ram pfns can be reserved too. Again,
see Andrea's reply.
Why ppc uses page_is_ram() for mmap? How should I know? But looking at
the function it does it only as a fallback if
ppc_md.phys_mem_access_prot() is not provided. Making access to MMIO
noncached as a safe fallback makes sense. It is also make sense to allow
noncached access to reserved ram sometimes.
--
Gleb.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
@ 2013-07-25 8:50 ` Gleb Natapov
0 siblings, 0 replies; 44+ messages in thread
From: Gleb Natapov @ 2013-07-25 8:50 UTC (permalink / raw)
To: Scott Wood
Cc: Wood Scott-B07421, kvm@vger.kernel.org list, Alexander Graf,
kvm-ppc, “tiejun.chen”,
Bhushan Bharat-R65777, Paolo Bonzini, linuxppc-dev
On Wed, Jul 24, 2013 at 03:32:49PM -0500, Scott Wood wrote:
> On 07/24/2013 04:39:59 AM, Alexander Graf wrote:
> >
> >On 24.07.2013, at 11:35, Gleb Natapov wrote:
> >
> >> On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
> >>>> Are not we going to use page_is_ram() from
> >e500_shadow_mas2_attrib() as Scott commented?
> >>>
> >>> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
> >>>
> >>>
> >> Because it is much slower and, IIRC, actually used to build pfn
> >map that allow
> >> us to check quickly for valid pfn.
> >
> >Then why should we use page_is_ram()? :)
> >
> >I really don't want the e500 code to diverge too much from what
> >the rest of the kvm code is doing.
>
> I don't understand "actually used to build pfn map...". What code
> is this? I don't see any calls to page_is_ram() in the KVM code, or
> in generic mm code. Is this a statement about what x86 does?
It may be not page_is_ram() directly, but the same into page_is_ram() is
using. On power both page_is_ram() and do_init_bootmem() walks some kind
of memblock_region data structure. What important is that pfn_valid()
does not mean that there is a memory behind page structure. See Andrea's
reply.
>
> On PPC page_is_ram() is only called (AFAICT) for determining what
> attributes to set on mmaps. We want to be sure that KVM always
> makes the same decision. While pfn_valid() seems like it should be
> equivalent, it's not obvious from the PPC code that it is.
>
Again pfn_valid() is not enough.
> If pfn_valid() is better, why is that not used for mmap? Why are
> there two different names for the same thing?
>
They are not the same thing. page_is_ram() tells you if phys address is
ram backed. pfn_valid() tells you if there is struct page behind the
pfn. PageReserved() tells if you a pfn is marked as reserved. All non
ram pfns should be reserved, but ram pfns can be reserved too. Again,
see Andrea's reply.
Why ppc uses page_is_ram() for mmap? How should I know? But looking at
the function it does it only as a fallback if
ppc_md.phys_mem_access_prot() is not provided. Making access to MMIO
noncached as a safe fallback makes sense. It is also make sense to allow
noncached access to reserved ram sometimes.
--
Gleb.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-25 8:50 ` Gleb Natapov
@ 2013-07-25 16:07 ` Alexander Graf
-1 siblings, 0 replies; 44+ messages in thread
From: Alexander Graf @ 2013-07-25 16:07 UTC (permalink / raw)
To: Gleb Natapov
Cc: Scott Wood, Bhushan Bharat-R65777,
"“tiejun.chen” Chen",
kvm-ppc, kvm@vger.kernel.org list, Wood Scott-B07421,
Paolo Bonzini, linuxppc-dev, Benjamin Herrenschmidt
On 25.07.2013, at 10:50, Gleb Natapov wrote:
> On Wed, Jul 24, 2013 at 03:32:49PM -0500, Scott Wood wrote:
>> On 07/24/2013 04:39:59 AM, Alexander Graf wrote:
>>>
>>> On 24.07.2013, at 11:35, Gleb Natapov wrote:
>>>
>>>> On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
>>>>>> Are not we going to use page_is_ram() from
>>> e500_shadow_mas2_attrib() as Scott commented?
>>>>>
>>>>> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
>>>>>
>>>>>
>>>> Because it is much slower and, IIRC, actually used to build pfn
>>> map that allow
>>>> us to check quickly for valid pfn.
>>>
>>> Then why should we use page_is_ram()? :)
>>>
>>> I really don't want the e500 code to diverge too much from what
>>> the rest of the kvm code is doing.
>>
>> I don't understand "actually used to build pfn map...". What code
>> is this? I don't see any calls to page_is_ram() in the KVM code, or
>> in generic mm code. Is this a statement about what x86 does?
> It may be not page_is_ram() directly, but the same into page_is_ram() is
> using. On power both page_is_ram() and do_init_bootmem() walks some kind
> of memblock_region data structure. What important is that pfn_valid()
> does not mean that there is a memory behind page structure. See Andrea's
> reply.
>
>>
>> On PPC page_is_ram() is only called (AFAICT) for determining what
>> attributes to set on mmaps. We want to be sure that KVM always
>> makes the same decision. While pfn_valid() seems like it should be
>> equivalent, it's not obvious from the PPC code that it is.
>>
> Again pfn_valid() is not enough.
>
>> If pfn_valid() is better, why is that not used for mmap? Why are
>> there two different names for the same thing?
>>
> They are not the same thing. page_is_ram() tells you if phys address is
> ram backed. pfn_valid() tells you if there is struct page behind the
> pfn. PageReserved() tells if you a pfn is marked as reserved. All non
> ram pfns should be reserved, but ram pfns can be reserved too. Again,
> see Andrea's reply.
>
> Why ppc uses page_is_ram() for mmap? How should I know? But looking at
That one's easy. Let's just ask Ben. Ben, is there any particular reason PPC uses page_is_ram() rather than what KVM does here to figure out whether a pfn is RAM or not? It would be really useful to be able to run the exact same logic that figures out whether we're cacheable or not in both TLB writers (KVM and linux-mm).
Alex
> the function it does it only as a fallback if
> ppc_md.phys_mem_access_prot() is not provided. Making access to MMIO
> noncached as a safe fallback makes sense. It is also make sense to allow
> noncached access to reserved ram sometimes.
>
> --
> Gleb.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
@ 2013-07-25 16:07 ` Alexander Graf
0 siblings, 0 replies; 44+ messages in thread
From: Alexander Graf @ 2013-07-25 16:07 UTC (permalink / raw)
To: Gleb Natapov
Cc: Wood Scott-B07421, kvm@vger.kernel.org list, kvm-ppc,
"“tiejun.chen” Chen",
Bhushan Bharat-R65777, Scott Wood, Paolo Bonzini, linuxppc-dev
On 25.07.2013, at 10:50, Gleb Natapov wrote:
> On Wed, Jul 24, 2013 at 03:32:49PM -0500, Scott Wood wrote:
>> On 07/24/2013 04:39:59 AM, Alexander Graf wrote:
>>>=20
>>> On 24.07.2013, at 11:35, Gleb Natapov wrote:
>>>=20
>>>> On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
>>>>>> Are not we going to use page_is_ram() from
>>> e500_shadow_mas2_attrib() as Scott commented?
>>>>>=20
>>>>> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
>>>>>=20
>>>>>=20
>>>> Because it is much slower and, IIRC, actually used to build pfn
>>> map that allow
>>>> us to check quickly for valid pfn.
>>>=20
>>> Then why should we use page_is_ram()? :)
>>>=20
>>> I really don't want the e500 code to diverge too much from what
>>> the rest of the kvm code is doing.
>>=20
>> I don't understand "actually used to build pfn map...". What code
>> is this? I don't see any calls to page_is_ram() in the KVM code, or
>> in generic mm code. Is this a statement about what x86 does?
> It may be not page_is_ram() directly, but the same into page_is_ram() =
is
> using. On power both page_is_ram() and do_init_bootmem() walks some =
kind
> of memblock_region data structure. What important is that pfn_valid()
> does not mean that there is a memory behind page structure. See =
Andrea's
> reply.
>=20
>>=20
>> On PPC page_is_ram() is only called (AFAICT) for determining what
>> attributes to set on mmaps. We want to be sure that KVM always
>> makes the same decision. While pfn_valid() seems like it should be
>> equivalent, it's not obvious from the PPC code that it is.
>>=20
> Again pfn_valid() is not enough.
>=20
>> If pfn_valid() is better, why is that not used for mmap? Why are
>> there two different names for the same thing?
>>=20
> They are not the same thing. page_is_ram() tells you if phys address =
is
> ram backed. pfn_valid() tells you if there is struct page behind the
> pfn. PageReserved() tells if you a pfn is marked as reserved. All non
> ram pfns should be reserved, but ram pfns can be reserved too. Again,
> see Andrea's reply.
>=20
> Why ppc uses page_is_ram() for mmap? How should I know? But looking at
That one's easy. Let's just ask Ben. Ben, is there any particular reason =
PPC uses page_is_ram() rather than what KVM does here to figure out =
whether a pfn is RAM or not? It would be really useful to be able to run =
the exact same logic that figures out whether we're cacheable or not in =
both TLB writers (KVM and linux-mm).
Alex
> the function it does it only as a fallback if
> ppc_md.phys_mem_access_prot() is not provided. Making access to MMIO
> noncached as a safe fallback makes sense. It is also make sense to =
allow
> noncached access to reserved ram sometimes.
>=20
> --
> Gleb.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-25 16:07 ` Alexander Graf
@ 2013-07-25 16:14 ` Gleb Natapov
-1 siblings, 0 replies; 44+ messages in thread
From: Gleb Natapov @ 2013-07-25 16:14 UTC (permalink / raw)
To: Alexander Graf
Cc: Scott Wood, Bhushan Bharat-R65777,
"“tiejun.chen” Chen",
kvm-ppc, kvm@vger.kernel.org list, Wood Scott-B07421,
Paolo Bonzini, linuxppc-dev, Benjamin Herrenschmidt
On Thu, Jul 25, 2013 at 06:07:55PM +0200, Alexander Graf wrote:
>
> On 25.07.2013, at 10:50, Gleb Natapov wrote:
>
> > On Wed, Jul 24, 2013 at 03:32:49PM -0500, Scott Wood wrote:
> >> On 07/24/2013 04:39:59 AM, Alexander Graf wrote:
> >>>
> >>> On 24.07.2013, at 11:35, Gleb Natapov wrote:
> >>>
> >>>> On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
> >>>>>> Are not we going to use page_is_ram() from
> >>> e500_shadow_mas2_attrib() as Scott commented?
> >>>>>
> >>>>> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
> >>>>>
> >>>>>
> >>>> Because it is much slower and, IIRC, actually used to build pfn
> >>> map that allow
> >>>> us to check quickly for valid pfn.
> >>>
> >>> Then why should we use page_is_ram()? :)
> >>>
> >>> I really don't want the e500 code to diverge too much from what
> >>> the rest of the kvm code is doing.
> >>
> >> I don't understand "actually used to build pfn map...". What code
> >> is this? I don't see any calls to page_is_ram() in the KVM code, or
> >> in generic mm code. Is this a statement about what x86 does?
> > It may be not page_is_ram() directly, but the same into page_is_ram() is
> > using. On power both page_is_ram() and do_init_bootmem() walks some kind
> > of memblock_region data structure. What important is that pfn_valid()
> > does not mean that there is a memory behind page structure. See Andrea's
> > reply.
> >
> >>
> >> On PPC page_is_ram() is only called (AFAICT) for determining what
> >> attributes to set on mmaps. We want to be sure that KVM always
> >> makes the same decision. While pfn_valid() seems like it should be
> >> equivalent, it's not obvious from the PPC code that it is.
> >>
> > Again pfn_valid() is not enough.
> >
> >> If pfn_valid() is better, why is that not used for mmap? Why are
> >> there two different names for the same thing?
> >>
> > They are not the same thing. page_is_ram() tells you if phys address is
> > ram backed. pfn_valid() tells you if there is struct page behind the
> > pfn. PageReserved() tells if you a pfn is marked as reserved. All non
> > ram pfns should be reserved, but ram pfns can be reserved too. Again,
> > see Andrea's reply.
> >
> > Why ppc uses page_is_ram() for mmap? How should I know? But looking at
>
> That one's easy. Let's just ask Ben. Ben, is there any particular reason PPC uses page_is_ram() rather than what KVM does here to figure out whether a pfn is RAM or not? It would be really useful to be able to run the exact same logic that figures out whether we're cacheable or not in both TLB writers (KVM and linux-mm).
>
KVM does not only try to figure out what is RAM or not! Look at how KVM
uses the function. KVM tries to figure out if refcounting needed to be
used on this page among other things.
--
Gleb.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
@ 2013-07-25 16:14 ` Gleb Natapov
0 siblings, 0 replies; 44+ messages in thread
From: Gleb Natapov @ 2013-07-25 16:14 UTC (permalink / raw)
To: Alexander Graf
Cc: Wood Scott-B07421, kvm@vger.kernel.org list, kvm-ppc,
"“tiejun.chen” Chen",
Bhushan Bharat-R65777, Scott Wood, Paolo Bonzini, linuxppc-dev
On Thu, Jul 25, 2013 at 06:07:55PM +0200, Alexander Graf wrote:
>
> On 25.07.2013, at 10:50, Gleb Natapov wrote:
>
> > On Wed, Jul 24, 2013 at 03:32:49PM -0500, Scott Wood wrote:
> >> On 07/24/2013 04:39:59 AM, Alexander Graf wrote:
> >>>
> >>> On 24.07.2013, at 11:35, Gleb Natapov wrote:
> >>>
> >>>> On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
> >>>>>> Are not we going to use page_is_ram() from
> >>> e500_shadow_mas2_attrib() as Scott commented?
> >>>>>
> >>>>> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
> >>>>>
> >>>>>
> >>>> Because it is much slower and, IIRC, actually used to build pfn
> >>> map that allow
> >>>> us to check quickly for valid pfn.
> >>>
> >>> Then why should we use page_is_ram()? :)
> >>>
> >>> I really don't want the e500 code to diverge too much from what
> >>> the rest of the kvm code is doing.
> >>
> >> I don't understand "actually used to build pfn map...". What code
> >> is this? I don't see any calls to page_is_ram() in the KVM code, or
> >> in generic mm code. Is this a statement about what x86 does?
> > It may be not page_is_ram() directly, but the same into page_is_ram() is
> > using. On power both page_is_ram() and do_init_bootmem() walks some kind
> > of memblock_region data structure. What important is that pfn_valid()
> > does not mean that there is a memory behind page structure. See Andrea's
> > reply.
> >
> >>
> >> On PPC page_is_ram() is only called (AFAICT) for determining what
> >> attributes to set on mmaps. We want to be sure that KVM always
> >> makes the same decision. While pfn_valid() seems like it should be
> >> equivalent, it's not obvious from the PPC code that it is.
> >>
> > Again pfn_valid() is not enough.
> >
> >> If pfn_valid() is better, why is that not used for mmap? Why are
> >> there two different names for the same thing?
> >>
> > They are not the same thing. page_is_ram() tells you if phys address is
> > ram backed. pfn_valid() tells you if there is struct page behind the
> > pfn. PageReserved() tells if you a pfn is marked as reserved. All non
> > ram pfns should be reserved, but ram pfns can be reserved too. Again,
> > see Andrea's reply.
> >
> > Why ppc uses page_is_ram() for mmap? How should I know? But looking at
>
> That one's easy. Let's just ask Ben. Ben, is there any particular reason PPC uses page_is_ram() rather than what KVM does here to figure out whether a pfn is RAM or not? It would be really useful to be able to run the exact same logic that figures out whether we're cacheable or not in both TLB writers (KVM and linux-mm).
>
KVM does not only try to figure out what is RAM or not! Look at how KVM
uses the function. KVM tries to figure out if refcounting needed to be
used on this page among other things.
--
Gleb.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
2013-07-25 8:50 ` Gleb Natapov
@ 2013-07-26 22:27 ` Scott Wood
-1 siblings, 0 replies; 44+ messages in thread
From: Scott Wood @ 2013-07-26 22:27 UTC (permalink / raw)
To: Gleb Natapov
Cc: Alexander Graf, Bhushan Bharat-R65777, “tiejun.chen”,
kvm-ppc, kvm@vger.kernel.org list, Wood Scott-B07421,
Paolo Bonzini, linuxppc-dev
On 07/25/2013 03:50:42 AM, Gleb Natapov wrote:
> Why ppc uses page_is_ram() for mmap? How should I know? But looking at
> the function it does it only as a fallback if
> ppc_md.phys_mem_access_prot() is not provided. Making access to MMIO
> noncached as a safe fallback makes sense.
There's only one current implementation of
ppc_md.phys_mem_access_prot(), which is pci_phys_mem_access_prot(),
which also uses page_is_ram(). If page_is_ram() returns false then it
checks for write-combining PCI. But yes, we would want to call
ppc_md.phys_mem_access_prot() if present.
Copying from the host PTE would be ideal if doesn't come with a
noticeable performance impact compared to other methods, but one way or
another we want to be sure we match.
> It is also make sense to allow noncached access to reserved ram
> sometimes.
Perhaps, but that's not KVM's decision to make. You should get the
same result as if you mmaped it -- because QEMU already did and we need
to be consistent. Not to mention the large page kernel mapping that
will have been done on e500...
-Scott
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
@ 2013-07-26 22:27 ` Scott Wood
0 siblings, 0 replies; 44+ messages in thread
From: Scott Wood @ 2013-07-26 22:27 UTC (permalink / raw)
To: Gleb Natapov
Cc: Wood Scott-B07421, kvm@vger.kernel.org list, Alexander Graf,
kvm-ppc, “tiejun.chen”,
Bhushan Bharat-R65777, Paolo Bonzini, linuxppc-dev
On 07/25/2013 03:50:42 AM, Gleb Natapov wrote:
> Why ppc uses page_is_ram() for mmap? How should I know? But looking at
> the function it does it only as a fallback if
> ppc_md.phys_mem_access_prot() is not provided. Making access to MMIO
> noncached as a safe fallback makes sense.
There's only one current implementation of =20
ppc_md.phys_mem_access_prot(), which is pci_phys_mem_access_prot(), =20
which also uses page_is_ram(). If page_is_ram() returns false then it =20
checks for write-combining PCI. But yes, we would want to call =20
ppc_md.phys_mem_access_prot() if present.
Copying from the host PTE would be ideal if doesn't come with a =20
noticeable performance impact compared to other methods, but one way or =20
another we want to be sure we match.
> It is also make sense to allow noncached access to reserved ram =20
> sometimes.
Perhaps, but that's not KVM's decision to make. You should get the =20
same result as if you mmaped it -- because QEMU already did and we need =20
to be consistent. Not to mention the large page kernel mapping that =20
will have been done on e500...
-Scott=
^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2013-07-26 22:27 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-18 6:04 [PATCH 1/2] kvm: powerpc: Do not ignore "E" attribute in mas2 Bharat Bhushan
2013-07-18 6:04 ` [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages Bharat Bhushan
2013-07-18 6:26 ` "“tiejun.chen”"
2013-07-18 7:12 ` Bhushan Bharat-R65777
2013-07-18 7:31 ` "“tiejun.chen”"
2013-07-18 8:08 ` Bhushan Bharat-R65777
2013-07-18 8:21 ` "“tiejun.chen”"
2013-07-18 8:22 ` Bhushan Bharat-R65777
2013-07-18 8:25 ` Bhushan Bharat-R65777
2013-07-18 8:55 ` "“tiejun.chen”"
2013-07-18 9:44 ` Alexander Graf
2013-07-18 9:56 ` "“tiejun.chen”"
2013-07-18 10:00 ` Alexander Graf
2013-07-18 10:14 ` "“tiejun.chen”"
2013-07-18 16:11 ` Scott Wood
2013-07-18 9:48 ` Alexander Graf
2013-07-18 9:51 ` Bhushan Bharat-R65777
2013-07-18 10:08 ` "“tiejun.chen”"
2013-07-18 10:12 ` Alexander Graf
2013-07-18 10:19 ` "“tiejun.chen”"
2013-07-18 10:27 ` Alexander Graf
2013-07-24 2:26 ` "“tiejun.chen”"
2013-07-24 8:25 ` Alexander Graf
2013-07-24 9:11 ` Bhushan Bharat-R65777
2013-07-24 9:21 ` Alexander Graf
2013-07-24 9:35 ` Gleb Natapov
2013-07-24 9:39 ` Alexander Graf
2013-07-24 20:32 ` Scott Wood
2013-07-24 20:32 ` Scott Wood
2013-07-25 8:50 ` Gleb Natapov
2013-07-25 8:50 ` Gleb Natapov
2013-07-25 16:07 ` Alexander Graf
2013-07-25 16:07 ` Alexander Graf
2013-07-25 16:14 ` Gleb Natapov
2013-07-25 16:14 ` Gleb Natapov
2013-07-26 22:27 ` Scott Wood
2013-07-26 22:27 ` Scott Wood
2013-07-24 10:01 ` Gleb Natapov
2013-07-24 10:09 ` Alexander Graf
2013-07-24 10:19 ` Gleb Natapov
2013-07-24 10:25 ` Alexander Graf
2013-07-24 10:30 ` Gleb Natapov
2013-07-25 1:04 ` Andrea Arcangeli
2013-07-18 8:27 ` "“tiejun.chen”"
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.