All of lore.kernel.org
 help / color / mirror / Atom feed
* How to handle PTE tables with non contiguous entries ?
@ 2018-09-10 14:34 Christophe Leroy
  2018-09-10 20:05   ` Dan Malek
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Christophe Leroy @ 2018-09-10 14:34 UTC (permalink / raw)
  To: akpm, linux-mm, aneesh.kumar, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev
  Cc: LKML

Hi,

I'm having a hard time figuring out the best way to handle the following 
situation:

On the powerpc8xx, handling 16k size pages requires to have page tables 
with 4 identical entries.

Initially I was thinking about handling this by simply modifying 
pte_index() which changing pte_t type in order to have one entry every 
16 bytes, then replicate the PTE value at *ptep, *ptep+1,*ptep+2 and 
*ptep+3 both in set_pte_at() and pte_update().

However, this doesn't work because many many places in the mm core part 
of the kernel use loops on ptep with single ptep++ increment.

Therefore did it with the following hack:

  /* PTE level */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
+#else
  typedef struct { pte_basic_t pte; } pte_t;
+#endif

@@ -181,7 +192,13 @@ static inline unsigned long pte_update(pte_t *p,
         : "cc" );
  #else /* PTE_ATOMIC_UPDATES */
         unsigned long old = pte_val(*p);
-       *p = __pte((old & ~clr) | set);
+       unsigned long new = (old & ~clr) | set;
+
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+       p->pte = p->pte1 = p->pte2 = p->pte3 = new;
+#else
+       *p = __pte(new);
+#endif
  #endif /* !PTE_ATOMIC_UPDATES */

  #ifdef CONFIG_44x


@@ -161,7 +161,11 @@ static inline void __set_pte_at(struct mm_struct 
*mm, unsigned long addr,
         /* Anything else just stores the PTE normally. That covers all 
64-bit
          * cases, and 32-bit non-hash with 32-bit PTEs.
          */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+       ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
+#else
         *ptep = pte;
+#endif



But I'm not too happy with it as it means pte_t is not a single type 
anymore so passing it from one function to the other is quite heavy.


Would someone have an idea of an elegent way to handle that ?

Thanks
Christophe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to handle PTE tables with non contiguous entries ?
  2018-09-10 14:34 How to handle PTE tables with non contiguous entries ? Christophe Leroy
@ 2018-09-10 20:05   ` Dan Malek
  2018-09-10 21:06 ` Nicholas Piggin
  2018-09-17  9:03 ` Aneesh Kumar K.V
  2 siblings, 0 replies; 15+ messages in thread
From: Dan Malek @ 2018-09-10 20:05 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: akpm, linux-mm, aneesh.kumar, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, LKML


Hello Cristophe.

> On Sep 10, 2018, at 7:34 AM, Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
> On the powerpc8xx, handling 16k size pages requires to have page tables with 4 identical entries.

Do you think a 16k page is useful?  Back in the day, the goal was to keep the fault handling and management overhead as simple and generic as possible, as you know this affects the system performance.  I understand there would be fewer page faults and more efficient use of the MMU resources with 16k, but if this comes at an overhead cost, is it really worth it?

In addition to the normal 4k mapping, I had thought about using 512k mapping, which could be easily detected at level 2 (PMD), with a single entry loaded into the MMU.  We would need an aux header or something from the executable/library to assist with knowing when this could be done.  I never got around to it. :)

The 8xx platforms tended to have smaller memory resources, so the 4k granularity was also useful in making better use of the available space.

> Would someone have an idea of an elegent way to handle that ?

My suggestion would be to not change the PTE table, but have the fault handler detect a 16k page and load any one of the four entries based upon miss offset.  Kinda use the same 4k miss hander, but with 16k knowledge.  You wouldn’t save any PTE table space, but the MMU efficiency may be worth it.  As I recall, the hardware may ignore/mask any LS bits, and there is PMD level information to utilize as well.

It’s been a long time since I’ve investigated how things have evolved, glad it’s still in use, and I hope you at least have some fun with the development :)

Thanks.

	— Dan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to handle PTE tables with non contiguous entries ?
@ 2018-09-10 20:05   ` Dan Malek
  0 siblings, 0 replies; 15+ messages in thread
From: Dan Malek @ 2018-09-10 20:05 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: akpm, linux-mm, aneesh.kumar, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, LKML


Hello Cristophe.

> On Sep 10, 2018, at 7:34 AM, Christophe Leroy =
<christophe.leroy@c-s.fr> wrote:
>=20
> On the powerpc8xx, handling 16k size pages requires to have page =
tables with 4 identical entries.

Do you think a 16k page is useful?  Back in the day, the goal was to =
keep the fault handling and management overhead as simple and generic as =
possible, as you know this affects the system performance.  I understand =
there would be fewer page faults and more efficient use of the MMU =
resources with 16k, but if this comes at an overhead cost, is it really =
worth it?

In addition to the normal 4k mapping, I had thought about using 512k =
mapping, which could be easily detected at level 2 (PMD), with a single =
entry loaded into the MMU.  We would need an aux header or something =
from the executable/library to assist with knowing when this could be =
done.  I never got around to it. :)

The 8xx platforms tended to have smaller memory resources, so the 4k =
granularity was also useful in making better use of the available space.

> Would someone have an idea of an elegent way to handle that ?

My suggestion would be to not change the PTE table, but have the fault =
handler detect a 16k page and load any one of the four entries based =
upon miss offset.  Kinda use the same 4k miss hander, but with 16k =
knowledge.  You wouldn=E2=80=99t save any PTE table space, but the MMU =
efficiency may be worth it.  As I recall, the hardware may ignore/mask =
any LS bits, and there is PMD level information to utilize as well.

It=E2=80=99s been a long time since I=E2=80=99ve investigated how things =
have evolved, glad it=E2=80=99s still in use, and I hope you at least =
have some fun with the development :)

Thanks.

	=E2=80=94 Dan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to handle PTE tables with non contiguous entries ?
  2018-09-10 14:34 How to handle PTE tables with non contiguous entries ? Christophe Leroy
  2018-09-10 20:05   ` Dan Malek
@ 2018-09-10 21:06 ` Nicholas Piggin
  2018-09-11  5:39     ` Christophe LEROY
  2018-09-17  9:03 ` Aneesh Kumar K.V
  2 siblings, 1 reply; 15+ messages in thread
From: Nicholas Piggin @ 2018-09-10 21:06 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: akpm, linux-mm, aneesh.kumar, Michael Ellerman, linuxppc-dev, LKML

On Mon, 10 Sep 2018 14:34:37 +0000
Christophe Leroy <christophe.leroy@c-s.fr> wrote:

> Hi,
> 
> I'm having a hard time figuring out the best way to handle the following 
> situation:
> 
> On the powerpc8xx, handling 16k size pages requires to have page tables 
> with 4 identical entries.
> 
> Initially I was thinking about handling this by simply modifying 
> pte_index() which changing pte_t type in order to have one entry every 
> 16 bytes, then replicate the PTE value at *ptep, *ptep+1,*ptep+2 and 
> *ptep+3 both in set_pte_at() and pte_update().
> 
> However, this doesn't work because many many places in the mm core part 
> of the kernel use loops on ptep with single ptep++ increment.
> 
> Therefore did it with the following hack:
> 
>   /* PTE level */
> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
> +typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
> +#else
>   typedef struct { pte_basic_t pte; } pte_t;
> +#endif
> 
> @@ -181,7 +192,13 @@ static inline unsigned long pte_update(pte_t *p,
>          : "cc" );
>   #else /* PTE_ATOMIC_UPDATES */
>          unsigned long old = pte_val(*p);
> -       *p = __pte((old & ~clr) | set);
> +       unsigned long new = (old & ~clr) | set;
> +
> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
> +       p->pte = p->pte1 = p->pte2 = p->pte3 = new;
> +#else
> +       *p = __pte(new);
> +#endif
>   #endif /* !PTE_ATOMIC_UPDATES */
> 
>   #ifdef CONFIG_44x
> 
> 
> @@ -161,7 +161,11 @@ static inline void __set_pte_at(struct mm_struct 
> *mm, unsigned long addr,
>          /* Anything else just stores the PTE normally. That covers all 
> 64-bit
>           * cases, and 32-bit non-hash with 32-bit PTEs.
>           */
> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
> +       ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
> +#else
>          *ptep = pte;
> +#endif
> 
> 
> 
> But I'm not too happy with it as it means pte_t is not a single type 
> anymore so passing it from one function to the other is quite heavy.
> 
> 
> Would someone have an idea of an elegent way to handle that ?

I can't think of anything better. Do we pass pte by value to a lot of
non inlined functions? Possible to inline the important ones?

Other option, try to get an iterator like pte = pte_next(pte) into core
code.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to handle PTE tables with non contiguous entries ?
  2018-09-10 20:05   ` Dan Malek
@ 2018-09-11  5:28     ` Christophe LEROY
  -1 siblings, 0 replies; 15+ messages in thread
From: Christophe LEROY @ 2018-09-11  5:28 UTC (permalink / raw)
  To: Dan Malek
  Cc: akpm, linux-mm, aneesh.kumar, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, LKML



Le 10/09/2018 à 22:05, Dan Malek a écrit :
> 
> Hello Cristophe.
> 
>> On Sep 10, 2018, at 7:34 AM, Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>
>> On the powerpc8xx, handling 16k size pages requires to have page tables with 4 identical entries.
> 
> Do you think a 16k page is useful?  Back in the day, the goal was to keep the fault handling and management overhead as simple and generic as possible, as you know this affects the system performance.  I understand there would be fewer page faults and more efficient use of the MMU resources with 16k, but if this comes at an overhead cost, is it really worth it?

Yes that's definitly usefull, the current 16k implementation already 
provides nice results, but it is based on the Linux structure, which 
implies not being able to use the 8xx HW assistance in TLBmiss handlers.

That's the reason why I'm trying to alter the Linux structure to match 
the 8xx page layout, hence the need to have 4 entries in the PTE for 
each 16k page.

> 
> In addition to the normal 4k mapping, I had thought about using 512k mapping, which could be easily detected at level 2 (PMD), with a single entry loaded into the MMU.  We would need an aux header or something from the executable/library to assist with knowing when this could be done.  I never got around to it. :)

Yes, 512k and 8M hugepages are implemented as well, but they are based 
on Linux structure, hence requiring some time consuming handling like 
checking the page size on every miss in order to run the appropriate 
part of the handler.

With the HW layout, the 512k entries are spread every 128 bytes in the 
PTE table but with those I don't have much problem because the hugepage 
code uses huge_pte_offset() and never increase the pte pointer directly.


> 
> The 8xx platforms tended to have smaller memory resources, so the 4k granularity was also useful in making better use of the available space.

Well, on my boards I have 128Mbytes, 16k page and hugepages have shown 
their benefit.

> 
>> Would someone have an idea of an elegent way to handle that ?
> 
> My suggestion would be to not change the PTE table, but have the fault handler detect a 16k page and load any one of the four entries based upon miss offset.  Kinda use the same 4k miss hander, but with 16k knowledge.  You wouldn’t save any PTE table space, but the MMU efficiency may be worth it.  As I recall, the hardware may ignore/mask any LS bits, and there is PMD level information to utilize as well.

That's exactly what I want to do, which means that everytime pte++ is 
encountered in some mm/memory.c file needs to push the index to the next 
16k page ie increase the pointer by 4 entries.

> 
> It’s been a long time since I’ve investigated how things have evolved, glad it’s still in use, and I hope you at least have some fun with the development :)

Thanks
Christophe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to handle PTE tables with non contiguous entries ?
@ 2018-09-11  5:28     ` Christophe LEROY
  0 siblings, 0 replies; 15+ messages in thread
From: Christophe LEROY @ 2018-09-11  5:28 UTC (permalink / raw)
  To: Dan Malek
  Cc: akpm, linux-mm, aneesh.kumar, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, LKML



Le 10/09/2018 A  22:05, Dan Malek a A(C)critA :
> 
> Hello Cristophe.
> 
>> On Sep 10, 2018, at 7:34 AM, Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>
>> On the powerpc8xx, handling 16k size pages requires to have page tables with 4 identical entries.
> 
> Do you think a 16k page is useful?  Back in the day, the goal was to keep the fault handling and management overhead as simple and generic as possible, as you know this affects the system performance.  I understand there would be fewer page faults and more efficient use of the MMU resources with 16k, but if this comes at an overhead cost, is it really worth it?

Yes that's definitly usefull, the current 16k implementation already 
provides nice results, but it is based on the Linux structure, which 
implies not being able to use the 8xx HW assistance in TLBmiss handlers.

That's the reason why I'm trying to alter the Linux structure to match 
the 8xx page layout, hence the need to have 4 entries in the PTE for 
each 16k page.

> 
> In addition to the normal 4k mapping, I had thought about using 512k mapping, which could be easily detected at level 2 (PMD), with a single entry loaded into the MMU.  We would need an aux header or something from the executable/library to assist with knowing when this could be done.  I never got around to it. :)

Yes, 512k and 8M hugepages are implemented as well, but they are based 
on Linux structure, hence requiring some time consuming handling like 
checking the page size on every miss in order to run the appropriate 
part of the handler.

With the HW layout, the 512k entries are spread every 128 bytes in the 
PTE table but with those I don't have much problem because the hugepage 
code uses huge_pte_offset() and never increase the pte pointer directly.


> 
> The 8xx platforms tended to have smaller memory resources, so the 4k granularity was also useful in making better use of the available space.

Well, on my boards I have 128Mbytes, 16k page and hugepages have shown 
their benefit.

> 
>> Would someone have an idea of an elegent way to handle that ?
> 
> My suggestion would be to not change the PTE table, but have the fault handler detect a 16k page and load any one of the four entries based upon miss offset.  Kinda use the same 4k miss hander, but with 16k knowledge.  You wouldna??t save any PTE table space, but the MMU efficiency may be worth it.  As I recall, the hardware may ignore/mask any LS bits, and there is PMD level information to utilize as well.

That's exactly what I want to do, which means that everytime pte++ is 
encountered in some mm/memory.c file needs to push the index to the next 
16k page ie increase the pointer by 4 entries.

> 
> Ita??s been a long time since Ia??ve investigated how things have evolved, glad ita??s still in use, and I hope you at least have some fun with the development :)

Thanks
Christophe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to handle PTE tables with non contiguous entries ?
  2018-09-10 21:06 ` Nicholas Piggin
@ 2018-09-11  5:39     ` Christophe LEROY
  0 siblings, 0 replies; 15+ messages in thread
From: Christophe LEROY @ 2018-09-11  5:39 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: akpm, linux-mm, aneesh.kumar, Michael Ellerman, linuxppc-dev, LKML



Le 10/09/2018 à 23:06, Nicholas Piggin a écrit :
> On Mon, 10 Sep 2018 14:34:37 +0000
> Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
>> Hi,
>>
>> I'm having a hard time figuring out the best way to handle the following
>> situation:
>>
>> On the powerpc8xx, handling 16k size pages requires to have page tables
>> with 4 identical entries.
>>
>> Initially I was thinking about handling this by simply modifying
>> pte_index() which changing pte_t type in order to have one entry every
>> 16 bytes, then replicate the PTE value at *ptep, *ptep+1,*ptep+2 and
>> *ptep+3 both in set_pte_at() and pte_update().
>>
>> However, this doesn't work because many many places in the mm core part
>> of the kernel use loops on ptep with single ptep++ increment.
>>
>> Therefore did it with the following hack:
>>
>>    /* PTE level */
>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>> +typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
>> +#else
>>    typedef struct { pte_basic_t pte; } pte_t;
>> +#endif
>>
>> @@ -181,7 +192,13 @@ static inline unsigned long pte_update(pte_t *p,
>>           : "cc" );
>>    #else /* PTE_ATOMIC_UPDATES */
>>           unsigned long old = pte_val(*p);
>> -       *p = __pte((old & ~clr) | set);
>> +       unsigned long new = (old & ~clr) | set;
>> +
>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>> +       p->pte = p->pte1 = p->pte2 = p->pte3 = new;
>> +#else
>> +       *p = __pte(new);
>> +#endif
>>    #endif /* !PTE_ATOMIC_UPDATES */
>>
>>    #ifdef CONFIG_44x
>>
>>
>> @@ -161,7 +161,11 @@ static inline void __set_pte_at(struct mm_struct
>> *mm, unsigned long addr,
>>           /* Anything else just stores the PTE normally. That covers all
>> 64-bit
>>            * cases, and 32-bit non-hash with 32-bit PTEs.
>>            */
>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>> +       ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
>> +#else
>>           *ptep = pte;
>> +#endif
>>
>>
>>
>> But I'm not too happy with it as it means pte_t is not a single type
>> anymore so passing it from one function to the other is quite heavy.
>>
>>
>> Would someone have an idea of an elegent way to handle that ?
> 
> I can't think of anything better. Do we pass pte by value to a lot of
> non inlined functions? Possible to inline the important ones?

Good question, I need to check that.

> 
> Other option, try to get an iterator like pte = pte_next(pte) into core
> code.

Yes I've been thinking about that, but it looks like a huge job to 
identify all places, as some drivers are also playing with it.
I'm not sure it is only to find all 'pte++' and 'ptep++', I fear there 
might be places with more unexpected names.

Thanks
Christophe

> 
> Thanks,
> Nick
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to handle PTE tables with non contiguous entries ?
@ 2018-09-11  5:39     ` Christophe LEROY
  0 siblings, 0 replies; 15+ messages in thread
From: Christophe LEROY @ 2018-09-11  5:39 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: akpm, linux-mm, aneesh.kumar, Michael Ellerman, linuxppc-dev, LKML



Le 10/09/2018 A  23:06, Nicholas Piggin a A(C)critA :
> On Mon, 10 Sep 2018 14:34:37 +0000
> Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
>> Hi,
>>
>> I'm having a hard time figuring out the best way to handle the following
>> situation:
>>
>> On the powerpc8xx, handling 16k size pages requires to have page tables
>> with 4 identical entries.
>>
>> Initially I was thinking about handling this by simply modifying
>> pte_index() which changing pte_t type in order to have one entry every
>> 16 bytes, then replicate the PTE value at *ptep, *ptep+1,*ptep+2 and
>> *ptep+3 both in set_pte_at() and pte_update().
>>
>> However, this doesn't work because many many places in the mm core part
>> of the kernel use loops on ptep with single ptep++ increment.
>>
>> Therefore did it with the following hack:
>>
>>    /* PTE level */
>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>> +typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
>> +#else
>>    typedef struct { pte_basic_t pte; } pte_t;
>> +#endif
>>
>> @@ -181,7 +192,13 @@ static inline unsigned long pte_update(pte_t *p,
>>           : "cc" );
>>    #else /* PTE_ATOMIC_UPDATES */
>>           unsigned long old = pte_val(*p);
>> -       *p = __pte((old & ~clr) | set);
>> +       unsigned long new = (old & ~clr) | set;
>> +
>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>> +       p->pte = p->pte1 = p->pte2 = p->pte3 = new;
>> +#else
>> +       *p = __pte(new);
>> +#endif
>>    #endif /* !PTE_ATOMIC_UPDATES */
>>
>>    #ifdef CONFIG_44x
>>
>>
>> @@ -161,7 +161,11 @@ static inline void __set_pte_at(struct mm_struct
>> *mm, unsigned long addr,
>>           /* Anything else just stores the PTE normally. That covers all
>> 64-bit
>>            * cases, and 32-bit non-hash with 32-bit PTEs.
>>            */
>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>> +       ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
>> +#else
>>           *ptep = pte;
>> +#endif
>>
>>
>>
>> But I'm not too happy with it as it means pte_t is not a single type
>> anymore so passing it from one function to the other is quite heavy.
>>
>>
>> Would someone have an idea of an elegent way to handle that ?
> 
> I can't think of anything better. Do we pass pte by value to a lot of
> non inlined functions? Possible to inline the important ones?

Good question, I need to check that.

> 
> Other option, try to get an iterator like pte = pte_next(pte) into core
> code.

Yes I've been thinking about that, but it looks like a huge job to 
identify all places, as some drivers are also playing with it.
I'm not sure it is only to find all 'pte++' and 'ptep++', I fear there 
might be places with more unexpected names.

Thanks
Christophe

> 
> Thanks,
> Nick
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to handle PTE tables with non contiguous entries ?
  2018-09-10 14:34 How to handle PTE tables with non contiguous entries ? Christophe Leroy
  2018-09-10 20:05   ` Dan Malek
  2018-09-10 21:06 ` Nicholas Piggin
@ 2018-09-17  9:03 ` Aneesh Kumar K.V
  2018-09-17  9:47     ` Christophe LEROY
  2 siblings, 1 reply; 15+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-17  9:03 UTC (permalink / raw)
  To: Christophe Leroy, akpm, linux-mm, aneesh.kumar, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev
  Cc: LKML

Christophe Leroy <christophe.leroy@c-s.fr> writes:

> Hi,
>
> I'm having a hard time figuring out the best way to handle the following 
> situation:
>
> On the powerpc8xx, handling 16k size pages requires to have page tables 
> with 4 identical entries.

I assume that hugetlb page size? If so isn't that similar to FSL hugetlb
page table layout?

>
> Initially I was thinking about handling this by simply modifying 
> pte_index() which changing pte_t type in order to have one entry every 
> 16 bytes, then replicate the PTE value at *ptep, *ptep+1,*ptep+2 and 
> *ptep+3 both in set_pte_at() and pte_update().
>
> However, this doesn't work because many many places in the mm core part 
> of the kernel use loops on ptep with single ptep++ increment.
>
> Therefore did it with the following hack:
>
>   /* PTE level */
> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
> +typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
> +#else
>   typedef struct { pte_basic_t pte; } pte_t;
> +#endif
>
> @@ -181,7 +192,13 @@ static inline unsigned long pte_update(pte_t *p,
>          : "cc" );
>   #else /* PTE_ATOMIC_UPDATES */
>          unsigned long old = pte_val(*p);
> -       *p = __pte((old & ~clr) | set);
> +       unsigned long new = (old & ~clr) | set;
> +
> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
> +       p->pte = p->pte1 = p->pte2 = p->pte3 = new;
> +#else
> +       *p = __pte(new);
> +#endif
>   #endif /* !PTE_ATOMIC_UPDATES */
>
>   #ifdef CONFIG_44x
>
>
> @@ -161,7 +161,11 @@ static inline void __set_pte_at(struct mm_struct 
> *mm, unsigned long addr,
>          /* Anything else just stores the PTE normally. That covers all 
> 64-bit
>           * cases, and 32-bit non-hash with 32-bit PTEs.
>           */
> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
> +       ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
> +#else
>          *ptep = pte;
> +#endif
>
>
>
> But I'm not too happy with it as it means pte_t is not a single type 
> anymore so passing it from one function to the other is quite heavy.
>
>
> Would someone have an idea of an elegent way to handle that ?
>
> Thanks
> Christophe

Why would pte_update bother about updating all the 4 entries?. Can you
help me understand the issue?

-aneesh


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to handle PTE tables with non contiguous entries ?
  2018-09-17  9:03 ` Aneesh Kumar K.V
@ 2018-09-17  9:47     ` Christophe LEROY
  0 siblings, 0 replies; 15+ messages in thread
From: Christophe LEROY @ 2018-09-17  9:47 UTC (permalink / raw)
  To: Aneesh Kumar K.V, akpm, linux-mm, aneesh.kumar, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev
  Cc: LKML



Le 17/09/2018 à 11:03, Aneesh Kumar K.V a écrit :
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
> 
>> Hi,
>>
>> I'm having a hard time figuring out the best way to handle the following
>> situation:
>>
>> On the powerpc8xx, handling 16k size pages requires to have page tables
>> with 4 identical entries.
> 
> I assume that hugetlb page size? If so isn't that similar to FSL hugetlb
> page table layout?

No, it is not for 16k hugepage size with a standard page size of 4k.

Here I'm trying to handle the case of CONFIG_PPC_16K_PAGES.
As of today, it is implemented by using the standard Linux page layout, 
ie one PTE entry for each 16k page. This forbids the use the 8xx HW 
assistance.

> 
>>
>> Initially I was thinking about handling this by simply modifying
>> pte_index() which changing pte_t type in order to have one entry every
>> 16 bytes, then replicate the PTE value at *ptep, *ptep+1,*ptep+2 and
>> *ptep+3 both in set_pte_at() and pte_update().
>>
>> However, this doesn't work because many many places in the mm core part
>> of the kernel use loops on ptep with single ptep++ increment.
>>
>> Therefore did it with the following hack:
>>
>>    /* PTE level */
>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>> +typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
>> +#else
>>    typedef struct { pte_basic_t pte; } pte_t;
>> +#endif
>>
>> @@ -181,7 +192,13 @@ static inline unsigned long pte_update(pte_t *p,
>>           : "cc" );
>>    #else /* PTE_ATOMIC_UPDATES */
>>           unsigned long old = pte_val(*p);
>> -       *p = __pte((old & ~clr) | set);
>> +       unsigned long new = (old & ~clr) | set;
>> +
>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>> +       p->pte = p->pte1 = p->pte2 = p->pte3 = new;
>> +#else
>> +       *p = __pte(new);
>> +#endif
>>    #endif /* !PTE_ATOMIC_UPDATES */
>>
>>    #ifdef CONFIG_44x
>>
>>
>> @@ -161,7 +161,11 @@ static inline void __set_pte_at(struct mm_struct
>> *mm, unsigned long addr,
>>           /* Anything else just stores the PTE normally. That covers all
>> 64-bit
>>            * cases, and 32-bit non-hash with 32-bit PTEs.
>>            */
>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>> +       ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
>> +#else
>>           *ptep = pte;
>> +#endif
>>
>>
>>
>> But I'm not too happy with it as it means pte_t is not a single type
>> anymore so passing it from one function to the other is quite heavy.
>>
>>
>> Would someone have an idea of an elegent way to handle that ?
>>
>> Thanks
>> Christophe
> 
> Why would pte_update bother about updating all the 4 entries?. Can you
> help me understand the issue?

Because the 8xx HW assistance expects 4 identical entries for each 16k 
page, so everytime a PTE is updated the 4 entries have to be updated.

Christophe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to handle PTE tables with non contiguous entries ?
@ 2018-09-17  9:47     ` Christophe LEROY
  0 siblings, 0 replies; 15+ messages in thread
From: Christophe LEROY @ 2018-09-17  9:47 UTC (permalink / raw)
  To: Aneesh Kumar K.V, akpm, linux-mm, aneesh.kumar, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev
  Cc: LKML



Le 17/09/2018 A  11:03, Aneesh Kumar K.V a A(C)critA :
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
> 
>> Hi,
>>
>> I'm having a hard time figuring out the best way to handle the following
>> situation:
>>
>> On the powerpc8xx, handling 16k size pages requires to have page tables
>> with 4 identical entries.
> 
> I assume that hugetlb page size? If so isn't that similar to FSL hugetlb
> page table layout?

No, it is not for 16k hugepage size with a standard page size of 4k.

Here I'm trying to handle the case of CONFIG_PPC_16K_PAGES.
As of today, it is implemented by using the standard Linux page layout, 
ie one PTE entry for each 16k page. This forbids the use the 8xx HW 
assistance.

> 
>>
>> Initially I was thinking about handling this by simply modifying
>> pte_index() which changing pte_t type in order to have one entry every
>> 16 bytes, then replicate the PTE value at *ptep, *ptep+1,*ptep+2 and
>> *ptep+3 both in set_pte_at() and pte_update().
>>
>> However, this doesn't work because many many places in the mm core part
>> of the kernel use loops on ptep with single ptep++ increment.
>>
>> Therefore did it with the following hack:
>>
>>    /* PTE level */
>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>> +typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
>> +#else
>>    typedef struct { pte_basic_t pte; } pte_t;
>> +#endif
>>
>> @@ -181,7 +192,13 @@ static inline unsigned long pte_update(pte_t *p,
>>           : "cc" );
>>    #else /* PTE_ATOMIC_UPDATES */
>>           unsigned long old = pte_val(*p);
>> -       *p = __pte((old & ~clr) | set);
>> +       unsigned long new = (old & ~clr) | set;
>> +
>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>> +       p->pte = p->pte1 = p->pte2 = p->pte3 = new;
>> +#else
>> +       *p = __pte(new);
>> +#endif
>>    #endif /* !PTE_ATOMIC_UPDATES */
>>
>>    #ifdef CONFIG_44x
>>
>>
>> @@ -161,7 +161,11 @@ static inline void __set_pte_at(struct mm_struct
>> *mm, unsigned long addr,
>>           /* Anything else just stores the PTE normally. That covers all
>> 64-bit
>>            * cases, and 32-bit non-hash with 32-bit PTEs.
>>            */
>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>> +       ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
>> +#else
>>           *ptep = pte;
>> +#endif
>>
>>
>>
>> But I'm not too happy with it as it means pte_t is not a single type
>> anymore so passing it from one function to the other is quite heavy.
>>
>>
>> Would someone have an idea of an elegent way to handle that ?
>>
>> Thanks
>> Christophe
> 
> Why would pte_update bother about updating all the 4 entries?. Can you
> help me understand the issue?

Because the 8xx HW assistance expects 4 identical entries for each 16k 
page, so everytime a PTE is updated the 4 entries have to be updated.

Christophe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to handle PTE tables with non contiguous entries ?
  2018-09-17  9:47     ` Christophe LEROY
@ 2018-09-18 11:47       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-18 11:47 UTC (permalink / raw)
  To: Christophe LEROY, akpm, linux-mm, aneesh.kumar, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev
  Cc: LKML

Christophe LEROY <christophe.leroy@c-s.fr> writes:

> Le 17/09/2018 à 11:03, Aneesh Kumar K.V a écrit :
>> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>> 
>>> Hi,
>>>
>>> I'm having a hard time figuring out the best way to handle the following
>>> situation:
>>>
>>> On the powerpc8xx, handling 16k size pages requires to have page tables
>>> with 4 identical entries.
>> 
>> I assume that hugetlb page size? If so isn't that similar to FSL hugetlb
>> page table layout?
>
> No, it is not for 16k hugepage size with a standard page size of 4k.
>
> Here I'm trying to handle the case of CONFIG_PPC_16K_PAGES.
> As of today, it is implemented by using the standard Linux page layout, 
> ie one PTE entry for each 16k page. This forbids the use the 8xx HW 
> assistance.
>
>> 
>>>
>>> Initially I was thinking about handling this by simply modifying
>>> pte_index() which changing pte_t type in order to have one entry every
>>> 16 bytes, then replicate the PTE value at *ptep, *ptep+1,*ptep+2 and
>>> *ptep+3 both in set_pte_at() and pte_update().
>>>
>>> However, this doesn't work because many many places in the mm core part
>>> of the kernel use loops on ptep with single ptep++ increment.
>>>
>>> Therefore did it with the following hack:
>>>
>>>    /* PTE level */
>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>> +typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
>>> +#else
>>>    typedef struct { pte_basic_t pte; } pte_t;
>>> +#endif
>>>
>>> @@ -181,7 +192,13 @@ static inline unsigned long pte_update(pte_t *p,
>>>           : "cc" );
>>>    #else /* PTE_ATOMIC_UPDATES */
>>>           unsigned long old = pte_val(*p);
>>> -       *p = __pte((old & ~clr) | set);
>>> +       unsigned long new = (old & ~clr) | set;
>>> +
>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>> +       p->pte = p->pte1 = p->pte2 = p->pte3 = new;
>>> +#else
>>> +       *p = __pte(new);
>>> +#endif
>>>    #endif /* !PTE_ATOMIC_UPDATES */
>>>
>>>    #ifdef CONFIG_44x
>>>
>>>
>>> @@ -161,7 +161,11 @@ static inline void __set_pte_at(struct mm_struct
>>> *mm, unsigned long addr,
>>>           /* Anything else just stores the PTE normally. That covers all
>>> 64-bit
>>>            * cases, and 32-bit non-hash with 32-bit PTEs.
>>>            */
>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>> +       ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
>>> +#else
>>>           *ptep = pte;
>>> +#endif
>>>
>>>
>>>
>>> But I'm not too happy with it as it means pte_t is not a single type
>>> anymore so passing it from one function to the other is quite heavy.
>>>
>>>
>>> Would someone have an idea of an elegent way to handle that ?
>>>
>>> Thanks
>>> Christophe
>> 
>> Why would pte_update bother about updating all the 4 entries?. Can you
>> help me understand the issue?
>
> Because the 8xx HW assistance expects 4 identical entries for each 16k 
> page, so everytime a PTE is updated the 4 entries have to be updated.
>

What you suggested in the original mail is what matches that best isn't it?
That is a linux pte update involves updating 4 slot. Hence a linux pte
consist of 4 unsigned long?

-aneesh


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to handle PTE tables with non contiguous entries ?
@ 2018-09-18 11:47       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-18 11:47 UTC (permalink / raw)
  To: Christophe LEROY, akpm, linux-mm, aneesh.kumar, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev
  Cc: LKML

Christophe LEROY <christophe.leroy@c-s.fr> writes:

> Le 17/09/2018 =C3=A0 11:03, Aneesh Kumar K.V a =C3=A9crit=C2=A0:
>> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>>=20
>>> Hi,
>>>
>>> I'm having a hard time figuring out the best way to handle the following
>>> situation:
>>>
>>> On the powerpc8xx, handling 16k size pages requires to have page tables
>>> with 4 identical entries.
>>=20
>> I assume that hugetlb page size? If so isn't that similar to FSL hugetlb
>> page table layout?
>
> No, it is not for 16k hugepage size with a standard page size of 4k.
>
> Here I'm trying to handle the case of CONFIG_PPC_16K_PAGES.
> As of today, it is implemented by using the standard Linux page layout,=20
> ie one PTE entry for each 16k page. This forbids the use the 8xx HW=20
> assistance.
>
>>=20
>>>
>>> Initially I was thinking about handling this by simply modifying
>>> pte_index() which changing pte_t type in order to have one entry every
>>> 16 bytes, then replicate the PTE value at *ptep, *ptep+1,*ptep+2 and
>>> *ptep+3 both in set_pte_at() and pte_update().
>>>
>>> However, this doesn't work because many many places in the mm core part
>>> of the kernel use loops on ptep with single ptep++ increment.
>>>
>>> Therefore did it with the following hack:
>>>
>>>    /* PTE level */
>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>> +typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
>>> +#else
>>>    typedef struct { pte_basic_t pte; } pte_t;
>>> +#endif
>>>
>>> @@ -181,7 +192,13 @@ static inline unsigned long pte_update(pte_t *p,
>>>           : "cc" );
>>>    #else /* PTE_ATOMIC_UPDATES */
>>>           unsigned long old =3D pte_val(*p);
>>> -       *p =3D __pte((old & ~clr) | set);
>>> +       unsigned long new =3D (old & ~clr) | set;
>>> +
>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>> +       p->pte =3D p->pte1 =3D p->pte2 =3D p->pte3 =3D new;
>>> +#else
>>> +       *p =3D __pte(new);
>>> +#endif
>>>    #endif /* !PTE_ATOMIC_UPDATES */
>>>
>>>    #ifdef CONFIG_44x
>>>
>>>
>>> @@ -161,7 +161,11 @@ static inline void __set_pte_at(struct mm_struct
>>> *mm, unsigned long addr,
>>>           /* Anything else just stores the PTE normally. That covers all
>>> 64-bit
>>>            * cases, and 32-bit non-hash with 32-bit PTEs.
>>>            */
>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>> +       ptep->pte =3D ptep->pte1 =3D ptep->pte2 =3D ptep->pte3 =3D pte_=
val(pte);
>>> +#else
>>>           *ptep =3D pte;
>>> +#endif
>>>
>>>
>>>
>>> But I'm not too happy with it as it means pte_t is not a single type
>>> anymore so passing it from one function to the other is quite heavy.
>>>
>>>
>>> Would someone have an idea of an elegent way to handle that ?
>>>
>>> Thanks
>>> Christophe
>>=20
>> Why would pte_update bother about updating all the 4 entries?. Can you
>> help me understand the issue?
>
> Because the 8xx HW assistance expects 4 identical entries for each 16k=20
> page, so everytime a PTE is updated the 4 entries have to be updated.
>

What you suggested in the original mail is what matches that best isn't it?
That is a linux pte update involves updating 4 slot. Hence a linux pte
consist of 4 unsigned long?

-aneesh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to handle PTE tables with non contiguous entries ?
  2018-09-18 11:47       ` Aneesh Kumar K.V
@ 2018-09-18 11:53         ` Christophe LEROY
  -1 siblings, 0 replies; 15+ messages in thread
From: Christophe LEROY @ 2018-09-18 11:53 UTC (permalink / raw)
  To: Aneesh Kumar K.V, akpm, linux-mm, aneesh.kumar, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev
  Cc: LKML



Le 18/09/2018 à 13:47, Aneesh Kumar K.V a écrit :
> Christophe LEROY <christophe.leroy@c-s.fr> writes:
> 
>> Le 17/09/2018 à 11:03, Aneesh Kumar K.V a écrit :
>>> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>>>
>>>> Hi,
>>>>
>>>> I'm having a hard time figuring out the best way to handle the following
>>>> situation:
>>>>
>>>> On the powerpc8xx, handling 16k size pages requires to have page tables
>>>> with 4 identical entries.
>>>
>>> I assume that hugetlb page size? If so isn't that similar to FSL hugetlb
>>> page table layout?
>>
>> No, it is not for 16k hugepage size with a standard page size of 4k.
>>
>> Here I'm trying to handle the case of CONFIG_PPC_16K_PAGES.
>> As of today, it is implemented by using the standard Linux page layout,
>> ie one PTE entry for each 16k page. This forbids the use the 8xx HW
>> assistance.
>>
>>>
>>>>
>>>> Initially I was thinking about handling this by simply modifying
>>>> pte_index() which changing pte_t type in order to have one entry every
>>>> 16 bytes, then replicate the PTE value at *ptep, *ptep+1,*ptep+2 and
>>>> *ptep+3 both in set_pte_at() and pte_update().
>>>>
>>>> However, this doesn't work because many many places in the mm core part
>>>> of the kernel use loops on ptep with single ptep++ increment.
>>>>
>>>> Therefore did it with the following hack:
>>>>
>>>>     /* PTE level */
>>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>>> +typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
>>>> +#else
>>>>     typedef struct { pte_basic_t pte; } pte_t;
>>>> +#endif
>>>>
>>>> @@ -181,7 +192,13 @@ static inline unsigned long pte_update(pte_t *p,
>>>>            : "cc" );
>>>>     #else /* PTE_ATOMIC_UPDATES */
>>>>            unsigned long old = pte_val(*p);
>>>> -       *p = __pte((old & ~clr) | set);
>>>> +       unsigned long new = (old & ~clr) | set;
>>>> +
>>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>>> +       p->pte = p->pte1 = p->pte2 = p->pte3 = new;
>>>> +#else
>>>> +       *p = __pte(new);
>>>> +#endif
>>>>     #endif /* !PTE_ATOMIC_UPDATES */
>>>>
>>>>     #ifdef CONFIG_44x
>>>>
>>>>
>>>> @@ -161,7 +161,11 @@ static inline void __set_pte_at(struct mm_struct
>>>> *mm, unsigned long addr,
>>>>            /* Anything else just stores the PTE normally. That covers all
>>>> 64-bit
>>>>             * cases, and 32-bit non-hash with 32-bit PTEs.
>>>>             */
>>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>>> +       ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
>>>> +#else
>>>>            *ptep = pte;
>>>> +#endif
>>>>
>>>>
>>>>
>>>> But I'm not too happy with it as it means pte_t is not a single type
>>>> anymore so passing it from one function to the other is quite heavy.
>>>>
>>>>
>>>> Would someone have an idea of an elegent way to handle that ?
>>>>
>>>> Thanks
>>>> Christophe
>>>
>>> Why would pte_update bother about updating all the 4 entries?. Can you
>>> help me understand the issue?
>>
>> Because the 8xx HW assistance expects 4 identical entries for each 16k
>> page, so everytime a PTE is updated the 4 entries have to be updated.
>>
> 
> What you suggested in the original mail is what matches that best isn't it?
> That is a linux pte update involves updating 4 slot. Hence a linux pte
> consist of 4 unsigned long?
> 

Yes indeed.
It seems hopeless to avoid carrying the 4 longs from one function to the 
other allthough that's four times the same thing.

Christophe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How to handle PTE tables with non contiguous entries ?
@ 2018-09-18 11:53         ` Christophe LEROY
  0 siblings, 0 replies; 15+ messages in thread
From: Christophe LEROY @ 2018-09-18 11:53 UTC (permalink / raw)
  To: Aneesh Kumar K.V, akpm, linux-mm, aneesh.kumar, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev
  Cc: LKML



Le 18/09/2018 A  13:47, Aneesh Kumar K.V a A(C)critA :
> Christophe LEROY <christophe.leroy@c-s.fr> writes:
> 
>> Le 17/09/2018 A  11:03, Aneesh Kumar K.V a A(C)critA :
>>> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>>>
>>>> Hi,
>>>>
>>>> I'm having a hard time figuring out the best way to handle the following
>>>> situation:
>>>>
>>>> On the powerpc8xx, handling 16k size pages requires to have page tables
>>>> with 4 identical entries.
>>>
>>> I assume that hugetlb page size? If so isn't that similar to FSL hugetlb
>>> page table layout?
>>
>> No, it is not for 16k hugepage size with a standard page size of 4k.
>>
>> Here I'm trying to handle the case of CONFIG_PPC_16K_PAGES.
>> As of today, it is implemented by using the standard Linux page layout,
>> ie one PTE entry for each 16k page. This forbids the use the 8xx HW
>> assistance.
>>
>>>
>>>>
>>>> Initially I was thinking about handling this by simply modifying
>>>> pte_index() which changing pte_t type in order to have one entry every
>>>> 16 bytes, then replicate the PTE value at *ptep, *ptep+1,*ptep+2 and
>>>> *ptep+3 both in set_pte_at() and pte_update().
>>>>
>>>> However, this doesn't work because many many places in the mm core part
>>>> of the kernel use loops on ptep with single ptep++ increment.
>>>>
>>>> Therefore did it with the following hack:
>>>>
>>>>     /* PTE level */
>>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>>> +typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
>>>> +#else
>>>>     typedef struct { pte_basic_t pte; } pte_t;
>>>> +#endif
>>>>
>>>> @@ -181,7 +192,13 @@ static inline unsigned long pte_update(pte_t *p,
>>>>            : "cc" );
>>>>     #else /* PTE_ATOMIC_UPDATES */
>>>>            unsigned long old = pte_val(*p);
>>>> -       *p = __pte((old & ~clr) | set);
>>>> +       unsigned long new = (old & ~clr) | set;
>>>> +
>>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>>> +       p->pte = p->pte1 = p->pte2 = p->pte3 = new;
>>>> +#else
>>>> +       *p = __pte(new);
>>>> +#endif
>>>>     #endif /* !PTE_ATOMIC_UPDATES */
>>>>
>>>>     #ifdef CONFIG_44x
>>>>
>>>>
>>>> @@ -161,7 +161,11 @@ static inline void __set_pte_at(struct mm_struct
>>>> *mm, unsigned long addr,
>>>>            /* Anything else just stores the PTE normally. That covers all
>>>> 64-bit
>>>>             * cases, and 32-bit non-hash with 32-bit PTEs.
>>>>             */
>>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>>> +       ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
>>>> +#else
>>>>            *ptep = pte;
>>>> +#endif
>>>>
>>>>
>>>>
>>>> But I'm not too happy with it as it means pte_t is not a single type
>>>> anymore so passing it from one function to the other is quite heavy.
>>>>
>>>>
>>>> Would someone have an idea of an elegent way to handle that ?
>>>>
>>>> Thanks
>>>> Christophe
>>>
>>> Why would pte_update bother about updating all the 4 entries?. Can you
>>> help me understand the issue?
>>
>> Because the 8xx HW assistance expects 4 identical entries for each 16k
>> page, so everytime a PTE is updated the 4 entries have to be updated.
>>
> 
> What you suggested in the original mail is what matches that best isn't it?
> That is a linux pte update involves updating 4 slot. Hence a linux pte
> consist of 4 unsigned long?
> 

Yes indeed.
It seems hopeless to avoid carrying the 4 longs from one function to the 
other allthough that's four times the same thing.

Christophe

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-09-18 11:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-10 14:34 How to handle PTE tables with non contiguous entries ? Christophe Leroy
2018-09-10 20:05 ` Dan Malek
2018-09-10 20:05   ` Dan Malek
2018-09-11  5:28   ` Christophe LEROY
2018-09-11  5:28     ` Christophe LEROY
2018-09-10 21:06 ` Nicholas Piggin
2018-09-11  5:39   ` Christophe LEROY
2018-09-11  5:39     ` Christophe LEROY
2018-09-17  9:03 ` Aneesh Kumar K.V
2018-09-17  9:47   ` Christophe LEROY
2018-09-17  9:47     ` Christophe LEROY
2018-09-18 11:47     ` Aneesh Kumar K.V
2018-09-18 11:47       ` Aneesh Kumar K.V
2018-09-18 11:53       ` Christophe LEROY
2018-09-18 11:53         ` Christophe LEROY

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.