All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] ipipe x86_64 huge page ioremap
@ 2016-01-14 17:34 Henning Schild
  2016-01-15  8:32 ` Philippe Gerum
  2016-01-26 15:20 ` [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning Henning Schild
  0 siblings, 2 replies; 37+ messages in thread
From: Henning Schild @ 2016-01-14 17:34 UTC (permalink / raw)
  To: Xenomai; +Cc: Gilles Chanteperdrix

Hey,

the 4.1 kernel supports mapping IO memory using huge pages.
0f616be120c632c818faaea9adcb8f05a7a8601f ..
6b6378355b925050eb6fa966742d8c2d65ff0d83

In ipipe memory that gets ioremapped will get pinned using
__ipipe_pin_mapping_globally, however in the x86_64 case that function
uses vmalloc_sync_one which must only be used on 4k pages.

We found the problem when using the kernel in a VBox VM, where the
paravirtualized PCI device has enough iomem to cause huge page
mappings. When loading the device driver you will get a BUG caused by
__ipipe_pin_mapping_globally.

I will work on a fix for the problem. But i would also like to
understand the initial purpose of the pinning. Is it even supposed to
work for io memory as well? It looks like a way to commit address space
changes right down into the page tables, to avoid page-faults in the
kernel address space. Probably for more predictable timing ...

regards,
Henning


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] ipipe x86_64 huge page ioremap
  2016-01-14 17:34 [Xenomai] ipipe x86_64 huge page ioremap Henning Schild
@ 2016-01-15  8:32 ` Philippe Gerum
  2016-01-15 12:34   ` Jan Kiszka
  2016-02-02 17:43   ` Henning Schild
  2016-01-26 15:20 ` [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning Henning Schild
  1 sibling, 2 replies; 37+ messages in thread
From: Philippe Gerum @ 2016-01-15  8:32 UTC (permalink / raw)
  To: Henning Schild, Xenomai; +Cc: Gilles Chanteperdrix

On 01/14/2016 06:34 PM, Henning Schild wrote:
> Hey,
> 
> the 4.1 kernel supports mapping IO memory using huge pages.
> 0f616be120c632c818faaea9adcb8f05a7a8601f ..
> 6b6378355b925050eb6fa966742d8c2d65ff0d83
> 
> In ipipe memory that gets ioremapped will get pinned using
> __ipipe_pin_mapping_globally, however in the x86_64 case that function
> uses vmalloc_sync_one which must only be used on 4k pages.
> 
> We found the problem when using the kernel in a VBox VM, where the
> paravirtualized PCI device has enough iomem to cause huge page
> mappings. When loading the device driver you will get a BUG caused by
> __ipipe_pin_mapping_globally.
> 
> I will work on a fix for the problem. But i would also like to
> understand the initial purpose of the pinning. Is it even supposed to
> work for io memory as well? It looks like a way to commit address space
> changes right down into the page tables, to avoid page-faults in the
> kernel address space. Probably for more predictable timing ...
> 

This is for pinning the page table entries referencing kernel mappings,
so that we don't get minor faults when treading over kernel memory,
unless the fault fixup code is compatible with primary domain execution,
and cheaper than tracking the pgds.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] ipipe x86_64 huge page ioremap
  2016-01-15  8:32 ` Philippe Gerum
@ 2016-01-15 12:34   ` Jan Kiszka
  2016-02-02 17:43   ` Henning Schild
  1 sibling, 0 replies; 37+ messages in thread
From: Jan Kiszka @ 2016-01-15 12:34 UTC (permalink / raw)
  To: Philippe Gerum, Henning Schild, Xenomai; +Cc: Gilles Chanteperdrix

On 2016-01-15 09:32, Philippe Gerum wrote:
> On 01/14/2016 06:34 PM, Henning Schild wrote:
>> Hey,
>>
>> the 4.1 kernel supports mapping IO memory using huge pages.
>> 0f616be120c632c818faaea9adcb8f05a7a8601f ..
>> 6b6378355b925050eb6fa966742d8c2d65ff0d83
>>
>> In ipipe memory that gets ioremapped will get pinned using
>> __ipipe_pin_mapping_globally, however in the x86_64 case that function
>> uses vmalloc_sync_one which must only be used on 4k pages.
>>
>> We found the problem when using the kernel in a VBox VM, where the
>> paravirtualized PCI device has enough iomem to cause huge page
>> mappings. When loading the device driver you will get a BUG caused by
>> __ipipe_pin_mapping_globally.
>>
>> I will work on a fix for the problem. But i would also like to
>> understand the initial purpose of the pinning. Is it even supposed to
>> work for io memory as well? It looks like a way to commit address space
>> changes right down into the page tables, to avoid page-faults in the
>> kernel address space. Probably for more predictable timing ...
>>
> 
> This is for pinning the page table entries referencing kernel mappings,
> so that we don't get minor faults when treading over kernel memory,
> unless the fault fixup code is compatible with primary domain execution,
> and cheaper than tracking the pgds.

I suppose a critical scenario would already be a real-time driver that
accesses io regions in its interrupt handler or in the context of some
real-time task.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning
  2016-01-14 17:34 [Xenomai] ipipe x86_64 huge page ioremap Henning Schild
  2016-01-15  8:32 ` Philippe Gerum
@ 2016-01-26 15:20 ` Henning Schild
  2016-01-26 20:18   ` Jan Kiszka
  2016-01-27 13:41   ` [Xenomai] [PATCH v2] " Henning Schild
  1 sibling, 2 replies; 37+ messages in thread
From: Henning Schild @ 2016-01-26 15:20 UTC (permalink / raw)
  To: Xenomai

In 4.1 huge page mapping of io memory was introduced, enable ipipe to
handle that when pinning kernel memory.

change that introduced the feature
0f616be120c632c818faaea9adcb8f05a7a8601f

Signed-off-by: Henning Schild <henning.schild@siemens.com>
---
 arch/x86/mm/fault.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fd5bbcc..5585610 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -211,11 +211,19 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 	pud_k = pud_offset(pgd_k, address);
 	if (!pud_present(*pud_k))
 		return NULL;
+#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
+	if (pud_large(*pud))
+		return pud_k;
+#endif
 
 	pmd = pmd_offset(pud, address);
 	pmd_k = pmd_offset(pud_k, address);
 	if (!pmd_present(*pmd_k))
 		return NULL;
+#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
+	if (pmd_large(*pmd))
+		return pmd_k;
+#endif
 
 	if (!pmd_present(*pmd))
 		set_pmd(pmd, *pmd_k);
@@ -400,6 +408,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 
 	if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref))
 		BUG();
+#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
+	if (pud_large(*pud))
+		return 0;
+#endif
 
 	pmd = pmd_offset(pud, address);
 	pmd_ref = pmd_offset(pud_ref, address);
@@ -408,6 +420,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 
 	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
 		BUG();
+#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
+	if (pmd_large(*pmd))
+		return 0;
+#endif
 
 	pte_ref = pte_offset_kernel(pmd_ref, address);
 	if (!pte_present(*pte_ref))
-- 
2.4.10



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning
  2016-01-26 15:20 ` [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning Henning Schild
@ 2016-01-26 20:18   ` Jan Kiszka
  2016-01-27  9:54     ` Henning Schild
  2016-01-27 13:41   ` [Xenomai] [PATCH v2] " Henning Schild
  1 sibling, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2016-01-26 20:18 UTC (permalink / raw)
  To: Henning Schild, Xenomai

On 2016-01-26 16:20, Henning Schild wrote:
> In 4.1 huge page mapping of io memory was introduced, enable ipipe to
> handle that when pinning kernel memory.
> 
> change that introduced the feature
> 0f616be120c632c818faaea9adcb8f05a7a8601f
> 
> Signed-off-by: Henning Schild <henning.schild@siemens.com>
> ---
>  arch/x86/mm/fault.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index fd5bbcc..5585610 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -211,11 +211,19 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
>  	pud_k = pud_offset(pgd_k, address);
>  	if (!pud_present(*pud_k))
>  		return NULL;
> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
> +	if (pud_large(*pud))
> +		return pud_k;
> +#endif
>  
>  	pmd = pmd_offset(pud, address);
>  	pmd_k = pmd_offset(pud_k, address);
>  	if (!pmd_present(*pmd_k))
>  		return NULL;
> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
> +	if (pmd_large(*pmd))
> +		return pmd_k;
> +#endif
>  
>  	if (!pmd_present(*pmd))
>  		set_pmd(pmd, *pmd_k);
> @@ -400,6 +408,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address)
>  
>  	if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref))
>  		BUG();
> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
> +	if (pud_large(*pud))
> +		return 0;
> +#endif
>  
>  	pmd = pmd_offset(pud, address);
>  	pmd_ref = pmd_offset(pud_ref, address);
> @@ -408,6 +420,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address)
>  
>  	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
>  		BUG();
> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
> +	if (pmd_large(*pmd))
> +		return 0;
> +#endif
>  
>  	pte_ref = pte_offset_kernel(pmd_ref, address);
>  	if (!pte_present(*pte_ref))
> 

Do we need the #ifdefs? If not, maybe just leave comments to establish
the relationship to I-pipe.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning
  2016-01-26 20:18   ` Jan Kiszka
@ 2016-01-27  9:54     ` Henning Schild
  2016-01-27 10:31       ` Jan Kiszka
  0 siblings, 1 reply; 37+ messages in thread
From: Henning Schild @ 2016-01-27  9:54 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On Tue, 26 Jan 2016 21:18:53 +0100
Jan Kiszka <jan.kiszka@siemens.com> wrote:

> On 2016-01-26 16:20, Henning Schild wrote:
> > In 4.1 huge page mapping of io memory was introduced, enable ipipe
> > to handle that when pinning kernel memory.
> > 
> > change that introduced the feature
> > 0f616be120c632c818faaea9adcb8f05a7a8601f
> > 
> > Signed-off-by: Henning Schild <henning.schild@siemens.com>
> > ---
> >  arch/x86/mm/fault.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index fd5bbcc..5585610 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -211,11 +211,19 @@ static inline pmd_t *vmalloc_sync_one(pgd_t
> > *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address);
> >  	if (!pud_present(*pud_k))
> >  		return NULL;
> > +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
> > +	if (pud_large(*pud))
> > +		return pud_k;
> > +#endif
> >  
> >  	pmd = pmd_offset(pud, address);
> >  	pmd_k = pmd_offset(pud_k, address);
> >  	if (!pmd_present(*pmd_k))
> >  		return NULL;
> > +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
> > +	if (pmd_large(*pmd))
> > +		return pmd_k;
> > +#endif
> >  
> >  	if (!pmd_present(*pmd))
> >  		set_pmd(pmd, *pmd_k);
> > @@ -400,6 +408,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
> > unsigned long address) 
> >  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
> > pud_page_vaddr(*pud_ref)) BUG();
> > +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
> > +	if (pud_large(*pud))
> > +		return 0;
> > +#endif
> >  
> >  	pmd = pmd_offset(pud, address);
> >  	pmd_ref = pmd_offset(pud_ref, address);
> > @@ -408,6 +420,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
> > unsigned long address) 
> >  	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
> >  		BUG();
> > +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
> > +	if (pmd_large(*pmd))
> > +		return 0;
> > +#endif
> >  
> >  	pte_ref = pte_offset_kernel(pmd_ref, address);
> >  	if (!pte_present(*pte_ref))
> >   
> 
> Do we need the #ifdefs? If not, maybe just leave comments to establish
> the relationship to I-pipe.

We do not, without ipipe the function is never called from a
context where that check is required. If it was, this change or
something similar would be in mainline.
I see the ifdefs as kind of comments that establish the relationship to
ipipe and the huge_vmap feature. Plus in all other cases a pointless
check is not even compiled in.
If comments should be used instead of ifdefs i will change that.

Henning


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning
  2016-01-27  9:54     ` Henning Schild
@ 2016-01-27 10:31       ` Jan Kiszka
  2016-01-27 10:44         ` Philippe Gerum
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2016-01-27 10:31 UTC (permalink / raw)
  To: Henning Schild, Philippe Gerum; +Cc: Xenomai

On 2016-01-27 10:54, Henning Schild wrote:
> On Tue, 26 Jan 2016 21:18:53 +0100
> Jan Kiszka <jan.kiszka@siemens.com> wrote:
> 
>> On 2016-01-26 16:20, Henning Schild wrote:
>>> In 4.1 huge page mapping of io memory was introduced, enable ipipe
>>> to handle that when pinning kernel memory.
>>>
>>> change that introduced the feature
>>> 0f616be120c632c818faaea9adcb8f05a7a8601f
>>>
>>> Signed-off-by: Henning Schild <henning.schild@siemens.com>
>>> ---
>>>  arch/x86/mm/fault.c | 16 ++++++++++++++++
>>>  1 file changed, 16 insertions(+)
>>>
>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>>> index fd5bbcc..5585610 100644
>>> --- a/arch/x86/mm/fault.c
>>> +++ b/arch/x86/mm/fault.c
>>> @@ -211,11 +211,19 @@ static inline pmd_t *vmalloc_sync_one(pgd_t
>>> *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address);
>>>  	if (!pud_present(*pud_k))
>>>  		return NULL;
>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
>>> +	if (pud_large(*pud))
>>> +		return pud_k;
>>> +#endif
>>>  
>>>  	pmd = pmd_offset(pud, address);
>>>  	pmd_k = pmd_offset(pud_k, address);
>>>  	if (!pmd_present(*pmd_k))
>>>  		return NULL;
>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
>>> +	if (pmd_large(*pmd))
>>> +		return pmd_k;
>>> +#endif
>>>  
>>>  	if (!pmd_present(*pmd))
>>>  		set_pmd(pmd, *pmd_k);
>>> @@ -400,6 +408,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
>>> unsigned long address) 
>>>  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
>>> pud_page_vaddr(*pud_ref)) BUG();
>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
>>> +	if (pud_large(*pud))
>>> +		return 0;
>>> +#endif
>>>  
>>>  	pmd = pmd_offset(pud, address);
>>>  	pmd_ref = pmd_offset(pud_ref, address);
>>> @@ -408,6 +420,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
>>> unsigned long address) 
>>>  	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
>>>  		BUG();
>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
>>> +	if (pmd_large(*pmd))
>>> +		return 0;
>>> +#endif
>>>  
>>>  	pte_ref = pte_offset_kernel(pmd_ref, address);
>>>  	if (!pte_present(*pte_ref))
>>>   
>>
>> Do we need the #ifdefs? If not, maybe just leave comments to establish
>> the relationship to I-pipe.
> 
> We do not, without ipipe the function is never called from a
> context where that check is required. If it was, this change or
> something similar would be in mainline.
> I see the ifdefs as kind of comments that establish the relationship to
> ipipe and the huge_vmap feature. Plus in all other cases a pointless
> check is not even compiled in.
> If comments should be used instead of ifdefs i will change that.

The optimal way of documenting the motivation for these changes is in a
changelog. Provided the patch will survive future I-pipe versions and
not be folded into a bigger patch (Philippe, please confirm), let's go
for that. Plan B is inline comments. But #ifdefs for documentation
purposes are not desirable.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning
  2016-01-27 10:31       ` Jan Kiszka
@ 2016-01-27 10:44         ` Philippe Gerum
  2016-01-27 10:46           ` Philippe Gerum
  0 siblings, 1 reply; 37+ messages in thread
From: Philippe Gerum @ 2016-01-27 10:44 UTC (permalink / raw)
  To: Jan Kiszka, Henning Schild; +Cc: Xenomai

On 01/27/2016 11:31 AM, Jan Kiszka wrote:
> On 2016-01-27 10:54, Henning Schild wrote:
>> On Tue, 26 Jan 2016 21:18:53 +0100
>> Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>
>>> On 2016-01-26 16:20, Henning Schild wrote:
>>>> In 4.1 huge page mapping of io memory was introduced, enable ipipe
>>>> to handle that when pinning kernel memory.
>>>>
>>>> change that introduced the feature
>>>> 0f616be120c632c818faaea9adcb8f05a7a8601f
>>>>
>>>> Signed-off-by: Henning Schild <henning.schild@siemens.com>
>>>> ---
>>>>  arch/x86/mm/fault.c | 16 ++++++++++++++++
>>>>  1 file changed, 16 insertions(+)
>>>>
>>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>>>> index fd5bbcc..5585610 100644
>>>> --- a/arch/x86/mm/fault.c
>>>> +++ b/arch/x86/mm/fault.c
>>>> @@ -211,11 +211,19 @@ static inline pmd_t *vmalloc_sync_one(pgd_t
>>>> *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address);
>>>>  	if (!pud_present(*pud_k))
>>>>  		return NULL;
>>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
>>>> +	if (pud_large(*pud))
>>>> +		return pud_k;
>>>> +#endif
>>>>  
>>>>  	pmd = pmd_offset(pud, address);
>>>>  	pmd_k = pmd_offset(pud_k, address);
>>>>  	if (!pmd_present(*pmd_k))
>>>>  		return NULL;
>>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
>>>> +	if (pmd_large(*pmd))
>>>> +		return pmd_k;
>>>> +#endif
>>>>  
>>>>  	if (!pmd_present(*pmd))
>>>>  		set_pmd(pmd, *pmd_k);
>>>> @@ -400,6 +408,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
>>>> unsigned long address) 
>>>>  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
>>>> pud_page_vaddr(*pud_ref)) BUG();
>>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
>>>> +	if (pud_large(*pud))
>>>> +		return 0;
>>>> +#endif
>>>>  
>>>>  	pmd = pmd_offset(pud, address);
>>>>  	pmd_ref = pmd_offset(pud_ref, address);
>>>> @@ -408,6 +420,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
>>>> unsigned long address) 
>>>>  	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
>>>>  		BUG();
>>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
>>>> +	if (pmd_large(*pmd))
>>>> +		return 0;
>>>> +#endif
>>>>  
>>>>  	pte_ref = pte_offset_kernel(pmd_ref, address);
>>>>  	if (!pte_present(*pte_ref))
>>>>   
>>>
>>> Do we need the #ifdefs? If not, maybe just leave comments to establish
>>> the relationship to I-pipe.
>>
>> We do not, without ipipe the function is never called from a
>> context where that check is required. If it was, this change or
>> something similar would be in mainline.
>> I see the ifdefs as kind of comments that establish the relationship to
>> ipipe and the huge_vmap feature. Plus in all other cases a pointless
>> check is not even compiled in.
>> If comments should be used instead of ifdefs i will change that.
> 
> The optimal way of documenting the motivation for these changes is in a
> changelog. Provided the patch will survive future I-pipe versions and
> not be folded into a bigger patch (Philippe, please confirm),

In the current workflow, only raw/* branches may be rebased, so folding
patches within a maintenance branch can't happen.

 let's go
> for that. Plan B is inline comments. But #ifdefs for documentation
> purposes are not desirable.
> 

Documentation is best when close to the documented code, so to me plan B
is actually plan A, although I agree that complex/tricky/non-obvious
matters addressed by any given patch should be described accurately in
its changelog.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning
  2016-01-27 10:44         ` Philippe Gerum
@ 2016-01-27 10:46           ` Philippe Gerum
  0 siblings, 0 replies; 37+ messages in thread
From: Philippe Gerum @ 2016-01-27 10:46 UTC (permalink / raw)
  To: Jan Kiszka, Henning Schild; +Cc: Xenomai

On 01/27/2016 11:44 AM, Philippe Gerum wrote:
> On 01/27/2016 11:31 AM, Jan Kiszka wrote:
>> On 2016-01-27 10:54, Henning Schild wrote:
>>> On Tue, 26 Jan 2016 21:18:53 +0100
>>> Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>
>>>> On 2016-01-26 16:20, Henning Schild wrote:
>>>>> In 4.1 huge page mapping of io memory was introduced, enable ipipe
>>>>> to handle that when pinning kernel memory.
>>>>>
>>>>> change that introduced the feature
>>>>> 0f616be120c632c818faaea9adcb8f05a7a8601f
>>>>>
>>>>> Signed-off-by: Henning Schild <henning.schild@siemens.com>
>>>>> ---
>>>>>  arch/x86/mm/fault.c | 16 ++++++++++++++++
>>>>>  1 file changed, 16 insertions(+)
>>>>>
>>>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>>>>> index fd5bbcc..5585610 100644
>>>>> --- a/arch/x86/mm/fault.c
>>>>> +++ b/arch/x86/mm/fault.c
>>>>> @@ -211,11 +211,19 @@ static inline pmd_t *vmalloc_sync_one(pgd_t
>>>>> *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address);
>>>>>  	if (!pud_present(*pud_k))
>>>>>  		return NULL;
>>>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
>>>>> +	if (pud_large(*pud))
>>>>> +		return pud_k;
>>>>> +#endif
>>>>>  
>>>>>  	pmd = pmd_offset(pud, address);
>>>>>  	pmd_k = pmd_offset(pud_k, address);
>>>>>  	if (!pmd_present(*pmd_k))
>>>>>  		return NULL;
>>>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
>>>>> +	if (pmd_large(*pmd))
>>>>> +		return pmd_k;
>>>>> +#endif
>>>>>  
>>>>>  	if (!pmd_present(*pmd))
>>>>>  		set_pmd(pmd, *pmd_k);
>>>>> @@ -400,6 +408,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
>>>>> unsigned long address) 
>>>>>  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
>>>>> pud_page_vaddr(*pud_ref)) BUG();
>>>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
>>>>> +	if (pud_large(*pud))
>>>>> +		return 0;
>>>>> +#endif
>>>>>  
>>>>>  	pmd = pmd_offset(pud, address);
>>>>>  	pmd_ref = pmd_offset(pud_ref, address);
>>>>> @@ -408,6 +420,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
>>>>> unsigned long address) 
>>>>>  	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
>>>>>  		BUG();
>>>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP)
>>>>> +	if (pmd_large(*pmd))
>>>>> +		return 0;
>>>>> +#endif
>>>>>  
>>>>>  	pte_ref = pte_offset_kernel(pmd_ref, address);
>>>>>  	if (!pte_present(*pte_ref))
>>>>>   
>>>>
>>>> Do we need the #ifdefs? If not, maybe just leave comments to establish
>>>> the relationship to I-pipe.
>>>
>>> We do not, without ipipe the function is never called from a
>>> context where that check is required. If it was, this change or
>>> something similar would be in mainline.
>>> I see the ifdefs as kind of comments that establish the relationship to
>>> ipipe and the huge_vmap feature. Plus in all other cases a pointless
>>> check is not even compiled in.
>>> If comments should be used instead of ifdefs i will change that.
>>
>> The optimal way of documenting the motivation for these changes is in a
>> changelog. Provided the patch will survive future I-pipe versions and
>> not be folded into a bigger patch (Philippe, please confirm),
> 
> In the current workflow, only raw/* branches may be rebased, so folding
> patches within a maintenance branch can't happen.
> 
>  let's go
>> for that. Plan B is inline comments. But #ifdefs for documentation
>> purposes are not desirable.
>>
> 
> Documentation is best when close to the documented code, so to me plan B
> is actually plan A, although I agree that complex/tricky/non-obvious
> matters addressed by any given patch should be described accurately in
> its changelog.
> 

This said, I agree that #ifdefery should be reduced by factoring out
code when possible.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning
  2016-01-26 15:20 ` [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning Henning Schild
  2016-01-26 20:18   ` Jan Kiszka
@ 2016-01-27 13:41   ` Henning Schild
  2016-01-28 10:53     ` Philippe Gerum
  2016-02-03 12:59     ` [Xenomai] [PATCH v3] " Henning Schild
  1 sibling, 2 replies; 37+ messages in thread
From: Henning Schild @ 2016-01-27 13:41 UTC (permalink / raw)
  To: Xenomai; +Cc: Jan Kiszka

In 4.1 huge page mapping of io memory was introduced, enable ipipe to
handle that when pinning kernel memory.

change that introduced the feature
0f616be120c632c818faaea9adcb8f05a7a8601f

Signed-off-by: Henning Schild <henning.schild@siemens.com>
---
 arch/x86/mm/fault.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fd5bbcc..ca1e75b 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -211,11 +211,15 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 	pud_k = pud_offset(pgd_k, address);
 	if (!pud_present(*pud_k))
 		return NULL;
+	if (pud_large(*pud))
+		return pud_k;
 
 	pmd = pmd_offset(pud, address);
 	pmd_k = pmd_offset(pud_k, address);
 	if (!pmd_present(*pmd_k))
 		return NULL;
+	if (pmd_large(*pmd))
+		return pmd_k;
 
 	if (!pmd_present(*pmd))
 		set_pmd(pmd, *pmd_k);
@@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 
 	if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref))
 		BUG();
+	if (pud_large(*pud))
+		return 0;
 
 	pmd = pmd_offset(pud, address);
 	pmd_ref = pmd_offset(pud_ref, address);
@@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 
 	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
 		BUG();
+	if (pmd_large(*pmd))
+		return 0;
 
 	pte_ref = pte_offset_kernel(pmd_ref, address);
 	if (!pte_present(*pte_ref))
-- 
2.4.10



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning
  2016-01-27 13:41   ` [Xenomai] [PATCH v2] " Henning Schild
@ 2016-01-28 10:53     ` Philippe Gerum
  2016-01-28 20:53       ` Henning Schild
  2016-02-03 12:59     ` [Xenomai] [PATCH v3] " Henning Schild
  1 sibling, 1 reply; 37+ messages in thread
From: Philippe Gerum @ 2016-01-28 10:53 UTC (permalink / raw)
  To: Henning Schild, Xenomai; +Cc: Jan Kiszka

On 01/27/2016 02:41 PM, Henning Schild wrote:
> In 4.1 huge page mapping of io memory was introduced, enable ipipe to
> handle that when pinning kernel memory.
> 
> change that introduced the feature
> 0f616be120c632c818faaea9adcb8f05a7a8601f
> 
> Signed-off-by: Henning Schild <henning.schild@siemens.com>
> ---
>  arch/x86/mm/fault.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index fd5bbcc..ca1e75b 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -211,11 +211,15 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
>  	pud_k = pud_offset(pgd_k, address);
>  	if (!pud_present(*pud_k))
>  		return NULL;
> +	if (pud_large(*pud))
> +		return pud_k;
>  
>  	pmd = pmd_offset(pud, address);
>  	pmd_k = pmd_offset(pud_k, address);
>  	if (!pmd_present(*pmd_k))
>  		return NULL;
> +	if (pmd_large(*pmd))
> +		return pmd_k;
>  
>  	if (!pmd_present(*pmd))
>  		set_pmd(pmd, *pmd_k);
> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address)
>  
>  	if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref))
>  		BUG();
> +	if (pud_large(*pud))
> +		return 0;
>  
>  	pmd = pmd_offset(pud, address);
>  	pmd_ref = pmd_offset(pud_ref, address);
> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address)
>  
>  	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
>  		BUG();
> +	if (pmd_large(*pmd))
> +		return 0;
>  
>  	pte_ref = pte_offset_kernel(pmd_ref, address);
>  	if (!pte_present(*pte_ref))
> 

I'm confused. Assuming the purpose of that patch is to exclude huge I/O
mappings from pte pinning, why does the changes to the x86_32 version of
the vmalloc_sync_one() helper actually prevent such pinning, while the
x86_64 version does not?

Could you explain the logic behind the changes?

-- 
Philippe.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning
  2016-01-28 10:53     ` Philippe Gerum
@ 2016-01-28 20:53       ` Henning Schild
  2016-01-29 17:11         ` Philippe Gerum
  0 siblings, 1 reply; 37+ messages in thread
From: Henning Schild @ 2016-01-28 20:53 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, Xenomai

On Thu, 28 Jan 2016 11:53:08 +0100
Philippe Gerum <rpm@xenomai.org> wrote:

> On 01/27/2016 02:41 PM, Henning Schild wrote:
> > In 4.1 huge page mapping of io memory was introduced, enable ipipe
> > to handle that when pinning kernel memory.
> > 
> > change that introduced the feature
> > 0f616be120c632c818faaea9adcb8f05a7a8601f
> > 
> > Signed-off-by: Henning Schild <henning.schild@siemens.com>
> > ---
> >  arch/x86/mm/fault.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index fd5bbcc..ca1e75b 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -211,11 +211,15 @@ static inline pmd_t *vmalloc_sync_one(pgd_t
> > *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address);
> >  	if (!pud_present(*pud_k))
> >  		return NULL;
> > +	if (pud_large(*pud))
> > +		return pud_k;
> >  
> >  	pmd = pmd_offset(pud, address);
> >  	pmd_k = pmd_offset(pud_k, address);
> >  	if (!pmd_present(*pmd_k))
> >  		return NULL;
> > +	if (pmd_large(*pmd))
> > +		return pmd_k;
> >  
> >  	if (!pmd_present(*pmd))
> >  		set_pmd(pmd, *pmd_k);
> > @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
> > unsigned long address) 
> >  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
> > pud_page_vaddr(*pud_ref)) BUG();
> > +	if (pud_large(*pud))
> > +		return 0;
> >  
> >  	pmd = pmd_offset(pud, address);
> >  	pmd_ref = pmd_offset(pud_ref, address);
> > @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
> > unsigned long address) 
> >  	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
> >  		BUG();
> > +	if (pmd_large(*pmd))
> > +		return 0;
> >  
> >  	pte_ref = pte_offset_kernel(pmd_ref, address);
> >  	if (!pte_present(*pte_ref))
> >   
> 
> I'm confused. Assuming the purpose of that patch is to exclude huge
> I/O mappings from pte pinning, why does the changes to the x86_32
> version of the vmalloc_sync_one() helper actually prevent such
> pinning, while the x86_64 version does not?

No the purpose is to include them just like they were before.
vanilla vmalloc_sync_one just must not be called on huge mappings
because it cant handle them. The patch is supposed to make the function
return successfully, stopping early when huge pages are detected.

It changes the implementation of both x86_32 and x86_64.

> Could you explain the logic behind the changes?





^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning
  2016-01-28 20:53       ` Henning Schild
@ 2016-01-29 17:11         ` Philippe Gerum
  2016-01-29 18:39           ` Gilles Chanteperdrix
  2016-02-02 11:41           ` Henning Schild
  0 siblings, 2 replies; 37+ messages in thread
From: Philippe Gerum @ 2016-01-29 17:11 UTC (permalink / raw)
  To: Henning Schild; +Cc: Jan Kiszka, Xenomai

On 01/28/2016 09:53 PM, Henning Schild wrote:
> On Thu, 28 Jan 2016 11:53:08 +0100
> Philippe Gerum <rpm@xenomai.org> wrote:
> 
>> On 01/27/2016 02:41 PM, Henning Schild wrote:
>>> In 4.1 huge page mapping of io memory was introduced, enable ipipe
>>> to handle that when pinning kernel memory.
>>>
>>> change that introduced the feature
>>> 0f616be120c632c818faaea9adcb8f05a7a8601f
>>>
>>> Signed-off-by: Henning Schild <henning.schild@siemens.com>
>>> ---
>>>  arch/x86/mm/fault.c | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>>> index fd5bbcc..ca1e75b 100644
>>> --- a/arch/x86/mm/fault.c
>>> +++ b/arch/x86/mm/fault.c
>>> @@ -211,11 +211,15 @@ static inline pmd_t *vmalloc_sync_one(pgd_t
>>> *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address);
>>>  	if (!pud_present(*pud_k))
>>>  		return NULL;
>>> +	if (pud_large(*pud))
>>> +		return pud_k;
>>>  
>>>  	pmd = pmd_offset(pud, address);
>>>  	pmd_k = pmd_offset(pud_k, address);
>>>  	if (!pmd_present(*pmd_k))
>>>  		return NULL;
>>> +	if (pmd_large(*pmd))
>>> +		return pmd_k;
>>>  
>>>  	if (!pmd_present(*pmd))
>>>  		set_pmd(pmd, *pmd_k);
>>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
>>> unsigned long address) 
>>>  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
>>> pud_page_vaddr(*pud_ref)) BUG();
>>> +	if (pud_large(*pud))
>>> +		return 0;
>>>  
>>>  	pmd = pmd_offset(pud, address);
>>>  	pmd_ref = pmd_offset(pud_ref, address);
>>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
>>> unsigned long address) 
>>>  	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
>>>  		BUG();
>>> +	if (pmd_large(*pmd))
>>> +		return 0;
>>>  
>>>  	pte_ref = pte_offset_kernel(pmd_ref, address);
>>>  	if (!pte_present(*pte_ref))
>>>   
>>
>> I'm confused. Assuming the purpose of that patch is to exclude huge
>> I/O mappings from pte pinning, why does the changes to the x86_32
>> version of the vmalloc_sync_one() helper actually prevent such
>> pinning, while the x86_64 version does not?
> 
> No the purpose is to include them just like they were before.
> vanilla vmalloc_sync_one just must not be called on huge mappings
> because it cant handle them. The patch is supposed to make the function
> return successfully, stopping early when huge pages are detected.
> 
> It changes the implementation of both x86_32 and x86_64.
> 

Sorry, your answer confuses me even more. vmalloc_sync_one() _does_ the
pinning, by copying over the kernel mapping, early in the course of the
routine for x86_64, late for x86_32.

Please explain why your changes prevent huge I/O mappings from being
pinned into the current page directory in the x86_32 implementation, but
still allow this to be done in the x86_64 version. The section of code
you patched in the latter case is basically a series of sanity checks
done after the pinning took place, not before.

On a more general note, a better approach would be to filter out calls
to vmalloc_sync_one() for huge pages directly from __ipipe_pin_mapping
globally().

-- 
Philippe.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning
  2016-01-29 17:11         ` Philippe Gerum
@ 2016-01-29 18:39           ` Gilles Chanteperdrix
  2016-02-02 12:08             ` Henning Schild
  2016-02-02 11:41           ` Henning Schild
  1 sibling, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2016-01-29 18:39 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Xenomai, Jan Kiszka

On Fri, Jan 29, 2016 at 06:11:07PM +0100, Philippe Gerum wrote:
> On 01/28/2016 09:53 PM, Henning Schild wrote:
> > On Thu, 28 Jan 2016 11:53:08 +0100
> > Philippe Gerum <rpm@xenomai.org> wrote:
> > 
> >> On 01/27/2016 02:41 PM, Henning Schild wrote:
> >>> In 4.1 huge page mapping of io memory was introduced, enable ipipe
> >>> to handle that when pinning kernel memory.
> >>>
> >>> change that introduced the feature
> >>> 0f616be120c632c818faaea9adcb8f05a7a8601f
> >>>
> >>> Signed-off-by: Henning Schild <henning.schild@siemens.com>
> >>> ---
> >>>  arch/x86/mm/fault.c | 8 ++++++++
> >>>  1 file changed, 8 insertions(+)
> >>>
> >>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> >>> index fd5bbcc..ca1e75b 100644
> >>> --- a/arch/x86/mm/fault.c
> >>> +++ b/arch/x86/mm/fault.c
> >>> @@ -211,11 +211,15 @@ static inline pmd_t *vmalloc_sync_one(pgd_t
> >>> *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address);
> >>>  	if (!pud_present(*pud_k))
> >>>  		return NULL;
> >>> +	if (pud_large(*pud))
> >>> +		return pud_k;
> >>>  
> >>>  	pmd = pmd_offset(pud, address);
> >>>  	pmd_k = pmd_offset(pud_k, address);
> >>>  	if (!pmd_present(*pmd_k))
> >>>  		return NULL;
> >>> +	if (pmd_large(*pmd))
> >>> +		return pmd_k;
> >>>  
> >>>  	if (!pmd_present(*pmd))
> >>>  		set_pmd(pmd, *pmd_k);
> >>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
> >>> unsigned long address) 
> >>>  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
> >>> pud_page_vaddr(*pud_ref)) BUG();
> >>> +	if (pud_large(*pud))
> >>> +		return 0;
> >>>  
> >>>  	pmd = pmd_offset(pud, address);
> >>>  	pmd_ref = pmd_offset(pud_ref, address);
> >>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
> >>> unsigned long address) 
> >>>  	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
> >>>  		BUG();
> >>> +	if (pmd_large(*pmd))
> >>> +		return 0;
> >>>  
> >>>  	pte_ref = pte_offset_kernel(pmd_ref, address);
> >>>  	if (!pte_present(*pte_ref))
> >>>   
> >>
> >> I'm confused. Assuming the purpose of that patch is to exclude huge
> >> I/O mappings from pte pinning, why does the changes to the x86_32
> >> version of the vmalloc_sync_one() helper actually prevent such
> >> pinning, while the x86_64 version does not?
> > 
> > No the purpose is to include them just like they were before.
> > vanilla vmalloc_sync_one just must not be called on huge mappings
> > because it cant handle them. The patch is supposed to make the function
> > return successfully, stopping early when huge pages are detected.
> > 
> > It changes the implementation of both x86_32 and x86_64.
> > 
> 
> Sorry, your answer confuses me even more. vmalloc_sync_one() _does_ the
> pinning, by copying over the kernel mapping, early in the course of the
> routine for x86_64, late for x86_32.
> 
> Please explain why your changes prevent huge I/O mappings from being
> pinned into the current page directory in the x86_32 implementation, but
> still allow this to be done in the x86_64 version. The section of code
> you patched in the latter case is basically a series of sanity checks
> done after the pinning took place, not before.
> 
> On a more general note, a better approach would be to filter out calls
> to vmalloc_sync_one() for huge pages directly from __ipipe_pin_mapping
> globally().

Since the ioremap/vmalloc range is based on huge pages or not,
globally (since the processes mapping are copied from the kernel
mapping), we can probably even avoid the call to __ipipe_pin_mapping
for hug page mappings. After all, since the mainline kernel is able
to avoid calls to vmalloc_sync_one() for huge page mappings, the
I-pipe should be able to do the same.

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning
  2016-01-29 17:11         ` Philippe Gerum
  2016-01-29 18:39           ` Gilles Chanteperdrix
@ 2016-02-02 11:41           ` Henning Schild
  1 sibling, 0 replies; 37+ messages in thread
From: Henning Schild @ 2016-02-02 11:41 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, Xenomai

On Fri, 29 Jan 2016 18:11:07 +0100
Philippe Gerum <rpm@xenomai.org> wrote:

> On 01/28/2016 09:53 PM, Henning Schild wrote:
> > On Thu, 28 Jan 2016 11:53:08 +0100
> > Philippe Gerum <rpm@xenomai.org> wrote:
> >   
> >> On 01/27/2016 02:41 PM, Henning Schild wrote:  
> >>> In 4.1 huge page mapping of io memory was introduced, enable ipipe
> >>> to handle that when pinning kernel memory.
> >>>
> >>> change that introduced the feature
> >>> 0f616be120c632c818faaea9adcb8f05a7a8601f
> >>>
> >>> Signed-off-by: Henning Schild <henning.schild@siemens.com>
> >>> ---
> >>>  arch/x86/mm/fault.c | 8 ++++++++
> >>>  1 file changed, 8 insertions(+)
> >>>
> >>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> >>> index fd5bbcc..ca1e75b 100644
> >>> --- a/arch/x86/mm/fault.c
> >>> +++ b/arch/x86/mm/fault.c
> >>> @@ -211,11 +211,15 @@ static inline pmd_t *vmalloc_sync_one(pgd_t
> >>> *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address);
> >>>  	if (!pud_present(*pud_k))
> >>>  		return NULL;
> >>> +	if (pud_large(*pud))
> >>> +		return pud_k;
> >>>  
> >>>  	pmd = pmd_offset(pud, address);
> >>>  	pmd_k = pmd_offset(pud_k, address);
> >>>  	if (!pmd_present(*pmd_k))
> >>>  		return NULL;
> >>> +	if (pmd_large(*pmd))
> >>> +		return pmd_k;
> >>>  
> >>>  	if (!pmd_present(*pmd))
> >>>  		set_pmd(pmd, *pmd_k);
> >>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
> >>> unsigned long address) 
> >>>  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
> >>> pud_page_vaddr(*pud_ref)) BUG();
> >>> +	if (pud_large(*pud))
> >>> +		return 0;
> >>>  
> >>>  	pmd = pmd_offset(pud, address);
> >>>  	pmd_ref = pmd_offset(pud_ref, address);
> >>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
> >>> unsigned long address) 
> >>>  	if (pmd_none(*pmd) || pmd_page(*pmd) !=
> >>> pmd_page(*pmd_ref)) BUG();
> >>> +	if (pmd_large(*pmd))
> >>> +		return 0;
> >>>  
> >>>  	pte_ref = pte_offset_kernel(pmd_ref, address);
> >>>  	if (!pte_present(*pte_ref))
> >>>     
> >>
> >> I'm confused. Assuming the purpose of that patch is to exclude huge
> >> I/O mappings from pte pinning, why does the changes to the x86_32
> >> version of the vmalloc_sync_one() helper actually prevent such
> >> pinning, while the x86_64 version does not?  
> > 
> > No the purpose is to include them just like they were before.
> > vanilla vmalloc_sync_one just must not be called on huge mappings
> > because it cant handle them. The patch is supposed to make the
> > function return successfully, stopping early when huge pages are
> > detected.
> > 
> > It changes the implementation of both x86_32 and x86_64.
> >   
> 
> Sorry, your answer confuses me even more. vmalloc_sync_one() _does_
> the pinning, by copying over the kernel mapping, early in the course
> of the routine for x86_64, late for x86_32.
> 
> Please explain why your changes prevent huge I/O mappings from being
> pinned into the current page directory in the x86_32 implementation,
> but still allow this to be done in the x86_64 version. The section of
> code you patched in the latter case is basically a series of sanity
> checks done after the pinning took place, not before.

There is no difference between 32 and 64bits. After the patch the
memory will get pinned like it was before. The "sanity checks" are
required when you want to call vmalloc_sync_one on a range that
contains huge pages. They actually make sure that the function does not
dig deeper treating the huge pages as pagetables.
The initial problem is that the huge page itself was accessed as if it
was a pagetable. An offset into it was derefenced which caused a #PF.
The upstream kernel seems to never take this path for areas that
contain huge pages, but we do. That is why we have to introduce these
checks in the pagetable walker.

> On a more general note, a better approach would be to filter out calls
> to vmalloc_sync_one() for huge pages directly from __ipipe_pin_mapping
> globally().
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning
  2016-01-29 18:39           ` Gilles Chanteperdrix
@ 2016-02-02 12:08             ` Henning Schild
  2016-02-02 13:58               ` Philippe Gerum
  2016-02-02 14:18               ` Philippe Gerum
  0 siblings, 2 replies; 37+ messages in thread
From: Henning Schild @ 2016-02-02 12:08 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai

On Fri, 29 Jan 2016 19:39:48 +0100
Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote:

> On Fri, Jan 29, 2016 at 06:11:07PM +0100, Philippe Gerum wrote:
> > On 01/28/2016 09:53 PM, Henning Schild wrote:  
> > > On Thu, 28 Jan 2016 11:53:08 +0100
> > > Philippe Gerum <rpm@xenomai.org> wrote:
> > >   
> > >> On 01/27/2016 02:41 PM, Henning Schild wrote:  
> > >>> In 4.1 huge page mapping of io memory was introduced, enable
> > >>> ipipe to handle that when pinning kernel memory.
> > >>>
> > >>> change that introduced the feature
> > >>> 0f616be120c632c818faaea9adcb8f05a7a8601f
> > >>>
> > >>> Signed-off-by: Henning Schild <henning.schild@siemens.com>
> > >>> ---
> > >>>  arch/x86/mm/fault.c | 8 ++++++++
> > >>>  1 file changed, 8 insertions(+)
> > >>>
> > >>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > >>> index fd5bbcc..ca1e75b 100644
> > >>> --- a/arch/x86/mm/fault.c
> > >>> +++ b/arch/x86/mm/fault.c
> > >>> @@ -211,11 +211,15 @@ static inline pmd_t
> > >>> *vmalloc_sync_one(pgd_t *pgd, unsigned long address) pud_k =
> > >>> pud_offset(pgd_k, address); if (!pud_present(*pud_k))
> > >>>  		return NULL;
> > >>> +	if (pud_large(*pud))
> > >>> +		return pud_k;
> > >>>  
> > >>>  	pmd = pmd_offset(pud, address);
> > >>>  	pmd_k = pmd_offset(pud_k, address);
> > >>>  	if (!pmd_present(*pmd_k))
> > >>>  		return NULL;
> > >>> +	if (pmd_large(*pmd))
> > >>> +		return pmd_k;
> > >>>  
> > >>>  	if (!pmd_present(*pmd))
> > >>>  		set_pmd(pmd, *pmd_k);
> > >>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t
> > >>> *pgd, unsigned long address) 
> > >>>  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
> > >>> pud_page_vaddr(*pud_ref)) BUG();
> > >>> +	if (pud_large(*pud))
> > >>> +		return 0;
> > >>>  
> > >>>  	pmd = pmd_offset(pud, address);
> > >>>  	pmd_ref = pmd_offset(pud_ref, address);
> > >>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t
> > >>> *pgd, unsigned long address) 
> > >>>  	if (pmd_none(*pmd) || pmd_page(*pmd) !=
> > >>> pmd_page(*pmd_ref)) BUG();
> > >>> +	if (pmd_large(*pmd))
> > >>> +		return 0;
> > >>>  
> > >>>  	pte_ref = pte_offset_kernel(pmd_ref, address);
> > >>>  	if (!pte_present(*pte_ref))
> > >>>     
> > >>
> > >> I'm confused. Assuming the purpose of that patch is to exclude
> > >> huge I/O mappings from pte pinning, why does the changes to the
> > >> x86_32 version of the vmalloc_sync_one() helper actually prevent
> > >> such pinning, while the x86_64 version does not?  
> > > 
> > > No the purpose is to include them just like they were before.
> > > vanilla vmalloc_sync_one just must not be called on huge mappings
> > > because it cant handle them. The patch is supposed to make the
> > > function return successfully, stopping early when huge pages are
> > > detected.
> > > 
> > > It changes the implementation of both x86_32 and x86_64.
> > >   
> > 
> > Sorry, your answer confuses me even more. vmalloc_sync_one() _does_
> > the pinning, by copying over the kernel mapping, early in the
> > course of the routine for x86_64, late for x86_32.
> > 
> > Please explain why your changes prevent huge I/O mappings from being
> > pinned into the current page directory in the x86_32
> > implementation, but still allow this to be done in the x86_64
> > version. The section of code you patched in the latter case is
> > basically a series of sanity checks done after the pinning took
> > place, not before.
> > 
> > On a more general note, a better approach would be to filter out
> > calls to vmalloc_sync_one() for huge pages directly from
> > __ipipe_pin_mapping globally().  
> 
> Since the ioremap/vmalloc range is based on huge pages or not,
> globally (since the processes mapping are copied from the kernel
> mapping), we can probably even avoid the call to __ipipe_pin_mapping
> for hug page mappings. After all, since the mainline kernel is able
> to avoid calls to vmalloc_sync_one() for huge page mappings, the
> I-pipe should be able to do the same.

As said, my patch preserves the old ipipe behaviour, making it able to
deal with huge pages. Whether or not the pinning is required and on
which regions, is a valid but totally different question. Down in these
low-level functions you can not tell why the pinning was requested. It
has to be dealt with at a higher level.
But you are right, the kernel never ends up in vmalloc_sync_one for
these regions. Hence we can probably expect that in ioremapped region
is always mapped anyways. Since vmalloc_sync_one will ultimately also
be used when handling a #PF.

I just traced the change in lib/ioremap.c to
d41282baf7c1f2bb517d6df95571e7da865f5e37
Not too helpful, at least to me.

Henning 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-02 12:08             ` Henning Schild
@ 2016-02-02 13:58               ` Philippe Gerum
  2016-02-02 16:38                 ` Henning Schild
  2016-02-02 14:18               ` Philippe Gerum
  1 sibling, 1 reply; 37+ messages in thread
From: Philippe Gerum @ 2016-02-02 13:58 UTC (permalink / raw)
  To: Henning Schild, Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai

On 02/02/2016 01:08 PM, Henning Schild wrote:
> On Fri, 29 Jan 2016 19:39:48 +0100
> Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote:
> 
>> On Fri, Jan 29, 2016 at 06:11:07PM +0100, Philippe Gerum wrote:
>>> On 01/28/2016 09:53 PM, Henning Schild wrote:  
>>>> On Thu, 28 Jan 2016 11:53:08 +0100
>>>> Philippe Gerum <rpm@xenomai.org> wrote:
>>>>   
>>>>> On 01/27/2016 02:41 PM, Henning Schild wrote:  
>>>>>> In 4.1 huge page mapping of io memory was introduced, enable
>>>>>> ipipe to handle that when pinning kernel memory.
>>>>>>
>>>>>> change that introduced the feature
>>>>>> 0f616be120c632c818faaea9adcb8f05a7a8601f
>>>>>>
>>>>>> Signed-off-by: Henning Schild <henning.schild@siemens.com>
>>>>>> ---
>>>>>>  arch/x86/mm/fault.c | 8 ++++++++
>>>>>>  1 file changed, 8 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>>>>>> index fd5bbcc..ca1e75b 100644
>>>>>> --- a/arch/x86/mm/fault.c
>>>>>> +++ b/arch/x86/mm/fault.c
>>>>>> @@ -211,11 +211,15 @@ static inline pmd_t
>>>>>> *vmalloc_sync_one(pgd_t *pgd, unsigned long address) pud_k =
>>>>>> pud_offset(pgd_k, address); if (!pud_present(*pud_k))
>>>>>>  		return NULL;
>>>>>> +	if (pud_large(*pud))
>>>>>> +		return pud_k;
>>>>>>  
>>>>>>  	pmd = pmd_offset(pud, address);
>>>>>>  	pmd_k = pmd_offset(pud_k, address);
>>>>>>  	if (!pmd_present(*pmd_k))
>>>>>>  		return NULL;
>>>>>> +	if (pmd_large(*pmd))
>>>>>> +		return pmd_k;
>>>>>>  
>>>>>>  	if (!pmd_present(*pmd))
>>>>>>  		set_pmd(pmd, *pmd_k);
>>>>>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t
>>>>>> *pgd, unsigned long address) 
>>>>>>  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
>>>>>> pud_page_vaddr(*pud_ref)) BUG();
>>>>>> +	if (pud_large(*pud))
>>>>>> +		return 0;
>>>>>>  
>>>>>>  	pmd = pmd_offset(pud, address);
>>>>>>  	pmd_ref = pmd_offset(pud_ref, address);
>>>>>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t
>>>>>> *pgd, unsigned long address) 
>>>>>>  	if (pmd_none(*pmd) || pmd_page(*pmd) !=
>>>>>> pmd_page(*pmd_ref)) BUG();
>>>>>> +	if (pmd_large(*pmd))
>>>>>> +		return 0;
>>>>>>  
>>>>>>  	pte_ref = pte_offset_kernel(pmd_ref, address);
>>>>>>  	if (!pte_present(*pte_ref))
>>>>>>     
>>>>>
>>>>> I'm confused. Assuming the purpose of that patch is to exclude
>>>>> huge I/O mappings from pte pinning, why does the changes to the
>>>>> x86_32 version of the vmalloc_sync_one() helper actually prevent
>>>>> such pinning, while the x86_64 version does not?  
>>>>
>>>> No the purpose is to include them just like they were before.
>>>> vanilla vmalloc_sync_one just must not be called on huge mappings
>>>> because it cant handle them. The patch is supposed to make the
>>>> function return successfully, stopping early when huge pages are
>>>> detected.
>>>>
>>>> It changes the implementation of both x86_32 and x86_64.
>>>>   
>>>
>>> Sorry, your answer confuses me even more. vmalloc_sync_one() _does_
>>> the pinning, by copying over the kernel mapping, early in the
>>> course of the routine for x86_64, late for x86_32.
>>>
>>> Please explain why your changes prevent huge I/O mappings from being
>>> pinned into the current page directory in the x86_32
>>> implementation, but still allow this to be done in the x86_64
>>> version. The section of code you patched in the latter case is
>>> basically a series of sanity checks done after the pinning took
>>> place, not before.
>>>
>>> On a more general note, a better approach would be to filter out
>>> calls to vmalloc_sync_one() for huge pages directly from
>>> __ipipe_pin_mapping globally().  
>>
>> Since the ioremap/vmalloc range is based on huge pages or not,
>> globally (since the processes mapping are copied from the kernel
>> mapping), we can probably even avoid the call to __ipipe_pin_mapping
>> for hug page mappings. After all, since the mainline kernel is able
>> to avoid calls to vmalloc_sync_one() for huge page mappings, the
>> I-pipe should be able to do the same.
> 
> As said, my patch preserves the old ipipe behaviour, making it able to
> deal with huge pages. Whether or not the pinning is required and on
> which regions, is a valid but totally different question. Down in these
> low-level functions you can not tell why the pinning was requested. It
> has to be dealt with at a higher level.
> But you are right, the kernel never ends up in vmalloc_sync_one for
> these regions. Hence we can probably expect that in ioremapped region
> is always mapped anyways. Since vmalloc_sync_one will ultimately also
> be used when handling a #PF.
> 
> I just traced the change in lib/ioremap.c to
> d41282baf7c1f2bb517d6df95571e7da865f5e37
> Not too helpful, at least to me.

So why did you trace it?

-- 
Philippe.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-02 12:08             ` Henning Schild
  2016-02-02 13:58               ` Philippe Gerum
@ 2016-02-02 14:18               ` Philippe Gerum
  2016-02-02 16:30                 ` Henning Schild
  1 sibling, 1 reply; 37+ messages in thread
From: Philippe Gerum @ 2016-02-02 14:18 UTC (permalink / raw)
  To: Henning Schild, Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai

On 02/02/2016 01:08 PM, Henning Schild wrote:
> On Fri, 29 Jan 2016 19:39:48 +0100
> Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote:
> 
>> On Fri, Jan 29, 2016 at 06:11:07PM +0100, Philippe Gerum wrote:
>>> On 01/28/2016 09:53 PM, Henning Schild wrote:  
>>>> On Thu, 28 Jan 2016 11:53:08 +0100
>>>> Philippe Gerum <rpm@xenomai.org> wrote:
>>>>   
>>>>> On 01/27/2016 02:41 PM, Henning Schild wrote:  
>>>>>> In 4.1 huge page mapping of io memory was introduced, enable
>>>>>> ipipe to handle that when pinning kernel memory.
>>>>>>
>>>>>> change that introduced the feature
>>>>>> 0f616be120c632c818faaea9adcb8f05a7a8601f
>>>>>>
>>>>>> Signed-off-by: Henning Schild <henning.schild@siemens.com>
>>>>>> ---
>>>>>>  arch/x86/mm/fault.c | 8 ++++++++
>>>>>>  1 file changed, 8 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>>>>>> index fd5bbcc..ca1e75b 100644
>>>>>> --- a/arch/x86/mm/fault.c
>>>>>> +++ b/arch/x86/mm/fault.c
>>>>>> @@ -211,11 +211,15 @@ static inline pmd_t
>>>>>> *vmalloc_sync_one(pgd_t *pgd, unsigned long address) pud_k =
>>>>>> pud_offset(pgd_k, address); if (!pud_present(*pud_k))
>>>>>>  		return NULL;
>>>>>> +	if (pud_large(*pud))
>>>>>> +		return pud_k;
>>>>>>  
>>>>>>  	pmd = pmd_offset(pud, address);
>>>>>>  	pmd_k = pmd_offset(pud_k, address);
>>>>>>  	if (!pmd_present(*pmd_k))
>>>>>>  		return NULL;
>>>>>> +	if (pmd_large(*pmd))
>>>>>> +		return pmd_k;
>>>>>>  
>>>>>>  	if (!pmd_present(*pmd))
>>>>>>  		set_pmd(pmd, *pmd_k);
>>>>>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t
>>>>>> *pgd, unsigned long address) 
>>>>>>  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
>>>>>> pud_page_vaddr(*pud_ref)) BUG();
>>>>>> +	if (pud_large(*pud))
>>>>>> +		return 0;
>>>>>>  
>>>>>>  	pmd = pmd_offset(pud, address);
>>>>>>  	pmd_ref = pmd_offset(pud_ref, address);
>>>>>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t
>>>>>> *pgd, unsigned long address) 
>>>>>>  	if (pmd_none(*pmd) || pmd_page(*pmd) !=
>>>>>> pmd_page(*pmd_ref)) BUG();
>>>>>> +	if (pmd_large(*pmd))
>>>>>> +		return 0;
>>>>>>  
>>>>>>  	pte_ref = pte_offset_kernel(pmd_ref, address);
>>>>>>  	if (!pte_present(*pte_ref))
>>>>>>     
>>>>>
>>>>> I'm confused. Assuming the purpose of that patch is to exclude
>>>>> huge I/O mappings from pte pinning, why does the changes to the
>>>>> x86_32 version of the vmalloc_sync_one() helper actually prevent
>>>>> such pinning, while the x86_64 version does not?  
>>>>
>>>> No the purpose is to include them just like they were before.
>>>> vanilla vmalloc_sync_one just must not be called on huge mappings
>>>> because it cant handle them. The patch is supposed to make the
>>>> function return successfully, stopping early when huge pages are
>>>> detected.
>>>>
>>>> It changes the implementation of both x86_32 and x86_64.
>>>>   
>>>
>>> Sorry, your answer confuses me even more. vmalloc_sync_one() _does_
>>> the pinning, by copying over the kernel mapping, early in the
>>> course of the routine for x86_64, late for x86_32.
>>>
>>> Please explain why your changes prevent huge I/O mappings from being
>>> pinned into the current page directory in the x86_32
>>> implementation, but still allow this to be done in the x86_64
>>> version. The section of code you patched in the latter case is
>>> basically a series of sanity checks done after the pinning took
>>> place, not before.
>>>
>>> On a more general note, a better approach would be to filter out
>>> calls to vmalloc_sync_one() for huge pages directly from
>>> __ipipe_pin_mapping globally().  
>>
>> Since the ioremap/vmalloc range is based on huge pages or not,
>> globally (since the processes mapping are copied from the kernel
>> mapping), we can probably even avoid the call to __ipipe_pin_mapping
>> for hug page mappings. After all, since the mainline kernel is able
>> to avoid calls to vmalloc_sync_one() for huge page mappings, the
>> I-pipe should be able to do the same.
> 
> As said, my patch preserves the old ipipe behaviour, making it able to
> deal with huge pages.

No, it does not because it creates an early exit path for x86_32 which
did not exist preventing the page directory to be updated, which is at
odds with the implementation for x86_64 of the very same routine where
your changes do not filter out huge pages from pinning.

So your patch now prevents PTEs for huge pages from being pinned in the
pgd for x86_32, which departs from the current behavior. However, it
does not for x86_64 which is already the current behavior.

So the question remains: why this difference?

-- 
Philippe.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-02 14:18               ` Philippe Gerum
@ 2016-02-02 16:30                 ` Henning Schild
  0 siblings, 0 replies; 37+ messages in thread
From: Henning Schild @ 2016-02-02 16:30 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, Gilles Chanteperdrix, Xenomai

On Tue, 2 Feb 2016 15:18:15 +0100
Philippe Gerum <rpm@xenomai.org> wrote:

> On 02/02/2016 01:08 PM, Henning Schild wrote:
> > On Fri, 29 Jan 2016 19:39:48 +0100
> > Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote:
> >   
> >> On Fri, Jan 29, 2016 at 06:11:07PM +0100, Philippe Gerum wrote:  
> >>> On 01/28/2016 09:53 PM, Henning Schild wrote:    
> >>>> On Thu, 28 Jan 2016 11:53:08 +0100
> >>>> Philippe Gerum <rpm@xenomai.org> wrote:
> >>>>     
> >>>>> On 01/27/2016 02:41 PM, Henning Schild wrote:    
> >>>>>> In 4.1 huge page mapping of io memory was introduced, enable
> >>>>>> ipipe to handle that when pinning kernel memory.
> >>>>>>
> >>>>>> change that introduced the feature
> >>>>>> 0f616be120c632c818faaea9adcb8f05a7a8601f
> >>>>>>
> >>>>>> Signed-off-by: Henning Schild <henning.schild@siemens.com>
> >>>>>> ---
> >>>>>>  arch/x86/mm/fault.c | 8 ++++++++
> >>>>>>  1 file changed, 8 insertions(+)
> >>>>>>
> >>>>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> >>>>>> index fd5bbcc..ca1e75b 100644
> >>>>>> --- a/arch/x86/mm/fault.c
> >>>>>> +++ b/arch/x86/mm/fault.c
> >>>>>> @@ -211,11 +211,15 @@ static inline pmd_t
> >>>>>> *vmalloc_sync_one(pgd_t *pgd, unsigned long address) pud_k =
> >>>>>> pud_offset(pgd_k, address); if (!pud_present(*pud_k))
> >>>>>>  		return NULL;
> >>>>>> +	if (pud_large(*pud))
> >>>>>> +		return pud_k;
> >>>>>>  
> >>>>>>  	pmd = pmd_offset(pud, address);
> >>>>>>  	pmd_k = pmd_offset(pud_k, address);
> >>>>>>  	if (!pmd_present(*pmd_k))
> >>>>>>  		return NULL;
> >>>>>> +	if (pmd_large(*pmd))
> >>>>>> +		return pmd_k;

Are you talking about this change? Now looking at it it indeed looks
wrong and should not be there.

> >>>>>>  
> >>>>>>  	if (!pmd_present(*pmd))
> >>>>>>  		set_pmd(pmd, *pmd_k);
> >>>>>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t
> >>>>>> *pgd, unsigned long address) 
> >>>>>>  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
> >>>>>> pud_page_vaddr(*pud_ref)) BUG();
> >>>>>> +	if (pud_large(*pud))
> >>>>>> +		return 0;
> >>>>>>  
> >>>>>>  	pmd = pmd_offset(pud, address);
> >>>>>>  	pmd_ref = pmd_offset(pud_ref, address);
> >>>>>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t
> >>>>>> *pgd, unsigned long address) 
> >>>>>>  	if (pmd_none(*pmd) || pmd_page(*pmd) !=
> >>>>>> pmd_page(*pmd_ref)) BUG();
> >>>>>> +	if (pmd_large(*pmd))
> >>>>>> +		return 0;
> >>>>>>  
> >>>>>>  	pte_ref = pte_offset_kernel(pmd_ref, address);
> >>>>>>  	if (!pte_present(*pte_ref))
> >>>>>>       
> >>>>>
> >>>>> I'm confused. Assuming the purpose of that patch is to exclude
> >>>>> huge I/O mappings from pte pinning, why does the changes to the
> >>>>> x86_32 version of the vmalloc_sync_one() helper actually prevent
> >>>>> such pinning, while the x86_64 version does not?    
> >>>>
> >>>> No the purpose is to include them just like they were before.
> >>>> vanilla vmalloc_sync_one just must not be called on huge mappings
> >>>> because it cant handle them. The patch is supposed to make the
> >>>> function return successfully, stopping early when huge pages are
> >>>> detected.
> >>>>
> >>>> It changes the implementation of both x86_32 and x86_64.
> >>>>     
> >>>
> >>> Sorry, your answer confuses me even more. vmalloc_sync_one()
> >>> _does_ the pinning, by copying over the kernel mapping, early in
> >>> the course of the routine for x86_64, late for x86_32.
> >>>
> >>> Please explain why your changes prevent huge I/O mappings from
> >>> being pinned into the current page directory in the x86_32
> >>> implementation, but still allow this to be done in the x86_64
> >>> version. The section of code you patched in the latter case is
> >>> basically a series of sanity checks done after the pinning took
> >>> place, not before.
> >>>
> >>> On a more general note, a better approach would be to filter out
> >>> calls to vmalloc_sync_one() for huge pages directly from
> >>> __ipipe_pin_mapping globally().    
> >>
> >> Since the ioremap/vmalloc range is based on huge pages or not,
> >> globally (since the processes mapping are copied from the kernel
> >> mapping), we can probably even avoid the call to
> >> __ipipe_pin_mapping for hug page mappings. After all, since the
> >> mainline kernel is able to avoid calls to vmalloc_sync_one() for
> >> huge page mappings, the I-pipe should be able to do the same.  
> > 
> > As said, my patch preserves the old ipipe behaviour, making it able
> > to deal with huge pages.  
> 
> No, it does not because it creates an early exit path for x86_32 which
> did not exist preventing the page directory to be updated, which is at
> odds with the implementation for x86_64 of the very same routine where
> your changes do not filter out huge pages from pinning.
> 
> So your patch now prevents PTEs for huge pages from being pinned in
> the pgd for x86_32, which departs from the current behavior. However,
> it does not for x86_64 which is already the current behavior.
> 
> So the question remains: why this difference?

We are still not on the same page and i still do not know that
difference exaclty you are talking about. Let us please write inline
when talking about concrete code.

The return values are different between 32 and 64 because that is just
how mainline works.

The large is checked on pmd und pud because the huge ioremap patch has
the potential to make these two levels large.

The 32bit patch is derived from 64bit and i might have to check that in
more detail.

Henning


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-02 13:58               ` Philippe Gerum
@ 2016-02-02 16:38                 ` Henning Schild
  2016-02-02 16:39                   ` Jan Kiszka
  2016-02-02 19:26                   ` Gilles Chanteperdrix
  0 siblings, 2 replies; 37+ messages in thread
From: Henning Schild @ 2016-02-02 16:38 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, Gilles Chanteperdrix, Xenomai

On Tue, 2 Feb 2016 14:58:41 +0100
Philippe Gerum <rpm@xenomai.org> wrote:

> On 02/02/2016 01:08 PM, Henning Schild wrote:
> > On Fri, 29 Jan 2016 19:39:48 +0100
> > Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote:
> >   
> >> On Fri, Jan 29, 2016 at 06:11:07PM +0100, Philippe Gerum wrote:  
> >>> On 01/28/2016 09:53 PM, Henning Schild wrote:    
> >>>> On Thu, 28 Jan 2016 11:53:08 +0100
> >>>> Philippe Gerum <rpm@xenomai.org> wrote:
> >>>>     
> >>>>> On 01/27/2016 02:41 PM, Henning Schild wrote:    
> >>>>>> In 4.1 huge page mapping of io memory was introduced, enable
> >>>>>> ipipe to handle that when pinning kernel memory.
> >>>>>>
> >>>>>> change that introduced the feature
> >>>>>> 0f616be120c632c818faaea9adcb8f05a7a8601f
> >>>>>>
> >>>>>> Signed-off-by: Henning Schild <henning.schild@siemens.com>
> >>>>>> ---
> >>>>>>  arch/x86/mm/fault.c | 8 ++++++++
> >>>>>>  1 file changed, 8 insertions(+)
> >>>>>>
> >>>>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> >>>>>> index fd5bbcc..ca1e75b 100644
> >>>>>> --- a/arch/x86/mm/fault.c
> >>>>>> +++ b/arch/x86/mm/fault.c
> >>>>>> @@ -211,11 +211,15 @@ static inline pmd_t
> >>>>>> *vmalloc_sync_one(pgd_t *pgd, unsigned long address) pud_k =
> >>>>>> pud_offset(pgd_k, address); if (!pud_present(*pud_k))
> >>>>>>  		return NULL;
> >>>>>> +	if (pud_large(*pud))
> >>>>>> +		return pud_k;
> >>>>>>  
> >>>>>>  	pmd = pmd_offset(pud, address);
> >>>>>>  	pmd_k = pmd_offset(pud_k, address);
> >>>>>>  	if (!pmd_present(*pmd_k))
> >>>>>>  		return NULL;
> >>>>>> +	if (pmd_large(*pmd))
> >>>>>> +		return pmd_k;
> >>>>>>  
> >>>>>>  	if (!pmd_present(*pmd))
> >>>>>>  		set_pmd(pmd, *pmd_k);
> >>>>>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t
> >>>>>> *pgd, unsigned long address) 
> >>>>>>  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
> >>>>>> pud_page_vaddr(*pud_ref)) BUG();
> >>>>>> +	if (pud_large(*pud))
> >>>>>> +		return 0;
> >>>>>>  
> >>>>>>  	pmd = pmd_offset(pud, address);
> >>>>>>  	pmd_ref = pmd_offset(pud_ref, address);
> >>>>>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t
> >>>>>> *pgd, unsigned long address) 
> >>>>>>  	if (pmd_none(*pmd) || pmd_page(*pmd) !=
> >>>>>> pmd_page(*pmd_ref)) BUG();
> >>>>>> +	if (pmd_large(*pmd))
> >>>>>> +		return 0;
> >>>>>>  
> >>>>>>  	pte_ref = pte_offset_kernel(pmd_ref, address);
> >>>>>>  	if (!pte_present(*pte_ref))
> >>>>>>       
> >>>>>
> >>>>> I'm confused. Assuming the purpose of that patch is to exclude
> >>>>> huge I/O mappings from pte pinning, why does the changes to the
> >>>>> x86_32 version of the vmalloc_sync_one() helper actually prevent
> >>>>> such pinning, while the x86_64 version does not?    
> >>>>
> >>>> No the purpose is to include them just like they were before.
> >>>> vanilla vmalloc_sync_one just must not be called on huge mappings
> >>>> because it cant handle them. The patch is supposed to make the
> >>>> function return successfully, stopping early when huge pages are
> >>>> detected.
> >>>>
> >>>> It changes the implementation of both x86_32 and x86_64.
> >>>>     
> >>>
> >>> Sorry, your answer confuses me even more. vmalloc_sync_one()
> >>> _does_ the pinning, by copying over the kernel mapping, early in
> >>> the course of the routine for x86_64, late for x86_32.
> >>>
> >>> Please explain why your changes prevent huge I/O mappings from
> >>> being pinned into the current page directory in the x86_32
> >>> implementation, but still allow this to be done in the x86_64
> >>> version. The section of code you patched in the latter case is
> >>> basically a series of sanity checks done after the pinning took
> >>> place, not before.
> >>>
> >>> On a more general note, a better approach would be to filter out
> >>> calls to vmalloc_sync_one() for huge pages directly from
> >>> __ipipe_pin_mapping globally().    
> >>
> >> Since the ioremap/vmalloc range is based on huge pages or not,
> >> globally (since the processes mapping are copied from the kernel
> >> mapping), we can probably even avoid the call to
> >> __ipipe_pin_mapping for hug page mappings. After all, since the
> >> mainline kernel is able to avoid calls to vmalloc_sync_one() for
> >> huge page mappings, the I-pipe should be able to do the same.  
> > 
> > As said, my patch preserves the old ipipe behaviour, making it able
> > to deal with huge pages. Whether or not the pinning is required and
> > on which regions, is a valid but totally different question. Down
> > in these low-level functions you can not tell why the pinning was
> > requested. It has to be dealt with at a higher level.
> > But you are right, the kernel never ends up in vmalloc_sync_one for
> > these regions. Hence we can probably expect that in ioremapped
> > region is always mapped anyways. Since vmalloc_sync_one will
> > ultimately also be used when handling a #PF.
> > 
> > I just traced the change in lib/ioremap.c to
> > d41282baf7c1f2bb517d6df95571e7da865f5e37
> > Not too helpful, at least to me.  
> 
> So why did you trace it?

I do not want to keep working on that patch just to hear that it is not
required in the next round. Gilles suggested it might not be and you
are the one that introduced the pinning.

Maybe a HAVE_ARCH_HUGE_VMAP depends on !IPIPE would also be fine?



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-02 16:38                 ` Henning Schild
@ 2016-02-02 16:39                   ` Jan Kiszka
  2016-02-02 19:26                   ` Gilles Chanteperdrix
  1 sibling, 0 replies; 37+ messages in thread
From: Jan Kiszka @ 2016-02-02 16:39 UTC (permalink / raw)
  To: Henning Schild, Philippe Gerum; +Cc: Gilles Chanteperdrix, Xenomai

On 2016-02-02 17:38, Henning Schild wrote:
> Maybe a HAVE_ARCH_HUGE_VMAP depends on !IPIPE would also be fine?

No, no, that's a needless restriction. We need to fix I-pipe here.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] ipipe x86_64 huge page ioremap
  2016-01-15  8:32 ` Philippe Gerum
  2016-01-15 12:34   ` Jan Kiszka
@ 2016-02-02 17:43   ` Henning Schild
  2016-02-03 15:07     ` Philippe Gerum
  1 sibling, 1 reply; 37+ messages in thread
From: Henning Schild @ 2016-02-02 17:43 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Gilles Chanteperdrix, Xenomai

On Fri, 15 Jan 2016 09:32:38 +0100
Philippe Gerum <rpm@xenomai.org> wrote:

> On 01/14/2016 06:34 PM, Henning Schild wrote:
> > Hey,
> > 
> > the 4.1 kernel supports mapping IO memory using huge pages.
> > 0f616be120c632c818faaea9adcb8f05a7a8601f ..
> > 6b6378355b925050eb6fa966742d8c2d65ff0d83
> > 
> > In ipipe memory that gets ioremapped will get pinned using
> > __ipipe_pin_mapping_globally, however in the x86_64 case that
> > function uses vmalloc_sync_one which must only be used on 4k pages.
> > 
> > We found the problem when using the kernel in a VBox VM, where the
> > paravirtualized PCI device has enough iomem to cause huge page
> > mappings. When loading the device driver you will get a BUG caused
> > by __ipipe_pin_mapping_globally.
> > 
> > I will work on a fix for the problem. But i would also like to
> > understand the initial purpose of the pinning. Is it even supposed
> > to work for io memory as well? It looks like a way to commit
> > address space changes right down into the page tables, to avoid
> > page-faults in the kernel address space. Probably for more
> > predictable timing ... 
> 
> This is for pinning the page table entries referencing kernel
> mappings, so that we don't get minor faults when treading over kernel
> memory, unless the fault fixup code is compatible with primary domain
> execution, and cheaper than tracking the pgds.

Looking at both users of the pinning vmalloc and ioremap it does not
seem to me like anything is done lazy here. The complete pagetables are
alloced and filled.
Maybe i am reading it wrong, maybe the kernel changed since the pinning
function was introduced, or something else. Could you please explain
what minor faults we are talking about?

Faults on the actual content or faults on the PTs? After all they need
to be mapped in order to read/change them.

Henning


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-02 16:38                 ` Henning Schild
  2016-02-02 16:39                   ` Jan Kiszka
@ 2016-02-02 19:26                   ` Gilles Chanteperdrix
  2016-02-03 11:35                     ` Henning Schild
  1 sibling, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2016-02-02 19:26 UTC (permalink / raw)
  To: Henning Schild; +Cc: Jan Kiszka, Xenomai

On Tue, Feb 02, 2016 at 05:38:49PM +0100, Henning Schild wrote:
> I do not want to keep working on that patch just to hear that it is not
> required in the next round. Gilles suggested it might not be and you
> are the one that introduced the pinning.
> 
> Maybe a HAVE_ARCH_HUGE_VMAP depends on !IPIPE would also be fine?

I did not expect you to use my remark as an excuse for dropping the
ball. What I meant is that maybe, there is a way to skip the call to
__ipipe_pin_mapping_globally() entirely in the upper layers, for the
"huge pages" case. And if there is, this should be preferred over
patching vmalloc_sync_one, a function not meant to be used for huge
mappings anyway.

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-02 19:26                   ` Gilles Chanteperdrix
@ 2016-02-03 11:35                     ` Henning Schild
  0 siblings, 0 replies; 37+ messages in thread
From: Henning Schild @ 2016-02-03 11:35 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai

On Tue, 2 Feb 2016 20:26:54 +0100
Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote:

> On Tue, Feb 02, 2016 at 05:38:49PM +0100, Henning Schild wrote:
> > I do not want to keep working on that patch just to hear that it is
> > not required in the next round. Gilles suggested it might not be
> > and you are the one that introduced the pinning.
> > 
> > Maybe a HAVE_ARCH_HUGE_VMAP depends on !IPIPE would also be fine?  
> 
> I did not expect you to use my remark as an excuse for dropping the
> ball. What I meant is that maybe, there is a way to skip the call to
> __ipipe_pin_mapping_globally() entirely in the upper layers, for the
> "huge pages" case. And if there is, this should be preferred over
> patching vmalloc_sync_one, a function not meant to be used for huge
> mappings anyway.

Please do not get me wrong here. I am not about to drop the ball!
Making the two things mutually exclusive is a valid solution with
little complexity. Keeping the patch small might after all be more
important than catering for questionable features with more patching.
Hence the suggestion.

The x86_64 portion of my patch is fine but the 32bit version is indeed
pretty wrong. I just took over what i did for 64bit without thinking ...

Henning


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning
  2016-01-27 13:41   ` [Xenomai] [PATCH v2] " Henning Schild
  2016-01-28 10:53     ` Philippe Gerum
@ 2016-02-03 12:59     ` Henning Schild
  2016-02-03 14:24       ` Gilles Chanteperdrix
  2016-02-08  8:44       ` Henning Schild
  1 sibling, 2 replies; 37+ messages in thread
From: Henning Schild @ 2016-02-03 12:59 UTC (permalink / raw)
  To: Xenomai; +Cc: Jan Kiszka, Gilles Chanteperdrix

In 4.1 huge page mapping of io memory was introduced, enable ipipe to
handle that when pinning kernel memory.

change that introduced the feature
0f616be120c632c818faaea9adcb8f05a7a8601f

Signed-off-by: Henning Schild <henning.schild@siemens.com>
---
 arch/x86/mm/fault.c | 6 ++++++
 1 file changed, 6 insertions(+)

This version has the 32bit case changed according to the problems found in
the review. That patch finally should keep the old behaviour. But we should
still discuss whether the old behaviour was correct and should be kept.
If the pinning it not required it should be removed instead of beeing made
more complex.

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fd5bbcc..e78bd09 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -211,6 +211,8 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 	pud_k = pud_offset(pgd_k, address);
 	if (!pud_present(*pud_k))
 		return NULL;
+	if (pud_large(*pud))
+		return NULL;
 
 	pmd = pmd_offset(pud, address);
 	pmd_k = pmd_offset(pud_k, address);
@@ -400,6 +402,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 
 	if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref))
 		BUG();
+	if (pud_large(*pud))
+		return 0;
 
 	pmd = pmd_offset(pud, address);
 	pmd_ref = pmd_offset(pud_ref, address);
@@ -408,6 +412,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 
 	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
 		BUG();
+	if (pmd_large(*pmd))
+		return 0;
 
 	pte_ref = pte_offset_kernel(pmd_ref, address);
 	if (!pte_present(*pte_ref))
-- 
2.4.10



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-03 12:59     ` [Xenomai] [PATCH v3] " Henning Schild
@ 2016-02-03 14:24       ` Gilles Chanteperdrix
  2016-02-03 14:31         ` Jan Kiszka
  2016-02-04 11:53         ` Henning Schild
  2016-02-08  8:44       ` Henning Schild
  1 sibling, 2 replies; 37+ messages in thread
From: Gilles Chanteperdrix @ 2016-02-03 14:24 UTC (permalink / raw)
  To: Henning Schild; +Cc: Jan Kiszka, Xenomai

On Wed, Feb 03, 2016 at 01:59:25PM +0100, Henning Schild wrote:
> In 4.1 huge page mapping of io memory was introduced, enable ipipe to
> handle that when pinning kernel memory.
> 
> change that introduced the feature
> 0f616be120c632c818faaea9adcb8f05a7a8601f

Could we have an assessment of whether avoiding the call to
__ipipe_pin_range_mapping in upper layers makes the patch simpler?

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-03 14:24       ` Gilles Chanteperdrix
@ 2016-02-03 14:31         ` Jan Kiszka
  2016-02-03 14:38           ` Gilles Chanteperdrix
  2016-02-04 11:53         ` Henning Schild
  1 sibling, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2016-02-03 14:31 UTC (permalink / raw)
  To: Gilles Chanteperdrix, Henning Schild; +Cc: Xenomai

On 2016-02-03 15:24, Gilles Chanteperdrix wrote:
> On Wed, Feb 03, 2016 at 01:59:25PM +0100, Henning Schild wrote:
>> In 4.1 huge page mapping of io memory was introduced, enable ipipe to
>> handle that when pinning kernel memory.
>>
>> change that introduced the feature
>> 0f616be120c632c818faaea9adcb8f05a7a8601f
> 
> Could we have an assessment of whether avoiding the call to
> __ipipe_pin_range_mapping in upper layers makes the patch simpler?

For that, we first of all need to recapitulate how
__ipipe_pin_range_globally is/was supposed to work at all.

I tried to restore details but I'm specifically failing regarding that
"globally". To my understanding, the vmalloc_sync_one of x86 transfers
at best mappings from init_mm to the current mm one, but not to all mm
in the systems.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-03 14:31         ` Jan Kiszka
@ 2016-02-03 14:38           ` Gilles Chanteperdrix
  2016-02-03 14:51             ` Jan Kiszka
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2016-02-03 14:38 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On Wed, Feb 03, 2016 at 03:31:10PM +0100, Jan Kiszka wrote:
> On 2016-02-03 15:24, Gilles Chanteperdrix wrote:
> > On Wed, Feb 03, 2016 at 01:59:25PM +0100, Henning Schild wrote:
> >> In 4.1 huge page mapping of io memory was introduced, enable ipipe to
> >> handle that when pinning kernel memory.
> >>
> >> change that introduced the feature
> >> 0f616be120c632c818faaea9adcb8f05a7a8601f
> > 
> > Could we have an assessment of whether avoiding the call to
> > __ipipe_pin_range_mapping in upper layers makes the patch simpler?
> 
> For that, we first of all need to recapitulate how
> __ipipe_pin_range_globally is/was supposed to work at all.
> 
> I tried to restore details but I'm specifically failing regarding that
> "globally". To my understanding, the vmalloc_sync_one of x86 transfers
> at best mappings from init_mm to the current mm one, but not to all mm
> in the systems.

__ipipe_pin_range_globally calls vmalloc_sync_one for all pgds in
the system. This is pretty ugly, not very scalable, but if you do
not do that, each access to an ioremap/vmalloc area in, say, an
interrupt handler, causes a fault for processes that do not have the
mapping in their page tables. Such processes exist if they were
created before the call to ioremap/vmalloc. Another possible fix to
this issue is to allow handling that kind of faults over the head
domain without switching to secondary domain.

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-03 14:38           ` Gilles Chanteperdrix
@ 2016-02-03 14:51             ` Jan Kiszka
  2016-02-03 15:02               ` Gilles Chanteperdrix
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kiszka @ 2016-02-03 14:51 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2016-02-03 15:38, Gilles Chanteperdrix wrote:
> On Wed, Feb 03, 2016 at 03:31:10PM +0100, Jan Kiszka wrote:
>> On 2016-02-03 15:24, Gilles Chanteperdrix wrote:
>>> On Wed, Feb 03, 2016 at 01:59:25PM +0100, Henning Schild wrote:
>>>> In 4.1 huge page mapping of io memory was introduced, enable ipipe to
>>>> handle that when pinning kernel memory.
>>>>
>>>> change that introduced the feature
>>>> 0f616be120c632c818faaea9adcb8f05a7a8601f
>>>
>>> Could we have an assessment of whether avoiding the call to
>>> __ipipe_pin_range_mapping in upper layers makes the patch simpler?
>>
>> For that, we first of all need to recapitulate how
>> __ipipe_pin_range_globally is/was supposed to work at all.
>>
>> I tried to restore details but I'm specifically failing regarding that
>> "globally". To my understanding, the vmalloc_sync_one of x86 transfers
>> at best mappings from init_mm to the current mm one, but not to all mm
>> in the systems.
> 
> __ipipe_pin_range_globally calls vmalloc_sync_one for all pgds in
> the system. This is pretty ugly, not very scalable, but if you do

How are future pgds accounted for?

> not do that, each access to an ioremap/vmalloc area in, say, an
> interrupt handler, causes a fault for processes that do not have the
> mapping in their page tables. Such processes exist if they were
> created before the call to ioremap/vmalloc. Another possible fix to
> this issue is to allow handling that kind of faults over the head
> domain without switching to secondary domain.

OK, that makes more sense again.

But then Henning is definitely on the right path, because you can't tell
from 'start' and 'end' or the pgd pointer if there were only huge pages
added or maybe also some small pages. IOW, we do have to walk the page
table trees and therefore have to be prepared to hit some huge pages
along that path.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-03 14:51             ` Jan Kiszka
@ 2016-02-03 15:02               ` Gilles Chanteperdrix
  2016-02-03 15:14                 ` Jan Kiszka
  0 siblings, 1 reply; 37+ messages in thread
From: Gilles Chanteperdrix @ 2016-02-03 15:02 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On Wed, Feb 03, 2016 at 03:51:34PM +0100, Jan Kiszka wrote:
> On 2016-02-03 15:38, Gilles Chanteperdrix wrote:
> > On Wed, Feb 03, 2016 at 03:31:10PM +0100, Jan Kiszka wrote:
> >> On 2016-02-03 15:24, Gilles Chanteperdrix wrote:
> >>> On Wed, Feb 03, 2016 at 01:59:25PM +0100, Henning Schild wrote:
> >>>> In 4.1 huge page mapping of io memory was introduced, enable ipipe to
> >>>> handle that when pinning kernel memory.
> >>>>
> >>>> change that introduced the feature
> >>>> 0f616be120c632c818faaea9adcb8f05a7a8601f
> >>>
> >>> Could we have an assessment of whether avoiding the call to
> >>> __ipipe_pin_range_mapping in upper layers makes the patch simpler?
> >>
> >> For that, we first of all need to recapitulate how
> >> __ipipe_pin_range_globally is/was supposed to work at all.
> >>
> >> I tried to restore details but I'm specifically failing regarding that
> >> "globally". To my understanding, the vmalloc_sync_one of x86 transfers
> >> at best mappings from init_mm to the current mm one, but not to all mm
> >> in the systems.
> > 
> > __ipipe_pin_range_globally calls vmalloc_sync_one for all pgds in
> > the system. This is pretty ugly, not very scalable, but if you do
> 
> How are future pgds accounted for?

vmalloc and ioremap add their mappings to init_mm, all processes
created after that copy their kernel mapping from init_mm, so
inherit the vmalloc/ioremap mappings.

> 
> > not do that, each access to an ioremap/vmalloc area in, say, an
> > interrupt handler, causes a fault for processes that do not have the
> > mapping in their page tables. Such processes exist if they were
> > created before the call to ioremap/vmalloc. Another possible fix to
> > this issue is to allow handling that kind of faults over the head
> > domain without switching to secondary domain.
> 
> OK, that makes more sense again.
> 
> But then Henning is definitely on the right path, because you can't tell
> from 'start' and 'end' or the pgd pointer if there were only huge pages
> added or maybe also some small pages. IOW, we do have to walk the page
> table trees and therefore have to be prepared to hit some huge pages
> along that path.

The function which creates the mappings has to know that it creates
huge mapping, what I believe needs to be checked is whether that
information can easily be made available to
__ipipe_pin_range_globally call sites. Because if it can, we can
simply skip the call, and the patch will be simpler that what is
currently proposed by Henning.

-- 
					    Gilles.
https://click-hack.org


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] ipipe x86_64 huge page ioremap
  2016-02-02 17:43   ` Henning Schild
@ 2016-02-03 15:07     ` Philippe Gerum
  2016-02-03 15:48       ` Philippe Gerum
  2016-02-04 11:43       ` Henning Schild
  0 siblings, 2 replies; 37+ messages in thread
From: Philippe Gerum @ 2016-02-03 15:07 UTC (permalink / raw)
  To: Henning Schild; +Cc: Gilles Chanteperdrix, Xenomai

On 02/02/2016 06:43 PM, Henning Schild wrote:
> On Fri, 15 Jan 2016 09:32:38 +0100
> Philippe Gerum <rpm@xenomai.org> wrote:
> 
>> On 01/14/2016 06:34 PM, Henning Schild wrote:
>>> Hey,
>>>
>>> the 4.1 kernel supports mapping IO memory using huge pages.
>>> 0f616be120c632c818faaea9adcb8f05a7a8601f ..
>>> 6b6378355b925050eb6fa966742d8c2d65ff0d83
>>>
>>> In ipipe memory that gets ioremapped will get pinned using
>>> __ipipe_pin_mapping_globally, however in the x86_64 case that
>>> function uses vmalloc_sync_one which must only be used on 4k pages.
>>>
>>> We found the problem when using the kernel in a VBox VM, where the
>>> paravirtualized PCI device has enough iomem to cause huge page
>>> mappings. When loading the device driver you will get a BUG caused
>>> by __ipipe_pin_mapping_globally.
>>>
>>> I will work on a fix for the problem. But i would also like to
>>> understand the initial purpose of the pinning. Is it even supposed
>>> to work for io memory as well? It looks like a way to commit
>>> address space changes right down into the page tables, to avoid
>>> page-faults in the kernel address space. Probably for more
>>> predictable timing ... 
>>
>> This is for pinning the page table entries referencing kernel
>> mappings, so that we don't get minor faults when treading over kernel
>> memory, unless the fault fixup code is compatible with primary domain
>> execution, and cheaper than tracking the pgds.
> 
> Looking at both users of the pinning vmalloc and ioremap it does not
> seem to me like anything is done lazy here. The complete pagetables are
> alloced and filled.
> Maybe i am reading it wrong, maybe the kernel changed since the pinning
> function was introduced, or something else. Could you please explain
> what minor faults we are talking about?
> 
> Faults on the actual content or faults on the PTs? After all they need
> to be mapped in order to read/change them.

minor faults: MMU traps occurring when the page is in-core, but not
indexed by the pgd/TLB, due to a lazy/ondemand mapping scheme or lack of
resources. This mechanism is typically used with vmalloc'ed memory,
which underlies kernel modules.

1. A Xenomai activity preempts whatever linux context, borrowing the
current mm
2. That activity refers to some memory which is not mapped into the
current mm.
3. Minor fault

Now, whether a minor fault is acceptable or not latency-wise depends on
what has to be done for fixing up the current context: specifically we
must be able to handle the trap immediately without having to wait for
reentering the regular linux context.

On x86, it's not acceptable so we have to pin those mappings a rt
activity might tread on into every mm. Usually, TLB miss handlers for
ppc32/ppc64 can be specifically "ironed", so that we don't have to
downgrade to linux mode for handling those traps. Some ARM families such
as imx6 look ok too these days, hence the recent dropping of pte pinning
for kernel mappings there. Likewise for arm64.

Since you mentioned a patch dating back to 2007, here is a discussion
illustrating the issue from the same period:

https://xenomai.org/pipermail/xenomai/2007-February/007383.html

-- 
Philippe.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-03 15:02               ` Gilles Chanteperdrix
@ 2016-02-03 15:14                 ` Jan Kiszka
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Kiszka @ 2016-02-03 15:14 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On 2016-02-03 16:02, Gilles Chanteperdrix wrote:
> On Wed, Feb 03, 2016 at 03:51:34PM +0100, Jan Kiszka wrote:
>> On 2016-02-03 15:38, Gilles Chanteperdrix wrote:
>>> On Wed, Feb 03, 2016 at 03:31:10PM +0100, Jan Kiszka wrote:
>>>> On 2016-02-03 15:24, Gilles Chanteperdrix wrote:
>>>>> On Wed, Feb 03, 2016 at 01:59:25PM +0100, Henning Schild wrote:
>>>>>> In 4.1 huge page mapping of io memory was introduced, enable ipipe to
>>>>>> handle that when pinning kernel memory.
>>>>>>
>>>>>> change that introduced the feature
>>>>>> 0f616be120c632c818faaea9adcb8f05a7a8601f
>>>>>
>>>>> Could we have an assessment of whether avoiding the call to
>>>>> __ipipe_pin_range_mapping in upper layers makes the patch simpler?
>>>>
>>>> For that, we first of all need to recapitulate how
>>>> __ipipe_pin_range_globally is/was supposed to work at all.
>>>>
>>>> I tried to restore details but I'm specifically failing regarding that
>>>> "globally". To my understanding, the vmalloc_sync_one of x86 transfers
>>>> at best mappings from init_mm to the current mm one, but not to all mm
>>>> in the systems.
>>>
>>> __ipipe_pin_range_globally calls vmalloc_sync_one for all pgds in
>>> the system. This is pretty ugly, not very scalable, but if you do
>>
>> How are future pgds accounted for?
> 
> vmalloc and ioremap add their mappings to init_mm, all processes
> created after that copy their kernel mapping from init_mm, so
> inherit the vmalloc/ioremap mappings.
> 
>>
>>> not do that, each access to an ioremap/vmalloc area in, say, an
>>> interrupt handler, causes a fault for processes that do not have the
>>> mapping in their page tables. Such processes exist if they were
>>> created before the call to ioremap/vmalloc. Another possible fix to
>>> this issue is to allow handling that kind of faults over the head
>>> domain without switching to secondary domain.
>>
>> OK, that makes more sense again.
>>
>> But then Henning is definitely on the right path, because you can't tell
>> from 'start' and 'end' or the pgd pointer if there were only huge pages
>> added or maybe also some small pages. IOW, we do have to walk the page
>> table trees and therefore have to be prepared to hit some huge pages
>> along that path.
> 
> The function which creates the mappings has to know that it creates
> huge mapping,
> what I believe needs to be checked is whether that
> information can easily be made available to
> __ipipe_pin_range_globally call sites. Because if it can, we can
> simply skip the call, and the patch will be simpler that what is
> currently proposed by Henning.

Nope, this is an internal optimization of the mapping functions: If the
requested range is large enough for a huge page and the arch supports
that size, it will be used (because that's faster). So, if we want to
use vmalloc_sync_one for our purposes, we need to enhance it. The
alternative is to reimplement it...

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] ipipe x86_64 huge page ioremap
  2016-02-03 15:07     ` Philippe Gerum
@ 2016-02-03 15:48       ` Philippe Gerum
  2016-02-04 11:43       ` Henning Schild
  1 sibling, 0 replies; 37+ messages in thread
From: Philippe Gerum @ 2016-02-03 15:48 UTC (permalink / raw)
  To: Henning Schild; +Cc: Gilles Chanteperdrix, Xenomai

On 02/03/2016 04:07 PM, Philippe Gerum wrote:
Some ARM families such
> as imx6 look ok too these days, hence the recent dropping of pte pinning

To be precise, the criterion is rather the CPU architecture, v6k and v7
are fine with direct handling of minor faults on trap entry.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] ipipe x86_64 huge page ioremap
  2016-02-03 15:07     ` Philippe Gerum
  2016-02-03 15:48       ` Philippe Gerum
@ 2016-02-04 11:43       ` Henning Schild
  1 sibling, 0 replies; 37+ messages in thread
From: Henning Schild @ 2016-02-04 11:43 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: KISZKA, JAN, Gilles Chanteperdrix, Xenomai

On Wed, 3 Feb 2016 16:07:57 +0100
Philippe Gerum <rpm@xenomai.org> wrote:

> On 02/02/2016 06:43 PM, Henning Schild wrote:
> > On Fri, 15 Jan 2016 09:32:38 +0100
> > Philippe Gerum <rpm@xenomai.org> wrote:
> >   
> >> On 01/14/2016 06:34 PM, Henning Schild wrote:  
> >>> Hey,
> >>>
> >>> the 4.1 kernel supports mapping IO memory using huge pages.
> >>> 0f616be120c632c818faaea9adcb8f05a7a8601f ..
> >>> 6b6378355b925050eb6fa966742d8c2d65ff0d83
> >>>
> >>> In ipipe memory that gets ioremapped will get pinned using
> >>> __ipipe_pin_mapping_globally, however in the x86_64 case that
> >>> function uses vmalloc_sync_one which must only be used on 4k
> >>> pages.
> >>>
> >>> We found the problem when using the kernel in a VBox VM, where the
> >>> paravirtualized PCI device has enough iomem to cause huge page
> >>> mappings. When loading the device driver you will get a BUG caused
> >>> by __ipipe_pin_mapping_globally.
> >>>
> >>> I will work on a fix for the problem. But i would also like to
> >>> understand the initial purpose of the pinning. Is it even supposed
> >>> to work for io memory as well? It looks like a way to commit
> >>> address space changes right down into the page tables, to avoid
> >>> page-faults in the kernel address space. Probably for more
> >>> predictable timing ...   
> >>
> >> This is for pinning the page table entries referencing kernel
> >> mappings, so that we don't get minor faults when treading over
> >> kernel memory, unless the fault fixup code is compatible with
> >> primary domain execution, and cheaper than tracking the pgds.  
> > 
> > Looking at both users of the pinning vmalloc and ioremap it does not
> > seem to me like anything is done lazy here. The complete pagetables
> > are alloced and filled.
> > Maybe i am reading it wrong, maybe the kernel changed since the
> > pinning function was introduced, or something else. Could you
> > please explain what minor faults we are talking about?
> > 
> > Faults on the actual content or faults on the PTs? After all they
> > need to be mapped in order to read/change them.  
> 
> minor faults: MMU traps occurring when the page is in-core, but not
> indexed by the pgd/TLB, due to a lazy/ondemand mapping scheme or lack
> of resources. This mechanism is typically used with vmalloc'ed memory,
> which underlies kernel modules.
> 
> 1. A Xenomai activity preempts whatever linux context, borrowing the
> current mm
> 2. That activity refers to some memory which is not mapped into the
> current mm.
> 3. Minor fault
> 
> Now, whether a minor fault is acceptable or not latency-wise depends
> on what has to be done for fixing up the current context:
> specifically we must be able to handle the trap immediately without
> having to wait for reentering the regular linux context.
> 
> On x86, it's not acceptable so we have to pin those mappings a rt
> activity might tread on into every mm. Usually, TLB miss handlers for
> ppc32/ppc64 can be specifically "ironed", so that we don't have to
> downgrade to linux mode for handling those traps. Some ARM families
> such as imx6 look ok too these days, hence the recent dropping of pte
> pinning for kernel mappings there. Likewise for arm64.
> 
> Since you mentioned a patch dating back to 2007, here is a discussion
> illustrating the issue from the same period:
> 
> https://xenomai.org/pipermail/xenomai/2007-February/007383.html

Thanks for the explanation, now the pinning is clear to me. How it is
done and why we need it.

Henning


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-03 14:24       ` Gilles Chanteperdrix
  2016-02-03 14:31         ` Jan Kiszka
@ 2016-02-04 11:53         ` Henning Schild
  1 sibling, 0 replies; 37+ messages in thread
From: Henning Schild @ 2016-02-04 11:53 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai

On Wed, 3 Feb 2016 15:24:48 +0100
Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote:

> On Wed, Feb 03, 2016 at 01:59:25PM +0100, Henning Schild wrote:
> > In 4.1 huge page mapping of io memory was introduced, enable ipipe
> > to handle that when pinning kernel memory.
> > 
> > change that introduced the feature
> > 0f616be120c632c818faaea9adcb8f05a7a8601f  
> 
> Could we have an assessment of whether avoiding the call to
> __ipipe_pin_range_mapping in upper layers makes the patch simpler?

I will have a look at that. Specifically why mainline does not run into
that. With the lazy nature of the mappings, when it comes to other
mm's, i still do not see why the vmalloc_fault path does not deal with
huge pages.

Henning


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-03 12:59     ` [Xenomai] [PATCH v3] " Henning Schild
  2016-02-03 14:24       ` Gilles Chanteperdrix
@ 2016-02-08  8:44       ` Henning Schild
  2016-03-07  7:58         ` Henning Schild
  1 sibling, 1 reply; 37+ messages in thread
From: Henning Schild @ 2016-02-08  8:44 UTC (permalink / raw)
  To: Xenomai; +Cc: Jan Kiszka, Gilles Chanteperdrix

Hey,

i discussed the issue with the guy that introduced huge ioremap. He
agreed that mainline also needs to handle huge pages in vmalloc_fault
and came up with a very similar patch. (the discussion can partially be
found in this lists archive)
I assume it will soon hit mainline and will get backported to 4.1. So i
guess we can just wait for that and use "nohugeiomap" until the issue
is fixed mainline and merged in our 4.1.y branch.

Henning

On Wed, 3 Feb 2016 13:59:25 +0100
Henning Schild <henning.schild@siemens.com> wrote:

> In 4.1 huge page mapping of io memory was introduced, enable ipipe to
> handle that when pinning kernel memory.
> 
> change that introduced the feature
> 0f616be120c632c818faaea9adcb8f05a7a8601f
> 
> Signed-off-by: Henning Schild <henning.schild@siemens.com>
> ---
>  arch/x86/mm/fault.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> This version has the 32bit case changed according to the problems
> found in the review. That patch finally should keep the old
> behaviour. But we should still discuss whether the old behaviour was
> correct and should be kept. If the pinning it not required it should
> be removed instead of beeing made more complex.
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index fd5bbcc..e78bd09 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -211,6 +211,8 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd,
> unsigned long address) pud_k = pud_offset(pgd_k, address);
>  	if (!pud_present(*pud_k))
>  		return NULL;
> +	if (pud_large(*pud))
> +		return NULL;
>  
>  	pmd = pmd_offset(pud, address);
>  	pmd_k = pmd_offset(pud_k, address);
> @@ -400,6 +402,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
> unsigned long address) 
>  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
> pud_page_vaddr(*pud_ref)) BUG();
> +	if (pud_large(*pud))
> +		return 0;
>  
>  	pmd = pmd_offset(pud, address);
>  	pmd_ref = pmd_offset(pud_ref, address);
> @@ -408,6 +412,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
> unsigned long address) 
>  	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
>  		BUG();
> +	if (pmd_large(*pmd))
> +		return 0;
>  
>  	pte_ref = pte_offset_kernel(pmd_ref, address);
>  	if (!pte_present(*pte_ref))



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning
  2016-02-08  8:44       ` Henning Schild
@ 2016-03-07  7:58         ` Henning Schild
  0 siblings, 0 replies; 37+ messages in thread
From: Henning Schild @ 2016-03-07  7:58 UTC (permalink / raw)
  To: Xenomai; +Cc: Jan Kiszka, Gilles Chanteperdrix

Hey,

the patch went into 4.1.19, please rebase/merge so ipipe-4.1.y contains
it.

Henning

On Mon, 8 Feb 2016 09:44:28 +0100
Henning Schild <henning.schild@siemens.com> wrote:

> Hey,
> 
> i discussed the issue with the guy that introduced huge ioremap. He
> agreed that mainline also needs to handle huge pages in vmalloc_fault
> and came up with a very similar patch. (the discussion can partially
> be found in this lists archive)
> I assume it will soon hit mainline and will get backported to 4.1. So
> i guess we can just wait for that and use "nohugeiomap" until the
> issue is fixed mainline and merged in our 4.1.y branch.
> 
> Henning
> 
> On Wed, 3 Feb 2016 13:59:25 +0100
> Henning Schild <henning.schild@siemens.com> wrote:
> 
> > In 4.1 huge page mapping of io memory was introduced, enable ipipe
> > to handle that when pinning kernel memory.
> > 
> > change that introduced the feature
> > 0f616be120c632c818faaea9adcb8f05a7a8601f
> > 
> > Signed-off-by: Henning Schild <henning.schild@siemens.com>
> > ---
> >  arch/x86/mm/fault.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > This version has the 32bit case changed according to the problems
> > found in the review. That patch finally should keep the old
> > behaviour. But we should still discuss whether the old behaviour was
> > correct and should be kept. If the pinning it not required it should
> > be removed instead of beeing made more complex.
> > 
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index fd5bbcc..e78bd09 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -211,6 +211,8 @@ static inline pmd_t *vmalloc_sync_one(pgd_t
> > *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address);
> >  	if (!pud_present(*pud_k))
> >  		return NULL;
> > +	if (pud_large(*pud))
> > +		return NULL;
> >  
> >  	pmd = pmd_offset(pud, address);
> >  	pmd_k = pmd_offset(pud_k, address);
> > @@ -400,6 +402,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
> > unsigned long address) 
> >  	if (pud_none(*pud) || pud_page_vaddr(*pud) !=
> > pud_page_vaddr(*pud_ref)) BUG();
> > +	if (pud_large(*pud))
> > +		return 0;
> >  
> >  	pmd = pmd_offset(pud, address);
> >  	pmd_ref = pmd_offset(pud_ref, address);
> > @@ -408,6 +412,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd,
> > unsigned long address) 
> >  	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
> >  		BUG();
> > +	if (pmd_large(*pmd))
> > +		return 0;
> >  
> >  	pte_ref = pte_offset_kernel(pmd_ref, address);
> >  	if (!pte_present(*pte_ref))  
> 
> 
> _______________________________________________
> Xenomai mailing list
> Xenomai@xenomai.org
> http://xenomai.org/mailman/listinfo/xenomai



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2016-03-07  7:58 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-14 17:34 [Xenomai] ipipe x86_64 huge page ioremap Henning Schild
2016-01-15  8:32 ` Philippe Gerum
2016-01-15 12:34   ` Jan Kiszka
2016-02-02 17:43   ` Henning Schild
2016-02-03 15:07     ` Philippe Gerum
2016-02-03 15:48       ` Philippe Gerum
2016-02-04 11:43       ` Henning Schild
2016-01-26 15:20 ` [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning Henning Schild
2016-01-26 20:18   ` Jan Kiszka
2016-01-27  9:54     ` Henning Schild
2016-01-27 10:31       ` Jan Kiszka
2016-01-27 10:44         ` Philippe Gerum
2016-01-27 10:46           ` Philippe Gerum
2016-01-27 13:41   ` [Xenomai] [PATCH v2] " Henning Schild
2016-01-28 10:53     ` Philippe Gerum
2016-01-28 20:53       ` Henning Schild
2016-01-29 17:11         ` Philippe Gerum
2016-01-29 18:39           ` Gilles Chanteperdrix
2016-02-02 12:08             ` Henning Schild
2016-02-02 13:58               ` Philippe Gerum
2016-02-02 16:38                 ` Henning Schild
2016-02-02 16:39                   ` Jan Kiszka
2016-02-02 19:26                   ` Gilles Chanteperdrix
2016-02-03 11:35                     ` Henning Schild
2016-02-02 14:18               ` Philippe Gerum
2016-02-02 16:30                 ` Henning Schild
2016-02-02 11:41           ` Henning Schild
2016-02-03 12:59     ` [Xenomai] [PATCH v3] " Henning Schild
2016-02-03 14:24       ` Gilles Chanteperdrix
2016-02-03 14:31         ` Jan Kiszka
2016-02-03 14:38           ` Gilles Chanteperdrix
2016-02-03 14:51             ` Jan Kiszka
2016-02-03 15:02               ` Gilles Chanteperdrix
2016-02-03 15:14                 ` Jan Kiszka
2016-02-04 11:53         ` Henning Schild
2016-02-08  8:44       ` Henning Schild
2016-03-07  7:58         ` Henning Schild

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.