* [Xenomai] ipipe x86_64 huge page ioremap @ 2016-01-14 17:34 Henning Schild 2016-01-15 8:32 ` Philippe Gerum 2016-01-26 15:20 ` [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning Henning Schild 0 siblings, 2 replies; 37+ messages in thread From: Henning Schild @ 2016-01-14 17:34 UTC (permalink / raw) To: Xenomai; +Cc: Gilles Chanteperdrix Hey, the 4.1 kernel supports mapping IO memory using huge pages. 0f616be120c632c818faaea9adcb8f05a7a8601f .. 6b6378355b925050eb6fa966742d8c2d65ff0d83 In ipipe memory that gets ioremapped will get pinned using __ipipe_pin_mapping_globally, however in the x86_64 case that function uses vmalloc_sync_one which must only be used on 4k pages. We found the problem when using the kernel in a VBox VM, where the paravirtualized PCI device has enough iomem to cause huge page mappings. When loading the device driver you will get a BUG caused by __ipipe_pin_mapping_globally. I will work on a fix for the problem. But i would also like to understand the initial purpose of the pinning. Is it even supposed to work for io memory as well? It looks like a way to commit address space changes right down into the page tables, to avoid page-faults in the kernel address space. Probably for more predictable timing ... regards, Henning ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] ipipe x86_64 huge page ioremap 2016-01-14 17:34 [Xenomai] ipipe x86_64 huge page ioremap Henning Schild @ 2016-01-15 8:32 ` Philippe Gerum 2016-01-15 12:34 ` Jan Kiszka 2016-02-02 17:43 ` Henning Schild 2016-01-26 15:20 ` [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning Henning Schild 1 sibling, 2 replies; 37+ messages in thread From: Philippe Gerum @ 2016-01-15 8:32 UTC (permalink / raw) To: Henning Schild, Xenomai; +Cc: Gilles Chanteperdrix On 01/14/2016 06:34 PM, Henning Schild wrote: > Hey, > > the 4.1 kernel supports mapping IO memory using huge pages. > 0f616be120c632c818faaea9adcb8f05a7a8601f .. > 6b6378355b925050eb6fa966742d8c2d65ff0d83 > > In ipipe memory that gets ioremapped will get pinned using > __ipipe_pin_mapping_globally, however in the x86_64 case that function > uses vmalloc_sync_one which must only be used on 4k pages. > > We found the problem when using the kernel in a VBox VM, where the > paravirtualized PCI device has enough iomem to cause huge page > mappings. When loading the device driver you will get a BUG caused by > __ipipe_pin_mapping_globally. > > I will work on a fix for the problem. But i would also like to > understand the initial purpose of the pinning. Is it even supposed to > work for io memory as well? It looks like a way to commit address space > changes right down into the page tables, to avoid page-faults in the > kernel address space. Probably for more predictable timing ... > This is for pinning the page table entries referencing kernel mappings, so that we don't get minor faults when treading over kernel memory, unless the fault fixup code is compatible with primary domain execution, and cheaper than tracking the pgds. -- Philippe. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] ipipe x86_64 huge page ioremap 2016-01-15 8:32 ` Philippe Gerum @ 2016-01-15 12:34 ` Jan Kiszka 2016-02-02 17:43 ` Henning Schild 1 sibling, 0 replies; 37+ messages in thread From: Jan Kiszka @ 2016-01-15 12:34 UTC (permalink / raw) To: Philippe Gerum, Henning Schild, Xenomai; +Cc: Gilles Chanteperdrix On 2016-01-15 09:32, Philippe Gerum wrote: > On 01/14/2016 06:34 PM, Henning Schild wrote: >> Hey, >> >> the 4.1 kernel supports mapping IO memory using huge pages. >> 0f616be120c632c818faaea9adcb8f05a7a8601f .. >> 6b6378355b925050eb6fa966742d8c2d65ff0d83 >> >> In ipipe memory that gets ioremapped will get pinned using >> __ipipe_pin_mapping_globally, however in the x86_64 case that function >> uses vmalloc_sync_one which must only be used on 4k pages. >> >> We found the problem when using the kernel in a VBox VM, where the >> paravirtualized PCI device has enough iomem to cause huge page >> mappings. When loading the device driver you will get a BUG caused by >> __ipipe_pin_mapping_globally. >> >> I will work on a fix for the problem. But i would also like to >> understand the initial purpose of the pinning. Is it even supposed to >> work for io memory as well? It looks like a way to commit address space >> changes right down into the page tables, to avoid page-faults in the >> kernel address space. Probably for more predictable timing ... >> > > This is for pinning the page table entries referencing kernel mappings, > so that we don't get minor faults when treading over kernel memory, > unless the fault fixup code is compatible with primary domain execution, > and cheaper than tracking the pgds. I suppose a critical scenario would already be a real-time driver that accesses io regions in its interrupt handler or in the context of some real-time task. Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] ipipe x86_64 huge page ioremap 2016-01-15 8:32 ` Philippe Gerum 2016-01-15 12:34 ` Jan Kiszka @ 2016-02-02 17:43 ` Henning Schild 2016-02-03 15:07 ` Philippe Gerum 1 sibling, 1 reply; 37+ messages in thread From: Henning Schild @ 2016-02-02 17:43 UTC (permalink / raw) To: Philippe Gerum; +Cc: Gilles Chanteperdrix, Xenomai On Fri, 15 Jan 2016 09:32:38 +0100 Philippe Gerum <rpm@xenomai.org> wrote: > On 01/14/2016 06:34 PM, Henning Schild wrote: > > Hey, > > > > the 4.1 kernel supports mapping IO memory using huge pages. > > 0f616be120c632c818faaea9adcb8f05a7a8601f .. > > 6b6378355b925050eb6fa966742d8c2d65ff0d83 > > > > In ipipe memory that gets ioremapped will get pinned using > > __ipipe_pin_mapping_globally, however in the x86_64 case that > > function uses vmalloc_sync_one which must only be used on 4k pages. > > > > We found the problem when using the kernel in a VBox VM, where the > > paravirtualized PCI device has enough iomem to cause huge page > > mappings. When loading the device driver you will get a BUG caused > > by __ipipe_pin_mapping_globally. > > > > I will work on a fix for the problem. But i would also like to > > understand the initial purpose of the pinning. Is it even supposed > > to work for io memory as well? It looks like a way to commit > > address space changes right down into the page tables, to avoid > > page-faults in the kernel address space. Probably for more > > predictable timing ... > > This is for pinning the page table entries referencing kernel > mappings, so that we don't get minor faults when treading over kernel > memory, unless the fault fixup code is compatible with primary domain > execution, and cheaper than tracking the pgds. Looking at both users of the pinning vmalloc and ioremap it does not seem to me like anything is done lazy here. The complete pagetables are alloced and filled. Maybe i am reading it wrong, maybe the kernel changed since the pinning function was introduced, or something else. Could you please explain what minor faults we are talking about? Faults on the actual content or faults on the PTs? After all they need to be mapped in order to read/change them. Henning ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] ipipe x86_64 huge page ioremap 2016-02-02 17:43 ` Henning Schild @ 2016-02-03 15:07 ` Philippe Gerum 2016-02-03 15:48 ` Philippe Gerum 2016-02-04 11:43 ` Henning Schild 0 siblings, 2 replies; 37+ messages in thread From: Philippe Gerum @ 2016-02-03 15:07 UTC (permalink / raw) To: Henning Schild; +Cc: Gilles Chanteperdrix, Xenomai On 02/02/2016 06:43 PM, Henning Schild wrote: > On Fri, 15 Jan 2016 09:32:38 +0100 > Philippe Gerum <rpm@xenomai.org> wrote: > >> On 01/14/2016 06:34 PM, Henning Schild wrote: >>> Hey, >>> >>> the 4.1 kernel supports mapping IO memory using huge pages. >>> 0f616be120c632c818faaea9adcb8f05a7a8601f .. >>> 6b6378355b925050eb6fa966742d8c2d65ff0d83 >>> >>> In ipipe memory that gets ioremapped will get pinned using >>> __ipipe_pin_mapping_globally, however in the x86_64 case that >>> function uses vmalloc_sync_one which must only be used on 4k pages. >>> >>> We found the problem when using the kernel in a VBox VM, where the >>> paravirtualized PCI device has enough iomem to cause huge page >>> mappings. When loading the device driver you will get a BUG caused >>> by __ipipe_pin_mapping_globally. >>> >>> I will work on a fix for the problem. But i would also like to >>> understand the initial purpose of the pinning. Is it even supposed >>> to work for io memory as well? It looks like a way to commit >>> address space changes right down into the page tables, to avoid >>> page-faults in the kernel address space. Probably for more >>> predictable timing ... >> >> This is for pinning the page table entries referencing kernel >> mappings, so that we don't get minor faults when treading over kernel >> memory, unless the fault fixup code is compatible with primary domain >> execution, and cheaper than tracking the pgds. > > Looking at both users of the pinning vmalloc and ioremap it does not > seem to me like anything is done lazy here. The complete pagetables are > alloced and filled. > Maybe i am reading it wrong, maybe the kernel changed since the pinning > function was introduced, or something else. Could you please explain > what minor faults we are talking about? > > Faults on the actual content or faults on the PTs? After all they need > to be mapped in order to read/change them. minor faults: MMU traps occurring when the page is in-core, but not indexed by the pgd/TLB, due to a lazy/ondemand mapping scheme or lack of resources. This mechanism is typically used with vmalloc'ed memory, which underlies kernel modules. 1. A Xenomai activity preempts whatever linux context, borrowing the current mm 2. That activity refers to some memory which is not mapped into the current mm. 3. Minor fault Now, whether a minor fault is acceptable or not latency-wise depends on what has to be done for fixing up the current context: specifically we must be able to handle the trap immediately without having to wait for reentering the regular linux context. On x86, it's not acceptable so we have to pin those mappings a rt activity might tread on into every mm. Usually, TLB miss handlers for ppc32/ppc64 can be specifically "ironed", so that we don't have to downgrade to linux mode for handling those traps. Some ARM families such as imx6 look ok too these days, hence the recent dropping of pte pinning for kernel mappings there. Likewise for arm64. Since you mentioned a patch dating back to 2007, here is a discussion illustrating the issue from the same period: https://xenomai.org/pipermail/xenomai/2007-February/007383.html -- Philippe. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] ipipe x86_64 huge page ioremap 2016-02-03 15:07 ` Philippe Gerum @ 2016-02-03 15:48 ` Philippe Gerum 2016-02-04 11:43 ` Henning Schild 1 sibling, 0 replies; 37+ messages in thread From: Philippe Gerum @ 2016-02-03 15:48 UTC (permalink / raw) To: Henning Schild; +Cc: Gilles Chanteperdrix, Xenomai On 02/03/2016 04:07 PM, Philippe Gerum wrote: Some ARM families such > as imx6 look ok too these days, hence the recent dropping of pte pinning To be precise, the criterion is rather the CPU architecture, v6k and v7 are fine with direct handling of minor faults on trap entry. -- Philippe. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] ipipe x86_64 huge page ioremap 2016-02-03 15:07 ` Philippe Gerum 2016-02-03 15:48 ` Philippe Gerum @ 2016-02-04 11:43 ` Henning Schild 1 sibling, 0 replies; 37+ messages in thread From: Henning Schild @ 2016-02-04 11:43 UTC (permalink / raw) To: Philippe Gerum; +Cc: KISZKA, JAN, Gilles Chanteperdrix, Xenomai On Wed, 3 Feb 2016 16:07:57 +0100 Philippe Gerum <rpm@xenomai.org> wrote: > On 02/02/2016 06:43 PM, Henning Schild wrote: > > On Fri, 15 Jan 2016 09:32:38 +0100 > > Philippe Gerum <rpm@xenomai.org> wrote: > > > >> On 01/14/2016 06:34 PM, Henning Schild wrote: > >>> Hey, > >>> > >>> the 4.1 kernel supports mapping IO memory using huge pages. > >>> 0f616be120c632c818faaea9adcb8f05a7a8601f .. > >>> 6b6378355b925050eb6fa966742d8c2d65ff0d83 > >>> > >>> In ipipe memory that gets ioremapped will get pinned using > >>> __ipipe_pin_mapping_globally, however in the x86_64 case that > >>> function uses vmalloc_sync_one which must only be used on 4k > >>> pages. > >>> > >>> We found the problem when using the kernel in a VBox VM, where the > >>> paravirtualized PCI device has enough iomem to cause huge page > >>> mappings. When loading the device driver you will get a BUG caused > >>> by __ipipe_pin_mapping_globally. > >>> > >>> I will work on a fix for the problem. But i would also like to > >>> understand the initial purpose of the pinning. Is it even supposed > >>> to work for io memory as well? It looks like a way to commit > >>> address space changes right down into the page tables, to avoid > >>> page-faults in the kernel address space. Probably for more > >>> predictable timing ... > >> > >> This is for pinning the page table entries referencing kernel > >> mappings, so that we don't get minor faults when treading over > >> kernel memory, unless the fault fixup code is compatible with > >> primary domain execution, and cheaper than tracking the pgds. > > > > Looking at both users of the pinning vmalloc and ioremap it does not > > seem to me like anything is done lazy here. The complete pagetables > > are alloced and filled. > > Maybe i am reading it wrong, maybe the kernel changed since the > > pinning function was introduced, or something else. Could you > > please explain what minor faults we are talking about? > > > > Faults on the actual content or faults on the PTs? After all they > > need to be mapped in order to read/change them. > > minor faults: MMU traps occurring when the page is in-core, but not > indexed by the pgd/TLB, due to a lazy/ondemand mapping scheme or lack > of resources. This mechanism is typically used with vmalloc'ed memory, > which underlies kernel modules. > > 1. A Xenomai activity preempts whatever linux context, borrowing the > current mm > 2. That activity refers to some memory which is not mapped into the > current mm. > 3. Minor fault > > Now, whether a minor fault is acceptable or not latency-wise depends > on what has to be done for fixing up the current context: > specifically we must be able to handle the trap immediately without > having to wait for reentering the regular linux context. > > On x86, it's not acceptable so we have to pin those mappings a rt > activity might tread on into every mm. Usually, TLB miss handlers for > ppc32/ppc64 can be specifically "ironed", so that we don't have to > downgrade to linux mode for handling those traps. Some ARM families > such as imx6 look ok too these days, hence the recent dropping of pte > pinning for kernel mappings there. Likewise for arm64. > > Since you mentioned a patch dating back to 2007, here is a discussion > illustrating the issue from the same period: > > https://xenomai.org/pipermail/xenomai/2007-February/007383.html Thanks for the explanation, now the pinning is clear to me. How it is done and why we need it. Henning ^ permalink raw reply [flat|nested] 37+ messages in thread
* [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning 2016-01-14 17:34 [Xenomai] ipipe x86_64 huge page ioremap Henning Schild 2016-01-15 8:32 ` Philippe Gerum @ 2016-01-26 15:20 ` Henning Schild 2016-01-26 20:18 ` Jan Kiszka 2016-01-27 13:41 ` [Xenomai] [PATCH v2] " Henning Schild 1 sibling, 2 replies; 37+ messages in thread From: Henning Schild @ 2016-01-26 15:20 UTC (permalink / raw) To: Xenomai In 4.1 huge page mapping of io memory was introduced, enable ipipe to handle that when pinning kernel memory. change that introduced the feature 0f616be120c632c818faaea9adcb8f05a7a8601f Signed-off-by: Henning Schild <henning.schild@siemens.com> --- arch/x86/mm/fault.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fd5bbcc..5585610 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -211,11 +211,19 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address); if (!pud_present(*pud_k)) return NULL; +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) + if (pud_large(*pud)) + return pud_k; +#endif pmd = pmd_offset(pud, address); pmd_k = pmd_offset(pud_k, address); if (!pmd_present(*pmd_k)) return NULL; +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) + if (pmd_large(*pmd)) + return pmd_k; +#endif if (!pmd_present(*pmd)) set_pmd(pmd, *pmd_k); @@ -400,6 +408,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address) if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref)) BUG(); +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) + if (pud_large(*pud)) + return 0; +#endif pmd = pmd_offset(pud, address); pmd_ref = pmd_offset(pud_ref, address); @@ -408,6 +420,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address) if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) BUG(); +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) + if (pmd_large(*pmd)) + return 0; +#endif pte_ref = pte_offset_kernel(pmd_ref, address); if (!pte_present(*pte_ref)) -- 2.4.10 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning 2016-01-26 15:20 ` [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning Henning Schild @ 2016-01-26 20:18 ` Jan Kiszka 2016-01-27 9:54 ` Henning Schild 2016-01-27 13:41 ` [Xenomai] [PATCH v2] " Henning Schild 1 sibling, 1 reply; 37+ messages in thread From: Jan Kiszka @ 2016-01-26 20:18 UTC (permalink / raw) To: Henning Schild, Xenomai On 2016-01-26 16:20, Henning Schild wrote: > In 4.1 huge page mapping of io memory was introduced, enable ipipe to > handle that when pinning kernel memory. > > change that introduced the feature > 0f616be120c632c818faaea9adcb8f05a7a8601f > > Signed-off-by: Henning Schild <henning.schild@siemens.com> > --- > arch/x86/mm/fault.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > index fd5bbcc..5585610 100644 > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -211,11 +211,19 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address) > pud_k = pud_offset(pgd_k, address); > if (!pud_present(*pud_k)) > return NULL; > +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) > + if (pud_large(*pud)) > + return pud_k; > +#endif > > pmd = pmd_offset(pud, address); > pmd_k = pmd_offset(pud_k, address); > if (!pmd_present(*pmd_k)) > return NULL; > +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) > + if (pmd_large(*pmd)) > + return pmd_k; > +#endif > > if (!pmd_present(*pmd)) > set_pmd(pmd, *pmd_k); > @@ -400,6 +408,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address) > > if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref)) > BUG(); > +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) > + if (pud_large(*pud)) > + return 0; > +#endif > > pmd = pmd_offset(pud, address); > pmd_ref = pmd_offset(pud_ref, address); > @@ -408,6 +420,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address) > > if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) > BUG(); > +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) > + if (pmd_large(*pmd)) > + return 0; > +#endif > > pte_ref = pte_offset_kernel(pmd_ref, address); > if (!pte_present(*pte_ref)) > Do we need the #ifdefs? If not, maybe just leave comments to establish the relationship to I-pipe. Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning 2016-01-26 20:18 ` Jan Kiszka @ 2016-01-27 9:54 ` Henning Schild 2016-01-27 10:31 ` Jan Kiszka 0 siblings, 1 reply; 37+ messages in thread From: Henning Schild @ 2016-01-27 9:54 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai On Tue, 26 Jan 2016 21:18:53 +0100 Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2016-01-26 16:20, Henning Schild wrote: > > In 4.1 huge page mapping of io memory was introduced, enable ipipe > > to handle that when pinning kernel memory. > > > > change that introduced the feature > > 0f616be120c632c818faaea9adcb8f05a7a8601f > > > > Signed-off-by: Henning Schild <henning.schild@siemens.com> > > --- > > arch/x86/mm/fault.c | 16 ++++++++++++++++ > > 1 file changed, 16 insertions(+) > > > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > > index fd5bbcc..5585610 100644 > > --- a/arch/x86/mm/fault.c > > +++ b/arch/x86/mm/fault.c > > @@ -211,11 +211,19 @@ static inline pmd_t *vmalloc_sync_one(pgd_t > > *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address); > > if (!pud_present(*pud_k)) > > return NULL; > > +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) > > + if (pud_large(*pud)) > > + return pud_k; > > +#endif > > > > pmd = pmd_offset(pud, address); > > pmd_k = pmd_offset(pud_k, address); > > if (!pmd_present(*pmd_k)) > > return NULL; > > +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) > > + if (pmd_large(*pmd)) > > + return pmd_k; > > +#endif > > > > if (!pmd_present(*pmd)) > > set_pmd(pmd, *pmd_k); > > @@ -400,6 +408,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, > > unsigned long address) > > if (pud_none(*pud) || pud_page_vaddr(*pud) != > > pud_page_vaddr(*pud_ref)) BUG(); > > +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) > > + if (pud_large(*pud)) > > + return 0; > > +#endif > > > > pmd = pmd_offset(pud, address); > > pmd_ref = pmd_offset(pud_ref, address); > > @@ -408,6 +420,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, > > unsigned long address) > > if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) > > BUG(); > > +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) > > + if (pmd_large(*pmd)) > > + return 0; > > +#endif > > > > pte_ref = pte_offset_kernel(pmd_ref, address); > > if (!pte_present(*pte_ref)) > > > > Do we need the #ifdefs? If not, maybe just leave comments to establish > the relationship to I-pipe. We do not, without ipipe the function is never called from a context where that check is required. If it was, this change or something similar would be in mainline. I see the ifdefs as kind of comments that establish the relationship to ipipe and the huge_vmap feature. Plus in all other cases a pointless check is not even compiled in. If comments should be used instead of ifdefs i will change that. Henning ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning 2016-01-27 9:54 ` Henning Schild @ 2016-01-27 10:31 ` Jan Kiszka 2016-01-27 10:44 ` Philippe Gerum 0 siblings, 1 reply; 37+ messages in thread From: Jan Kiszka @ 2016-01-27 10:31 UTC (permalink / raw) To: Henning Schild, Philippe Gerum; +Cc: Xenomai On 2016-01-27 10:54, Henning Schild wrote: > On Tue, 26 Jan 2016 21:18:53 +0100 > Jan Kiszka <jan.kiszka@siemens.com> wrote: > >> On 2016-01-26 16:20, Henning Schild wrote: >>> In 4.1 huge page mapping of io memory was introduced, enable ipipe >>> to handle that when pinning kernel memory. >>> >>> change that introduced the feature >>> 0f616be120c632c818faaea9adcb8f05a7a8601f >>> >>> Signed-off-by: Henning Schild <henning.schild@siemens.com> >>> --- >>> arch/x86/mm/fault.c | 16 ++++++++++++++++ >>> 1 file changed, 16 insertions(+) >>> >>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c >>> index fd5bbcc..5585610 100644 >>> --- a/arch/x86/mm/fault.c >>> +++ b/arch/x86/mm/fault.c >>> @@ -211,11 +211,19 @@ static inline pmd_t *vmalloc_sync_one(pgd_t >>> *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address); >>> if (!pud_present(*pud_k)) >>> return NULL; >>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) >>> + if (pud_large(*pud)) >>> + return pud_k; >>> +#endif >>> >>> pmd = pmd_offset(pud, address); >>> pmd_k = pmd_offset(pud_k, address); >>> if (!pmd_present(*pmd_k)) >>> return NULL; >>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) >>> + if (pmd_large(*pmd)) >>> + return pmd_k; >>> +#endif >>> >>> if (!pmd_present(*pmd)) >>> set_pmd(pmd, *pmd_k); >>> @@ -400,6 +408,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, >>> unsigned long address) >>> if (pud_none(*pud) || pud_page_vaddr(*pud) != >>> pud_page_vaddr(*pud_ref)) BUG(); >>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) >>> + if (pud_large(*pud)) >>> + return 0; >>> +#endif >>> >>> pmd = pmd_offset(pud, address); >>> pmd_ref = pmd_offset(pud_ref, address); >>> @@ -408,6 +420,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, >>> unsigned long address) >>> if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) >>> BUG(); >>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) >>> + if (pmd_large(*pmd)) >>> + return 0; >>> +#endif >>> >>> pte_ref = pte_offset_kernel(pmd_ref, address); >>> if (!pte_present(*pte_ref)) >>> >> >> Do we need the #ifdefs? If not, maybe just leave comments to establish >> the relationship to I-pipe. > > We do not, without ipipe the function is never called from a > context where that check is required. If it was, this change or > something similar would be in mainline. > I see the ifdefs as kind of comments that establish the relationship to > ipipe and the huge_vmap feature. Plus in all other cases a pointless > check is not even compiled in. > If comments should be used instead of ifdefs i will change that. The optimal way of documenting the motivation for these changes is in a changelog. Provided the patch will survive future I-pipe versions and not be folded into a bigger patch (Philippe, please confirm), let's go for that. Plan B is inline comments. But #ifdefs for documentation purposes are not desirable. Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning 2016-01-27 10:31 ` Jan Kiszka @ 2016-01-27 10:44 ` Philippe Gerum 2016-01-27 10:46 ` Philippe Gerum 0 siblings, 1 reply; 37+ messages in thread From: Philippe Gerum @ 2016-01-27 10:44 UTC (permalink / raw) To: Jan Kiszka, Henning Schild; +Cc: Xenomai On 01/27/2016 11:31 AM, Jan Kiszka wrote: > On 2016-01-27 10:54, Henning Schild wrote: >> On Tue, 26 Jan 2016 21:18:53 +0100 >> Jan Kiszka <jan.kiszka@siemens.com> wrote: >> >>> On 2016-01-26 16:20, Henning Schild wrote: >>>> In 4.1 huge page mapping of io memory was introduced, enable ipipe >>>> to handle that when pinning kernel memory. >>>> >>>> change that introduced the feature >>>> 0f616be120c632c818faaea9adcb8f05a7a8601f >>>> >>>> Signed-off-by: Henning Schild <henning.schild@siemens.com> >>>> --- >>>> arch/x86/mm/fault.c | 16 ++++++++++++++++ >>>> 1 file changed, 16 insertions(+) >>>> >>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c >>>> index fd5bbcc..5585610 100644 >>>> --- a/arch/x86/mm/fault.c >>>> +++ b/arch/x86/mm/fault.c >>>> @@ -211,11 +211,19 @@ static inline pmd_t *vmalloc_sync_one(pgd_t >>>> *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address); >>>> if (!pud_present(*pud_k)) >>>> return NULL; >>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) >>>> + if (pud_large(*pud)) >>>> + return pud_k; >>>> +#endif >>>> >>>> pmd = pmd_offset(pud, address); >>>> pmd_k = pmd_offset(pud_k, address); >>>> if (!pmd_present(*pmd_k)) >>>> return NULL; >>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) >>>> + if (pmd_large(*pmd)) >>>> + return pmd_k; >>>> +#endif >>>> >>>> if (!pmd_present(*pmd)) >>>> set_pmd(pmd, *pmd_k); >>>> @@ -400,6 +408,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, >>>> unsigned long address) >>>> if (pud_none(*pud) || pud_page_vaddr(*pud) != >>>> pud_page_vaddr(*pud_ref)) BUG(); >>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) >>>> + if (pud_large(*pud)) >>>> + return 0; >>>> +#endif >>>> >>>> pmd = pmd_offset(pud, address); >>>> pmd_ref = pmd_offset(pud_ref, address); >>>> @@ -408,6 +420,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, >>>> unsigned long address) >>>> if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) >>>> BUG(); >>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) >>>> + if (pmd_large(*pmd)) >>>> + return 0; >>>> +#endif >>>> >>>> pte_ref = pte_offset_kernel(pmd_ref, address); >>>> if (!pte_present(*pte_ref)) >>>> >>> >>> Do we need the #ifdefs? If not, maybe just leave comments to establish >>> the relationship to I-pipe. >> >> We do not, without ipipe the function is never called from a >> context where that check is required. If it was, this change or >> something similar would be in mainline. >> I see the ifdefs as kind of comments that establish the relationship to >> ipipe and the huge_vmap feature. Plus in all other cases a pointless >> check is not even compiled in. >> If comments should be used instead of ifdefs i will change that. > > The optimal way of documenting the motivation for these changes is in a > changelog. Provided the patch will survive future I-pipe versions and > not be folded into a bigger patch (Philippe, please confirm), In the current workflow, only raw/* branches may be rebased, so folding patches within a maintenance branch can't happen. let's go > for that. Plan B is inline comments. But #ifdefs for documentation > purposes are not desirable. > Documentation is best when close to the documented code, so to me plan B is actually plan A, although I agree that complex/tricky/non-obvious matters addressed by any given patch should be described accurately in its changelog. -- Philippe. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning 2016-01-27 10:44 ` Philippe Gerum @ 2016-01-27 10:46 ` Philippe Gerum 0 siblings, 0 replies; 37+ messages in thread From: Philippe Gerum @ 2016-01-27 10:46 UTC (permalink / raw) To: Jan Kiszka, Henning Schild; +Cc: Xenomai On 01/27/2016 11:44 AM, Philippe Gerum wrote: > On 01/27/2016 11:31 AM, Jan Kiszka wrote: >> On 2016-01-27 10:54, Henning Schild wrote: >>> On Tue, 26 Jan 2016 21:18:53 +0100 >>> Jan Kiszka <jan.kiszka@siemens.com> wrote: >>> >>>> On 2016-01-26 16:20, Henning Schild wrote: >>>>> In 4.1 huge page mapping of io memory was introduced, enable ipipe >>>>> to handle that when pinning kernel memory. >>>>> >>>>> change that introduced the feature >>>>> 0f616be120c632c818faaea9adcb8f05a7a8601f >>>>> >>>>> Signed-off-by: Henning Schild <henning.schild@siemens.com> >>>>> --- >>>>> arch/x86/mm/fault.c | 16 ++++++++++++++++ >>>>> 1 file changed, 16 insertions(+) >>>>> >>>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c >>>>> index fd5bbcc..5585610 100644 >>>>> --- a/arch/x86/mm/fault.c >>>>> +++ b/arch/x86/mm/fault.c >>>>> @@ -211,11 +211,19 @@ static inline pmd_t *vmalloc_sync_one(pgd_t >>>>> *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address); >>>>> if (!pud_present(*pud_k)) >>>>> return NULL; >>>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) >>>>> + if (pud_large(*pud)) >>>>> + return pud_k; >>>>> +#endif >>>>> >>>>> pmd = pmd_offset(pud, address); >>>>> pmd_k = pmd_offset(pud_k, address); >>>>> if (!pmd_present(*pmd_k)) >>>>> return NULL; >>>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) >>>>> + if (pmd_large(*pmd)) >>>>> + return pmd_k; >>>>> +#endif >>>>> >>>>> if (!pmd_present(*pmd)) >>>>> set_pmd(pmd, *pmd_k); >>>>> @@ -400,6 +408,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, >>>>> unsigned long address) >>>>> if (pud_none(*pud) || pud_page_vaddr(*pud) != >>>>> pud_page_vaddr(*pud_ref)) BUG(); >>>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) >>>>> + if (pud_large(*pud)) >>>>> + return 0; >>>>> +#endif >>>>> >>>>> pmd = pmd_offset(pud, address); >>>>> pmd_ref = pmd_offset(pud_ref, address); >>>>> @@ -408,6 +420,10 @@ static inline int vmalloc_sync_one(pgd_t *pgd, >>>>> unsigned long address) >>>>> if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) >>>>> BUG(); >>>>> +#if defined(CONFIG_IPIPE) && defined(CONFIG_HAVE_ARCH_HUGE_VMAP) >>>>> + if (pmd_large(*pmd)) >>>>> + return 0; >>>>> +#endif >>>>> >>>>> pte_ref = pte_offset_kernel(pmd_ref, address); >>>>> if (!pte_present(*pte_ref)) >>>>> >>>> >>>> Do we need the #ifdefs? If not, maybe just leave comments to establish >>>> the relationship to I-pipe. >>> >>> We do not, without ipipe the function is never called from a >>> context where that check is required. If it was, this change or >>> something similar would be in mainline. >>> I see the ifdefs as kind of comments that establish the relationship to >>> ipipe and the huge_vmap feature. Plus in all other cases a pointless >>> check is not even compiled in. >>> If comments should be used instead of ifdefs i will change that. >> >> The optimal way of documenting the motivation for these changes is in a >> changelog. Provided the patch will survive future I-pipe versions and >> not be folded into a bigger patch (Philippe, please confirm), > > In the current workflow, only raw/* branches may be rebased, so folding > patches within a maintenance branch can't happen. > > let's go >> for that. Plan B is inline comments. But #ifdefs for documentation >> purposes are not desirable. >> > > Documentation is best when close to the documented code, so to me plan B > is actually plan A, although I agree that complex/tricky/non-obvious > matters addressed by any given patch should be described accurately in > its changelog. > This said, I agree that #ifdefery should be reduced by factoring out code when possible. -- Philippe. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning 2016-01-26 15:20 ` [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning Henning Schild 2016-01-26 20:18 ` Jan Kiszka @ 2016-01-27 13:41 ` Henning Schild 2016-01-28 10:53 ` Philippe Gerum 2016-02-03 12:59 ` [Xenomai] [PATCH v3] " Henning Schild 1 sibling, 2 replies; 37+ messages in thread From: Henning Schild @ 2016-01-27 13:41 UTC (permalink / raw) To: Xenomai; +Cc: Jan Kiszka In 4.1 huge page mapping of io memory was introduced, enable ipipe to handle that when pinning kernel memory. change that introduced the feature 0f616be120c632c818faaea9adcb8f05a7a8601f Signed-off-by: Henning Schild <henning.schild@siemens.com> --- arch/x86/mm/fault.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fd5bbcc..ca1e75b 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -211,11 +211,15 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address); if (!pud_present(*pud_k)) return NULL; + if (pud_large(*pud)) + return pud_k; pmd = pmd_offset(pud, address); pmd_k = pmd_offset(pud_k, address); if (!pmd_present(*pmd_k)) return NULL; + if (pmd_large(*pmd)) + return pmd_k; if (!pmd_present(*pmd)) set_pmd(pmd, *pmd_k); @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address) if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref)) BUG(); + if (pud_large(*pud)) + return 0; pmd = pmd_offset(pud, address); pmd_ref = pmd_offset(pud_ref, address); @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address) if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) BUG(); + if (pmd_large(*pmd)) + return 0; pte_ref = pte_offset_kernel(pmd_ref, address); if (!pte_present(*pte_ref)) -- 2.4.10 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning 2016-01-27 13:41 ` [Xenomai] [PATCH v2] " Henning Schild @ 2016-01-28 10:53 ` Philippe Gerum 2016-01-28 20:53 ` Henning Schild 2016-02-03 12:59 ` [Xenomai] [PATCH v3] " Henning Schild 1 sibling, 1 reply; 37+ messages in thread From: Philippe Gerum @ 2016-01-28 10:53 UTC (permalink / raw) To: Henning Schild, Xenomai; +Cc: Jan Kiszka On 01/27/2016 02:41 PM, Henning Schild wrote: > In 4.1 huge page mapping of io memory was introduced, enable ipipe to > handle that when pinning kernel memory. > > change that introduced the feature > 0f616be120c632c818faaea9adcb8f05a7a8601f > > Signed-off-by: Henning Schild <henning.schild@siemens.com> > --- > arch/x86/mm/fault.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > index fd5bbcc..ca1e75b 100644 > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -211,11 +211,15 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address) > pud_k = pud_offset(pgd_k, address); > if (!pud_present(*pud_k)) > return NULL; > + if (pud_large(*pud)) > + return pud_k; > > pmd = pmd_offset(pud, address); > pmd_k = pmd_offset(pud_k, address); > if (!pmd_present(*pmd_k)) > return NULL; > + if (pmd_large(*pmd)) > + return pmd_k; > > if (!pmd_present(*pmd)) > set_pmd(pmd, *pmd_k); > @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address) > > if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref)) > BUG(); > + if (pud_large(*pud)) > + return 0; > > pmd = pmd_offset(pud, address); > pmd_ref = pmd_offset(pud_ref, address); > @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address) > > if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) > BUG(); > + if (pmd_large(*pmd)) > + return 0; > > pte_ref = pte_offset_kernel(pmd_ref, address); > if (!pte_present(*pte_ref)) > I'm confused. Assuming the purpose of that patch is to exclude huge I/O mappings from pte pinning, why does the changes to the x86_32 version of the vmalloc_sync_one() helper actually prevent such pinning, while the x86_64 version does not? Could you explain the logic behind the changes? -- Philippe. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning 2016-01-28 10:53 ` Philippe Gerum @ 2016-01-28 20:53 ` Henning Schild 2016-01-29 17:11 ` Philippe Gerum 0 siblings, 1 reply; 37+ messages in thread From: Henning Schild @ 2016-01-28 20:53 UTC (permalink / raw) To: Philippe Gerum; +Cc: Jan Kiszka, Xenomai On Thu, 28 Jan 2016 11:53:08 +0100 Philippe Gerum <rpm@xenomai.org> wrote: > On 01/27/2016 02:41 PM, Henning Schild wrote: > > In 4.1 huge page mapping of io memory was introduced, enable ipipe > > to handle that when pinning kernel memory. > > > > change that introduced the feature > > 0f616be120c632c818faaea9adcb8f05a7a8601f > > > > Signed-off-by: Henning Schild <henning.schild@siemens.com> > > --- > > arch/x86/mm/fault.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > > index fd5bbcc..ca1e75b 100644 > > --- a/arch/x86/mm/fault.c > > +++ b/arch/x86/mm/fault.c > > @@ -211,11 +211,15 @@ static inline pmd_t *vmalloc_sync_one(pgd_t > > *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address); > > if (!pud_present(*pud_k)) > > return NULL; > > + if (pud_large(*pud)) > > + return pud_k; > > > > pmd = pmd_offset(pud, address); > > pmd_k = pmd_offset(pud_k, address); > > if (!pmd_present(*pmd_k)) > > return NULL; > > + if (pmd_large(*pmd)) > > + return pmd_k; > > > > if (!pmd_present(*pmd)) > > set_pmd(pmd, *pmd_k); > > @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, > > unsigned long address) > > if (pud_none(*pud) || pud_page_vaddr(*pud) != > > pud_page_vaddr(*pud_ref)) BUG(); > > + if (pud_large(*pud)) > > + return 0; > > > > pmd = pmd_offset(pud, address); > > pmd_ref = pmd_offset(pud_ref, address); > > @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, > > unsigned long address) > > if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) > > BUG(); > > + if (pmd_large(*pmd)) > > + return 0; > > > > pte_ref = pte_offset_kernel(pmd_ref, address); > > if (!pte_present(*pte_ref)) > > > > I'm confused. Assuming the purpose of that patch is to exclude huge > I/O mappings from pte pinning, why does the changes to the x86_32 > version of the vmalloc_sync_one() helper actually prevent such > pinning, while the x86_64 version does not? No the purpose is to include them just like they were before. vanilla vmalloc_sync_one just must not be called on huge mappings because it cant handle them. The patch is supposed to make the function return successfully, stopping early when huge pages are detected. It changes the implementation of both x86_32 and x86_64. > Could you explain the logic behind the changes? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning 2016-01-28 20:53 ` Henning Schild @ 2016-01-29 17:11 ` Philippe Gerum 2016-01-29 18:39 ` Gilles Chanteperdrix 2016-02-02 11:41 ` Henning Schild 0 siblings, 2 replies; 37+ messages in thread From: Philippe Gerum @ 2016-01-29 17:11 UTC (permalink / raw) To: Henning Schild; +Cc: Jan Kiszka, Xenomai On 01/28/2016 09:53 PM, Henning Schild wrote: > On Thu, 28 Jan 2016 11:53:08 +0100 > Philippe Gerum <rpm@xenomai.org> wrote: > >> On 01/27/2016 02:41 PM, Henning Schild wrote: >>> In 4.1 huge page mapping of io memory was introduced, enable ipipe >>> to handle that when pinning kernel memory. >>> >>> change that introduced the feature >>> 0f616be120c632c818faaea9adcb8f05a7a8601f >>> >>> Signed-off-by: Henning Schild <henning.schild@siemens.com> >>> --- >>> arch/x86/mm/fault.c | 8 ++++++++ >>> 1 file changed, 8 insertions(+) >>> >>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c >>> index fd5bbcc..ca1e75b 100644 >>> --- a/arch/x86/mm/fault.c >>> +++ b/arch/x86/mm/fault.c >>> @@ -211,11 +211,15 @@ static inline pmd_t *vmalloc_sync_one(pgd_t >>> *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address); >>> if (!pud_present(*pud_k)) >>> return NULL; >>> + if (pud_large(*pud)) >>> + return pud_k; >>> >>> pmd = pmd_offset(pud, address); >>> pmd_k = pmd_offset(pud_k, address); >>> if (!pmd_present(*pmd_k)) >>> return NULL; >>> + if (pmd_large(*pmd)) >>> + return pmd_k; >>> >>> if (!pmd_present(*pmd)) >>> set_pmd(pmd, *pmd_k); >>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, >>> unsigned long address) >>> if (pud_none(*pud) || pud_page_vaddr(*pud) != >>> pud_page_vaddr(*pud_ref)) BUG(); >>> + if (pud_large(*pud)) >>> + return 0; >>> >>> pmd = pmd_offset(pud, address); >>> pmd_ref = pmd_offset(pud_ref, address); >>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, >>> unsigned long address) >>> if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) >>> BUG(); >>> + if (pmd_large(*pmd)) >>> + return 0; >>> >>> pte_ref = pte_offset_kernel(pmd_ref, address); >>> if (!pte_present(*pte_ref)) >>> >> >> I'm confused. Assuming the purpose of that patch is to exclude huge >> I/O mappings from pte pinning, why does the changes to the x86_32 >> version of the vmalloc_sync_one() helper actually prevent such >> pinning, while the x86_64 version does not? > > No the purpose is to include them just like they were before. > vanilla vmalloc_sync_one just must not be called on huge mappings > because it cant handle them. The patch is supposed to make the function > return successfully, stopping early when huge pages are detected. > > It changes the implementation of both x86_32 and x86_64. > Sorry, your answer confuses me even more. vmalloc_sync_one() _does_ the pinning, by copying over the kernel mapping, early in the course of the routine for x86_64, late for x86_32. Please explain why your changes prevent huge I/O mappings from being pinned into the current page directory in the x86_32 implementation, but still allow this to be done in the x86_64 version. The section of code you patched in the latter case is basically a series of sanity checks done after the pinning took place, not before. On a more general note, a better approach would be to filter out calls to vmalloc_sync_one() for huge pages directly from __ipipe_pin_mapping globally(). -- Philippe. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning 2016-01-29 17:11 ` Philippe Gerum @ 2016-01-29 18:39 ` Gilles Chanteperdrix 2016-02-02 12:08 ` Henning Schild 2016-02-02 11:41 ` Henning Schild 1 sibling, 1 reply; 37+ messages in thread From: Gilles Chanteperdrix @ 2016-01-29 18:39 UTC (permalink / raw) To: Philippe Gerum; +Cc: Xenomai, Jan Kiszka On Fri, Jan 29, 2016 at 06:11:07PM +0100, Philippe Gerum wrote: > On 01/28/2016 09:53 PM, Henning Schild wrote: > > On Thu, 28 Jan 2016 11:53:08 +0100 > > Philippe Gerum <rpm@xenomai.org> wrote: > > > >> On 01/27/2016 02:41 PM, Henning Schild wrote: > >>> In 4.1 huge page mapping of io memory was introduced, enable ipipe > >>> to handle that when pinning kernel memory. > >>> > >>> change that introduced the feature > >>> 0f616be120c632c818faaea9adcb8f05a7a8601f > >>> > >>> Signed-off-by: Henning Schild <henning.schild@siemens.com> > >>> --- > >>> arch/x86/mm/fault.c | 8 ++++++++ > >>> 1 file changed, 8 insertions(+) > >>> > >>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > >>> index fd5bbcc..ca1e75b 100644 > >>> --- a/arch/x86/mm/fault.c > >>> +++ b/arch/x86/mm/fault.c > >>> @@ -211,11 +211,15 @@ static inline pmd_t *vmalloc_sync_one(pgd_t > >>> *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address); > >>> if (!pud_present(*pud_k)) > >>> return NULL; > >>> + if (pud_large(*pud)) > >>> + return pud_k; > >>> > >>> pmd = pmd_offset(pud, address); > >>> pmd_k = pmd_offset(pud_k, address); > >>> if (!pmd_present(*pmd_k)) > >>> return NULL; > >>> + if (pmd_large(*pmd)) > >>> + return pmd_k; > >>> > >>> if (!pmd_present(*pmd)) > >>> set_pmd(pmd, *pmd_k); > >>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, > >>> unsigned long address) > >>> if (pud_none(*pud) || pud_page_vaddr(*pud) != > >>> pud_page_vaddr(*pud_ref)) BUG(); > >>> + if (pud_large(*pud)) > >>> + return 0; > >>> > >>> pmd = pmd_offset(pud, address); > >>> pmd_ref = pmd_offset(pud_ref, address); > >>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, > >>> unsigned long address) > >>> if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) > >>> BUG(); > >>> + if (pmd_large(*pmd)) > >>> + return 0; > >>> > >>> pte_ref = pte_offset_kernel(pmd_ref, address); > >>> if (!pte_present(*pte_ref)) > >>> > >> > >> I'm confused. Assuming the purpose of that patch is to exclude huge > >> I/O mappings from pte pinning, why does the changes to the x86_32 > >> version of the vmalloc_sync_one() helper actually prevent such > >> pinning, while the x86_64 version does not? > > > > No the purpose is to include them just like they were before. > > vanilla vmalloc_sync_one just must not be called on huge mappings > > because it cant handle them. The patch is supposed to make the function > > return successfully, stopping early when huge pages are detected. > > > > It changes the implementation of both x86_32 and x86_64. > > > > Sorry, your answer confuses me even more. vmalloc_sync_one() _does_ the > pinning, by copying over the kernel mapping, early in the course of the > routine for x86_64, late for x86_32. > > Please explain why your changes prevent huge I/O mappings from being > pinned into the current page directory in the x86_32 implementation, but > still allow this to be done in the x86_64 version. The section of code > you patched in the latter case is basically a series of sanity checks > done after the pinning took place, not before. > > On a more general note, a better approach would be to filter out calls > to vmalloc_sync_one() for huge pages directly from __ipipe_pin_mapping > globally(). Since the ioremap/vmalloc range is based on huge pages or not, globally (since the processes mapping are copied from the kernel mapping), we can probably even avoid the call to __ipipe_pin_mapping for hug page mappings. After all, since the mainline kernel is able to avoid calls to vmalloc_sync_one() for huge page mappings, the I-pipe should be able to do the same. -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning 2016-01-29 18:39 ` Gilles Chanteperdrix @ 2016-02-02 12:08 ` Henning Schild 2016-02-02 13:58 ` Philippe Gerum 2016-02-02 14:18 ` Philippe Gerum 0 siblings, 2 replies; 37+ messages in thread From: Henning Schild @ 2016-02-02 12:08 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai On Fri, 29 Jan 2016 19:39:48 +0100 Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote: > On Fri, Jan 29, 2016 at 06:11:07PM +0100, Philippe Gerum wrote: > > On 01/28/2016 09:53 PM, Henning Schild wrote: > > > On Thu, 28 Jan 2016 11:53:08 +0100 > > > Philippe Gerum <rpm@xenomai.org> wrote: > > > > > >> On 01/27/2016 02:41 PM, Henning Schild wrote: > > >>> In 4.1 huge page mapping of io memory was introduced, enable > > >>> ipipe to handle that when pinning kernel memory. > > >>> > > >>> change that introduced the feature > > >>> 0f616be120c632c818faaea9adcb8f05a7a8601f > > >>> > > >>> Signed-off-by: Henning Schild <henning.schild@siemens.com> > > >>> --- > > >>> arch/x86/mm/fault.c | 8 ++++++++ > > >>> 1 file changed, 8 insertions(+) > > >>> > > >>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > > >>> index fd5bbcc..ca1e75b 100644 > > >>> --- a/arch/x86/mm/fault.c > > >>> +++ b/arch/x86/mm/fault.c > > >>> @@ -211,11 +211,15 @@ static inline pmd_t > > >>> *vmalloc_sync_one(pgd_t *pgd, unsigned long address) pud_k = > > >>> pud_offset(pgd_k, address); if (!pud_present(*pud_k)) > > >>> return NULL; > > >>> + if (pud_large(*pud)) > > >>> + return pud_k; > > >>> > > >>> pmd = pmd_offset(pud, address); > > >>> pmd_k = pmd_offset(pud_k, address); > > >>> if (!pmd_present(*pmd_k)) > > >>> return NULL; > > >>> + if (pmd_large(*pmd)) > > >>> + return pmd_k; > > >>> > > >>> if (!pmd_present(*pmd)) > > >>> set_pmd(pmd, *pmd_k); > > >>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t > > >>> *pgd, unsigned long address) > > >>> if (pud_none(*pud) || pud_page_vaddr(*pud) != > > >>> pud_page_vaddr(*pud_ref)) BUG(); > > >>> + if (pud_large(*pud)) > > >>> + return 0; > > >>> > > >>> pmd = pmd_offset(pud, address); > > >>> pmd_ref = pmd_offset(pud_ref, address); > > >>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t > > >>> *pgd, unsigned long address) > > >>> if (pmd_none(*pmd) || pmd_page(*pmd) != > > >>> pmd_page(*pmd_ref)) BUG(); > > >>> + if (pmd_large(*pmd)) > > >>> + return 0; > > >>> > > >>> pte_ref = pte_offset_kernel(pmd_ref, address); > > >>> if (!pte_present(*pte_ref)) > > >>> > > >> > > >> I'm confused. Assuming the purpose of that patch is to exclude > > >> huge I/O mappings from pte pinning, why does the changes to the > > >> x86_32 version of the vmalloc_sync_one() helper actually prevent > > >> such pinning, while the x86_64 version does not? > > > > > > No the purpose is to include them just like they were before. > > > vanilla vmalloc_sync_one just must not be called on huge mappings > > > because it cant handle them. The patch is supposed to make the > > > function return successfully, stopping early when huge pages are > > > detected. > > > > > > It changes the implementation of both x86_32 and x86_64. > > > > > > > Sorry, your answer confuses me even more. vmalloc_sync_one() _does_ > > the pinning, by copying over the kernel mapping, early in the > > course of the routine for x86_64, late for x86_32. > > > > Please explain why your changes prevent huge I/O mappings from being > > pinned into the current page directory in the x86_32 > > implementation, but still allow this to be done in the x86_64 > > version. The section of code you patched in the latter case is > > basically a series of sanity checks done after the pinning took > > place, not before. > > > > On a more general note, a better approach would be to filter out > > calls to vmalloc_sync_one() for huge pages directly from > > __ipipe_pin_mapping globally(). > > Since the ioremap/vmalloc range is based on huge pages or not, > globally (since the processes mapping are copied from the kernel > mapping), we can probably even avoid the call to __ipipe_pin_mapping > for hug page mappings. After all, since the mainline kernel is able > to avoid calls to vmalloc_sync_one() for huge page mappings, the > I-pipe should be able to do the same. As said, my patch preserves the old ipipe behaviour, making it able to deal with huge pages. Whether or not the pinning is required and on which regions, is a valid but totally different question. Down in these low-level functions you can not tell why the pinning was requested. It has to be dealt with at a higher level. But you are right, the kernel never ends up in vmalloc_sync_one for these regions. Hence we can probably expect that in ioremapped region is always mapped anyways. Since vmalloc_sync_one will ultimately also be used when handling a #PF. I just traced the change in lib/ioremap.c to d41282baf7c1f2bb517d6df95571e7da865f5e37 Not too helpful, at least to me. Henning ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning 2016-02-02 12:08 ` Henning Schild @ 2016-02-02 13:58 ` Philippe Gerum 2016-02-02 16:38 ` Henning Schild 2016-02-02 14:18 ` Philippe Gerum 1 sibling, 1 reply; 37+ messages in thread From: Philippe Gerum @ 2016-02-02 13:58 UTC (permalink / raw) To: Henning Schild, Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai On 02/02/2016 01:08 PM, Henning Schild wrote: > On Fri, 29 Jan 2016 19:39:48 +0100 > Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote: > >> On Fri, Jan 29, 2016 at 06:11:07PM +0100, Philippe Gerum wrote: >>> On 01/28/2016 09:53 PM, Henning Schild wrote: >>>> On Thu, 28 Jan 2016 11:53:08 +0100 >>>> Philippe Gerum <rpm@xenomai.org> wrote: >>>> >>>>> On 01/27/2016 02:41 PM, Henning Schild wrote: >>>>>> In 4.1 huge page mapping of io memory was introduced, enable >>>>>> ipipe to handle that when pinning kernel memory. >>>>>> >>>>>> change that introduced the feature >>>>>> 0f616be120c632c818faaea9adcb8f05a7a8601f >>>>>> >>>>>> Signed-off-by: Henning Schild <henning.schild@siemens.com> >>>>>> --- >>>>>> arch/x86/mm/fault.c | 8 ++++++++ >>>>>> 1 file changed, 8 insertions(+) >>>>>> >>>>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c >>>>>> index fd5bbcc..ca1e75b 100644 >>>>>> --- a/arch/x86/mm/fault.c >>>>>> +++ b/arch/x86/mm/fault.c >>>>>> @@ -211,11 +211,15 @@ static inline pmd_t >>>>>> *vmalloc_sync_one(pgd_t *pgd, unsigned long address) pud_k = >>>>>> pud_offset(pgd_k, address); if (!pud_present(*pud_k)) >>>>>> return NULL; >>>>>> + if (pud_large(*pud)) >>>>>> + return pud_k; >>>>>> >>>>>> pmd = pmd_offset(pud, address); >>>>>> pmd_k = pmd_offset(pud_k, address); >>>>>> if (!pmd_present(*pmd_k)) >>>>>> return NULL; >>>>>> + if (pmd_large(*pmd)) >>>>>> + return pmd_k; >>>>>> >>>>>> if (!pmd_present(*pmd)) >>>>>> set_pmd(pmd, *pmd_k); >>>>>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t >>>>>> *pgd, unsigned long address) >>>>>> if (pud_none(*pud) || pud_page_vaddr(*pud) != >>>>>> pud_page_vaddr(*pud_ref)) BUG(); >>>>>> + if (pud_large(*pud)) >>>>>> + return 0; >>>>>> >>>>>> pmd = pmd_offset(pud, address); >>>>>> pmd_ref = pmd_offset(pud_ref, address); >>>>>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t >>>>>> *pgd, unsigned long address) >>>>>> if (pmd_none(*pmd) || pmd_page(*pmd) != >>>>>> pmd_page(*pmd_ref)) BUG(); >>>>>> + if (pmd_large(*pmd)) >>>>>> + return 0; >>>>>> >>>>>> pte_ref = pte_offset_kernel(pmd_ref, address); >>>>>> if (!pte_present(*pte_ref)) >>>>>> >>>>> >>>>> I'm confused. Assuming the purpose of that patch is to exclude >>>>> huge I/O mappings from pte pinning, why does the changes to the >>>>> x86_32 version of the vmalloc_sync_one() helper actually prevent >>>>> such pinning, while the x86_64 version does not? >>>> >>>> No the purpose is to include them just like they were before. >>>> vanilla vmalloc_sync_one just must not be called on huge mappings >>>> because it cant handle them. The patch is supposed to make the >>>> function return successfully, stopping early when huge pages are >>>> detected. >>>> >>>> It changes the implementation of both x86_32 and x86_64. >>>> >>> >>> Sorry, your answer confuses me even more. vmalloc_sync_one() _does_ >>> the pinning, by copying over the kernel mapping, early in the >>> course of the routine for x86_64, late for x86_32. >>> >>> Please explain why your changes prevent huge I/O mappings from being >>> pinned into the current page directory in the x86_32 >>> implementation, but still allow this to be done in the x86_64 >>> version. The section of code you patched in the latter case is >>> basically a series of sanity checks done after the pinning took >>> place, not before. >>> >>> On a more general note, a better approach would be to filter out >>> calls to vmalloc_sync_one() for huge pages directly from >>> __ipipe_pin_mapping globally(). >> >> Since the ioremap/vmalloc range is based on huge pages or not, >> globally (since the processes mapping are copied from the kernel >> mapping), we can probably even avoid the call to __ipipe_pin_mapping >> for hug page mappings. After all, since the mainline kernel is able >> to avoid calls to vmalloc_sync_one() for huge page mappings, the >> I-pipe should be able to do the same. > > As said, my patch preserves the old ipipe behaviour, making it able to > deal with huge pages. Whether or not the pinning is required and on > which regions, is a valid but totally different question. Down in these > low-level functions you can not tell why the pinning was requested. It > has to be dealt with at a higher level. > But you are right, the kernel never ends up in vmalloc_sync_one for > these regions. Hence we can probably expect that in ioremapped region > is always mapped anyways. Since vmalloc_sync_one will ultimately also > be used when handling a #PF. > > I just traced the change in lib/ioremap.c to > d41282baf7c1f2bb517d6df95571e7da865f5e37 > Not too helpful, at least to me. So why did you trace it? -- Philippe. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning 2016-02-02 13:58 ` Philippe Gerum @ 2016-02-02 16:38 ` Henning Schild 2016-02-02 16:39 ` Jan Kiszka 2016-02-02 19:26 ` Gilles Chanteperdrix 0 siblings, 2 replies; 37+ messages in thread From: Henning Schild @ 2016-02-02 16:38 UTC (permalink / raw) To: Philippe Gerum; +Cc: Jan Kiszka, Gilles Chanteperdrix, Xenomai On Tue, 2 Feb 2016 14:58:41 +0100 Philippe Gerum <rpm@xenomai.org> wrote: > On 02/02/2016 01:08 PM, Henning Schild wrote: > > On Fri, 29 Jan 2016 19:39:48 +0100 > > Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote: > > > >> On Fri, Jan 29, 2016 at 06:11:07PM +0100, Philippe Gerum wrote: > >>> On 01/28/2016 09:53 PM, Henning Schild wrote: > >>>> On Thu, 28 Jan 2016 11:53:08 +0100 > >>>> Philippe Gerum <rpm@xenomai.org> wrote: > >>>> > >>>>> On 01/27/2016 02:41 PM, Henning Schild wrote: > >>>>>> In 4.1 huge page mapping of io memory was introduced, enable > >>>>>> ipipe to handle that when pinning kernel memory. > >>>>>> > >>>>>> change that introduced the feature > >>>>>> 0f616be120c632c818faaea9adcb8f05a7a8601f > >>>>>> > >>>>>> Signed-off-by: Henning Schild <henning.schild@siemens.com> > >>>>>> --- > >>>>>> arch/x86/mm/fault.c | 8 ++++++++ > >>>>>> 1 file changed, 8 insertions(+) > >>>>>> > >>>>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > >>>>>> index fd5bbcc..ca1e75b 100644 > >>>>>> --- a/arch/x86/mm/fault.c > >>>>>> +++ b/arch/x86/mm/fault.c > >>>>>> @@ -211,11 +211,15 @@ static inline pmd_t > >>>>>> *vmalloc_sync_one(pgd_t *pgd, unsigned long address) pud_k = > >>>>>> pud_offset(pgd_k, address); if (!pud_present(*pud_k)) > >>>>>> return NULL; > >>>>>> + if (pud_large(*pud)) > >>>>>> + return pud_k; > >>>>>> > >>>>>> pmd = pmd_offset(pud, address); > >>>>>> pmd_k = pmd_offset(pud_k, address); > >>>>>> if (!pmd_present(*pmd_k)) > >>>>>> return NULL; > >>>>>> + if (pmd_large(*pmd)) > >>>>>> + return pmd_k; > >>>>>> > >>>>>> if (!pmd_present(*pmd)) > >>>>>> set_pmd(pmd, *pmd_k); > >>>>>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t > >>>>>> *pgd, unsigned long address) > >>>>>> if (pud_none(*pud) || pud_page_vaddr(*pud) != > >>>>>> pud_page_vaddr(*pud_ref)) BUG(); > >>>>>> + if (pud_large(*pud)) > >>>>>> + return 0; > >>>>>> > >>>>>> pmd = pmd_offset(pud, address); > >>>>>> pmd_ref = pmd_offset(pud_ref, address); > >>>>>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t > >>>>>> *pgd, unsigned long address) > >>>>>> if (pmd_none(*pmd) || pmd_page(*pmd) != > >>>>>> pmd_page(*pmd_ref)) BUG(); > >>>>>> + if (pmd_large(*pmd)) > >>>>>> + return 0; > >>>>>> > >>>>>> pte_ref = pte_offset_kernel(pmd_ref, address); > >>>>>> if (!pte_present(*pte_ref)) > >>>>>> > >>>>> > >>>>> I'm confused. Assuming the purpose of that patch is to exclude > >>>>> huge I/O mappings from pte pinning, why does the changes to the > >>>>> x86_32 version of the vmalloc_sync_one() helper actually prevent > >>>>> such pinning, while the x86_64 version does not? > >>>> > >>>> No the purpose is to include them just like they were before. > >>>> vanilla vmalloc_sync_one just must not be called on huge mappings > >>>> because it cant handle them. The patch is supposed to make the > >>>> function return successfully, stopping early when huge pages are > >>>> detected. > >>>> > >>>> It changes the implementation of both x86_32 and x86_64. > >>>> > >>> > >>> Sorry, your answer confuses me even more. vmalloc_sync_one() > >>> _does_ the pinning, by copying over the kernel mapping, early in > >>> the course of the routine for x86_64, late for x86_32. > >>> > >>> Please explain why your changes prevent huge I/O mappings from > >>> being pinned into the current page directory in the x86_32 > >>> implementation, but still allow this to be done in the x86_64 > >>> version. The section of code you patched in the latter case is > >>> basically a series of sanity checks done after the pinning took > >>> place, not before. > >>> > >>> On a more general note, a better approach would be to filter out > >>> calls to vmalloc_sync_one() for huge pages directly from > >>> __ipipe_pin_mapping globally(). > >> > >> Since the ioremap/vmalloc range is based on huge pages or not, > >> globally (since the processes mapping are copied from the kernel > >> mapping), we can probably even avoid the call to > >> __ipipe_pin_mapping for hug page mappings. After all, since the > >> mainline kernel is able to avoid calls to vmalloc_sync_one() for > >> huge page mappings, the I-pipe should be able to do the same. > > > > As said, my patch preserves the old ipipe behaviour, making it able > > to deal with huge pages. Whether or not the pinning is required and > > on which regions, is a valid but totally different question. Down > > in these low-level functions you can not tell why the pinning was > > requested. It has to be dealt with at a higher level. > > But you are right, the kernel never ends up in vmalloc_sync_one for > > these regions. Hence we can probably expect that in ioremapped > > region is always mapped anyways. Since vmalloc_sync_one will > > ultimately also be used when handling a #PF. > > > > I just traced the change in lib/ioremap.c to > > d41282baf7c1f2bb517d6df95571e7da865f5e37 > > Not too helpful, at least to me. > > So why did you trace it? I do not want to keep working on that patch just to hear that it is not required in the next round. Gilles suggested it might not be and you are the one that introduced the pinning. Maybe a HAVE_ARCH_HUGE_VMAP depends on !IPIPE would also be fine? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning 2016-02-02 16:38 ` Henning Schild @ 2016-02-02 16:39 ` Jan Kiszka 2016-02-02 19:26 ` Gilles Chanteperdrix 1 sibling, 0 replies; 37+ messages in thread From: Jan Kiszka @ 2016-02-02 16:39 UTC (permalink / raw) To: Henning Schild, Philippe Gerum; +Cc: Gilles Chanteperdrix, Xenomai On 2016-02-02 17:38, Henning Schild wrote: > Maybe a HAVE_ARCH_HUGE_VMAP depends on !IPIPE would also be fine? No, no, that's a needless restriction. We need to fix I-pipe here. Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning 2016-02-02 16:38 ` Henning Schild 2016-02-02 16:39 ` Jan Kiszka @ 2016-02-02 19:26 ` Gilles Chanteperdrix 2016-02-03 11:35 ` Henning Schild 1 sibling, 1 reply; 37+ messages in thread From: Gilles Chanteperdrix @ 2016-02-02 19:26 UTC (permalink / raw) To: Henning Schild; +Cc: Jan Kiszka, Xenomai On Tue, Feb 02, 2016 at 05:38:49PM +0100, Henning Schild wrote: > I do not want to keep working on that patch just to hear that it is not > required in the next round. Gilles suggested it might not be and you > are the one that introduced the pinning. > > Maybe a HAVE_ARCH_HUGE_VMAP depends on !IPIPE would also be fine? I did not expect you to use my remark as an excuse for dropping the ball. What I meant is that maybe, there is a way to skip the call to __ipipe_pin_mapping_globally() entirely in the upper layers, for the "huge pages" case. And if there is, this should be preferred over patching vmalloc_sync_one, a function not meant to be used for huge mappings anyway. -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning 2016-02-02 19:26 ` Gilles Chanteperdrix @ 2016-02-03 11:35 ` Henning Schild 0 siblings, 0 replies; 37+ messages in thread From: Henning Schild @ 2016-02-03 11:35 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai On Tue, 2 Feb 2016 20:26:54 +0100 Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote: > On Tue, Feb 02, 2016 at 05:38:49PM +0100, Henning Schild wrote: > > I do not want to keep working on that patch just to hear that it is > > not required in the next round. Gilles suggested it might not be > > and you are the one that introduced the pinning. > > > > Maybe a HAVE_ARCH_HUGE_VMAP depends on !IPIPE would also be fine? > > I did not expect you to use my remark as an excuse for dropping the > ball. What I meant is that maybe, there is a way to skip the call to > __ipipe_pin_mapping_globally() entirely in the upper layers, for the > "huge pages" case. And if there is, this should be preferred over > patching vmalloc_sync_one, a function not meant to be used for huge > mappings anyway. Please do not get me wrong here. I am not about to drop the ball! Making the two things mutually exclusive is a valid solution with little complexity. Keeping the patch small might after all be more important than catering for questionable features with more patching. Hence the suggestion. The x86_64 portion of my patch is fine but the 32bit version is indeed pretty wrong. I just took over what i did for 64bit without thinking ... Henning ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning 2016-02-02 12:08 ` Henning Schild 2016-02-02 13:58 ` Philippe Gerum @ 2016-02-02 14:18 ` Philippe Gerum 2016-02-02 16:30 ` Henning Schild 1 sibling, 1 reply; 37+ messages in thread From: Philippe Gerum @ 2016-02-02 14:18 UTC (permalink / raw) To: Henning Schild, Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai On 02/02/2016 01:08 PM, Henning Schild wrote: > On Fri, 29 Jan 2016 19:39:48 +0100 > Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote: > >> On Fri, Jan 29, 2016 at 06:11:07PM +0100, Philippe Gerum wrote: >>> On 01/28/2016 09:53 PM, Henning Schild wrote: >>>> On Thu, 28 Jan 2016 11:53:08 +0100 >>>> Philippe Gerum <rpm@xenomai.org> wrote: >>>> >>>>> On 01/27/2016 02:41 PM, Henning Schild wrote: >>>>>> In 4.1 huge page mapping of io memory was introduced, enable >>>>>> ipipe to handle that when pinning kernel memory. >>>>>> >>>>>> change that introduced the feature >>>>>> 0f616be120c632c818faaea9adcb8f05a7a8601f >>>>>> >>>>>> Signed-off-by: Henning Schild <henning.schild@siemens.com> >>>>>> --- >>>>>> arch/x86/mm/fault.c | 8 ++++++++ >>>>>> 1 file changed, 8 insertions(+) >>>>>> >>>>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c >>>>>> index fd5bbcc..ca1e75b 100644 >>>>>> --- a/arch/x86/mm/fault.c >>>>>> +++ b/arch/x86/mm/fault.c >>>>>> @@ -211,11 +211,15 @@ static inline pmd_t >>>>>> *vmalloc_sync_one(pgd_t *pgd, unsigned long address) pud_k = >>>>>> pud_offset(pgd_k, address); if (!pud_present(*pud_k)) >>>>>> return NULL; >>>>>> + if (pud_large(*pud)) >>>>>> + return pud_k; >>>>>> >>>>>> pmd = pmd_offset(pud, address); >>>>>> pmd_k = pmd_offset(pud_k, address); >>>>>> if (!pmd_present(*pmd_k)) >>>>>> return NULL; >>>>>> + if (pmd_large(*pmd)) >>>>>> + return pmd_k; >>>>>> >>>>>> if (!pmd_present(*pmd)) >>>>>> set_pmd(pmd, *pmd_k); >>>>>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t >>>>>> *pgd, unsigned long address) >>>>>> if (pud_none(*pud) || pud_page_vaddr(*pud) != >>>>>> pud_page_vaddr(*pud_ref)) BUG(); >>>>>> + if (pud_large(*pud)) >>>>>> + return 0; >>>>>> >>>>>> pmd = pmd_offset(pud, address); >>>>>> pmd_ref = pmd_offset(pud_ref, address); >>>>>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t >>>>>> *pgd, unsigned long address) >>>>>> if (pmd_none(*pmd) || pmd_page(*pmd) != >>>>>> pmd_page(*pmd_ref)) BUG(); >>>>>> + if (pmd_large(*pmd)) >>>>>> + return 0; >>>>>> >>>>>> pte_ref = pte_offset_kernel(pmd_ref, address); >>>>>> if (!pte_present(*pte_ref)) >>>>>> >>>>> >>>>> I'm confused. Assuming the purpose of that patch is to exclude >>>>> huge I/O mappings from pte pinning, why does the changes to the >>>>> x86_32 version of the vmalloc_sync_one() helper actually prevent >>>>> such pinning, while the x86_64 version does not? >>>> >>>> No the purpose is to include them just like they were before. >>>> vanilla vmalloc_sync_one just must not be called on huge mappings >>>> because it cant handle them. The patch is supposed to make the >>>> function return successfully, stopping early when huge pages are >>>> detected. >>>> >>>> It changes the implementation of both x86_32 and x86_64. >>>> >>> >>> Sorry, your answer confuses me even more. vmalloc_sync_one() _does_ >>> the pinning, by copying over the kernel mapping, early in the >>> course of the routine for x86_64, late for x86_32. >>> >>> Please explain why your changes prevent huge I/O mappings from being >>> pinned into the current page directory in the x86_32 >>> implementation, but still allow this to be done in the x86_64 >>> version. The section of code you patched in the latter case is >>> basically a series of sanity checks done after the pinning took >>> place, not before. >>> >>> On a more general note, a better approach would be to filter out >>> calls to vmalloc_sync_one() for huge pages directly from >>> __ipipe_pin_mapping globally(). >> >> Since the ioremap/vmalloc range is based on huge pages or not, >> globally (since the processes mapping are copied from the kernel >> mapping), we can probably even avoid the call to __ipipe_pin_mapping >> for hug page mappings. After all, since the mainline kernel is able >> to avoid calls to vmalloc_sync_one() for huge page mappings, the >> I-pipe should be able to do the same. > > As said, my patch preserves the old ipipe behaviour, making it able to > deal with huge pages. No, it does not because it creates an early exit path for x86_32 which did not exist preventing the page directory to be updated, which is at odds with the implementation for x86_64 of the very same routine where your changes do not filter out huge pages from pinning. So your patch now prevents PTEs for huge pages from being pinned in the pgd for x86_32, which departs from the current behavior. However, it does not for x86_64 which is already the current behavior. So the question remains: why this difference? -- Philippe. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning 2016-02-02 14:18 ` Philippe Gerum @ 2016-02-02 16:30 ` Henning Schild 0 siblings, 0 replies; 37+ messages in thread From: Henning Schild @ 2016-02-02 16:30 UTC (permalink / raw) To: Philippe Gerum; +Cc: Jan Kiszka, Gilles Chanteperdrix, Xenomai On Tue, 2 Feb 2016 15:18:15 +0100 Philippe Gerum <rpm@xenomai.org> wrote: > On 02/02/2016 01:08 PM, Henning Schild wrote: > > On Fri, 29 Jan 2016 19:39:48 +0100 > > Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote: > > > >> On Fri, Jan 29, 2016 at 06:11:07PM +0100, Philippe Gerum wrote: > >>> On 01/28/2016 09:53 PM, Henning Schild wrote: > >>>> On Thu, 28 Jan 2016 11:53:08 +0100 > >>>> Philippe Gerum <rpm@xenomai.org> wrote: > >>>> > >>>>> On 01/27/2016 02:41 PM, Henning Schild wrote: > >>>>>> In 4.1 huge page mapping of io memory was introduced, enable > >>>>>> ipipe to handle that when pinning kernel memory. > >>>>>> > >>>>>> change that introduced the feature > >>>>>> 0f616be120c632c818faaea9adcb8f05a7a8601f > >>>>>> > >>>>>> Signed-off-by: Henning Schild <henning.schild@siemens.com> > >>>>>> --- > >>>>>> arch/x86/mm/fault.c | 8 ++++++++ > >>>>>> 1 file changed, 8 insertions(+) > >>>>>> > >>>>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > >>>>>> index fd5bbcc..ca1e75b 100644 > >>>>>> --- a/arch/x86/mm/fault.c > >>>>>> +++ b/arch/x86/mm/fault.c > >>>>>> @@ -211,11 +211,15 @@ static inline pmd_t > >>>>>> *vmalloc_sync_one(pgd_t *pgd, unsigned long address) pud_k = > >>>>>> pud_offset(pgd_k, address); if (!pud_present(*pud_k)) > >>>>>> return NULL; > >>>>>> + if (pud_large(*pud)) > >>>>>> + return pud_k; > >>>>>> > >>>>>> pmd = pmd_offset(pud, address); > >>>>>> pmd_k = pmd_offset(pud_k, address); > >>>>>> if (!pmd_present(*pmd_k)) > >>>>>> return NULL; > >>>>>> + if (pmd_large(*pmd)) > >>>>>> + return pmd_k; Are you talking about this change? Now looking at it it indeed looks wrong and should not be there. > >>>>>> > >>>>>> if (!pmd_present(*pmd)) > >>>>>> set_pmd(pmd, *pmd_k); > >>>>>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t > >>>>>> *pgd, unsigned long address) > >>>>>> if (pud_none(*pud) || pud_page_vaddr(*pud) != > >>>>>> pud_page_vaddr(*pud_ref)) BUG(); > >>>>>> + if (pud_large(*pud)) > >>>>>> + return 0; > >>>>>> > >>>>>> pmd = pmd_offset(pud, address); > >>>>>> pmd_ref = pmd_offset(pud_ref, address); > >>>>>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t > >>>>>> *pgd, unsigned long address) > >>>>>> if (pmd_none(*pmd) || pmd_page(*pmd) != > >>>>>> pmd_page(*pmd_ref)) BUG(); > >>>>>> + if (pmd_large(*pmd)) > >>>>>> + return 0; > >>>>>> > >>>>>> pte_ref = pte_offset_kernel(pmd_ref, address); > >>>>>> if (!pte_present(*pte_ref)) > >>>>>> > >>>>> > >>>>> I'm confused. Assuming the purpose of that patch is to exclude > >>>>> huge I/O mappings from pte pinning, why does the changes to the > >>>>> x86_32 version of the vmalloc_sync_one() helper actually prevent > >>>>> such pinning, while the x86_64 version does not? > >>>> > >>>> No the purpose is to include them just like they were before. > >>>> vanilla vmalloc_sync_one just must not be called on huge mappings > >>>> because it cant handle them. The patch is supposed to make the > >>>> function return successfully, stopping early when huge pages are > >>>> detected. > >>>> > >>>> It changes the implementation of both x86_32 and x86_64. > >>>> > >>> > >>> Sorry, your answer confuses me even more. vmalloc_sync_one() > >>> _does_ the pinning, by copying over the kernel mapping, early in > >>> the course of the routine for x86_64, late for x86_32. > >>> > >>> Please explain why your changes prevent huge I/O mappings from > >>> being pinned into the current page directory in the x86_32 > >>> implementation, but still allow this to be done in the x86_64 > >>> version. The section of code you patched in the latter case is > >>> basically a series of sanity checks done after the pinning took > >>> place, not before. > >>> > >>> On a more general note, a better approach would be to filter out > >>> calls to vmalloc_sync_one() for huge pages directly from > >>> __ipipe_pin_mapping globally(). > >> > >> Since the ioremap/vmalloc range is based on huge pages or not, > >> globally (since the processes mapping are copied from the kernel > >> mapping), we can probably even avoid the call to > >> __ipipe_pin_mapping for hug page mappings. After all, since the > >> mainline kernel is able to avoid calls to vmalloc_sync_one() for > >> huge page mappings, the I-pipe should be able to do the same. > > > > As said, my patch preserves the old ipipe behaviour, making it able > > to deal with huge pages. > > No, it does not because it creates an early exit path for x86_32 which > did not exist preventing the page directory to be updated, which is at > odds with the implementation for x86_64 of the very same routine where > your changes do not filter out huge pages from pinning. > > So your patch now prevents PTEs for huge pages from being pinned in > the pgd for x86_32, which departs from the current behavior. However, > it does not for x86_64 which is already the current behavior. > > So the question remains: why this difference? We are still not on the same page and i still do not know that difference exaclty you are talking about. Let us please write inline when talking about concrete code. The return values are different between 32 and 64 because that is just how mainline works. The large is checked on pmd und pud because the huge ioremap patch has the potential to make these two levels large. The 32bit patch is derived from 64bit and i might have to check that in more detail. Henning ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v2] ipipe x86 mm: handle huge pages in memory pinning 2016-01-29 17:11 ` Philippe Gerum 2016-01-29 18:39 ` Gilles Chanteperdrix @ 2016-02-02 11:41 ` Henning Schild 1 sibling, 0 replies; 37+ messages in thread From: Henning Schild @ 2016-02-02 11:41 UTC (permalink / raw) To: Philippe Gerum; +Cc: Jan Kiszka, Xenomai On Fri, 29 Jan 2016 18:11:07 +0100 Philippe Gerum <rpm@xenomai.org> wrote: > On 01/28/2016 09:53 PM, Henning Schild wrote: > > On Thu, 28 Jan 2016 11:53:08 +0100 > > Philippe Gerum <rpm@xenomai.org> wrote: > > > >> On 01/27/2016 02:41 PM, Henning Schild wrote: > >>> In 4.1 huge page mapping of io memory was introduced, enable ipipe > >>> to handle that when pinning kernel memory. > >>> > >>> change that introduced the feature > >>> 0f616be120c632c818faaea9adcb8f05a7a8601f > >>> > >>> Signed-off-by: Henning Schild <henning.schild@siemens.com> > >>> --- > >>> arch/x86/mm/fault.c | 8 ++++++++ > >>> 1 file changed, 8 insertions(+) > >>> > >>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > >>> index fd5bbcc..ca1e75b 100644 > >>> --- a/arch/x86/mm/fault.c > >>> +++ b/arch/x86/mm/fault.c > >>> @@ -211,11 +211,15 @@ static inline pmd_t *vmalloc_sync_one(pgd_t > >>> *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address); > >>> if (!pud_present(*pud_k)) > >>> return NULL; > >>> + if (pud_large(*pud)) > >>> + return pud_k; > >>> > >>> pmd = pmd_offset(pud, address); > >>> pmd_k = pmd_offset(pud_k, address); > >>> if (!pmd_present(*pmd_k)) > >>> return NULL; > >>> + if (pmd_large(*pmd)) > >>> + return pmd_k; > >>> > >>> if (!pmd_present(*pmd)) > >>> set_pmd(pmd, *pmd_k); > >>> @@ -400,6 +404,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, > >>> unsigned long address) > >>> if (pud_none(*pud) || pud_page_vaddr(*pud) != > >>> pud_page_vaddr(*pud_ref)) BUG(); > >>> + if (pud_large(*pud)) > >>> + return 0; > >>> > >>> pmd = pmd_offset(pud, address); > >>> pmd_ref = pmd_offset(pud_ref, address); > >>> @@ -408,6 +414,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, > >>> unsigned long address) > >>> if (pmd_none(*pmd) || pmd_page(*pmd) != > >>> pmd_page(*pmd_ref)) BUG(); > >>> + if (pmd_large(*pmd)) > >>> + return 0; > >>> > >>> pte_ref = pte_offset_kernel(pmd_ref, address); > >>> if (!pte_present(*pte_ref)) > >>> > >> > >> I'm confused. Assuming the purpose of that patch is to exclude huge > >> I/O mappings from pte pinning, why does the changes to the x86_32 > >> version of the vmalloc_sync_one() helper actually prevent such > >> pinning, while the x86_64 version does not? > > > > No the purpose is to include them just like they were before. > > vanilla vmalloc_sync_one just must not be called on huge mappings > > because it cant handle them. The patch is supposed to make the > > function return successfully, stopping early when huge pages are > > detected. > > > > It changes the implementation of both x86_32 and x86_64. > > > > Sorry, your answer confuses me even more. vmalloc_sync_one() _does_ > the pinning, by copying over the kernel mapping, early in the course > of the routine for x86_64, late for x86_32. > > Please explain why your changes prevent huge I/O mappings from being > pinned into the current page directory in the x86_32 implementation, > but still allow this to be done in the x86_64 version. The section of > code you patched in the latter case is basically a series of sanity > checks done after the pinning took place, not before. There is no difference between 32 and 64bits. After the patch the memory will get pinned like it was before. The "sanity checks" are required when you want to call vmalloc_sync_one on a range that contains huge pages. They actually make sure that the function does not dig deeper treating the huge pages as pagetables. The initial problem is that the huge page itself was accessed as if it was a pagetable. An offset into it was derefenced which caused a #PF. The upstream kernel seems to never take this path for areas that contain huge pages, but we do. That is why we have to introduce these checks in the pagetable walker. > On a more general note, a better approach would be to filter out calls > to vmalloc_sync_one() for huge pages directly from __ipipe_pin_mapping > globally(). > ^ permalink raw reply [flat|nested] 37+ messages in thread
* [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning 2016-01-27 13:41 ` [Xenomai] [PATCH v2] " Henning Schild 2016-01-28 10:53 ` Philippe Gerum @ 2016-02-03 12:59 ` Henning Schild 2016-02-03 14:24 ` Gilles Chanteperdrix 2016-02-08 8:44 ` Henning Schild 1 sibling, 2 replies; 37+ messages in thread From: Henning Schild @ 2016-02-03 12:59 UTC (permalink / raw) To: Xenomai; +Cc: Jan Kiszka, Gilles Chanteperdrix In 4.1 huge page mapping of io memory was introduced, enable ipipe to handle that when pinning kernel memory. change that introduced the feature 0f616be120c632c818faaea9adcb8f05a7a8601f Signed-off-by: Henning Schild <henning.schild@siemens.com> --- arch/x86/mm/fault.c | 6 ++++++ 1 file changed, 6 insertions(+) This version has the 32bit case changed according to the problems found in the review. That patch finally should keep the old behaviour. But we should still discuss whether the old behaviour was correct and should be kept. If the pinning it not required it should be removed instead of beeing made more complex. diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fd5bbcc..e78bd09 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -211,6 +211,8 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address); if (!pud_present(*pud_k)) return NULL; + if (pud_large(*pud)) + return NULL; pmd = pmd_offset(pud, address); pmd_k = pmd_offset(pud_k, address); @@ -400,6 +402,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address) if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref)) BUG(); + if (pud_large(*pud)) + return 0; pmd = pmd_offset(pud, address); pmd_ref = pmd_offset(pud_ref, address); @@ -408,6 +412,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, unsigned long address) if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) BUG(); + if (pmd_large(*pmd)) + return 0; pte_ref = pte_offset_kernel(pmd_ref, address); if (!pte_present(*pte_ref)) -- 2.4.10 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning 2016-02-03 12:59 ` [Xenomai] [PATCH v3] " Henning Schild @ 2016-02-03 14:24 ` Gilles Chanteperdrix 2016-02-03 14:31 ` Jan Kiszka 2016-02-04 11:53 ` Henning Schild 2016-02-08 8:44 ` Henning Schild 1 sibling, 2 replies; 37+ messages in thread From: Gilles Chanteperdrix @ 2016-02-03 14:24 UTC (permalink / raw) To: Henning Schild; +Cc: Jan Kiszka, Xenomai On Wed, Feb 03, 2016 at 01:59:25PM +0100, Henning Schild wrote: > In 4.1 huge page mapping of io memory was introduced, enable ipipe to > handle that when pinning kernel memory. > > change that introduced the feature > 0f616be120c632c818faaea9adcb8f05a7a8601f Could we have an assessment of whether avoiding the call to __ipipe_pin_range_mapping in upper layers makes the patch simpler? -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning 2016-02-03 14:24 ` Gilles Chanteperdrix @ 2016-02-03 14:31 ` Jan Kiszka 2016-02-03 14:38 ` Gilles Chanteperdrix 2016-02-04 11:53 ` Henning Schild 1 sibling, 1 reply; 37+ messages in thread From: Jan Kiszka @ 2016-02-03 14:31 UTC (permalink / raw) To: Gilles Chanteperdrix, Henning Schild; +Cc: Xenomai On 2016-02-03 15:24, Gilles Chanteperdrix wrote: > On Wed, Feb 03, 2016 at 01:59:25PM +0100, Henning Schild wrote: >> In 4.1 huge page mapping of io memory was introduced, enable ipipe to >> handle that when pinning kernel memory. >> >> change that introduced the feature >> 0f616be120c632c818faaea9adcb8f05a7a8601f > > Could we have an assessment of whether avoiding the call to > __ipipe_pin_range_mapping in upper layers makes the patch simpler? For that, we first of all need to recapitulate how __ipipe_pin_range_globally is/was supposed to work at all. I tried to restore details but I'm specifically failing regarding that "globally". To my understanding, the vmalloc_sync_one of x86 transfers at best mappings from init_mm to the current mm one, but not to all mm in the systems. Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning 2016-02-03 14:31 ` Jan Kiszka @ 2016-02-03 14:38 ` Gilles Chanteperdrix 2016-02-03 14:51 ` Jan Kiszka 0 siblings, 1 reply; 37+ messages in thread From: Gilles Chanteperdrix @ 2016-02-03 14:38 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai On Wed, Feb 03, 2016 at 03:31:10PM +0100, Jan Kiszka wrote: > On 2016-02-03 15:24, Gilles Chanteperdrix wrote: > > On Wed, Feb 03, 2016 at 01:59:25PM +0100, Henning Schild wrote: > >> In 4.1 huge page mapping of io memory was introduced, enable ipipe to > >> handle that when pinning kernel memory. > >> > >> change that introduced the feature > >> 0f616be120c632c818faaea9adcb8f05a7a8601f > > > > Could we have an assessment of whether avoiding the call to > > __ipipe_pin_range_mapping in upper layers makes the patch simpler? > > For that, we first of all need to recapitulate how > __ipipe_pin_range_globally is/was supposed to work at all. > > I tried to restore details but I'm specifically failing regarding that > "globally". To my understanding, the vmalloc_sync_one of x86 transfers > at best mappings from init_mm to the current mm one, but not to all mm > in the systems. __ipipe_pin_range_globally calls vmalloc_sync_one for all pgds in the system. This is pretty ugly, not very scalable, but if you do not do that, each access to an ioremap/vmalloc area in, say, an interrupt handler, causes a fault for processes that do not have the mapping in their page tables. Such processes exist if they were created before the call to ioremap/vmalloc. Another possible fix to this issue is to allow handling that kind of faults over the head domain without switching to secondary domain. -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning 2016-02-03 14:38 ` Gilles Chanteperdrix @ 2016-02-03 14:51 ` Jan Kiszka 2016-02-03 15:02 ` Gilles Chanteperdrix 0 siblings, 1 reply; 37+ messages in thread From: Jan Kiszka @ 2016-02-03 14:51 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Xenomai On 2016-02-03 15:38, Gilles Chanteperdrix wrote: > On Wed, Feb 03, 2016 at 03:31:10PM +0100, Jan Kiszka wrote: >> On 2016-02-03 15:24, Gilles Chanteperdrix wrote: >>> On Wed, Feb 03, 2016 at 01:59:25PM +0100, Henning Schild wrote: >>>> In 4.1 huge page mapping of io memory was introduced, enable ipipe to >>>> handle that when pinning kernel memory. >>>> >>>> change that introduced the feature >>>> 0f616be120c632c818faaea9adcb8f05a7a8601f >>> >>> Could we have an assessment of whether avoiding the call to >>> __ipipe_pin_range_mapping in upper layers makes the patch simpler? >> >> For that, we first of all need to recapitulate how >> __ipipe_pin_range_globally is/was supposed to work at all. >> >> I tried to restore details but I'm specifically failing regarding that >> "globally". To my understanding, the vmalloc_sync_one of x86 transfers >> at best mappings from init_mm to the current mm one, but not to all mm >> in the systems. > > __ipipe_pin_range_globally calls vmalloc_sync_one for all pgds in > the system. This is pretty ugly, not very scalable, but if you do How are future pgds accounted for? > not do that, each access to an ioremap/vmalloc area in, say, an > interrupt handler, causes a fault for processes that do not have the > mapping in their page tables. Such processes exist if they were > created before the call to ioremap/vmalloc. Another possible fix to > this issue is to allow handling that kind of faults over the head > domain without switching to secondary domain. OK, that makes more sense again. But then Henning is definitely on the right path, because you can't tell from 'start' and 'end' or the pgd pointer if there were only huge pages added or maybe also some small pages. IOW, we do have to walk the page table trees and therefore have to be prepared to hit some huge pages along that path. Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning 2016-02-03 14:51 ` Jan Kiszka @ 2016-02-03 15:02 ` Gilles Chanteperdrix 2016-02-03 15:14 ` Jan Kiszka 0 siblings, 1 reply; 37+ messages in thread From: Gilles Chanteperdrix @ 2016-02-03 15:02 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai On Wed, Feb 03, 2016 at 03:51:34PM +0100, Jan Kiszka wrote: > On 2016-02-03 15:38, Gilles Chanteperdrix wrote: > > On Wed, Feb 03, 2016 at 03:31:10PM +0100, Jan Kiszka wrote: > >> On 2016-02-03 15:24, Gilles Chanteperdrix wrote: > >>> On Wed, Feb 03, 2016 at 01:59:25PM +0100, Henning Schild wrote: > >>>> In 4.1 huge page mapping of io memory was introduced, enable ipipe to > >>>> handle that when pinning kernel memory. > >>>> > >>>> change that introduced the feature > >>>> 0f616be120c632c818faaea9adcb8f05a7a8601f > >>> > >>> Could we have an assessment of whether avoiding the call to > >>> __ipipe_pin_range_mapping in upper layers makes the patch simpler? > >> > >> For that, we first of all need to recapitulate how > >> __ipipe_pin_range_globally is/was supposed to work at all. > >> > >> I tried to restore details but I'm specifically failing regarding that > >> "globally". To my understanding, the vmalloc_sync_one of x86 transfers > >> at best mappings from init_mm to the current mm one, but not to all mm > >> in the systems. > > > > __ipipe_pin_range_globally calls vmalloc_sync_one for all pgds in > > the system. This is pretty ugly, not very scalable, but if you do > > How are future pgds accounted for? vmalloc and ioremap add their mappings to init_mm, all processes created after that copy their kernel mapping from init_mm, so inherit the vmalloc/ioremap mappings. > > > not do that, each access to an ioremap/vmalloc area in, say, an > > interrupt handler, causes a fault for processes that do not have the > > mapping in their page tables. Such processes exist if they were > > created before the call to ioremap/vmalloc. Another possible fix to > > this issue is to allow handling that kind of faults over the head > > domain without switching to secondary domain. > > OK, that makes more sense again. > > But then Henning is definitely on the right path, because you can't tell > from 'start' and 'end' or the pgd pointer if there were only huge pages > added or maybe also some small pages. IOW, we do have to walk the page > table trees and therefore have to be prepared to hit some huge pages > along that path. The function which creates the mappings has to know that it creates huge mapping, what I believe needs to be checked is whether that information can easily be made available to __ipipe_pin_range_globally call sites. Because if it can, we can simply skip the call, and the patch will be simpler that what is currently proposed by Henning. -- Gilles. https://click-hack.org ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning 2016-02-03 15:02 ` Gilles Chanteperdrix @ 2016-02-03 15:14 ` Jan Kiszka 0 siblings, 0 replies; 37+ messages in thread From: Jan Kiszka @ 2016-02-03 15:14 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Xenomai On 2016-02-03 16:02, Gilles Chanteperdrix wrote: > On Wed, Feb 03, 2016 at 03:51:34PM +0100, Jan Kiszka wrote: >> On 2016-02-03 15:38, Gilles Chanteperdrix wrote: >>> On Wed, Feb 03, 2016 at 03:31:10PM +0100, Jan Kiszka wrote: >>>> On 2016-02-03 15:24, Gilles Chanteperdrix wrote: >>>>> On Wed, Feb 03, 2016 at 01:59:25PM +0100, Henning Schild wrote: >>>>>> In 4.1 huge page mapping of io memory was introduced, enable ipipe to >>>>>> handle that when pinning kernel memory. >>>>>> >>>>>> change that introduced the feature >>>>>> 0f616be120c632c818faaea9adcb8f05a7a8601f >>>>> >>>>> Could we have an assessment of whether avoiding the call to >>>>> __ipipe_pin_range_mapping in upper layers makes the patch simpler? >>>> >>>> For that, we first of all need to recapitulate how >>>> __ipipe_pin_range_globally is/was supposed to work at all. >>>> >>>> I tried to restore details but I'm specifically failing regarding that >>>> "globally". To my understanding, the vmalloc_sync_one of x86 transfers >>>> at best mappings from init_mm to the current mm one, but not to all mm >>>> in the systems. >>> >>> __ipipe_pin_range_globally calls vmalloc_sync_one for all pgds in >>> the system. This is pretty ugly, not very scalable, but if you do >> >> How are future pgds accounted for? > > vmalloc and ioremap add their mappings to init_mm, all processes > created after that copy their kernel mapping from init_mm, so > inherit the vmalloc/ioremap mappings. > >> >>> not do that, each access to an ioremap/vmalloc area in, say, an >>> interrupt handler, causes a fault for processes that do not have the >>> mapping in their page tables. Such processes exist if they were >>> created before the call to ioremap/vmalloc. Another possible fix to >>> this issue is to allow handling that kind of faults over the head >>> domain without switching to secondary domain. >> >> OK, that makes more sense again. >> >> But then Henning is definitely on the right path, because you can't tell >> from 'start' and 'end' or the pgd pointer if there were only huge pages >> added or maybe also some small pages. IOW, we do have to walk the page >> table trees and therefore have to be prepared to hit some huge pages >> along that path. > > The function which creates the mappings has to know that it creates > huge mapping, > what I believe needs to be checked is whether that > information can easily be made available to > __ipipe_pin_range_globally call sites. Because if it can, we can > simply skip the call, and the patch will be simpler that what is > currently proposed by Henning. Nope, this is an internal optimization of the mapping functions: If the requested range is large enough for a huge page and the arch supports that size, it will be used (because that's faster). So, if we want to use vmalloc_sync_one for our purposes, we need to enhance it. The alternative is to reimplement it... Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning 2016-02-03 14:24 ` Gilles Chanteperdrix 2016-02-03 14:31 ` Jan Kiszka @ 2016-02-04 11:53 ` Henning Schild 1 sibling, 0 replies; 37+ messages in thread From: Henning Schild @ 2016-02-04 11:53 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Jan Kiszka, Xenomai On Wed, 3 Feb 2016 15:24:48 +0100 Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote: > On Wed, Feb 03, 2016 at 01:59:25PM +0100, Henning Schild wrote: > > In 4.1 huge page mapping of io memory was introduced, enable ipipe > > to handle that when pinning kernel memory. > > > > change that introduced the feature > > 0f616be120c632c818faaea9adcb8f05a7a8601f > > Could we have an assessment of whether avoiding the call to > __ipipe_pin_range_mapping in upper layers makes the patch simpler? I will have a look at that. Specifically why mainline does not run into that. With the lazy nature of the mappings, when it comes to other mm's, i still do not see why the vmalloc_fault path does not deal with huge pages. Henning ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning 2016-02-03 12:59 ` [Xenomai] [PATCH v3] " Henning Schild 2016-02-03 14:24 ` Gilles Chanteperdrix @ 2016-02-08 8:44 ` Henning Schild 2016-03-07 7:58 ` Henning Schild 1 sibling, 1 reply; 37+ messages in thread From: Henning Schild @ 2016-02-08 8:44 UTC (permalink / raw) To: Xenomai; +Cc: Jan Kiszka, Gilles Chanteperdrix Hey, i discussed the issue with the guy that introduced huge ioremap. He agreed that mainline also needs to handle huge pages in vmalloc_fault and came up with a very similar patch. (the discussion can partially be found in this lists archive) I assume it will soon hit mainline and will get backported to 4.1. So i guess we can just wait for that and use "nohugeiomap" until the issue is fixed mainline and merged in our 4.1.y branch. Henning On Wed, 3 Feb 2016 13:59:25 +0100 Henning Schild <henning.schild@siemens.com> wrote: > In 4.1 huge page mapping of io memory was introduced, enable ipipe to > handle that when pinning kernel memory. > > change that introduced the feature > 0f616be120c632c818faaea9adcb8f05a7a8601f > > Signed-off-by: Henning Schild <henning.schild@siemens.com> > --- > arch/x86/mm/fault.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > This version has the 32bit case changed according to the problems > found in the review. That patch finally should keep the old > behaviour. But we should still discuss whether the old behaviour was > correct and should be kept. If the pinning it not required it should > be removed instead of beeing made more complex. > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > index fd5bbcc..e78bd09 100644 > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -211,6 +211,8 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, > unsigned long address) pud_k = pud_offset(pgd_k, address); > if (!pud_present(*pud_k)) > return NULL; > + if (pud_large(*pud)) > + return NULL; > > pmd = pmd_offset(pud, address); > pmd_k = pmd_offset(pud_k, address); > @@ -400,6 +402,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, > unsigned long address) > if (pud_none(*pud) || pud_page_vaddr(*pud) != > pud_page_vaddr(*pud_ref)) BUG(); > + if (pud_large(*pud)) > + return 0; > > pmd = pmd_offset(pud, address); > pmd_ref = pmd_offset(pud_ref, address); > @@ -408,6 +412,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, > unsigned long address) > if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) > BUG(); > + if (pmd_large(*pmd)) > + return 0; > > pte_ref = pte_offset_kernel(pmd_ref, address); > if (!pte_present(*pte_ref)) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [Xenomai] [PATCH v3] ipipe x86 mm: handle huge pages in memory pinning 2016-02-08 8:44 ` Henning Schild @ 2016-03-07 7:58 ` Henning Schild 0 siblings, 0 replies; 37+ messages in thread From: Henning Schild @ 2016-03-07 7:58 UTC (permalink / raw) To: Xenomai; +Cc: Jan Kiszka, Gilles Chanteperdrix Hey, the patch went into 4.1.19, please rebase/merge so ipipe-4.1.y contains it. Henning On Mon, 8 Feb 2016 09:44:28 +0100 Henning Schild <henning.schild@siemens.com> wrote: > Hey, > > i discussed the issue with the guy that introduced huge ioremap. He > agreed that mainline also needs to handle huge pages in vmalloc_fault > and came up with a very similar patch. (the discussion can partially > be found in this lists archive) > I assume it will soon hit mainline and will get backported to 4.1. So > i guess we can just wait for that and use "nohugeiomap" until the > issue is fixed mainline and merged in our 4.1.y branch. > > Henning > > On Wed, 3 Feb 2016 13:59:25 +0100 > Henning Schild <henning.schild@siemens.com> wrote: > > > In 4.1 huge page mapping of io memory was introduced, enable ipipe > > to handle that when pinning kernel memory. > > > > change that introduced the feature > > 0f616be120c632c818faaea9adcb8f05a7a8601f > > > > Signed-off-by: Henning Schild <henning.schild@siemens.com> > > --- > > arch/x86/mm/fault.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > This version has the 32bit case changed according to the problems > > found in the review. That patch finally should keep the old > > behaviour. But we should still discuss whether the old behaviour was > > correct and should be kept. If the pinning it not required it should > > be removed instead of beeing made more complex. > > > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > > index fd5bbcc..e78bd09 100644 > > --- a/arch/x86/mm/fault.c > > +++ b/arch/x86/mm/fault.c > > @@ -211,6 +211,8 @@ static inline pmd_t *vmalloc_sync_one(pgd_t > > *pgd, unsigned long address) pud_k = pud_offset(pgd_k, address); > > if (!pud_present(*pud_k)) > > return NULL; > > + if (pud_large(*pud)) > > + return NULL; > > > > pmd = pmd_offset(pud, address); > > pmd_k = pmd_offset(pud_k, address); > > @@ -400,6 +402,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, > > unsigned long address) > > if (pud_none(*pud) || pud_page_vaddr(*pud) != > > pud_page_vaddr(*pud_ref)) BUG(); > > + if (pud_large(*pud)) > > + return 0; > > > > pmd = pmd_offset(pud, address); > > pmd_ref = pmd_offset(pud_ref, address); > > @@ -408,6 +412,8 @@ static inline int vmalloc_sync_one(pgd_t *pgd, > > unsigned long address) > > if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref)) > > BUG(); > > + if (pmd_large(*pmd)) > > + return 0; > > > > pte_ref = pte_offset_kernel(pmd_ref, address); > > if (!pte_present(*pte_ref)) > > > _______________________________________________ > Xenomai mailing list > Xenomai@xenomai.org > http://xenomai.org/mailman/listinfo/xenomai ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2016-03-07 7:58 UTC | newest] Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-14 17:34 [Xenomai] ipipe x86_64 huge page ioremap Henning Schild 2016-01-15 8:32 ` Philippe Gerum 2016-01-15 12:34 ` Jan Kiszka 2016-02-02 17:43 ` Henning Schild 2016-02-03 15:07 ` Philippe Gerum 2016-02-03 15:48 ` Philippe Gerum 2016-02-04 11:43 ` Henning Schild 2016-01-26 15:20 ` [Xenomai] [PATCH] ipipe x86 mm: handle huge pages in memory pinning Henning Schild 2016-01-26 20:18 ` Jan Kiszka 2016-01-27 9:54 ` Henning Schild 2016-01-27 10:31 ` Jan Kiszka 2016-01-27 10:44 ` Philippe Gerum 2016-01-27 10:46 ` Philippe Gerum 2016-01-27 13:41 ` [Xenomai] [PATCH v2] " Henning Schild 2016-01-28 10:53 ` Philippe Gerum 2016-01-28 20:53 ` Henning Schild 2016-01-29 17:11 ` Philippe Gerum 2016-01-29 18:39 ` Gilles Chanteperdrix 2016-02-02 12:08 ` Henning Schild 2016-02-02 13:58 ` Philippe Gerum 2016-02-02 16:38 ` Henning Schild 2016-02-02 16:39 ` Jan Kiszka 2016-02-02 19:26 ` Gilles Chanteperdrix 2016-02-03 11:35 ` Henning Schild 2016-02-02 14:18 ` Philippe Gerum 2016-02-02 16:30 ` Henning Schild 2016-02-02 11:41 ` Henning Schild 2016-02-03 12:59 ` [Xenomai] [PATCH v3] " Henning Schild 2016-02-03 14:24 ` Gilles Chanteperdrix 2016-02-03 14:31 ` Jan Kiszka 2016-02-03 14:38 ` Gilles Chanteperdrix 2016-02-03 14:51 ` Jan Kiszka 2016-02-03 15:02 ` Gilles Chanteperdrix 2016-02-03 15:14 ` Jan Kiszka 2016-02-04 11:53 ` Henning Schild 2016-02-08 8:44 ` Henning Schild 2016-03-07 7:58 ` Henning Schild
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.