* [patch 0/2] x86/mm/pti: Robustness updates @ 2019-08-28 14:24 Thomas Gleixner 2019-08-28 14:24 ` [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() Thomas Gleixner ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Thomas Gleixner @ 2019-08-28 14:24 UTC (permalink / raw) To: LKML Cc: x86, Song Liu, Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rik van Riel Following up on the discussions around the patch Song submitted to 'cure' a iTLB related performance regression, I picked up Song's patch which makes clone_page_tables() more robust by handling unaligned addresses proper and added one which prevents calling into the PTI code when PTI is enabled compile time, but disabled at runtime (command line or CPU not affected). There is no point in calling into those PTI functions unconditionally. The resulting page tables are the same before and after the change which makes sense as the code clones the kernel page table into the secondary page table space which is available but not used when PTI is boot time disabled. But even if it does not do damage today, this could have nasty side effect when the PTI code is changed, extended etc. later. Thanks, tglx 8<---------------- pti.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) ^ permalink raw reply [flat|nested] 26+ messages in thread
* [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() 2019-08-28 14:24 [patch 0/2] x86/mm/pti: Robustness updates Thomas Gleixner @ 2019-08-28 14:24 ` Thomas Gleixner 2019-08-28 15:46 ` Dave Hansen ` (2 more replies) 2019-08-28 14:24 ` [patch 2/2] x86/mm/pti: Do not invoke PTI functions when PTI is disabled Thomas Gleixner 2019-08-28 16:03 ` [patch 0/2] x86/mm/pti: Robustness updates Peter Zijlstra 2 siblings, 3 replies; 26+ messages in thread From: Thomas Gleixner @ 2019-08-28 14:24 UTC (permalink / raw) To: LKML Cc: x86, Song Liu, Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rik van Riel From: Song Liu <songliubraving@fb.com> pti_clone_pmds() assumes that the supplied address is either: - properly PUD/PMD aligned or - the address is actually mapped which means that independent of the mapping level (PUD/PMD/PTE) the next higher mapping exist. If that's not the case the unaligned address can be incremented by PUD or PMD size wrongly. All callers supply mapped and/or aligned addresses, but for robustness sake, it's better to handle that case proper and to emit a warning. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190820202314.1083149-1-songliubraving@fb.com --- arch/x86/mm/pti.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -330,13 +330,15 @@ pti_clone_pgtable(unsigned long start, u pud = pud_offset(p4d, addr); if (pud_none(*pud)) { - addr += PUD_SIZE; + WARN_ON_ONCE(addr & PUD_MASK); + addr = round_up(addr + 1, PUD_SIZE); continue; } pmd = pmd_offset(pud, addr); if (pmd_none(*pmd)) { - addr += PMD_SIZE; + WARN_ON_ONCE(addr & PMD_MASK); + addr = round_up(addr + 1, PMD_SIZE); continue; } ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() 2019-08-28 14:24 ` [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() Thomas Gleixner @ 2019-08-28 15:46 ` Dave Hansen 2019-08-28 15:51 ` Thomas Gleixner 2019-08-28 18:58 ` [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() Ingo Molnar 2019-08-28 20:54 ` [patch V2 " Thomas Gleixner 2 siblings, 1 reply; 26+ messages in thread From: Dave Hansen @ 2019-08-28 15:46 UTC (permalink / raw) To: Thomas Gleixner, LKML Cc: x86, Song Liu, Joerg Roedel, Andy Lutomirski, Peter Zijlstra, Rik van Riel On 8/28/19 7:24 AM, Thomas Gleixner wrote: > From: Song Liu <songliubraving@fb.com> > > pti_clone_pmds() assumes that the supplied address is either: > > - properly PUD/PMD aligned > or > - the address is actually mapped which means that independent > of the mapping level (PUD/PMD/PTE) the next higher mapping > exist. > > If that's not the case the unaligned address can be incremented by PUD or > PMD size wrongly. All callers supply mapped and/or aligned addresses, but > for robustness sake, it's better to handle that case proper and to emit a > warning. Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Song, did you ever root-cause the performance regression? I thought there were still some mysteries there. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() 2019-08-28 15:46 ` Dave Hansen @ 2019-08-28 15:51 ` Thomas Gleixner 2019-08-28 17:58 ` Song Liu 0 siblings, 1 reply; 26+ messages in thread From: Thomas Gleixner @ 2019-08-28 15:51 UTC (permalink / raw) To: Dave Hansen Cc: LKML, x86, Song Liu, Joerg Roedel, Andy Lutomirski, Peter Zijlstra, Rik van Riel On Wed, 28 Aug 2019, Dave Hansen wrote: > On 8/28/19 7:24 AM, Thomas Gleixner wrote: > > From: Song Liu <songliubraving@fb.com> > > > > pti_clone_pmds() assumes that the supplied address is either: > > > > - properly PUD/PMD aligned > > or > > - the address is actually mapped which means that independent > > of the mapping level (PUD/PMD/PTE) the next higher mapping > > exist. > > > > If that's not the case the unaligned address can be incremented by PUD or > > PMD size wrongly. All callers supply mapped and/or aligned addresses, but > > for robustness sake, it's better to handle that case proper and to emit a > > warning. > > Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> > > Song, did you ever root-cause the performance regression? I thought > there were still some mysteries there. See Peter's series to rework the ftrace code patching ... ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() 2019-08-28 15:51 ` Thomas Gleixner @ 2019-08-28 17:58 ` Song Liu 2019-08-28 20:05 ` Thomas Gleixner 0 siblings, 1 reply; 26+ messages in thread From: Song Liu @ 2019-08-28 17:58 UTC (permalink / raw) To: Thomas Gleixner Cc: Dave Hansen, LKML, x86, Joerg Roedel, Andy Lutomirski, Peter Zijlstra, Rik van Riel > On Aug 28, 2019, at 8:51 AM, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Wed, 28 Aug 2019, Dave Hansen wrote: >> On 8/28/19 7:24 AM, Thomas Gleixner wrote: >>> From: Song Liu <songliubraving@fb.com> >>> >>> pti_clone_pmds() assumes that the supplied address is either: >>> >>> - properly PUD/PMD aligned >>> or >>> - the address is actually mapped which means that independent >>> of the mapping level (PUD/PMD/PTE) the next higher mapping >>> exist. >>> >>> If that's not the case the unaligned address can be incremented by PUD or >>> PMD size wrongly. All callers supply mapped and/or aligned addresses, but >>> for robustness sake, it's better to handle that case proper and to emit a >>> warning. >> >> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> >> >> Song, did you ever root-cause the performance regression? I thought >> there were still some mysteries there. > > See Peter's series to rework the ftrace code patching ... Thanks Thomas. Yes, in summary, enabling ftrace or kprobe-on-ftrace causes the kernel to split PMDs in kernel text mapping. Related question: while Peter's patches fix it for 5.3 kernel, they don't apply cleanly over 5.2 kernel (which we are using). So I wonder what is the best solution for 5.2 kernel. May patch also fixes the issue: https://lore.kernel.org/lkml/20190823052335.572133-1-songliubraving@fb.com/ How about we apply this patch to upstream 5.2 kernel? Thanks, Song ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() 2019-08-28 17:58 ` Song Liu @ 2019-08-28 20:05 ` Thomas Gleixner 2019-08-28 20:32 ` Song Liu 0 siblings, 1 reply; 26+ messages in thread From: Thomas Gleixner @ 2019-08-28 20:05 UTC (permalink / raw) To: Song Liu Cc: Dave Hansen, LKML, x86, Joerg Roedel, Andy Lutomirski, Peter Zijlstra, Rik van Riel On Wed, 28 Aug 2019, Song Liu wrote: > > On Aug 28, 2019, at 8:51 AM, Thomas Gleixner <tglx@linutronix.de> wrote: > > > > On Wed, 28 Aug 2019, Dave Hansen wrote: > >> On 8/28/19 7:24 AM, Thomas Gleixner wrote: > >>> From: Song Liu <songliubraving@fb.com> > >>> > >>> pti_clone_pmds() assumes that the supplied address is either: > >>> > >>> - properly PUD/PMD aligned > >>> or > >>> - the address is actually mapped which means that independent > >>> of the mapping level (PUD/PMD/PTE) the next higher mapping > >>> exist. > >>> > >>> If that's not the case the unaligned address can be incremented by PUD or > >>> PMD size wrongly. All callers supply mapped and/or aligned addresses, but > >>> for robustness sake, it's better to handle that case proper and to emit a > >>> warning. > >> > >> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> > >> > >> Song, did you ever root-cause the performance regression? I thought > >> there were still some mysteries there. > > > > See Peter's series to rework the ftrace code patching ... > > Thanks Thomas. > > Yes, in summary, enabling ftrace or kprobe-on-ftrace causes the kernel > to split PMDs in kernel text mapping. > > Related question: while Peter's patches fix it for 5.3 kernel, they don't > apply cleanly over 5.2 kernel (which we are using). So I wonder what is > the best solution for 5.2 kernel. May patch also fixes the issue: > > https://lore.kernel.org/lkml/20190823052335.572133-1-songliubraving@fb.com/ > > How about we apply this patch to upstream 5.2 kernel? That's not how it works. We fix stuff upstream and it gets backported to all affected kernels not just to the one you care about. Aside of that I really disagree with that hack. You completely fail to explain why that commit in question broke it and instead of fixing the underlying issue you create a horrible workaround. It took me ~10 minutes to analyze the root cause and I'm just booting the test box with a proper fix which can be actually tagged for stable and can be removed from upstream again once ftrace got moved over to text poke. I'll post it once it's confirmed to work and I wrote a comprehensible changelog. Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() 2019-08-28 20:05 ` Thomas Gleixner @ 2019-08-28 20:32 ` Song Liu 2019-08-28 22:31 ` [PATCH] x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text Thomas Gleixner 0 siblings, 1 reply; 26+ messages in thread From: Song Liu @ 2019-08-28 20:32 UTC (permalink / raw) To: Thomas Gleixner Cc: Dave Hansen, LKML, x86, Joerg Roedel, Andy Lutomirski, Peter Zijlstra, Rik van Riel > On Aug 28, 2019, at 1:05 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Wed, 28 Aug 2019, Song Liu wrote: >>> On Aug 28, 2019, at 8:51 AM, Thomas Gleixner <tglx@linutronix.de> wrote: >>> >>> On Wed, 28 Aug 2019, Dave Hansen wrote: >>>> On 8/28/19 7:24 AM, Thomas Gleixner wrote: >>>>> From: Song Liu <songliubraving@fb.com> >>>>> >>>>> pti_clone_pmds() assumes that the supplied address is either: >>>>> >>>>> - properly PUD/PMD aligned >>>>> or >>>>> - the address is actually mapped which means that independent >>>>> of the mapping level (PUD/PMD/PTE) the next higher mapping >>>>> exist. >>>>> >>>>> If that's not the case the unaligned address can be incremented by PUD or >>>>> PMD size wrongly. All callers supply mapped and/or aligned addresses, but >>>>> for robustness sake, it's better to handle that case proper and to emit a >>>>> warning. >>>> >>>> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> >>>> >>>> Song, did you ever root-cause the performance regression? I thought >>>> there were still some mysteries there. >>> >>> See Peter's series to rework the ftrace code patching ... >> >> Thanks Thomas. >> >> Yes, in summary, enabling ftrace or kprobe-on-ftrace causes the kernel >> to split PMDs in kernel text mapping. >> >> Related question: while Peter's patches fix it for 5.3 kernel, they don't >> apply cleanly over 5.2 kernel (which we are using). So I wonder what is >> the best solution for 5.2 kernel. May patch also fixes the issue: >> >> https://lore.kernel.org/lkml/20190823052335.572133-1-songliubraving@fb.com/ >> >> How about we apply this patch to upstream 5.2 kernel? > > That's not how it works. We fix stuff upstream and it gets backported to > all affected kernels not just to the one you care about. Agreed. I am trying to back port Peter's patch set to 5.2 kernel. There are 9 dependencies and some manual changes. > > Aside of that I really disagree with that hack. You completely fail to > explain why that commit in question broke it and instead of fixing the > underlying issue you create a horrible workaround. > > It took me ~10 minutes to analyze the root cause and I'm just booting the > test box with a proper fix which can be actually tagged for stable and can > be removed from upstream again once ftrace got moved over to text poke. > > I'll post it once it's confirmed to work and I wrote a comprehensible > changelog. This sounds great. Thanks! Song ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH] x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text 2019-08-28 20:32 ` Song Liu @ 2019-08-28 22:31 ` Thomas Gleixner 2019-08-28 23:03 ` Song Liu ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Thomas Gleixner @ 2019-08-28 22:31 UTC (permalink / raw) To: Song Liu Cc: Dave Hansen, LKML, x86, Joerg Roedel, Andy Lutomirski, Peter Zijlstra, Rik van Riel, Steven Rostedt ftrace does not use text_poke() for enabling trace functionality. It uses its own mechanism and flips the whole kernel text to RW and back to RO. The CPA rework removed a loop based check of 4k pages which tried to preserve a large page by checking each 4k page whether the change would actually cover all pages in the large page. This resulted in endless loops for nothing as in testing it turned out that it actually never preserved anything. Of course testing missed to include ftrace, which is the one and only case which benefitted from the 4k loop. As a consequence enabling function tracing or ftrace based kprobes results in a full 4k split of the kernel text, which affects iTLB performance. The kernel RO protection is the only valid case where this can actually preserve large pages. All other static protections (RO data, data NX, PCI, BIOS) are truly static. So a conflict with those protections which results in a split should only ever happen when a change of memory next to a protected region is attempted. But these conflicts are rightfully splitting the large page to preserve the protected regions. In fact a change to the protected regions itself is a bug and is warned about. Add an exception for the static protection check for kernel text RO when the to be changed region spawns a full large page which allows to preserve the large mappings. This also prevents the syslog to be spammed about CPA violations when ftrace is used. The exception needs to be removed once ftrace switched over to text_poke() which avoids the whole issue. Fixes: 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely") Reported-by: Song Liu <songliubraving@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org --- arch/x86/mm/pageattr.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -516,7 +516,7 @@ static inline void check_conflict(int wa */ static inline pgprot_t static_protections(pgprot_t prot, unsigned long start, unsigned long pfn, unsigned long npg, - int warnlvl) + unsigned long lpsize, int warnlvl) { pgprotval_t forbidden, res; unsigned long end; @@ -535,9 +535,17 @@ static inline pgprot_t static_protection check_conflict(warnlvl, prot, res, start, end, pfn, "Text NX"); forbidden = res; - res = protect_kernel_text_ro(start, end); - check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO"); - forbidden |= res; + /* + * Special case to preserve a large page. If the change spawns the + * full large page mapping then there is no point to split it + * up. Happens with ftrace and is going to be removed once ftrace + * switched to text_poke(). + */ + if (lpsize != (npg * PAGE_SIZE) || (start & (lpsize - 1))) { + res = protect_kernel_text_ro(start, end); + check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO"); + forbidden |= res; + } /* Check the PFN directly */ res = protect_pci_bios(pfn, pfn + npg - 1); @@ -819,7 +827,7 @@ static int __should_split_large_page(pte * extra conditional required here. */ chk_prot = static_protections(old_prot, lpaddr, old_pfn, numpages, - CPA_CONFLICT); + psize, CPA_CONFLICT); if (WARN_ON_ONCE(pgprot_val(chk_prot) != pgprot_val(old_prot))) { /* @@ -855,7 +863,7 @@ static int __should_split_large_page(pte * protection requirement in the large page. */ new_prot = static_protections(req_prot, lpaddr, old_pfn, numpages, - CPA_DETECT); + psize, CPA_DETECT); /* * If there is a conflict, split the large page. @@ -906,7 +914,8 @@ static void split_set_pte(struct cpa_dat if (!cpa->force_static_prot) goto set; - prot = static_protections(ref_prot, address, pfn, npg, CPA_PROTECT); + /* Hand in lpsize = 0 to enforce the protection mechanism */ + prot = static_protections(ref_prot, address, pfn, npg, 0, CPA_PROTECT); if (pgprot_val(prot) == pgprot_val(ref_prot)) goto set; @@ -1503,7 +1512,8 @@ static int __change_page_attr(struct cpa pgprot_val(new_prot) |= pgprot_val(cpa->mask_set); cpa_inc_4k_install(); - new_prot = static_protections(new_prot, address, pfn, 1, + /* Hand in lpsize = 0 to enforce the protection mechanism */ + new_prot = static_protections(new_prot, address, pfn, 1, 0, CPA_PROTECT); new_prot = pgprot_clear_protnone_bits(new_prot); ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text 2019-08-28 22:31 ` [PATCH] x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text Thomas Gleixner @ 2019-08-28 23:03 ` Song Liu 2019-08-29 13:01 ` Peter Zijlstra 2019-08-29 18:55 ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner 2 siblings, 0 replies; 26+ messages in thread From: Song Liu @ 2019-08-28 23:03 UTC (permalink / raw) To: Thomas Gleixner Cc: Dave Hansen, LKML, x86, Joerg Roedel, Andy Lutomirski, Peter Zijlstra, Rik van Riel, Steven Rostedt > On Aug 28, 2019, at 3:31 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > > ftrace does not use text_poke() for enabling trace functionality. It uses > its own mechanism and flips the whole kernel text to RW and back to RO. > > The CPA rework removed a loop based check of 4k pages which tried to > preserve a large page by checking each 4k page whether the change would > actually cover all pages in the large page. > > This resulted in endless loops for nothing as in testing it turned out that > it actually never preserved anything. Of course testing missed to include > ftrace, which is the one and only case which benefitted from the 4k loop. > > As a consequence enabling function tracing or ftrace based kprobes results > in a full 4k split of the kernel text, which affects iTLB performance. > > The kernel RO protection is the only valid case where this can actually > preserve large pages. > > All other static protections (RO data, data NX, PCI, BIOS) are truly > static. So a conflict with those protections which results in a split > should only ever happen when a change of memory next to a protected region > is attempted. But these conflicts are rightfully splitting the large page > to preserve the protected regions. In fact a change to the protected > regions itself is a bug and is warned about. > > Add an exception for the static protection check for kernel text RO when > the to be changed region spawns a full large page which allows to preserve > the large mappings. This also prevents the syslog to be spammed about CPA > violations when ftrace is used. > > The exception needs to be removed once ftrace switched over to text_poke() > which avoids the whole issue. > > Fixes: 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely") > Reported-by: Song Liu <songliubraving@fb.com> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Cc: stable@vger.kernel.org This looks great. Much cleaner than my workaround. Thanks! Reviewed-and-tested-by: Song Liu <songliubraving@fb.com> We need this for v4.20 to v5.3 (assuming Peter's patches will land in 5.4). > --- > arch/x86/mm/pageattr.c | 26 ++++++++++++++++++-------- > 1 file changed, 18 insertions(+), 8 deletions(-) > > --- a/arch/x86/mm/pageattr.c > +++ b/arch/x86/mm/pageattr.c > @@ -516,7 +516,7 @@ static inline void check_conflict(int wa > */ > static inline pgprot_t static_protections(pgprot_t prot, unsigned long start, > unsigned long pfn, unsigned long npg, > - int warnlvl) > + unsigned long lpsize, int warnlvl) > { > pgprotval_t forbidden, res; > unsigned long end; > @@ -535,9 +535,17 @@ static inline pgprot_t static_protection > check_conflict(warnlvl, prot, res, start, end, pfn, "Text NX"); > forbidden = res; > > - res = protect_kernel_text_ro(start, end); > - check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO"); > - forbidden |= res; > + /* > + * Special case to preserve a large page. If the change spawns the > + * full large page mapping then there is no point to split it > + * up. Happens with ftrace and is going to be removed once ftrace > + * switched to text_poke(). > + */ > + if (lpsize != (npg * PAGE_SIZE) || (start & (lpsize - 1))) { > + res = protect_kernel_text_ro(start, end); > + check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO"); > + forbidden |= res; > + } > > /* Check the PFN directly */ > res = protect_pci_bios(pfn, pfn + npg - 1); > @@ -819,7 +827,7 @@ static int __should_split_large_page(pte > * extra conditional required here. > */ > chk_prot = static_protections(old_prot, lpaddr, old_pfn, numpages, > - CPA_CONFLICT); > + psize, CPA_CONFLICT); > > if (WARN_ON_ONCE(pgprot_val(chk_prot) != pgprot_val(old_prot))) { > /* > @@ -855,7 +863,7 @@ static int __should_split_large_page(pte > * protection requirement in the large page. > */ > new_prot = static_protections(req_prot, lpaddr, old_pfn, numpages, > - CPA_DETECT); > + psize, CPA_DETECT); > > /* > * If there is a conflict, split the large page. > @@ -906,7 +914,8 @@ static void split_set_pte(struct cpa_dat > if (!cpa->force_static_prot) > goto set; > > - prot = static_protections(ref_prot, address, pfn, npg, CPA_PROTECT); > + /* Hand in lpsize = 0 to enforce the protection mechanism */ > + prot = static_protections(ref_prot, address, pfn, npg, 0, CPA_PROTECT); > > if (pgprot_val(prot) == pgprot_val(ref_prot)) > goto set; > @@ -1503,7 +1512,8 @@ static int __change_page_attr(struct cpa > pgprot_val(new_prot) |= pgprot_val(cpa->mask_set); > > cpa_inc_4k_install(); > - new_prot = static_protections(new_prot, address, pfn, 1, > + /* Hand in lpsize = 0 to enforce the protection mechanism */ > + new_prot = static_protections(new_prot, address, pfn, 1, 0, > CPA_PROTECT); > > new_prot = pgprot_clear_protnone_bits(new_prot); ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text 2019-08-28 22:31 ` [PATCH] x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text Thomas Gleixner 2019-08-28 23:03 ` Song Liu @ 2019-08-29 13:01 ` Peter Zijlstra 2019-08-29 18:55 ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner 2 siblings, 0 replies; 26+ messages in thread From: Peter Zijlstra @ 2019-08-29 13:01 UTC (permalink / raw) To: Thomas Gleixner Cc: Song Liu, Dave Hansen, LKML, x86, Joerg Roedel, Andy Lutomirski, Rik van Riel, Steven Rostedt On Thu, Aug 29, 2019 at 12:31:34AM +0200, Thomas Gleixner wrote: > arch/x86/mm/pageattr.c | 26 ++++++++++++++++++-------- > 1 file changed, 18 insertions(+), 8 deletions(-) > > --- a/arch/x86/mm/pageattr.c > +++ b/arch/x86/mm/pageattr.c > @@ -516,7 +516,7 @@ static inline void check_conflict(int wa > */ > static inline pgprot_t static_protections(pgprot_t prot, unsigned long start, > unsigned long pfn, unsigned long npg, > - int warnlvl) > + unsigned long lpsize, int warnlvl) > { > pgprotval_t forbidden, res; > unsigned long end; > @@ -535,9 +535,17 @@ static inline pgprot_t static_protection > check_conflict(warnlvl, prot, res, start, end, pfn, "Text NX"); > forbidden = res; > > - res = protect_kernel_text_ro(start, end); > - check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO"); > - forbidden |= res; > + /* > + * Special case to preserve a large page. If the change spawns the > + * full large page mapping then there is no point to split it > + * up. Happens with ftrace and is going to be removed once ftrace > + * switched to text_poke(). > + */ > + if (lpsize != (npg * PAGE_SIZE) || (start & (lpsize - 1))) { > + res = protect_kernel_text_ro(start, end); > + check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO"); > + forbidden |= res; > + } Right, so this allows the RW (doesn't enforce RO) and thereby doesn't force split, when it is a whole large page. > > /* Check the PFN directly */ > res = protect_pci_bios(pfn, pfn + npg - 1); > @@ -819,7 +827,7 @@ static int __should_split_large_page(pte > * extra conditional required here. > */ > chk_prot = static_protections(old_prot, lpaddr, old_pfn, numpages, > - CPA_CONFLICT); > + psize, CPA_CONFLICT); > > if (WARN_ON_ONCE(pgprot_val(chk_prot) != pgprot_val(old_prot))) { > /* > @@ -855,7 +863,7 @@ static int __should_split_large_page(pte > * protection requirement in the large page. > */ > new_prot = static_protections(req_prot, lpaddr, old_pfn, numpages, > - CPA_DETECT); > + psize, CPA_DETECT); > > /* > * If there is a conflict, split the large page. And these are the callsites in __should_split_large_page(), and you provide psize, and therefore we allow RW to preserve the large pages on the kernel text. > @@ -906,7 +914,8 @@ static void split_set_pte(struct cpa_dat > if (!cpa->force_static_prot) > goto set; > > - prot = static_protections(ref_prot, address, pfn, npg, CPA_PROTECT); > + /* Hand in lpsize = 0 to enforce the protection mechanism */ > + prot = static_protections(ref_prot, address, pfn, npg, 0, CPA_PROTECT); This is when we've already decided to split, in which case we might as well enforce the normal rules, and .lpsize=0 does just that. > > if (pgprot_val(prot) == pgprot_val(ref_prot)) > goto set; > @@ -1503,7 +1512,8 @@ static int __change_page_attr(struct cpa > pgprot_val(new_prot) |= pgprot_val(cpa->mask_set); > > cpa_inc_4k_install(); > - new_prot = static_protections(new_prot, address, pfn, 1, > + /* Hand in lpsize = 0 to enforce the protection mechanism */ > + new_prot = static_protections(new_prot, address, pfn, 1, 0, > CPA_PROTECT); And here we check the protections of a single 4k page, in which case large pages are irrelevant and again .lpsize=0 disables the new code. > > new_prot = pgprot_clear_protnone_bits(new_prot); That all seems OK I suppose. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [tip: x86/urgent] x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text 2019-08-28 22:31 ` [PATCH] x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text Thomas Gleixner 2019-08-28 23:03 ` Song Liu 2019-08-29 13:01 ` Peter Zijlstra @ 2019-08-29 18:55 ` tip-bot2 for Thomas Gleixner 2 siblings, 0 replies; 26+ messages in thread From: tip-bot2 for Thomas Gleixner @ 2019-08-29 18:55 UTC (permalink / raw) To: linux-tip-commits Cc: Song Liu, Thomas Gleixner, Peter Zijlstra (Intel), stable, Ingo Molnar, Borislav Petkov, linux-kernel The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 7af0145067bc429a09ac4047b167c0971c9f0dc7 Gitweb: https://git.kernel.org/tip/7af0145067bc429a09ac4047b167c0971c9f0dc7 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 29 Aug 2019 00:31:34 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Thu, 29 Aug 2019 20:48:44 +02:00 x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text ftrace does not use text_poke() for enabling trace functionality. It uses its own mechanism and flips the whole kernel text to RW and back to RO. The CPA rework removed a loop based check of 4k pages which tried to preserve a large page by checking each 4k page whether the change would actually cover all pages in the large page. This resulted in endless loops for nothing as in testing it turned out that it actually never preserved anything. Of course testing missed to include ftrace, which is the one and only case which benefitted from the 4k loop. As a consequence enabling function tracing or ftrace based kprobes results in a full 4k split of the kernel text, which affects iTLB performance. The kernel RO protection is the only valid case where this can actually preserve large pages. All other static protections (RO data, data NX, PCI, BIOS) are truly static. So a conflict with those protections which results in a split should only ever happen when a change of memory next to a protected region is attempted. But these conflicts are rightfully splitting the large page to preserve the protected regions. In fact a change to the protected regions itself is a bug and is warned about. Add an exception for the static protection check for kernel text RO when the to be changed region spawns a full large page which allows to preserve the large mappings. This also prevents the syslog to be spammed about CPA violations when ftrace is used. The exception needs to be removed once ftrace switched over to text_poke() which avoids the whole issue. Fixes: 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely") Reported-by: Song Liu <songliubraving@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Reviewed-by: Song Liu <songliubraving@fb.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908282355340.1938@nanos.tec.linutronix.de --- arch/x86/mm/pageattr.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 6a9a77a..e14e95e 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -516,7 +516,7 @@ static inline void check_conflict(int warnlvl, pgprot_t prot, pgprotval_t val, */ static inline pgprot_t static_protections(pgprot_t prot, unsigned long start, unsigned long pfn, unsigned long npg, - int warnlvl) + unsigned long lpsize, int warnlvl) { pgprotval_t forbidden, res; unsigned long end; @@ -535,9 +535,17 @@ static inline pgprot_t static_protections(pgprot_t prot, unsigned long start, check_conflict(warnlvl, prot, res, start, end, pfn, "Text NX"); forbidden = res; - res = protect_kernel_text_ro(start, end); - check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO"); - forbidden |= res; + /* + * Special case to preserve a large page. If the change spawns the + * full large page mapping then there is no point to split it + * up. Happens with ftrace and is going to be removed once ftrace + * switched to text_poke(). + */ + if (lpsize != (npg * PAGE_SIZE) || (start & (lpsize - 1))) { + res = protect_kernel_text_ro(start, end); + check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO"); + forbidden |= res; + } /* Check the PFN directly */ res = protect_pci_bios(pfn, pfn + npg - 1); @@ -819,7 +827,7 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address, * extra conditional required here. */ chk_prot = static_protections(old_prot, lpaddr, old_pfn, numpages, - CPA_CONFLICT); + psize, CPA_CONFLICT); if (WARN_ON_ONCE(pgprot_val(chk_prot) != pgprot_val(old_prot))) { /* @@ -855,7 +863,7 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address, * protection requirement in the large page. */ new_prot = static_protections(req_prot, lpaddr, old_pfn, numpages, - CPA_DETECT); + psize, CPA_DETECT); /* * If there is a conflict, split the large page. @@ -906,7 +914,8 @@ static void split_set_pte(struct cpa_data *cpa, pte_t *pte, unsigned long pfn, if (!cpa->force_static_prot) goto set; - prot = static_protections(ref_prot, address, pfn, npg, CPA_PROTECT); + /* Hand in lpsize = 0 to enforce the protection mechanism */ + prot = static_protections(ref_prot, address, pfn, npg, 0, CPA_PROTECT); if (pgprot_val(prot) == pgprot_val(ref_prot)) goto set; @@ -1503,7 +1512,8 @@ static int __change_page_attr(struct cpa_data *cpa, int primary) pgprot_val(new_prot) |= pgprot_val(cpa->mask_set); cpa_inc_4k_install(); - new_prot = static_protections(new_prot, address, pfn, 1, + /* Hand in lpsize = 0 to enforce the protection mechanism */ + new_prot = static_protections(new_prot, address, pfn, 1, 0, CPA_PROTECT); new_prot = pgprot_clear_protnone_bits(new_prot); ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() 2019-08-28 14:24 ` [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() Thomas Gleixner 2019-08-28 15:46 ` Dave Hansen @ 2019-08-28 18:58 ` Ingo Molnar 2019-08-28 19:45 ` Thomas Gleixner 2019-08-28 20:54 ` [patch V2 " Thomas Gleixner 2 siblings, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2019-08-28 18:58 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, x86, Song Liu, Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rik van Riel * Thomas Gleixner <tglx@linutronix.de> wrote: > From: Song Liu <songliubraving@fb.com> > > pti_clone_pmds() assumes that the supplied address is either: > > - properly PUD/PMD aligned > or > - the address is actually mapped which means that independent > of the mapping level (PUD/PMD/PTE) the next higher mapping > exist. s/independent /independently s/exist /exists > If that's not the case the unaligned address can be incremented by PUD or > PMD size wrongly. All callers supply mapped and/or aligned addresses, but > for robustness sake, it's better to handle that case proper and to emit a > warning. s/wrongly /incorrectly s/robustness sake /robustness's sake s/proper /properly With that: > pud = pud_offset(p4d, addr); > if (pud_none(*pud)) { > - addr += PUD_SIZE; > + WARN_ON_ONCE(addr & PUD_MASK); > + addr = round_up(addr + 1, PUD_SIZE); > continue; > } > > pmd = pmd_offset(pud, addr); > if (pmd_none(*pmd)) { > - addr += PMD_SIZE; > + WARN_ON_ONCE(addr & PMD_MASK); > + addr = round_up(addr + 1, PMD_SIZE); So given that PUD_MASK and PMD_MASK are masking out the *offset*: arch/x86/include/asm/pgtable_64_types.h:#define PMD_MASK (~(PMD_SIZE - 1)) Didn't we want something like: WARN_ON_ONCE(addr & ~PUD_MASK); WARN_ON_ONCE(addr & ~PMD_MASK); to warn about an unaligned 'addr', or am I misreading the intent here? Thanks, Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() 2019-08-28 18:58 ` [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() Ingo Molnar @ 2019-08-28 19:45 ` Thomas Gleixner 0 siblings, 0 replies; 26+ messages in thread From: Thomas Gleixner @ 2019-08-28 19:45 UTC (permalink / raw) To: Ingo Molnar Cc: LKML, x86, Song Liu, Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rik van Riel On Wed, 28 Aug 2019, Ingo Molnar wrote: > > pmd = pmd_offset(pud, addr); > > if (pmd_none(*pmd)) { > > - addr += PMD_SIZE; > > + WARN_ON_ONCE(addr & PMD_MASK); > > + addr = round_up(addr + 1, PMD_SIZE); > > So given that PUD_MASK and PMD_MASK are masking out the *offset*: > > arch/x86/include/asm/pgtable_64_types.h:#define PMD_MASK (~(PMD_SIZE - 1)) > > Didn't we want something like: > > WARN_ON_ONCE(addr & ~PUD_MASK); > > WARN_ON_ONCE(addr & ~PMD_MASK); > > to warn about an unaligned 'addr', or am I misreading the intent here? Bah, right you are... ^ permalink raw reply [flat|nested] 26+ messages in thread
* [patch V2 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() 2019-08-28 14:24 ` [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() Thomas Gleixner 2019-08-28 15:46 ` Dave Hansen 2019-08-28 18:58 ` [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() Ingo Molnar @ 2019-08-28 20:54 ` Thomas Gleixner 2019-08-28 21:52 ` Thomas Gleixner 2019-08-28 21:54 ` [patch V3 " Thomas Gleixner 2 siblings, 2 replies; 26+ messages in thread From: Thomas Gleixner @ 2019-08-28 20:54 UTC (permalink / raw) To: LKML Cc: x86, Song Liu, Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rik van Riel From: Song Liu <songliubraving@fb.com> pti_clone_pmds() assumes that the supplied address is either: - properly PUD/PMD aligned or - the address is actually mapped which means that independent of the mapping level (PUD/PMD/PTE) the next higher mapping exist. If that's not the case the unaligned address can be incremented by PUD or PMD size wrongly. All callers supply mapped and/or aligned addresses, but for robustness sake, it's better to handle that case proper and to emit a warning. [ tglx: Rewrote changelog and added WARN_ON_ONCE() ] Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- V2: Negate P[UM]D_MASK for checking whether the offset part is 0 --- arch/x86/mm/pti.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -330,13 +330,15 @@ pti_clone_pgtable(unsigned long start, u pud = pud_offset(p4d, addr); if (pud_none(*pud)) { - addr += PUD_SIZE; + WARN_ON_ONCE(addr & ~PUD_MASK); + addr = round_up(addr + 1, PUD_SIZE); continue; } pmd = pmd_offset(pud, addr); if (pmd_none(*pmd)) { - addr += PMD_SIZE; + WARN_ON_ONCE(addr & ~PMD_MASK); + addr = round_up(addr + 1, PMD_SIZE); continue; } ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch V2 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() 2019-08-28 20:54 ` [patch V2 " Thomas Gleixner @ 2019-08-28 21:52 ` Thomas Gleixner 2019-08-28 21:54 ` [patch V3 " Thomas Gleixner 1 sibling, 0 replies; 26+ messages in thread From: Thomas Gleixner @ 2019-08-28 21:52 UTC (permalink / raw) To: LKML Cc: x86, Song Liu, Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rik van Riel On Wed, 28 Aug 2019, Thomas Gleixner wrote: > From: Song Liu <songliubraving@fb.com> > > pti_clone_pmds() assumes that the supplied address is either: > > - properly PUD/PMD aligned > or > - the address is actually mapped which means that independent Bah. I'm a moron. Forgot to fix the spell checker issues. ^ permalink raw reply [flat|nested] 26+ messages in thread
* [patch V3 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() 2019-08-28 20:54 ` [patch V2 " Thomas Gleixner 2019-08-28 21:52 ` Thomas Gleixner @ 2019-08-28 21:54 ` Thomas Gleixner 2019-08-28 23:22 ` Ingo Molnar ` (2 more replies) 1 sibling, 3 replies; 26+ messages in thread From: Thomas Gleixner @ 2019-08-28 21:54 UTC (permalink / raw) To: LKML Cc: x86, Song Liu, Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rik van Riel From: Song Liu <songliubraving@fb.com> pti_clone_pmds() assumes that the supplied address is either: - properly PUD/PMD aligned or - the address is actually mapped which means that independently of the mapping level (PUD/PMD/PTE) the next higher mapping exists. If that's not the case the unaligned address can be incremented by PUD or PMD size incorrectly. All callers supply mapped and/or aligned addresses, but for the sake of robustness it's better to handle that case properly and to emit a warning. [ tglx: Rewrote changelog and added WARN_ON_ONCE() ] Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- V2: Negate P[UM]D_MASK for checking whether the offset part is 0 V3: Fix changelog --- arch/x86/mm/pti.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -330,13 +330,15 @@ pti_clone_pgtable(unsigned long start, u pud = pud_offset(p4d, addr); if (pud_none(*pud)) { - addr += PUD_SIZE; + WARN_ON_ONCE(addr & ~PUD_MASK); + addr = round_up(addr + 1, PUD_SIZE); continue; } pmd = pmd_offset(pud, addr); if (pmd_none(*pmd)) { - addr += PMD_SIZE; + WARN_ON_ONCE(addr & ~PMD_MASK); + addr = round_up(addr + 1, PMD_SIZE); continue; } ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch V3 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() 2019-08-28 21:54 ` [patch V3 " Thomas Gleixner @ 2019-08-28 23:22 ` Ingo Molnar 2019-08-29 19:02 ` [tip: x86/pti] " tip-bot2 for Song Liu 2019-08-30 10:24 ` [patch V3 1/2] " Joerg Roedel 2 siblings, 0 replies; 26+ messages in thread From: Ingo Molnar @ 2019-08-28 23:22 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, x86, Song Liu, Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rik van Riel * Thomas Gleixner <tglx@linutronix.de> wrote: > From: Song Liu <songliubraving@fb.com> > > pti_clone_pmds() assumes that the supplied address is either: > > - properly PUD/PMD aligned > or > - the address is actually mapped which means that independently > of the mapping level (PUD/PMD/PTE) the next higher mapping > exists. > > If that's not the case the unaligned address can be incremented by PUD or > PMD size incorrectly. All callers supply mapped and/or aligned addresses, > but for the sake of robustness it's better to handle that case properly and > to emit a warning. > > [ tglx: Rewrote changelog and added WARN_ON_ONCE() ] > > Signed-off-by: Song Liu <songliubraving@fb.com> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > --- > V2: Negate P[UM]D_MASK for checking whether the offset part is 0 > V3: Fix changelog > --- > arch/x86/mm/pti.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > --- a/arch/x86/mm/pti.c > +++ b/arch/x86/mm/pti.c > @@ -330,13 +330,15 @@ pti_clone_pgtable(unsigned long start, u > > pud = pud_offset(p4d, addr); > if (pud_none(*pud)) { > - addr += PUD_SIZE; > + WARN_ON_ONCE(addr & ~PUD_MASK); > + addr = round_up(addr + 1, PUD_SIZE); > continue; > } > > pmd = pmd_offset(pud, addr); > if (pmd_none(*pmd)) { > - addr += PMD_SIZE; > + WARN_ON_ONCE(addr & ~PMD_MASK); > + addr = round_up(addr + 1, PMD_SIZE); > continue; > } Reviewed-by: Ingo Molnar <mingo@kernel.org> Thanks, Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* [tip: x86/pti] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() 2019-08-28 21:54 ` [patch V3 " Thomas Gleixner 2019-08-28 23:22 ` Ingo Molnar @ 2019-08-29 19:02 ` tip-bot2 for Song Liu 2019-08-30 10:24 ` [patch V3 1/2] " Joerg Roedel 2 siblings, 0 replies; 26+ messages in thread From: tip-bot2 for Song Liu @ 2019-08-29 19:02 UTC (permalink / raw) To: linux-tip-commits Cc: Song Liu, Thomas Gleixner, Ingo Molnar, Peter Zijlstra (Intel), Borislav Petkov, linux-kernel The following commit has been merged into the x86/pti branch of tip: Commit-ID: 825d0b73cd7526b0bb186798583fae810091cbac Gitweb: https://git.kernel.org/tip/825d0b73cd7526b0bb186798583fae810091cbac Author: Song Liu <songliubraving@fb.com> AuthorDate: Wed, 28 Aug 2019 23:54:55 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Thu, 29 Aug 2019 20:52:52 +02:00 x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() pti_clone_pmds() assumes that the supplied address is either: - properly PUD/PMD aligned or - the address is actually mapped which means that independently of the mapping level (PUD/PMD/PTE) the next higher mapping exists. If that's not the case the unaligned address can be incremented by PUD or PMD size incorrectly. All callers supply mapped and/or aligned addresses, but for the sake of robustness it's better to handle that case properly and to emit a warning. [ tglx: Rewrote changelog and added WARN_ON_ONCE() ] Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908282352470.1938@nanos.tec.linutronix.de --- arch/x86/mm/pti.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index b196524..a24487b 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -330,13 +330,15 @@ pti_clone_pgtable(unsigned long start, unsigned long end, pud = pud_offset(p4d, addr); if (pud_none(*pud)) { - addr += PUD_SIZE; + WARN_ON_ONCE(addr & ~PUD_MASK); + addr = round_up(addr + 1, PUD_SIZE); continue; } pmd = pmd_offset(pud, addr); if (pmd_none(*pmd)) { - addr += PMD_SIZE; + WARN_ON_ONCE(addr & ~PMD_MASK); + addr = round_up(addr + 1, PMD_SIZE); continue; } ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [patch V3 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() 2019-08-28 21:54 ` [patch V3 " Thomas Gleixner 2019-08-28 23:22 ` Ingo Molnar 2019-08-29 19:02 ` [tip: x86/pti] " tip-bot2 for Song Liu @ 2019-08-30 10:24 ` Joerg Roedel 2 siblings, 0 replies; 26+ messages in thread From: Joerg Roedel @ 2019-08-30 10:24 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, x86, Song Liu, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rik van Riel On Wed, Aug 28, 2019 at 11:54:55PM +0200, Thomas Gleixner wrote: > --- a/arch/x86/mm/pti.c > +++ b/arch/x86/mm/pti.c > @@ -330,13 +330,15 @@ pti_clone_pgtable(unsigned long start, u > > pud = pud_offset(p4d, addr); > if (pud_none(*pud)) { > - addr += PUD_SIZE; > + WARN_ON_ONCE(addr & ~PUD_MASK); > + addr = round_up(addr + 1, PUD_SIZE); > continue; > } > > pmd = pmd_offset(pud, addr); > if (pmd_none(*pmd)) { > - addr += PMD_SIZE; > + WARN_ON_ONCE(addr & ~PMD_MASK); > + addr = round_up(addr + 1, PMD_SIZE); > continue; > } > Reviewed-by: Joerg Roedel <jroedel@suse.de> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [patch 2/2] x86/mm/pti: Do not invoke PTI functions when PTI is disabled 2019-08-28 14:24 [patch 0/2] x86/mm/pti: Robustness updates Thomas Gleixner 2019-08-28 14:24 ` [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() Thomas Gleixner @ 2019-08-28 14:24 ` Thomas Gleixner 2019-08-28 15:47 ` Dave Hansen ` (4 more replies) 2019-08-28 16:03 ` [patch 0/2] x86/mm/pti: Robustness updates Peter Zijlstra 2 siblings, 5 replies; 26+ messages in thread From: Thomas Gleixner @ 2019-08-28 14:24 UTC (permalink / raw) To: LKML Cc: x86, Song Liu, Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rik van Riel When PTI is disabled at boot time either because the CPU is not affected or PTI has been disabled on the command line, the boot code still calls into pti_finalize() which then unconditionally invokes: pti_clone_entry_text() pti_clone_kernel_text() pti_clone_kernel_text() was called unconditionally before the 32bit support was added and 32bit added the call to pti_clone_entry_text(). The call has no side effects as cloning the page tables into the available second one, which was allocated for PTI does not create damage. But it does not make sense either and in case that this functionality would be extended later this might actually lead to hard to diagnose issue. Neither function should be called when PTI is runtime disabled. Make the invocation conditional. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- arch/x86/mm/pti.c | 2 ++ 1 file changed, 2 insertions(+) --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -668,6 +668,8 @@ void __init pti_init(void) */ void pti_finalize(void) { + if (!boot_cpu_has(X86_FEATURE_PTI)) + return; /* * We need to clone everything (again) that maps parts of the * kernel image. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 2/2] x86/mm/pti: Do not invoke PTI functions when PTI is disabled 2019-08-28 14:24 ` [patch 2/2] x86/mm/pti: Do not invoke PTI functions when PTI is disabled Thomas Gleixner @ 2019-08-28 15:47 ` Dave Hansen 2019-08-28 17:49 ` Song Liu ` (3 subsequent siblings) 4 siblings, 0 replies; 26+ messages in thread From: Dave Hansen @ 2019-08-28 15:47 UTC (permalink / raw) To: Thomas Gleixner, LKML Cc: x86, Song Liu, Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rik van Riel On 8/28/19 7:24 AM, Thomas Gleixner wrote: > Neither function should be called when PTI is runtime disabled. Make the > invocation conditional. Thanks for sending that out. My impressions from a look through it matched your changelog. Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 2/2] x86/mm/pti: Do not invoke PTI functions when PTI is disabled 2019-08-28 14:24 ` [patch 2/2] x86/mm/pti: Do not invoke PTI functions when PTI is disabled Thomas Gleixner 2019-08-28 15:47 ` Dave Hansen @ 2019-08-28 17:49 ` Song Liu 2019-08-28 19:00 ` Ingo Molnar ` (2 subsequent siblings) 4 siblings, 0 replies; 26+ messages in thread From: Song Liu @ 2019-08-28 17:49 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, X86 ML, Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rik van Riel > On Aug 28, 2019, at 7:24 AM, Thomas Gleixner <tglx@linutronix.de> wrote: > > When PTI is disabled at boot time either because the CPU is not affected or > PTI has been disabled on the command line, the boot code still calls into > pti_finalize() which then unconditionally invokes: > > pti_clone_entry_text() > pti_clone_kernel_text() > > pti_clone_kernel_text() was called unconditionally before the 32bit support > was added and 32bit added the call to pti_clone_entry_text(). > > The call has no side effects as cloning the page tables into the available > second one, which was allocated for PTI does not create damage. But it does > not make sense either and in case that this functionality would be extended > later this might actually lead to hard to diagnose issue. > > Neither function should be called when PTI is runtime disabled. Make the > invocation conditional. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Song Liu <songliubraving@fb.com> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 2/2] x86/mm/pti: Do not invoke PTI functions when PTI is disabled 2019-08-28 14:24 ` [patch 2/2] x86/mm/pti: Do not invoke PTI functions when PTI is disabled Thomas Gleixner 2019-08-28 15:47 ` Dave Hansen 2019-08-28 17:49 ` Song Liu @ 2019-08-28 19:00 ` Ingo Molnar 2019-08-29 19:02 ` [tip: x86/pti] " tip-bot2 for Thomas Gleixner 2019-08-30 10:25 ` [patch 2/2] " Joerg Roedel 4 siblings, 0 replies; 26+ messages in thread From: Ingo Molnar @ 2019-08-28 19:00 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, x86, Song Liu, Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rik van Riel * Thomas Gleixner <tglx@linutronix.de> wrote: > When PTI is disabled at boot time either because the CPU is not affected or > PTI has been disabled on the command line, the boot code still calls into > pti_finalize() which then unconditionally invokes: > > pti_clone_entry_text() > pti_clone_kernel_text() > > pti_clone_kernel_text() was called unconditionally before the 32bit support > was added and 32bit added the call to pti_clone_entry_text(). > > The call has no side effects as cloning the page tables into the available > second one, which was allocated for PTI does not create damage. But it does > not make sense either and in case that this functionality would be extended > later this might actually lead to hard to diagnose issue. s/issue/issues > Neither function should be called when PTI is runtime disabled. Make the > invocation conditional. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > --- > arch/x86/mm/pti.c | 2 ++ > 1 file changed, 2 insertions(+) > > --- a/arch/x86/mm/pti.c > +++ b/arch/x86/mm/pti.c > @@ -668,6 +668,8 @@ void __init pti_init(void) > */ > void pti_finalize(void) > { > + if (!boot_cpu_has(X86_FEATURE_PTI)) > + return; > /* > * We need to clone everything (again) that maps parts of the > * kernel image. Acked-by: Ingo Molnar <mingo@kernel.org> Thanks, Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* [tip: x86/pti] x86/mm/pti: Do not invoke PTI functions when PTI is disabled 2019-08-28 14:24 ` [patch 2/2] x86/mm/pti: Do not invoke PTI functions when PTI is disabled Thomas Gleixner ` (2 preceding siblings ...) 2019-08-28 19:00 ` Ingo Molnar @ 2019-08-29 19:02 ` tip-bot2 for Thomas Gleixner 2019-08-30 10:25 ` [patch 2/2] " Joerg Roedel 4 siblings, 0 replies; 26+ messages in thread From: tip-bot2 for Thomas Gleixner @ 2019-08-29 19:02 UTC (permalink / raw) To: linux-tip-commits Cc: Thomas Gleixner, Dave Hansen, Ingo Molnar, Song Liu, Peter Zijlstra (Intel), Borislav Petkov, linux-kernel The following commit has been merged into the x86/pti branch of tip: Commit-ID: 990784b57731192b7d90c8d4049e6318d81e887d Gitweb: https://git.kernel.org/tip/990784b57731192b7d90c8d4049e6318d81e887d Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Wed, 28 Aug 2019 16:24:47 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Thu, 29 Aug 2019 20:52:53 +02:00 x86/mm/pti: Do not invoke PTI functions when PTI is disabled When PTI is disabled at boot time either because the CPU is not affected or PTI has been disabled on the command line, the boot code still calls into pti_finalize() which then unconditionally invokes: pti_clone_entry_text() pti_clone_kernel_text() pti_clone_kernel_text() was called unconditionally before the 32bit support was added and 32bit added the call to pti_clone_entry_text(). The call has no side effects as cloning the page tables into the available second one, which was allocated for PTI does not create damage. But it does not make sense either and in case that this functionality would be extended later this might actually lead to hard to diagnose issues. Neither function should be called when PTI is runtime disabled. Make the invocation conditional. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190828143124.063353972@linutronix.de --- arch/x86/mm/pti.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index a24487b..7f21404 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -668,6 +668,8 @@ void __init pti_init(void) */ void pti_finalize(void) { + if (!boot_cpu_has(X86_FEATURE_PTI)) + return; /* * We need to clone everything (again) that maps parts of the * kernel image. ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [patch 2/2] x86/mm/pti: Do not invoke PTI functions when PTI is disabled 2019-08-28 14:24 ` [patch 2/2] x86/mm/pti: Do not invoke PTI functions when PTI is disabled Thomas Gleixner ` (3 preceding siblings ...) 2019-08-29 19:02 ` [tip: x86/pti] " tip-bot2 for Thomas Gleixner @ 2019-08-30 10:25 ` Joerg Roedel 4 siblings, 0 replies; 26+ messages in thread From: Joerg Roedel @ 2019-08-30 10:25 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, x86, Song Liu, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Rik van Riel On Wed, Aug 28, 2019 at 04:24:47PM +0200, Thomas Gleixner wrote: > --- a/arch/x86/mm/pti.c > +++ b/arch/x86/mm/pti.c > @@ -668,6 +668,8 @@ void __init pti_init(void) > */ > void pti_finalize(void) > { > + if (!boot_cpu_has(X86_FEATURE_PTI)) > + return; > /* > * We need to clone everything (again) that maps parts of the > * kernel image. > Reviewed-by: Joerg Roedel <jroedel@suse.de> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 0/2] x86/mm/pti: Robustness updates 2019-08-28 14:24 [patch 0/2] x86/mm/pti: Robustness updates Thomas Gleixner 2019-08-28 14:24 ` [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() Thomas Gleixner 2019-08-28 14:24 ` [patch 2/2] x86/mm/pti: Do not invoke PTI functions when PTI is disabled Thomas Gleixner @ 2019-08-28 16:03 ` Peter Zijlstra 2 siblings, 0 replies; 26+ messages in thread From: Peter Zijlstra @ 2019-08-28 16:03 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, x86, Song Liu, Joerg Roedel, Dave Hansen, Andy Lutomirski, Rik van Riel On Wed, Aug 28, 2019 at 04:24:45PM +0200, Thomas Gleixner wrote: > Following up on the discussions around the patch Song submitted to 'cure' a > iTLB related performance regression, I picked up Song's patch which makes > clone_page_tables() more robust by handling unaligned addresses proper and > added one which prevents calling into the PTI code when PTI is enabled > compile time, but disabled at runtime (command line or CPU not affected). > > There is no point in calling into those PTI functions unconditionally. The > resulting page tables are the same before and after the change which makes > sense as the code clones the kernel page table into the secondary page > table space which is available but not used when PTI is boot time disabled. > > But even if it does not do damage today, this could have nasty side effect > when the PTI code is changed, extended etc. later. > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2019-08-30 10:25 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-28 14:24 [patch 0/2] x86/mm/pti: Robustness updates Thomas Gleixner 2019-08-28 14:24 ` [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() Thomas Gleixner 2019-08-28 15:46 ` Dave Hansen 2019-08-28 15:51 ` Thomas Gleixner 2019-08-28 17:58 ` Song Liu 2019-08-28 20:05 ` Thomas Gleixner 2019-08-28 20:32 ` Song Liu 2019-08-28 22:31 ` [PATCH] x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text Thomas Gleixner 2019-08-28 23:03 ` Song Liu 2019-08-29 13:01 ` Peter Zijlstra 2019-08-29 18:55 ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner 2019-08-28 18:58 ` [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() Ingo Molnar 2019-08-28 19:45 ` Thomas Gleixner 2019-08-28 20:54 ` [patch V2 " Thomas Gleixner 2019-08-28 21:52 ` Thomas Gleixner 2019-08-28 21:54 ` [patch V3 " Thomas Gleixner 2019-08-28 23:22 ` Ingo Molnar 2019-08-29 19:02 ` [tip: x86/pti] " tip-bot2 for Song Liu 2019-08-30 10:24 ` [patch V3 1/2] " Joerg Roedel 2019-08-28 14:24 ` [patch 2/2] x86/mm/pti: Do not invoke PTI functions when PTI is disabled Thomas Gleixner 2019-08-28 15:47 ` Dave Hansen 2019-08-28 17:49 ` Song Liu 2019-08-28 19:00 ` Ingo Molnar 2019-08-29 19:02 ` [tip: x86/pti] " tip-bot2 for Thomas Gleixner 2019-08-30 10:25 ` [patch 2/2] " Joerg Roedel 2019-08-28 16:03 ` [patch 0/2] x86/mm/pti: Robustness updates Peter Zijlstra
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.