From: Song Liu <songliubraving@fb.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@intel.com>,
LKML <linux-kernel@vger.kernel.org>,
"x86@kernel.org" <x86@kernel.org>, Joerg Roedel <jroedel@suse.de>,
"Andy Lutomirski" <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
"Rik van Riel" <riel@surriel.com>,
Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH] x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text
Date: Wed, 28 Aug 2019 23:03:03 +0000 [thread overview]
Message-ID: <704BDFE2-E6E7-4B34-8C94-01152B5C9CCD@fb.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1908282355340.1938@nanos.tec.linutronix.de>
> On Aug 28, 2019, at 3:31 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> ftrace does not use text_poke() for enabling trace functionality. It uses
> its own mechanism and flips the whole kernel text to RW and back to RO.
>
> The CPA rework removed a loop based check of 4k pages which tried to
> preserve a large page by checking each 4k page whether the change would
> actually cover all pages in the large page.
>
> This resulted in endless loops for nothing as in testing it turned out that
> it actually never preserved anything. Of course testing missed to include
> ftrace, which is the one and only case which benefitted from the 4k loop.
>
> As a consequence enabling function tracing or ftrace based kprobes results
> in a full 4k split of the kernel text, which affects iTLB performance.
>
> The kernel RO protection is the only valid case where this can actually
> preserve large pages.
>
> All other static protections (RO data, data NX, PCI, BIOS) are truly
> static. So a conflict with those protections which results in a split
> should only ever happen when a change of memory next to a protected region
> is attempted. But these conflicts are rightfully splitting the large page
> to preserve the protected regions. In fact a change to the protected
> regions itself is a bug and is warned about.
>
> Add an exception for the static protection check for kernel text RO when
> the to be changed region spawns a full large page which allows to preserve
> the large mappings. This also prevents the syslog to be spammed about CPA
> violations when ftrace is used.
>
> The exception needs to be removed once ftrace switched over to text_poke()
> which avoids the whole issue.
>
> Fixes: 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely")
> Reported-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
This looks great. Much cleaner than my workaround.
Thanks!
Reviewed-and-tested-by: Song Liu <songliubraving@fb.com>
We need this for v4.20 to v5.3 (assuming Peter's patches will land in 5.4).
> ---
> arch/x86/mm/pageattr.c | 26 ++++++++++++++++++--------
> 1 file changed, 18 insertions(+), 8 deletions(-)
>
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -516,7 +516,7 @@ static inline void check_conflict(int wa
> */
> static inline pgprot_t static_protections(pgprot_t prot, unsigned long start,
> unsigned long pfn, unsigned long npg,
> - int warnlvl)
> + unsigned long lpsize, int warnlvl)
> {
> pgprotval_t forbidden, res;
> unsigned long end;
> @@ -535,9 +535,17 @@ static inline pgprot_t static_protection
> check_conflict(warnlvl, prot, res, start, end, pfn, "Text NX");
> forbidden = res;
>
> - res = protect_kernel_text_ro(start, end);
> - check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO");
> - forbidden |= res;
> + /*
> + * Special case to preserve a large page. If the change spawns the
> + * full large page mapping then there is no point to split it
> + * up. Happens with ftrace and is going to be removed once ftrace
> + * switched to text_poke().
> + */
> + if (lpsize != (npg * PAGE_SIZE) || (start & (lpsize - 1))) {
> + res = protect_kernel_text_ro(start, end);
> + check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO");
> + forbidden |= res;
> + }
>
> /* Check the PFN directly */
> res = protect_pci_bios(pfn, pfn + npg - 1);
> @@ -819,7 +827,7 @@ static int __should_split_large_page(pte
> * extra conditional required here.
> */
> chk_prot = static_protections(old_prot, lpaddr, old_pfn, numpages,
> - CPA_CONFLICT);
> + psize, CPA_CONFLICT);
>
> if (WARN_ON_ONCE(pgprot_val(chk_prot) != pgprot_val(old_prot))) {
> /*
> @@ -855,7 +863,7 @@ static int __should_split_large_page(pte
> * protection requirement in the large page.
> */
> new_prot = static_protections(req_prot, lpaddr, old_pfn, numpages,
> - CPA_DETECT);
> + psize, CPA_DETECT);
>
> /*
> * If there is a conflict, split the large page.
> @@ -906,7 +914,8 @@ static void split_set_pte(struct cpa_dat
> if (!cpa->force_static_prot)
> goto set;
>
> - prot = static_protections(ref_prot, address, pfn, npg, CPA_PROTECT);
> + /* Hand in lpsize = 0 to enforce the protection mechanism */
> + prot = static_protections(ref_prot, address, pfn, npg, 0, CPA_PROTECT);
>
> if (pgprot_val(prot) == pgprot_val(ref_prot))
> goto set;
> @@ -1503,7 +1512,8 @@ static int __change_page_attr(struct cpa
> pgprot_val(new_prot) |= pgprot_val(cpa->mask_set);
>
> cpa_inc_4k_install();
> - new_prot = static_protections(new_prot, address, pfn, 1,
> + /* Hand in lpsize = 0 to enforce the protection mechanism */
> + new_prot = static_protections(new_prot, address, pfn, 1, 0,
> CPA_PROTECT);
>
> new_prot = pgprot_clear_protnone_bits(new_prot);
next prev parent reply other threads:[~2019-08-28 23:04 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-28 14:24 [patch 0/2] x86/mm/pti: Robustness updates Thomas Gleixner
2019-08-28 14:24 ` [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() Thomas Gleixner
2019-08-28 15:46 ` Dave Hansen
2019-08-28 15:51 ` Thomas Gleixner
2019-08-28 17:58 ` Song Liu
2019-08-28 20:05 ` Thomas Gleixner
2019-08-28 20:32 ` Song Liu
2019-08-28 22:31 ` [PATCH] x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text Thomas Gleixner
2019-08-28 23:03 ` Song Liu [this message]
2019-08-29 13:01 ` Peter Zijlstra
2019-08-29 18:55 ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner
2019-08-28 18:58 ` [patch 1/2] x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() Ingo Molnar
2019-08-28 19:45 ` Thomas Gleixner
2019-08-28 20:54 ` [patch V2 " Thomas Gleixner
2019-08-28 21:52 ` Thomas Gleixner
2019-08-28 21:54 ` [patch V3 " Thomas Gleixner
2019-08-28 23:22 ` Ingo Molnar
2019-08-29 19:02 ` [tip: x86/pti] " tip-bot2 for Song Liu
2019-08-30 10:24 ` [patch V3 1/2] " Joerg Roedel
2019-08-28 14:24 ` [patch 2/2] x86/mm/pti: Do not invoke PTI functions when PTI is disabled Thomas Gleixner
2019-08-28 15:47 ` Dave Hansen
2019-08-28 17:49 ` Song Liu
2019-08-28 19:00 ` Ingo Molnar
2019-08-29 19:02 ` [tip: x86/pti] " tip-bot2 for Thomas Gleixner
2019-08-30 10:25 ` [patch 2/2] " Joerg Roedel
2019-08-28 16:03 ` [patch 0/2] x86/mm/pti: Robustness updates Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=704BDFE2-E6E7-4B34-8C94-01152B5C9CCD@fb.com \
--to=songliubraving@fb.com \
--cc=dave.hansen@intel.com \
--cc=jroedel@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).