linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Huang Ying <ying.huang@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Arjan van de Ven <arjan@infradead.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH] x86: use the right protections for split-up pagetables
Date: Fri, 20 Feb 2009 08:57:22 -0500 (EST)	[thread overview]
Message-ID: <alpine.DEB.2.00.0902200856330.29217@gandalf.stny.rr.com> (raw)
In-Reply-To: <20090220072915.GB28085@elte.hu>


On Fri, 20 Feb 2009, Ingo Molnar wrote:

> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > So the whole
> > 
> >         ref_prot = pte_pgprot(pte_mkexec(pte_clrhuge(*kpte)));
> >         pgprot_val(ref_prot) |= _PAGE_PRESENT;
> >         __set_pmd_pte(kpte, address, mk_pte(base, ref_prot));
> > 
> > sequence is utter crap, I think. The whole "ref_prot" there 
> > should be just _pgprot(_KERNPG_TABLE), I think. I don't think 
> > there is any other valid value.
> 
> Agreed, split_large_page() was just plain confused here - there 
> was no hidden reason for this logic. It makes no sense to bring 
> any pte level protection information to the PMD level because a 
> pmd entry covers a set of 512 ptes so there's no singular 
> protection attribute that can be carried to it.
> 
> The right solution is what you suggested: to use the most 
> permissive protection bits for the pmd, i.e. _KERNPG_TABLE. 
> Since the protection bits get combined, this makes the pte 
> protections control the final behavior of the mapping - so 
> subsequent code patching and similar activities will work fine.
> 
> The bug was mostly harmless until Steve hacked his kernel to 
> have the right (large) size of readonly, text and data areas. I 
> never hit such an ftrace hang even with allyesconfig bzImage 
> bootups [which has obscenely large text and data sections], so i 
> think something in Steve's tree was also needed to trigger it: 
> an unusually large readonly data section.
> 
> I've queued up the fix below in tip:x86/urgent and will send a 
> pull request later today if it passes testing. Steve, does this 
> solve the bug you've hit?

Yep, I've already tried this fix. It works fine.

-- Steve

> 
> With this fix i dont think the other bits from Steve's series 
> (patch 1-4) are needed at all - those patches expose PMD details 
> in various places that iterate over ptes - that's ugly and 
> unnecessary as well if the PMD's protection is permissive.
> 
> [ Also, since you suggested the fix i've added your Acked-by, 
>   let me know if you dont agree with any aspect of the fix. ]
> 
> 	Ingo
> 
> ---------------->
> >From f07eb4c47d5d4a70dc8eb8e2c158741cd6c69948 Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@elte.hu>
> Date: Fri, 20 Feb 2009 08:04:13 +0100
> Subject: [PATCH] x86: use the right protections for split-up pagetables
> 
> Steven Rostedt found a bug in where in his modified kernel
> ftrace was unable to modify the kernel text, due to the PMD
> itself having been marked read-only as well in
> split_large_page().
> 
> The fix, suggested by Linus, is to not try to 'clone' the
> reference protection of a huge-page, but to use the standard
> (and permissive) page protection bits of KERNPG_TABLE.
> 
> The 'cloning' makes sense for the ptes but it's a confused and
> incorrect concept at the page table level - because the
> pagetable entry is a set of all ptes and hence cannot
> 'clone' any single protection attribute - the ptes can be any
> mixture of protections.
> 
> With the permissive KERNPG_TABLE, even if the pte protections
> get changed after this point (due to ftrace doing code-patching
> or other similar activities like kprobes), the resulting combined
> protections will still be correct and the pte's restrictive
> (or permissive) protections will control it.
> 
> Also update the comment.
> 
> This bug was there for a long time but has not caused visible
> problems before as it needs a rather large read-only area to
> trigger. Steve possibly hacked his kernel with some really
> large arrays or so. Anyway, the bug is definitely worth fixing.
> 
> [ Huang Ying also experienced problems in this area when writing
>   the EFI code, but the real bug in split_large_page() was not
>   realized back then. ]
> 
> Reported-by: Steven Rostedt <rostedt@goodmis.org>
> Reported-by: Huang Ying <ying.huang@intel.com>
> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/mm/pageattr.c |   15 +++++----------
>  1 files changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 8ca0d85..17d5d1a 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -508,18 +508,13 @@ static int split_large_page(pte_t *kpte, unsigned long address)
>  #endif
>  
>  	/*
> -	 * Install the new, split up pagetable. Important details here:
> +	 * Install the new, split up pagetable.
>  	 *
> -	 * On Intel the NX bit of all levels must be cleared to make a
> -	 * page executable. See section 4.13.2 of Intel 64 and IA-32
> -	 * Architectures Software Developer's Manual).
> -	 *
> -	 * Mark the entry present. The current mapping might be
> -	 * set to not present, which we preserved above.
> +	 * We use the standard kernel pagetable protections for the new
> +	 * pagetable protections, the actual ptes set above control the
> +	 * primary protection behavior:
>  	 */
> -	ref_prot = pte_pgprot(pte_mkexec(pte_clrhuge(*kpte)));
> -	pgprot_val(ref_prot) |= _PAGE_PRESENT;
> -	__set_pmd_pte(kpte, address, mk_pte(base, ref_prot));
> +	__set_pmd_pte(kpte, address, mk_pte(base, _pgprot(_KERNPG_TABLE)));
>  	base = NULL;
>  
>  out_unlock:
> 
> 

  parent reply	other threads:[~2009-02-20 13:57 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-20  1:13 [git pull] changes for tip, and a nasty x86 page table bug Steven Rostedt
2009-02-20  1:13 ` [PATCH 1/6] x86: check PMD in spurious_fault handler Steven Rostedt
2009-02-20  1:13 ` [PATCH 2/6] x86: keep pmd rw bit set when creating 4K level pages Steven Rostedt
2009-02-20  1:13 ` [PATCH 3/6] ftrace: allow archs to preform pre and post process for code modification Steven Rostedt
2009-02-20  1:13 ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
2009-02-20  1:32   ` Andrew Morton
2009-02-20  1:44     ` Steven Rostedt
2009-02-20  2:05       ` [PATCH][git pull] update to tip/tracing/ftrace Steven Rostedt
2009-02-22 17:50   ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Andi Kleen
2009-02-22 22:53     ` Steven Rostedt
2009-02-23  0:29       ` Andi Kleen
2009-02-23  2:33       ` Mathieu Desnoyers
2009-02-23  4:29         ` Steven Rostedt
2009-02-23  4:53           ` Mathieu Desnoyers
2009-02-23 14:48             ` Steven Rostedt
2009-02-23 15:42               ` Mathieu Desnoyers
2009-02-23 15:51                 ` Steven Rostedt
2009-02-23 15:55                   ` Steven Rostedt
2009-02-23 16:13                   ` Mathieu Desnoyers
2009-02-23 16:48                     ` Steven Rostedt
2009-02-23 17:31                       ` Mathieu Desnoyers
2009-02-23 18:17                         ` Steven Rostedt
2009-02-23 18:34                           ` Mathieu Desnoyers
2009-02-27 17:52                           ` Masami Hiramatsu
2009-02-27 18:07                             ` Mathieu Desnoyers
2009-02-27 18:34                               ` Masami Hiramatsu
2009-02-27 18:53                                 ` Mathieu Desnoyers
2009-02-27 20:57                                   ` Masami Hiramatsu
2009-03-02 17:01                                     ` [RFC][PATCH] x86: make text_poke() atomic Masami Hiramatsu
2009-03-02 17:19                                       ` Mathieu Desnoyers
2009-03-02 22:15                                         ` Masami Hiramatsu
2009-03-02 22:22                                           ` Ingo Molnar
2009-03-02 22:55                                             ` Masami Hiramatsu
2009-03-02 23:09                                               ` Ingo Molnar
2009-03-02 23:38                                                 ` Masami Hiramatsu
2009-03-02 23:49                                                   ` Ingo Molnar
2009-03-03  0:00                                                     ` Mathieu Desnoyers
2009-03-03  0:00                                                     ` [PATCH] Text Edit Lock - Architecture Independent Code Mathieu Desnoyers
2009-03-03  0:32                                                       ` Ingo Molnar
2009-03-03  0:39                                                         ` Mathieu Desnoyers
2009-03-03  1:30                                                         ` [PATCH] Text Edit Lock - Architecture Independent Code (v2) Mathieu Desnoyers
2009-03-03  1:31                                                         ` [PATCH] Text Edit Lock - kprobes architecture independent support (v2) Mathieu Desnoyers
2009-03-03  9:27                                                           ` Ingo Molnar
2009-03-03 12:06                                                             ` Ananth N Mavinakayanahalli
2009-03-03 14:28                                                               ` Mathieu Desnoyers
2009-03-03 14:33                                                               ` [PATCH] Text Edit Lock - kprobes architecture independent support (v3) Mathieu Desnoyers
2009-03-03 14:53                                                               ` [PATCH] Text Edit Lock - kprobes architecture independent support (v2) Ingo Molnar
2009-03-03  0:01                                                     ` [PATCH] Text Edit Lock - kprobes architecture independent support Mathieu Desnoyers
2009-03-03  0:10                                                       ` Masami Hiramatsu
2009-03-03  0:05                                                     ` [RFC][PATCH] x86: make text_poke() atomic Masami Hiramatsu
2009-03-03  0:22                                                       ` Ingo Molnar
2009-03-03  0:31                                                         ` Masami Hiramatsu
2009-03-03 16:31                                                           ` [PATCH] x86: make text_poke() atomic using fixmap Masami Hiramatsu
2009-03-03 17:08                                                             ` Mathieu Desnoyers
2009-03-05 10:38                                                             ` Ingo Molnar
2009-03-06 14:06                                                               ` Ingo Molnar
2009-03-06 14:49                                                                 ` Masami Hiramatsu
2009-03-02 18:28                                       ` [RFC][PATCH] x86: make text_poke() atomic Arjan van de Ven
2009-03-02 18:36                                         ` Mathieu Desnoyers
2009-03-02 18:55                                           ` Arjan van de Ven
2009-03-02 19:13                                             ` Masami Hiramatsu
2009-03-02 19:23                                               ` H. Peter Anvin
2009-03-02 19:47                                             ` Mathieu Desnoyers
2009-03-02 18:42                                         ` Linus Torvalds
2009-03-03  4:54                                       ` Nick Piggin
2009-02-23 18:23                         ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
2009-02-23  9:02         ` Ingo Molnar
2009-02-27 21:08     ` Pavel Machek
2009-02-28 16:56       ` Andi Kleen
2009-02-28 22:08         ` Pavel Machek
     [not found]           ` <87wsba1a9f.fsf@basil.nowhere.org>
2009-02-28 22:19             ` Pavel Machek
2009-02-28 23:52               ` Andi Kleen
2009-02-20  1:13 ` [PATCH 5/6] ftrace: immediately stop code modification if failure is detected Steven Rostedt
2009-02-20  1:13 ` [PATCH 6/6] ftrace: break out modify loop immediately on detection of error Steven Rostedt
2009-02-20  2:00 ` [git pull] changes for tip, and a nasty x86 page table bug Linus Torvalds
2009-02-20  2:08   ` Steven Rostedt
2009-02-20  3:44     ` Linus Torvalds
2009-02-20  4:00       ` Steven Rostedt
2009-02-20  4:17         ` Linus Torvalds
2009-02-20  4:34           ` Steven Rostedt
2009-02-20  5:02           ` Huang Ying
2009-02-20  7:29       ` [PATCH] x86: use the right protections for split-up pagetables Ingo Molnar
2009-02-20  7:39         ` [PATCH, v2] " Ingo Molnar
2009-02-20  8:02           ` Ingo Molnar
2009-02-20 10:24             ` Ingo Molnar
2009-02-20 13:57         ` Steven Rostedt [this message]
2009-02-20 15:40         ` [PATCH] " Linus Torvalds
2009-02-20 16:59           ` Ingo Molnar
2009-02-20 18:33           ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.0902200856330.29217@gandalf.stny.rr.com \
    --to=rostedt@goodmis.org \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).