linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Huang Ying <ying.huang@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Arjan van de Ven <arjan@infradead.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: [PATCH] x86: use the right protections for split-up pagetables
Date: Fri, 20 Feb 2009 08:29:15 +0100	[thread overview]
Message-ID: <20090220072915.GB28085@elte.hu> (raw)
In-Reply-To: <alpine.LFD.2.00.0902191923300.21686@localhost.localdomain>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> So the whole
> 
>         ref_prot = pte_pgprot(pte_mkexec(pte_clrhuge(*kpte)));
>         pgprot_val(ref_prot) |= _PAGE_PRESENT;
>         __set_pmd_pte(kpte, address, mk_pte(base, ref_prot));
> 
> sequence is utter crap, I think. The whole "ref_prot" there 
> should be just _pgprot(_KERNPG_TABLE), I think. I don't think 
> there is any other valid value.

Agreed, split_large_page() was just plain confused here - there 
was no hidden reason for this logic. It makes no sense to bring 
any pte level protection information to the PMD level because a 
pmd entry covers a set of 512 ptes so there's no singular 
protection attribute that can be carried to it.

The right solution is what you suggested: to use the most 
permissive protection bits for the pmd, i.e. _KERNPG_TABLE. 
Since the protection bits get combined, this makes the pte 
protections control the final behavior of the mapping - so 
subsequent code patching and similar activities will work fine.

The bug was mostly harmless until Steve hacked his kernel to 
have the right (large) size of readonly, text and data areas. I 
never hit such an ftrace hang even with allyesconfig bzImage 
bootups [which has obscenely large text and data sections], so i 
think something in Steve's tree was also needed to trigger it: 
an unusually large readonly data section.

I've queued up the fix below in tip:x86/urgent and will send a 
pull request later today if it passes testing. Steve, does this 
solve the bug you've hit?

With this fix i dont think the other bits from Steve's series 
(patch 1-4) are needed at all - those patches expose PMD details 
in various places that iterate over ptes - that's ugly and 
unnecessary as well if the PMD's protection is permissive.

[ Also, since you suggested the fix i've added your Acked-by, 
  let me know if you dont agree with any aspect of the fix. ]

	Ingo

---------------->
>From f07eb4c47d5d4a70dc8eb8e2c158741cd6c69948 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Fri, 20 Feb 2009 08:04:13 +0100
Subject: [PATCH] x86: use the right protections for split-up pagetables

Steven Rostedt found a bug in where in his modified kernel
ftrace was unable to modify the kernel text, due to the PMD
itself having been marked read-only as well in
split_large_page().

The fix, suggested by Linus, is to not try to 'clone' the
reference protection of a huge-page, but to use the standard
(and permissive) page protection bits of KERNPG_TABLE.

The 'cloning' makes sense for the ptes but it's a confused and
incorrect concept at the page table level - because the
pagetable entry is a set of all ptes and hence cannot
'clone' any single protection attribute - the ptes can be any
mixture of protections.

With the permissive KERNPG_TABLE, even if the pte protections
get changed after this point (due to ftrace doing code-patching
or other similar activities like kprobes), the resulting combined
protections will still be correct and the pte's restrictive
(or permissive) protections will control it.

Also update the comment.

This bug was there for a long time but has not caused visible
problems before as it needs a rather large read-only area to
trigger. Steve possibly hacked his kernel with some really
large arrays or so. Anyway, the bug is definitely worth fixing.

[ Huang Ying also experienced problems in this area when writing
  the EFI code, but the real bug in split_large_page() was not
  realized back then. ]

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Huang Ying <ying.huang@intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/mm/pageattr.c |   15 +++++----------
 1 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 8ca0d85..17d5d1a 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -508,18 +508,13 @@ static int split_large_page(pte_t *kpte, unsigned long address)
 #endif
 
 	/*
-	 * Install the new, split up pagetable. Important details here:
+	 * Install the new, split up pagetable.
 	 *
-	 * On Intel the NX bit of all levels must be cleared to make a
-	 * page executable. See section 4.13.2 of Intel 64 and IA-32
-	 * Architectures Software Developer's Manual).
-	 *
-	 * Mark the entry present. The current mapping might be
-	 * set to not present, which we preserved above.
+	 * We use the standard kernel pagetable protections for the new
+	 * pagetable protections, the actual ptes set above control the
+	 * primary protection behavior:
 	 */
-	ref_prot = pte_pgprot(pte_mkexec(pte_clrhuge(*kpte)));
-	pgprot_val(ref_prot) |= _PAGE_PRESENT;
-	__set_pmd_pte(kpte, address, mk_pte(base, ref_prot));
+	__set_pmd_pte(kpte, address, mk_pte(base, _pgprot(_KERNPG_TABLE)));
 	base = NULL;
 
 out_unlock:

  parent reply	other threads:[~2009-02-20  7:30 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-20  1:13 [git pull] changes for tip, and a nasty x86 page table bug Steven Rostedt
2009-02-20  1:13 ` [PATCH 1/6] x86: check PMD in spurious_fault handler Steven Rostedt
2009-02-20  1:13 ` [PATCH 2/6] x86: keep pmd rw bit set when creating 4K level pages Steven Rostedt
2009-02-20  1:13 ` [PATCH 3/6] ftrace: allow archs to preform pre and post process for code modification Steven Rostedt
2009-02-20  1:13 ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
2009-02-20  1:32   ` Andrew Morton
2009-02-20  1:44     ` Steven Rostedt
2009-02-20  2:05       ` [PATCH][git pull] update to tip/tracing/ftrace Steven Rostedt
2009-02-22 17:50   ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Andi Kleen
2009-02-22 22:53     ` Steven Rostedt
2009-02-23  0:29       ` Andi Kleen
2009-02-23  2:33       ` Mathieu Desnoyers
2009-02-23  4:29         ` Steven Rostedt
2009-02-23  4:53           ` Mathieu Desnoyers
2009-02-23 14:48             ` Steven Rostedt
2009-02-23 15:42               ` Mathieu Desnoyers
2009-02-23 15:51                 ` Steven Rostedt
2009-02-23 15:55                   ` Steven Rostedt
2009-02-23 16:13                   ` Mathieu Desnoyers
2009-02-23 16:48                     ` Steven Rostedt
2009-02-23 17:31                       ` Mathieu Desnoyers
2009-02-23 18:17                         ` Steven Rostedt
2009-02-23 18:34                           ` Mathieu Desnoyers
2009-02-27 17:52                           ` Masami Hiramatsu
2009-02-27 18:07                             ` Mathieu Desnoyers
2009-02-27 18:34                               ` Masami Hiramatsu
2009-02-27 18:53                                 ` Mathieu Desnoyers
2009-02-27 20:57                                   ` Masami Hiramatsu
2009-03-02 17:01                                     ` [RFC][PATCH] x86: make text_poke() atomic Masami Hiramatsu
2009-03-02 17:19                                       ` Mathieu Desnoyers
2009-03-02 22:15                                         ` Masami Hiramatsu
2009-03-02 22:22                                           ` Ingo Molnar
2009-03-02 22:55                                             ` Masami Hiramatsu
2009-03-02 23:09                                               ` Ingo Molnar
2009-03-02 23:38                                                 ` Masami Hiramatsu
2009-03-02 23:49                                                   ` Ingo Molnar
2009-03-03  0:00                                                     ` Mathieu Desnoyers
2009-03-03  0:00                                                     ` [PATCH] Text Edit Lock - Architecture Independent Code Mathieu Desnoyers
2009-03-03  0:32                                                       ` Ingo Molnar
2009-03-03  0:39                                                         ` Mathieu Desnoyers
2009-03-03  1:30                                                         ` [PATCH] Text Edit Lock - Architecture Independent Code (v2) Mathieu Desnoyers
2009-03-03  1:31                                                         ` [PATCH] Text Edit Lock - kprobes architecture independent support (v2) Mathieu Desnoyers
2009-03-03  9:27                                                           ` Ingo Molnar
2009-03-03 12:06                                                             ` Ananth N Mavinakayanahalli
2009-03-03 14:28                                                               ` Mathieu Desnoyers
2009-03-03 14:33                                                               ` [PATCH] Text Edit Lock - kprobes architecture independent support (v3) Mathieu Desnoyers
2009-03-03 14:53                                                               ` [PATCH] Text Edit Lock - kprobes architecture independent support (v2) Ingo Molnar
2009-03-03  0:01                                                     ` [PATCH] Text Edit Lock - kprobes architecture independent support Mathieu Desnoyers
2009-03-03  0:10                                                       ` Masami Hiramatsu
2009-03-03  0:05                                                     ` [RFC][PATCH] x86: make text_poke() atomic Masami Hiramatsu
2009-03-03  0:22                                                       ` Ingo Molnar
2009-03-03  0:31                                                         ` Masami Hiramatsu
2009-03-03 16:31                                                           ` [PATCH] x86: make text_poke() atomic using fixmap Masami Hiramatsu
2009-03-03 17:08                                                             ` Mathieu Desnoyers
2009-03-05 10:38                                                             ` Ingo Molnar
2009-03-06 14:06                                                               ` Ingo Molnar
2009-03-06 14:49                                                                 ` Masami Hiramatsu
2009-03-02 18:28                                       ` [RFC][PATCH] x86: make text_poke() atomic Arjan van de Ven
2009-03-02 18:36                                         ` Mathieu Desnoyers
2009-03-02 18:55                                           ` Arjan van de Ven
2009-03-02 19:13                                             ` Masami Hiramatsu
2009-03-02 19:23                                               ` H. Peter Anvin
2009-03-02 19:47                                             ` Mathieu Desnoyers
2009-03-02 18:42                                         ` Linus Torvalds
2009-03-03  4:54                                       ` Nick Piggin
2009-02-23 18:23                         ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
2009-02-23  9:02         ` Ingo Molnar
2009-02-27 21:08     ` Pavel Machek
2009-02-28 16:56       ` Andi Kleen
2009-02-28 22:08         ` Pavel Machek
     [not found]           ` <87wsba1a9f.fsf@basil.nowhere.org>
2009-02-28 22:19             ` Pavel Machek
2009-02-28 23:52               ` Andi Kleen
2009-02-20  1:13 ` [PATCH 5/6] ftrace: immediately stop code modification if failure is detected Steven Rostedt
2009-02-20  1:13 ` [PATCH 6/6] ftrace: break out modify loop immediately on detection of error Steven Rostedt
2009-02-20  2:00 ` [git pull] changes for tip, and a nasty x86 page table bug Linus Torvalds
2009-02-20  2:08   ` Steven Rostedt
2009-02-20  3:44     ` Linus Torvalds
2009-02-20  4:00       ` Steven Rostedt
2009-02-20  4:17         ` Linus Torvalds
2009-02-20  4:34           ` Steven Rostedt
2009-02-20  5:02           ` Huang Ying
2009-02-20  7:29       ` Ingo Molnar [this message]
2009-02-20  7:39         ` [PATCH, v2] x86: use the right protections for split-up pagetables Ingo Molnar
2009-02-20  8:02           ` Ingo Molnar
2009-02-20 10:24             ` Ingo Molnar
2009-02-20 13:57         ` [PATCH] " Steven Rostedt
2009-02-20 15:40         ` Linus Torvalds
2009-02-20 16:59           ` Ingo Molnar
2009-02-20 18:33           ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090220072915.GB28085@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).