All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Don Zickus <dzickus@redhat.com>, Andi Kleen <ak@linux.intel.com>,
	dave.hansen@linux.intel.com,
	Stephane Eranian <eranian@google.com>,
	jmario@redhat.com,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Arnaldo Carvalho de Melo <acme@infradead.org>
Subject: Re: [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip()
Date: Wed, 23 Oct 2013 09:44:47 +0200	[thread overview]
Message-ID: <20131023074447.GB9916@gmail.com> (raw)
In-Reply-To: <20131022211237.GH2490@laptop.programming.kicks-ass.net>


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Oct 17, 2013 at 03:27:48PM -0700, Linus Torvalds wrote:
> > On Thu, Oct 17, 2013 at 3:01 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > Oh wait,.. now that Steven fixed being able to take faults from NMI
> > > context; we could actually try copy_from_user_inatomic(). Being able to
> > > directly access userspace would make the whole deal a lot easier again.
> > 
> > Careful! There is one magic piece of state that you need to
> > save-and-restore if you do this, namely %cr2. Taking a page fault
> > always writes to %cr2, and we must *not* corrupt it in the NMI
> > handler.
> 
> It looks like this is already dealt with (a similar thing is done 
> for i386).

> commit 7fbb98c5cb07563d3ee08714073a8e5452a96be2
> Author: Steven Rostedt <srostedt@redhat.com>
> Date:   Thu Jun 7 10:21:21 2012 -0400
> 
>     x86: Save cr2 in NMI in case NMIs take a page fault
>     
>     Avi Kivity reported that page faults in NMIs could cause havic if
>     the NMI preempted another page fault handler:
>     
>        The recent changes to NMI allow exceptions to take place in NMI
>        handlers, but I think that a #PF (say, due to access to vmalloc space)
>        is still problematic.  Consider the sequence
>     
>         #PF  (cr2 set by processor)
>           NMI
>             ...
>             #PF (cr2 clobbered)
>               do_page_fault()
>               IRET
>             ...
>             IRET
>           do_page_fault()
>             address = read_cr2()
>     
>        The last line reads the overwritten cr2 value.
>     
>     Originally I wrote a patch to solve this by saving the cr2 on the stack.
>     Brian Gerst suggested to save it in the r12 register as both r12 and rbx
>     are saved by the do_nmi handler as required by the C standard. But rbx
>     is already used for saving if swapgs needs to be run on exit of the NMI
>     handler.
>     
>     Link: http://lkml.kernel.org/r/4FBB8C40.6080304@redhat.com
>     Link: http://lkml.kernel.org/r/1337763411.13348.140.camel@gandalf.stny.rr.com
>     
>     Reported-by: Avi Kivity <avi@redhat.com>
>     Cc: Linus Torvalds <torvalds@linux-foundation.org>
>     Cc: H. Peter Anvin <hpa@zytor.com>
>     Cc: Thomas Gleixner <tglx@linutronix.de>
>     Suggested-by: Brian Gerst <brgerst@gmail.com>
>     Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> 
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 7d65133..111f6bb 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -1758,10 +1758,30 @@ ENTRY(nmi)
>  	 */
>  	call save_paranoid
>  	DEFAULT_FRAME 0
> +
> +	/*
> +	 * Save off the CR2 register. If we take a page fault in the NMI then
> +	 * it could corrupt the CR2 value. If the NMI preempts a page fault
> +	 * handler before it was able to read the CR2 register, and then the
> +	 * NMI itself takes a page fault, the page fault that was preempted
> +	 * will read the information from the NMI page fault and not the
> +	 * origin fault. Save it off and restore it if it changes.
> +	 * Use the r12 callee-saved register.
> +	 */
> +	movq %cr2, %r12

> +	/* Did the NMI take a page fault? Restore cr2 if it did */
> +	movq %cr2, %rcx
> +	cmpq %rcx, %r12
> +	je 1f
> +	movq %r12, %cr2
> +1:
> +	

Btw., depending on how expensive cr2 is to read this could be 
optimized a bit, by matching on the race window RIP range instead of 
saving cr2 unconditionally.

It would take a bit of restructuring of the page fault entry path to 
get the race window into a single RIP range, but that should be 
fairly trivial and might even speed up the page fault handler as the 
CPU can start the cr2 load sooner in the page fault handling 
sequence.

It's all a function of how expensive a cr2 read is. If it's 2 cycles 
then it doesn't matter much. If it's 20 then it might be worthwile.

Thanks,

	Ingo

  parent reply	other threads:[~2013-10-23  7:44 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-14 20:35 x86, perf: throttling issues with long nmi latencies Don Zickus
2013-10-14 21:28 ` Andi Kleen
2013-10-15 10:14 ` Peter Zijlstra
2013-10-15 13:02   ` Peter Zijlstra
2013-10-15 14:32     ` Peter Zijlstra
2013-10-15 15:07       ` Peter Zijlstra
2013-10-15 15:41         ` Don Zickus
2013-10-16 10:57           ` [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip() Peter Zijlstra
2013-10-16 12:46             ` Don Zickus
2013-10-16 13:31               ` Peter Zijlstra
2013-10-16 13:54                 ` Don Zickus
2013-10-17 11:21                 ` Peter Zijlstra
2013-10-17 13:33                 ` Peter Zijlstra
2013-10-29 14:07                   ` [tip:perf/urgent] perf/x86: Fix NMI measurements tip-bot for Peter Zijlstra
2013-10-16 20:52             ` [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip() Andi Kleen
2013-10-16 21:03               ` Peter Zijlstra
2013-10-16 23:07                 ` Peter Zijlstra
2013-10-17  9:41                   ` Peter Zijlstra
2013-10-17 16:00                     ` Don Zickus
2013-10-17 16:04                       ` Don Zickus
2013-10-17 16:30                         ` Peter Zijlstra
2013-10-17 18:26                           ` Linus Torvalds
2013-10-17 21:08                             ` Peter Zijlstra
2013-10-17 21:11                               ` Peter Zijlstra
2013-10-17 22:01                             ` Peter Zijlstra
2013-10-17 22:27                               ` Linus Torvalds
2013-10-22 21:12                                 ` Peter Zijlstra
2013-10-23  7:09                                   ` Linus Torvalds
2013-10-23 20:48                                     ` Peter Zijlstra
2013-10-24 10:52                                       ` Peter Zijlstra
2013-10-24 13:47                                         ` Don Zickus
2013-10-24 14:06                                           ` Peter Zijlstra
2013-10-25 16:33                                         ` Don Zickus
2013-10-25 17:03                                           ` Peter Zijlstra
2013-10-26 10:36                                           ` Ingo Molnar
2013-10-28 13:19                                             ` Don Zickus
2013-10-29 14:08                                         ` [tip:perf/core] perf/x86: Further optimize copy_from_user_nmi() tip-bot for Peter Zijlstra
2013-10-23  7:44                                   ` Ingo Molnar [this message]
2013-10-17 14:49             ` [PATCH] perf, x86: Optimize intel_pmu_pebs_fixup_ip() Don Zickus
2013-10-17 14:51               ` Peter Zijlstra
2013-10-17 15:03                 ` Don Zickus
2013-10-17 15:09                   ` Peter Zijlstra
2013-10-17 15:11                     ` Peter Zijlstra
2013-10-17 16:50             ` [tip:perf/core] perf/x86: " tip-bot for Peter Zijlstra
2013-10-15 16:22         ` x86, perf: throttling issues with long nmi latencies Don Zickus
2013-10-15 14:36     ` Don Zickus
2013-10-15 14:39       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131023074447.GB9916@gmail.com \
    --to=mingo@kernel.org \
    --cc=acme@infradead.org \
    --cc=ak@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dzickus@redhat.com \
    --cc=eranian@google.com \
    --cc=jmario@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.