All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Shuah Khan <shuahkh@osg.samsung.com>,
	Ingo Molnar <mingo@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Borislav Petkov <bp@alien8.de>,
	khorenko@virtuozzo.com, X86 ML <x86@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	xemul@virtuozzo.com, linux-kselftest@vger.kernel.org,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode
Date: Mon, 25 Apr 2016 17:16:38 +0200	[thread overview]
Message-ID: <20160425151638.GL3430@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <CALCETrXXe3Rc5UQPxZyD==EV1vBZ+ei3qtqhkTMTyuXnF9Jazw@mail.gmail.com>

On Thu, Apr 21, 2016 at 04:27:19PM -0700, Andy Lutomirski wrote:
> > Did that help? Or did I confuse you moar?
> >
> 
> I think I'm starting to get it.  What if we rearrange slightly, like this:
> 
> perf_sample_data already has a struct perf_regs in it.  We could add a
> flags field to the first chunk of perf_sample_data:
> 
> u64 sample_flags;

I actually considered that for another problem. Didn't like it then, but
seeing how I still haven't figured out a better way and you're now
proposing this too, maybe...

Part of the problem is that this will completely exhaust that first
cacheline :/

> perf_sample_data_init sets sample_flags to zero.

And while we're on struct perf_sample_data, that thing has gotten
insanely large. We carry it on-stack!

It should be fairly easy to take regs_user_copy out and use a per-cpu
array of them things for this I think, see below.

> Now we rename perf_sample_regs_user to __perf_sample_regs_user and
> make it non-static.  We also teach it to set do data->sample_flags |=
> PERF_SAMPLE_FLAGS_HAS_REGS_USER.  We add:
> 
> static void perf_fetch_regs_user(struct perf_sample_data *data, struct
> pt_regs *interrupt_regs)
> {
>   if (data->sample_flags & PERF_SAMPLE_FLAGS_HAS_REGS_USER)
>     return;
> 
>   __perf_sample_regs_user(&data->regs_user, interrupt_regs,
> &data->regs_user_copy);
> }

I meant to change perf_prepare_sample() to do:

	u64 sample_type = event->attr.sample_type & ~data.sample_type;

or something similar, such that we can override/avoid some of the work
there.

> (Hmm.  This only really works well if we can guarantee that
> interrupt_regs remains valid for the life of the perf_sample_data
> object.  Could we perhaps move the interrupt_regs pointer *into*
> perf_sample_data and stop passing it all over the place?)

So the problem with that is that we'll now overflow the one cacheline,
and the last time I really looked at this that made samples that much
slower.

It might be time to re-evaluate this stuff, since pretty much everything
will eventually write into perf_sample_data::ip etc.. which is the
second line anyway.

Also, looking at it, we actually have a pointer in there for this,
perf_sample_data::regs_intr::regs, but its at the very tail of this
monster, 4 cachelines off the normal path.

> We change all the callers of perf_sample_regs_user to use
> perf_fetch_regs_user instead.

There's only the one site currently, but yeah.

> What do you think?  If you like it, I can probably find some time to
> give it a shot, but I don't guarantee that I won't miss some subtlety
> in its interaction with the rest of the event output code.

Sure give it a go, I'll stomp on it to fix the pebs-time issue (we need
to skip perf_prepare_sample's PERF_SAMPLE_TIME branch for that).

> On a vaguely related note, why is the big prebs-to-pt_regs copy
> conditional on (sample_type & PERF_SAMPLE_REGS_INTR)?  I bet it would
> be faster to make it unconditional, because you could avoid copying
> over the entire pt_regs struct if PERF_SAMPLE_REGS_INTR isn't set.

Hmm, yes.. that code did move about a bit, not sure what it looked like
originally.

In any case, That fully copy is overkill in the simple case as well, I
think that could get away with only copying cs,flags.


Compile tested only..

---
Subject: perf: Replace perf_sample_data::regs_user_copy with per-cpu storage

struct perf_sample_data is immense, and we carry it on stack, shrink it
some.

struct perf_sample_data {
        /* size: 384, cachelines: 6, members: 19 */
}

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/perf_event.h |  2 --
 kernel/events/core.c       | 23 +++++++++++++++++------
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 85749ae8cb5f..dd2cab6c5bbb 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -795,8 +795,6 @@ struct perf_sample_data {
 	 * on arch details.
 	 */
 	struct perf_regs		regs_user;
-	struct pt_regs			regs_user_copy;
-
 	struct perf_regs		regs_intr;
 	u64				stack_user_size;
 } ____cacheline_aligned;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index eabeb2aec00f..72754607d2cd 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5146,15 +5146,27 @@ perf_output_sample_regs(struct perf_output_handle *handle,
 	}
 }
 
-static void perf_sample_regs_user(struct perf_regs *regs_user,
-				  struct pt_regs *regs,
-				  struct pt_regs *regs_user_copy)
+static DEFINE_PER_CPU(struct pt_regs, __regs_user[4]);
+
+static struct pt_regs *regs_user_ptr(void)
+{
+	if (in_nmi())
+		return this_cpu_ptr(&__regs_user[0]);
+	if (in_interrupt())
+		return this_cpu_ptr(&__regs_user[1]);
+	if (in_serving_softirq())
+		return this_cpu_ptr(&__regs_user[2]);
+	return this_cpu_ptr(&__regs_user[3]);
+}
+
+static void
+perf_sample_regs_user(struct perf_regs *regs_user, struct pt_regs *regs)
 {
 	if (user_mode(regs)) {
 		regs_user->abi = perf_reg_abi(current);
 		regs_user->regs = regs;
 	} else if (current->mm) {
-		perf_get_regs_user(regs_user, regs, regs_user_copy);
+		perf_get_regs_user(regs_user, regs, regs_user_ptr());
 	} else {
 		regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE;
 		regs_user->regs = NULL;
@@ -5638,8 +5650,7 @@ void perf_prepare_sample(struct perf_event_header *header,
 	}
 
 	if (sample_type & (PERF_SAMPLE_REGS_USER | PERF_SAMPLE_STACK_USER))
-		perf_sample_regs_user(&data->regs_user, regs,
-				      &data->regs_user_copy);
+		perf_sample_regs_user(&data->regs_user, regs);
 
 	if (sample_type & PERF_SAMPLE_REGS_USER) {
 		/* regs dump ABI info */

  parent reply	other threads:[~2016-04-25 15:17 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-06 16:29 [PATCH 0/2] x86: add arch_prctl to switch between native/compat modes Dmitry Safonov
2016-04-06 16:29 ` [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode Dmitry Safonov
2016-04-06 18:04   ` Andy Lutomirski
2016-04-06 18:49     ` Andy Lutomirski
2016-04-07 12:11     ` Dmitry Safonov
2016-04-07 12:21       ` Cyrill Gorcunov
2016-04-07 12:35         ` Dmitry Safonov
2016-04-07 14:39       ` Andy Lutomirski
2016-04-07 15:18         ` Dmitry Safonov
2016-04-08 13:50         ` Dmitry Safonov
2016-04-08 15:56           ` Andy Lutomirski
2016-04-08 16:18             ` Dmitry Safonov
2016-04-08 20:44               ` Andy Lutomirski
2016-04-09  8:06                 ` Dmitry Safonov
2016-04-13 16:55                 ` Dmitry Safonov
2016-04-14 18:27                   ` Andy Lutomirski
2016-04-20 11:04                     ` Peter Zijlstra
2016-04-20 15:40                       ` Andy Lutomirski
2016-04-20 19:05                         ` Peter Zijlstra
2016-04-21 19:39                           ` Andy Lutomirski
2016-04-21 20:12                             ` Peter Zijlstra
2016-04-21 23:27                               ` Andy Lutomirski
2016-04-21 23:46                                 ` Andy Lutomirski
2016-04-25 15:16                                 ` Peter Zijlstra [this message]
2016-04-25 16:50                                   ` Andy Lutomirski
2016-04-06 16:29 ` [PATCH 2/2] x86/tools/testing: add test for ARCH_SET_COMPAT Dmitry Safonov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160425151638.GL3430@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=0x7f454c46@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dsafonov@virtuozzo.com \
    --cc=gorcunov@openvz.org \
    --cc=hpa@zytor.com \
    --cc=khorenko@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=shuahkh@osg.samsung.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.