All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Joerg Roedel <joro@8bytes.org>
Cc: Avi Kivity <avi@redhat.com>,
	Robert Richter <robert.richter@amd.com>,
	Benjamin Block <bebl@mageta.org>,
	Hans Rosenfeld <hans.rosenfeld@amd.com>,
	hpa@zytor.com, tglx@linutronix.de, suresh.b.siddha@intel.com,
	eranian@google.com, brgerst@gmail.com, Andreas.Herrmann3@amd.com,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	Benjamin Block <benjamin.block@amd.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC 4/5] x86, perf: implements lwp-perf-integration (rc1)
Date: Tue, 20 Dec 2011 19:40:04 +0100	[thread overview]
Message-ID: <20111220184004.GE8408@elte.hu> (raw)
In-Reply-To: <20111220152758.GA30127@8bytes.org>


* Joerg Roedel <joro@8bytes.org> wrote:

> Hi Ingo,
> 
> On Tue, Dec 20, 2011 at 11:09:17AM +0100, Ingo Molnar wrote:
> > >
> > > No, it's worse.  They are ring 3 writeable, and ring 3 
> > > configurable.
> > 
> > Avi, i know that very well.
> 
> So you agree that your ideas presented in this thread of 
> integrating LWP into perf have serious security implications?

No, i do not agree at all - you are drastically misrepresending 
my position.

> > > btw, that means that the intended use case - 
> > > self-monitoring with no kernel support - cannot be done. 
> > > [...]
> > 
> > Arguably many years ago the hardware was designed for 
> > brain-dead instrumentation abstractions.
> 
> The point of LWP design is, that it doesn't require 
> abstractions except for the threshold interrupt.
> 
> I am fine with integrating LWP into perf as long as it makes 
> sense and does not break the intended usage scenario for LWP.

That's the wrong way around - in reality we'll integrate LWP 
upstream only once it makes sense and works well with the 
primary instrumentation abstraction we have in the kernel.

Otherwise my "sorry, it's not convincing enough yet" NAK against 
the new feature stands.

In fact as per Linus's rules about new kernel features, 
maintainers don't even have to justify NAK's by offering an 
implementation roadmap that would make the feature acceptable.

Me or PeterZ could just say "this feature is too limited and not 
convincing enough yet, sorry".

*You* who are pushing the feature have to convince the objecting 
maintainer that the feature is worth integrating.

But i'm being nice and helpful here by giving you a rough 
technical outline of how you could overcome my "sorry, this is 
not convincing in its current form yet" rejection of the current 
LWP patches.

> 	[ Because LWP is a user-space feature and designed as such,
> 	  forcing it into an abstraction makes software that uses LWP
> 	  unportable. ]
> 
> But Ingo, the ideas you presented in this thread are clearly 
> no-gos.

Nonsense.

> Having a shared per-cpu buffer for LWP data that is read by 
> perf obviously has very bad security implications, as Avi 
> already pointed out. [...]

Stop this stupidity already!

There's no "security implications" whatsoever. LWP is a ring-3 
hw feature and it can do nothing that the user-space app could 
not already do ...

> [...] It also destroys the intended use-case for LWP because 
> it disturbs any process that is doing self-profiling with LWP.

Why would it destroy that? Self-profiling can install events 
just fine, the kernel will arbitrate the resource.

The 'intended usecase' is meaningless to me - it was done in 
some closed process apparently not talking to anyone who knows a 
bit about Linux instrumentation. If you want this code upstream 
then you need to convince me that the feature makes sense in the 
general and current scheme of things.

I've outlined the (rather workable) technical roadmap for that.

> > Note that as i said user-space *can* acccess the area if it 
> > thinks it can do it better than the kernel (and we could 
> > export that information in a well defined way - we could do 
> > the same for PEBS as well) - i have no particular strong 
> > feelings about allowing that other than i think it's an 
> > obviously inferior model - *as long* as proper, generic, 
> > usable support is added.
> 
> LWP can't be compared in any serious way with PEBS. The only 
> common thing is the hardware-managed ring-buffer. [...]

Which ring-buffer is actually happens to be one of the main 
things that has to be managed ...

> [...] But PEBS is an addition to MSR based performance 
> monitoring resources (for which a kernel abstraction makes a 
> lot of sense) and can only be controlled from ring 0 while LWP 
> is a complete user-space controlled PMU which has no link at 
> all to the MSR-based, ring 0 controlled PMU.

It's a ring-3 controlled PMU feature, not a user-space PMU 
feature. It *can* be controlled by user-space - but it obviously 
can also (and i argue, it should be) - managed by the kernel, 
under Linux.

The kernel is running ring-3 code as well, and it's managing 
ring-3 accessible resources as well, there's nothing new about 
that.

> > From my perspective there's really just one realistic option 
> > to accept this feature: if it's properly fit into existing, 
> > modern instrumentation abstractions. I made that abundantly 
> > clear in my feedback so far.
> 
> The threshold interrupt fits well into the perf-abstraction 
> layer. Even self-monitoring of processes does, and Hans posted 
> patches from Benjamin for that.  What do you think about this 
> approach?

As as i said it's a promising first step - although the 
discussion here convinced me that it needs to be even more 
feature complete, i don't really see that you guys understand 
how such things should be implemented.

You seem to be dead set on supporting a weird special case 
'intended workload' while forgetting the *much* more common 
profilin workloads we have under Linux.

I don't mind supporting weird stuff as well, but you have to 
keep the common case in mind ...

I'd like to see the ring-buffer and the events managed by the 
kernel too, at least so that perf record works fine with this 
PMU feature.

> > User-space can smash it and make it not profile or profile 
> > the wrong thing or into the wrong buffer - but LWP itself 
> > runs with ring3 privileges so it won't do anything the user 
> > couldnt do already.
> 
> The point is, if user-space re-programs LWP it will continue 
> to write its samples to the new ring-buffer virtual-address 
> set up by user-space. It will still use that virtual address 
> in another address-space after a task-switch. This allows 
> processes to corrupt memory of other processes. [...]

That's nonsense. As i said it my previous mail the LWPC should 
be per task and switched on task switch - just like the DS/PEBS 
context is.

> [...] There are ways to hack around that but these have a 
> serious impact on task-switch costs so this is also no way to 
> go.

We are seeing no problems with this approach under PEBS.

> > Lack of protection against self-misconfiguration-damage is a 
> > benign hardware mis-feature - something for LWP v2 to 
> > specify i guess.
> 
> So what you are saying is (not just here, also in other emails 
> in this thread) that every hardware not designed for perf is 
> crap?

No - PMU hardware designed to not allow the profiling of the 
kernel is obviously a crappy aspect of it. Also, PMU hardware 
that does not allow 100% encapsulation by the kernel is 
obviously not very wisely done either.

Those limitations are not a big problem for usable Linux support 
- and future iterations of the LWP hardware can trivially 
address those shortcomings.

> > get_unmapped_area() + install_special_mapping() is probably 
> > better, but yeah.
> 
> get_unmapped_area() only works on current. [...]

Which is a perfectly fine first step to support the
'perf record' inheritance-tree case - which is a very
common profiling method.

> [...] So it can't be used for that purpose too. [...]

Hey, i wrote bits of get_unmapped_area(), way back. I had code 
on my machine that inserted vmas into other tasks's address 
spaces and can confirm that it works. Do you take my word for it 
that it's possible?

Firstly, the perf record case - which is an important, primary 
workflow - can work with the code as-is just fine.

Secondly, for system-wide profiling vmas can be inserted into 
another task's mm context just fine as well: technically we do 
that all the time, when a threaded program is running.

Inserting a vma into another task's mm where that mm is not ours 
is indeed not typical, but not unprecedented either, UML patches 
did that a couple of years ago. (In fact the upcoming uprobes 
patches are doing something far more intrusive.)

The VM modification is trivial AFAICS: an 'mm' parameter has to 
be added to a new do_mmap() variant, that's all - the code is 
already SMP-safe, due to the threaded case.

Otherwise using another task's mm is safe if you acquire it via 
get_task_mm()/mmput().

[ Sidenote: as a bonus this would put infrastructure in place to 
  have user-space accessible trace buffers, insertable via 
  the LWPINS instruction and recoverable via the regular kernel 
  perf event processing facilities. LWP has more potential than
  just self-profiling, if we use the right abstractions... ]

Thanks,

	Ingo

  reply	other threads:[~2011-12-20 18:42 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-29 12:41 [PATCH 0/9] rework of extended state handling, LWP support Hans Rosenfeld
2011-11-29 12:41 ` [PATCH 1/9] x86, xsave: warn on #NM exceptions caused by the kernel Hans Rosenfeld
2011-11-29 12:41 ` [PATCH 2/9] x86, xsave: cleanup fpu/xsave support Hans Rosenfeld
2011-11-29 12:41 ` [PATCH 3/9] x86, xsave: cleanup fpu/xsave signal frame setup Hans Rosenfeld
2011-11-29 12:41 ` [PATCH 4/9] x86, xsave: rework fpu/xsave support Hans Rosenfeld
2011-11-29 12:41 ` [PATCH 5/9] x86, xsave: remove unused code Hans Rosenfeld
2011-11-29 12:41 ` [PATCH 6/9] x86, xsave: more cleanups Hans Rosenfeld
2011-11-29 12:41 ` [PATCH 7/9] x86, xsave: remove lazy allocation of xstate area Hans Rosenfeld
2011-11-29 12:41 ` [PATCH 8/9] x86, xsave: add support for non-lazy xstates Hans Rosenfeld
2011-11-29 12:41 ` [PATCH 9/9] x86, xsave: add kernel support for AMDs Lightweight Profiling (LWP) Hans Rosenfeld
2011-11-29 21:31 ` [PATCH 0/9] rework of extended state handling, LWP support Andi Kleen
2011-11-30 17:37   ` Hans Rosenfeld
2011-11-30 21:52     ` Andi Kleen
2011-12-01 20:36       ` Hans Rosenfeld
2011-12-02  2:01         ` H. Peter Anvin
2011-12-02 11:20           ` Hans Rosenfeld
2011-12-07 19:57             ` Hans Rosenfeld
2011-12-07 20:00               ` [PATCH 7/8] x86, xsave: add support for non-lazy xstates Hans Rosenfeld
2011-12-07 20:00                 ` [PATCH 8/8] x86, xsave: add kernel support for AMDs Lightweight Profiling (LWP) Hans Rosenfeld
2011-12-05 10:22 ` [PATCH 0/9] rework of extended state handling, LWP support Ingo Molnar
2011-12-16 16:07   ` Hans Rosenfeld
2011-12-16 16:12     ` [RFC 1/5] x86, perf: Implement software-activation of lwp Hans Rosenfeld
2011-12-16 16:12       ` [RFC 2/5] perf: adds prototype for a new perf-context-type Hans Rosenfeld
2011-12-16 16:12       ` [RFC 3/5] perf: adds a new pmu-initialization-call Hans Rosenfeld
2011-12-16 16:12       ` [RFC 4/5] x86, perf: implements lwp-perf-integration (rc1) Hans Rosenfeld
2011-12-18  8:04         ` Ingo Molnar
2011-12-18 15:22           ` Benjamin Block
2011-12-18 23:43             ` Ingo Molnar
2011-12-19  9:09               ` Robert Richter
2011-12-19 10:54                 ` Ingo Molnar
2011-12-19 11:12                   ` Avi Kivity
2011-12-19 11:40                     ` Ingo Molnar
2011-12-19 11:58                       ` Avi Kivity
2011-12-19 18:13                         ` Benjamin
2011-12-20  8:56                           ` Ingo Molnar
2011-12-20  9:15                         ` Ingo Molnar
2011-12-20  9:47                           ` Avi Kivity
2011-12-20 10:09                             ` Ingo Molnar
2011-12-20 15:27                               ` Joerg Roedel
2011-12-20 18:40                                 ` Ingo Molnar [this message]
2011-12-21  0:07                                   ` Joerg Roedel
2011-12-21 12:34                                     ` Ingo Molnar
2011-12-21 12:44                                       ` Avi Kivity
2011-12-21 13:22                                         ` Ingo Molnar
2011-12-21 22:49                                           ` Joerg Roedel
2011-12-23 10:53                                             ` Ingo Molnar
2011-12-21 11:46                                   ` Gleb Natapov
2011-12-23 10:56                                     ` Ingo Molnar
2011-12-20 15:48                           ` Vince Weaver
2011-12-20 18:27                             ` Ingo Molnar
2011-12-20 22:47                               ` Vince Weaver
2011-12-21 12:00                                 ` Ingo Molnar
2011-12-21 13:55                                   ` Vince Weaver
2011-12-16 16:12       ` [RFC 5/5] x86, perf: adds support for the LWP threshold-int Hans Rosenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111220184004.GE8408@elte.hu \
    --to=mingo@elte.hu \
    --cc=Andreas.Herrmann3@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=bebl@mageta.org \
    --cc=benjamin.block@amd.com \
    --cc=brgerst@gmail.com \
    --cc=eranian@google.com \
    --cc=hans.rosenfeld@amd.com \
    --cc=hpa@zytor.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robert.richter@amd.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.