All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Shuah Khan <shuahkh@osg.samsung.com>,
	Ingo Molnar <mingo@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Borislav Petkov <bp@alien8.de>,
	khorenko@virtuozzo.com, X86 ML <x86@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	xemul@virtuozzo.com, linux-kselftest@vger.kernel.org,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode
Date: Thu, 21 Apr 2016 12:39:42 -0700	[thread overview]
Message-ID: <CALCETrXXRrDpC3u=4tQj8iTTP1dE44YNKSNTYEPdRWW-E3i3nw@mail.gmail.com> (raw)
In-Reply-To: <20160420190507.GF3408@twins.programming.kicks-ass.net>

On Wed, Apr 20, 2016 at 12:05 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Apr 20, 2016 at 08:40:23AM -0700, Andy Lutomirski wrote:
>> Do LBR, PEBS, and similar report user regs or do they merely want to
>> know the instruction format?  If the latter, I could whip up a tiny
>> function to do just that (like perf_get_regs_user but just for ABI --
>> it would be simpler).
>
> Just the instruction format, nothing else.
>
>> >> Peter, I got lost in the code that calls this.  Are regs coming from
>> >> the overflow interrupt's regs, current_pt_regs(), or
>> >> perf_get_regs_user?
>> >
>> > So get_perf_callchain() will get regs from:
>> >
>> >  - interrupt/NMI regs
>> >  - perf_arch_fetch_caller_regs()
>> >
>> > And when user && !user_mode(), we'll use:
>> >
>> >  - task_pt_regs() (which arguably should maybe be perf_get_regs_user())
>>
>> Could you point me to this bit of the code?
>
> kernel/events/callchain.c:198

But that only applies to the callchain code, right?  AFAICS the PEBS
code is invoked through the x86_pmu NMI handler and always gets the
IRQ regs.  Except for this case:

static inline void intel_pmu_drain_pebs_buffer(void)
{
    struct pt_regs regs;

    x86_pmu.drain_pebs(&regs);
}

which seems a bit confused.

I don't suppose we could arrange to pass something consistent into the
PEBS handlers...

Or is the PEBS code being called from the callchain code somehow?

I haven't dug in to the LBR code much.

>>
>> One call to perf_get_user_regs per interrupt shouldn't be too bad --
>> certainly much better then one per PEBS record.  One call to get user
>> ABI per overflow would be even less bad, but at that point, folding it
>> in to the PEBS code wouldn't be so bad either.
>
> Right; although note that the whole fixup_ip() thing requires a single
> record per interrupt (for we need the LBR state for each record in order
> to rewind).

So do earlier PEBS events not get rewound?  Or so we just program the
thing to only ever give us one event at a time?

>
> Also, HSW+ PEBS doesn't do the fixup anymore.
>
>> If I'm understanding this right (a big, big if), if we get a PEBS
>> overflow while running in user mode, we'll dump out the user regs (and
>> call perf_get_regs_user) and all the PEBS entries (subject to
>> exclude_kernel and with all the decoding magic).  So, in that case, we
>> call perf_get_user_regs.
>
> We only dump user regs if PERF_SAMPLE_REGS_USER, and in case we hit
> userspace userspace with the interrupt we use the interrupt regs; see
> perf_sample_regs_user().
>
>> If we get a PEBS overflow while running in kernel mode, we'll report
>> the kernel regs (if !exclude_kernel) and report the PEBS data as well.
>> If any of those records are in user mode, then, ideally, we'd invoke
>> perf_get_regs_user or similar *once* to get the ABI.  Although, if we
>> can get the user ABI efficiently enough, then maybe we don't care if
>> we call it once per PEBS record.
>
> Right, if we interrupt kernel mode, we'll call perf_get_regs_user() if
> PERF_SAMPLE_REGS_USER (| PERF_SAMPLE_STACK_USER).

But not in get_perf_callchain.  So we'll show the correct user *regs*
but not the current user callchain under some conditions, AFAICS.

>
> The problem here is that the overflow stuff is designed for a single
> 'event' per interrupt, so passing it data for multiple events is
> somewhat icky.

It also seems that there's a certain amount of confusion as to exactly
what "regs" means in various contexts.  Or at least I'm confused by
it.

--Andy

  reply	other threads:[~2016-04-21 19:40 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-06 16:29 [PATCH 0/2] x86: add arch_prctl to switch between native/compat modes Dmitry Safonov
2016-04-06 16:29 ` [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode Dmitry Safonov
2016-04-06 18:04   ` Andy Lutomirski
2016-04-06 18:49     ` Andy Lutomirski
2016-04-07 12:11     ` Dmitry Safonov
2016-04-07 12:21       ` Cyrill Gorcunov
2016-04-07 12:35         ` Dmitry Safonov
2016-04-07 14:39       ` Andy Lutomirski
2016-04-07 15:18         ` Dmitry Safonov
2016-04-08 13:50         ` Dmitry Safonov
2016-04-08 15:56           ` Andy Lutomirski
2016-04-08 16:18             ` Dmitry Safonov
2016-04-08 20:44               ` Andy Lutomirski
2016-04-09  8:06                 ` Dmitry Safonov
2016-04-13 16:55                 ` Dmitry Safonov
2016-04-14 18:27                   ` Andy Lutomirski
2016-04-20 11:04                     ` Peter Zijlstra
2016-04-20 15:40                       ` Andy Lutomirski
2016-04-20 19:05                         ` Peter Zijlstra
2016-04-21 19:39                           ` Andy Lutomirski [this message]
2016-04-21 20:12                             ` Peter Zijlstra
2016-04-21 23:27                               ` Andy Lutomirski
2016-04-21 23:46                                 ` Andy Lutomirski
2016-04-25 15:16                                 ` Peter Zijlstra
2016-04-25 16:50                                   ` Andy Lutomirski
2016-04-06 16:29 ` [PATCH 2/2] x86/tools/testing: add test for ARCH_SET_COMPAT Dmitry Safonov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrXXRrDpC3u=4tQj8iTTP1dE44YNKSNTYEPdRWW-E3i3nw@mail.gmail.com' \
    --to=luto@amacapital.net \
    --cc=0x7f454c46@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dsafonov@virtuozzo.com \
    --cc=gorcunov@openvz.org \
    --cc=hpa@zytor.com \
    --cc=khorenko@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=shuahkh@osg.samsung.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.