All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Shuah Khan <shuahkh@osg.samsung.com>,
	Ingo Molnar <mingo@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Borislav Petkov <bp@alien8.de>,
	khorenko@virtuozzo.com, X86 ML <x86@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	xemul@virtuozzo.com, linux-kselftest@vger.kernel.org,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode
Date: Wed, 20 Apr 2016 08:40:23 -0700	[thread overview]
Message-ID: <CALCETrUxiT5Y9GMf0pRFXoh2wMACMdGbbBeT2zoy38idS-fC5g@mail.gmail.com> (raw)
In-Reply-To: <20160420110402.GY3408@twins.programming.kicks-ass.net>

On Wed, Apr 20, 2016 at 4:04 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Apr 14, 2016 at 11:27:35AM -0700, Andy Lutomirski wrote:
>> On Wed, Apr 13, 2016 at 9:55 AM, Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
>> > On 04/08/2016 11:44 PM, Andy Lutomirski wrote:
>> >>
>> >> Feel free to ask for help on some of these details.  user_64bit_mode
>> >> will be helpful too.
>> >
>> > Hello again,
>> >
>> > here are some questions on  TIF_IA32 removal:
>> > - in function intel_pmu_pebs_fixup_ip: there is need to
>> > know if process was it native/compat mode for instruction
>> > interpreter for IP + one instruction fixup. There are
>> > registers, but they are from PEBS, which does not contain
>> > segment descriptors (even for PEBSv3). Other values
>> > are from interrupt regs (look at setup_pebs_sample_data).
>> > So, I guess, we may use user_64bit_mode on interrupt
>> > register set, which will be racy with changing task's mode,
>> > but quite ok?
>>
>> Here's my understanding:
>>
>> We don't actually know the mode, and there's no way we could get it
>> exactly.  User code could have changed the mode between when the PEBS
>> event was written and when we got the interrupt, and there's no way
>> for us to tell.
>>
>> The regs passed to the interrupt aren't particularly helpful -- if we
>> get the overflow event from kernel mode, the regs will be kernel regs,
>> not user regs.
>>
>> What we can do is to the the regs returned by perf_get_regs_user,
>> which I imagine perf is already doing.  Peter, is this the case?
>
> *confused*, how is perf_get_regs_user() connected to the PEBS fixup?
>
> Ah, you want to use perf_get_regs_user() instead of task_pt_regs()
> because of how an NMI during interrupt entry would mess up the
> task_pt_regs() contents.
>
> At that point you can use regs_user->abi, right?

Yes, exactly.

Do LBR, PEBS, and similar report user regs or do they merely want to
know the instruction format?  If the latter, I could whip up a tiny
function to do just that (like perf_get_regs_user but just for ABI --
it would be simpler).

[merging some emails]

>> Peter, I got lost in the code that calls this.  Are regs coming from
>> the overflow interrupt's regs, current_pt_regs(), or
>> perf_get_regs_user?
>
> So get_perf_callchain() will get regs from:
>
>  - interrupt/NMI regs
>  - perf_arch_fetch_caller_regs()
>
> And when user && !user_mode(), we'll use:
>
>  - task_pt_regs() (which arguably should maybe be perf_get_regs_user())

Could you point me to this bit of the code?

>
> to call perf_callchain_user(), which then, ands up calling
> perf_callchain_user32() which is expected to NO-OP for 64bit userspace.
>
>> If it's the perf_get_regs_user, then this should be okay, but passing
>> in the ABI field directly would be even nicer.  If they're coming from
>> the overflow interrupt's regs or current_pt_regs(), could we change
>> that?
>>
>> It might also be nice to make sure that we call perf_get_regs_user
>> exactly once per overflow interrupt -- i.e. we could push it into the
>> main code rather than the regs sampling code.
>
> The risk there is that we might not need the user regs at all to handle
> the overflow thingy, so doing it unconditionally would be unwanted.

One call to perf_get_user_regs per interrupt shouldn't be too bad --
certainly much better then one per PEBS record.  One call to get user
ABI per overflow would be even less bad, but at that point, folding it
in to the PEBS code wouldn't be so bad either.

If I'm understanding this right (a big, big if), if we get a PEBS
overflow while running in user mode, we'll dump out the user regs (and
call perf_get_regs_user) and all the PEBS entries (subject to
exclude_kernel and with all the decoding magic).  So, in that case, we
call perf_get_user_regs.

If we get a PEBS overflow while running in kernel mode, we'll report
the kernel regs (if !exclude_kernel) and report the PEBS data as well.
If any of those records are in user mode, then, ideally, we'd invoke
perf_get_regs_user or similar *once* to get the ABI.  Although, if we
can get the user ABI efficiently enough, then maybe we don't care if
we call it once per PEBS record.

On x86, the only weird cases are NMIs or MCEs that land in the
syscall, syscall32, and sysenter prologues (easy to handle fully
correctly if we care because the IP that we interrupted tells us the
ABI) and the bullshit SYSENTER+TF thing.  Even the latter isn't so
hard to get right.

--Andy

  reply	other threads:[~2016-04-20 15:40 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-06 16:29 [PATCH 0/2] x86: add arch_prctl to switch between native/compat modes Dmitry Safonov
2016-04-06 16:29 ` [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode Dmitry Safonov
2016-04-06 18:04   ` Andy Lutomirski
2016-04-06 18:49     ` Andy Lutomirski
2016-04-07 12:11     ` Dmitry Safonov
2016-04-07 12:21       ` Cyrill Gorcunov
2016-04-07 12:35         ` Dmitry Safonov
2016-04-07 14:39       ` Andy Lutomirski
2016-04-07 15:18         ` Dmitry Safonov
2016-04-08 13:50         ` Dmitry Safonov
2016-04-08 15:56           ` Andy Lutomirski
2016-04-08 16:18             ` Dmitry Safonov
2016-04-08 20:44               ` Andy Lutomirski
2016-04-09  8:06                 ` Dmitry Safonov
2016-04-13 16:55                 ` Dmitry Safonov
2016-04-14 18:27                   ` Andy Lutomirski
2016-04-20 11:04                     ` Peter Zijlstra
2016-04-20 15:40                       ` Andy Lutomirski [this message]
2016-04-20 19:05                         ` Peter Zijlstra
2016-04-21 19:39                           ` Andy Lutomirski
2016-04-21 20:12                             ` Peter Zijlstra
2016-04-21 23:27                               ` Andy Lutomirski
2016-04-21 23:46                                 ` Andy Lutomirski
2016-04-25 15:16                                 ` Peter Zijlstra
2016-04-25 16:50                                   ` Andy Lutomirski
2016-04-06 16:29 ` [PATCH 2/2] x86/tools/testing: add test for ARCH_SET_COMPAT Dmitry Safonov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrUxiT5Y9GMf0pRFXoh2wMACMdGbbBeT2zoy38idS-fC5g@mail.gmail.com \
    --to=luto@amacapital.net \
    --cc=0x7f454c46@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dsafonov@virtuozzo.com \
    --cc=gorcunov@openvz.org \
    --cc=hpa@zytor.com \
    --cc=khorenko@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=shuahkh@osg.samsung.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.