linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
       [not found] ` <7DF88F22-0310-40C9-9DA6-5EBCB4877933@amacapital.net>
@ 2020-08-24 21:10   ` Andy Lutomirski
  2020-08-24 23:52     ` H. Peter Anvin
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2020-08-24 21:10 UTC (permalink / raw)
  To: Robert O'Callahan
  Cc: Bae, Chang Seok, Kyle Huey, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Andi Kleen, Shankar, Ravi V, LKML,
	Hansen, Dave

On Sat, Aug 22, 2020 at 6:19 AM Andy Lutomirski <luto@amacapital.net> wrote:
>
> We can give you a new ptrace operation to load the selector and deduce the base from the descriptor table if it would help.

Concretely, we could add one of these:

PTRACE_READ_SEGMENT_DESCRIPTOR to read a segment descriptor.

PTRACE_SET_FS / PTRACE_SET_GS: Sets FS or GS and updates the base accordingly.

PTRACE_READ_SEGMENT_BASE: pass in a segment selector, get a base out.
You would use this to populate the base fields.

or perhaps a ptrace SETREGS variant that tries to preserve the old
base semantics and magically sets the bases to match the selectors if
the selectors are nonzero.

Do any of these choices sound preferable to any of you?

--Andhy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-24 21:10   ` [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system Andy Lutomirski
@ 2020-08-24 23:52     ` H. Peter Anvin
  2020-08-25  0:30       ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: H. Peter Anvin @ 2020-08-24 23:52 UTC (permalink / raw)
  To: Andy Lutomirski, Robert O'Callahan
  Cc: Bae, Chang Seok, Kyle Huey, Thomas Gleixner, Ingo Molnar,
	Andi Kleen, Shankar, Ravi V, LKML, Hansen, Dave

On 2020-08-24 14:10, Andy Lutomirski wrote:
> 
> PTRACE_READ_SEGMENT_DESCRIPTOR to read a segment descriptor.
> 
> PTRACE_SET_FS / PTRACE_SET_GS: Sets FS or GS and updates the base accordingly.
> 
> PTRACE_READ_SEGMENT_BASE: pass in a segment selector, get a base out.
> You would use this to populate the base fields.
> 
> or perhaps a ptrace SETREGS variant that tries to preserve the old
> base semantics and magically sets the bases to match the selectors if
> the selectors are nonzero.
> 
> Do any of these choices sound preferable to any of you?
> 

My suggestion would be to export the GDT and LDT as a (readonly or mostly
readonly) regset(s) rather than adding entirely new operations. We could allow
the LDT and the per-thread GDT entries to be written, subject to the same
limitations as the corresponding system calls.

	-hpa


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-24 23:52     ` H. Peter Anvin
@ 2020-08-25  0:30       ` Andy Lutomirski
  2020-08-25  0:46         ` Kyle Huey
  2020-08-25 15:13         ` hpa
  0 siblings, 2 replies; 18+ messages in thread
From: Andy Lutomirski @ 2020-08-25  0:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andy Lutomirski, Robert O'Callahan, Bae, Chang Seok,
	Kyle Huey, Thomas Gleixner, Ingo Molnar, Andi Kleen, Shankar,
	Ravi V, LKML, Hansen, Dave

On Mon, Aug 24, 2020 at 4:52 PM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 2020-08-24 14:10, Andy Lutomirski wrote:
> >
> > PTRACE_READ_SEGMENT_DESCRIPTOR to read a segment descriptor.
> >
> > PTRACE_SET_FS / PTRACE_SET_GS: Sets FS or GS and updates the base accordingly.
> >
> > PTRACE_READ_SEGMENT_BASE: pass in a segment selector, get a base out.
> > You would use this to populate the base fields.
> >
> > or perhaps a ptrace SETREGS variant that tries to preserve the old
> > base semantics and magically sets the bases to match the selectors if
> > the selectors are nonzero.
> >
> > Do any of these choices sound preferable to any of you?
> >
>
> My suggestion would be to export the GDT and LDT as a (readonly or mostly
> readonly) regset(s) rather than adding entirely new operations. We could allow
> the LDT and the per-thread GDT entries to be written, subject to the same
> limitations as the corresponding system calls.
>

That seems useful, although we'd want to do some extensive
sanitization of the GDT.  But maybe it's obnoxious to ask Kyle and
Robert to parse the GDT, LDT, and selector just to emulate the
demented pre-5.9 ptrace() behavior.

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-25  0:30       ` Andy Lutomirski
@ 2020-08-25  0:46         ` Kyle Huey
  2020-08-25 16:12           ` Andy Lutomirski
  2020-08-25 15:13         ` hpa
  1 sibling, 1 reply; 18+ messages in thread
From: Kyle Huey @ 2020-08-25  0:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. Peter Anvin, Robert O'Callahan, Bae, Chang Seok,
	Thomas Gleixner, Ingo Molnar, Andi Kleen, Shankar, Ravi V, LKML,
	Hansen, Dave

On Mon, Aug 24, 2020 at 5:31 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Mon, Aug 24, 2020 at 4:52 PM H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > On 2020-08-24 14:10, Andy Lutomirski wrote:
> > >
> > > PTRACE_READ_SEGMENT_DESCRIPTOR to read a segment descriptor.
> > >
> > > PTRACE_SET_FS / PTRACE_SET_GS: Sets FS or GS and updates the base accordingly.
> > >
> > > PTRACE_READ_SEGMENT_BASE: pass in a segment selector, get a base out.
> > > You would use this to populate the base fields.
> > >
> > > or perhaps a ptrace SETREGS variant that tries to preserve the old
> > > base semantics and magically sets the bases to match the selectors if
> > > the selectors are nonzero.
> > >
> > > Do any of these choices sound preferable to any of you?
> > >
> >
> > My suggestion would be to export the GDT and LDT as a (readonly or mostly
> > readonly) regset(s) rather than adding entirely new operations. We could allow
> > the LDT and the per-thread GDT entries to be written, subject to the same
> > limitations as the corresponding system calls.
> >
>
> That seems useful, although we'd want to do some extensive
> sanitization of the GDT.  But maybe it's obnoxious to ask Kyle and
> Robert to parse the GDT, LDT, and selector just to emulate the
> demented pre-5.9 ptrace() behavior.
>
> --Andy

We've already addressed the main issue on rr's side[0]. The only
outstanding issue is that if you record a trace with 32 bit programs
on a pre-5.9 64 bit kernel and then try to replay it on 5.9 it won't
work. If you hit this case rr will print an error message telling you
to boot your 5.9 kernel with nofsgsbase if you want to replay the
trace. I think that's probably sufficient. 32 bit is legacy stuff we
don't care that much about anyways, replaying traces on a different
kernel/machine has always been a bit dicey, and if you absolutely must
do it there is a workaround. I'm not inclined to do much work to
support the narrow remaining case.

- Kyle

[0] Namely, we're tracking fs/gsbase for 32 bit tracees on 64 bit
kernels where the fs/gsbase instructions work in new recordings now:
https://github.com/mozilla/rr/commit/c3292c75dbd8c9ce5256496108965c0442424eef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-25  0:30       ` Andy Lutomirski
  2020-08-25  0:46         ` Kyle Huey
@ 2020-08-25 15:13         ` hpa
  1 sibling, 0 replies; 18+ messages in thread
From: hpa @ 2020-08-25 15:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Robert O'Callahan, Bae, Chang Seok, Kyle Huey,
	Thomas Gleixner, Ingo Molnar, Andi Kleen, Shankar, Ravi V, LKML,
	Hansen, Dave

On August 24, 2020 5:30:56 PM PDT, Andy Lutomirski <luto@kernel.org> wrote:
>On Mon, Aug 24, 2020 at 4:52 PM H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> On 2020-08-24 14:10, Andy Lutomirski wrote:
>> >
>> > PTRACE_READ_SEGMENT_DESCRIPTOR to read a segment descriptor.
>> >
>> > PTRACE_SET_FS / PTRACE_SET_GS: Sets FS or GS and updates the base
>accordingly.
>> >
>> > PTRACE_READ_SEGMENT_BASE: pass in a segment selector, get a base
>out.
>> > You would use this to populate the base fields.
>> >
>> > or perhaps a ptrace SETREGS variant that tries to preserve the old
>> > base semantics and magically sets the bases to match the selectors
>if
>> > the selectors are nonzero.
>> >
>> > Do any of these choices sound preferable to any of you?
>> >
>>
>> My suggestion would be to export the GDT and LDT as a (readonly or
>mostly
>> readonly) regset(s) rather than adding entirely new operations. We
>could allow
>> the LDT and the per-thread GDT entries to be written, subject to the
>same
>> limitations as the corresponding system calls.
>>
>
>That seems useful, although we'd want to do some extensive
>sanitization of the GDT.  But maybe it's obnoxious to ask Kyle and
>Robert to parse the GDT, LDT, and selector just to emulate the
>demented pre-5.9 ptrace() behavior.
>
>--Andy

We only want to allow the same access that user space gets, that's exactly the sanitization we need.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-25  0:46         ` Kyle Huey
@ 2020-08-25 16:12           ` Andy Lutomirski
  2020-08-25 16:32             ` Kyle Huey
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2020-08-25 16:12 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Andy Lutomirski, H. Peter Anvin, Robert O'Callahan, Bae,
	Chang Seok, Thomas Gleixner, Ingo Molnar, Andi Kleen, Shankar,
	Ravi V, LKML, Hansen, Dave

> On Aug 24, 2020, at 5:46 PM, Kyle Huey <me@kylehuey.com> wrote:
>
> On Mon, Aug 24, 2020 at 5:31 PM Andy Lutomirski <luto@kernel.org> wrote:
>>
>>> On Mon, Aug 24, 2020 at 4:52 PM H. Peter Anvin <hpa@zytor.com> wrote:
>>>
>>> On 2020-08-24 14:10, Andy Lutomirski wrote:
>>>>
>>>> PTRACE_READ_SEGMENT_DESCRIPTOR to read a segment descriptor.
>>>>
>>>> PTRACE_SET_FS / PTRACE_SET_GS: Sets FS or GS and updates the base accordingly.
>>>>
>>>> PTRACE_READ_SEGMENT_BASE: pass in a segment selector, get a base out.
>>>> You would use this to populate the base fields.
>>>>
>>>> or perhaps a ptrace SETREGS variant that tries to preserve the old
>>>> base semantics and magically sets the bases to match the selectors if
>>>> the selectors are nonzero.
>>>>
>>>> Do any of these choices sound preferable to any of you?
>>>>
>>>
>>> My suggestion would be to export the GDT and LDT as a (readonly or mostly
>>> readonly) regset(s) rather than adding entirely new operations. We could allow
>>> the LDT and the per-thread GDT entries to be written, subject to the same
>>> limitations as the corresponding system calls.
>>>
>>
>> That seems useful, although we'd want to do some extensive
>> sanitization of the GDT.  But maybe it's obnoxious to ask Kyle and
>> Robert to parse the GDT, LDT, and selector just to emulate the
>> demented pre-5.9 ptrace() behavior.
>>
>> --Andy
>
> We've already addressed the main issue on rr's side[0]. The only
> outstanding issue is that if you record a trace with 32 bit programs
> on a pre-5.9 64 bit kernel and then try to replay it on 5.9 it won't
> work. If you hit this case rr will print an error message telling you
> to boot your 5.9 kernel with nofsgsbase if you want to replay the
> trace. I think that's probably sufficient. 32 bit is legacy stuff we
> don't care that much about anyways, replaying traces on a different
> kernel/machine has always been a bit dicey, and if you absolutely must
> do it there is a workaround. I'm not inclined to do much work to
> support the narrow remaining case.
>
> - Kyle
>
> [0] Namely, we're tracking fs/gsbase for 32 bit tracees on 64 bit
> kernels where the fs/gsbase instructions work in new recordings now:
> https://github.com/mozilla/rr/commit/c3292c75dbd8c9ce5256496108965c0442424eef


I don’t like this at all. Your behavior really shouldn’t depend on
whether the new instructions are available.  Also, some day I would
like to change Linux to have the new behavior even if FSGSBASE
instructions are not available, and this will break rr again.  (The
current !FSGSBASE behavior is an ugly optimization of dubious value.
I would not go so far as to describe it as correct.)

I would suggest you do one of the following things:

1. Use int $0x80 directly to load 32-bit regs into a child.  This
might dramatically simplify your code and should just do the right
thing.

2. Something like your patch but make it unconditional.

3. Ask for, and receive, real kernel support for setting FS and GS in
the way that 32-bit code expects.

Also, for other x86 kernel folks playing along, WTF is
task_user_regset_vew() about?  It has eight callers, in two groups,
and as far as I can tell every single call returns &user_x86_64_view
because it's only called in the 64-bit and x32 syscall paths, and
those are only reachable using the 64-bit SYSCALL instruction.  I
suppose the exception is if someone ptraces the ptracer and changes CS
at syscall entry.  In any case, if task_user_regset_view() ever
returns anything else, the code will malfunction.  I'll send a patch
to get rid of it.

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-25 16:12           ` Andy Lutomirski
@ 2020-08-25 16:32             ` Kyle Huey
  2020-08-25 16:46               ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Kyle Huey @ 2020-08-25 16:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, H. Peter Anvin, Robert O'Callahan, Bae,
	Chang Seok, Thomas Gleixner, Ingo Molnar, Andi Kleen, Shankar,
	Ravi V, LKML, Hansen, Dave

On Tue, Aug 25, 2020 at 9:12 AM Andy Lutomirski <luto@amacapital.net> wrote:
> I don’t like this at all. Your behavior really shouldn’t depend on
> whether the new instructions are available.  Also, some day I would
> like to change Linux to have the new behavior even if FSGSBASE
> instructions are not available, and this will break rr again.  (The
> current !FSGSBASE behavior is an ugly optimization of dubious value.
> I would not go so far as to describe it as correct.)

Ok.

> I would suggest you do one of the following things:
>
> 1. Use int $0x80 directly to load 32-bit regs into a child.  This
> might dramatically simplify your code and should just do the right
> thing.

I don't know what that means.

> 2. Something like your patch but make it unconditional.
>
> 3. Ask for, and receive, real kernel support for setting FS and GS in
> the way that 32-bit code expects.

I think the easiest way forward for us would be a PTRACE_GET/SETREGSET
like operation that operates on the regsets according to the
*tracee*'s bitness (rather than the tracer, as it works currently).
Does that sound workable?

- Kyle

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-25 16:32             ` Kyle Huey
@ 2020-08-25 16:46               ` Andy Lutomirski
  2020-08-25 17:31                 ` Kyle Huey
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2020-08-25 16:46 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Andy Lutomirski, H. Peter Anvin, Robert O'Callahan, Bae,
	Chang Seok, Thomas Gleixner, Ingo Molnar, Andi Kleen, Shankar,
	Ravi V, LKML, Hansen, Dave

On Tue, Aug 25, 2020 at 9:32 AM Kyle Huey <me@kylehuey.com> wrote:
>
> On Tue, Aug 25, 2020 at 9:12 AM Andy Lutomirski <luto@amacapital.net> wrote:
> > I don’t like this at all. Your behavior really shouldn’t depend on
> > whether the new instructions are available.  Also, some day I would
> > like to change Linux to have the new behavior even if FSGSBASE
> > instructions are not available, and this will break rr again.  (The
> > current !FSGSBASE behavior is an ugly optimization of dubious value.
> > I would not go so far as to describe it as correct.)
>
> Ok.
>
> > I would suggest you do one of the following things:
> >
> > 1. Use int $0x80 directly to load 32-bit regs into a child.  This
> > might dramatically simplify your code and should just do the right
> > thing.
>
> I don't know what that means.

This is untested, but what I mean is:

static int ptrace32(int req, pid_t pid, int addr, int data) {
   int ret;
   /* new enough kernels won't clobber r8, etc. */
   asm volatile ("int $0x80" : "=a" (ret) : "a" (26 /* ptrace */), "b"
(req), "c" (pid), "d" (addr), "S" (data) : "flags", "r8", "r9", "r10",
"r11");
   return ret;
}

with a handful of caveats:

 - This won't compile with -fPIC, I think.  Instead you'll need to
write a little bit of asm to set up and restore ebx yourself.  gcc is
silly like this.

 - Note that addr is an int.  You'll need to mmap(..., MAP_32BIT, ...)
to get a buffer that can be pointed to with an int.

The advantage is that this should work on all kernels that support
32-bit mode at all.

>
> > 2. Something like your patch but make it unconditional.
> >
> > 3. Ask for, and receive, real kernel support for setting FS and GS in
> > the way that 32-bit code expects.
>
> I think the easiest way forward for us would be a PTRACE_GET/SETREGSET
> like operation that operates on the regsets according to the
> *tracee*'s bitness (rather than the tracer, as it works currently).
> Does that sound workable?
>

Strictly speaking, on Linux, there is no unified concept of a task's
bitness, so "set all these registers according to the target's
bitness" is not well defined.  We could easily give you a
PTRACE_SETREGS_X86_32, etc, though.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-25 16:46               ` Andy Lutomirski
@ 2020-08-25 17:31                 ` Kyle Huey
  2020-08-25 18:50                   ` Kyle Huey
  0 siblings, 1 reply; 18+ messages in thread
From: Kyle Huey @ 2020-08-25 17:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. Peter Anvin, Robert O'Callahan, Bae, Chang Seok,
	Thomas Gleixner, Ingo Molnar, Andi Kleen, Shankar, Ravi V, LKML,
	Hansen, Dave

On Tue, Aug 25, 2020 at 9:46 AM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Tue, Aug 25, 2020 at 9:32 AM Kyle Huey <me@kylehuey.com> wrote:
> >
> > On Tue, Aug 25, 2020 at 9:12 AM Andy Lutomirski <luto@amacapital.net> wrote:
> > > I don’t like this at all. Your behavior really shouldn’t depend on
> > > whether the new instructions are available.  Also, some day I would
> > > like to change Linux to have the new behavior even if FSGSBASE
> > > instructions are not available, and this will break rr again.  (The
> > > current !FSGSBASE behavior is an ugly optimization of dubious value.
> > > I would not go so far as to describe it as correct.)
> >
> > Ok.
> >
> > > I would suggest you do one of the following things:
> > >
> > > 1. Use int $0x80 directly to load 32-bit regs into a child.  This
> > > might dramatically simplify your code and should just do the right
> > > thing.
> >
> > I don't know what that means.
>
> This is untested, but what I mean is:
>
> static int ptrace32(int req, pid_t pid, int addr, int data) {
>    int ret;
>    /* new enough kernels won't clobber r8, etc. */
>    asm volatile ("int $0x80" : "=a" (ret) : "a" (26 /* ptrace */), "b"
> (req), "c" (pid), "d" (addr), "S" (data) : "flags", "r8", "r9", "r10",
> "r11");
>    return ret;
> }
>
> with a handful of caveats:
>
>  - This won't compile with -fPIC, I think.  Instead you'll need to
> write a little bit of asm to set up and restore ebx yourself.  gcc is
> silly like this.
>
>  - Note that addr is an int.  You'll need to mmap(..., MAP_32BIT, ...)
> to get a buffer that can be pointed to with an int.
>
> The advantage is that this should work on all kernels that support
> 32-bit mode at all.
>
> >
> > > 2. Something like your patch but make it unconditional.
> > >
> > > 3. Ask for, and receive, real kernel support for setting FS and GS in
> > > the way that 32-bit code expects.
> >
> > I think the easiest way forward for us would be a PTRACE_GET/SETREGSET
> > like operation that operates on the regsets according to the
> > *tracee*'s bitness (rather than the tracer, as it works currently).
> > Does that sound workable?
> >
>
> Strictly speaking, on Linux, there is no unified concept of a task's
> bitness, so "set all these registers according to the target's
> bitness" is not well defined.  We could easily give you a
> PTRACE_SETREGS_X86_32, etc, though.

In the process of responding to this I spent some time doing code
inspection and discovered a subtlety in the ptrace API that I was
previously unaware of. PTRACE_GET/SETREGS use the regset views
corresponding to the tracer but PTRACE_GET/SETREGSET use the regset
views corresponding to the tracee. This means it is possible for us
today to set FS/GS "the old way" with a 64 bit tracer/32 bit tracee
combo, as long as we use PTRACE_SETREGSET with NT_PRSTATUS instead of
PTRACE_SETREGS.

- Kyle

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-25 17:31                 ` Kyle Huey
@ 2020-08-25 18:50                   ` Kyle Huey
  2020-08-25 19:32                     ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Kyle Huey @ 2020-08-25 18:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. Peter Anvin, Robert O'Callahan, Bae, Chang Seok,
	Thomas Gleixner, Ingo Molnar, Andi Kleen, Shankar, Ravi V, LKML,
	Hansen, Dave

On Tue, Aug 25, 2020 at 10:31 AM Kyle Huey <me@kylehuey.com> wrote:
>
> On Tue, Aug 25, 2020 at 9:46 AM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Tue, Aug 25, 2020 at 9:32 AM Kyle Huey <me@kylehuey.com> wrote:
> > >
> > > On Tue, Aug 25, 2020 at 9:12 AM Andy Lutomirski <luto@amacapital.net> wrote:
> > > > I don’t like this at all. Your behavior really shouldn’t depend on
> > > > whether the new instructions are available.  Also, some day I would
> > > > like to change Linux to have the new behavior even if FSGSBASE
> > > > instructions are not available, and this will break rr again.  (The
> > > > current !FSGSBASE behavior is an ugly optimization of dubious value.
> > > > I would not go so far as to describe it as correct.)
> > >
> > > Ok.
> > >
> > > > I would suggest you do one of the following things:
> > > >
> > > > 1. Use int $0x80 directly to load 32-bit regs into a child.  This
> > > > might dramatically simplify your code and should just do the right
> > > > thing.
> > >
> > > I don't know what that means.
> >
> > This is untested, but what I mean is:
> >
> > static int ptrace32(int req, pid_t pid, int addr, int data) {
> >    int ret;
> >    /* new enough kernels won't clobber r8, etc. */
> >    asm volatile ("int $0x80" : "=a" (ret) : "a" (26 /* ptrace */), "b"
> > (req), "c" (pid), "d" (addr), "S" (data) : "flags", "r8", "r9", "r10",
> > "r11");
> >    return ret;
> > }
> >
> > with a handful of caveats:
> >
> >  - This won't compile with -fPIC, I think.  Instead you'll need to
> > write a little bit of asm to set up and restore ebx yourself.  gcc is
> > silly like this.
> >
> >  - Note that addr is an int.  You'll need to mmap(..., MAP_32BIT, ...)
> > to get a buffer that can be pointed to with an int.
> >
> > The advantage is that this should work on all kernels that support
> > 32-bit mode at all.
> >
> > >
> > > > 2. Something like your patch but make it unconditional.
> > > >
> > > > 3. Ask for, and receive, real kernel support for setting FS and GS in
> > > > the way that 32-bit code expects.
> > >
> > > I think the easiest way forward for us would be a PTRACE_GET/SETREGSET
> > > like operation that operates on the regsets according to the
> > > *tracee*'s bitness (rather than the tracer, as it works currently).
> > > Does that sound workable?
> > >
> >
> > Strictly speaking, on Linux, there is no unified concept of a task's
> > bitness, so "set all these registers according to the target's
> > bitness" is not well defined.  We could easily give you a
> > PTRACE_SETREGS_X86_32, etc, though.
>
> In the process of responding to this I spent some time doing code
> inspection and discovered a subtlety in the ptrace API that I was
> previously unaware of. PTRACE_GET/SETREGS use the regset views
> corresponding to the tracer but PTRACE_GET/SETREGSET use the regset
> views corresponding to the tracee. This means it is possible for us
> today to set FS/GS "the old way" with a 64 bit tracer/32 bit tracee
> combo, as long as we use PTRACE_SETREGSET with NT_PRSTATUS instead of
> PTRACE_SETREGS.

Alright I reverted the previous changes and switched us to use
PTRACE_SETREGSET with NT_PRSTATUS[0] and our 32 bit tests pass again.
I assume this behavior will remain unchanged indefinitely even when
the fs/gsbase manipulation instructions are not available since the 32
bit user_regs_struct can't grow?

- Kyle

[0] https://github.com/mozilla/rr/commit/5c12d5f9ab77e526f852cbca82f454a42e3a6e30#diff-b509a7939392c11bb3c517b00da4526fL1447
(most of the rest of this commit is fixing our *emulation* of
PTRACE_GETREGSET/PTRACE_SETREGSET which was not respecting the
tracee's 32 vs 64 bit stauts).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-25 18:50                   ` Kyle Huey
@ 2020-08-25 19:32                     ` Andy Lutomirski
  2020-08-25 20:03                       ` Kyle Huey
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2020-08-25 19:32 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Andy Lutomirski, H. Peter Anvin, Robert O'Callahan, Bae,
	Chang Seok, Thomas Gleixner, Ingo Molnar, Andi Kleen, Shankar,
	Ravi V, LKML, Hansen, Dave

On Tue, Aug 25, 2020 at 11:50 AM Kyle Huey <me@kylehuey.com> wrote:
>
> On Tue, Aug 25, 2020 at 10:31 AM Kyle Huey <me@kylehuey.com> wrote:
> >
> > On Tue, Aug 25, 2020 at 9:46 AM Andy Lutomirski <luto@kernel.org> wrote:
> > >
> > > On Tue, Aug 25, 2020 at 9:32 AM Kyle Huey <me@kylehuey.com> wrote:
> > > >
> > > > On Tue, Aug 25, 2020 at 9:12 AM Andy Lutomirski <luto@amacapital.net> wrote:
> > > > > I don’t like this at all. Your behavior really shouldn’t depend on
> > > > > whether the new instructions are available.  Also, some day I would
> > > > > like to change Linux to have the new behavior even if FSGSBASE
> > > > > instructions are not available, and this will break rr again.  (The
> > > > > current !FSGSBASE behavior is an ugly optimization of dubious value.
> > > > > I would not go so far as to describe it as correct.)
> > > >
> > > > Ok.
> > > >
> > > > > I would suggest you do one of the following things:
> > > > >
> > > > > 1. Use int $0x80 directly to load 32-bit regs into a child.  This
> > > > > might dramatically simplify your code and should just do the right
> > > > > thing.
> > > >
> > > > I don't know what that means.
> > >
> > > This is untested, but what I mean is:
> > >
> > > static int ptrace32(int req, pid_t pid, int addr, int data) {
> > >    int ret;
> > >    /* new enough kernels won't clobber r8, etc. */
> > >    asm volatile ("int $0x80" : "=a" (ret) : "a" (26 /* ptrace */), "b"
> > > (req), "c" (pid), "d" (addr), "S" (data) : "flags", "r8", "r9", "r10",
> > > "r11");
> > >    return ret;
> > > }
> > >
> > > with a handful of caveats:
> > >
> > >  - This won't compile with -fPIC, I think.  Instead you'll need to
> > > write a little bit of asm to set up and restore ebx yourself.  gcc is
> > > silly like this.
> > >
> > >  - Note that addr is an int.  You'll need to mmap(..., MAP_32BIT, ...)
> > > to get a buffer that can be pointed to with an int.
> > >
> > > The advantage is that this should work on all kernels that support
> > > 32-bit mode at all.
> > >
> > > >
> > > > > 2. Something like your patch but make it unconditional.
> > > > >
> > > > > 3. Ask for, and receive, real kernel support for setting FS and GS in
> > > > > the way that 32-bit code expects.
> > > >
> > > > I think the easiest way forward for us would be a PTRACE_GET/SETREGSET
> > > > like operation that operates on the regsets according to the
> > > > *tracee*'s bitness (rather than the tracer, as it works currently).
> > > > Does that sound workable?
> > > >
> > >
> > > Strictly speaking, on Linux, there is no unified concept of a task's
> > > bitness, so "set all these registers according to the target's
> > > bitness" is not well defined.  We could easily give you a
> > > PTRACE_SETREGS_X86_32, etc, though.
> >
> > In the process of responding to this I spent some time doing code
> > inspection and discovered a subtlety in the ptrace API that I was
> > previously unaware of. PTRACE_GET/SETREGS use the regset views
> > corresponding to the tracer but PTRACE_GET/SETREGSET use the regset
> > views corresponding to the tracee. This means it is possible for us
> > today to set FS/GS "the old way" with a 64 bit tracer/32 bit tracee
> > combo, as long as we use PTRACE_SETREGSET with NT_PRSTATUS instead of
> > PTRACE_SETREGS.
>
> Alright I reverted the previous changes and switched us to use
> PTRACE_SETREGSET with NT_PRSTATUS[0] and our 32 bit tests pass again.
> I assume this behavior will remain unchanged indefinitely even when
> the fs/gsbase manipulation instructions are not available since the 32
> bit user_regs_struct can't grow?

Correct.

I think it would be reasonable to add new, less erratic PTRACE_SETxyz
modes, but the old ones will stay as is.

Strictly speaking, the issue you had wasn't a ptrace change.  It was a
context switching change.  Before the change, you were programming
garbage into the tracee state, but the context switch code wasn't
capable of restoring the garbage correctly and instead happened to
restore the state you actually wanted.  With the new code, the context
switch actually worked correctly (for some value of correctly), and
the tracee crashed.  Not that this distinction really matters.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-25 19:32                     ` Andy Lutomirski
@ 2020-08-25 20:03                       ` Kyle Huey
  0 siblings, 0 replies; 18+ messages in thread
From: Kyle Huey @ 2020-08-25 20:03 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. Peter Anvin, Robert O'Callahan, Bae, Chang Seok,
	Thomas Gleixner, Ingo Molnar, Andi Kleen, Shankar, Ravi V, LKML,
	Hansen, Dave

On Tue, Aug 25, 2020 at 12:32 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Tue, Aug 25, 2020 at 11:50 AM Kyle Huey <me@kylehuey.com> wrote:
> >
> > On Tue, Aug 25, 2020 at 10:31 AM Kyle Huey <me@kylehuey.com> wrote:
> > >
> > > On Tue, Aug 25, 2020 at 9:46 AM Andy Lutomirski <luto@kernel.org> wrote:
> > > >
> > > > On Tue, Aug 25, 2020 at 9:32 AM Kyle Huey <me@kylehuey.com> wrote:
> > > > >
> > > > > On Tue, Aug 25, 2020 at 9:12 AM Andy Lutomirski <luto@amacapital.net> wrote:
> > > > > > I don’t like this at all. Your behavior really shouldn’t depend on
> > > > > > whether the new instructions are available.  Also, some day I would
> > > > > > like to change Linux to have the new behavior even if FSGSBASE
> > > > > > instructions are not available, and this will break rr again.  (The
> > > > > > current !FSGSBASE behavior is an ugly optimization of dubious value.
> > > > > > I would not go so far as to describe it as correct.)
> > > > >
> > > > > Ok.
> > > > >
> > > > > > I would suggest you do one of the following things:
> > > > > >
> > > > > > 1. Use int $0x80 directly to load 32-bit regs into a child.  This
> > > > > > might dramatically simplify your code and should just do the right
> > > > > > thing.
> > > > >
> > > > > I don't know what that means.
> > > >
> > > > This is untested, but what I mean is:
> > > >
> > > > static int ptrace32(int req, pid_t pid, int addr, int data) {
> > > >    int ret;
> > > >    /* new enough kernels won't clobber r8, etc. */
> > > >    asm volatile ("int $0x80" : "=a" (ret) : "a" (26 /* ptrace */), "b"
> > > > (req), "c" (pid), "d" (addr), "S" (data) : "flags", "r8", "r9", "r10",
> > > > "r11");
> > > >    return ret;
> > > > }
> > > >
> > > > with a handful of caveats:
> > > >
> > > >  - This won't compile with -fPIC, I think.  Instead you'll need to
> > > > write a little bit of asm to set up and restore ebx yourself.  gcc is
> > > > silly like this.
> > > >
> > > >  - Note that addr is an int.  You'll need to mmap(..., MAP_32BIT, ...)
> > > > to get a buffer that can be pointed to with an int.
> > > >
> > > > The advantage is that this should work on all kernels that support
> > > > 32-bit mode at all.
> > > >
> > > > >
> > > > > > 2. Something like your patch but make it unconditional.
> > > > > >
> > > > > > 3. Ask for, and receive, real kernel support for setting FS and GS in
> > > > > > the way that 32-bit code expects.
> > > > >
> > > > > I think the easiest way forward for us would be a PTRACE_GET/SETREGSET
> > > > > like operation that operates on the regsets according to the
> > > > > *tracee*'s bitness (rather than the tracer, as it works currently).
> > > > > Does that sound workable?
> > > > >
> > > >
> > > > Strictly speaking, on Linux, there is no unified concept of a task's
> > > > bitness, so "set all these registers according to the target's
> > > > bitness" is not well defined.  We could easily give you a
> > > > PTRACE_SETREGS_X86_32, etc, though.
> > >
> > > In the process of responding to this I spent some time doing code
> > > inspection and discovered a subtlety in the ptrace API that I was
> > > previously unaware of. PTRACE_GET/SETREGS use the regset views
> > > corresponding to the tracer but PTRACE_GET/SETREGSET use the regset
> > > views corresponding to the tracee. This means it is possible for us
> > > today to set FS/GS "the old way" with a 64 bit tracer/32 bit tracee
> > > combo, as long as we use PTRACE_SETREGSET with NT_PRSTATUS instead of
> > > PTRACE_SETREGS.
> >
> > Alright I reverted the previous changes and switched us to use
> > PTRACE_SETREGSET with NT_PRSTATUS[0] and our 32 bit tests pass again.
> > I assume this behavior will remain unchanged indefinitely even when
> > the fs/gsbase manipulation instructions are not available since the 32
> > bit user_regs_struct can't grow?
>
> Correct.
>
> I think it would be reasonable to add new, less erratic PTRACE_SETxyz
> modes, but the old ones will stay as is.
>
> Strictly speaking, the issue you had wasn't a ptrace change.  It was a
> context switching change.  Before the change, you were programming
> garbage into the tracee state, but the context switch code wasn't
> capable of restoring the garbage correctly and instead happened to
> restore the state you actually wanted.  With the new code, the context
> switch actually worked correctly (for some value of correctly), and
> the tracee crashed.  Not that this distinction really matters.

Yes. We've been feeding the kernel trash for years. You were just
letting us get away with it before ;)

- Kyle

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-22  2:53     ` Andy Lutomirski
@ 2020-08-22  3:03       ` Kyle Huey
  0 siblings, 0 replies; 18+ messages in thread
From: Kyle Huey @ 2020-08-22  3:03 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Bae, Chang Seok, Robert O'Callahan, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andi Kleen,
	Shankar, Ravi V, LKML, Hansen, Dave

On Fri, Aug 21, 2020 at 7:53 PM Andy Lutomirski <luto@amacapital.net> wrote:
>
>
>
> > On Aug 21, 2020, at 2:33 PM, Kyle Huey <me@kylehuey.com> wrote:
> >
> > On Fri, Aug 21, 2020 at 1:08 PM Bae, Chang Seok
> > <chang.seok.bae@intel.com> wrote:
> >>
> >>
> >>>> On Aug 20, 2020, at 21:41, Kyle Huey <me@kylehuey.com> wrote:
> >>>
> >>> On the x86-64 5.9-rc1 TLS is completely broken in 32 bit tracees when
> >>> running under rr[0]. Booting the kernel with `nofsgsbase` fixes it and
> >>> I bisected to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.8&id=673903495c85137791d5820d690229efe09c8f7b.
> >>>
> >>> STR:
> >>> 1. Build rr from source by
> >>> a. git clone https://github.com/mozilla/rr
> >>> b. mkdir rr/obj
> >>> c. cd rr/obj
> >>> d. cmake ..
> >>> e. make -j16
> >>> 2. Run the simple 32 bit tracee outside of rr with `./bin/simple_32`.
> >>> It should print a message and exit cleanly.
> >>> 3. Run it under rr with `./bin/rr ./bin/simple_32`.
> >>>
> >>> It should behave the same way, but with fsgsbase enabled it will
> >>> segfault. The `simple_32` binary is a simple "hello world" type
> >>> program but it does link to pthreads, so pre-main code attempts to
> >>> access TLS variables.
> >>>
> >>> The interplay between 32 bit and 64 bit TLS is dark magic to me
> >>> unfortunately so this is all the useful information I have.
> >>
> >> As I run it and collect the ptrace logs, it starts to set FSBASE with
> >> some numbers, e.g. 140632147826496, and then later attempts to set GS
> >> with 99 through putreg(), not putreg32().
> >>
> >> With FSGSBASE, the FS/GS base is decoupled from the selector. Andy
> >> made putreg32() to retain the old behavior, fetching FS/GS base
> >> according to the index, but the putreg() does not do. So, rr probably
> >> relies on the old behavior as observed to setting the GS index only.
> >> But it was previously considered to be okay with this comment [1]:
> >>
> >>   "Our modifications to fs/gs/fs_base/gs_base are always either a)
> >>    setting values that the kernel set during recording to make them
> >>    happen during replay or b) emulating PTRACE_SET_REGS that a tracee
> >>    ptracer tried to set on another tracee. Either way I think the
> >>    effects are going to be the same as what would happen if the
> >>    program were run without rr."
> >>
> >> It is not straightforward to go into the rr internal to me. Robert,
> >> any thought?
> >
> > Hmm. When we are running a 32 bit tracee in a 64 bit build of rr we
> > internally convert between the 32 bit and 64 bit user_regs_structs[0]
> > at the ptrace boundary. This does not preserve the fs/gsbase (because
> > there is no fs/gsbase in the 32 bit user_regs_struct, of course).
> >
> > 40c45904f818c1f6555294ca27afc5fda4f09e68 added magic for a 32 bit
> > tracer tracing a 32 bit tracee on a 64 bit kernel, but it looks like a
> > 64 bit tracer tracing a 32 bit tracee on a 64 bit kernel *is* now
> > expected to preserve the fs/gsbase values (or die, in our case).
> >
> > Is that correct?
>
> I was certainly not expecting rr to do this, and I thought I had asked in advance.  What exact ptrace() calls are you doing here?  Is this POKEUSER or something else?  Breaking rr is at least impolite, and I’d like to fix this.

I believe we are PTRACE_GETREGSing and later PTRACE_SETREGSing, but
doing the latter with garbage for fs/gs_base for a 32 bit tracee. That
didn't used to matter (because those values were completely ignored
for a 32 bit tracee) but now it does.

There's a good case that that's our fault and I'm happy to spend my
"don't break userspace" points somewhere else ;)

- Kyle

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-21 21:32   ` Kyle Huey
  2020-08-21 21:46     ` Bae, Chang Seok
@ 2020-08-22  2:53     ` Andy Lutomirski
  2020-08-22  3:03       ` Kyle Huey
  1 sibling, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2020-08-22  2:53 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Bae, Chang Seok, Robert O'Callahan, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andi Kleen,
	Shankar, Ravi V, LKML, Hansen, Dave



> On Aug 21, 2020, at 2:33 PM, Kyle Huey <me@kylehuey.com> wrote:
> 
> On Fri, Aug 21, 2020 at 1:08 PM Bae, Chang Seok
> <chang.seok.bae@intel.com> wrote:
>> 
>> 
>>>> On Aug 20, 2020, at 21:41, Kyle Huey <me@kylehuey.com> wrote:
>>> 
>>> On the x86-64 5.9-rc1 TLS is completely broken in 32 bit tracees when
>>> running under rr[0]. Booting the kernel with `nofsgsbase` fixes it and
>>> I bisected to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.8&id=673903495c85137791d5820d690229efe09c8f7b.
>>> 
>>> STR:
>>> 1. Build rr from source by
>>> a. git clone https://github.com/mozilla/rr
>>> b. mkdir rr/obj
>>> c. cd rr/obj
>>> d. cmake ..
>>> e. make -j16
>>> 2. Run the simple 32 bit tracee outside of rr with `./bin/simple_32`.
>>> It should print a message and exit cleanly.
>>> 3. Run it under rr with `./bin/rr ./bin/simple_32`.
>>> 
>>> It should behave the same way, but with fsgsbase enabled it will
>>> segfault. The `simple_32` binary is a simple "hello world" type
>>> program but it does link to pthreads, so pre-main code attempts to
>>> access TLS variables.
>>> 
>>> The interplay between 32 bit and 64 bit TLS is dark magic to me
>>> unfortunately so this is all the useful information I have.
>> 
>> As I run it and collect the ptrace logs, it starts to set FSBASE with
>> some numbers, e.g. 140632147826496, and then later attempts to set GS
>> with 99 through putreg(), not putreg32().
>> 
>> With FSGSBASE, the FS/GS base is decoupled from the selector. Andy
>> made putreg32() to retain the old behavior, fetching FS/GS base
>> according to the index, but the putreg() does not do. So, rr probably
>> relies on the old behavior as observed to setting the GS index only.
>> But it was previously considered to be okay with this comment [1]:
>> 
>>   "Our modifications to fs/gs/fs_base/gs_base are always either a)
>>    setting values that the kernel set during recording to make them
>>    happen during replay or b) emulating PTRACE_SET_REGS that a tracee
>>    ptracer tried to set on another tracee. Either way I think the
>>    effects are going to be the same as what would happen if the
>>    program were run without rr."
>> 
>> It is not straightforward to go into the rr internal to me. Robert,
>> any thought?
> 
> Hmm. When we are running a 32 bit tracee in a 64 bit build of rr we
> internally convert between the 32 bit and 64 bit user_regs_structs[0]
> at the ptrace boundary. This does not preserve the fs/gsbase (because
> there is no fs/gsbase in the 32 bit user_regs_struct, of course).
> 
> 40c45904f818c1f6555294ca27afc5fda4f09e68 added magic for a 32 bit
> tracer tracing a 32 bit tracee on a 64 bit kernel, but it looks like a
> 64 bit tracer tracing a 32 bit tracee on a 64 bit kernel *is* now
> expected to preserve the fs/gsbase values (or die, in our case).
> 
> Is that correct?

I was certainly not expecting rr to do this, and I thought I had asked in advance.  What exact ptrace() calls are you doing here?  Is this POKEUSER or something else?  Breaking rr is at least impolite, and I’d like to fix this.

> 
> - Kyle
> 
> [0] https://github.com/mozilla/rr/blob/fcd2a26680a3fc2bda5f40d732d0c72b9628357b/src/Registers.cc#L519

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-21 21:32   ` Kyle Huey
@ 2020-08-21 21:46     ` Bae, Chang Seok
  2020-08-22  2:53     ` Andy Lutomirski
  1 sibling, 0 replies; 18+ messages in thread
From: Bae, Chang Seok @ 2020-08-21 21:46 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Robert O'Callahan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Andi Kleen, Shankar, Ravi V, LKML,
	Hansen, Dave


> On Aug 21, 2020, at 14:32, Kyle Huey <me@kylehuey.com> wrote:
> 
> 40c45904f818c1f6555294ca27afc5fda4f09e68 added magic for a 32 bit
> tracer tracing a 32 bit tracee on a 64 bit kernel, but it looks like a
> 64 bit tracer tracing a 32 bit tracee on a 64 bit kernel *is* now
> expected to preserve the fs/gsbase values (or die, in our case).
> 
> Is that correct?

Correct.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-21 20:08 ` Bae, Chang Seok
@ 2020-08-21 21:32   ` Kyle Huey
  2020-08-21 21:46     ` Bae, Chang Seok
  2020-08-22  2:53     ` Andy Lutomirski
  0 siblings, 2 replies; 18+ messages in thread
From: Kyle Huey @ 2020-08-21 21:32 UTC (permalink / raw)
  To: Bae, Chang Seok
  Cc: Robert O'Callahan, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Andi Kleen, Shankar, Ravi V, LKML,
	Hansen, Dave

On Fri, Aug 21, 2020 at 1:08 PM Bae, Chang Seok
<chang.seok.bae@intel.com> wrote:
>
>
> > On Aug 20, 2020, at 21:41, Kyle Huey <me@kylehuey.com> wrote:
> >
> > On the x86-64 5.9-rc1 TLS is completely broken in 32 bit tracees when
> > running under rr[0]. Booting the kernel with `nofsgsbase` fixes it and
> > I bisected to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.8&id=673903495c85137791d5820d690229efe09c8f7b.
> >
> > STR:
> > 1. Build rr from source by
> >  a. git clone https://github.com/mozilla/rr
> >  b. mkdir rr/obj
> >  c. cd rr/obj
> >  d. cmake ..
> >  e. make -j16
> > 2. Run the simple 32 bit tracee outside of rr with `./bin/simple_32`.
> > It should print a message and exit cleanly.
> > 3. Run it under rr with `./bin/rr ./bin/simple_32`.
> >
> > It should behave the same way, but with fsgsbase enabled it will
> > segfault. The `simple_32` binary is a simple "hello world" type
> > program but it does link to pthreads, so pre-main code attempts to
> > access TLS variables.
> >
> > The interplay between 32 bit and 64 bit TLS is dark magic to me
> > unfortunately so this is all the useful information I have.
>
> As I run it and collect the ptrace logs, it starts to set FSBASE with
> some numbers, e.g. 140632147826496, and then later attempts to set GS
> with 99 through putreg(), not putreg32().
>
> With FSGSBASE, the FS/GS base is decoupled from the selector. Andy
> made putreg32() to retain the old behavior, fetching FS/GS base
> according to the index, but the putreg() does not do. So, rr probably
> relies on the old behavior as observed to setting the GS index only.
> But it was previously considered to be okay with this comment [1]:
>
>    "Our modifications to fs/gs/fs_base/gs_base are always either a)
>     setting values that the kernel set during recording to make them
>     happen during replay or b) emulating PTRACE_SET_REGS that a tracee
>     ptracer tried to set on another tracee. Either way I think the
>     effects are going to be the same as what would happen if the
>     program were run without rr."
>
> It is not straightforward to go into the rr internal to me. Robert,
> any thought?

Hmm. When we are running a 32 bit tracee in a 64 bit build of rr we
internally convert between the 32 bit and 64 bit user_regs_structs[0]
at the ptrace boundary. This does not preserve the fs/gsbase (because
there is no fs/gsbase in the 32 bit user_regs_struct, of course).

40c45904f818c1f6555294ca27afc5fda4f09e68 added magic for a 32 bit
tracer tracing a 32 bit tracee on a 64 bit kernel, but it looks like a
64 bit tracer tracing a 32 bit tracee on a 64 bit kernel *is* now
expected to preserve the fs/gsbase values (or die, in our case).

Is that correct?

- Kyle

[0] https://github.com/mozilla/rr/blob/fcd2a26680a3fc2bda5f40d732d0c72b9628357b/src/Registers.cc#L519

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
  2020-08-21  4:41 Kyle Huey
@ 2020-08-21 20:08 ` Bae, Chang Seok
  2020-08-21 21:32   ` Kyle Huey
  0 siblings, 1 reply; 18+ messages in thread
From: Bae, Chang Seok @ 2020-08-21 20:08 UTC (permalink / raw)
  To: Kyle Huey, Robert O'Callahan
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andi Kleen, Shankar, Ravi V, LKML, Hansen, Dave


> On Aug 20, 2020, at 21:41, Kyle Huey <me@kylehuey.com> wrote:
> 
> On the x86-64 5.9-rc1 TLS is completely broken in 32 bit tracees when
> running under rr[0]. Booting the kernel with `nofsgsbase` fixes it and
> I bisected to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.8&id=673903495c85137791d5820d690229efe09c8f7b.
> 
> STR:
> 1. Build rr from source by
>  a. git clone https://github.com/mozilla/rr
>  b. mkdir rr/obj
>  c. cd rr/obj
>  d. cmake ..
>  e. make -j16
> 2. Run the simple 32 bit tracee outside of rr with `./bin/simple_32`.
> It should print a message and exit cleanly.
> 3. Run it under rr with `./bin/rr ./bin/simple_32`.
> 
> It should behave the same way, but with fsgsbase enabled it will
> segfault. The `simple_32` binary is a simple "hello world" type
> program but it does link to pthreads, so pre-main code attempts to
> access TLS variables.
> 
> The interplay between 32 bit and 64 bit TLS is dark magic to me
> unfortunately so this is all the useful information I have.

As I run it and collect the ptrace logs, it starts to set FSBASE with
some numbers, e.g. 140632147826496, and then later attempts to set GS
with 99 through putreg(), not putreg32().

With FSGSBASE, the FS/GS base is decoupled from the selector. Andy
made putreg32() to retain the old behavior, fetching FS/GS base
according to the index, but the putreg() does not do. So, rr probably
relies on the old behavior as observed to setting the GS index only.
But it was previously considered to be okay with this comment [1]:

   "Our modifications to fs/gs/fs_base/gs_base are always either a)
    setting values that the kernel set during recording to make them
    happen during replay or b) emulating PTRACE_SET_REGS that a tracee
    ptracer tried to set on another tracee. Either way I think the
    effects are going to be the same as what would happen if the
    program were run without rr."

It is not straightforward to go into the rr internal to me. Robert, 
any thought?

[1] https://mail.mozilla.org/pipermail/rr-dev/2018-March/000615.html

> - Kyle
> 
> [0] https://rr-project.org/




^ permalink raw reply	[flat|nested] 18+ messages in thread

* [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system
@ 2020-08-21  4:41 Kyle Huey
  2020-08-21 20:08 ` Bae, Chang Seok
  0 siblings, 1 reply; 18+ messages in thread
From: Kyle Huey @ 2020-08-21  4:41 UTC (permalink / raw)
  To: Andy Lutomirski, Chang S. Bae, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Andi Kleen
  Cc: Robert O'Callahan, Ravi Shankar, LKML

On the x86-64 5.9-rc1 TLS is completely broken in 32 bit tracees when
running under rr[0]. Booting the kernel with `nofsgsbase` fixes it and
I bisected to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.8&id=673903495c85137791d5820d690229efe09c8f7b.

STR:
1. Build rr from source by
  a. git clone https://github.com/mozilla/rr
  b. mkdir rr/obj
  c. cd rr/obj
  d. cmake ..
  e. make -j16
2. Run the simple 32 bit tracee outside of rr with `./bin/simple_32`.
It should print a message and exit cleanly.
3. Run it under rr with `./bin/rr ./bin/simple_32`.

It should behave the same way, but with fsgsbase enabled it will
segfault. The `simple_32` binary is a simple "hello world" type
program but it does link to pthreads, so pre-main code attempts to
access TLS variables.

The interplay between 32 bit and 64 bit TLS is dark magic to me
unfortunately so this is all the useful information I have.

- Kyle

[0] https://rr-project.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-08-25 20:03 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAOp6jLYrwMqV=7hmxgdZUdDZ2aeUB27TTHm=j6cQT7C10Muhww@mail.gmail.com>
     [not found] ` <7DF88F22-0310-40C9-9DA6-5EBCB4877933@amacapital.net>
2020-08-24 21:10   ` [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system Andy Lutomirski
2020-08-24 23:52     ` H. Peter Anvin
2020-08-25  0:30       ` Andy Lutomirski
2020-08-25  0:46         ` Kyle Huey
2020-08-25 16:12           ` Andy Lutomirski
2020-08-25 16:32             ` Kyle Huey
2020-08-25 16:46               ` Andy Lutomirski
2020-08-25 17:31                 ` Kyle Huey
2020-08-25 18:50                   ` Kyle Huey
2020-08-25 19:32                     ` Andy Lutomirski
2020-08-25 20:03                       ` Kyle Huey
2020-08-25 15:13         ` hpa
2020-08-21  4:41 Kyle Huey
2020-08-21 20:08 ` Bae, Chang Seok
2020-08-21 21:32   ` Kyle Huey
2020-08-21 21:46     ` Bae, Chang Seok
2020-08-22  2:53     ` Andy Lutomirski
2020-08-22  3:03       ` Kyle Huey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).