linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads
       [not found]   ` <CAHk-=wh0KoEZXPYMGkfkeVEerSCEF1AiCZSvz9TRrx=Kj74D+Q@mail.gmail.com>
@ 2021-05-04  8:39     ` Peter Zijlstra
  2021-05-04 15:35       ` Borislav Petkov
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2021-05-04  8:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Thomas Gleixner, Stefan Metzmacher, Jens Axboe,
	Linux Kernel Mailing List, io-uring, the arch/x86 maintainers,
	linux-toolchains


+ linux-toolchains

On Mon, May 03, 2021 at 12:14:45PM -0700, Linus Torvalds wrote:
> On Mon, May 3, 2021 at 9:05 AM Andy Lutomirski <luto@amacapital.net> wrote:
> >
> > Linus, what is the actual effect of allowing gdb to attach these threads?  Can we instead make all the regset ops do:
> >
> > if (not actually a user thread) return -EINVAL;
> 
> I don't think it matters - the end result ends up  being the same, ie
> gdb gets confused about whether the (parent) thread is a 32-bit or
> 64-bit one.
> 
> So the basic issue is
> 
>  (a) we want the IO threads to look exactly like normal user threads
> as far as the kernel is concerned, because we had way too many bugs
> due to special cases.
> 
>  (b) but that means that they are also visible to user space, and then
> gdb has this odd thing where it takes the 64-bit vs 32-bit data for
> the whole process from one thread, and picks the worst possible thread
> to do it (ie explicitly not even the main thread, so usually the IO
> thread!)
> 
> That (a) ended up really being critical. The issues with special cases
> were just horrendous, both for security issues (ie "make them kernel
> threads but carry user credentials" just caused lots of problems) but
> also for various just random other state handling issues (signal state
> in particular).
> 
> So generally, the IO threads are now 100% normal threads - it's
> literally just that they never return to user space because they are
> always just doing the IO offload on the kernel side.
> 
> That part is lovely, but part of the "100% IO threads" really is that
> they share the signal struct too, which in turn means that they very
> much show up as normal threads. Again, not a problem: they really
> _are_ normal threads for all intents and purposes.
> 
> But then that (b) issue means that gdb gets confused by them. I
> personally think that's just a pure gdb mis-feature, but I also think
> that "hey, if we just make the register state look like the main
> thread, and unconfuse gdb that way, problem solved".
> 
> So I'd actually rather not make these non-special threads any more
> special at all. And I strongly suspect that making ptrace() not work
> on them will just confuse gdb even more - so it would make them just
> unnecessarily special in the kernel, for no actual gain.
> 
> Is the right thing to do to fix gdb to not look at irrelevant thread B
> when deciding whether thread A is 64-bit or not? Yeah, that seems like
> obviously the RightThing(tm) to me.
> 
> But at the same time, this is arguably about "regression", although at
> the same time it's "gdb doesn't understand new user programs that use
> new features, film at 11", so I think that argument is partly bogus
> too.
> 
> So my personal preference would be:
> 
>  - make those threads look even more like user threads, even if that
> means giving them pointless user segment data that the threads
> themselves will never use
> 
>    So I think Stefan's patch is reasonable, if not pretty. Literally
> becasue of that "make these threads look even more normal"
> 
>  - ALSO fix gdb that is doing obviously garbage stupid things
> 
> But I'm obviously not involved in that "ALSO fix gdb" part, and
> arguably the kernel hack then makes it more likely that gdb will
> continue doing its insane broken thing.

Anybody on toolchains that can help get GDB fixed?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads
  2021-05-04  8:39     ` [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads Peter Zijlstra
@ 2021-05-04 15:35       ` Borislav Petkov
  2021-05-04 15:55         ` Simon Marchi
  0 siblings, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2021-05-04 15:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Andy Lutomirski, Thomas Gleixner,
	Stefan Metzmacher, Jens Axboe, Linux Kernel Mailing List,
	io-uring, the arch/x86 maintainers, linux-toolchains

On Tue, May 04, 2021 at 10:39:23AM +0200, Peter Zijlstra wrote:
> Anybody on toolchains that can help get GDB fixed?

In the meantime, Tom is looking at fixing this, in case people wanna try
gdb patches or give him a test case or so...

https://sourceware.org/bugzilla/show_bug.cgi?id=27822

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads
  2021-05-04 15:35       ` Borislav Petkov
@ 2021-05-04 15:55         ` Simon Marchi
  2021-05-05 11:29           ` Stefan Metzmacher
  0 siblings, 1 reply; 14+ messages in thread
From: Simon Marchi @ 2021-05-04 15:55 UTC (permalink / raw)
  To: Borislav Petkov, Peter Zijlstra
  Cc: Linus Torvalds, Andy Lutomirski, Thomas Gleixner,
	Stefan Metzmacher, Jens Axboe, Linux Kernel Mailing List,
	io-uring, the arch/x86 maintainers, linux-toolchains

On 2021-05-04 11:35 a.m., Borislav Petkov wrote:
> On Tue, May 04, 2021 at 10:39:23AM +0200, Peter Zijlstra wrote:
>> Anybody on toolchains that can help get GDB fixed?
> 
> In the meantime, Tom is looking at fixing this, in case people wanna try
> gdb patches or give him a test case or so...
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=27822

Yes, please provide reproducing steps in that bug.  Unlike what was said
in this thread, some people do work on gdb and are willing to fix
things, but they can only do so if they know about the problem.

Simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads
  2021-05-04 15:55         ` Simon Marchi
@ 2021-05-05 11:29           ` Stefan Metzmacher
  2021-05-05 21:59             ` Simon Marchi
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Metzmacher @ 2021-05-05 11:29 UTC (permalink / raw)
  To: Simon Marchi, Borislav Petkov, Peter Zijlstra
  Cc: Linus Torvalds, Andy Lutomirski, Thomas Gleixner, Jens Axboe,
	Linux Kernel Mailing List, io-uring, the arch/x86 maintainers,
	linux-toolchains


Am 04.05.21 um 17:55 schrieb Simon Marchi:
> On 2021-05-04 11:35 a.m., Borislav Petkov wrote:
>> On Tue, May 04, 2021 at 10:39:23AM +0200, Peter Zijlstra wrote:
>>> Anybody on toolchains that can help get GDB fixed?
>>
>> In the meantime, Tom is looking at fixing this, in case people wanna try
>> gdb patches or give him a test case or so...
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=27822
> 
> Yes, please provide reproducing steps in that bug.  Unlike what was said
> in this thread, some people do work on gdb and are willing to fix
> things, but they can only do so if they know about the problem.

See https://lore.kernel.org/io-uring/0375b37f-2e1e-7999-53b8-c567422aa181@samba.org/
and https://lore.kernel.org/io-uring/20210411152705.2448053-1-metze@samba.org/T/#m461f280e8c3d32a49bc7da7bb5e214e90d97cf65

The question is why does inferior_ptid doesn't represent the thread
that was specified by 'gdb --pid PIDVAL'

https://www.samba.org/~metze/strace-uring-fail.txt
used "gdb --pid 1396" and does the following ptrace calls:

# grep ptrace strace-uring-fail.txt

> 18:46:35.319925 ptrace(PTRACE_ATTACH, 1396) = 0 <0.000048>
> 18:46:35.321622 ptrace(PTRACE_ATTACH, 1397) = 0 <0.000059>
> 18:46:35.322813 ptrace(PTRACE_ATTACH, 1398) = 0 <0.003052>
> 18:46:35.327287 ptrace(PTRACE_ATTACH, 1399) = 0 <0.000028>
> 18:46:35.334920 ptrace(PTRACE_GETREGS, 1396, NULL, 0x7ffed6173ea0) = 0 <0.000067>
> 18:46:35.341506 ptrace(PTRACE_SETOPTIONS, 1410, NULL, PTRACE_O_TRACESYSGOOD) = 0 <0.000056>
> 18:46:35.341681 ptrace(PTRACE_SETOPTIONS, 1410, NULL, PTRACE_O_TRACEFORK) = 0 <0.000051>
> 18:46:35.341816 ptrace(PTRACE_SETOPTIONS, 1410, NULL, PTRACE_O_TRACEFORK|PTRACE_O_TRACEVFORKDONE) = 0 <0.000054>
> 18:46:35.341957 ptrace(PTRACE_CONT, 1410, NULL, 0) = 0 <0.000056>
> 18:46:35.345568 ptrace(PTRACE_GETEVENTMSG, 1410, NULL, [1411]) = 0 <0.000081>
> 18:46:35.350541 ptrace(PTRACE_SETOPTIONS, 1410, NULL, PTRACE_O_EXITKILL) = 0 <0.000019>
> 18:46:35.354010 ptrace(PTRACE_SETOPTIONS, 1397, NULL, PTRACE_O_TRACESYSGOOD|PTRACE_O_TRACEFORK|PTRACE_O_TRACEVFORK|PTRACE_O_TRACECLONE|PTRACE_O_TRACEEXEC|PTRACE_O_TRACEVFORKDONE) = 0 <0.000019>
> 18:46:35.415730 ptrace(PTRACE_SETOPTIONS, 1396, NULL, PTRACE_O_TRACESYSGOOD|PTRACE_O_TRACEFORK|PTRACE_O_TRACEVFORK|PTRACE_O_TRACECLONE|PTRACE_O_TRACEEXEC|PTRACE_O_TRACEVFORKDONE) = 0 <0.000076>
> 18:46:35.421076 ptrace(PTRACE_GETREGS, 1412, NULL, 0x7ffed6174980) = 0 <0.000088>
> 18:46:35.429498 ptrace(PTRACE_PEEKUSER, 1397, 8*CS, [NULL]) = 0 <0.000022>
> 18:46:35.429632 ptrace(PTRACE_PEEKUSER, 1397, 8*SS + 24, [NULL]) = 0 <0.000019>
> 18:46:35.429732 ptrace(PTRACE_GETREGSET, 1397, NT_X86_XSTATE, [{iov_base=0x7ffed6174780, iov_len=576}]) = 0 <0.000030>
> 18:46:35.435507 ptrace(PTRACE_GETREGS, 1397, NULL, 0x7ffed6173cb0) = 0 <0.000019>
> 18:46:35.445877 ptrace(PTRACE_PEEKTEXT, 1397, 0x56357e99de00, [0x7f49d572b160]) = 0 <0.000057>
> 18:46:35.446043 ptrace(PTRACE_PEEKTEXT, 1397, 0x7f49d572b168, [0x7f49d572b190]) = 0 <0.000049>
> 18:46:35.447192 ptrace(PTRACE_PEEKTEXT, 1397, 0x7f49d572bbf0, [0x64762d78756e696c]) = 0 <0.000060>
> 18:46:35.447368 ptrace(PTRACE_PEEKTEXT, 1397, 0x7f49d572bbf0, [0x64762d78756e696c]) = 0 <0.000075>
> 18:46:35.447571 ptrace(PTRACE_PEEKTEXT, 1397, 0x7f49d572bbf8, [0x312e6f732e6f73]) = 0 <0.000070>
> 18:46:35.447762 ptrace(PTRACE_PEEKTEXT, 1397, 0x7f49d572bbf8, [0x312e6f732e6f73]) = 0 <0.000067>
> 18:46:35.448658 ptrace(PTRACE_PEEKTEXT, 1397, 0x7f49d572be10, [0x3638782f62696c2f]) = 0 <0.000076>
> 18:46:35.448917 ptrace(PTRACE_PEEKTEXT, 1397, 0x7f49d572be10, [0x3638782f62696c2f]) = 0 <0.000050>
> 18:46:35.449051 ptrace(PTRACE_PEEKTEXT, 1397, 0x7f49d572be18, [0x756e696c2d34365f]) = 0 <0.000045>
> 18:46:35.449173 ptrace(PTRACE_PEEKTEXT, 1397, 0x7f49d572be18, [0x756e696c2d34365f]) = 0 <0.000043>
> 18:46:35.449292 ptrace(PTRACE_PEEKTEXT, 1397, 0x7f49d572be20, [0x696c2f756e672d78]) = 0 <0.000042>
> 18:46:35.449414 ptrace(PTRACE_PEEKTEXT, 1397, 0x7f49d572be20, [0x696c2f756e672d78]) = 0 <0.000048>

ptrace(PTRACE_GETREGS, 1396, ... looks expected to me, but
starting with ptrace(PTRACE_PEEKUSER, 1397, 8*CS, [NULL]) (which triggers the actual problem)
it's unexpected to me why 1397 is used instead of 1396.

1397 is the iou-mgr-1396 iothread.

I hope that helps!
metze

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads
  2021-05-05 11:29           ` Stefan Metzmacher
@ 2021-05-05 21:59             ` Simon Marchi
  2021-05-05 22:11               ` Andy Lutomirski
  2021-05-05 22:21               ` Stefan Metzmacher
  0 siblings, 2 replies; 14+ messages in thread
From: Simon Marchi @ 2021-05-05 21:59 UTC (permalink / raw)
  To: Stefan Metzmacher, Borislav Petkov, Peter Zijlstra
  Cc: Linus Torvalds, Andy Lutomirski, Thomas Gleixner, Jens Axboe,
	Linux Kernel Mailing List, io-uring, the arch/x86 maintainers,
	linux-toolchains

On 2021-05-05 7:29 a.m., Stefan Metzmacher wrote:
> See https://lore.kernel.org/io-uring/0375b37f-2e1e-7999-53b8-c567422aa181@samba.org/
> and https://lore.kernel.org/io-uring/20210411152705.2448053-1-metze@samba.org/T/#m461f280e8c3d32a49bc7da7bb5e214e90d97cf65
> 
> The question is why does inferior_ptid doesn't represent the thread
> that was specified by 'gdb --pid PIDVAL'

Hi Stefan,

When you attach to PIDVAL (assuming that PIDVAL is a thread-group
leader), GDB attaches to all the threads of that thread group.  The
inferior_ptid global variable is "the thread we are currently working
with", and changes whenever GDB wants to deal with a different thread.

After attaching to all threads, GDB wants to know more about that
process' architecture (that read_description call mentioned in [1]).
The way this is implemented varies from arch to arch, but as you found
out, for x86-64 it consists of peeking into a thread's CS/DS to
determine whether the process is x86-64, x32 or i386.  The thread used
for this - the inferior_ptid value - just happens to be the latest
thread we switched inferior_ptid to (presumably because we iterated on
the thread list to do something and that was the last thread in the
list).  And up to now, this was working under the assumption that
picking any thread of the process would yield the same values for that
purpose.  I don't think it was intentionally done this way, but it
worked and didn't cause any trouble until now.

With io threads, that assumption doesn't hold anymore.  From what I
understand, your v1 patch made it so that the kernel puts the CS/DS
values GDB expects in the io threads (even though they are never
actually used otherwise).  I don't understand how your v2 patch
addresses the problem though.

I don't think it would be a problem on the GDB-side to make sure to
fetch these values from a "standard" thread.  Most likely the thread
group leader, like Tom has proposed in [1].

Thanks,

Simon

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=27822#c0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads
  2021-05-05 21:59             ` Simon Marchi
@ 2021-05-05 22:11               ` Andy Lutomirski
  2021-05-05 23:12                 ` Borislav Petkov
                                   ` (2 more replies)
  2021-05-05 22:21               ` Stefan Metzmacher
  1 sibling, 3 replies; 14+ messages in thread
From: Andy Lutomirski @ 2021-05-05 22:11 UTC (permalink / raw)
  To: Simon Marchi
  Cc: Stefan Metzmacher, Borislav Petkov, Peter Zijlstra,
	Linus Torvalds, Thomas Gleixner, Jens Axboe,
	Linux Kernel Mailing List, io-uring, the arch/x86 maintainers,
	linux-toolchains

I'm not holding my breath, but:

On Wed, May 5, 2021 at 2:59 PM Simon Marchi <simon.marchi@polymtl.ca> wrote:
>
> On 2021-05-05 7:29 a.m., Stefan Metzmacher wrote:
> > See https://lore.kernel.org/io-uring/0375b37f-2e1e-7999-53b8-c567422aa181@samba.org/
> > and https://lore.kernel.org/io-uring/20210411152705.2448053-1-metze@samba.org/T/#m461f280e8c3d32a49bc7da7bb5e214e90d97cf65
> >
> > The question is why does inferior_ptid doesn't represent the thread
> > that was specified by 'gdb --pid PIDVAL'
>
> Hi Stefan,
>
> When you attach to PIDVAL (assuming that PIDVAL is a thread-group
> leader), GDB attaches to all the threads of that thread group.  The
> inferior_ptid global variable is "the thread we are currently working
> with", and changes whenever GDB wants to deal with a different thread.
>
> After attaching to all threads, GDB wants to know more about that
> process' architecture (that read_description call mentioned in [1]).

^^^^^^

For what it's worth, this is already fundamentally incorrect.  On
x86_64 Linux, a process *does* *not* *have* an architecture.  Every
task on an x86_64 Linux host has a full 64-bit register state.  The
task can, and sometimes does, change CS using far transfers or other
bizarre techniques, and neither the kernel nor GDB will be notified or
have a chance to take any action in response.  ELF files can be
32-bit, CS:rIP can point at 32-bit code, and system calls can be
32-bit (even from 64-bit code), but *tasks* are not 32-bit.

Now I realize that the ptrace() API is awful and makes life difficult
in several respects for no good reason but, if gdb is ever interested
in fixing its ideas about architecture to understand that all tasks,
even those that think of themselves as "compat", have full 64-bit
state, I would be more than willing to improve the ptrace() API as
needed to make this work well.

Since I'm not holding my breath, please at least keep in mind that
anything you do here is merely a heuristic, cannot be fully correct,
and then whenever gdb determines that a thread group or a thread is
"32-bit", gdb is actually deciding to operate in a degraded mode for
that task, is not accurately representing the task state, and is at
risk of crashing, malfunctioning, or crashing the inferior due to its
incorrect assumptions.  If you have ever attached gdb to QEMU's
gdbserver and tried to debug the early boot process of a 64-bit Linux
kernel, you may have encountered this class of bugs.  gdb works very,
very poorly for this use case.

(To avoid confusion, this is not a universal property of Linux.  arm64
and arm32 tasks on an arm64 Linux host are different and cannot
arbitrarily switch modes.)

--Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads
  2021-05-05 21:59             ` Simon Marchi
  2021-05-05 22:11               ` Andy Lutomirski
@ 2021-05-05 22:21               ` Stefan Metzmacher
  2021-05-05 23:15                 ` Simon Marchi
  1 sibling, 1 reply; 14+ messages in thread
From: Stefan Metzmacher @ 2021-05-05 22:21 UTC (permalink / raw)
  To: Simon Marchi, Borislav Petkov, Peter Zijlstra
  Cc: Linus Torvalds, Andy Lutomirski, Thomas Gleixner, Jens Axboe,
	Linux Kernel Mailing List, io-uring, the arch/x86 maintainers,
	linux-toolchains


Hi Simon,

> When you attach to PIDVAL (assuming that PIDVAL is a thread-group
> leader), GDB attaches to all the threads of that thread group.  The
> inferior_ptid global variable is "the thread we are currently working
> with", and changes whenever GDB wants to deal with a different thread.
> 
> After attaching to all threads, GDB wants to know more about that
> process' architecture (that read_description call mentioned in [1]).
> The way this is implemented varies from arch to arch, but as you found
> out, for x86-64 it consists of peeking into a thread's CS/DS to
> determine whether the process is x86-64, x32 or i386.  The thread used
> for this - the inferior_ptid value - just happens to be the latest
> thread we switched inferior_ptid to (presumably because we iterated on
> the thread list to do something and that was the last thread in the
> list).  And up to now, this was working under the assumption that
> picking any thread of the process would yield the same values for that
> purpose.  I don't think it was intentionally done this way, but it
> worked and didn't cause any trouble until now.
> 
> With io threads, that assumption doesn't hold anymore.

Yes, in 5.12

> From what I understand, your v1 patch made it so that the kernel puts the CS/DS
> values GDB expects in the io threads (even though they are never
> actually used otherwise).  I don't understand how your v2 patch
> addresses the problem though.

v1 did clear everything and tried to keep some selected registers:

  'memset(childregs, 0, sizeof(struct pt_regs));'
  childregs->cs = currenttrgs->cs;
  childregs->ss = currenttrgs->ss;
  childregs->ds = currenttrgs->ds;
  childregs->es = currenttrgs->es;

v2 copies everything and only clears 3 selected registers (I added the last two for
   the PF_IO_WORKER case:

  *childregs = *current_pt_regs();
  childregs->ax = 0;
  ...
  childregs->ip = 0;
  childregs->sp = 0;

So regarding childregs->cs and childregs->ds the effect is the same.

(Also note that on x86_64 ds in handled by savesegment(ds, p->thread.ds);
already instead so the effective problem as only childregs->cs which
is cleared in 5.12, but will be kept with both of my patches.

> I don't think it would be a problem on the GDB-side to make sure to
> fetch these values from a "standard" thread.  Most likely the thread
> group leader, like Tom has proposed in [1].

Ok.

Is it clear now?
metze

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads
  2021-05-05 22:11               ` Andy Lutomirski
@ 2021-05-05 23:12                 ` Borislav Petkov
  2021-05-05 23:22                   ` Andy Lutomirski
  2021-05-06  1:04                 ` Simon Marchi
  2021-05-06  9:47                 ` David Laight
  2 siblings, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2021-05-05 23:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Simon Marchi, Stefan Metzmacher, Peter Zijlstra, Linus Torvalds,
	Thomas Gleixner, Jens Axboe, Linux Kernel Mailing List, io-uring,
	the arch/x86 maintainers, linux-toolchains

On Wed, May 05, 2021 at 03:11:18PM -0700, Andy Lutomirski wrote:
> Since I'm not holding my breath, please at least keep in mind that
> anything you do here is merely a heuristic, cannot be fully correct,
> and then whenever gdb determines that a thread group or a thread is
> "32-bit", gdb is actually deciding to operate in a degraded mode for
> that task, is not accurately representing the task state, and is at
> risk of crashing, malfunctioning, or crashing the inferior due to its
> incorrect assumptions.  If you have ever attached gdb to QEMU's
> gdbserver and tried to debug the early boot process of a 64-bit Linux
> kernel, you may have encountered this class of bugs.  gdb works very,
> very poorly for this use case.

So we were talking about this with toolchain folks today and they gave
me this example:

Imagine you've stopped the target this way:

	<insn><-- stopped here
	<insn>
	<mode changing insn>
	<insn>
	<insn>
	...

now, if you dump rIP and say, rIP + the 10 following insns at the place
you've stopped it, gdb cannot know that 2 insns further into the stream
a

<mode changing insn>

is coming and it should change the disassembly of the insns after that
<mode changing insn> to the new mode. Unless it goes and inspects all
further instructions and disassembles them and analyzes the flow...

So what you can do is 

(gdb) set arch ...

at the <mode changing insn> to the mode you're changing to.

Dunno, maybe I'm missing something but this sounds like without user
help gdb can only assume things.

Good night and good luck.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads
  2021-05-05 22:21               ` Stefan Metzmacher
@ 2021-05-05 23:15                 ` Simon Marchi
  0 siblings, 0 replies; 14+ messages in thread
From: Simon Marchi @ 2021-05-05 23:15 UTC (permalink / raw)
  To: Stefan Metzmacher, Borislav Petkov, Peter Zijlstra
  Cc: Linus Torvalds, Andy Lutomirski, Thomas Gleixner, Jens Axboe,
	Linux Kernel Mailing List, io-uring, the arch/x86 maintainers,
	linux-toolchains

> Is it clear now?
> metze

Yes, thanks.  I had missed that the point of the 2nd patch is to not do
the memset for these threads.  Makes sense now.

Simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads
  2021-05-05 23:12                 ` Borislav Petkov
@ 2021-05-05 23:22                   ` Andy Lutomirski
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2021-05-05 23:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Simon Marchi, Stefan Metzmacher, Peter Zijlstra,
	Linus Torvalds, Thomas Gleixner, Jens Axboe,
	Linux Kernel Mailing List, io-uring, the arch/x86 maintainers,
	linux-toolchains

On Wed, May 5, 2021 at 4:12 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, May 05, 2021 at 03:11:18PM -0700, Andy Lutomirski wrote:
> > Since I'm not holding my breath, please at least keep in mind that
> > anything you do here is merely a heuristic, cannot be fully correct,
> > and then whenever gdb determines that a thread group or a thread is
> > "32-bit", gdb is actually deciding to operate in a degraded mode for
> > that task, is not accurately representing the task state, and is at
> > risk of crashing, malfunctioning, or crashing the inferior due to its
> > incorrect assumptions.  If you have ever attached gdb to QEMU's
> > gdbserver and tried to debug the early boot process of a 64-bit Linux
> > kernel, you may have encountered this class of bugs.  gdb works very,
> > very poorly for this use case.
>
> So we were talking about this with toolchain folks today and they gave
> me this example:
>
> Imagine you've stopped the target this way:
>
>         <insn><-- stopped here
>         <insn>
>         <mode changing insn>
>         <insn>
>         <insn>
>         ...
>
> now, if you dump rIP and say, rIP + the 10 following insns at the place
> you've stopped it, gdb cannot know that 2 insns further into the stream
> a
>
> <mode changing insn>
>
> is coming and it should change the disassembly of the insns after that
> <mode changing insn> to the new mode. Unless it goes and inspects all
> further instructions and disassembles them and analyzes the flow...

That's fine.  x86 machine code is like this.  You also can't
disassemble instructions before rIP accurately either.

>
> So what you can do is
>
> (gdb) set arch ...
>
> at the <mode changing insn> to the mode you're changing to.
>
> Dunno, maybe I'm missing something but this sounds like without user
> help gdb can only assume things.

In the tools/testing/x86/selftests directory, edited slightly for brevity:

$ gdb ./test_syscall_vdso_32
(gdb) b call64_from_32
Breakpoint 1 at 0x80499ef: file thunks_32.S, line 19.
(gdb) display/i $pc
1: x/i $pc
<error: No registers.>
(gdb) r
Starting program:
/home/luto/apps/linux/tools/testing/selftests/x86/test_syscall_vdso_32
...
[RUN]    Executing 6-argument 32-bit syscall via VDSO

Breakpoint 1, call64_from_32 () at thunks_32.S:19
19        mov    4(%esp), %eax
1: x/i $pc
=> 0x80499ef <call64_from_32>:    mov    0x4(%esp),%eax
(gdb) si
22        push    %ecx
1: x/i $pc
=> 0x80499f3 <call64_from_32+4>:    push   %ecx
(gdb)
call64_from_32 () at thunks_32.S:23
23        push    %edx
1: x/i $pc
=> 0x80499f4 <call64_from_32+5>:    push   %edx
(gdb)
call64_from_32 () at thunks_32.S:24
24        push    %esi
1: x/i $pc
=> 0x80499f5 <call64_from_32+6>:    push   %esi
(gdb)
call64_from_32 () at thunks_32.S:25
25        push    %edi
1: x/i $pc
=> 0x80499f6 <call64_from_32+7>:    push   %edi
(gdb)
call64_from_32 () at thunks_32.S:28
28        jmp    $0x33,$1f
1: x/i $pc
=> 0x80499f7 <call64_from_32+8>:    ljmp   $0x33,$0x80499fe
(gdb) info registers
eax            0x80492e8           134517480
ecx            0x3f                63
edx            0x1                 1
ebx            0xf7fc8550          -134445744
esp            0xffffc57c          0xffffc57c
ebp            0xffffc5e8          0xffffc5e8
esi            0x0                 0
edi            0x8049180           134517120
eip            0x80499f7           0x80499f7 <call64_from_32+8>
eflags         0x292               [ AF SF IF ]
cs             0x23                35
ss             0x2b                43
ds             0x2b                43
es             0x2b                43
fs             0x0                 0
gs             0x63                99
(gdb) si
32        call    *%rax
1: x/i $pc
=> 0x80499fe <call64_from_32+15>:    call   *%eax
(gdb) info registers
eax            0x80492e8           134517480

^^^ Should be rax

ecx            0x3f                63
edx            0x1                 1
ebx            0xf7fc8550          -134445744
esp            0xffffc57c          0xffffc57c
ebp            0xffffc5e8          0xffffc5e8
esi            0x0                 0
edi            0x8049180           134517120
eip            0x80499fe           0x80499fe <call64_from_32+15>

^^^ r8, etc are all missing

eflags         0x292               [ AF SF IF ]
cs             0x33                51

^^^ 64-bit!

ss             0x2b                43
ds             0x2b                43
es             0x2b                43
fs             0x0                 0
gs             0x63                99
(gdb) si
poison_regs64 () at test_syscall_vdso.c:35
35    long syscall_addr;
1: x/i $pc
=> 0x80492e8 <poison_regs64>:    dec    %ecx
(gdb) si
36    long get_syscall(char **envp)
1: x/i $pc
=> 0x80492ef <poison_regs64+7>:    dec    %ecx
(gdb) set arch i386:x86-64
warning: Selected architecture i386:x86-64 is not compatible with
reported target architecture i386
Architecture `i386:x86-64' not recognized.
The target architecture is set to "auto" (currently "i386").
(gdb) set arch i386:x86-64:intel
warning: Selected architecture i386:x86-64:intel is not compatible
with reported target architecture i386
Architecture `i386:x86-64:intel' not recognized.
The target architecture is set to "auto" (currently "i386").

I don't know enough about gdb internals to know precisely what failed
here, but this did not work the way it should have.

Sure, ptrace should provide a nice API to figure out that CS == 0x33
means long mode, but gdb could do a whole lot better here.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads
  2021-05-05 22:11               ` Andy Lutomirski
  2021-05-05 23:12                 ` Borislav Petkov
@ 2021-05-06  1:04                 ` Simon Marchi
  2021-05-06 15:11                   ` Andy Lutomirski
  2021-05-06  9:47                 ` David Laight
  2 siblings, 1 reply; 14+ messages in thread
From: Simon Marchi @ 2021-05-06  1:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Stefan Metzmacher, Borislav Petkov, Peter Zijlstra,
	Linus Torvalds, Thomas Gleixner, Jens Axboe,
	Linux Kernel Mailing List, io-uring, the arch/x86 maintainers,
	linux-toolchains

On 2021-05-05 6:11 p.m., Andy Lutomirski wrote:
> For what it's worth, this is already fundamentally incorrect.  On
> x86_64 Linux, a process *does* *not* *have* an architecture.  Every
> task on an x86_64 Linux host has a full 64-bit register state.  The
> task can, and sometimes does, change CS using far transfers or other
> bizarre techniques, and neither the kernel nor GDB will be notified or
> have a chance to take any action in response.  ELF files can be
> 32-bit, CS:rIP can point at 32-bit code, and system calls can be
> 32-bit (even from 64-bit code), but *tasks* are not 32-bit.

Thanks for that explanation: I didn't know that "32-bit" tasks had the
same register state as any other task.  I never really looked into it,
but I was assuming that tasks were either 32-bit or 64-bit (based on the
ELF header of the file they exec'd or something) and that 32-bit tasks
had the register state as a task on an i386 machine would have.  And
that PEEKUSER would return the 64-bit register state for 64-bit tasks,
and 32-bit register state for 32-bit tasks.

I looked at how GDB reads registers from a "64-bit" task and a "32-bit"
task (I have to quote now, since I now know it's an abuse of
terminology) side by side.  And indeed, GDB reads a full 64-bit state in
both cases.  For the 32-bit case, it picks the 32-bit values from that
buffer.  For example, to get the eax value it picks the low 4 bytes of
rax (well, ax in user_regs_struct).

So I suppose that if GDB wanted to tell nothing but the truth, it would
present the full 64-bit register state to the user even when debugging a
32-bit program.  But at the end of the day, the typical user debugging a
32-bit program on a 64-bit probably just wants the illusion that they
are on i386.

> Now I realize that the ptrace() API is awful and makes life difficult
> in several respects for no good reason but, if gdb is ever interested
> in fixing its ideas about architecture to understand that all tasks,
> even those that think of themselves as "compat", have full 64-bit
> state, I would be more than willing to improve the ptrace() API as
> needed to make this work well.

Just wondering, do you have specific ptrace shortcomings in mind when
saying this?  As I found above, ptrace lets us read the whole 64-bit
register state.  After that it's up to us to analyze the state of the
program based on its registers and memory.  What more could ptrace give
us?

> Since I'm not holding my breath, please at least keep in mind that
> anything you do here is merely a heuristic, cannot be fully correct,
> and then whenever gdb determines that a thread group or a thread is
> "32-bit", gdb is actually deciding to operate in a degraded mode for
> that task, is not accurately representing the task state, and is at
> risk of crashing, malfunctioning, or crashing the inferior due to its
> incorrect assumptions.  If you have ever attached gdb to QEMU's
> gdbserver and tried to debug the early boot process of a 64-bit Linux
> kernel, you may have encountered this class of bugs.  gdb works very,
> very poorly for this use case.

Yes, that QEMU case comes up often.  I wish that things were better, but
the reality is that this is an edge case, it would require somebody with
that particular itch to scratch to work on GDB to improve that use case.
So as you said, don't hold your breath :).

I completely understand that GDB putting processes in the "32-bit" or
"64-bit" bin is not the right thing to do in general, from a kernel
perspective.  But it converged to this because it's enough for and
useful to the 99.9% of users who debug programs that don't do funky
things.  At least, it's good to know about it in case problems related
to this arise in the future.

Simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads
  2021-05-05 22:11               ` Andy Lutomirski
  2021-05-05 23:12                 ` Borislav Petkov
  2021-05-06  1:04                 ` Simon Marchi
@ 2021-05-06  9:47                 ` David Laight
  2021-05-06  9:53                   ` David Laight
  2 siblings, 1 reply; 14+ messages in thread
From: David Laight @ 2021-05-06  9:47 UTC (permalink / raw)
  To: 'Andy Lutomirski', Simon Marchi
  Cc: Stefan Metzmacher, Borislav Petkov, Peter Zijlstra,
	Linus Torvalds, Thomas Gleixner, Jens Axboe,
	Linux Kernel Mailing List, io-uring, the arch/x86 maintainers,
	linux-toolchains

> (To avoid confusion, this is not a universal property of Linux.  arm64
> and arm32 tasks on an arm64 Linux host are different and cannot
> arbitrarily switch modes.)

Although there are patches lurking to change that.
(not from me).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads
  2021-05-06  9:47                 ` David Laight
@ 2021-05-06  9:53                   ` David Laight
  0 siblings, 0 replies; 14+ messages in thread
From: David Laight @ 2021-05-06  9:53 UTC (permalink / raw)
  To: 'Andy Lutomirski', 'Simon Marchi'
  Cc: 'Stefan Metzmacher', 'Borislav Petkov',
	'Peter Zijlstra', 'Linus Torvalds',
	'Thomas Gleixner', 'Jens Axboe',
	'Linux Kernel Mailing List', 'io-uring',
	'the arch/x86 maintainers',
	'linux-toolchains@vger.kernel.org'

> > (To avoid confusion, this is not a universal property of Linux.  arm64
> > and arm32 tasks on an arm64 Linux host are different and cannot
> > arbitrarily switch modes.)
> 
> Although there are patches lurking to change that.
> (not from me).

Actually they may be just to allow 64bit tasks make 32bit system calls.
The code is almost certainly still 54bit.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads
  2021-05-06  1:04                 ` Simon Marchi
@ 2021-05-06 15:11                   ` Andy Lutomirski
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2021-05-06 15:11 UTC (permalink / raw)
  To: Simon Marchi
  Cc: Andy Lutomirski, Stefan Metzmacher, Borislav Petkov,
	Peter Zijlstra, Linus Torvalds, Thomas Gleixner, Jens Axboe,
	Linux Kernel Mailing List, io-uring, the arch/x86 maintainers,
	linux-toolchains

On Wed, May 5, 2021 at 6:04 PM Simon Marchi <simon.marchi@polymtl.ca> wrote:
>
> On 2021-05-05 6:11 p.m., Andy Lutomirski wrote:

> I looked at how GDB reads registers from a "64-bit" task and a "32-bit"
> task (I have to quote now, since I now know it's an abuse of
> terminology) side by side.  And indeed, GDB reads a full 64-bit state in
> both cases.  For the 32-bit case, it picks the 32-bit values from that
> buffer.  For example, to get the eax value it picks the low 4 bytes of
> rax (well, ax in user_regs_struct).
>
> So I suppose that if GDB wanted to tell nothing but the truth, it would
> present the full 64-bit register state to the user even when debugging a
> 32-bit program.  But at the end of the day, the typical user debugging a
> 32-bit program on a 64-bit probably just wants the illusion that they
> are on i386.

True.  I see no reason, especially by default, to show the extra
registers.  On the other hand, if the program switches modes, having
gdb notice would be nice.  And, if gdb handled this correctly, all
this io_uring stuff would be entirely moot.  The made-up register
state of the io_uring thread would have no bearing on the debugging of
other threads.

>
> > Now I realize that the ptrace() API is awful and makes life difficult
> > in several respects for no good reason but, if gdb is ever interested
> > in fixing its ideas about architecture to understand that all tasks,
> > even those that think of themselves as "compat", have full 64-bit
> > state, I would be more than willing to improve the ptrace() API as
> > needed to make this work well.
>
> Just wondering, do you have specific ptrace shortcomings in mind when
> saying this?  As I found above, ptrace lets us read the whole 64-bit
> register state.  After that it's up to us to analyze the state of the
> program based on its registers and memory.  What more could ptrace give
> us?

Two specific issues come to mind:

1. PTRACE_GETREGSET and PTRACE_SETREGSET are terminally broken.  See
the comment above task_user_regset_view() in arch/x86/kernel/ptrace.c.
We need a new version of those APIs that takes an e_machine parameter.
(I don't even see how you can call these APIs safely at all, short of
allocating a buffer with a guard page or intentionally over-allocating
and calculating the maximum possible size of buffer that could be used
in case of a screwup.)

2. There should be an API to either read the descriptor table or to
look up a specific descriptor.  How else are you supposed to know
whether CS.L is set?  (Keep in mind that 0x33 is not necessarily the
only long mode segment that gets used.  Linux on Xen PV has an extra
one.)

--Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-05-06 15:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <8735v3ex3h.ffs@nanos.tec.linutronix.de>
     [not found] ` <3C41339D-29A2-4AB1-958F-19DB0A92D8D7@amacapital.net>
     [not found]   ` <CAHk-=wh0KoEZXPYMGkfkeVEerSCEF1AiCZSvz9TRrx=Kj74D+Q@mail.gmail.com>
2021-05-04  8:39     ` [PATCH] io_thread/x86: don't reset 'cs', 'ss', 'ds' and 'es' registers for io_threads Peter Zijlstra
2021-05-04 15:35       ` Borislav Petkov
2021-05-04 15:55         ` Simon Marchi
2021-05-05 11:29           ` Stefan Metzmacher
2021-05-05 21:59             ` Simon Marchi
2021-05-05 22:11               ` Andy Lutomirski
2021-05-05 23:12                 ` Borislav Petkov
2021-05-05 23:22                   ` Andy Lutomirski
2021-05-06  1:04                 ` Simon Marchi
2021-05-06 15:11                   ` Andy Lutomirski
2021-05-06  9:47                 ` David Laight
2021-05-06  9:53                   ` David Laight
2021-05-05 22:21               ` Stefan Metzmacher
2021-05-05 23:15                 ` Simon Marchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).