Fix get ERESTARTSYS with m32 in x86_64 when debug by GDB
diff mbox series

Message ID CANFwon0oLO+qCtpewc=BxKBOm05aBMpV=yG0CxwW1isWHfnZqw@mail.gmail.com
State New, archived
Headers show
Series
  • Fix get ERESTARTSYS with m32 in x86_64 when debug by GDB
Related show

Commit Message

Hui Zhu April 21, 2014, 4:19 p.m. UTC
#cat gdb.base/interrupt.c
#include <errno.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

#ifdef SIGNALS
#include <signal.h>

static void
sigint_handler (int signo)
{
}
#endif

int
main ()
{
  char x;
  int nbytes;
#ifdef SIGNALS
  signal (SIGINT, sigint_handler);
#endif
  printf ("talk to me baby\n");
  while (1)
    {
      nbytes = read (0, &x, 1);
      if (nbytes < 0)
{
#ifdef EINTR
 if (errno != EINTR)
#endif
   {
     perror ("");
     return 1;
   }
}
      else if (nbytes == 0)
{
 printf ("end of file\n");
 exit (0);
}
      else
write (1, &x, 1);
    }
  return 0;
}

int
func1 ()
{
  return 4;
}
#gcc -g -m32 gdb.base/interrupt.c
#gdb ./a.out
(gdb) r
Starting program: /home/teawater/gdb/binutils-gdb/gdb/testsuite/a.out
talk to me baby
data
data
^C
Program received signal SIGINT, Interrupt.
0xf7ffd430 in __kernel_vsyscall ()
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0xf7ffd430 in __kernel_vsyscall ()
(gdb) p func1()
$1 = 4
(gdb) c
Continuing.
Unknown error 512
[Inferior 1 (process 7953) exited with code 01]

      nbytes = read (0, &x, 1);
      if (nbytes < 0)
{
#ifdef EINTR
 if (errno != EINTR)
#endif
After GDB call a function "func1()" by hands, "read" will get
errno 512(ERESTARTSYS) that should handled by Linux kernel.

The root cause of this issue is:
When user use ctrl-c stop the inferior, the signal will be handled in
Linux kernel function "do_signal" in arch/x86/kernel/signal.c.
The inferior will be stoped by function "ptrace_stop".  The call trace is:
#0  freezable_schedule () at include/linux/freezer.h:172
#1  ptrace_stop (exit_code=exit_code@entry=5, why=why@entry=262148,
    clear_code=clear_code@entry=0, info=info@entry=0xffff88001d833e78)
    at kernel/signal.c:1920
#2  0xffffffff8107ec33 in ptrace_signal (info=0xffff88001d833e78, signr=5)
    at kernel/signal.c:2157
#3  get_signal_to_deliver (info=info@entry=0xffff88001d833e78,
    return_ka=return_ka@entry=0xffff88001d833e58, regs=<optimized out>,
    cookie=cookie@entry=0x0 <irq_stack_union>) at kernel/signal.c:2269
#4  0xffffffff81013438 in do_signal (regs=regs@entry=0xffff88001d833f58)
    at arch/x86/kernel/signal.c:696
#5  0xffffffff81013a40 in do_notify_resume (regs=0xffff88001d833f58,
    unused=<optimized out>, thread_info_flags=4) at arch/x86/kernel/signal.c:747
#6  <signal handler called>
#7  0x0000000000000000 in irq_stack_union ()

When GDB "call func1()", to control inferior execute the function func1()
and go back to old ip.  GDB need set all the registers by GDB function
"amd64_collect_native_gregset" that will zero-extend most of 32 bits registers
to 64 bits and set to inferior.
And execute from ptrace_stop and got back to do_signal.
current_thread_info()->status TS_COMPAT will be clean by function
"int_with_check" when it return to user space.

When GDB "continue", inferior will execute from ptrace_stop and got back
to do_signal again.
Because this signal interrupt a syscall, go back to function do_signal
will use function "syscall_get_error" check if this is a syscall and got
error:
static inline long syscall_get_error(struct task_struct *task,
    struct pt_regs *regs)
{
unsigned long error = regs->ax;
#ifdef CONFIG_IA32_EMULATION
/*
* TS_COMPAT is set for 32-bit syscall entries and then
* remains set until we return to user mode.
*/
if (task_thread_info(task)->status & TS_COMPAT)
/*
* Sign-extend the value so (int)-EFOO becomes (long)-EFOO
* and will match correctly in comparisons.
*/
error = (long) (int) error;
#endif
return IS_ERR_VALUE(error) ? error : 0;
}
Now ax is in 32 bits now, need sign-extend to 64 bits.  But
current_thread_info()->status TS_COMPAT is cleared when GDB call "call func1()".
Linux kernel don't know this is a 32 bits task and will not extend it.
Then -ERESTARTSYS is not be handled and go back to user space.

Then the syscall "read" get a errno in ERESTARTSYS.

To fix this issue, I tried to add a local variable to "do_signal" but
it is not works.  The stack is cleared before GDB "continue".
so I make a patch that add "test_thread_flag (TIF_IA32)" to syscall_get_error.

Signed-off-by: Hui Zhu <hui@codesourcery.com>
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

H. Peter Anvin April 21, 2014, 4:33 p.m. UTC | #1
On 04/21/2014 09:19 AM, Hui Zhu wrote:
> }
> Now ax is in 32 bits now, need sign-extend to 64 bits.  But
> current_thread_info()->status TS_COMPAT is cleared when GDB call "call func1()".
> Linux kernel don't know this is a 32 bits task and will not extend it.
> Then -ERESTARTSYS is not be handled and go back to user space.
> 
> Then the syscall "read" get a errno in ERESTARTSYS.
> 
> To fix this issue, I tried to add a local variable to "do_signal" but
> it is not works.  The stack is cleared before GDB "continue".
> so I make a patch that add "test_thread_flag (TIF_IA32)" to syscall_get_error.
> 
> Signed-off-by: Hui Zhu <hui@codesourcery.com>
> ---
> --- a/arch/x86/include/asm/syscall.h
> +++ b/arch/x86/include/asm/syscall.h
> @@ -48,7 +48,8 @@ static inline long syscall_get_error(str
>   * TS_COMPAT is set for 32-bit syscall entries and then
>   * remains set until we return to user mode.
>   */
> - if (task_thread_info(task)->status & TS_COMPAT)
> + if ((task_thread_info(task)->status & TS_COMPAT)
> +    || test_thread_flag (TIF_IA32))
>   /*
>   * Sign-extend the value so (int)-EFOO becomes (long)-EFOO
>   * and will match correctly in comparisons.
> 

No, this is definitely not the right fix.  Your description is
incredibly hard to follow, but I feel pretty strongly that the above is
at the very best a last resort fix.  TS_COMPAT is a local property
whereas TIF_IA32 is global; it is important to keep their respective
uses correct.  Mixing them is almost guaranteed to be just plain wrong.

	-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Hui Zhu April 30, 2014, 3:44 a.m. UTC | #2
On Tue, Apr 22, 2014 at 12:33 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 04/21/2014 09:19 AM, Hui Zhu wrote:
>> }
>> Now ax is in 32 bits now, need sign-extend to 64 bits.  But
>> current_thread_info()->status TS_COMPAT is cleared when GDB call "call func1()".
>> Linux kernel don't know this is a 32 bits task and will not extend it.
>> Then -ERESTARTSYS is not be handled and go back to user space.
>>
>> Then the syscall "read" get a errno in ERESTARTSYS.
>>
>> To fix this issue, I tried to add a local variable to "do_signal" but
>> it is not works.  The stack is cleared before GDB "continue".
>> so I make a patch that add "test_thread_flag (TIF_IA32)" to syscall_get_error.
>>
>> Signed-off-by: Hui Zhu <hui@codesourcery.com>
>> ---
>> --- a/arch/x86/include/asm/syscall.h
>> +++ b/arch/x86/include/asm/syscall.h
>> @@ -48,7 +48,8 @@ static inline long syscall_get_error(str
>>   * TS_COMPAT is set for 32-bit syscall entries and then
>>   * remains set until we return to user mode.
>>   */
>> - if (task_thread_info(task)->status & TS_COMPAT)
>> + if ((task_thread_info(task)->status & TS_COMPAT)
>> +    || test_thread_flag (TIF_IA32))
>>   /*
>>   * Sign-extend the value so (int)-EFOO becomes (long)-EFOO
>>   * and will match correctly in comparisons.
>>
>
> No, this is definitely not the right fix.  Your description is
> incredibly hard to follow, but I feel pretty strongly that the above is
> at the very best a last resort fix.  TS_COMPAT is a local property
> whereas TIF_IA32 is global; it is important to keep their respective
> uses correct.  Mixing them is almost guaranteed to be just plain wrong.
>
>         -hpa
>

I am sorry that the root cause of issue has something wrong.
The right root cause is:
When inferior call 32 bits syscall "read", Linux kernel function
"ia32_cstar_target" will set TS_COMPAT to current_thread_info->status.

syscall read is interrupt by ctrl-c.   Then the $rax will be set to
errno -512 in 64 bits.
And the inferior will be stopped by Linux kernel function ptrace_stop,
the call trace is:
#0  freezable_schedule () at include/linux/freezer.h:172
#1  ptrace_stop (exit_code=exit_code@entry=5, why=why@entry=262148,
    clear_code=clear_code@entry=0, info=info@entry=0xffff88001d833e78)
    at kernel/signal.c:1920
#2  0xffffffff8107ec33 in ptrace_signal (info=0xffff88001d833e78, signr=5)
    at kernel/signal.c:2157
#3  get_signal_to_deliver (info=info@entry=0xffff88001d833e78,
    return_ka=return_ka@entry=0xffff88001d833e58, regs=<optimized out>,
    cookie=cookie@entry=0x0 <irq_stack_union>) at kernel/signal.c:2269
#4  0xffffffff81013438 in do_signal (regs=regs@entry=0xffff88001d833f58)
    at arch/x86/kernel/signal.c:696
#5  0xffffffff81013a40 in do_notify_resume (regs=0xffff88001d833f58,
    unused=<optimized out>, thread_info_flags=4) at arch/x86/kernel/signal.c:747
#6  <signal handler called>
#7  0x0000000000000000 in irq_stack_union ()

After that, GDB can control the stopped inferior.
To call function "func1()" of inferior, GDB need:
Step 1, save current values of registers ($rax 0xfffffffffffffe00(64 bits -512)
is cut to 0xfffffe00(32 bits -512) because inferior is a 32 bits program).
Step 2, change the values of registers.
Step 3, Push a dummy frame to stack.
Step 4, set a breakpint in the return address.

When GDB resume the inferior, it will keep execut from ptrace_stop
with new values of registers that set by GDB.
And TS_COMPAT inside current_thread_info->status will be cleared when
inferior switch back to user space.

When function "func1()" return, inferior will be stoped by breakpoint
inferior will be stopped by Linux kernel function "ptrace_stop" again.
current_thread_info->status will not set TS_COMPAT when inferior swith
from user space to kernel space because breakpoint handler "int3" doesn't
has code for that.

GDB begin to set saved values of registers back to inferior that use
function "amd64_collect_native_gregset".  Because this function just
zero-extend each 32 bits value to 64 bits value before put them to inferior.
$rax's value is set to 0xfffffe00(32 bits -512) but not
0xfffffffffffffe00(64 bits -512).

When GDB continue syscall "read" that is interrupted by "ctrl-c", it will
keep execute from ptrace_stop without "TS_COMPAT".
Then in Linux kernel function "syscall_get_error", current_thread_info->status
doesn't have TS_COMPAT and $rax is 0xfffffe00(32 bits -512).  Then in
function do_signal will not handle this -ERESTARTSYS.

-ERESTARTSYS will be return back to inferior, that is why inferior got a
errno -ERESTARTSYS.

I made a new patch that before call do_notify_resume(will call do_signal)
in the int3 handler, set TS_COMPAT to status if this task is TIF_IA32.
Then after GDB call a function of inferior, it will still has TS_COMPAT.

Signed-off-by: Hui Zhu <hui@codesourcery.com>
---
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1545,7 +1545,14 @@ paranoid_userspace:
  ENABLE_INTERRUPTS(CLBR_NONE)
  xorl %esi,%esi /* arg2: oldset */
  movq %rsp,%rdi /* arg1: &pt_regs */
+        testl $_TIF_IA32,%ebx
+        jnz call_do_notify_resume
+        GET_THREAD_INFO(%rcx)
+        orl $TS_COMPAT,TI_status(%rcx)
+call_do_notify_resume:
  call do_notify_resume
+        GET_THREAD_INFO(%rcx)
+        andl    $~TS_COMPAT,TI_status(%rcx)
  DISABLE_INTERRUPTS(CLBR_NONE)
  TRACE_IRQS_OFF
  jmp paranoid_userspace
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
H. Peter Anvin April 30, 2014, 4:50 a.m. UTC | #3
On 04/29/2014 08:44 PM, Hui Zhu wrote:
> 
> I am sorry that the root cause of issue has something wrong.
> The right root cause is:
> When inferior call 32 bits syscall "read", Linux kernel function
> "ia32_cstar_target" will set TS_COMPAT to current_thread_info->status.
> 
> syscall read is interrupt by ctrl-c.   Then the $rax will be set to
> errno -512 in 64 bits.
> And the inferior will be stopped by Linux kernel function ptrace_stop,
> the call trace is:
> #0  freezable_schedule () at include/linux/freezer.h:172
> #1  ptrace_stop (exit_code=exit_code@entry=5, why=why@entry=262148,
>     clear_code=clear_code@entry=0, info=info@entry=0xffff88001d833e78)
>     at kernel/signal.c:1920
> #2  0xffffffff8107ec33 in ptrace_signal (info=0xffff88001d833e78, signr=5)
>     at kernel/signal.c:2157
> #3  get_signal_to_deliver (info=info@entry=0xffff88001d833e78,
>     return_ka=return_ka@entry=0xffff88001d833e58, regs=<optimized out>,
>     cookie=cookie@entry=0x0 <irq_stack_union>) at kernel/signal.c:2269
> #4  0xffffffff81013438 in do_signal (regs=regs@entry=0xffff88001d833f58)
>     at arch/x86/kernel/signal.c:696
> #5  0xffffffff81013a40 in do_notify_resume (regs=0xffff88001d833f58,
>     unused=<optimized out>, thread_info_flags=4) at arch/x86/kernel/signal.c:747
> #6  <signal handler called>
> #7  0x0000000000000000 in irq_stack_union ()
> 
> After that, GDB can control the stopped inferior.
> To call function "func1()" of inferior, GDB need:
> Step 1, save current values of registers ($rax 0xfffffffffffffe00(64 bits -512)
> is cut to 0xfffffe00(32 bits -512) because inferior is a 32 bits program).

So gdb just corrupted the system state.

> Step 2, change the values of registers.
> Step 3, Push a dummy frame to stack.
> Step 4, set a breakpint in the return address.
> 
> When GDB resume the inferior, it will keep execut from ptrace_stop
> with new values of registers that set by GDB.

> And TS_COMPAT inside current_thread_info->status will be cleared when
> inferior switch back to user space.

As it should, because TS_COMPAT *only is meaningful while a system call
is executing*.

> When function "func1()" return, inferior will be stoped by breakpoint
> inferior will be stopped by Linux kernel function "ptrace_stop" again.
> current_thread_info->status will not set TS_COMPAT when inferior swith
> from user space to kernel space because breakpoint handler "int3" doesn't
> has code for that.

As it shouldn't, because there is no system call entry involved.

> GDB begin to set saved values of registers back to inferior that use
> function "amd64_collect_native_gregset".  Because this function just
> zero-extend each 32 bits value to 64 bits value before put them to inferior.
> $rax's value is set to 0xfffffe00(32 bits -512) but not
> 0xfffffffffffffe00(64 bits -512).
>
> When GDB continue syscall "read" that is interrupted by "ctrl-c", it will
> keep execute from ptrace_stop without "TS_COMPAT".

gdb has corrupted the state, and it fails to execute.

I'm wondering if we need to add additional state here, to carry the
TS_COMPAT bit.  We have talked about this kind of issues in the past.

> Then in Linux kernel function "syscall_get_error", current_thread_info->status
> doesn't have TS_COMPAT and $rax is 0xfffffe00(32 bits -512).  Then in
> function do_signal will not handle this -ERESTARTSYS.
> 
> -ERESTARTSYS will be return back to inferior, that is why inferior got a
> errno -ERESTARTSYS.
> 
> I made a new patch that before call do_notify_resume(will call do_signal)
> in the int3 handler, set TS_COMPAT to status if this task is TIF_IA32.
> Then after GDB call a function of inferior, it will still has TS_COMPAT.

I'm not sure if I want to label this a gdb bug or not (my main feeling
is that gdb should save and restore the register set presented to it,
and that truncating values to 32 bits is the root of the problem), but
the above is definitely a hack which doesn't really address the real
problem.

	-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Andrew Pinski April 30, 2014, 5:08 a.m. UTC | #4
On Tue, Apr 29, 2014 at 9:50 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 04/29/2014 08:44 PM, Hui Zhu wrote:
>>
>> I am sorry that the root cause of issue has something wrong.
>> The right root cause is:
>> When inferior call 32 bits syscall "read", Linux kernel function
>> "ia32_cstar_target" will set TS_COMPAT to current_thread_info->status.
>>
>> syscall read is interrupt by ctrl-c.   Then the $rax will be set to
>> errno -512 in 64 bits.
>> And the inferior will be stopped by Linux kernel function ptrace_stop,
>> the call trace is:
>> #0  freezable_schedule () at include/linux/freezer.h:172
>> #1  ptrace_stop (exit_code=exit_code@entry=5, why=why@entry=262148,
>>     clear_code=clear_code@entry=0, info=info@entry=0xffff88001d833e78)
>>     at kernel/signal.c:1920
>> #2  0xffffffff8107ec33 in ptrace_signal (info=0xffff88001d833e78, signr=5)
>>     at kernel/signal.c:2157
>> #3  get_signal_to_deliver (info=info@entry=0xffff88001d833e78,
>>     return_ka=return_ka@entry=0xffff88001d833e58, regs=<optimized out>,
>>     cookie=cookie@entry=0x0 <irq_stack_union>) at kernel/signal.c:2269
>> #4  0xffffffff81013438 in do_signal (regs=regs@entry=0xffff88001d833f58)
>>     at arch/x86/kernel/signal.c:696
>> #5  0xffffffff81013a40 in do_notify_resume (regs=0xffff88001d833f58,
>>     unused=<optimized out>, thread_info_flags=4) at arch/x86/kernel/signal.c:747
>> #6  <signal handler called>
>> #7  0x0000000000000000 in irq_stack_union ()
>>
>> After that, GDB can control the stopped inferior.
>> To call function "func1()" of inferior, GDB need:
>> Step 1, save current values of registers ($rax 0xfffffffffffffe00(64 bits -512)
>> is cut to 0xfffffe00(32 bits -512) because inferior is a 32 bits program).
>
> So gdb just corrupted the system state.

Except GDB in 32bit mode does not know the registers are full 64bits so ...

>
>> Step 2, change the values of registers.
>> Step 3, Push a dummy frame to stack.
>> Step 4, set a breakpint in the return address.
>>
>> When GDB resume the inferior, it will keep execut from ptrace_stop
>> with new values of registers that set by GDB.
>
>> And TS_COMPAT inside current_thread_info->status will be cleared when
>> inferior switch back to user space.
>
> As it should, because TS_COMPAT *only is meaningful while a system call
> is executing*.
>
>> When function "func1()" return, inferior will be stoped by breakpoint
>> inferior will be stopped by Linux kernel function "ptrace_stop" again.
>> current_thread_info->status will not set TS_COMPAT when inferior swith
>> from user space to kernel space because breakpoint handler "int3" doesn't
>> has code for that.
>
> As it shouldn't, because there is no system call entry involved.
>
>> GDB begin to set saved values of registers back to inferior that use
>> function "amd64_collect_native_gregset".  Because this function just
>> zero-extend each 32 bits value to 64 bits value before put them to inferior.
>> $rax's value is set to 0xfffffe00(32 bits -512) but not
>> 0xfffffffffffffe00(64 bits -512).
>>
>> When GDB continue syscall "read" that is interrupted by "ctrl-c", it will
>> keep execute from ptrace_stop without "TS_COMPAT".
>
> gdb has corrupted the state, and it fails to execute.
>
> I'm wondering if we need to add additional state here, to carry the
> TS_COMPAT bit.  We have talked about this kind of issues in the past.
>
>> Then in Linux kernel function "syscall_get_error", current_thread_info->status
>> doesn't have TS_COMPAT and $rax is 0xfffffe00(32 bits -512).  Then in
>> function do_signal will not handle this -ERESTARTSYS.
>>
>> -ERESTARTSYS will be return back to inferior, that is why inferior got a
>> errno -ERESTARTSYS.
>>
>> I made a new patch that before call do_notify_resume(will call do_signal)
>> in the int3 handler, set TS_COMPAT to status if this task is TIF_IA32.
>> Then after GDB call a function of inferior, it will still has TS_COMPAT.
>
> I'm not sure if I want to label this a gdb bug or not (my main feeling
> is that gdb should save and restore the register set presented to it,
> and that truncating values to 32 bits is the root of the problem), but
> the above is definitely a hack which doesn't really address the real
> problem.

restoring the values is hard since even the ptrace interface does not
allow for that.

Thanks,
Andrew Pinski

>
>         -hpa
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
H. Peter Anvin April 30, 2014, 5:10 a.m. UTC | #5
On 04/29/2014 10:08 PM, Andrew Pinski wrote:
> 
> restoring the values is hard since even the ptrace interface does not
> allow for that.
> 

So that begs the ultimate question, which is: given the fact that there
is *state missing* from the state vector (this is the core of the
problem), is there a way we can add that state so that gdb will be able
to save and restore it?

	-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Mark Kettenis April 30, 2014, 1:35 p.m. UTC | #6
> Date: Tue, 29 Apr 2014 22:10:15 -0700
> From: "H. Peter Anvin" <hpa@zytor.com>
> 
> On 04/29/2014 10:08 PM, Andrew Pinski wrote:
> > 
> > restoring the values is hard since even the ptrace interface does not
> > allow for that.
> > 
> 
> So that begs the ultimate question, which is: given the fact that there
> is *state missing* from the state vector (this is the core of the
> problem), is there a way we can add that state so that gdb will be able
> to save and restore it?

Carrying around additional state in GDB is complicated; I'd rather
avoid it.

arch/x86/kernel/ptrace.c:putreg32() has this bit of code:

        case offsetof(struct user32, regs.orig_eax):
                /*
                 * A 32-bit debugger setting orig_eax means to restore
                 * the state of the task restarting a 32-bit syscall.
                 * Make sure we interpret the -ERESTART* codes correctly
                 * in case the task is not actually still sitting at the
                 * exit from a 32-bit syscall with TS_COMPAT still set.
                 */
                regs->orig_ax = value;
                if (syscall_get_nr(child, regs) >= 0)
                        task_thread_info(child)->status |= TS_COMPAT;
                break;

which gets used for 32-bit compat ptrace(2).  Perhaps the same logic
should be added to putreg() if the child is a 32-bit process?

If (and only if) the goal of that TS_COMPAT flag solely is to trigger
the error code sign-extension in arch/x86/asm/syscall.h:syscall_get_error(),
we could work around to problem in GDB by checking "orig_ax" to see if
we're continuing an interrupted system call and sign extend the error
code in the real "eax" register if we are.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Hui Zhu April 30, 2014, 4:28 p.m. UTC | #7
On Wed, Apr 30, 2014 at 9:35 PM, Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
>> Date: Tue, 29 Apr 2014 22:10:15 -0700
>> From: "H. Peter Anvin" <hpa@zytor.com>
>>
>> On 04/29/2014 10:08 PM, Andrew Pinski wrote:
>> >
>> > restoring the values is hard since even the ptrace interface does not
>> > allow for that.
>> >
>>
>> So that begs the ultimate question, which is: given the fact that there
>> is *state missing* from the state vector (this is the core of the
>> problem), is there a way we can add that state so that gdb will be able
>> to save and restore it?
>
> Carrying around additional state in GDB is complicated; I'd rather
> avoid it.
>
> arch/x86/kernel/ptrace.c:putreg32() has this bit of code:
>
>         case offsetof(struct user32, regs.orig_eax):
>                 /*
>                  * A 32-bit debugger setting orig_eax means to restore
>                  * the state of the task restarting a 32-bit syscall.
>                  * Make sure we interpret the -ERESTART* codes correctly
>                  * in case the task is not actually still sitting at the
>                  * exit from a 32-bit syscall with TS_COMPAT still set.
>                  */
>                 regs->orig_ax = value;
>                 if (syscall_get_nr(child, regs) >= 0)
>                         task_thread_info(child)->status |= TS_COMPAT;
>                 break;
>
> which gets used for 32-bit compat ptrace(2).  Perhaps the same logic
> should be added to putreg() if the child is a 32-bit process?

Make a new patch that add following code to putreg():
case offsetof(struct user_regs_struct, orig_ax):
/*
* A 64-bit debugger setting orig_ax of a 32-bit inferior
* means to restore the state of the task restarting a
* 32-bit syscall.
* Make sure we interpret the -ERESTART* codes correctly
* in case the task is not actually still sitting at the
* exit from a 32-bit syscall with TS_COMPAT still set.
*/
if (test_ti_thread_flag(task_thread_info(child), TIF_IA32)) {
struct pt_regs *regs = task_pt_regs(child);
regs->orig_ax = value;
if (syscall_get_nr(child, regs) >= 0)
task_thread_info(child)->status |= TS_COMPAT;
return 0;
}

>
> If (and only if) the goal of that TS_COMPAT flag solely is to trigger
> the error code sign-extension in arch/x86/asm/syscall.h:syscall_get_error(),
> we could work around to problem in GDB by checking "orig_ax" to see if
> we're continuing an interrupted system call and sign extend the error
> code in the real "eax" register if we are.

I will update patch in
https://sourceware.org/ml/gdb-patches/2014-04/msg00429.html
according to this comments.

Thanks,
Hui

Signed-off-by: Hui Zhu <hui@codesourcery.com>
---
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -452,6 +452,23 @@ static int putreg(struct task_struct *ch
  if (child->thread.gs != value)
  return do_arch_prctl(child, ARCH_SET_GS, value);
  return 0;
+ case offsetof(struct user_regs_struct, orig_ax):
+ /*
+ * A 64-bit debugger setting orig_ax of a 32-bit inferior
+ * means to restore the state of the task restarting a
+ * 32-bit syscall.
+ * Make sure we interpret the -ERESTART* codes correctly
+ * in case the task is not actually still sitting at the
+ * exit from a 32-bit syscall with TS_COMPAT still set.
+ */
+ if (test_ti_thread_flag(task_thread_info(child), TIF_IA32)) {
+ struct pt_regs *regs = task_pt_regs(child);
+ regs->orig_ax = value;
+ if (syscall_get_nr(child, regs) >= 0)
+ task_thread_info(child)->status |= TS_COMPAT;
+ return 0;
+ }
+ break;
 #endif
  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
H. Peter Anvin April 30, 2014, 4:35 p.m. UTC | #8
On 04/30/2014 06:35 AM, Mark Kettenis wrote:
> 
> If (and only if) the goal of that TS_COMPAT flag solely is to trigger
> the error code sign-extension in arch/x86/asm/syscall.h:syscall_get_error(),
> we could work around to problem in GDB by checking "orig_ax" to see if
> we're continuing an interrupted system call and sign extend the error
> code in the real "eax" register if we are.
> 

Well... this "if and only if" clause is false, although it might work
specifically in this context.  I need to do a writeup about this... I
should try to do that today.

	-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Hui Zhu April 30, 2014, 4:35 p.m. UTC | #9
On 05/01/14 00:28, Hui Zhu wrote:
> On Wed, Apr 30, 2014 at 9:35  PM, Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
 >>> Date: Tue, 29 Apr 2014 22:10:15 -0700
 >>> From: "H. Peter Anvin" <hpa@zytor.com>
 >>>
 >>> On 04/29/2014 10:08 PM, Andrew Pinski wrote:
 >>>>
 >>>> restoring the values is hard since even the ptrace interface does not
 >>>> allow for that.
 >>>>
 >>>
 >>> So that begs the ultimate question, which is: given the fact that there
 >>> is *state missing* from the state vector (this is the core of the
 >>> problem), is there a way we can add that state so that gdb will be able
 >>> to save and restore it?
 >>
 >> Carrying around additional state in GDB is complicated; I'd rather
 >> avoid it.
 >>
 >> arch/x86/kernel/ptrace.c:putreg32() has this bit of code:
 >>
 >>         case offsetof(struct user32, regs.orig_eax):
 >>                 /*
 >>                  * A 32-bit debugger setting orig_eax means to restore
 >>                  * the state of the task restarting a 32-bit syscall.
 >>                  * Make sure we interpret the -ERESTART* codes correctly
 >>                  * in case the task is not actually still sitting at the
 >>                  * exit from a 32-bit syscall with TS_COMPAT still set.
 >>                  */
 >>                 regs->orig_ax = value;
 >>                 if (syscall_get_nr(child, regs) >= 0)
 >> task_thread_info(child)->status |= TS_COMPAT;
 >>                 break;
 >>
 >> which gets used for 32-bit compat ptrace(2).  Perhaps the same logic
 >> should be added to putreg() if the child is a 32-bit process?
 >
 > Make a new patch that add following code to putreg():
 > case offsetof(struct user_regs_struct, orig_ax):
 > /*
 > * A 64-bit debugger setting orig_ax of a 32-bit inferior
 > * means to restore the state of the task restarting a
 > * 32-bit syscall.
 > * Make sure we interpret the -ERESTART* codes correctly
 > * in case the task is not actually still sitting at the
 > * exit from a 32-bit syscall with TS_COMPAT still set.
 > */
 > if (test_ti_thread_flag(task_thread_info(child), TIF_IA32)) {
 > struct pt_regs *regs = task_pt_regs(child);
 > regs->orig_ax = value;
 > if (syscall_get_nr(child, regs) >= 0)
 > task_thread_info(child)->status |= TS_COMPAT;
 > return 0;
 > }
 >
 >>
 >> If (and only if) the goal of that TS_COMPAT flag solely is to trigger
 >> the error code sign-extension in 
arch/x86/asm/syscall.h:syscall_get_error(),
 >> we could work around to problem in GDB by checking "orig_ax" to see if
 >> we're continuing an interrupted system call and sign extend the error
 >> code in the real "eax" register if we are.
 >
 > I will update patch in
 > https://sourceware.org/ml/gdb-patches/2014-04/msg00429.html
 > according to this comments.
 >
 > Thanks,
 > Hui
 >

I sorry that previous patch has some format issue, post a new one.

Signed-off-by: Hui Zhu <hui@codesourcery.com>
---
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -452,6 +452,23 @@ static int putreg(struct task_struct *ch
          if (child->thread.gs != value)
              return do_arch_prctl(child, ARCH_SET_GS, value);
          return 0;
+    case offsetof(struct user_regs_struct, orig_ax):
+        /*
+         * A 64-bit debugger setting orig_ax of a 32-bit inferior
+         * means to restore the state of the task restarting a
+         * 32-bit syscall.
+         * Make sure we interpret the -ERESTART* codes correctly
+         * in case the task is not actually still sitting at the
+         * exit from a 32-bit syscall with TS_COMPAT still set.
+         */
+        if (test_ti_thread_flag(task_thread_info(child), TIF_IA32)) {
+            struct pt_regs *regs = task_pt_regs(child);
+            regs->orig_ax = value;
+            if (syscall_get_nr(child, regs) >= 0)
+                task_thread_info(child)->status |= TS_COMPAT;
+            return 0;
+        }
+        break;
  #endif
      }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Pedro Alves April 30, 2014, 5:49 p.m. UTC | #10
On 04/30/2014 02:35 PM, Mark Kettenis wrote:
>> Date: Tue, 29 Apr 2014 22:10:15 -0700
>> From: "H. Peter Anvin" <hpa@zytor.com>
>>
>> On 04/29/2014 10:08 PM, Andrew Pinski wrote:
>>>
>>> restoring the values is hard since even the ptrace interface does not
>>> allow for that.
>>>
>>
>> So that begs the ultimate question, which is: given the fact that there
>> is *state missing* from the state vector (this is the core of the
>> problem), is there a way we can add that state so that gdb will be able
>> to save and restore it?
> 
> Carrying around additional state in GDB is complicated; I'd rather
> avoid it.

Agreed very much.

> arch/x86/kernel/ptrace.c:putreg32() has this bit of code:
> 
>         case offsetof(struct user32, regs.orig_eax):
>                 /*
>                  * A 32-bit debugger setting orig_eax means to restore
>                  * the state of the task restarting a 32-bit syscall.
>                  * Make sure we interpret the -ERESTART* codes correctly
>                  * in case the task is not actually still sitting at the
>                  * exit from a 32-bit syscall with TS_COMPAT still set.
>                  */
>                 regs->orig_ax = value;
>                 if (syscall_get_nr(child, regs) >= 0)
>                         task_thread_info(child)->status |= TS_COMPAT;
>                 break;
> 
> which gets used for 32-bit compat ptrace(2).  Perhaps the same logic
> should be added to putreg() if the child is a 32-bit process?

Sounds like the right fix to me.
H. Peter Anvin April 30, 2014, 8:43 p.m. UTC | #11
On 04/30/2014 09:35 AM, Hui Zhu wrote:
> 
> I sorry that previous patch has some format issue, post a new one.
> 
> Signed-off-by: Hui Zhu <hui@codesourcery.com>
> ---
> --- a/arch/x86/kernel/ptrace.c
> +++ b/arch/x86/kernel/ptrace.c
> @@ -452,6 +452,23 @@ static int putreg(struct task_struct *ch
>          if (child->thread.gs != value)
>              return do_arch_prctl(child, ARCH_SET_GS, value);
>          return 0;
> +    case offsetof(struct user_regs_struct, orig_ax):
> +        /*
> +         * A 64-bit debugger setting orig_ax of a 32-bit inferior
> +         * means to restore the state of the task restarting a
> +         * 32-bit syscall.
> +         * Make sure we interpret the -ERESTART* codes correctly
> +         * in case the task is not actually still sitting at the
> +         * exit from a 32-bit syscall with TS_COMPAT still set.
> +         */
> +        if (test_ti_thread_flag(task_thread_info(child), TIF_IA32)) {
> +            struct pt_regs *regs = task_pt_regs(child);
> +            regs->orig_ax = value;
> +            if (syscall_get_nr(child, regs) >= 0)
> +                task_thread_info(child)->status |= TS_COMPAT;
> +            return 0;
> +        }
> +        break;
>  #endif
>      }

You still seems to have botched whitespace, and no patch description in
the same email so it can be automated.

	-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
H. Peter Anvin April 30, 2014, 8:44 p.m. UTC | #12
On 04/30/2014 06:35 AM, Mark Kettenis wrote:
> 
> arch/x86/kernel/ptrace.c:putreg32() has this bit of code:
> 
>         case offsetof(struct user32, regs.orig_eax):
>                 /*
>                  * A 32-bit debugger setting orig_eax means to restore
>                  * the state of the task restarting a 32-bit syscall.
>                  * Make sure we interpret the -ERESTART* codes correctly
>                  * in case the task is not actually still sitting at the
>                  * exit from a 32-bit syscall with TS_COMPAT still set.
>                  */
>                 regs->orig_ax = value;
>                 if (syscall_get_nr(child, regs) >= 0)
>                         task_thread_info(child)->status |= TS_COMPAT;
>                 break;
> 
> which gets used for 32-bit compat ptrace(2).  Perhaps the same logic
> should be added to putreg() if the child is a 32-bit process?
> 

This seems a lot saner although I haven't thought about some of the
consequences.

> If (and only if) the goal of that TS_COMPAT flag solely is to trigger
> the error code sign-extension in arch/x86/asm/syscall.h:syscall_get_error(),
> we could work around to problem in GDB by checking "orig_ax" to see if
> we're continuing an interrupted system call and sign extend the error
> code in the real "eax" register if we are.

So to clarify the meaning of all these flags, here is the first cut of
my writeup on the topic:

The x86-64 Linux kernel supports three types of processes: legacy i386
32-bit processes, x32 32-bit processes, or x86-64 64-bit processes.
It is a common question how to know what kind of process is currently
executing.  However, it turns out this question is not as
straightforward as it first seems, and it is important to keep the
various aspects of this in mind, or it is often easy to make wrong
design choices.

Mixing and matching these three different meanings is harmful.


1. The initial execution mode

The initial execution mode is determined by the type of initial
executable (usually, but not always, ELF).  Inside the kernel, this is
represented by the flags TIF_IA32, TIF_X32 and TIF_ADDR32.
Specifically, for an i386 process TIF_IA32 and TIF_ADDR32 will be set,
for an x32 process TIF_X32 and TIF_ADDR32 will be set, and for an
x86-64 process none will be set.

These flags control the format of signal stack frames, core dumps, and
(TIF_ADDR32) whether or not the kernel will allocate address space
above 4 GiB on behalf of the process.

Currently, these cannot be changed, in the future it may be possible
for the process to request a change at runtime.

The use of this bit for signal stack frames is a bit unfortunate.
There may be a better solution in the future like remembering the type
of system call that configured the stack frame.  In general as little
as possible should depend on this mode.


2. The current execution mode

The actual execution mode (16, 32, or 64 bits) of the processor is
changeable in user space simply by executing a far transfer
instruction (to get into 16-bit mode, the modify_ldt system call needs
to enable suitable segments).  At first instruction, an i386 process
will run in 32-bit mode and an x32 or x86-64 process will run in
64-bit mode, but there is no restriction preventing the process from
changing its execution mode.

Therefore, the execution mode may very well be different from process
entry and also different from the last time the process went through
the kernel.

Currently the kernel does not allow an LDT segment to be a 64-bit
segment.  This means that the user space process will have %cs ==
USER_CS (currently 0x33) whenever it is running in 64-bit mode.


3. The system call type

i386 processes will execute system calls via int $0x80, syscall32 or
sysenter32, depending on the processor.  x32 and x86-64 processes both
execute system calls via the syscall64 instruction.

Because i386 system call numbers are unrelated to and overlap x86-64
system call numbers, they are distinguished via the type of entry,
which is recorded in the form of the TS_COMPAT flag.  That is, to
uniquely know what system call has been executed, you need both the
status of the TS_COMPAT flag and the system call number recorded in
the orig_ax field of pt_regs.

x32 and x86-64 processes both run in 64-bit mode and both use the
syscall64 instruction, so x32 processes are distinguished by setting
bit 30 in the system call number.  Sometimes the base system call
number is also different for x32.  It is not currently illegal, but
not generally useful, for bit 30 to not match the system call number
for a system call number that is not shared between x32 and x86-64.
This may be made illegal in the future, if the need arises.

The type of system call (i386, x32 or x86-64) determines not just the
meaning of the system call number, but also changes the semantics of
some system calls, for example, a read() of certain files in the input
system will return different values, and some system calls use
different data structures in different modes.

It is worth noting that there is absolutely nothing that prevents a
64-bit process to execute int $0x80 and execute an i386 compatibility
system call.  Such a system call will run with TS_COMPAT set and will
use i386 data structures and semantics.

The TS_COMPAT bit has no meaning (and should always be clear) when no
system call is executing.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch
diff mbox series

--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -48,7 +48,8 @@  static inline long syscall_get_error(str
  * TS_COMPAT is set for 32-bit syscall entries and then
  * remains set until we return to user mode.
  */
- if (task_thread_info(task)->status & TS_COMPAT)
+ if ((task_thread_info(task)->status & TS_COMPAT)
+    || test_thread_flag (TIF_IA32))
  /*
  * Sign-extend the value so (int)-EFOO becomes (long)-EFOO
  * and will match correctly in comparisons.