All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] signal/x86: Delay calling signals in atomic
@ 2022-02-14 20:19 Sebastian Andrzej Siewior
  2022-03-28 14:25 ` Eric W. Biederman
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-14 20:19 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Oleg Nesterov, H. Peter Anvin, Andy Lutomirski, Ben Segall,
	Borislav Petkov, Daniel Bristot de Oliveira, Dave Hansen,
	Dietmar Eggemann, Ingo Molnar, Juri Lelli, Mel Gorman,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Vincent Guittot

From: Oleg Nesterov <oleg@redhat.com>
Date: Tue, 14 Jul 2015 14:26:34 +0200

On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.

When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and
one of these is the spin lock used in signal handling.

Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spinlock_t lock that has been converted to a
sleeping lock. If this happens, the above issues with the corrupted
stack is possible.

Instead of calling the signal right away, for PREEMPT_RT and x86,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.

[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
  ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
[bigeasy: Add on 32bit as per Yang Shi, minor rewording. ]

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/signal.h | 13 +++++++++++++
 include/linux/sched.h         |  3 +++
 kernel/entry/common.c         |  9 +++++++++
 kernel/signal.c               | 28 ++++++++++++++++++++++++++++
 4 files changed, 53 insertions(+)

diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h
index 2dfb5fea13aff..fc03f4f7ed84c 100644
--- a/arch/x86/include/asm/signal.h
+++ b/arch/x86/include/asm/signal.h
@@ -28,6 +28,19 @@ typedef struct {
 #define SA_IA32_ABI	0x02000000u
 #define SA_X32_ABI	0x01000000u
 
+/*
+ * Because some traps use the IST stack, we must keep preemption
+ * disabled while calling do_trap(), but do_trap() may call
+ * force_sig_info() which will grab the signal spin_locks for the
+ * task, which in PREEMPT_RT are mutexes.  By defining
+ * ARCH_RT_DELAYS_SIGNAL_SEND the force_sig_info() will set
+ * TIF_NOTIFY_RESUME and set up the signal to be sent on exit of the
+ * trap.
+ */
+#if defined(CONFIG_PREEMPT_RT)
+#define ARCH_RT_DELAYS_SIGNAL_SEND
+#endif
+
 #ifndef CONFIG_COMPAT
 #define compat_sigset_t compat_sigset_t
 typedef sigset_t compat_sigset_t;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 75ba8aa60248b..0514237cee3fc 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1087,6 +1087,9 @@ struct task_struct {
 	/* Restored if set_restore_sigmask() was used: */
 	sigset_t			saved_sigmask;
 	struct sigpending		pending;
+#ifdef CONFIG_PREEMPT_RT
+	struct				kernel_siginfo forced_info;
+#endif
 	unsigned long			sas_ss_sp;
 	size_t				sas_ss_size;
 	unsigned int			sas_ss_flags;
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index bad713684c2e3..216dbf46e05f5 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -162,6 +162,15 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 		if (ti_work & _TIF_NEED_RESCHED)
 			schedule();
 
+#ifdef ARCH_RT_DELAYS_SIGNAL_SEND
+		if (unlikely(current->forced_info.si_signo)) {
+			struct task_struct *t = current;
+
+			force_sig_info(&t->forced_info);
+			t->forced_info.si_signo = 0;
+		}
+#endif
+
 		if (ti_work & _TIF_UPROBE)
 			uprobe_notify_resume(regs);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 9b04631acde8f..cb2b28c17c0a5 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1327,6 +1327,34 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	struct k_sigaction *action;
 	int sig = info->si_signo;
 
+	/*
+	 * On some archs, PREEMPT_RT has to delay sending a signal from a trap
+	 * since it can not enable preemption, and the signal code's spin_locks
+	 * turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME which will
+	 * send the signal on exit of the trap.
+	 */
+#ifdef ARCH_RT_DELAYS_SIGNAL_SEND
+	if (in_atomic()) {
+		struct task_struct *t = current;
+
+		if (WARN_ON_ONCE(t->forced_info.si_signo))
+			return 0;
+
+		if (is_si_special(info)) {
+			WARN_ON_ONCE(info != SEND_SIG_PRIV);
+			t->forced_info.si_signo = info->si_signo;
+			t->forced_info.si_errno = 0;
+			t->forced_info.si_code = SI_KERNEL;
+			t->forced_info.si_pid = 0;
+			t->forced_info.si_uid = 0;
+		} else {
+			t->forced_info = *info;
+		}
+
+		set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
+		return 0;
+	}
+#endif
 	spin_lock_irqsave(&t->sighand->siglock, flags);
 	action = &t->sighand->action[sig-1];
 	ignored = action->sa.sa_handler == SIG_IGN;
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] signal/x86: Delay calling signals in atomic
  2022-02-14 20:19 [PATCH] signal/x86: Delay calling signals in atomic Sebastian Andrzej Siewior
@ 2022-03-28 14:25 ` Eric W. Biederman
  2022-03-28 14:41   ` Eric W. Biederman
                     ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Eric W. Biederman @ 2022-03-28 14:25 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: x86, linux-kernel, Oleg Nesterov, H. Peter Anvin,
	Andy Lutomirski, Ben Segall, Borislav Petkov,
	Daniel Bristot de Oliveira, Dave Hansen, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Vincent Guittot

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> From: Oleg Nesterov <oleg@redhat.com>
> Date: Tue, 14 Jul 2015 14:26:34 +0200
>
> On x86_64 we must disable preemption before we enable interrupts
> for stack faults, int3 and debugging, because the current task is using
> a per CPU debug stack defined by the IST. If we schedule out, another task
> can come in and use the same stack and cause the stack to be corrupted
> and crash the kernel on return.
>
> When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and
> one of these is the spin lock used in signal handling.
>
> Some of the debug code (int3) causes do_trap() to send a signal.
> This function calls a spinlock_t lock that has been converted to a
> sleeping lock. If this happens, the above issues with the corrupted
> stack is possible.
>
> Instead of calling the signal right away, for PREEMPT_RT and x86,
> the signal information is stored on the stacks task_struct and
> TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
> code will send the signal when preemption is enabled.

Folks I really would have appreciated being copied on a signal handling
patch like this.

It is too late to nack, but I think this buggy patch deserved one.  Can
we please fix PREEMPT_RT instead?

As far as I can tell this violates all of rules from
implementing/maintaining the RT kernel.  Instead of coming up with new
abstractions that makes sense and can use by everyone this introduces
a hack only for PREEMPT_RT and a pretty horrible one at that.

This talks about int3, but the code looks for in_atomic().  Which means
that essentially every call of force_sig will take this path as they
almost all come from exception handlers.  It is the nature of signals
that report on faults.  An exception is raised and the kernel reports it
to userspace with a fault signal (aka force_sig_xxx).

Further this code is buggy.  TIF_NOTIFY_RESUME is not the correct
flag to set to enter into exit_to_usermode_loop.  TIF_NOTIFY_RESUME is
about that happens after signal handling.  This very much needs to be
TIF_SIGPENDING with recalc_sigpending and friends updated to know about
"task->force_info".

Does someone own this problem?  Can that person please fix this
properly?

I really don't think it is going to be maintainable for PREEMPT_RT to
maintain a separate signal delivery path for faults from the rest of
linux.


Eric

> [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
>   ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
> [bigeasy: Add on 32bit as per Yang Shi, minor rewording. ]
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/include/asm/signal.h | 13 +++++++++++++
>  include/linux/sched.h         |  3 +++
>  kernel/entry/common.c         |  9 +++++++++
>  kernel/signal.c               | 28 ++++++++++++++++++++++++++++
>  4 files changed, 53 insertions(+)
>
> diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h
> index 2dfb5fea13aff..fc03f4f7ed84c 100644
> --- a/arch/x86/include/asm/signal.h
> +++ b/arch/x86/include/asm/signal.h
> @@ -28,6 +28,19 @@ typedef struct {
>  #define SA_IA32_ABI	0x02000000u
>  #define SA_X32_ABI	0x01000000u
>  
> +/*
> + * Because some traps use the IST stack, we must keep preemption
> + * disabled while calling do_trap(), but do_trap() may call
> + * force_sig_info() which will grab the signal spin_locks for the
> + * task, which in PREEMPT_RT are mutexes.  By defining
> + * ARCH_RT_DELAYS_SIGNAL_SEND the force_sig_info() will set
> + * TIF_NOTIFY_RESUME and set up the signal to be sent on exit of the
> + * trap.
> + */
> +#if defined(CONFIG_PREEMPT_RT)
> +#define ARCH_RT_DELAYS_SIGNAL_SEND
> +#endif
> +
>  #ifndef CONFIG_COMPAT
>  #define compat_sigset_t compat_sigset_t
>  typedef sigset_t compat_sigset_t;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 75ba8aa60248b..0514237cee3fc 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1087,6 +1087,9 @@ struct task_struct {
>  	/* Restored if set_restore_sigmask() was used: */
>  	sigset_t			saved_sigmask;
>  	struct sigpending		pending;
> +#ifdef CONFIG_PREEMPT_RT
> +	struct				kernel_siginfo forced_info;
> +#endif
>  	unsigned long			sas_ss_sp;
>  	size_t				sas_ss_size;
>  	unsigned int			sas_ss_flags;
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index bad713684c2e3..216dbf46e05f5 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -162,6 +162,15 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>  		if (ti_work & _TIF_NEED_RESCHED)
>  			schedule();
>  
> +#ifdef ARCH_RT_DELAYS_SIGNAL_SEND
> +		if (unlikely(current->forced_info.si_signo)) {
> +			struct task_struct *t = current;
> +
> +			force_sig_info(&t->forced_info);
> +			t->forced_info.si_signo = 0;
> +		}
> +#endif
> +
>  		if (ti_work & _TIF_UPROBE)
>  			uprobe_notify_resume(regs);
>  
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 9b04631acde8f..cb2b28c17c0a5 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1327,6 +1327,34 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
>  	struct k_sigaction *action;
>  	int sig = info->si_signo;
>  
> +	/*
> +	 * On some archs, PREEMPT_RT has to delay sending a signal from a trap
> +	 * since it can not enable preemption, and the signal code's spin_locks
> +	 * turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME which will
> +	 * send the signal on exit of the trap.
> +	 */
> +#ifdef ARCH_RT_DELAYS_SIGNAL_SEND
> +	if (in_atomic()) {
> +		struct task_struct *t = current;
> +
> +		if (WARN_ON_ONCE(t->forced_info.si_signo))
> +			return 0;
> +
> +		if (is_si_special(info)) {
> +			WARN_ON_ONCE(info != SEND_SIG_PRIV);
> +			t->forced_info.si_signo = info->si_signo;
> +			t->forced_info.si_errno = 0;
> +			t->forced_info.si_code = SI_KERNEL;
> +			t->forced_info.si_pid = 0;
> +			t->forced_info.si_uid = 0;
> +		} else {
> +			t->forced_info = *info;
> +		}
> +
> +		set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
> +		return 0;
> +	}
> +#endif
>  	spin_lock_irqsave(&t->sighand->siglock, flags);
>  	action = &t->sighand->action[sig-1];
>  	ignored = action->sa.sa_handler == SIG_IGN;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] signal/x86: Delay calling signals in atomic
  2022-03-28 14:25 ` Eric W. Biederman
@ 2022-03-28 14:41   ` Eric W. Biederman
  2022-03-28 16:32     ` Sebastian Andrzej Siewior
  2022-03-28 14:48   ` Eric W. Biederman
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2022-03-28 14:41 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: x86, linux-kernel, Oleg Nesterov, H. Peter Anvin,
	Andy Lutomirski, Ben Segall, Borislav Petkov,
	Daniel Bristot de Oliveira, Dave Hansen, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Vincent Guittot

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
>
>> From: Oleg Nesterov <oleg@redhat.com>
>> Date: Tue, 14 Jul 2015 14:26:34 +0200
>>
>> On x86_64 we must disable preemption before we enable interrupts
>> for stack faults, int3 and debugging, because the current task is using
>> a per CPU debug stack defined by the IST. If we schedule out, another task
>> can come in and use the same stack and cause the stack to be corrupted
>> and crash the kernel on return.
>>
>> When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and
>> one of these is the spin lock used in signal handling.
>>
>> Some of the debug code (int3) causes do_trap() to send a signal.
>> This function calls a spinlock_t lock that has been converted to a
>> sleeping lock. If this happens, the above issues with the corrupted
>> stack is possible.
>>
>> Instead of calling the signal right away, for PREEMPT_RT and x86,
>> the signal information is stored on the stacks task_struct and
>> TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
>> code will send the signal when preemption is enabled.
>
> Folks I really would have appreciated being copied on a signal handling
> patch like this.
>
> It is too late to nack, but I think this buggy patch deserved one.  Can
> we please fix PREEMPT_RT instead?
>
> As far as I can tell this violates all of rules from
> implementing/maintaining the RT kernel.  Instead of coming up with new
> abstractions that makes sense and can use by everyone this introduces
> a hack only for PREEMPT_RT and a pretty horrible one at that.
>
> This talks about int3, but the code looks for in_atomic().  Which means
> that essentially every call of force_sig will take this path as they
> almost all come from exception handlers.  It is the nature of signals
> that report on faults.  An exception is raised and the kernel reports it
> to userspace with a fault signal (aka force_sig_xxx).
>
> Further this code is buggy.  TIF_NOTIFY_RESUME is not the correct
> flag to set to enter into exit_to_usermode_loop.  TIF_NOTIFY_RESUME is
> about that happens after signal handling.  This very much needs to be
> TIF_SIGPENDING with recalc_sigpending and friends updated to know about
> "task->force_info".
>
> Does someone own this problem?  Can that person please fix this
> properly?
>
> I really don't think it is going to be maintainable for PREEMPT_RT to
> maintain a separate signal delivery path for faults from the rest of
> linux.

I want to say the patch below looks like it was a perfectly fine debug
patch to see if what someone thinks is the issue is the issue.  It is
not a good final solution for the reasons I have already mentioned.

May I ask where the rest of the conversation was?  I can only find the
single posting of this patch on linux-kernel without any conversation,
and the description indicates this change has seen several rounds of
development.

Eric

>> [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
>>   ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
>> [bigeasy: Add on 32bit as per Yang Shi, minor rewording. ]
>>
>> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> ---
>>  arch/x86/include/asm/signal.h | 13 +++++++++++++
>>  include/linux/sched.h         |  3 +++
>>  kernel/entry/common.c         |  9 +++++++++
>>  kernel/signal.c               | 28 ++++++++++++++++++++++++++++
>>  4 files changed, 53 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h
>> index 2dfb5fea13aff..fc03f4f7ed84c 100644
>> --- a/arch/x86/include/asm/signal.h
>> +++ b/arch/x86/include/asm/signal.h
>> @@ -28,6 +28,19 @@ typedef struct {
>>  #define SA_IA32_ABI	0x02000000u
>>  #define SA_X32_ABI	0x01000000u
>>  
>> +/*
>> + * Because some traps use the IST stack, we must keep preemption
>> + * disabled while calling do_trap(), but do_trap() may call
>> + * force_sig_info() which will grab the signal spin_locks for the
>> + * task, which in PREEMPT_RT are mutexes.  By defining
>> + * ARCH_RT_DELAYS_SIGNAL_SEND the force_sig_info() will set
>> + * TIF_NOTIFY_RESUME and set up the signal to be sent on exit of the
>> + * trap.
>> + */
>> +#if defined(CONFIG_PREEMPT_RT)
>> +#define ARCH_RT_DELAYS_SIGNAL_SEND
>> +#endif
>> +
>>  #ifndef CONFIG_COMPAT
>>  #define compat_sigset_t compat_sigset_t
>>  typedef sigset_t compat_sigset_t;
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 75ba8aa60248b..0514237cee3fc 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1087,6 +1087,9 @@ struct task_struct {
>>  	/* Restored if set_restore_sigmask() was used: */
>>  	sigset_t			saved_sigmask;
>>  	struct sigpending		pending;
>> +#ifdef CONFIG_PREEMPT_RT
>> +	struct				kernel_siginfo forced_info;
>> +#endif
>>  	unsigned long			sas_ss_sp;
>>  	size_t				sas_ss_size;
>>  	unsigned int			sas_ss_flags;
>> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
>> index bad713684c2e3..216dbf46e05f5 100644
>> --- a/kernel/entry/common.c
>> +++ b/kernel/entry/common.c
>> @@ -162,6 +162,15 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>>  		if (ti_work & _TIF_NEED_RESCHED)
>>  			schedule();
>>  
>> +#ifdef ARCH_RT_DELAYS_SIGNAL_SEND
>> +		if (unlikely(current->forced_info.si_signo)) {
>> +			struct task_struct *t = current;
>> +
>> +			force_sig_info(&t->forced_info);
>> +			t->forced_info.si_signo = 0;
>> +		}
>> +#endif
>> +
>>  		if (ti_work & _TIF_UPROBE)
>>  			uprobe_notify_resume(regs);
>>  
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 9b04631acde8f..cb2b28c17c0a5 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -1327,6 +1327,34 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
>>  	struct k_sigaction *action;
>>  	int sig = info->si_signo;
>>  
>> +	/*
>> +	 * On some archs, PREEMPT_RT has to delay sending a signal from a trap
>> +	 * since it can not enable preemption, and the signal code's spin_locks
>> +	 * turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME which will
>> +	 * send the signal on exit of the trap.
>> +	 */
>> +#ifdef ARCH_RT_DELAYS_SIGNAL_SEND
>> +	if (in_atomic()) {
>> +		struct task_struct *t = current;
>> +
>> +		if (WARN_ON_ONCE(t->forced_info.si_signo))
>> +			return 0;
>> +
>> +		if (is_si_special(info)) {
>> +			WARN_ON_ONCE(info != SEND_SIG_PRIV);
>> +			t->forced_info.si_signo = info->si_signo;
>> +			t->forced_info.si_errno = 0;
>> +			t->forced_info.si_code = SI_KERNEL;
>> +			t->forced_info.si_pid = 0;
>> +			t->forced_info.si_uid = 0;
>> +		} else {
>> +			t->forced_info = *info;
>> +		}
>> +
>> +		set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
>> +		return 0;
>> +	}
>> +#endif
>>  	spin_lock_irqsave(&t->sighand->siglock, flags);
>>  	action = &t->sighand->action[sig-1];
>>  	ignored = action->sa.sa_handler == SIG_IGN;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] signal/x86: Delay calling signals in atomic
  2022-03-28 14:25 ` Eric W. Biederman
  2022-03-28 14:41   ` Eric W. Biederman
@ 2022-03-28 14:48   ` Eric W. Biederman
  2022-03-28 16:17   ` Sebastian Andrzej Siewior
  2022-03-28 16:28   ` Thomas Gleixner
  3 siblings, 0 replies; 12+ messages in thread
From: Eric W. Biederman @ 2022-03-28 14:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: x86, linux-kernel, Oleg Nesterov, H. Peter Anvin,
	Andy Lutomirski, Ben Segall, Borislav Petkov,
	Daniel Bristot de Oliveira, Dave Hansen, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Vincent Guittot

"Eric W. Biederman" <ebiederm@xmission.com> writes:

>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 9b04631acde8f..cb2b28c17c0a5 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -1327,6 +1327,34 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
>>  	struct k_sigaction *action;
>>  	int sig = info->si_signo;
        ^^^^^^^^^^^^^^^^^^^^^^^^^
See info->si_signo is unconditionally dereferenced.
>>  
>> +	/*
>> +	 * On some archs, PREEMPT_RT has to delay sending a signal from a trap
>> +	 * since it can not enable preemption, and the signal code's spin_locks
>> +	 * turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME which will
>> +	 * send the signal on exit of the trap.
>> +	 */
>> +#ifdef ARCH_RT_DELAYS_SIGNAL_SEND
>> +	if (in_atomic()) {
>> +		struct task_struct *t = current;
>> +
>> +		if (WARN_ON_ONCE(t->forced_info.si_signo))
>> +			return 0;
>> +
>> +		if (is_si_special(info)) {
                    ^^^^^^^^^^^^^^^^^^^
This tests to see if info is either 0 or 1 cased to as "kernel_siginfo *"

Which is a long way of saying this code already guarantees that
is_si_special is guaranteed to be false so most of this code is
completely unnecessary.


>> +			WARN_ON_ONCE(info != SEND_SIG_PRIV);
>> +			t->forced_info.si_signo = info->si_signo;
>> +			t->forced_info.si_errno = 0;
>> +			t->forced_info.si_code = SI_KERNEL;
>> +			t->forced_info.si_pid = 0;
>> +			t->forced_info.si_uid = 0;
>> +		} else {
>> +			t->forced_info = *info;
>> +		}
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
All of that can just be:

		copy_siginfo(&t->force_info, info);

Using a structure copy here that gcc is allowed to not copy bytes in the
padding is wrong in that it allows an information leak to userspace.


>> +		set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
>> +		return 0;
>> +	}
>> +#endif
>>  	spin_lock_irqsave(&t->sighand->siglock, flags);
>>  	action = &t->sighand->action[sig-1];
>>  	ignored = action->sa.sa_handler == SIG_IGN;

Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] signal/x86: Delay calling signals in atomic
  2022-03-28 14:25 ` Eric W. Biederman
  2022-03-28 14:41   ` Eric W. Biederman
  2022-03-28 14:48   ` Eric W. Biederman
@ 2022-03-28 16:17   ` Sebastian Andrzej Siewior
  2022-03-28 22:07     ` Eric W. Biederman
  2022-03-28 16:28   ` Thomas Gleixner
  3 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-03-28 16:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: x86, linux-kernel, Oleg Nesterov, H. Peter Anvin,
	Andy Lutomirski, Ben Segall, Borislav Petkov,
	Daniel Bristot de Oliveira, Dave Hansen, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Vincent Guittot

On 2022-03-28 09:25:06 [-0500], Eric W. Biederman wrote:
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> 
> Folks I really would have appreciated being copied on a signal handling
> patch like this.

Sorry for that. For the whole ptrace/signal part is no maintainer listed
and I got the feeling that Oleg knows these bits.

> It is too late to nack, but I think this buggy patch deserved one.  Can
> we please fix PREEMPT_RT instead?

Sure.

> As far as I can tell this violates all of rules from
> implementing/maintaining the RT kernel.  Instead of coming up with new
> abstractions that makes sense and can use by everyone this introduces
> a hack only for PREEMPT_RT and a pretty horrible one at that.
>
> This talks about int3, but the code looks for in_atomic().  Which means
> that essentially every call of force_sig will take this path as they
> almost all come from exception handlers.  It is the nature of signals
> that report on faults.  An exception is raised and the kernel reports it
> to userspace with a fault signal (aka force_sig_xxx).

The int3 is invoked with disabled interrupts. There are also a few
others path which are explicit with disabled interrupts or with a
raw_spinlock_t which lead to an atomic section on PREEMPT_RT. Call
chains with spinlock_t or a rwlock_t don't lead to a atomic section on
PREEMPT_RT. Therefore I don't think this is "essentially every call of
force_sig" that is going to use that.

> Further this code is buggy.  TIF_NOTIFY_RESUME is not the correct
> flag to set to enter into exit_to_usermode_loop.  TIF_NOTIFY_RESUME is
> about that happens after signal handling.  This very much needs to be
> TIF_SIGPENDING with recalc_sigpending and friends updated to know about
> "task->force_info".
> 
> Does someone own this problem?  Can that person please fix this
> properly?

Sure. Instead setting TIF_NOTIFY_RESUME you want the code updated to use
recalc_sigpending() only. Or do you have other suggestions regarding
fixing this properly?

> I really don't think it is going to be maintainable for PREEMPT_RT to
> maintain a separate signal delivery path for faults from the rest of
> linux.

Okay.

> Eric

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] signal/x86: Delay calling signals in atomic
  2022-03-28 14:25 ` Eric W. Biederman
                     ` (2 preceding siblings ...)
  2022-03-28 16:17   ` Sebastian Andrzej Siewior
@ 2022-03-28 16:28   ` Thomas Gleixner
  3 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2022-03-28 16:28 UTC (permalink / raw)
  To: Eric W. Biederman, Sebastian Andrzej Siewior
  Cc: x86, linux-kernel, Oleg Nesterov, H. Peter Anvin,
	Andy Lutomirski, Ben Segall, Borislav Petkov,
	Daniel Bristot de Oliveira, Dave Hansen, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra,
	Steven Rostedt, Vincent Guittot

On Mon, Mar 28 2022 at 09:25, Eric W. Biederman wrote:
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> Further this code is buggy.  TIF_NOTIFY_RESUME is not the correct
> flag to set to enter into exit_to_usermode_loop.  TIF_NOTIFY_RESUME is
> about that happens after signal handling.  This very much needs to be
> TIF_SIGPENDING with recalc_sigpending and friends updated to know about
> "task->force_info".
>
> Does someone own this problem?  Can that person please fix this
> properly?
>
> I really don't think it is going to be maintainable for PREEMPT_RT to
> maintain a separate signal delivery path for faults from the rest of
> linux.

Fair enough. Let me queue a revert and go back to the drawing board.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] signal/x86: Delay calling signals in atomic
  2022-03-28 14:41   ` Eric W. Biederman
@ 2022-03-28 16:32     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-03-28 16:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: x86, linux-kernel, Oleg Nesterov, H. Peter Anvin,
	Andy Lutomirski, Ben Segall, Borislav Petkov,
	Daniel Bristot de Oliveira, Dave Hansen, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Vincent Guittot

On 2022-03-28 09:41:37 [-0500], Eric W. Biederman wrote:
> I want to say the patch below looks like it was a perfectly fine debug
> patch to see if what someone thinks is the issue is the issue.  It is
> not a good final solution for the reasons I have already mentioned.
> 
> May I ask where the rest of the conversation was?  I can only find the
> single posting of this patch on linux-kernel without any conversation,
> and the description indicates this change has seen several rounds of
> development.

There was not feedback based on what has been posted to lkml.
This, was ended as this patch, was originally posted by Steven as
	https://lore.kernel.org/linux-rt-users/20120124191454.345715521@goodmis.org/

a few iterations later we got to v4 in
	https://lore.kernel.org/linux-rt-users/20120203183041.427463295@goodmis.org/

which got review from Oleg and then become Oleg's in v5
	https://lore.kernel.org/linux-rt-users/20120208005850.233662427@goodmis.org/

and was part of the RT queue since v3.0.20-rt36.

> Eric

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] signal/x86: Delay calling signals in atomic
  2022-03-28 16:17   ` Sebastian Andrzej Siewior
@ 2022-03-28 22:07     ` Eric W. Biederman
  2022-03-29  9:31       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2022-03-28 22:07 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: x86, linux-kernel, Oleg Nesterov, H. Peter Anvin,
	Andy Lutomirski, Ben Segall, Borislav Petkov,
	Daniel Bristot de Oliveira, Dave Hansen, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Vincent Guittot

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-03-28 09:25:06 [-0500], Eric W. Biederman wrote:
>> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
>> 
>> Folks I really would have appreciated being copied on a signal handling
>> patch like this.
>
> Sorry for that. For the whole ptrace/signal part is no maintainer listed
> and I got the feeling that Oleg knows these bits.

Oleg and I maybe a couple of others have been doing the work.

I happen to know the fine differences between TIF_SIGPENDING and
TIF_NOTIFY_RESUME because I have just been doing some cleanup work in
that area.  Plus cleanup in the force_sig_info_to_task area.

>> It is too late to nack, but I think this buggy patch deserved one.  Can
>> we please fix PREEMPT_RT instead?
>
> Sure.
>
>> As far as I can tell this violates all of rules from
>> implementing/maintaining the RT kernel.  Instead of coming up with new
>> abstractions that makes sense and can use by everyone this introduces
>> a hack only for PREEMPT_RT and a pretty horrible one at that.
>>
>> This talks about int3, but the code looks for in_atomic().  Which means
>> that essentially every call of force_sig will take this path as they
>> almost all come from exception handlers.  It is the nature of signals
>> that report on faults.  An exception is raised and the kernel reports it
>> to userspace with a fault signal (aka force_sig_xxx).
>
> The int3 is invoked with disabled interrupts. There are also a few
> others path which are explicit with disabled interrupts or with a
> raw_spinlock_t which lead to an atomic section on PREEMPT_RT. Call
> chains with spinlock_t or a rwlock_t don't lead to a atomic section on
> PREEMPT_RT. Therefore I don't think this is "essentially every call of
> force_sig" that is going to use that.

I was assuming that exception handlers would be like int3 and are in
general cases that would be places that are atomic and don't allow
scheduling.  I admit I don't know the PREEMPT_RT rules though.


>> Further this code is buggy.  TIF_NOTIFY_RESUME is not the correct
>> flag to set to enter into exit_to_usermode_loop.  TIF_NOTIFY_RESUME is
>> about that happens after signal handling.  This very much needs to be
>> TIF_SIGPENDING with recalc_sigpending and friends updated to know about
>> "task->force_info".
>> 
>> Does someone own this problem?  Can that person please fix this
>> properly?
>
> Sure. Instead setting TIF_NOTIFY_RESUME you want the code updated to use
> recalc_sigpending() only. Or do you have other suggestions regarding
> fixing this properly?

The rule with TIF_SIGPENDING (which should be used if this technique is
used at all), is that recalc_sigpending needs to know how to test if
the signal is still pending.

>> I really don't think it is going to be maintainable for PREEMPT_RT to
>> maintain a separate signal delivery path for faults from the rest of
>> linux.
>
> Okay.

We have a few other cases where we deliver signals from interrupts.
Off the top of my head there is SAK and magic sysrq, but I think there
are more.  So I am also not convinced that all signals you care about
will go through force_sig_info_to_task.

What I really don't know and this is essentially a PREEMPT_RT question
is what makes int3 special?  Why don't other faults have this problem?

I remember there was a change where we had threaded interrupt handlers
to get around this for interrupt service routines.  I wonder if we need
to do something similar with faults.  Have a fast part and a threaded
part that runs in a schedulable way.  Although given that for a fault
you need to be fundamentally bound to the task/thread you faulted from
it probably just means having a way to switch to a kernel stack that you
can schedule from, and not use a reserved per cpu stack.  The
task_struct would certainly need to stay the same for all of the pieces.

Or maybe for PREMPT_RT you pick the i386 mechanism.  How does PREEMPT_RT
deal with page faults, or general protection faults?

This is my long winded way of saying that I rather expect that if
PREEMPT_RT is going to call code it has modified to be sleeping that it
would also make it safe for that code to sleep.

Further (and this is probably my ignorance) I just don't see what makes
any of this specific to just int3.  Why aren't other faults affected?

Eric


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] signal/x86: Delay calling signals in atomic
  2022-03-28 22:07     ` Eric W. Biederman
@ 2022-03-29  9:31       ` Sebastian Andrzej Siewior
  2022-03-30 18:10         ` Eric W. Biederman
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-03-29  9:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: x86, linux-kernel, Oleg Nesterov, H. Peter Anvin,
	Andy Lutomirski, Ben Segall, Borislav Petkov,
	Daniel Bristot de Oliveira, Dave Hansen, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Vincent Guittot

On 2022-03-28 17:07:25 [-0500], Eric W. Biederman wrote:
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> 
> We have a few other cases where we deliver signals from interrupts.
> Off the top of my head there is SAK and magic sysrq, but I think there
> are more.  So I am also not convinced that all signals you care about
> will go through force_sig_info_to_task.
> 
> What I really don't know and this is essentially a PREEMPT_RT question
> is what makes int3 special?  Why don't other faults have this problem?

int3 on x86 is delivered from the debug interrupt and at this point
interrupts are in general disabled even on PREEMPT_RT.
If you are in a section which disables interrupts via
spin_lock_irqsave() then interrupts are not disabled on PREEMPT_RT.
If you are in an interrupt handler (as per request_irq(), not the
special vector that is used for int3 handling) then interrupts are also
not disabled because the interrupt handler is threaded by default.

In both cases in_atomic() reports 0 on PREEMPT_RT. So the exception is:
- explicit usage of local_irq_{save|disable}, preempt_disable().
- usage of raw_spinlock_t locks.
- interrupts vectors which are not threaded (like int3 or the SMP IPI
  function call).

> I remember there was a change where we had threaded interrupt handlers
> to get around this for interrupt service routines.  I wonder if we need
> to do something similar with faults.  Have a fast part and a threaded
> part that runs in a schedulable way.  Although given that for a fault
> you need to be fundamentally bound to the task/thread you faulted from
> it probably just means having a way to switch to a kernel stack that you
> can schedule from, and not use a reserved per cpu stack.  The
> task_struct would certainly need to stay the same for all of the pieces.
> 
> Or maybe for PREMPT_RT you pick the i386 mechanism.  How does PREEMPT_RT
> deal with page faults, or general protection faults?

An in-kernel stack overflow will panic() with interrupts disabled.
An in-kernel NULL-pointer is also entered with disabled interrupts and
complains later about sleeping locks in do_exit(). I do remember that
the arch code conditionally enabled interrupts based on IRQ-flags on
stack.

> This is my long winded way of saying that I rather expect that if
> PREEMPT_RT is going to call code it has modified to be sleeping that it
> would also make it safe for that code to sleep.
> 
> Further (and this is probably my ignorance) I just don't see what makes
> any of this specific to just int3.  Why aren't other faults affected?

That NULL-pointer in kernel doesn't look good. If you have a test-case
(like do this) then I can definitely look into it in case more is
missed.

> Eric

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] signal/x86: Delay calling signals in atomic
  2022-03-29  9:31       ` Sebastian Andrzej Siewior
@ 2022-03-30 18:10         ` Eric W. Biederman
  2022-04-01 11:45           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2022-03-30 18:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: x86, linux-kernel, Oleg Nesterov, H. Peter Anvin,
	Andy Lutomirski, Ben Segall, Borislav Petkov,
	Daniel Bristot de Oliveira, Dave Hansen, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Vincent Guittot

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-03-28 17:07:25 [-0500], Eric W. Biederman wrote:
>> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
>> 
>> We have a few other cases where we deliver signals from interrupts.
>> Off the top of my head there is SAK and magic sysrq, but I think there
>> are more.  So I am also not convinced that all signals you care about
>> will go through force_sig_info_to_task.
>> 
>> What I really don't know and this is essentially a PREEMPT_RT question
>> is what makes int3 special?  Why don't other faults have this problem?
>
> int3 on x86 is delivered from the debug interrupt and at this point
> interrupts are in general disabled even on PREEMPT_RT.
> If you are in a section which disables interrupts via
> spin_lock_irqsave() then interrupts are not disabled on PREEMPT_RT.
> If you are in an interrupt handler (as per request_irq(), not the
> special vector that is used for int3 handling) then interrupts are also
> not disabled because the interrupt handler is threaded by default.
>
> In both cases in_atomic() reports 0 on PREEMPT_RT. So the exception is:
> - explicit usage of local_irq_{save|disable}, preempt_disable().
> - usage of raw_spinlock_t locks.
> - interrupts vectors which are not threaded (like int3 or the SMP IPI
>   function call).
>
>> I remember there was a change where we had threaded interrupt handlers
>> to get around this for interrupt service routines.  I wonder if we need
>> to do something similar with faults.  Have a fast part and a threaded
>> part that runs in a schedulable way.  Although given that for a fault
>> you need to be fundamentally bound to the task/thread you faulted from
>> it probably just means having a way to switch to a kernel stack that you
>> can schedule from, and not use a reserved per cpu stack.  The
>> task_struct would certainly need to stay the same for all of the pieces.
>> 
>> Or maybe for PREMPT_RT you pick the i386 mechanism.  How does PREEMPT_RT
>> deal with page faults, or general protection faults?
>
> An in-kernel stack overflow will panic() with interrupts disabled.
> An in-kernel NULL-pointer is also entered with disabled interrupts and
> complains later about sleeping locks in do_exit(). I do remember that
> the arch code conditionally enabled interrupts based on IRQ-flags on
> stack.
>
>> This is my long winded way of saying that I rather expect that if
>> PREEMPT_RT is going to call code it has modified to be sleeping that it
>> would also make it safe for that code to sleep.
>> 
>> Further (and this is probably my ignorance) I just don't see what makes
>> any of this specific to just int3.  Why aren't other faults affected?
>
> That NULL-pointer in kernel doesn't look good. If you have a test-case
> (like do this) then I can definitely look into it in case more is
> missed.

The linux kernel dump test module aka drivers/misc/lkdtm should have
that and several other nasty ways to die as a test cases.

So I am confused.  The call path that is trying to be fixed is:

idtentry
  idtentry_body
    exc_int3
      do_int3_user
        do_trap
          force_sig

There are a multiple am I from userspace checks and I did not verify
they are all equal, so I may have missed something subtle.

But it looks like if we are coming from userspace then we use the same
stack as any other time we would come from userspace.  AKA a stack
that allows the kernel to sleep.

So I don't see what the problem is that is trying to be fixed.

I know that code has been changed over the years, perhaps this is
something that was fixed upstream and the real time tree didn't realize
there was no longer a need to fix anything?

Or am I missing something subtle when reading the idtentry assembly?

Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] signal/x86: Delay calling signals in atomic
  2022-03-30 18:10         ` Eric W. Biederman
@ 2022-04-01 11:45           ` Sebastian Andrzej Siewior
  2022-04-04 14:29             ` Eric W. Biederman
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-04-01 11:45 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: x86, linux-kernel, Oleg Nesterov, H. Peter Anvin,
	Andy Lutomirski, Ben Segall, Borislav Petkov,
	Daniel Bristot de Oliveira, Dave Hansen, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Vincent Guittot

On 2022-03-30 13:10:05 [-0500], Eric W. Biederman wrote:
> But it looks like if we are coming from userspace then we use the same
> stack as any other time we would come from userspace.  AKA a stack
> that allows the kernel to sleep.
> 
> So I don't see what the problem is that is trying to be fixed.

It is not only the stack. In atomic context / disabled interrupts it is
not possible to acquire a spinlock_t (sighand_struct::siglock) which is
done later.

> I know that code has been changed over the years, perhaps this is
> something that was fixed upstream and the real time tree didn't realize
> there was no longer a need to fix anything?
> 
> Or am I missing something subtle when reading the idtentry assembly?

It certainly is true that the code changed over the years. The per-CPU
stack is one problem, the siglock in atomic context is the other one.
Thank you for the input. Let me digest the informations I have here and
get back.

> Eric

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] signal/x86: Delay calling signals in atomic
  2022-04-01 11:45           ` Sebastian Andrzej Siewior
@ 2022-04-04 14:29             ` Eric W. Biederman
  0 siblings, 0 replies; 12+ messages in thread
From: Eric W. Biederman @ 2022-04-04 14:29 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: x86, linux-kernel, Oleg Nesterov, H. Peter Anvin,
	Andy Lutomirski, Ben Segall, Borislav Petkov,
	Daniel Bristot de Oliveira, Dave Hansen, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Vincent Guittot

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-03-30 13:10:05 [-0500], Eric W. Biederman wrote:
>> But it looks like if we are coming from userspace then we use the same
>> stack as any other time we would come from userspace.  AKA a stack
>> that allows the kernel to sleep.
>> 
>> So I don't see what the problem is that is trying to be fixed.
>
> It is not only the stack. In atomic context / disabled interrupts it is
> not possible to acquire a spinlock_t (sighand_struct::siglock) which is
> done later.

Looking at do_int3_user the interrupts must be enabled.
>
>> I know that code has been changed over the years, perhaps this is
>> something that was fixed upstream and the real time tree didn't realize
>> there was no longer a need to fix anything?
>> 
>> Or am I missing something subtle when reading the idtentry assembly?
>
> It certainly is true that the code changed over the years. The per-CPU
> stack is one problem, the siglock in atomic context is the other one.
> Thank you for the input. Let me digest the informations I have here and
> get back.

Certainly.  I case it helps this is the relevant bit of code:

static void do_int3_user(struct pt_regs *regs)
{
	if (do_int3(regs))
		return;

	cond_local_irq_enable(regs);
	do_trap(X86_TRAP_BP, SIGTRAP, "int3", regs, 0, 0, NULL);
	cond_local_irq_disable(regs);
}

The signal delivery where siglock is take happens inside of do_trap.  If
we are coming from kernel mode only do_int3 is called.

Coming from user_mode we switch to the task stack and
enable interrupts.

Unless I am misreading the code the cond_local_irq_{enable/disable} can
correctly be replaced local_irq_{enable/disable} as coming from user
mode interrupts are always enabled.

Unless I am misreading cond_local_irq_enable.  If for some reason
cond_local_irq_enable doesn't enable interrupts when come from user
mode fixing that appears to be the fix that is needed.


Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-04-04 14:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-14 20:19 [PATCH] signal/x86: Delay calling signals in atomic Sebastian Andrzej Siewior
2022-03-28 14:25 ` Eric W. Biederman
2022-03-28 14:41   ` Eric W. Biederman
2022-03-28 16:32     ` Sebastian Andrzej Siewior
2022-03-28 14:48   ` Eric W. Biederman
2022-03-28 16:17   ` Sebastian Andrzej Siewior
2022-03-28 22:07     ` Eric W. Biederman
2022-03-29  9:31       ` Sebastian Andrzej Siewior
2022-03-30 18:10         ` Eric W. Biederman
2022-04-01 11:45           ` Sebastian Andrzej Siewior
2022-04-04 14:29             ` Eric W. Biederman
2022-03-28 16:28   ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.