IO-Uring Archive on lore.kernel.org
 help / color / Atom feed
* [PATCHSET v4] Add support for TIF_NOTIFY_SIGNAL
@ 2020-10-08 15:27 Jens Axboe
  2020-10-08 15:27 ` [PATCH 1/4] tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume() Jens Axboe
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Jens Axboe @ 2020-10-08 15:27 UTC (permalink / raw)
  To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx

Hi,

The goal is this patch series is to decouple TWA_SIGNAL based task_work
from real signals and signal delivery. The motivation is speeding up
TWA_SIGNAL based task_work, particularly for threaded setups where
->sighand is shared across threads. See the last patch for numbers.

v4 is nicely reduced, thanks to feedback from Oleg, dropping two of the
core patches and resulting in something that is easier to adopt in other
archs as well.

 arch/alpha/kernel/signal.c         |  1 -
 arch/arc/kernel/signal.c           |  2 +-
 arch/arm/kernel/signal.c           |  1 -
 arch/arm64/kernel/signal.c         |  1 -
 arch/c6x/kernel/signal.c           |  4 +--
 arch/csky/kernel/signal.c          |  1 -
 arch/h8300/kernel/signal.c         |  4 +--
 arch/hexagon/kernel/process.c      |  1 -
 arch/ia64/kernel/process.c         |  2 +-
 arch/m68k/kernel/signal.c          |  2 +-
 arch/microblaze/kernel/signal.c    |  2 +-
 arch/mips/kernel/signal.c          |  1 -
 arch/nds32/kernel/signal.c         |  4 +--
 arch/nios2/kernel/signal.c         |  2 +-
 arch/openrisc/kernel/signal.c      |  1 -
 arch/parisc/kernel/signal.c        |  4 +--
 arch/powerpc/kernel/signal.c       |  1 -
 arch/riscv/kernel/signal.c         |  4 +--
 arch/s390/kernel/signal.c          |  1 -
 arch/sh/kernel/signal_32.c         |  4 +--
 arch/sparc/kernel/signal_32.c      |  4 +--
 arch/sparc/kernel/signal_64.c      |  4 +--
 arch/um/kernel/process.c           |  2 +-
 arch/x86/include/asm/thread_info.h |  2 ++
 arch/x86/kernel/signal.c           |  5 +++-
 arch/xtensa/kernel/signal.c        |  2 +-
 include/linux/entry-common.h       |  6 ++++-
 include/linux/entry-kvm.h          |  4 +--
 include/linux/sched/signal.h       | 20 ++++++++++++---
 include/linux/tracehook.h          | 31 ++++++++++++++++++++--
 kernel/entry/common.c              |  3 +--
 kernel/entry/kvm.c                 |  7 ++---
 kernel/events/uprobes.c            |  2 +-
 kernel/signal.c                    |  8 +++---
 kernel/task_work.c                 | 41 +++++++++++++++++++++---------
 35 files changed, 113 insertions(+), 71 deletions(-)

Also find the changes here:

https://git.kernel.dk/cgit/linux-block/log/?h=tif-task_work

Changes since v3:

- Drop not needed io_uring change
- Drop syscall restart split, handle TIF_NOTIFY_SIGNAL from the arch
  signal handling, using task_sigpending() to see if we need to care
  about real signals.
- Fix a few over-zelaous task_sigpending() changes
- Cleanup WARN_ON() in restore_saved_sigmask_unless()

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
  2020-10-08 15:27 [PATCHSET v4] Add support for TIF_NOTIFY_SIGNAL Jens Axboe
@ 2020-10-08 15:27 ` Jens Axboe
  2020-10-13 23:35   ` Thomas Gleixner
  2020-10-08 15:27 ` [PATCH 2/4] kernel: add task_sigpending() helper Jens Axboe
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2020-10-08 15:27 UTC (permalink / raw)
  To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx, Jens Axboe

All the callers currently do this, clean it up and move the clearing
into tracehook_notify_resume() instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 arch/alpha/kernel/signal.c      | 1 -
 arch/arc/kernel/signal.c        | 2 +-
 arch/arm/kernel/signal.c        | 1 -
 arch/arm64/kernel/signal.c      | 1 -
 arch/c6x/kernel/signal.c        | 4 +---
 arch/csky/kernel/signal.c       | 1 -
 arch/h8300/kernel/signal.c      | 4 +---
 arch/hexagon/kernel/process.c   | 1 -
 arch/ia64/kernel/process.c      | 2 +-
 arch/m68k/kernel/signal.c       | 2 +-
 arch/microblaze/kernel/signal.c | 2 +-
 arch/mips/kernel/signal.c       | 1 -
 arch/nds32/kernel/signal.c      | 4 +---
 arch/nios2/kernel/signal.c      | 2 +-
 arch/openrisc/kernel/signal.c   | 1 -
 arch/parisc/kernel/signal.c     | 4 +---
 arch/powerpc/kernel/signal.c    | 1 -
 arch/riscv/kernel/signal.c      | 4 +---
 arch/s390/kernel/signal.c       | 1 -
 arch/sh/kernel/signal_32.c      | 4 +---
 arch/sparc/kernel/signal_32.c   | 4 +---
 arch/sparc/kernel/signal_64.c   | 4 +---
 arch/um/kernel/process.c        | 2 +-
 arch/xtensa/kernel/signal.c     | 2 +-
 include/linux/tracehook.h       | 4 ++--
 kernel/entry/common.c           | 1 -
 kernel/entry/kvm.c              | 4 +---
 27 files changed, 18 insertions(+), 46 deletions(-)

diff --git a/arch/alpha/kernel/signal.c b/arch/alpha/kernel/signal.c
index 15bc9d1e79f4..3739efce1ec0 100644
--- a/arch/alpha/kernel/signal.c
+++ b/arch/alpha/kernel/signal.c
@@ -531,7 +531,6 @@ do_work_pending(struct pt_regs *regs, unsigned long thread_flags,
 				do_signal(regs, r0, r19);
 				r0 = 0;
 			} else {
-				clear_thread_flag(TIF_NOTIFY_RESUME);
 				tracehook_notify_resume(regs);
 			}
 		}
diff --git a/arch/arc/kernel/signal.c b/arch/arc/kernel/signal.c
index 8222f8c54690..2be55fb96d87 100644
--- a/arch/arc/kernel/signal.c
+++ b/arch/arc/kernel/signal.c
@@ -394,6 +394,6 @@ void do_notify_resume(struct pt_regs *regs)
 	 * ASM glue gaurantees that this is only called when returning to
 	 * user mode
 	 */
-	if (test_and_clear_thread_flag(TIF_NOTIFY_RESUME))
+	if (test_thread_flag(TIF_NOTIFY_RESUME))
 		tracehook_notify_resume(regs);
 }
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index c1892f733f20..585edbfccf6d 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -669,7 +669,6 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall)
 			} else if (thread_flags & _TIF_UPROBE) {
 				uprobe_notify_resume(regs);
 			} else {
-				clear_thread_flag(TIF_NOTIFY_RESUME);
 				tracehook_notify_resume(regs);
 				rseq_handle_notify_resume(NULL, regs);
 			}
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 3b4f31f35e45..4a6e1dc480c1 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -936,7 +936,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 				do_signal(regs);
 
 			if (thread_flags & _TIF_NOTIFY_RESUME) {
-				clear_thread_flag(TIF_NOTIFY_RESUME);
 				tracehook_notify_resume(regs);
 				rseq_handle_notify_resume(NULL, regs);
 			}
diff --git a/arch/c6x/kernel/signal.c b/arch/c6x/kernel/signal.c
index d05c78eace1b..a3f15b9a79da 100644
--- a/arch/c6x/kernel/signal.c
+++ b/arch/c6x/kernel/signal.c
@@ -316,8 +316,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, u32 thread_info_flags,
 	if (thread_info_flags & (1 << TIF_SIGPENDING))
 		do_signal(regs, syscall);
 
-	if (thread_info_flags & (1 << TIF_NOTIFY_RESUME)) {
-		clear_thread_flag(TIF_NOTIFY_RESUME);
+	if (thread_info_flags & (1 << TIF_NOTIFY_RESUME))
 		tracehook_notify_resume(regs);
-	}
 }
diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c
index 970895df75ec..8b068cf37447 100644
--- a/arch/csky/kernel/signal.c
+++ b/arch/csky/kernel/signal.c
@@ -261,7 +261,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 		do_signal(regs);
 
 	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
-		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
 		rseq_handle_notify_resume(NULL, regs);
 	}
diff --git a/arch/h8300/kernel/signal.c b/arch/h8300/kernel/signal.c
index 69e68949787f..75d9b7e626b2 100644
--- a/arch/h8300/kernel/signal.c
+++ b/arch/h8300/kernel/signal.c
@@ -282,8 +282,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, u32 thread_info_flags)
 	if (thread_info_flags & _TIF_SIGPENDING)
 		do_signal(regs);
 
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
-		clear_thread_flag(TIF_NOTIFY_RESUME);
+	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-	}
 }
diff --git a/arch/hexagon/kernel/process.c b/arch/hexagon/kernel/process.c
index dfd322c5ce83..5a0a95d93ddb 100644
--- a/arch/hexagon/kernel/process.c
+++ b/arch/hexagon/kernel/process.c
@@ -180,7 +180,6 @@ int do_work_pending(struct pt_regs *regs, u32 thread_info_flags)
 	}
 
 	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
-		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
 		return 1;
 	}
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index f19cb97c0098..8f96cdda2f09 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -189,7 +189,7 @@ do_notify_resume_user(sigset_t *unused, struct sigscratch *scr, long in_syscall)
 		ia64_do_signal(scr, in_syscall);
 	}
 
-	if (test_and_clear_thread_flag(TIF_NOTIFY_RESUME)) {
+	if (test_thread_flag(TIF_NOTIFY_RESUME)) {
 		local_irq_enable();	/* force interrupt enable */
 		tracehook_notify_resume(&scr->pt);
 	}
diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
index a98fca977073..29e174a80bf6 100644
--- a/arch/m68k/kernel/signal.c
+++ b/arch/m68k/kernel/signal.c
@@ -1134,6 +1134,6 @@ void do_notify_resume(struct pt_regs *regs)
 	if (test_thread_flag(TIF_SIGPENDING))
 		do_signal(regs);
 
-	if (test_and_clear_thread_flag(TIF_NOTIFY_RESUME))
+	if (test_thread_flag(TIF_NOTIFY_RESUME))
 		tracehook_notify_resume(regs);
 }
diff --git a/arch/microblaze/kernel/signal.c b/arch/microblaze/kernel/signal.c
index 4a96b59f0bee..f11a0ccccabc 100644
--- a/arch/microblaze/kernel/signal.c
+++ b/arch/microblaze/kernel/signal.c
@@ -316,6 +316,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, int in_syscall)
 	if (test_thread_flag(TIF_SIGPENDING))
 		do_signal(regs, in_syscall);
 
-	if (test_and_clear_thread_flag(TIF_NOTIFY_RESUME))
+	if (test_thread_flag(TIF_NOTIFY_RESUME))
 		tracehook_notify_resume(regs);
 }
diff --git a/arch/mips/kernel/signal.c b/arch/mips/kernel/signal.c
index a0262729cd4c..77d40126b8a9 100644
--- a/arch/mips/kernel/signal.c
+++ b/arch/mips/kernel/signal.c
@@ -901,7 +901,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, void *unused,
 		do_signal(regs);
 
 	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
-		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
 		rseq_handle_notify_resume(NULL, regs);
 	}
diff --git a/arch/nds32/kernel/signal.c b/arch/nds32/kernel/signal.c
index 36e25a410bb0..2acb94812af9 100644
--- a/arch/nds32/kernel/signal.c
+++ b/arch/nds32/kernel/signal.c
@@ -379,8 +379,6 @@ do_notify_resume(struct pt_regs *regs, unsigned int thread_flags)
 	if (thread_flags & _TIF_SIGPENDING)
 		do_signal(regs);
 
-	if (thread_flags & _TIF_NOTIFY_RESUME) {
-		clear_thread_flag(TIF_NOTIFY_RESUME);
+	if (thread_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-	}
 }
diff --git a/arch/nios2/kernel/signal.c b/arch/nios2/kernel/signal.c
index d8a087cf2b42..cf2dca2ac7c3 100644
--- a/arch/nios2/kernel/signal.c
+++ b/arch/nios2/kernel/signal.c
@@ -317,7 +317,7 @@ asmlinkage int do_notify_resume(struct pt_regs *regs)
 			 */
 			return restart;
 		}
-	} else if (test_and_clear_thread_flag(TIF_NOTIFY_RESUME))
+	} else if (test_thread_flag(TIF_NOTIFY_RESUME))
 		tracehook_notify_resume(regs);
 
 	return 0;
diff --git a/arch/openrisc/kernel/signal.c b/arch/openrisc/kernel/signal.c
index c779364f0cd0..af66f968dd45 100644
--- a/arch/openrisc/kernel/signal.c
+++ b/arch/openrisc/kernel/signal.c
@@ -311,7 +311,6 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall)
 				}
 				syscall = 0;
 			} else {
-				clear_thread_flag(TIF_NOTIFY_RESUME);
 				tracehook_notify_resume(regs);
 			}
 		}
diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c
index 3c037fc96038..9f43eaeb0b0a 100644
--- a/arch/parisc/kernel/signal.c
+++ b/arch/parisc/kernel/signal.c
@@ -606,8 +606,6 @@ void do_notify_resume(struct pt_regs *regs, long in_syscall)
 	if (test_thread_flag(TIF_SIGPENDING))
 		do_signal(regs, in_syscall);
 
-	if (test_thread_flag(TIF_NOTIFY_RESUME)) {
-		clear_thread_flag(TIF_NOTIFY_RESUME);
+	if (test_thread_flag(TIF_NOTIFY_RESUME))
 		tracehook_notify_resume(regs);
-	}
 }
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index d15a98c758b8..74a94a125f0d 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -327,7 +327,6 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 	}
 
 	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
-		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
 		rseq_handle_notify_resume(NULL, regs);
 	}
diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
index e996e08f1061..bc6841867b51 100644
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -313,8 +313,6 @@ asmlinkage __visible void do_notify_resume(struct pt_regs *regs,
 	if (thread_info_flags & _TIF_SIGPENDING)
 		do_signal(regs);
 
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
-		clear_thread_flag(TIF_NOTIFY_RESUME);
+	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-	}
 }
diff --git a/arch/s390/kernel/signal.c b/arch/s390/kernel/signal.c
index b295090e2ce6..9e900a8977bd 100644
--- a/arch/s390/kernel/signal.c
+++ b/arch/s390/kernel/signal.c
@@ -535,7 +535,6 @@ void do_signal(struct pt_regs *regs)
 
 void do_notify_resume(struct pt_regs *regs)
 {
-	clear_thread_flag(TIF_NOTIFY_RESUME);
 	tracehook_notify_resume(regs);
 	rseq_handle_notify_resume(NULL, regs);
 }
diff --git a/arch/sh/kernel/signal_32.c b/arch/sh/kernel/signal_32.c
index 4fe3f00137bc..1add47fd31f6 100644
--- a/arch/sh/kernel/signal_32.c
+++ b/arch/sh/kernel/signal_32.c
@@ -502,8 +502,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, unsigned int save_r0,
 	if (thread_info_flags & _TIF_SIGPENDING)
 		do_signal(regs, save_r0);
 
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
-		clear_thread_flag(TIF_NOTIFY_RESUME);
+	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-	}
 }
diff --git a/arch/sparc/kernel/signal_32.c b/arch/sparc/kernel/signal_32.c
index d0e0025ee3ba..741d0701003a 100644
--- a/arch/sparc/kernel/signal_32.c
+++ b/arch/sparc/kernel/signal_32.c
@@ -523,10 +523,8 @@ void do_notify_resume(struct pt_regs *regs, unsigned long orig_i0,
 {
 	if (thread_info_flags & _TIF_SIGPENDING)
 		do_signal(regs, orig_i0);
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
-		clear_thread_flag(TIF_NOTIFY_RESUME);
+	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-	}
 }
 
 asmlinkage int do_sys_sigstack(struct sigstack __user *ssptr,
diff --git a/arch/sparc/kernel/signal_64.c b/arch/sparc/kernel/signal_64.c
index 255264bcb46a..f7ef7edcd5c1 100644
--- a/arch/sparc/kernel/signal_64.c
+++ b/arch/sparc/kernel/signal_64.c
@@ -551,10 +551,8 @@ void do_notify_resume(struct pt_regs *regs, unsigned long orig_i0, unsigned long
 		uprobe_notify_resume(regs);
 	if (thread_info_flags & _TIF_SIGPENDING)
 		do_signal(regs, orig_i0);
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
-		clear_thread_flag(TIF_NOTIFY_RESUME);
+	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-	}
 	user_enter();
 }
 
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 26b5e243d3fc..3bed09538dd9 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -101,7 +101,7 @@ void interrupt_end(void)
 		schedule();
 	if (test_thread_flag(TIF_SIGPENDING))
 		do_signal(regs);
-	if (test_and_clear_thread_flag(TIF_NOTIFY_RESUME))
+	if (test_thread_flag(TIF_NOTIFY_RESUME))
 		tracehook_notify_resume(regs);
 }
 
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index b3b17d6c50f0..1fb1047f905c 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -501,6 +501,6 @@ void do_notify_resume(struct pt_regs *regs)
 	if (test_thread_flag(TIF_SIGPENDING))
 		do_signal(regs);
 
-	if (test_and_clear_thread_flag(TIF_NOTIFY_RESUME))
+	if (test_thread_flag(TIF_NOTIFY_RESUME))
 		tracehook_notify_resume(regs);
 }
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 36fb3bbed6b2..b480e1a07ed8 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -178,9 +178,9 @@ static inline void set_notify_resume(struct task_struct *task)
  */
 static inline void tracehook_notify_resume(struct pt_regs *regs)
 {
+	clear_thread_flag(TIF_NOTIFY_RESUME);
 	/*
-	 * The caller just cleared TIF_NOTIFY_RESUME. This barrier
-	 * pairs with task_work_add()->set_notify_resume() after
+	 * This barrier pairs with task_work_add()->set_notify_resume() after
 	 * hlist_add_head(task->task_works);
 	 */
 	smp_mb__after_atomic();
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 6fdb6105e6d6..d20ab4ac7183 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -161,7 +161,6 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 			arch_do_signal(regs);
 
 		if (ti_work & _TIF_NOTIFY_RESUME) {
-			clear_thread_flag(TIF_NOTIFY_RESUME);
 			tracehook_notify_resume(regs);
 			rseq_handle_notify_resume(NULL, regs);
 		}
diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
index eb1a8a4c867c..b6678a5e3cf6 100644
--- a/kernel/entry/kvm.c
+++ b/kernel/entry/kvm.c
@@ -16,10 +16,8 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
 		if (ti_work & _TIF_NEED_RESCHED)
 			schedule();
 
-		if (ti_work & _TIF_NOTIFY_RESUME) {
-			clear_thread_flag(TIF_NOTIFY_RESUME);
+		if (ti_work & _TIF_NOTIFY_RESUME)
 			tracehook_notify_resume(NULL);
-		}
 
 		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
 		if (ret)
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/4] kernel: add task_sigpending() helper
  2020-10-08 15:27 [PATCHSET v4] Add support for TIF_NOTIFY_SIGNAL Jens Axboe
  2020-10-08 15:27 ` [PATCH 1/4] tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume() Jens Axboe
@ 2020-10-08 15:27 ` Jens Axboe
  2020-10-13 23:36   ` Thomas Gleixner
  2020-10-08 15:27 ` [PATCH 3/4] kernel: add support for TIF_NOTIFY_SIGNAL Jens Axboe
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2020-10-08 15:27 UTC (permalink / raw)
  To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx, Jens Axboe

This is in preparation for maintaining signal_pending() as the decider
of whether or not a schedule() loop should be broken, or continue
sleeping. This is different than the core signal use cases, where we
really want to know if an actual signal is pending or not.
task_sigpending() returns non-zero if TIF_SIGPENDING is set.

Only core kernel use cases should care about the distinction between
the two, make sure those use the task_sigpending() helper.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/sched/signal.h | 9 +++++++--
 kernel/events/uprobes.c      | 2 +-
 kernel/signal.c              | 8 ++++----
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 1bad18a1d8ba..404145dc536e 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -353,11 +353,16 @@ static inline int restart_syscall(void)
 	return -ERESTARTNOINTR;
 }
 
-static inline int signal_pending(struct task_struct *p)
+static inline int task_sigpending(struct task_struct *p)
 {
 	return unlikely(test_tsk_thread_flag(p,TIF_SIGPENDING));
 }
 
+static inline int signal_pending(struct task_struct *p)
+{
+	return task_sigpending(p);
+}
+
 static inline int __fatal_signal_pending(struct task_struct *p)
 {
 	return unlikely(sigismember(&p->pending.signal, SIGKILL));
@@ -365,7 +370,7 @@ static inline int __fatal_signal_pending(struct task_struct *p)
 
 static inline int fatal_signal_pending(struct task_struct *p)
 {
-	return signal_pending(p) && __fatal_signal_pending(p);
+	return task_sigpending(p) && __fatal_signal_pending(p);
 }
 
 static inline int signal_pending_state(long state, struct task_struct *p)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 0e18aaf23a7b..8bb26a338e06 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1973,7 +1973,7 @@ bool uprobe_deny_signal(void)
 
 	WARN_ON_ONCE(utask->state != UTASK_SSTEP);
 
-	if (signal_pending(t)) {
+	if (task_sigpending(t)) {
 		spin_lock_irq(&t->sighand->siglock);
 		clear_tsk_thread_flag(t, TIF_SIGPENDING);
 		spin_unlock_irq(&t->sighand->siglock);
diff --git a/kernel/signal.c b/kernel/signal.c
index a38b3edc6851..9f86246a8637 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -983,7 +983,7 @@ static inline bool wants_signal(int sig, struct task_struct *p)
 	if (task_is_stopped_or_traced(p))
 		return false;
 
-	return task_curr(p) || !signal_pending(p);
+	return task_curr(p) || !task_sigpending(p);
 }
 
 static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
@@ -2822,7 +2822,7 @@ static void retarget_shared_pending(struct task_struct *tsk, sigset_t *which)
 		/* Remove the signals this thread can handle. */
 		sigandsets(&retarget, &retarget, &t->blocked);
 
-		if (!signal_pending(t))
+		if (!task_sigpending(t))
 			signal_wake_up(t, 0);
 
 		if (sigisemptyset(&retarget))
@@ -2856,7 +2856,7 @@ void exit_signals(struct task_struct *tsk)
 
 	cgroup_threadgroup_change_end(tsk);
 
-	if (!signal_pending(tsk))
+	if (!task_sigpending(tsk))
 		goto out;
 
 	unblocked = tsk->blocked;
@@ -2900,7 +2900,7 @@ long do_no_restart_syscall(struct restart_block *param)
 
 static void __set_task_blocked(struct task_struct *tsk, const sigset_t *newset)
 {
-	if (signal_pending(tsk) && !thread_group_empty(tsk)) {
+	if (task_sigpending(tsk) && !thread_group_empty(tsk)) {
 		sigset_t newblocked;
 		/* A set of now blocked but previously unblocked signals. */
 		sigandnsets(&newblocked, newset, &current->blocked);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/4] kernel: add support for TIF_NOTIFY_SIGNAL
  2020-10-08 15:27 [PATCHSET v4] Add support for TIF_NOTIFY_SIGNAL Jens Axboe
  2020-10-08 15:27 ` [PATCH 1/4] tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume() Jens Axboe
  2020-10-08 15:27 ` [PATCH 2/4] kernel: add task_sigpending() helper Jens Axboe
@ 2020-10-08 15:27 ` Jens Axboe
  2020-10-09 14:43   ` Oleg Nesterov
  2020-10-13 23:42   ` Thomas Gleixner
  2020-10-08 15:27 ` [PATCH 4/4] task_work: use TIF_NOTIFY_SIGNAL if available Jens Axboe
  2020-10-09 14:30 ` [PATCHSET v4] Add support for TIF_NOTIFY_SIGNAL Oleg Nesterov
  4 siblings, 2 replies; 20+ messages in thread
From: Jens Axboe @ 2020-10-08 15:27 UTC (permalink / raw)
  To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx, Jens Axboe

This adds TIF_NOTIFY_SIGNAL handling in the generic code, which if set,
will return true if signal_pending() is used in a wait loop. That causes
an exit of the loop so that notify_signal tracehooks can be run. If the
wait loop is currently inside a system call, the system call is restarted
once task_work has been processed.

x86 is using the generic entry code, add the necessary TIF_NOTIFY_SIGNAL
definitions for it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 arch/x86/include/asm/thread_info.h |  2 ++
 arch/x86/kernel/signal.c           |  5 ++++-
 include/linux/entry-common.h       |  6 +++++-
 include/linux/entry-kvm.h          |  4 ++--
 include/linux/sched/signal.h       | 11 ++++++++++-
 include/linux/tracehook.h          | 27 +++++++++++++++++++++++++++
 kernel/entry/common.c              |  2 +-
 kernel/entry/kvm.c                 |  3 +++
 8 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 267701ae3d86..86ade67f21b7 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -93,6 +93,7 @@ struct thread_info {
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
 #define TIF_SLD			18	/* Restore split lock detection on context switch */
+#define TIF_NOTIFY_SIGNAL	19	/* signal notifications exist */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
 #define TIF_IO_BITMAP		22	/* uses I/O bitmap */
@@ -123,6 +124,7 @@ struct thread_info {
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
 #define _TIF_SLD		(1 << TIF_SLD)
+#define _TIF_NOTIFY_SIGNAL	(1 << TIF_NOTIFY_SIGNAL)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
 #define _TIF_FORCED_TF		(1 << TIF_FORCED_TF)
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index be0d7d4152ec..cd140bbf8520 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -808,7 +808,10 @@ void arch_do_signal(struct pt_regs *regs)
 {
 	struct ksignal ksig;
 
-	if (get_signal(&ksig)) {
+	if (test_thread_flag(TIF_NOTIFY_SIGNAL))
+		tracehook_notify_signal();
+
+	if (task_sigpending(current) && get_signal(&ksig)) {
 		/* Whee! Actually deliver the signal.  */
 		handle_signal(&ksig, regs);
 		return;
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index 159c7476b11b..f4234aaac36c 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -37,6 +37,10 @@
 # define _TIF_UPROBE			(0)
 #endif
 
+#ifndef _TIF_NOTIFY_SIGNAL
+# define _TIF_NOTIFY_SIGNAL		(0)
+#endif
+
 /*
  * TIF flags handled in syscall_enter_from_usermode()
  */
@@ -69,7 +73,7 @@
 
 #define EXIT_TO_USER_MODE_WORK						\
 	(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE |		\
-	 _TIF_NEED_RESCHED | _TIF_PATCH_PENDING |			\
+	 _TIF_NEED_RESCHED | _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL |	\
 	 ARCH_EXIT_TO_USER_MODE_WORK)
 
 /**
diff --git a/include/linux/entry-kvm.h b/include/linux/entry-kvm.h
index 0cef17afb41a..9b93f8584ff7 100644
--- a/include/linux/entry-kvm.h
+++ b/include/linux/entry-kvm.h
@@ -11,8 +11,8 @@
 # define ARCH_XFER_TO_GUEST_MODE_WORK	(0)
 #endif
 
-#define XFER_TO_GUEST_MODE_WORK					\
-	(_TIF_NEED_RESCHED | _TIF_SIGPENDING |			\
+#define XFER_TO_GUEST_MODE_WORK						\
+	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL |	\
 	 _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
 
 struct kvm_vcpu;
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 404145dc536e..9bc13ade2ff9 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -360,6 +360,15 @@ static inline int task_sigpending(struct task_struct *p)
 
 static inline int signal_pending(struct task_struct *p)
 {
+#ifdef TIF_NOTIFY_SIGNAL
+	/*
+	 * TIF_NOTIFY_SIGNAL isn't really a signal, but it requires the same
+	 * behavior in terms of ensuring that we break out of wait loops
+	 * so that notify signal callbacks can be processed.
+	 */
+	if (unlikely(test_tsk_thread_flag(p, TIF_NOTIFY_SIGNAL)))
+		return 1;
+#endif
 	return task_sigpending(p);
 }
 
@@ -507,7 +516,7 @@ extern int set_user_sigmask(const sigset_t __user *umask, size_t sigsetsize);
 static inline void restore_saved_sigmask_unless(bool interrupted)
 {
 	if (interrupted)
-		WARN_ON(!test_thread_flag(TIF_SIGPENDING));
+		WARN_ON(!signal_pending(current));
 	else
 		restore_saved_sigmask();
 }
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index b480e1a07ed8..bec952f51439 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -198,4 +198,31 @@ static inline void tracehook_notify_resume(struct pt_regs *regs)
 	blkcg_maybe_throttle_current();
 }
 
+/*
+ * called by exit_to_user_mode_loop() if ti_work & _TIF_NOTIFY_SIGNAL. This
+ * is currently used by TWA_SIGNAL based task_work, which requires breaking
+ * wait loops to ensure that task_work is noticed and run.
+ */
+static inline void tracehook_notify_signal(void)
+{
+#ifdef TIF_NOTIFY_SIGNAL
+	clear_thread_flag(TIF_NOTIFY_SIGNAL);
+	smp_mb__after_atomic();
+	if (current->task_works)
+		task_work_run();
+#endif
+}
+
+/*
+ * Called when we have work to process from exit_to_user_mode_loop()
+ */
+static inline void set_notify_signal(struct task_struct *task)
+{
+#ifdef TIF_NOTIFY_SIGNAL
+	if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
+	    !wake_up_state(task, TASK_INTERRUPTIBLE))
+		kick_process(task);
+#endif
+}
+
 #endif	/* <linux/tracehook.h> */
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index d20ab4ac7183..89a068252897 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -157,7 +157,7 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 		if (ti_work & _TIF_PATCH_PENDING)
 			klp_update_patch_state(current);
 
-		if (ti_work & _TIF_SIGPENDING)
+		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
 			arch_do_signal(regs);
 
 		if (ti_work & _TIF_NOTIFY_RESUME) {
diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
index b6678a5e3cf6..49972ee99aff 100644
--- a/kernel/entry/kvm.c
+++ b/kernel/entry/kvm.c
@@ -8,6 +8,9 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
 	do {
 		int ret;
 
+		if (ti_work & _TIF_NOTIFY_SIGNAL)
+			tracehook_notify_signal();
+
 		if (ti_work & _TIF_SIGPENDING) {
 			kvm_handle_signal_exit(vcpu);
 			return -EINTR;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/4] task_work: use TIF_NOTIFY_SIGNAL if available
  2020-10-08 15:27 [PATCHSET v4] Add support for TIF_NOTIFY_SIGNAL Jens Axboe
                   ` (2 preceding siblings ...)
  2020-10-08 15:27 ` [PATCH 3/4] kernel: add support for TIF_NOTIFY_SIGNAL Jens Axboe
@ 2020-10-08 15:27 ` Jens Axboe
  2020-10-13 23:50   ` Thomas Gleixner
  2020-10-09 14:30 ` [PATCHSET v4] Add support for TIF_NOTIFY_SIGNAL Oleg Nesterov
  4 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2020-10-08 15:27 UTC (permalink / raw)
  To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx, Jens Axboe, Roman Gershman

If the arch supports TIF_NOTIFY_SIGNAL, then use that for TWA_SIGNAL as
it's more efficient than using the signal delivery method. This is
especially true on threaded applications, where ->sighand is shared
across threads, but it's also lighter weight on non-shared cases.

io_uring is a heavy consumer of TWA_SIGNAL based task_work. On my test
box, even just using 16 threads shows a nice improvement running an
io_uring based echo server.

stock kernel:
0.01% <= 0.1 milliseconds
95.86% <= 0.2 milliseconds
98.27% <= 0.3 milliseconds
99.71% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.6 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.8 milliseconds
100.00% <= 0.9 milliseconds
100.00% <= 1.0 milliseconds
100.00% <= 1.1 milliseconds
100.00% <= 2 milliseconds
100.00% <= 3 milliseconds
100.00% <= 3 milliseconds
1378930.00 requests per second
~1600% CPU

1.38M requests/second, and all 16 CPUs are maxed out.

patched kernel:
0.01% <= 0.1 milliseconds
98.24% <= 0.2 milliseconds
99.47% <= 0.3 milliseconds
99.99% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.6 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.8 milliseconds
100.00% <= 0.9 milliseconds
100.00% <= 1.2 milliseconds
1666111.38 requests per second
~1450% CPU

1.67M requests/second, and we're no longer just hammering on the sighand
lock. The original reporter states:

"For 5.7.15 my benchmark achieves 1.6M qps and system cpu is at ~80%.
 for 5.7.16 or later it achieves only 1M qps and the system cpu is is
 at ~100%"

with the only difference there being that TWA_SIGNAL is used
unconditionally in 5.7.16, since we need it to be able to solve an
inability to run task_work if the application is waiting in the kernel
already on an event that needs task_work run to be satisfied. Also
see commit 0ba9c9edcd15.

Reported-by: Roman Gershman <romger@amazon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 kernel/task_work.c | 41 +++++++++++++++++++++++++++++------------
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/kernel/task_work.c b/kernel/task_work.c
index 613b2d634af8..95604e57af46 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -5,6 +5,34 @@
 
 static struct callback_head work_exited; /* all we need is ->next == NULL */
 
+/*
+ * TWA_SIGNAL signaling - use TIF_NOTIFY_SIGNAL, if available, as it's faster
+ * than TIF_SIGPENDING as there's no dependency on ->sighand. The latter is
+ * shared for threads, and can cause contention on sighand->lock. Even for
+ * the non-threaded case TIF_NOTIFY_SIGNAL is more efficient, as no locking
+ * or IRQ disabling is involved for notification (or running) purposes.
+ */
+static void task_work_notify_signal(struct task_struct *task)
+{
+#ifdef TIF_NOTIFY_SIGNAL
+	set_notify_signal(task);
+#else
+	unsigned long flags;
+
+	/*
+	 * Only grab the sighand lock if we don't already have some
+	 * task_work pending. This pairs with the smp_store_mb()
+	 * in get_signal(), see comment there.
+	 */
+	if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
+	    lock_task_sighand(task, &flags)) {
+		task->jobctl |= JOBCTL_TASK_WORK;
+		signal_wake_up(task, 0);
+		unlock_task_sighand(task, &flags);
+	}
+#endif
+}
+
 /**
  * task_work_add - ask the @task to execute @work->func()
  * @task: the task which should run the callback
@@ -28,7 +56,6 @@ int
 task_work_add(struct task_struct *task, struct callback_head *work, int notify)
 {
 	struct callback_head *head;
-	unsigned long flags;
 
 	do {
 		head = READ_ONCE(task->task_works);
@@ -42,17 +69,7 @@ task_work_add(struct task_struct *task, struct callback_head *work, int notify)
 		set_notify_resume(task);
 		break;
 	case TWA_SIGNAL:
-		/*
-		 * Only grab the sighand lock if we don't already have some
-		 * task_work pending. This pairs with the smp_store_mb()
-		 * in get_signal(), see comment there.
-		 */
-		if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
-		    lock_task_sighand(task, &flags)) {
-			task->jobctl |= JOBCTL_TASK_WORK;
-			signal_wake_up(task, 0);
-			unlock_task_sighand(task, &flags);
-		}
+		task_work_notify_signal(task);
 		break;
 	}
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCHSET v4] Add support for TIF_NOTIFY_SIGNAL
  2020-10-08 15:27 [PATCHSET v4] Add support for TIF_NOTIFY_SIGNAL Jens Axboe
                   ` (3 preceding siblings ...)
  2020-10-08 15:27 ` [PATCH 4/4] task_work: use TIF_NOTIFY_SIGNAL if available Jens Axboe
@ 2020-10-09 14:30 ` Oleg Nesterov
  2020-10-09 15:16   ` Jens Axboe
  4 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2020-10-09 14:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, io-uring, peterz, tglx

On 10/08, Jens Axboe wrote:
>
> Changes since v3:
> 
> - Drop not needed io_uring change
> - Drop syscall restart split, handle TIF_NOTIFY_SIGNAL from the arch
>   signal handling, using task_sigpending() to see if we need to care
>   about real signals.
> - Fix a few over-zelaous task_sigpending() changes
> - Cleanup WARN_ON() in restore_saved_sigmask_unless()

Reviewed-by: Oleg Nesterov <oleg@redhat.com>

but let me comment 3/4...


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] kernel: add support for TIF_NOTIFY_SIGNAL
  2020-10-08 15:27 ` [PATCH 3/4] kernel: add support for TIF_NOTIFY_SIGNAL Jens Axboe
@ 2020-10-09 14:43   ` Oleg Nesterov
  2020-10-09 15:13     ` Jens Axboe
  2020-10-13 23:42   ` Thomas Gleixner
  1 sibling, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2020-10-09 14:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, io-uring, peterz, tglx

Once again, I am fine with this patch, just a minor comment...

On 10/08, Jens Axboe wrote:
>
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -808,7 +808,10 @@ void arch_do_signal(struct pt_regs *regs)
>  {
>  	struct ksignal ksig;
>
> -	if (get_signal(&ksig)) {
> +	if (test_thread_flag(TIF_NOTIFY_SIGNAL))
> +		tracehook_notify_signal();
> +
> +	if (task_sigpending(current) && get_signal(&ksig)) {

I suggested to change arch_do_signal() because somehow I like it this way ;)

And because we can easily pass the "ti_work" mask to arch_do_signal() and
avoid test_thread_flag/task_sigpending.

Hmm. I just noticed that only x86 uses arch_do_signal(), so perhaps you can
add this change to this patch right now? Up to you.



On the other hand, we could add

	if (test_thread_flag(TIF_NOTIFY_SIGNAL))
		tracehook_notify_signal();

	if (!task_sigpending(current))
		return 0;

at the start of get_signal() instead. Somehow I don't really like this, but
this way we do need less changes in arch-dependant code. Again, up to you.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] kernel: add support for TIF_NOTIFY_SIGNAL
  2020-10-09 14:43   ` Oleg Nesterov
@ 2020-10-09 15:13     ` Jens Axboe
  2020-10-13 23:45       ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2020-10-09 15:13 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz, tglx

On 10/9/20 8:43 AM, Oleg Nesterov wrote:
> Once again, I am fine with this patch, just a minor comment...
> 
> On 10/08, Jens Axboe wrote:
>>
>> --- a/arch/x86/kernel/signal.c
>> +++ b/arch/x86/kernel/signal.c
>> @@ -808,7 +808,10 @@ void arch_do_signal(struct pt_regs *regs)
>>  {
>>  	struct ksignal ksig;
>>
>> -	if (get_signal(&ksig)) {
>> +	if (test_thread_flag(TIF_NOTIFY_SIGNAL))
>> +		tracehook_notify_signal();
>> +
>> +	if (task_sigpending(current) && get_signal(&ksig)) {
> 
> I suggested to change arch_do_signal() because somehow I like it this way ;)
> 
> And because we can easily pass the "ti_work" mask to arch_do_signal() and
> avoid test_thread_flag/task_sigpending.
> 
> Hmm. I just noticed that only x86 uses arch_do_signal(), so perhaps you can
> add this change to this patch right now? Up to you.

Sure, we can do that. Incremental on top then looks like the below. I don't
feel that strongly about it, but it is nice to avoid re-testing the flags.


diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index cd140bbf8520..ec6490e53dc3 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -804,14 +804,14 @@ static inline unsigned long get_nr_restart_syscall(const struct pt_regs *regs)
  * want to handle. Thus you cannot kill init even with a SIGKILL even by
  * mistake.
  */
-void arch_do_signal(struct pt_regs *regs)
+void arch_do_signal(struct pt_regs *regs, unsigned long ti_work)
 {
 	struct ksignal ksig;
 
-	if (test_thread_flag(TIF_NOTIFY_SIGNAL))
+	if (ti_work & _TIF_NOTIFY_SIGNAL)
 		tracehook_notify_signal();
 
-	if (task_sigpending(current) && get_signal(&ksig)) {
+	if ((ti_work & _TIF_SIGPENDING) && get_signal(&ksig)) {
 		/* Whee! Actually deliver the signal.  */
 		handle_signal(&ksig, regs);
 		return;
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index f4234aaac36c..0360b7e4e39a 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -265,10 +265,11 @@ static __always_inline void arch_exit_to_user_mode(void) { }
 /**
  * arch_do_signal -  Architecture specific signal delivery function
  * @regs:	Pointer to currents pt_regs
+ * @ti_work	task thread info work flags
  *
  * Invoked from exit_to_user_mode_loop().
  */
-void arch_do_signal(struct pt_regs *regs);
+void arch_do_signal(struct pt_regs *regs, unsigned long ti_work);
 
 /**
  * arch_syscall_exit_tracehook - Wrapper around tracehook_report_syscall_exit()
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 89a068252897..bd3cf6279e94 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -135,7 +135,7 @@ static __always_inline void exit_to_user_mode(void)
 }
 
 /* Workaround to allow gradual conversion of architecture code */
-void __weak arch_do_signal(struct pt_regs *regs) { }
+void __weak arch_do_signal(struct pt_regs *regs, unsigned long ti_work) { }
 
 static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 					    unsigned long ti_work)
@@ -158,7 +158,7 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 			klp_update_patch_state(current);
 
 		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
-			arch_do_signal(regs);
+			arch_do_signal(regs, ti_work);
 
 		if (ti_work & _TIF_NOTIFY_RESUME) {
 			tracehook_notify_resume(regs);

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCHSET v4] Add support for TIF_NOTIFY_SIGNAL
  2020-10-09 14:30 ` [PATCHSET v4] Add support for TIF_NOTIFY_SIGNAL Oleg Nesterov
@ 2020-10-09 15:16   ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2020-10-09 15:16 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz, tglx

On 10/9/20 8:30 AM, Oleg Nesterov wrote:
> On 10/08, Jens Axboe wrote:
>>
>> Changes since v3:
>>
>> - Drop not needed io_uring change
>> - Drop syscall restart split, handle TIF_NOTIFY_SIGNAL from the arch
>>   signal handling, using task_sigpending() to see if we need to care
>>   about real signals.
>> - Fix a few over-zelaous task_sigpending() changes
>> - Cleanup WARN_ON() in restore_saved_sigmask_unless()
> 
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>

Thanks, added.

> but let me comment 3/4...

Updated for that one too, this is the current patch:

https://git.kernel.dk/cgit/linux-block/commit/?h=tif-task_work&id=b6d5da9ba8e31f7b222172c1626cfd0f5d035083

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
  2020-10-08 15:27 ` [PATCH 1/4] tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume() Jens Axboe
@ 2020-10-13 23:35   ` Thomas Gleixner
  2020-10-14  1:13     ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2020-10-13 23:35 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, io-uring; +Cc: peterz, oleg, Jens Axboe

On Thu, Oct 08 2020 at 09:27, Jens Axboe wrote:
> All the callers currently do this, clean it up and move the clearing
> into tracehook_notify_resume() instead.
>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Nice cleanup!

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] kernel: add task_sigpending() helper
  2020-10-08 15:27 ` [PATCH 2/4] kernel: add task_sigpending() helper Jens Axboe
@ 2020-10-13 23:36   ` Thomas Gleixner
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2020-10-13 23:36 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, io-uring; +Cc: peterz, oleg, Jens Axboe

On Thu, Oct 08 2020 at 09:27, Jens Axboe wrote:
> This is in preparation for maintaining signal_pending() as the decider
> of whether or not a schedule() loop should be broken, or continue
> sleeping. This is different than the core signal use cases, where we
> really want to know if an actual signal is pending or not.
> task_sigpending() returns non-zero if TIF_SIGPENDING is set.
>
> Only core kernel use cases should care about the distinction between
> the two, make sure those use the task_sigpending() helper.
>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] kernel: add support for TIF_NOTIFY_SIGNAL
  2020-10-08 15:27 ` [PATCH 3/4] kernel: add support for TIF_NOTIFY_SIGNAL Jens Axboe
  2020-10-09 14:43   ` Oleg Nesterov
@ 2020-10-13 23:42   ` Thomas Gleixner
  2020-10-13 23:45     ` Jens Axboe
  1 sibling, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2020-10-13 23:42 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, io-uring; +Cc: peterz, oleg, Jens Axboe

On Thu, Oct 08 2020 at 09:27, Jens Axboe wrote:
> This adds TIF_NOTIFY_SIGNAL handling in the generic code, which if set,
> will return true if signal_pending() is used in a wait loop. That causes
> an exit of the loop so that notify_signal tracehooks can be run. If the
> wait loop is currently inside a system call, the system call is restarted
> once task_work has been processed.
>
> x86 is using the generic entry code, add the necessary TIF_NOTIFY_SIGNAL
> definitions for it.

Can you please split that into core changes and a patch which adds
support for x86?

>  static inline int signal_pending(struct task_struct *p)
>  {
> +#ifdef TIF_NOTIFY_SIGNAL

As I said in the other thread, plase make this

#if defined(CONFIG_GENERIC_ENTRY) && defined(TIF_NOTIFY_SIGNAL)

> +/*
> + * called by exit_to_user_mode_loop() if ti_work & _TIF_NOTIFY_SIGNAL. This
> + * is currently used by TWA_SIGNAL based task_work, which requires breaking
> + * wait loops to ensure that task_work is noticed and run.
> + */
> +static inline void tracehook_notify_signal(void)
> +{
> +#ifdef TIF_NOTIFY_SIGNAL

Ditto.

> +	clear_thread_flag(TIF_NOTIFY_SIGNAL);
> +	smp_mb__after_atomic();
> +	if (current->task_works)
> +		task_work_run();
> +#endif
> +}
> +
> +/*
> + * Called when we have work to process from exit_to_user_mode_loop()
> + */
> +static inline void set_notify_signal(struct task_struct *task)
> +{
> +#ifdef TIF_NOTIFY_SIGNAL

And this one.

Other than that, this approach of using arch_do_signal() addresses my
earlier concerns about the syscall restart machinery.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] kernel: add support for TIF_NOTIFY_SIGNAL
  2020-10-13 23:42   ` Thomas Gleixner
@ 2020-10-13 23:45     ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2020-10-13 23:45 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel, io-uring; +Cc: peterz, oleg

On 10/13/20 5:42 PM, Thomas Gleixner wrote:
> On Thu, Oct 08 2020 at 09:27, Jens Axboe wrote:
>> This adds TIF_NOTIFY_SIGNAL handling in the generic code, which if set,
>> will return true if signal_pending() is used in a wait loop. That causes
>> an exit of the loop so that notify_signal tracehooks can be run. If the
>> wait loop is currently inside a system call, the system call is restarted
>> once task_work has been processed.
>>
>> x86 is using the generic entry code, add the necessary TIF_NOTIFY_SIGNAL
>> definitions for it.
> 
> Can you please split that into core changes and a patch which adds
> support for x86?

Sure, I'll split it into generic and then x86 after that.

>>  static inline int signal_pending(struct task_struct *p)
>>  {
>> +#ifdef TIF_NOTIFY_SIGNAL
> 
> As I said in the other thread, plase make this
> 
> #if defined(CONFIG_GENERIC_ENTRY) && defined(TIF_NOTIFY_SIGNAL)

Thanks, I'll make those tweaks.

> Other than that, this approach of using arch_do_signal() addresses my
> earlier concerns about the syscall restart machinery.

Great! Thanks for taking a look.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] kernel: add support for TIF_NOTIFY_SIGNAL
  2020-10-09 15:13     ` Jens Axboe
@ 2020-10-13 23:45       ` Thomas Gleixner
  2020-10-13 23:54         ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2020-10-13 23:45 UTC (permalink / raw)
  To: Jens Axboe, Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz

On Fri, Oct 09 2020 at 09:13, Jens Axboe wrote:
>> Hmm. I just noticed that only x86 uses arch_do_signal(), so perhaps you can
>> add this change to this patch right now? Up to you.
>
> Sure, we can do that. Incremental on top then looks like the below. I don't
> feel that strongly about it, but it is nice to avoid re-testing the
> flags.

Yes it makes sense. Can you please make the signature change of
arch_do_signal() in a preparatory patch and only make use of it when you
add the TIF bit to x86?

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] task_work: use TIF_NOTIFY_SIGNAL if available
  2020-10-08 15:27 ` [PATCH 4/4] task_work: use TIF_NOTIFY_SIGNAL if available Jens Axboe
@ 2020-10-13 23:50   ` Thomas Gleixner
  2020-10-13 23:55     ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2020-10-13 23:50 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel, io-uring
  Cc: peterz, oleg, Jens Axboe, Roman Gershman

On Thu, Oct 08 2020 at 09:27, Jens Axboe wrote:
> +/*
> + * TWA_SIGNAL signaling - use TIF_NOTIFY_SIGNAL, if available, as it's faster
> + * than TIF_SIGPENDING as there's no dependency on ->sighand. The latter is
> + * shared for threads, and can cause contention on sighand->lock. Even for
> + * the non-threaded case TIF_NOTIFY_SIGNAL is more efficient, as no locking
> + * or IRQ disabling is involved for notification (or running) purposes.
> + */
> +static void task_work_notify_signal(struct task_struct *task)
> +{
> +#ifdef TIF_NOTIFY_SIGNAL
> +	set_notify_signal(task);
> +#else
> +	unsigned long flags;
> +
> +	/*
> +	 * Only grab the sighand lock if we don't already have some
> +	 * task_work pending. This pairs with the smp_store_mb()
> +	 * in get_signal(), see comment there.
> +	 */
> +	if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
> +	    lock_task_sighand(task, &flags)) {
> +		task->jobctl |= JOBCTL_TASK_WORK;
> +		signal_wake_up(task, 0);
> +		unlock_task_sighand(task, &flags);
> +	}
> +#endif

Same #ifdeffery comment as before.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] kernel: add support for TIF_NOTIFY_SIGNAL
  2020-10-13 23:45       ` Thomas Gleixner
@ 2020-10-13 23:54         ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2020-10-13 23:54 UTC (permalink / raw)
  To: Thomas Gleixner, Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz

On 10/13/20 5:45 PM, Thomas Gleixner wrote:
> On Fri, Oct 09 2020 at 09:13, Jens Axboe wrote:
>>> Hmm. I just noticed that only x86 uses arch_do_signal(), so perhaps you can
>>> add this change to this patch right now? Up to you.
>>
>> Sure, we can do that. Incremental on top then looks like the below. I don't
>> feel that strongly about it, but it is nice to avoid re-testing the
>> flags.
> 
> Yes it makes sense. Can you please make the signature change of
> arch_do_signal() in a preparatory patch and only make use of it when you
> add the TIF bit to x86?

Ala:

https://git.kernel.dk/cgit/linux-block/commit/?h=tif-task_work&id=6a150da501727f6bdca4786a02c0a57160e13079

with the pre generic patch being:

https://git.kernel.dk/cgit/linux-block/commit/?h=tif-task_work&id=add2e252ae8481350239c3cb893aed490c05c397

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] task_work: use TIF_NOTIFY_SIGNAL if available
  2020-10-13 23:50   ` Thomas Gleixner
@ 2020-10-13 23:55     ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2020-10-13 23:55 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel, io-uring; +Cc: peterz, oleg, Roman Gershman

On 10/13/20 5:50 PM, Thomas Gleixner wrote:
> On Thu, Oct 08 2020 at 09:27, Jens Axboe wrote:
>> +/*
>> + * TWA_SIGNAL signaling - use TIF_NOTIFY_SIGNAL, if available, as it's faster
>> + * than TIF_SIGPENDING as there's no dependency on ->sighand. The latter is
>> + * shared for threads, and can cause contention on sighand->lock. Even for
>> + * the non-threaded case TIF_NOTIFY_SIGNAL is more efficient, as no locking
>> + * or IRQ disabling is involved for notification (or running) purposes.
>> + */
>> +static void task_work_notify_signal(struct task_struct *task)
>> +{
>> +#ifdef TIF_NOTIFY_SIGNAL
>> +	set_notify_signal(task);
>> +#else
>> +	unsigned long flags;
>> +
>> +	/*
>> +	 * Only grab the sighand lock if we don't already have some
>> +	 * task_work pending. This pairs with the smp_store_mb()
>> +	 * in get_signal(), see comment there.
>> +	 */
>> +	if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
>> +	    lock_task_sighand(task, &flags)) {
>> +		task->jobctl |= JOBCTL_TASK_WORK;
>> +		signal_wake_up(task, 0);
>> +		unlock_task_sighand(task, &flags);
>> +	}
>> +#endif
> 
> Same #ifdeffery comment as before.

Fixed up.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
  2020-10-13 23:35   ` Thomas Gleixner
@ 2020-10-14  1:13     ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2020-10-14  1:13 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel, io-uring; +Cc: peterz, oleg

On 10/13/20 5:35 PM, Thomas Gleixner wrote:
> On Thu, Oct 08 2020 at 09:27, Jens Axboe wrote:
>> All the callers currently do this, clean it up and move the clearing
>> into tracehook_notify_resume() instead.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> Nice cleanup!
> 
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

Thanks! Added.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/4] task_work: use TIF_NOTIFY_SIGNAL if available
  2020-10-26 20:32 [PATCHSET v6a 0/4] Add support for TIF_NOTIFY_SIGNAL Jens Axboe
@ 2020-10-26 20:32 ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2020-10-26 20:32 UTC (permalink / raw)
  To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx, Jens Axboe, Roman Gershman

If the arch supports TIF_NOTIFY_SIGNAL, then use that for TWA_SIGNAL as
it's more efficient than using the signal delivery method. This is
especially true on threaded applications, where ->sighand is shared
across threads, but it's also lighter weight on non-shared cases.

io_uring is a heavy consumer of TWA_SIGNAL based task_work. On my test
box, even just using 16 threads shows a nice improvement running an
io_uring based echo server.

stock kernel:
0.01% <= 0.1 milliseconds
95.86% <= 0.2 milliseconds
98.27% <= 0.3 milliseconds
99.71% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.6 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.8 milliseconds
100.00% <= 0.9 milliseconds
100.00% <= 1.0 milliseconds
100.00% <= 1.1 milliseconds
100.00% <= 2 milliseconds
100.00% <= 3 milliseconds
100.00% <= 3 milliseconds
1378930.00 requests per second
~1600% CPU

1.38M requests/second, and all 16 CPUs are maxed out.

patched kernel:
0.01% <= 0.1 milliseconds
98.24% <= 0.2 milliseconds
99.47% <= 0.3 milliseconds
99.99% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.6 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.8 milliseconds
100.00% <= 0.9 milliseconds
100.00% <= 1.2 milliseconds
1666111.38 requests per second
~1450% CPU

1.67M requests/second, and we're no longer just hammering on the sighand
lock. The original reporter states:

"For 5.7.15 my benchmark achieves 1.6M qps and system cpu is at ~80%.
 for 5.7.16 or later it achieves only 1M qps and the system cpu is is
 at ~100%"

with the only difference there being that TWA_SIGNAL is used
unconditionally in 5.7.16, since we need it to be able to solve an
inability to run task_work if the application is waiting in the kernel
already on an event that needs task_work run to be satisfied. Also
see commit 0ba9c9edcd15.

Reported-by: Roman Gershman <romger@amazon.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 kernel/task_work.c | 41 +++++++++++++++++++++++++++++------------
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/kernel/task_work.c b/kernel/task_work.c
index 8d6e1217c451..15b087286bea 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -5,6 +5,34 @@
 
 static struct callback_head work_exited; /* all we need is ->next == NULL */
 
+/*
+ * TWA_SIGNAL signaling - use TIF_NOTIFY_SIGNAL, if available, as it's faster
+ * than TIF_SIGPENDING as there's no dependency on ->sighand. The latter is
+ * shared for threads, and can cause contention on sighand->lock. Even for
+ * the non-threaded case TIF_NOTIFY_SIGNAL is more efficient, as no locking
+ * or IRQ disabling is involved for notification (or running) purposes.
+ */
+static void task_work_notify_signal(struct task_struct *task)
+{
+#if defined(TIF_NOTIFY_SIGNAL)
+	set_notify_signal(task);
+#else
+	unsigned long flags;
+
+	/*
+	 * Only grab the sighand lock if we don't already have some
+	 * task_work pending. This pairs with the smp_store_mb()
+	 * in get_signal(), see comment there.
+	 */
+	if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
+	    lock_task_sighand(task, &flags)) {
+		task->jobctl |= JOBCTL_TASK_WORK;
+		signal_wake_up(task, 0);
+		unlock_task_sighand(task, &flags);
+	}
+#endif
+}
+
 /**
  * task_work_add - ask the @task to execute @work->func()
  * @task: the task which should run the callback
@@ -33,7 +61,6 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
 		  enum task_work_notify_mode notify)
 {
 	struct callback_head *head;
-	unsigned long flags;
 
 	do {
 		head = READ_ONCE(task->task_works);
@@ -49,17 +76,7 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
 		set_notify_resume(task);
 		break;
 	case TWA_SIGNAL:
-		/*
-		 * Only grab the sighand lock if we don't already have some
-		 * task_work pending. This pairs with the smp_store_mb()
-		 * in get_signal(), see comment there.
-		 */
-		if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
-		    lock_task_sighand(task, &flags)) {
-			task->jobctl |= JOBCTL_TASK_WORK;
-			signal_wake_up(task, 0);
-			unlock_task_sighand(task, &flags);
-		}
+		task_work_notify_signal(task);
 		break;
 	default:
 		WARN_ON_ONCE(1);
-- 
2.29.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/4] task_work: use TIF_NOTIFY_SIGNAL if available
  2020-10-16 15:45 [PATCHSET v6] " Jens Axboe
@ 2020-10-16 15:45 ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2020-10-16 15:45 UTC (permalink / raw)
  To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx, Jens Axboe, Roman Gershman

If the arch supports TIF_NOTIFY_SIGNAL, then use that for TWA_SIGNAL as
it's more efficient than using the signal delivery method. This is
especially true on threaded applications, where ->sighand is shared
across threads, but it's also lighter weight on non-shared cases.

io_uring is a heavy consumer of TWA_SIGNAL based task_work. On my test
box, even just using 16 threads shows a nice improvement running an
io_uring based echo server.

stock kernel:
0.01% <= 0.1 milliseconds
95.86% <= 0.2 milliseconds
98.27% <= 0.3 milliseconds
99.71% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.6 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.8 milliseconds
100.00% <= 0.9 milliseconds
100.00% <= 1.0 milliseconds
100.00% <= 1.1 milliseconds
100.00% <= 2 milliseconds
100.00% <= 3 milliseconds
100.00% <= 3 milliseconds
1378930.00 requests per second
~1600% CPU

1.38M requests/second, and all 16 CPUs are maxed out.

patched kernel:
0.01% <= 0.1 milliseconds
98.24% <= 0.2 milliseconds
99.47% <= 0.3 milliseconds
99.99% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.6 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.8 milliseconds
100.00% <= 0.9 milliseconds
100.00% <= 1.2 milliseconds
1666111.38 requests per second
~1450% CPU

1.67M requests/second, and we're no longer just hammering on the sighand
lock. The original reporter states:

"For 5.7.15 my benchmark achieves 1.6M qps and system cpu is at ~80%.
 for 5.7.16 or later it achieves only 1M qps and the system cpu is is
 at ~100%"

with the only difference there being that TWA_SIGNAL is used
unconditionally in 5.7.16, since we need it to be able to solve an
inability to run task_work if the application is waiting in the kernel
already on an event that needs task_work run to be satisfied. Also
see commit 0ba9c9edcd15.

Reported-by: Roman Gershman <romger@amazon.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 kernel/task_work.c | 41 +++++++++++++++++++++++++++++------------
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/kernel/task_work.c b/kernel/task_work.c
index 613b2d634af8..ae058893913c 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -5,6 +5,34 @@
 
 static struct callback_head work_exited; /* all we need is ->next == NULL */
 
+/*
+ * TWA_SIGNAL signaling - use TIF_NOTIFY_SIGNAL, if available, as it's faster
+ * than TIF_SIGPENDING as there's no dependency on ->sighand. The latter is
+ * shared for threads, and can cause contention on sighand->lock. Even for
+ * the non-threaded case TIF_NOTIFY_SIGNAL is more efficient, as no locking
+ * or IRQ disabling is involved for notification (or running) purposes.
+ */
+static void task_work_notify_signal(struct task_struct *task)
+{
+#if defined(TIF_NOTIFY_SIGNAL)
+	set_notify_signal(task);
+#else
+	unsigned long flags;
+
+	/*
+	 * Only grab the sighand lock if we don't already have some
+	 * task_work pending. This pairs with the smp_store_mb()
+	 * in get_signal(), see comment there.
+	 */
+	if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
+	    lock_task_sighand(task, &flags)) {
+		task->jobctl |= JOBCTL_TASK_WORK;
+		signal_wake_up(task, 0);
+		unlock_task_sighand(task, &flags);
+	}
+#endif
+}
+
 /**
  * task_work_add - ask the @task to execute @work->func()
  * @task: the task which should run the callback
@@ -28,7 +56,6 @@ int
 task_work_add(struct task_struct *task, struct callback_head *work, int notify)
 {
 	struct callback_head *head;
-	unsigned long flags;
 
 	do {
 		head = READ_ONCE(task->task_works);
@@ -42,17 +69,7 @@ task_work_add(struct task_struct *task, struct callback_head *work, int notify)
 		set_notify_resume(task);
 		break;
 	case TWA_SIGNAL:
-		/*
-		 * Only grab the sighand lock if we don't already have some
-		 * task_work pending. This pairs with the smp_store_mb()
-		 * in get_signal(), see comment there.
-		 */
-		if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
-		    lock_task_sighand(task, &flags)) {
-			task->jobctl |= JOBCTL_TASK_WORK;
-			signal_wake_up(task, 0);
-			unlock_task_sighand(task, &flags);
-		}
+		task_work_notify_signal(task);
 		break;
 	}
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, back to index

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-08 15:27 [PATCHSET v4] Add support for TIF_NOTIFY_SIGNAL Jens Axboe
2020-10-08 15:27 ` [PATCH 1/4] tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume() Jens Axboe
2020-10-13 23:35   ` Thomas Gleixner
2020-10-14  1:13     ` Jens Axboe
2020-10-08 15:27 ` [PATCH 2/4] kernel: add task_sigpending() helper Jens Axboe
2020-10-13 23:36   ` Thomas Gleixner
2020-10-08 15:27 ` [PATCH 3/4] kernel: add support for TIF_NOTIFY_SIGNAL Jens Axboe
2020-10-09 14:43   ` Oleg Nesterov
2020-10-09 15:13     ` Jens Axboe
2020-10-13 23:45       ` Thomas Gleixner
2020-10-13 23:54         ` Jens Axboe
2020-10-13 23:42   ` Thomas Gleixner
2020-10-13 23:45     ` Jens Axboe
2020-10-08 15:27 ` [PATCH 4/4] task_work: use TIF_NOTIFY_SIGNAL if available Jens Axboe
2020-10-13 23:50   ` Thomas Gleixner
2020-10-13 23:55     ` Jens Axboe
2020-10-09 14:30 ` [PATCHSET v4] Add support for TIF_NOTIFY_SIGNAL Oleg Nesterov
2020-10-09 15:16   ` Jens Axboe
2020-10-16 15:45 [PATCHSET v6] " Jens Axboe
2020-10-16 15:45 ` [PATCH 4/4] task_work: use TIF_NOTIFY_SIGNAL if available Jens Axboe
2020-10-26 20:32 [PATCHSET v6a 0/4] Add support for TIF_NOTIFY_SIGNAL Jens Axboe
2020-10-26 20:32 ` [PATCH 4/4] task_work: use TIF_NOTIFY_SIGNAL if available Jens Axboe

IO-Uring Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/io-uring/0 io-uring/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 io-uring io-uring/ https://lore.kernel.org/io-uring \
		io-uring@vger.kernel.org
	public-inbox-index io-uring

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.io-uring


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git