All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RT 00/13] Linux 5.10.78-rt56-rc3
@ 2021-11-24 18:03 Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 01/13] mm, zsmalloc: Convert zsmalloc_handle.lock to spinlock_t Steven Rostedt
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: Steven Rostedt @ 2021-11-24 18:03 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Daniel Wagner, Tom Zanussi, Srivatsa S. Bhat


Dear RT Folks,

This is the RT stable review cycle of patch 5.10.78-rt56-rc3.

Please scream at me if I messed something up. Please test the patches too.

The -rc release will be uploaded to kernel.org and will be deleted when
the final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 11/26/2021.

Enjoy,

-- Steve


To build 5.10.78-rt56-rc3 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v5.x/linux-5.10.tar.xz

  http://www.kernel.org/pub/linux/kernel/v5.x/patch-5.10.78.xz

  http://www.kernel.org/pub/linux/kernel/projects/rt/5.10/patch-5.10.78-rt56-rc3.patch.xz

You can also build from 5.10.78-rt55 by applying the incremental patch:

http://www.kernel.org/pub/linux/kernel/projects/rt/5.10/incr/patch-5.10.78-rt55-rt56-rc3.patch.xz


Changes from 5.10.78-rt55:

---


Mike Galbraith (1):
      mm, zsmalloc: Convert zsmalloc_handle.lock to spinlock_t

Sebastian Andrzej Siewior (11):
      sched: Fix get_push_task() vs migrate_disable()
      sched: Switch wait_task_inactive to HRTIMER_MODE_REL_HARD
      preempt: Move preempt_enable_no_resched() to the RT block
      mm: Disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT
      fscache: Use only one fscache_object_cong_wait.
      fscache: Use only one fscache_object_cong_wait.
      locking: Drop might_resched() from might_sleep_no_state_check()
      drm/i915/gt: Queue and wait for the irq_work item.
      irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
      irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
      irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT

Steven Rostedt (VMware) (1):
      Linux 5.10.78-rt56-rc3

----
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |   5 +-
 fs/fscache/internal.h                       |   1 -
 fs/fscache/main.c                           |   6 --
 fs/fscache/object.c                         |  13 +--
 include/linux/irq_work.h                    |  31 ++++--
 include/linux/kernel.h                      |   2 +-
 include/linux/preempt.h                     |   6 +-
 init/Kconfig                                |   2 +-
 kernel/irq_work.c                           | 143 +++++++++++++++++++++-------
 kernel/sched/core.c                         |   2 +-
 kernel/sched/sched.h                        |   3 +
 kernel/time/timer.c                         |   2 -
 localversion-rt                             |   2 +-
 mm/zsmalloc.c                               |  12 +--
 14 files changed, 155 insertions(+), 75 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RT 01/13] mm, zsmalloc: Convert zsmalloc_handle.lock to spinlock_t
  2021-11-24 18:03 [PATCH RT 00/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
@ 2021-11-24 18:03 ` Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 02/13] sched: Fix get_push_task() vs migrate_disable() Steven Rostedt
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2021-11-24 18:03 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Daniel Wagner, Tom Zanussi, Srivatsa S. Bhat,
	stable-rt, Mike Galbraith

5.10.78-rt56-rc3 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Mike Galbraith <efault@gmx.de>

local_lock_t becoming a synonym of spinlock_t had consequences for the RT
mods to zsmalloc, which were taking a mutex while holding a local_lock,
inspiring a lockdep "BUG: Invalid wait context" gripe.

Converting zsmalloc_handle.lock to a spinlock_t restored lockdep silence.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 mm/zsmalloc.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 277d426c881f..3595c1644135 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -82,7 +82,7 @@
 
 struct zsmalloc_handle {
 	unsigned long addr;
-	struct mutex lock;
+	spinlock_t lock;
 };
 
 #define ZS_HANDLE_ALLOC_SIZE (sizeof(struct zsmalloc_handle))
@@ -370,7 +370,7 @@ static unsigned long cache_alloc_handle(struct zs_pool *pool, gfp_t gfp)
 	if (p) {
 		struct zsmalloc_handle *zh = p;
 
-		mutex_init(&zh->lock);
+		spin_lock_init(&zh->lock);
 	}
 #endif
 	return (unsigned long)p;
@@ -930,7 +930,7 @@ static inline int testpin_tag(unsigned long handle)
 #ifdef CONFIG_PREEMPT_RT
 	struct zsmalloc_handle *zh = zs_get_pure_handle(handle);
 
-	return mutex_is_locked(&zh->lock);
+	return spin_is_locked(&zh->lock);
 #else
 	return bit_spin_is_locked(HANDLE_PIN_BIT, (unsigned long *)handle);
 #endif
@@ -941,7 +941,7 @@ static inline int trypin_tag(unsigned long handle)
 #ifdef CONFIG_PREEMPT_RT
 	struct zsmalloc_handle *zh = zs_get_pure_handle(handle);
 
-	return mutex_trylock(&zh->lock);
+	return spin_trylock(&zh->lock);
 #else
 	return bit_spin_trylock(HANDLE_PIN_BIT, (unsigned long *)handle);
 #endif
@@ -952,7 +952,7 @@ static void pin_tag(unsigned long handle) __acquires(bitlock)
 #ifdef CONFIG_PREEMPT_RT
 	struct zsmalloc_handle *zh = zs_get_pure_handle(handle);
 
-	return mutex_lock(&zh->lock);
+	return spin_lock(&zh->lock);
 #else
 	bit_spin_lock(HANDLE_PIN_BIT, (unsigned long *)handle);
 #endif
@@ -963,7 +963,7 @@ static void unpin_tag(unsigned long handle) __releases(bitlock)
 #ifdef CONFIG_PREEMPT_RT
 	struct zsmalloc_handle *zh = zs_get_pure_handle(handle);
 
-	return mutex_unlock(&zh->lock);
+	return spin_unlock(&zh->lock);
 #else
 	bit_spin_unlock(HANDLE_PIN_BIT, (unsigned long *)handle);
 #endif
-- 
2.33.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RT 02/13] sched: Fix get_push_task() vs migrate_disable()
  2021-11-24 18:03 [PATCH RT 00/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 01/13] mm, zsmalloc: Convert zsmalloc_handle.lock to spinlock_t Steven Rostedt
@ 2021-11-24 18:03 ` Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 03/13] sched: Switch wait_task_inactive to HRTIMER_MODE_REL_HARD Steven Rostedt
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2021-11-24 18:03 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Daniel Wagner, Tom Zanussi, Srivatsa S. Bhat,
	stable-rt, Peter Zijlstra (Intel)

5.10.78-rt56-rc3 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

push_rt_task() attempts to move the currently running task away if the
next runnable task has migration disabled and therefore is pinned on the
current CPU.

The current task is retrieved via get_push_task() which only checks for
nr_cpus_allowed == 1, but does not check whether the task has migration
disabled and therefore cannot be moved either. The consequence is a
pointless invocation of the migration thread which correctly observes
that the task cannot be moved.

Return NULL if the task has migration disabled and cannot be moved to
another CPU.

Cc: stable-rt@vger.kernel.org
Fixes: a7c81556ec4d3 ("sched: Fix migrate_disable() vs rt/dl balancing")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210826133738.yiotqbtdaxzjsnfj@linutronix.de
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/sched/sched.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 826ea17e144d..c2c9c386456d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1949,6 +1949,9 @@ static inline struct task_struct *get_push_task(struct rq *rq)
 	if (p->nr_cpus_allowed == 1)
 		return NULL;
 
+	if (p->migration_disabled)
+		return NULL;
+
 	rq->push_busy = true;
 	return get_task_struct(p);
 }
-- 
2.33.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RT 03/13] sched: Switch wait_task_inactive to HRTIMER_MODE_REL_HARD
  2021-11-24 18:03 [PATCH RT 00/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 01/13] mm, zsmalloc: Convert zsmalloc_handle.lock to spinlock_t Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 02/13] sched: Fix get_push_task() vs migrate_disable() Steven Rostedt
@ 2021-11-24 18:03 ` Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 04/13] preempt: Move preempt_enable_no_resched() to the RT block Steven Rostedt
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2021-11-24 18:03 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Daniel Wagner, Tom Zanussi, Srivatsa S. Bhat,
	stable-rt

5.10.78-rt56-rc3 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

With PREEMPT_RT enabled all hrtimers callbacks will be invoked in
softirq mode unless they are explicitly marked as HRTIMER_MODE_HARD.
During boot kthread_bind() is used for the creation of per-CPU threads
and then hangs in wait_task_inactive() if the ksoftirqd is not
yet up and running.
The hang disappeared since commit
   26c7295be0c5e ("kthread: Do not preempt current task if it is going to call schedule()")

but enabling function trace on boot reliably leads to the freeze on boot
behaviour again.
The timer in wait_task_inactive() can not be directly used by an user
interface to abuse it and create a mass wake of several tasks at the
same time which would to long sections with disabled interrupts.
Therefore it is safe to make the timer HRTIMER_MODE_REL_HARD.

Switch the timer to HRTIMER_MODE_REL_HARD.

Cc: stable-rt@vger.kernel.org
Link: https://lkml.kernel.org/r/20210826170408.vm7rlj7odslshwch@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f638d9420553..54fa3bb1b7c4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2734,7 +2734,7 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state)
 			ktime_t to = NSEC_PER_SEC / HZ;
 
 			set_current_state(TASK_UNINTERRUPTIBLE);
-			schedule_hrtimeout(&to, HRTIMER_MODE_REL);
+			schedule_hrtimeout(&to, HRTIMER_MODE_REL_HARD);
 			continue;
 		}
 
-- 
2.33.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RT 04/13] preempt: Move preempt_enable_no_resched() to the RT block
  2021-11-24 18:03 [PATCH RT 00/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
                   ` (2 preceding siblings ...)
  2021-11-24 18:03 ` [PATCH RT 03/13] sched: Switch wait_task_inactive to HRTIMER_MODE_REL_HARD Steven Rostedt
@ 2021-11-24 18:03 ` Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 05/13] mm: Disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT Steven Rostedt
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2021-11-24 18:03 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Daniel Wagner, Tom Zanussi, Srivatsa S. Bhat,
	stable-rt

5.10.78-rt56-rc3 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

preempt_enable_no_resched() should point to preempt_enable() on
PREEMPT_RT so nobody is playing any preempt tricks and enables
preemption without checking for the need-resched flag.

This was misplaced in v3.14.0-rt1 und remained unnoticed until now.

Point preempt_enable_no_resched() and preempt_enable() on RT.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/preempt.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index af39859f02ee..7b5b2ed55531 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -208,12 +208,12 @@ do { \
 	preempt_count_dec(); \
 } while (0)
 
-#ifdef CONFIG_PREEMPT_RT
+#ifndef CONFIG_PREEMPT_RT
 # define preempt_enable_no_resched() sched_preempt_enable_no_resched()
-# define preempt_check_resched_rt() preempt_check_resched()
+# define preempt_check_resched_rt() barrier();
 #else
 # define preempt_enable_no_resched() preempt_enable()
-# define preempt_check_resched_rt() barrier();
+# define preempt_check_resched_rt() preempt_check_resched()
 #endif
 
 #define preemptible()	(preempt_count() == 0 && !irqs_disabled())
-- 
2.33.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RT 05/13] mm: Disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT
  2021-11-24 18:03 [PATCH RT 00/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
                   ` (3 preceding siblings ...)
  2021-11-24 18:03 ` [PATCH RT 04/13] preempt: Move preempt_enable_no_resched() to the RT block Steven Rostedt
@ 2021-11-24 18:03 ` Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 06/13] fscache: Use only one fscache_object_cong_wait Steven Rostedt
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2021-11-24 18:03 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Daniel Wagner, Tom Zanussi, Srivatsa S. Bhat,
	stable-rt, Mel Gorman

5.10.78-rt56-rc3 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

TRANSPARENT_HUGEPAGE:
There are potential non-deterministic delays to an RT thread if a critical
memory region is not THP-aligned and a non-RT buffer is located in the same
hugepage-aligned region. It's also possible for an unrelated thread to migrate
pages belonging to an RT task incurring unexpected page faults due to memory
defragmentation even if khugepaged is disabled.

Regular HUGEPAGEs are not affected by this can be used.

NUMA_BALANCING:
There is a non-deterministic delay to mark PTEs PROT_NONE to gather NUMA fault
samples, increased page faults of regions even if mlocked and non-deterministic
delays when migrating pages.

[Mel Gorman worded 99% of the commit description].

Link: https://lore.kernel.org/all/20200304091159.GN3818@techsingularity.net/
Link: https://lore.kernel.org/all/20211026165100.ahz5bkx44lrrw5pt@linutronix.de/
Cc: stable-rt@vger.kernel.org
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20211028143327.hfbxjze7palrpfgp@linutronix.de
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 init/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 7ba2b602b707..9bfc60e7eead 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -861,7 +861,7 @@ config NUMA_BALANCING
 	bool "Memory placement aware NUMA scheduler"
 	depends on ARCH_SUPPORTS_NUMA_BALANCING
 	depends on !ARCH_WANT_NUMA_VARIABLE_LOCALITY
-	depends on SMP && NUMA && MIGRATION
+	depends on SMP && NUMA && MIGRATION && !PREEMPT_RT
 	help
 	  This option adds support for automatic NUMA aware memory/task placement.
 	  The mechanism is quite primitive and is based on migrating memory when
-- 
2.33.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RT 06/13] fscache: Use only one fscache_object_cong_wait.
  2021-11-24 18:03 [PATCH RT 00/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
                   ` (4 preceding siblings ...)
  2021-11-24 18:03 ` [PATCH RT 05/13] mm: Disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT Steven Rostedt
@ 2021-11-24 18:03 ` Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 07/13] fscache: Use only one fscache_object_cong_wait (2) Steven Rostedt
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2021-11-24 18:03 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Daniel Wagner, Tom Zanussi, Srivatsa S. Bhat,
	Gregor Beck, stable-rt

5.10.78-rt56-rc3 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

In the commit mentioned below, fscache was converted from slow-work to
workqueue. slow_work_enqueue() and slow_work_sleep_till_thread_needed()
did not use a per-CPU workqueue. They choose from two global waitqueues
depending on the SLOW_WORK_VERY_SLOW bit which was not set so it always
one waitqueue.

I can't find out how it is ensured that a waiter on certain CPU is woken
up be the other side. My guess is that the timeout in schedule_timeout()
ensures that it does not wait forever (or a random wake up).

fscache_object_sleep_till_congested() must be invoked from preemptible
context in order for schedule() to work. In this case this_cpu_ptr()
should complain with CONFIG_DEBUG_PREEMPT enabled except the thread is
bound to one CPU.

wake_up() wakes only one waiter and I'm not sure if it is guaranteed
that only one waiter exists.

Replace the per-CPU waitqueue with one global waitqueue.

Fixes: 8b8edefa2fffb ("fscache: convert object to use workqueue instead of slow-work")
Reported-by: Gregor Beck <gregor.beck@gmail.com>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 fs/fscache/internal.h |  1 -
 fs/fscache/main.c     |  6 ------
 fs/fscache/object.c   | 11 +++++------
 3 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 64aa552b296d..7dae569dafb9 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -95,7 +95,6 @@ extern unsigned fscache_debug;
 extern struct kobject *fscache_root;
 extern struct workqueue_struct *fscache_object_wq;
 extern struct workqueue_struct *fscache_op_wq;
-DECLARE_PER_CPU(wait_queue_head_t, fscache_object_cong_wait);
 
 extern unsigned int fscache_hash(unsigned int salt, unsigned int *data, unsigned int n);
 
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index 4207f98e405f..85f8cf3a323d 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -41,8 +41,6 @@ struct kobject *fscache_root;
 struct workqueue_struct *fscache_object_wq;
 struct workqueue_struct *fscache_op_wq;
 
-DEFINE_PER_CPU(wait_queue_head_t, fscache_object_cong_wait);
-
 /* these values serve as lower bounds, will be adjusted in fscache_init() */
 static unsigned fscache_object_max_active = 4;
 static unsigned fscache_op_max_active = 2;
@@ -138,7 +136,6 @@ unsigned int fscache_hash(unsigned int salt, unsigned int *data, unsigned int n)
 static int __init fscache_init(void)
 {
 	unsigned int nr_cpus = num_possible_cpus();
-	unsigned int cpu;
 	int ret;
 
 	fscache_object_max_active =
@@ -161,9 +158,6 @@ static int __init fscache_init(void)
 	if (!fscache_op_wq)
 		goto error_op_wq;
 
-	for_each_possible_cpu(cpu)
-		init_waitqueue_head(&per_cpu(fscache_object_cong_wait, cpu));
-
 	ret = fscache_proc_init();
 	if (ret < 0)
 		goto error_proc;
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index cb2146e02cd5..55158f30d093 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -807,6 +807,8 @@ void fscache_object_destroy(struct fscache_object *object)
 }
 EXPORT_SYMBOL(fscache_object_destroy);
 
+static DECLARE_WAIT_QUEUE_HEAD(fscache_object_cong_wait);
+
 /*
  * enqueue an object for metadata-type processing
  */
@@ -815,12 +817,10 @@ void fscache_enqueue_object(struct fscache_object *object)
 	_enter("{OBJ%x}", object->debug_id);
 
 	if (fscache_get_object(object, fscache_obj_get_queue) >= 0) {
-		wait_queue_head_t *cong_wq =
-			&get_cpu_var(fscache_object_cong_wait);
 
 		if (queue_work(fscache_object_wq, &object->work)) {
 			if (fscache_object_congested())
-				wake_up(cong_wq);
+				wake_up(&fscache_object_cong_wait);
 		} else
 			fscache_put_object(object, fscache_obj_put_queue);
 
@@ -842,16 +842,15 @@ void fscache_enqueue_object(struct fscache_object *object)
  */
 bool fscache_object_sleep_till_congested(signed long *timeoutp)
 {
-	wait_queue_head_t *cong_wq = this_cpu_ptr(&fscache_object_cong_wait);
 	DEFINE_WAIT(wait);
 
 	if (fscache_object_congested())
 		return true;
 
-	add_wait_queue_exclusive(cong_wq, &wait);
+	add_wait_queue_exclusive(&fscache_object_cong_wait, &wait);
 	if (!fscache_object_congested())
 		*timeoutp = schedule_timeout(*timeoutp);
-	finish_wait(cong_wq, &wait);
+	finish_wait(&fscache_object_cong_wait, &wait);
 
 	return fscache_object_congested();
 }
-- 
2.33.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RT 07/13] fscache: Use only one fscache_object_cong_wait (2)
  2021-11-24 18:03 [PATCH RT 00/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
                   ` (5 preceding siblings ...)
  2021-11-24 18:03 ` [PATCH RT 06/13] fscache: Use only one fscache_object_cong_wait Steven Rostedt
@ 2021-11-24 18:03 ` Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 08/13] locking: Drop might_resched() from might_sleep_no_state_check() Steven Rostedt
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2021-11-24 18:03 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Daniel Wagner, Tom Zanussi, Srivatsa S. Bhat

5.10.78-rt56-rc3 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

This is an update of the original patch, removing put_cpu_var() which
was overseen in the initial patch.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 fs/fscache/object.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index 55158f30d093..fb9794dce721 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -823,8 +823,6 @@ void fscache_enqueue_object(struct fscache_object *object)
 				wake_up(&fscache_object_cong_wait);
 		} else
 			fscache_put_object(object, fscache_obj_put_queue);
-
-		put_cpu_var(fscache_object_cong_wait);
 	}
 }
 
-- 
2.33.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RT 08/13] locking: Drop might_resched() from might_sleep_no_state_check()
  2021-11-24 18:03 [PATCH RT 00/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
                   ` (6 preceding siblings ...)
  2021-11-24 18:03 ` [PATCH RT 07/13] fscache: Use only one fscache_object_cong_wait (2) Steven Rostedt
@ 2021-11-24 18:03 ` Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 09/13] drm/i915/gt: Queue and wait for the irq_work item Steven Rostedt
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2021-11-24 18:03 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Daniel Wagner, Tom Zanussi, Srivatsa S. Bhat

5.10.78-rt56-rc3 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

might_sleep_no_state_check() serves the same purpose as might_sleep()
except it is used before sleeping locks are acquired and therefore does
not check task_struct::state because the state is preserved.

That state is preserved in the locking slow path so we must not schedule
at the begin of the locking function because the state will be lost and
not preserved at that time.

Remove might_resched() from might_sleep_no_state_check() to avoid losing the
state before it is preserved.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/kernel.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 2cff7554395d..6eb0ab994f4c 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -222,7 +222,7 @@ extern void __cant_migrate(const char *file, int line);
 	do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
 
 # define might_sleep_no_state_check() \
-	do { ___might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
+	do { ___might_sleep(__FILE__, __LINE__, 0); } while (0)
 
 /**
  * cant_sleep - annotation for functions that cannot sleep
-- 
2.33.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RT 09/13] drm/i915/gt: Queue and wait for the irq_work item.
  2021-11-24 18:03 [PATCH RT 00/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
                   ` (7 preceding siblings ...)
  2021-11-24 18:03 ` [PATCH RT 08/13] locking: Drop might_resched() from might_sleep_no_state_check() Steven Rostedt
@ 2021-11-24 18:03 ` Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 10/13] irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support Steven Rostedt
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2021-11-24 18:03 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Daniel Wagner, Tom Zanussi, Srivatsa S. Bhat,
	Clark Williams, Maarten Lankhorst

5.10.78-rt56-rc3 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Disabling interrupts and invoking the irq_work function directly breaks
on PREEMPT_RT.
PREEMPT_RT does not invoke all irq_work from hardirq context because
some of the user have spinlock_t locking in the callback function.
These locks are then turned into a sleeping locks which can not be
acquired with disabled interrupts.

Using irq_work_queue() has the benefit that the irqwork will be invoked
in the regular context. In general there is "no" delay between enqueuing
the callback and its invocation because the interrupt is raised right
away on architectures which support it (which includes x86).

Use irq_work_queue() + irq_work_sync() instead invoking the callback
directly.

Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 0040b4765a54..3f4f854786f2 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -342,10 +342,9 @@ void intel_breadcrumbs_park(struct intel_breadcrumbs *b)
 	/* Kick the work once more to drain the signalers */
 	irq_work_sync(&b->irq_work);
 	while (unlikely(READ_ONCE(b->irq_armed))) {
-		local_irq_disable();
-		signal_irq_work(&b->irq_work);
-		local_irq_enable();
+		irq_work_queue(&b->irq_work);
 		cond_resched();
+		irq_work_sync(&b->irq_work);
 	}
 	GEM_BUG_ON(!list_empty(&b->signalers));
 }
-- 
2.33.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RT 10/13] irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
  2021-11-24 18:03 [PATCH RT 00/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
                   ` (8 preceding siblings ...)
  2021-11-24 18:03 ` [PATCH RT 09/13] drm/i915/gt: Queue and wait for the irq_work item Steven Rostedt
@ 2021-11-24 18:03 ` Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 11/13] irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT Steven Rostedt
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2021-11-24 18:03 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Daniel Wagner, Tom Zanussi, Srivatsa S. Bhat,
	Peter Zijlstra (Intel)

5.10.78-rt56-rc3 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

irq_work() triggers instantly an interrupt if supported by the
architecture. Otherwise the work will be processed on the next timer
tick. In worst case irq_work_sync() could spin up to a jiffy.

irq_work_sync() is usually used in tear down context which is fully
preemptible. Based on review irq_work_sync() is invoked from preemptible
context and there is one waiter at a time. This qualifies it to use
rcuwait for synchronisation.

Let irq_work_sync() synchronize with rcuwait if the architecture
processes irqwork via the timer tick.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211006111852.1514359-3-bigeasy@linutronix.de
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/irq_work.h | 10 +++++++++-
 kernel/irq_work.c        | 10 ++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index f941f2d7d71c..3c6d3a96bca0 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -3,6 +3,7 @@
 #define _LINUX_IRQ_WORK_H
 
 #include <linux/smp_types.h>
+#include <linux/rcuwait.h>
 
 /*
  * An entry can be in one of four states:
@@ -22,6 +23,7 @@ struct irq_work {
 		};
 	};
 	void (*func)(struct irq_work *);
+	struct rcuwait irqwait;
 };
 
 static inline
@@ -29,13 +31,19 @@ void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
 {
 	atomic_set(&work->flags, 0);
 	work->func = func;
+	rcuwait_init(&work->irqwait);
 }
 
 #define DEFINE_IRQ_WORK(name, _f) struct irq_work name = {	\
 		.flags = ATOMIC_INIT(0),			\
-		.func  = (_f)					\
+		.func  = (_f),					\
+		.irqwait = __RCUWAIT_INITIALIZER(irqwait),	\
 }
 
+static inline bool irq_work_is_busy(struct irq_work *work)
+{
+	return atomic_read(&work->flags) & IRQ_WORK_BUSY;
+}
 
 bool irq_work_queue(struct irq_work *work);
 bool irq_work_queue_on(struct irq_work *work, int cpu);
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 8183d30e1bb1..8969aff790e2 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -165,6 +165,9 @@ void irq_work_single(void *arg)
 	 */
 	flags &= ~IRQ_WORK_PENDING;
 	(void)atomic_cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
+
+	if (!arch_irq_work_has_interrupt())
+		rcuwait_wake_up(&work->irqwait);
 }
 
 static void irq_work_run_list(struct llist_head *list)
@@ -231,6 +234,13 @@ void irq_work_tick_soft(void)
 void irq_work_sync(struct irq_work *work)
 {
 	lockdep_assert_irqs_enabled();
+	might_sleep();
+
+	if (!arch_irq_work_has_interrupt()) {
+		rcuwait_wait_event(&work->irqwait, !irq_work_is_busy(work),
+				   TASK_UNINTERRUPTIBLE);
+		return;
+	}
 
 	while (atomic_read(&work->flags) & IRQ_WORK_BUSY)
 		cpu_relax();
-- 
2.33.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RT 11/13] irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
  2021-11-24 18:03 [PATCH RT 00/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
                   ` (9 preceding siblings ...)
  2021-11-24 18:03 ` [PATCH RT 10/13] irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support Steven Rostedt
@ 2021-11-24 18:03 ` Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 12/13] irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ " Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 13/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
  12 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2021-11-24 18:03 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Daniel Wagner, Tom Zanussi, Srivatsa S. Bhat,
	Peter Zijlstra (Intel)

5.10.78-rt56-rc3 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The irq_work callback is invoked in hard IRQ context. By default all
callbacks are scheduled for invocation right away (given supported by
the architecture) except for the ones marked IRQ_WORK_LAZY which are
delayed until the next timer-tick.

While looking over the callbacks, some of them may acquire locks
(spinlock_t, rwlock_t) which are transformed into sleeping locks on
PREEMPT_RT and must not be acquired in hard IRQ context.
Changing the locks into locks which could be acquired in this context
will lead to other problems such as increased latencies if everything
in the chain has IRQ-off locks. This will not solve all the issues as
one callback has been noticed which invoked kref_put() and its callback
invokes kfree() and this can not be invoked in hardirq context.

Some callbacks are required to be invoked in hardirq context even on
PREEMPT_RT to work properly. This includes for instance the NO_HZ
callback which needs to be able to observe the idle context.

The callbacks which require to be run in hardirq have already been
marked. Use this information to split the callbacks onto the two lists
on PREEMPT_RT:
- lazy_list
  Work items which are not marked with IRQ_WORK_HARD_IRQ will be added
  to this list. Callbacks on this list will be invoked from a per-CPU
  thread.
  The handler here may acquire sleeping locks such as spinlock_t and
  invoke kfree().

- raised_list
  Work items which are marked with IRQ_WORK_HARD_IRQ will be added to
  this list. They will be invoked in hardirq context and must not
  acquire any sleeping locks.

The wake up of the per-CPU thread occurs from irq_work handler/
hardirq context. The thread runs with lowest RT priority to ensure it
runs before any SCHED_OTHER tasks do.

[bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
	  hard and soft variant. Collected fixes over time from Steven
	  Rostedt and Mike Galbraith. Move to per-CPU threads instead of
	  softirq as suggested by PeterZ.]

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211007092646.uhshe3ut2wkrcfzv@linutronix.de
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/irq_work.h |  16 +++--
 kernel/irq_work.c        | 131 ++++++++++++++++++++++++++++-----------
 kernel/time/timer.c      |   2 -
 3 files changed, 106 insertions(+), 43 deletions(-)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 3c6d3a96bca0..f551ba9c99d4 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -40,6 +40,16 @@ void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
 		.irqwait = __RCUWAIT_INITIALIZER(irqwait),	\
 }
 
+#define __IRQ_WORK_INIT(_func, _flags) (struct irq_work){	\
+	.flags = ATOMIC_INIT(_flags),				\
+	.func = (_func),					\
+	.irqwait = __RCUWAIT_INITIALIZER(irqwait),		\
+}
+
+#define IRQ_WORK_INIT(_func) __IRQ_WORK_INIT(_func, 0)
+#define IRQ_WORK_INIT_LAZY(_func) __IRQ_WORK_INIT(_func, IRQ_WORK_LAZY)
+#define IRQ_WORK_INIT_HARD(_func) __IRQ_WORK_INIT(_func, IRQ_WORK_HARD_IRQ)
+
 static inline bool irq_work_is_busy(struct irq_work *work)
 {
 	return atomic_read(&work->flags) & IRQ_WORK_BUSY;
@@ -63,10 +73,4 @@ static inline void irq_work_run(void) { }
 static inline void irq_work_single(void *arg) { }
 #endif
 
-#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT)
-void irq_work_tick_soft(void);
-#else
-static inline void irq_work_tick_soft(void) { }
-#endif
-
 #endif /* _LINUX_IRQ_WORK_H */
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 8969aff790e2..03d09d779ee1 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -18,12 +18,37 @@
 #include <linux/cpu.h>
 #include <linux/notifier.h>
 #include <linux/smp.h>
+#include <linux/smpboot.h>
 #include <linux/interrupt.h>
 #include <asm/processor.h>
 
 
 static DEFINE_PER_CPU(struct llist_head, raised_list);
 static DEFINE_PER_CPU(struct llist_head, lazy_list);
+static DEFINE_PER_CPU(struct task_struct *, irq_workd);
+
+static void wake_irq_workd(void)
+{
+	struct task_struct *tsk = __this_cpu_read(irq_workd);
+
+	if (!llist_empty(this_cpu_ptr(&lazy_list)) && tsk)
+		wake_up_process(tsk);
+}
+
+#ifdef CONFIG_SMP
+static void irq_work_wake(struct irq_work *entry)
+{
+	wake_irq_workd();
+}
+
+static DEFINE_PER_CPU(struct irq_work, irq_work_wakeup) =
+	IRQ_WORK_INIT_HARD(irq_work_wake);
+#endif
+
+static int irq_workd_should_run(unsigned int cpu)
+{
+	return !llist_empty(this_cpu_ptr(&lazy_list));
+}
 
 /*
  * Claim the entry so that no one else will poke at it.
@@ -54,20 +79,28 @@ void __weak arch_irq_work_raise(void)
 static void __irq_work_queue_local(struct irq_work *work)
 {
 	struct llist_head *list;
-	bool lazy_work, realtime = IS_ENABLED(CONFIG_PREEMPT_RT);
-
-	lazy_work = atomic_read(&work->flags) & IRQ_WORK_LAZY;
-
-	/* If the work is "lazy", handle it from next tick if any */
-	if (lazy_work || (realtime && !(atomic_read(&work->flags) & IRQ_WORK_HARD_IRQ)))
+	bool rt_lazy_work = false;
+	bool lazy_work = false;
+	int work_flags;
+
+	work_flags = atomic_read(&work->flags);
+	if (work_flags & IRQ_WORK_LAZY)
+		lazy_work = true;
+	else if (IS_ENABLED(CONFIG_PREEMPT_RT) &&
+		 !(work_flags & IRQ_WORK_HARD_IRQ))
+		rt_lazy_work = true;
+
+	if (lazy_work || rt_lazy_work)
 		list = this_cpu_ptr(&lazy_list);
 	else
 		list = this_cpu_ptr(&raised_list);
 
-	if (llist_add(&work->llnode, list)) {
-		if (!lazy_work || tick_nohz_tick_stopped())
-			arch_irq_work_raise();
-	}
+	if (!llist_add(&work->llnode, list))
+		return;
+
+	/* If the work is "lazy", handle it from next tick if any */
+	if (!lazy_work || tick_nohz_tick_stopped())
+		arch_irq_work_raise();
 }
 
 /* Enqueue the irq work @work on the current CPU */
@@ -110,15 +143,27 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
 		/* Arch remote IPI send/receive backend aren't NMI safe */
 		WARN_ON_ONCE(in_nmi());
 
-		if (IS_ENABLED(CONFIG_PREEMPT_RT) && !(atomic_read(&work->flags) & IRQ_WORK_HARD_IRQ)) {
-			if (llist_add(&work->llnode, &per_cpu(lazy_list, cpu)))
-				arch_send_call_function_single_ipi(cpu);
-		} else {
-			__smp_call_single_queue(cpu, &work->llnode);
+		/*
+		 * On PREEMPT_RT the items which are not marked as
+		 * IRQ_WORK_HARD_IRQ are added to the lazy list and a HARD work
+		 * item is used on the remote CPU to wake the thread.
+		 */
+		if (IS_ENABLED(CONFIG_PREEMPT_RT) &&
+		    !(atomic_read(&work->flags) & IRQ_WORK_HARD_IRQ)) {
+
+			if (!llist_add(&work->llnode, &per_cpu(lazy_list, cpu)))
+				goto out;
+
+			work = &per_cpu(irq_work_wakeup, cpu);
+			if (!irq_work_claim(work))
+				goto out;
 		}
+
+		__smp_call_single_queue(cpu, &work->llnode);
 	} else {
 		__irq_work_queue_local(work);
 	}
+out:
 	preempt_enable();
 
 	return true;
@@ -175,12 +220,13 @@ static void irq_work_run_list(struct llist_head *list)
 	struct irq_work *work, *tmp;
 	struct llist_node *llnode;
 
-#ifndef CONFIG_PREEMPT_RT
 	/*
-	 * nort: On RT IRQ-work may run in SOFTIRQ context.
+	 * On PREEMPT_RT IRQ-work which is not marked as HARD will be processed
+	 * in a per-CPU thread in preemptible context. Only the items which are
+	 * marked as IRQ_WORK_HARD_IRQ will be processed in hardirq context.
 	 */
-	BUG_ON(!irqs_disabled());
-#endif
+	BUG_ON(!irqs_disabled() && !IS_ENABLED(CONFIG_PREEMPT_RT));
+
 	if (llist_empty(list))
 		return;
 
@@ -196,16 +242,10 @@ static void irq_work_run_list(struct llist_head *list)
 void irq_work_run(void)
 {
 	irq_work_run_list(this_cpu_ptr(&raised_list));
-	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
-		/*
-		 * NOTE: we raise softirq via IPI for safety,
-		 * and execute in irq_work_tick() to move the
-		 * overhead from hard to soft irq context.
-		 */
-		if (!llist_empty(this_cpu_ptr(&lazy_list)))
-			raise_softirq(TIMER_SOFTIRQ);
-	} else
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
 		irq_work_run_list(this_cpu_ptr(&lazy_list));
+	else
+		wake_irq_workd();
 }
 EXPORT_SYMBOL_GPL(irq_work_run);
 
@@ -218,15 +258,10 @@ void irq_work_tick(void)
 
 	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
 		irq_work_run_list(this_cpu_ptr(&lazy_list));
+	else
+		wake_irq_workd();
 }
 
-#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT)
-void irq_work_tick_soft(void)
-{
-	irq_work_run_list(this_cpu_ptr(&lazy_list));
-}
-#endif
-
 /*
  * Synchronize against the irq_work @entry, ensures the entry is not
  * currently in use.
@@ -246,3 +281,29 @@ void irq_work_sync(struct irq_work *work)
 		cpu_relax();
 }
 EXPORT_SYMBOL_GPL(irq_work_sync);
+
+static void run_irq_workd(unsigned int cpu)
+{
+	irq_work_run_list(this_cpu_ptr(&lazy_list));
+}
+
+static void irq_workd_setup(unsigned int cpu)
+{
+	sched_set_fifo_low(current);
+}
+
+static struct smp_hotplug_thread irqwork_threads = {
+	.store                  = &irq_workd,
+	.setup			= irq_workd_setup,
+	.thread_should_run      = irq_workd_should_run,
+	.thread_fn              = run_irq_workd,
+	.thread_comm            = "irq_work/%u",
+};
+
+static __init int irq_work_init_threads(void)
+{
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		BUG_ON(smpboot_register_percpu_thread(&irqwork_threads));
+	return 0;
+}
+early_initcall(irq_work_init_threads);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index af3daf03c917..cd67ee6d2634 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1767,8 +1767,6 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h)
 {
 	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
 
-	irq_work_tick_soft();
-
 	__run_timers(base);
 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON))
 		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
-- 
2.33.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RT 12/13] irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT
  2021-11-24 18:03 [PATCH RT 00/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
                   ` (10 preceding siblings ...)
  2021-11-24 18:03 ` [PATCH RT 11/13] irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT Steven Rostedt
@ 2021-11-24 18:03 ` Steven Rostedt
  2021-11-24 18:03 ` [PATCH RT 13/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
  12 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2021-11-24 18:03 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Daniel Wagner, Tom Zanussi, Srivatsa S. Bhat,
	Peter Zijlstra (Intel)

5.10.78-rt56-rc3 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

On PREEMPT_RT most items are processed as LAZY via softirq context.
Avoid to spin-wait for them because irq_work_sync() could have higher
priority and not allow the irq-work to be completed.

Wait additionally for !IRQ_WORK_HARD_IRQ irq_work items on PREEMPT_RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211006111852.1514359-5-bigeasy@linutronix.de
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/irq_work.h | 5 +++++
 kernel/irq_work.c        | 6 ++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index f551ba9c99d4..2c0059340871 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -55,6 +55,11 @@ static inline bool irq_work_is_busy(struct irq_work *work)
 	return atomic_read(&work->flags) & IRQ_WORK_BUSY;
 }
 
+static inline bool irq_work_is_hard(struct irq_work *work)
+{
+	return atomic_read(&work->flags) & IRQ_WORK_HARD_IRQ;
+}
+
 bool irq_work_queue(struct irq_work *work);
 bool irq_work_queue_on(struct irq_work *work, int cpu);
 
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 03d09d779ee1..cbec10c32ead 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -211,7 +211,8 @@ void irq_work_single(void *arg)
 	flags &= ~IRQ_WORK_PENDING;
 	(void)atomic_cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
 
-	if (!arch_irq_work_has_interrupt())
+	if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) ||
+	    !arch_irq_work_has_interrupt())
 		rcuwait_wake_up(&work->irqwait);
 }
 
@@ -271,7 +272,8 @@ void irq_work_sync(struct irq_work *work)
 	lockdep_assert_irqs_enabled();
 	might_sleep();
 
-	if (!arch_irq_work_has_interrupt()) {
+	if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) ||
+	    !arch_irq_work_has_interrupt()) {
 		rcuwait_wait_event(&work->irqwait, !irq_work_is_busy(work),
 				   TASK_UNINTERRUPTIBLE);
 		return;
-- 
2.33.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RT 13/13] Linux 5.10.78-rt56-rc3
  2021-11-24 18:03 [PATCH RT 00/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
                   ` (11 preceding siblings ...)
  2021-11-24 18:03 ` [PATCH RT 12/13] irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ " Steven Rostedt
@ 2021-11-24 18:03 ` Steven Rostedt
  12 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2021-11-24 18:03 UTC (permalink / raw)
  To: linux-kernel, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, Sebastian Andrzej Siewior,
	John Kacur, Daniel Wagner, Tom Zanussi, Srivatsa S. Bhat

5.10.78-rt56-rc3 stable review patch.
If anyone has any objections, please let me know.

------------------

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

---
 localversion-rt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index 51b05e9abe6f..dfd6a5d63829 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt55
+-rt56-rc3
-- 
2.33.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-11-24 18:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-24 18:03 [PATCH RT 00/13] Linux 5.10.78-rt56-rc3 Steven Rostedt
2021-11-24 18:03 ` [PATCH RT 01/13] mm, zsmalloc: Convert zsmalloc_handle.lock to spinlock_t Steven Rostedt
2021-11-24 18:03 ` [PATCH RT 02/13] sched: Fix get_push_task() vs migrate_disable() Steven Rostedt
2021-11-24 18:03 ` [PATCH RT 03/13] sched: Switch wait_task_inactive to HRTIMER_MODE_REL_HARD Steven Rostedt
2021-11-24 18:03 ` [PATCH RT 04/13] preempt: Move preempt_enable_no_resched() to the RT block Steven Rostedt
2021-11-24 18:03 ` [PATCH RT 05/13] mm: Disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT Steven Rostedt
2021-11-24 18:03 ` [PATCH RT 06/13] fscache: Use only one fscache_object_cong_wait Steven Rostedt
2021-11-24 18:03 ` [PATCH RT 07/13] fscache: Use only one fscache_object_cong_wait (2) Steven Rostedt
2021-11-24 18:03 ` [PATCH RT 08/13] locking: Drop might_resched() from might_sleep_no_state_check() Steven Rostedt
2021-11-24 18:03 ` [PATCH RT 09/13] drm/i915/gt: Queue and wait for the irq_work item Steven Rostedt
2021-11-24 18:03 ` [PATCH RT 10/13] irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support Steven Rostedt
2021-11-24 18:03 ` [PATCH RT 11/13] irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT Steven Rostedt
2021-11-24 18:03 ` [PATCH RT 12/13] irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ " Steven Rostedt
2021-11-24 18:03 ` [PATCH RT 13/13] Linux 5.10.78-rt56-rc3 Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.