All of lore.kernel.org
 help / color / mirror / Atom feed
* [ANNOUNCE] 3.14.23-rt20
@ 2014-10-31 21:03 Steven Rostedt
  2014-11-02  7:30 ` Mike Galbraith
       [not found] ` <1414910967.5380.81.camel@marge.simpson.net>
  0 siblings, 2 replies; 29+ messages in thread
From: Steven Rostedt @ 2014-10-31 21:03 UTC (permalink / raw)
  To: LKML, linux-rt-users
  Cc: Thomas Gleixner, Carsten Emde, John Kacur,
	Sebastian Andrzej Siewior, Clark Williams


Dear RT Folks,

I'm pleased to announce the 3.14.23-rt20 stable release.

This is the first 3.14-rt release in the stable-rt series. Normally I
wait till the next development release is out before I pull in a new
one. That is, I would pull in 3.14-rt when 3.16-rt or later was
released. But because development is now moving at a "hobbyist rate" 
 (read http://lwn.net/Articles/617140/ for details)
and 3.14-rt is no longer being developed against, I figured it was time
to put it under the "stable-rt" umbrella.

This release is just an update to the new stable 3.14.23 version
and no RT specific changes have been made.


You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v3.14-rt
  Head SHA1: 98a86c73c8828ba724eafaef832d71e3c909fb73


Or to build 3.14.23-rt20 directly, the following patches should be
applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.14.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.14.23.xz

  http://www.kernel.org/pub/linux/kernel/projects/rt/3.14/patch-3.14.23-rt20.patch.xz




Enjoy,

-- Steve


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ANNOUNCE] 3.14.23-rt20
  2014-10-31 21:03 [ANNOUNCE] 3.14.23-rt20 Steven Rostedt
@ 2014-11-02  7:30 ` Mike Galbraith
       [not found]   ` <CADLDEKvFvw7Aa98tVtM3z5JV8oocKqAdV4PqOMoZH2mXZ6x2jg@mail.gmail.com>
       [not found] ` <1414910967.5380.81.camel@marge.simpson.net>
  1 sibling, 1 reply; 29+ messages in thread
From: Mike Galbraith @ 2014-11-02  7:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-rt-users, Thomas Gleixner, Carsten Emde, John Kacur,
	Sebastian Andrzej Siewior, Clark Williams

On Fri, 2014-10-31 at 17:03 -0400, Steven Rostedt wrote:
> Dear RT Folks,
> 
> I'm pleased to announce the 3.14.23-rt20 stable release.
> 
> This is the first 3.14-rt release in the stable-rt series. Normally I
> wait till the next development release is out before I pull in a new
> one. That is, I would pull in 3.14-rt when 3.16-rt or later was
> released. But because development is now moving at a "hobbyist rate" 
> (read http://lwn.net/Articles/617140/ for details)
> and 3.14-rt is no longer being developed against, I figured it was
time
> to put it under the "stable-rt" umbrella.

I piddled about with it yesterday, found that you can't change cpufreq
governor IFF the tree is configured as rt, but works fine as voluntary
preempt.  I'll poke about for the entertainment value.  Having no
personal need/use for rt detracts from its hobby value somewhat, but rt
problems do have a tendency to be 'entertaining'.

I'll follow up with a few patches that folks can apply to their trees if
they so desire.  There being no devel tree to submit against, I can't do
a proper submission (rules), and some of them you surely don't want :) 

	-Mike



^ permalink raw reply	[flat|nested] 29+ messages in thread

* 3.14.23-rt20 - sched/numa: Fix task_numa_free() lockdep splat
       [not found] ` <1414910967.5380.81.camel@marge.simpson.net>
@ 2014-11-02  7:30   ` Mike Galbraith
  2014-11-02  7:31   ` 3.14.23-rt20 - wwmutex: fix __ww_mutex_lock_interruptible lockdep annotation Mike Galbraith
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Mike Galbraith @ 2014-11-02  7:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-rt-users, Thomas Gleixner, Carsten Emde, John Kacur,
	Sebastian Andrzej Siewior, Clark Williams

>From 60e69eed85bb7b5198ef70643b5895c26ad76ef7 Mon Sep 17 00:00:00 2001
From: Mike Galbraith <bitbucket@online.de>
Date: Mon, 7 Apr 2014 10:55:15 +0200
Subject: sched/numa: Fix task_numa_free() lockdep splat

Sasha reported that lockdep claims that the following commit:
made numa_group.lock interrupt unsafe:

  156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()")

While I don't see how that could be, given the commit in question moved
task_numa_free() from one irq enabled region to another, the below does
make both gripes and lockups upon gripe with numa=fake=4 go away.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Fixes: 156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()")
Signed-off-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org
Cc: mgorman@suse.com
Cc: akpm@linux-foundation.org
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/1396860915.5170.5.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>

---
 kernel/sched/fair.c  |   13 +++++++------
 kernel/sched/sched.h |    9 +++++++++
 2 files changed, 16 insertions(+), 6 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1371,7 +1371,7 @@ static void task_numa_placement(struct t
 	/* If the task is part of a group prevent parallel updates to group stats */
 	if (p->numa_group) {
 		group_lock = &p->numa_group->lock;
-		spin_lock(group_lock);
+		spin_lock_irq(group_lock);
 	}
 
 	/* Find the node with the highest number of faults */
@@ -1432,7 +1432,7 @@ static void task_numa_placement(struct t
 			}
 		}
 
-		spin_unlock(group_lock);
+		spin_unlock_irq(group_lock);
 	}
 
 	/* Preferred node as the node with the most faults */
@@ -1532,7 +1532,8 @@ static void task_numa_group(struct task_
 	if (!join)
 		return;
 
-	double_lock(&my_grp->lock, &grp->lock);
+	BUG_ON(irqs_disabled());
+	double_lock_irq(&my_grp->lock, &grp->lock);
 
 	for (i = 0; i < 2*nr_node_ids; i++) {
 		my_grp->faults[i] -= p->numa_faults[i];
@@ -1546,7 +1547,7 @@ static void task_numa_group(struct task_
 	grp->nr_tasks++;
 
 	spin_unlock(&my_grp->lock);
-	spin_unlock(&grp->lock);
+	spin_unlock_irq(&grp->lock);
 
 	rcu_assign_pointer(p->numa_group, grp);
 
@@ -1565,14 +1566,14 @@ void task_numa_free(struct task_struct *
 	void *numa_faults = p->numa_faults;
 
 	if (grp) {
-		spin_lock(&grp->lock);
+		spin_lock_irq(&grp->lock);
 		for (i = 0; i < 2*nr_node_ids; i++)
 			grp->faults[i] -= p->numa_faults[i];
 		grp->total_faults -= p->total_numa_faults;
 
 		list_del(&p->numa_entry);
 		grp->nr_tasks--;
-		spin_unlock(&grp->lock);
+		spin_unlock_irq(&grp->lock);
 		rcu_assign_pointer(p->numa_group, NULL);
 		put_numa_group(grp);
 	}
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1402,6 +1402,15 @@ static inline void double_lock(spinlock_
 	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
 }
 
+static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2)
+{
+	if (l1 > l2)
+		swap(l1, l2);
+
+	spin_lock_irq(l1);
+	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
+}
+
 static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2)
 {
 	if (l1 > l2)







^ permalink raw reply	[flat|nested] 29+ messages in thread

* 3.14.23-rt20 - wwmutex: fix __ww_mutex_lock_interruptible lockdep annotation
       [not found] ` <1414910967.5380.81.camel@marge.simpson.net>
  2014-11-02  7:30   ` 3.14.23-rt20 - sched/numa: Fix task_numa_free() lockdep splat Mike Galbraith
@ 2014-11-02  7:31   ` Mike Galbraith
  2014-11-02  7:31   ` 3.14.23-rt20 - mm,memcg: make refill_stock()/consume_stock() use get_cpu_light() Mike Galbraith
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Mike Galbraith @ 2014-11-02  7:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-rt-users, Thomas Gleixner, Carsten Emde, John Kacur,
	Sebastian Andrzej Siewior, Clark Williams

Adding mutex_acquire_nest() as used in __ww_mutex_lock() fixes the 
lockdep splat below.  Remove superflous linebreak in __ww_mutex_lock()
as well.

[   56.999063] =============================================
[   56.999063] [ INFO: possible recursive locking detected ]
[   56.999065] 3.14.4-rt5 #26 Not tainted
[   56.999065] ---------------------------------------------
[   56.999067] Xorg/4298 is trying to acquire lock:
[   56.999107]  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa02b4270>] nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau]
[   56.999107] 
               but task is already holding lock:
[   56.999130]  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa02b4270>] nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau]
[   56.999131] 
               other info that might help us debug this:
[   56.999131]  Possible unsafe locking scenario:
               
[   56.999132]        CPU0
[   56.999132]        ----
[   56.999133]   lock(reservation_ww_class_mutex);
[   56.999134]   lock(reservation_ww_class_mutex);
[   56.999134] 
                *** DEADLOCK ***
               
[   56.999135]  May be due to missing lock nesting notation
               
[   56.999136] 3 locks held by Xorg/4298:
[   56.999160]  #0:  (&cli->mutex){+.+.+.}, at: [<ffffffffa02b597b>] nouveau_abi16_get+0x2b/0x100 [nouveau]
[   56.999174]  #1:  (reservation_ww_class_acquire){+.+...}, at: [<ffffffffa0160cd2>] drm_ioctl+0x4d2/0x610 [drm]
[   56.999197]  #2:  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa02b4270>] nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau]
[   56.999198] 
               stack backtrace:
[   56.999200] CPU: 1 PID: 4298 Comm: Xorg Not tainted 3.14.4-rt5 #26
[   56.999200] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007
[   56.999203]  ffffffff820309b0 ffff8802217a9ac8 ffffffff8157e53e ffffffff820309b0
[   56.999205]  ffff8802217a9b90 ffffffff810964ce ffff8802217a9b40 0000000000000246
[   56.999207]  0000000000000003 ffffffff8204f130 0001e09300000000 0001e093347523a9
[   56.999207] Call Trace:
[   56.999212]  [<ffffffff8157e53e>] dump_stack+0x4f/0x9d
[   56.999215]  [<ffffffff810964ce>] __lock_acquire+0x169e/0x1a50
[   56.999218]  [<ffffffff81588da9>] ? preempt_count_sub+0x49/0x50
[   56.999219]  [<ffffffff81584541>] ? _raw_spin_unlock+0x31/0x60
[   56.999221]  [<ffffffff8109701d>] lock_acquire+0x8d/0x140
[   56.999243]  [<ffffffffa02b4270>] ? nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau]
[   56.999246]  [<ffffffff8158313d>] __ww_mutex_lock_interruptible+0x3d/0xf0
[   56.999268]  [<ffffffffa02b4270>] ? nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau]
[   56.999290]  [<ffffffffa02b4270>] nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau]
[   56.999298]  [<ffffffffa0160cd2>] ? drm_ioctl+0x4d2/0x610 [drm]
[   56.999308]  [<ffffffffa0160cd2>] drm_ioctl+0x4d2/0x610 [drm]
[   56.999311]  [<ffffffff81588da9>] ? preempt_count_sub+0x49/0x50
[   56.999313]  [<ffffffff8107cd60>] ? migrate_enable+0xe0/0x1e0
[   56.999335]  [<ffffffffa02aaece>] nouveau_drm_ioctl+0x4e/0x90 [nouveau]
[   56.999338]  [<ffffffff81185ff0>] do_vfs_ioctl+0x300/0x530
[   56.999340]  [<ffffffff8158d33b>] ? sysret_check+0x1b/0x56
[   56.999342]  [<ffffffff81094a8d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
[   56.999344]  [<ffffffff81186261>] SyS_ioctl+0x41/0x80
[   56.999345]  [<ffffffff8158d316>] system_call_fastpath+0x1a/0x1f

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
---
 kernel/locking/rtmutex.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1964,7 +1964,7 @@ __ww_mutex_lock_interruptible(struct ww_
 
 	might_sleep();
 
-	mutex_acquire(&lock->base.dep_map, 0, 0, _RET_IP_);
+	mutex_acquire_nest(&lock->base.dep_map, 0, 0, &ww_ctx->dep_map, _RET_IP_);
 	ret = rt_mutex_slowlock(&lock->base.lock, TASK_INTERRUPTIBLE, NULL, 0, ww_ctx);
 	if (ret)
 		mutex_release(&lock->base.dep_map, 1, _RET_IP_);
@@ -1982,8 +1982,7 @@ __ww_mutex_lock(struct ww_mutex *lock, s
 
 	might_sleep();
 
-	mutex_acquire_nest(&lock->base.dep_map, 0, 0, &ww_ctx->dep_map,
-			_RET_IP_);
+	mutex_acquire_nest(&lock->base.dep_map, 0, 0, &ww_ctx->dep_map, _RET_IP_);
 	ret = rt_mutex_slowlock(&lock->base.lock, TASK_UNINTERRUPTIBLE, NULL, 0, ww_ctx);
 	if (ret)
 		mutex_release(&lock->base.dep_map, 1, _RET_IP_);






^ permalink raw reply	[flat|nested] 29+ messages in thread

* 3.14.23-rt20 - mm,memcg: make refill_stock()/consume_stock() use get_cpu_light()
       [not found] ` <1414910967.5380.81.camel@marge.simpson.net>
  2014-11-02  7:30   ` 3.14.23-rt20 - sched/numa: Fix task_numa_free() lockdep splat Mike Galbraith
  2014-11-02  7:31   ` 3.14.23-rt20 - wwmutex: fix __ww_mutex_lock_interruptible lockdep annotation Mike Galbraith
@ 2014-11-02  7:31   ` Mike Galbraith
  2014-11-02  7:31   ` 3.14.23-rt20 - fs,btrfs: fix rt deadlock on extent_buffer->lock Mike Galbraith
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Mike Galbraith @ 2014-11-02  7:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-rt-users, Thomas Gleixner, Carsten Emde, John Kacur,
	Sebastian Andrzej Siewior, Clark Williams

Nikita reported the following memcg scheduling while atomic bug:

Call Trace:
[e22d5a90] [c0007ea8] show_stack+0x4c/0x168 (unreliable)
[e22d5ad0] [c0618c04] __schedule_bug+0x94/0xb0
[e22d5ae0] [c060b9ec] __schedule+0x530/0x550
[e22d5bf0] [c060bacc] schedule+0x30/0xbc
[e22d5c00] [c060ca24] rt_spin_lock_slowlock+0x180/0x27c
[e22d5c70] [c00b39dc] res_counter_uncharge_until+0x40/0xc4
[e22d5ca0] [c013ca88] drain_stock.isra.20+0x54/0x98
[e22d5cc0] [c01402ac] __mem_cgroup_try_charge+0x2e8/0xbac
[e22d5d70] [c01410d4] mem_cgroup_charge_common+0x3c/0x70
[e22d5d90] [c0117284] __do_fault+0x38c/0x510
[e22d5df0] [c011a5f4] handle_pte_fault+0x98/0x858
[e22d5e50] [c060ed08] do_page_fault+0x42c/0x6fc
[e22d5f40] [c000f5b4] handle_page_fault+0xc/0x80

What happens:

   refill_stock()
      get_cpu_var()
      drain_stock()
         res_counter_uncharge()
            res_counter_uncharge_until()
               spin_lock() <== boom

Fix it by replacing get/put_cpu_var() with get/put_cpu_light().

Reported-by: Nikita Yushchenko <nyushchenko@dev.rtsoft.ru>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
---
 mm/memcontrol.c |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2438,16 +2438,18 @@ static bool consume_stock(struct mem_cgr
 {
 	struct memcg_stock_pcp *stock;
 	bool ret = true;
+	int cpu;
 
 	if (nr_pages > CHARGE_BATCH)
 		return false;
 
-	stock = &get_cpu_var(memcg_stock);
+	cpu = get_cpu_light();
+	stock = &per_cpu(memcg_stock, cpu);
 	if (memcg == stock->cached && stock->nr_pages >= nr_pages)
 		stock->nr_pages -= nr_pages;
 	else /* need to call res_counter_charge */
 		ret = false;
-	put_cpu_var(memcg_stock);
+	put_cpu_light();
 	return ret;
 }
 
@@ -2497,14 +2499,17 @@ static void __init memcg_stock_init(void
  */
 static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
-	struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock);
+	struct memcg_stock_pcp *stock;
+	int cpu = get_cpu_light();
+
+	stock = &per_cpu(memcg_stock, cpu);
 
 	if (stock->cached != memcg) { /* reset if necessary */
 		drain_stock(stock);
 		stock->cached = memcg;
 	}
 	stock->nr_pages += nr_pages;
-	put_cpu_var(memcg_stock);
+	put_cpu_light();
 }
 
 /*






^ permalink raw reply	[flat|nested] 29+ messages in thread

* 3.14.23-rt20 - fs,btrfs: fix rt deadlock on extent_buffer->lock
       [not found] ` <1414910967.5380.81.camel@marge.simpson.net>
                     ` (2 preceding siblings ...)
  2014-11-02  7:31   ` 3.14.23-rt20 - mm,memcg: make refill_stock()/consume_stock() use get_cpu_light() Mike Galbraith
@ 2014-11-02  7:31   ` Mike Galbraith
  2015-02-17 11:56     ` Sebastian Andrzej Siewior
  2014-11-02  7:31   ` 3.14.23-rt20 - aio: fix rcu garbage collection might_sleep() splat Mike Galbraith
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Mike Galbraith @ 2014-11-02  7:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-rt-users, Thomas Gleixner, Carsten Emde, John Kacur,
	Sebastian Andrzej Siewior, Clark Williams

Subject: fs,btrfs: fix rt deadlock on extent_buffer->lock
From: Mike Galbraith <mgalbraith@suse.de>
Sat Jul 14 12:30:41 CEST 2012

Trivially repeatable deadlock is cured by enabling lockdep code in
btrfs_clear_path_blocking() as suggested by Chris Mason.  He also
suggested restricting blocking reader count to one, and not allowing
a spinning reader while blocking reader exists.  This has proven to
be unnecessary, the strict lock order enforcement is enough.. or
rather that's my box's opinion after long hours of hard pounding.

Note: extent-tree.c bit is additional recommendation from Chris
      Mason, split into a separate patch after discussion.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Chris Mason <chris.mason@fusionio.com>
---
 fs/btrfs/ctree.c       |    4 ++--
 fs/btrfs/extent-tree.c |    8 --------
 2 files changed, 2 insertions(+), 10 deletions(-)

--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -80,7 +80,7 @@ noinline void btrfs_clear_path_blocking(
 {
 	int i;
 
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
 	/* lockdep really cares that we take all of these spinlocks
 	 * in the right order.  If any of the locks in the path are not
 	 * currently blocking, it is going to complain.  So, make really
@@ -107,7 +107,7 @@ noinline void btrfs_clear_path_blocking(
 		}
 	}
 
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
 	if (held)
 		btrfs_clear_lock_blocking_rw(held, held_rw);
 #endif
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6938,14 +6938,6 @@ use_block_rsv(struct btrfs_trans_handle
 		goto again;
 	}
 
-	if (btrfs_test_opt(root, ENOSPC_DEBUG)) {
-		static DEFINE_RATELIMIT_STATE(_rs,
-				DEFAULT_RATELIMIT_INTERVAL * 10,
-				/*DEFAULT_RATELIMIT_BURST*/ 1);
-		if (__ratelimit(&_rs))
-			WARN(1, KERN_DEBUG
-				"BTRFS: block rsv returned %d\n", ret);
-	}
 try_reserve:
 	ret = reserve_metadata_bytes(root, block_rsv, blocksize,
 				     BTRFS_RESERVE_NO_FLUSH);






^ permalink raw reply	[flat|nested] 29+ messages in thread

* 3.14.23-rt20 - aio: fix rcu garbage collection might_sleep() splat
       [not found] ` <1414910967.5380.81.camel@marge.simpson.net>
                     ` (3 preceding siblings ...)
  2014-11-02  7:31   ` 3.14.23-rt20 - fs,btrfs: fix rt deadlock on extent_buffer->lock Mike Galbraith
@ 2014-11-02  7:31   ` Mike Galbraith
  2015-02-17 12:53     ` Sebastian Andrzej Siewior
  2014-11-02  7:31   ` 3.14.23-rt20 - x86, UV: raw_spinlock conversion Mike Galbraith
  2014-11-02  7:31   ` 3.14.23-rt20 - softirq: resurrect softirq threads Mike Galbraith
  6 siblings, 1 reply; 29+ messages in thread
From: Mike Galbraith @ 2014-11-02  7:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-rt-users, Thomas Gleixner, Carsten Emde, John Kacur,
	Sebastian Andrzej Siewior, Clark Williams

(not pretty)

[  172.743098] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:768
[  172.743116] in_atomic(): 1, irqs_disabled(): 0, pid: 26, name: rcuos/2
[  172.743117] 2 locks held by rcuos/2/26:
[  172.743128]  #0:  (rcu_callback){.+.+..}, at: [<ffffffff810b1a12>] rcu_nocb_kthread+0x1e2/0x380
[  172.743135]  #1:  (rcu_read_lock_sched){.+.+..}, at: [<ffffffff812acd26>] percpu_ref_kill_rcu+0xa6/0x1c0
[  172.743138] Preemption disabled at:[<ffffffff810b1a93>] rcu_nocb_kthread+0x263/0x380
[  172.743138]
[  172.743142] CPU: 0 PID: 26 Comm: rcuos/2 Not tainted 3.14.4-rt5 #31
[  172.743143] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007
[  172.743148]  ffff8802231aa190 ffff8802231a5d08 ffffffff81582e9e 0000000000000000
[  172.743151]  ffff8802231a5d28 ffffffff81077aeb ffff880209f68140 ffff880209f681c0
[  172.743154]  ffff8802231a5d48 ffffffff81589304 ffff880209f68000 ffff880209f68000
[  172.743155] Call Trace:
[  172.743160]  [<ffffffff81582e9e>] dump_stack+0x4e/0x9c
[  172.743163]  [<ffffffff81077aeb>] __might_sleep+0xfb/0x170
[  172.743167]  [<ffffffff81589304>] rt_spin_lock+0x24/0x70
[  172.743171]  [<ffffffff811c5790>] free_ioctx_users+0x30/0x130
[  172.743174]  [<ffffffff812ace34>] percpu_ref_kill_rcu+0x1b4/0x1c0
[  172.743177]  [<ffffffff812acd26>] ? percpu_ref_kill_rcu+0xa6/0x1c0
[  172.743180]  [<ffffffff812acc80>] ? percpu_ref_kill_and_confirm+0x70/0x70
[  172.743183]  [<ffffffff810b1a93>] rcu_nocb_kthread+0x263/0x380
[  172.743185]  [<ffffffff810b1a12>] ? rcu_nocb_kthread+0x1e2/0x380
[  172.743189]  [<ffffffff810b1830>] ? rcu_report_exp_rnp.isra.52+0xc0/0xc0
[  172.743192]  [<ffffffff8106e046>] kthread+0xd6/0xf0
[  172.743194]  [<ffffffff8158900c>] ? _raw_spin_unlock_irq+0x2c/0x70
[  172.743197]  [<ffffffff8106df70>] ? __kthread_parkme+0x70/0x70
[  172.743200]  [<ffffffff81591eec>] ret_from_fork+0x7c/0xb0
[  172.743203]  [<ffffffff8106df70>] ? __kthread_parkme+0x70/0x70

crash> gdb list *percpu_ref_kill_rcu+0x1b4
0xffffffff812ace34 is in percpu_ref_kill_rcu (include/linux/percpu-refcount.h:169).
164             pcpu_count = ACCESS_ONCE(ref->pcpu_count);
165
166             if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR))
167                     __this_cpu_dec(*pcpu_count);
168             else if (unlikely(atomic_dec_and_test(&ref->count)))
169                     ref->release(ref);
170
171             rcu_read_unlock_sched();
172     }
173

Ok, so ->release() can't do anything where it may meet a sleeping lock,
but in an -rt kernel, it does that.

Convert struct kioctx ctx_lock/completion_lock to raw_spinlock_t, and
defer final free to a time when we're not under rcu_read_lock_sched().

runltp -f ltp-aio-stress.part1 runs kernel/ltp gripe free.

INFO: ltp-pan reported all tests PASS
LTP Version: 20140422

       ###############################################################

            Done executing testcases.
            LTP Version:  20140422
       ###############################################################


Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
---
 fs/aio.c |   65 ++++++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 46 insertions(+), 19 deletions(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -130,7 +130,7 @@ struct kioctx {
 	} ____cacheline_aligned_in_smp;
 
 	struct {
-		spinlock_t	ctx_lock;
+		raw_spinlock_t	ctx_lock;
 		struct list_head active_reqs;	/* used for cancellation */
 	} ____cacheline_aligned_in_smp;
 
@@ -142,13 +142,16 @@ struct kioctx {
 	struct {
 		unsigned	tail;
 		unsigned	completed_events;
-		spinlock_t	completion_lock;
+		raw_spinlock_t	completion_lock;
 	} ____cacheline_aligned_in_smp;
 
 	struct page		*internal_pages[AIO_RING_PAGES];
 	struct file		*aio_ring_file;
 
 	unsigned		id;
+#ifdef CONFIG_PREEMPT_RT_BASE
+	struct rcu_head		rcu;
+#endif
 };
 
 /*------ sysctl variables----*/
@@ -340,11 +343,11 @@ static int aio_migratepage(struct addres
 	 * while the old page is copied to the new.  This prevents new
 	 * events from being lost.
 	 */
-	spin_lock_irqsave(&ctx->completion_lock, flags);
+	raw_spin_lock_irqsave(&ctx->completion_lock, flags);
 	migrate_page_copy(new, old);
 	BUG_ON(ctx->ring_pages[idx] != old);
 	ctx->ring_pages[idx] = new;
-	spin_unlock_irqrestore(&ctx->completion_lock, flags);
+	raw_spin_unlock_irqrestore(&ctx->completion_lock, flags);
 
 	/* The old page is no longer accessible. */
 	put_page(old);
@@ -467,14 +470,14 @@ void kiocb_set_cancel_fn(struct kiocb *r
 	struct kioctx *ctx = req->ki_ctx;
 	unsigned long flags;
 
-	spin_lock_irqsave(&ctx->ctx_lock, flags);
+	raw_spin_lock_irqsave(&ctx->ctx_lock, flags);
 
 	if (!req->ki_list.next)
 		list_add(&req->ki_list, &ctx->active_reqs);
 
 	req->ki_cancel = cancel;
 
-	spin_unlock_irqrestore(&ctx->ctx_lock, flags);
+	raw_spin_unlock_irqrestore(&ctx->ctx_lock, flags);
 }
 EXPORT_SYMBOL(kiocb_set_cancel_fn);
 
@@ -499,6 +502,7 @@ static int kiocb_cancel(struct kioctx *c
 	return cancel(kiocb);
 }
 
+#ifndef CONFIG_PREEMPT_RT_BASE
 static void free_ioctx(struct work_struct *work)
 {
 	struct kioctx *ctx = container_of(work, struct kioctx, free_work);
@@ -509,6 +513,18 @@ static void free_ioctx(struct work_struc
 	free_percpu(ctx->cpu);
 	kmem_cache_free(kioctx_cachep, ctx);
 }
+#else
+static void free_ioctx_rcu(struct rcu_head *rcu)
+{
+	struct kioctx *ctx = container_of(rcu, struct kioctx, rcu);
+
+	pr_debug("freeing %p\n", ctx);
+
+	aio_free_ring(ctx);
+	free_percpu(ctx->cpu);
+	kmem_cache_free(kioctx_cachep, ctx);
+}
+#endif
 
 static void free_ioctx_reqs(struct percpu_ref *ref)
 {
@@ -518,8 +534,19 @@ static void free_ioctx_reqs(struct percp
 	if (ctx->requests_done)
 		complete(ctx->requests_done);
 
+#ifndef CONFIG_PREEMPT_RT_BASE
 	INIT_WORK(&ctx->free_work, free_ioctx);
 	schedule_work(&ctx->free_work);
+#else
+	/*
+	 * We're in ->release() under rcu_read_lock_sched(), and can't do
+	 * anything that requires taking a sleeping lock, so ->release()
+	 * becomes a two stage rcu process for -rt.  We've now done the
+	 * rcu work that we can under locks made raw to get us this far.
+	 * Defer the freeing bit until we're again allowed to schedule().
+	 */
+	call_rcu(&ctx->rcu, free_ioctx_rcu);
+#endif
 }
 
 /*
@@ -532,7 +559,7 @@ static void free_ioctx_users(struct perc
 	struct kioctx *ctx = container_of(ref, struct kioctx, users);
 	struct kiocb *req;
 
-	spin_lock_irq(&ctx->ctx_lock);
+	raw_spin_lock_irq(&ctx->ctx_lock);
 
 	while (!list_empty(&ctx->active_reqs)) {
 		req = list_first_entry(&ctx->active_reqs,
@@ -542,7 +569,7 @@ static void free_ioctx_users(struct perc
 		kiocb_cancel(ctx, req);
 	}
 
-	spin_unlock_irq(&ctx->ctx_lock);
+	raw_spin_unlock_irq(&ctx->ctx_lock);
 
 	percpu_ref_kill(&ctx->reqs);
 	percpu_ref_put(&ctx->reqs);
@@ -655,8 +682,8 @@ static struct kioctx *ioctx_alloc(unsign
 
 	ctx->max_reqs = nr_events;
 
-	spin_lock_init(&ctx->ctx_lock);
-	spin_lock_init(&ctx->completion_lock);
+	raw_spin_lock_init(&ctx->ctx_lock);
+	raw_spin_lock_init(&ctx->completion_lock);
 	mutex_init(&ctx->ring_lock);
 	/* Protect against page migration throughout kiotx setup by keeping
 	 * the ring_lock mutex held until setup is complete. */
@@ -925,7 +952,7 @@ static void refill_reqs_available(struct
  */
 static void user_refill_reqs_available(struct kioctx *ctx)
 {
-	spin_lock_irq(&ctx->completion_lock);
+	raw_spin_lock_irq(&ctx->completion_lock);
 	if (ctx->completed_events) {
 		struct aio_ring *ring;
 		unsigned head;
@@ -946,7 +973,7 @@ static void user_refill_reqs_available(s
 		refill_reqs_available(ctx, head, ctx->tail);
 	}
 
-	spin_unlock_irq(&ctx->completion_lock);
+	raw_spin_unlock_irq(&ctx->completion_lock);
 }
 
 /* aio_get_req
@@ -1041,9 +1068,9 @@ void aio_complete(struct kiocb *iocb, lo
 	if (iocb->ki_list.next) {
 		unsigned long flags;
 
-		spin_lock_irqsave(&ctx->ctx_lock, flags);
+		raw_spin_lock_irqsave(&ctx->ctx_lock, flags);
 		list_del(&iocb->ki_list);
-		spin_unlock_irqrestore(&ctx->ctx_lock, flags);
+		raw_spin_unlock_irqrestore(&ctx->ctx_lock, flags);
 	}
 
 	/*
@@ -1051,7 +1078,7 @@ void aio_complete(struct kiocb *iocb, lo
 	 * ctx->completion_lock to prevent other code from messing with the tail
 	 * pointer since we might be called from irq context.
 	 */
-	spin_lock_irqsave(&ctx->completion_lock, flags);
+	raw_spin_lock_irqsave(&ctx->completion_lock, flags);
 
 	tail = ctx->tail;
 	pos = tail + AIO_EVENTS_OFFSET;
@@ -1090,7 +1117,7 @@ void aio_complete(struct kiocb *iocb, lo
 	ctx->completed_events++;
 	if (ctx->completed_events > 1)
 		refill_reqs_available(ctx, head, tail);
-	spin_unlock_irqrestore(&ctx->completion_lock, flags);
+	raw_spin_unlock_irqrestore(&ctx->completion_lock, flags);
 
 	pr_debug("added to ring %p at [%u]\n", iocb, tail);
 
@@ -1631,7 +1658,7 @@ static struct kiocb *lookup_kiocb(struct
 {
 	struct list_head *pos;
 
-	assert_spin_locked(&ctx->ctx_lock);
+	assert_raw_spin_locked(&ctx->ctx_lock);
 
 	if (key != KIOCB_KEY)
 		return NULL;
@@ -1671,7 +1698,7 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t
 	if (unlikely(!ctx))
 		return -EINVAL;
 
-	spin_lock_irq(&ctx->ctx_lock);
+	raw_spin_lock_irq(&ctx->ctx_lock);
 
 	kiocb = lookup_kiocb(ctx, iocb, key);
 	if (kiocb)
@@ -1679,7 +1706,7 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t
 	else
 		ret = -EINVAL;
 
-	spin_unlock_irq(&ctx->ctx_lock);
+	raw_spin_unlock_irq(&ctx->ctx_lock);
 
 	if (!ret) {
 		/*





^ permalink raw reply	[flat|nested] 29+ messages in thread

* 3.14.23-rt20 - x86, UV: raw_spinlock conversion
       [not found] ` <1414910967.5380.81.camel@marge.simpson.net>
                     ` (4 preceding siblings ...)
  2014-11-02  7:31   ` 3.14.23-rt20 - aio: fix rcu garbage collection might_sleep() splat Mike Galbraith
@ 2014-11-02  7:31   ` Mike Galbraith
  2015-02-17 13:02     ` Sebastian Andrzej Siewior
  2014-11-02  7:31   ` 3.14.23-rt20 - softirq: resurrect softirq threads Mike Galbraith
  6 siblings, 1 reply; 29+ messages in thread
From: Mike Galbraith @ 2014-11-02  7:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-rt-users, Thomas Gleixner, Carsten Emde, John Kacur,
	Sebastian Andrzej Siewior, Clark Williams

Shrug.  Lots of hobbyists have a beast in their basement, right?

From: Mike Galbraith <mgalbraith@suse.de>
Date: Mon Dec  5 10:01:47 CET 2011
Subject: x86, UV: raw_spinlock conversion

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
---
 arch/x86/include/asm/uv/uv_bau.h   |   14 +++++++-------
 arch/x86/include/asm/uv/uv_hub.h   |    2 +-
 arch/x86/kernel/apic/x2apic_uv_x.c |    2 +-
 arch/x86/platform/uv/tlb_uv.c      |   26 +++++++++++++-------------
 arch/x86/platform/uv/uv_time.c     |   21 +++++++++++++--------
 5 files changed, 35 insertions(+), 30 deletions(-)

--- a/arch/x86/include/asm/uv/uv_bau.h
+++ b/arch/x86/include/asm/uv/uv_bau.h
@@ -611,9 +611,9 @@ struct bau_control {
 	cycles_t		send_message;
 	cycles_t		period_end;
 	cycles_t		period_time;
-	spinlock_t		uvhub_lock;
-	spinlock_t		queue_lock;
-	spinlock_t		disable_lock;
+	raw_spinlock_t		uvhub_lock;
+	raw_spinlock_t		queue_lock;
+	raw_spinlock_t		disable_lock;
 	/* tunables */
 	int			max_concurr;
 	int			max_concurr_const;
@@ -773,15 +773,15 @@ static inline int atom_asr(short i, stru
  * to be lowered below the current 'v'.  atomic_add_unless can only stop
  * on equal.
  */
-static inline int atomic_inc_unless_ge(spinlock_t *lock, atomic_t *v, int u)
+static inline int atomic_inc_unless_ge(raw_spinlock_t *lock, atomic_t *v, int u)
 {
-	spin_lock(lock);
+	raw_spin_lock(lock);
 	if (atomic_read(v) >= u) {
-		spin_unlock(lock);
+		raw_spin_unlock(lock);
 		return 0;
 	}
 	atomic_inc(v);
-	spin_unlock(lock);
+	raw_spin_unlock(lock);
 	return 1;
 }
 
--- a/arch/x86/include/asm/uv/uv_hub.h
+++ b/arch/x86/include/asm/uv/uv_hub.h
@@ -502,7 +502,7 @@ struct uv_blade_info {
 	unsigned short	nr_online_cpus;
 	unsigned short	pnode;
 	short		memory_nid;
-	spinlock_t	nmi_lock;	/* obsolete, see uv_hub_nmi */
+	raw_spinlock_t	nmi_lock;	/* obsolete, see uv_hub_nmi */
 	unsigned long	nmi_count;	/* obsolete, see uv_hub_nmi */
 };
 extern struct uv_blade_info *uv_blade_info;
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -911,7 +911,7 @@ void __init uv_system_init(void)
 			uv_blade_info[blade].pnode = pnode;
 			uv_blade_info[blade].nr_possible_cpus = 0;
 			uv_blade_info[blade].nr_online_cpus = 0;
-			spin_lock_init(&uv_blade_info[blade].nmi_lock);
+			raw_spin_lock_init(&uv_blade_info[blade].nmi_lock);
 			min_pnode = min(pnode, min_pnode);
 			max_pnode = max(pnode, max_pnode);
 			blade++;
--- a/arch/x86/platform/uv/tlb_uv.c
+++ b/arch/x86/platform/uv/tlb_uv.c
@@ -719,9 +719,9 @@ static void destination_plugged(struct b
 
 		quiesce_local_uvhub(hmaster);
 
-		spin_lock(&hmaster->queue_lock);
+		raw_spin_lock(&hmaster->queue_lock);
 		reset_with_ipi(&bau_desc->distribution, bcp);
-		spin_unlock(&hmaster->queue_lock);
+		raw_spin_unlock(&hmaster->queue_lock);
 
 		end_uvhub_quiesce(hmaster);
 
@@ -741,9 +741,9 @@ static void destination_timeout(struct b
 
 		quiesce_local_uvhub(hmaster);
 
-		spin_lock(&hmaster->queue_lock);
+		raw_spin_lock(&hmaster->queue_lock);
 		reset_with_ipi(&bau_desc->distribution, bcp);
-		spin_unlock(&hmaster->queue_lock);
+		raw_spin_unlock(&hmaster->queue_lock);
 
 		end_uvhub_quiesce(hmaster);
 
@@ -764,7 +764,7 @@ static void disable_for_period(struct ba
 	cycles_t tm1;
 
 	hmaster = bcp->uvhub_master;
-	spin_lock(&hmaster->disable_lock);
+	raw_spin_lock(&hmaster->disable_lock);
 	if (!bcp->baudisabled) {
 		stat->s_bau_disabled++;
 		tm1 = get_cycles();
@@ -777,7 +777,7 @@ static void disable_for_period(struct ba
 			}
 		}
 	}
-	spin_unlock(&hmaster->disable_lock);
+	raw_spin_unlock(&hmaster->disable_lock);
 }
 
 static void count_max_concurr(int stat, struct bau_control *bcp,
@@ -840,7 +840,7 @@ static void record_send_stats(cycles_t t
  */
 static void uv1_throttle(struct bau_control *hmaster, struct ptc_stats *stat)
 {
-	spinlock_t *lock = &hmaster->uvhub_lock;
+	raw_spinlock_t *lock = &hmaster->uvhub_lock;
 	atomic_t *v;
 
 	v = &hmaster->active_descriptor_count;
@@ -972,7 +972,7 @@ static int check_enable(struct bau_contr
 	struct bau_control *hmaster;
 
 	hmaster = bcp->uvhub_master;
-	spin_lock(&hmaster->disable_lock);
+	raw_spin_lock(&hmaster->disable_lock);
 	if (bcp->baudisabled && (get_cycles() >= bcp->set_bau_on_time)) {
 		stat->s_bau_reenabled++;
 		for_each_present_cpu(tcpu) {
@@ -984,10 +984,10 @@ static int check_enable(struct bau_contr
 				tbcp->period_giveups = 0;
 			}
 		}
-		spin_unlock(&hmaster->disable_lock);
+		raw_spin_unlock(&hmaster->disable_lock);
 		return 0;
 	}
-	spin_unlock(&hmaster->disable_lock);
+	raw_spin_unlock(&hmaster->disable_lock);
 	return -1;
 }
 
@@ -1895,9 +1895,9 @@ static void __init init_per_cpu_tunables
 		bcp->cong_reps			= congested_reps;
 		bcp->disabled_period =		sec_2_cycles(disabled_period);
 		bcp->giveup_limit =		giveup_limit;
-		spin_lock_init(&bcp->queue_lock);
-		spin_lock_init(&bcp->uvhub_lock);
-		spin_lock_init(&bcp->disable_lock);
+		raw_spin_lock_init(&bcp->queue_lock);
+		raw_spin_lock_init(&bcp->uvhub_lock);
+		raw_spin_lock_init(&bcp->disable_lock);
 	}
 }
 
--- a/arch/x86/platform/uv/uv_time.c
+++ b/arch/x86/platform/uv/uv_time.c
@@ -58,7 +58,7 @@ static DEFINE_PER_CPU(struct clock_event
 
 /* There is one of these allocated per node */
 struct uv_rtc_timer_head {
-	spinlock_t	lock;
+	raw_spinlock_t	lock;
 	/* next cpu waiting for timer, local node relative: */
 	int		next_cpu;
 	/* number of cpus on this node: */
@@ -178,7 +178,7 @@ static __init int uv_rtc_allocate_timers
 				uv_rtc_deallocate_timers();
 				return -ENOMEM;
 			}
-			spin_lock_init(&head->lock);
+			raw_spin_lock_init(&head->lock);
 			head->ncpus = uv_blade_nr_possible_cpus(bid);
 			head->next_cpu = -1;
 			blade_info[bid] = head;
@@ -232,7 +232,7 @@ static int uv_rtc_set_timer(int cpu, u64
 	unsigned long flags;
 	int next_cpu;
 
-	spin_lock_irqsave(&head->lock, flags);
+	raw_spin_lock_irqsave(&head->lock, flags);
 
 	next_cpu = head->next_cpu;
 	*t = expires;
@@ -244,12 +244,12 @@ static int uv_rtc_set_timer(int cpu, u64
 		if (uv_setup_intr(cpu, expires)) {
 			*t = ULLONG_MAX;
 			uv_rtc_find_next_timer(head, pnode);
-			spin_unlock_irqrestore(&head->lock, flags);
+			raw_spin_unlock_irqrestore(&head->lock, flags);
 			return -ETIME;
 		}
 	}
 
-	spin_unlock_irqrestore(&head->lock, flags);
+	raw_spin_unlock_irqrestore(&head->lock, flags);
 	return 0;
 }
 
@@ -268,7 +268,7 @@ static int uv_rtc_unset_timer(int cpu, i
 	unsigned long flags;
 	int rc = 0;
 
-	spin_lock_irqsave(&head->lock, flags);
+	raw_spin_lock_irqsave(&head->lock, flags);
 
 	if ((head->next_cpu == bcpu && uv_read_rtc(NULL) >= *t) || force)
 		rc = 1;
@@ -280,7 +280,7 @@ static int uv_rtc_unset_timer(int cpu, i
 			uv_rtc_find_next_timer(head, pnode);
 	}
 
-	spin_unlock_irqrestore(&head->lock, flags);
+	raw_spin_unlock_irqrestore(&head->lock, flags);
 
 	return rc;
 }
@@ -300,13 +300,18 @@ static int uv_rtc_unset_timer(int cpu, i
 static cycle_t uv_read_rtc(struct clocksource *cs)
 {
 	unsigned long offset;
+	cycle_t cycles;
 
+	migrate_disable();
 	if (uv_get_min_hub_revision_id() == 1)
 		offset = 0;
 	else
 		offset = (uv_blade_processor_id() * L1_CACHE_BYTES) % PAGE_SIZE;
 
-	return (cycle_t)uv_read_local_mmr(UVH_RTC | offset);
+	cycles = (cycle_t)uv_read_local_mmr(UVH_RTC | offset);
+	migrate_enable();
+
+	return cycles;
 }
 
 /*




^ permalink raw reply	[flat|nested] 29+ messages in thread

* 3.14.23-rt20 - softirq: resurrect softirq threads
       [not found] ` <1414910967.5380.81.camel@marge.simpson.net>
                     ` (5 preceding siblings ...)
  2014-11-02  7:31   ` 3.14.23-rt20 - x86, UV: raw_spinlock conversion Mike Galbraith
@ 2014-11-02  7:31   ` Mike Galbraith
  2015-02-17 13:05     ` Sebastian Andrzej Siewior
  6 siblings, 1 reply; 29+ messages in thread
From: Mike Galbraith @ 2014-11-02  7:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-rt-users, Thomas Gleixner, Carsten Emde, John Kacur,
	Sebastian Andrzej Siewior, Clark Williams

(sirqs suck, this makes them suck less for some boxen/loads)

Subject: softirq: resurrect softirq threads
From: Mike Galbraith <mgalbraith@suse.de>
Date: Mon Jan  6 08:42:11 CET 2014

Some loads cannot tolerate the jitter induced by all softirqs being processed
at the same priority.  Let the user prioritize them again.

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
---
 Documentation/kernel-parameters.txt |    3 
 include/linux/interrupt.h           |    9 -
 include/linux/sched.h               |    6 +
 kernel/sched/cputime.c              |    4 
 kernel/softirq.c                    |  182 +++++++++++++++++++++++++++++++-----
 5 files changed, 173 insertions(+), 31 deletions(-)

--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3187,6 +3187,9 @@ bytes respectively. Such letter suffixes
 			Force threading of all interrupt handlers except those
 			marked explicitly IRQF_NO_THREAD.
 
+	threadsirqs	[KNL]
+			Enable or disable threading of all softirqs for -rt.
+
 	tmem		[KNL,XEN]
 			Enable the Transcendent memory driver if built-in.
 
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -425,6 +425,7 @@ struct softirq_action
 asmlinkage void do_softirq(void);
 asmlinkage void __do_softirq(void);
 static inline void thread_do_softirq(void) { do_softirq(); }
+#define NR_SOFTIRQ_THREADS 1
 #ifdef __ARCH_HAS_DO_SOFTIRQ
 void do_softirq_own_stack(void);
 #else
@@ -435,6 +436,7 @@ static inline void do_softirq_own_stack(
 #endif
 #else
 extern void thread_do_softirq(void);
+#define NR_SOFTIRQ_THREADS NR_SOFTIRQS
 #endif
 
 extern void open_softirq(int nr, void (*action)(struct softirq_action *));
@@ -445,12 +447,7 @@ extern void raise_softirq_irqoff(unsigne
 extern void raise_softirq(unsigned int nr);
 extern void softirq_check_pending_idle(void);
 
-DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
-
-static inline struct task_struct *this_cpu_ksoftirqd(void)
-{
-	return this_cpu_read(ksoftirqd);
-}
+DECLARE_PER_CPU(struct task_struct * [NR_SOFTIRQ_THREADS], ksoftirqd);
 
 /* Tasklets --- multithreaded analogue of BHs.
 
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1263,6 +1263,7 @@ struct task_struct {
 	/* Revert to default priority/policy when forking */
 	unsigned sched_reset_on_fork:1;
 	unsigned sched_contributes_to_load:1;
+	unsigned sched_is_softirqd:1;
 
 	pid_t pid;
 	pid_t tgid;
@@ -1678,6 +1679,11 @@ static inline struct pid *task_tgid(stru
 	return task->group_leader->pids[PIDTYPE_PID].pid;
 }
 
+static inline bool task_is_softirqd(struct task_struct *task)
+{
+	return task->sched_is_softirqd;
+}
+
 /*
  * Without tasklist or rcu lock it is not safe to dereference
  * the result of task_pgrp/task_session even if task == current,
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -68,7 +68,7 @@ void irqtime_account_irq(struct task_str
 	 */
 	if (hardirq_count())
 		__this_cpu_add(cpu_hardirq_time, delta);
-	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
+	else if (in_serving_softirq() && !task_is_softirqd(curr))
 		__this_cpu_add(cpu_softirq_time, delta);
 
 	irq_time_write_end();
@@ -342,7 +342,7 @@ static void irqtime_account_process_tick
 		cpustat[CPUTIME_IRQ] += cputime;
 	} else if (irqtime_account_si_update()) {
 		cpustat[CPUTIME_SOFTIRQ] += cputime;
-	} else if (this_cpu_ksoftirqd() == p) {
+	} else if (task_is_softirqd(p)) {
 		/*
 		 * ksoftirqd time do not get accounted in cpu_softirq_time.
 		 * So, we have to handle it separately here.
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -56,7 +56,14 @@ EXPORT_SYMBOL(irq_stat);
 
 static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
 
-DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+DEFINE_PER_CPU(struct task_struct * [NR_SOFTIRQ_THREADS], ksoftirqd);
+
+static unsigned int __read_mostly threadsirqs;
+
+static struct task_struct *__this_cpu_ksoftirqd(int nr)
+{
+	return  __this_cpu_read(ksoftirqd[nr && threadsirqs ? nr : 0]);
+}
 
 const char * const softirq_to_name[NR_SOFTIRQS] = {
 	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
@@ -161,10 +168,10 @@ static inline void softirq_clr_runner(un
  * to the pending events, so lets the scheduler to balance
  * the softirq load for us.
  */
-static void wakeup_softirqd(void)
+static void wakeup_softirqd(int nr)
 {
 	/* Interrupts are disabled: no need to stop preemption */
-	struct task_struct *tsk = __this_cpu_read(ksoftirqd);
+	struct task_struct *tsk =  __this_cpu_ksoftirqd(nr);
 
 	if (tsk && tsk->state != TASK_RUNNING)
 		wake_up_process(tsk);
@@ -413,7 +420,7 @@ asmlinkage void __do_softirq(void)
 		    --max_restart)
 			goto restart;
 
-		wakeup_softirqd();
+		wakeup_softirqd(0);
 	}
 
 	lockdep_softirq_end(in_hardirq);
@@ -458,7 +465,7 @@ void raise_softirq_irqoff(unsigned int n
 	 * schedule the softirq soon.
 	 */
 	if (!in_interrupt())
-		wakeup_softirqd();
+		wakeup_softirqd(0);
 }
 
 void __raise_softirq_irqoff(unsigned int nr)
@@ -469,8 +476,18 @@ void __raise_softirq_irqoff(unsigned int
 
 static inline void local_bh_disable_nort(void) { local_bh_disable(); }
 static inline void _local_bh_enable_nort(void) { _local_bh_enable(); }
-static void ksoftirqd_set_sched_params(unsigned int cpu) { }
-static void ksoftirqd_clr_sched_params(unsigned int cpu, bool online) { }
+static void ksoftirqd_set_sched_params(unsigned int cpu)
+{
+	local_irq_disable();
+	current->sched_is_softirqd = 1;
+	local_irq_enable();
+}
+static void ksoftirqd_clr_sched_params(unsigned int cpu, bool online)
+{
+	local_irq_disable();
+	current->sched_is_softirqd = 0;
+	local_irq_enable();
+}
 
 #else /* !PREEMPT_RT_FULL */
 
@@ -656,15 +673,15 @@ static void do_raise_softirq_irqoff(unsi
 
 	if (!in_irq() && current->softirq_nestcnt)
 		current->softirqs_raised |= (1U << nr);
-	else if (__this_cpu_read(ksoftirqd))
-		__this_cpu_read(ksoftirqd)->softirqs_raised |= (1U << nr);
+	else if (__this_cpu_ksoftirqd(nr))
+		__this_cpu_ksoftirqd(nr)->softirqs_raised |= (1U << nr);
 }
 
 void __raise_softirq_irqoff(unsigned int nr)
 {
 	do_raise_softirq_irqoff(nr);
 	if (!in_irq() && !current->softirq_nestcnt)
-		wakeup_softirqd();
+		wakeup_softirqd(nr);
 }
 
 /*
@@ -691,7 +708,7 @@ void raise_softirq_irqoff(unsigned int n
 	 *
 	 */
 	if (!current->softirq_nestcnt)
-		wakeup_softirqd();
+		wakeup_softirqd(nr);
 }
 
 static inline int ksoftirqd_softirq_pending(void)
@@ -709,6 +726,7 @@ static inline void ksoftirqd_set_sched_p
 	sched_setscheduler(current, SCHED_FIFO, &param);
 	/* Take over all pending softirqs when starting */
 	local_irq_disable();
+	current->sched_is_softirqd = 1;
 	current->softirqs_raised = local_softirq_pending();
 	local_irq_enable();
 }
@@ -717,9 +735,26 @@ static inline void ksoftirqd_clr_sched_p
 {
 	struct sched_param param = { .sched_priority = 0 };
 
+	local_irq_disable();
+	current->sched_is_softirqd = 0;
+	current->softirqs_raised = 0;
+	local_irq_enable();
 	sched_setscheduler(current, SCHED_NORMAL, &param);
 }
 
+static int __init threadsoftirqs(char *str)
+{
+	int thread = 0;
+
+	if (!get_option(&str, &thread))
+		thread = 1;
+
+	threadsirqs = !!thread;
+
+	return 0;
+}
+
+early_param("threadsirqs", threadsoftirqs);
 #endif /* PREEMPT_RT_FULL */
 /*
  * Enter an interrupt context.
@@ -760,15 +795,25 @@ static inline void invoke_softirq(void)
 		do_softirq_own_stack();
 #endif
 	} else {
-		wakeup_softirqd();
+		wakeup_softirqd(0);
 	}
 #else /* PREEMPT_RT_FULL */
+	struct task_struct *tsk;
 	unsigned long flags;
+	u32 pending, nr;
 
 	local_irq_save(flags);
-	if (__this_cpu_read(ksoftirqd) &&
-			__this_cpu_read(ksoftirqd)->softirqs_raised)
-		wakeup_softirqd();
+	pending = local_softirq_pending();
+
+	while (pending) {
+		nr = __ffs(pending);
+		tsk = __this_cpu_ksoftirqd(nr);
+		if (tsk && tsk->softirqs_raised)
+			wakeup_softirqd(nr);
+		if (!threadsirqs)
+			break;
+		pending &= ~(1U << nr);
+	}
 	local_irq_restore(flags);
 #endif
 }
@@ -1201,20 +1246,111 @@ static struct notifier_block cpu_nfb = {
 	.notifier_call = cpu_callback
 };
 
-static struct smp_hotplug_thread softirq_threads = {
-	.store			= &ksoftirqd,
-	.setup			= ksoftirqd_set_sched_params,
-	.cleanup		= ksoftirqd_clr_sched_params,
-	.thread_should_run	= ksoftirqd_should_run,
-	.thread_fn		= run_ksoftirqd,
-	.thread_comm		= "ksoftirqd/%u",
+static struct smp_hotplug_thread softirq_threads[] = {
+	{
+		.store			= &ksoftirqd[0],
+		.setup			= ksoftirqd_set_sched_params,
+		.cleanup		= ksoftirqd_clr_sched_params,
+		.thread_should_run	= ksoftirqd_should_run,
+		.thread_fn		= run_ksoftirqd,
+		.thread_comm		= "ksoftirqd/%u",
+	},
+#ifdef CONFIG_PREEMPT_RT_FULL
+	{
+		.store			= &ksoftirqd[HI_SOFTIRQ],
+		.setup			= ksoftirqd_set_sched_params,
+		.cleanup		= ksoftirqd_clr_sched_params,
+		.thread_should_run	= ksoftirqd_should_run,
+		.thread_fn		= run_ksoftirqd,
+		.thread_comm		= "sirq-high/%u",
+	},
+	{
+		.store			= &ksoftirqd[TIMER_SOFTIRQ],
+		.setup			= ksoftirqd_set_sched_params,
+		.cleanup		= ksoftirqd_clr_sched_params,
+		.thread_should_run	= ksoftirqd_should_run,
+		.thread_fn		= run_ksoftirqd,
+		.thread_comm		= "sirq-timer/%u",
+	},
+	{
+		.store			= &ksoftirqd[NET_TX_SOFTIRQ],
+		.setup			= ksoftirqd_set_sched_params,
+		.cleanup		= ksoftirqd_clr_sched_params,
+		.thread_should_run	= ksoftirqd_should_run,
+		.thread_fn		= run_ksoftirqd,
+		.thread_comm		= "sirq-net-tx/%u",
+	},
+	{
+		.store			= &ksoftirqd[NET_RX_SOFTIRQ],
+		.setup			= ksoftirqd_set_sched_params,
+		.cleanup		= ksoftirqd_clr_sched_params,
+		.thread_should_run	= ksoftirqd_should_run,
+		.thread_fn		= run_ksoftirqd,
+		.thread_comm		= "sirq-net-rx/%u",
+	},
+	{
+		.store			= &ksoftirqd[BLOCK_SOFTIRQ],
+		.setup			= ksoftirqd_set_sched_params,
+		.cleanup		= ksoftirqd_clr_sched_params,
+		.thread_should_run	= ksoftirqd_should_run,
+		.thread_fn		= run_ksoftirqd,
+		.thread_comm		= "sirq-blk/%u",
+	},
+	{
+		.store			= &ksoftirqd[BLOCK_IOPOLL_SOFTIRQ],
+		.setup			= ksoftirqd_set_sched_params,
+		.cleanup		= ksoftirqd_clr_sched_params,
+		.thread_should_run	= ksoftirqd_should_run,
+		.thread_fn		= run_ksoftirqd,
+		.thread_comm		= "sirq-blk-pol/%u",
+	},
+	{
+		.store			= &ksoftirqd[TASKLET_SOFTIRQ],
+		.setup			= ksoftirqd_set_sched_params,
+		.cleanup		= ksoftirqd_clr_sched_params,
+		.thread_should_run	= ksoftirqd_should_run,
+		.thread_fn		= run_ksoftirqd,
+		.thread_comm		= "sirq-tasklet/%u",
+	},
+	{
+		.store			= &ksoftirqd[SCHED_SOFTIRQ],
+		.setup			= ksoftirqd_set_sched_params,
+		.cleanup		= ksoftirqd_clr_sched_params,
+		.thread_should_run	= ksoftirqd_should_run,
+		.thread_fn		= run_ksoftirqd,
+		.thread_comm		= "sirq-sched/%u",
+	},
+	{
+		.store			= &ksoftirqd[HRTIMER_SOFTIRQ],
+		.setup			= ksoftirqd_set_sched_params,
+		.cleanup		= ksoftirqd_clr_sched_params,
+		.thread_should_run	= ksoftirqd_should_run,
+		.thread_fn		= run_ksoftirqd,
+		.thread_comm		= "sirq-hrtimer/%u",
+	},
+	{
+		.store			= &ksoftirqd[RCU_SOFTIRQ],
+		.setup			= ksoftirqd_set_sched_params,
+		.cleanup		= ksoftirqd_clr_sched_params,
+		.thread_should_run	= ksoftirqd_should_run,
+		.thread_fn		= run_ksoftirqd,
+		.thread_comm		= "sirq-rcu/%u",
+	},
+#endif
 };
 
 static __init int spawn_ksoftirqd(void)
 {
+	struct smp_hotplug_thread *t = &softirq_threads[threadsirqs];
+	int i, threads = NR_SOFTIRQ_THREADS;
+
 	register_cpu_notifier(&cpu_nfb);
 
-	BUG_ON(smpboot_register_percpu_thread(&softirq_threads));
+	for (i = 0; i < threads; i++, t++) {
+		BUG_ON(smpboot_register_percpu_thread(t));
+		if (!threadsirqs)
+			break;
+	}
 
 	return 0;
 }




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ANNOUNCE] 3.14.23-rt20
       [not found]   ` <CADLDEKvFvw7Aa98tVtM3z5JV8oocKqAdV4PqOMoZH2mXZ6x2jg@mail.gmail.com>
@ 2014-11-05 14:04     ` Juerg Haefliger
  2014-11-05 14:27     ` Steven Rostedt
  1 sibling, 0 replies; 29+ messages in thread
From: Juerg Haefliger @ 2014-11-05 14:04 UTC (permalink / raw)
  To: LKML, linux-rt-users

Resending to the list due to mailer/html issues.


On Sun, Nov 2, 2014 at 8:30 AM, Mike Galbraith <umgwanakikbuti@gmail.com> wrote:
>
> On Fri, 2014-10-31 at 17:03 -0400, Steven Rostedt wrote:
> > Dear RT Folks,
> >
> > I'm pleased to announce the 3.14.23-rt20 stable release.
> >
> > This is the first 3.14-rt release in the stable-rt series. Normally I
> > wait till the next development release is out before I pull in a new
> > one. That is, I would pull in 3.14-rt when 3.16-rt or later was
> > released. But because development is now moving at a "hobbyist rate"
> > (read http://lwn.net/Articles/617140/ for details)
> > and 3.14-rt is no longer being developed against, I figured it was
> > time
> > to put it under the "stable-rt" umbrella.
>
> I piddled about with it yesterday, found that you can't change cpufreq
> governor IFF the tree is configured as rt, but works fine as voluntary
> preempt.

The problem seems to be this patch: https://lkml.org/lkml/2014/4/8/584

The cpufreq code does nested down_read_trylocks and only the first one succeeds:

drivers/cpufreq/cpufreq.c:
store
  down_read_trylock(cpufreq_rwsem)  <- succeeds
  store_scaling_governor
    cpufreq_get_policy
      cpufreq_cpu_get
        down_read_trylock(cpufreq_rwsem)  <-- fails

Reverting the above patch 'fixes' the problem. I don't understand
Steven's commit comment that readers of rwsem are not allowed to nest
in mainline since this works just fine in mainline.

...Juerg


>  I'll poke about for the entertainment value.  Having no
> personal need/use for rt detracts from its hobby value somewhat, but rt
> problems do have a tendency to be 'entertaining'.
>
> I'll follow up with a few patches that folks can apply to their trees if
> they so desire.  There being no devel tree to submit against, I can't do
> a proper submission (rules), and some of them you surely don't want :)
>
>         -Mike

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ANNOUNCE] 3.14.23-rt20
       [not found]   ` <CADLDEKvFvw7Aa98tVtM3z5JV8oocKqAdV4PqOMoZH2mXZ6x2jg@mail.gmail.com>
  2014-11-05 14:04     ` Juerg Haefliger
@ 2014-11-05 14:27     ` Steven Rostedt
  2014-11-05 14:43       ` Juerg Haefliger
  2014-11-05 16:07       ` Thomas Gleixner
  1 sibling, 2 replies; 29+ messages in thread
From: Steven Rostedt @ 2014-11-05 14:27 UTC (permalink / raw)
  To: Juerg Haefliger
  Cc: Mike Galbraith, LKML, linux-rt-users, Thomas Gleixner,
	Carsten Emde, John Kacur, Sebastian Andrzej Siewior,
	Clark Williams

On Wed, 5 Nov 2014 14:50:41 +0100
Juerg Haefliger <juergh@gmail.com> wrote:

> On Sun, Nov 2, 2014 at 8:30 AM, Mike Galbraith <umgwanakikbuti@gmail.com>
> wrote:
> >
> > On Fri, 2014-10-31 at 17:03 -0400, Steven Rostedt wrote:
> > > Dear RT Folks,
> > >
> > > I'm pleased to announce the 3.14.23-rt20 stable release.
> > >
> > > This is the first 3.14-rt release in the stable-rt series. Normally I
> > > wait till the next development release is out before I pull in a new
> > > one. That is, I would pull in 3.14-rt when 3.16-rt or later was
> > > released. But because development is now moving at a "hobbyist rate"
> > > (read http://lwn.net/Articles/617140/ for details)
> > > and 3.14-rt is no longer being developed against, I figured it was
> > time
> > > to put it under the "stable-rt" umbrella.
> >
> > I piddled about with it yesterday, found that you can't change cpufreq
> > governor IFF the tree is configured as rt, but works fine as voluntary
> > preempt.
> 
> The problem seems to be this patch: https://lkml.org/lkml/2014/4/8/584
> 
> The cpufreq code does nested down_read_trylocks and only the first one
> succeeds:
> 
> drivers/cpufreq/cpufreq.c:
> store
>   down_read_trylock(cpufreq_rwsem)  <- succeeds
>   store_scaling_governor
>     cpufreq_get_policy
>       cpufreq_cpu_get
>         down_read_trylock(cpufreq_rwsem)  <-- fails
> 
> Reverting the above patch 'fixes' the problem. I don't understand Steven's
> commit comment that readers of rwsem are not allowed to nest in mainline
> since this works just fine in mainline.

When we allow multiple readers, this will be allowed. But even in
mainline, if a writer were to come in and block between those two
down_read_trylocks(), the second trylock would fail.

PREEMPT_RT just has that fail all the time as we only allow an rwsem to
be held by a single reader.


-- Steve



> 
> ...Juerg
> 
> 
> 
> >  I'll poke about for the entertainment value.  Having no
> > personal need/use for rt detracts from its hobby value somewhat, but rt
> > problems do have a tendency to be 'entertaining'.
> >
> > I'll follow up with a few patches that folks can apply to their trees if
> > they so desire.  There being no devel tree to submit against, I can't do
> > a proper submission (rules), and some of them you surely don't want :)
> >
> >         -Mike
> >
> >
> >
> >


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ANNOUNCE] 3.14.23-rt20
  2014-11-05 14:27     ` Steven Rostedt
@ 2014-11-05 14:43       ` Juerg Haefliger
  2014-11-05 15:00         ` Steven Rostedt
  2014-11-05 16:07       ` Thomas Gleixner
  1 sibling, 1 reply; 29+ messages in thread
From: Juerg Haefliger @ 2014-11-05 14:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mike Galbraith, LKML, linux-rt-users, Thomas Gleixner,
	Carsten Emde, John Kacur, Sebastian Andrzej Siewior,
	Clark Williams

On Wed, Nov 5, 2014 at 3:27 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 5 Nov 2014 14:50:41 +0100
> Juerg Haefliger <juergh@gmail.com> wrote:
>
>> On Sun, Nov 2, 2014 at 8:30 AM, Mike Galbraith <umgwanakikbuti@gmail.com>
>> wrote:
>> >
>> > On Fri, 2014-10-31 at 17:03 -0400, Steven Rostedt wrote:
>> > > Dear RT Folks,
>> > >
>> > > I'm pleased to announce the 3.14.23-rt20 stable release.
>> > >
>> > > This is the first 3.14-rt release in the stable-rt series. Normally I
>> > > wait till the next development release is out before I pull in a new
>> > > one. That is, I would pull in 3.14-rt when 3.16-rt or later was
>> > > released. But because development is now moving at a "hobbyist rate"
>> > > (read http://lwn.net/Articles/617140/ for details)
>> > > and 3.14-rt is no longer being developed against, I figured it was
>> > time
>> > > to put it under the "stable-rt" umbrella.
>> >
>> > I piddled about with it yesterday, found that you can't change cpufreq
>> > governor IFF the tree is configured as rt, but works fine as voluntary
>> > preempt.
>>
>> The problem seems to be this patch: https://lkml.org/lkml/2014/4/8/584
>>
>> The cpufreq code does nested down_read_trylocks and only the first one
>> succeeds:
>>
>> drivers/cpufreq/cpufreq.c:
>> store
>>   down_read_trylock(cpufreq_rwsem)  <- succeeds
>>   store_scaling_governor
>>     cpufreq_get_policy
>>       cpufreq_cpu_get
>>         down_read_trylock(cpufreq_rwsem)  <-- fails
>>
>> Reverting the above patch 'fixes' the problem. I don't understand Steven's
>> commit comment that readers of rwsem are not allowed to nest in mainline
>> since this works just fine in mainline.
>
> When we allow multiple readers, this will be allowed. But even in
> mainline, if a writer were to come in and block between those two
> down_read_trylocks(), the second trylock would fail.

Thanks for the explanation. So is this considered a temporary failure
until multiple readers are allowed or does cpufreq need fixing or
something else? Just trying to figure out what to do next.

...Juerg


> PREEMPT_RT just has that fail all the time as we only allow an rwsem to
> be held by a single reader.
>
>
> -- Steve
>
>
>
>>
>> ...Juerg
>>
>>
>>
>> >  I'll poke about for the entertainment value.  Having no
>> > personal need/use for rt detracts from its hobby value somewhat, but rt
>> > problems do have a tendency to be 'entertaining'.
>> >
>> > I'll follow up with a few patches that folks can apply to their trees if
>> > they so desire.  There being no devel tree to submit against, I can't do
>> > a proper submission (rules), and some of them you surely don't want :)
>> >
>> >         -Mike
>> >
>> >
>> >
>> >
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ANNOUNCE] 3.14.23-rt20
  2014-11-05 14:43       ` Juerg Haefliger
@ 2014-11-05 15:00         ` Steven Rostedt
  2014-11-05 15:35           ` Harry van Haaren
  2014-11-05 15:44           ` Mike Galbraith
  0 siblings, 2 replies; 29+ messages in thread
From: Steven Rostedt @ 2014-11-05 15:00 UTC (permalink / raw)
  To: Juerg Haefliger
  Cc: Mike Galbraith, LKML, linux-rt-users, Thomas Gleixner,
	Carsten Emde, John Kacur, Sebastian Andrzej Siewior,
	Clark Williams

On Wed, 5 Nov 2014 15:43:24 +0100
Juerg Haefliger <juergh@gmail.com> wrote:

> Thanks for the explanation. So is this considered a temporary failure
> until multiple readers are allowed or does cpufreq need fixing or
> something else? Just trying to figure out what to do next.

Good question. Unfortunately I don't know the answer to that.

-- Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ANNOUNCE] 3.14.23-rt20
  2014-11-05 15:00         ` Steven Rostedt
@ 2014-11-05 15:35           ` Harry van Haaren
  2014-11-05 15:44           ` Mike Galbraith
  1 sibling, 0 replies; 29+ messages in thread
From: Harry van Haaren @ 2014-11-05 15:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Juerg Haefliger, Mike Galbraith, LKML, linux-rt-users,
	Thomas Gleixner, Carsten Emde, John Kacur,
	Sebastian Andrzej Siewior, Clark Williams

On Wed, 5 Nov 2014 14:50:41 +0100 Juerg Haefliger <juergh@gmail.com> wrote:
> The cpufreq code does nested down_read_trylocks and only the first one succeeds:
>drivers/cpufreq/cpufreq.c:
>store
>  down_read_trylock(cpufreq_rwsem)  <- succeeds
>  store_scaling_governor
>    cpufreq_get_policy
>      cpufreq_cpu_get
>        down_read_trylock(cpufreq_rwsem)  <-- fails

On a related note: I think this patch/issue may be the cause of the
-rt CPU frequency scaling bug I reported a couple of months ago.
http://comments.gmane.org/gmane.linux.rt.user/12472

I'm currently using the performance governor by default as a
workaround; thanks to JackWinter for packaging in the ArchAudio repos.

Cheers, -Harry

-- 

http://www.openavproductions.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ANNOUNCE] 3.14.23-rt20
  2014-11-05 15:00         ` Steven Rostedt
  2014-11-05 15:35           ` Harry van Haaren
@ 2014-11-05 15:44           ` Mike Galbraith
  2014-11-05 16:07             ` Thomas Gleixner
  1 sibling, 1 reply; 29+ messages in thread
From: Mike Galbraith @ 2014-11-05 15:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Juerg Haefliger, LKML, linux-rt-users, Thomas Gleixner,
	Carsten Emde, John Kacur, Sebastian Andrzej Siewior,
	Clark Williams

On Wed, 2014-11-05 at 10:00 -0500, Steven Rostedt wrote: 
> On Wed, 5 Nov 2014 15:43:24 +0100
> Juerg Haefliger <juergh@gmail.com> wrote:
> 
> > Thanks for the explanation. So is this considered a temporary failure
> > until multiple readers are allowed or does cpufreq need fixing or
> > something else? Just trying to figure out what to do next.
> 
> Good question. Unfortunately I don't know the answer to that.

[RFC PATCH RT] rwsem: The return of multi-reader PI rwsems

The above springs to mind.

-Mike


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ANNOUNCE] 3.14.23-rt20
  2014-11-05 14:27     ` Steven Rostedt
  2014-11-05 14:43       ` Juerg Haefliger
@ 2014-11-05 16:07       ` Thomas Gleixner
  2014-11-05 22:29         ` Thomas Gleixner
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2014-11-05 16:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Juerg Haefliger, Mike Galbraith, LKML, linux-rt-users,
	Carsten Emde, John Kacur, Sebastian Andrzej Siewior,
	Clark Williams

On Wed, 5 Nov 2014, Steven Rostedt wrote:
> On Wed, 5 Nov 2014 14:50:41 +0100
> Juerg Haefliger <juergh@gmail.com> wrote:
> 
> > On Sun, Nov 2, 2014 at 8:30 AM, Mike Galbraith <umgwanakikbuti@gmail.com>
> > wrote:
> > >
> > > On Fri, 2014-10-31 at 17:03 -0400, Steven Rostedt wrote:
> > > > Dear RT Folks,
> > > >
> > > > I'm pleased to announce the 3.14.23-rt20 stable release.
> > > >
> > > > This is the first 3.14-rt release in the stable-rt series. Normally I
> > > > wait till the next development release is out before I pull in a new
> > > > one. That is, I would pull in 3.14-rt when 3.16-rt or later was
> > > > released. But because development is now moving at a "hobbyist rate"
> > > > (read http://lwn.net/Articles/617140/ for details)
> > > > and 3.14-rt is no longer being developed against, I figured it was
> > > time
> > > > to put it under the "stable-rt" umbrella.
> > >
> > > I piddled about with it yesterday, found that you can't change cpufreq
> > > governor IFF the tree is configured as rt, but works fine as voluntary
> > > preempt.
> > 
> > The problem seems to be this patch: https://lkml.org/lkml/2014/4/8/584
> > 
> > The cpufreq code does nested down_read_trylocks and only the first one
> > succeeds:
> > 
> > drivers/cpufreq/cpufreq.c:
> > store
> >   down_read_trylock(cpufreq_rwsem)  <- succeeds
> >   store_scaling_governor
> >     cpufreq_get_policy
> >       cpufreq_cpu_get
> >         down_read_trylock(cpufreq_rwsem)  <-- fails
> > 
> > Reverting the above patch 'fixes' the problem. I don't understand Steven's
> > commit comment that readers of rwsem are not allowed to nest in mainline
> > since this works just fine in mainline.
> 
> When we allow multiple readers, this will be allowed. But even in
> mainline, if a writer were to come in and block between those two
> down_read_trylocks(), the second trylock would fail.
> 
> PREEMPT_RT just has that fail all the time as we only allow an rwsem to
> be held by a single reader.

Errm. The reader holds the sem already. So that's a recursive read
lock which should always succeed. And rt_read_trylock() has that
implemented.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ANNOUNCE] 3.14.23-rt20
  2014-11-05 15:44           ` Mike Galbraith
@ 2014-11-05 16:07             ` Thomas Gleixner
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Gleixner @ 2014-11-05 16:07 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Steven Rostedt, Juerg Haefliger, LKML, linux-rt-users,
	Carsten Emde, John Kacur, Sebastian Andrzej Siewior,
	Clark Williams

On Wed, 5 Nov 2014, Mike Galbraith wrote:
> On Wed, 2014-11-05 at 10:00 -0500, Steven Rostedt wrote: 
> > On Wed, 5 Nov 2014 15:43:24 +0100
> > Juerg Haefliger <juergh@gmail.com> wrote:
> > 
> > > Thanks for the explanation. So is this considered a temporary failure
> > > until multiple readers are allowed or does cpufreq need fixing or
> > > something else? Just trying to figure out what to do next.
> > 
> > Good question. Unfortunately I don't know the answer to that.
> 
> [RFC PATCH RT] rwsem: The return of multi-reader PI rwsems
> 
> The above springs to mind.

Shudder!


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ANNOUNCE] 3.14.23-rt20
  2014-11-05 16:07       ` Thomas Gleixner
@ 2014-11-05 22:29         ` Thomas Gleixner
  2014-11-05 22:48           ` Steven Rostedt
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2014-11-05 22:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Juerg Haefliger, Mike Galbraith, LKML, linux-rt-users,
	Carsten Emde, John Kacur, Sebastian Andrzej Siewior,
	Clark Williams

On Wed, 5 Nov 2014, Thomas Gleixner wrote:
> On Wed, 5 Nov 2014, Steven Rostedt wrote:
> > On Wed, 5 Nov 2014 14:50:41 +0100
> > Juerg Haefliger <juergh@gmail.com> wrote:
> > 
> > > On Sun, Nov 2, 2014 at 8:30 AM, Mike Galbraith <umgwanakikbuti@gmail.com>
> > > wrote:
> > > >
> > > > On Fri, 2014-10-31 at 17:03 -0400, Steven Rostedt wrote:
> > > > > Dear RT Folks,
> > > > >
> > > > > I'm pleased to announce the 3.14.23-rt20 stable release.
> > > > >
> > > > > This is the first 3.14-rt release in the stable-rt series. Normally I
> > > > > wait till the next development release is out before I pull in a new
> > > > > one. That is, I would pull in 3.14-rt when 3.16-rt or later was
> > > > > released. But because development is now moving at a "hobbyist rate"
> > > > > (read http://lwn.net/Articles/617140/ for details)
> > > > > and 3.14-rt is no longer being developed against, I figured it was
> > > > time
> > > > > to put it under the "stable-rt" umbrella.
> > > >
> > > > I piddled about with it yesterday, found that you can't change cpufreq
> > > > governor IFF the tree is configured as rt, but works fine as voluntary
> > > > preempt.
> > > 
> > > The problem seems to be this patch: https://lkml.org/lkml/2014/4/8/584
> > > 
> > > The cpufreq code does nested down_read_trylocks and only the first one
> > > succeeds:
> > > 
> > > drivers/cpufreq/cpufreq.c:
> > > store
> > >   down_read_trylock(cpufreq_rwsem)  <- succeeds
> > >   store_scaling_governor
> > >     cpufreq_get_policy
> > >       cpufreq_cpu_get
> > >         down_read_trylock(cpufreq_rwsem)  <-- fails
> > > 
> > > Reverting the above patch 'fixes' the problem. I don't understand Steven's
> > > commit comment that readers of rwsem are not allowed to nest in mainline
> > > since this works just fine in mainline.
> > 
> > When we allow multiple readers, this will be allowed. But even in
> > mainline, if a writer were to come in and block between those two
> > down_read_trylocks(), the second trylock would fail.
> > 
> > PREEMPT_RT just has that fail all the time as we only allow an rwsem to
> > be held by a single reader.
> 
> Errm. The reader holds the sem already. So that's a recursive read
> lock which should always succeed. And rt_read_trylock() has that
> implemented.

Bah. That's the rwlock path. Untested patch below should fix the issue.

Thanks,

	tglx

------------------------>

diff --git a/include/linux/rwsem_rt.h b/include/linux/rwsem_rt.h
index 0065b08fbb7a..924c2d274ab5 100644
--- a/include/linux/rwsem_rt.h
+++ b/include/linux/rwsem_rt.h
@@ -20,6 +20,7 @@
 
 struct rw_semaphore {
 	struct rt_mutex		lock;
+	int			read_depth;
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lockdep_map	dep_map;
 #endif
diff --git a/kernel/locking/rt.c b/kernel/locking/rt.c
index 90b8ba03e2a4..a48bff77e2a8 100644
--- a/kernel/locking/rt.c
+++ b/kernel/locking/rt.c
@@ -321,8 +321,11 @@ EXPORT_SYMBOL(rt_up_write);
 
 void  rt_up_read(struct rw_semaphore *rwsem)
 {
-	rwsem_release(&rwsem->dep_map, 1, _RET_IP_);
-	rt_mutex_unlock(&rwsem->lock);
+	/* Release the lock only when read_depth is down to 0 */
+	if (--rwsem->read_depth == 0) {
+		rwsem_release(&rwsem->dep_map, 1, _RET_IP_);
+		rt_mutex_unlock(&rwsem->lock);
+	}
 }
 EXPORT_SYMBOL(rt_up_read);
 
@@ -332,7 +335,9 @@ EXPORT_SYMBOL(rt_up_read);
  */
 void  rt_downgrade_write(struct rw_semaphore *rwsem)
 {
-	BUG_ON(rt_mutex_owner(&rwsem->lock) != current);
+	BUG_ON(rt_mutex_owner(&rwsem->lock) != current ||
+	       rwsem->read_depth != 0);
+	rwsem->read_depth++;
 }
 EXPORT_SYMBOL(rt_downgrade_write);
 
@@ -370,11 +375,21 @@ EXPORT_SYMBOL(rt_down_write_nested_lock);
 
 int  rt_down_read_trylock(struct rw_semaphore *rwsem)
 {
-	int ret;
+	int ret = 1;
+
+	/*
+	 * recursive read locks succeed when current owns the lock
+	 */
+	if (rt_mutex_owner(&rwsem->lock) != current) {
+		ret = rt_mutex_trylock(&rwsem->lock);
+		if (ret)
+			rwsem_acquire(&rwsem->dep_map, 0, 1, _RET_IP_);
+	} else if (!rwsem->read_depth) {
+		ret = 0;
+	}
 
-	ret = rt_mutex_trylock(&rwsem->lock);
 	if (ret)
-		rwsem_acquire(&rwsem->dep_map, 0, 1, _RET_IP_);
+		rwsem->read_depth++;
 
 	return ret;
 }
@@ -382,8 +397,14 @@ EXPORT_SYMBOL(rt_down_read_trylock);
 
 static void __rt_down_read(struct rw_semaphore *rwsem, int subclass)
 {
-	rwsem_acquire(&rwsem->dep_map, subclass, 0, _RET_IP_);
-	rt_mutex_lock(&rwsem->lock);
+	/*
+	 * recursive read locks succeed when current owns the lock
+	 */
+	if (rt_mutex_owner(&rwsem->lock) != current) {
+		rwsem_acquire(&rwsem->dep_map, subclass, 0, _RET_IP_);
+		rt_mutex_lock(&rwsem->lock);
+	}
+	rwsem->read_depth++;
 }
 
 void  rt_down_read(struct rw_semaphore *rwsem)
@@ -408,6 +429,7 @@ void  __rt_rwsem_init(struct rw_semaphore *rwsem, const char *name,
 	debug_check_no_locks_freed((void *)rwsem, sizeof(*rwsem));
 	lockdep_init_map(&rwsem->dep_map, name, key, 0);
 #endif
+	rwsem->read_depth = 0;
 	rwsem->lock.save_state = 0;
 }
 EXPORT_SYMBOL(__rt_rwsem_init);



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [ANNOUNCE] 3.14.23-rt20
  2014-11-05 22:29         ` Thomas Gleixner
@ 2014-11-05 22:48           ` Steven Rostedt
  2014-11-05 23:11             ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Rostedt @ 2014-11-05 22:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Juerg Haefliger, Mike Galbraith, LKML, linux-rt-users,
	Carsten Emde, John Kacur, Sebastian Andrzej Siewior,
	Clark Williams

On Wed, 5 Nov 2014 23:29:32 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:

> > > When we allow multiple readers, this will be allowed. But even in
> > > mainline, if a writer were to come in and block between those two
> > > down_read_trylocks(), the second trylock would fail.
> > > 
> > > PREEMPT_RT just has that fail all the time as we only allow an rwsem to
> > > be held by a single reader.
> > 
> > Errm. The reader holds the sem already. So that's a recursive read
> > lock which should always succeed. And rt_read_trylock() has that
> > implemented.
> 
> Bah. That's the rwlock path. Untested patch below should fix the issue.

This is basically a revert of my patch that removed rwsems as being
recursive, because they are not recursive in mainline.

If you have the following:

	down_read(&A);

				down_write(&A);

	down_read(&A);

in mainline, you will have a deadlock. With this change, it will not
deadlock on -rt.

This is probably why that cpu governor code uses down_read_trylock()
otherwise, it would have suffered from possible deadlock scenarios.

But, for a quick solution, just use this patch. I'll start working on
the multi rwsem readers again, and then I'll have to revert this to do
that. But when we have multi readers again, we wont need to have this
hack.

-- Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [ANNOUNCE] 3.14.23-rt20
  2014-11-05 22:48           ` Steven Rostedt
@ 2014-11-05 23:11             ` Thomas Gleixner
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Gleixner @ 2014-11-05 23:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Juerg Haefliger, Mike Galbraith, LKML, linux-rt-users,
	Carsten Emde, John Kacur, Sebastian Andrzej Siewior,
	Clark Williams

On Wed, 5 Nov 2014, Steven Rostedt wrote:
> On Wed, 5 Nov 2014 23:29:32 +0100 (CET)
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > > > When we allow multiple readers, this will be allowed. But even in
> > > > mainline, if a writer were to come in and block between those two
> > > > down_read_trylocks(), the second trylock would fail.
> > > > 
> > > > PREEMPT_RT just has that fail all the time as we only allow an rwsem to
> > > > be held by a single reader.
> > > 
> > > Errm. The reader holds the sem already. So that's a recursive read
> > > lock which should always succeed. And rt_read_trylock() has that
> > > implemented.
> > 
> > Bah. That's the rwlock path. Untested patch below should fix the issue.
> 
> This is basically a revert of my patch that removed rwsems as being
> recursive, because they are not recursive in mainline.

Well, they are. Just not in the presence of a writer.

Which is silly, but due to the fact that we do not track the readers
unavoidable. If the reader holds the sem already there is no real
reason why it should not succeed with the recursive lock.

If anything relies on that behaviour then we have a way bigger
problem at some other place.

> If you have the following:
> 
> 	down_read(&A);
> 
> 				down_write(&A);
> 
> 	down_read(&A);
> 
> in mainline, you will have a deadlock. With this change, it will not
> deadlock on -rt.

And that's fine on RT in the face of a single reader.

> This is probably why that cpu governor code uses down_read_trylock()
> otherwise, it would have suffered from possible deadlock scenarios.

Right.

> But, for a quick solution, just use this patch. I'll start working on
> the multi rwsem readers again, and then I'll have to revert this to do
> that. But when we have multi readers again, we wont need to have this
> hack.

And how exactly are multi reader rwsem solving that problem?

Not at all. You still need to track recursion of an already 'owning'
reader unless you want to add it twice on the 'owner' list.

Thanks,

	tglx







^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 3.14.23-rt20 - fs,btrfs: fix rt deadlock on extent_buffer->lock
  2014-11-02  7:31   ` 3.14.23-rt20 - fs,btrfs: fix rt deadlock on extent_buffer->lock Mike Galbraith
@ 2015-02-17 11:56     ` Sebastian Andrzej Siewior
  2015-02-17 12:23       ` Mike Galbraith
  2015-02-18 10:47       ` Mike Galbraith
  0 siblings, 2 replies; 29+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-17 11:56 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Steven Rostedt, LKML, linux-rt-users, Thomas Gleixner,
	Carsten Emde, John Kacur, Clark Williams

* Mike Galbraith | 2014-11-02 08:31:18 [+0100]:

>--- a/fs/btrfs/ctree.c
>+++ b/fs/btrfs/ctree.c
>@@ -80,7 +80,7 @@ noinline void btrfs_clear_path_blocking(
> {
> 	int i;
> 
>-#ifdef CONFIG_DEBUG_LOCK_ALLOC
>+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
> 	/* lockdep really cares that we take all of these spinlocks
> 	 * in the right order.  If any of the locks in the path are not
> 	 * currently blocking, it is going to complain.  So, make really

This is gone since commit f82c458 ("btrfs: fix lockups from
btrfs_clear_path_blocking")

>@@ -107,7 +107,7 @@ noinline void btrfs_clear_path_blocking(
> 		}
> 	}
> 
>-#ifdef CONFIG_DEBUG_LOCK_ALLOC
>+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
> 	if (held)
> 		btrfs_clear_lock_blocking_rw(held, held_rw);
> #endif
>--- a/fs/btrfs/extent-tree.c
>+++ b/fs/btrfs/extent-tree.c
>@@ -6938,14 +6938,6 @@ use_block_rsv(struct btrfs_trans_handle
> 		goto again;
> 	}
> 
>-	if (btrfs_test_opt(root, ENOSPC_DEBUG)) {
>-		static DEFINE_RATELIMIT_STATE(_rs,
>-				DEFAULT_RATELIMIT_INTERVAL * 10,
>-				/*DEFAULT_RATELIMIT_BURST*/ 1);
>-		if (__ratelimit(&_rs))
>-			WARN(1, KERN_DEBUG
>-				"BTRFS: block rsv returned %d\n", ret);
>-	}
> try_reserve:
> 	ret = reserve_metadata_bytes(root, block_rsv, blocksize,
> 				     BTRFS_RESERVE_NO_FLUSH);
>
and this look like just a warning with enabled debug that is supressed.
May I drop this patch?

Sebastian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 3.14.23-rt20 - fs,btrfs: fix rt deadlock on extent_buffer->lock
  2015-02-17 11:56     ` Sebastian Andrzej Siewior
@ 2015-02-17 12:23       ` Mike Galbraith
  2015-02-18 10:47       ` Mike Galbraith
  1 sibling, 0 replies; 29+ messages in thread
From: Mike Galbraith @ 2015-02-17 12:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, LKML, linux-rt-users, Thomas Gleixner,
	Carsten Emde, John Kacur, Clark Williams

On Tue, 2015-02-17 at 12:56 +0100, Sebastian Andrzej Siewior wrote:
> * Mike Galbraith | 2014-11-02 08:31:18 [+0100]:
> 
> >--- a/fs/btrfs/ctree.c
> >+++ b/fs/btrfs/ctree.c
> >@@ -80,7 +80,7 @@ noinline void btrfs_clear_path_blocking(
> > {
> > 	int i;
> > 
> >-#ifdef CONFIG_DEBUG_LOCK_ALLOC
> >+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
> > 	/* lockdep really cares that we take all of these spinlocks
> > 	 * in the right order.  If any of the locks in the path are not
> > 	 * currently blocking, it is going to complain.  So, make really
> 
> This is gone since commit f82c458 ("btrfs: fix lockups from
> btrfs_clear_path_blocking")
> 
> >@@ -107,7 +107,7 @@ noinline void btrfs_clear_path_blocking(
> > 		}
> > 	}
> > 
> >-#ifdef CONFIG_DEBUG_LOCK_ALLOC
> >+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
> > 	if (held)
> > 		btrfs_clear_lock_blocking_rw(held, held_rw);
> > #endif
> >--- a/fs/btrfs/extent-tree.c
> >+++ b/fs/btrfs/extent-tree.c
> >@@ -6938,14 +6938,6 @@ use_block_rsv(struct btrfs_trans_handle
> > 		goto again;
> > 	}
> > 
> >-	if (btrfs_test_opt(root, ENOSPC_DEBUG)) {
> >-		static DEFINE_RATELIMIT_STATE(_rs,
> >-				DEFAULT_RATELIMIT_INTERVAL * 10,
> >-				/*DEFAULT_RATELIMIT_BURST*/ 1);
> >-		if (__ratelimit(&_rs))
> >-			WARN(1, KERN_DEBUG
> >-				"BTRFS: block rsv returned %d\n", ret);
> >-	}
> > try_reserve:
> > 	ret = reserve_metadata_bytes(root, block_rsv, blocksize,
> > 				     BTRFS_RESERVE_NO_FLUSH);
> >
> and this look like just a warning with enabled debug that is supressed.
> May I drop this patch?

Hm, I'll try to find some time to beat on filesystems again.

	-Mike


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 3.14.23-rt20 - aio: fix rcu garbage collection might_sleep() splat
  2014-11-02  7:31   ` 3.14.23-rt20 - aio: fix rcu garbage collection might_sleep() splat Mike Galbraith
@ 2015-02-17 12:53     ` Sebastian Andrzej Siewior
  2015-02-17 14:01       ` Mike Galbraith
  0 siblings, 1 reply; 29+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-17 12:53 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Steven Rostedt, LKML, linux-rt-users, Thomas Gleixner,
	Carsten Emde, John Kacur, Clark Williams

* Mike Galbraith | 2014-11-02 08:31:28 [+0100]:

>(not pretty)

You do the RCU thingy and replace locking. Is this required after the
patch I've sent in "Re: [RFC PATCH V2] rt/aio: fix rcu garbage
collection might_sleep() splat"? From a quick browse I've seen that
ffs_epfile_aio_write() invokes kiocb_set_cancel_fn(, ffs_aio_cancel) and
the function does hold a sleeping lock.

Sebastian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 3.14.23-rt20 - x86, UV: raw_spinlock conversion
  2014-11-02  7:31   ` 3.14.23-rt20 - x86, UV: raw_spinlock conversion Mike Galbraith
@ 2015-02-17 13:02     ` Sebastian Andrzej Siewior
  2015-02-17 14:11       ` Mike Galbraith
  0 siblings, 1 reply; 29+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-17 13:02 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Steven Rostedt, LKML, linux-rt-users, Thomas Gleixner,
	Carsten Emde, John Kacur, Clark Williams

* Mike Galbraith | 2014-11-02 08:31:37 [+0100]:

>Shrug.  Lots of hobbyists have a beast in their basement, right?

I can take this as is if you want.

>--- a/arch/x86/platform/uv/uv_time.c
>+++ b/arch/x86/platform/uv/uv_time.c
>@@ -300,13 +300,18 @@ static int uv_rtc_unset_timer(int cpu, i
> static cycle_t uv_read_rtc(struct clocksource *cs)
> {
> 	unsigned long offset;
>+	cycle_t cycles;
> 
>+	migrate_disable();
> 	if (uv_get_min_hub_revision_id() == 1)
> 		offset = 0;
> 	else
> 		offset = (uv_blade_processor_id() * L1_CACHE_BYTES) % PAGE_SIZE;
> 
>-	return (cycle_t)uv_read_local_mmr(UVH_RTC | offset);
>+	cycles = (cycle_t)uv_read_local_mmr(UVH_RTC | offset);
>+	migrate_enable();
>+
>+	return cycles;
> }

but do you really want a migrate_disable() in here? The only problem I
can imagine is that you switch CPUs between uv_blade_processor_id()
and the actual read. I recommend preempt_disable() and sending this
upstream as well since it is not limited to -RT.

Sebastian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 3.14.23-rt20 - softirq: resurrect softirq threads
  2014-11-02  7:31   ` 3.14.23-rt20 - softirq: resurrect softirq threads Mike Galbraith
@ 2015-02-17 13:05     ` Sebastian Andrzej Siewior
  2015-02-17 14:00       ` Mike Galbraith
  0 siblings, 1 reply; 29+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-17 13:05 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Steven Rostedt, LKML, linux-rt-users, Thomas Gleixner,
	Carsten Emde, John Kacur, Clark Williams

* Mike Galbraith | 2014-11-02 08:31:47 [+0100]:

>(sirqs suck, this makes them suck less for some boxen/loads)
>
>Subject: softirq: resurrect softirq threads
>From: Mike Galbraith <mgalbraith@suse.de>
>Date: Mon Jan  6 08:42:11 CET 2014
>
>Some loads cannot tolerate the jitter induced by all softirqs being processed
>at the same priority.  Let the user prioritize them again.
>
>Signed-off-by: Mike Galbraith <mgalbraith@suse.de>

I'm going to postpone this. While it might make sense in general I'm
going to wait for tglx, Steven and others to see if this is what we want
or if there are some plans redoing the softirq handling.

Sebastian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 3.14.23-rt20 - softirq: resurrect softirq threads
  2015-02-17 13:05     ` Sebastian Andrzej Siewior
@ 2015-02-17 14:00       ` Mike Galbraith
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Galbraith @ 2015-02-17 14:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, LKML, linux-rt-users, Thomas Gleixner,
	Carsten Emde, John Kacur, Clark Williams

On Tue, 2015-02-17 at 14:05 +0100, Sebastian Andrzej Siewior wrote:
> * Mike Galbraith | 2014-11-02 08:31:47 [+0100]:
> 
> >(sirqs suck, this makes them suck less for some boxen/loads)
> >
> >Subject: softirq: resurrect softirq threads
> >From: Mike Galbraith <mgalbraith@suse.de>
> >Date: Mon Jan  6 08:42:11 CET 2014
> >
> >Some loads cannot tolerate the jitter induced by all softirqs being processed
> >at the same priority.  Let the user prioritize them again.
> >
> >Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
> 
> I'm going to postpone this. While it might make sense in general I'm
> going to wait for tglx, Steven and others to see if this is what we want
> or if there are some plans redoing the softirq handling.

You can postpone this one forever, they don't want it.  I posted it for
folks who may want the option.

	-Mike


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 3.14.23-rt20 - aio: fix rcu garbage collection might_sleep() splat
  2015-02-17 12:53     ` Sebastian Andrzej Siewior
@ 2015-02-17 14:01       ` Mike Galbraith
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Galbraith @ 2015-02-17 14:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, LKML, linux-rt-users, Thomas Gleixner,
	Carsten Emde, John Kacur, Clark Williams

On Tue, 2015-02-17 at 13:53 +0100, Sebastian Andrzej Siewior wrote:
> * Mike Galbraith | 2014-11-02 08:31:28 [+0100]:
> 
> >(not pretty)
> 
> You do the RCU thingy and replace locking. Is this required after the
> patch I've sent in "Re: [RFC PATCH V2] rt/aio: fix rcu garbage
> collection might_sleep() splat"? From a quick browse I've seen that
> ffs_epfile_aio_write() invokes kiocb_set_cancel_fn(, ffs_aio_cancel) and
> the function does hold a sleeping lock.

Yeah, I don't think that's changed, it should still gripe.

	-Mike


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 3.14.23-rt20 - x86, UV: raw_spinlock conversion
  2015-02-17 13:02     ` Sebastian Andrzej Siewior
@ 2015-02-17 14:11       ` Mike Galbraith
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Galbraith @ 2015-02-17 14:11 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, LKML, linux-rt-users, Thomas Gleixner,
	Carsten Emde, John Kacur, Clark Williams

On Tue, 2015-02-17 at 14:02 +0100, Sebastian Andrzej Siewior wrote:
> * Mike Galbraith | 2014-11-02 08:31:37 [+0100]:
> 
> >Shrug.  Lots of hobbyists have a beast in their basement, right?
> 
> I can take this as is if you want.

Or you can trash it since SGI didn't seem to care.  This patch was born
while getting 2.6.33-rt running on a UV100 box, I've been carrying it
ever since.

	-Mike



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 3.14.23-rt20 - fs,btrfs: fix rt deadlock on extent_buffer->lock
  2015-02-17 11:56     ` Sebastian Andrzej Siewior
  2015-02-17 12:23       ` Mike Galbraith
@ 2015-02-18 10:47       ` Mike Galbraith
  1 sibling, 0 replies; 29+ messages in thread
From: Mike Galbraith @ 2015-02-18 10:47 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, LKML, linux-rt-users, Thomas Gleixner,
	Carsten Emde, John Kacur, Clark Williams

On Tue, 2015-02-17 at 12:56 +0100, Sebastian Andrzej Siewior wrote:
> * Mike Galbraith | 2014-11-02 08:31:18 [+0100]:
> 
> >--- a/fs/btrfs/ctree.c
> >+++ b/fs/btrfs/ctree.c
> >@@ -80,7 +80,7 @@ noinline void btrfs_clear_path_blocking(
> > {
> > 	int i;
> > 
> >-#ifdef CONFIG_DEBUG_LOCK_ALLOC
> >+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
> > 	/* lockdep really cares that we take all of these spinlocks
> > 	 * in the right order.  If any of the locks in the path are not
> > 	 * currently blocking, it is going to complain.  So, make really
> 
> This is gone since commit f82c458 ("btrfs: fix lockups from
> btrfs_clear_path_blocking")

Goody, BTRFS took a modest workout without a whimper.  When that patch
was born, you didn't have to try, deadlock was immediate gratification.

> >@@ -107,7 +107,7 @@ noinline void btrfs_clear_path_blocking(
> > 		}
> > 	}
> > 
> >-#ifdef CONFIG_DEBUG_LOCK_ALLOC
> >+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
> > 	if (held)
> > 		btrfs_clear_lock_blocking_rw(held, held_rw);
> > #endif
> >--- a/fs/btrfs/extent-tree.c
> >+++ b/fs/btrfs/extent-tree.c
> >@@ -6938,14 +6938,6 @@ use_block_rsv(struct btrfs_trans_handle
> > 		goto again;
> > 	}
> > 
> >-	if (btrfs_test_opt(root, ENOSPC_DEBUG)) {
> >-		static DEFINE_RATELIMIT_STATE(_rs,
> >-				DEFAULT_RATELIMIT_INTERVAL * 10,
> >-				/*DEFAULT_RATELIMIT_BURST*/ 1);
> >-		if (__ratelimit(&_rs))
> >-			WARN(1, KERN_DEBUG
> >-				"BTRFS: block rsv returned %d\n", ret);
> >-	}
> > try_reserve:
> > 	ret = reserve_metadata_bytes(root, block_rsv, blocksize,
> > 				     BTRFS_RESERVE_NO_FLUSH);
> >
> and this look like just a warning with enabled debug that is supressed.
> May I drop this patch?

Yup, that was a cosmetic thing, also suggested by Chris Mason.  BTRFS
seems to now just work.  Hopefully it'll keep on just working, I don't
want to ever go back there :)

	-Mike


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-02-18 10:48 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-31 21:03 [ANNOUNCE] 3.14.23-rt20 Steven Rostedt
2014-11-02  7:30 ` Mike Galbraith
     [not found]   ` <CADLDEKvFvw7Aa98tVtM3z5JV8oocKqAdV4PqOMoZH2mXZ6x2jg@mail.gmail.com>
2014-11-05 14:04     ` Juerg Haefliger
2014-11-05 14:27     ` Steven Rostedt
2014-11-05 14:43       ` Juerg Haefliger
2014-11-05 15:00         ` Steven Rostedt
2014-11-05 15:35           ` Harry van Haaren
2014-11-05 15:44           ` Mike Galbraith
2014-11-05 16:07             ` Thomas Gleixner
2014-11-05 16:07       ` Thomas Gleixner
2014-11-05 22:29         ` Thomas Gleixner
2014-11-05 22:48           ` Steven Rostedt
2014-11-05 23:11             ` Thomas Gleixner
     [not found] ` <1414910967.5380.81.camel@marge.simpson.net>
2014-11-02  7:30   ` 3.14.23-rt20 - sched/numa: Fix task_numa_free() lockdep splat Mike Galbraith
2014-11-02  7:31   ` 3.14.23-rt20 - wwmutex: fix __ww_mutex_lock_interruptible lockdep annotation Mike Galbraith
2014-11-02  7:31   ` 3.14.23-rt20 - mm,memcg: make refill_stock()/consume_stock() use get_cpu_light() Mike Galbraith
2014-11-02  7:31   ` 3.14.23-rt20 - fs,btrfs: fix rt deadlock on extent_buffer->lock Mike Galbraith
2015-02-17 11:56     ` Sebastian Andrzej Siewior
2015-02-17 12:23       ` Mike Galbraith
2015-02-18 10:47       ` Mike Galbraith
2014-11-02  7:31   ` 3.14.23-rt20 - aio: fix rcu garbage collection might_sleep() splat Mike Galbraith
2015-02-17 12:53     ` Sebastian Andrzej Siewior
2015-02-17 14:01       ` Mike Galbraith
2014-11-02  7:31   ` 3.14.23-rt20 - x86, UV: raw_spinlock conversion Mike Galbraith
2015-02-17 13:02     ` Sebastian Andrzej Siewior
2015-02-17 14:11       ` Mike Galbraith
2014-11-02  7:31   ` 3.14.23-rt20 - softirq: resurrect softirq threads Mike Galbraith
2015-02-17 13:05     ` Sebastian Andrzej Siewior
2015-02-17 14:00       ` Mike Galbraith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.