linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] [RFC v2] seqcount: Add lockdep functionality to seqcount/seqlock structures
@ 2013-09-14  0:19 John Stultz
  2013-09-14  0:19 ` [PATCH 2/2] [RFC] cpuset: Fix potential deadlock w/ set_mems_allowed John Stultz
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: John Stultz @ 2013-09-14  0:19 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Mathieu Desnoyers, Li Zefan, Steven Rostedt,
	Peter Zijlstra, Ingo Molnar, Thomas Gleixner

Currently seqlocks and seqcounts don't support lockdep.

After running across a seqcount related deadlock in the timekeeping
code, I used a less-refined and more focused varient of this patch
to narrow down the cause of the issue.

This is a first-pass attempt to properly enable lockdep functionality
on seqlocks and seqcounts.

Since seqcounts are used in the vdso gettimeofday code, I've provided
lockdep accessors.

I've also handled one cases where there were nested seqlock writers
and there may be more edge cases.

Comments and feedback would be appreciated!

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
v2:
 * Update to new simplified lockdep.h
 * vdso accessor simplifications
 * removed needless preempt_disable
 * removed unneeded ifdefs

 arch/x86/vdso/vclock_gettime.c |  8 ++---
 fs/dcache.c                    |  4 +--
 fs/fs_struct.c                 |  2 +-
 include/linux/init_task.h      |  8 ++---
 include/linux/lockdep.h        |  8 +++--
 include/linux/seqlock.h        | 79 ++++++++++++++++++++++++++++++++++++++----
 mm/filemap_xip.c               |  2 +-
 7 files changed, 91 insertions(+), 20 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 72074d5..2ada505 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -178,7 +178,7 @@ notrace static int __always_inline do_realtime(struct timespec *ts)
 
 	ts->tv_nsec = 0;
 	do {
-		seq = read_seqcount_begin(&gtod->seq);
+		seq = read_seqcount_begin_no_lockdep(&gtod->seq);
 		mode = gtod->clock.vclock_mode;
 		ts->tv_sec = gtod->wall_time_sec;
 		ns = gtod->wall_time_snsec;
@@ -198,7 +198,7 @@ notrace static int do_monotonic(struct timespec *ts)
 
 	ts->tv_nsec = 0;
 	do {
-		seq = read_seqcount_begin(&gtod->seq);
+		seq = read_seqcount_begin_no_lockdep(&gtod->seq);
 		mode = gtod->clock.vclock_mode;
 		ts->tv_sec = gtod->monotonic_time_sec;
 		ns = gtod->monotonic_time_snsec;
@@ -214,7 +214,7 @@ notrace static int do_realtime_coarse(struct timespec *ts)
 {
 	unsigned long seq;
 	do {
-		seq = read_seqcount_begin(&gtod->seq);
+		seq = read_seqcount_begin_no_lockdep(&gtod->seq);
 		ts->tv_sec = gtod->wall_time_coarse.tv_sec;
 		ts->tv_nsec = gtod->wall_time_coarse.tv_nsec;
 	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
@@ -225,7 +225,7 @@ notrace static int do_monotonic_coarse(struct timespec *ts)
 {
 	unsigned long seq;
 	do {
-		seq = read_seqcount_begin(&gtod->seq);
+		seq = read_seqcount_begin_no_lockdep(&gtod->seq);
 		ts->tv_sec = gtod->monotonic_time_coarse.tv_sec;
 		ts->tv_nsec = gtod->monotonic_time_coarse.tv_nsec;
 	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
diff --git a/fs/dcache.c b/fs/dcache.c
index ca8e9cd..5e59bd3 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2396,7 +2396,7 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
 	dentry_lock_for_move(dentry, target);
 
 	write_seqcount_begin(&dentry->d_seq);
-	write_seqcount_begin(&target->d_seq);
+	write_seqcount_begin_nested(&target->d_seq, DENTRY_D_LOCK_NESTED);
 
 	/* __d_drop does write_seqcount_barrier, but they're OK to nest. */
 
@@ -2528,7 +2528,7 @@ static void __d_materialise_dentry(struct dentry *dentry, struct dentry *anon)
 	dentry_lock_for_move(anon, dentry);
 
 	write_seqcount_begin(&dentry->d_seq);
-	write_seqcount_begin(&anon->d_seq);
+	write_seqcount_begin_nested(&anon->d_seq, DENTRY_D_LOCK_NESTED);
 
 	dparent = dentry->d_parent;
 
diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index d8ac61d..7dca743 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -161,6 +161,6 @@ EXPORT_SYMBOL(current_umask);
 struct fs_struct init_fs = {
 	.users		= 1,
 	.lock		= __SPIN_LOCK_UNLOCKED(init_fs.lock),
-	.seq		= SEQCNT_ZERO,
+	.seq		= SEQCNT_ZERO(init_fs.seq),
 	.umask		= 0022,
 };
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 5cd0f09..b0ed422 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -32,10 +32,10 @@ extern struct fs_struct init_fs;
 #endif
 
 #ifdef CONFIG_CPUSETS
-#define INIT_CPUSET_SEQ							\
-	.mems_allowed_seq = SEQCNT_ZERO,
+#define INIT_CPUSET_SEQ(tsk)							\
+	.mems_allowed_seq = SEQCNT_ZERO(tsk.mems_allowed_seq),
 #else
-#define INIT_CPUSET_SEQ
+#define INIT_CPUSET_SEQ(tsk)
 #endif
 
 #define INIT_SIGNALS(sig) {						\
@@ -220,7 +220,7 @@ extern struct task_group root_task_group;
 	INIT_FTRACE_GRAPH						\
 	INIT_TRACE_RECURSION						\
 	INIT_TASK_RCU_PREEMPT(tsk)					\
-	INIT_CPUSET_SEQ							\
+	INIT_CPUSET_SEQ(tsk)						\
 	INIT_VTIME(tsk)							\
 }
 
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index cfc2f11..92b1bfc 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -497,6 +497,10 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 #define rwlock_acquire_read(l, s, t, i)		lock_acquire_shared_recursive(l, s, t, NULL, i)
 #define rwlock_release(l, n, i)			lock_release(l, n, i)
 
+#define seqcount_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
+#define seqcount_acquire_read(l, s, t, i)	lock_acquire_shared_recursive(l, s, t, NULL, i)
+#define seqcount_release(l, n, i)		lock_release(l, n, i)
+
 #define mutex_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define mutex_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
 #define mutex_release(l, n, i)			lock_release(l, n, i)
@@ -504,11 +508,11 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 #define rwsem_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define rwsem_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
 #define rwsem_acquire_read(l, s, t, i)		lock_acquire_shared(l, s, t, NULL, i)
-# define rwsem_release(l, n, i)			lock_release(l, n, i)
+#define rwsem_release(l, n, i)			lock_release(l, n, i)
 
 #define lock_map_acquire(l)			lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_)
 #define lock_map_acquire_read(l)		lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_)
-# define lock_map_release(l)			lock_release(l, 1, _THIS_IP_)
+#define lock_map_release(l)			lock_release(l, 1, _THIS_IP_)
 
 #ifdef CONFIG_PROVE_LOCKING
 # define might_lock(lock) 						\
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 1829905..c633b5d 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -28,6 +28,7 @@
 
 #include <linux/spinlock.h>
 #include <linux/preempt.h>
+#include <linux/lockdep.h>
 #include <asm/processor.h>
 
 /*
@@ -38,10 +39,50 @@
  */
 typedef struct seqcount {
 	unsigned sequence;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	struct lockdep_map dep_map;
+#endif
 } seqcount_t;
 
-#define SEQCNT_ZERO { 0 }
-#define seqcount_init(x)	do { *(x) = (seqcount_t) SEQCNT_ZERO; } while (0)
+static inline void __seqcount_init(seqcount_t *s, const char *name,
+					  struct lock_class_key *key)
+{
+	/*
+	 * Make sure we are not reinitializing a held lock:
+	 */
+	lockdep_init_map(&s->dep_map, name, key, 0);
+	s->sequence = 0;
+}
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+# define SEQCOUNT_DEP_MAP_INIT(lockname) \
+		.dep_map = { .name = #lockname } \
+
+# define seqcount_init(s)				\
+	do {						\
+		static struct lock_class_key __key;	\
+		__seqcount_init((s), #s, &__key);	\
+	} while (0)
+
+static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
+{
+	seqcount_t *l = (seqcount_t *)s;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	seqcount_acquire_read(&l->dep_map, 0, 0, _RET_IP_);
+	seqcount_release(&l->dep_map, 1, _RET_IP_);
+	local_irq_restore(flags);
+}
+
+#else
+# define SEQCOUNT_DEP_MAP_INIT(lockname)
+# define seqcount_init(s) __seqcount_init(s, NULL, NULL)
+# define seqcount_lockdep_reader_access(x)
+#endif
+
+#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)}
+
 
 /**
  * __read_seqcount_begin - begin a seq-read critical section (without barrier)
@@ -70,6 +111,22 @@ repeat:
 }
 
 /**
+ * read_seqcount_begin_no_lockdep - start seq-read critical section w/o lockdep
+ * @s: pointer to seqcount_t
+ * Returns: count to be passed to read_seqcount_retry
+ *
+ * read_seqcount_begin_no_lockdep opens a read critical section of the given
+ * seqcount, but without any lockdep checking. Validity of the critical
+ * section is tested by checking read_seqcount_retry function.
+ */
+static inline unsigned read_seqcount_begin_no_lockdep(const seqcount_t *s)
+{
+	unsigned ret = __read_seqcount_begin(s);
+	smp_rmb();
+	return ret;
+}
+
+/**
  * read_seqcount_begin - begin a seq-read critical section
  * @s: pointer to seqcount_t
  * Returns: count to be passed to read_seqcount_retry
@@ -80,9 +137,8 @@ repeat:
  */
 static inline unsigned read_seqcount_begin(const seqcount_t *s)
 {
-	unsigned ret = __read_seqcount_begin(s);
-	smp_rmb();
-	return ret;
+	seqcount_lockdep_reader_access(s);
+	return read_seqcount_begin_no_lockdep(s);
 }
 
 /**
@@ -102,6 +158,8 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
 static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 {
 	unsigned ret = ACCESS_ONCE(s->sequence);
+
+	seqcount_lockdep_reader_access(s);
 	smp_rmb();
 	return ret & ~1;
 }
@@ -150,10 +208,19 @@ static inline void write_seqcount_begin(seqcount_t *s)
 {
 	s->sequence++;
 	smp_wmb();
+	seqcount_acquire(&s->dep_map, 0, 0, _RET_IP_);
+}
+
+static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
+{
+	s->sequence++;
+	smp_wmb();
+	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
 static inline void write_seqcount_end(seqcount_t *s)
 {
+	seqcount_release(&s->dep_map, 1, _RET_IP_);
 	smp_wmb();
 	s->sequence++;
 }
@@ -182,7 +249,7 @@ typedef struct {
  */
 #define __SEQLOCK_UNLOCKED(lockname)			\
 	{						\
-		.seqcount = SEQCNT_ZERO,		\
+		.seqcount = SEQCNT_ZERO(lockname),	\
 		.lock =	__SPIN_LOCK_UNLOCKED(lockname)	\
 	}
 
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
index 28fe26b..d8d9fe3 100644
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -26,7 +26,7 @@
  * of ZERO_PAGE(), such as /dev/zero
  */
 static DEFINE_MUTEX(xip_sparse_mutex);
-static seqcount_t xip_sparse_seq = SEQCNT_ZERO;
+static seqcount_t xip_sparse_seq = SEQCNT_ZERO(xip_sparse_seq);
 static struct page *__xip_sparse_page;
 
 /* called under xip_sparse_mutex */
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] [RFC] cpuset: Fix potential deadlock w/ set_mems_allowed
  2013-09-14  0:19 [PATCH 1/2] [RFC v2] seqcount: Add lockdep functionality to seqcount/seqlock structures John Stultz
@ 2013-09-14  0:19 ` John Stultz
  2013-09-14  9:04   ` Li Zefan
  2013-09-14  8:42 ` [PATCH 1/2] [RFC v2] seqcount: Add lockdep functionality to seqcount/seqlock structures Li Zefan
  2013-09-17  0:23 ` John Stultz
  2 siblings, 1 reply; 6+ messages in thread
From: John Stultz @ 2013-09-14  0:19 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Li Zefan, Mathieu Desnoyers, Steven Rostedt,
	Peter Zijlstra, Ingo Molnar, Thomas Gleixner

After adding lockdep support to seqlock/seqcount structures,
I started seeing the following warning:

[    1.070907] ======================================================
[    1.072015] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
[    1.073181] 3.11.0+ #67 Not tainted
[    1.073801] ------------------------------------------------------
[    1.074882] kworker/u4:2/708 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[    1.076088]  (&p->mems_allowed_seq){+.+...}, at: [<ffffffff81187d7f>] new_slab+0x5f/0x280
[    1.077572]
[    1.077572] and this task is already holding:
[    1.078593]  (&(&q->__queue_lock)->rlock){..-...}, at: [<ffffffff81339f03>] blk_execute_rq_nowait+0x53/0xf0
[    1.080042] which would create a new lock dependency:
[    1.080042]  (&(&q->__queue_lock)->rlock){..-...} -> (&p->mems_allowed_seq){+.+...}
[    1.080042]
[    1.080042] but this new dependency connects a SOFTIRQ-irq-safe lock:
[    1.080042]  (&(&q->__queue_lock)->rlock){..-...}
[    1.080042] ... which became SOFTIRQ-irq-safe at:
[    1.080042]   [<ffffffff810ec179>] __lock_acquire+0x5b9/0x1db0
[    1.080042]   [<ffffffff810edfe5>] lock_acquire+0x95/0x130
[    1.080042]   [<ffffffff818968a1>] _raw_spin_lock+0x41/0x80
[    1.080042]   [<ffffffff81560c9e>] scsi_device_unbusy+0x7e/0xd0
[    1.080042]   [<ffffffff8155a612>] scsi_finish_command+0x32/0xf0
[    1.080042]   [<ffffffff81560e91>] scsi_softirq_done+0xa1/0x130
[    1.080042]   [<ffffffff8133b0f3>] blk_done_softirq+0x73/0x90
[    1.080042]   [<ffffffff81095dc0>] __do_softirq+0x110/0x2f0
[    1.080042]   [<ffffffff81095fcd>] run_ksoftirqd+0x2d/0x60
[    1.080042]   [<ffffffff810bc506>] smpboot_thread_fn+0x156/0x1e0
[    1.080042]   [<ffffffff810b3916>] kthread+0xd6/0xe0
[    1.080042]   [<ffffffff818980ac>] ret_from_fork+0x7c/0xb0
[    1.080042]
[    1.080042] to a SOFTIRQ-irq-unsafe lock:
[    1.080042]  (&p->mems_allowed_seq){+.+...}
[    1.080042] ... which became SOFTIRQ-irq-unsafe at:
[    1.080042] ...  [<ffffffff810ec1d3>] __lock_acquire+0x613/0x1db0
[    1.080042]   [<ffffffff810edfe5>] lock_acquire+0x95/0x130
[    1.080042]   [<ffffffff810b3df2>] kthreadd+0x82/0x180
[    1.080042]   [<ffffffff818980ac>] ret_from_fork+0x7c/0xb0
[    1.080042]
[    1.080042] other info that might help us debug this:
[    1.080042]
[    1.080042]  Possible interrupt unsafe locking scenario:
[    1.080042]
[    1.080042]        CPU0                    CPU1
[    1.080042]        ----                    ----
[    1.080042]   lock(&p->mems_allowed_seq);
[    1.080042]                                local_irq_disable();
[    1.080042]                                lock(&(&q->__queue_lock)->rlock);
[    1.080042]                                lock(&p->mems_allowed_seq);
[    1.080042]   <Interrupt>
[    1.080042]     lock(&(&q->__queue_lock)->rlock);
[    1.080042]
[    1.080042]  *** DEADLOCK ***

The issue stems from the kthreadd() function calling set_mems_allowed
with irqs enabled. While its possibly unlikely for the actual deadlock
to trigger, a fix is fairly simple: disable irqs before taking the
mems_allowed_seq lock.

Let me know if you have any other suggestions or alternative fixes you'd
prefer.

Cc: Li Zefan <lizefan@huawei.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/cpuset.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index cc1b01c..3fe661f 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -110,10 +110,14 @@ static inline bool put_mems_allowed(unsigned int seq)
 
 static inline void set_mems_allowed(nodemask_t nodemask)
 {
+	unsigned long flags;
+
 	task_lock(current);
+	local_irq_save(flags);
 	write_seqcount_begin(&current->mems_allowed_seq);
 	current->mems_allowed = nodemask;
 	write_seqcount_end(&current->mems_allowed_seq);
+	local_irq_restore(flags);
 	task_unlock(current);
 }
 
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] [RFC v2] seqcount: Add lockdep functionality to seqcount/seqlock structures
  2013-09-14  0:19 [PATCH 1/2] [RFC v2] seqcount: Add lockdep functionality to seqcount/seqlock structures John Stultz
  2013-09-14  0:19 ` [PATCH 2/2] [RFC] cpuset: Fix potential deadlock w/ set_mems_allowed John Stultz
@ 2013-09-14  8:42 ` Li Zefan
  2013-09-17  0:23 ` John Stultz
  2 siblings, 0 replies; 6+ messages in thread
From: Li Zefan @ 2013-09-14  8:42 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Mathieu Desnoyers, Steven Rostedt, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner

On 2013/9/14 8:19, John Stultz wrote:
> Currently seqlocks and seqcounts don't support lockdep.
> 
> After running across a seqcount related deadlock in the timekeeping
> code, I used a less-refined and more focused varient of this patch
> to narrow down the cause of the issue.
> 
> This is a first-pass attempt to properly enable lockdep functionality
> on seqlocks and seqcounts.
> 
> Since seqcounts are used in the vdso gettimeofday code, I've provided
> lockdep accessors.
> 
> I've also handled one cases where there were nested seqlock writers
> and there may be more edge cases.
> 
> Comments and feedback would be appreciated!
> 

Could you describe how seqlocks/seqcounts can lead to deadlock in the
changelog?

> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Li Zefan <lizefan@huawei.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
> v2:
>  * Update to new simplified lockdep.h
>  * vdso accessor simplifications
>  * removed needless preempt_disable
>  * removed unneeded ifdefs


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] [RFC] cpuset: Fix potential deadlock w/ set_mems_allowed
  2013-09-14  0:19 ` [PATCH 2/2] [RFC] cpuset: Fix potential deadlock w/ set_mems_allowed John Stultz
@ 2013-09-14  9:04   ` Li Zefan
  0 siblings, 0 replies; 6+ messages in thread
From: Li Zefan @ 2013-09-14  9:04 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Mathieu Desnoyers, Steven Rostedt, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Mel Gorman

Cc Mel, who added seqcount to cpuset.

On 2013/9/14 8:19, John Stultz wrote:
> After adding lockdep support to seqlock/seqcount structures,
> I started seeing the following warning:
> 
> [    1.070907] ======================================================
> [    1.072015] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
> [    1.073181] 3.11.0+ #67 Not tainted
> [    1.073801] ------------------------------------------------------
> [    1.074882] kworker/u4:2/708 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> [    1.076088]  (&p->mems_allowed_seq){+.+...}, at: [<ffffffff81187d7f>] new_slab+0x5f/0x280
> [    1.077572]
> [    1.077572] and this task is already holding:
> [    1.078593]  (&(&q->__queue_lock)->rlock){..-...}, at: [<ffffffff81339f03>] blk_execute_rq_nowait+0x53/0xf0
> [    1.080042] which would create a new lock dependency:
> [    1.080042]  (&(&q->__queue_lock)->rlock){..-...} -> (&p->mems_allowed_seq){+.+...}
> [    1.080042]
> [    1.080042] but this new dependency connects a SOFTIRQ-irq-safe lock:
> [    1.080042]  (&(&q->__queue_lock)->rlock){..-...}
> [    1.080042] ... which became SOFTIRQ-irq-safe at:
> [    1.080042]   [<ffffffff810ec179>] __lock_acquire+0x5b9/0x1db0
> [    1.080042]   [<ffffffff810edfe5>] lock_acquire+0x95/0x130
> [    1.080042]   [<ffffffff818968a1>] _raw_spin_lock+0x41/0x80
> [    1.080042]   [<ffffffff81560c9e>] scsi_device_unbusy+0x7e/0xd0
> [    1.080042]   [<ffffffff8155a612>] scsi_finish_command+0x32/0xf0
> [    1.080042]   [<ffffffff81560e91>] scsi_softirq_done+0xa1/0x130
> [    1.080042]   [<ffffffff8133b0f3>] blk_done_softirq+0x73/0x90
> [    1.080042]   [<ffffffff81095dc0>] __do_softirq+0x110/0x2f0
> [    1.080042]   [<ffffffff81095fcd>] run_ksoftirqd+0x2d/0x60
> [    1.080042]   [<ffffffff810bc506>] smpboot_thread_fn+0x156/0x1e0
> [    1.080042]   [<ffffffff810b3916>] kthread+0xd6/0xe0
> [    1.080042]   [<ffffffff818980ac>] ret_from_fork+0x7c/0xb0
> [    1.080042]
> [    1.080042] to a SOFTIRQ-irq-unsafe lock:
> [    1.080042]  (&p->mems_allowed_seq){+.+...}
> [    1.080042] ... which became SOFTIRQ-irq-unsafe at:
> [    1.080042] ...  [<ffffffff810ec1d3>] __lock_acquire+0x613/0x1db0
> [    1.080042]   [<ffffffff810edfe5>] lock_acquire+0x95/0x130
> [    1.080042]   [<ffffffff810b3df2>] kthreadd+0x82/0x180
> [    1.080042]   [<ffffffff818980ac>] ret_from_fork+0x7c/0xb0
> [    1.080042]
> [    1.080042] other info that might help us debug this:
> [    1.080042]
> [    1.080042]  Possible interrupt unsafe locking scenario:
> [    1.080042]
> [    1.080042]        CPU0                    CPU1
> [    1.080042]        ----                    ----
> [    1.080042]   lock(&p->mems_allowed_seq);
> [    1.080042]                                local_irq_disable();
> [    1.080042]                                lock(&(&q->__queue_lock)->rlock);
> [    1.080042]                                lock(&p->mems_allowed_seq);
> [    1.080042]   <Interrupt>
> [    1.080042]     lock(&(&q->__queue_lock)->rlock);
> [    1.080042]
> [    1.080042]  *** DEADLOCK ***
> 
> The issue stems from the kthreadd() function calling set_mems_allowed
> with irqs enabled. While its possibly unlikely for the actual deadlock
> to trigger, a fix is fairly simple: disable irqs before taking the
> mems_allowed_seq lock.
> 

Now I get it. I'm fine with this change.

Acked-by: Li Zefan <lizefan@huawei.com>

> Let me know if you have any other suggestions or alternative fixes you'd
> prefer.
> 
> Cc: Li Zefan <lizefan@huawei.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  include/linux/cpuset.h | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> index cc1b01c..3fe661f 100644
> --- a/include/linux/cpuset.h
> +++ b/include/linux/cpuset.h
> @@ -110,10 +110,14 @@ static inline bool put_mems_allowed(unsigned int seq)
>  
>  static inline void set_mems_allowed(nodemask_t nodemask)
>  {
> +	unsigned long flags;
> +
>  	task_lock(current);
> +	local_irq_save(flags);
>  	write_seqcount_begin(&current->mems_allowed_seq);
>  	current->mems_allowed = nodemask;
>  	write_seqcount_end(&current->mems_allowed_seq);
> +	local_irq_restore(flags);
>  	task_unlock(current);
>  }
>  
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] [RFC v2] seqcount: Add lockdep functionality to seqcount/seqlock structures
  2013-09-14  0:19 [PATCH 1/2] [RFC v2] seqcount: Add lockdep functionality to seqcount/seqlock structures John Stultz
  2013-09-14  0:19 ` [PATCH 2/2] [RFC] cpuset: Fix potential deadlock w/ set_mems_allowed John Stultz
  2013-09-14  8:42 ` [PATCH 1/2] [RFC v2] seqcount: Add lockdep functionality to seqcount/seqlock structures Li Zefan
@ 2013-09-17  0:23 ` John Stultz
  2013-09-17  8:28   ` Ingo Molnar
  2 siblings, 1 reply; 6+ messages in thread
From: John Stultz @ 2013-09-17  0:23 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Mathieu Desnoyers, Li Zefan, Steven Rostedt,
	Peter Zijlstra, Ingo Molnar, Thomas Gleixner

On 09/13/2013 05:19 PM, John Stultz wrote:
> Currently seqlocks and seqcounts don't support lockdep.
>
> After running across a seqcount related deadlock in the timekeeping
> code, I used a less-refined and more focused varient of this patch
> to narrow down the cause of the issue.
>
> This is a first-pass attempt to properly enable lockdep functionality
> on seqlocks and seqcounts.
>
> Since seqcounts are used in the vdso gettimeofday code, I've provided
> lockdep accessors.
>
> I've also handled one cases where there were nested seqlock writers
> and there may be more edge cases.

Oof.

So I just noticed there's a bunch of places in the network code that use
fairly deeply embedded seqcounter: u64_stats_sync. There's almost never
an explicit initialization, as they assume they're zeroed when
allocated, but this causes trouble with the lockdep key initialization.

I'll have to go through each of these (about 25 cases) and make them
call seqcount_init(), but since I'm heading to plumbers tomorrow I might
not get to it until next week.

Anyway, let me know if you have any other thoughts on the patches.

thanks
-john


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] [RFC v2] seqcount: Add lockdep functionality to seqcount/seqlock structures
  2013-09-17  0:23 ` John Stultz
@ 2013-09-17  8:28   ` Ingo Molnar
  0 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2013-09-17  8:28 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Mathieu Desnoyers, Li Zefan, Steven Rostedt,
	Peter Zijlstra, Thomas Gleixner


* John Stultz <john.stultz@linaro.org> wrote:

> On 09/13/2013 05:19 PM, John Stultz wrote:
> > Currently seqlocks and seqcounts don't support lockdep.
> >
> > After running across a seqcount related deadlock in the timekeeping
> > code, I used a less-refined and more focused varient of this patch
> > to narrow down the cause of the issue.
> >
> > This is a first-pass attempt to properly enable lockdep functionality
> > on seqlocks and seqcounts.
> >
> > Since seqcounts are used in the vdso gettimeofday code, I've provided
> > lockdep accessors.
> >
> > I've also handled one cases where there were nested seqlock writers
> > and there may be more edge cases.
> 
> Oof.
> 
> So I just noticed there's a bunch of places in the network code that use
> fairly deeply embedded seqcounter: u64_stats_sync. There's almost never
> an explicit initialization, as they assume they're zeroed when
> allocated, but this causes trouble with the lockdep key initialization.
> 
> I'll have to go through each of these (about 25 cases) and make them
> call seqcount_init(), but since I'm heading to plumbers tomorrow I might
> not get to it until next week.
> 
> Anyway, let me know if you have any other thoughts on the patches.

Explicit initialization is generally a bonus for readability, 
debuggability and ease of development, we enforce that for spinlocks as 
well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-09-17  8:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-14  0:19 [PATCH 1/2] [RFC v2] seqcount: Add lockdep functionality to seqcount/seqlock structures John Stultz
2013-09-14  0:19 ` [PATCH 2/2] [RFC] cpuset: Fix potential deadlock w/ set_mems_allowed John Stultz
2013-09-14  9:04   ` Li Zefan
2013-09-14  8:42 ` [PATCH 1/2] [RFC v2] seqcount: Add lockdep functionality to seqcount/seqlock structures Li Zefan
2013-09-17  0:23 ` John Stultz
2013-09-17  8:28   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).