linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 00/12] lockdep: Implement crossrelease feature
@ 2016-06-20  4:55 Byungchul Park
  2016-06-20  4:55 ` [RFC 01/12] lockdep: Refactor lookup_chain_cache() Byungchul Park
                   ` (13 more replies)
  0 siblings, 14 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-20  4:55 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

Crossrelease feature calls a lock which is releasable by a
different context from the context having acquired the lock,
crosslock. For crosslock, all locks having been held in the
context unlocking the crosslock, until eventually the crosslock
will be unlocked, have dependency with the crosslock. That's a
key idea to implement crossrelease feature.

Crossrelease feature introduces 2 new data structures.

1. pend_lock (== plock)

	This is for keeping locks waiting to commit those so
	that an actual dependency chain is built, when commiting
	a crosslock.

	Every task_struct has an array of this pending lock to
	keep those locks. These pending locks will be added
	whenever lock_acquire() is called for normal(non-crosslock)
	lock and will be flushed(committed) at proper time.

2. cross_lock (== xlock)

	This keeps some additional data only for crosslock. There
	is one cross_lock per one lockdep_map for crosslock.
	lockdep_init_map_crosslock() should be used instead of
	lockdep_init_map() to use the lock as a crosslock.

Acquiring and releasing sequence for crossrelease feature:

1. Acquire

	All validation check is performed for all locks.

	1) For non-crosslock (normal lock)

		The hlock will be added not only to held_locks
		of the current's task_struct, but also to
		pend_lock array of the task_struct, so that
		a dependency chain can be built with the lock
		when doing commit.

	2) For crosslock

		The hlock will be added only to the cross_lock
		of the lock's lockdep_map instead of held_locks,
		so that a dependency chain can be built with
		the lock when doing commit. And this lock is
		added to the xlocks_head list.

2. Commit (only for crosslock)

	This establishes a dependency chain between the lock
	unlocking it now and all locks having held in the context
	unlocking it since the lock was held, even though it tries
	to avoid building a chain unnecessarily as far as possible.

3. Release

	1) For non-crosslock (normal lock)

		No change.

	2) For crosslock

		Just Remove the lock from xlocks_head list. Release
		operation should be used with commit operation
		together for crosslock, in order to build a
		dependency chain properly.

Byungchul Park (12):
  lockdep: Refactor lookup_chain_cache()
  lockdep: Add a function building a chain between two hlocks
  lockdep: Make check_prev_add can use a stack_trace of other context
  lockdep: Make save_trace can copy from other stack_trace
  lockdep: Implement crossrelease feature
  lockdep: Apply crossrelease to completion
  pagemap.h: Remove trailing white space
  lockdep: Apply crossrelease to PG_locked lock
  cifs/file.c: Remove trailing white space
  mm/swap_state.c: Remove trailing white space
  lockdep: Call lock_acquire(release) when accessing PG_locked manually
  x86/dumpstack: Optimize save_stack_trace

 arch/x86/include/asm/stacktrace.h |   1 +
 arch/x86/kernel/dumpstack.c       |   2 +
 arch/x86/kernel/dumpstack_32.c    |   2 +
 arch/x86/kernel/stacktrace.c      |   7 +
 fs/cifs/file.c                    |   6 +-
 include/linux/completion.h        | 121 +++++-
 include/linux/irqflags.h          |  16 +-
 include/linux/lockdep.h           | 139 +++++++
 include/linux/mm_types.h          |   9 +
 include/linux/pagemap.h           | 104 ++++-
 include/linux/sched.h             |   5 +
 kernel/fork.c                     |   4 +
 kernel/locking/lockdep.c          | 846 +++++++++++++++++++++++++++++++++++---
 kernel/sched/completion.c         |  55 +--
 lib/Kconfig.debug                 |  30 ++
 mm/filemap.c                      |  10 +-
 mm/ksm.c                          |   1 +
 mm/migrate.c                      |   1 +
 mm/page_alloc.c                   |   3 +
 mm/shmem.c                        |   2 +
 mm/swap_state.c                   |  12 +-
 mm/vmscan.c                       |   1 +
 22 files changed, 1255 insertions(+), 122 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 01/12] lockdep: Refactor lookup_chain_cache()
  2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
@ 2016-06-20  4:55 ` Byungchul Park
  2016-06-20  4:55 ` [RFC 02/12] lockdep: Add a function building a chain between two hlocks Byungchul Park
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-20  4:55 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

Currently, lookup_chain_cache() provides both "lookup" and "add"
functionalities in a function. However each one is useful indivisually.
Some features, e.g. crossrelease, can use each one indivisually.
Thus, splited these functionalities into 2 functions.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 125 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 79 insertions(+), 46 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 716547f..efd001c 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2010,15 +2010,9 @@ struct lock_class *lock_chain_get_class(struct lock_chain *chain, int i)
 	return lock_classes + chain_hlocks[chain->base + i];
 }
 
-/*
- * Look up a dependency chain. If the key is not present yet then
- * add it and return 1 - in this case the new dependency chain is
- * validated. If the key is already hashed, return 0.
- * (On return with 1 graph_lock is held.)
- */
-static inline int lookup_chain_cache(struct task_struct *curr,
-				     struct held_lock *hlock,
-				     u64 chain_key)
+static inline int add_chain_cache(struct task_struct *curr,
+				  struct held_lock *hlock,
+				  u64 chain_key)
 {
 	struct lock_class *class = hlock_class(hlock);
 	struct hlist_head *hash_head = chainhashentry(chain_key);
@@ -2027,46 +2021,18 @@ static inline int lookup_chain_cache(struct task_struct *curr,
 	int i, j;
 
 	/*
+	 * Allocate a new chain entry from the static array, and add
+	 * it to the hash:
+	 */
+
+	/*
 	 * We might need to take the graph lock, ensure we've got IRQs
 	 * disabled to make this an IRQ-safe lock.. for recursion reasons
 	 * lockdep won't complain about its own locking errors.
 	 */
 	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
 		return 0;
-	/*
-	 * We can walk it lock-free, because entries only get added
-	 * to the hash:
-	 */
-	hlist_for_each_entry_rcu(chain, hash_head, entry) {
-		if (chain->chain_key == chain_key) {
-cache_hit:
-			debug_atomic_inc(chain_lookup_hits);
-			if (very_verbose(class))
-				printk("\nhash chain already cached, key: "
-					"%016Lx tail class: [%p] %s\n",
-					(unsigned long long)chain_key,
-					class->key, class->name);
-			return 0;
-		}
-	}
-	if (very_verbose(class))
-		printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
-			(unsigned long long)chain_key, class->key, class->name);
-	/*
-	 * Allocate a new chain entry from the static array, and add
-	 * it to the hash:
-	 */
-	if (!graph_lock())
-		return 0;
-	/*
-	 * We have to walk the chain again locked - to avoid duplicates:
-	 */
-	hlist_for_each_entry(chain, hash_head, entry) {
-		if (chain->chain_key == chain_key) {
-			graph_unlock();
-			goto cache_hit;
-		}
-	}
+
 	if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
 		if (!debug_locks_off_graph_unlock())
 			return 0;
@@ -2102,6 +2068,72 @@ cache_hit:
 	return 1;
 }
 
+/*
+ * Look up a dependency chain.
+ */
+static inline struct lock_chain *lookup_chain_cache(u64 chain_key)
+{
+	struct hlist_head *hash_head = chainhashentry(chain_key);
+	struct lock_chain *chain;
+
+	/*
+	 * We can walk it lock-free, because entries only get added
+	 * to the hash:
+	 */
+	hlist_for_each_entry_rcu(chain, hash_head, entry) {
+		if (chain->chain_key == chain_key) {
+			debug_atomic_inc(chain_lookup_hits);
+			return chain;
+		}
+	}
+	return NULL;
+}
+
+/*
+ * If the key is not present yet in dependency chain cache then
+ * add it and return 1 - in this case the new dependency chain is
+ * validated. If the key is already hashed, return 0.
+ * (On return with 1 graph_lock is held.)
+ */
+static inline int lookup_chain_cache_add(struct task_struct *curr,
+					 struct held_lock *hlock,
+					 u64 chain_key)
+{
+	struct lock_class *class = hlock_class(hlock);
+	struct lock_chain *chain = lookup_chain_cache(chain_key);
+
+	if (chain) {
+cache_hit:
+		if (very_verbose(class))
+			printk("\nhash chain already cached, key: "
+					"%016Lx tail class: [%p] %s\n",
+					(unsigned long long)chain_key,
+					class->key, class->name);
+		return 0;
+	}
+
+	if (very_verbose(class))
+		printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
+			(unsigned long long)chain_key, class->key, class->name);
+
+	if (!graph_lock())
+		return 0;
+
+	/*
+	 * We have to walk the chain again locked - to avoid duplicates:
+	 */
+	chain = lookup_chain_cache(chain_key);
+	if (chain) {
+		graph_unlock();
+		goto cache_hit;
+	}
+
+	if (!add_chain_cache(curr, hlock, chain_key))
+		return 0;
+
+	return 1;
+}
+
 static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 		struct held_lock *hlock, int chain_head, u64 chain_key)
 {
@@ -2112,11 +2144,11 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 	 *
 	 * We look up the chain_key and do the O(N^2) check and update of
 	 * the dependencies only if this is a new dependency chain.
-	 * (If lookup_chain_cache() returns with 1 it acquires
+	 * (If lookup_chain_cache_add() return with 1 it acquires
 	 * graph_lock for us)
 	 */
 	if (!hlock->trylock && hlock->check &&
-	    lookup_chain_cache(curr, hlock, chain_key)) {
+	    lookup_chain_cache_add(curr, hlock, chain_key)) {
 		/*
 		 * Check whether last held lock:
 		 *
@@ -2147,9 +2179,10 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 		if (!chain_head && ret != 2)
 			if (!check_prevs_add(curr, hlock))
 				return 0;
+
 		graph_unlock();
 	} else
-		/* after lookup_chain_cache(): */
+		/* after lookup_chain_cache_add(): */
 		if (unlikely(!debug_locks))
 			return 0;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 02/12] lockdep: Add a function building a chain between two hlocks
  2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
  2016-06-20  4:55 ` [RFC 01/12] lockdep: Refactor lookup_chain_cache() Byungchul Park
@ 2016-06-20  4:55 ` Byungchul Park
  2016-06-20  4:55 ` [RFC 03/12] lockdep: Make check_prev_add can use a stack_trace of other context Byungchul Park
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-20  4:55 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

add_chain_cache() can only be used by current context since it
depends on a task's held_locks which is not protected by lock.
However, it would be useful if a dependency chain can be built
in any context. This patch makes the chain building not depend
on its context.

Especially, crossrelease feature wants to do this. Crossrelease
feature introduces a additional dependency chain consisting of 2
lock classes using 2 hlock instances, to connect dependency
between different contexts.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index efd001c..4d51208 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2010,6 +2010,63 @@ struct lock_class *lock_chain_get_class(struct lock_chain *chain, int i)
 	return lock_classes + chain_hlocks[chain->base + i];
 }
 
+/*
+ * This can make it possible to build a chain between just two
+ * specified hlocks rather than between already held locks of
+ * the current task and newly held lock, which can be done by
+ * add_chain_cache().
+ *
+ * add_chain_cache() must be done within the lock owner's context,
+ * however this can be called in any context if two racy-less hlock
+ * instances were already taken by caller. Thus this can be useful
+ * when building a chain between two hlocks regardless of context.
+ */
+static inline int add_chain_cache_2hlocks(struct held_lock *prev,
+					  struct held_lock *next,
+					  u64 chain_key)
+{
+	struct hlist_head *hash_head = chainhashentry(chain_key);
+	struct lock_chain *chain;
+
+	/*
+	 * Allocate a new chain entry from the static array, and add
+	 * it to the hash:
+	 */
+
+	/*
+	 * We might need to take the graph lock, ensure we've got IRQs
+	 * disabled to make this an IRQ-safe lock.. for recursion reasons
+	 * lockdep won't complain about its own locking errors.
+	 */
+	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+		return 0;
+
+	if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
+		if (!debug_locks_off_graph_unlock())
+			return 0;
+
+		print_lockdep_off("BUG: MAX_LOCKDEP_CHAINS too low!");
+		dump_stack();
+		return 0;
+	}
+
+	chain = lock_chains + nr_lock_chains++;
+	chain->chain_key = chain_key;
+	chain->irq_context = next->irq_context;
+	chain->depth = 2;
+	if (likely(nr_chain_hlocks + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS)) {
+		chain->base = nr_chain_hlocks;
+		nr_chain_hlocks += chain->depth;
+		chain_hlocks[chain->base] = prev->class_idx - 1;
+		chain_hlocks[chain->base + 1] = next->class_idx -1;
+	}
+	hlist_add_head_rcu(&chain->entry, hash_head);
+	debug_atomic_inc(chain_lookup_misses);
+	inc_chains();
+
+	return 1;
+}
+
 static inline int add_chain_cache(struct task_struct *curr,
 				  struct held_lock *hlock,
 				  u64 chain_key)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 03/12] lockdep: Make check_prev_add can use a stack_trace of other context
  2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
  2016-06-20  4:55 ` [RFC 01/12] lockdep: Refactor lookup_chain_cache() Byungchul Park
  2016-06-20  4:55 ` [RFC 02/12] lockdep: Add a function building a chain between two hlocks Byungchul Park
@ 2016-06-20  4:55 ` Byungchul Park
  2016-06-20  4:55 ` [RFC 04/12] lockdep: Make save_trace can copy from other stack_trace Byungchul Park
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-20  4:55 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

Currently, check_prev_add() can only save its current context's stack
trace. But it would be useful if a seperated stack_trace can be taken
and used in check_prev_add(). Crossrelease feature can use
check_prev_add() with another context's stack_trace.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 4d51208..c596bef 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1822,7 +1822,8 @@ check_deadlock(struct task_struct *curr, struct held_lock *next,
  */
 static int
 check_prev_add(struct task_struct *curr, struct held_lock *prev,
-	       struct held_lock *next, int distance, int *stack_saved)
+	       struct held_lock *next, int distance, int *stack_saved,
+	       struct stack_trace *own_trace)
 {
 	struct lock_list *entry;
 	int ret;
@@ -1883,7 +1884,7 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
 		}
 	}
 
-	if (!*stack_saved) {
+	if (!own_trace && stack_saved && !*stack_saved) {
 		if (!save_trace(&trace))
 			return 0;
 		*stack_saved = 1;
@@ -1895,14 +1896,14 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
 	 */
 	ret = add_lock_to_list(hlock_class(prev), hlock_class(next),
 			       &hlock_class(prev)->locks_after,
-			       next->acquire_ip, distance, &trace);
+			       next->acquire_ip, distance, own_trace ?: &trace);
 
 	if (!ret)
 		return 0;
 
 	ret = add_lock_to_list(hlock_class(next), hlock_class(prev),
 			       &hlock_class(next)->locks_before,
-			       next->acquire_ip, distance, &trace);
+			       next->acquire_ip, distance, own_trace ?: &trace);
 	if (!ret)
 		return 0;
 
@@ -1911,7 +1912,8 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
 	 */
 	if (verbose(hlock_class(prev)) || verbose(hlock_class(next))) {
 		/* We drop graph lock, so another thread can overwrite trace. */
-		*stack_saved = 0;
+		if (stack_saved)
+			*stack_saved = 0;
 		graph_unlock();
 		printk("\n new dependency: ");
 		print_lock_name(hlock_class(prev));
@@ -1960,8 +1962,8 @@ check_prevs_add(struct task_struct *curr, struct held_lock *next)
 		 * added:
 		 */
 		if (hlock->read != 2 && hlock->check) {
-			if (!check_prev_add(curr, hlock, next,
-						distance, &stack_saved))
+			if (!check_prev_add(curr, hlock, next, distance,
+						&stack_saved, NULL))
 				return 0;
 			/*
 			 * Stop after the first non-trylock entry,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 04/12] lockdep: Make save_trace can copy from other stack_trace
  2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
                   ` (2 preceding siblings ...)
  2016-06-20  4:55 ` [RFC 03/12] lockdep: Make check_prev_add can use a stack_trace of other context Byungchul Park
@ 2016-06-20  4:55 ` Byungchul Park
  2016-06-20  4:55 ` [RFC 05/12] lockdep: Implement crossrelease feature Byungchul Park
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-20  4:55 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

Currently, save_trace() can only save current context's stack trace.
However, it would be useful if it can save(copy from) another context's
stack trace. Especially, it can be used by crossrelease feature.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index c596bef..b03014b 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -389,7 +389,7 @@ static void print_lockdep_off(const char *bug_msg)
 #endif
 }
 
-static int save_trace(struct stack_trace *trace)
+static int save_trace(struct stack_trace *trace, struct stack_trace *copy)
 {
 	trace->nr_entries = 0;
 	trace->max_entries = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
@@ -397,7 +397,13 @@ static int save_trace(struct stack_trace *trace)
 
 	trace->skip = 3;
 
-	save_stack_trace(trace);
+	if (copy) {
+		trace->nr_entries = min(copy->nr_entries, trace->max_entries);
+		trace->skip = copy->skip;
+		memcpy(trace->entries, copy->entries,
+				trace->nr_entries * sizeof(unsigned long));
+	} else
+		save_stack_trace(trace);
 
 	/*
 	 * Some daft arches put -1 at the end to indicate its a full trace.
@@ -1201,7 +1207,7 @@ static noinline int print_circular_bug(struct lock_list *this,
 	if (!debug_locks_off_graph_unlock() || debug_locks_silent)
 		return 0;
 
-	if (!save_trace(&this->trace))
+	if (!save_trace(&this->trace, NULL))
 		return 0;
 
 	depth = get_lock_depth(target);
@@ -1547,13 +1553,13 @@ print_bad_irq_dependency(struct task_struct *curr,
 
 	printk("\nthe dependencies between %s-irq-safe lock", irqclass);
 	printk(" and the holding lock:\n");
-	if (!save_trace(&prev_root->trace))
+	if (!save_trace(&prev_root->trace, NULL))
 		return 0;
 	print_shortest_lock_dependencies(backwards_entry, prev_root);
 
 	printk("\nthe dependencies between the lock to be acquired");
 	printk(" and %s-irq-unsafe lock:\n", irqclass);
-	if (!save_trace(&next_root->trace))
+	if (!save_trace(&next_root->trace, NULL))
 		return 0;
 	print_shortest_lock_dependencies(forwards_entry, next_root);
 
@@ -1885,7 +1891,7 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
 	}
 
 	if (!own_trace && stack_saved && !*stack_saved) {
-		if (!save_trace(&trace))
+		if (!save_trace(&trace, NULL))
 			return 0;
 		*stack_saved = 1;
 	}
@@ -2436,7 +2442,7 @@ print_irq_inversion_bug(struct task_struct *curr,
 	lockdep_print_held_locks(curr);
 
 	printk("\nthe shortest dependencies between 2nd lock and 1st lock:\n");
-	if (!save_trace(&root->trace))
+	if (!save_trace(&root->trace, NULL))
 		return 0;
 	print_shortest_lock_dependencies(other, root);
 
@@ -3015,7 +3021,7 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this,
 
 	hlock_class(this)->usage_mask |= new_mask;
 
-	if (!save_trace(hlock_class(this)->usage_traces + new_bit))
+	if (!save_trace(hlock_class(this)->usage_traces + new_bit, NULL))
 		return 0;
 
 	switch (new_bit) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 05/12] lockdep: Implement crossrelease feature
  2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
                   ` (3 preceding siblings ...)
  2016-06-20  4:55 ` [RFC 04/12] lockdep: Make save_trace can copy from other stack_trace Byungchul Park
@ 2016-06-20  4:55 ` Byungchul Park
  2016-06-30 13:03   ` Peter Zijlstra
  2016-06-20  4:55 ` [RFC 06/12] lockdep: Apply crossrelease to completion Byungchul Park
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Byungchul Park @ 2016-06-20  4:55 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

Crossrelease feature calls a lock which is releasable by a
different context from the context having acquired the lock,
crosslock. For crosslock, all locks having been held in the
context unlocking the crosslock, until eventually the crosslock
will be unlocked, have dependency with the crosslock. That's a
key idea to implement crossrelease feature.

Crossrelease feature introduces 2 new data structures.

1. pend_lock (== plock)

	This is for keeping locks waiting to commit those so
	that an actual dependency chain is built, when commiting
	a crosslock.

	Every task_struct has an array of this pending lock to
	keep those locks. These pending locks will be added
	whenever lock_acquire() is called for normal(non-crosslock)
	lock and will be flushed(committed) at proper time.

2. cross_lock (== xlock)

	This keeps some additional data only for crosslock. There
	is one cross_lock per one lockdep_map for crosslock.
	lockdep_init_map_crosslock() should be used instead of
	lockdep_init_map() to use the lock as a crosslock.

Acquiring and releasing sequence for crossrelease feature:

1. Acquire

	All validation check is performed for all locks.

	1) For non-crosslock (normal lock)

		The hlock will be added not only to held_locks
		of the current's task_struct, but also to
		pend_lock array of the task_struct, so that
		a dependency chain can be built with the lock
		when doing commit.

	2) For crosslock

		The hlock will be added only to the cross_lock
		of the lock's lockdep_map instead of held_locks,
		so that a dependency chain can be built with
		the lock when doing commit. And this lock is
		added to the xlocks_head list.

2. Commit (only for crosslock)

	This establishes a dependency chain between the lock
	unlocking it now and all locks having held in the context
	unlocking it since the lock was held, even though it tries
	to avoid building a chain unnecessarily as far as possible.

3. Release

	1) For non-crosslock (normal lock)

		No change.

	2) For crosslock

		Just Remove the lock from xlocks_head list. Release
		operation should be used with commit operation
		together for crosslock, in order to build a
		dependency chain properly.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/irqflags.h |  16 +-
 include/linux/lockdep.h  | 139 +++++++++++
 include/linux/sched.h    |   5 +
 kernel/fork.c            |   4 +
 kernel/locking/lockdep.c | 626 +++++++++++++++++++++++++++++++++++++++++++++--
 lib/Kconfig.debug        |  13 +
 6 files changed, 785 insertions(+), 18 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 5dd1272..83eebe1 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -23,10 +23,18 @@
 # define trace_softirq_context(p)	((p)->softirq_context)
 # define trace_hardirqs_enabled(p)	((p)->hardirqs_enabled)
 # define trace_softirqs_enabled(p)	((p)->softirqs_enabled)
-# define trace_hardirq_enter()	do { current->hardirq_context++; } while (0)
-# define trace_hardirq_exit()	do { current->hardirq_context--; } while (0)
-# define lockdep_softirq_enter()	do { current->softirq_context++; } while (0)
-# define lockdep_softirq_exit()	do { current->softirq_context--; } while (0)
+# define trace_hardirq_enter()		\
+do {					\
+	current->hardirq_context++;	\
+	crossrelease_hardirq_start();	\
+} while (0)
+# define trace_hardirq_exit()		do { current->hardirq_context--; } while (0)
+# define lockdep_softirq_enter()	\
+do {					\
+	current->softirq_context++;	\
+	crossrelease_softirq_start();	\
+} while (0)
+# define lockdep_softirq_exit()		do { current->softirq_context--; } while (0)
 # define INIT_TRACE_IRQFLAGS	.softirqs_enabled = 1,
 #else
 # define trace_hardirqs_on()		do { } while (0)
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 4dca42f..1bf513e 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -108,6 +108,19 @@ struct lock_class {
 	unsigned long			contention_point[LOCKSTAT_POINTS];
 	unsigned long			contending_point[LOCKSTAT_POINTS];
 #endif
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+	/*
+	 * A flag to check if this lock class is releasable in
+	 * a differnet context from the context acquiring it.
+	 */
+	int				crosslock;
+
+	/*
+	 * When building a dependency chain, this help any classes
+	 * already established the chain can be skipped.
+	 */
+	unsigned int			skip_gen_id;
+#endif
 };
 
 #ifdef CONFIG_LOCK_STAT
@@ -143,6 +156,9 @@ struct lock_class_stats lock_stats(struct lock_class *class);
 void clear_lock_stats(struct lock_class *class);
 #endif
 
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+struct cross_lock;
+#endif
 /*
  * Map the lock object (the lock instance) to the lock-class object.
  * This is embedded into specific lock instances:
@@ -155,6 +171,15 @@ struct lockdep_map {
 	int				cpu;
 	unsigned long			ip;
 #endif
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+	struct cross_lock		*xlock;
+
+	/*
+	 * A flag to check if this lockdep_map is releasable in
+	 * a differnet context from the context acquiring it.
+	 */
+	int				cross;
+#endif
 };
 
 static inline void lockdep_copy_map(struct lockdep_map *to,
@@ -256,8 +281,90 @@ struct held_lock {
 	unsigned int hardirqs_off:1;
 	unsigned int references:12;					/* 32 bits */
 	unsigned int pin_count;
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+	/*
+	 * This is used to find out the first lock acquired since
+	 * a crosslock was held. In the crossrelease feature, we
+	 * build and use a dependency chain between the crosslock
+	 * and the first normal(non-crosslock) lock acquired
+	 * since the crosslock was held.
+	 */
+	unsigned int gen_id;
+#endif
 };
 
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+#define MAX_PLOCK_TRACE_ENTRIES		5
+
+/*
+ * This is for keeping locks waiting to commit those
+ * so that an actual dependency chain is built, when
+ * commiting a crosslock.
+ *
+ * Every task_struct has an array of this pending lock
+ * to keep those locks. These pending locks will be
+ * added whenever lock_acquire() is called for
+ * normal(non-crosslock) lock and will be
+ * flushed(committed) at proper time.
+ */
+struct pend_lock {
+	/*
+	 * This is used to find out the first lock acquired since
+	 * a crosslock was held. In the crossrelease feature, we
+	 * build and use a dependency chain between the crosslock
+	 * and the first normal(non-crosslock) lock acquired
+	 * since the crosslock was held.
+	 */
+	unsigned int		prev_gen_id;
+	unsigned int		gen_id;
+
+	int			hardirq_context;
+	int			softirq_context;
+
+	/*
+	 * Whenever irq occures, these ids will be updated so that
+	 * we can distinguish each irq context uniquely.
+	 */
+	unsigned int		hardirq_id;
+	unsigned int		softirq_id;
+
+	/*
+	 * Seperated stack_trace data. This will be used when
+	 * building a dependency chain for a crosslock, say,
+	 * commit.
+	 */
+	struct stack_trace	trace;
+	unsigned long		trace_entries[MAX_PLOCK_TRACE_ENTRIES];
+
+	/*
+	 * Seperated hlock instance. This will be used when
+	 * building a dependency chain for a crosslock, say,
+	 * commit.
+	 */
+	struct held_lock	hlock;
+};
+
+/*
+ * One cross_lock per one lockdep_map. For crosslock,
+ * lockdep_init_map_crosslock() should be used instead of
+ * lockdep_init_map(), where the pointer of this cross_lock
+ * instance should be passed as a parameter.
+ */
+struct cross_lock {
+	unsigned int		gen_id;
+	struct list_head	xlock_entry;
+
+	/*
+	 * Seperated hlock instance. This will be used when
+	 * building a dependency chain for a crosslock, say,
+	 * commit.
+	 */
+	struct held_lock	hlock;
+
+	int			ref; /* reference count */
+};
+#endif
+
 /*
  * Initialization, self-test and debugging-output methods:
  */
@@ -280,6 +387,34 @@ extern void lockdep_on(void);
 extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
 			     struct lock_class_key *key, int subclass);
 
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+extern void lockdep_init_map_crosslock(struct lockdep_map *lock,
+				       struct cross_lock *xlock,
+				       const char *name,
+				       struct lock_class_key *key,
+				       int subclass);
+extern void lock_commit_crosslock(struct lockdep_map *lock);
+
+/*
+ * A member which we essencially have to initialize is ref.
+ * Other members will be initialized in add_xlock().
+ */
+#define STATIC_CROSS_LOCK_INIT() \
+	{ .ref = 0,}
+
+#define STATIC_CROSS_LOCKDEP_MAP_INIT(_name, _key, _xlock) \
+	{ .name = (_name), .key = (void *)(_key), .xlock = (_xlock), .cross = 1, }
+
+/*
+ * To initialize a lockdep_map statically use this macro.
+ * Note that _name must not be NULL.
+ */
+#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
+	{ .name = (_name), .key = (void *)(_key), .xlock = NULL, .cross = 0, }
+
+extern void crossrelease_hardirq_start(void);
+extern void crossrelease_softirq_start(void);
+#else
 /*
  * To initialize a lockdep_map statically use this macro.
  * Note that _name must not be NULL.
@@ -287,6 +422,10 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
 #define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
 	{ .name = (_name), .key = (void *)(_key), }
 
+void crossrelease_hardirq_start(void) {}
+void crossrelease_softirq_start(void) {}
+#endif
+
 /*
  * Reinitialize a lock key - for cases where there is special locking or
  * special initialization of locks so that the validator gets the scope
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a10494a..6e45761 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1644,6 +1644,11 @@ struct task_struct {
 	struct held_lock held_locks[MAX_LOCK_DEPTH];
 	gfp_t lockdep_reclaim_gfp;
 #endif
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+#define MAX_PLOCKS_NR 1024UL
+	int plock_index;
+	struct pend_lock plocks[MAX_PLOCKS_NR];
+#endif
 #ifdef CONFIG_UBSAN
 	unsigned int in_ubsan;
 #endif
diff --git a/kernel/fork.c b/kernel/fork.c
index 2e391c7..eb18c47 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1404,6 +1404,10 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	p->lockdep_depth = 0; /* no locks held yet */
 	p->curr_chain_key = 0;
 	p->lockdep_recursion = 0;
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+	p->plock_index = 0;
+	memset(p->plocks, 0x0, sizeof(struct pend_lock) * MAX_PLOCKS_NR);
+#endif
 #endif
 
 #ifdef CONFIG_DEBUG_MUTEXES
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index b03014b..fcbcb2e 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -739,6 +739,20 @@ look_up_lock_class(struct lockdep_map *lock, unsigned int subclass)
 	return NULL;
 }
 
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+static int cross_class(struct lock_class *class);
+static void init_map_noncrosslock(struct lockdep_map *lock);
+static void init_class_crosslock(struct lock_class *class, int cross);
+static int lock_acquire_crosslock(struct held_lock *hlock);
+static int lock_release_crosslock(struct lockdep_map *lock);
+#else
+static inline int cross_class(struct lock_class *class) { return 0; }
+static inline void init_map_noncrosslock(struct lockdep_map *lock) {}
+static inline void init_class_crosslock(struct lock_class *class, int cross) {}
+static inline int lock_acquire_crosslock(struct held_lock *hlock) { return 0; }
+static inline int lock_release_crosslock(struct lockdep_map *lock) { return 0; }
+#endif
+
 /*
  * Register a lock's class in the hash-table, if the class is not present
  * yet. Otherwise we look it up. We cache the result in the lock object
@@ -807,6 +821,7 @@ register_lock_class(struct lockdep_map *lock, unsigned int subclass, int force)
 	INIT_LIST_HEAD(&class->locks_before);
 	INIT_LIST_HEAD(&class->locks_after);
 	class->name_version = count_matching_names(class);
+	init_class_crosslock(class, lock->cross);
 	/*
 	 * We use RCU's safe list-add method to make
 	 * parallel walking of the hash-list safe:
@@ -1799,6 +1814,9 @@ check_deadlock(struct task_struct *curr, struct held_lock *next,
 		if (nest)
 			return 2;
 
+		if (cross_class(hlock_class(prev)))
+			continue;
+
 		return print_deadlock_bug(curr, prev, next);
 	}
 	return 1;
@@ -1964,21 +1982,27 @@ check_prevs_add(struct task_struct *curr, struct held_lock *next)
 		int distance = curr->lockdep_depth - depth + 1;
 		hlock = curr->held_locks + depth - 1;
 		/*
-		 * Only non-recursive-read entries get new dependencies
-		 * added:
+		 * Only non-crosslock entries get new dependencies added.
+		 * Crosslock entries will be added by commiting later:
 		 */
-		if (hlock->read != 2 && hlock->check) {
-			if (!check_prev_add(curr, hlock, next, distance,
-						&stack_saved, NULL))
-				return 0;
+		if (!cross_class(hlock_class(hlock))) {
 			/*
-			 * Stop after the first non-trylock entry,
-			 * as non-trylock entries have added their
-			 * own direct dependencies already, so this
-			 * lock is connected to them indirectly:
+			 * Only non-recursive-read entries get new dependencies
+			 * added:
 			 */
-			if (!hlock->trylock)
-				break;
+			if (hlock->read != 2 && hlock->check) {
+				if (!check_prev_add(curr, hlock, next, distance,
+							&stack_saved, NULL))
+					return 0;
+				/*
+				 * Stop after the first non-trylock entry,
+				 * as non-trylock entries have added their
+				 * own direct dependencies already, so this
+				 * lock is connected to them indirectly:
+				 */
+				if (!hlock->trylock)
+					break;
+			}
 		}
 		depth--;
 		/*
@@ -3064,7 +3088,7 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this,
 /*
  * Initialize a lock instance's lock-class mapping info:
  */
-void lockdep_init_map(struct lockdep_map *lock, const char *name,
+static void __lockdep_init_map(struct lockdep_map *lock, const char *name,
 		      struct lock_class_key *key, int subclass)
 {
 	int i;
@@ -3122,8 +3146,27 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name,
 		raw_local_irq_restore(flags);
 	}
 }
+
+void lockdep_init_map(struct lockdep_map *lock, const char *name,
+		      struct lock_class_key *key, int subclass)
+{
+	init_map_noncrosslock(lock);
+	__lockdep_init_map(lock, name, key, subclass);
+}
 EXPORT_SYMBOL_GPL(lockdep_init_map);
 
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+static void init_map_crosslock(struct lockdep_map *lock, struct cross_lock *xlock);
+void lockdep_init_map_crosslock(struct lockdep_map *lock,
+		      struct cross_lock *xlock, const char *name,
+		      struct lock_class_key *key, int subclass)
+{
+	init_map_crosslock(lock, xlock);
+	__lockdep_init_map(lock, name, key, subclass);
+}
+EXPORT_SYMBOL_GPL(lockdep_init_map_crosslock);
+#endif
+
 struct lock_class_key __lockdep_no_validate__;
 EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
 
@@ -3227,7 +3270,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 
 	class_idx = class - lock_classes + 1;
 
-	if (depth) {
+	/* TODO: nest_lock is not implemented for crosslock yet. */
+	if (depth && !cross_class(class)) {
 		hlock = curr->held_locks + depth - 1;
 		if (hlock->class_idx == class_idx && nest_lock) {
 			if (hlock->references)
@@ -3308,6 +3352,9 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	if (!validate_chain(curr, lock, hlock, chain_head, chain_key))
 		return 0;
 
+	if (lock_acquire_crosslock(hlock))
+		return 1;
+
 	curr->curr_chain_key = chain_key;
 	curr->lockdep_depth++;
 	check_chain_key(curr);
@@ -3476,6 +3523,9 @@ __lock_release(struct lockdep_map *lock, int nested, unsigned long ip)
 	if (unlikely(!debug_locks))
 		return 0;
 
+	if (lock_release_crosslock(lock))
+		return 1;
+
 	depth = curr->lockdep_depth;
 	/*
 	 * So we're all set to release this lock.. wait what lock? We don't
@@ -4396,3 +4446,551 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
 	dump_stack();
 }
 EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
+
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+
+/*
+ * Crossrelease feature call a lock which is releasable by a
+ * different context from the context having acquired the lock,
+ * crosslock. For crosslock, all locks being held in the context
+ * unlocking the crosslock, until eventually the crosslock will
+ * be unlocked, have dependency with the crosslock. That's a key
+ * idea to implement crossrelease feature.
+ *
+ * Crossrelease feature introduces 2 new data structures.
+ *
+ * 1. pend_lock (== plock)
+ *
+ *	This is for keeping locks waiting to commit those so
+ *	that an actual dependency chain is built, when commiting
+ *	a crosslock.
+ *
+ *	Every task_struct has an array of this pending lock to
+ *	keep those locks. These pending locks will be added
+ *	whenever lock_acquire() is called for normal(non-crosslock)
+ *	lock and will be flushed(committed) at proper time.
+ *
+ * 2. cross_lock (== xlock)
+ *
+ *	This keeps some additional data only for crosslock. There
+ *	is one cross_lock per one lockdep_map for crosslock.
+ *	lockdep_init_map_crosslock() should be used instead of
+ *	lockdep_init_map() to use the lock as a crosslock.
+ *
+ * Acquiring and releasing sequence for crossrelease feature:
+ *
+ * 1. Acquire
+ *
+ *	All validation check is performed for all locks.
+ *
+ *	1) For non-crosslock (normal lock)
+ *
+ *		The hlock will be added not only to held_locks
+ *		of the current's task_struct, but also to
+ *		pend_lock array of the task_struct, so that
+ *		a dependency chain can be built with the lock
+ *		when doing commit.
+ *
+ *	2) For crosslock
+ *
+ *		The hlock will be added only to the cross_lock
+ *		of the lock's lockdep_map instead of held_locks,
+ *		so that a dependency chain can be built with
+ *		the lock when doing commit. And this lock is
+ *		added to the xlocks_head list.
+ *
+ * 2. Commit (only for crosslock)
+ *
+ *	This establishes a dependency chain between the lock
+ *	unlocking it now and all locks having held in the context
+ *	unlocking it since the lock was held, even though it tries
+ *	to avoid building a chain unnecessarily as far as possible.
+ *
+ * 3. Release
+ *
+ *	1) For non-crosslock (normal lock)
+ *
+ *		No change.
+ *
+ *	2) For crosslock
+ *
+ *		Just Remove the lock from xlocks_head list. Release
+ *		operation should be used with commit operation
+ *		together for crosslock, in order to build a
+ *		dependency chain properly.
+ */
+
+static LIST_HEAD(xlocks_head);
+
+/*
+ * Whenever a crosslock is held, cross_gen_id will be increased.
+ */
+static atomic_t cross_gen_id; /* Can be wrapped */
+
+/* Implement a circular buffer - for internal use */
+#define cir_p(n, i)		((i) ? (i) - 1 : (n) - 1)
+#define cir_n(n, i)		((i) == (n) - 1 ? 0 : (i) + 1)
+#define p_idx_p(i)		cir_p(MAX_PLOCKS_NR, i)
+#define p_idx_n(i)		cir_n(MAX_PLOCKS_NR, i)
+#define p_idx(t)		((t)->plock_index)
+
+/* For easy access to plock */
+#define plock(t, i)		((t)->plocks + (i))
+#define plock_prev(t, p)	plock(t, p_idx_p((p) - (t)->plocks))
+#define plock_curr(t)		plock(t, p_idx(t))
+#define plock_incr(t)		({p_idx(t) = p_idx_n(p_idx(t));})
+
+/*
+ * Crossrelease also need to distinguish each hardirq context, not
+ * only identify its depth.
+ */
+static DEFINE_PER_CPU(unsigned int, hardirq_id);
+void crossrelease_hardirq_start(void)
+{
+	per_cpu(hardirq_id, smp_processor_id())++;
+}
+
+/*
+ * Crossrelease also need to distinguish each softirq context, not
+ * only identify its depth.
+ */
+static DEFINE_PER_CPU(unsigned int, softirq_id);
+void crossrelease_softirq_start(void)
+{
+	per_cpu(softirq_id, smp_processor_id())++;
+}
+
+static int cross_class(struct lock_class *class)
+{
+	if (!class)
+		return 0;
+
+	return class->crosslock;
+}
+
+/*
+ * This is needed to decide the relationship between
+ * wrapable variables.
+ */
+static inline int before(unsigned int a, unsigned int b)
+{
+	return (int)(a - b) < 0;
+}
+
+static inline struct lock_class *plock_class(struct pend_lock *plock)
+{
+	return hlock_class(&plock->hlock);
+}
+
+static inline struct lock_class *xlock_class(struct cross_lock *xlock)
+{
+	return hlock_class(&xlock->hlock);
+}
+
+/*
+ * To find the earlist crosslock among all crosslocks not released yet
+ */
+static unsigned int gen_id_begin(void)
+{
+	struct cross_lock *xlock;
+	unsigned int gen_id;
+	unsigned int min = (unsigned int)atomic_read(&cross_gen_id) + 1;
+
+	list_for_each_entry_rcu(xlock, &xlocks_head, xlock_entry) {
+		gen_id = READ_ONCE(xlock->gen_id);
+		if (before(gen_id, min))
+			min = gen_id;
+	}
+
+	return min;
+}
+
+/*
+ * To find the latest crosslock among all crosslocks already released
+ */
+static inline unsigned int gen_id_done(void)
+{
+	return gen_id_begin() - 1;
+}
+
+/*
+ * Should we check a dependency with previous one?
+ */
+static inline int dep_before(struct held_lock *hlock)
+{
+	return hlock->read != 2 && hlock->check && !hlock->trylock;
+}
+
+/*
+ * Should we check a dependency with next one?
+ */
+static inline int dep_after(struct held_lock *hlock)
+{
+	return hlock->read != 2 && hlock->check;
+}
+
+/*
+ * Check if the plock is used at least once after initializaion.
+ * Remind pend_lock is implemented as a ring buffer.
+ */
+static inline int plock_used(struct pend_lock *plock)
+{
+	/*
+	 * plock->hlock.class_idx must be !NULL and
+	 * plock->hlock.instance must be !NULL,
+	 * if it was used once.
+	 */
+	return plock->hlock.instance ? 1 : 0;
+}
+
+/*
+ * Get a pend_lock from pend_lock ring buffer and provide it
+ * to caller.
+ *
+ * No contention. Irq disable is only required.
+ */
+static struct pend_lock *alloc_plock(unsigned int gen_id_done)
+{
+	struct task_struct *curr = current;
+	struct pend_lock *plock = plock_curr(curr);
+
+	if (plock_used(plock) && before(gen_id_done, plock->gen_id)) {
+		printk_once("crossrelease: plock pool is full.\n");
+		return NULL;
+	}
+
+	plock_incr(curr);
+	return plock;
+}
+
+/*
+ * No contention. Irq disable is only required.
+ */
+static void add_plock(struct held_lock *hlock, unsigned int prev_gen_id,
+		unsigned int gen_id_done)
+{
+	struct task_struct *curr = current;
+	int cpu = smp_processor_id();
+	struct pend_lock *plock;
+	unsigned int gen_id = (unsigned int)atomic_read(&cross_gen_id);
+
+	plock = alloc_plock(gen_id_done);
+
+	if (plock) {
+		/* Initialize pend_lock's members here */
+		memcpy(&plock->hlock, hlock, sizeof(struct held_lock));
+		plock->prev_gen_id = prev_gen_id;
+		plock->gen_id = gen_id;
+		plock->hardirq_context = curr->hardirq_context;
+		plock->softirq_context = curr->softirq_context;
+		plock->hardirq_id = per_cpu(hardirq_id, cpu);
+		plock->softirq_id = per_cpu(softirq_id, cpu);
+
+		plock->trace.nr_entries = 0;
+		plock->trace.max_entries = MAX_PLOCK_TRACE_ENTRIES;
+		plock->trace.entries = plock->trace_entries;
+		plock->trace.skip = 3;
+		save_stack_trace(&plock->trace);
+	}
+}
+
+/*
+ * No contention. Irq disable is only required.
+ */
+static int same_context_plock(struct pend_lock *plock)
+{
+	struct task_struct *curr = current;
+	int cpu = smp_processor_id();
+
+	/* In the case of hardirq context */
+	if (curr->hardirq_context) {
+		if (plock->hardirq_id != per_cpu(hardirq_id, cpu) ||
+		    plock->hardirq_context != curr->hardirq_context)
+			return 0;
+	/* In the case of softriq context */
+	} else if (curr->softirq_context) {
+		if (plock->softirq_id != per_cpu(softirq_id, cpu) ||
+		    plock->softirq_context != curr->softirq_context)
+			return 0;
+	/* In the case of process context */
+	} else {
+		if (plock->hardirq_context != 0 ||
+		    plock->softirq_context != 0)
+			return 0;
+	}
+	return 1;
+}
+
+/*
+ * We try to avoid adding a pend_lock unnecessarily so that
+ * the overhead adding pend_lock and building a dependency
+ * chain when commiting it unnecessarily, can be reduced.
+ */
+static int check_dup_plock(struct held_lock *hlock)
+{
+	struct task_struct *curr = current;
+	struct lock_class *class = hlock_class(hlock);
+	struct pend_lock *plock_c = plock_curr(curr);
+	struct pend_lock *plock = plock_c;
+	unsigned int gen_id = (unsigned int)atomic_read(&cross_gen_id);
+
+	do {
+		plock = plock_prev(curr, plock);
+
+		if (!plock_used(plock))
+			break;
+		if (plock->gen_id != gen_id)
+			break;
+		if (same_context_plock(plock) &&
+		    hlock_class(&plock->hlock) == class)
+			return 1;
+	} while (plock_c != plock);
+
+	return 0;
+}
+
+/*
+ * This will be called when lock_acquire() is called for
+ * non-crosslock. It should be implemented locklessly as far
+ * as possible since it is called very frequently.
+ */
+static void check_add_plock(struct held_lock *hlock)
+{
+	struct held_lock *prev;
+	struct held_lock *start;
+	struct cross_lock *xlock;
+	struct lock_chain *chain;
+	unsigned int id;
+	unsigned int gen_id;
+	unsigned int gen_id_e;
+	u64 chain_key;
+
+	if (!dep_before(hlock) || check_dup_plock(hlock))
+		return;
+
+	gen_id = (unsigned int)atomic_read(&cross_gen_id);
+	gen_id_e = gen_id_done();
+	start = current->held_locks;
+
+	list_for_each_entry_rcu(xlock, &xlocks_head, xlock_entry) {
+		id = xlock_class(xlock) - lock_classes;
+		chain_key = iterate_chain_key((u64)0, id);
+		id = hlock_class(hlock) - lock_classes;
+		chain_key = iterate_chain_key(chain_key, id);
+		chain = lookup_chain_cache(chain_key);
+
+		if (!chain) {
+			for (prev = hlock - 1; prev >= start &&
+			     !dep_before(prev); prev--);
+
+			if (prev < start)
+				add_plock(hlock, gen_id_e, gen_id_e);
+			else if (prev->gen_id != gen_id)
+				add_plock(hlock, prev->gen_id, gen_id_e);
+
+			break;
+		}
+	}
+}
+
+/*
+ * This will be called when lock_acquire() is called for crosslock.
+ */
+static int add_xlock(struct held_lock *hlock)
+{
+	struct cross_lock *xlock;
+	unsigned int gen_id;
+
+	if (!dep_after(hlock))
+		return 1;
+
+	if (!graph_lock())
+		return 0;
+
+	xlock = hlock->instance->xlock;
+	if (!xlock)
+		goto unlock;
+
+	if (xlock->ref++)
+		goto unlock;
+
+	/*
+	 * We do assign operation for class_idx here redundantly
+	 * even though memcpy will also be performed soon, to
+	 * ensure a rcu reader can access this class_idx atomically
+	 * without lock.
+	 *
+	 * Here we assume setting a word-sized variable to a value
+	 * is an atomic operation.
+	 */
+	xlock->hlock.class_idx = hlock->class_idx;
+	gen_id = (unsigned int)atomic_inc_return(&cross_gen_id);
+	WRITE_ONCE(xlock->gen_id, gen_id);
+	memcpy(&xlock->hlock, hlock, sizeof(struct held_lock));
+	INIT_LIST_HEAD(&xlock->xlock_entry);
+	list_add_tail_rcu(&xlock->xlock_entry, &xlocks_head);
+unlock:
+	graph_unlock();
+	return 1;
+}
+
+/*
+ * return 0: Need to do non-crossrelease acquire ops.
+ * return 1: Done. No more acquire ops is needed.
+ */
+static int lock_acquire_crosslock(struct held_lock *hlock)
+{
+	unsigned int gen_id = (unsigned int)atomic_read(&cross_gen_id);
+
+	hlock->gen_id = gen_id;
+
+	if (cross_class(hlock_class(hlock)))
+		return add_xlock(hlock);
+
+	check_add_plock(hlock);
+	return 0;
+}
+
+static int commit_plock(struct cross_lock *xlock, struct pend_lock *plock)
+{
+	struct stack_trace trace;
+	unsigned int id;
+	u64 chain_key;
+
+	id = xlock_class(xlock) - lock_classes;
+	chain_key = iterate_chain_key((u64)0, id);
+	id = plock_class(plock) - lock_classes;
+	chain_key = iterate_chain_key(chain_key, id);
+
+	if (lookup_chain_cache(chain_key))
+		return 1;
+
+	if (!add_chain_cache_2hlocks(&xlock->hlock, &plock->hlock, chain_key))
+		return 0;
+
+	if (!save_trace(&trace, &plock->trace))
+		return 0;
+
+	if (!check_prev_add(current, &xlock->hlock, &plock->hlock, 1,
+			    NULL, &trace))
+		return 0;
+
+	return 1;
+}
+
+static int commit_plocks(struct cross_lock *xlock)
+{
+	struct task_struct *curr = current;
+	struct pend_lock *plock_c = plock_curr(curr);
+	struct pend_lock *plock = plock_c;
+
+	do {
+		plock = plock_prev(curr, plock);
+
+		if (!plock_used(plock))
+			break;
+
+		if (before(plock->gen_id, xlock->gen_id))
+			break;
+
+		if (same_context_plock(plock) &&
+		    before(plock->prev_gen_id, xlock->gen_id) &&
+		    plock_class(plock)->skip_gen_id != xlock->gen_id &&
+		    !commit_plock(xlock, plock))
+			return 0;
+	} while (plock_c != plock);
+
+	return 1;
+}
+
+/* Need to be protected using graph_lock() by caller */
+static void mark_skip_classes(struct lock_class *class, unsigned int id)
+{
+	struct lock_list *entry;
+
+	list_for_each_entry(entry, &class->locks_after, entry)
+		entry->class->skip_gen_id = id;
+}
+
+/*
+ * lock_commit_crosslock is a function for doing commit.
+ */
+void lock_commit_crosslock(struct lockdep_map *lock)
+{
+	struct cross_lock *xlock;
+	unsigned long flags;
+
+	if (unlikely(current->lockdep_recursion))
+		return;
+
+	raw_local_irq_save(flags);
+	check_flags(flags);
+	current->lockdep_recursion = 1;
+
+	if (unlikely(!debug_locks))
+		return;
+
+	if (!graph_lock())
+		return;
+
+	xlock = lock->xlock;
+	if (xlock && xlock->ref > 0) {
+		mark_skip_classes(xlock_class(xlock), xlock->gen_id);
+		if (!commit_plocks(xlock))
+			return;
+	}
+
+	graph_unlock();
+	current->lockdep_recursion = 0;
+	raw_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(lock_commit_crosslock);
+
+/*
+ * return 0: Need to do non-crossrelease release ops.
+ * return 1: Done. No more release ops is needed.
+ */
+static int lock_release_crosslock(struct lockdep_map *lock)
+{
+	struct cross_lock *xlock;
+	int ret = lock->cross;
+
+	if (!graph_lock())
+		return 0;
+
+	xlock = lock->xlock;
+	if (xlock && !--xlock->ref)
+		list_del_rcu(&xlock->xlock_entry);
+
+	graph_unlock();
+	return ret;
+}
+
+static void init_map_noncrosslock(struct lockdep_map *lock)
+{
+	lock->cross = 0;
+	lock->xlock = NULL;
+}
+
+static void init_map_crosslock(struct lockdep_map *lock, struct cross_lock *xlock)
+{
+	unsigned long flags;
+
+	BUG_ON(!lock || !xlock);
+
+	raw_local_irq_save(flags);
+	if (graph_lock()) {
+		memset(xlock, 0x0, sizeof(struct cross_lock));
+		lock->cross = 1;
+		lock->xlock = xlock;
+		graph_unlock();
+	}
+	raw_local_irq_restore(flags);
+}
+
+static void init_class_crosslock(struct lock_class *class, int cross)
+{
+	class->crosslock = cross;
+	class->skip_gen_id = gen_id_done();
+}
+#endif
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ca2d2ee..37e0136 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1003,6 +1003,19 @@ config BITLOCK_ALLOC
 	 bit-based lock e.g. bit_spin_lock. lockdep_map instance is
 	 necessary for lock correctness checking to be used.
 
+config LOCKDEP_CROSSRELEASE
+	bool "Lock debugging: allow other context to unlock a lock"
+	depends on TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
+	select LOCKDEP
+	select TRACE_IRQFLAGS
+	default n
+	help
+	 This allows a context to unlock a lock held by another context.
+	 Normally a lock must be unlocked by the context holding the lock.
+	 However relexing this constraint helps locks like (un)lock_page()
+	 or wait_for_complete() can use lock correctness detector using
+	 lockdep.
+
 config PROVE_LOCKING
 	bool "Lock debugging: prove locking correctness"
 	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 06/12] lockdep: Apply crossrelease to completion
  2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
                   ` (4 preceding siblings ...)
  2016-06-20  4:55 ` [RFC 05/12] lockdep: Implement crossrelease feature Byungchul Park
@ 2016-06-20  4:55 ` Byungchul Park
  2016-06-20  4:55 ` [RFC 07/12] pagemap.h: Remove trailing white space Byungchul Park
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-20  4:55 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

wait_for_complete() and its family can cause deadlock. Nevertheless, it
cannot use the lock correntness validator because complete() will be
called in different context from the context calling wait_for_complete(),
which violates original lockdep's assumption.

However, thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can apply the lockdep
detector to wait_for_complete() and complete(). Applied it.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/completion.h | 121 +++++++++++++++++++++++++++++++++++++++++----
 kernel/locking/lockdep.c   |  18 +++++++
 kernel/sched/completion.c  |  55 ++++++++++++---------
 lib/Kconfig.debug          |   8 +++
 4 files changed, 169 insertions(+), 33 deletions(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 5d5aaae..67a27af 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,6 +9,9 @@
  */
 
 #include <linux/wait.h>
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#include <linux/lockdep.h>
+#endif
 
 /*
  * struct completion - structure used to maintain state for a "completion"
@@ -25,10 +28,53 @@
 struct completion {
 	unsigned int done;
 	wait_queue_head_t wait;
+#ifdef CONFIG_LOCKDEP_COMPLETE
+	struct lockdep_map map;
+	struct cross_lock xlock;
+#endif
 };
 
+#ifdef CONFIG_LOCKDEP_COMPLETE
+static inline void complete_acquire(struct completion *x)
+{
+	lock_acquire_exclusive(&x->map, 0, 0, NULL, _RET_IP_);
+}
+
+static inline void complete_release(struct completion *x)
+{
+	lock_release(&x->map, 0, _RET_IP_);
+}
+
+static inline void complete_release_commit(struct completion *x)
+{
+	lock_commit_crosslock(&x->map);
+}
+
+#define init_completion(x)				\
+do {							\
+	static struct lock_class_key __key;		\
+	lockdep_init_map_crosslock(&(x)->map,		\
+			&(x)->xlock,			\
+			"(complete)" #x,		\
+			&__key, 0);			\
+	__init_completion(x);				\
+} while (0)
+#else
+#define init_completion(x) __init_completion(x)
+static inline void complete_acquire(struct completion *x, int try) {}
+static inline void complete_release(struct completion *x) {}
+static inline void complete_release_commit(struct completion *x) {}
+#endif
+
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#define COMPLETION_INITIALIZER(work) \
+	{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
+	STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work), \
+	&(work).xlock), STATIC_CROSS_LOCK_INIT()}
+#else
 #define COMPLETION_INITIALIZER(work) \
 	{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+#endif
 
 #define COMPLETION_INITIALIZER_ONSTACK(work) \
 	({ init_completion(&work); work; })
@@ -70,7 +116,7 @@ struct completion {
  * This inline function will initialize a dynamically created completion
  * structure.
  */
-static inline void init_completion(struct completion *x)
+static inline void __init_completion(struct completion *x)
 {
 	x->done = 0;
 	init_waitqueue_head(&x->wait);
@@ -88,18 +134,75 @@ static inline void reinit_completion(struct completion *x)
 	x->done = 0;
 }
 
-extern void wait_for_completion(struct completion *);
-extern void wait_for_completion_io(struct completion *);
-extern int wait_for_completion_interruptible(struct completion *x);
-extern int wait_for_completion_killable(struct completion *x);
-extern unsigned long wait_for_completion_timeout(struct completion *x,
+extern void __wait_for_completion(struct completion *);
+extern void __wait_for_completion_io(struct completion *);
+extern int __wait_for_completion_interruptible(struct completion *x);
+extern int __wait_for_completion_killable(struct completion *x);
+extern unsigned long __wait_for_completion_timeout(struct completion *x,
 						   unsigned long timeout);
-extern unsigned long wait_for_completion_io_timeout(struct completion *x,
+extern unsigned long __wait_for_completion_io_timeout(struct completion *x,
 						    unsigned long timeout);
-extern long wait_for_completion_interruptible_timeout(
+extern long __wait_for_completion_interruptible_timeout(
 	struct completion *x, unsigned long timeout);
-extern long wait_for_completion_killable_timeout(
+extern long __wait_for_completion_killable_timeout(
 	struct completion *x, unsigned long timeout);
+
+static inline void wait_for_completion(struct completion *x)
+{
+	complete_acquire(x);
+	__wait_for_completion(x);
+	complete_release(x);
+}
+
+static inline void wait_for_completion_io(struct completion *x)
+{
+	complete_acquire(x);
+	__wait_for_completion_io(x);
+	complete_release(x);
+}
+
+static inline int wait_for_completion_interruptible(struct completion *x)
+{
+	int ret;
+	complete_acquire(x);
+	ret = __wait_for_completion_interruptible(x);
+	complete_release(x);
+	return ret;
+}
+
+static inline int wait_for_completion_killable(struct completion *x)
+{
+	int ret;
+	complete_acquire(x);
+	ret = __wait_for_completion_killable(x);
+	complete_release(x);
+	return ret;
+}
+
+static inline unsigned long wait_for_completion_timeout(struct completion *x,
+		unsigned long timeout)
+{
+	return __wait_for_completion_timeout(x, timeout);
+}
+
+static inline unsigned long wait_for_completion_io_timeout(struct completion *x,
+		unsigned long timeout)
+{
+	return __wait_for_completion_io_timeout(x, timeout);
+}
+
+static inline long wait_for_completion_interruptible_timeout(
+	struct completion *x, unsigned long timeout)
+{
+	return __wait_for_completion_interruptible_timeout(x, timeout);
+}
+
+static inline long wait_for_completion_killable_timeout(
+	struct completion *x, unsigned long timeout)
+{
+	return __wait_for_completion_killable_timeout(x, timeout);
+}
+
 extern bool try_wait_for_completion(struct completion *x);
 extern bool completion_done(struct completion *x);
 
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index fcbcb2e..b679116 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4694,6 +4694,12 @@ static void add_plock(struct held_lock *hlock, unsigned int prev_gen_id,
 	}
 }
 
+#ifdef CONFIG_LOCKDEP_COMPLETE
+static int xlock_might_onstack = 1;
+#else
+static int xlock_might_onstack = 0;
+#endif
+
 /*
  * No contention. Irq disable is only required.
  */
@@ -4768,6 +4774,15 @@ static void check_add_plock(struct held_lock *hlock)
 	if (!dep_before(hlock) || check_dup_plock(hlock))
 		return;
 
+	/*
+	 * If a xlock instance is on stack, it can be
+	 * overwritten randomly after escaping the
+	 * stack fraem, so we cannot refer rcu protected
+	 * list without holding lock.
+	 */
+	if (xlock_might_onstack && !graph_lock())
+		return;
+
 	gen_id = (unsigned int)atomic_read(&cross_gen_id);
 	gen_id_e = gen_id_done();
 	start = current->held_locks;
@@ -4791,6 +4806,9 @@ static void check_add_plock(struct held_lock *hlock)
 			break;
 		}
 	}
+
+	if (xlock_might_onstack)
+		graph_unlock();
 }
 
 /*
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 8d0f35d..b695699 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -31,6 +31,11 @@ void complete(struct completion *x)
 	unsigned long flags;
 
 	spin_lock_irqsave(&x->wait.lock, flags);
+	/*
+	 * Actual lock dependency building should be
+	 * performed when complete() is called.
+	 */
+	complete_release_commit(x);
 	x->done++;
 	__wake_up_locked(&x->wait, TASK_NORMAL, 1);
 	spin_unlock_irqrestore(&x->wait.lock, flags);
@@ -108,7 +113,7 @@ wait_for_common_io(struct completion *x, long timeout, int state)
 }
 
 /**
- * wait_for_completion: - waits for completion of a task
+ * __wait_for_completion: - waits for completion of a task
  * @x:  holds the state of this particular completion
  *
  * This waits to be signaled for completion of a specific task. It is NOT
@@ -117,14 +122,14 @@ wait_for_common_io(struct completion *x, long timeout, int state)
  * See also similar routines (i.e. wait_for_completion_timeout()) with timeout
  * and interrupt capability. Also see complete().
  */
-void __sched wait_for_completion(struct completion *x)
+void __sched __wait_for_completion(struct completion *x)
 {
 	wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion);
+EXPORT_SYMBOL(__wait_for_completion);
 
 /**
- * wait_for_completion_timeout: - waits for completion of a task (w/timeout)
+ * __wait_for_completion_timeout: - waits for completion of a task (w/timeout)
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -136,28 +141,28 @@ EXPORT_SYMBOL(wait_for_completion);
  * till timeout) if completed.
  */
 unsigned long __sched
-wait_for_completion_timeout(struct completion *x, unsigned long timeout)
+__wait_for_completion_timeout(struct completion *x, unsigned long timeout)
 {
 	return wait_for_common(x, timeout, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_timeout);
+EXPORT_SYMBOL(__wait_for_completion_timeout);
 
 /**
- * wait_for_completion_io: - waits for completion of a task
+ * __wait_for_completion_io: - waits for completion of a task
  * @x:  holds the state of this particular completion
  *
  * This waits to be signaled for completion of a specific task. It is NOT
  * interruptible and there is no timeout. The caller is accounted as waiting
  * for IO (which traditionally means blkio only).
  */
-void __sched wait_for_completion_io(struct completion *x)
+void __sched __wait_for_completion_io(struct completion *x)
 {
 	wait_for_common_io(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_io);
+EXPORT_SYMBOL(__wait_for_completion_io);
 
 /**
- * wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
+ * __wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -170,14 +175,14 @@ EXPORT_SYMBOL(wait_for_completion_io);
  * till timeout) if completed.
  */
 unsigned long __sched
-wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
+__wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
 {
 	return wait_for_common_io(x, timeout, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_io_timeout);
+EXPORT_SYMBOL(__wait_for_completion_io_timeout);
 
 /**
- * wait_for_completion_interruptible: - waits for completion of a task (w/intr)
+ * __wait_for_completion_interruptible: - waits for completion of a task (w/intr)
  * @x:  holds the state of this particular completion
  *
  * This waits for completion of a specific task to be signaled. It is
@@ -185,17 +190,18 @@ EXPORT_SYMBOL(wait_for_completion_io_timeout);
  *
  * Return: -ERESTARTSYS if interrupted, 0 if completed.
  */
-int __sched wait_for_completion_interruptible(struct completion *x)
+int __sched __wait_for_completion_interruptible(struct completion *x)
 {
 	long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE);
+
 	if (t == -ERESTARTSYS)
 		return t;
 	return 0;
 }
-EXPORT_SYMBOL(wait_for_completion_interruptible);
+EXPORT_SYMBOL(__wait_for_completion_interruptible);
 
 /**
- * wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
+ * __wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -206,15 +212,15 @@ EXPORT_SYMBOL(wait_for_completion_interruptible);
  * or number of jiffies left till timeout) if completed.
  */
 long __sched
-wait_for_completion_interruptible_timeout(struct completion *x,
+__wait_for_completion_interruptible_timeout(struct completion *x,
 					  unsigned long timeout)
 {
 	return wait_for_common(x, timeout, TASK_INTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_interruptible_timeout);
+EXPORT_SYMBOL(__wait_for_completion_interruptible_timeout);
 
 /**
- * wait_for_completion_killable: - waits for completion of a task (killable)
+ * __wait_for_completion_killable: - waits for completion of a task (killable)
  * @x:  holds the state of this particular completion
  *
  * This waits to be signaled for completion of a specific task. It can be
@@ -222,17 +228,18 @@ EXPORT_SYMBOL(wait_for_completion_interruptible_timeout);
  *
  * Return: -ERESTARTSYS if interrupted, 0 if completed.
  */
-int __sched wait_for_completion_killable(struct completion *x)
+int __sched __wait_for_completion_killable(struct completion *x)
 {
 	long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_KILLABLE);
+
 	if (t == -ERESTARTSYS)
 		return t;
 	return 0;
 }
-EXPORT_SYMBOL(wait_for_completion_killable);
+EXPORT_SYMBOL(__wait_for_completion_killable);
 
 /**
- * wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
+ * __wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -244,12 +251,12 @@ EXPORT_SYMBOL(wait_for_completion_killable);
  * or number of jiffies left till timeout) if completed.
  */
 long __sched
-wait_for_completion_killable_timeout(struct completion *x,
+__wait_for_completion_killable_timeout(struct completion *x,
 				     unsigned long timeout)
 {
 	return wait_for_common(x, timeout, TASK_KILLABLE);
 }
-EXPORT_SYMBOL(wait_for_completion_killable_timeout);
+EXPORT_SYMBOL(__wait_for_completion_killable_timeout);
 
 /**
  *	try_wait_for_completion - try to decrement a completion without blocking
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 37e0136..dd314d3 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1016,6 +1016,14 @@ config LOCKDEP_CROSSRELEASE
 	 or wait_for_complete() can use lock correctness detector using
 	 lockdep.
 
+config LOCKDEP_COMPLETE
+	bool "Lock debugging: allow complete to use deadlock detector"
+	select LOCKDEP_CROSSRELEASE
+	default n
+	help
+	 A deadlock caused by wait and complete can be detected by lockdep
+	 using crossrelease feature.
+
 config PROVE_LOCKING
 	bool "Lock debugging: prove locking correctness"
 	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 07/12] pagemap.h: Remove trailing white space
  2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
                   ` (5 preceding siblings ...)
  2016-06-20  4:55 ` [RFC 06/12] lockdep: Apply crossrelease to completion Byungchul Park
@ 2016-06-20  4:55 ` Byungchul Park
  2016-06-20  4:55 ` [RFC 08/12] lockdep: Apply crossrelease to PG_locked lock Byungchul Park
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-20  4:55 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

Trailing white space is not accepted in kernel coding style. Remove
them.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/pagemap.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 92395a0..c0049d9 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -513,7 +513,7 @@ static inline void wake_up_page(struct page *page, int bit)
 	__wake_up_bit(page_waitqueue(page), &page->flags, bit);
 }
 
-/* 
+/*
  * Wait for a page to be unlocked.
  *
  * This must be called with the caller "holding" the page,
@@ -526,7 +526,7 @@ static inline void wait_on_page_locked(struct page *page)
 		wait_on_page_bit(compound_head(page), PG_locked);
 }
 
-/* 
+/*
  * Wait for a page to complete writeback
  */
 static inline void wait_on_page_writeback(struct page *page)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 08/12] lockdep: Apply crossrelease to PG_locked lock
  2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
                   ` (6 preceding siblings ...)
  2016-06-20  4:55 ` [RFC 07/12] pagemap.h: Remove trailing white space Byungchul Park
@ 2016-06-20  4:55 ` Byungchul Park
  2016-06-30 13:04   ` Peter Zijlstra
  2016-06-20  4:55 ` [RFC 09/12] cifs/file.c: Remove trailing white space Byungchul Park
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 38+ messages in thread
From: Byungchul Park @ 2016-06-20  4:55 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

lock_page() and its family can cause deadlock. Nevertheless, it cannot
use the lock correctness validator becasue unlock_page() can be called
in different context from the context calling lock_page(), which
violates original lockdep's assumption.

However, thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can apply the lockdep
detector to lock_page() using PG_locked. Applied it.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/mm_types.h |  9 +++++
 include/linux/pagemap.h  | 95 +++++++++++++++++++++++++++++++++++++++++++++---
 lib/Kconfig.debug        |  9 +++++
 mm/filemap.c             |  4 +-
 mm/page_alloc.c          |  3 ++
 5 files changed, 112 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 624b78b..ab33ee3 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -15,6 +15,10 @@
 #include <asm/page.h>
 #include <asm/mmu.h>
 
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include <linux/lockdep.h>
+#endif
+
 #ifndef AT_VECTOR_SIZE_ARCH
 #define AT_VECTOR_SIZE_ARCH 0
 #endif
@@ -215,6 +219,11 @@ struct page {
 #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
 	int _last_cpupid;
 #endif
+
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+	struct lockdep_map map;
+	struct cross_lock xlock;
+#endif
 }
 /*
  * The struct page can be forced to be double word aligned so that atomic ops
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c0049d9..2fc4af1 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -14,6 +14,9 @@
 #include <linux/bitops.h>
 #include <linux/hardirq.h> /* for in_interrupt() */
 #include <linux/hugetlb_inline.h>
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include <linux/lockdep.h>
+#endif
 
 /*
  * Bits in mapping->flags.  The lower __GFP_BITS_SHIFT bits are the page
@@ -441,26 +444,81 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
 	return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 }
 
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#define lock_page_init(p)					\
+do {								\
+	static struct lock_class_key __key;			\
+	lockdep_init_map_crosslock(&(p)->map, &(p)->xlock,	\
+			"(PG_locked)" #p, &__key, 0);		\
+} while (0)
+
+static inline void lock_page_acquire(struct page *page, int try)
+{
+	page = compound_head(page);
+	lock_acquire_exclusive(&page->map, 0, try, NULL, _RET_IP_);
+}
+
+static inline void lock_page_release(struct page *page)
+{
+	page = compound_head(page);
+	/*
+	 * Calling lock_commit_crosslock() is necessary
+	 * for cross-releasable lock when the lock is
+	 * releasing before calling lock_release().
+	 */
+	lock_commit_crosslock(&page->map);
+	lock_release(&page->map, 0, _RET_IP_);
+}
+#else
+static inline void lock_page_init(struct page *page) {}
+static inline void lock_page_free(struct page *page) {}
+static inline void lock_page_acquire(struct page *page, int try) {}
+static inline void lock_page_release(struct page *page) {}
+#endif
+
 extern void __lock_page(struct page *page);
 extern int __lock_page_killable(struct page *page);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
-extern void unlock_page(struct page *page);
+extern void do_raw_unlock_page(struct page *page);
 
-static inline int trylock_page(struct page *page)
+static inline void unlock_page(struct page *page)
+{
+	lock_page_release(page);
+	do_raw_unlock_page(page);
+}
+
+static inline int do_raw_trylock_page(struct page *page)
 {
 	page = compound_head(page);
 	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
 }
 
+static inline int trylock_page(struct page *page)
+{
+	if (do_raw_trylock_page(page)) {
+		lock_page_acquire(page, 1);
+		return 1;
+	}
+	return 0;
+}
+
 /*
  * lock_page may only be called if we have the page's inode pinned.
  */
 static inline void lock_page(struct page *page)
 {
 	might_sleep();
-	if (!trylock_page(page))
+
+	if (!do_raw_trylock_page(page))
 		__lock_page(page);
+	/*
+	 * The acquire function must be after actual lock operation
+	 * for crossrelease lock, because the lock instance is
+	 * searched by release operation in any context and more
+	 * than two instances acquired make it confused.
+	 */
+	lock_page_acquire(page, 0);
 }
 
 /*
@@ -470,9 +528,22 @@ static inline void lock_page(struct page *page)
  */
 static inline int lock_page_killable(struct page *page)
 {
+	int ret;
+
 	might_sleep();
-	if (!trylock_page(page))
-		return __lock_page_killable(page);
+
+	if (!do_raw_trylock_page(page)) {
+		ret = __lock_page_killable(page);
+		if (ret)
+			return ret;
+	}
+	/*
+	 * The acquire function must be after actual lock operation
+	 * for crossrelease lock, because the lock instance is
+	 * searched by release operation in any context and more
+	 * than two instances acquired make it confused.
+	 */
+	lock_page_acquire(page, 0);
 	return 0;
 }
 
@@ -487,7 +558,19 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				     unsigned int flags)
 {
 	might_sleep();
-	return trylock_page(page) || __lock_page_or_retry(page, mm, flags);
+
+	if (do_raw_trylock_page(page) || __lock_page_or_retry(page, mm, flags)) {
+		/*
+		 * The acquire function must be after actual lock operation
+		 * for crossrelease lock, because the lock instance is
+		 * searched by release operation in any context and more
+		 * than two instances acquired make it confused.
+		 */
+		lock_page_acquire(page, 0);
+		return 1;
+	}
+
+	return 0;
 }
 
 /*
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index dd314d3..73942b5 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1024,6 +1024,15 @@ config LOCKDEP_COMPLETE
 	 A deadlock caused by wait and complete can be detected by lockdep
 	 using crossrelease feature.
 
+config LOCKDEP_PAGELOCK
+	bool "Lock debugging: allow PG_locked lock to use deadlock detector"
+	select LOCKDEP_CROSSRELEASE
+	default n
+	help
+	 PG_locked lock is a kind of cross-released lock. This makes
+	 PG_locked lock possible to use deadlock detector, using
+	 crossrelease feature.
+
 config PROVE_LOCKING
 	bool "Lock debugging: prove locking correctness"
 	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
diff --git a/mm/filemap.c b/mm/filemap.c
index 3461d97..47fc5c0 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -814,7 +814,7 @@ EXPORT_SYMBOL_GPL(add_page_wait_queue);
  * The mb is necessary to enforce ordering between the clear_bit and the read
  * of the waitqueue (to avoid SMP races with a parallel wait_on_page_locked()).
  */
-void unlock_page(struct page *page)
+void do_raw_unlock_page(struct page *page)
 {
 	page = compound_head(page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
@@ -822,7 +822,7 @@ void unlock_page(struct page *page)
 	smp_mb__after_atomic();
 	wake_up_page(page, PG_locked);
 }
-EXPORT_SYMBOL(unlock_page);
+EXPORT_SYMBOL(do_raw_unlock_page);
 
 /**
  * end_page_writeback - end writeback against a page
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 838ca8bb..17ed9a8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4538,6 +4538,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		} else {
 			__init_single_pfn(pfn, zone, nid);
 		}
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+		lock_page_init(pfn_to_page(pfn));
+#endif
 	}
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 09/12] cifs/file.c: Remove trailing white space
  2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
                   ` (7 preceding siblings ...)
  2016-06-20  4:55 ` [RFC 08/12] lockdep: Apply crossrelease to PG_locked lock Byungchul Park
@ 2016-06-20  4:55 ` Byungchul Park
  2016-06-20  4:55 ` [RFC 10/12] mm/swap_state.c: " Byungchul Park
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-20  4:55 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

Trailing white space is not accepted in kernel coding style. Remove
them.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 fs/cifs/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index ff882ae..bcf9ead 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3851,7 +3851,7 @@ void cifs_oplock_break(struct work_struct *work)
  * In the non-cached mode (mount with cache=none), we shunt off direct read and write requests
  * so this method should never be called.
  *
- * Direct IO is not yet supported in the cached mode. 
+ * Direct IO is not yet supported in the cached mode.
  */
 static ssize_t
 cifs_direct_io(struct kiocb *iocb, struct iov_iter *iter, loff_t pos)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 10/12] mm/swap_state.c: Remove trailing white space
  2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
                   ` (8 preceding siblings ...)
  2016-06-20  4:55 ` [RFC 09/12] cifs/file.c: Remove trailing white space Byungchul Park
@ 2016-06-20  4:55 ` Byungchul Park
  2016-06-20  4:55 ` [RFC 11/12] lockdep: Call lock_acquire(release) when accessing PG_locked manually Byungchul Park
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-20  4:55 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

Trailing white space is not accepted in kernel coding style. Remove
them.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 mm/swap_state.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 69cb246..3fb7013 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -156,7 +156,7 @@ void __delete_from_swap_cache(struct page *page)
  * @page: page we want to move to swap
  *
  * Allocate swap space for the page and add the page to the
- * swap cache.  Caller needs to hold the page lock. 
+ * swap cache.  Caller needs to hold the page lock.
  */
 int add_to_swap(struct page *page, struct list_head *list)
 {
@@ -229,9 +229,9 @@ void delete_from_swap_cache(struct page *page)
 	page_cache_release(page);
 }
 
-/* 
- * If we are the only user, then try to free up the swap cache. 
- * 
+/*
+ * If we are the only user, then try to free up the swap cache.
+ *
  * Its ok to check for PageSwapCache without the page lock
  * here because we are going to recheck again inside
  * try_to_free_swap() _with_ the lock.
@@ -245,7 +245,7 @@ static inline void free_swap_cache(struct page *page)
 	}
 }
 
-/* 
+/*
  * Perform a free_page(), also freeing any swap cache associated with
  * this page if it is the last user of the page.
  */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 11/12] lockdep: Call lock_acquire(release) when accessing PG_locked manually
  2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
                   ` (9 preceding siblings ...)
  2016-06-20  4:55 ` [RFC 10/12] mm/swap_state.c: " Byungchul Park
@ 2016-06-20  4:55 ` Byungchul Park
  2016-06-20  4:55 ` [RFC 12/12] x86/dumpstack: Optimize save_stack_trace Byungchul Park
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-20  4:55 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

The PG_locked bit can be updated through SetPageLocked() or
ClearPageLocked(), not by lock_page() and unlock_page().
SetPageLockded() and ClearPageLocked() also have to be considered to
get balanced between acquring and releasing the PG_locked lock.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 fs/cifs/file.c          | 4 ++++
 include/linux/pagemap.h | 5 ++++-
 mm/filemap.c            | 6 ++++--
 mm/ksm.c                | 1 +
 mm/migrate.c            | 1 +
 mm/shmem.c              | 2 ++
 mm/swap_state.c         | 2 ++
 mm/vmscan.c             | 1 +
 8 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index bcf9ead..7b250c1 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3392,12 +3392,14 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
 	 * PG_locked without checking it first.
 	 */
 	__SetPageLocked(page);
+	lock_page_acquire(page, 1);
 	rc = add_to_page_cache_locked(page, mapping,
 				      page->index, gfp);
 
 	/* give up if we can't stick it in the cache */
 	if (rc) {
 		__ClearPageLocked(page);
+		lock_page_release(page);
 		return rc;
 	}
 
@@ -3419,8 +3421,10 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
 			break;
 
 		__SetPageLocked(page);
+		lock_page_acquire(page, 1);
 		if (add_to_page_cache_locked(page, mapping, page->index, gfp)) {
 			__ClearPageLocked(page);
+			lock_page_release(page);
 			break;
 		}
 		list_move_tail(&page->lru, tmplist);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 2fc4af1..f92972c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -760,9 +760,12 @@ static inline int add_to_page_cache(struct page *page,
 	int error;
 
 	__SetPageLocked(page);
+	lock_page_acquire(page, 1);
 	error = add_to_page_cache_locked(page, mapping, offset, gfp_mask);
-	if (unlikely(error))
+	if (unlikely(error)) {
 		__ClearPageLocked(page);
+		lock_page_release(page);
+	}
 	return error;
 }
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 47fc5c0..7acce5e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -690,11 +690,13 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
 	int ret;
 
 	__SetPageLocked(page);
+	lock_page_acquire(page, 1);
 	ret = __add_to_page_cache_locked(page, mapping, offset,
 					 gfp_mask, &shadow);
-	if (unlikely(ret))
+	if (unlikely(ret)) {
 		__ClearPageLocked(page);
-	else {
+		lock_page_release(page);
+	} else {
 		/*
 		 * The page might have been evicted from cache only
 		 * recently, in which case it should be activated like
diff --git a/mm/ksm.c b/mm/ksm.c
index ca6d2a0..c89debd 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1869,6 +1869,7 @@ struct page *ksm_might_need_to_copy(struct page *page,
 		SetPageDirty(new_page);
 		__SetPageUptodate(new_page);
 		__SetPageLocked(new_page);
+		lock_page_acquire(new_page, 1);
 	}
 
 	return new_page;
diff --git a/mm/migrate.c b/mm/migrate.c
index 3ad0fea..9aab7c4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1773,6 +1773,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 
 	/* Prepare a page as a migration target */
 	__SetPageLocked(new_page);
+	lock_page_acquire(new_page, 1);
 	SetPageSwapBacked(new_page);
 
 	/* anon mapping, we can simply copy page->mapping to the new page: */
diff --git a/mm/shmem.c b/mm/shmem.c
index 440e2a7..da35ca8 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1090,6 +1090,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
 	flush_dcache_page(newpage);
 
 	__SetPageLocked(newpage);
+	lock_page_acquire(newpage, 1);
 	SetPageUptodate(newpage);
 	SetPageSwapBacked(newpage);
 	set_page_private(newpage, swap_index);
@@ -1283,6 +1284,7 @@ repeat:
 
 		__SetPageSwapBacked(page);
 		__SetPageLocked(page);
+		lock_page_acquire(page, 1);
 		if (sgp == SGP_WRITE)
 			__SetPageReferenced(page);
 
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 3fb7013..200edbf 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -358,6 +358,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 
 		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
 		__SetPageLocked(new_page);
+		lock_page_acquire(new_page, 1);
 		SetPageSwapBacked(new_page);
 		err = __add_to_swap_cache(new_page, entry);
 		if (likely(!err)) {
@@ -372,6 +373,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 		radix_tree_preload_end();
 		ClearPageSwapBacked(new_page);
 		__ClearPageLocked(new_page);
+		lock_page_release(new_page);
 		/*
 		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
 		 * clear SWAP_HAS_CACHE flag.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 71b1c29..5baff91 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1199,6 +1199,7 @@ lazyfree:
 		 * waiting on the page lock, because there are no references.
 		 */
 		__ClearPageLocked(page);
+		lock_page_release(page);
 free_it:
 		if (ret == SWAP_LZFREE)
 			count_vm_event(PGLAZYFREED);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 12/12] x86/dumpstack: Optimize save_stack_trace
  2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
                   ` (10 preceding siblings ...)
  2016-06-20  4:55 ` [RFC 11/12] lockdep: Call lock_acquire(release) when accessing PG_locked manually Byungchul Park
@ 2016-06-20  4:55 ` Byungchul Park
  2016-06-20  7:29   ` xinhui
  2016-06-23 23:37 ` [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
  2016-07-01  4:15 ` [PATCH] lockdep: Add a document describing " Byungchul Park
  13 siblings, 1 reply; 38+ messages in thread
From: Byungchul Park @ 2016-06-20  4:55 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

Currently, x86 implementation of save_stack_trace() is walking all stack
region word by word regardless of what the trace->max_entries is.
However, it's unnecessary to walk after already fulfilling caller's
requirement, say, if trace->nr_entries >= trace->max_entries is true.

For example, CONFIG_LOCKDEP_CROSSRELEASE implementation calls
save_stack_trace() with max_entries = 5 frequently. I measured its
overhead and printed its difference of sched_clock() with my QEMU x86
machine.

The latency was improved over 70% when trace->max_entries = 5.

Before this patch:

[    2.326940] save_stack_trace() takes 83931 ns
[    2.326389] save_stack_trace() takes 62576 ns
[    2.327575] save_stack_trace() takes 58826 ns
[    2.327000] save_stack_trace() takes 88980 ns
[    2.327424] save_stack_trace() takes 59831 ns
[    2.327575] save_stack_trace() takes 58482 ns
[    2.327597] save_stack_trace() takes 87114 ns
[    2.327931] save_stack_trace() takes 121140 ns
[    2.327434] save_stack_trace() takes 64321 ns
[    2.328632] save_stack_trace() takes 84997 ns
[    2.328000] save_stack_trace() takes 115037 ns
[    2.328460] save_stack_trace() takes 72292 ns
[    2.328632] save_stack_trace() takes 61236 ns
[    2.328567] save_stack_trace() takes 76666 ns
[    2.328867] save_stack_trace() takes 79525 ns
[    2.328460] save_stack_trace() takes 64902 ns
[    2.329585] save_stack_trace() takes 58760 ns
[    2.329000] save_stack_trace() takes 91349 ns
[    2.329414] save_stack_trace() takes 60069 ns
[    2.329585] save_stack_trace() takes 61012 ns
[    2.329573] save_stack_trace() takes 76820 ns
[    2.329863] save_stack_trace() takes 62131 ns
[    2.330000] save_stack_trace() takes 99476 ns
[    2.329846] save_stack_trace() takes 62419 ns
[    2.330000] save_stack_trace() takes 88918 ns
[    2.330253] save_stack_trace() takes 73669 ns
[    2.330520] save_stack_trace() takes 67876 ns
[    2.330671] save_stack_trace() takes 75963 ns
[    2.330983] save_stack_trace() takes 95079 ns
[    2.330451] save_stack_trace() takes 62352 ns

After this patch:

[    2.780735] save_stack_trace() takes 19902 ns
[    2.780718] save_stack_trace() takes 20240 ns
[    2.781692] save_stack_trace() takes 45215 ns
[    2.781477] save_stack_trace() takes 20191 ns
[    2.781694] save_stack_trace() takes 20044 ns
[    2.782589] save_stack_trace() takes 20292 ns
[    2.782706] save_stack_trace() takes 20024 ns
[    2.782706] save_stack_trace() takes 19881 ns
[    2.782881] save_stack_trace() takes 24577 ns
[    2.782706] save_stack_trace() takes 19901 ns
[    2.783621] save_stack_trace() takes 24381 ns
[    2.783621] save_stack_trace() takes 20205 ns
[    2.783760] save_stack_trace() takes 19956 ns
[    2.783718] save_stack_trace() takes 20280 ns
[    2.784179] save_stack_trace() takes 20099 ns
[    2.784835] save_stack_trace() takes 20055 ns
[    2.785922] save_stack_trace() takes 20157 ns
[    2.785922] save_stack_trace() takes 20140 ns
[    2.786178] save_stack_trace() takes 20040 ns
[    2.786877] save_stack_trace() takes 20102 ns
[    2.795000] save_stack_trace() takes 21147 ns
[    2.795397] save_stack_trace() takes 20230 ns
[    2.795397] save_stack_trace() takes 31274 ns
[    2.795739] save_stack_trace() takes 19706 ns
[    2.796484] save_stack_trace() takes 20266 ns
[    2.796484] save_stack_trace() takes 20902 ns
[    2.797000] save_stack_trace() takes 38110 ns
[    2.797510] save_stack_trace() takes 20224 ns
[    2.798181] save_stack_trace() takes 20172 ns
[    2.798837] save_stack_trace() takes 20824 ns

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 arch/x86/include/asm/stacktrace.h | 1 +
 arch/x86/kernel/dumpstack.c       | 2 ++
 arch/x86/kernel/dumpstack_32.c    | 2 ++
 arch/x86/kernel/stacktrace.c      | 7 +++++++
 4 files changed, 12 insertions(+)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 70bbe39..fc572e7 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -41,6 +41,7 @@ struct stacktrace_ops {
 	/* On negative return stop dumping */
 	int (*stack)(void *data, char *name);
 	walk_stack_t	walk_stack;
+	int (*end_walk)(void *data);
 };
 
 void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 9c30acf..355fe8f 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -115,6 +115,8 @@ print_context_stack(struct thread_info *tinfo,
 			print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
 		}
 		stack++;
+		if (ops->end_walk && ops->end_walk(data))
+			break;
 	}
 	return bp;
 }
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 464ffd6..cc51419 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -71,6 +71,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		context = task_thread_info(task);
 		bp = ops->walk_stack(context, stack, bp, ops, data,
 				     end_stack, &graph);
+		if (ops->end_walk && ops->end_walk(data))
+			break;
 
 		/* Stop if not on irq stack */
 		if (!end_stack)
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index fdd0c64..9545719 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -43,10 +43,17 @@ save_stack_address_nosched(void *data, unsigned long addr, int reliable)
 	return __save_stack_address(data, addr, reliable, true);
 }
 
+static int save_stack_end(void *data)
+{
+	struct stack_trace *trace = data;
+	return trace->nr_entries >= trace->max_entries;
+}
+
 static const struct stacktrace_ops save_stack_ops = {
 	.stack		= save_stack_stack,
 	.address	= save_stack_address,
 	.walk_stack	= print_context_stack,
+	.end_walk	= save_stack_end,
 };
 
 static const struct stacktrace_ops save_stack_ops_nosched = {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [RFC 12/12] x86/dumpstack: Optimize save_stack_trace
  2016-06-20  4:55 ` [RFC 12/12] x86/dumpstack: Optimize save_stack_trace Byungchul Park
@ 2016-06-20  7:29   ` xinhui
  2016-06-20  7:50     ` byungchul.park
  0 siblings, 1 reply; 38+ messages in thread
From: xinhui @ 2016-06-20  7:29 UTC (permalink / raw)
  To: Byungchul Park, peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx


On 2016年06月20日 12:55, Byungchul Park wrote:
> Currently, x86 implementation of save_stack_trace() is walking all stack
> region word by word regardless of what the trace->max_entries is.
> However, it's unnecessary to walk after already fulfilling caller's
> requirement, say, if trace->nr_entries >= trace->max_entries is true.
>
> For example, CONFIG_LOCKDEP_CROSSRELEASE implementation calls
> save_stack_trace() with max_entries = 5 frequently. I measured its
> overhead and printed its difference of sched_clock() with my QEMU x86
> machine.
>
> The latency was improved over 70% when trace->max_entries = 5.
>
[snip]

> +static int save_stack_end(void *data)
> +{
> +	struct stack_trace *trace = data;
> +	return trace->nr_entries >= trace->max_entries;
> +}
> +
>   static const struct stacktrace_ops save_stack_ops = {
>   	.stack		= save_stack_stack,
>   	.address	= save_stack_address,
then why not check the return value of ->address(), -1 indicate there is no room to store any pointer.

>   	.walk_stack	= print_context_stack,
> +	.end_walk	= save_stack_end,
>   };
>
>   static const struct stacktrace_ops save_stack_ops_nosched = {
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [RFC 12/12] x86/dumpstack: Optimize save_stack_trace
  2016-06-20  7:29   ` xinhui
@ 2016-06-20  7:50     ` byungchul.park
  2016-06-29 12:43       ` Byungchul Park
  0 siblings, 1 reply; 38+ messages in thread
From: byungchul.park @ 2016-06-20  7:50 UTC (permalink / raw)
  To: 'xinhui', peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

> -----Original Message-----
> From: xinhui [mailto:xinhui.pan@linux.vnet.ibm.com]
> Sent: Monday, June 20, 2016 4:29 PM
> To: Byungchul Park; peterz@infradead.org; mingo@kernel.org
> Cc: linux-kernel@vger.kernel.org; npiggin@suse.de; walken@google.com;
> ak@suse.de; tglx@inhelltoy.tec.linutronix.de
> Subject: Re: [RFC 12/12] x86/dumpstack: Optimize save_stack_trace
> 
> 
> On 2016年06月20日 12:55, Byungchul Park wrote:
> > Currently, x86 implementation of save_stack_trace() is walking all stack
> > region word by word regardless of what the trace->max_entries is.
> > However, it's unnecessary to walk after already fulfilling caller's
> > requirement, say, if trace->nr_entries >= trace->max_entries is true.
> >
> > For example, CONFIG_LOCKDEP_CROSSRELEASE implementation calls
> > save_stack_trace() with max_entries = 5 frequently. I measured its
> > overhead and printed its difference of sched_clock() with my QEMU x86
> > machine.
> >
> > The latency was improved over 70% when trace->max_entries = 5.
> >
> [snip]
> 
> > +static int save_stack_end(void *data)
> > +{
> > +	struct stack_trace *trace = data;
> > +	return trace->nr_entries >= trace->max_entries;
> > +}
> > +
> >   static const struct stacktrace_ops save_stack_ops = {
> >   	.stack		= save_stack_stack,
> >   	.address	= save_stack_address,
> then why not check the return value of ->address(), -1 indicate there is
> no room to store any pointer.

Hello,

Indeed. It also looks good to me even though it has to propagate the condition
between callback functions. I will modify it if it's better.

Thank you.
Byungchul

> 
> >   	.walk_stack	= print_context_stack,
> > +	.end_walk	= save_stack_end,
> >   };
> >
> >   static const struct stacktrace_ops save_stack_ops_nosched = {
> >

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 00/12] lockdep: Implement crossrelease feature
  2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
                   ` (11 preceding siblings ...)
  2016-06-20  4:55 ` [RFC 12/12] x86/dumpstack: Optimize save_stack_trace Byungchul Park
@ 2016-06-23 23:37 ` Byungchul Park
  2016-06-24  7:08   ` Peter Zijlstra
  2016-06-24 11:26   ` Nikolay Borisov
  2016-07-01  4:15 ` [PATCH] lockdep: Add a document describing " Byungchul Park
  13 siblings, 2 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-23 23:37 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx

On Mon, Jun 20, 2016 at 01:55:15PM +0900, Byungchul Park wrote:

Hello,

I have a plan to resend this patchset after reinforcement of
documentation. However I am wondering what you think about the
main concept of this. A main motivation is to be able to detect
several problems which I describes with examples below.

ex.1)

PROCESS X		PROCESS Y
---------		---------
mutext_lock A
			lock_page B
lock_page B
			mutext_lock A // DEADLOCK
unlock_page B
			mutext_unlock A
mutex_unlock A
			unlock_page B

ex.2)

PROCESS X		PROCESS Y		PROCESS Z
---------		---------		---------
lock_page B		mutex_lock A

			lock_page B
						mutext_lock A // DEADLOCK
						mutext_unlock A
						unlock_page B
			mutex_unlock A

ex.3)

PROCESS X		PROCESS Y
---------		---------
			mutex_lock A
mutex_lock A
mutex_unlock A		wait_for_complete B // DEADLOCK

complete B
			mutex_unlock A

and so on...

Whatever lockdep can detect can be detected by my implementation
except AA deadlock in a context, which is of course not a deadlock
by nature, for locks releasable by difference context. Fortunately,
current kernel code is robust enough not to be detected on my machine,
I am sure this can be a good navigator to developers.

Thank you.
Byungchul

> Crossrelease feature calls a lock which is releasable by a
> different context from the context having acquired the lock,
> crosslock. For crosslock, all locks having been held in the
> context unlocking the crosslock, until eventually the crosslock
> will be unlocked, have dependency with the crosslock. That's a
> key idea to implement crossrelease feature.
> 
> Crossrelease feature introduces 2 new data structures.
> 
> 1. pend_lock (== plock)
> 
> 	This is for keeping locks waiting to commit those so
> 	that an actual dependency chain is built, when commiting
> 	a crosslock.
> 
> 	Every task_struct has an array of this pending lock to
> 	keep those locks. These pending locks will be added
> 	whenever lock_acquire() is called for normal(non-crosslock)
> 	lock and will be flushed(committed) at proper time.
> 
> 2. cross_lock (== xlock)
> 
> 	This keeps some additional data only for crosslock. There
> 	is one cross_lock per one lockdep_map for crosslock.
> 	lockdep_init_map_crosslock() should be used instead of
> 	lockdep_init_map() to use the lock as a crosslock.
> 
> Acquiring and releasing sequence for crossrelease feature:
> 
> 1. Acquire
> 
> 	All validation check is performed for all locks.
> 
> 	1) For non-crosslock (normal lock)
> 
> 		The hlock will be added not only to held_locks
> 		of the current's task_struct, but also to
> 		pend_lock array of the task_struct, so that
> 		a dependency chain can be built with the lock
> 		when doing commit.
> 
> 	2) For crosslock
> 
> 		The hlock will be added only to the cross_lock
> 		of the lock's lockdep_map instead of held_locks,
> 		so that a dependency chain can be built with
> 		the lock when doing commit. And this lock is
> 		added to the xlocks_head list.
> 
> 2. Commit (only for crosslock)
> 
> 	This establishes a dependency chain between the lock
> 	unlocking it now and all locks having held in the context
> 	unlocking it since the lock was held, even though it tries
> 	to avoid building a chain unnecessarily as far as possible.
> 
> 3. Release
> 
> 	1) For non-crosslock (normal lock)
> 
> 		No change.
> 
> 	2) For crosslock
> 
> 		Just Remove the lock from xlocks_head list. Release
> 		operation should be used with commit operation
> 		together for crosslock, in order to build a
> 		dependency chain properly.
> 
> Byungchul Park (12):
>   lockdep: Refactor lookup_chain_cache()
>   lockdep: Add a function building a chain between two hlocks
>   lockdep: Make check_prev_add can use a stack_trace of other context
>   lockdep: Make save_trace can copy from other stack_trace
>   lockdep: Implement crossrelease feature
>   lockdep: Apply crossrelease to completion
>   pagemap.h: Remove trailing white space
>   lockdep: Apply crossrelease to PG_locked lock
>   cifs/file.c: Remove trailing white space
>   mm/swap_state.c: Remove trailing white space
>   lockdep: Call lock_acquire(release) when accessing PG_locked manually
>   x86/dumpstack: Optimize save_stack_trace
> 
>  arch/x86/include/asm/stacktrace.h |   1 +
>  arch/x86/kernel/dumpstack.c       |   2 +
>  arch/x86/kernel/dumpstack_32.c    |   2 +
>  arch/x86/kernel/stacktrace.c      |   7 +
>  fs/cifs/file.c                    |   6 +-
>  include/linux/completion.h        | 121 +++++-
>  include/linux/irqflags.h          |  16 +-
>  include/linux/lockdep.h           | 139 +++++++
>  include/linux/mm_types.h          |   9 +
>  include/linux/pagemap.h           | 104 ++++-
>  include/linux/sched.h             |   5 +
>  kernel/fork.c                     |   4 +
>  kernel/locking/lockdep.c          | 846 +++++++++++++++++++++++++++++++++++---
>  kernel/sched/completion.c         |  55 +--
>  lib/Kconfig.debug                 |  30 ++
>  mm/filemap.c                      |  10 +-
>  mm/ksm.c                          |   1 +
>  mm/migrate.c                      |   1 +
>  mm/page_alloc.c                   |   3 +
>  mm/shmem.c                        |   2 +
>  mm/swap_state.c                   |  12 +-
>  mm/vmscan.c                       |   1 +
>  22 files changed, 1255 insertions(+), 122 deletions(-)
> 
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 00/12] lockdep: Implement crossrelease feature
  2016-06-23 23:37 ` [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
@ 2016-06-24  7:08   ` Peter Zijlstra
  2016-06-24 11:13     ` Byungchul Park
  2016-06-24 11:26   ` Nikolay Borisov
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2016-06-24  7:08 UTC (permalink / raw)
  To: Byungchul Park; +Cc: mingo, linux-kernel, npiggin, walken, ak, tglx

On Fri, Jun 24, 2016 at 08:37:13AM +0900, Byungchul Park wrote:
> On Mon, Jun 20, 2016 at 01:55:15PM +0900, Byungchul Park wrote:
> 
> Hello,
> 
> I have a plan to resend this patchset after reinforcement of
> documentation. However I am wondering what you think about the
> main concept of this. 

I have not had time to look at this at all..

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 00/12] lockdep: Implement crossrelease feature
  2016-06-24  7:08   ` Peter Zijlstra
@ 2016-06-24 11:13     ` Byungchul Park
  0 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-24 11:13 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, npiggin, walken, ak, tglx

On Fri, Jun 24, 2016 at 09:08:44AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 24, 2016 at 08:37:13AM +0900, Byungchul Park wrote:
> > On Mon, Jun 20, 2016 at 01:55:15PM +0900, Byungchul Park wrote:
> > 
> > Hello,
> > 
> > I have a plan to resend this patchset after reinforcement of
> > documentation. However I am wondering what you think about the
> > main concept of this. 
> 
> I have not had time to look at this at all..

I can wait until you become available.

Thank you for letting me know that.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 00/12] lockdep: Implement crossrelease feature
  2016-06-23 23:37 ` [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
  2016-06-24  7:08   ` Peter Zijlstra
@ 2016-06-24 11:26   ` Nikolay Borisov
  2016-06-27  1:34     ` Byungchul Park
  1 sibling, 1 reply; 38+ messages in thread
From: Nikolay Borisov @ 2016-06-24 11:26 UTC (permalink / raw)
  To: Byungchul Park, peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, tglx



On 06/24/2016 02:37 AM, Byungchul Park wrote:
> On Mon, Jun 20, 2016 at 01:55:15PM +0900, Byungchul Park wrote:
> 
> Hello,
> 
> I have a plan to resend this patchset after reinforcement of
> documentation. However I am wondering what you think about the
> main concept of this. A main motivation is to be able to detect
> several problems which I describes with examples below.
> 
> ex.1)
> 
> PROCESS X		PROCESS Y
> ---------		---------
> mutext_lock A
> 			lock_page B
> lock_page B
> 			mutext_lock A // DEADLOCK
> unlock_page B
> 			mutext_unlock A
> mutex_unlock A
> 			unlock_page B
> 
> ex.2)
> 
> PROCESS X		PROCESS Y		PROCESS Z
> ---------		---------		---------
> lock_page B		mutex_lock A
> 
> 			lock_page B
> 						mutext_lock A // DEADLOCK
> 						mutext_unlock A
> 						unlock_page B
> 			mutex_unlock A

Am I correct in assuming that in ex2 PROCESS Z holds page B lock? If so
can you make it a bit more explicit if this is going to go into the
documentation, that is.

> 
> ex.3)
> 
> PROCESS X		PROCESS Y
> ---------		---------
> 			mutex_lock A
> mutex_lock A
> mutex_unlock A		wait_for_complete B // DEADLOCK
> 
> complete B
> 			mutex_unlock A
> 
> and so on...
> 
> Whatever lockdep can detect can be detected by my implementation
> except AA deadlock in a context, which is of course not a deadlock
> by nature, for locks releasable by difference context. Fortunately,
> current kernel code is robust enough not to be detected on my machine,
> I am sure this can be a good navigator to developers.
> 
> Thank you.
> Byungchul
> 
>> Crossrelease feature calls a lock which is releasable by a
>> different context from the context having acquired the lock,
>> crosslock. For crosslock, all locks having been held in the
>> context unlocking the crosslock, until eventually the crosslock
>> will be unlocked, have dependency with the crosslock. That's a
>> key idea to implement crossrelease feature.
>>
>> Crossrelease feature introduces 2 new data structures.
>>
>> 1. pend_lock (== plock)
>>
>> 	This is for keeping locks waiting to commit those so
>> 	that an actual dependency chain is built, when commiting
>> 	a crosslock.
>>
>> 	Every task_struct has an array of this pending lock to
>> 	keep those locks. These pending locks will be added
>> 	whenever lock_acquire() is called for normal(non-crosslock)
>> 	lock and will be flushed(committed) at proper time.
>>
>> 2. cross_lock (== xlock)
>>
>> 	This keeps some additional data only for crosslock. There
>> 	is one cross_lock per one lockdep_map for crosslock.
>> 	lockdep_init_map_crosslock() should be used instead of
>> 	lockdep_init_map() to use the lock as a crosslock.
>>
>> Acquiring and releasing sequence for crossrelease feature:
>>
>> 1. Acquire
>>
>> 	All validation check is performed for all locks.
>>
>> 	1) For non-crosslock (normal lock)
>>
>> 		The hlock will be added not only to held_locks
>> 		of the current's task_struct, but also to
>> 		pend_lock array of the task_struct, so that
>> 		a dependency chain can be built with the lock
>> 		when doing commit.
>>
>> 	2) For crosslock
>>
>> 		The hlock will be added only to the cross_lock
>> 		of the lock's lockdep_map instead of held_locks,
>> 		so that a dependency chain can be built with
>> 		the lock when doing commit. And this lock is
>> 		added to the xlocks_head list.
>>
>> 2. Commit (only for crosslock)
>>
>> 	This establishes a dependency chain between the lock
>> 	unlocking it now and all locks having held in the context
>> 	unlocking it since the lock was held, even though it tries
>> 	to avoid building a chain unnecessarily as far as possible.
>>
>> 3. Release
>>
>> 	1) For non-crosslock (normal lock)
>>
>> 		No change.
>>
>> 	2) For crosslock
>>
>> 		Just Remove the lock from xlocks_head list. Release
>> 		operation should be used with commit operation
>> 		together for crosslock, in order to build a
>> 		dependency chain properly.
>>
>> Byungchul Park (12):
>>   lockdep: Refactor lookup_chain_cache()
>>   lockdep: Add a function building a chain between two hlocks
>>   lockdep: Make check_prev_add can use a stack_trace of other context
>>   lockdep: Make save_trace can copy from other stack_trace
>>   lockdep: Implement crossrelease feature
>>   lockdep: Apply crossrelease to completion
>>   pagemap.h: Remove trailing white space
>>   lockdep: Apply crossrelease to PG_locked lock
>>   cifs/file.c: Remove trailing white space
>>   mm/swap_state.c: Remove trailing white space
>>   lockdep: Call lock_acquire(release) when accessing PG_locked manually
>>   x86/dumpstack: Optimize save_stack_trace
>>
>>  arch/x86/include/asm/stacktrace.h |   1 +
>>  arch/x86/kernel/dumpstack.c       |   2 +
>>  arch/x86/kernel/dumpstack_32.c    |   2 +
>>  arch/x86/kernel/stacktrace.c      |   7 +
>>  fs/cifs/file.c                    |   6 +-
>>  include/linux/completion.h        | 121 +++++-
>>  include/linux/irqflags.h          |  16 +-
>>  include/linux/lockdep.h           | 139 +++++++
>>  include/linux/mm_types.h          |   9 +
>>  include/linux/pagemap.h           | 104 ++++-
>>  include/linux/sched.h             |   5 +
>>  kernel/fork.c                     |   4 +
>>  kernel/locking/lockdep.c          | 846 +++++++++++++++++++++++++++++++++++---
>>  kernel/sched/completion.c         |  55 +--
>>  lib/Kconfig.debug                 |  30 ++
>>  mm/filemap.c                      |  10 +-
>>  mm/ksm.c                          |   1 +
>>  mm/migrate.c                      |   1 +
>>  mm/page_alloc.c                   |   3 +
>>  mm/shmem.c                        |   2 +
>>  mm/swap_state.c                   |  12 +-
>>  mm/vmscan.c                       |   1 +
>>  22 files changed, 1255 insertions(+), 122 deletions(-)
>>
>> -- 
>> 1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 00/12] lockdep: Implement crossrelease feature
  2016-06-24 11:26   ` Nikolay Borisov
@ 2016-06-27  1:34     ` Byungchul Park
  0 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-27  1:34 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: peterz, mingo, linux-kernel, npiggin, walken, ak, tglx

On Fri, Jun 24, 2016 at 02:26:45PM +0300, Nikolay Borisov wrote:
> 
> 
> On 06/24/2016 02:37 AM, Byungchul Park wrote:
> > On Mon, Jun 20, 2016 at 01:55:15PM +0900, Byungchul Park wrote:
> > 
> > Hello,
> > 
> > I have a plan to resend this patchset after reinforcement of
> > documentation. However I am wondering what you think about the
> > main concept of this. A main motivation is to be able to detect
> > several problems which I describes with examples below.
> > 
> > ex.1)
> > 
> > PROCESS X		PROCESS Y
> > ---------		---------
> > mutext_lock A
> > 			lock_page B
> > lock_page B
> > 			mutext_lock A // DEADLOCK
> > unlock_page B
> > 			mutext_unlock A
> > mutex_unlock A
> > 			unlock_page B
> > 
> > ex.2)
> > 
> > PROCESS X		PROCESS Y		PROCESS Z
> > ---------		---------		---------
> > lock_page B		mutex_lock A
> > 
> > 			lock_page B
> > 						mutext_lock A // DEADLOCK
> > 						mutext_unlock A
> > 						unlock_page B
> > 			mutex_unlock A
> 
> Am I correct in assuming that in ex2 PROCESS Z holds page B lock? If so

In this example, PROCESS Z does not hold page B lock. The page B lock
being unlocked by PROCESS Z was held by PROCESS X.

> can you make it a bit more explicit if this is going to go into the
> documentation, that is.
> 
> > 
> > ex.3)
> > 
> > PROCESS X		PROCESS Y
> > ---------		---------
> > 			mutex_lock A
> > mutex_lock A
> > mutex_unlock A		wait_for_complete B // DEADLOCK
> > 
> > complete B
> > 			mutex_unlock A
> > 
> > and so on...
> > 
> > Whatever lockdep can detect can be detected by my implementation
> > except AA deadlock in a context, which is of course not a deadlock
> > by nature, for locks releasable by difference context. Fortunately,
> > current kernel code is robust enough not to be detected on my machine,
> > I am sure this can be a good navigator to developers.
> > 
> > Thank you.
> > Byungchul
> > 
> >> Crossrelease feature calls a lock which is releasable by a
> >> different context from the context having acquired the lock,
> >> crosslock. For crosslock, all locks having been held in the
> >> context unlocking the crosslock, until eventually the crosslock
> >> will be unlocked, have dependency with the crosslock. That's a
> >> key idea to implement crossrelease feature.
> >>
> >> Crossrelease feature introduces 2 new data structures.
> >>
> >> 1. pend_lock (== plock)
> >>
> >> 	This is for keeping locks waiting to commit those so
> >> 	that an actual dependency chain is built, when commiting
> >> 	a crosslock.
> >>
> >> 	Every task_struct has an array of this pending lock to
> >> 	keep those locks. These pending locks will be added
> >> 	whenever lock_acquire() is called for normal(non-crosslock)
> >> 	lock and will be flushed(committed) at proper time.
> >>
> >> 2. cross_lock (== xlock)
> >>
> >> 	This keeps some additional data only for crosslock. There
> >> 	is one cross_lock per one lockdep_map for crosslock.
> >> 	lockdep_init_map_crosslock() should be used instead of
> >> 	lockdep_init_map() to use the lock as a crosslock.
> >>
> >> Acquiring and releasing sequence for crossrelease feature:
> >>
> >> 1. Acquire
> >>
> >> 	All validation check is performed for all locks.
> >>
> >> 	1) For non-crosslock (normal lock)
> >>
> >> 		The hlock will be added not only to held_locks
> >> 		of the current's task_struct, but also to
> >> 		pend_lock array of the task_struct, so that
> >> 		a dependency chain can be built with the lock
> >> 		when doing commit.
> >>
> >> 	2) For crosslock
> >>
> >> 		The hlock will be added only to the cross_lock
> >> 		of the lock's lockdep_map instead of held_locks,
> >> 		so that a dependency chain can be built with
> >> 		the lock when doing commit. And this lock is
> >> 		added to the xlocks_head list.
> >>
> >> 2. Commit (only for crosslock)
> >>
> >> 	This establishes a dependency chain between the lock
> >> 	unlocking it now and all locks having held in the context
> >> 	unlocking it since the lock was held, even though it tries
> >> 	to avoid building a chain unnecessarily as far as possible.
> >>
> >> 3. Release
> >>
> >> 	1) For non-crosslock (normal lock)
> >>
> >> 		No change.
> >>
> >> 	2) For crosslock
> >>
> >> 		Just Remove the lock from xlocks_head list. Release
> >> 		operation should be used with commit operation
> >> 		together for crosslock, in order to build a
> >> 		dependency chain properly.
> >>
> >> Byungchul Park (12):
> >>   lockdep: Refactor lookup_chain_cache()
> >>   lockdep: Add a function building a chain between two hlocks
> >>   lockdep: Make check_prev_add can use a stack_trace of other context
> >>   lockdep: Make save_trace can copy from other stack_trace
> >>   lockdep: Implement crossrelease feature
> >>   lockdep: Apply crossrelease to completion
> >>   pagemap.h: Remove trailing white space
> >>   lockdep: Apply crossrelease to PG_locked lock
> >>   cifs/file.c: Remove trailing white space
> >>   mm/swap_state.c: Remove trailing white space
> >>   lockdep: Call lock_acquire(release) when accessing PG_locked manually
> >>   x86/dumpstack: Optimize save_stack_trace
> >>
> >>  arch/x86/include/asm/stacktrace.h |   1 +
> >>  arch/x86/kernel/dumpstack.c       |   2 +
> >>  arch/x86/kernel/dumpstack_32.c    |   2 +
> >>  arch/x86/kernel/stacktrace.c      |   7 +
> >>  fs/cifs/file.c                    |   6 +-
> >>  include/linux/completion.h        | 121 +++++-
> >>  include/linux/irqflags.h          |  16 +-
> >>  include/linux/lockdep.h           | 139 +++++++
> >>  include/linux/mm_types.h          |   9 +
> >>  include/linux/pagemap.h           | 104 ++++-
> >>  include/linux/sched.h             |   5 +
> >>  kernel/fork.c                     |   4 +
> >>  kernel/locking/lockdep.c          | 846 +++++++++++++++++++++++++++++++++++---
> >>  kernel/sched/completion.c         |  55 +--
> >>  lib/Kconfig.debug                 |  30 ++
> >>  mm/filemap.c                      |  10 +-
> >>  mm/ksm.c                          |   1 +
> >>  mm/migrate.c                      |   1 +
> >>  mm/page_alloc.c                   |   3 +
> >>  mm/shmem.c                        |   2 +
> >>  mm/swap_state.c                   |  12 +-
> >>  mm/vmscan.c                       |   1 +
> >>  22 files changed, 1255 insertions(+), 122 deletions(-)
> >>
> >> -- 
> >> 1.9.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 12/12] x86/dumpstack: Optimize save_stack_trace
  2016-06-20  7:50     ` byungchul.park
@ 2016-06-29 12:43       ` Byungchul Park
  2016-06-30 10:38         ` xinhui
  0 siblings, 1 reply; 38+ messages in thread
From: Byungchul Park @ 2016-06-29 12:43 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak, xinhui.pan

On Mon, Jun 20, 2016 at 04:50:37PM +0900, byungchul.park wrote:
> > -----Original Message-----
> > From: xinhui [mailto:xinhui.pan@linux.vnet.ibm.com]
> > Sent: Monday, June 20, 2016 4:29 PM
> > To: Byungchul Park; peterz@infradead.org; mingo@kernel.org
> > Cc: linux-kernel@vger.kernel.org; npiggin@suse.de; walken@google.com;
> > ak@suse.de; tglx@inhelltoy.tec.linutronix.de
> > Subject: Re: [RFC 12/12] x86/dumpstack: Optimize save_stack_trace
> > 
> > 
> > On 2016年06月20日 12:55, Byungchul Park wrote:
> > > Currently, x86 implementation of save_stack_trace() is walking all stack
> > > region word by word regardless of what the trace->max_entries is.
> > > However, it's unnecessary to walk after already fulfilling caller's
> > > requirement, say, if trace->nr_entries >= trace->max_entries is true.
> > >
> > > For example, CONFIG_LOCKDEP_CROSSRELEASE implementation calls
> > > save_stack_trace() with max_entries = 5 frequently. I measured its
> > > overhead and printed its difference of sched_clock() with my QEMU x86
> > > machine.
> > >
> > > The latency was improved over 70% when trace->max_entries = 5.
> > >
> > [snip]
> > 
> > > +static int save_stack_end(void *data)
> > > +{
> > > +	struct stack_trace *trace = data;
> > > +	return trace->nr_entries >= trace->max_entries;
> > > +}
> > > +
> > >   static const struct stacktrace_ops save_stack_ops = {
> > >   	.stack		= save_stack_stack,
> > >   	.address	= save_stack_address,
> > then why not check the return value of ->address(), -1 indicate there is
> > no room to store any pointer.
> 
> Hello,
> 
> Indeed. It also looks good to me even though it has to propagate the condition
> between callback functions. I will modify it if it's better.

Do you also think it would be better to make it propagate the result of
->address() rather than add a new callback, say, end_walk?

> 
> Thank you.
> Byungchul
> 
> > 
> > >   	.walk_stack	= print_context_stack,
> > > +	.end_walk	= save_stack_end,
> > >   };
> > >
> > >   static const struct stacktrace_ops save_stack_ops_nosched = {
> > >

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 12/12] x86/dumpstack: Optimize save_stack_trace
  2016-06-29 12:43       ` Byungchul Park
@ 2016-06-30 10:38         ` xinhui
  2016-06-30 23:06           ` Byungchul Park
  0 siblings, 1 reply; 38+ messages in thread
From: xinhui @ 2016-06-30 10:38 UTC (permalink / raw)
  To: Byungchul Park, peterz, mingo; +Cc: linux-kernel, npiggin, walken, ak



On 2016年06月29日 20:43, Byungchul Park wrote:
> On Mon, Jun 20, 2016 at 04:50:37PM +0900, byungchul.park wrote:
>>> -----Original Message-----
>>> From: xinhui [mailto:xinhui.pan@linux.vnet.ibm.com]
>>> Sent: Monday, June 20, 2016 4:29 PM
>>> To: Byungchul Park; peterz@infradead.org; mingo@kernel.org
>>> Cc: linux-kernel@vger.kernel.org; npiggin@suse.de; walken@google.com;
>>> ak@suse.de; tglx@inhelltoy.tec.linutronix.de
>>> Subject: Re: [RFC 12/12] x86/dumpstack: Optimize save_stack_trace
>>>
>>>
>>> On 2016年06月20日 12:55, Byungchul Park wrote:
>>>> Currently, x86 implementation of save_stack_trace() is walking all stack
>>>> region word by word regardless of what the trace->max_entries is.
>>>> However, it's unnecessary to walk after already fulfilling caller's
>>>> requirement, say, if trace->nr_entries >= trace->max_entries is true.
>>>>
>>>> For example, CONFIG_LOCKDEP_CROSSRELEASE implementation calls
>>>> save_stack_trace() with max_entries = 5 frequently. I measured its
>>>> overhead and printed its difference of sched_clock() with my QEMU x86
>>>> machine.
>>>>
>>>> The latency was improved over 70% when trace->max_entries = 5.
>>>>
>>> [snip]
>>>
>>>> +static int save_stack_end(void *data)
>>>> +{
>>>> +	struct stack_trace *trace = data;
>>>> +	return trace->nr_entries >= trace->max_entries;
>>>> +}
>>>> +
>>>>    static const struct stacktrace_ops save_stack_ops = {
>>>>    	.stack		= save_stack_stack,
>>>>    	.address	= save_stack_address,
>>> then why not check the return value of ->address(), -1 indicate there is
>>> no room to store any pointer.
>>
>> Hello,
>>
>> Indeed. It also looks good to me even though it has to propagate the condition
>> between callback functions. I will modify it if it's better.
>
> Do you also think it would be better to make it propagate the result of
> ->address() rather than add a new callback, say, end_walk?
>
It's up to you. In my opinion, end_walk is better for reading.
  
>>
>> Thank you.
>> Byungchul
>>
>>>
>>>>    	.walk_stack	= print_context_stack,
>>>> +	.end_walk	= save_stack_end,
>>>>    };
>>>>
>>>>    static const struct stacktrace_ops save_stack_ops_nosched = {
>>>>
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 05/12] lockdep: Implement crossrelease feature
  2016-06-20  4:55 ` [RFC 05/12] lockdep: Implement crossrelease feature Byungchul Park
@ 2016-06-30 13:03   ` Peter Zijlstra
  2016-06-30 23:28     ` Byungchul Park
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2016-06-30 13:03 UTC (permalink / raw)
  To: Byungchul Park; +Cc: mingo, linux-kernel, npiggin, walken, ak, tglx

On Mon, Jun 20, 2016 at 01:55:20PM +0900, Byungchul Park wrote:
> +struct cross_lock {
> +	unsigned int		gen_id;

4 byte hole

> +	struct list_head	xlock_entry;
> +
> +	/*
> +	 * Seperated hlock instance. This will be used when
> +	 * building a dependency chain for a crosslock, say,
> +	 * commit.
> +	 */
> +	struct held_lock	hlock;
> +
> +	int			ref; /* reference count */

4 byte hole

> +};

A trivial re-arrangement would shrink this structure by 8 bytes.

After which its still at least 64 bytes.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 08/12] lockdep: Apply crossrelease to PG_locked lock
  2016-06-20  4:55 ` [RFC 08/12] lockdep: Apply crossrelease to PG_locked lock Byungchul Park
@ 2016-06-30 13:04   ` Peter Zijlstra
  2016-06-30 23:21     ` Byungchul Park
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2016-06-30 13:04 UTC (permalink / raw)
  To: Byungchul Park; +Cc: mingo, linux-kernel, npiggin, walken, ak, tglx

On Mon, Jun 20, 2016 at 01:55:23PM +0900, Byungchul Park wrote:
> @@ -215,6 +219,11 @@ struct page {
>  #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
>  	int _last_cpupid;
>  #endif
> +
> +#ifdef CONFIG_LOCKDEP_PAGELOCK
> +	struct lockdep_map map;
> +	struct cross_lock xlock;
> +#endif
>  }

So that's 32+64=96 bytes (CONFIG_LOCK_STAT=n) added to struct page,
really!?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 12/12] x86/dumpstack: Optimize save_stack_trace
  2016-06-30 10:38         ` xinhui
@ 2016-06-30 23:06           ` Byungchul Park
  0 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-30 23:06 UTC (permalink / raw)
  To: xinhui; +Cc: peterz, mingo, linux-kernel, npiggin, walken, ak

On Thu, Jun 30, 2016 at 06:38:47PM +0800, xinhui wrote:
> >>>>+static int save_stack_end(void *data)
> >>>>+{
> >>>>+	struct stack_trace *trace = data;
> >>>>+	return trace->nr_entries >= trace->max_entries;
> >>>>+}
> >>>>+
> >>>>   static const struct stacktrace_ops save_stack_ops = {
> >>>>   	.stack		= save_stack_stack,
> >>>>   	.address	= save_stack_address,
> >>>then why not check the return value of ->address(), -1 indicate there is
> >>>no room to store any pointer.
> >>
> >>Hello,
> >>
> >>Indeed. It also looks good to me even though it has to propagate the condition
> >>between callback functions. I will modify it if it's better.
> >
> >Do you also think it would be better to make it propagate the result of
> >->address() rather than add a new callback, say, end_walk?
> >
> It's up to you. In my opinion, end_walk is better for reading.

I also prefer the way this patch works.

> >>
> >>Thank you.
> >>Byungchul
> >>
> >>>
> >>>>   	.walk_stack	= print_context_stack,
> >>>>+	.end_walk	= save_stack_end,
> >>>>   };
> >>>>
> >>>>   static const struct stacktrace_ops save_stack_ops_nosched = {
> >>>>
> >

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 08/12] lockdep: Apply crossrelease to PG_locked lock
  2016-06-30 13:04   ` Peter Zijlstra
@ 2016-06-30 23:21     ` Byungchul Park
  2016-07-01  8:10       ` Peter Zijlstra
  2016-07-01 11:18       ` Kirill A. Shutemov
  0 siblings, 2 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-30 23:21 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, npiggin, walken, ak, tglx

On Thu, Jun 30, 2016 at 03:04:58PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 20, 2016 at 01:55:23PM +0900, Byungchul Park wrote:
> > @@ -215,6 +219,11 @@ struct page {
> >  #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
> >  	int _last_cpupid;
> >  #endif
> > +
> > +#ifdef CONFIG_LOCKDEP_PAGELOCK
> > +	struct lockdep_map map;
> > +	struct cross_lock xlock;
> > +#endif
> >  }
> 
> So that's 32+64=96 bytes (CONFIG_LOCK_STAT=n) added to struct page,
> really!?

Yes... I concerned it at first, but I thought it would be ok since
CONFIG_LOCKDEP_PAGE is a debug feature. Anyway, I will try to reduce
the size of struct cross_lock which is only thing I can do to reduce
it, since we cannot avoid using lockdep_map if we want to make
lock_page() participate in the lockdep play.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 05/12] lockdep: Implement crossrelease feature
  2016-06-30 13:03   ` Peter Zijlstra
@ 2016-06-30 23:28     ` Byungchul Park
  0 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-06-30 23:28 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, npiggin, walken, ak, tglx

On Thu, Jun 30, 2016 at 03:03:57PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 20, 2016 at 01:55:20PM +0900, Byungchul Park wrote:
> > +struct cross_lock {
> > +	unsigned int		gen_id;
> 
> 4 byte hole
> 
> > +	struct list_head	xlock_entry;
> > +
> > +	/*
> > +	 * Seperated hlock instance. This will be used when
> > +	 * building a dependency chain for a crosslock, say,
> > +	 * commit.
> > +	 */
> > +	struct held_lock	hlock;
> > +
> > +	int			ref; /* reference count */
> 
> 4 byte hole
> 
> > +};
> 
> A trivial re-arrangement would shrink this structure by 8 bytes.

Ok. Thank you.

> 
> After which its still at least 64 bytes.

I will try to reduce the size as much as possible. However I think it's
not serious problem beacuse this is just a debug feature.

Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH] lockdep: Add a document describing crossrelease feature
  2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
                   ` (12 preceding siblings ...)
  2016-06-23 23:37 ` [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
@ 2016-07-01  4:15 ` Byungchul Park
  2016-07-01 10:45   ` Peter Zijlstra
  13 siblings, 1 reply; 38+ messages in thread
From: Byungchul Park @ 2016-07-01  4:15 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, walken

Crossrelease feature introduces new concept and data structure. Thus a
document helping understand it is necessary. So added it.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 Documentation/locking/crossrelease.txt | 276 +++++++++++++++++++++++++++++++++
 1 file changed, 276 insertions(+)
 create mode 100644 Documentation/locking/crossrelease.txt

diff --git a/Documentation/locking/crossrelease.txt b/Documentation/locking/crossrelease.txt
new file mode 100644
index 0000000..98851ef
--- /dev/null
+++ b/Documentation/locking/crossrelease.txt
@@ -0,0 +1,276 @@
+Crossrelease lock dependency check
+==================================
+
+Started by Byungchul Park <byungchul.park@lge.com>
+
+Contents:
+
+ (*) What is a problem?
+
+     - Original lockdep's assumptions.
+     - Original lockdep's limitation.
+
+ (*) How to solve the problem.
+
+     - What causes deadlock?
+     - Relax the assumptions.
+     - Introduce "crosslock".
+     - Introduce "commit" stage.
+     - Acquire vs commit vs release
+
+ (*) Implementation.
+
+     - Data structures.
+     - Optimizations.
+
+
+=================
+What is a problem
+=================
+
+Can we detect deadlocks descriped below with original lockdep?
+No.
+
+Example 1)
+
+	PROCESS X	PROCESS Y
+	--------------	--------------
+	mutext_lock A
+			lock_page B
+	lock_page B
+			mutext_lock A // DEADLOCK
+	unlock_page B
+			mutext_unlock A
+	mutex_unlock A
+			unlock_page B
+
+We are currently not checking lock dependency for lock_page(), which is
+for exclusive access to pages.
+
+Example 2)
+
+	PROCESS X	PROCESS Y	PROCESS Z
+	--------------	--------------	--------------
+			mutex_lock A
+	lock_page B
+			lock_page B
+					mutext_lock A // DEADLOCK
+					mutext_unlock A
+					unlock_page B
+					(B was held by PROCESS X)
+			unlock_page B
+			mutex_unlock A
+
+We cannot detect this kind of deadlock with original lockdep, even
+though we enable lock dependency check on lock_page().
+
+Example 3)
+
+	PROCESS X	PROCESS Y
+	--------------	--------------
+			mutex_lock A
+	mutex_lock A
+	mutex_unlock A
+			wait_for_complete B // DEADLOCK
+	complete B
+			mutex_unlock A
+
+wait_for_complete() and complete() also can cause a deadlock, however
+we cannot detect it with original lockdep, either.
+
+
+Original lockdep's assumptions
+------------------------------
+
+Original lockdep (not crossrelease featured lockdep) assumes that,
+
+1. A lock will be unlocked within the context holding the lock.
+2. A lock has dependency with all locks already held in held_locks.
+2. Acquiring is more important than releasing, to check its dependency.
+
+
+Original lockdep's limitation
+-----------------------------
+
+Therefore, the original lockdep has limitations. It can be applied only
+to typical lock operations, e.g. spin_lock, mutex, semaphore and the
+like. Even though lock_page() can be considered as a lock, it cannot be
+used with lockdep because it violates assumptions of original lockdep.
+In the view point of original lockdep, a lock must be released within
+the context having held the lock, however, a lock using lock_page() can
+be released by different context from the context having held the lock.
+wait_for_complete() is also the case by nature, in which original
+lockdep cannot deal with it.
+
+
+========================
+How to solve the problem
+========================
+
+What causes deadlock
+--------------------
+
+Not only lock operations, but also any operations causing to wait or
+spin it e.g. all wait operations for an event, lock_page() and so on
+can cause deadlock unless it's eventually released by someone. The most
+important point here is that the waiting or spinning must be *released*
+by someone. In other words, we have to focus whether the waiting and
+spinning can be *released* or not to avoid deadlock, rather than
+waiting or spinning it itself.
+
+
+Relax the assumptions
+---------------------
+
+We can relax the assumtions the original lockdep has, which is not
+necessary to check dependency and detect a deadlock.
+
+1. A lock can be unlocked in any context, unless the context itself
+   causes a deadlock e.g. acquiring a lock in irq-safe context before
+   releasing the lock in irq-unsafe context.
+
+2. A lock has dependency with all locks in the releasing context, having
+   been held since the lock was held. Thus we can check the dependency
+   only after we identify the releasing context at first. Of course,
+   if we consider only typical lock e.g. spin lock, mutex, semaphore
+   and so on, then we can identify the releasing context at the time
+   acquiring a lock because the releasing context is same as the
+   releasing context for the typical lock. However, generally we have to
+   wait until the lock having been held will be eventually released to
+   identify the releasing context. We can say that the original lockdep
+   is a special case among all cases this crossrelease feature can deal
+   with.
+
+3. Releasing is more important than acquiring to check its dependency.
+   Compare to the third assumption of original lockdep.
+
+
+Introduce "crosslock"
+---------------------
+
+Crossrelease feature names a lock "crosslock" if it is releasable by a
+different context from the context having acquired the lock. All locks
+having been held in the context unlocking the crosslock until
+eventually the crosslock will be unlocked, have dependency with the
+crosslock. That's the key idea to implement crossrelease feature.
+
+
+Introduce "commit" stage
+------------------------
+
+Crossrelease feature names it "commit", to check dependency and build
+the dependency tree and chain. That is, the original lockdep is already
+doing the so-called commit, when acquiring it. In the strict sense, the
+checking and building must be done in the releasing context, as
+described in the "What causes a deadlock" subsection above. However, it
+will work no matter which context is used for typical lock, since it's
+guarrented that the acquiring context is same as the releasing context
+as described above. So we can commit it in the acquiring context for
+typical lock.
+
+How the original lockdep works:
+
+	acquire (including commit operation) -> release
+
+What if we consider a crosslock? For crosslock, the way lockdep works
+must be changed so that the releasing context is considered instead.
+Again, the releasing context is more important than the acquiring
+context, to check dependency and detect a deadlock. Thus checking
+dependency and building the dependency tree and chain, namely commit
+must be done in the releasing context, especially for crosslock.
+
+How the crossrelease lockdep works for crosslock:
+
+	acquire -> (context may be changed) -> commit -> release
+
+
+Acquire vs commit vs release
+----------------------------
+
+The things to do when acquiring and releasing a lock will be slightly
+changed in other to make lockdep can work even for crosslock. And an
+additional stage, commit, is placed between acquire and release.
+
+1. Acquire
+
+	1) For typical lock
+
+		The lock will be added not only to held_locks of the
+		current's task_struct, but also to additional structure
+		so that the commit stage can check dependency and build
+		the dependency tree and chain with that later.
+
+	2) For crosslock
+
+		The lock will be added to a global linked list so that
+		the commit stage can check dependency and build the
+		dependency tree and chain with that later.
+
+2. Commit
+
+	1) For typical lock
+
+		N/A.
+
+	2) For crosslock
+
+		It checks dependency and builds the dependency tree and
+		chain with data saved in the acquire stage. Here, we
+		establish dependency between the crosslock we are
+		unlocking now and all locks in the context unlocking it,
+		having been held since the lock was held. Of course,
+		it avoids unnecessary checking and building as far as
+		possible.
+
+3. Release
+
+	1) For typical lock
+
+		No change.
+
+	2) For crosslock
+
+		Just Remove the target crosslock from the global linked
+		list, to which the crosslock was added at acquire stage.
+		Release operation should be used with commit operation
+		together for crosslock, in order to build a dependency
+		chain properly.
+
+
+==============
+Implementation
+==============
+
+Data structures
+---------------
+
+Crossrelease feature introduces two new data structures.
+
+1. pend_lock (== plock)
+
+	This is for keeping locks waiting to be committed so that the
+	actual dependency tree and chain is built in the commit stage.
+	Every task_struct has an pend_lock array to keep those locks.
+	pend_lock entry will be consumed and filled whenever
+	lock_acquire() is called for typical lock and will be flushed,
+	namely committed at proper time.
+
+2. cross_lock (== xlock)
+
+	This keeps some additional data only for crosslock. One
+	cross_lock exists per one lockdep_map.
+	lockdep_init_map_crosslock() should be used instead of
+	lockdep_init_map() to use a lock as a crosslock.
+
+
+Optimizations
+-------------
+
+Adding a pend_lock is an operation very frequently happened because it
+happens whenever a typical lock is acquired. So the operation is
+implemented locklessly using rcu mechanism unless the xlock instance
+can be freed or destroyed unpredictably e.g. the instance is on stack.
+
+And chain cache for crosslock is also used to avoid unnecessary checking
+and building dependency, like how the original lockdep is doing for that
+purpose.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [RFC 08/12] lockdep: Apply crossrelease to PG_locked lock
  2016-06-30 23:21     ` Byungchul Park
@ 2016-07-01  8:10       ` Peter Zijlstra
  2016-07-01 11:18       ` Kirill A. Shutemov
  1 sibling, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2016-07-01  8:10 UTC (permalink / raw)
  To: Byungchul Park; +Cc: mingo, linux-kernel, npiggin, walken, ak, tglx

On Fri, Jul 01, 2016 at 08:21:21AM +0900, Byungchul Park wrote:
> On Thu, Jun 30, 2016 at 03:04:58PM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 20, 2016 at 01:55:23PM +0900, Byungchul Park wrote:
> > > @@ -215,6 +219,11 @@ struct page {
> > >  #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
> > >  	int _last_cpupid;
> > >  #endif
> > > +
> > > +#ifdef CONFIG_LOCKDEP_PAGELOCK
> > > +	struct lockdep_map map;
> > > +	struct cross_lock xlock;
> > > +#endif
> > >  }
> > 
> > So that's 32+64=96 bytes (CONFIG_LOCK_STAT=n) added to struct page,
> > really!?
> 
> Yes... I concerned it at first, but I thought it would be ok since
> CONFIG_LOCKDEP_PAGE is a debug feature.

Right, but still, that's 0.75 GB of memory on my desktop (32GB total)
just for a debug feature. It grows struct page from 1.5% to 3.9% of
total memory, that is immense.

We've avoided doing this for ptl; which was doable because typically
only a small number of pages ends up being a pagetable.

In any case, I feel PG_locked is special enough to fudge. After all, the
content of all these lockdep_map thingies would basically be the same,
which is a massive waste of space.

I still need to bend my brain around this xlock stuff, that just didn't
want to parse when I looked at it last night.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] lockdep: Add a document describing crossrelease feature
  2016-07-01  4:15 ` [PATCH] lockdep: Add a document describing " Byungchul Park
@ 2016-07-01 10:45   ` Peter Zijlstra
  2016-07-04  6:42     ` Byungchul Park
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2016-07-01 10:45 UTC (permalink / raw)
  To: Byungchul Park; +Cc: mingo, linux-kernel, walken


So I really could not understand your initial changelogs, this text
seem to be somewhat better, so let me try and comment on this.

On Fri, Jul 01, 2016 at 01:15:38PM +0900, Byungchul Park wrote:

> +++ b/Documentation/locking/crossrelease.txt
> @@ -0,0 +1,276 @@
> +Crossrelease lock dependency check
> +==================================
> +
> +Started by Byungchul Park <byungchul.park@lge.com>
> +
> +Contents:
> +
> + (*) What is a problem?
> +
> +     - Original lockdep's assumptions.
> +     - Original lockdep's limitation.

Their form doesn't make sense if we ever commit this. Nobody knows or
cares about an 'original' lockdep. There is only now.

> +Original lockdep's assumptions
> +------------------------------
> +
> +Original lockdep (not crossrelease featured lockdep) assumes that,
> +
> +1. A lock will be unlocked within the context holding the lock.

This is lock owner semantics; that is, each lock has a clear owner.
Which is a sane assumption, and a hard requirement for PI. Remember, all
this comes from the RT tree.

!owner locks cannot do PI and thus cannot be used for code you want to
provide deterministic behaviour with.

> +2. A lock has dependency with all locks already held in held_locks.

That's not really an assumption, given 1, this is a fact. An owner lock
can only depend on locks currently held. This is a corner stone of
proving things.

> +2. Acquiring is more important than releasing, to check its dependency.

s/2/3/

That's not an assumption; that's a hard requirement. Since the acquire
is the one blocking, you _have_ to check for cycles before you block,
otherwise you'll hit the deadlock and not get a report, because you're
deadlocked.

> +Original lockdep's limitation
> +-----------------------------
> +
> +Therefore, the original lockdep has limitations. It can be applied only
> +to typical lock operations, e.g. spin_lock, mutex, semaphore and the

This is wrong, semaphores are very much not covered by lockdep since
they do not have owner semantics (what you call crossmuck). (this is
the distinction between a binary semaphore and a mutex)

> +What causes deadlock
> +--------------------
> +
> +Not only lock operations, but also any operations causing to wait or
> +spin it e.g. all wait operations for an event, lock_page() and so on
> +can cause deadlock unless it's eventually released by someone. The most
> +important point here is that the waiting or spinning must be *released*
> +by someone. In other words, we have to focus whether the waiting and
> +spinning can be *released* or not to avoid deadlock, rather than
> +waiting or spinning it itself.

But since its the blocking that _is_ the deadlock, you'll never get your
report.

IOW, you rely on future behaviour to tell if now can make forwards
progress. This already implies a well formed program. You're inverting
causality.

> +Relax the assumptions
> +---------------------
> +
> +We can relax the assumtions the original lockdep has, which is not
> +necessary to check dependency and detect a deadlock.
> +
> +1. A lock can be unlocked in any context, unless the context itself
> +   causes a deadlock e.g. acquiring a lock in irq-safe context before
> +   releasing the lock in irq-unsafe context.

You fail to say how this preserves correctness. By relaxing this you
loose the held_lock dependencies and you destroy the entire proof that
currently underpins lockdep.

> +2. A lock has dependency with all locks in the releasing context, having
> +   been held since the lock was held.

But you cannot tell this. The 'since the lock was held' thing fully
depends on timing and is not fundamentally correct.

			lock(A)
			unlock(A)
	lock(A)
	wait_for(B)
	unlock(A)
			wake(B)

Between the wait_for(B) and wake(B), _nothing_ has been held, yet still
there's the deadlock potential.

And note that if the timing was 'right', you would never get to wake(B)
because deadlock, so you'd never establish that there would be a
deadlock.

>                                        Thus we can check the dependency
> +   only after we identify the releasing context at first. Of course,
> +   if we consider only typical lock e.g. spin lock, mutex, semaphore
> +   and so on, then we can identify the releasing context at the time
> +   acquiring a lock because the releasing context is same as the
> +   releasing context for the typical lock. However, generally we have to
> +   wait until the lock having been held will be eventually released to
> +   identify the releasing context. We can say that the original lockdep
> +   is a special case among all cases this crossrelease feature can deal
> +   with.

I'm not sure you can say this at all; you've no proof of correctness
from which this special case flows.

> +3. Releasing is more important than acquiring to check its dependency.
> +   Compare to the third assumption of original lockdep.

Again, you're inverting causality afaict. You depend on the future
happening to say now is correct.

> +Introduce "crosslock"
> +---------------------
> +
> +Crossrelease feature names a lock "crosslock" if it is releasable by a
> +different context from the context having acquired the lock. All locks
> +having been held in the context unlocking the crosslock until
> +eventually the crosslock will be unlocked, have dependency with the
> +crosslock. That's the key idea to implement crossrelease feature.

_all_ locks? That implies infinite storage, which is hardly feasible. If
you limit it, the limit would seem arbitrary and you loose your proof
(in so far as I can see, because you're not actually giving any).


Please, give a coherent, mathematical proof of correctness.

Because I'm not seeing how this thing would work.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 08/12] lockdep: Apply crossrelease to PG_locked lock
  2016-06-30 23:21     ` Byungchul Park
  2016-07-01  8:10       ` Peter Zijlstra
@ 2016-07-01 11:18       ` Kirill A. Shutemov
  2016-07-04  4:30         ` Byungchul Park
  1 sibling, 1 reply; 38+ messages in thread
From: Kirill A. Shutemov @ 2016-07-01 11:18 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Peter Zijlstra, mingo, linux-kernel, npiggin, walken, ak, tglx

On Fri, Jul 01, 2016 at 08:21:21AM +0900, Byungchul Park wrote:
> On Thu, Jun 30, 2016 at 03:04:58PM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 20, 2016 at 01:55:23PM +0900, Byungchul Park wrote:
> > > @@ -215,6 +219,11 @@ struct page {
> > >  #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
> > >  	int _last_cpupid;
> > >  #endif
> > > +
> > > +#ifdef CONFIG_LOCKDEP_PAGELOCK
> > > +	struct lockdep_map map;
> > > +	struct cross_lock xlock;
> > > +#endif
> > >  }
> > 
> > So that's 32+64=96 bytes (CONFIG_LOCK_STAT=n) added to struct page,
> > really!?
> 
> Yes... I concerned it at first, but I thought it would be ok since
> CONFIG_LOCKDEP_PAGE is a debug feature. Anyway, I will try to reduce
> the size of struct cross_lock which is only thing I can do to reduce
> it, since we cannot avoid using lockdep_map if we want to make
> lock_page() participate in the lockdep play.

Please use page_ext instead. With boottime switch to enable.

This way we can have this compile-time debug option enabled on more
machines without unnecessary runtime overhead.

And, please, CC linux-mm next time.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 08/12] lockdep: Apply crossrelease to PG_locked lock
  2016-07-01 11:18       ` Kirill A. Shutemov
@ 2016-07-04  4:30         ` Byungchul Park
  0 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-07-04  4:30 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Peter Zijlstra, mingo, linux-kernel, walken

On Fri, Jul 01, 2016 at 02:18:46PM +0300, Kirill A. Shutemov wrote:
> On Fri, Jul 01, 2016 at 08:21:21AM +0900, Byungchul Park wrote:
> > On Thu, Jun 30, 2016 at 03:04:58PM +0200, Peter Zijlstra wrote:
> > > On Mon, Jun 20, 2016 at 01:55:23PM +0900, Byungchul Park wrote:
> > > > @@ -215,6 +219,11 @@ struct page {
> > > >  #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
> > > >  	int _last_cpupid;
> > > >  #endif
> > > > +
> > > > +#ifdef CONFIG_LOCKDEP_PAGELOCK
> > > > +	struct lockdep_map map;
> > > > +	struct cross_lock xlock;
> > > > +#endif
> > > >  }
> > > 
> > > So that's 32+64=96 bytes (CONFIG_LOCK_STAT=n) added to struct page,
> > > really!?
> > 
> > Yes... I concerned it at first, but I thought it would be ok since
> > CONFIG_LOCKDEP_PAGE is a debug feature. Anyway, I will try to reduce
> > the size of struct cross_lock which is only thing I can do to reduce
> > it, since we cannot avoid using lockdep_map if we want to make
> > lock_page() participate in the lockdep play.
> 
> Please use page_ext instead. With boottime switch to enable.
> 
> This way we can have this compile-time debug option enabled on more
> machines without unnecessary runtime overhead.

Thank you for advice.

I also think it's one of good candidates except the fact that it have to
depend on page_ext additionally.

> 
> And, please, CC linux-mm next time.
> 
> -- 
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] lockdep: Add a document describing crossrelease feature
  2016-07-01 10:45   ` Peter Zijlstra
@ 2016-07-04  6:42     ` Byungchul Park
  2016-07-06  0:49       ` Boqun Feng
  0 siblings, 1 reply; 38+ messages in thread
From: Byungchul Park @ 2016-07-04  6:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, walken

On Fri, Jul 01, 2016 at 12:45:21PM +0200, Peter Zijlstra wrote:
> > +Crossrelease lock dependency check
> > +==================================
> > +
> > +Started by Byungchul Park <byungchul.park@lge.com>
> > +
> > +Contents:
> > +
> > + (*) What is a problem?
> > +
> > +     - Original lockdep's assumptions.
> > +     - Original lockdep's limitation.
> 
> Their form doesn't make sense if we ever commit this. Nobody knows or
> cares about an 'original' lockdep. There is only now.

Right. I wonder what word would be proper. Basic? Or just lockdep?

> > +Original lockdep's assumptions
> > +------------------------------
> > +
> > +Original lockdep (not crossrelease featured lockdep) assumes that,
> > +
> > +1. A lock will be unlocked within the context holding the lock.
> 
> This is lock owner semantics; that is, each lock has a clear owner.
> Which is a sane assumption, and a hard requirement for PI. Remember, all
> this comes from the RT tree.

I agree it's hard requirment for PI.

> 
> !owner locks cannot do PI and thus cannot be used for code you want to
> provide deterministic behaviour with.

I am sorry for that I don't understand this sentence. Could you explain
it more?

> > +2. A lock has dependency with all locks already held in held_locks.
> 
> That's not really an assumption, given 1, this is a fact. An owner lock
> can only depend on locks currently held. This is a corner stone of
> proving things.

Yes, given 1, it's a fact. I need to modify this assumption section.

> 
> > +2. Acquiring is more important than releasing, to check its dependency.
> 
> s/2/3/
> 
> That's not an assumption; that's a hard requirement. Since the acquire
> is the one blocking, you _have_ to check for cycles before you block,
> otherwise you'll hit the deadlock and not get a report, because you're
> deadlocked.

I don't think that's a *hard* requirement, even though that's required
in order to be able to report the deadlock before hitting it actually.
I agree that we can report the actual deadlock only if we check it before
the blocking operation is actually executed, as the current lockdep can
report it before the actual deadlock happens. It's very good.

However, there's another valuable thing lockdep mechanism can provides.
It can report a deadlock possibility based on the dependency built even
though actual deadlock does not happen.

Of cource it would be the best if it can report both the actual deadlock
and the deadlock possibility. So I also think we should focus acquiring
rather than releasing for typical lock since we can always report the
deadlock.

However, for any other locks which can be released by different context
for the context it was held by, we cannot detect any deadlock focusing
acquiring because we don't know where it will be released. However we can
detect the deadlock possibility if we consider the releasing so that we
can identify the releasing context, even though this way we cannot report
the actual deadlock for the crosslock. I think, detecting the deadlock
possibility is more valuable than doing nothing, unless it harms original
lockdep's current capability.

> > +Original lockdep's limitation
> > +-----------------------------
> > +
> > +Therefore, the original lockdep has limitations. It can be applied only
> > +to typical lock operations, e.g. spin_lock, mutex, semaphore and the
> 
> This is wrong, semaphores are very much not covered by lockdep since
> they do not have owner semantics (what you call crossmuck). (this is
> the distinction between a binary semaphore and a mutex)

Sorry for missing it. And please don't use the word like muck.

> > +What causes deadlock
> > +--------------------
> > +
> > +Not only lock operations, but also any operations causing to wait or
> > +spin it e.g. all wait operations for an event, lock_page() and so on
> > +can cause deadlock unless it's eventually released by someone. The most
> > +important point here is that the waiting or spinning must be *released*
> > +by someone. In other words, we have to focus whether the waiting and
> > +spinning can be *released* or not to avoid deadlock, rather than
> > +waiting or spinning it itself.
> 
> But since its the blocking that _is_ the deadlock, you'll never get your
> report.

Yes right. So I also think it is better to leave current implementation
unchanged for typical lock. However it would be better to detect the
deadlock possibility than doing nothing for crosslock.

> > +Relax the assumptions
> > +---------------------
> > +
> > +We can relax the assumtions the original lockdep has, which is not
> > +necessary to check dependency and detect a deadlock.
> > +
> > +1. A lock can be unlocked in any context, unless the context itself
> > +   causes a deadlock e.g. acquiring a lock in irq-safe context before
> > +   releasing the lock in irq-unsafe context.
> 
> You fail to say how this preserves correctness. By relaxing this you

Yes, I have to add more description about that.

> loose the held_lock dependencies and you destroy the entire proof that
> currently underpins lockdep.

I've never touch the current proof the lockdep uses. Just *added*
additional detection capability to detect the deadlock possibility for
crosslock for which currently we are doing nothing.

> > +2. A lock has dependency with all locks in the releasing context, having
> > +   been held since the lock was held.
> 
> But you cannot tell this. The 'since the lock was held' thing fully
> depends on timing and is not fundamentally correct.
> 
> 			lock(A)
> 			unlock(A)
> 	lock(A)
> 	wait_for(B)
> 	unlock(A)
> 			wake(B)
> 
> Between the wait_for(B) and wake(B), _nothing_ has been held, yet still
> there's the deadlock potential.

Crossreleas feature can detect this situation as a deadlock. wait_for()
is not an actual lock, but we can make it detectable by using acquring and
releasing semantics on wait_for() and wake().

> And note that if the timing was 'right', you would never get to wake(B)
> because deadlock, so you'd never establish that there would be a
> deadlock.

If a deadlock actually happens, then we cannot establish it as you said.
Remind that current lockdep does nothing for this situation. But at least
crossrelease feature can detect this deadlock possibility at the time the
dependency tree(graph) is built, which is better than doing nothing.

> >                                        Thus we can check the dependency
> > +   only after we identify the releasing context at first. Of course,
> > +   if we consider only typical lock e.g. spin lock, mutex, semaphore
> > +   and so on, then we can identify the releasing context at the time
> > +   acquiring a lock because the releasing context is same as the
> > +   releasing context for the typical lock. However, generally we have to
> > +   wait until the lock having been held will be eventually released to
> > +   identify the releasing context. We can say that the original lockdep
> > +   is a special case among all cases this crossrelease feature can deal
> > +   with.
> 
> I'm not sure you can say this at all; you've no proof of correctness
> from which this special case flows.

I need to modify and reinforce this description so that does not make you
confused.

> > +Introduce "crosslock"
> > +---------------------
> > +
> > +Crossrelease feature names a lock "crosslock" if it is releasable by a
> > +different context from the context having acquired the lock. All locks
> > +having been held in the context unlocking the crosslock until
> > +eventually the crosslock will be unlocked, have dependency with the
> > +crosslock. That's the key idea to implement crossrelease feature.
> 
> _all_ locks? That implies infinite storage, which is hardly feasible. If

Right. Basically it tries to consider all lock, but crossfeature builds
cache chain and check if the consideration is duplicated or not, and only
consider necessary ones. At least, on my qemu machine it works well without
any problem and detect problematic deadlock situation I mentioned.

> you limit it, the limit would seem arbitrary and you loose your proof
> (in so far as I can see, because you're not actually giving any).

I will give more description about implementation next spin.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] lockdep: Add a document describing crossrelease feature
  2016-07-04  6:42     ` Byungchul Park
@ 2016-07-06  0:49       ` Boqun Feng
  2016-07-06  2:17         ` Byungchul Park
  0 siblings, 1 reply; 38+ messages in thread
From: Boqun Feng @ 2016-07-06  0:49 UTC (permalink / raw)
  To: Byungchul Park; +Cc: Peter Zijlstra, mingo, linux-kernel, walken

[-- Attachment #1: Type: text/plain, Size: 2019 bytes --]

On Mon, Jul 04, 2016 at 03:42:59PM +0900, Byungchul Park wrote:
[snip]
> > > +2. A lock has dependency with all locks in the releasing context, having
> > > +   been held since the lock was held.
> > 
> > But you cannot tell this. The 'since the lock was held' thing fully
> > depends on timing and is not fundamentally correct.
> > 
> > 			lock(A)
> > 			unlock(A)
> > 	lock(A)
> > 	wait_for(B)
> > 	unlock(A)
> > 			wake(B)
> > 
> > Between the wait_for(B) and wake(B), _nothing_ has been held, yet still
> > there's the deadlock potential.
> 
> Crossreleas feature can detect this situation as a deadlock. wait_for()
> is not an actual lock, but we can make it detectable by using acquring and
> releasing semantics on wait_for() and wake().
> 
> > And note that if the timing was 'right', you would never get to wake(B)
> > because deadlock, so you'd never establish that there would be a
> > deadlock.
> 
> If a deadlock actually happens, then we cannot establish it as you said.
> Remind that current lockdep does nothing for this situation. But at least
> crossrelease feature can detect this deadlock possibility at the time the
> dependency tree(graph) is built, which is better than doing nothing.
> 

Confused, how?

Say the sequence of events is as follow:

(two tasks are initially with no lock held)

	Task 1		Task 2
	=============	====================
			lock(A)
			unlock(A)
	lock(A)
	wait_for(B) // acquire
			wake(B) // commit + release
	unlock(A)

by the time, the commit are called, the dependency tree will be built,
and we will find there is _no_ lock held before wake(B). Therefore at
the release stage, you will end up only adding dependency chain A->B in
the lockdep, right? And it looks like neither Task1 or Task2 will break
the dependency chain A->B. So how can crossrelease detect the potential
deadlock?

It will be better, that you could provide some samples that crossrelease
can detect after your confirmation.

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] lockdep: Add a document describing crossrelease feature
  2016-07-06  0:49       ` Boqun Feng
@ 2016-07-06  2:17         ` Byungchul Park
  2016-07-06  5:33           ` Byungchul Park
  0 siblings, 1 reply; 38+ messages in thread
From: Byungchul Park @ 2016-07-06  2:17 UTC (permalink / raw)
  To: Boqun Feng; +Cc: Peter Zijlstra, mingo, linux-kernel, walken

On Wed, Jul 06, 2016 at 08:49:43AM +0800, Boqun Feng wrote:
> On Mon, Jul 04, 2016 at 03:42:59PM +0900, Byungchul Park wrote:
> [snip]
> > > > +2. A lock has dependency with all locks in the releasing context, having
> > > > +   been held since the lock was held.
> > > 
> > > But you cannot tell this. The 'since the lock was held' thing fully
> > > depends on timing and is not fundamentally correct.
> > > 
> > > 			lock(A)
> > > 			unlock(A)
> > > 	lock(A)
> > > 	wait_for(B)
> > > 	unlock(A)
> > > 			wake(B)
> > > 
> > > Between the wait_for(B) and wake(B), _nothing_ has been held, yet still
> > > there's the deadlock potential.

I mis-understood this sentence. However, anyway this does not cause
deadlock. Of course it's deadlock if an example below actually happens.

We should not presume that the below can happen, once the above happened
because some dependencies between these contexts may prevent the below.
Therefore we should decide to check only when the actual problematic
sequence happens like below.

lock(A)
wait_for(B)
~~~~~~~~~~~~~~~~~~~~~~~~ <- serialized by atomic operation
		lock(A)
		unlock(A)
		wake(B)
unlock(A)

So I meant crossrelease can detect this deadlock if actually deadlock
causable sequence happens at least once. I tried to avoid false positive
detection, in other words, I made lockdep's detection stronger only with
true positive ones.

Of course I want this crosslock to be stronger than current
implementation. However I have no idea to make even what peterz's example
can be detected regarding all dependency with avoiding false positive
detection. It should be a future work. But it will be not simple or
impossible.

> > 
> > Crossreleas feature can detect this situation as a deadlock. wait_for()
> > is not an actual lock, but we can make it detectable by using acquring and
> > releasing semantics on wait_for() and wake().
> > 
> > > And note that if the timing was 'right', you would never get to wake(B)
> > > because deadlock, so you'd never establish that there would be a
> > > deadlock.
> > 
> > If a deadlock actually happens, then we cannot establish it as you said.
> > Remind that current lockdep does nothing for this situation. But at least
> > crossrelease feature can detect this deadlock possibility at the time the
> > dependency tree(graph) is built, which is better than doing nothing.
> > 
> 
> Confused, how?

And I am sorry for making you confused.

> 
> Say the sequence of events is as follow:
> 
> (two tasks are initially with no lock held)
> 
> 	Task 1		Task 2
> 	=============	====================
> 			lock(A)
> 			unlock(A)
> 	lock(A)
> 	wait_for(B) // acquire
> 			wake(B) // commit + release
> 	unlock(A)
> 

As you know this is not actual deadlock.

> by the time, the commit are called, the dependency tree will be built,
> and we will find there is _no_ lock held before wake(B). Therefore at
> the release stage, you will end up only adding dependency chain A->B in
> the lockdep, right? And it looks like neither Task1 or Task2 will break
> the dependency chain A->B. So how can crossrelease detect the potential
> deadlock?

It's similar to how the crossrelease work, but a little bit different.
I will reinforce and resend the document later.

> 
> It will be better, that you could provide some samples that crossrelease
> can detect after your confirmation.

Yes I will.

Thank you,
Byungchul

> 
> Regards,
> Boqun

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] lockdep: Add a document describing crossrelease feature
  2016-07-06  2:17         ` Byungchul Park
@ 2016-07-06  5:33           ` Byungchul Park
  2016-07-06  7:56             ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Byungchul Park @ 2016-07-06  5:33 UTC (permalink / raw)
  To: Boqun Feng; +Cc: Peter Zijlstra, mingo, linux-kernel, walken

On Wed, Jul 06, 2016 at 11:17:10AM +0900, Byungchul Park wrote:
> 
> lock(A)
> wait_for(B)
> ~~~~~~~~~~~~~~~~~~~~~~~~ <- serialized by atomic operation
> 		lock(A)
> 		unlock(A)
> 		wake(B)
> unlock(A)

By the way, I have a question. Is there anyone who could answer it?

I want to serialize between two context's lock operations, for example,

	context A	context B
	--------------	--------------
	lock A
	lock B		...
	lock C
	atomic_inc_return
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <- serialization
			atomic_read
			lock D
	...		lock E
			lock F

so that we can see these in the order like A -> B -> C -> D -> E -> F.

atomic_inc_return() is used after lock C in context A, and atomic_read()
is used before lock D in context B. And I want to make it serialized when
the atomic_read() can see the increased value.

Can I use smp_mb__after_atomic() just after atomic_read() or should I use
smp_mb()? I think anyway I have to choose one of them for that ordering.

Thank you,
Byungchul

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] lockdep: Add a document describing crossrelease feature
  2016-07-06  5:33           ` Byungchul Park
@ 2016-07-06  7:56             ` Peter Zijlstra
  2016-07-06  8:12               ` Byungchul Park
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2016-07-06  7:56 UTC (permalink / raw)
  To: Byungchul Park; +Cc: Boqun Feng, mingo, linux-kernel, walken

On Wed, Jul 06, 2016 at 02:33:29PM +0900, Byungchul Park wrote:
> On Wed, Jul 06, 2016 at 11:17:10AM +0900, Byungchul Park wrote:
> > 
> > lock(A)
> > wait_for(B)
> > ~~~~~~~~~~~~~~~~~~~~~~~~ <- serialized by atomic operation
> > 		lock(A)
> > 		unlock(A)
> > 		wake(B)
> > unlock(A)
> 
> By the way, I have a question. Is there anyone who could answer it?
> 
> I want to serialize between two context's lock operations, for example,
> 
> 	context A	context B
> 	--------------	--------------
> 	lock A
> 	lock B		...
> 	lock C
> 	atomic_inc_return
> 	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <- serialization
> 			atomic_read
> 			lock D
> 	...		lock E
> 			lock F
> 
> so that we can see these in the order like A -> B -> C -> D -> E -> F.
> 
> atomic_inc_return() is used after lock C in context A, and atomic_read()
> is used before lock D in context B. And I want to make it serialized when
> the atomic_read() can see the increased value.
> 
> Can I use smp_mb__after_atomic() just after atomic_read() 

No. atomic_set() and atomic_read() are not RmW operations.

> or should I use
> smp_mb()? I think anyway I have to choose one of them for that ordering.

smp_load_acquire(), if that observes the increment it will ensure D
comes after etc..

Also, atomic_read() _could_ be enough, if its part of a control
dependency, because LOCK very much involves a store, so the load->store
order provided by the control dependency will already order things.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] lockdep: Add a document describing crossrelease feature
  2016-07-06  7:56             ` Peter Zijlstra
@ 2016-07-06  8:12               ` Byungchul Park
  0 siblings, 0 replies; 38+ messages in thread
From: Byungchul Park @ 2016-07-06  8:12 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Boqun Feng, mingo, linux-kernel, walken

On Wed, Jul 06, 2016 at 09:56:08AM +0200, Peter Zijlstra wrote:
> On Wed, Jul 06, 2016 at 02:33:29PM +0900, Byungchul Park wrote:
> > On Wed, Jul 06, 2016 at 11:17:10AM +0900, Byungchul Park wrote:
> > > 
> > > lock(A)
> > > wait_for(B)
> > > ~~~~~~~~~~~~~~~~~~~~~~~~ <- serialized by atomic operation
> > > 		lock(A)
> > > 		unlock(A)
> > > 		wake(B)
> > > unlock(A)
> > 
> > By the way, I have a question. Is there anyone who could answer it?
> > 
> > I want to serialize between two context's lock operations, for example,
> > 
> > 	context A	context B
> > 	--------------	--------------
> > 	lock A
> > 	lock B		...
> > 	lock C
> > 	atomic_inc_return
> > 	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <- serialization
> > 			atomic_read
> > 			lock D
> > 	...		lock E
> > 			lock F
> > 
> > so that we can see these in the order like A -> B -> C -> D -> E -> F.
> > 
> > atomic_inc_return() is used after lock C in context A, and atomic_read()
> > is used before lock D in context B. And I want to make it serialized when
> > the atomic_read() can see the increased value.
> > 
> > Can I use smp_mb__after_atomic() just after atomic_read() 
> 
> No. atomic_set() and atomic_read() are not RmW operations.
> 
> > or should I use
> > smp_mb()? I think anyway I have to choose one of them for that ordering.
> 
> smp_load_acquire(), if that observes the increment it will ensure D
> comes after etc..
> 
> Also, atomic_read() _could_ be enough, if its part of a control
> dependency, because LOCK very much involves a store, so the load->store
> order provided by the control dependency will already order things.

Indeed. Thank you very much.

I can rely on the control dependency if possible. I will check it.

Thank you,
Byungchul

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2016-07-06  8:14 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-20  4:55 [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
2016-06-20  4:55 ` [RFC 01/12] lockdep: Refactor lookup_chain_cache() Byungchul Park
2016-06-20  4:55 ` [RFC 02/12] lockdep: Add a function building a chain between two hlocks Byungchul Park
2016-06-20  4:55 ` [RFC 03/12] lockdep: Make check_prev_add can use a stack_trace of other context Byungchul Park
2016-06-20  4:55 ` [RFC 04/12] lockdep: Make save_trace can copy from other stack_trace Byungchul Park
2016-06-20  4:55 ` [RFC 05/12] lockdep: Implement crossrelease feature Byungchul Park
2016-06-30 13:03   ` Peter Zijlstra
2016-06-30 23:28     ` Byungchul Park
2016-06-20  4:55 ` [RFC 06/12] lockdep: Apply crossrelease to completion Byungchul Park
2016-06-20  4:55 ` [RFC 07/12] pagemap.h: Remove trailing white space Byungchul Park
2016-06-20  4:55 ` [RFC 08/12] lockdep: Apply crossrelease to PG_locked lock Byungchul Park
2016-06-30 13:04   ` Peter Zijlstra
2016-06-30 23:21     ` Byungchul Park
2016-07-01  8:10       ` Peter Zijlstra
2016-07-01 11:18       ` Kirill A. Shutemov
2016-07-04  4:30         ` Byungchul Park
2016-06-20  4:55 ` [RFC 09/12] cifs/file.c: Remove trailing white space Byungchul Park
2016-06-20  4:55 ` [RFC 10/12] mm/swap_state.c: " Byungchul Park
2016-06-20  4:55 ` [RFC 11/12] lockdep: Call lock_acquire(release) when accessing PG_locked manually Byungchul Park
2016-06-20  4:55 ` [RFC 12/12] x86/dumpstack: Optimize save_stack_trace Byungchul Park
2016-06-20  7:29   ` xinhui
2016-06-20  7:50     ` byungchul.park
2016-06-29 12:43       ` Byungchul Park
2016-06-30 10:38         ` xinhui
2016-06-30 23:06           ` Byungchul Park
2016-06-23 23:37 ` [RFC 00/12] lockdep: Implement crossrelease feature Byungchul Park
2016-06-24  7:08   ` Peter Zijlstra
2016-06-24 11:13     ` Byungchul Park
2016-06-24 11:26   ` Nikolay Borisov
2016-06-27  1:34     ` Byungchul Park
2016-07-01  4:15 ` [PATCH] lockdep: Add a document describing " Byungchul Park
2016-07-01 10:45   ` Peter Zijlstra
2016-07-04  6:42     ` Byungchul Park
2016-07-06  0:49       ` Boqun Feng
2016-07-06  2:17         ` Byungchul Park
2016-07-06  5:33           ` Byungchul Park
2016-07-06  7:56             ` Peter Zijlstra
2016-07-06  8:12               ` Byungchul Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).