linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/13] lockdep: Implement crossrelease feature
@ 2017-01-18 13:17 Byungchul Park
  2017-01-18 13:17 ` [PATCH v5 01/13] lockdep: Refactor lookup_chain_cache() Byungchul Park
                   ` (13 more replies)
  0 siblings, 14 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-18 13:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

I checked if crossrelease feature works well on my qemu-i386 machine.
There's no problem at all to work on mine. But I wonder if it's also
true even on other machines. Especially, on large system. Could you
let me know if it doesn't work on yours? Or Could you let me know if
crossrelease feature is useful? Please let me know if you need to
backport it to another version but it's not easy. Then I can provide
the backported version after working it.

-----8<-----

Change from v4
	- rebase on vanilla v4.9 tag
	- re-name pend_lock(plock) to hist_lock(xhlock)
	- allow overwriting ring buffer for hist_lock
	- unwind ring buffer instead of tagging id for each irq
	- introduce lockdep_map_cross embedding cross_lock
	- make each work of workqueue distinguishable
	- enhance comments
	(I will update the document at the next spin.)

Change from v3
	- reviced document

Change from v2
	- rebase on vanilla v4.7 tag
	- move lockdep data for page lock from struct page to page_ext
	- allocate plocks buffer via vmalloc instead of in struct task
	- enhanced comments and document
	- optimize performance
	- make reporting function crossrelease-aware

Change from v1
	- enhanced the document
	- removed save_stack_trace() optimizing patch
	- made this based on the seperated save_stack_trace patchset
	  https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1182242.html

Can we detect deadlocks below with original lockdep?

Example 1)

	PROCESS X	PROCESS Y
	--------------	--------------
	mutext_lock A
			lock_page B
	lock_page B
			mutext_lock A // DEADLOCK
	unlock_page B
			mutext_unlock A
	mutex_unlock A
			unlock_page B

where A and B are different lock classes.

No, we cannot.

Example 2)

	PROCESS X	PROCESS Y	PROCESS Z
	--------------	--------------	--------------
			mutex_lock A
	lock_page B
			lock_page B
					mutext_lock A // DEADLOCK
					mutext_unlock A
					unlock_page B
					(B was held by PROCESS X)
			unlock_page B
			mutex_unlock A

where A and B are different lock classes.

No, we cannot.

Example 3)

	PROCESS X	PROCESS Y
	--------------	--------------
			mutex_lock A
	mutex_lock A
			wait_for_complete B // DEADLOCK
	mutex_unlock A
	complete B
			mutex_unlock A

where A is a lock class and B is a completion variable.

No, we cannot.

Not only lock operations, but also any operations causing to wait or
spin for something can cause deadlock unless it's eventually *released*
by someone. The important point here is that the waiting or spinning
must be *released* by someone.

Using crossrelease feature, we can check dependency and detect deadlock
possibility not only for typical lock, but also for lock_page(),
wait_for_xxx() and so on, which might be released in any context.

See the last patch including the document for more information.

Byungchul Park (13):
  lockdep: Refactor lookup_chain_cache()
  lockdep: Fix wrong condition to print bug msgs for
    MAX_LOCKDEP_CHAIN_HLOCKS
  lockdep: Add a function building a chain between two classes
  lockdep: Refactor save_trace()
  lockdep: Pass a callback arg to check_prev_add() to handle stack_trace
  lockdep: Implement crossrelease feature
  lockdep: Make print_circular_bug() aware of crossrelease
  lockdep: Apply crossrelease to completions
  pagemap.h: Remove trailing white space
  lockdep: Apply crossrelease to PG_locked locks
  lockdep: Apply lock_acquire(release) on __Set(__Clear)PageLocked
  lockdep: Move data of CONFIG_LOCKDEP_PAGELOCK from page to page_ext
  lockdep: Crossrelease feature documentation

 Documentation/locking/crossrelease.txt | 1053 ++++++++++++++++++++++++++++++++
 include/linux/completion.h             |  118 +++-
 include/linux/irqflags.h               |   24 +-
 include/linux/lockdep.h                |  129 ++++
 include/linux/mm_types.h               |    4 +
 include/linux/page-flags.h             |   43 +-
 include/linux/page_ext.h               |    4 +
 include/linux/pagemap.h                |  124 +++-
 include/linux/sched.h                  |    9 +
 kernel/exit.c                          |    9 +
 kernel/fork.c                          |   23 +
 kernel/locking/lockdep.c               |  763 ++++++++++++++++++++---
 kernel/sched/completion.c              |   54 +-
 kernel/workqueue.c                     |    1 +
 lib/Kconfig.debug                      |   30 +
 mm/filemap.c                           |   76 ++-
 mm/page_ext.c                          |    4 +
 17 files changed, 2324 insertions(+), 144 deletions(-)
 create mode 100644 Documentation/locking/crossrelease.txt

-- 
1.9.1

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v5 01/13] lockdep: Refactor lookup_chain_cache()
  2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
@ 2017-01-18 13:17 ` Byungchul Park
  2017-01-19  9:16   ` Boqun Feng
  2017-01-18 13:17 ` [PATCH v5 02/13] lockdep: Fix wrong condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS Byungchul Park
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 63+ messages in thread
From: Byungchul Park @ 2017-01-18 13:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

Currently, lookup_chain_cache() provides both 'lookup' and 'add'
functionalities in a function. However, each is useful. So this
patch makes lookup_chain_cache() only do 'lookup' functionality and
makes add_chain_cahce() only do 'add' functionality. And it's more
readable than before.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 129 +++++++++++++++++++++++++++++------------------
 1 file changed, 81 insertions(+), 48 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 4d7ffc0..f37156f 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2109,15 +2109,9 @@ static int check_no_collision(struct task_struct *curr,
 	return 1;
 }
 
-/*
- * Look up a dependency chain. If the key is not present yet then
- * add it and return 1 - in this case the new dependency chain is
- * validated. If the key is already hashed, return 0.
- * (On return with 1 graph_lock is held.)
- */
-static inline int lookup_chain_cache(struct task_struct *curr,
-				     struct held_lock *hlock,
-				     u64 chain_key)
+static inline int add_chain_cache(struct task_struct *curr,
+				  struct held_lock *hlock,
+				  u64 chain_key)
 {
 	struct lock_class *class = hlock_class(hlock);
 	struct hlist_head *hash_head = chainhashentry(chain_key);
@@ -2125,49 +2119,18 @@ static inline int lookup_chain_cache(struct task_struct *curr,
 	int i, j;
 
 	/*
+	 * Allocate a new chain entry from the static array, and add
+	 * it to the hash:
+	 */
+
+	/*
 	 * We might need to take the graph lock, ensure we've got IRQs
 	 * disabled to make this an IRQ-safe lock.. for recursion reasons
 	 * lockdep won't complain about its own locking errors.
 	 */
 	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
 		return 0;
-	/*
-	 * We can walk it lock-free, because entries only get added
-	 * to the hash:
-	 */
-	hlist_for_each_entry_rcu(chain, hash_head, entry) {
-		if (chain->chain_key == chain_key) {
-cache_hit:
-			debug_atomic_inc(chain_lookup_hits);
-			if (!check_no_collision(curr, hlock, chain))
-				return 0;
 
-			if (very_verbose(class))
-				printk("\nhash chain already cached, key: "
-					"%016Lx tail class: [%p] %s\n",
-					(unsigned long long)chain_key,
-					class->key, class->name);
-			return 0;
-		}
-	}
-	if (very_verbose(class))
-		printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
-			(unsigned long long)chain_key, class->key, class->name);
-	/*
-	 * Allocate a new chain entry from the static array, and add
-	 * it to the hash:
-	 */
-	if (!graph_lock())
-		return 0;
-	/*
-	 * We have to walk the chain again locked - to avoid duplicates:
-	 */
-	hlist_for_each_entry(chain, hash_head, entry) {
-		if (chain->chain_key == chain_key) {
-			graph_unlock();
-			goto cache_hit;
-		}
-	}
 	if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
 		if (!debug_locks_off_graph_unlock())
 			return 0;
@@ -2219,6 +2182,75 @@ static inline int lookup_chain_cache(struct task_struct *curr,
 	return 1;
 }
 
+/*
+ * Look up a dependency chain.
+ */
+static inline struct lock_chain *lookup_chain_cache(u64 chain_key)
+{
+	struct hlist_head *hash_head = chainhashentry(chain_key);
+	struct lock_chain *chain;
+
+	/*
+	 * We can walk it lock-free, because entries only get added
+	 * to the hash:
+	 */
+	hlist_for_each_entry_rcu(chain, hash_head, entry) {
+		if (chain->chain_key == chain_key) {
+			debug_atomic_inc(chain_lookup_hits);
+			return chain;
+		}
+	}
+	return NULL;
+}
+
+/*
+ * If the key is not present yet in dependency chain cache then
+ * add it and return 1 - in this case the new dependency chain is
+ * validated. If the key is already hashed, return 0.
+ * (On return with 1 graph_lock is held.)
+ */
+static inline int lookup_chain_cache_add(struct task_struct *curr,
+					 struct held_lock *hlock,
+					 u64 chain_key)
+{
+	struct lock_class *class = hlock_class(hlock);
+	struct lock_chain *chain = lookup_chain_cache(chain_key);
+
+	if (chain) {
+cache_hit:
+		if (!check_no_collision(curr, hlock, chain))
+			return 0;
+
+		if (very_verbose(class))
+			printk("\nhash chain already cached, key: "
+					"%016Lx tail class: [%p] %s\n",
+					(unsigned long long)chain_key,
+					class->key, class->name);
+		return 0;
+	}
+
+	if (very_verbose(class))
+		printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
+			(unsigned long long)chain_key, class->key, class->name);
+
+	if (!graph_lock())
+		return 0;
+
+	/*
+	 * We have to walk the chain again locked - to avoid duplicates:
+	 */
+	chain = lookup_chain_cache(chain_key);
+	if (chain) {
+		graph_unlock();
+		goto cache_hit;
+	}
+
+	if (!add_chain_cache(curr, hlock, chain_key))
+		return 0;
+
+	return 1;
+}
+
 static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 		struct held_lock *hlock, int chain_head, u64 chain_key)
 {
@@ -2229,11 +2261,11 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 	 *
 	 * We look up the chain_key and do the O(N^2) check and update of
 	 * the dependencies only if this is a new dependency chain.
-	 * (If lookup_chain_cache() returns with 1 it acquires
+	 * (If lookup_chain_cache_add() return with 1 it acquires
 	 * graph_lock for us)
 	 */
 	if (!hlock->trylock && hlock->check &&
-	    lookup_chain_cache(curr, hlock, chain_key)) {
+	    lookup_chain_cache_add(curr, hlock, chain_key)) {
 		/*
 		 * Check whether last held lock:
 		 *
@@ -2264,9 +2296,10 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 		if (!chain_head && ret != 2)
 			if (!check_prevs_add(curr, hlock))
 				return 0;
+
 		graph_unlock();
 	} else
-		/* after lookup_chain_cache(): */
+		/* after lookup_chain_cache_add(): */
 		if (unlikely(!debug_locks))
 			return 0;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 02/13] lockdep: Fix wrong condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
  2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
  2017-01-18 13:17 ` [PATCH v5 01/13] lockdep: Refactor lookup_chain_cache() Byungchul Park
@ 2017-01-18 13:17 ` Byungchul Park
  2017-01-18 13:17 ` [PATCH v5 03/13] lockdep: Add a function building a chain between two classes Byungchul Park
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-18 13:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

Bug messages and stack dump for MAX_LOCKDEP_CHAIN_HLOCKS should be
printed only at the first time.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index f37156f..a143eb4 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2166,7 +2166,7 @@ static inline int add_chain_cache(struct task_struct *curr,
 	 * Important for check_no_collision().
 	 */
 	if (unlikely(nr_chain_hlocks > MAX_LOCKDEP_CHAIN_HLOCKS)) {
-		if (debug_locks_off_graph_unlock())
+		if (!debug_locks_off_graph_unlock())
 			return 0;
 
 		print_lockdep_off("BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!");
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 03/13] lockdep: Add a function building a chain between two classes
  2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
  2017-01-18 13:17 ` [PATCH v5 01/13] lockdep: Refactor lookup_chain_cache() Byungchul Park
  2017-01-18 13:17 ` [PATCH v5 02/13] lockdep: Fix wrong condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS Byungchul Park
@ 2017-01-18 13:17 ` Byungchul Park
  2017-01-18 13:17 ` [PATCH v5 04/13] lockdep: Refactor save_trace() Byungchul Park
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-18 13:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

Currently, add_chain_cache() should be used in the context where the
hlock is owned since it might be racy. However, crossrelease needs to
build a chain between two locks regardless of context. So this patch
introduces a new function making it possible.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index a143eb4..2081c31 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2109,6 +2109,76 @@ static int check_no_collision(struct task_struct *curr,
 	return 1;
 }
 
+/*
+ * This is for building a chain between just two different classes,
+ * instead of adding a new hlock upon current, which is done by
+ * add_chain_cache().
+ *
+ * This can be called in any context with two classes, while
+ * add_chain_cache() must be done within the lock owener's context
+ * since it uses hlock which might be racy in another context.
+ */
+static inline int add_chain_cache_classes(unsigned int prev,
+					  unsigned int next,
+					  unsigned int irq_context,
+					  u64 chain_key)
+{
+	struct hlist_head *hash_head = chainhashentry(chain_key);
+	struct lock_chain *chain;
+
+	/*
+	 * Allocate a new chain entry from the static array, and add
+	 * it to the hash:
+	 */
+
+	/*
+	 * We might need to take the graph lock, ensure we've got IRQs
+	 * disabled to make this an IRQ-safe lock.. for recursion reasons
+	 * lockdep won't complain about its own locking errors.
+	 */
+	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+		return 0;
+
+	if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
+		if (!debug_locks_off_graph_unlock())
+			return 0;
+
+		print_lockdep_off("BUG: MAX_LOCKDEP_CHAINS too low!");
+		dump_stack();
+		return 0;
+	}
+
+	chain = lock_chains + nr_lock_chains++;
+	chain->chain_key = chain_key;
+	chain->irq_context = irq_context;
+	chain->depth = 2;
+	if (likely(nr_chain_hlocks + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS)) {
+		chain->base = nr_chain_hlocks;
+		nr_chain_hlocks += chain->depth;
+		chain_hlocks[chain->base] = prev - 1;
+		chain_hlocks[chain->base + 1] = next -1;
+	}
+#ifdef CONFIG_DEBUG_LOCKDEP
+	/*
+	 * Important for check_no_collision().
+	 */
+	else {
+		if (!debug_locks_off_graph_unlock())
+			return 0;
+
+		print_lockdep_off("BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!");
+		dump_stack();
+		return 0;
+	}
+#endif
+
+	hlist_add_head_rcu(&chain->entry, hash_head);
+	debug_atomic_inc(chain_lookup_misses);
+	inc_chains();
+
+	return 1;
+}
+
 static inline int add_chain_cache(struct task_struct *curr,
 				  struct held_lock *hlock,
 				  u64 chain_key)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 04/13] lockdep: Refactor save_trace()
  2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
                   ` (2 preceding siblings ...)
  2017-01-18 13:17 ` [PATCH v5 03/13] lockdep: Add a function building a chain between two classes Byungchul Park
@ 2017-01-18 13:17 ` Byungchul Park
  2017-01-18 13:17 ` [PATCH v5 05/13] lockdep: Pass a callback arg to check_prev_add() to handle stack_trace Byungchul Park
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-18 13:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

Currently, save_trace() allocates a buffer for saving stack_trace from
the global buffer, and then saves the trace. However, it would be more
useful if a separate buffer can be used. Actually, crossrelease needs
to use separate temporal buffers where to save stack_traces.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 2081c31..e63ff97 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -392,13 +392,13 @@ static void print_lockdep_off(const char *bug_msg)
 #endif
 }
 
-static int save_trace(struct stack_trace *trace)
+static unsigned int __save_trace(struct stack_trace *trace, unsigned long *buf,
+				 unsigned long max_nr, int skip)
 {
 	trace->nr_entries = 0;
-	trace->max_entries = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
-	trace->entries = stack_trace + nr_stack_trace_entries;
-
-	trace->skip = 3;
+	trace->max_entries = max_nr;
+	trace->entries = buf;
+	trace->skip = skip;
 
 	save_stack_trace(trace);
 
@@ -415,7 +415,15 @@ static int save_trace(struct stack_trace *trace)
 
 	trace->max_entries = trace->nr_entries;
 
-	nr_stack_trace_entries += trace->nr_entries;
+	return trace->nr_entries;
+}
+
+static int save_trace(struct stack_trace *trace)
+{
+	unsigned long *buf = stack_trace + nr_stack_trace_entries;
+	unsigned long max_nr = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
+
+	nr_stack_trace_entries += __save_trace(trace, buf, max_nr, 3);
 
 	if (nr_stack_trace_entries >= MAX_STACK_TRACE_ENTRIES-1) {
 		if (!debug_locks_off_graph_unlock())
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 05/13] lockdep: Pass a callback arg to check_prev_add() to handle stack_trace
  2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
                   ` (3 preceding siblings ...)
  2017-01-18 13:17 ` [PATCH v5 04/13] lockdep: Refactor save_trace() Byungchul Park
@ 2017-01-18 13:17 ` Byungchul Park
  2017-01-26  7:43   ` Byungchul Park
  2017-01-18 13:17 ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 63+ messages in thread
From: Byungchul Park @ 2017-01-18 13:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

Currently, a separate stack_trace instance cannot be used in
check_prev_add(). The simplest way to achieve it is to pass a
stack_trace instance to check_prev_add() as an argument after
saving it. However, unnecessary saving can happen if so implemented.

The proper solution is to pass a callback function additionally along
with a stack_trace so that a caller can decide the way to save, for
example, doing nothing, calling save_trace() or doing something else.

Actually, crossrelease don't need to save stack_trace of current but
only need to copy stack_traces from temporary buffers to the global
stack_trace[].

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 38 ++++++++++++++++++--------------------
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e63ff97..75dc14a 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1805,20 +1805,13 @@ static inline void inc_chains(void)
  */
 static int
 check_prev_add(struct task_struct *curr, struct held_lock *prev,
-	       struct held_lock *next, int distance, int *stack_saved)
+	       struct held_lock *next, int distance, struct stack_trace *trace,
+	       int (*save)(struct stack_trace *trace))
 {
 	struct lock_list *entry;
 	int ret;
 	struct lock_list this;
 	struct lock_list *uninitialized_var(target_entry);
-	/*
-	 * Static variable, serialized by the graph_lock().
-	 *
-	 * We use this static variable to save the stack trace in case
-	 * we call into this function multiple times due to encountering
-	 * trylocks in the held lock stack.
-	 */
-	static struct stack_trace trace;
 
 	/*
 	 * Prove that the new <prev> -> <next> dependency would not
@@ -1866,11 +1859,8 @@ static inline void inc_chains(void)
 		}
 	}
 
-	if (!*stack_saved) {
-		if (!save_trace(&trace))
-			return 0;
-		*stack_saved = 1;
-	}
+	if (save && !save(trace))
+		return 0;
 
 	/*
 	 * Ok, all validations passed, add the new lock
@@ -1878,14 +1868,14 @@ static inline void inc_chains(void)
 	 */
 	ret = add_lock_to_list(hlock_class(prev), hlock_class(next),
 			       &hlock_class(prev)->locks_after,
-			       next->acquire_ip, distance, &trace);
+			       next->acquire_ip, distance, trace);
 
 	if (!ret)
 		return 0;
 
 	ret = add_lock_to_list(hlock_class(next), hlock_class(prev),
 			       &hlock_class(next)->locks_before,
-			       next->acquire_ip, distance, &trace);
+			       next->acquire_ip, distance, trace);
 	if (!ret)
 		return 0;
 
@@ -1893,8 +1883,6 @@ static inline void inc_chains(void)
 	 * Debugging printouts:
 	 */
 	if (verbose(hlock_class(prev)) || verbose(hlock_class(next))) {
-		/* We drop graph lock, so another thread can overwrite trace. */
-		*stack_saved = 0;
 		graph_unlock();
 		printk("\n new dependency: ");
 		print_lock_name(hlock_class(prev));
@@ -1917,8 +1905,10 @@ static inline void inc_chains(void)
 check_prevs_add(struct task_struct *curr, struct held_lock *next)
 {
 	int depth = curr->lockdep_depth;
-	int stack_saved = 0;
 	struct held_lock *hlock;
+	struct stack_trace trace;
+	unsigned long start_nr = nr_stack_trace_entries;
+	int (*save)(struct stack_trace *trace) = save_trace;
 
 	/*
 	 * Debugging checks.
@@ -1944,8 +1934,16 @@ static inline void inc_chains(void)
 		 */
 		if (hlock->read != 2 && hlock->check) {
 			if (!check_prev_add(curr, hlock, next,
-						distance, &stack_saved))
+						distance, &trace, save))
 				return 0;
+
+			/*
+			 * Stop saving stack_trace if save_trace() was
+			 * called at least once:
+			 */
+			if (save && start_nr != nr_stack_trace_entries)
+				save = NULL;
+
 			/*
 			 * Stop after the first non-trylock entry,
 			 * as non-trylock entries have added their
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
                   ` (4 preceding siblings ...)
  2017-01-18 13:17 ` [PATCH v5 05/13] lockdep: Pass a callback arg to check_prev_add() to handle stack_trace Byungchul Park
@ 2017-01-18 13:17 ` Byungchul Park
  2017-02-28 12:26   ` Peter Zijlstra
                     ` (7 more replies)
  2017-01-18 13:17 ` [PATCH v5 07/13] lockdep: Make print_circular_bug() aware of crossrelease Byungchul Park
                   ` (7 subsequent siblings)
  13 siblings, 8 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-18 13:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

Lockdep is a runtime locking correctness validator that detects and
reports a deadlock or its possibility by checking dependencies between
locks. It's useful since it does not report just an actual deadlock but
also the possibility of a deadlock that has not actually happened yet.
That enables problems to be fixed before they affect real systems.

However, this facility is only applicable to typical locks, such as
spinlocks and mutexes, which are normally released within the context in
which they were acquired. However, synchronization primitives like page
locks or completions, which are allowed to be released in any context,
also create dependencies and can cause a deadlock. So lockdep should
track these locks to do a better job. The "crossrelease" implementation
makes these primitives also be tracked.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/irqflags.h |  24 ++-
 include/linux/lockdep.h  | 129 +++++++++++++
 include/linux/sched.h    |   9 +
 kernel/exit.c            |   9 +
 kernel/fork.c            |  23 +++
 kernel/locking/lockdep.c | 482 ++++++++++++++++++++++++++++++++++++++++++++---
 kernel/workqueue.c       |   1 +
 lib/Kconfig.debug        |  13 ++
 8 files changed, 665 insertions(+), 25 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 5dd1272..c40af8a 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -23,10 +23,26 @@
 # define trace_softirq_context(p)	((p)->softirq_context)
 # define trace_hardirqs_enabled(p)	((p)->hardirqs_enabled)
 # define trace_softirqs_enabled(p)	((p)->softirqs_enabled)
-# define trace_hardirq_enter()	do { current->hardirq_context++; } while (0)
-# define trace_hardirq_exit()	do { current->hardirq_context--; } while (0)
-# define lockdep_softirq_enter()	do { current->softirq_context++; } while (0)
-# define lockdep_softirq_exit()	do { current->softirq_context--; } while (0)
+# define trace_hardirq_enter()		\
+do {					\
+	current->hardirq_context++;	\
+	crossrelease_hardirq_start();	\
+} while (0)
+# define trace_hardirq_exit()		\
+do {					\
+	current->hardirq_context--;	\
+	crossrelease_hardirq_end();	\
+} while (0)
+# define lockdep_softirq_enter()	\
+do {					\
+	current->softirq_context++;	\
+	crossrelease_softirq_start();	\
+} while (0)
+# define lockdep_softirq_exit()		\
+do {					\
+	current->softirq_context--;	\
+	crossrelease_softirq_end();	\
+} while (0)
 # define INIT_TRACE_IRQFLAGS	.softirqs_enabled = 1,
 #else
 # define trace_hardirqs_on()		do { } while (0)
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index c1458fe..f7c6905 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -155,6 +155,12 @@ struct lockdep_map {
 	int				cpu;
 	unsigned long			ip;
 #endif
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+	/*
+	 * Flag to indicate whether it's a crosslock.
+	 */
+	int				cross;
+#endif
 };
 
 static inline void lockdep_copy_map(struct lockdep_map *to,
@@ -258,9 +264,94 @@ struct held_lock {
 	unsigned int hardirqs_off:1;
 	unsigned int references:12;					/* 32 bits */
 	unsigned int pin_count;
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+	/*
+	 * Generation id.
+	 *
+	 * A value of cross_gen_id will be stored when holding this,
+	 * which is globally increased whenever each crosslock is held.
+	 */
+	unsigned int gen_id;
+#endif
+};
+
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+#define MAX_XHLOCK_TRACE_ENTRIES 5
+
+/*
+ * This is for keeping locks waiting for commit so that true dependencies
+ * can be added at commit step.
+ */
+struct hist_lock {
+	/*
+	 * If the previous in held_locks can create a proper dependency
+	 * with a target crosslock, then we can skip commiting this,
+	 * since "the target crosslock -> the previous lock" and
+	 * "the previous lock -> this lock" can cover the case. So we
+	 * keep the previous's gen_id to make the decision.
+	 */
+	unsigned int		prev_gen_id;
+
+	/*
+	 * Each work of workqueue might run in a different context,
+	 * thanks to concurrency support of workqueue. So we have to
+	 * distinguish each work to avoid false positive.
+	 *
+	 * TODO: We can also add dependencies between two acquisitions
+	 * of different work_id, if they don't cause a sleep so make
+	 * the worker stalled.
+	 */
+	unsigned int		work_id;
+
+	/*
+	 * Seperate stack_trace data. This will be used at commit step.
+	 */
+	struct stack_trace	trace;
+	unsigned long		trace_entries[MAX_XHLOCK_TRACE_ENTRIES];
+
+	/*
+	 * struct held_lock does not have an indicator whether in nmi.
+	 */
+	int nmi;
+
+	/*
+	 * Seperate hlock instance. This will be used at commit step.
+	 *
+	 * TODO: Use a smaller data structure containing only necessary
+	 * data. However, we should make lockdep code able to handle the
+	 * smaller one first.
+	 */
+	struct held_lock	hlock;
 };
 
 /*
+ * To initialize a lock as crosslock, lockdep_init_map_crosslock() should
+ * be called instead of lockdep_init_map().
+ */
+struct cross_lock {
+	/*
+	 * When more than one acquisition of crosslocks are overlapped,
+	 * we do actual commit only when ref == 0.
+	 */
+	atomic_t ref;
+
+	/*
+	 * Seperate hlock instance. This will be used at commit step.
+	 *
+	 * TODO: Use a smaller data structure containing only necessary
+	 * data. However, we should make lockdep code able to handle the
+	 * smaller one first.
+	 */
+	struct held_lock	hlock;
+};
+
+struct lockdep_map_cross {
+	struct lockdep_map map;
+	struct cross_lock xlock;
+};
+#endif
+
+/*
  * Initialization, self-test and debugging-output methods:
  */
 extern void lockdep_info(void);
@@ -281,6 +372,37 @@ struct held_lock {
 extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
 			     struct lock_class_key *key, int subclass);
 
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+extern void lockdep_init_map_crosslock(struct lockdep_map *lock,
+				       const char *name,
+				       struct lock_class_key *key,
+				       int subclass);
+extern void lock_commit_crosslock(struct lockdep_map *lock);
+
+/*
+ * What we essencially have to initialize is 'ref'. Other members will
+ * be initialized in add_xlock().
+ */
+#define STATIC_CROSS_LOCK_INIT() \
+	{ .ref = ATOMIC_INIT(0),}
+
+#define STATIC_CROSS_LOCKDEP_MAP_INIT(_name, _key) \
+	{ .map.name = (_name), .map.key = (void *)(_key), \
+	  .map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
+
+/*
+ * To initialize a lockdep_map statically use this macro.
+ * Note that _name must not be NULL.
+ */
+#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
+	{ .name = (_name), .key = (void *)(_key), .cross = 0, }
+
+extern void crossrelease_hardirq_start(void);
+extern void crossrelease_hardirq_end(void);
+extern void crossrelease_softirq_start(void);
+extern void crossrelease_softirq_end(void);
+extern void crossrelease_work_start(void);
+#else
 /*
  * To initialize a lockdep_map statically use this macro.
  * Note that _name must not be NULL.
@@ -288,6 +410,13 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
 #define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
 	{ .name = (_name), .key = (void *)(_key), }
 
+void crossrelease_hardirq_start(void) {}
+void crossrelease_hardirq_end(void) {}
+void crossrelease_softirq_start(void) {}
+void crossrelease_softirq_end(void) {}
+void crossrelease_work_start(void) {}
+#endif
+
 /*
  * Reinitialize a lock key - for cases where there is special locking or
  * special initialization of locks so that the validator gets the scope
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e9c009d..e7bcae8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1749,6 +1749,15 @@ struct task_struct {
 	struct held_lock held_locks[MAX_LOCK_DEPTH];
 	gfp_t lockdep_reclaim_gfp;
 #endif
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+#define MAX_XHLOCKS_NR 64UL
+	struct hist_lock *xhlocks; /* Crossrelease history locks */
+	int xhlock_idx;
+	int xhlock_idx_soft; /* For backing up at softirq entry */
+	int xhlock_idx_hard; /* For backing up at hardirq entry */
+	int xhlock_idx_nmi; /* For backing up at nmi entry */
+	unsigned int work_id;
+#endif
 #ifdef CONFIG_UBSAN
 	unsigned int in_ubsan;
 #endif
diff --git a/kernel/exit.c b/kernel/exit.c
index 3076f30..1bba1ab 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -54,6 +54,7 @@
 #include <linux/writeback.h>
 #include <linux/shm.h>
 #include <linux/kcov.h>
+#include <linux/vmalloc.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -883,6 +884,14 @@ void __noreturn do_exit(long code)
 	exit_rcu();
 	TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));
 
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+	if (tsk->xhlocks) {
+		void *tmp = tsk->xhlocks;
+		/* Disable crossrelease for current */
+		tsk->xhlocks = NULL;
+		vfree(tmp);
+	}
+#endif
 	do_task_dead();
 }
 EXPORT_SYMBOL_GPL(do_exit);
diff --git a/kernel/fork.c b/kernel/fork.c
index 997ac1d..1eda5cd 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -451,6 +451,13 @@ void __init fork_init(void)
 	for (i = 0; i < UCOUNT_COUNTS; i++) {
 		init_user_ns.ucount_max[i] = max_threads/2;
 	}
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+	/*
+	 * TODO: We need to make init_task also use crossrelease. Now,
+	 * just disable the feature for init_task.
+	 */
+	init_task.xhlocks = NULL;
+#endif
 }
 
 int __weak arch_dup_task_struct(struct task_struct *dst,
@@ -1611,6 +1618,14 @@ static __latent_entropy struct task_struct *copy_process(
 	p->lockdep_depth = 0; /* no locks held yet */
 	p->curr_chain_key = 0;
 	p->lockdep_recursion = 0;
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+	p->xhlock_idx = 0;
+	p->xhlock_idx_soft = 0;
+	p->xhlock_idx_hard = 0;
+	p->xhlock_idx_nmi = 0;
+	p->xhlocks = vzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR);
+	p->work_id = 0;
+#endif
 #endif
 
 #ifdef CONFIG_DEBUG_MUTEXES
@@ -1856,6 +1871,14 @@ static __latent_entropy struct task_struct *copy_process(
 bad_fork_cleanup_perf:
 	perf_event_free_task(p);
 bad_fork_cleanup_policy:
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+	if (p->xhlocks) {
+		void *tmp = p->xhlocks;
+		/* Diable crossrelease for current */
+		p->xhlocks = NULL;
+		vfree(tmp);
+	}
+#endif
 #ifdef CONFIG_NUMA
 	mpol_put(p->mempolicy);
 bad_fork_cleanup_threadgroup_lock:
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 75dc14a..0621b3e 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -717,6 +717,18 @@ static int count_matching_names(struct lock_class *new_class)
 	return NULL;
 }
 
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+static void cross_init(struct lockdep_map *lock, int cross);
+static int cross_lock(struct lockdep_map *lock);
+static int lock_acquire_crosslock(struct held_lock *hlock);
+static int lock_release_crosslock(struct lockdep_map *lock);
+#else
+static inline void cross_init(struct lockdep_map *lock, int cross) {}
+static inline int cross_lock(struct lockdep_map *lock) { return 0; }
+static inline int lock_acquire_crosslock(struct held_lock *hlock) { return 0; }
+static inline int lock_release_crosslock(struct lockdep_map *lock) { return 0; }
+#endif
+
 /*
  * Register a lock's class in the hash-table, if the class is not present
  * yet. Otherwise we look it up. We cache the result in the lock object
@@ -1776,6 +1788,9 @@ static inline void inc_chains(void)
 		if (nest)
 			return 2;
 
+		if (cross_lock(prev->instance))
+			continue;
+
 		return print_deadlock_bug(curr, prev, next);
 	}
 	return 1;
@@ -1929,29 +1944,35 @@ static inline void inc_chains(void)
 		int distance = curr->lockdep_depth - depth + 1;
 		hlock = curr->held_locks + depth - 1;
 		/*
-		 * Only non-recursive-read entries get new dependencies
-		 * added:
+		 * Only non-crosslock entries get new dependencies added.
+		 * Crosslock entries will be added by commit later:
 		 */
-		if (hlock->read != 2 && hlock->check) {
-			if (!check_prev_add(curr, hlock, next,
-						distance, &trace, save))
-				return 0;
-
+		if (!cross_lock(hlock->instance)) {
 			/*
-			 * Stop saving stack_trace if save_trace() was
-			 * called at least once:
+			 * Only non-recursive-read entries get new dependencies
+			 * added:
 			 */
-			if (save && start_nr != nr_stack_trace_entries)
-				save = NULL;
+			if (hlock->read != 2 && hlock->check) {
+				if (!check_prev_add(curr, hlock, next,
+							distance, &trace, save))
+					return 0;
 
-			/*
-			 * Stop after the first non-trylock entry,
-			 * as non-trylock entries have added their
-			 * own direct dependencies already, so this
-			 * lock is connected to them indirectly:
-			 */
-			if (!hlock->trylock)
-				break;
+				/*
+				 * Stop saving stack_trace if save_trace() was
+				 * called at least once:
+				 */
+				if (save && start_nr != nr_stack_trace_entries)
+					save = NULL;
+
+				/*
+				 * Stop after the first non-trylock entry,
+				 * as non-trylock entries have added their
+				 * own direct dependencies already, so this
+				 * lock is connected to them indirectly:
+				 */
+				if (!hlock->trylock)
+					break;
+			}
 		}
 		depth--;
 		/*
@@ -3203,7 +3224,7 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this,
 /*
  * Initialize a lock instance's lock-class mapping info:
  */
-void lockdep_init_map(struct lockdep_map *lock, const char *name,
+static void __lockdep_init_map(struct lockdep_map *lock, const char *name,
 		      struct lock_class_key *key, int subclass)
 {
 	int i;
@@ -3261,8 +3282,25 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name,
 		raw_local_irq_restore(flags);
 	}
 }
+
+void lockdep_init_map(struct lockdep_map *lock, const char *name,
+		      struct lock_class_key *key, int subclass)
+{
+	cross_init(lock, 0);
+	__lockdep_init_map(lock, name, key, subclass);
+}
 EXPORT_SYMBOL_GPL(lockdep_init_map);
 
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+void lockdep_init_map_crosslock(struct lockdep_map *lock, const char *name,
+		      struct lock_class_key *key, int subclass)
+{
+	cross_init(lock, 1);
+	__lockdep_init_map(lock, name, key, subclass);
+}
+EXPORT_SYMBOL_GPL(lockdep_init_map_crosslock);
+#endif
+
 struct lock_class_key __lockdep_no_validate__;
 EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
 
@@ -3366,7 +3404,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 
 	class_idx = class - lock_classes + 1;
 
-	if (depth) {
+	/* TODO: nest_lock is not implemented for crosslock yet. */
+	if (depth && !cross_lock(lock)) {
 		hlock = curr->held_locks + depth - 1;
 		if (hlock->class_idx == class_idx && nest_lock) {
 			if (hlock->references)
@@ -3447,6 +3486,9 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	if (!validate_chain(curr, lock, hlock, chain_head, chain_key))
 		return 0;
 
+	if (lock_acquire_crosslock(hlock))
+		return 1;
+
 	curr->curr_chain_key = chain_key;
 	curr->lockdep_depth++;
 	check_chain_key(curr);
@@ -3615,6 +3657,9 @@ static int match_held_lock(struct held_lock *hlock, struct lockdep_map *lock)
 	if (unlikely(!debug_locks))
 		return 0;
 
+	if (lock_release_crosslock(lock))
+		return 1;
+
 	depth = curr->lockdep_depth;
 	/*
 	 * So we're all set to release this lock.. wait what lock? We don't
@@ -4557,3 +4602,398 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
 	dump_stack();
 }
 EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
+
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+
+#define idx(t)			((t)->xhlock_idx)
+#define idx_prev(i)		((i) ? (i) - 1 : MAX_XHLOCKS_NR - 1)
+#define idx_next(i)		(((i) + 1) % MAX_XHLOCKS_NR)
+
+/* For easy access to xhlock */
+#define xhlock(t, i)		((t)->xhlocks + (i))
+#define xhlock_prev(t, l)	xhlock(t, idx_prev((l) - (t)->xhlocks))
+#define xhlock_curr(t)		xhlock(t, idx(t))
+#define xhlock_incr(t)		({idx(t) = idx_next(idx(t));})
+
+/*
+ * Whenever a crosslock is held, cross_gen_id will be increased.
+ */
+static atomic_t cross_gen_id; /* Can be wrapped */
+
+void crossrelease_hardirq_start(void)
+{
+	if (current->xhlocks) {
+		if (preempt_count() & NMI_MASK)
+			current->xhlock_idx_nmi = current->xhlock_idx;
+		else
+			current->xhlock_idx_hard = current->xhlock_idx;
+	}
+}
+
+void crossrelease_hardirq_end(void)
+{
+	if (current->xhlocks) {
+		if (preempt_count() & NMI_MASK)
+			current->xhlock_idx = current->xhlock_idx_nmi;
+		else
+			current->xhlock_idx = current->xhlock_idx_hard;
+	}
+}
+
+void crossrelease_softirq_start(void)
+{
+	if (current->xhlocks)
+		current->xhlock_idx_soft = current->xhlock_idx;
+}
+
+void crossrelease_softirq_end(void)
+{
+	if (current->xhlocks)
+		current->xhlock_idx = current->xhlock_idx_soft;
+}
+
+/*
+ * Crossrelease needs to distinguish each work of workqueues.
+ * Caller is supposed to be a worker.
+ */
+void crossrelease_work_start(void)
+{
+	if (current->xhlocks)
+		current->work_id++;
+}
+
+static int cross_lock(struct lockdep_map *lock)
+{
+	return lock ? lock->cross : 0;
+}
+
+/*
+ * This is needed to decide the relationship between wrapable variables.
+ */
+static inline int before(unsigned int a, unsigned int b)
+{
+	return (int)(a - b) < 0;
+}
+
+static inline struct lock_class *xhlock_class(struct hist_lock *xhlock)
+{
+	return hlock_class(&xhlock->hlock);
+}
+
+static inline struct lock_class *xlock_class(struct cross_lock *xlock)
+{
+	return hlock_class(&xlock->hlock);
+}
+
+/*
+ * Should we check a dependency with previous one?
+ */
+static inline int depend_before(struct held_lock *hlock)
+{
+	return hlock->read != 2 && hlock->check && !hlock->trylock;
+}
+
+/*
+ * Should we check a dependency with next one?
+ */
+static inline int depend_after(struct held_lock *hlock)
+{
+	return hlock->read != 2 && hlock->check;
+}
+
+/*
+ * Check if the xhlock is used at least once after initializaion.
+ * Remind hist_lock is implemented as a ring buffer.
+ */
+static inline int xhlock_used(struct hist_lock *xhlock)
+{
+	/*
+	 * xhlock->hlock.instance must be !NULL if it's used.
+	 */
+	return !!xhlock->hlock.instance;
+}
+
+/*
+ * Get a hist_lock from hist_lock ring buffer.
+ *
+ * Only access local task's data, so irq disable is only required.
+ */
+static struct hist_lock *alloc_xhlock(void)
+{
+	struct task_struct *curr = current;
+	struct hist_lock *xhlock = xhlock_curr(curr);
+
+	xhlock_incr(curr);
+	return xhlock;
+}
+
+/*
+ * Only access local task's data, so irq disable is only required.
+ */
+static void add_xhlock(struct held_lock *hlock, unsigned int prev_gen_id)
+{
+	struct hist_lock *xhlock;
+
+	xhlock = alloc_xhlock();
+
+	/* Initialize hist_lock's members */
+	xhlock->hlock = *hlock;
+	xhlock->nmi = !!(preempt_count() & NMI_MASK);
+	/*
+	 * prev_gen_id is used to skip adding dependency at commit step,
+	 * when the previous lock in held_locks can do that instead.
+	 */
+	xhlock->prev_gen_id = prev_gen_id;
+	xhlock->work_id = current->work_id;
+
+	xhlock->trace.nr_entries = 0;
+	xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
+	xhlock->trace.entries = xhlock->trace_entries;
+	xhlock->trace.skip = 3;
+	save_stack_trace(&xhlock->trace);
+}
+
+/*
+ * Only access local task's data, so irq disable is only required.
+ */
+static int same_context_xhlock(struct hist_lock *xhlock)
+{
+	struct task_struct *curr = current;
+
+	/* In the case of nmi context */
+	if (preempt_count() & NMI_MASK) {
+		if (xhlock->nmi)
+			return 1;
+	/* In the case of hardirq context */
+	} else if (curr->hardirq_context) {
+		if (xhlock->hlock.irq_context & 2) /* 2: bitmask for hardirq */
+			return 1;
+	/* In the case of softriq context */
+	} else if (curr->softirq_context) {
+		if (xhlock->hlock.irq_context & 1) /* 1: bitmask for softirq */
+			return 1;
+	/* In the case of process context */
+	} else {
+		if (xhlock->work_id == curr->work_id)
+			return 1;
+	}
+	return 0;
+}
+
+/*
+ * This should be lockless as far as possible because this would be
+ * called very frequently.
+ */
+static void check_add_xhlock(struct held_lock *hlock)
+{
+	struct held_lock *prev;
+	struct held_lock *start;
+	unsigned int gen_id;
+	unsigned int gen_id_invalid;
+
+	if (!current->xhlocks || !depend_before(hlock))
+		return;
+
+	gen_id = (unsigned int)atomic_read(&cross_gen_id);
+	/*
+	 * gen_id_invalid must be too old to be valid. That means
+	 * current hlock should not be skipped but should be
+	 * considered at commit step.
+	 */
+	gen_id_invalid = gen_id - (UINT_MAX / 4);
+	start = current->held_locks;
+
+	for (prev = hlock - 1; prev >= start &&
+			!depend_before(prev); prev--);
+
+	if (prev < start)
+		add_xhlock(hlock, gen_id_invalid);
+	else if (prev->gen_id != gen_id)
+		add_xhlock(hlock, prev->gen_id);
+}
+
+/*
+ * For crosslock.
+ */
+static int add_xlock(struct held_lock *hlock)
+{
+	struct cross_lock *xlock;
+	unsigned int gen_id;
+
+	if (!depend_after(hlock))
+		return 1;
+
+	if (!graph_lock())
+		return 0;
+
+	xlock = &((struct lockdep_map_cross *)hlock->instance)->xlock;
+
+	/*
+	 * When acquisitions for a xlock are overlapped, we use
+	 * a reference counter to handle it.
+	 */
+	if (atomic_inc_return(&xlock->ref) > 1)
+		goto unlock;
+
+	gen_id = (unsigned int)atomic_inc_return(&cross_gen_id);
+	xlock->hlock = *hlock;
+	xlock->hlock.gen_id = gen_id;
+unlock:
+	graph_unlock();
+	return 1;
+}
+
+/*
+ * return 0: Need to do normal acquire operation.
+ * return 1: Done. No more acquire ops is needed.
+ */
+static int lock_acquire_crosslock(struct held_lock *hlock)
+{
+	/*
+	 *	CONTEXT 1		CONTEXT 2
+	 *	---------		---------
+	 *	lock A (cross)
+	 *	X = atomic_inc_return()
+	 *	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ serialize
+	 *				Y = atomic_read_acquire()
+	 *				lock B
+	 *
+	 * atomic_read_acquire() is for ordering between this and all
+	 * following locks. This way, we ensure the order A -> B when
+	 * CONTEXT 2 can see that, Y is equal to or greater than X.
+	 *
+	 * Pairs with atomic_inc_return() in add_xlock().
+	 */
+	hlock->gen_id = (unsigned int)atomic_read_acquire(&cross_gen_id);
+
+	if (cross_lock(hlock->instance))
+		return add_xlock(hlock);
+
+	check_add_xhlock(hlock);
+	return 0;
+}
+
+static int copy_trace(struct stack_trace *trace)
+{
+	unsigned long *buf = stack_trace + nr_stack_trace_entries;
+	unsigned int max_nr = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
+	unsigned int nr = min(max_nr, trace->nr_entries);
+
+	trace->nr_entries = nr;
+	memcpy(buf, trace->entries, nr * sizeof(trace->entries[0]));
+	trace->entries = buf;
+	nr_stack_trace_entries += nr;
+
+	if (nr_stack_trace_entries >= MAX_STACK_TRACE_ENTRIES-1) {
+		if (!debug_locks_off_graph_unlock())
+			return 0;
+
+		print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!");
+		dump_stack();
+
+		return 0;
+	}
+
+	return 1;
+}
+
+static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
+{
+	unsigned int xid, pid;
+	u64 chain_key;
+
+	xid = xlock_class(xlock) - lock_classes;
+	chain_key = iterate_chain_key((u64)0, xid);
+	pid = xhlock_class(xhlock) - lock_classes;
+	chain_key = iterate_chain_key(chain_key, pid);
+
+	if (lookup_chain_cache(chain_key))
+		return 1;
+
+	if (!add_chain_cache_classes(xid, pid, xhlock->hlock.irq_context,
+				chain_key))
+		return 0;
+
+	if (!check_prev_add(current, &xlock->hlock, &xhlock->hlock, 1,
+			    &xhlock->trace, copy_trace))
+		return 0;
+
+	return 1;
+}
+
+static int commit_xhlocks(struct cross_lock *xlock)
+{
+	struct task_struct *curr = current;
+	struct hist_lock *xhlock_c = xhlock_curr(curr);
+	struct hist_lock *xhlock = xhlock_c;
+
+	do {
+		xhlock = xhlock_prev(curr, xhlock);
+
+		if (!xhlock_used(xhlock))
+			break;
+
+		if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
+			break;
+
+		if (same_context_xhlock(xhlock) &&
+		    before(xhlock->prev_gen_id, xlock->hlock.gen_id) &&
+		    !commit_xhlock(xlock, xhlock))
+			return 0;
+	} while (xhlock_c != xhlock);
+
+	return 1;
+}
+
+void lock_commit_crosslock(struct lockdep_map *lock)
+{
+	struct cross_lock *xlock;
+	unsigned long flags;
+
+	if (!current->xhlocks)
+		return;
+
+	if (unlikely(current->lockdep_recursion))
+		return;
+
+	raw_local_irq_save(flags);
+	check_flags(flags);
+	current->lockdep_recursion = 1;
+
+	if (unlikely(!debug_locks))
+		return;
+
+	if (!graph_lock())
+		return;
+
+	xlock = &((struct lockdep_map_cross *)lock)->xlock;
+	if (atomic_read(&xlock->ref) > 0 && !commit_xhlocks(xlock))
+		return;
+
+	graph_unlock();
+	current->lockdep_recursion = 0;
+	raw_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(lock_commit_crosslock);
+
+/*
+ * return 0: Need to do normal release operation.
+ * return 1: Done. No more release ops is needed.
+ */
+static int lock_release_crosslock(struct lockdep_map *lock)
+{
+	if (cross_lock(lock)) {
+		atomic_dec(&((struct lockdep_map_cross *)lock)->xlock.ref);
+		return 1;
+	}
+	return 0;
+}
+
+static void cross_init(struct lockdep_map *lock, int cross)
+{
+	if (cross)
+		atomic_set(&((struct lockdep_map_cross *)lock)->xlock.ref, 0);
+
+	lock->cross = cross;
+}
+#endif
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 479d840..b4a451f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2034,6 +2034,7 @@ static void process_one_work(struct worker *worker, struct work_struct *work)
 	struct lockdep_map lockdep_map;
 
 	lockdep_copy_map(&lockdep_map, &work->lockdep_map);
+	crossrelease_work_start();
 #endif
 	/* ensure we're on the correct CPU */
 	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a6c8db1..7890661 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1042,6 +1042,19 @@ config DEBUG_LOCK_ALLOC
 	 spin_lock_init()/mutex_init()/etc., or whether there is any lock
 	 held during task exit.
 
+config LOCKDEP_CROSSRELEASE
+	bool "Lock debugging: make lockdep work for crosslocks"
+	select LOCKDEP
+	select TRACE_IRQFLAGS
+	default n
+	help
+	 This makes lockdep work for crosslock which is a lock allowed to
+	 be released in a different context from the acquisition context.
+	 Normally a lock must be released in the context acquiring the lock.
+	 However, relexing this constraint helps synchronization primitives
+	 such as page locks or completions can use the lock correctness
+	 detector, lockdep.
+
 config PROVE_LOCKING
 	bool "Lock debugging: prove locking correctness"
 	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 07/13] lockdep: Make print_circular_bug() aware of crossrelease
  2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
                   ` (5 preceding siblings ...)
  2017-01-18 13:17 ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
@ 2017-01-18 13:17 ` Byungchul Park
  2017-01-18 13:17 ` [PATCH v5 08/13] lockdep: Apply crossrelease to completions Byungchul Park
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-18 13:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

print_circular_bug() reporting circular bug assumes that target hlock is
owned by the current. However, in crossrelease, target hlock can be
owned by other than the current. So the report format needs to be
changed to reflect the change.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 56 +++++++++++++++++++++++++++++++++---------------
 1 file changed, 39 insertions(+), 17 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 0621b3e..49b9386 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1129,22 +1129,41 @@ static inline int __bfs_backwards(struct lock_list *src_entry,
 		printk(KERN_CONT "\n\n");
 	}
 
-	printk(" Possible unsafe locking scenario:\n\n");
-	printk("       CPU0                    CPU1\n");
-	printk("       ----                    ----\n");
-	printk("  lock(");
-	__print_lock_name(target);
-	printk(KERN_CONT ");\n");
-	printk("                               lock(");
-	__print_lock_name(parent);
-	printk(KERN_CONT ");\n");
-	printk("                               lock(");
-	__print_lock_name(target);
-	printk(KERN_CONT ");\n");
-	printk("  lock(");
-	__print_lock_name(source);
-	printk(KERN_CONT ");\n");
-	printk("\n *** DEADLOCK ***\n\n");
+	if (cross_lock(tgt->instance)) {
+		printk(" Possible unsafe locking scenario by crosslock:\n\n");
+		printk("       CPU0                    CPU1\n");
+		printk("       ----                    ----\n");
+		printk("  lock(");
+		__print_lock_name(parent);
+		printk(KERN_CONT ");\n");
+		printk("  lock(");
+		__print_lock_name(target);
+		printk(KERN_CONT ");\n");
+		printk("                               lock(");
+		__print_lock_name(source);
+		printk(KERN_CONT ");\n");
+		printk("                               unlock(");
+		__print_lock_name(target);
+		printk(KERN_CONT ");\n");
+		printk("\n *** DEADLOCK ***\n\n");
+	} else {
+		printk(" Possible unsafe locking scenario:\n\n");
+		printk("       CPU0                    CPU1\n");
+		printk("       ----                    ----\n");
+		printk("  lock(");
+		__print_lock_name(target);
+		printk(KERN_CONT ");\n");
+		printk("                               lock(");
+		__print_lock_name(parent);
+		printk(KERN_CONT ");\n");
+		printk("                               lock(");
+		__print_lock_name(target);
+		printk(KERN_CONT ");\n");
+		printk("  lock(");
+		__print_lock_name(source);
+		printk(KERN_CONT ");\n");
+		printk("\n *** DEADLOCK ***\n\n");
+	}
 }
 
 /*
@@ -1169,7 +1188,10 @@ static inline int __bfs_backwards(struct lock_list *src_entry,
 	printk("%s/%d is trying to acquire lock:\n",
 		curr->comm, task_pid_nr(curr));
 	print_lock(check_src);
-	printk("\nbut task is already holding lock:\n");
+	if (cross_lock(check_tgt->instance))
+		printk("\nbut now in the release context of lock:\n");
+	else
+		printk("\nbut task is already holding lock:\n");
 	print_lock(check_tgt);
 	printk("\nwhich lock already depends on the new lock.\n\n");
 	printk("\nthe existing dependency chain (in reverse order) is:\n");
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 08/13] lockdep: Apply crossrelease to completions
  2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
                   ` (6 preceding siblings ...)
  2017-01-18 13:17 ` [PATCH v5 07/13] lockdep: Make print_circular_bug() aware of crossrelease Byungchul Park
@ 2017-01-18 13:17 ` Byungchul Park
  2017-01-18 13:17 ` [PATCH v5 09/13] pagemap.h: Remove trailing white space Byungchul Park
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-18 13:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

wait_for_completion() and its family can cause deadlock. Nevertheless,
it could not use the lock correntness validator until now, because
complete() is called in a different context from the waiting context,
which, we believed, violates lockdep's assumption. But it's not true.

Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
detector to completion operations. Applied it.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/completion.h | 118 +++++++++++++++++++++++++++++++++++++++++----
 kernel/sched/completion.c  |  54 ++++++++++++---------
 lib/Kconfig.debug          |   8 +++
 3 files changed, 147 insertions(+), 33 deletions(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 5d5aaae..8469476 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,6 +9,9 @@
  */
 
 #include <linux/wait.h>
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#include <linux/lockdep.h>
+#endif
 
 /*
  * struct completion - structure used to maintain state for a "completion"
@@ -25,10 +28,50 @@
 struct completion {
 	unsigned int done;
 	wait_queue_head_t wait;
+#ifdef CONFIG_LOCKDEP_COMPLETE
+	struct lockdep_map_cross map;
+#endif
 };
 
+#ifdef CONFIG_LOCKDEP_COMPLETE
+static inline void complete_acquire(struct completion *x)
+{
+	lock_acquire_exclusive((struct lockdep_map *)&x->map, 0, 0, NULL, _RET_IP_);
+}
+
+static inline void complete_release(struct completion *x)
+{
+	lock_release((struct lockdep_map *)&x->map, 0, _RET_IP_);
+}
+
+static inline void complete_release_commit(struct completion *x)
+{
+	lock_commit_crosslock((struct lockdep_map *)&x->map);
+}
+
+#define init_completion(x)						\
+do {									\
+	static struct lock_class_key __key;				\
+	lockdep_init_map_crosslock((struct lockdep_map *)&(x)->map,	\
+			"(complete)" #x,				\
+			&__key, 0);					\
+	__init_completion(x);						\
+} while (0)
+#else
+#define init_completion(x) __init_completion(x)
+static inline void complete_acquire(struct completion *x, int try) {}
+static inline void complete_release(struct completion *x) {}
+static inline void complete_release_commit(struct completion *x) {}
+#endif
+
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#define COMPLETION_INITIALIZER(work) \
+	{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
+	STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
+#else
 #define COMPLETION_INITIALIZER(work) \
 	{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+#endif
 
 #define COMPLETION_INITIALIZER_ONSTACK(work) \
 	({ init_completion(&work); work; })
@@ -70,7 +113,7 @@ struct completion {
  * This inline function will initialize a dynamically created completion
  * structure.
  */
-static inline void init_completion(struct completion *x)
+static inline void __init_completion(struct completion *x)
 {
 	x->done = 0;
 	init_waitqueue_head(&x->wait);
@@ -88,18 +131,75 @@ static inline void reinit_completion(struct completion *x)
 	x->done = 0;
 }
 
-extern void wait_for_completion(struct completion *);
-extern void wait_for_completion_io(struct completion *);
-extern int wait_for_completion_interruptible(struct completion *x);
-extern int wait_for_completion_killable(struct completion *x);
-extern unsigned long wait_for_completion_timeout(struct completion *x,
+extern void __wait_for_completion(struct completion *);
+extern void __wait_for_completion_io(struct completion *);
+extern int __wait_for_completion_interruptible(struct completion *x);
+extern int __wait_for_completion_killable(struct completion *x);
+extern unsigned long __wait_for_completion_timeout(struct completion *x,
 						   unsigned long timeout);
-extern unsigned long wait_for_completion_io_timeout(struct completion *x,
+extern unsigned long __wait_for_completion_io_timeout(struct completion *x,
 						    unsigned long timeout);
-extern long wait_for_completion_interruptible_timeout(
+extern long __wait_for_completion_interruptible_timeout(
 	struct completion *x, unsigned long timeout);
-extern long wait_for_completion_killable_timeout(
+extern long __wait_for_completion_killable_timeout(
 	struct completion *x, unsigned long timeout);
+
+static inline void wait_for_completion(struct completion *x)
+{
+	complete_acquire(x);
+	__wait_for_completion(x);
+	complete_release(x);
+}
+
+static inline void wait_for_completion_io(struct completion *x)
+{
+	complete_acquire(x);
+	__wait_for_completion_io(x);
+	complete_release(x);
+}
+
+static inline int wait_for_completion_interruptible(struct completion *x)
+{
+	int ret;
+	complete_acquire(x);
+	ret = __wait_for_completion_interruptible(x);
+	complete_release(x);
+	return ret;
+}
+
+static inline int wait_for_completion_killable(struct completion *x)
+{
+	int ret;
+	complete_acquire(x);
+	ret = __wait_for_completion_killable(x);
+	complete_release(x);
+	return ret;
+}
+
+static inline unsigned long wait_for_completion_timeout(struct completion *x,
+		unsigned long timeout)
+{
+	return __wait_for_completion_timeout(x, timeout);
+}
+
+static inline unsigned long wait_for_completion_io_timeout(struct completion *x,
+		unsigned long timeout)
+{
+	return __wait_for_completion_io_timeout(x, timeout);
+}
+
+static inline long wait_for_completion_interruptible_timeout(
+	struct completion *x, unsigned long timeout)
+{
+	return __wait_for_completion_interruptible_timeout(x, timeout);
+}
+
+static inline long wait_for_completion_killable_timeout(
+	struct completion *x, unsigned long timeout)
+{
+	return __wait_for_completion_killable_timeout(x, timeout);
+}
+
 extern bool try_wait_for_completion(struct completion *x);
 extern bool completion_done(struct completion *x);
 
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 8d0f35d..847b1d4 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -31,6 +31,10 @@ void complete(struct completion *x)
 	unsigned long flags;
 
 	spin_lock_irqsave(&x->wait.lock, flags);
+	/*
+	 * Perform commit of crossrelease here.
+	 */
+	complete_release_commit(x);
 	x->done++;
 	__wake_up_locked(&x->wait, TASK_NORMAL, 1);
 	spin_unlock_irqrestore(&x->wait.lock, flags);
@@ -108,7 +112,7 @@ void complete_all(struct completion *x)
 }
 
 /**
- * wait_for_completion: - waits for completion of a task
+ * __wait_for_completion: - waits for completion of a task
  * @x:  holds the state of this particular completion
  *
  * This waits to be signaled for completion of a specific task. It is NOT
@@ -117,14 +121,14 @@ void complete_all(struct completion *x)
  * See also similar routines (i.e. wait_for_completion_timeout()) with timeout
  * and interrupt capability. Also see complete().
  */
-void __sched wait_for_completion(struct completion *x)
+void __sched __wait_for_completion(struct completion *x)
 {
 	wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion);
+EXPORT_SYMBOL(__wait_for_completion);
 
 /**
- * wait_for_completion_timeout: - waits for completion of a task (w/timeout)
+ * __wait_for_completion_timeout: - waits for completion of a task (w/timeout)
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -136,28 +140,28 @@ void __sched wait_for_completion(struct completion *x)
  * till timeout) if completed.
  */
 unsigned long __sched
-wait_for_completion_timeout(struct completion *x, unsigned long timeout)
+__wait_for_completion_timeout(struct completion *x, unsigned long timeout)
 {
 	return wait_for_common(x, timeout, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_timeout);
+EXPORT_SYMBOL(__wait_for_completion_timeout);
 
 /**
- * wait_for_completion_io: - waits for completion of a task
+ * __wait_for_completion_io: - waits for completion of a task
  * @x:  holds the state of this particular completion
  *
  * This waits to be signaled for completion of a specific task. It is NOT
  * interruptible and there is no timeout. The caller is accounted as waiting
  * for IO (which traditionally means blkio only).
  */
-void __sched wait_for_completion_io(struct completion *x)
+void __sched __wait_for_completion_io(struct completion *x)
 {
 	wait_for_common_io(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_io);
+EXPORT_SYMBOL(__wait_for_completion_io);
 
 /**
- * wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
+ * __wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -170,14 +174,14 @@ void __sched wait_for_completion_io(struct completion *x)
  * till timeout) if completed.
  */
 unsigned long __sched
-wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
+__wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
 {
 	return wait_for_common_io(x, timeout, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_io_timeout);
+EXPORT_SYMBOL(__wait_for_completion_io_timeout);
 
 /**
- * wait_for_completion_interruptible: - waits for completion of a task (w/intr)
+ * __wait_for_completion_interruptible: - waits for completion of a task (w/intr)
  * @x:  holds the state of this particular completion
  *
  * This waits for completion of a specific task to be signaled. It is
@@ -185,17 +189,18 @@ void __sched wait_for_completion_io(struct completion *x)
  *
  * Return: -ERESTARTSYS if interrupted, 0 if completed.
  */
-int __sched wait_for_completion_interruptible(struct completion *x)
+int __sched __wait_for_completion_interruptible(struct completion *x)
 {
 	long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE);
+
 	if (t == -ERESTARTSYS)
 		return t;
 	return 0;
 }
-EXPORT_SYMBOL(wait_for_completion_interruptible);
+EXPORT_SYMBOL(__wait_for_completion_interruptible);
 
 /**
- * wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
+ * __wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -206,15 +211,15 @@ int __sched wait_for_completion_interruptible(struct completion *x)
  * or number of jiffies left till timeout) if completed.
  */
 long __sched
-wait_for_completion_interruptible_timeout(struct completion *x,
+__wait_for_completion_interruptible_timeout(struct completion *x,
 					  unsigned long timeout)
 {
 	return wait_for_common(x, timeout, TASK_INTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_interruptible_timeout);
+EXPORT_SYMBOL(__wait_for_completion_interruptible_timeout);
 
 /**
- * wait_for_completion_killable: - waits for completion of a task (killable)
+ * __wait_for_completion_killable: - waits for completion of a task (killable)
  * @x:  holds the state of this particular completion
  *
  * This waits to be signaled for completion of a specific task. It can be
@@ -222,17 +227,18 @@ int __sched wait_for_completion_interruptible(struct completion *x)
  *
  * Return: -ERESTARTSYS if interrupted, 0 if completed.
  */
-int __sched wait_for_completion_killable(struct completion *x)
+int __sched __wait_for_completion_killable(struct completion *x)
 {
 	long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_KILLABLE);
+
 	if (t == -ERESTARTSYS)
 		return t;
 	return 0;
 }
-EXPORT_SYMBOL(wait_for_completion_killable);
+EXPORT_SYMBOL(__wait_for_completion_killable);
 
 /**
- * wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
+ * __wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -244,12 +250,12 @@ int __sched wait_for_completion_killable(struct completion *x)
  * or number of jiffies left till timeout) if completed.
  */
 long __sched
-wait_for_completion_killable_timeout(struct completion *x,
+__wait_for_completion_killable_timeout(struct completion *x,
 				     unsigned long timeout)
 {
 	return wait_for_common(x, timeout, TASK_KILLABLE);
 }
-EXPORT_SYMBOL(wait_for_completion_killable_timeout);
+EXPORT_SYMBOL(__wait_for_completion_killable_timeout);
 
 /**
  *	try_wait_for_completion - try to decrement a completion without blocking
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 7890661..ad172b5 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1055,6 +1055,14 @@ config LOCKDEP_CROSSRELEASE
 	 such as page locks or completions can use the lock correctness
 	 detector, lockdep.
 
+config LOCKDEP_COMPLETE
+	bool "Lock debugging: allow completions to use deadlock detector"
+	select LOCKDEP_CROSSRELEASE
+	default n
+	help
+	 A deadlock caused by wait_for_completion() and complete() can be
+	 detected by lockdep using crossrelease feature.
+
 config PROVE_LOCKING
 	bool "Lock debugging: prove locking correctness"
 	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 09/13] pagemap.h: Remove trailing white space
  2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
                   ` (7 preceding siblings ...)
  2017-01-18 13:17 ` [PATCH v5 08/13] lockdep: Apply crossrelease to completions Byungchul Park
@ 2017-01-18 13:17 ` Byungchul Park
  2017-01-18 13:17 ` [PATCH v5 10/13] lockdep: Apply crossrelease to PG_locked locks Byungchul Park
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-18 13:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

Trailing white space is not accepted in kernel coding style. Remove
them.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/pagemap.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 7dbe914..a8ee59a 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -504,7 +504,7 @@ static inline void wake_up_page(struct page *page, int bit)
 	__wake_up_bit(page_waitqueue(page), &page->flags, bit);
 }
 
-/* 
+/*
  * Wait for a page to be unlocked.
  *
  * This must be called with the caller "holding" the page,
@@ -517,7 +517,7 @@ static inline void wait_on_page_locked(struct page *page)
 		wait_on_page_bit(compound_head(page), PG_locked);
 }
 
-/* 
+/*
  * Wait for a page to complete writeback
  */
 static inline void wait_on_page_writeback(struct page *page)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 10/13] lockdep: Apply crossrelease to PG_locked locks
  2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
                   ` (8 preceding siblings ...)
  2017-01-18 13:17 ` [PATCH v5 09/13] pagemap.h: Remove trailing white space Byungchul Park
@ 2017-01-18 13:17 ` Byungchul Park
  2017-01-18 13:17 ` [PATCH v5 11/13] lockdep: Apply lock_acquire(release) on __Set(__Clear)PageLocked Byungchul Park
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-18 13:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

lock_page() and its family can cause deadlock. Nevertheless, it could
not use the lock correctness validator until now, becasue unlock_page()
might be called in a different context from the acquisition context,
which, we believed, violates lockdep's assumption. But it's not true.

Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
detector to page locks. Applied it.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/mm_types.h |   8 ++++
 include/linux/pagemap.h  | 100 ++++++++++++++++++++++++++++++++++++++++++++---
 lib/Kconfig.debug        |   8 ++++
 mm/filemap.c             |   4 +-
 mm/page_alloc.c          |   3 ++
 5 files changed, 115 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 4a8aced..06adfa2 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -16,6 +16,10 @@
 #include <asm/page.h>
 #include <asm/mmu.h>
 
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include <linux/lockdep.h>
+#endif
+
 #ifndef AT_VECTOR_SIZE_ARCH
 #define AT_VECTOR_SIZE_ARCH 0
 #endif
@@ -221,6 +225,10 @@ struct page {
 #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
 	int _last_cpupid;
 #endif
+
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+	struct lockdep_map_cross map;
+#endif
 }
 /*
  * The struct page can be forced to be double word aligned so that atomic ops
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a8ee59a..a3ecec1 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -14,6 +14,9 @@
 #include <linux/bitops.h>
 #include <linux/hardirq.h> /* for in_interrupt() */
 #include <linux/hugetlb_inline.h>
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include <linux/lockdep.h>
+#endif
 
 /*
  * Bits in mapping->flags.
@@ -432,26 +435,90 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
 	return pgoff;
 }
 
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#define lock_page_init(p)						\
+do {									\
+	static struct lock_class_key __key;				\
+	lockdep_init_map_crosslock((struct lockdep_map *)&(p)->map,	\
+			"(PG_locked)" #p, &__key, 0);			\
+} while (0)
+
+static inline void lock_page_acquire(struct page *page, int try)
+{
+	page = compound_head(page);
+	lock_acquire_exclusive((struct lockdep_map *)&page->map, 0,
+			       try, NULL, _RET_IP_);
+}
+
+static inline void lock_page_release(struct page *page)
+{
+	page = compound_head(page);
+	/*
+	 * lock_commit_crosslock() is necessary for crosslocks.
+	 */
+	lock_commit_crosslock((struct lockdep_map *)&page->map);
+	lock_release((struct lockdep_map *)&page->map, 0, _RET_IP_);
+}
+#else
+static inline void lock_page_init(struct page *page) {}
+static inline void lock_page_free(struct page *page) {}
+static inline void lock_page_acquire(struct page *page, int try) {}
+static inline void lock_page_release(struct page *page) {}
+#endif
+
 extern void __lock_page(struct page *page);
 extern int __lock_page_killable(struct page *page);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
-extern void unlock_page(struct page *page);
+extern void do_raw_unlock_page(struct page *page);
 
-static inline int trylock_page(struct page *page)
+static inline void unlock_page(struct page *page)
+{
+	lock_page_release(page);
+	do_raw_unlock_page(page);
+}
+
+static inline int do_raw_trylock_page(struct page *page)
 {
 	page = compound_head(page);
 	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
 }
 
+static inline int trylock_page(struct page *page)
+{
+	if (do_raw_trylock_page(page)) {
+		lock_page_acquire(page, 1);
+		return 1;
+	}
+	return 0;
+}
+
 /*
  * lock_page may only be called if we have the page's inode pinned.
  */
 static inline void lock_page(struct page *page)
 {
 	might_sleep();
-	if (!trylock_page(page))
+
+	if (!do_raw_trylock_page(page))
 		__lock_page(page);
+	/*
+	 * acquire() must be after actual lock operation for crosslocks.
+	 * This way a crosslock and other locks can be serialized like:
+	 *
+	 *	CONTEXT 1		CONTEXT 2
+	 *	---------		---------
+	 *	lock A (cross)
+	 *	acquire A
+	 *	  atomic_inc_return()
+	 *	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ serialize
+	 *				acquire B
+	 *				  atomic_read_acquire()
+	 *				lock B
+	 *
+	 * so that 'A -> B' can be seen globally.
+	 */
+	lock_page_acquire(page, 0);
 }
 
 /*
@@ -461,9 +528,20 @@ static inline void lock_page(struct page *page)
  */
 static inline int lock_page_killable(struct page *page)
 {
+	int ret;
+
 	might_sleep();
-	if (!trylock_page(page))
-		return __lock_page_killable(page);
+
+	if (!do_raw_trylock_page(page)) {
+		ret = __lock_page_killable(page);
+		if (ret)
+			return ret;
+	}
+	/*
+	 * acquire() must be after actual lock operation for crosslocks.
+	 * This way a crosslock and other locks can be serialized.
+	 */
+	lock_page_acquire(page, 0);
 	return 0;
 }
 
@@ -478,7 +556,17 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				     unsigned int flags)
 {
 	might_sleep();
-	return trylock_page(page) || __lock_page_or_retry(page, mm, flags);
+
+	if (do_raw_trylock_page(page) || __lock_page_or_retry(page, mm, flags)) {
+		/*
+		 * acquire() must be after actual lock operation for crosslocks.
+		 * This way a crosslock and other locks can be serialized.
+		 */
+		lock_page_acquire(page, 0);
+		return 1;
+	}
+
+	return 0;
 }
 
 /*
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ad172b5..69364d0 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1063,6 +1063,14 @@ config LOCKDEP_COMPLETE
 	 A deadlock caused by wait_for_completion() and complete() can be
 	 detected by lockdep using crossrelease feature.
 
+config LOCKDEP_PAGELOCK
+	bool "Lock debugging: allow PG_locked lock to use deadlock detector"
+	select LOCKDEP_CROSSRELEASE
+	default n
+	help
+	 PG_locked lock is a kind of crosslock. Using crossrelease feature,
+	 PG_locked lock can work with runtime deadlock detector, lockdep.
+
 config PROVE_LOCKING
 	bool "Lock debugging: prove locking correctness"
 	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
diff --git a/mm/filemap.c b/mm/filemap.c
index 50b52fe..d439cc7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -858,7 +858,7 @@ void add_page_wait_queue(struct page *page, wait_queue_t *waiter)
  * The mb is necessary to enforce ordering between the clear_bit and the read
  * of the waitqueue (to avoid SMP races with a parallel wait_on_page_locked()).
  */
-void unlock_page(struct page *page)
+void do_raw_unlock_page(struct page *page)
 {
 	page = compound_head(page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
@@ -866,7 +866,7 @@ void unlock_page(struct page *page)
 	smp_mb__after_atomic();
 	wake_up_page(page, PG_locked);
 }
-EXPORT_SYMBOL(unlock_page);
+EXPORT_SYMBOL(do_raw_unlock_page);
 
 /**
  * end_page_writeback - end writeback against a page
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6de9440..36d5f9e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5063,6 +5063,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		} else {
 			__init_single_pfn(pfn, zone, nid);
 		}
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+		lock_page_init(pfn_to_page(pfn));
+#endif
 	}
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 11/13] lockdep: Apply lock_acquire(release) on __Set(__Clear)PageLocked
  2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
                   ` (9 preceding siblings ...)
  2017-01-18 13:17 ` [PATCH v5 10/13] lockdep: Apply crossrelease to PG_locked locks Byungchul Park
@ 2017-01-18 13:17 ` Byungchul Park
  2017-01-18 13:17 ` [PATCH v5 12/13] lockdep: Move data of CONFIG_LOCKDEP_PAGELOCK from page to page_ext Byungchul Park
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-18 13:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

Usually PG_locked bit is updated by lock_page() or unlock_page().
However, it can be also updated through __SetPageLocked() or
__ClearPageLockded(). They have to be considered, to get paired between
acquire and release.

Furthermore, e.g. __SetPageLocked() in add_to_page_cache_lru() is called
frequently. We might miss many chances to check deadlock if we ignore it.
Make __Set(__Clear)PageLockded considered as well.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/page-flags.h | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 74e4dda..9d5f79d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -252,7 +252,6 @@ static __always_inline int PageCompound(struct page *page)
 #define TESTSCFLAG_FALSE(uname)						\
 	TESTSETFLAG_FALSE(uname) TESTCLEARFLAG_FALSE(uname)
 
-__PAGEFLAG(Locked, locked, PF_NO_TAIL)
 PAGEFLAG(Error, error, PF_NO_COMPOUND) TESTCLEARFLAG(Error, error, PF_NO_COMPOUND)
 PAGEFLAG(Referenced, referenced, PF_HEAD)
 	TESTCLEARFLAG(Referenced, referenced, PF_HEAD)
@@ -354,6 +353,35 @@ static __always_inline int PageCompound(struct page *page)
 PAGEFLAG(Idle, idle, PF_ANY)
 #endif
 
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include <linux/lockdep.h>
+
+TESTPAGEFLAG(Locked, locked, PF_NO_TAIL)
+
+static __always_inline void __SetPageLocked(struct page *page)
+{
+	__set_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
+
+	page = compound_head(page);
+	lock_acquire_exclusive((struct lockdep_map *)&page->map, 0, 1, NULL, _RET_IP_);
+}
+
+static __always_inline void __ClearPageLocked(struct page *page)
+{
+	__clear_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
+
+	page = compound_head(page);
+	/*
+	 * lock_commit_crosslock() is necessary for crosslock
+	 * when the lock is released, before lock_release().
+	 */
+	lock_commit_crosslock((struct lockdep_map *)&page->map);
+	lock_release((struct lockdep_map *)&page->map, 0, _RET_IP_);
+}
+#else
+__PAGEFLAG(Locked, locked, PF_NO_TAIL)
+#endif
+
 /*
  * On an anonymous page mapped into a user virtual memory area,
  * page->mapping points to its anon_vma, not to a struct address_space;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 12/13] lockdep: Move data of CONFIG_LOCKDEP_PAGELOCK from page to page_ext
  2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
                   ` (10 preceding siblings ...)
  2017-01-18 13:17 ` [PATCH v5 11/13] lockdep: Apply lock_acquire(release) on __Set(__Clear)PageLocked Byungchul Park
@ 2017-01-18 13:17 ` Byungchul Park
  2017-01-18 13:17 ` [PATCH v5 13/13] lockdep: Crossrelease feature documentation Byungchul Park
  2017-02-20  8:38 ` [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
  13 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-18 13:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

CONFIG_LOCKDEP_PAGELOCK needs to keep lockdep_map_cross per page, to
work with lockdep. Since it's a debug feature, it's preferred to keep
it in struct page_ext than struct page. Move it to struct page_ext.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/mm_types.h   |  4 ---
 include/linux/page-flags.h | 19 ++++++++++--
 include/linux/page_ext.h   |  4 +++
 include/linux/pagemap.h    | 28 +++++++++++++++---
 lib/Kconfig.debug          |  1 +
 mm/filemap.c               | 72 ++++++++++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c            |  3 --
 mm/page_ext.c              |  4 +++
 8 files changed, 121 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 06adfa2..a6c7133 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -225,10 +225,6 @@ struct page {
 #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
 	int _last_cpupid;
 #endif
-
-#ifdef CONFIG_LOCKDEP_PAGELOCK
-	struct lockdep_map_cross map;
-#endif
 }
 /*
  * The struct page can be forced to be double word aligned so that atomic ops
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 9d5f79d..cca33f5 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -355,28 +355,41 @@ static __always_inline int PageCompound(struct page *page)
 
 #ifdef CONFIG_LOCKDEP_PAGELOCK
 #include <linux/lockdep.h>
+#include <linux/page_ext.h>
 
 TESTPAGEFLAG(Locked, locked, PF_NO_TAIL)
 
 static __always_inline void __SetPageLocked(struct page *page)
 {
+	struct page_ext *e;
+
 	__set_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
 
 	page = compound_head(page);
-	lock_acquire_exclusive((struct lockdep_map *)&page->map, 0, 1, NULL, _RET_IP_);
+	e = lookup_page_ext(page);
+	if (unlikely(!e))
+		return;
+
+	lock_acquire_exclusive((struct lockdep_map *)&e->map, 0, 1, NULL, _RET_IP_);
 }
 
 static __always_inline void __ClearPageLocked(struct page *page)
 {
+	struct page_ext *e;
+
 	__clear_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
 
 	page = compound_head(page);
+	e = lookup_page_ext(page);
+	if (unlikely(!e))
+		return;
+
 	/*
 	 * lock_commit_crosslock() is necessary for crosslock
 	 * when the lock is released, before lock_release().
 	 */
-	lock_commit_crosslock((struct lockdep_map *)&page->map);
-	lock_release((struct lockdep_map *)&page->map, 0, _RET_IP_);
+	lock_commit_crosslock((struct lockdep_map *)&e->map);
+	lock_release((struct lockdep_map *)&e->map, 0, _RET_IP_);
 }
 #else
 __PAGEFLAG(Locked, locked, PF_NO_TAIL)
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index 9298c39..d1c52c8c 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -44,6 +44,10 @@ enum page_ext_flags {
  */
 struct page_ext {
 	unsigned long flags;
+
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+	struct lockdep_map_cross map;
+#endif
 };
 
 extern void pgdat_page_ext_init(struct pglist_data *pgdat);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a3ecec1..e38a5c8 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -16,6 +16,7 @@
 #include <linux/hugetlb_inline.h>
 #ifdef CONFIG_LOCKDEP_PAGELOCK
 #include <linux/lockdep.h>
+#include <linux/page_ext.h>
 #endif
 
 /*
@@ -436,28 +437,47 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
 }
 
 #ifdef CONFIG_LOCKDEP_PAGELOCK
+extern struct page_ext_operations lockdep_pagelock_ops;
+
 #define lock_page_init(p)						\
 do {									\
 	static struct lock_class_key __key;				\
-	lockdep_init_map_crosslock((struct lockdep_map *)&(p)->map,	\
+	struct page_ext *e = lookup_page_ext(p);		\
+								\
+	if (unlikely(!e))					\
+		break;						\
+								\
+	lockdep_init_map_crosslock((struct lockdep_map *)&(e)->map,	\
 			"(PG_locked)" #p, &__key, 0);			\
 } while (0)
 
 static inline void lock_page_acquire(struct page *page, int try)
 {
+	struct page_ext *e;
+
 	page = compound_head(page);
-	lock_acquire_exclusive((struct lockdep_map *)&page->map, 0,
+	e = lookup_page_ext(page);
+	if (unlikely(!e))
+		return;
+
+	lock_acquire_exclusive((struct lockdep_map *)&e->map, 0,
 			       try, NULL, _RET_IP_);
 }
 
 static inline void lock_page_release(struct page *page)
 {
+	struct page_ext *e;
+
 	page = compound_head(page);
+	e = lookup_page_ext(page);
+	if (unlikely(!e))
+		return;
+
 	/*
 	 * lock_commit_crosslock() is necessary for crosslocks.
 	 */
-	lock_commit_crosslock((struct lockdep_map *)&page->map);
-	lock_release((struct lockdep_map *)&page->map, 0, _RET_IP_);
+	lock_commit_crosslock((struct lockdep_map *)&e->map);
+	lock_release((struct lockdep_map *)&e->map, 0, _RET_IP_);
 }
 #else
 static inline void lock_page_init(struct page *page) {}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 69364d0..0b3118b 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1066,6 +1066,7 @@ config LOCKDEP_COMPLETE
 config LOCKDEP_PAGELOCK
 	bool "Lock debugging: allow PG_locked lock to use deadlock detector"
 	select LOCKDEP_CROSSRELEASE
+	select PAGE_EXTENSION
 	default n
 	help
 	 PG_locked lock is a kind of crosslock. Using crossrelease feature,
diff --git a/mm/filemap.c b/mm/filemap.c
index d439cc7..03d669f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -35,6 +35,9 @@
 #include <linux/memcontrol.h>
 #include <linux/cleancache.h>
 #include <linux/rmap.h>
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include <linux/page_ext.h>
+#endif
 #include "internal.h"
 
 #define CREATE_TRACE_POINTS
@@ -986,6 +989,75 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 	}
 }
 
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+static bool need_lockdep_pagelock(void) { return true; }
+
+static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
+{
+	struct page *page;
+	struct page_ext *page_ext;
+	unsigned long pfn = zone->zone_start_pfn;
+	unsigned long end_pfn = pfn + zone->spanned_pages;
+	unsigned long count = 0;
+
+	for (; pfn < end_pfn; pfn++) {
+		if (!pfn_valid(pfn)) {
+			pfn = ALIGN(pfn + 1, MAX_ORDER_NR_PAGES);
+			continue;
+		}
+
+		if (!pfn_valid_within(pfn))
+			continue;
+
+		page = pfn_to_page(pfn);
+
+		if (page_zone(page) != zone)
+			continue;
+
+		if (PageReserved(page))
+			continue;
+
+		page_ext = lookup_page_ext(page);
+		if (unlikely(!page_ext))
+			continue;
+
+		lock_page_init(page);
+		count++;
+	}
+
+	pr_info("Node %d, zone %8s: lockdep pagelock found early allocated %lu pages\n",
+		pgdat->node_id, zone->name, count);
+}
+
+static void init_zones_in_node(pg_data_t *pgdat)
+{
+	struct zone *zone;
+	struct zone *node_zones = pgdat->node_zones;
+	unsigned long flags;
+
+	for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) {
+		if (!populated_zone(zone))
+			continue;
+
+		spin_lock_irqsave(&zone->lock, flags);
+		init_pages_in_zone(pgdat, zone);
+		spin_unlock_irqrestore(&zone->lock, flags);
+	}
+}
+
+static void init_lockdep_pagelock(void)
+{
+	pg_data_t *pgdat;
+	for_each_online_pgdat(pgdat)
+		init_zones_in_node(pgdat);
+}
+
+struct page_ext_operations lockdep_pagelock_ops = {
+	.need = need_lockdep_pagelock,
+	.init = init_lockdep_pagelock,
+};
+#endif
+
 /**
  * page_cache_next_hole - find the next hole (not-present entry)
  * @mapping: mapping
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 36d5f9e..6de9440 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5063,9 +5063,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		} else {
 			__init_single_pfn(pfn, zone, nid);
 		}
-#ifdef CONFIG_LOCKDEP_PAGELOCK
-		lock_page_init(pfn_to_page(pfn));
-#endif
 	}
 }
 
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 121dcff..023ac65 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -7,6 +7,7 @@
 #include <linux/kmemleak.h>
 #include <linux/page_owner.h>
 #include <linux/page_idle.h>
+#include <linux/pagemap.h>
 
 /*
  * struct page extension
@@ -68,6 +69,9 @@
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	&page_idle_ops,
 #endif
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+	&lockdep_pagelock_ops,
+#endif
 };
 
 static unsigned long total_usage;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v5 13/13] lockdep: Crossrelease feature documentation
  2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
                   ` (11 preceding siblings ...)
  2017-01-18 13:17 ` [PATCH v5 12/13] lockdep: Move data of CONFIG_LOCKDEP_PAGELOCK from page to page_ext Byungchul Park
@ 2017-01-18 13:17 ` Byungchul Park
  2017-01-20  9:08   ` [REVISED DOCUMENT] " Byungchul Park
  2017-02-20  8:38 ` [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
  13 siblings, 1 reply; 63+ messages in thread
From: Byungchul Park @ 2017-01-18 13:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

This document describes the concept of crossrelease feature.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 Documentation/locking/crossrelease.txt | 1053 ++++++++++++++++++++++++++++++++
 1 file changed, 1053 insertions(+)
 create mode 100644 Documentation/locking/crossrelease.txt

diff --git a/Documentation/locking/crossrelease.txt b/Documentation/locking/crossrelease.txt
new file mode 100644
index 0000000..dec890c
--- /dev/null
+++ b/Documentation/locking/crossrelease.txt
@@ -0,0 +1,1053 @@
+Crossrelease
+============
+
+Started by Byungchul Park <byungchul.park@lge.com>
+
+Contents:
+
+ (*) Background.
+
+     - What causes deadlock.
+     - What lockdep detects.
+     - How lockdep works.
+
+ (*) Limitation.
+
+     - Limit to typical locks.
+     - Pros from the limitation.
+     - Cons from the limitation.
+
+ (*) Generalization.
+
+     - Relax the limitation.
+
+ (*) Crossrelease.
+
+     - Introduce crossrelease.
+     - Pick true dependencies.
+     - Introduce commit.
+
+ (*) Implementation.
+
+     - Data structures.
+     - How crossrelease works.
+
+ (*) Optimizations.
+
+     - Avoid duplication.
+     - Lockless for hot paths.
+
+
+==========
+Background
+==========
+
+What causes deadlock
+--------------------
+
+A deadlock occurs when a context is waiting for an event to happen,
+which is impossible because another (or the) context who can trigger the
+event is also waiting for another (or the) event to happen, which is
+also impossible due to the same reason. Single or more contexts
+paricipate in such a deadlock.
+
+For example,
+
+   A context going to trigger event D is waiting for event A to happen.
+   A context going to trigger event A is waiting for event B to happen.
+   A context going to trigger event B is waiting for event C to happen.
+   A context going to trigger event C is waiting for event D to happen.
+
+A deadlock occurs when these four wait operations run at the same time,
+because event D cannot be triggered if event A does not happen, which in
+turn cannot be triggered if event B does not happen, which in turn
+cannot be triggered if event C does not happen, which in turn cannot be
+triggered if event D does not happen. After all, no event can be
+triggered since any of them never meets its precondition to wake up.
+
+In terms of dependency, a wait for an event creates a dependency if the
+context is going to wake up another waiter by triggering an proper event.
+In other words, a dependency exists if,
+
+   COND 1. There are two waiters waiting for each event at the same time.
+   COND 2. Only way to wake up each waiter is to trigger its events.
+   COND 3. Whether one can be woken up depends on whether the other can.
+
+Each wait in the example creates its dependency like,
+
+   Event D depends on event A.
+   Event A depends on event B.
+   Event B depends on event C.
+   Event C depends on event D.
+
+   NOTE: Precisely speaking, a dependency is one between whether a
+   waiter for an event can be woken up and whether another waiter for
+   another event can be woken up. However from now on, we will describe
+   a dependency as if it's one between an event and another event for
+   simplicity, so e.g. 'event D depends on event A'.
+
+And they form circular dependencies like,
+
+    -> D -> A -> B -> C -
+   /                     \
+   \                     /
+    ---------------------
+
+   where A, B,..., D are different events, and '->' represents 'depends
+   on'.
+
+Such circular dependencies lead to a deadlock since no waiter can meet
+its precondition to wake up if they run simultaneously, as described.
+
+CONCLUSION
+
+Circular dependencies cause a deadlock.
+
+
+What lockdep detects
+--------------------
+
+Lockdep tries to detect a deadlock by checking dependencies created by
+lock operations e.i. acquire and release. Waiting for a lock to be
+released corresponds to waiting for an event to happen, and releasing a
+lock corresponds to triggering an event. See 'What causes deadlock'
+section.
+
+A deadlock actually occurs when all wait operations creating circular
+dependencies run at the same time. Even though they don't, a potential
+deadlock exists if the problematic dependencies exist. Thus it's
+meaningful to detect not only an actual deadlock but also its potential
+possibility. Lockdep does the both.
+
+Whether or not a deadlock actually occurs depends on several factors.
+For example, what order contexts are switched in is a factor. Assuming
+circular dependencies exist, a deadlock would occur when contexts are
+switched so that all wait operations creating the problematic
+dependencies run simultaneously.
+
+To detect a potential possibility which means a deadlock has not
+happened yet but might happen in future, lockdep considers all possible
+combinations of dependencies so that its potential possibility can be
+detected in advance. To do this, lockdep is trying to,
+
+1. Use a global dependency graph.
+
+   Lockdep combines all dependencies into one global graph and uses them,
+   regardless of which context generates them or what order contexts are
+   switched in. Aggregated dependencies are only considered so they are
+   prone to be circular if a problem exists.
+
+2. Check dependencies between classes instead of instances.
+
+   What actually causes a deadlock are instances of lock. However,
+   lockdep checks dependencies between classes instead of instances.
+   This way lockdep can detect a deadlock which has not happened but
+   might happen in future by others but the same classes.
+
+3. Assume all acquisitions lead to waiting.
+
+   Although locks might be acquired without waiting which is essential
+   to create dependencies, lockdep assumes all acquisitions lead to
+   waiting and generates dependencies, since it might be true some time
+   or another. Potential possibilities can be checked in this way.
+
+Lockdep detects both an actual deadlock and its possibility. But the
+latter is more valuable than the former. When a deadlock occurs actually,
+we can identify what happens in the system by some means or other even
+without lockdep. However, there's no way to detect possiblity without
+lockdep unless the whole code is parsed in head. It's terrible.
+
+CONCLUSION
+
+Lockdep detects and reports,
+
+   1. A deadlock possibility.
+   2. A deadlock which actually occured.
+
+
+How lockdep works
+-----------------
+
+Lockdep does,
+
+   1. Detect a new dependency created.
+   2. Keep the dependency in a global data structure, graph.
+   3. Check if circular dependencies exist.
+   4. Report a deadlock or its possibility if so.
+
+A graph built by lockdep looks like, e.g.
+
+   A -> B -        -> F -> G
+           \      /
+            -> E -        -> L
+           /      \      /
+   C -> D -        -> H -
+                         \
+                          -> I -> K
+                         /
+                      J -
+
+   where A, B,..., L are different lock classes.
+
+Lockdep will add a dependency into graph when a new dependency is
+detected. For example, it will add a dependency 'K -> J' when a new
+dependency between lock K and lock J is detected. Then the graph will be,
+
+   A -> B -        -> F -> G
+           \      /
+            -> E -        -> L
+           /      \      /
+   C -> D -        -> H -
+                         \
+                          -> I -> K -
+                         /           \
+                   -> J -             \
+                  /                   /
+                  \                  /
+                   ------------------
+
+   where A, B,..., L are different lock classes.
+
+Now, circular dependencies are detected like,
+
+           -> I -> K -
+          /           \
+    -> J -             \
+   /                   /
+   \                  /
+    ------------------
+
+   where J, I and K are different lock classes.
+
+As decribed in 'What causes deadlock', this is the condition under which
+a deadlock might occur. Lockdep detects a deadlock or its possibility by
+checking if circular dependencies were created after adding each new
+dependency into the global graph. This is the way how lockdep works.
+
+CONCLUSION
+
+Lockdep detects a deadlock or its possibility by checking if circular
+dependencies were created after adding each new dependency.
+
+
+==========
+Limitation
+==========
+
+Limit to typical locks
+----------------------
+
+Limiting lockdep to checking dependencies only on typical locks e.g.
+spin locks and mutexes, which should be released within the acquire
+context, the implementation of detecting and adding dependencies becomes
+simple but its capacity for detection becomes limited. Let's check what
+its pros and cons are, in next section.
+
+CONCLUSION
+
+Limiting lockdep to working on typical locks e.g. spin locks and mutexes,
+the implmentation becomes simple but limits its capacity.
+
+
+Pros from the limitation
+------------------------
+
+Given the limitation, when acquiring a lock, locks in the held_locks of
+the context cannot be released if the context fails to acquire it and
+has to wait for it. It also makes waiters for the locks in the
+held_locks stuck. It's the exact case to create a dependency 'A -> B',
+where lock A is each lock in held_locks and lock B is the lock to
+acquire. See 'What casues deadlock' section.
+
+For example,
+
+   CONTEXT X
+   ---------
+   acquire A
+
+   acquire B /* Add a dependency 'A -> B' */
+
+   acquire C /* Add a dependency 'B -> C' */
+
+   release C
+
+   release B
+
+   release A
+
+   where A, B and C are different lock classes.
+
+When acquiring lock A, the held_locks of CONTEXT X is empty thus no
+dependency is added. When acquiring lock B, lockdep detects and adds
+a new dependency 'A -> B' between lock A in held_locks and lock B. When
+acquiring lock C, lockdep also adds another dependency 'B -> C' for the
+same reason. They can be simply added whenever acquiring each lock.
+
+And most data required by lockdep exists in a local structure e.i.
+'task_struct -> held_locks'. Forcing to access those data within the
+context, lockdep can avoid racy problems without explicit locks while
+handling the local data.
+
+Lastly, lockdep only needs to keep locks currently being held, to build
+the dependency graph. However relaxing the limitation, it might need to
+keep even locks already released, because the decision of whether they
+created dependencies might be long-deferred. See 'Crossrelease' section.
+
+To sum up, we can expect several advantages from the limitation.
+
+1. Lockdep can easily identify a dependency when acquiring a lock.
+2. Requiring only local locks makes many races avoidable.
+3. Lockdep only needs to keep locks currently being held.
+
+CONCLUSION
+
+Given the limitation, the implementation becomes simple and efficient.
+
+
+Cons from the limitation
+------------------------
+
+Given the limitation, lockdep is applicable only to typical locks. For
+example, page locks for page access or completions for synchronization
+cannot play with lockdep under the limitation.
+
+Can we detect deadlocks below, under the limitation?
+
+Example 1:
+
+   CONTEXT X		   CONTEXT Y
+   ---------		   ---------
+   mutex_lock A
+			   lock_page B
+   lock_page B
+			   mutex_lock A /* DEADLOCK */
+   unlock_page B
+			   mutex_unlock A
+   mutex_unlock A
+			   unlock_page B
+
+   where A is a lock class and B is a page lock.
+
+No, we cannot.
+
+Example 2:
+
+   CONTEXT X	   CONTEXT Y	   CONTEXT Z
+   ---------	   ---------	   ----------
+		   mutex_lock A
+   lock_page B
+		   lock_page B
+				   mutex_lock A /* DEADLOCK */
+				   mutex_unlock A
+				   unlock_page B held by X
+		   unlock_page B
+		   mutex_unlock A
+
+   where A is a lock class and B is a page lock.
+
+No, we cannot.
+
+Example 3:
+
+   CONTEXT X		   CONTEXT Y
+   ---------		   ---------
+			   mutex_lock A
+   mutex_lock A
+			   wait_for_complete B /* DEADLOCK */
+   mutex_unlock A
+   complete B
+			   mutex_unlock A
+
+   where A is a lock class and B is a completion variable.
+
+No, we cannot.
+
+CONCLUSION
+
+Given the limitation, lockdep cannot detect a deadlock or its
+possibility caused by page locks or completions.
+
+
+==============
+Generalization
+==============
+
+Relax the limitation
+--------------------
+
+Under the limitation, things to create dependencies are limited to
+typical locks. However, e.g. page locks and completions which are not
+typical locks also create dependencies and cause a deadlock. Therefore
+it would be better for lockdep to detect a deadlock or its possibility
+even for them.
+
+Detecting and adding dependencies into graph is very important for
+lockdep to work because adding a dependency means adding a chance to
+check if it causes a deadlock. The more lockdep adds dependencies, the
+more it thoroughly works. Therefore Lockdep has to do its best to add as
+many true dependencies as possible into the graph.
+
+Relaxing the limitation, lockdep can add more dependencies since
+additional things e.g. page locks or completions create additional
+dependencies. However even so, it needs to be noted that the relaxation
+does not affect the behavior of adding dependencies for typical locks.
+
+For example, considering only typical locks, lockdep builds a graph like,
+
+   A -> B -        -> F -> G
+           \      /
+            -> E -        -> L
+           /      \      /
+   C -> D -        -> H -
+                         \
+                          -> I -> K
+                         /
+                      J -
+
+   where A, B,..., L are different lock classes.
+
+On the other hand, under the relaxation, additional dependencies might
+be created and added. Assuming additional 'MX -> H', 'L -> NX' and
+'OX -> J' dependencies are added thanks to the relaxation, the graph
+will be, giving additional chances to check circular dependencies,
+
+   A -> B -        -> F -> G
+           \      /
+            -> E -        -> L -> NX
+           /      \      /
+   C -> D -        -> H -
+                  /      \
+              MX -        -> I -> K
+                         /
+                   -> J -
+                  /
+              OX -
+
+   where A, B,..., L, MX, NX and OX are different lock classes, and
+   a suffix 'X' is added on non-typical locks e.g. page locks and
+   completions.
+
+However, it might suffer performance degradation since relaxing the
+limitation with which design and implementation of lockdep could be
+efficient might introduce inefficiency inevitably. Each option, strong
+detection or efficient detection, has its pros and cons, thus the right
+of choice between two options should be given to users.
+
+Choosing efficient detection, lockdep only deals with locks satisfying,
+
+   A lock should be released within the context holding the lock.
+
+Choosing strong detection, lockdep deals with any locks satisfying,
+
+   A lock can be released in any context.
+
+The latter, of course, doesn't allow illegal contexts to release a lock.
+For example, acquiring a lock in irq-safe context before releasing the
+lock in irq-unsafe context is not allowed, which after all ends in
+circular dependencies, meaning a deadlock. Otherwise, any contexts are
+allowed to release it.
+
+CONCLUSION
+
+Relaxing the limitation, lockdep can add additional dependencies and
+get additional chances to check if they cause deadlocks.
+
+
+============
+Crossrelease
+============
+
+Introduce crossrelease
+----------------------
+
+To allow lockdep to handle additional dependencies by what might be
+released in any context, namely 'crosslock', a new feature 'crossrelease'
+is introduced. Thanks to the feature, now lockdep can identify such
+dependencies. Crossrelease feature has to do,
+
+   1. Identify dependencies by crosslocks.
+   2. Add the dependencies into graph.
+
+That's all. Once a meaningful dependency is added into graph, then
+lockdep would work with the graph as it did. So the most important thing
+crossrelease feature has to do is to correctly identify and add true
+dependencies into the global graph.
+
+A dependency e.g. 'A -> B' can be identified only in the A's release
+context because a decision required to identify the dependency can be
+made only in the release context. That is to decide whether A can be
+released so that a waiter for A can be woken up. It cannot be made in
+other contexts than the A's release context. See 'What causes deadlock'
+section to remind what a dependency is.
+
+It's no matter for typical locks because each acquire context is same as
+its release context, thus lockdep can decide whether a lock can be
+released, in the acquire context. However for crosslocks, lockdep cannot
+make the decision in the acquire context but has to wait until the
+release context is identified.
+
+Therefore lockdep has to queue all acquisitions which might create
+dependencies until the decision can be made, so that they can be used
+when it proves they are the right ones. We call the step 'commit'. See
+'Introduce commit' section.
+
+Of course, some actual deadlocks caused by crosslocks cannot be detected
+just when it happens, because the deadlocks cannot be identified until
+the crosslocks is actually released. However, deadlock possibilities can
+be detected in this way. It's worth possibility detection of deadlock.
+See 'What lockdep does' section.
+
+CONCLUSION
+
+With crossrelease feature, lockdep can work with what might be released
+in any context, namely crosslock.
+
+
+Pick true dependencies
+----------------------
+
+Remind what a dependency is. A dependency exists if,
+
+   COND 1. There are two waiters waiting for each event at the same time.
+   COND 2. Only way to wake up each waiter is to trigger its events.
+   COND 3. Whether one can be woken up depends on whether the other can.
+
+For example,
+
+   TASK X
+   ------
+   acquire A
+
+   acquire B /* A dependency 'A -> B' exists */
+
+   acquire C /* A dependency 'B -> C' exists */
+
+   release C
+
+   release B
+
+   release A
+
+   where A, B and C are different lock classes.
+
+A depedency 'A -> B' exists since,
+
+   1. A waiter for A and a waiter for B might exist when acquiring B.
+   2. Only way to wake up each of them is to release what it waits for.
+   3. Whether the waiter for A can be woken up depends on whether the
+      other can. IOW, TASK X cannot release A if it cannot acquire B.
+
+Other dependencies 'B -> C' and 'A -> C' also exist for the same reason.
+But the second is ignored since it's covered by 'A -> B' and 'B -> C'.
+
+For another example,
+
+   TASK X			   TASK Y
+   ------			   ------
+				   acquire AX
+   acquire D
+   /* A dependency 'AX -> D' exists */
+				   acquire B
+   release D
+				   acquire C
+				   /* A dependency 'B -> C' exists */
+   acquire E
+   /* A dependency 'AX -> E' exists */
+				   acquire D
+				   /* A dependency 'C -> D' exists */
+   release E
+				   release D
+   release AX held by Y
+				   release C
+
+				   release B
+
+   where AX, B, C,..., E are different lock classes, and a suffix 'X' is
+   added on crosslocks.
+
+Even in this case involving crosslocks, the same rules can be applied. A
+depedency 'AX -> D' exists since,
+
+   1. A waiter for AX and a waiter for D might exist when acquiring D.
+   2. Only way to wake up each of them is to release what it waits for.
+   3. Whether the waiter for AX can be woken up depends on whether the
+      other can. IOW, TASK X cannot release AX if it cannot acquire D.
+
+The same rules can be applied to other dependencies, too.
+
+Let's take a look at more complicated example.
+
+   TASK X			   TASK Y
+   ------			   ------
+   acquire B
+
+   release B
+
+   acquire C
+
+   release C
+   (1)
+   fork Y
+				   acquire AX
+   acquire D
+   /* A dependency 'AX -> D' exists */
+				   acquire F
+   release D
+				   acquire G
+				   /* A dependency 'F -> G' exists */
+   acquire E
+   /* A dependency 'AX -> E' exists */
+				   acquire H
+				   /* A dependency 'G -> H' exists */
+   release E
+				   release H
+   release AX held by Y
+				   release G
+
+				   release F
+
+   where AX, B, C,..., H are different lock classes, and a suffix 'X' is
+   added on crosslocks.
+
+Does a dependency 'AX -> B' exist? Nope.
+
+Two waiters, one is for AX and the other is for B, are essential
+elements to create the dependency 'AX -> B'. However in this example,
+these two waiters cannot exist at the same time. Thus the dependency
+'AX -> B' cannot be created.
+
+In fact, AX depends on all acquisitions after (1) in TASK X e.i. D and E,
+but excluding all acquisitions before (1) in the context e.i. A and C.
+Thus only 'AX -> D' and 'AX -> E' are true dependencies by AX.
+
+It would be ideal if the full set of true ones can be added. But parsing
+the whole code is necessary to do it, which is impossible. Relying on
+what actually happens at runtime, we can anyway add only true ones even
+though they might be a subset of the full set. This way we can avoid
+adding false ones.
+
+It's similar to how lockdep works for typical locks. Ideally there might
+be more true dependencies than ones being in the gloabl dependency graph,
+however, lockdep has no choice but to rely on what actually happens
+since otherwise it's almost impossible.
+
+CONCLUSION
+
+Relying on what actually happens, adding false dependencies can be
+avoided.
+
+
+Introduce commit
+----------------
+
+Crossrelease feature names it 'commit' to identify and add dependencies
+into graph in batches. Lockdep is already doing what commit is supposed
+to do, when acquiring a lock for typical locks. However, that way must
+be changed for crosslocks so that it identifies a crosslock's release
+context first, then does commit.
+
+There are four types of dependencies.
+
+1. TT type: 'Typical lock A -> Typical lock B' dependency
+
+   Just when acquiring B, lockdep can see it's in the A's release
+   context. So the dependency between A and B can be identified
+   immediately. Commit is unnecessary.
+
+2. TC type: 'Typical lock A -> Crosslock BX' dependency
+
+   Just when acquiring BX, lockdep can see it's in the A's release
+   context. So the dependency between A and BX can be identified
+   immediately. Commit is unnecessary, too.
+
+3. CT type: 'Crosslock AX -> Typical lock B' dependency
+
+   When acquiring B, lockdep cannot identify the dependency because
+   there's no way to know whether it's in the AX's release context. It
+   has to wait until the decision can be made. Commit is necessary.
+
+4. CC type: 'Crosslock AX -> Crosslock BX' dependency
+
+   If there is a typical lock acting as a bridge so that 'AX -> a lock'
+   and 'the lock -> BX' can be added, then this dependency can be
+   detected. But direct ways are not implemented yet. It's a future work.
+
+Lockdep works even without commit for typical locks. However, commit
+step is necessary once crosslocks are involved, until all crosslocks in
+progress are released. Introducing commit, lockdep performs three steps
+i.e. acquire, commit and release. What lockdep does in each step is,
+
+1. Acquire
+
+   1) For typical lock
+
+      Lockdep does what it originally did and queues the lock so that
+      lockdep can check CT type dependencies using it at commit step.
+
+   2) For crosslock
+
+      The crosslock is added to a global linked list so that lockdep can
+      check CT type dependencies using it at commit step.
+
+2. Commit
+
+   1) For typical lock
+
+      N/A.
+
+   2) For crosslock
+
+      Lockdep checks and adds CT Type dependencies using data saved at
+      acquire step.
+
+3. Release
+
+   1) For typical lock
+
+      No change.
+
+   2) For crosslock
+
+      Lockdep just remove the crosslock from the global linked list, to
+      which it was added at acquire step.
+
+CONCLUSION
+
+Crossrelease feature introduces commit step to handle dependencies by
+crosslocks in batches, which lockdep cannot handle in its original way.
+
+
+==============
+Implementation
+==============
+
+Data structures
+---------------
+
+Crossrelease feature introduces two main data structures.
+
+1. pend_lock
+
+   This is an array embedded in task_struct, for keeping locks queued so
+   that real dependencies can be added using them at commit step. Since
+   it's local data, it can be accessed locklessly in the owner context.
+   The array is filled at acquire step and consumed at commit step. And
+   it's managed in circular manner.
+
+2. cross_lock
+
+   This is a global linked list, for keeping all crosslocks in progress.
+   The list grows at acquire step and is shrunk at release step.
+
+CONCLUSION
+
+Crossrelease feature introduces two main data structures.
+
+1. A pend_lock array for queueing typical locks in circular manner.
+2. A cross_lock linked list for managing crosslocks in progress.
+
+
+How crossrelease works
+----------------------
+
+Let's take a look at how crossrelease feature works step by step,
+starting from how lockdep works without crossrelease feaure.
+
+For example, the below is how lockdep works for typical locks.
+
+   A's RELEASE CONTEXT (= A's ACQUIRE CONTEXT)
+   -------------------------------------------
+   acquire A
+
+   acquire B /* Add 'A -> B' */
+
+   acquire C /* Add 'B -> C' */
+
+   release C
+
+   release B
+
+   release A
+
+   where A, B and C are different lock classes.
+
+After adding 'A -> B', the dependency graph will be,
+
+   A -> B
+
+   where A and B are different lock classes.
+
+And after adding 'B -> C', the graph will be,
+
+   A -> B -> C
+
+   where A, B and C are different lock classes.
+
+What if we use commit step to add dependencies even for typical locks?
+Commit step is not necessary for them, however it anyway would work well,
+because this is a more general way.
+
+   A's RELEASE CONTEXT (= A's ACQUIRE CONTEXT)
+   -------------------------------------------
+   acquire A
+   /*
+    * 1. Mark A as started
+    * 2. Queue A
+    *
+    * In pend_lock: A
+    * In graph: Empty
+    */
+
+   acquire B
+   /*
+    * 1. Mark B as started
+    * 2. Queue B
+    *
+    * In pend_lock: A, B
+    * In graph: Empty
+    */
+
+   acquire C
+   /*
+    * 1. Mark C as started
+    * 2. Queue C
+    *
+    * In pend_lock: A, B, C
+    * In graph: Empty
+    */
+
+   release C
+   /*
+    * 1. Commit C (= Add 'C -> ?')
+    *   a. What queued since C was marked: Nothing
+    *   b. Add nothing
+    *
+    * In pend_lock: A, B, C
+    * In graph: Empty
+    */
+
+   release B
+   /*
+    * 1. Commit B (= Add 'B -> ?')
+    *   a. What queued since B was marked: C
+    *   b. Add 'B -> C'
+    *
+    * In pend_lock: A, B, C
+    * In graph: 'B -> C'
+    */
+
+   release A
+   /*
+    * 1. Commit A (= Add 'A -> ?')
+    *   a. What queued since A was marked: B, C
+    *   b. Add 'A -> B'
+    *   c. Add 'A -> C'
+    *
+    * In pend_lock: A, B, C
+    * In graph: 'B -> C', 'A -> B', 'A -> C'
+    */
+
+   where A, B and C are different lock classes.
+
+After doing commit A, B and C, the dependency graph becomes like,
+
+   A -> B -> C
+
+   where A, B and C are different lock classes.
+
+   NOTE: A dependency 'A -> C' is optimized out.
+
+We can see the former graph built without commit step is same as the
+latter graph built using commit steps. Of course the former way leads to
+earlier finish for building the graph, which means we can detect a
+deadlock or its possibility sooner. So the former way would be prefered
+if possible. But we cannot avoid using the latter way for crosslocks.
+
+Let's look at how commit works for crosslocks.
+
+   AX's RELEASE CONTEXT		   AX's ACQUIRE CONTEXT
+   --------------------		   --------------------
+				   acquire AX
+				   /*
+				    * 1. Mark AX as started
+				    *
+				    * (No queuing for crosslocks)
+				    *
+				    * In pend_lock: Empty
+				    * In graph: Empty
+				    */
+
+   (serialized by some means e.g. barrier)
+
+   acquire D
+   /*
+    * (No marking for typical locks)
+    *
+    * 1. Queue D
+    *
+    * In pend_lock: D
+    * In graph: Empty
+    */
+				   acquire B
+				   /*
+				    * (No marking for typical locks)
+				    *
+				    * 1. Queue B
+				    *
+				    * In pend_lock: B
+				    * In graph: Empty
+				    */
+   release D
+   /*
+    * (No commit for typical locks)
+    *
+    * In pend_lock: D
+    * In graph: Empty
+    */
+				   acquire C
+				   /*
+				    * (No marking for typical locks)
+				    *
+				    * 1. Add 'B -> C' of TT type
+				    * 2. Queue C
+				    *
+				    * In pend_lock: B, C
+				    * In graph: 'B -> C'
+				    */
+   acquire E
+   /*
+    * (No marking for typical locks)
+    *
+    * 1. Queue E
+    *
+    * In pend_lock: D, E
+    * In graph: 'B -> C'
+    */
+				   acquire D
+				   /*
+				    * (No marking for typical locks)
+				    *
+				    * 1. Add 'C -> D' of TT type
+				    * 2. Queue D
+				    *
+				    * In pend_lock: B, C, D
+				    * In graph: 'B -> C', 'C -> D'
+				    */
+   release E
+   /*
+    * (No commit for typical locks)
+    *
+    * In pend_lock: D, E
+    * In graph: 'B -> C', 'C -> D'
+    */
+				   release D
+				   /*
+				    * (No commit for typical locks)
+				    *
+				    * In pend_lock: B, C, D
+				    * In graph: 'B -> C', 'C -> D'
+				    */
+   release AX
+   /*
+    * 1. Commit AX (= Add 'AX -> ?')
+    *   a. What queued since AX was marked: D, E
+    *   b. Add 'AX -> D' of CT type
+    *   c. Add 'AX -> E' of CT type
+    *
+    * In pend_lock: D, E
+    * In graph: 'B -> C', 'C -> D',
+    *           'AX -> D', 'AX -> E'
+    */
+				   release C
+				   /*
+				    * (No commit for typical locks)
+				    *
+				    * In pend_lock: B, C, D
+				    * In graph: 'B -> C', 'C -> D',
+				    *           'AX -> D', 'AX -> E'
+				    */
+
+				   release B
+				   /*
+				    * (No commit for typical locks)
+				    *
+				    * In pend_lock: B, C, D
+				    * In graph: 'B -> C', 'C -> D',
+				    *           'AX -> D', 'AX -> E'
+				    */
+
+   where AX, B, C,..., E are different lock classes, and a suffix 'X' is
+   added on crosslocks.
+
+When acquiring crosslock AX, crossrelease feature marks AX as started,
+which means all acquisitions from now are candidates which might create
+dependencies with AX. True dependencies will be determined when
+identifying the AX's release context.
+
+When acquiring typical lock B, lockdep queues B so that it can be used
+at commit step later since any crosslocks in progress might depends on B.
+The same thing is done on lock C, D and E. And then two dependencies
+'AX -> D' and 'AX -> E' are added at commit step, when identifying the
+AX's release context.
+
+The final graph is, with crossrelease feature using commit,
+
+   B -> C -
+           \
+            -> D
+           /
+       AX -
+           \
+            -> E
+
+   where AX, B, C,..., E are different lock classes, and a suffix 'X' is
+   added on crosslocks.
+
+However, without crossrelease feature, the final graph would be,
+
+   B -> C -> D
+
+   where B and C are different lock classes.
+
+The former graph has two more dependencies 'AX -> D' and 'AX -> E'
+giving additional chances to check if they cause deadlocks. This way
+lockdep can detect a deadlock or its possibility caused by crosslocks.
+Again, crossrelease feature does not affect the behavior of adding
+dependencies for typical locks.
+
+CONCLUSION
+
+Crossrelease works well for crosslock, thanks to commit step.
+
+
+=============
+Optimizations
+=============
+
+Avoid duplication
+-----------------
+
+Crossrelease feature uses a cache like what lockdep is already using for
+dependency chains, but this time it's for caching a dependency of CT
+type, crossing between two different context. Once that dependency is
+cached, same dependencies will never be added again. Queueing
+unnecessary locks is also prevented based on the cache.
+
+CONCLUSION
+
+Crossrelease does not add any duplicate dependencies.
+
+
+Lockless for hot paths
+----------------------
+
+To keep all typical locks for later use, crossrelease feature adopts a
+local array embedded in task_struct, which makes accesses to arrays
+lockless by forcing the accesses to happen only within the owner context.
+It's like how lockdep accesses held_locks. Lockless implmentation is
+important since typical locks are very frequently acquired and released.
+
+CONCLUSION
+
+Crossrelease is designed to use no lock for hot paths.
+
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 01/13] lockdep: Refactor lookup_chain_cache()
  2017-01-18 13:17 ` [PATCH v5 01/13] lockdep: Refactor lookup_chain_cache() Byungchul Park
@ 2017-01-19  9:16   ` Boqun Feng
  2017-01-19  9:52     ` Byungchul Park
  2017-01-26  7:53     ` Byungchul Park
  0 siblings, 2 replies; 63+ messages in thread
From: Boqun Feng @ 2017-01-19  9:16 UTC (permalink / raw)
  To: Byungchul Park
  Cc: peterz, mingo, tglx, walken, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

[-- Attachment #1: Type: text/plain, Size: 2343 bytes --]

On Wed, Jan 18, 2017 at 10:17:27PM +0900, Byungchul Park wrote:
> Currently, lookup_chain_cache() provides both 'lookup' and 'add'
> functionalities in a function. However, each is useful. So this
> patch makes lookup_chain_cache() only do 'lookup' functionality and
> makes add_chain_cahce() only do 'add' functionality. And it's more
> readable than before.
> 
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
>  kernel/locking/lockdep.c | 129 +++++++++++++++++++++++++++++------------------
>  1 file changed, 81 insertions(+), 48 deletions(-)
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 4d7ffc0..f37156f 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -2109,15 +2109,9 @@ static int check_no_collision(struct task_struct *curr,
>  	return 1;
>  }
>  
> -/*
> - * Look up a dependency chain. If the key is not present yet then
> - * add it and return 1 - in this case the new dependency chain is
> - * validated. If the key is already hashed, return 0.
> - * (On return with 1 graph_lock is held.)
> - */

I think you'd better put some comments here for the behavior of
add_chain_cache(), something like:

/*
 * Add a dependency chain into chain hashtable.
 * 
 * Must be called with graph_lock held.
 * Return 0 if fail to add the chain, and graph_lock is released.
 * Return 1 with graph_lock held if succeed.
 */

Regards,
Boqun

> -static inline int lookup_chain_cache(struct task_struct *curr,
> -				     struct held_lock *hlock,
> -				     u64 chain_key)
> +static inline int add_chain_cache(struct task_struct *curr,
> +				  struct held_lock *hlock,
> +				  u64 chain_key)
>  {
>  	struct lock_class *class = hlock_class(hlock);
>  	struct hlist_head *hash_head = chainhashentry(chain_key);
> @@ -2125,49 +2119,18 @@ static inline int lookup_chain_cache(struct task_struct *curr,
>  	int i, j;
>  
>  	/*
> +	 * Allocate a new chain entry from the static array, and add
> +	 * it to the hash:
> +	 */
> +
> +	/*
>  	 * We might need to take the graph lock, ensure we've got IRQs
>  	 * disabled to make this an IRQ-safe lock.. for recursion reasons
>  	 * lockdep won't complain about its own locking errors.
>  	 */
>  	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
>  		return 0;
[...]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 01/13] lockdep: Refactor lookup_chain_cache()
  2017-01-19  9:16   ` Boqun Feng
@ 2017-01-19  9:52     ` Byungchul Park
  2017-01-26  7:53     ` Byungchul Park
  1 sibling, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-19  9:52 UTC (permalink / raw)
  To: Boqun Feng
  Cc: peterz, mingo, tglx, walken, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

On Thu, Jan 19, 2017 at 05:16:27PM +0800, Boqun Feng wrote:
> On Wed, Jan 18, 2017 at 10:17:27PM +0900, Byungchul Park wrote:
> > Currently, lookup_chain_cache() provides both 'lookup' and 'add'
> > functionalities in a function. However, each is useful. So this
> > patch makes lookup_chain_cache() only do 'lookup' functionality and
> > makes add_chain_cahce() only do 'add' functionality. And it's more
> > readable than before.
> > 
> > Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> > ---
> >  kernel/locking/lockdep.c | 129 +++++++++++++++++++++++++++++------------------
> >  1 file changed, 81 insertions(+), 48 deletions(-)
> > 
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index 4d7ffc0..f37156f 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -2109,15 +2109,9 @@ static int check_no_collision(struct task_struct *curr,
> >  	return 1;
> >  }
> >  
> > -/*
> > - * Look up a dependency chain. If the key is not present yet then
> > - * add it and return 1 - in this case the new dependency chain is
> > - * validated. If the key is already hashed, return 0.
> > - * (On return with 1 graph_lock is held.)
> > - */
> 
> I think you'd better put some comments here for the behavior of
> add_chain_cache(), something like:
> 
> /*
>  * Add a dependency chain into chain hashtable.
>  * 
>  * Must be called with graph_lock held.
>  * Return 0 if fail to add the chain, and graph_lock is released.
>  * Return 1 with graph_lock held if succeed.
>  */

Yes. I will apply what you recommand.

Thank you very much. :)

Thanks,
Byungchul

> 
> Regards,
> Boqun
> 
> > -static inline int lookup_chain_cache(struct task_struct *curr,
> > -				     struct held_lock *hlock,
> > -				     u64 chain_key)
> > +static inline int add_chain_cache(struct task_struct *curr,
> > +				  struct held_lock *hlock,
> > +				  u64 chain_key)
> >  {
> >  	struct lock_class *class = hlock_class(hlock);
> >  	struct hlist_head *hash_head = chainhashentry(chain_key);
> > @@ -2125,49 +2119,18 @@ static inline int lookup_chain_cache(struct task_struct *curr,
> >  	int i, j;
> >  
> >  	/*
> > +	 * Allocate a new chain entry from the static array, and add
> > +	 * it to the hash:
> > +	 */
> > +
> > +	/*
> >  	 * We might need to take the graph lock, ensure we've got IRQs
> >  	 * disabled to make this an IRQ-safe lock.. for recursion reasons
> >  	 * lockdep won't complain about its own locking errors.
> >  	 */
> >  	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
> >  		return 0;
> [...]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [REVISED DOCUMENT] lockdep: Crossrelease feature documentation
  2017-01-18 13:17 ` [PATCH v5 13/13] lockdep: Crossrelease feature documentation Byungchul Park
@ 2017-01-20  9:08   ` Byungchul Park
  0 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-20  9:08 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

This document describes the concept of crossrelease feature.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 Documentation/locking/crossrelease.txt | 874 +++++++++++++++++++++++++++++++++
 1 file changed, 874 insertions(+)
 create mode 100644 Documentation/locking/crossrelease.txt

diff --git a/Documentation/locking/crossrelease.txt b/Documentation/locking/crossrelease.txt
new file mode 100644
index 0000000..bdf1423
--- /dev/null
+++ b/Documentation/locking/crossrelease.txt
@@ -0,0 +1,874 @@
+Crossrelease
+============
+
+Started by Byungchul Park <byungchul.park@lge.com>
+
+Contents:
+
+ (*) Background
+
+     - What causes deadlock
+     - How lockdep works
+
+ (*) Limitation
+
+     - Limit lockdep
+     - Pros from the limitation
+     - Cons from the limitation
+     - Relax the limitation
+
+ (*) Crossrelease
+
+     - Introduce crossrelease
+     - Introduce commit
+
+ (*) Implementation
+
+     - Data structures
+     - How crossrelease works
+
+ (*) Optimizations
+
+     - Avoid duplication
+     - Lockless for hot paths
+
+ (*) APPENDIX A: What lockdep does to work aggresively
+
+ (*) APPENDIX B: How to avoid adding false dependencies
+
+
+==========
+Background
+==========
+
+What causes deadlock
+--------------------
+
+A deadlock occurs when a context is waiting for an event to happen,
+which is impossible because another (or the) context who can trigger the
+event is also waiting for another (or the) event to happen, which is
+also impossible due to the same reason.
+
+For example:
+
+   A context going to trigger event C is waiting for event A to happen.
+   A context going to trigger event A is waiting for event B to happen.
+   A context going to trigger event B is waiting for event C to happen.
+
+A deadlock occurs when these three wait operations run at the same time,
+because event C cannot be triggered if event A does not happen, which in
+turn cannot be triggered if event B does not happen, which in turn
+cannot be triggered if event C does not happen. After all, no event can
+be triggered since any of them never meets its condition to wake up.
+
+A dependency might exist between two waiters and a deadlock might happen
+due to an incorrect releationship between dependencies. Thus, we must
+define what a dependency is first. A dependency exists between them if:
+
+   1. There are two waiters waiting for each event at a given time.
+   2. The only way to wake up each waiter is to trigger its event.
+   3. Whether one can be woken up depends on whether the other can.
+
+Each wait in the example creates its dependency like:
+
+   Event C depends on event A.
+   Event A depends on event B.
+   Event B depends on event C.
+
+   NOTE: Precisely speaking, a dependency is one between whether a
+   waiter for an event can be woken up and whether another waiter for
+   another event can be woken up. However from now on, we will describe
+   a dependency as if it's one between an event and another event for
+   simplicity.
+
+And they form circular dependencies like:
+
+    -> C -> A -> B -
+   /                \
+   \                /
+    ----------------
+
+   where 'A -> B' means that event A depends on event B.
+
+Such circular dependencies lead to a deadlock since no waiter can meet
+its condition to wake up as described.
+
+CONCLUSION
+
+Circular dependencies cause a deadlock.
+
+
+How lockdep works
+-----------------
+
+Lockdep tries to detect a deadlock by checking dependencies created by
+lock operations, acquire and release. Waiting for a lock corresponds to
+waiting for an event, and releasing a lock corresponds to triggering an
+event in the previous section.
+
+In short, lockdep does:
+
+   1. Detect a new dependency.
+   2. Add the dependency into a global graph.
+   3. Check if that makes dependencies circular.
+   4. Report a deadlock or its possibility if so.
+
+For example, consider a graph built by lockdep that looks like:
+
+   A -> B -
+           \
+            -> E
+           /
+   C -> D -
+
+   where A, B,..., E are different lock classes.
+
+Lockdep will add a dependency into the graph on detection of a new
+dependency. For example, it will add a dependency 'E -> C' when a new
+dependency between lock E and lock C is detected. Then the graph will be:
+
+       A -> B -
+               \
+                -> E -
+               /      \
+    -> C -> D -        \
+   /                   /
+   \                  /
+    ------------------
+
+   where A, B,..., E are different lock classes.
+
+This graph contains a subgraph which demonstrates circular dependencies:
+
+                -> E -
+               /      \
+    -> C -> D -        \
+   /                   /
+   \                  /
+    ------------------
+
+   where C, D and E are different lock classes.
+
+This is the condition under which a deadlock might occur. Lockdep
+reports it on detection after adding a new dependency. This is the way
+how lockdep works.
+
+CONCLUSION
+
+Lockdep detects a deadlock or its possibility by checking if circular
+dependencies were created after adding each new dependency.
+
+
+==========
+Limitation
+==========
+
+Limit lockdep
+-------------
+
+Limiting lockdep to work on only typical locks e.g. spin locks and
+mutexes, which are released within the acquire context, the
+implementation becomes simple but its capacity for detection becomes
+limited. Let's check pros and cons in next section.
+
+
+Pros from the limitation
+------------------------
+
+Given the limitation, when acquiring a lock, locks in a held_locks
+cannot be released if the context cannot acquire it so has to wait to
+acquire it, which means all waiters for the locks in the held_locks are
+stuck. It's an exact case to create dependencies between each lock in
+the held_locks and the lock to acquire.
+
+For example:
+
+   CONTEXT X
+   ---------
+   acquire A
+   acquire B /* Add a dependency 'A -> B' */
+   release B
+   release A
+
+   where A and B are different lock classes.
+
+When acquiring lock A, the held_locks of CONTEXT X is empty thus no
+dependency is added. But when acquiring lock B, lockdep detects and adds
+a new dependency 'A -> B' between lock A in the held_locks and lock B.
+They can be simply added whenever acquiring each lock.
+
+And data required by lockdep exists in a local structure, held_locks
+embedded in task_struct. Forcing to access the data within the context,
+lockdep can avoid racy problems without explicit locks while handling
+the local data.
+
+Lastly, lockdep only needs to keep locks currently being held, to build
+a dependency graph. However, relaxing the limitation, it needs to keep
+even locks already released, because a decision whether they created
+dependencies might be long-deferred.
+
+To sum up, we can expect several advantages from the limitation:
+
+   1. Lockdep can easily identify a dependency when acquiring a lock.
+   2. Races are avoidable while accessing local locks in a held_locks.
+   3. Lockdep only needs to keep locks currently being held.
+
+CONCLUSION
+
+Given the limitation, the implementation becomes simple and efficient.
+
+
+Cons from the limitation
+------------------------
+
+Given the limitation, lockdep is applicable only to typical locks. For
+example, page locks for page access or completions for synchronization
+cannot work with lockdep.
+
+Can we detect deadlocks below, under the limitation?
+
+Example 1:
+
+   CONTEXT X	   CONTEXT Y	   CONTEXT Z
+   ---------	   ---------	   ----------
+		   mutex_lock A
+   lock_page B
+		   lock_page B
+				   mutex_lock A /* DEADLOCK */
+				   unlock_page B held by X
+		   unlock_page B
+		   mutex_unlock A
+				   mutex_unlock A
+
+   where A and B are different lock classes.
+
+No, we cannot.
+
+Example 2:
+
+   CONTEXT X		   CONTEXT Y
+   ---------		   ---------
+			   mutex_lock A
+   mutex_lock A
+			   wait_for_complete B /* DEADLOCK */
+   complete B
+			   mutex_unlock A
+   mutex_unlock A
+
+   where A is a lock class and B is a completion variable.
+
+No, we cannot.
+
+CONCLUSION
+
+Given the limitation, lockdep cannot detect a deadlock or its
+possibility caused by page locks or completions.
+
+
+Relax the limitation
+--------------------
+
+Under the limitation, things to create dependencies are limited to
+typical locks. However, synchronization primitives like page locks and
+completions, which are allowed to be released in any context, also
+create dependencies and can cause a deadlock. So lockdep should track
+these locks to do a better job. We have to relax the limitation for
+these locks to work with lockdep.
+
+Detecting dependencies is very important for lockdep to work because
+adding a dependency means adding an opportunity to check whether it
+causes a deadlock. The more lockdep adds dependencies, the more it
+thoroughly works. Thus Lockdep has to do its best to detect and add as
+many true dependencies into a graph as possible.
+
+For example, considering only typical locks, lockdep builds a graph like:
+
+   A -> B -
+           \
+            -> E
+           /
+   C -> D -
+
+   where A, B,..., E are different lock classes.
+
+On the other hand, under the relaxation, additional dependencies might
+be created and added. Assuming additional 'FX -> C' and 'E -> GX' are
+added thanks to the relaxation, the graph will be:
+
+         A -> B -
+                 \
+                  -> E -> GX
+                 /
+   FX -> C -> D -
+
+   where A, B,..., E, FX and GX are different lock classes, and a suffix
+   'X' is added on non-typical locks.
+
+The latter graph gives us more chances to check circular dependencies
+than the former. However, it might suffer performance degradation since
+relaxing the limitation, with which design and implementation of lockdep
+can be efficient, might introduce inefficiency inevitably. So lockdep
+should provide two options, strong detection and efficient detection.
+
+Choosing efficient detection:
+
+   Lockdep works with only locks restricted to be released within the
+   acquire context. However, lockdep works efficiently.
+
+Choosing strong detection:
+
+   Lockdep works with all synchronization primitives. However, lockdep
+   suffers performance degradation.
+
+CONCLUSION
+
+Relaxing the limitation, lockdep can add additional dependencies giving
+additional opportunities to check circular dependencies.
+
+
+============
+Crossrelease
+============
+
+Introduce crossrelease
+----------------------
+
+In order to allow lockdep to handle additional dependencies by what
+might be released in any context, namely 'crosslock', we have to be able
+to identify those created by crosslocks. The proposed 'crossrelease'
+feature provoides a way to do that.
+
+Crossrelease feature has to do:
+
+   1. Identify dependencies created by crosslocks.
+   2. Add the dependencies into a dependency graph.
+
+That's all. Once a meaningful dependency is added into graph, then
+lockdep would work with the graph as it did. The most important thing
+crossrelease feature has to do is to correctly identify and add true
+dependencies into the global graph.
+
+A dependency e.g. 'A -> B' can be identified only in the A's release
+context because a decision required to identify the dependency can be
+made only in the release context. That is to decide whether A can be
+released so that a waiter for A can be woken up. It cannot be made in
+other than the A's release context.
+
+It's no matter for typical locks because each acquire context is same as
+its release context, thus lockdep can decide whether a lock can be
+released in the acquire context. However for crosslocks, lockdep cannot
+make the decision in the acquire context but has to wait until the
+release context is identified.
+
+Therefore, deadlocks by crosslocks cannot be detected just when it
+happens, because those cannot be identified until the crosslocks are
+released. However, deadlock possibilities can be detected and it's very
+worth. See 'APPENDIX A' section to check why.
+
+CONCLUSION
+
+Using crossrelease feature, lockdep can work with what might be released
+in any context, namely crosslock.
+
+
+Introduce commit
+----------------
+
+Since crossrelease defers the work adding true dependencies of
+crosslocks until they are actually released, crossrelease has to queue
+all acquisitions which might create dependencies with the crosslocks.
+Then it identifies dependencies using the queued data in batches at a
+proper time. We call it 'commit'.
+
+There are four types of dependencies:
+
+1. TT type: 'typical lock A -> typical lock B'
+
+   Just when acquiring B, lockdep can see it's in the A's release
+   context. So the dependency between A and B can be identified
+   immediately. Commit is unnecessary.
+
+2. TC type: 'typical lock A -> crosslock BX'
+
+   Just when acquiring BX, lockdep can see it's in the A's release
+   context. So the dependency between A and BX can be identified
+   immediately. Commit is unnecessary, too.
+
+3. CT type: 'crosslock AX -> typical lock B'
+
+   When acquiring B, lockdep cannot identify the dependency because
+   there's no way to know if it's in the AX's release context. It has
+   to wait until the decision can be made. Commit is necessary.
+
+4. CC type: 'crosslock AX -> crosslock BX'
+
+   When acquiring BX, lockdep cannot identify the dependency because
+   there's no way to know if it's in the AX's release context. It has
+   to wait until the decision can be made. Commit is necessary.
+   But, handling CC type is not implemented yet. It's a future work.
+
+Lockdep can work without commit for typical locks, but commit step is
+necessary once crosslocks are involved. Introducing commit, lockdep
+performs three steps. What lockdep does in each step is:
+
+1. Acquisition: For typical locks, lockdep does what it originally did
+   and queues the lock so that CT type dependencies can be checked using
+   it at the commit step. For crosslocks, it saves data which will be
+   used at the commit step and increases a reference count for it.
+
+2. Commit: No action is reauired for typical locks. For crosslocks,
+   lockdep adds CT type dependencies using the data saved at the
+   acquisition step.
+
+3. Release: No changes are required for typical locks. When a crosslock
+   is released, it decreases a reference count for it.
+
+CONCLUSION
+
+Crossrelease introduces commit step to handle dependencies of crosslocks
+in batches at a proper time.
+
+
+==============
+Implementation
+==============
+
+Data structures
+---------------
+
+Crossrelease introduces two main data structures.
+
+1. hist_lock
+
+   This is an array embedded in task_struct, for keeping lock history so
+   that dependencies can be added using them at the commit step. Since
+   it's local data, it can be accessed locklessly in the owner context.
+   The array is filled at the acquisition step and consumed at the
+   commit step. And it's managed in circular manner.
+
+2. cross_lock
+
+   One per lockdep_map exists. This is for keeping data of crosslocks
+   and used at the commit step.
+
+
+How crossrelease works
+----------------------
+
+It's the key of how crossrelease works, to defer necessary works to an
+appropriate point in time and perform in at once at the commit step.
+Let's take a look with examples step by step, starting from how lockdep
+works without crossrelease for typical locks.
+
+   acquire A /* Push A onto held_locks */
+   acquire B /* Push B onto held_locks and add 'A -> B' */
+   acquire C /* Push C onto held_locks and add 'B -> C' */
+   release C /* Pop C from held_locks */
+   release B /* Pop B from held_locks */
+   release A /* Pop A from held_locks */
+
+   where A, B and C are different lock classes.
+
+   NOTE: This document assumes that readers already understand how
+   lockdep works without crossrelease thus omits details. But there's
+   one thing to note. Lockdep pretends to pop a lock from held_locks
+   when releasing it. But it's subtly different from the original pop
+   operation because lockdep allows other than the top to be poped.
+
+In this case, lockdep adds 'the top of held_locks -> the lock to acquire'
+dependency every time acquiring a lock.
+
+After adding 'A -> B', a dependency graph will be:
+
+   A -> B
+
+   where A and B are different lock classes.
+
+And after adding 'B -> C', the graph will be:
+
+   A -> B -> C
+
+   where A, B and C are different lock classes.
+
+Let's performs commit step even for typical locks to add dependencies.
+Of course, commit step is not necessary for them, however, it would work
+well because this is a more general way.
+
+   acquire A
+   /*
+    * Queue A into hist_locks
+    *
+    * In hist_locks: A
+    * In graph: Empty
+    */
+
+   acquire B
+   /*
+    * Queue B into hist_locks
+    *
+    * In hist_locks: A, B
+    * In graph: Empty
+    */
+
+   acquire C
+   /*
+    * Queue C into hist_locks
+    *
+    * In hist_locks: A, B, C
+    * In graph: Empty
+    */
+
+   commit C
+   /*
+    * Add 'C -> ?'
+    * Answer the following to decide '?'
+    * What has been queued since acquire C: Nothing
+    *
+    * In hist_locks: A, B, C
+    * In graph: Empty
+    */
+
+   release C
+
+   commit B
+   /*
+    * Add 'B -> ?'
+    * Answer the following to decide '?'
+    * What has been queued since acquire B: C
+    *
+    * In hist_locks: A, B, C
+    * In graph: 'B -> C'
+    */
+
+   release B
+
+   commit A
+   /*
+    * Add 'A -> ?'
+    * Answer the following to decide '?'
+    * What has been queued since acquire A: B, C
+    *
+    * In hist_locks: A, B, C
+    * In graph: 'B -> C', 'A -> B', 'A -> C'
+    */
+
+   release A
+
+   where A, B and C are different lock classes.
+
+In this case, dependencies are added at the commit step as described.
+
+After commits for A, B and C, the graph will be:
+
+   A -> B -> C
+
+   where A, B and C are different lock classes.
+
+   NOTE: A dependency 'A -> C' is optimized out.
+
+We can see the former graph built without commit step is same as the
+latter graph built using commit steps. Of course the former way leads to
+earlier finish for building the graph, which means we can detect a
+deadlock or its possibility sooner. So the former way would be prefered
+when possible. But we cannot avoid using the latter way for crosslocks.
+
+Let's look at how commit steps work for crosslocks. In this case, the
+commit step is performed only on crosslock AX as real. And it assumes
+that the AX release context is different from the AX acquire context.
+
+   BX RELEASE CONTEXT		   BX ACQUIRE CONTEXT
+   ------------------		   ------------------
+				   acquire A
+				   /*
+				    * Push A onto held_locks
+				    * Queue A into hist_locks
+				    *
+				    * In held_locks: A
+				    * In hist_locks: A
+				    * In graph: Empty
+				    */
+
+				   acquire BX
+				   /*
+				    * Add 'the top of held_locks -> BX'
+				    *
+				    * In held_locks: A
+				    * In hist_locks: A
+				    * In graph: 'A -> BX'
+				    */
+
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+   It must be guaranteed that the following operations are seen after
+   acquiring BX globally. It can be done by things like barrier.
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+   acquire C
+   /*
+    * Push C onto held_locks
+    * Queue C into hist_locks
+    *
+    * In held_locks: C
+    * In hist_locks: C
+    * In graph: 'A -> BX'
+    */
+
+   release C
+   /*
+    * Pop C from held_locks
+    *
+    * In held_locks: Empty
+    * In hist_locks: C
+    * In graph: 'A -> BX'
+    */
+				   acquire D
+				   /*
+				    * Push D onto held_locks
+				    * Queue D into hist_locks
+				    * Add 'the top of held_locks -> D'
+				    *
+				    * In held_locks: A, D
+				    * In hist_locks: A, D
+				    * In graph: 'A -> BX', 'A -> D'
+				    */
+   acquire E
+   /*
+    * Push E onto held_locks
+    * Queue E into hist_locks
+    *
+    * In held_locks: E
+    * In hist_locks: C, E
+    * In graph: 'A -> BX', 'A -> D'
+    */
+
+   release E
+   /*
+    * Pop E from held_locks
+    *
+    * In held_locks: Empty
+    * In hist_locks: D, E
+    * In graph: 'A -> BX', 'A -> D'
+    */
+				   release D
+				   /*
+				    * Pop D from held_locks
+				    *
+				    * In held_locks: A
+				    * In hist_locks: A, D
+				    * In graph: 'A -> BX', 'A -> D'
+				    */
+   commit BX
+   /*
+    * Add 'BX -> ?'
+    * What has been queued since acquire BX: C, E
+    *
+    * In held_locks: Empty
+    * In hist_locks: D, E
+    * In graph: 'A -> BX', 'A -> D',
+    *           'BX -> C', 'BX -> E'
+    */
+
+   release BX
+   /*
+    * In held_locks: Empty
+    * In hist_locks: D, E
+    * In graph: 'A -> BX', 'A -> D',
+    *           'BX -> C', 'BX -> E'
+    */
+				   release A
+				   /*
+				    * Pop A from held_locks
+				    *
+				    * In held_locks: Empty
+				    * In hist_locks: A, D
+				    * In graph: 'A -> BX', 'A -> D',
+				    *           'BX -> C', 'BX -> E'
+				    */
+
+   where A, BX, C,..., E are different lock classes, and a suffix 'X' is
+   added on crosslocks.
+
+Crossrelease considers all acquisitions after acqiuring BX are
+candidates which might create dependencies with BX. True dependencies
+will be determined when identifying the release context of BX. Meanwhile,
+all typical locks are queued so that they can be used at the commit step.
+And then two dependencies 'BX -> C' and 'BX -> E' are added at the
+commit step when identifying the release context.
+
+The final graph will be, with crossrelease:
+
+               -> C
+              /
+       -> BX -
+      /       \
+   A -         -> E
+      \
+       -> D
+
+   where A, BX, C,..., E are different lock classes, and a suffix 'X' is
+   added on crosslocks.
+
+However, the final graph will be, without crossrelease:
+
+   A -> D
+
+   where A and D are different lock classes.
+
+The former graph has three more dependencies, 'A -> BX', 'BX -> C' and
+'BX -> E' giving additional opportunities to check if they cause
+deadlocks. This way lockdep can detect a deadlock or its possibility
+caused by crosslocks.
+
+CONCLUSION
+
+We checked how crossrelease works with several examples.
+
+
+=============
+Optimizations
+=============
+
+Avoid duplication
+-----------------
+
+Crossrelease feature uses a cache like what lockdep already uses for
+dependency chains, but this time it's for caching CT type dependencies.
+Once that dependency is cached, the same will never be added again.
+
+
+Lockless for hot paths
+----------------------
+
+To keep all locks for later use at the commit step, crossrelease adopts
+a local array embedded in task_struct, which makes access to the data
+lockless by forcing it to happen only within the owner context. It's
+like how lockdep handles held_locks. Lockless implmentation is important
+since typical locks are very frequently acquired and released.
+
+
+=================================================
+APPENDIX A: What lockdep does to work aggresively
+=================================================
+
+A deadlock actually occurs when all wait operations creating circular
+dependencies run at the same time. Even though they don't, a potential
+deadlock exists if the problematic dependencies exist. Thus it's
+meaningful to detect not only an actual deadlock but also its potential
+possibility. The latter is rather valuable. When a deadlock occurs
+actually, we can identify what happens in the system by some means or
+other even without lockdep. However, there's no way to detect possiblity
+without lockdep unless the whole code is parsed in head. It's terrible.
+Lockdep does the both, and crossrelease only focuses on the latter.
+
+Whether or not a deadlock actually occurs depends on several factors.
+For example, what order contexts are switched in is a factor. Assuming
+circular dependencies exist, a deadlock would occur when contexts are
+switched so that all wait operations creating the dependencies run
+simultaneously. Thus to detect a deadlock possibility even in the case
+that it has not occured yet, lockdep should consider all possible
+combinations of dependencies, trying to:
+
+1. Use a global dependency graph.
+
+   Lockdep combines all dependencies into one global graph and uses them,
+   regardless of which context generates them or what order contexts are
+   switched in. Aggregated dependencies are only considered so they are
+   prone to be circular if a problem exists.
+
+2. Check dependencies between classes instead of instances.
+
+   What actually causes a deadlock are instances of lock. However,
+   lockdep checks dependencies between classes instead of instances.
+   This way lockdep can detect a deadlock which has not happened but
+   might happen in future by others but the same class.
+
+3. Assume all acquisitions lead to waiting.
+
+   Although locks might be acquired without waiting which is essential
+   to create dependencies, lockdep assumes all acquisitions lead to
+   waiting since it might be true some time or another.
+
+CONCLUSION
+
+Lockdep detects not only an actual deadlock but also its possibility,
+and the latter is more valuable.
+
+
+==================================================
+APPENDIX B: How to avoid adding false dependencies
+==================================================
+
+Remind what a dependency is. A dependency exists if:
+
+   1. There are two waiters waiting for each event at a given time.
+   2. The only way to wake up each waiter is to trigger its event.
+   3. Whether one can be woken up depends on whether the other can.
+
+For example:
+
+   acquire A
+   acquire B /* A dependency 'A -> B' exists */
+   release B
+   release A
+
+   where A and B are different lock classes.
+
+A depedency 'A -> B' exists since:
+
+   1. A waiter for A and a waiter for B might exist when acquiring B.
+   2. Only way to wake up each is to release what it waits for.
+   3. Whether the waiter for A can be woken up depends on whether the
+      other can. IOW, TASK X cannot release A if it fails to acquire B.
+
+For another example:
+
+   TASK X			   TASK Y
+   ------			   ------
+				   acquire AX
+   acquire B /* A dependency 'AX -> B' exists */
+   release B
+   release AX held by Y
+
+   where AX and B are different lock classes, and a suffix 'X' is added
+   on crosslocks.
+
+Even in this case involving crosslocks, the same rule can be applied. A
+depedency 'AX -> B' exists since:
+
+   1. A waiter for AX and a waiter for B might exist when acquiring B.
+   2. Only way to wake up each is to release what it waits for.
+   3. Whether the waiter for AX can be woken up depends on whether the
+      other can. IOW, TASK X cannot release AX if it fails to acquire B.
+
+Let's take a look at more complicated example:
+
+   TASK X			   TASK Y
+   ------			   ------
+   acquire B
+   release B
+   fork Y
+				   acquire AX
+   acquire C /* A dependency 'AX -> C' exists */
+   release C
+   release AX held by Y
+
+   where AX, B and C are different lock classes, and a suffix 'X' is
+   added on crosslocks.
+
+Does a dependency 'AX -> B' exist? Nope.
+
+Two waiters are essential to create a dependency. However, waiters for
+AX and B to create 'AX -> B' cannot exist at the same time in this
+example. Thus the dependency 'AX -> B' cannot be created.
+
+It would be ideal if the full set of true ones can be considered. But
+we can ensure nothing but what actually happened. Relying on what
+actually happens at runtime, we can anyway add only true ones, though
+they might be a subset of true ones. It's similar to how lockdep works
+for typical locks. There might be more true dependencies than what
+lockdep has detected in runtime. Lockdep has no choice but to rely on
+what actually happens. Crossrelease also relies on it.
+
+CONCLUSION
+
+Relying on what actually happens, lockdep can avoid adding false
+dependencies.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 05/13] lockdep: Pass a callback arg to check_prev_add() to handle stack_trace
  2017-01-18 13:17 ` [PATCH v5 05/13] lockdep: Pass a callback arg to check_prev_add() to handle stack_trace Byungchul Park
@ 2017-01-26  7:43   ` Byungchul Park
  0 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-26  7:43 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

I fixed a hole that peterz pointed out. And then, I think the following
is reasonable. Don't you think so?

----->8-----
commit ac185d1820ee7223773ec3e23f614c1fe5c079fc
Author: Byungchul Park <byungchul.park@lge.com>
Date:   Tue Jan 24 14:46:14 2017 +0900

    lockdep: Pass a callback arg to check_prev_add() to handle stack_trace
    
    Currently, a separate stack_trace instance cannot be used in
    check_prev_add(). The simplest way to achieve it is to pass a
    stack_trace instance to check_prev_add() as an argument after
    saving it. However, unnecessary saving can happen if so implemented.
    
    The proper solution is to pass a callback function additionally along
    with a stack_trace so that a caller can decide the way to save. Actually,
    crossrelease don't need to save stack_trace of current, but only need to
    copy stack_traces from temporary buffers to the global stack_trace[].
    
    In addition, check_prev_add() returns 2 in case that the lock does not
    need to be added into the dependency graph because it was already in.
    However, the return value is not used any more. So, this patch changes
    it to mean that lockdep successfully save stack_trace and add the lock
    to the graph.
    
    Signed-off-by: Byungchul Park <byungchul.park@lge.com>

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 7fe6af1..9562b29 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1805,20 +1805,13 @@ static inline void inc_chains(void)
  */
 static int
 check_prev_add(struct task_struct *curr, struct held_lock *prev,
-	       struct held_lock *next, int distance, int *stack_saved)
+	       struct held_lock *next, int distance, struct stack_trace *trace,
+	       int (*save)(struct stack_trace *trace))
 {
 	struct lock_list *entry;
 	int ret;
 	struct lock_list this;
 	struct lock_list *uninitialized_var(target_entry);
-	/*
-	 * Static variable, serialized by the graph_lock().
-	 *
-	 * We use this static variable to save the stack trace in case
-	 * we call into this function multiple times due to encountering
-	 * trylocks in the held lock stack.
-	 */
-	static struct stack_trace trace;
 
 	/*
 	 * Prove that the new <prev> -> <next> dependency would not
@@ -1862,15 +1855,12 @@ static inline void inc_chains(void)
 		if (entry->class == hlock_class(next)) {
 			if (distance == 1)
 				entry->distance = 1;
-			return 2;
+			return 1;
 		}
 	}
 
-	if (!*stack_saved) {
-		if (!save_trace(&trace))
-			return 0;
-		*stack_saved = 1;
-	}
+	if (save && !save(trace))
+		return 0;
 
 	/*
 	 * Ok, all validations passed, add the new lock
@@ -1878,14 +1868,14 @@ static inline void inc_chains(void)
 	 */
 	ret = add_lock_to_list(hlock_class(prev), hlock_class(next),
 			       &hlock_class(prev)->locks_after,
-			       next->acquire_ip, distance, &trace);
+			       next->acquire_ip, distance, trace);
 
 	if (!ret)
 		return 0;
 
 	ret = add_lock_to_list(hlock_class(next), hlock_class(prev),
 			       &hlock_class(next)->locks_before,
-			       next->acquire_ip, distance, &trace);
+			       next->acquire_ip, distance, trace);
 	if (!ret)
 		return 0;
 
@@ -1893,8 +1883,6 @@ static inline void inc_chains(void)
 	 * Debugging printouts:
 	 */
 	if (verbose(hlock_class(prev)) || verbose(hlock_class(next))) {
-		/* We drop graph lock, so another thread can overwrite trace. */
-		*stack_saved = 0;
 		graph_unlock();
 		printk("\n new dependency: ");
 		print_lock_name(hlock_class(prev));
@@ -1902,9 +1890,10 @@ static inline void inc_chains(void)
 		print_lock_name(hlock_class(next));
 		printk(KERN_CONT "\n");
 		dump_stack();
-		return graph_lock();
+		if (!graph_lock())
+			return 0;
 	}
-	return 1;
+	return 2;
 }
 
 /*
@@ -1917,8 +1906,9 @@ static inline void inc_chains(void)
 check_prevs_add(struct task_struct *curr, struct held_lock *next)
 {
 	int depth = curr->lockdep_depth;
-	int stack_saved = 0;
 	struct held_lock *hlock;
+	struct stack_trace trace;
+	int (*save)(struct stack_trace *trace) = save_trace;
 
 	/*
 	 * Debugging checks.
@@ -1943,9 +1933,18 @@ static inline void inc_chains(void)
 		 * added:
 		 */
 		if (hlock->read != 2 && hlock->check) {
-			if (!check_prev_add(curr, hlock, next,
-						distance, &stack_saved))
+			int ret = check_prev_add(curr, hlock, next,
+						distance, &trace, save);
+			if (!ret)
 				return 0;
+
+			/*
+			 * Stop saving stack_trace if save_trace() was
+			 * called at least once:
+			 */
+			if (save && ret == 2)
+				save = NULL;
+
 			/*
 			 * Stop after the first non-trylock entry,
 			 * as non-trylock entries have added their

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 01/13] lockdep: Refactor lookup_chain_cache()
  2017-01-19  9:16   ` Boqun Feng
  2017-01-19  9:52     ` Byungchul Park
@ 2017-01-26  7:53     ` Byungchul Park
  1 sibling, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-01-26  7:53 UTC (permalink / raw)
  To: Boqun Feng
  Cc: peterz, mingo, tglx, walken, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

I added a comment that you recommanded after modifying a bit, like the
following. Please let me know if the my sentence is rather awkward.

Thank you.

----->8-----
commit bb8ad95a4944eec6ab72e950ef063960791b0d8c
Author: Byungchul Park <byungchul.park@lge.com>
Date:   Tue Jan 24 16:44:16 2017 +0900

    lockdep: Refactor lookup_chain_cache()
    
    Currently, lookup_chain_cache() provides both 'lookup' and 'add'
    functionalities in a function. However, each is useful. So this
    patch makes lookup_chain_cache() only do 'lookup' functionality and
    makes add_chain_cahce() only do 'add' functionality. And it's more
    readable than before.
    
    Signed-off-by: Byungchul Park <byungchul.park@lge.com>

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 4d7ffc0..0c6e6b7 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2110,14 +2110,15 @@ static int check_no_collision(struct task_struct *curr,
 }
 
 /*
- * Look up a dependency chain. If the key is not present yet then
- * add it and return 1 - in this case the new dependency chain is
- * validated. If the key is already hashed, return 0.
- * (On return with 1 graph_lock is held.)
+ * Adds a dependency chain into chain hashtable. And must be called with
+ * graph_lock held.
+ *
+ * Return 0 if fail, and graph_lock is released.
+ * Return 1 if succeed, with graph_lock held.
  */
-static inline int lookup_chain_cache(struct task_struct *curr,
-				     struct held_lock *hlock,
-				     u64 chain_key)
+static inline int add_chain_cache(struct task_struct *curr,
+				  struct held_lock *hlock,
+				  u64 chain_key)
 {
 	struct lock_class *class = hlock_class(hlock);
 	struct hlist_head *hash_head = chainhashentry(chain_key);
@@ -2125,49 +2126,18 @@ static inline int lookup_chain_cache(struct task_struct *curr,
 	int i, j;
 
 	/*
+	 * Allocate a new chain entry from the static array, and add
+	 * it to the hash:
+	 */
+
+	/*
 	 * We might need to take the graph lock, ensure we've got IRQs
 	 * disabled to make this an IRQ-safe lock.. for recursion reasons
 	 * lockdep won't complain about its own locking errors.
 	 */
 	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
 		return 0;
-	/*
-	 * We can walk it lock-free, because entries only get added
-	 * to the hash:
-	 */
-	hlist_for_each_entry_rcu(chain, hash_head, entry) {
-		if (chain->chain_key == chain_key) {
-cache_hit:
-			debug_atomic_inc(chain_lookup_hits);
-			if (!check_no_collision(curr, hlock, chain))
-				return 0;
 
-			if (very_verbose(class))
-				printk("\nhash chain already cached, key: "
-					"%016Lx tail class: [%p] %s\n",
-					(unsigned long long)chain_key,
-					class->key, class->name);
-			return 0;
-		}
-	}
-	if (very_verbose(class))
-		printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
-			(unsigned long long)chain_key, class->key, class->name);
-	/*
-	 * Allocate a new chain entry from the static array, and add
-	 * it to the hash:
-	 */
-	if (!graph_lock())
-		return 0;
-	/*
-	 * We have to walk the chain again locked - to avoid duplicates:
-	 */
-	hlist_for_each_entry(chain, hash_head, entry) {
-		if (chain->chain_key == chain_key) {
-			graph_unlock();
-			goto cache_hit;
-		}
-	}
 	if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
 		if (!debug_locks_off_graph_unlock())
 			return 0;
@@ -2219,6 +2189,75 @@ static inline int lookup_chain_cache(struct task_struct *curr,
 	return 1;
 }
 
+/*
+ * Look up a dependency chain.
+ */
+static inline struct lock_chain *lookup_chain_cache(u64 chain_key)
+{
+	struct hlist_head *hash_head = chainhashentry(chain_key);
+	struct lock_chain *chain;
+
+	/*
+	 * We can walk it lock-free, because entries only get added
+	 * to the hash:
+	 */
+	hlist_for_each_entry_rcu(chain, hash_head, entry) {
+		if (chain->chain_key == chain_key) {
+			debug_atomic_inc(chain_lookup_hits);
+			return chain;
+		}
+	}
+	return NULL;
+}
+
+/*
+ * If the key is not present yet in dependency chain cache then
+ * add it and return 1 - in this case the new dependency chain is
+ * validated. If the key is already hashed, return 0.
+ * (On return with 1 graph_lock is held.)
+ */
+static inline int lookup_chain_cache_add(struct task_struct *curr,
+					 struct held_lock *hlock,
+					 u64 chain_key)
+{
+	struct lock_class *class = hlock_class(hlock);
+	struct lock_chain *chain = lookup_chain_cache(chain_key);
+
+	if (chain) {
+cache_hit:
+		if (!check_no_collision(curr, hlock, chain))
+			return 0;
+
+		if (very_verbose(class))
+			printk("\nhash chain already cached, key: "
+					"%016Lx tail class: [%p] %s\n",
+					(unsigned long long)chain_key,
+					class->key, class->name);
+		return 0;
+	}
+
+	if (very_verbose(class))
+		printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
+			(unsigned long long)chain_key, class->key, class->name);
+
+	if (!graph_lock())
+		return 0;
+
+	/*
+	 * We have to walk the chain again locked - to avoid duplicates:
+	 */
+	chain = lookup_chain_cache(chain_key);
+	if (chain) {
+		graph_unlock();
+		goto cache_hit;
+	}
+
+	if (!add_chain_cache(curr, hlock, chain_key))
+		return 0;
+
+	return 1;
+}
+
 static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 		struct held_lock *hlock, int chain_head, u64 chain_key)
 {
@@ -2229,11 +2268,11 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 	 *
 	 * We look up the chain_key and do the O(N^2) check and update of
 	 * the dependencies only if this is a new dependency chain.
-	 * (If lookup_chain_cache() returns with 1 it acquires
+	 * (If lookup_chain_cache_add() return with 1 it acquires
 	 * graph_lock for us)
 	 */
 	if (!hlock->trylock && hlock->check &&
-	    lookup_chain_cache(curr, hlock, chain_key)) {
+	    lookup_chain_cache_add(curr, hlock, chain_key)) {
 		/*
 		 * Check whether last held lock:
 		 *
@@ -2264,9 +2303,10 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 		if (!chain_head && ret != 2)
 			if (!check_prevs_add(curr, hlock))
 				return 0;
+
 		graph_unlock();
 	} else
-		/* after lookup_chain_cache(): */
+		/* after lookup_chain_cache_add(): */
 		if (unlikely(!debug_locks))
 			return 0;
 

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 00/13] lockdep: Implement crossrelease feature
  2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
                   ` (12 preceding siblings ...)
  2017-01-18 13:17 ` [PATCH v5 13/13] lockdep: Crossrelease feature documentation Byungchul Park
@ 2017-02-20  8:38 ` Byungchul Park
  13 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-02-20  8:38 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

On Wed, Jan 18, 2017 at 10:17:26PM +0900, Byungchul Park wrote:
> I checked if crossrelease feature works well on my qemu-i386 machine.
> There's no problem at all to work on mine. But I wonder if it's also
> true even on other machines. Especially, on large system. Could you
> let me know if it doesn't work on yours? Or Could you let me know if
> crossrelease feature is useful? Please let me know if you need to
> backport it to another version but it's not easy. Then I can provide
> the backported version after working it.

Hello peterz,

I don't want to rush you, but I think enough time has passed. Could you
check this? I tried to apply what you recommanded at the previous spin
as much as possible. Could you?

Thanks,
Byungchul

> 
> -----8<-----
> 
> Change from v4
> 	- rebase on vanilla v4.9 tag
> 	- re-name pend_lock(plock) to hist_lock(xhlock)
> 	- allow overwriting ring buffer for hist_lock
> 	- unwind ring buffer instead of tagging id for each irq
> 	- introduce lockdep_map_cross embedding cross_lock
> 	- make each work of workqueue distinguishable
> 	- enhance comments
> 	(I will update the document at the next spin.)
> 
> Change from v3
> 	- reviced document
> 
> Change from v2
> 	- rebase on vanilla v4.7 tag
> 	- move lockdep data for page lock from struct page to page_ext
> 	- allocate plocks buffer via vmalloc instead of in struct task
> 	- enhanced comments and document
> 	- optimize performance
> 	- make reporting function crossrelease-aware
> 
> Change from v1
> 	- enhanced the document
> 	- removed save_stack_trace() optimizing patch
> 	- made this based on the seperated save_stack_trace patchset
> 	  https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1182242.html
> 
> Can we detect deadlocks below with original lockdep?
> 
> Example 1)
> 
> 	PROCESS X	PROCESS Y
> 	--------------	--------------
> 	mutext_lock A
> 			lock_page B
> 	lock_page B
> 			mutext_lock A // DEADLOCK
> 	unlock_page B
> 			mutext_unlock A
> 	mutex_unlock A
> 			unlock_page B
> 
> where A and B are different lock classes.
> 
> No, we cannot.
> 
> Example 2)
> 
> 	PROCESS X	PROCESS Y	PROCESS Z
> 	--------------	--------------	--------------
> 			mutex_lock A
> 	lock_page B
> 			lock_page B
> 					mutext_lock A // DEADLOCK
> 					mutext_unlock A
> 					unlock_page B
> 					(B was held by PROCESS X)
> 			unlock_page B
> 			mutex_unlock A
> 
> where A and B are different lock classes.
> 
> No, we cannot.
> 
> Example 3)
> 
> 	PROCESS X	PROCESS Y
> 	--------------	--------------
> 			mutex_lock A
> 	mutex_lock A
> 			wait_for_complete B // DEADLOCK
> 	mutex_unlock A
> 	complete B
> 			mutex_unlock A
> 
> where A is a lock class and B is a completion variable.
> 
> No, we cannot.
> 
> Not only lock operations, but also any operations causing to wait or
> spin for something can cause deadlock unless it's eventually *released*
> by someone. The important point here is that the waiting or spinning
> must be *released* by someone.
> 
> Using crossrelease feature, we can check dependency and detect deadlock
> possibility not only for typical lock, but also for lock_page(),
> wait_for_xxx() and so on, which might be released in any context.
> 
> See the last patch including the document for more information.
> 
> Byungchul Park (13):
>   lockdep: Refactor lookup_chain_cache()
>   lockdep: Fix wrong condition to print bug msgs for
>     MAX_LOCKDEP_CHAIN_HLOCKS
>   lockdep: Add a function building a chain between two classes
>   lockdep: Refactor save_trace()
>   lockdep: Pass a callback arg to check_prev_add() to handle stack_trace
>   lockdep: Implement crossrelease feature
>   lockdep: Make print_circular_bug() aware of crossrelease
>   lockdep: Apply crossrelease to completions
>   pagemap.h: Remove trailing white space
>   lockdep: Apply crossrelease to PG_locked locks
>   lockdep: Apply lock_acquire(release) on __Set(__Clear)PageLocked
>   lockdep: Move data of CONFIG_LOCKDEP_PAGELOCK from page to page_ext
>   lockdep: Crossrelease feature documentation
> 
>  Documentation/locking/crossrelease.txt | 1053 ++++++++++++++++++++++++++++++++
>  include/linux/completion.h             |  118 +++-
>  include/linux/irqflags.h               |   24 +-
>  include/linux/lockdep.h                |  129 ++++
>  include/linux/mm_types.h               |    4 +
>  include/linux/page-flags.h             |   43 +-
>  include/linux/page_ext.h               |    4 +
>  include/linux/pagemap.h                |  124 +++-
>  include/linux/sched.h                  |    9 +
>  kernel/exit.c                          |    9 +
>  kernel/fork.c                          |   23 +
>  kernel/locking/lockdep.c               |  763 ++++++++++++++++++++---
>  kernel/sched/completion.c              |   54 +-
>  kernel/workqueue.c                     |    1 +
>  lib/Kconfig.debug                      |   30 +
>  mm/filemap.c                           |   76 ++-
>  mm/page_ext.c                          |    4 +
>  17 files changed, 2324 insertions(+), 144 deletions(-)
>  create mode 100644 Documentation/locking/crossrelease.txt
> 
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-01-18 13:17 ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
@ 2017-02-28 12:26   ` Peter Zijlstra
  2017-02-28 12:45   ` Peter Zijlstra
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 63+ messages in thread
From: Peter Zijlstra @ 2017-02-28 12:26 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> +	/*
> +	 * Each work of workqueue might run in a different context,
> +	 * thanks to concurrency support of workqueue. So we have to
> +	 * distinguish each work to avoid false positive.
> +	 *
> +	 * TODO: We can also add dependencies between two acquisitions
> +	 * of different work_id, if they don't cause a sleep so make
> +	 * the worker stalled.
> +	 */
> +	unsigned int		work_id;

I don't understand... please explain.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-01-18 13:17 ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
  2017-02-28 12:26   ` Peter Zijlstra
@ 2017-02-28 12:45   ` Peter Zijlstra
  2017-02-28 12:49     ` Peter Zijlstra
  2017-02-28 13:05   ` Peter Zijlstra
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2017-02-28 12:45 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> +	/*
> +	 * struct held_lock does not have an indicator whether in nmi.
> +	 */
> +	int nmi;

Do we really need this? Lockdep doesn't really know about NMI context,
so its weird to now partially introduce it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-02-28 12:45   ` Peter Zijlstra
@ 2017-02-28 12:49     ` Peter Zijlstra
  2017-03-01  6:20       ` Byungchul Park
  0 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2017-02-28 12:49 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

On Tue, Feb 28, 2017 at 01:45:07PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> > +	/*
> > +	 * struct held_lock does not have an indicator whether in nmi.
> > +	 */
> > +	int nmi;
> 
> Do we really need this? Lockdep doesn't really know about NMI context,
> so its weird to now partially introduce it.

That is, see how nmi_enter() includes lockdep_off().

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-01-18 13:17 ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
  2017-02-28 12:26   ` Peter Zijlstra
  2017-02-28 12:45   ` Peter Zijlstra
@ 2017-02-28 13:05   ` Peter Zijlstra
  2017-02-28 13:28     ` Byungchul Park
  2017-02-28 13:10   ` Peter Zijlstra
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2017-02-28 13:05 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> +#define MAX_XHLOCKS_NR 64UL

> +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> +	if (tsk->xhlocks) {
> +		void *tmp = tsk->xhlocks;
> +		/* Disable crossrelease for current */
> +		tsk->xhlocks = NULL;
> +		vfree(tmp);
> +	}
> +#endif

> +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> +	p->xhlock_idx = 0;
> +	p->xhlock_idx_soft = 0;
> +	p->xhlock_idx_hard = 0;
> +	p->xhlock_idx_nmi = 0;
> +	p->xhlocks = vzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR);

I don't think we need vmalloc for this now.

> +	p->work_id = 0;
> +#endif

> +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> +	if (p->xhlocks) {
> +		void *tmp = p->xhlocks;
> +		/* Diable crossrelease for current */
> +		p->xhlocks = NULL;
> +		vfree(tmp);
> +	}
> +#endif

Second instance of the same code, which would suggest using a function
for this. Also, with a function you can loose the #ifdeffery.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-01-18 13:17 ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
                     ` (2 preceding siblings ...)
  2017-02-28 13:05   ` Peter Zijlstra
@ 2017-02-28 13:10   ` Peter Zijlstra
  2017-02-28 13:24     ` Byungchul Park
  2017-02-28 13:40   ` Peter Zijlstra
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2017-02-28 13:10 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

> +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> +
> +#define idx(t)			((t)->xhlock_idx)
> +#define idx_prev(i)		((i) ? (i) - 1 : MAX_XHLOCKS_NR - 1)
> +#define idx_next(i)		(((i) + 1) % MAX_XHLOCKS_NR)

Note that:

#define idx_prev(i)		(((i) - 1) % MAX_XHLOCKS_NR)
#define idx_next(i)		(((i) + 1) % MAX_XHLOCKS_NR)

is more symmetric and easier to understand.

> +
> +/* For easy access to xhlock */
> +#define xhlock(t, i)		((t)->xhlocks + (i))
> +#define xhlock_prev(t, l)	xhlock(t, idx_prev((l) - (t)->xhlocks))
> +#define xhlock_curr(t)		xhlock(t, idx(t))

So these result in an xhlock pointer

> +#define xhlock_incr(t)		({idx(t) = idx_next(idx(t));})

This does not; which is confusing seeing how they share the same
namespace; also incr is weird.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-02-28 13:10   ` Peter Zijlstra
@ 2017-02-28 13:24     ` Byungchul Park
  2017-02-28 18:29       ` Peter Zijlstra
  0 siblings, 1 reply; 63+ messages in thread
From: Byungchul Park @ 2017-02-28 13:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

On Tue, Feb 28, 2017 at 02:10:12PM +0100, Peter Zijlstra wrote:
> > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > +
> > +#define idx(t)			((t)->xhlock_idx)
> > +#define idx_prev(i)		((i) ? (i) - 1 : MAX_XHLOCKS_NR - 1)
> > +#define idx_next(i)		(((i) + 1) % MAX_XHLOCKS_NR)
> 
> Note that:
> 
> #define idx_prev(i)		(((i) - 1) % MAX_XHLOCKS_NR)
> #define idx_next(i)		(((i) + 1) % MAX_XHLOCKS_NR)
> 
> is more symmetric and easier to understand.

OK. I will do it after forcing MAX_XHLOCKS_NR to be power of 2. Current
value of it is already power of 2 but I need to add comment explaning it.

> > +
> > +/* For easy access to xhlock */
> > +#define xhlock(t, i)		((t)->xhlocks + (i))
> > +#define xhlock_prev(t, l)	xhlock(t, idx_prev((l) - (t)->xhlocks))
> > +#define xhlock_curr(t)		xhlock(t, idx(t))
> 
> So these result in an xhlock pointer
> 
> > +#define xhlock_incr(t)		({idx(t) = idx_next(idx(t));})
> 
> This does not; which is confusing seeing how they share the same
> namespace; also incr is weird.

OK.. Could you suggest a better name? xhlock_adv()? advance_xhlock()?
And.. replace it with a function?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-02-28 13:05   ` Peter Zijlstra
@ 2017-02-28 13:28     ` Byungchul Park
  2017-02-28 13:35       ` Peter Zijlstra
  0 siblings, 1 reply; 63+ messages in thread
From: Byungchul Park @ 2017-02-28 13:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Tue, Feb 28, 2017 at 02:05:13PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> > +#define MAX_XHLOCKS_NR 64UL
> 
> > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > +	if (tsk->xhlocks) {
> > +		void *tmp = tsk->xhlocks;
> > +		/* Disable crossrelease for current */
> > +		tsk->xhlocks = NULL;
> > +		vfree(tmp);
> > +	}
> > +#endif
> 
> > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > +	p->xhlock_idx = 0;
> > +	p->xhlock_idx_soft = 0;
> > +	p->xhlock_idx_hard = 0;
> > +	p->xhlock_idx_nmi = 0;
> > +	p->xhlocks = vzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR);
> 
> I don't think we need vmalloc for this now.

Really? When is a better time to do it?

I think the time creating a task is the best time to initialize it. No?

> 
> > +	p->work_id = 0;
> > +#endif
> 
> > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > +	if (p->xhlocks) {
> > +		void *tmp = p->xhlocks;
> > +		/* Diable crossrelease for current */
> > +		p->xhlocks = NULL;
> > +		vfree(tmp);
> > +	}
> > +#endif
> 
> Second instance of the same code, which would suggest using a function
> for this. Also, with a function you can loose the #ifdeffery.

Yes. It looks much better.

Thank you very much.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-02-28 13:28     ` Byungchul Park
@ 2017-02-28 13:35       ` Peter Zijlstra
  2017-02-28 14:00         ` Byungchul Park
  0 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2017-02-28 13:35 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Tue, Feb 28, 2017 at 10:28:20PM +0900, Byungchul Park wrote:
> On Tue, Feb 28, 2017 at 02:05:13PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> > > +#define MAX_XHLOCKS_NR 64UL
> > 
> > > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > > +	if (tsk->xhlocks) {
> > > +		void *tmp = tsk->xhlocks;
> > > +		/* Disable crossrelease for current */
> > > +		tsk->xhlocks = NULL;
> > > +		vfree(tmp);
> > > +	}
> > > +#endif
> > 
> > > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > > +	p->xhlock_idx = 0;
> > > +	p->xhlock_idx_soft = 0;
> > > +	p->xhlock_idx_hard = 0;
> > > +	p->xhlock_idx_nmi = 0;
> > > +	p->xhlocks = vzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR);
> > 
> > I don't think we need vmalloc for this now.
> 
> Really? When is a better time to do it?
> 
> I think the time creating a task is the best time to initialize it. No?

The place is fine, but I would use kmalloc() now (and subsequently kfree
on the other end) for the allocation. Its not _that_ large anymore,
right?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-01-18 13:17 ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
                     ` (3 preceding siblings ...)
  2017-02-28 13:10   ` Peter Zijlstra
@ 2017-02-28 13:40   ` Peter Zijlstra
  2017-03-01  5:43     ` Byungchul Park
  2017-02-28 15:49   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2017-02-28 13:40 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> +	/*
> +	 * If the previous in held_locks can create a proper dependency
> +	 * with a target crosslock, then we can skip commiting this,
> +	 * since "the target crosslock -> the previous lock" and
> +	 * "the previous lock -> this lock" can cover the case. So we
> +	 * keep the previous's gen_id to make the decision.
> +	 */
> +	unsigned int		prev_gen_id;

> +static void add_xhlock(struct held_lock *hlock, unsigned int prev_gen_id)
> +{
> +	struct hist_lock *xhlock;
> +
> +	xhlock = alloc_xhlock();
> +
> +	/* Initialize hist_lock's members */
> +	xhlock->hlock = *hlock;
> +	xhlock->nmi = !!(preempt_count() & NMI_MASK);
> +	/*
> +	 * prev_gen_id is used to skip adding dependency at commit step,
> +	 * when the previous lock in held_locks can do that instead.
> +	 */
> +	xhlock->prev_gen_id = prev_gen_id;
> +	xhlock->work_id = current->work_id;
> +
> +	xhlock->trace.nr_entries = 0;
> +	xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
> +	xhlock->trace.entries = xhlock->trace_entries;
> +	xhlock->trace.skip = 3;
> +	save_stack_trace(&xhlock->trace);
> +}

> +static void check_add_xhlock(struct held_lock *hlock)
> +{
> +	struct held_lock *prev;
> +	struct held_lock *start;
> +	unsigned int gen_id;
> +	unsigned int gen_id_invalid;
> +
> +	if (!current->xhlocks || !depend_before(hlock))
> +		return;
> +
> +	gen_id = (unsigned int)atomic_read(&cross_gen_id);
> +	/*
> +	 * gen_id_invalid must be too old to be valid. That means
> +	 * current hlock should not be skipped but should be
> +	 * considered at commit step.
> +	 */
> +	gen_id_invalid = gen_id - (UINT_MAX / 4);
> +	start = current->held_locks;
> +
> +	for (prev = hlock - 1; prev >= start &&
> +			!depend_before(prev); prev--);
> +
> +	if (prev < start)
> +		add_xhlock(hlock, gen_id_invalid);
> +	else if (prev->gen_id != gen_id)
> +		add_xhlock(hlock, prev->gen_id);
> +}

> +static int commit_xhlocks(struct cross_lock *xlock)
> +{
> +	struct task_struct *curr = current;
> +	struct hist_lock *xhlock_c = xhlock_curr(curr);
> +	struct hist_lock *xhlock = xhlock_c;
> +
> +	do {
> +		xhlock = xhlock_prev(curr, xhlock);
> +
> +		if (!xhlock_used(xhlock))
> +			break;
> +
> +		if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
> +			break;
> +
> +		if (same_context_xhlock(xhlock) &&
> +		    before(xhlock->prev_gen_id, xlock->hlock.gen_id) &&
> +		    !commit_xhlock(xlock, xhlock))
> +			return 0;
> +	} while (xhlock_c != xhlock);
> +
> +	return 1;
> +}

So I'm still struggling with prev_gen_id; is it an optimization or is it
required for correctness?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-02-28 13:35       ` Peter Zijlstra
@ 2017-02-28 14:00         ` Byungchul Park
  0 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-02-28 14:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Tue, Feb 28, 2017 at 02:35:21PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 28, 2017 at 10:28:20PM +0900, Byungchul Park wrote:
> > On Tue, Feb 28, 2017 at 02:05:13PM +0100, Peter Zijlstra wrote:
> > > On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> > > > +#define MAX_XHLOCKS_NR 64UL
> > > 
> > > > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > > > +	if (tsk->xhlocks) {
> > > > +		void *tmp = tsk->xhlocks;
> > > > +		/* Disable crossrelease for current */
> > > > +		tsk->xhlocks = NULL;
> > > > +		vfree(tmp);
> > > > +	}
> > > > +#endif
> > > 
> > > > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > > > +	p->xhlock_idx = 0;
> > > > +	p->xhlock_idx_soft = 0;
> > > > +	p->xhlock_idx_hard = 0;
> > > > +	p->xhlock_idx_nmi = 0;
> > > > +	p->xhlocks = vzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR);
> > > 
> > > I don't think we need vmalloc for this now.
> > 
> > Really? When is a better time to do it?
> > 
> > I think the time creating a task is the best time to initialize it. No?
> 
> The place is fine, but I would use kmalloc() now (and subsequently kfree
> on the other end) for the allocation. Its not _that_ large anymore,
> right?

Did you mean that? OK, I will do it.

Thank you.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-01-18 13:17 ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
                     ` (4 preceding siblings ...)
  2017-02-28 13:40   ` Peter Zijlstra
@ 2017-02-28 15:49   ` Peter Zijlstra
  2017-03-01  5:17     ` Byungchul Park
  2017-02-28 18:15   ` Peter Zijlstra
  2017-03-02 13:41   ` Peter Zijlstra
  7 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2017-02-28 15:49 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:

> +struct cross_lock {
> +	/*
> +	 * When more than one acquisition of crosslocks are overlapped,
> +	 * we do actual commit only when ref == 0.
> +	 */
> +	atomic_t ref;

That comment doesn't seem right, should that be: ref != 0 ?

Also; would it not be much clearer to call this: nr_blocked, or waiters
or something along those lines, because that is what it appears to be.

> +	/*
> +	 * Seperate hlock instance. This will be used at commit step.
> +	 *
> +	 * TODO: Use a smaller data structure containing only necessary
> +	 * data. However, we should make lockdep code able to handle the
> +	 * smaller one first.
> +	 */
> +	struct held_lock	hlock;
> +};

> +static int add_xlock(struct held_lock *hlock)
> +{
> +	struct cross_lock *xlock;
> +	unsigned int gen_id;
> +
> +	if (!depend_after(hlock))
> +		return 1;
> +
> +	if (!graph_lock())
> +		return 0;
> +
> +	xlock = &((struct lockdep_map_cross *)hlock->instance)->xlock;
> +
> +	/*
> +	 * When acquisitions for a xlock are overlapped, we use
> +	 * a reference counter to handle it.

Handle what!? That comment is near empty.

> +	 */
> +	if (atomic_inc_return(&xlock->ref) > 1)
> +		goto unlock;

So you set the xlock's generation only once, to the oldest blocking-on
relation, which makes sense, you want to be able to related to all
historical locks since.

> +
> +	gen_id = (unsigned int)atomic_inc_return(&cross_gen_id);
> +	xlock->hlock = *hlock;
> +	xlock->hlock.gen_id = gen_id;
> +unlock:
> +	graph_unlock();
> +	return 1;
> +}

> +void lock_commit_crosslock(struct lockdep_map *lock)
> +{
> +	struct cross_lock *xlock;
> +	unsigned long flags;
> +
> +	if (!current->xhlocks)
> +		return;
> +
> +	if (unlikely(current->lockdep_recursion))
> +		return;
> +
> +	raw_local_irq_save(flags);
> +	check_flags(flags);
> +	current->lockdep_recursion = 1;
> +
> +	if (unlikely(!debug_locks))
> +		return;
> +
> +	if (!graph_lock())
> +		return;
> +
> +	xlock = &((struct lockdep_map_cross *)lock)->xlock;
> +	if (atomic_read(&xlock->ref) > 0 && !commit_xhlocks(xlock))

You terminate with graph_lock() held.

Also, I think you can do the atomic_read() outside of graph lock, to
avoid taking graph_lock when its 0.

> +		return;
> +
> +	graph_unlock();
> +	current->lockdep_recursion = 0;
> +	raw_local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL_GPL(lock_commit_crosslock);
> +
> +/*
> + * return 0: Need to do normal release operation.
> + * return 1: Done. No more release ops is needed.
> + */
> +static int lock_release_crosslock(struct lockdep_map *lock)
> +{
> +	if (cross_lock(lock)) {
> +		atomic_dec(&((struct lockdep_map_cross *)lock)->xlock.ref);
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +static void cross_init(struct lockdep_map *lock, int cross)
> +{
> +	if (cross)
> +		atomic_set(&((struct lockdep_map_cross *)lock)->xlock.ref, 0);
> +
> +	lock->cross = cross;
> +}

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-01-18 13:17 ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
                     ` (5 preceding siblings ...)
  2017-02-28 15:49   ` Peter Zijlstra
@ 2017-02-28 18:15   ` Peter Zijlstra
  2017-03-01  7:21     ` Byungchul Park
                       ` (2 more replies)
  2017-03-02 13:41   ` Peter Zijlstra
  7 siblings, 3 replies; 63+ messages in thread
From: Peter Zijlstra @ 2017-02-28 18:15 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> +	/*
> +	 * Each work of workqueue might run in a different context,
> +	 * thanks to concurrency support of workqueue. So we have to
> +	 * distinguish each work to avoid false positive.
> +	 *
> +	 * TODO: We can also add dependencies between two acquisitions
> +	 * of different work_id, if they don't cause a sleep so make
> +	 * the worker stalled.
> +	 */
> +	unsigned int		work_id;

> +/*
> + * Crossrelease needs to distinguish each work of workqueues.
> + * Caller is supposed to be a worker.
> + */
> +void crossrelease_work_start(void)
> +{
> +	if (current->xhlocks)
> +		current->work_id++;
> +}

So what you're trying to do with that 'work_id' thing is basically wipe
the entire history when we're at the bottom of a context.

Which is a useful operation, but should arguably also be done on the
return to userspace path. Any historical lock from before the current
syscall is irrelevant.

(And we should not be returning to userspace with locks held anyway --
lockdep already has a check for that).

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-02-28 13:24     ` Byungchul Park
@ 2017-02-28 18:29       ` Peter Zijlstra
  2017-03-01  4:40         ` Byungchul Park
  0 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2017-02-28 18:29 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

On Tue, Feb 28, 2017 at 10:24:44PM +0900, Byungchul Park wrote:
> On Tue, Feb 28, 2017 at 02:10:12PM +0100, Peter Zijlstra wrote:

> > > +/* For easy access to xhlock */
> > > +#define xhlock(t, i)		((t)->xhlocks + (i))
> > > +#define xhlock_prev(t, l)	xhlock(t, idx_prev((l) - (t)->xhlocks))
> > > +#define xhlock_curr(t)		xhlock(t, idx(t))
> > 
> > So these result in an xhlock pointer
> > 
> > > +#define xhlock_incr(t)		({idx(t) = idx_next(idx(t));})
> > 
> > This does not; which is confusing seeing how they share the same
> > namespace; also incr is weird.
> 
> OK.. Could you suggest a better name? xhlock_adv()? advance_xhlock()?
> And.. replace it with a function?

How about doing: xhlocks_idx++ ? That is, keep all the indexes as
regular u32 and only reduce the space when using them as index.

Also, I would write the loop:

> +static int commit_xhlocks(struct cross_lock *xlock)
> +{
> +     struct task_struct *curr = current;
> +     struct hist_lock *xhlock_c = xhlock_curr(curr);
> +     struct hist_lock *xhlock = xhlock_c;
> +
> +     do {
> +             xhlock = xhlock_prev(curr, xhlock);
> +
> +             if (!xhlock_used(xhlock))
> +                     break;
> +
> +             if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
> +                     break;
> +
> +             if (same_context_xhlock(xhlock) &&
> +                 before(xhlock->prev_gen_id, xlock->hlock.gen_id) &&
> +                 !commit_xhlock(xlock, xhlock))
> +                     return 0;
> +     } while (xhlock_c != xhlock);
> +
> +     return 1;
> +}

like:

#define xhlock(i)	current->xhlocks[i % MAX_XHLOCKS_NR]

	for (i = 0; i < MAX_XHLOCKS_NR; i++) {
		xhlock = xhlock(curr->xhlock_idx - i);

		/* ... */
	}

That avoids that horrible xhlock_prev() thing.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-02-28 18:29       ` Peter Zijlstra
@ 2017-03-01  4:40         ` Byungchul Park
  2017-03-01 10:45           ` Peter Zijlstra
  0 siblings, 1 reply; 63+ messages in thread
From: Byungchul Park @ 2017-03-01  4:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Tue, Feb 28, 2017 at 07:29:02PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 28, 2017 at 10:24:44PM +0900, Byungchul Park wrote:
> > On Tue, Feb 28, 2017 at 02:10:12PM +0100, Peter Zijlstra wrote:
> 
> > > > +/* For easy access to xhlock */
> > > > +#define xhlock(t, i)		((t)->xhlocks + (i))
> > > > +#define xhlock_prev(t, l)	xhlock(t, idx_prev((l) - (t)->xhlocks))
> > > > +#define xhlock_curr(t)		xhlock(t, idx(t))
> > > 
> > > So these result in an xhlock pointer
> > > 
> > > > +#define xhlock_incr(t)		({idx(t) = idx_next(idx(t));})
> > > 
> > > This does not; which is confusing seeing how they share the same
> > > namespace; also incr is weird.
> > 
> > OK.. Could you suggest a better name? xhlock_adv()? advance_xhlock()?
> > And.. replace it with a function?
> 
> How about doing: xhlocks_idx++ ? That is, keep all the indexes as
> regular u32 and only reduce the space when using them as index.

OK.

> 
> Also, I would write the loop:
> 
> > +static int commit_xhlocks(struct cross_lock *xlock)
> > +{
> > +     struct task_struct *curr = current;
> > +     struct hist_lock *xhlock_c = xhlock_curr(curr);
> > +     struct hist_lock *xhlock = xhlock_c;
> > +
> > +     do {
> > +             xhlock = xhlock_prev(curr, xhlock);
> > +
> > +             if (!xhlock_used(xhlock))
> > +                     break;
> > +
> > +             if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
> > +                     break;
> > +
> > +             if (same_context_xhlock(xhlock) &&
> > +                 before(xhlock->prev_gen_id, xlock->hlock.gen_id) &&
> > +                 !commit_xhlock(xlock, xhlock))
> > +                     return 0;
> > +     } while (xhlock_c != xhlock);
> > +
> > +     return 1;
> > +}
> 
> like:
> 
> #define xhlock(i)	current->xhlocks[i % MAX_XHLOCKS_NR]
> 
> 	for (i = 0; i < MAX_XHLOCKS_NR; i++) {
> 		xhlock = xhlock(curr->xhlock_idx - i);
> 
> 		/* ... */
> 	}
> 
> That avoids that horrible xhlock_prev() thing.

Right. I decided to force MAX_XHLOCKS_NR to be power of 2 and everything
became easy. Thank you very much.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-02-28 15:49   ` Peter Zijlstra
@ 2017-03-01  5:17     ` Byungchul Park
  2017-03-01  6:18       ` Byungchul Park
  2017-03-02  2:52       ` Byungchul Park
  0 siblings, 2 replies; 63+ messages in thread
From: Byungchul Park @ 2017-03-01  5:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Tue, Feb 28, 2017 at 04:49:00PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> 
> > +struct cross_lock {
> > +	/*
> > +	 * When more than one acquisition of crosslocks are overlapped,
> > +	 * we do actual commit only when ref == 0.
> > +	 */
> > +	atomic_t ref;
> 
> That comment doesn't seem right, should that be: ref != 0 ?
> Also; would it not be much clearer to call this: nr_blocked, or waiters
> or something along those lines, because that is what it appears to be.
> 
> > +	/*
> > +	 * Seperate hlock instance. This will be used at commit step.
> > +	 *
> > +	 * TODO: Use a smaller data structure containing only necessary
> > +	 * data. However, we should make lockdep code able to handle the
> > +	 * smaller one first.
> > +	 */
> > +	struct held_lock	hlock;
> > +};
> 
> > +static int add_xlock(struct held_lock *hlock)
> > +{
> > +	struct cross_lock *xlock;
> > +	unsigned int gen_id;
> > +
> > +	if (!depend_after(hlock))
> > +		return 1;
> > +
> > +	if (!graph_lock())
> > +		return 0;
> > +
> > +	xlock = &((struct lockdep_map_cross *)hlock->instance)->xlock;
> > +
> > +	/*
> > +	 * When acquisitions for a xlock are overlapped, we use
> > +	 * a reference counter to handle it.
> 
> Handle what!? That comment is near empty.

I will add more comment so that it can fully descibe.

> 
> > +	 */
> > +	if (atomic_inc_return(&xlock->ref) > 1)
> > +		goto unlock;
> 
> So you set the xlock's generation only once, to the oldest blocking-on
> relation, which makes sense, you want to be able to related to all
> historical locks since.
> 
> > +
> > +	gen_id = (unsigned int)atomic_inc_return(&cross_gen_id);
> > +	xlock->hlock = *hlock;
> > +	xlock->hlock.gen_id = gen_id;
> > +unlock:
> > +	graph_unlock();
> > +	return 1;
> > +}
> 
> > +void lock_commit_crosslock(struct lockdep_map *lock)
> > +{
> > +	struct cross_lock *xlock;
> > +	unsigned long flags;
> > +
> > +	if (!current->xhlocks)
> > +		return;
> > +
> > +	if (unlikely(current->lockdep_recursion))
> > +		return;
> > +
> > +	raw_local_irq_save(flags);
> > +	check_flags(flags);
> > +	current->lockdep_recursion = 1;
> > +
> > +	if (unlikely(!debug_locks))
> > +		return;
> > +
> > +	if (!graph_lock())
> > +		return;
> > +
> > +	xlock = &((struct lockdep_map_cross *)lock)->xlock;
> > +	if (atomic_read(&xlock->ref) > 0 && !commit_xhlocks(xlock))
> 
> You terminate with graph_lock() held.

Oops. What did I do? I'll fix it.

> 
> Also, I think you can do the atomic_read() outside of graph lock, to
> avoid taking graph_lock when its 0.

I'll do that if possible after thinking more.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-02-28 13:40   ` Peter Zijlstra
@ 2017-03-01  5:43     ` Byungchul Park
  2017-03-01 12:28       ` Peter Zijlstra
  0 siblings, 1 reply; 63+ messages in thread
From: Byungchul Park @ 2017-03-01  5:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Tue, Feb 28, 2017 at 02:40:18PM +0100, Peter Zijlstra wrote:
> > +static int commit_xhlocks(struct cross_lock *xlock)
> > +{
> > +	struct task_struct *curr = current;
> > +	struct hist_lock *xhlock_c = xhlock_curr(curr);
> > +	struct hist_lock *xhlock = xhlock_c;
> > +
> > +	do {
> > +		xhlock = xhlock_prev(curr, xhlock);
> > +
> > +		if (!xhlock_used(xhlock))
> > +			break;
> > +
> > +		if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
> > +			break;
> > +
> > +		if (same_context_xhlock(xhlock) &&
> > +		    before(xhlock->prev_gen_id, xlock->hlock.gen_id) &&
> > +		    !commit_xhlock(xlock, xhlock))
> > +			return 0;
> > +	} while (xhlock_c != xhlock);
> > +
> > +	return 1;
> > +}
> 
> So I'm still struggling with prev_gen_id; is it an optimization or is it
> required for correctness?

It's an optimization, but very essential and important optimization.

          in hlocks[]
          ------------
          A gen_id (4) --+
                         | previous gen_id
          B gen_id (3) <-+
          C gen_id (3)
          D gen_id (2)
oldest -> E gen_id (1)

          in xhlocks[]
          ------------
       ^  A gen_id (4) prev_gen_id (3: B's gen id)
       |  B gen_id (3) prev_gen_id (3: C's gen id)
       |  C gen_id (3) prev_gen_id (2: D's gen id)
       |  D gen_id (2) prev_gen_id (1: E's gen id)
       |  E gen_id (1) prev_gen_id (NA)

Let's consider the case that the gen id of xlock to commit is 3.

In this case, it's engough to generate 'the xlock -> C'. 'the xlock -> B'
and 'the xlock -> A' are unnecessary since it's covered by 'C -> B' and
'B -> A' which are already generated by original lockdep.

I use the prev_gen_id to avoid adding this kind of redundant
dependencies. In other words, xhlock->prev_gen_id >= xlock->hlock.gen_id
means that the previous lock in hlocks[] is able to handle the
dependency on its commit stage.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-01  5:17     ` Byungchul Park
@ 2017-03-01  6:18       ` Byungchul Park
  2017-03-02  2:52       ` Byungchul Park
  1 sibling, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-03-01  6:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Wed, Mar 01, 2017 at 02:17:07PM +0900, Byungchul Park wrote:
> On Tue, Feb 28, 2017 at 04:49:00PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> > 
> > > +struct cross_lock {
> > > +	/*
> > > +	 * When more than one acquisition of crosslocks are overlapped,
> > > +	 * we do actual commit only when ref == 0.
> > > +	 */
> > > +	atomic_t ref;
> > 
> > That comment doesn't seem right, should that be: ref != 0 ?
> > Also; would it not be much clearer to call this: nr_blocked, or waiters
> > or something along those lines, because that is what it appears to be.

Honestly, I forgot why I introduced the ref.. I will remove the ref next
spin, and handle waiters in another way.

Thank you,
Byungchul

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-02-28 12:49     ` Peter Zijlstra
@ 2017-03-01  6:20       ` Byungchul Park
  0 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-03-01  6:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Tue, Feb 28, 2017 at 01:49:06PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 28, 2017 at 01:45:07PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> > > +	/*
> > > +	 * struct held_lock does not have an indicator whether in nmi.
> > > +	 */
> > > +	int nmi;
> > 
> > Do we really need this? Lockdep doesn't really know about NMI context,
> > so its weird to now partially introduce it.
> 
> That is, see how nmi_enter() includes lockdep_off().

Indeed. OK. I will fix it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-02-28 18:15   ` Peter Zijlstra
@ 2017-03-01  7:21     ` Byungchul Park
  2017-03-01 10:43       ` Peter Zijlstra
  2017-03-02  4:20     ` Matthew Wilcox
  2017-03-14  7:36     ` Byungchul Park
  2 siblings, 1 reply; 63+ messages in thread
From: Byungchul Park @ 2017-03-01  7:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Tue, Feb 28, 2017 at 07:15:47PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> > +	/*
> > +	 * Each work of workqueue might run in a different context,
> > +	 * thanks to concurrency support of workqueue. So we have to
> > +	 * distinguish each work to avoid false positive.
> > +	 *
> > +	 * TODO: We can also add dependencies between two acquisitions
> > +	 * of different work_id, if they don't cause a sleep so make
> > +	 * the worker stalled.
> > +	 */
> > +	unsigned int		work_id;
> 
> > +/*
> > + * Crossrelease needs to distinguish each work of workqueues.
> > + * Caller is supposed to be a worker.
> > + */
> > +void crossrelease_work_start(void)
> > +{
> > +	if (current->xhlocks)
> > +		current->work_id++;
> > +}
> 
> So what you're trying to do with that 'work_id' thing is basically wipe
> the entire history when we're at the bottom of a context.

Sorry, but I do not understand what you are trying to say.

What I was trying to do with the 'work_id' is to distinguish between
different works, which will be used to check if history locks were held
in the same context as a release one.

> Which is a useful operation, but should arguably also be done on the
> return to userspace path. Any historical lock from before the current
> syscall is irrelevant.

Sorry. Could you explain it more?

> 
> (And we should not be returning to userspace with locks held anyway --
> lockdep already has a check for that).

Yes right. We should not be returning to userspace without reporting it
in that case.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-01  7:21     ` Byungchul Park
@ 2017-03-01 10:43       ` Peter Zijlstra
  2017-03-01 12:27         ` Byungchul Park
  0 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2017-03-01 10:43 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Wed, Mar 01, 2017 at 04:21:28PM +0900, Byungchul Park wrote:
> On Tue, Feb 28, 2017 at 07:15:47PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> > > +	/*
> > > +	 * Each work of workqueue might run in a different context,
> > > +	 * thanks to concurrency support of workqueue. So we have to
> > > +	 * distinguish each work to avoid false positive.
> > > +	 *
> > > +	 * TODO: We can also add dependencies between two acquisitions
> > > +	 * of different work_id, if they don't cause a sleep so make
> > > +	 * the worker stalled.
> > > +	 */
> > > +	unsigned int		work_id;
> > 
> > > +/*
> > > + * Crossrelease needs to distinguish each work of workqueues.
> > > + * Caller is supposed to be a worker.
> > > + */
> > > +void crossrelease_work_start(void)
> > > +{
> > > +	if (current->xhlocks)
> > > +		current->work_id++;
> > > +}
> > 
> > So what you're trying to do with that 'work_id' thing is basically wipe
> > the entire history when we're at the bottom of a context.
> 
> Sorry, but I do not understand what you are trying to say.
> 
> What I was trying to do with the 'work_id' is to distinguish between
> different works, which will be used to check if history locks were held
> in the same context as a release one.

The effect of changing work_id is that history disappears, yes? That is,
by changing it, all our hist_locks don't match the context anymore and
therefore we have no history.

This is a useful operation.

You would want to do this at points where you know there will not be any
dependencies on prior action, and typically at the same points we want
to not be holding any locks.

Hence my term: 'bottom of a context', referring to an empty (held) lock
stack.

I would say this needs to be done for all 'work-queue' like things, and
there are quite a few outside of the obvious ones, smpboot threads and
many other kthreads fall into this category.

Similarly the return to userspace point that I already mentioned.

I would propose something like:

	lockdep_assert_empty();

Or something similar, which would verify the lock stack is indeed empty
and wipe our entire hist_lock buffer when cross-release is enabled.

> > Which is a useful operation, but should arguably also be done on the
> > return to userspace path. Any historical lock from before the current
> > syscall is irrelevant.
> 
> Sorry. Could you explain it more?

Does the above make things clear?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-01  4:40         ` Byungchul Park
@ 2017-03-01 10:45           ` Peter Zijlstra
  2017-03-01 12:10             ` Byungchul Park
  0 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2017-03-01 10:45 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Wed, Mar 01, 2017 at 01:40:33PM +0900, Byungchul Park wrote:

> Right. I decided to force MAX_XHLOCKS_NR to be power of 2 and everything
> became easy. Thank you very much.

Something like:

	BUILD_BUG_ON(MAX_XHLOCKS_NR & (MAX_XHLOCK_NR - 1));

Should enforce I think.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-01 10:45           ` Peter Zijlstra
@ 2017-03-01 12:10             ` Byungchul Park
  0 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-03-01 12:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Wed, Mar 01, 2017 at 11:45:48AM +0100, Peter Zijlstra wrote:
> On Wed, Mar 01, 2017 at 01:40:33PM +0900, Byungchul Park wrote:
> 
> > Right. I decided to force MAX_XHLOCKS_NR to be power of 2 and everything
> > became easy. Thank you very much.
> 
> Something like:
> 
> 	BUILD_BUG_ON(MAX_XHLOCKS_NR & (MAX_XHLOCK_NR - 1));
> 
> Should enforce I think.

Good idea! Thank you very much.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-01 10:43       ` Peter Zijlstra
@ 2017-03-01 12:27         ` Byungchul Park
  0 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-03-01 12:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Wed, Mar 01, 2017 at 11:43:28AM +0100, Peter Zijlstra wrote:
> On Wed, Mar 01, 2017 at 04:21:28PM +0900, Byungchul Park wrote:
> > On Tue, Feb 28, 2017 at 07:15:47PM +0100, Peter Zijlstra wrote:
> > > On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> > > > +	/*
> > > > +	 * Each work of workqueue might run in a different context,
> > > > +	 * thanks to concurrency support of workqueue. So we have to
> > > > +	 * distinguish each work to avoid false positive.
> > > > +	 *
> > > > +	 * TODO: We can also add dependencies between two acquisitions
> > > > +	 * of different work_id, if they don't cause a sleep so make
> > > > +	 * the worker stalled.
> > > > +	 */
> > > > +	unsigned int		work_id;
> > > 
> > > > +/*
> > > > + * Crossrelease needs to distinguish each work of workqueues.
> > > > + * Caller is supposed to be a worker.
> > > > + */
> > > > +void crossrelease_work_start(void)
> > > > +{
> > > > +	if (current->xhlocks)
> > > > +		current->work_id++;
> > > > +}
> > > 
> > > So what you're trying to do with that 'work_id' thing is basically wipe
> > > the entire history when we're at the bottom of a context.
> > 
> > Sorry, but I do not understand what you are trying to say.
> > 
> > What I was trying to do with the 'work_id' is to distinguish between
> > different works, which will be used to check if history locks were held
> > in the same context as a release one.
> 
> The effect of changing work_id is that history disappears, yes? That is,
> by changing it, all our hist_locks don't match the context anymore and
> therefore we have no history.

Right. Now I understood your words.

> This is a useful operation.
> 
> You would want to do this at points where you know there will not be any
> dependencies on prior action, and typically at the same points we want
> to not be holding any locks.
> 
> Hence my term: 'bottom of a context', referring to an empty (held) lock
> stack.

Right.

> I would say this needs to be done for all 'work-queue' like things, and

Of course.

> there are quite a few outside of the obvious ones, smpboot threads and
> many other kthreads fall into this category.

Where can I check those?

> Similarly the return to userspace point that I already mentioned.
> 
> I would propose something like:
> 
> 	lockdep_assert_empty();
> 
> Or something similar, which would verify the lock stack is indeed empty
> and wipe our entire hist_lock buffer when cross-release is enabled.

Right. I should do that.

> > > Which is a useful operation, but should arguably also be done on the
> > > return to userspace path. Any historical lock from before the current
> > > syscall is irrelevant.

Let me think more. It looks not a simple problem.

> > 
> > Sorry. Could you explain it more?
> 
> Does the above make things clear?

Perfect. Thank you very much.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-01  5:43     ` Byungchul Park
@ 2017-03-01 12:28       ` Peter Zijlstra
  2017-03-02 13:40         ` Peter Zijlstra
  0 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2017-03-01 12:28 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Wed, Mar 01, 2017 at 02:43:23PM +0900, Byungchul Park wrote:
> On Tue, Feb 28, 2017 at 02:40:18PM +0100, Peter Zijlstra wrote:
> > > +static int commit_xhlocks(struct cross_lock *xlock)
> > > +{
> > > +	struct task_struct *curr = current;
> > > +	struct hist_lock *xhlock_c = xhlock_curr(curr);
> > > +	struct hist_lock *xhlock = xhlock_c;
> > > +
> > > +	do {
> > > +		xhlock = xhlock_prev(curr, xhlock);
> > > +
> > > +		if (!xhlock_used(xhlock))
> > > +			break;
> > > +
> > > +		if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
> > > +			break;
> > > +
> > > +		if (same_context_xhlock(xhlock) &&
> > > +		    before(xhlock->prev_gen_id, xlock->hlock.gen_id) &&
> > > +		    !commit_xhlock(xlock, xhlock))
> > > +			return 0;
> > > +	} while (xhlock_c != xhlock);
> > > +
> > > +	return 1;
> > > +}
> > 
> > So I'm still struggling with prev_gen_id; is it an optimization or is it
> > required for correctness?
> 
> It's an optimization, but very essential and important optimization.
> 
>           in hlocks[]
>           ------------
>           A gen_id (4) --+
>                          | previous gen_id
>           B gen_id (3) <-+
>           C gen_id (3)
>           D gen_id (2)
> oldest -> E gen_id (1)
> 
>           in xhlocks[]
>           ------------
>        ^  A gen_id (4) prev_gen_id (3: B's gen id)
>        |  B gen_id (3) prev_gen_id (3: C's gen id)
>        |  C gen_id (3) prev_gen_id (2: D's gen id)
>        |  D gen_id (2) prev_gen_id (1: E's gen id)
>        |  E gen_id (1) prev_gen_id (NA)
> 
> Let's consider the case that the gen id of xlock to commit is 3.
> 
> In this case, it's engough to generate 'the xlock -> C'. 'the xlock -> B'
> and 'the xlock -> A' are unnecessary since it's covered by 'C -> B' and
> 'B -> A' which are already generated by original lockdep.
> 
> I use the prev_gen_id to avoid adding this kind of redundant
> dependencies. In other words, xhlock->prev_gen_id >= xlock->hlock.gen_id
> means that the previous lock in hlocks[] is able to handle the
> dependency on its commit stage.
> 

Aah, I completely missed it was against held_locks.

Hurm.. it feels like this is solving a problem we shouldn't be solving
though.

That is, ideally we'd already be able to (quickly) tell if a relation
exists or not, but given how the whole chain_hash stuff is build now, it
looks like we cannot.


Let me think about this a bit more.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-01  5:17     ` Byungchul Park
  2017-03-01  6:18       ` Byungchul Park
@ 2017-03-02  2:52       ` Byungchul Park
  1 sibling, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-03-02  2:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Wed, Mar 01, 2017 at 02:17:07PM +0900, Byungchul Park wrote:
> > > +void lock_commit_crosslock(struct lockdep_map *lock)
> > > +{
> > > +	struct cross_lock *xlock;
> > > +	unsigned long flags;
> > > +
> > > +	if (!current->xhlocks)
> > > +		return;
> > > +
> > > +	if (unlikely(current->lockdep_recursion))
> > > +		return;
> > > +
> > > +	raw_local_irq_save(flags);
> > > +	check_flags(flags);
> > > +	current->lockdep_recursion = 1;
> > > +
> > > +	if (unlikely(!debug_locks))
> > > +		return;
> > > +
> > > +	if (!graph_lock())
> > > +		return;
> > > +
> > > +	xlock = &((struct lockdep_map_cross *)lock)->xlock;
> > > +	if (atomic_read(&xlock->ref) > 0 && !commit_xhlocks(xlock))
> > 
> > You terminate with graph_lock() held.
> 
> Oops. What did I do? I'll fix it.

I remembered it. It's no problem because it would terminate there, only
if _both_ 'xlock->ref > 0' and 'commit_xhlocks returns 0' are true.
Otherwise, it will unlock the lock safely.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-02-28 18:15   ` Peter Zijlstra
  2017-03-01  7:21     ` Byungchul Park
@ 2017-03-02  4:20     ` Matthew Wilcox
  2017-03-02  4:45       ` byungchul.park
  2017-03-14  7:36     ` Byungchul Park
  2 siblings, 1 reply; 63+ messages in thread
From: Matthew Wilcox @ 2017-03-02  4:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Byungchul Park, mingo, tglx, walken, boqun.feng, kirill,
	linux-kernel, linux-mm, iamjoonsoo.kim, akpm, npiggin

On Tue, Feb 28, 2017 at 07:15:47PM +0100, Peter Zijlstra wrote:
> (And we should not be returning to userspace with locks held anyway --
> lockdep already has a check for that).

Don't we return to userspace with page locks held, eg during async directio?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-02  4:20     ` Matthew Wilcox
@ 2017-03-02  4:45       ` byungchul.park
  2017-03-02 14:39         ` Matthew Wilcox
  0 siblings, 1 reply; 63+ messages in thread
From: byungchul.park @ 2017-03-02  4:45 UTC (permalink / raw)
  To: 'Matthew Wilcox', 'Peter Zijlstra'
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

> -----Original Message-----
> From: Matthew Wilcox [mailto:willy@infradead.org]
> Sent: Thursday, March 02, 2017 1:20 PM
> To: Peter Zijlstra
> Cc: Byungchul Park; mingo@kernel.org; tglx@linutronix.de;
> walken@google.com; boqun.feng@gmail.com; kirill@shutemov.name; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; iamjoonsoo.kim@lge.com;
> akpm@linux-foundation.org; npiggin@gmail.com
> Subject: Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
> 
> On Tue, Feb 28, 2017 at 07:15:47PM +0100, Peter Zijlstra wrote:
> > (And we should not be returning to userspace with locks held anyway --
> > lockdep already has a check for that).
> 
> Don't we return to userspace with page locks held, eg during async
> directio?

Hello,

I think that the check when returning to user with crosslocks held
should be an exception. Don't you think so?

Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-01 12:28       ` Peter Zijlstra
@ 2017-03-02 13:40         ` Peter Zijlstra
  2017-03-03  0:17           ` Byungchul Park
  2017-03-03  0:39           ` Byungchul Park
  0 siblings, 2 replies; 63+ messages in thread
From: Peter Zijlstra @ 2017-03-02 13:40 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team, Michal Hocko,
	Nikolay Borisov, Mel Gorman

On Wed, Mar 01, 2017 at 01:28:43PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 01, 2017 at 02:43:23PM +0900, Byungchul Park wrote:

> > It's an optimization, but very essential and important optimization.

Since its not for correctness, please put it in a separate patch with a
good Changelog, the below would make a good beginning of that.

Also, I feel, the source comments can be improved.

> >           in hlocks[]
> >           ------------
> >           A gen_id (4) --+
> >                          | previous gen_id
> >           B gen_id (3) <-+
> >           C gen_id (3)
> >           D gen_id (2)
> > oldest -> E gen_id (1)
> > 
> >           in xhlocks[]
> >           ------------
> >        ^  A gen_id (4) prev_gen_id (3: B's gen id)
> >        |  B gen_id (3) prev_gen_id (3: C's gen id)
> >        |  C gen_id (3) prev_gen_id (2: D's gen id)
> >        |  D gen_id (2) prev_gen_id (1: E's gen id)
> >        |  E gen_id (1) prev_gen_id (NA)
> > 
> > Let's consider the case that the gen id of xlock to commit is 3.
> > 
> > In this case, it's engough to generate 'the xlock -> C'. 'the xlock -> B'
> > and 'the xlock -> A' are unnecessary since it's covered by 'C -> B' and
> > 'B -> A' which are already generated by original lockdep.
> > 
> > I use the prev_gen_id to avoid adding this kind of redundant
> > dependencies. In other words, xhlock->prev_gen_id >= xlock->hlock.gen_id
> > means that the previous lock in hlocks[] is able to handle the
> > dependency on its commit stage.
> > 
> 
> Aah, I completely missed it was against held_locks.
> 
> Hurm.. it feels like this is solving a problem we shouldn't be solving
> though.
> 
> That is, ideally we'd already be able to (quickly) tell if a relation
> exists or not, but given how the whole chain_hash stuff is build now, it
> looks like we cannot.
> 
> 
> Let me think about this a bit more.

OK, so neither this nor the chain-hash completely avoid redundant
dependencies. The only way to do that is doing graph-walks for every
proposed link.

Now, we already do a ton of __bfs() walks in check_prev_add(), so one
more might not hurt too much [*].

Esp. with the chain-hash avoiding all the obvious duplicate work, this
might just work.

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index a95e5d1..7baea89 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1860,6 +1860,17 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
 		}
 	}
 
+	/*
+	 * Is the <prev> -> <next> redundant?
+	 */
+	this.class = hlock_class(prev);
+	this.parent = NULL;
+	ret = check_noncircular(&this, hlock_class(next), &target_entry);
+	if (!ret) /* exists, redundant */
+		return 2;
+	if (ret < 0)
+		return print_bfs_bug(ret);
+
 	if (!*stack_saved) {
 		if (!save_trace(&trace))
 			return 0;



[*] A while ago someone, and I cannot find the email just now, asked if
we could not implement the RECLAIM_FS inversion stuff with a 'fake' lock
like we use for other things like workqueues etc. I think this should be
possible which allows reducing the 'irq' states and will reduce the
amount of __bfs() lookups we do.

Removing the 1 IRQ state, would result in 4 less __bfs() walks if I'm
not mistaken, more than making up for the 1 we'd have to add to detect
redundant links.


 include/linux/lockdep.h         | 11 +-----
 include/linux/sched.h           |  1 -
 kernel/locking/lockdep.c        | 87 +----------------------------------------
 kernel/locking/lockdep_states.h |  1 -
 mm/internal.h                   | 40 +++++++++++++++++++
 mm/page_alloc.c                 | 13 ++++--
 mm/slab.h                       |  7 +++-
 mm/slob.c                       |  8 +++-
 mm/vmscan.c                     | 13 +++---
 9 files changed, 71 insertions(+), 110 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 1e327bb..6ba1a65 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -29,7 +29,7 @@ extern int lock_stat;
  * We'd rather not expose kernel/lockdep_states.h this wide, but we do need
  * the total number of states... :-(
  */
-#define XXX_LOCK_USAGE_STATES		(1+3*4)
+#define XXX_LOCK_USAGE_STATES		(1+2*4)
 
 /*
  * NR_LOCKDEP_CACHING_CLASSES ... Number of classes
@@ -361,10 +361,6 @@ static inline void lock_set_subclass(struct lockdep_map *lock,
 	lock_set_class(lock, lock->name, lock->key, subclass, ip);
 }
 
-extern void lockdep_set_current_reclaim_state(gfp_t gfp_mask);
-extern void lockdep_clear_current_reclaim_state(void);
-extern void lockdep_trace_alloc(gfp_t mask);
-
 struct pin_cookie { unsigned int val; };
 
 #define NIL_COOKIE (struct pin_cookie){ .val = 0U, }
@@ -373,7 +369,7 @@ extern struct pin_cookie lock_pin_lock(struct lockdep_map *lock);
 extern void lock_repin_lock(struct lockdep_map *lock, struct pin_cookie);
 extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie);
 
-# define INIT_LOCKDEP				.lockdep_recursion = 0, .lockdep_reclaim_gfp = 0,
+# define INIT_LOCKDEP				.lockdep_recursion = 0,
 
 #define lockdep_depth(tsk)	(debug_locks ? (tsk)->lockdep_depth : 0)
 
@@ -413,9 +409,6 @@ static inline void lockdep_on(void)
 # define lock_release(l, n, i)			do { } while (0)
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
-# define lockdep_set_current_reclaim_state(g)	do { } while (0)
-# define lockdep_clear_current_reclaim_state()	do { } while (0)
-# define lockdep_trace_alloc(g)			do { } while (0)
 # define lockdep_info()				do { } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d67eee8..0fa8a8f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -806,7 +806,6 @@ struct task_struct {
 	int				lockdep_depth;
 	unsigned int			lockdep_recursion;
 	struct held_lock		held_locks[MAX_LOCK_DEPTH];
-	gfp_t				lockdep_reclaim_gfp;
 #endif
 
 #ifdef CONFIG_UBSAN
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index a95e5d1..1051600 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -343,14 +343,12 @@ EXPORT_SYMBOL(lockdep_on);
 #if VERBOSE
 # define HARDIRQ_VERBOSE	1
 # define SOFTIRQ_VERBOSE	1
-# define RECLAIM_VERBOSE	1
 #else
 # define HARDIRQ_VERBOSE	0
 # define SOFTIRQ_VERBOSE	0
-# define RECLAIM_VERBOSE	0
 #endif
 
-#if VERBOSE || HARDIRQ_VERBOSE || SOFTIRQ_VERBOSE || RECLAIM_VERBOSE
+#if VERBOSE || HARDIRQ_VERBOSE || SOFTIRQ_VERBOSE
 /*
  * Quick filtering for interesting events:
  */
@@ -2553,14 +2551,6 @@ static int SOFTIRQ_verbose(struct lock_class *class)
 	return 0;
 }
 
-static int RECLAIM_FS_verbose(struct lock_class *class)
-{
-#if RECLAIM_VERBOSE
-	return class_filter(class);
-#endif
-	return 0;
-}
-
 #define STRICT_READ_CHECKS	1
 
 static int (*state_verbose_f[])(struct lock_class *class) = {
@@ -2856,51 +2846,6 @@ void trace_softirqs_off(unsigned long ip)
 		debug_atomic_inc(redundant_softirqs_off);
 }
 
-static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags)
-{
-	struct task_struct *curr = current;
-
-	if (unlikely(!debug_locks))
-		return;
-
-	/* no reclaim without waiting on it */
-	if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
-		return;
-
-	/* this guy won't enter reclaim */
-	if ((curr->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
-		return;
-
-	/* We're only interested __GFP_FS allocations for now */
-	if (!(gfp_mask & __GFP_FS))
-		return;
-
-	/*
-	 * Oi! Can't be having __GFP_FS allocations with IRQs disabled.
-	 */
-	if (DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)))
-		return;
-
-	mark_held_locks(curr, RECLAIM_FS);
-}
-
-static void check_flags(unsigned long flags);
-
-void lockdep_trace_alloc(gfp_t gfp_mask)
-{
-	unsigned long flags;
-
-	if (unlikely(current->lockdep_recursion))
-		return;
-
-	raw_local_irq_save(flags);
-	check_flags(flags);
-	current->lockdep_recursion = 1;
-	__lockdep_trace_alloc(gfp_mask, flags);
-	current->lockdep_recursion = 0;
-	raw_local_irq_restore(flags);
-}
-
 static int mark_irqflags(struct task_struct *curr, struct held_lock *hlock)
 {
 	/*
@@ -2946,22 +2891,6 @@ static int mark_irqflags(struct task_struct *curr, struct held_lock *hlock)
 		}
 	}
 
-	/*
-	 * We reuse the irq context infrastructure more broadly as a general
-	 * context checking code. This tests GFP_FS recursion (a lock taken
-	 * during reclaim for a GFP_FS allocation is held over a GFP_FS
-	 * allocation).
-	 */
-	if (!hlock->trylock && (curr->lockdep_reclaim_gfp & __GFP_FS)) {
-		if (hlock->read) {
-			if (!mark_lock(curr, hlock, LOCK_USED_IN_RECLAIM_FS_READ))
-					return 0;
-		} else {
-			if (!mark_lock(curr, hlock, LOCK_USED_IN_RECLAIM_FS))
-					return 0;
-		}
-	}
-
 	return 1;
 }
 
@@ -3020,10 +2949,6 @@ static inline int separate_irq_context(struct task_struct *curr,
 	return 0;
 }
 
-void lockdep_trace_alloc(gfp_t gfp_mask)
-{
-}
-
 #endif /* defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_PROVE_LOCKING) */
 
 /*
@@ -3859,16 +3784,6 @@ void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie cookie)
 }
 EXPORT_SYMBOL_GPL(lock_unpin_lock);
 
-void lockdep_set_current_reclaim_state(gfp_t gfp_mask)
-{
-	current->lockdep_reclaim_gfp = gfp_mask;
-}
-
-void lockdep_clear_current_reclaim_state(void)
-{
-	current->lockdep_reclaim_gfp = 0;
-}
-
 #ifdef CONFIG_LOCK_STAT
 static int
 print_lock_contention_bug(struct task_struct *curr, struct lockdep_map *lock,
diff --git a/kernel/locking/lockdep_states.h b/kernel/locking/lockdep_states.h
index 995b0cc..35ca09f 100644
--- a/kernel/locking/lockdep_states.h
+++ b/kernel/locking/lockdep_states.h
@@ -6,4 +6,3 @@
  */
 LOCKDEP_STATE(HARDIRQ)
 LOCKDEP_STATE(SOFTIRQ)
-LOCKDEP_STATE(RECLAIM_FS)
diff --git a/mm/internal.h b/mm/internal.h
index ccfc2a2..88b9107 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -15,6 +15,8 @@
 #include <linux/mm.h>
 #include <linux/pagemap.h>
 #include <linux/tracepoint-defs.h>
+#include <linux/lockdep.h>
+#include <linux/sched/mm.h>
 
 /*
  * The set of flags that only affect watermark checking and reclaim
@@ -498,4 +500,42 @@ extern const struct trace_print_flags pageflag_names[];
 extern const struct trace_print_flags vmaflag_names[];
 extern const struct trace_print_flags gfpflag_names[];
 
+
+#ifdef CONFIG_LOCKDEP
+extern struct lockdep_map __fs_reclaim_map;
+
+static inline bool __need_fs_reclaim(gfp_t gfp_mask)
+{
+	gfp_mask = memalloc_noio_flags(gfp_mask);
+
+	/* no reclaim without waiting on it */
+	if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
+		return false;
+
+	/* this guy won't enter reclaim */
+	if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
+		return false;
+
+	/* We're only interested __GFP_FS allocations for now */
+	if (!(gfp_mask & __GFP_FS))
+		return false;
+
+	return true;
+}
+
+static inline void fs_reclaim_acquire(gfp_t gfp_mask)
+{
+	if (__need_fs_reclaim(gfp_mask))
+		lock_map_acquire(&__fs_reclaim_map);
+}
+static inline void fs_reclaim_release(gfp_t gfp_mask)
+{
+	if (__need_fs_reclaim(gfp_mask))
+		lock_map_release(&__fs_reclaim_map);
+}
+#else
+static inline void fs_reclaim_acquire(gfp_t gfp_mask) { }
+static inline void fs_reclaim_release(gfp_t gfp_mask) { }
+#endif
+
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index eaa64d2..85ea8bf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3387,6 +3387,12 @@ should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_fla
 }
 #endif /* CONFIG_COMPACTION */
 
+
+#ifdef CONFIG_LOCKDEP
+struct lockdep_map __fs_reclaim_map =
+	STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map);
+#endif
+
 /* Perform direct synchronous page reclaim */
 static int
 __perform_reclaim(gfp_t gfp_mask, unsigned int order,
@@ -3400,7 +3406,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
 	/* We now go into synchronous reclaim */
 	cpuset_memory_pressure_bump();
 	current->flags |= PF_MEMALLOC;
-	lockdep_set_current_reclaim_state(gfp_mask);
+	fs_reclaim_acquire(gfp_mask);
 	reclaim_state.reclaimed_slab = 0;
 	current->reclaim_state = &reclaim_state;
 
@@ -3408,7 +3414,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
 								ac->nodemask);
 
 	current->reclaim_state = NULL;
-	lockdep_clear_current_reclaim_state();
+	fs_reclaim_release(gfp_mask);
 	current->flags &= ~PF_MEMALLOC;
 
 	cond_resched();
@@ -3913,7 +3919,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
 			*alloc_flags |= ALLOC_CPUSET;
 	}
 
-	lockdep_trace_alloc(gfp_mask);
+	fs_reclaim_acquire(gfp_mask);
+	fs_reclaim_release(gfp_mask);
 
 	might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
 
diff --git a/mm/slab.h b/mm/slab.h
index 65e7c3f..753f552 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -44,6 +44,8 @@ struct kmem_cache {
 #include <linux/kmemleak.h>
 #include <linux/random.h>
 
+#include "internal.h"
+
 /*
  * State of the slab allocator.
  *
@@ -428,7 +430,10 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
 						     gfp_t flags)
 {
 	flags &= gfp_allowed_mask;
-	lockdep_trace_alloc(flags);
+
+	fs_reclaim_acquire(flags);
+	fs_reclaim_release(flags);
+
 	might_sleep_if(gfpflags_allow_blocking(flags));
 
 	if (should_failslab(s, flags))
diff --git a/mm/slob.c b/mm/slob.c
index eac04d43..3e32280 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -73,6 +73,8 @@
 #include <linux/atomic.h>
 
 #include "slab.h"
+#include "internal.h"
+
 /*
  * slob_block has a field 'units', which indicates size of block if +ve,
  * or offset of next block if -ve (in SLOB_UNITs).
@@ -432,7 +434,8 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 
 	gfp &= gfp_allowed_mask;
 
-	lockdep_trace_alloc(gfp);
+	fs_reclaim_acquire(gfp);
+	fs_reclaim_release(gfp);
 
 	if (size < PAGE_SIZE - align) {
 		if (!size)
@@ -538,7 +541,8 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 
 	flags &= gfp_allowed_mask;
 
-	lockdep_trace_alloc(flags);
+	fs_reclaim_acquire(flags);
+	fs_reclaim_release(flags);
 
 	if (c->size < PAGE_SIZE) {
 		b = slob_alloc(c->size, flags, c->align, node);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bc8031e..2f57e36 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3418,8 +3418,6 @@ static int kswapd(void *p)
 	};
 	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
 
-	lockdep_set_current_reclaim_state(GFP_KERNEL);
-
 	if (!cpumask_empty(cpumask))
 		set_cpus_allowed_ptr(tsk, cpumask);
 	current->reclaim_state = &reclaim_state;
@@ -3475,7 +3473,9 @@ static int kswapd(void *p)
 		 */
 		trace_mm_vmscan_kswapd_wake(pgdat->node_id, classzone_idx,
 						alloc_order);
+		fs_reclaim_acquire(GFP_KERNEL);
 		reclaim_order = balance_pgdat(pgdat, alloc_order, classzone_idx);
+		fs_reclaim_release(GFP_KERNEL);
 		if (reclaim_order < alloc_order)
 			goto kswapd_try_sleep;
 
@@ -3485,7 +3485,6 @@ static int kswapd(void *p)
 
 	tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
 	current->reclaim_state = NULL;
-	lockdep_clear_current_reclaim_state();
 
 	return 0;
 }
@@ -3550,14 +3549,14 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
 	unsigned long nr_reclaimed;
 
 	p->flags |= PF_MEMALLOC;
-	lockdep_set_current_reclaim_state(sc.gfp_mask);
+	fs_reclaim_acquire(sc.gfp_mask);
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
 	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
 
 	p->reclaim_state = NULL;
-	lockdep_clear_current_reclaim_state();
+	fs_reclaim_release(sc.gfp_mask);
 	p->flags &= ~PF_MEMALLOC;
 
 	return nr_reclaimed;
@@ -3741,7 +3740,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 	 * and RECLAIM_UNMAP.
 	 */
 	p->flags |= PF_MEMALLOC | PF_SWAPWRITE;
-	lockdep_set_current_reclaim_state(gfp_mask);
+	fs_reclaim_acquire(gfp_mask);
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
@@ -3756,8 +3755,8 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 	}
 
 	p->reclaim_state = NULL;
+	fs_reclaim_release(gfp_mask);
 	current->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE);
-	lockdep_clear_current_reclaim_state();
 	return sc.nr_reclaimed >= nr_pages;
 }
 

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-01-18 13:17 ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
                     ` (6 preceding siblings ...)
  2017-02-28 18:15   ` Peter Zijlstra
@ 2017-03-02 13:41   ` Peter Zijlstra
  2017-03-02 23:43     ` Byungchul Park
  7 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2017-03-02 13:41 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin

On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index a6c8db1..7890661 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1042,6 +1042,19 @@ config DEBUG_LOCK_ALLOC
>  	 spin_lock_init()/mutex_init()/etc., or whether there is any lock
>  	 held during task exit.
>  
> +config LOCKDEP_CROSSRELEASE
> +	bool "Lock debugging: make lockdep work for crosslocks"
> +	select LOCKDEP
> +	select TRACE_IRQFLAGS
> +	default n
> +	help
> +	 This makes lockdep work for crosslock which is a lock allowed to
> +	 be released in a different context from the acquisition context.
> +	 Normally a lock must be released in the context acquiring the lock.
> +	 However, relexing this constraint helps synchronization primitives
> +	 such as page locks or completions can use the lock correctness
> +	 detector, lockdep.
> +
>  config PROVE_LOCKING
>  	bool "Lock debugging: prove locking correctness"
>  	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT


Does CROSSRELEASE && !PROVE_LOCKING make any sense?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-02  4:45       ` byungchul.park
@ 2017-03-02 14:39         ` Matthew Wilcox
  2017-03-02 23:50           ` Byungchul Park
  0 siblings, 1 reply; 63+ messages in thread
From: Matthew Wilcox @ 2017-03-02 14:39 UTC (permalink / raw)
  To: byungchul.park
  Cc: 'Peter Zijlstra',
	mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Thu, Mar 02, 2017 at 01:45:35PM +0900, byungchul.park wrote:
> From: Matthew Wilcox [mailto:willy@infradead.org]
> > On Tue, Feb 28, 2017 at 07:15:47PM +0100, Peter Zijlstra wrote:
> > > (And we should not be returning to userspace with locks held anyway --
> > > lockdep already has a check for that).
> > 
> > Don't we return to userspace with page locks held, eg during async
> > directio?
> 
> Hello,
> 
> I think that the check when returning to user with crosslocks held
> should be an exception. Don't you think so?

Oh yes.  We have to keep the pages locked during reads, and we have to
return to userspace before I/O is complete, therefore we have to return
to userspace with pages locked.  They'll be unlocked by the interrupt
handler in page_endio().

Speaking of which ... this feature is far too heavy for use in production
on pages.  You're almost trebling the size of struct page.  Can we
do something like make all struct pages share the same lockdep_map?
We'd have to not complain about holding one crossdep lock and acquiring
another one of the same type, but with millions of pages in the system,
it must surely be creating a gargantuan graph right now?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-02 13:41   ` Peter Zijlstra
@ 2017-03-02 23:43     ` Byungchul Park
  0 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-03-02 23:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Thu, Mar 02, 2017 at 02:41:03PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index a6c8db1..7890661 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1042,6 +1042,19 @@ config DEBUG_LOCK_ALLOC
> >  	 spin_lock_init()/mutex_init()/etc., or whether there is any lock
> >  	 held during task exit.
> >  
> > +config LOCKDEP_CROSSRELEASE
> > +	bool "Lock debugging: make lockdep work for crosslocks"
> > +	select LOCKDEP
> > +	select TRACE_IRQFLAGS
> > +	default n
> > +	help
> > +	 This makes lockdep work for crosslock which is a lock allowed to
> > +	 be released in a different context from the acquisition context.
> > +	 Normally a lock must be released in the context acquiring the lock.
> > +	 However, relexing this constraint helps synchronization primitives
> > +	 such as page locks or completions can use the lock correctness
> > +	 detector, lockdep.
> > +
> >  config PROVE_LOCKING
> >  	bool "Lock debugging: prove locking correctness"
> >  	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
> 
> 
> Does CROSSRELEASE && !PROVE_LOCKING make any sense?

No, it does not make sense. I will fix it. Thank you.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-02 14:39         ` Matthew Wilcox
@ 2017-03-02 23:50           ` Byungchul Park
  2017-03-05  8:01             ` Byungchul Park
  0 siblings, 1 reply; 63+ messages in thread
From: Byungchul Park @ 2017-03-02 23:50 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: 'Peter Zijlstra',
	mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Thu, Mar 02, 2017 at 06:39:49AM -0800, Matthew Wilcox wrote:
> On Thu, Mar 02, 2017 at 01:45:35PM +0900, byungchul.park wrote:
> > From: Matthew Wilcox [mailto:willy@infradead.org]
> > > On Tue, Feb 28, 2017 at 07:15:47PM +0100, Peter Zijlstra wrote:
> > > > (And we should not be returning to userspace with locks held anyway --
> > > > lockdep already has a check for that).
> > > 
> > > Don't we return to userspace with page locks held, eg during async
> > > directio?
> > 
> > Hello,
> > 
> > I think that the check when returning to user with crosslocks held
> > should be an exception. Don't you think so?
> 
> Oh yes.  We have to keep the pages locked during reads, and we have to
> return to userspace before I/O is complete, therefore we have to return
> to userspace with pages locked.  They'll be unlocked by the interrupt
> handler in page_endio().

Agree.

> Speaking of which ... this feature is far too heavy for use in production
> on pages.  You're almost trebling the size of struct page.  Can we
> do something like make all struct pages share the same lockdep_map?
> We'd have to not complain about holding one crossdep lock and acquiring
> another one of the same type, but with millions of pages in the system,
> it must surely be creating a gargantuan graph right now?

Um.. I will try it for page locks to work with one lockmap. That is also
what Peterz pointed out and what I worried about when implementing..

Thanks for your opinion.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-02 13:40         ` Peter Zijlstra
@ 2017-03-03  0:17           ` Byungchul Park
  2017-03-03  8:14             ` Peter Zijlstra
  2017-03-03  0:39           ` Byungchul Park
  1 sibling, 1 reply; 63+ messages in thread
From: Byungchul Park @ 2017-03-03  0:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team, Michal Hocko,
	Nikolay Borisov, Mel Gorman

On Thu, Mar 02, 2017 at 02:40:31PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 01, 2017 at 01:28:43PM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 01, 2017 at 02:43:23PM +0900, Byungchul Park wrote:
> 
> > > It's an optimization, but very essential and important optimization.
> 
> Since its not for correctness, please put it in a separate patch with a
> good Changelog, the below would make a good beginning of that.

OK. I will do it.

> Also, I feel, the source comments can be improved.
> 
> > >           in hlocks[]
> > >           ------------
> > >           A gen_id (4) --+
> > >                          | previous gen_id
> > >           B gen_id (3) <-+
> > >           C gen_id (3)
> > >           D gen_id (2)
> > > oldest -> E gen_id (1)
> > > 
> > >           in xhlocks[]
> > >           ------------
> > >        ^  A gen_id (4) prev_gen_id (3: B's gen id)
> > >        |  B gen_id (3) prev_gen_id (3: C's gen id)
> > >        |  C gen_id (3) prev_gen_id (2: D's gen id)
> > >        |  D gen_id (2) prev_gen_id (1: E's gen id)
> > >        |  E gen_id (1) prev_gen_id (NA)
> > > 
> > > Let's consider the case that the gen id of xlock to commit is 3.
> > > 
> > > In this case, it's engough to generate 'the xlock -> C'. 'the xlock -> B'
> > > and 'the xlock -> A' are unnecessary since it's covered by 'C -> B' and
> > > 'B -> A' which are already generated by original lockdep.
> > > 
> > > I use the prev_gen_id to avoid adding this kind of redundant
> > > dependencies. In other words, xhlock->prev_gen_id >= xlock->hlock.gen_id
> > > means that the previous lock in hlocks[] is able to handle the
> > > dependency on its commit stage.

> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index a95e5d1..7baea89 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -1860,6 +1860,17 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
>  		}
>  	}
>  
> +	/*
> +	 * Is the <prev> -> <next> redundant?
> +	 */
> +	this.class = hlock_class(prev);
> +	this.parent = NULL;
> +	ret = check_noncircular(&this, hlock_class(next), &target_entry);
> +	if (!ret) /* exists, redundant */
> +		return 2;
> +	if (ret < 0)
> +		return print_bfs_bug(ret);
> +
>  	if (!*stack_saved) {
>  		if (!save_trace(&trace))
>  			return 0;

This whoud be very nice if you allow to add this code. However, prev_gen_id
thingy is still useful, the code above can achieve it though. Agree?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-02 13:40         ` Peter Zijlstra
  2017-03-03  0:17           ` Byungchul Park
@ 2017-03-03  0:39           ` Byungchul Park
  1 sibling, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-03-03  0:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team, Michal Hocko,
	Nikolay Borisov, Mel Gorman

On Thu, Mar 02, 2017 at 02:40:31PM +0100, Peter Zijlstra wrote:
> [*] A while ago someone, and I cannot find the email just now, asked if
> we could not implement the RECLAIM_FS inversion stuff with a 'fake' lock

It looks interesting to me.

> like we use for other things like workqueues etc. I think this should be
> possible which allows reducing the 'irq' states and will reduce the
> amount of __bfs() lookups we do.
> 
> Removing the 1 IRQ state, would result in 4 less __bfs() walks if I'm
> not mistaken, more than making up for the 1 we'd have to add to detect
> redundant links.

OK.

Thanks,
Byungchul

> 
> 
>  include/linux/lockdep.h         | 11 +-----
>  include/linux/sched.h           |  1 -
>  kernel/locking/lockdep.c        | 87 +----------------------------------------
>  kernel/locking/lockdep_states.h |  1 -
>  mm/internal.h                   | 40 +++++++++++++++++++
>  mm/page_alloc.c                 | 13 ++++--
>  mm/slab.h                       |  7 +++-
>  mm/slob.c                       |  8 +++-
>  mm/vmscan.c                     | 13 +++---
>  9 files changed, 71 insertions(+), 110 deletions(-)
> 
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 1e327bb..6ba1a65 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -29,7 +29,7 @@ extern int lock_stat;
>   * We'd rather not expose kernel/lockdep_states.h this wide, but we do need
>   * the total number of states... :-(
>   */
> -#define XXX_LOCK_USAGE_STATES		(1+3*4)
> +#define XXX_LOCK_USAGE_STATES		(1+2*4)
>  
>  /*
>   * NR_LOCKDEP_CACHING_CLASSES ... Number of classes
> @@ -361,10 +361,6 @@ static inline void lock_set_subclass(struct lockdep_map *lock,
>  	lock_set_class(lock, lock->name, lock->key, subclass, ip);
>  }
>  
> -extern void lockdep_set_current_reclaim_state(gfp_t gfp_mask);
> -extern void lockdep_clear_current_reclaim_state(void);
> -extern void lockdep_trace_alloc(gfp_t mask);
> -
>  struct pin_cookie { unsigned int val; };
>  
>  #define NIL_COOKIE (struct pin_cookie){ .val = 0U, }
> @@ -373,7 +369,7 @@ extern struct pin_cookie lock_pin_lock(struct lockdep_map *lock);
>  extern void lock_repin_lock(struct lockdep_map *lock, struct pin_cookie);
>  extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie);
>  
> -# define INIT_LOCKDEP				.lockdep_recursion = 0, .lockdep_reclaim_gfp = 0,
> +# define INIT_LOCKDEP				.lockdep_recursion = 0,
>  
>  #define lockdep_depth(tsk)	(debug_locks ? (tsk)->lockdep_depth : 0)
>  
> @@ -413,9 +409,6 @@ static inline void lockdep_on(void)
>  # define lock_release(l, n, i)			do { } while (0)
>  # define lock_set_class(l, n, k, s, i)		do { } while (0)
>  # define lock_set_subclass(l, s, i)		do { } while (0)
> -# define lockdep_set_current_reclaim_state(g)	do { } while (0)
> -# define lockdep_clear_current_reclaim_state()	do { } while (0)
> -# define lockdep_trace_alloc(g)			do { } while (0)
>  # define lockdep_info()				do { } while (0)
>  # define lockdep_init_map(lock, name, key, sub) \
>  		do { (void)(name); (void)(key); } while (0)
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d67eee8..0fa8a8f 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -806,7 +806,6 @@ struct task_struct {
>  	int				lockdep_depth;
>  	unsigned int			lockdep_recursion;
>  	struct held_lock		held_locks[MAX_LOCK_DEPTH];
> -	gfp_t				lockdep_reclaim_gfp;
>  #endif
>  
>  #ifdef CONFIG_UBSAN
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index a95e5d1..1051600 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -343,14 +343,12 @@ EXPORT_SYMBOL(lockdep_on);
>  #if VERBOSE
>  # define HARDIRQ_VERBOSE	1
>  # define SOFTIRQ_VERBOSE	1
> -# define RECLAIM_VERBOSE	1
>  #else
>  # define HARDIRQ_VERBOSE	0
>  # define SOFTIRQ_VERBOSE	0
> -# define RECLAIM_VERBOSE	0
>  #endif
>  
> -#if VERBOSE || HARDIRQ_VERBOSE || SOFTIRQ_VERBOSE || RECLAIM_VERBOSE
> +#if VERBOSE || HARDIRQ_VERBOSE || SOFTIRQ_VERBOSE
>  /*
>   * Quick filtering for interesting events:
>   */
> @@ -2553,14 +2551,6 @@ static int SOFTIRQ_verbose(struct lock_class *class)
>  	return 0;
>  }
>  
> -static int RECLAIM_FS_verbose(struct lock_class *class)
> -{
> -#if RECLAIM_VERBOSE
> -	return class_filter(class);
> -#endif
> -	return 0;
> -}
> -
>  #define STRICT_READ_CHECKS	1
>  
>  static int (*state_verbose_f[])(struct lock_class *class) = {
> @@ -2856,51 +2846,6 @@ void trace_softirqs_off(unsigned long ip)
>  		debug_atomic_inc(redundant_softirqs_off);
>  }
>  
> -static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags)
> -{
> -	struct task_struct *curr = current;
> -
> -	if (unlikely(!debug_locks))
> -		return;
> -
> -	/* no reclaim without waiting on it */
> -	if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
> -		return;
> -
> -	/* this guy won't enter reclaim */
> -	if ((curr->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
> -		return;
> -
> -	/* We're only interested __GFP_FS allocations for now */
> -	if (!(gfp_mask & __GFP_FS))
> -		return;
> -
> -	/*
> -	 * Oi! Can't be having __GFP_FS allocations with IRQs disabled.
> -	 */
> -	if (DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)))
> -		return;
> -
> -	mark_held_locks(curr, RECLAIM_FS);
> -}
> -
> -static void check_flags(unsigned long flags);
> -
> -void lockdep_trace_alloc(gfp_t gfp_mask)
> -{
> -	unsigned long flags;
> -
> -	if (unlikely(current->lockdep_recursion))
> -		return;
> -
> -	raw_local_irq_save(flags);
> -	check_flags(flags);
> -	current->lockdep_recursion = 1;
> -	__lockdep_trace_alloc(gfp_mask, flags);
> -	current->lockdep_recursion = 0;
> -	raw_local_irq_restore(flags);
> -}
> -
>  static int mark_irqflags(struct task_struct *curr, struct held_lock *hlock)
>  {
>  	/*
> @@ -2946,22 +2891,6 @@ static int mark_irqflags(struct task_struct *curr, struct held_lock *hlock)
>  		}
>  	}
>  
> -	/*
> -	 * We reuse the irq context infrastructure more broadly as a general
> -	 * context checking code. This tests GFP_FS recursion (a lock taken
> -	 * during reclaim for a GFP_FS allocation is held over a GFP_FS
> -	 * allocation).
> -	 */
> -	if (!hlock->trylock && (curr->lockdep_reclaim_gfp & __GFP_FS)) {
> -		if (hlock->read) {
> -			if (!mark_lock(curr, hlock, LOCK_USED_IN_RECLAIM_FS_READ))
> -					return 0;
> -		} else {
> -			if (!mark_lock(curr, hlock, LOCK_USED_IN_RECLAIM_FS))
> -					return 0;
> -		}
> -	}
> -
>  	return 1;
>  }
>  
> @@ -3020,10 +2949,6 @@ static inline int separate_irq_context(struct task_struct *curr,
>  	return 0;
>  }
>  
> -void lockdep_trace_alloc(gfp_t gfp_mask)
> -{
> -}
> -
>  #endif /* defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_PROVE_LOCKING) */
>  
>  /*
> @@ -3859,16 +3784,6 @@ void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie cookie)
>  }
>  EXPORT_SYMBOL_GPL(lock_unpin_lock);
>  
> -void lockdep_set_current_reclaim_state(gfp_t gfp_mask)
> -{
> -	current->lockdep_reclaim_gfp = gfp_mask;
> -}
> -
> -void lockdep_clear_current_reclaim_state(void)
> -{
> -	current->lockdep_reclaim_gfp = 0;
> -}
> -
>  #ifdef CONFIG_LOCK_STAT
>  static int
>  print_lock_contention_bug(struct task_struct *curr, struct lockdep_map *lock,
> diff --git a/kernel/locking/lockdep_states.h b/kernel/locking/lockdep_states.h
> index 995b0cc..35ca09f 100644
> --- a/kernel/locking/lockdep_states.h
> +++ b/kernel/locking/lockdep_states.h
> @@ -6,4 +6,3 @@
>   */
>  LOCKDEP_STATE(HARDIRQ)
>  LOCKDEP_STATE(SOFTIRQ)
> -LOCKDEP_STATE(RECLAIM_FS)
> diff --git a/mm/internal.h b/mm/internal.h
> index ccfc2a2..88b9107 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -15,6 +15,8 @@
>  #include <linux/mm.h>
>  #include <linux/pagemap.h>
>  #include <linux/tracepoint-defs.h>
> +#include <linux/lockdep.h>
> +#include <linux/sched/mm.h>
>  
>  /*
>   * The set of flags that only affect watermark checking and reclaim
> @@ -498,4 +500,42 @@ extern const struct trace_print_flags pageflag_names[];
>  extern const struct trace_print_flags vmaflag_names[];
>  extern const struct trace_print_flags gfpflag_names[];
>  
> +
> +#ifdef CONFIG_LOCKDEP
> +extern struct lockdep_map __fs_reclaim_map;
> +
> +static inline bool __need_fs_reclaim(gfp_t gfp_mask)
> +{
> +	gfp_mask = memalloc_noio_flags(gfp_mask);
> +
> +	/* no reclaim without waiting on it */
> +	if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
> +		return false;
> +
> +	/* this guy won't enter reclaim */
> +	if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
> +		return false;
> +
> +	/* We're only interested __GFP_FS allocations for now */
> +	if (!(gfp_mask & __GFP_FS))
> +		return false;
> +
> +	return true;
> +}
> +
> +static inline void fs_reclaim_acquire(gfp_t gfp_mask)
> +{
> +	if (__need_fs_reclaim(gfp_mask))
> +		lock_map_acquire(&__fs_reclaim_map);
> +}
> +static inline void fs_reclaim_release(gfp_t gfp_mask)
> +{
> +	if (__need_fs_reclaim(gfp_mask))
> +		lock_map_release(&__fs_reclaim_map);
> +}
> +#else
> +static inline void fs_reclaim_acquire(gfp_t gfp_mask) { }
> +static inline void fs_reclaim_release(gfp_t gfp_mask) { }
> +#endif
> +
>  #endif	/* __MM_INTERNAL_H */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index eaa64d2..85ea8bf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3387,6 +3387,12 @@ should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_fla
>  }
>  #endif /* CONFIG_COMPACTION */
>  
> +
> +#ifdef CONFIG_LOCKDEP
> +struct lockdep_map __fs_reclaim_map =
> +	STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map);
> +#endif
> +
>  /* Perform direct synchronous page reclaim */
>  static int
>  __perform_reclaim(gfp_t gfp_mask, unsigned int order,
> @@ -3400,7 +3406,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
>  	/* We now go into synchronous reclaim */
>  	cpuset_memory_pressure_bump();
>  	current->flags |= PF_MEMALLOC;
> -	lockdep_set_current_reclaim_state(gfp_mask);
> +	fs_reclaim_acquire(gfp_mask);
>  	reclaim_state.reclaimed_slab = 0;
>  	current->reclaim_state = &reclaim_state;
>  
> @@ -3408,7 +3414,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
>  								ac->nodemask);
>  
>  	current->reclaim_state = NULL;
> -	lockdep_clear_current_reclaim_state();
> +	fs_reclaim_release(gfp_mask);
>  	current->flags &= ~PF_MEMALLOC;
>  
>  	cond_resched();
> @@ -3913,7 +3919,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
>  			*alloc_flags |= ALLOC_CPUSET;
>  	}
>  
> -	lockdep_trace_alloc(gfp_mask);
> +	fs_reclaim_acquire(gfp_mask);
> +	fs_reclaim_release(gfp_mask);
>  
>  	might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
>  
> diff --git a/mm/slab.h b/mm/slab.h
> index 65e7c3f..753f552 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -44,6 +44,8 @@ struct kmem_cache {
>  #include <linux/kmemleak.h>
>  #include <linux/random.h>
>  
> +#include "internal.h"
> +
>  /*
>   * State of the slab allocator.
>   *
> @@ -428,7 +430,10 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
>  						     gfp_t flags)
>  {
>  	flags &= gfp_allowed_mask;
> -	lockdep_trace_alloc(flags);
> +
> +	fs_reclaim_acquire(flags);
> +	fs_reclaim_release(flags);
> +
>  	might_sleep_if(gfpflags_allow_blocking(flags));
>  
>  	if (should_failslab(s, flags))
> diff --git a/mm/slob.c b/mm/slob.c
> index eac04d43..3e32280 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -73,6 +73,8 @@
>  #include <linux/atomic.h>
>  
>  #include "slab.h"
> +#include "internal.h"
> +
>  /*
>   * slob_block has a field 'units', which indicates size of block if +ve,
>   * or offset of next block if -ve (in SLOB_UNITs).
> @@ -432,7 +434,8 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
>  
>  	gfp &= gfp_allowed_mask;
>  
> -	lockdep_trace_alloc(gfp);
> +	fs_reclaim_acquire(gfp);
> +	fs_reclaim_release(gfp);
>  
>  	if (size < PAGE_SIZE - align) {
>  		if (!size)
> @@ -538,7 +541,8 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
>  
>  	flags &= gfp_allowed_mask;
>  
> -	lockdep_trace_alloc(flags);
> +	fs_reclaim_acquire(flags);
> +	fs_reclaim_release(flags);
>  
>  	if (c->size < PAGE_SIZE) {
>  		b = slob_alloc(c->size, flags, c->align, node);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bc8031e..2f57e36 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3418,8 +3418,6 @@ static int kswapd(void *p)
>  	};
>  	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
>  
> -	lockdep_set_current_reclaim_state(GFP_KERNEL);
> -
>  	if (!cpumask_empty(cpumask))
>  		set_cpus_allowed_ptr(tsk, cpumask);
>  	current->reclaim_state = &reclaim_state;
> @@ -3475,7 +3473,9 @@ static int kswapd(void *p)
>  		 */
>  		trace_mm_vmscan_kswapd_wake(pgdat->node_id, classzone_idx,
>  						alloc_order);
> +		fs_reclaim_acquire(GFP_KERNEL);
>  		reclaim_order = balance_pgdat(pgdat, alloc_order, classzone_idx);
> +		fs_reclaim_release(GFP_KERNEL);
>  		if (reclaim_order < alloc_order)
>  			goto kswapd_try_sleep;
>  
> @@ -3485,7 +3485,6 @@ static int kswapd(void *p)
>  
>  	tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
>  	current->reclaim_state = NULL;
> -	lockdep_clear_current_reclaim_state();
>  
>  	return 0;
>  }
> @@ -3550,14 +3549,14 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
>  	unsigned long nr_reclaimed;
>  
>  	p->flags |= PF_MEMALLOC;
> -	lockdep_set_current_reclaim_state(sc.gfp_mask);
> +	fs_reclaim_acquire(sc.gfp_mask);
>  	reclaim_state.reclaimed_slab = 0;
>  	p->reclaim_state = &reclaim_state;
>  
>  	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
>  
>  	p->reclaim_state = NULL;
> -	lockdep_clear_current_reclaim_state();
> +	fs_reclaim_release(sc.gfp_mask);
>  	p->flags &= ~PF_MEMALLOC;
>  
>  	return nr_reclaimed;
> @@ -3741,7 +3740,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>  	 * and RECLAIM_UNMAP.
>  	 */
>  	p->flags |= PF_MEMALLOC | PF_SWAPWRITE;
> -	lockdep_set_current_reclaim_state(gfp_mask);
> +	fs_reclaim_acquire(gfp_mask);
>  	reclaim_state.reclaimed_slab = 0;
>  	p->reclaim_state = &reclaim_state;
>  
> @@ -3756,8 +3755,8 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>  	}
>  
>  	p->reclaim_state = NULL;
> +	fs_reclaim_release(gfp_mask);
>  	current->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE);
> -	lockdep_clear_current_reclaim_state();
>  	return sc.nr_reclaimed >= nr_pages;
>  }
>  

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-03  0:17           ` Byungchul Park
@ 2017-03-03  8:14             ` Peter Zijlstra
  2017-03-03  9:13               ` Peter Zijlstra
  2017-03-05  3:08               ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
  0 siblings, 2 replies; 63+ messages in thread
From: Peter Zijlstra @ 2017-03-03  8:14 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team, Michal Hocko,
	Nikolay Borisov, Mel Gorman

On Fri, Mar 03, 2017 at 09:17:37AM +0900, Byungchul Park wrote:
> On Thu, Mar 02, 2017 at 02:40:31PM +0100, Peter Zijlstra wrote:

> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index a95e5d1..7baea89 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -1860,6 +1860,17 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
> >  		}
> >  	}
> >  
> > +	/*
> > +	 * Is the <prev> -> <next> redundant?
> > +	 */
> > +	this.class = hlock_class(prev);
> > +	this.parent = NULL;
> > +	ret = check_noncircular(&this, hlock_class(next), &target_entry);
> > +	if (!ret) /* exists, redundant */
> > +		return 2;
> > +	if (ret < 0)
> > +		return print_bfs_bug(ret);
> > +
> >  	if (!*stack_saved) {
> >  		if (!save_trace(&trace))
> >  			return 0;
> 
> This whoud be very nice if you allow to add this code. However, prev_gen_id
> thingy is still useful, the code above can achieve it though. Agree?

So my goal was to avoid prev_gen_id, and yes I think the above does
that.

Now the problem with the above condition is that it makes reports
harder to decipher, because by avoiding adding redundant links to our
graph we loose a possible shorter path.

So while for correctness sake it doesn't matter, it is irrelevant how
long the cycle is after all, all that matters is that there is a cycle.
But the humans on the receiving end tend to like shorter cycles.

And I think the same is true for crossrelease, avoiding redundant links
increases cycle length.

(And remember, BFS will otherwise find the shortest cycle.)

That said; I'd be fairly interested in numbers on how many links this
avoids, I'll go make a check_redundant() version of the above and put a
proper counter in so I can see what it does for a regular boot etc..

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-03  8:14             ` Peter Zijlstra
@ 2017-03-03  9:13               ` Peter Zijlstra
  2017-03-03  9:32                 ` Peter Zijlstra
                                   ` (2 more replies)
  2017-03-05  3:08               ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
  1 sibling, 3 replies; 63+ messages in thread
From: Peter Zijlstra @ 2017-03-03  9:13 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team, Michal Hocko,
	Nikolay Borisov, Mel Gorman

On Fri, Mar 03, 2017 at 09:14:16AM +0100, Peter Zijlstra wrote:

> That said; I'd be fairly interested in numbers on how many links this
> avoids, I'll go make a check_redundant() version of the above and put a
> proper counter in so I can see what it does for a regular boot etc..

Two boots + a make defconfig, the first didn't have the redundant bit
in, the second did (full diff below still includes the reclaim rework,
because that was still in that kernel and I forgot to reset the tree).


 lock-classes:                         1168       1169 [max: 8191]
 direct dependencies:                  7688       5812 [max: 32768]
 indirect dependencies:               25492      25937
 all direct dependencies:            220113     217512
 dependency chains:                    9005       9008 [max: 65536]
 dependency chain hlocks:             34450      34366 [max: 327680]
 in-hardirq chains:                      55         51
 in-softirq chains:                     371        378
 in-process chains:                    8579       8579
 stack-trace entries:                108073      88474 [max: 524288]
 combined max dependencies:       178738560  169094640

 max locking depth:                      15         15
 max bfs queue depth:                   320        329

 cyclic checks:                        9123       9190

 redundant checks:                                5046
 redundant links:                                 1828

 find-mask forwards checks:            2564       2599
 find-mask backwards checks:          39521      39789


So it saves nearly 2k links and a fair chunk of stack-trace entries, but
as expected, makes no real difference on the indirect dependencies.

At the same time, you see the max BFS depth increase, which is also
expected, although it could easily be boot variance -- these numbers are
not entirely stable between boots.

Could you run something similar? Or I'll take a look on your next spin
of the patches.



---
 include/linux/lockdep.h            |  11 +---
 include/linux/sched.h              |   1 -
 kernel/locking/lockdep.c           | 114 +++++++++----------------------------
 kernel/locking/lockdep_internals.h |   2 +
 kernel/locking/lockdep_proc.c      |   4 ++
 kernel/locking/lockdep_states.h    |   1 -
 mm/internal.h                      |  40 +++++++++++++
 mm/page_alloc.c                    |  13 ++++-
 mm/slab.h                          |   7 ++-
 mm/slob.c                          |   8 ++-
 mm/vmscan.c                        |  13 ++---
 11 files changed, 104 insertions(+), 110 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 1e327bb..6ba1a65 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -29,7 +29,7 @@ extern int lock_stat;
  * We'd rather not expose kernel/lockdep_states.h this wide, but we do need
  * the total number of states... :-(
  */
-#define XXX_LOCK_USAGE_STATES		(1+3*4)
+#define XXX_LOCK_USAGE_STATES		(1+2*4)
 
 /*
  * NR_LOCKDEP_CACHING_CLASSES ... Number of classes
@@ -361,10 +361,6 @@ static inline void lock_set_subclass(struct lockdep_map *lock,
 	lock_set_class(lock, lock->name, lock->key, subclass, ip);
 }
 
-extern void lockdep_set_current_reclaim_state(gfp_t gfp_mask);
-extern void lockdep_clear_current_reclaim_state(void);
-extern void lockdep_trace_alloc(gfp_t mask);
-
 struct pin_cookie { unsigned int val; };
 
 #define NIL_COOKIE (struct pin_cookie){ .val = 0U, }
@@ -373,7 +369,7 @@ extern struct pin_cookie lock_pin_lock(struct lockdep_map *lock);
 extern void lock_repin_lock(struct lockdep_map *lock, struct pin_cookie);
 extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie);
 
-# define INIT_LOCKDEP				.lockdep_recursion = 0, .lockdep_reclaim_gfp = 0,
+# define INIT_LOCKDEP				.lockdep_recursion = 0,
 
 #define lockdep_depth(tsk)	(debug_locks ? (tsk)->lockdep_depth : 0)
 
@@ -413,9 +409,6 @@ static inline void lockdep_on(void)
 # define lock_release(l, n, i)			do { } while (0)
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
-# define lockdep_set_current_reclaim_state(g)	do { } while (0)
-# define lockdep_clear_current_reclaim_state()	do { } while (0)
-# define lockdep_trace_alloc(g)			do { } while (0)
 # define lockdep_info()				do { } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d67eee8..0fa8a8f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -806,7 +806,6 @@ struct task_struct {
 	int				lockdep_depth;
 	unsigned int			lockdep_recursion;
 	struct held_lock		held_locks[MAX_LOCK_DEPTH];
-	gfp_t				lockdep_reclaim_gfp;
 #endif
 
 #ifdef CONFIG_UBSAN
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index a95e5d1..e3cc398 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -343,14 +343,12 @@ EXPORT_SYMBOL(lockdep_on);
 #if VERBOSE
 # define HARDIRQ_VERBOSE	1
 # define SOFTIRQ_VERBOSE	1
-# define RECLAIM_VERBOSE	1
 #else
 # define HARDIRQ_VERBOSE	0
 # define SOFTIRQ_VERBOSE	0
-# define RECLAIM_VERBOSE	0
 #endif
 
-#if VERBOSE || HARDIRQ_VERBOSE || SOFTIRQ_VERBOSE || RECLAIM_VERBOSE
+#if VERBOSE || HARDIRQ_VERBOSE || SOFTIRQ_VERBOSE
 /*
  * Quick filtering for interesting events:
  */
@@ -1295,6 +1293,19 @@ check_noncircular(struct lock_list *root, struct lock_class *target,
 	return result;
 }
 
+static noinline int
+check_redundant(struct lock_list *root, struct lock_class *target,
+		struct lock_list **target_entry)
+{
+	int result;
+
+	debug_atomic_inc(nr_redundant_checks);
+
+	result = __bfs_forwards(root, target, class_equal, target_entry);
+
+	return result;
+}
+
 #if defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_PROVE_LOCKING)
 /*
  * Forwards and backwards subgraph searching, for the purposes of
@@ -1860,6 +1871,20 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
 		}
 	}
 
+	/*
+	 * Is the <prev> -> <next> link redundant?
+	 */
+	this.class = hlock_class(prev);
+	this.parent = NULL;
+	ret = check_redundant(&this, hlock_class(next), &target_entry);
+	if (!ret) {
+		debug_atomic_inc(nr_redundant);
+		return 2;
+	}
+	if (ret < 0)
+		return print_bfs_bug(ret);
+
+
 	if (!*stack_saved) {
 		if (!save_trace(&trace))
 			return 0;
@@ -2553,14 +2578,6 @@ static int SOFTIRQ_verbose(struct lock_class *class)
 	return 0;
 }
 
-static int RECLAIM_FS_verbose(struct lock_class *class)
-{
-#if RECLAIM_VERBOSE
-	return class_filter(class);
-#endif
-	return 0;
-}
-
 #define STRICT_READ_CHECKS	1
 
 static int (*state_verbose_f[])(struct lock_class *class) = {
@@ -2856,51 +2873,6 @@ void trace_softirqs_off(unsigned long ip)
 		debug_atomic_inc(redundant_softirqs_off);
 }
 
-static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags)
-{
-	struct task_struct *curr = current;
-
-	if (unlikely(!debug_locks))
-		return;
-
-	/* no reclaim without waiting on it */
-	if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
-		return;
-
-	/* this guy won't enter reclaim */
-	if ((curr->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
-		return;
-
-	/* We're only interested __GFP_FS allocations for now */
-	if (!(gfp_mask & __GFP_FS))
-		return;
-
-	/*
-	 * Oi! Can't be having __GFP_FS allocations with IRQs disabled.
-	 */
-	if (DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)))
-		return;
-
-	mark_held_locks(curr, RECLAIM_FS);
-}
-
-static void check_flags(unsigned long flags);
-
-void lockdep_trace_alloc(gfp_t gfp_mask)
-{
-	unsigned long flags;
-
-	if (unlikely(current->lockdep_recursion))
-		return;
-
-	raw_local_irq_save(flags);
-	check_flags(flags);
-	current->lockdep_recursion = 1;
-	__lockdep_trace_alloc(gfp_mask, flags);
-	current->lockdep_recursion = 0;
-	raw_local_irq_restore(flags);
-}
-
 static int mark_irqflags(struct task_struct *curr, struct held_lock *hlock)
 {
 	/*
@@ -2946,22 +2918,6 @@ static int mark_irqflags(struct task_struct *curr, struct held_lock *hlock)
 		}
 	}
 
-	/*
-	 * We reuse the irq context infrastructure more broadly as a general
-	 * context checking code. This tests GFP_FS recursion (a lock taken
-	 * during reclaim for a GFP_FS allocation is held over a GFP_FS
-	 * allocation).
-	 */
-	if (!hlock->trylock && (curr->lockdep_reclaim_gfp & __GFP_FS)) {
-		if (hlock->read) {
-			if (!mark_lock(curr, hlock, LOCK_USED_IN_RECLAIM_FS_READ))
-					return 0;
-		} else {
-			if (!mark_lock(curr, hlock, LOCK_USED_IN_RECLAIM_FS))
-					return 0;
-		}
-	}
-
 	return 1;
 }
 
@@ -3020,10 +2976,6 @@ static inline int separate_irq_context(struct task_struct *curr,
 	return 0;
 }
 
-void lockdep_trace_alloc(gfp_t gfp_mask)
-{
-}
-
 #endif /* defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_PROVE_LOCKING) */
 
 /*
@@ -3859,16 +3811,6 @@ void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie cookie)
 }
 EXPORT_SYMBOL_GPL(lock_unpin_lock);
 
-void lockdep_set_current_reclaim_state(gfp_t gfp_mask)
-{
-	current->lockdep_reclaim_gfp = gfp_mask;
-}
-
-void lockdep_clear_current_reclaim_state(void)
-{
-	current->lockdep_reclaim_gfp = 0;
-}
-
 #ifdef CONFIG_LOCK_STAT
 static int
 print_lock_contention_bug(struct task_struct *curr, struct lockdep_map *lock,
diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h
index c2b8849..7809269 100644
--- a/kernel/locking/lockdep_internals.h
+++ b/kernel/locking/lockdep_internals.h
@@ -143,6 +143,8 @@ struct lockdep_stats {
 	int	redundant_softirqs_on;
 	int	redundant_softirqs_off;
 	int	nr_unused_locks;
+	int	nr_redundant_checks;
+	int	nr_redundant;
 	int	nr_cyclic_checks;
 	int	nr_cyclic_check_recursions;
 	int	nr_find_usage_forwards_checks;
diff --git a/kernel/locking/lockdep_proc.c b/kernel/locking/lockdep_proc.c
index 6d1fcc7..68d9e26 100644
--- a/kernel/locking/lockdep_proc.c
+++ b/kernel/locking/lockdep_proc.c
@@ -201,6 +201,10 @@ static void lockdep_stats_debug_show(struct seq_file *m)
 		debug_atomic_read(chain_lookup_hits));
 	seq_printf(m, " cyclic checks:                 %11llu\n",
 		debug_atomic_read(nr_cyclic_checks));
+	seq_printf(m, " redundant checks:              %11llu\n",
+		debug_atomic_read(nr_redundant_checks));
+	seq_printf(m, " redundant links:               %11llu\n",
+		debug_atomic_read(nr_redundant));
 	seq_printf(m, " find-mask forwards checks:     %11llu\n",
 		debug_atomic_read(nr_find_usage_forwards_checks));
 	seq_printf(m, " find-mask backwards checks:    %11llu\n",
diff --git a/kernel/locking/lockdep_states.h b/kernel/locking/lockdep_states.h
index 995b0cc..35ca09f 100644
--- a/kernel/locking/lockdep_states.h
+++ b/kernel/locking/lockdep_states.h
@@ -6,4 +6,3 @@
  */
 LOCKDEP_STATE(HARDIRQ)
 LOCKDEP_STATE(SOFTIRQ)
-LOCKDEP_STATE(RECLAIM_FS)
diff --git a/mm/internal.h b/mm/internal.h
index ccfc2a2..88b9107 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -15,6 +15,8 @@
 #include <linux/mm.h>
 #include <linux/pagemap.h>
 #include <linux/tracepoint-defs.h>
+#include <linux/lockdep.h>
+#include <linux/sched/mm.h>
 
 /*
  * The set of flags that only affect watermark checking and reclaim
@@ -498,4 +500,42 @@ extern const struct trace_print_flags pageflag_names[];
 extern const struct trace_print_flags vmaflag_names[];
 extern const struct trace_print_flags gfpflag_names[];
 
+
+#ifdef CONFIG_LOCKDEP
+extern struct lockdep_map __fs_reclaim_map;
+
+static inline bool __need_fs_reclaim(gfp_t gfp_mask)
+{
+	gfp_mask = memalloc_noio_flags(gfp_mask);
+
+	/* no reclaim without waiting on it */
+	if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
+		return false;
+
+	/* this guy won't enter reclaim */
+	if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
+		return false;
+
+	/* We're only interested __GFP_FS allocations for now */
+	if (!(gfp_mask & __GFP_FS))
+		return false;
+
+	return true;
+}
+
+static inline void fs_reclaim_acquire(gfp_t gfp_mask)
+{
+	if (__need_fs_reclaim(gfp_mask))
+		lock_map_acquire(&__fs_reclaim_map);
+}
+static inline void fs_reclaim_release(gfp_t gfp_mask)
+{
+	if (__need_fs_reclaim(gfp_mask))
+		lock_map_release(&__fs_reclaim_map);
+}
+#else
+static inline void fs_reclaim_acquire(gfp_t gfp_mask) { }
+static inline void fs_reclaim_release(gfp_t gfp_mask) { }
+#endif
+
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index eaa64d2..85ea8bf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3387,6 +3387,12 @@ should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_fla
 }
 #endif /* CONFIG_COMPACTION */
 
+
+#ifdef CONFIG_LOCKDEP
+struct lockdep_map __fs_reclaim_map =
+	STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map);
+#endif
+
 /* Perform direct synchronous page reclaim */
 static int
 __perform_reclaim(gfp_t gfp_mask, unsigned int order,
@@ -3400,7 +3406,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
 	/* We now go into synchronous reclaim */
 	cpuset_memory_pressure_bump();
 	current->flags |= PF_MEMALLOC;
-	lockdep_set_current_reclaim_state(gfp_mask);
+	fs_reclaim_acquire(gfp_mask);
 	reclaim_state.reclaimed_slab = 0;
 	current->reclaim_state = &reclaim_state;
 
@@ -3408,7 +3414,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
 								ac->nodemask);
 
 	current->reclaim_state = NULL;
-	lockdep_clear_current_reclaim_state();
+	fs_reclaim_release(gfp_mask);
 	current->flags &= ~PF_MEMALLOC;
 
 	cond_resched();
@@ -3913,7 +3919,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
 			*alloc_flags |= ALLOC_CPUSET;
 	}
 
-	lockdep_trace_alloc(gfp_mask);
+	fs_reclaim_acquire(gfp_mask);
+	fs_reclaim_release(gfp_mask);
 
 	might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
 
diff --git a/mm/slab.h b/mm/slab.h
index 65e7c3f..753f552 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -44,6 +44,8 @@ struct kmem_cache {
 #include <linux/kmemleak.h>
 #include <linux/random.h>
 
+#include "internal.h"
+
 /*
  * State of the slab allocator.
  *
@@ -428,7 +430,10 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
 						     gfp_t flags)
 {
 	flags &= gfp_allowed_mask;
-	lockdep_trace_alloc(flags);
+
+	fs_reclaim_acquire(flags);
+	fs_reclaim_release(flags);
+
 	might_sleep_if(gfpflags_allow_blocking(flags));
 
 	if (should_failslab(s, flags))
diff --git a/mm/slob.c b/mm/slob.c
index eac04d43..3e32280 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -73,6 +73,8 @@
 #include <linux/atomic.h>
 
 #include "slab.h"
+#include "internal.h"
+
 /*
  * slob_block has a field 'units', which indicates size of block if +ve,
  * or offset of next block if -ve (in SLOB_UNITs).
@@ -432,7 +434,8 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 
 	gfp &= gfp_allowed_mask;
 
-	lockdep_trace_alloc(gfp);
+	fs_reclaim_acquire(gfp);
+	fs_reclaim_release(gfp);
 
 	if (size < PAGE_SIZE - align) {
 		if (!size)
@@ -538,7 +541,8 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 
 	flags &= gfp_allowed_mask;
 
-	lockdep_trace_alloc(flags);
+	fs_reclaim_acquire(flags);
+	fs_reclaim_release(flags);
 
 	if (c->size < PAGE_SIZE) {
 		b = slob_alloc(c->size, flags, c->align, node);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bc8031e..2f57e36 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3418,8 +3418,6 @@ static int kswapd(void *p)
 	};
 	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
 
-	lockdep_set_current_reclaim_state(GFP_KERNEL);
-
 	if (!cpumask_empty(cpumask))
 		set_cpus_allowed_ptr(tsk, cpumask);
 	current->reclaim_state = &reclaim_state;
@@ -3475,7 +3473,9 @@ static int kswapd(void *p)
 		 */
 		trace_mm_vmscan_kswapd_wake(pgdat->node_id, classzone_idx,
 						alloc_order);
+		fs_reclaim_acquire(GFP_KERNEL);
 		reclaim_order = balance_pgdat(pgdat, alloc_order, classzone_idx);
+		fs_reclaim_release(GFP_KERNEL);
 		if (reclaim_order < alloc_order)
 			goto kswapd_try_sleep;
 
@@ -3485,7 +3485,6 @@ static int kswapd(void *p)
 
 	tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
 	current->reclaim_state = NULL;
-	lockdep_clear_current_reclaim_state();
 
 	return 0;
 }
@@ -3550,14 +3549,14 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
 	unsigned long nr_reclaimed;
 
 	p->flags |= PF_MEMALLOC;
-	lockdep_set_current_reclaim_state(sc.gfp_mask);
+	fs_reclaim_acquire(sc.gfp_mask);
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
 	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
 
 	p->reclaim_state = NULL;
-	lockdep_clear_current_reclaim_state();
+	fs_reclaim_release(sc.gfp_mask);
 	p->flags &= ~PF_MEMALLOC;
 
 	return nr_reclaimed;
@@ -3741,7 +3740,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 	 * and RECLAIM_UNMAP.
 	 */
 	p->flags |= PF_MEMALLOC | PF_SWAPWRITE;
-	lockdep_set_current_reclaim_state(gfp_mask);
+	fs_reclaim_acquire(gfp_mask);
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
@@ -3756,8 +3755,8 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 	}
 
 	p->reclaim_state = NULL;
+	fs_reclaim_release(gfp_mask);
 	current->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE);
-	lockdep_clear_current_reclaim_state();
 	return sc.nr_reclaimed >= nr_pages;
 }
 

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-03  9:13               ` Peter Zijlstra
@ 2017-03-03  9:32                 ` Peter Zijlstra
  2017-03-05  3:33                 ` Byungchul Park
  2017-08-10 12:18                 ` [tip:locking/core] locking/lockdep: Avoid creating redundant links tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 63+ messages in thread
From: Peter Zijlstra @ 2017-03-03  9:32 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team, Michal Hocko,
	Nikolay Borisov, Mel Gorman

On Fri, Mar 03, 2017 at 10:13:38AM +0100, Peter Zijlstra wrote:
> On Fri, Mar 03, 2017 at 09:14:16AM +0100, Peter Zijlstra wrote:
> 
> > That said; I'd be fairly interested in numbers on how many links this
> > avoids, I'll go make a check_redundant() version of the above and put a
> > proper counter in so I can see what it does for a regular boot etc..
> 
> Two boots + a make defconfig, the first didn't have the redundant bit
> in, the second did (full diff below still includes the reclaim rework,
> because that was still in that kernel and I forgot to reset the tree).
> 
> 
>  lock-classes:                         1168       1169 [max: 8191]
>  direct dependencies:                  7688       5812 [max: 32768]
>  indirect dependencies:               25492      25937
>  all direct dependencies:            220113     217512
>  dependency chains:                    9005       9008 [max: 65536]
>  dependency chain hlocks:             34450      34366 [max: 327680]
>  in-hardirq chains:                      55         51
>  in-softirq chains:                     371        378
>  in-process chains:                    8579       8579
>  stack-trace entries:                108073      88474 [max: 524288]
>  combined max dependencies:       178738560  169094640
> 
>  max locking depth:                      15         15
>  max bfs queue depth:                   320        329
> 
>  cyclic checks:                        9123       9190
> 
>  redundant checks:                                5046
>  redundant links:                                 1828
> 
>  find-mask forwards checks:            2564       2599
>  find-mask backwards checks:          39521      39789
> 

OK, last email, I promise, then I'll go bury myself in futexes.

 find-mask forwards checks:            2999
 find-mask backwards checks:          56134

Is with a clean kernel, which shows how many __bfs() calls we save by
doing away with that RECLAIM state. OTOH:

 lock-classes:                         1167 [max: 8191]
 direct dependencies:                  7254 [max: 32768]
 indirect dependencies:               23763
 all direct dependencies:            219093

Shows that the added reclaim class isn't entirely free either ;-)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-03  8:14             ` Peter Zijlstra
  2017-03-03  9:13               ` Peter Zijlstra
@ 2017-03-05  3:08               ` Byungchul Park
  2017-03-07 11:42                 ` Peter Zijlstra
  1 sibling, 1 reply; 63+ messages in thread
From: Byungchul Park @ 2017-03-05  3:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team, Michal Hocko,
	Nikolay Borisov, Mel Gorman

On Fri, Mar 03, 2017 at 09:14:16AM +0100, Peter Zijlstra wrote:
> On Fri, Mar 03, 2017 at 09:17:37AM +0900, Byungchul Park wrote:
> > On Thu, Mar 02, 2017 at 02:40:31PM +0100, Peter Zijlstra wrote:
> 
> > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > > index a95e5d1..7baea89 100644
> > > --- a/kernel/locking/lockdep.c
> > > +++ b/kernel/locking/lockdep.c
> > > @@ -1860,6 +1860,17 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
> > >  		}
> > >  	}
> > >  
> > > +	/*
> > > +	 * Is the <prev> -> <next> redundant?
> > > +	 */
> > > +	this.class = hlock_class(prev);
> > > +	this.parent = NULL;
> > > +	ret = check_noncircular(&this, hlock_class(next), &target_entry);
> > > +	if (!ret) /* exists, redundant */
> > > +		return 2;
> > > +	if (ret < 0)
> > > +		return print_bfs_bug(ret);
> > > +
> > >  	if (!*stack_saved) {
> > >  		if (!save_trace(&trace))
> > >  			return 0;
> > 
> > This whoud be very nice if you allow to add this code. However, prev_gen_id
> > thingy is still useful, the code above can achieve it though. Agree?
> 
> So my goal was to avoid prev_gen_id, and yes I think the above does
> that.
> 
> Now the problem with the above condition is that it makes reports
> harder to decipher, because by avoiding adding redundant links to our
> graph we loose a possible shorter path.

Let's see the following example:

   A -> B -> C

   where A, B and C are typical lock class.

Assume the graph above was built and operations happena in the
following order:

   CONTEXT X		CONTEXT Y
   ---------		---------
   acquire DX
			acquire A
			acquire B
			acquire C

			release and commit DX

   where A, B and C are typical lock class, DX is a crosslock class.

The graph will grow as following _without_ prev_gen_id.

        -> A -> B -> C
       /    /    /
   DX -----------

   where A, B and C are typical lock class, DX is a crosslock class.

The graph will grow as following _with_ prev_gen_id.

   DX -> A -> B -> C

   where A, B and C are typical lock class, DX is a crosslock class.

You said the former is better because it has smaller cost in bfs. But it
has to use _much_ more memory to keep additional nodes in graph. Without
exaggeration, every crosslock would get linked with all locks in history
locks, on commit, unless redundant. It might be pretty more than we
expect - I will check and let you know how many it is. Is it still good?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-03  9:13               ` Peter Zijlstra
  2017-03-03  9:32                 ` Peter Zijlstra
@ 2017-03-05  3:33                 ` Byungchul Park
  2017-08-10 12:18                 ` [tip:locking/core] locking/lockdep: Avoid creating redundant links tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-03-05  3:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team, Michal Hocko,
	Nikolay Borisov, Mel Gorman

On Fri, Mar 03, 2017 at 10:13:38AM +0100, Peter Zijlstra wrote:
> On Fri, Mar 03, 2017 at 09:14:16AM +0100, Peter Zijlstra wrote:
> 
> Two boots + a make defconfig, the first didn't have the redundant bit
> in, the second did (full diff below still includes the reclaim rework,
> because that was still in that kernel and I forgot to reset the tree).
> 
> 
>  lock-classes:                         1168       1169 [max: 8191]
>  direct dependencies:                  7688       5812 [max: 32768]
>  indirect dependencies:               25492      25937
>  all direct dependencies:            220113     217512
>  dependency chains:                    9005       9008 [max: 65536]
>  dependency chain hlocks:             34450      34366 [max: 327680]
>  in-hardirq chains:                      55         51
>  in-softirq chains:                     371        378
>  in-process chains:                    8579       8579
>  stack-trace entries:                108073      88474 [max: 524288]
>  combined max dependencies:       178738560  169094640
> 
>  max locking depth:                      15         15
>  max bfs queue depth:                   320        329
> 
>  cyclic checks:                        9123       9190
> 
>  redundant checks:                                5046
>  redundant links:                                 1828
> 
>  find-mask forwards checks:            2564       2599
>  find-mask backwards checks:          39521      39789
> 
> 
> So it saves nearly 2k links and a fair chunk of stack-trace entries, but

It's as we expect.

> as expected, makes no real difference on the indirect dependencies.

It looks that the indirect dependencies increased to me. This result is
also somewhat anticipated.

> At the same time, you see the max BFS depth increase, which is also

Yes. The depth should increase.

> expected, although it could easily be boot variance -- these numbers are
> not entirely stable between boots.
> 
> Could you run something similar? Or I'll take a look on your next spin
> of the patches.

I will check same thing you did and let you know the result at next spin.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-02 23:50           ` Byungchul Park
@ 2017-03-05  8:01             ` Byungchul Park
  0 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-03-05  8:01 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: 'Peter Zijlstra',
	mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Fri, Mar 03, 2017 at 08:50:03AM +0900, Byungchul Park wrote:
> On Thu, Mar 02, 2017 at 06:39:49AM -0800, Matthew Wilcox wrote:
> > On Thu, Mar 02, 2017 at 01:45:35PM +0900, byungchul.park wrote:
> > > From: Matthew Wilcox [mailto:willy@infradead.org]
> > > > On Tue, Feb 28, 2017 at 07:15:47PM +0100, Peter Zijlstra wrote:
> > > > > (And we should not be returning to userspace with locks held anyway --
> > > > > lockdep already has a check for that).
> > > > 
> > > > Don't we return to userspace with page locks held, eg during async
> > > > directio?
> > > 
> > > Hello,
> > > 
> > > I think that the check when returning to user with crosslocks held
> > > should be an exception. Don't you think so?
> > 
> > Oh yes.  We have to keep the pages locked during reads, and we have to
> > return to userspace before I/O is complete, therefore we have to return
> > to userspace with pages locked.  They'll be unlocked by the interrupt
> > handler in page_endio().
> 
> Agree.
> 
> > Speaking of which ... this feature is far too heavy for use in production
> > on pages.  You're almost trebling the size of struct page.  Can we
> > do something like make all struct pages share the same lockdep_map?
> > We'd have to not complain about holding one crossdep lock and acquiring
> > another one of the same type, but with millions of pages in the system,
> > it must surely be creating a gargantuan graph right now?
> 
> Um.. I will try it for page locks to work with one lockmap. That is also
> what Peterz pointed out and what I worried about when implementing..

I've thought it more and it seems not to be good. We could not use
subclass feature if we make page locks work with only one lockmap
instance. And there are several things we have to give up, that are,
things using each field in struct lockdep_map. So now, I'm not sure I
should change the current implementation. What do you think about it?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-03-05  3:08               ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
@ 2017-03-07 11:42                 ` Peter Zijlstra
  0 siblings, 0 replies; 63+ messages in thread
From: Peter Zijlstra @ 2017-03-07 11:42 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team, Michal Hocko,
	Nikolay Borisov, Mel Gorman

On Sun, Mar 05, 2017 at 12:08:45PM +0900, Byungchul Park wrote:
> On Fri, Mar 03, 2017 at 09:14:16AM +0100, Peter Zijlstra wrote:

> > 
> > Now the problem with the above condition is that it makes reports
> > harder to decipher, because by avoiding adding redundant links to our
> > graph we loose a possible shorter path.
> 
> Let's see the following example:
> 
>    A -> B -> C
> 
>    where A, B and C are typical lock class.
> 
> Assume the graph above was built and operations happena in the
> following order:
> 
>    CONTEXT X		CONTEXT Y
>    ---------		---------
>    acquire DX
> 			acquire A
> 			acquire B
> 			acquire C
> 
> 			release and commit DX
> 
>    where A, B and C are typical lock class, DX is a crosslock class.
> 
> The graph will grow as following _without_ prev_gen_id.
> 
>         -> A -> B -> C
>        /    /    /
>    DX -----------
> 
>    where A, B and C are typical lock class, DX is a crosslock class.
> 
> The graph will grow as following _with_ prev_gen_id.
> 
>    DX -> A -> B -> C
> 
>    where A, B and C are typical lock class, DX is a crosslock class.
> 
> You said the former is better because it has smaller cost in bfs.

No, I said the former is better because when you report a DX inversion
against C, A and B are not required and the report is easier to
understand by _humans_.

I don't particularly care about the BFS cost itself.

> But it has to use _much_ more memory to keep additional nodes in
> graph. Without exaggeration, every crosslock would get linked with all
> locks in history locks, on commit, unless redundant. It might be
> pretty more than we expect - I will check and let you know how many it
> is. Is it still good?

Dunno, probably not.. but it would be good to have numbers.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v5 06/13] lockdep: Implement crossrelease feature
  2017-02-28 18:15   ` Peter Zijlstra
  2017-03-01  7:21     ` Byungchul Park
  2017-03-02  4:20     ` Matthew Wilcox
@ 2017-03-14  7:36     ` Byungchul Park
  2 siblings, 0 replies; 63+ messages in thread
From: Byungchul Park @ 2017-03-14  7:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	iamjoonsoo.kim, akpm, npiggin, kernel-team

On Tue, Feb 28, 2017 at 07:15:47PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 18, 2017 at 10:17:32PM +0900, Byungchul Park wrote:
> > +	/*
> > +	 * Each work of workqueue might run in a different context,
> > +	 * thanks to concurrency support of workqueue. So we have to
> > +	 * distinguish each work to avoid false positive.
> > +	 *
> > +	 * TODO: We can also add dependencies between two acquisitions
> > +	 * of different work_id, if they don't cause a sleep so make
> > +	 * the worker stalled.
> > +	 */
> > +	unsigned int		work_id;
> 
> > +/*
> > + * Crossrelease needs to distinguish each work of workqueues.
> > + * Caller is supposed to be a worker.
> > + */
> > +void crossrelease_work_start(void)
> > +{
> > +	if (current->xhlocks)
> > +		current->work_id++;
> > +}
> 
> So what you're trying to do with that 'work_id' thing is basically wipe
> the entire history when we're at the bottom of a context.
> 
> Which is a useful operation, but should arguably also be done on the
> return to userspace path. Any historical lock from before the current
> syscall is irrelevant.

Yes. I agree with that each syscall is irrelevant to others. But should
we do that? Is it a problem if we don't distinguish between each syscall
context in crossrelease check? IMHO, it's ok to perform commit if the
target crosslock can be seen when releasing it. No? (As you know, in case
of work queue, each work should be distinguished. See the comment in code.)

If we have to do it.. do you mean to modify architecture code for syscall
entry? Or is there architecture independent code where we can be aware of
the entry? It would be appriciated if you answer them.

> 
> (And we should not be returning to userspace with locks held anyway --
> lockdep already has a check for that).

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [tip:locking/core] locking/lockdep: Avoid creating redundant links
  2017-03-03  9:13               ` Peter Zijlstra
  2017-03-03  9:32                 ` Peter Zijlstra
  2017-03-05  3:33                 ` Byungchul Park
@ 2017-08-10 12:18                 ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 63+ messages in thread
From: tip-bot for Peter Zijlstra @ 2017-08-10 12:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mhocko, peterz, hpa, torvalds, nborisov, byungchul.park, tglx,
	mgorman, linux-kernel, mingo

Commit-ID:  ae813308f4630642d2c1c87553929ce95f29f9ef
Gitweb:     http://git.kernel.org/tip/ae813308f4630642d2c1c87553929ce95f29f9ef
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 3 Mar 2017 10:13:38 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Aug 2017 12:29:04 +0200

locking/lockdep: Avoid creating redundant links

Two boots + a make defconfig, the first didn't have the redundant bit
in, the second did:

 lock-classes:                         1168       1169 [max: 8191]
 direct dependencies:                  7688       5812 [max: 32768]
 indirect dependencies:               25492      25937
 all direct dependencies:            220113     217512
 dependency chains:                    9005       9008 [max: 65536]
 dependency chain hlocks:             34450      34366 [max: 327680]
 in-hardirq chains:                      55         51
 in-softirq chains:                     371        378
 in-process chains:                    8579       8579
 stack-trace entries:                108073      88474 [max: 524288]
 combined max dependencies:       178738560  169094640

 max locking depth:                      15         15
 max bfs queue depth:                   320        329

 cyclic checks:                        9123       9190

 redundant checks:                                5046
 redundant links:                                 1828

 find-mask forwards checks:            2564       2599
 find-mask backwards checks:          39521      39789

So it saves nearly 2k links and a fair chunk of stack-trace entries, but
as expected, makes no real difference on the indirect dependencies.

At the same time, you see the max BFS depth increase, which is also
expected, although it could easily be boot variance -- these numbers are
not entirely stable between boots.

The down side is that the cycles in the graph become larger and thus
the reports harder to read.

XXX: do we want this as a CONFIG variable, implied by LOCKDEP_SMALL?

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: iamjoonsoo.kim@lge.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Link: http://lkml.kernel.org/r/20170303091338.GH6536@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/lockdep.c           | 27 +++++++++++++++++++++++++++
 kernel/locking/lockdep_internals.h |  2 ++
 kernel/locking/lockdep_proc.c      |  4 ++++
 3 files changed, 33 insertions(+)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 986f2fa7..b2dd313 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1307,6 +1307,19 @@ check_noncircular(struct lock_list *root, struct lock_class *target,
 	return result;
 }
 
+static noinline int
+check_redundant(struct lock_list *root, struct lock_class *target,
+		struct lock_list **target_entry)
+{
+	int result;
+
+	debug_atomic_inc(nr_redundant_checks);
+
+	result = __bfs_forwards(root, target, class_equal, target_entry);
+
+	return result;
+}
+
 #if defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_PROVE_LOCKING)
 /*
  * Forwards and backwards subgraph searching, for the purposes of
@@ -1872,6 +1885,20 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
 		}
 	}
 
+	/*
+	 * Is the <prev> -> <next> link redundant?
+	 */
+	this.class = hlock_class(prev);
+	this.parent = NULL;
+	ret = check_redundant(&this, hlock_class(next), &target_entry);
+	if (!ret) {
+		debug_atomic_inc(nr_redundant);
+		return 2;
+	}
+	if (ret < 0)
+		return print_bfs_bug(ret);
+
+
 	if (!*stack_saved) {
 		if (!save_trace(&trace))
 			return 0;
diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h
index c08fbd2..1da4669 100644
--- a/kernel/locking/lockdep_internals.h
+++ b/kernel/locking/lockdep_internals.h
@@ -143,6 +143,8 @@ struct lockdep_stats {
 	int	redundant_softirqs_on;
 	int	redundant_softirqs_off;
 	int	nr_unused_locks;
+	int	nr_redundant_checks;
+	int	nr_redundant;
 	int	nr_cyclic_checks;
 	int	nr_cyclic_check_recursions;
 	int	nr_find_usage_forwards_checks;
diff --git a/kernel/locking/lockdep_proc.c b/kernel/locking/lockdep_proc.c
index 6d1fcc7..68d9e26 100644
--- a/kernel/locking/lockdep_proc.c
+++ b/kernel/locking/lockdep_proc.c
@@ -201,6 +201,10 @@ static void lockdep_stats_debug_show(struct seq_file *m)
 		debug_atomic_read(chain_lookup_hits));
 	seq_printf(m, " cyclic checks:                 %11llu\n",
 		debug_atomic_read(nr_cyclic_checks));
+	seq_printf(m, " redundant checks:              %11llu\n",
+		debug_atomic_read(nr_redundant_checks));
+	seq_printf(m, " redundant links:               %11llu\n",
+		debug_atomic_read(nr_redundant));
 	seq_printf(m, " find-mask forwards checks:     %11llu\n",
 		debug_atomic_read(nr_find_usage_forwards_checks));
 	seq_printf(m, " find-mask backwards checks:    %11llu\n",

^ permalink raw reply related	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2017-08-10 12:21 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-18 13:17 [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park
2017-01-18 13:17 ` [PATCH v5 01/13] lockdep: Refactor lookup_chain_cache() Byungchul Park
2017-01-19  9:16   ` Boqun Feng
2017-01-19  9:52     ` Byungchul Park
2017-01-26  7:53     ` Byungchul Park
2017-01-18 13:17 ` [PATCH v5 02/13] lockdep: Fix wrong condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS Byungchul Park
2017-01-18 13:17 ` [PATCH v5 03/13] lockdep: Add a function building a chain between two classes Byungchul Park
2017-01-18 13:17 ` [PATCH v5 04/13] lockdep: Refactor save_trace() Byungchul Park
2017-01-18 13:17 ` [PATCH v5 05/13] lockdep: Pass a callback arg to check_prev_add() to handle stack_trace Byungchul Park
2017-01-26  7:43   ` Byungchul Park
2017-01-18 13:17 ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
2017-02-28 12:26   ` Peter Zijlstra
2017-02-28 12:45   ` Peter Zijlstra
2017-02-28 12:49     ` Peter Zijlstra
2017-03-01  6:20       ` Byungchul Park
2017-02-28 13:05   ` Peter Zijlstra
2017-02-28 13:28     ` Byungchul Park
2017-02-28 13:35       ` Peter Zijlstra
2017-02-28 14:00         ` Byungchul Park
2017-02-28 13:10   ` Peter Zijlstra
2017-02-28 13:24     ` Byungchul Park
2017-02-28 18:29       ` Peter Zijlstra
2017-03-01  4:40         ` Byungchul Park
2017-03-01 10:45           ` Peter Zijlstra
2017-03-01 12:10             ` Byungchul Park
2017-02-28 13:40   ` Peter Zijlstra
2017-03-01  5:43     ` Byungchul Park
2017-03-01 12:28       ` Peter Zijlstra
2017-03-02 13:40         ` Peter Zijlstra
2017-03-03  0:17           ` Byungchul Park
2017-03-03  8:14             ` Peter Zijlstra
2017-03-03  9:13               ` Peter Zijlstra
2017-03-03  9:32                 ` Peter Zijlstra
2017-03-05  3:33                 ` Byungchul Park
2017-08-10 12:18                 ` [tip:locking/core] locking/lockdep: Avoid creating redundant links tip-bot for Peter Zijlstra
2017-03-05  3:08               ` [PATCH v5 06/13] lockdep: Implement crossrelease feature Byungchul Park
2017-03-07 11:42                 ` Peter Zijlstra
2017-03-03  0:39           ` Byungchul Park
2017-02-28 15:49   ` Peter Zijlstra
2017-03-01  5:17     ` Byungchul Park
2017-03-01  6:18       ` Byungchul Park
2017-03-02  2:52       ` Byungchul Park
2017-02-28 18:15   ` Peter Zijlstra
2017-03-01  7:21     ` Byungchul Park
2017-03-01 10:43       ` Peter Zijlstra
2017-03-01 12:27         ` Byungchul Park
2017-03-02  4:20     ` Matthew Wilcox
2017-03-02  4:45       ` byungchul.park
2017-03-02 14:39         ` Matthew Wilcox
2017-03-02 23:50           ` Byungchul Park
2017-03-05  8:01             ` Byungchul Park
2017-03-14  7:36     ` Byungchul Park
2017-03-02 13:41   ` Peter Zijlstra
2017-03-02 23:43     ` Byungchul Park
2017-01-18 13:17 ` [PATCH v5 07/13] lockdep: Make print_circular_bug() aware of crossrelease Byungchul Park
2017-01-18 13:17 ` [PATCH v5 08/13] lockdep: Apply crossrelease to completions Byungchul Park
2017-01-18 13:17 ` [PATCH v5 09/13] pagemap.h: Remove trailing white space Byungchul Park
2017-01-18 13:17 ` [PATCH v5 10/13] lockdep: Apply crossrelease to PG_locked locks Byungchul Park
2017-01-18 13:17 ` [PATCH v5 11/13] lockdep: Apply lock_acquire(release) on __Set(__Clear)PageLocked Byungchul Park
2017-01-18 13:17 ` [PATCH v5 12/13] lockdep: Move data of CONFIG_LOCKDEP_PAGELOCK from page to page_ext Byungchul Park
2017-01-18 13:17 ` [PATCH v5 13/13] lockdep: Crossrelease feature documentation Byungchul Park
2017-01-20  9:08   ` [REVISED DOCUMENT] " Byungchul Park
2017-02-20  8:38 ` [PATCH v5 00/13] lockdep: Implement crossrelease feature Byungchul Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).