linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/16] lockdep: Implement crossrelease feature
@ 2017-05-24  8:59 Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 01/16] lockdep: Refactor lookup_chain_cache() Byungchul Park
                   ` (15 more replies)
  0 siblings, 16 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

I checked if crossrelease feature works well on my qemu-i386 machine.
There's no problem at all to work on mine. But I wonder if it's still
true on other machines. Especially, on large system. Could you let me
know if it doesn't work on yours or if crossrelease feature is useful?

-----8<-----

Change from v6
	- unwind the ring buffer instead tagging for 'work' context
	- introduce hist_id to distinguish every entry of ring buffer
	- change the point calling crossrelease_work_start()
	- handle cases the ring buffer was overwritten
	- change LOCKDEP_CROSSRELEASE config in Kconfig
	  (select PROVE_LOCKING -> depends on PROVE_LOCKING)
	- rename xhlock_used() -> xhlock_valid()
	- simplify serveral code (e.g. traversal the ring buffer)
	- add/enhance several comments and changelogs

Change from v5
	- force XHLOCKS_SIZE to be power of 2 and simplify code
	- remove nmi check
	- separate an optimization using prev_gen_id with a full changelog
	- separate non(multi)-acquisition handling with a full changelog
	- replace vmalloc with kmallock(GFP_KERNEL) for xhlocks
	- select PROVE_LOCKING when choosing CROSSRELEASE
	- clean serveral code (e.g. loose some ifdefferies)
	- enhance several comments and changelogs

Change from v4
	- rebase on vanilla v4.9 tag
	- re-name pend_lock(plock) to hist_lock(xhlock)
	- allow overwriting ring buffer for hist_lock
	- unwind ring buffer instead of tagging id for each irq
	- introduce lockdep_map_cross embedding cross_lock
	- make each work of workqueue distinguishable
	- enhance comments
	(I will update the document at the next spin.)

Change from v3
	- reviced document

Change from v2
	- rebase on vanilla v4.7 tag
	- move lockdep data for page lock from struct page to page_ext
	- allocate plocks buffer via vmalloc instead of in struct task
	- enhanced comments and document
	- optimize performance
	- make reporting function crossrelease-aware

Change from v1
	- enhanced the document
	- removed save_stack_trace() optimizing patch
	- made this based on the seperated save_stack_trace patchset
	  https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1182242.html

Can we detect deadlocks below with original lockdep?

Example 1)

	PROCESS X	PROCESS Y
	--------------	--------------
	mutext_lock A
			lock_page B
	lock_page B
			mutext_lock A // DEADLOCK
	unlock_page B
			mutext_unlock A
	mutex_unlock A
			unlock_page B

where A and B are different lock classes.

No, we cannot.

Example 2)

	PROCESS X	PROCESS Y	PROCESS Z
	--------------	--------------	--------------
			mutex_lock A
	lock_page B
			lock_page B
					mutext_lock A // DEADLOCK
					mutext_unlock A
					unlock_page B
					(B was held by PROCESS X)
			unlock_page B
			mutex_unlock A

where A and B are different lock classes.

No, we cannot.

Example 3)

	PROCESS X	PROCESS Y
	--------------	--------------
			mutex_lock A
	mutex_lock A
			wait_for_complete B // DEADLOCK
	mutex_unlock A
	complete B
			mutex_unlock A

where A is a lock class and B is a completion variable.

No, we cannot.

Not only lock operations, but also any operations causing to wait or
spin for something can cause deadlock unless it's eventually *released*
by someone. The important point here is that the waiting or spinning
must be *released* by someone.

Using crossrelease feature, we can check dependency and detect deadlock
possibility not only for typical lock, but also for lock_page(),
wait_for_xxx() and so on, which might be released in any context.

See the last patch including the document for more information.

Byungchul Park (16):
  lockdep: Refactor lookup_chain_cache()
  lockdep: Add a function building a chain between two classes
  lockdep: Change the meaning of check_prev_add()'s return value
  lockdep: Make check_prev_add() able to handle external stack_trace
  lockdep: Implement crossrelease feature
  lockdep: Detect and handle hist_lock ring buffer overwrite
  lockdep: Handle non(or multi)-acquisition of a crosslock
  lockdep: Avoid adding redundant direct links of crosslocks
  lockdep: Fix incorrect condition to print bug msgs for
    MAX_LOCKDEP_CHAIN_HLOCKS
  lockdep: Make print_circular_bug() aware of crossrelease
  lockdep: Apply crossrelease to completions
  pagemap.h: Remove trailing white space
  lockdep: Apply crossrelease to PG_locked locks
  lockdep: Apply lock_acquire(release) on __Set(__Clear)PageLocked
  lockdep: Move data of CONFIG_LOCKDEP_PAGELOCK from page to page_ext
  lockdep: Crossrelease feature documentation

 Documentation/locking/crossrelease.txt | 874 ++++++++++++++++++++++++++++++++
 include/linux/completion.h             | 118 ++++-
 include/linux/irqflags.h               |  24 +-
 include/linux/lockdep.h                | 162 +++++-
 include/linux/mm_types.h               |   4 +
 include/linux/page-flags.h             |  43 +-
 include/linux/page_ext.h               |   4 +
 include/linux/pagemap.h                | 125 ++++-
 include/linux/sched.h                  |  12 +
 kernel/exit.c                          |   1 +
 kernel/fork.c                          |   3 +
 kernel/locking/lockdep.c               | 882 +++++++++++++++++++++++++++++----
 kernel/sched/completion.c              |  54 +-
 kernel/workqueue.c                     |   2 +
 lib/Kconfig.debug                      |  29 ++
 mm/filemap.c                           |  73 ++-
 mm/page_ext.c                          |   4 +
 17 files changed, 2262 insertions(+), 152 deletions(-)
 create mode 100644 Documentation/locking/crossrelease.txt

-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v7 01/16] lockdep: Refactor lookup_chain_cache()
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 02/16] lockdep: Add a function building a chain between two classes Byungchul Park
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

Currently, lookup_chain_cache() provides both 'lookup' and 'add'
functionalities in a function. However, each is useful. So this
patch makes lookup_chain_cache() only do 'lookup' functionality and
makes add_chain_cahce() only do 'add' functionality. And it's more
readable than before.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 132 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 86 insertions(+), 46 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 4d7ffc0..0c6e6b7 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2110,14 +2110,15 @@ static int check_no_collision(struct task_struct *curr,
 }
 
 /*
- * Look up a dependency chain. If the key is not present yet then
- * add it and return 1 - in this case the new dependency chain is
- * validated. If the key is already hashed, return 0.
- * (On return with 1 graph_lock is held.)
+ * Adds a dependency chain into chain hashtable. And must be called with
+ * graph_lock held.
+ *
+ * Return 0 if fail, and graph_lock is released.
+ * Return 1 if succeed, with graph_lock held.
  */
-static inline int lookup_chain_cache(struct task_struct *curr,
-				     struct held_lock *hlock,
-				     u64 chain_key)
+static inline int add_chain_cache(struct task_struct *curr,
+				  struct held_lock *hlock,
+				  u64 chain_key)
 {
 	struct lock_class *class = hlock_class(hlock);
 	struct hlist_head *hash_head = chainhashentry(chain_key);
@@ -2125,49 +2126,18 @@ static inline int lookup_chain_cache(struct task_struct *curr,
 	int i, j;
 
 	/*
+	 * Allocate a new chain entry from the static array, and add
+	 * it to the hash:
+	 */
+
+	/*
 	 * We might need to take the graph lock, ensure we've got IRQs
 	 * disabled to make this an IRQ-safe lock.. for recursion reasons
 	 * lockdep won't complain about its own locking errors.
 	 */
 	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
 		return 0;
-	/*
-	 * We can walk it lock-free, because entries only get added
-	 * to the hash:
-	 */
-	hlist_for_each_entry_rcu(chain, hash_head, entry) {
-		if (chain->chain_key == chain_key) {
-cache_hit:
-			debug_atomic_inc(chain_lookup_hits);
-			if (!check_no_collision(curr, hlock, chain))
-				return 0;
 
-			if (very_verbose(class))
-				printk("\nhash chain already cached, key: "
-					"%016Lx tail class: [%p] %s\n",
-					(unsigned long long)chain_key,
-					class->key, class->name);
-			return 0;
-		}
-	}
-	if (very_verbose(class))
-		printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
-			(unsigned long long)chain_key, class->key, class->name);
-	/*
-	 * Allocate a new chain entry from the static array, and add
-	 * it to the hash:
-	 */
-	if (!graph_lock())
-		return 0;
-	/*
-	 * We have to walk the chain again locked - to avoid duplicates:
-	 */
-	hlist_for_each_entry(chain, hash_head, entry) {
-		if (chain->chain_key == chain_key) {
-			graph_unlock();
-			goto cache_hit;
-		}
-	}
 	if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
 		if (!debug_locks_off_graph_unlock())
 			return 0;
@@ -2219,6 +2189,75 @@ static inline int lookup_chain_cache(struct task_struct *curr,
 	return 1;
 }
 
+/*
+ * Look up a dependency chain.
+ */
+static inline struct lock_chain *lookup_chain_cache(u64 chain_key)
+{
+	struct hlist_head *hash_head = chainhashentry(chain_key);
+	struct lock_chain *chain;
+
+	/*
+	 * We can walk it lock-free, because entries only get added
+	 * to the hash:
+	 */
+	hlist_for_each_entry_rcu(chain, hash_head, entry) {
+		if (chain->chain_key == chain_key) {
+			debug_atomic_inc(chain_lookup_hits);
+			return chain;
+		}
+	}
+	return NULL;
+}
+
+/*
+ * If the key is not present yet in dependency chain cache then
+ * add it and return 1 - in this case the new dependency chain is
+ * validated. If the key is already hashed, return 0.
+ * (On return with 1 graph_lock is held.)
+ */
+static inline int lookup_chain_cache_add(struct task_struct *curr,
+					 struct held_lock *hlock,
+					 u64 chain_key)
+{
+	struct lock_class *class = hlock_class(hlock);
+	struct lock_chain *chain = lookup_chain_cache(chain_key);
+
+	if (chain) {
+cache_hit:
+		if (!check_no_collision(curr, hlock, chain))
+			return 0;
+
+		if (very_verbose(class))
+			printk("\nhash chain already cached, key: "
+					"%016Lx tail class: [%p] %s\n",
+					(unsigned long long)chain_key,
+					class->key, class->name);
+		return 0;
+	}
+
+	if (very_verbose(class))
+		printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
+			(unsigned long long)chain_key, class->key, class->name);
+
+	if (!graph_lock())
+		return 0;
+
+	/*
+	 * We have to walk the chain again locked - to avoid duplicates:
+	 */
+	chain = lookup_chain_cache(chain_key);
+	if (chain) {
+		graph_unlock();
+		goto cache_hit;
+	}
+
+	if (!add_chain_cache(curr, hlock, chain_key))
+		return 0;
+
+	return 1;
+}
+
 static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 		struct held_lock *hlock, int chain_head, u64 chain_key)
 {
@@ -2229,11 +2268,11 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 	 *
 	 * We look up the chain_key and do the O(N^2) check and update of
 	 * the dependencies only if this is a new dependency chain.
-	 * (If lookup_chain_cache() returns with 1 it acquires
+	 * (If lookup_chain_cache_add() return with 1 it acquires
 	 * graph_lock for us)
 	 */
 	if (!hlock->trylock && hlock->check &&
-	    lookup_chain_cache(curr, hlock, chain_key)) {
+	    lookup_chain_cache_add(curr, hlock, chain_key)) {
 		/*
 		 * Check whether last held lock:
 		 *
@@ -2264,9 +2303,10 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 		if (!chain_head && ret != 2)
 			if (!check_prevs_add(curr, hlock))
 				return 0;
+
 		graph_unlock();
 	} else
-		/* after lookup_chain_cache(): */
+		/* after lookup_chain_cache_add(): */
 		if (unlikely(!debug_locks))
 			return 0;
 
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 02/16] lockdep: Add a function building a chain between two classes
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 01/16] lockdep: Refactor lookup_chain_cache() Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 03/16] lockdep: Change the meaning of check_prev_add()'s return value Byungchul Park
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

Crossrelease needs to build a chain between two classes regardless of
their contexts. However, add_chain_cache() cannot be used for that
purpose since it assumes that it's called in the acquisition context
of the hlock. So this patch introduces a new function doing it.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 0c6e6b7..eb39474 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2110,6 +2110,76 @@ static int check_no_collision(struct task_struct *curr,
 }
 
 /*
+ * This is for building a chain between just two different classes,
+ * instead of adding a new hlock upon current, which is done by
+ * add_chain_cache().
+ *
+ * This can be called in any context with two classes, while
+ * add_chain_cache() must be done within the lock owener's context
+ * since it uses hlock which might be racy in another context.
+ */
+static inline int add_chain_cache_classes(unsigned int prev,
+					  unsigned int next,
+					  unsigned int irq_context,
+					  u64 chain_key)
+{
+	struct hlist_head *hash_head = chainhashentry(chain_key);
+	struct lock_chain *chain;
+
+	/*
+	 * Allocate a new chain entry from the static array, and add
+	 * it to the hash:
+	 */
+
+	/*
+	 * We might need to take the graph lock, ensure we've got IRQs
+	 * disabled to make this an IRQ-safe lock.. for recursion reasons
+	 * lockdep won't complain about its own locking errors.
+	 */
+	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+		return 0;
+
+	if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
+		if (!debug_locks_off_graph_unlock())
+			return 0;
+
+		print_lockdep_off("BUG: MAX_LOCKDEP_CHAINS too low!");
+		dump_stack();
+		return 0;
+	}
+
+	chain = lock_chains + nr_lock_chains++;
+	chain->chain_key = chain_key;
+	chain->irq_context = irq_context;
+	chain->depth = 2;
+	if (likely(nr_chain_hlocks + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS)) {
+		chain->base = nr_chain_hlocks;
+		nr_chain_hlocks += chain->depth;
+		chain_hlocks[chain->base] = prev - 1;
+		chain_hlocks[chain->base + 1] = next -1;
+	}
+#ifdef CONFIG_DEBUG_LOCKDEP
+	/*
+	 * Important for check_no_collision().
+	 */
+	else {
+		if (!debug_locks_off_graph_unlock())
+			return 0;
+
+		print_lockdep_off("BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!");
+		dump_stack();
+		return 0;
+	}
+#endif
+
+	hlist_add_head_rcu(&chain->entry, hash_head);
+	debug_atomic_inc(chain_lookup_misses);
+	inc_chains();
+
+	return 1;
+}
+
+/*
  * Adds a dependency chain into chain hashtable. And must be called with
  * graph_lock held.
  *
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 03/16] lockdep: Change the meaning of check_prev_add()'s return value
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 01/16] lockdep: Refactor lookup_chain_cache() Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 02/16] lockdep: Add a function building a chain between two classes Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 04/16] lockdep: Make check_prev_add() able to handle external stack_trace Byungchul Park
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

Firstly, return 1 instead of 2 when 'prev -> next' dependency already
exists. Since the value 2 is not referenced anywhere, just return 1
indicating success in this case.

Secondly, return 2 instead of 1 when successfully added a lock_list
entry with saving stack_trace. With that, a caller can decide whether
to avoid redundant save_trace() on the caller site.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index eb39474..4709110 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1854,7 +1854,7 @@ static inline void inc_chains(void)
 		if (entry->class == hlock_class(next)) {
 			if (distance == 1)
 				entry->distance = 1;
-			return 2;
+			return 1;
 		}
 	}
 
@@ -1894,9 +1894,10 @@ static inline void inc_chains(void)
 		print_lock_name(hlock_class(next));
 		printk(KERN_CONT "\n");
 		dump_stack();
-		return graph_lock();
+		if (!graph_lock())
+			return 0;
 	}
-	return 1;
+	return 2;
 }
 
 /*
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 04/16] lockdep: Make check_prev_add() able to handle external stack_trace
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
                   ` (2 preceding siblings ...)
  2017-05-24  8:59 ` [PATCH v7 03/16] lockdep: Change the meaning of check_prev_add()'s return value Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 05/16] lockdep: Implement crossrelease feature Byungchul Park
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

Currently, a space for stack_trace is pinned in check_prev_add(), that
makes us not able to use external stack_trace. The simplest way to
achieve it is to pass an external stack_trace as an argument.

A more suitable solution is to pass a callback additionally along with
a stack_trace so that callers can decide the way to save or whether to
save. Actually crossrelease needs to do other than saving a stack_trace.
So pass a stack_trace and callback to handle it, to check_prev_add().

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 40 +++++++++++++++++++---------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 4709110..2847356 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1797,20 +1797,13 @@ static inline void inc_chains(void)
  */
 static int
 check_prev_add(struct task_struct *curr, struct held_lock *prev,
-	       struct held_lock *next, int distance, int *stack_saved)
+	       struct held_lock *next, int distance, struct stack_trace *trace,
+	       int (*save)(struct stack_trace *trace))
 {
 	struct lock_list *entry;
 	int ret;
 	struct lock_list this;
 	struct lock_list *uninitialized_var(target_entry);
-	/*
-	 * Static variable, serialized by the graph_lock().
-	 *
-	 * We use this static variable to save the stack trace in case
-	 * we call into this function multiple times due to encountering
-	 * trylocks in the held lock stack.
-	 */
-	static struct stack_trace trace;
 
 	/*
 	 * Prove that the new <prev> -> <next> dependency would not
@@ -1858,11 +1851,8 @@ static inline void inc_chains(void)
 		}
 	}
 
-	if (!*stack_saved) {
-		if (!save_trace(&trace))
-			return 0;
-		*stack_saved = 1;
-	}
+	if (save && !save(trace))
+		return 0;
 
 	/*
 	 * Ok, all validations passed, add the new lock
@@ -1870,14 +1860,14 @@ static inline void inc_chains(void)
 	 */
 	ret = add_lock_to_list(hlock_class(prev), hlock_class(next),
 			       &hlock_class(prev)->locks_after,
-			       next->acquire_ip, distance, &trace);
+			       next->acquire_ip, distance, trace);
 
 	if (!ret)
 		return 0;
 
 	ret = add_lock_to_list(hlock_class(next), hlock_class(prev),
 			       &hlock_class(next)->locks_before,
-			       next->acquire_ip, distance, &trace);
+			       next->acquire_ip, distance, trace);
 	if (!ret)
 		return 0;
 
@@ -1885,8 +1875,6 @@ static inline void inc_chains(void)
 	 * Debugging printouts:
 	 */
 	if (verbose(hlock_class(prev)) || verbose(hlock_class(next))) {
-		/* We drop graph lock, so another thread can overwrite trace. */
-		*stack_saved = 0;
 		graph_unlock();
 		printk("\n new dependency: ");
 		print_lock_name(hlock_class(prev));
@@ -1910,8 +1898,9 @@ static inline void inc_chains(void)
 check_prevs_add(struct task_struct *curr, struct held_lock *next)
 {
 	int depth = curr->lockdep_depth;
-	int stack_saved = 0;
 	struct held_lock *hlock;
+	struct stack_trace trace;
+	int (*save)(struct stack_trace *trace) = save_trace;
 
 	/*
 	 * Debugging checks.
@@ -1936,9 +1925,18 @@ static inline void inc_chains(void)
 		 * added:
 		 */
 		if (hlock->read != 2 && hlock->check) {
-			if (!check_prev_add(curr, hlock, next,
-						distance, &stack_saved))
+			int ret = check_prev_add(curr, hlock, next,
+						distance, &trace, save);
+			if (!ret)
 				return 0;
+
+			/*
+			 * Stop saving stack_trace if save_trace() was
+			 * called at least once:
+			 */
+			if (save && ret == 2)
+				save = NULL;
+
 			/*
 			 * Stop after the first non-trylock entry,
 			 * as non-trylock entries have added their
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 05/16] lockdep: Implement crossrelease feature
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
                   ` (3 preceding siblings ...)
  2017-05-24  8:59 ` [PATCH v7 04/16] lockdep: Make check_prev_add() able to handle external stack_trace Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-06-13  0:33   ` Byungchul Park
  2017-07-11 16:04   ` Peter Zijlstra
  2017-05-24  8:59 ` [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite Byungchul Park
                   ` (10 subsequent siblings)
  15 siblings, 2 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

Lockdep is a runtime locking correctness validator that detects and
reports a deadlock or its possibility by checking dependencies between
locks. It's useful since it does not report just an actual deadlock but
also the possibility of a deadlock that has not actually happened yet.
That enables problems to be fixed before they affect real systems.

However, this facility is only applicable to typical locks, such as
spinlocks and mutexes, which are normally released within the context in
which they were acquired. However, synchronization primitives like page
locks or completions, which are allowed to be released in any context,
also create dependencies and can cause a deadlock. So lockdep should
track these locks to do a better job. The 'crossrelease' implementation
makes these primitives also be tracked.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/irqflags.h |  24 ++-
 include/linux/lockdep.h  | 111 ++++++++++-
 include/linux/sched.h    |   8 +
 kernel/exit.c            |   1 +
 kernel/fork.c            |   3 +
 kernel/locking/lockdep.c | 474 ++++++++++++++++++++++++++++++++++++++++++++---
 kernel/workqueue.c       |   2 +
 lib/Kconfig.debug        |  12 ++
 8 files changed, 601 insertions(+), 34 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 5dd1272..c40af8a 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -23,10 +23,26 @@
 # define trace_softirq_context(p)	((p)->softirq_context)
 # define trace_hardirqs_enabled(p)	((p)->hardirqs_enabled)
 # define trace_softirqs_enabled(p)	((p)->softirqs_enabled)
-# define trace_hardirq_enter()	do { current->hardirq_context++; } while (0)
-# define trace_hardirq_exit()	do { current->hardirq_context--; } while (0)
-# define lockdep_softirq_enter()	do { current->softirq_context++; } while (0)
-# define lockdep_softirq_exit()	do { current->softirq_context--; } while (0)
+# define trace_hardirq_enter()		\
+do {					\
+	current->hardirq_context++;	\
+	crossrelease_hardirq_start();	\
+} while (0)
+# define trace_hardirq_exit()		\
+do {					\
+	current->hardirq_context--;	\
+	crossrelease_hardirq_end();	\
+} while (0)
+# define lockdep_softirq_enter()	\
+do {					\
+	current->softirq_context++;	\
+	crossrelease_softirq_start();	\
+} while (0)
+# define lockdep_softirq_exit()		\
+do {					\
+	current->softirq_context--;	\
+	crossrelease_softirq_end();	\
+} while (0)
 # define INIT_TRACE_IRQFLAGS	.softirqs_enabled = 1,
 #else
 # define trace_hardirqs_on()		do { } while (0)
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index c1458fe..d531097 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -155,6 +155,12 @@ struct lockdep_map {
 	int				cpu;
 	unsigned long			ip;
 #endif
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+	/*
+	 * Whether it's a crosslock.
+	 */
+	int				cross;
+#endif
 };
 
 static inline void lockdep_copy_map(struct lockdep_map *to,
@@ -258,7 +264,61 @@ struct held_lock {
 	unsigned int hardirqs_off:1;
 	unsigned int references:12;					/* 32 bits */
 	unsigned int pin_count;
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+	/*
+	 * Generation id.
+	 *
+	 * A value of cross_gen_id will be stored when holding this,
+	 * which is globally increased whenever each crosslock is held.
+	 */
+	unsigned int gen_id;
+#endif
+};
+
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+#define MAX_XHLOCK_TRACE_ENTRIES 5
+
+/*
+ * This is for keeping locks waiting for commit so that true dependencies
+ * can be added at commit step.
+ */
+struct hist_lock {
+	/*
+	 * Seperate stack_trace data. This will be used at commit step.
+	 */
+	struct stack_trace	trace;
+	unsigned long		trace_entries[MAX_XHLOCK_TRACE_ENTRIES];
+
+	/*
+	 * Seperate hlock instance. This will be used at commit step.
+	 *
+	 * TODO: Use a smaller data structure containing only necessary
+	 * data. However, we should make lockdep code able to handle the
+	 * smaller one first.
+	 */
+	struct held_lock	hlock;
+};
+
+/*
+ * To initialize a lock as crosslock, lockdep_init_map_crosslock() should
+ * be called instead of lockdep_init_map().
+ */
+struct cross_lock {
+	/*
+	 * Seperate hlock instance. This will be used at commit step.
+	 *
+	 * TODO: Use a smaller data structure containing only necessary
+	 * data. However, we should make lockdep code able to handle the
+	 * smaller one first.
+	 */
+	struct held_lock	hlock;
+};
+
+struct lockdep_map_cross {
+	struct lockdep_map map;
+	struct cross_lock xlock;
 };
+#endif
 
 /*
  * Initialization, self-test and debugging-output methods:
@@ -282,13 +342,6 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
 			     struct lock_class_key *key, int subclass);
 
 /*
- * To initialize a lockdep_map statically use this macro.
- * Note that _name must not be NULL.
- */
-#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
-	{ .name = (_name), .key = (void *)(_key), }
-
-/*
  * Reinitialize a lock key - for cases where there is special locking or
  * special initialization of locks so that the validator gets the scope
  * of dependencies wrong: they are either too broad (they need a class-split)
@@ -443,6 +496,50 @@ static inline void lockdep_on(void)
 
 #endif /* !LOCKDEP */
 
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+extern void lockdep_init_map_crosslock(struct lockdep_map *lock,
+				       const char *name,
+				       struct lock_class_key *key,
+				       int subclass);
+extern void lock_commit_crosslock(struct lockdep_map *lock);
+
+#define STATIC_CROSS_LOCKDEP_MAP_INIT(_name, _key) \
+	{ .map.name = (_name), .map.key = (void *)(_key), \
+	  .map.cross = 1, }
+
+/*
+ * To initialize a lockdep_map statically use this macro.
+ * Note that _name must not be NULL.
+ */
+#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
+	{ .name = (_name), .key = (void *)(_key), .cross = 0, }
+
+extern void crossrelease_hardirq_start(void);
+extern void crossrelease_hardirq_end(void);
+extern void crossrelease_softirq_start(void);
+extern void crossrelease_softirq_end(void);
+extern void crossrelease_work_start(void);
+extern void crossrelease_work_end(void);
+extern void init_crossrelease_task(struct task_struct *task);
+extern void free_crossrelease_task(struct task_struct *task);
+#else
+/*
+ * To initialize a lockdep_map statically use this macro.
+ * Note that _name must not be NULL.
+ */
+#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
+	{ .name = (_name), .key = (void *)(_key), }
+
+static inline void crossrelease_hardirq_start(void) {}
+static inline void crossrelease_hardirq_end(void) {}
+static inline void crossrelease_softirq_start(void) {}
+static inline void crossrelease_softirq_end(void) {}
+static inline void crossrelease_work_start(void) {}
+static inline void crossrelease_work_end(void) {}
+static inline void init_crossrelease_task(struct task_struct *task) {}
+static inline void free_crossrelease_task(struct task_struct *task) {}
+#endif
+
 #ifdef CONFIG_LOCK_STAT
 
 extern void lock_contended(struct lockdep_map *lock, unsigned long ip);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e9c009d..5f6d6f4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1749,6 +1749,14 @@ struct task_struct {
 	struct held_lock held_locks[MAX_LOCK_DEPTH];
 	gfp_t lockdep_reclaim_gfp;
 #endif
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+#define MAX_XHLOCKS_NR 64UL
+	struct hist_lock *xhlocks; /* Crossrelease history locks */
+	unsigned int xhlock_idx;
+	unsigned int xhlock_idx_soft; /* For restoring at softirq exit */
+	unsigned int xhlock_idx_hard; /* For restoring at hardirq exit */
+	unsigned int xhlock_idx_work; /* For restoring at work exit */
+#endif
 #ifdef CONFIG_UBSAN
 	unsigned int in_ubsan;
 #endif
diff --git a/kernel/exit.c b/kernel/exit.c
index 3076f30..cc56aad 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -883,6 +883,7 @@ void __noreturn do_exit(long code)
 	exit_rcu();
 	TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));
 
+	free_crossrelease_task(tsk);
 	do_task_dead();
 }
 EXPORT_SYMBOL_GPL(do_exit);
diff --git a/kernel/fork.c b/kernel/fork.c
index 997ac1d..f9623a0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -451,6 +451,7 @@ void __init fork_init(void)
 	for (i = 0; i < UCOUNT_COUNTS; i++) {
 		init_user_ns.ucount_max[i] = max_threads/2;
 	}
+	init_crossrelease_task(&init_task);
 }
 
 int __weak arch_dup_task_struct(struct task_struct *dst,
@@ -1611,6 +1612,7 @@ static __latent_entropy struct task_struct *copy_process(
 	p->lockdep_depth = 0; /* no locks held yet */
 	p->curr_chain_key = 0;
 	p->lockdep_recursion = 0;
+	init_crossrelease_task(p);
 #endif
 
 #ifdef CONFIG_DEBUG_MUTEXES
@@ -1856,6 +1858,7 @@ static __latent_entropy struct task_struct *copy_process(
 bad_fork_cleanup_perf:
 	perf_event_free_task(p);
 bad_fork_cleanup_policy:
+	free_crossrelease_task(p);
 #ifdef CONFIG_NUMA
 	mpol_put(p->mempolicy);
 bad_fork_cleanup_threadgroup_lock:
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 2847356..63eb04a 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -55,6 +55,10 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/lock.h>
 
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+#include <linux/slab.h>
+#endif
+
 #ifdef CONFIG_PROVE_LOCKING
 int prove_locking = 1;
 module_param(prove_locking, int, 0644);
@@ -709,6 +713,18 @@ static int count_matching_names(struct lock_class *new_class)
 	return NULL;
 }
 
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+static void cross_init(struct lockdep_map *lock, int cross);
+static int cross_lock(struct lockdep_map *lock);
+static int lock_acquire_crosslock(struct held_lock *hlock);
+static int lock_release_crosslock(struct lockdep_map *lock);
+#else
+static inline void cross_init(struct lockdep_map *lock, int cross) {}
+static inline int cross_lock(struct lockdep_map *lock) { return 0; }
+static inline int lock_acquire_crosslock(struct held_lock *hlock) { return 2; }
+static inline int lock_release_crosslock(struct lockdep_map *lock) { return 2; }
+#endif
+
 /*
  * Register a lock's class in the hash-table, if the class is not present
  * yet. Otherwise we look it up. We cache the result in the lock object
@@ -1768,6 +1784,9 @@ static inline void inc_chains(void)
 		if (nest)
 			return 2;
 
+		if (cross_lock(prev->instance))
+			continue;
+
 		return print_deadlock_bug(curr, prev, next);
 	}
 	return 1;
@@ -1921,30 +1940,36 @@ static inline void inc_chains(void)
 		int distance = curr->lockdep_depth - depth + 1;
 		hlock = curr->held_locks + depth - 1;
 		/*
-		 * Only non-recursive-read entries get new dependencies
-		 * added:
+		 * Only non-crosslock entries get new dependencies added.
+		 * Crosslock entries will be added by commit later:
 		 */
-		if (hlock->read != 2 && hlock->check) {
-			int ret = check_prev_add(curr, hlock, next,
-						distance, &trace, save);
-			if (!ret)
-				return 0;
-
+		if (!cross_lock(hlock->instance)) {
 			/*
-			 * Stop saving stack_trace if save_trace() was
-			 * called at least once:
+			 * Only non-recursive-read entries get new dependencies
+			 * added:
 			 */
-			if (save && ret == 2)
-				save = NULL;
+			if (hlock->read != 2 && hlock->check) {
+				int ret = check_prev_add(curr, hlock, next,
+							 distance, &trace, save);
+				if (!ret)
+					return 0;
 
-			/*
-			 * Stop after the first non-trylock entry,
-			 * as non-trylock entries have added their
-			 * own direct dependencies already, so this
-			 * lock is connected to them indirectly:
-			 */
-			if (!hlock->trylock)
-				break;
+				/*
+				 * Stop saving stack_trace if save_trace() was
+				 * called at least once:
+				 */
+				if (save && ret == 2)
+					save = NULL;
+
+				/*
+				 * Stop after the first non-trylock entry,
+				 * as non-trylock entries have added their
+				 * own direct dependencies already, so this
+				 * lock is connected to them indirectly:
+				 */
+				if (!hlock->trylock)
+					break;
+			}
 		}
 		depth--;
 		/*
@@ -3203,7 +3228,7 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this,
 /*
  * Initialize a lock instance's lock-class mapping info:
  */
-void lockdep_init_map(struct lockdep_map *lock, const char *name,
+static void __lockdep_init_map(struct lockdep_map *lock, const char *name,
 		      struct lock_class_key *key, int subclass)
 {
 	int i;
@@ -3261,8 +3286,25 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name,
 		raw_local_irq_restore(flags);
 	}
 }
+
+void lockdep_init_map(struct lockdep_map *lock, const char *name,
+		      struct lock_class_key *key, int subclass)
+{
+	cross_init(lock, 0);
+	__lockdep_init_map(lock, name, key, subclass);
+}
 EXPORT_SYMBOL_GPL(lockdep_init_map);
 
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+void lockdep_init_map_crosslock(struct lockdep_map *lock, const char *name,
+		      struct lock_class_key *key, int subclass)
+{
+	cross_init(lock, 1);
+	__lockdep_init_map(lock, name, key, subclass);
+}
+EXPORT_SYMBOL_GPL(lockdep_init_map_crosslock);
+#endif
+
 struct lock_class_key __lockdep_no_validate__;
 EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
 
@@ -3317,6 +3359,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	unsigned int depth;
 	int chain_head = 0;
 	int class_idx;
+	int ret;
 	u64 chain_key;
 
 	if (unlikely(!debug_locks))
@@ -3366,7 +3409,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 
 	class_idx = class - lock_classes + 1;
 
-	if (depth) {
+	/* TODO: nest_lock is not implemented for crosslock yet. */
+	if (depth && !cross_lock(lock)) {
 		hlock = curr->held_locks + depth - 1;
 		if (hlock->class_idx == class_idx && nest_lock) {
 			if (hlock->references)
@@ -3447,6 +3491,14 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	if (!validate_chain(curr, lock, hlock, chain_head, chain_key))
 		return 0;
 
+	ret = lock_acquire_crosslock(hlock);
+	/*
+	 * 2 means normal acquire operations are needed. Otherwise, it's
+	 * ok just to return with '0:fail, 1:success'.
+	 */
+	if (ret != 2)
+		return ret;
+
 	curr->curr_chain_key = chain_key;
 	curr->lockdep_depth++;
 	check_chain_key(curr);
@@ -3610,11 +3662,19 @@ static int match_held_lock(struct held_lock *hlock, struct lockdep_map *lock)
 	struct task_struct *curr = current;
 	struct held_lock *hlock, *prev_hlock;
 	unsigned int depth;
-	int i;
+	int ret, i;
 
 	if (unlikely(!debug_locks))
 		return 0;
 
+	ret = lock_release_crosslock(lock);
+	/*
+	 * 2 means normal release operations are needed. Otherwise, it's
+	 * ok just to return with '0:fail, 1:success'.
+	 */
+	if (ret != 2)
+		return ret;
+
 	depth = curr->lockdep_depth;
 	/*
 	 * So we're all set to release this lock.. wait what lock? We don't
@@ -4557,3 +4617,371 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
 	dump_stack();
 }
 EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
+
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+
+#define xhlock(i)         (current->xhlocks[(i) % MAX_XHLOCKS_NR])
+
+/*
+ * Whenever a crosslock is held, cross_gen_id will be increased.
+ */
+static atomic_t cross_gen_id; /* Can be wrapped */
+
+void crossrelease_hardirq_start(void)
+{
+	if (current->xhlocks)
+		current->xhlock_idx_hard = current->xhlock_idx;
+}
+
+void crossrelease_hardirq_end(void)
+{
+	if (current->xhlocks)
+		current->xhlock_idx = current->xhlock_idx_hard;
+}
+
+void crossrelease_softirq_start(void)
+{
+	if (current->xhlocks)
+		current->xhlock_idx_soft = current->xhlock_idx;
+}
+
+void crossrelease_softirq_end(void)
+{
+	if (current->xhlocks)
+		current->xhlock_idx = current->xhlock_idx_soft;
+}
+
+/*
+ * Each work of workqueue might run in a different context,
+ * thanks to concurrency support of workqueue. So we have to
+ * distinguish each work to avoid false positive.
+ */
+void crossrelease_work_start(void)
+{
+	if (current->xhlocks)
+		current->xhlock_idx_work = current->xhlock_idx;
+}
+
+void crossrelease_work_end(void)
+{
+	if (current->xhlocks)
+		current->xhlock_idx = current->xhlock_idx_work;
+}
+
+static int cross_lock(struct lockdep_map *lock)
+{
+	return lock ? lock->cross : 0;
+}
+
+/*
+ * This is needed to decide the relationship between wrapable variables.
+ */
+static inline int before(unsigned int a, unsigned int b)
+{
+	return (int)(a - b) < 0;
+}
+
+static inline struct lock_class *xhlock_class(struct hist_lock *xhlock)
+{
+	return hlock_class(&xhlock->hlock);
+}
+
+static inline struct lock_class *xlock_class(struct cross_lock *xlock)
+{
+	return hlock_class(&xlock->hlock);
+}
+
+/*
+ * Should we check a dependency with previous one?
+ */
+static inline int depend_before(struct held_lock *hlock)
+{
+	return hlock->read != 2 && hlock->check && !hlock->trylock;
+}
+
+/*
+ * Should we check a dependency with next one?
+ */
+static inline int depend_after(struct held_lock *hlock)
+{
+	return hlock->read != 2 && hlock->check;
+}
+
+/*
+ * Check if the xhlock is valid, which would be false if,
+ *
+ *    1. Has not used after initializaion yet.
+ *
+ * Remind hist_lock is implemented as a ring buffer.
+ */
+static inline int xhlock_valid(struct hist_lock *xhlock)
+{
+	/*
+	 * xhlock->hlock.instance must be !NULL.
+	 */
+	return !!xhlock->hlock.instance;
+}
+
+/*
+ * Record a hist_lock entry.
+ *
+ * Irq disable is only required.
+ */
+static void add_xhlock(struct held_lock *hlock)
+{
+	unsigned int idx = ++current->xhlock_idx;
+	struct hist_lock *xhlock = &xhlock(idx);
+
+#ifdef CONFIG_DEBUG_LOCKDEP
+	/*
+	 * This can be done locklessly because they are all task-local
+	 * state, we must however ensure IRQs are disabled.
+	 */
+	WARN_ON_ONCE(!irqs_disabled());
+#endif
+
+	/* Initialize hist_lock's members */
+	xhlock->hlock = *hlock;
+
+	xhlock->trace.nr_entries = 0;
+	xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
+	xhlock->trace.entries = xhlock->trace_entries;
+	xhlock->trace.skip = 3;
+	save_stack_trace(&xhlock->trace);
+}
+
+static inline int same_context_xhlock(struct hist_lock *xhlock)
+{
+	return xhlock->hlock.irq_context == task_irq_context(current);
+}
+
+/*
+ * This should be lockless as far as possible because this would be
+ * called very frequently.
+ */
+static void check_add_xhlock(struct held_lock *hlock)
+{
+	/*
+	 * Record a hist_lock, only in case that acquisitions ahead
+	 * could depend on the held_lock. For example, if the held_lock
+	 * is trylock then acquisitions ahead never depends on that.
+	 * In that case, we don't need to record it. Just return.
+	 */
+	if (!current->xhlocks || !depend_before(hlock))
+		return;
+
+	add_xhlock(hlock);
+}
+
+/*
+ * For crosslock.
+ */
+static int add_xlock(struct held_lock *hlock)
+{
+	struct cross_lock *xlock;
+	unsigned int gen_id;
+
+	if (!graph_lock())
+		return 0;
+
+	xlock = &((struct lockdep_map_cross *)hlock->instance)->xlock;
+
+	gen_id = (unsigned int)atomic_inc_return(&cross_gen_id);
+	xlock->hlock = *hlock;
+	xlock->hlock.gen_id = gen_id;
+	graph_unlock();
+
+	return 1;
+}
+
+/*
+ * Called for both normal and crosslock acquires. Normal locks will be
+ * pushed on the hist_lock queue. Cross locks will record state and
+ * stop regular lock_acquire() to avoid being placed on the held_lock
+ * stack.
+ *
+ * Return: 0 - failure;
+ *         1 - crosslock, done;
+ *         2 - normal lock, continue to held_lock[] ops.
+ */
+static int lock_acquire_crosslock(struct held_lock *hlock)
+{
+	/*
+	 *	CONTEXT 1		CONTEXT 2
+	 *	---------		---------
+	 *	lock A (cross)
+	 *	X = atomic_inc_return(&cross_gen_id)
+	 *	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+	 *				Y = atomic_read_acquire(&cross_gen_id)
+	 *				lock B
+	 *
+	 * atomic_read_acquire() is for ordering between A and B,
+	 * IOW, A happens before B, when CONTEXT 2 see Y >= X.
+	 *
+	 * Pairs with atomic_inc_return() in add_xlock().
+	 */
+	hlock->gen_id = (unsigned int)atomic_read_acquire(&cross_gen_id);
+
+	if (cross_lock(hlock->instance))
+		return add_xlock(hlock);
+
+	check_add_xhlock(hlock);
+	return 2;
+}
+
+static int copy_trace(struct stack_trace *trace)
+{
+	unsigned long *buf = stack_trace + nr_stack_trace_entries;
+	unsigned int max_nr = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
+	unsigned int nr = min(max_nr, trace->nr_entries);
+
+	trace->nr_entries = nr;
+	memcpy(buf, trace->entries, nr * sizeof(trace->entries[0]));
+	trace->entries = buf;
+	nr_stack_trace_entries += nr;
+
+	if (nr_stack_trace_entries >= MAX_STACK_TRACE_ENTRIES-1) {
+		if (!debug_locks_off_graph_unlock())
+			return 0;
+
+		print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!");
+		dump_stack();
+
+		return 0;
+	}
+
+	return 1;
+}
+
+static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
+{
+	unsigned int xid, pid;
+	u64 chain_key;
+
+	xid = xlock_class(xlock) - lock_classes;
+	chain_key = iterate_chain_key((u64)0, xid);
+	pid = xhlock_class(xhlock) - lock_classes;
+	chain_key = iterate_chain_key(chain_key, pid);
+
+	if (lookup_chain_cache(chain_key))
+		return 1;
+
+	if (!add_chain_cache_classes(xid, pid, xhlock->hlock.irq_context,
+				chain_key))
+		return 0;
+
+	if (!check_prev_add(current, &xlock->hlock, &xhlock->hlock, 1,
+			    &xhlock->trace, copy_trace))
+		return 0;
+
+	return 1;
+}
+
+static void commit_xhlocks(struct cross_lock *xlock)
+{
+	unsigned int cur = current->xhlock_idx;
+	unsigned int i;
+
+	if (!graph_lock())
+		return;
+
+	for (i = 0; i < MAX_XHLOCKS_NR; i++) {
+		struct hist_lock *xhlock = &xhlock(cur - i);
+
+		if (!xhlock_valid(xhlock))
+			break;
+
+		if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
+			break;
+
+		if (!same_context_xhlock(xhlock))
+			break;
+
+		/*
+		 * commit_xhlock() returns 0 with graph_lock already
+		 * released if fail.
+		 */
+		if (!commit_xhlock(xlock, xhlock))
+			return;
+	}
+
+	graph_unlock();
+}
+
+void lock_commit_crosslock(struct lockdep_map *lock)
+{
+	struct cross_lock *xlock;
+	unsigned long flags;
+
+	if (unlikely(!debug_locks || current->lockdep_recursion))
+		return;
+
+	if (!current->xhlocks)
+		return;
+
+	/*
+	 * Do commit hist_locks with the cross_lock, only in case that
+	 * the cross_lock could depend on acquisitions after that.
+	 *
+	 * For example, if the cross_lock does not have the 'check' flag
+	 * then we don't need to check dependencies and commit for that.
+	 * Just skip it. In that case, of course, the cross_lock does
+	 * not depend on acquisitions ahead, either.
+	 *
+	 * WARNING: Don't do that in add_xlock() in advance. When an
+	 * acquisition context is different from the commit context,
+	 * invalid(skipped) cross_lock might be accessed.
+	 */
+	if (!depend_after(&((struct lockdep_map_cross *)lock)->xlock.hlock))
+		return;
+
+	raw_local_irq_save(flags);
+	check_flags(flags);
+	current->lockdep_recursion = 1;
+	xlock = &((struct lockdep_map_cross *)lock)->xlock;
+	commit_xhlocks(xlock);
+	current->lockdep_recursion = 0;
+	raw_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(lock_commit_crosslock);
+
+/*
+ * Return: 1 - crosslock, done;
+ *         2 - normal lock, continue to held_lock[] ops.
+ */
+static int lock_release_crosslock(struct lockdep_map *lock)
+{
+	return cross_lock(lock) ? 1 : 2;
+}
+
+static void cross_init(struct lockdep_map *lock, int cross)
+{
+	lock->cross = cross;
+
+	/*
+	 * Crossrelease assumes that the ring buffer size of xhlocks
+	 * is aligned with power of 2. So force it on build.
+	 */
+	BUILD_BUG_ON(MAX_XHLOCKS_NR & (MAX_XHLOCKS_NR - 1));
+}
+
+void init_crossrelease_task(struct task_struct *task)
+{
+	task->xhlock_idx = UINT_MAX;
+	task->xhlock_idx_soft = UINT_MAX;
+	task->xhlock_idx_hard = UINT_MAX;
+	task->xhlock_idx_work = UINT_MAX;
+	task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,
+				GFP_KERNEL);
+}
+
+void free_crossrelease_task(struct task_struct *task)
+{
+	if (task->xhlocks) {
+		void *tmp = task->xhlocks;
+		/* Diable crossrelease for current */
+		task->xhlocks = NULL;
+		kfree(tmp);
+	}
+}
+#endif
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 479d840..2f43ac1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2092,6 +2092,7 @@ static void process_one_work(struct worker *worker, struct work_struct *work)
 
 	lock_map_acquire_read(&pwq->wq->lockdep_map);
 	lock_map_acquire(&lockdep_map);
+	crossrelease_work_start();
 	trace_workqueue_execute_start(work);
 	worker->current_func(work);
 	/*
@@ -2099,6 +2100,7 @@ static void process_one_work(struct worker *worker, struct work_struct *work)
 	 * point will only record its address.
 	 */
 	trace_workqueue_execute_end(work);
+	crossrelease_work_end();
 	lock_map_release(&lockdep_map);
 	lock_map_release(&pwq->wq->lockdep_map);
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a6c8db1..e584431 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1042,6 +1042,18 @@ config DEBUG_LOCK_ALLOC
 	 spin_lock_init()/mutex_init()/etc., or whether there is any lock
 	 held during task exit.
 
+config LOCKDEP_CROSSRELEASE
+	bool "Lock debugging: make lockdep work for crosslocks"
+	depends on PROVE_LOCKING
+	default n
+	help
+	 This makes lockdep work for crosslock which is a lock allowed to
+	 be released in a different context from the acquisition context.
+	 Normally a lock must be released in the context acquiring the lock.
+	 However, relexing this constraint helps synchronization primitives
+	 such as page locks or completions can use the lock correctness
+	 detector, lockdep.
+
 config PROVE_LOCKING
 	bool "Lock debugging: prove locking correctness"
 	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
                   ` (4 preceding siblings ...)
  2017-05-24  8:59 ` [PATCH v7 05/16] lockdep: Implement crossrelease feature Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-07-11 16:12   ` Peter Zijlstra
  2017-05-24  8:59 ` [PATCH v7 07/16] lockdep: Handle non(or multi)-acquisition of a crosslock Byungchul Park
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

The ring buffer can be overwritten by hardirq/softirq/work contexts.
That cases must be considered on rollback or commit. For example,

          |<------ hist_lock ring buffer size ----->|
          ppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
wrapped > iiiiiiiiiiiiiiiiiiiiiii....................

          where 'p' represents an acquisition in process context,
          'i' represents an acquisition in irq context.

On irq exit, crossrelease tries to rollback idx to original position,
but it should not because the entry already has been invalid by
overwriting 'i'. Avoid rollback or commit for entries overwritten.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h  | 20 +++++++++++
 include/linux/sched.h    |  4 +++
 kernel/locking/lockdep.c | 92 +++++++++++++++++++++++++++++++++++++++++-------
 3 files changed, 104 insertions(+), 12 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index d531097..a03f79d 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -284,6 +284,26 @@ struct held_lock {
  */
 struct hist_lock {
 	/*
+	 * Id for each entry in the ring buffer. This is used to
+	 * decide whether the ring buffer was overwritten or not.
+	 *
+	 * For example,
+	 *
+	 *           |<----------- hist_lock ring buffer size ------->|
+	 *           pppppppppppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiii
+	 * wrapped > iiiiiiiiiiiiiiiiiiiiiiiiiii.......................
+	 *
+	 *           where 'p' represents an acquisition in process
+	 *           context, 'i' represents an acquisition in irq
+	 *           context.
+	 *
+	 * In this example, the ring buffer was overwritten by
+	 * acquisitions in irq context, that should be detected on
+	 * rollback or commit.
+	 */
+	unsigned int hist_id;
+
+	/*
 	 * Seperate stack_trace data. This will be used at commit step.
 	 */
 	struct stack_trace	trace;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5f6d6f4..9e1437c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1756,6 +1756,10 @@ struct task_struct {
 	unsigned int xhlock_idx_soft; /* For restoring at softirq exit */
 	unsigned int xhlock_idx_hard; /* For restoring at hardirq exit */
 	unsigned int xhlock_idx_work; /* For restoring at work exit */
+	unsigned int hist_id;
+	unsigned int hist_id_soft; /* For overwrite check at softirq exit */
+	unsigned int hist_id_hard; /* For overwrite check at hardirq exit */
+	unsigned int hist_id_work; /* For overwrite check at work exit */
 #endif
 #ifdef CONFIG_UBSAN
 	unsigned int in_ubsan;
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 63eb04a..26ff205 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4627,28 +4627,65 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
  */
 static atomic_t cross_gen_id; /* Can be wrapped */
 
+/*
+ * Make an entry of the ring buffer invalid.
+ */
+static inline void invalidate_xhlock(struct hist_lock *xhlock)
+{
+	/*
+	 * Normally, xhlock->hlock.instance must be !NULL.
+	 */
+	xhlock->hlock.instance = NULL;
+}
+
 void crossrelease_hardirq_start(void)
 {
-	if (current->xhlocks)
-		current->xhlock_idx_hard = current->xhlock_idx;
+	struct task_struct *cur = current;
+
+	if (cur->xhlocks) {
+		cur->xhlock_idx_hard = cur->xhlock_idx;
+		cur->hist_id_hard = cur->hist_id;
+	}
 }
 
 void crossrelease_hardirq_end(void)
 {
-	if (current->xhlocks)
-		current->xhlock_idx = current->xhlock_idx_hard;
+	struct task_struct *cur = current;
+
+	if (cur->xhlocks) {
+		unsigned int idx = cur->xhlock_idx_hard;
+		struct hist_lock *h = &xhlock(idx);
+
+		cur->xhlock_idx = idx;
+		/* Check if the ring was overwritten. */
+		if (h->hist_id != cur->hist_id_hard)
+			invalidate_xhlock(h);
+	}
 }
 
 void crossrelease_softirq_start(void)
 {
-	if (current->xhlocks)
-		current->xhlock_idx_soft = current->xhlock_idx;
+	struct task_struct *cur = current;
+
+	if (cur->xhlocks) {
+		cur->xhlock_idx_soft = cur->xhlock_idx;
+		cur->hist_id_soft = cur->hist_id;
+	}
 }
 
 void crossrelease_softirq_end(void)
 {
-	if (current->xhlocks)
-		current->xhlock_idx = current->xhlock_idx_soft;
+	struct task_struct *cur = current;
+
+	if (cur->xhlocks) {
+		unsigned int idx = cur->xhlock_idx_soft;
+		struct hist_lock *h = &xhlock(idx);
+
+		cur->xhlock_idx = idx;
+		/* Check if the ring was overwritten. */
+		if (h->hist_id != cur->hist_id_soft)
+			invalidate_xhlock(h);
+	}
 }
 
 /*
@@ -4658,14 +4695,27 @@ void crossrelease_softirq_end(void)
  */
 void crossrelease_work_start(void)
 {
-	if (current->xhlocks)
-		current->xhlock_idx_work = current->xhlock_idx;
+	struct task_struct *cur = current;
+
+	if (cur->xhlocks) {
+		cur->xhlock_idx_work = cur->xhlock_idx;
+		cur->hist_id_work = cur->hist_id;
+	}
 }
 
 void crossrelease_work_end(void)
 {
-	if (current->xhlocks)
-		current->xhlock_idx = current->xhlock_idx_work;
+	struct task_struct *cur = current;
+
+	if (cur->xhlocks) {
+		unsigned int idx = cur->xhlock_idx_work;
+		struct hist_lock *h = &xhlock(idx);
+
+		cur->xhlock_idx = idx;
+		/* Check if the ring was overwritten. */
+		if (h->hist_id != cur->hist_id_work)
+			invalidate_xhlock(h);
+	}
 }
 
 static int cross_lock(struct lockdep_map *lock)
@@ -4711,6 +4761,7 @@ static inline int depend_after(struct held_lock *hlock)
  * Check if the xhlock is valid, which would be false if,
  *
  *    1. Has not used after initializaion yet.
+ *    2. Got invalidated.
  *
  * Remind hist_lock is implemented as a ring buffer.
  */
@@ -4742,6 +4793,7 @@ static void add_xhlock(struct held_lock *hlock)
 
 	/* Initialize hist_lock's members */
 	xhlock->hlock = *hlock;
+	xhlock->hist_id = current->hist_id++;
 
 	xhlock->trace.nr_entries = 0;
 	xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
@@ -4880,6 +4932,7 @@ static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
 static void commit_xhlocks(struct cross_lock *xlock)
 {
 	unsigned int cur = current->xhlock_idx;
+	unsigned int prev_hist_id = xhlock(cur).hist_id;
 	unsigned int i;
 
 	if (!graph_lock())
@@ -4898,6 +4951,17 @@ static void commit_xhlocks(struct cross_lock *xlock)
 			break;
 
 		/*
+		 * Filter out the cases that the ring buffer was
+		 * overwritten and the previous entry has a bigger
+		 * hist_id than the following one, which is impossible
+		 * otherwise.
+		 */
+		if (unlikely(before(xhlock->hist_id, prev_hist_id)))
+			break;
+
+		prev_hist_id = xhlock->hist_id;
+
+		/*
 		 * commit_xhlock() returns 0 with graph_lock already
 		 * released if fail.
 		 */
@@ -4967,6 +5031,10 @@ static void cross_init(struct lockdep_map *lock, int cross)
 
 void init_crossrelease_task(struct task_struct *task)
 {
+	task->hist_id = 0;
+	task->hist_id_soft = 0;
+	task->hist_id_hard = 0;
+	task->hist_id_work = 0;
 	task->xhlock_idx = UINT_MAX;
 	task->xhlock_idx_soft = UINT_MAX;
 	task->xhlock_idx_hard = UINT_MAX;
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 07/16] lockdep: Handle non(or multi)-acquisition of a crosslock
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
                   ` (5 preceding siblings ...)
  2017-05-24  8:59 ` [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 08/16] lockdep: Avoid adding redundant direct links of crosslocks Byungchul Park
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

No acquisition might be in progress on commit of a crosslock. Completion
operations enabling crossrelease are the case like:

   CONTEXT X                         CONTEXT Y
   ---------                         ---------
   trigger completion context
                                     complete AX
                                        commit AX
   wait_for_complete AX
      acquire AX
      wait

   where AX is a crosslock.

When no acquisition is in progress, we should not perform commit because
the lock does not exist, which might cause incorrect memory access. So
we have to track the number of acquisitions of a crosslock to handle it.

Moreover, in case that more than one acquisition of a crosslock are
overlapped like:

   CONTEXT W        CONTEXT X        CONTEXT Y        CONTEXT Z
   ---------        ---------        ---------        ---------
   acquire AX (gen_id: 1)
                                     acquire A
                    acquire AX (gen_id: 10)
                                     acquire B
                                     commit AX
                                                      acquire C
                                                      commit AX

   where A, B and C are typical locks and AX is a crosslock.

Current crossrelease code performs commits in Y and Z with gen_id = 10.
However, we can use gen_id = 1 to do it, since not only 'acquire AX in X'
but 'acquire AX in W' also depends on each acquisition in Y and Z until
their commits. So make it use gen_id = 1 instead of 10 on their commits,
which adds an additional dependency 'AX -> A' in the example above.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h  | 22 ++++++++++++-
 kernel/locking/lockdep.c | 82 +++++++++++++++++++++++++++++++++---------------
 2 files changed, 77 insertions(+), 27 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index a03f79d..f7c730a 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -325,6 +325,19 @@ struct hist_lock {
  */
 struct cross_lock {
 	/*
+	 * When more than one acquisition of crosslocks are overlapped,
+	 * we have to perform commit for them based on cross_gen_id of
+	 * the first acquisition, which allows us to add more true
+	 * dependencies.
+	 *
+	 * Moreover, when no acquisition of a crosslock is in progress,
+	 * we should not perform commit because the lock might not exist
+	 * any more, which might cause incorrect memory access. So we
+	 * have to track the number of acquisitions of a crosslock.
+	 */
+	int nr_acquire;
+
+	/*
 	 * Seperate hlock instance. This will be used at commit step.
 	 *
 	 * TODO: Use a smaller data structure containing only necessary
@@ -523,9 +536,16 @@ extern void lockdep_init_map_crosslock(struct lockdep_map *lock,
 				       int subclass);
 extern void lock_commit_crosslock(struct lockdep_map *lock);
 
+/*
+ * What we essencially have to initialize is 'nr_acquire'. Other members
+ * will be initialized in add_xlock().
+ */
+#define STATIC_CROSS_LOCK_INIT() \
+	{ .nr_acquire = 0,}
+
 #define STATIC_CROSS_LOCKDEP_MAP_INIT(_name, _key) \
 	{ .map.name = (_name), .map.key = (void *)(_key), \
-	  .map.cross = 1, }
+	  .map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
 
 /*
  * To initialize a lockdep_map statically use this macro.
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 26ff205..09f5eec 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4838,11 +4838,28 @@ static int add_xlock(struct held_lock *hlock)
 
 	xlock = &((struct lockdep_map_cross *)hlock->instance)->xlock;
 
+	/*
+	 * When acquisitions for a crosslock are overlapped, we use
+	 * nr_acquire to perform commit for them, based on cross_gen_id
+	 * of the first acquisition, which allows to add additional
+	 * dependencies.
+	 *
+	 * Moreover, when no acquisition of a crosslock is in progress,
+	 * we should not perform commit because the lock might not exist
+	 * any more, which might cause incorrect memory access. So we
+	 * have to track the number of acquisitions of a crosslock.
+	 *
+	 * depend_after() is necessary to initialize only the first
+	 * valid xlock so that the xlock can be used on its commit.
+	 */
+	if (xlock->nr_acquire++ && depend_after(&xlock->hlock))
+		goto unlock;
+
 	gen_id = (unsigned int)atomic_inc_return(&cross_gen_id);
 	xlock->hlock = *hlock;
 	xlock->hlock.gen_id = gen_id;
+unlock:
 	graph_unlock();
-
 	return 1;
 }
 
@@ -4938,35 +4955,37 @@ static void commit_xhlocks(struct cross_lock *xlock)
 	if (!graph_lock())
 		return;
 
-	for (i = 0; i < MAX_XHLOCKS_NR; i++) {
-		struct hist_lock *xhlock = &xhlock(cur - i);
+	if (xlock->nr_acquire) {
+		for (i = 0; i < MAX_XHLOCKS_NR; i++) {
+			struct hist_lock *xhlock = &xhlock(cur - i);
 
-		if (!xhlock_valid(xhlock))
-			break;
+			if (!xhlock_valid(xhlock))
+				break;
 
-		if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
-			break;
+			if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
+				break;
 
-		if (!same_context_xhlock(xhlock))
-			break;
+			if (!same_context_xhlock(xhlock))
+				break;
 
-		/*
-		 * Filter out the cases that the ring buffer was
-		 * overwritten and the previous entry has a bigger
-		 * hist_id than the following one, which is impossible
-		 * otherwise.
-		 */
-		if (unlikely(before(xhlock->hist_id, prev_hist_id)))
-			break;
+			/*
+			 * Filter out the cases that the ring buffer was
+			 * overwritten and the previous entry has a bigger
+			 * hist_id than the following one, which is impossible
+			 * otherwise.
+			 */
+			if (unlikely(before(xhlock->hist_id, prev_hist_id)))
+				break;
 
-		prev_hist_id = xhlock->hist_id;
+			prev_hist_id = xhlock->hist_id;
 
-		/*
-		 * commit_xhlock() returns 0 with graph_lock already
-		 * released if fail.
-		 */
-		if (!commit_xhlock(xlock, xhlock))
-			return;
+			/*
+			 * commit_xhlock() returns 0 with graph_lock already
+			 * released if fail.
+			 */
+			if (!commit_xhlock(xlock, xhlock))
+				return;
+		}
 	}
 
 	graph_unlock();
@@ -5010,16 +5029,27 @@ void lock_commit_crosslock(struct lockdep_map *lock)
 EXPORT_SYMBOL_GPL(lock_commit_crosslock);
 
 /*
- * Return: 1 - crosslock, done;
+ * Return: 0 - failure;
+ *         1 - crosslock, done;
  *         2 - normal lock, continue to held_lock[] ops.
  */
 static int lock_release_crosslock(struct lockdep_map *lock)
 {
-	return cross_lock(lock) ? 1 : 2;
+	if (cross_lock(lock)) {
+		if (!graph_lock())
+			return 0;
+		((struct lockdep_map_cross *)lock)->xlock.nr_acquire--;
+		graph_unlock();
+		return 1;
+	}
+	return 2;
 }
 
 static void cross_init(struct lockdep_map *lock, int cross)
 {
+	if (cross)
+		((struct lockdep_map_cross *)lock)->xlock.nr_acquire = 0;
+
 	lock->cross = cross;
 
 	/*
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 08/16] lockdep: Avoid adding redundant direct links of crosslocks
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
                   ` (6 preceding siblings ...)
  2017-05-24  8:59 ` [PATCH v7 07/16] lockdep: Handle non(or multi)-acquisition of a crosslock Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-07-25 15:41   ` Peter Zijlstra
  2017-05-24  8:59 ` [PATCH v7 09/16] lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS Byungchul Park
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

We can skip adding a dependency 'AX -> B', in case that we ensure 'AX ->
the previous of B in hlocks' to be created, where AX is a crosslock and
B is a typical lock. Remember that two adjacent locks in hlocks generate
a dependency like 'prev -> next', that is, 'the previous of B in hlocks
-> B' in this case.

For example:

             in hlocks[]
             ------------
          ^  A (gen_id: 4) --+
          |                  | previous gen_id
          |  B (gen_id: 3) <-+
          |  C (gen_id: 3)
          |  D (gen_id: 2)
   oldest |  E (gen_id: 1)

             in xhlocks[]
             ------------
          ^  A (gen_id: 4, prev_gen_id: 3(B's gen id))
          |  B (gen_id: 3, prev_gen_id: 3(C's gen id))
          |  C (gen_id: 3, prev_gen_id: 2(D's gen id))
          |  D (gen_id: 2, prev_gen_id: 1(E's gen id))
   oldest |  E (gen_id: 1, prev_gen_id: NA)

On commit for a crosslock AX(gen_id = 3), it's engough to add 'AX -> C',
but adding 'AX -> B' and 'AX -> A' is unnecessary since 'AX -> C', 'C ->
B' and 'B -> A' cover them, which are guaranteed to be generated.

This patch intoduces a variable, prev_gen_id, to avoid adding this kind
of redundant dependencies. In other words, the previous in hlocks will
anyway handle it if the previous's gen_id >= the crosslock's gen_id.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h  | 11 +++++++++++
 kernel/locking/lockdep.c | 32 ++++++++++++++++++++++++++++++--
 2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index f7c730a..e5c5cc4 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -284,6 +284,17 @@ struct held_lock {
  */
 struct hist_lock {
 	/*
+	 * We can skip adding a dependency 'a target crosslock -> this
+	 * lock', in case that we ensure 'the target crosslock -> the
+	 * previous lock in held_locks' to be created. Remember that
+	 * 'the previous lock in held_locks -> this lock' is guaranteed
+	 * to be created, and 'A -> B' and 'B -> C' cover 'A -> C'.
+	 *
+	 * Keep the previous's gen_id to make the decision.
+	 */
+	unsigned int		prev_gen_id;
+
+	/*
 	 * Id for each entry in the ring buffer. This is used to
 	 * decide whether the ring buffer was overwritten or not.
 	 *
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 09f5eec..a14d2ca 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4778,7 +4778,7 @@ static inline int xhlock_valid(struct hist_lock *xhlock)
  *
  * Irq disable is only required.
  */
-static void add_xhlock(struct held_lock *hlock)
+static void add_xhlock(struct held_lock *hlock, unsigned int prev_gen_id)
 {
 	unsigned int idx = ++current->xhlock_idx;
 	struct hist_lock *xhlock = &xhlock(idx);
@@ -4793,6 +4793,11 @@ static void add_xhlock(struct held_lock *hlock)
 
 	/* Initialize hist_lock's members */
 	xhlock->hlock = *hlock;
+	/*
+	 * prev_gen_id is used to skip adding redundant dependencies,
+	 * which can be covered by the previous lock in held_locks.
+	 */
+	xhlock->prev_gen_id = prev_gen_id;
 	xhlock->hist_id = current->hist_id++;
 
 	xhlock->trace.nr_entries = 0;
@@ -4813,6 +4818,11 @@ static inline int same_context_xhlock(struct hist_lock *xhlock)
  */
 static void check_add_xhlock(struct held_lock *hlock)
 {
+	struct held_lock *prev;
+	struct held_lock *start;
+	unsigned int gen_id;
+	unsigned int gen_id_invalid;
+
 	/*
 	 * Record a hist_lock, only in case that acquisitions ahead
 	 * could depend on the held_lock. For example, if the held_lock
@@ -4822,7 +4832,22 @@ static void check_add_xhlock(struct held_lock *hlock)
 	if (!current->xhlocks || !depend_before(hlock))
 		return;
 
-	add_xhlock(hlock);
+	gen_id = (unsigned int)atomic_read(&cross_gen_id);
+	/*
+	 * gen_id_invalid should be old enough to be invalid.
+	 * Current gen_id - (UINIT_MAX / 4) would be a good
+	 * value to meet it.
+	 */
+	gen_id_invalid = gen_id - (UINT_MAX / 4);
+	start = current->held_locks;
+
+	for (prev = hlock - 1; prev >= start &&
+			!depend_before(prev); prev--);
+
+	if (prev < start)
+		add_xhlock(hlock, gen_id_invalid);
+	else if (prev->gen_id != gen_id)
+		add_xhlock(hlock, prev->gen_id);
 }
 
 /*
@@ -4979,6 +5004,9 @@ static void commit_xhlocks(struct cross_lock *xlock)
 
 			prev_hist_id = xhlock->hist_id;
 
+			if (!before(xhlock->prev_gen_id, xlock->hlock.gen_id))
+				continue;
+
 			/*
 			 * commit_xhlock() returns 0 with graph_lock already
 			 * released if fail.
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 09/16] lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
                   ` (7 preceding siblings ...)
  2017-05-24  8:59 ` [PATCH v7 08/16] lockdep: Avoid adding redundant direct links of crosslocks Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 10/16] lockdep: Make print_circular_bug() aware of crossrelease Byungchul Park
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

Bug messages and stack dump for MAX_LOCKDEP_CHAIN_HLOCKS should be
printed only once.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index a14d2ca..8173c81 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2267,7 +2267,7 @@ static inline int add_chain_cache(struct task_struct *curr,
 	 * Important for check_no_collision().
 	 */
 	if (unlikely(nr_chain_hlocks > MAX_LOCKDEP_CHAIN_HLOCKS)) {
-		if (debug_locks_off_graph_unlock())
+		if (!debug_locks_off_graph_unlock())
 			return 0;
 
 		print_lockdep_off("BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!");
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 10/16] lockdep: Make print_circular_bug() aware of crossrelease
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
                   ` (8 preceding siblings ...)
  2017-05-24  8:59 ` [PATCH v7 09/16] lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 11/16] lockdep: Apply crossrelease to completions Byungchul Park
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

print_circular_bug() reporting circular bug assumes that target hlock is
owned by the current. However, in crossrelease, target hlock can be
owned by other than the current. So the report format needs to be
changed to reflect the change.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/locking/lockdep.c | 65 +++++++++++++++++++++++++++++++++---------------
 1 file changed, 45 insertions(+), 20 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 8173c81..45e9019 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1125,22 +1125,41 @@ static inline int __bfs_backwards(struct lock_list *src_entry,
 		printk(KERN_CONT "\n\n");
 	}
 
-	printk(" Possible unsafe locking scenario:\n\n");
-	printk("       CPU0                    CPU1\n");
-	printk("       ----                    ----\n");
-	printk("  lock(");
-	__print_lock_name(target);
-	printk(KERN_CONT ");\n");
-	printk("                               lock(");
-	__print_lock_name(parent);
-	printk(KERN_CONT ");\n");
-	printk("                               lock(");
-	__print_lock_name(target);
-	printk(KERN_CONT ");\n");
-	printk("  lock(");
-	__print_lock_name(source);
-	printk(KERN_CONT ");\n");
-	printk("\n *** DEADLOCK ***\n\n");
+	if (cross_lock(tgt->instance)) {
+		printk(" Possible unsafe locking scenario by crosslock:\n\n");
+		printk("       CPU0                    CPU1\n");
+		printk("       ----                    ----\n");
+		printk("  lock(");
+		__print_lock_name(parent);
+		printk(KERN_CONT ");\n");
+		printk("  lock(");
+		__print_lock_name(target);
+		printk(KERN_CONT ");\n");
+		printk("                               lock(");
+		__print_lock_name(source);
+		printk(KERN_CONT ");\n");
+		printk("                               unlock(");
+		__print_lock_name(target);
+		printk(KERN_CONT ");\n");
+		printk("\n *** DEADLOCK ***\n\n");
+	} else {
+		printk(" Possible unsafe locking scenario:\n\n");
+		printk("       CPU0                    CPU1\n");
+		printk("       ----                    ----\n");
+		printk("  lock(");
+		__print_lock_name(target);
+		printk(KERN_CONT ");\n");
+		printk("                               lock(");
+		__print_lock_name(parent);
+		printk(KERN_CONT ");\n");
+		printk("                               lock(");
+		__print_lock_name(target);
+		printk(KERN_CONT ");\n");
+		printk("  lock(");
+		__print_lock_name(source);
+		printk(KERN_CONT ");\n");
+		printk("\n *** DEADLOCK ***\n\n");
+	}
 }
 
 /*
@@ -1165,7 +1184,10 @@ static inline int __bfs_backwards(struct lock_list *src_entry,
 	printk("%s/%d is trying to acquire lock:\n",
 		curr->comm, task_pid_nr(curr));
 	print_lock(check_src);
-	printk("\nbut task is already holding lock:\n");
+	if (cross_lock(check_tgt->instance))
+		printk("\nbut now in release context of a crosslock acquired at the following:\n");
+	else
+		printk("\nbut task is already holding lock:\n");
 	print_lock(check_tgt);
 	printk("\nwhich lock already depends on the new lock.\n\n");
 	printk("\nthe existing dependency chain (in reverse order) is:\n");
@@ -1183,7 +1205,8 @@ static inline int class_equal(struct lock_list *entry, void *data)
 static noinline int print_circular_bug(struct lock_list *this,
 				struct lock_list *target,
 				struct held_lock *check_src,
-				struct held_lock *check_tgt)
+				struct held_lock *check_tgt,
+				struct stack_trace *trace)
 {
 	struct task_struct *curr = current;
 	struct lock_list *parent;
@@ -1193,7 +1216,9 @@ static noinline int print_circular_bug(struct lock_list *this,
 	if (!debug_locks_off_graph_unlock() || debug_locks_silent)
 		return 0;
 
-	if (!save_trace(&this->trace))
+	if (cross_lock(check_tgt->instance))
+		this->trace = *trace;
+	else if (!save_trace(&this->trace))
 		return 0;
 
 	depth = get_lock_depth(target);
@@ -1837,7 +1862,7 @@ static inline void inc_chains(void)
 	this.parent = NULL;
 	ret = check_noncircular(&this, hlock_class(prev), &target_entry);
 	if (unlikely(!ret))
-		return print_circular_bug(&this, target_entry, next, prev);
+		return print_circular_bug(&this, target_entry, next, prev, trace);
 	else if (unlikely(ret < 0))
 		return print_bfs_bug(ret);
 
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 11/16] lockdep: Apply crossrelease to completions
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
                   ` (9 preceding siblings ...)
  2017-05-24  8:59 ` [PATCH v7 10/16] lockdep: Make print_circular_bug() aware of crossrelease Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 12/16] pagemap.h: Remove trailing white space Byungchul Park
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

Although wait_for_completion() and its family can cause deadlock, the
lock correctness validator could not be applied to them until now,
because things like complete() are usually called in a different context
from the waiting context, which violates lockdep's assumption.

Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
detector to those completion operations. Applied it.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/completion.h | 118 +++++++++++++++++++++++++++++++++++++++++----
 kernel/sched/completion.c  |  54 ++++++++++++---------
 lib/Kconfig.debug          |   8 +++
 3 files changed, 147 insertions(+), 33 deletions(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 5d5aaae..6b3bcfc 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,6 +9,9 @@
  */
 
 #include <linux/wait.h>
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#include <linux/lockdep.h>
+#endif
 
 /*
  * struct completion - structure used to maintain state for a "completion"
@@ -25,10 +28,50 @@
 struct completion {
 	unsigned int done;
 	wait_queue_head_t wait;
+#ifdef CONFIG_LOCKDEP_COMPLETE
+	struct lockdep_map_cross map;
+#endif
 };
 
+#ifdef CONFIG_LOCKDEP_COMPLETE
+static inline void complete_acquire(struct completion *x)
+{
+	lock_acquire_exclusive((struct lockdep_map *)&x->map, 0, 0, NULL, _RET_IP_);
+}
+
+static inline void complete_release(struct completion *x)
+{
+	lock_release((struct lockdep_map *)&x->map, 0, _RET_IP_);
+}
+
+static inline void complete_release_commit(struct completion *x)
+{
+	lock_commit_crosslock((struct lockdep_map *)&x->map);
+}
+
+#define init_completion(x)						\
+do {									\
+	static struct lock_class_key __key;				\
+	lockdep_init_map_crosslock((struct lockdep_map *)&(x)->map,	\
+			"(complete)" #x,				\
+			&__key, 0);					\
+	__init_completion(x);						\
+} while (0)
+#else
+#define init_completion(x) __init_completion(x)
+static inline void complete_acquire(struct completion *x) {}
+static inline void complete_release(struct completion *x) {}
+static inline void complete_release_commit(struct completion *x) {}
+#endif
+
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#define COMPLETION_INITIALIZER(work) \
+	{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
+	STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
+#else
 #define COMPLETION_INITIALIZER(work) \
 	{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+#endif
 
 #define COMPLETION_INITIALIZER_ONSTACK(work) \
 	({ init_completion(&work); work; })
@@ -70,7 +113,7 @@ struct completion {
  * This inline function will initialize a dynamically created completion
  * structure.
  */
-static inline void init_completion(struct completion *x)
+static inline void __init_completion(struct completion *x)
 {
 	x->done = 0;
 	init_waitqueue_head(&x->wait);
@@ -88,18 +131,75 @@ static inline void reinit_completion(struct completion *x)
 	x->done = 0;
 }
 
-extern void wait_for_completion(struct completion *);
-extern void wait_for_completion_io(struct completion *);
-extern int wait_for_completion_interruptible(struct completion *x);
-extern int wait_for_completion_killable(struct completion *x);
-extern unsigned long wait_for_completion_timeout(struct completion *x,
+extern void __wait_for_completion(struct completion *);
+extern void __wait_for_completion_io(struct completion *);
+extern int __wait_for_completion_interruptible(struct completion *x);
+extern int __wait_for_completion_killable(struct completion *x);
+extern unsigned long __wait_for_completion_timeout(struct completion *x,
 						   unsigned long timeout);
-extern unsigned long wait_for_completion_io_timeout(struct completion *x,
+extern unsigned long __wait_for_completion_io_timeout(struct completion *x,
 						    unsigned long timeout);
-extern long wait_for_completion_interruptible_timeout(
+extern long __wait_for_completion_interruptible_timeout(
 	struct completion *x, unsigned long timeout);
-extern long wait_for_completion_killable_timeout(
+extern long __wait_for_completion_killable_timeout(
 	struct completion *x, unsigned long timeout);
+
+static inline void wait_for_completion(struct completion *x)
+{
+	complete_acquire(x);
+	__wait_for_completion(x);
+	complete_release(x);
+}
+
+static inline void wait_for_completion_io(struct completion *x)
+{
+	complete_acquire(x);
+	__wait_for_completion_io(x);
+	complete_release(x);
+}
+
+static inline int wait_for_completion_interruptible(struct completion *x)
+{
+	int ret;
+	complete_acquire(x);
+	ret = __wait_for_completion_interruptible(x);
+	complete_release(x);
+	return ret;
+}
+
+static inline int wait_for_completion_killable(struct completion *x)
+{
+	int ret;
+	complete_acquire(x);
+	ret = __wait_for_completion_killable(x);
+	complete_release(x);
+	return ret;
+}
+
+static inline unsigned long wait_for_completion_timeout(struct completion *x,
+		unsigned long timeout)
+{
+	return __wait_for_completion_timeout(x, timeout);
+}
+
+static inline unsigned long wait_for_completion_io_timeout(struct completion *x,
+		unsigned long timeout)
+{
+	return __wait_for_completion_io_timeout(x, timeout);
+}
+
+static inline long wait_for_completion_interruptible_timeout(
+	struct completion *x, unsigned long timeout)
+{
+	return __wait_for_completion_interruptible_timeout(x, timeout);
+}
+
+static inline long wait_for_completion_killable_timeout(
+	struct completion *x, unsigned long timeout)
+{
+	return __wait_for_completion_killable_timeout(x, timeout);
+}
+
 extern bool try_wait_for_completion(struct completion *x);
 extern bool completion_done(struct completion *x);
 
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 8d0f35d..847b1d4 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -31,6 +31,10 @@ void complete(struct completion *x)
 	unsigned long flags;
 
 	spin_lock_irqsave(&x->wait.lock, flags);
+	/*
+	 * Perform commit of crossrelease here.
+	 */
+	complete_release_commit(x);
 	x->done++;
 	__wake_up_locked(&x->wait, TASK_NORMAL, 1);
 	spin_unlock_irqrestore(&x->wait.lock, flags);
@@ -108,7 +112,7 @@ void complete_all(struct completion *x)
 }
 
 /**
- * wait_for_completion: - waits for completion of a task
+ * __wait_for_completion: - waits for completion of a task
  * @x:  holds the state of this particular completion
  *
  * This waits to be signaled for completion of a specific task. It is NOT
@@ -117,14 +121,14 @@ void complete_all(struct completion *x)
  * See also similar routines (i.e. wait_for_completion_timeout()) with timeout
  * and interrupt capability. Also see complete().
  */
-void __sched wait_for_completion(struct completion *x)
+void __sched __wait_for_completion(struct completion *x)
 {
 	wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion);
+EXPORT_SYMBOL(__wait_for_completion);
 
 /**
- * wait_for_completion_timeout: - waits for completion of a task (w/timeout)
+ * __wait_for_completion_timeout: - waits for completion of a task (w/timeout)
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -136,28 +140,28 @@ void __sched wait_for_completion(struct completion *x)
  * till timeout) if completed.
  */
 unsigned long __sched
-wait_for_completion_timeout(struct completion *x, unsigned long timeout)
+__wait_for_completion_timeout(struct completion *x, unsigned long timeout)
 {
 	return wait_for_common(x, timeout, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_timeout);
+EXPORT_SYMBOL(__wait_for_completion_timeout);
 
 /**
- * wait_for_completion_io: - waits for completion of a task
+ * __wait_for_completion_io: - waits for completion of a task
  * @x:  holds the state of this particular completion
  *
  * This waits to be signaled for completion of a specific task. It is NOT
  * interruptible and there is no timeout. The caller is accounted as waiting
  * for IO (which traditionally means blkio only).
  */
-void __sched wait_for_completion_io(struct completion *x)
+void __sched __wait_for_completion_io(struct completion *x)
 {
 	wait_for_common_io(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_io);
+EXPORT_SYMBOL(__wait_for_completion_io);
 
 /**
- * wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
+ * __wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -170,14 +174,14 @@ void __sched wait_for_completion_io(struct completion *x)
  * till timeout) if completed.
  */
 unsigned long __sched
-wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
+__wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
 {
 	return wait_for_common_io(x, timeout, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_io_timeout);
+EXPORT_SYMBOL(__wait_for_completion_io_timeout);
 
 /**
- * wait_for_completion_interruptible: - waits for completion of a task (w/intr)
+ * __wait_for_completion_interruptible: - waits for completion of a task (w/intr)
  * @x:  holds the state of this particular completion
  *
  * This waits for completion of a specific task to be signaled. It is
@@ -185,17 +189,18 @@ void __sched wait_for_completion_io(struct completion *x)
  *
  * Return: -ERESTARTSYS if interrupted, 0 if completed.
  */
-int __sched wait_for_completion_interruptible(struct completion *x)
+int __sched __wait_for_completion_interruptible(struct completion *x)
 {
 	long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE);
+
 	if (t == -ERESTARTSYS)
 		return t;
 	return 0;
 }
-EXPORT_SYMBOL(wait_for_completion_interruptible);
+EXPORT_SYMBOL(__wait_for_completion_interruptible);
 
 /**
- * wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
+ * __wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -206,15 +211,15 @@ int __sched wait_for_completion_interruptible(struct completion *x)
  * or number of jiffies left till timeout) if completed.
  */
 long __sched
-wait_for_completion_interruptible_timeout(struct completion *x,
+__wait_for_completion_interruptible_timeout(struct completion *x,
 					  unsigned long timeout)
 {
 	return wait_for_common(x, timeout, TASK_INTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_interruptible_timeout);
+EXPORT_SYMBOL(__wait_for_completion_interruptible_timeout);
 
 /**
- * wait_for_completion_killable: - waits for completion of a task (killable)
+ * __wait_for_completion_killable: - waits for completion of a task (killable)
  * @x:  holds the state of this particular completion
  *
  * This waits to be signaled for completion of a specific task. It can be
@@ -222,17 +227,18 @@ int __sched wait_for_completion_interruptible(struct completion *x)
  *
  * Return: -ERESTARTSYS if interrupted, 0 if completed.
  */
-int __sched wait_for_completion_killable(struct completion *x)
+int __sched __wait_for_completion_killable(struct completion *x)
 {
 	long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_KILLABLE);
+
 	if (t == -ERESTARTSYS)
 		return t;
 	return 0;
 }
-EXPORT_SYMBOL(wait_for_completion_killable);
+EXPORT_SYMBOL(__wait_for_completion_killable);
 
 /**
- * wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
+ * __wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -244,12 +250,12 @@ int __sched wait_for_completion_killable(struct completion *x)
  * or number of jiffies left till timeout) if completed.
  */
 long __sched
-wait_for_completion_killable_timeout(struct completion *x,
+__wait_for_completion_killable_timeout(struct completion *x,
 				     unsigned long timeout)
 {
 	return wait_for_common(x, timeout, TASK_KILLABLE);
 }
-EXPORT_SYMBOL(wait_for_completion_killable_timeout);
+EXPORT_SYMBOL(__wait_for_completion_killable_timeout);
 
 /**
  *	try_wait_for_completion - try to decrement a completion without blocking
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e584431..88089ba 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1054,6 +1054,14 @@ config LOCKDEP_CROSSRELEASE
 	 such as page locks or completions can use the lock correctness
 	 detector, lockdep.
 
+config LOCKDEP_COMPLETE
+	bool "Lock debugging: allow completions to use deadlock detector"
+	select LOCKDEP_CROSSRELEASE
+	default n
+	help
+	 A deadlock caused by wait_for_completion() and complete() can be
+	 detected by lockdep using crossrelease feature.
+
 config PROVE_LOCKING
 	bool "Lock debugging: prove locking correctness"
 	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 12/16] pagemap.h: Remove trailing white space
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
                   ` (10 preceding siblings ...)
  2017-05-24  8:59 ` [PATCH v7 11/16] lockdep: Apply crossrelease to completions Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 13/16] lockdep: Apply crossrelease to PG_locked locks Byungchul Park
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

Trailing white space is not accepted in kernel coding style. Remove
them.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/pagemap.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 7dbe914..a8ee59a 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -504,7 +504,7 @@ static inline void wake_up_page(struct page *page, int bit)
 	__wake_up_bit(page_waitqueue(page), &page->flags, bit);
 }
 
-/* 
+/*
  * Wait for a page to be unlocked.
  *
  * This must be called with the caller "holding" the page,
@@ -517,7 +517,7 @@ static inline void wait_on_page_locked(struct page *page)
 		wait_on_page_bit(compound_head(page), PG_locked);
 }
 
-/* 
+/*
  * Wait for a page to complete writeback
  */
 static inline void wait_on_page_writeback(struct page *page)
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 13/16] lockdep: Apply crossrelease to PG_locked locks
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
                   ` (11 preceding siblings ...)
  2017-05-24  8:59 ` [PATCH v7 12/16] pagemap.h: Remove trailing white space Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 14/16] lockdep: Apply lock_acquire(release) on __Set(__Clear)PageLocked Byungchul Park
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

Although lock_page() and its family can cause deadlock, the lock
correctness validator could not be applied to them until now, becasue
things like unlock_page() might be called in a different context from
the acquisition context, which violates lockdep's assumption.

Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
detector to page locks. Applied it.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/mm_types.h |   8 ++++
 include/linux/pagemap.h  | 101 ++++++++++++++++++++++++++++++++++++++++++++---
 lib/Kconfig.debug        |   8 ++++
 mm/filemap.c             |   4 +-
 mm/page_alloc.c          |   3 ++
 5 files changed, 116 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 4a8aced..06adfa2 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -16,6 +16,10 @@
 #include <asm/page.h>
 #include <asm/mmu.h>
 
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include <linux/lockdep.h>
+#endif
+
 #ifndef AT_VECTOR_SIZE_ARCH
 #define AT_VECTOR_SIZE_ARCH 0
 #endif
@@ -221,6 +225,10 @@ struct page {
 #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
 	int _last_cpupid;
 #endif
+
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+	struct lockdep_map_cross map;
+#endif
 }
 /*
  * The struct page can be forced to be double word aligned so that atomic ops
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a8ee59a..b72be29 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -14,6 +14,9 @@
 #include <linux/bitops.h>
 #include <linux/hardirq.h> /* for in_interrupt() */
 #include <linux/hugetlb_inline.h>
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include <linux/lockdep.h>
+#endif
 
 /*
  * Bits in mapping->flags.
@@ -432,26 +435,91 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
 	return pgoff;
 }
 
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#define lock_page_init(p)						\
+do {									\
+	static struct lock_class_key __key;				\
+	lockdep_init_map_crosslock((struct lockdep_map *)&(p)->map,	\
+			"(PG_locked)" #p, &__key, 0);			\
+} while (0)
+
+static inline void lock_page_acquire(struct page *page, int try)
+{
+	page = compound_head(page);
+	lock_acquire_exclusive((struct lockdep_map *)&page->map, 0,
+			       try, NULL, _RET_IP_);
+}
+
+static inline void lock_page_release(struct page *page)
+{
+	page = compound_head(page);
+	/*
+	 * lock_commit_crosslock() is necessary for crosslocks.
+	 */
+	lock_commit_crosslock((struct lockdep_map *)&page->map);
+	lock_release((struct lockdep_map *)&page->map, 0, _RET_IP_);
+}
+#else
+static inline void lock_page_init(struct page *page) {}
+static inline void lock_page_free(struct page *page) {}
+static inline void lock_page_acquire(struct page *page, int try) {}
+static inline void lock_page_release(struct page *page) {}
+#endif
+
 extern void __lock_page(struct page *page);
 extern int __lock_page_killable(struct page *page);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				unsigned int flags);
-extern void unlock_page(struct page *page);
+extern void do_raw_unlock_page(struct page *page);
 
-static inline int trylock_page(struct page *page)
+static inline void unlock_page(struct page *page)
+{
+	lock_page_release(page);
+	do_raw_unlock_page(page);
+}
+
+static inline int do_raw_trylock_page(struct page *page)
 {
 	page = compound_head(page);
 	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
 }
 
+static inline int trylock_page(struct page *page)
+{
+	if (do_raw_trylock_page(page)) {
+		lock_page_acquire(page, 1);
+		return 1;
+	}
+	return 0;
+}
+
 /*
  * lock_page may only be called if we have the page's inode pinned.
  */
 static inline void lock_page(struct page *page)
 {
 	might_sleep();
-	if (!trylock_page(page))
+
+	if (!do_raw_trylock_page(page))
 		__lock_page(page);
+	/*
+	 * acquire() must be after actual lock operation for crosslocks.
+	 * This way a crosslock and current lock can be ordered like:
+	 *
+	 *	CONTEXT 1		CONTEXT 2
+	 *	---------		---------
+	 *	lock A (cross)
+	 *	acquire A
+	 *	  X = atomic_inc_return(&cross_gen_id)
+	 *	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+	 *				acquire B
+	 *				  Y = atomic_read_acquire(&cross_gen_id)
+	 *				lock B
+	 *
+	 * so that 'lock A and then lock B' can be seen globally,
+	 * if X <= Y.
+	 */
+	lock_page_acquire(page, 0);
 }
 
 /*
@@ -461,9 +529,20 @@ static inline void lock_page(struct page *page)
  */
 static inline int lock_page_killable(struct page *page)
 {
+	int ret;
+
 	might_sleep();
-	if (!trylock_page(page))
-		return __lock_page_killable(page);
+
+	if (!do_raw_trylock_page(page)) {
+		ret = __lock_page_killable(page);
+		if (ret)
+			return ret;
+	}
+	/*
+	 * acquire() must be after actual lock operation for crosslocks.
+	 * This way a crosslock and other locks can be ordered.
+	 */
+	lock_page_acquire(page, 0);
 	return 0;
 }
 
@@ -478,7 +557,17 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
 				     unsigned int flags)
 {
 	might_sleep();
-	return trylock_page(page) || __lock_page_or_retry(page, mm, flags);
+
+	if (do_raw_trylock_page(page) || __lock_page_or_retry(page, mm, flags)) {
+		/*
+		 * acquire() must be after actual lock operation for crosslocks.
+		 * This way a crosslock and other locks can be ordered.
+		 */
+		lock_page_acquire(page, 0);
+		return 1;
+	}
+
+	return 0;
 }
 
 /*
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 88089ba..cdcc3df 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1062,6 +1062,14 @@ config LOCKDEP_COMPLETE
 	 A deadlock caused by wait_for_completion() and complete() can be
 	 detected by lockdep using crossrelease feature.
 
+config LOCKDEP_PAGELOCK
+	bool "Lock debugging: allow PG_locked lock to use deadlock detector"
+	select LOCKDEP_CROSSRELEASE
+	default n
+	help
+	 PG_locked lock is a kind of crosslock. Using crossrelease feature,
+	 PG_locked lock can work with runtime deadlock detector.
+
 config PROVE_LOCKING
 	bool "Lock debugging: prove locking correctness"
 	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
diff --git a/mm/filemap.c b/mm/filemap.c
index 50b52fe..d439cc7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -858,7 +858,7 @@ void add_page_wait_queue(struct page *page, wait_queue_t *waiter)
  * The mb is necessary to enforce ordering between the clear_bit and the read
  * of the waitqueue (to avoid SMP races with a parallel wait_on_page_locked()).
  */
-void unlock_page(struct page *page)
+void do_raw_unlock_page(struct page *page)
 {
 	page = compound_head(page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
@@ -866,7 +866,7 @@ void unlock_page(struct page *page)
 	smp_mb__after_atomic();
 	wake_up_page(page, PG_locked);
 }
-EXPORT_SYMBOL(unlock_page);
+EXPORT_SYMBOL(do_raw_unlock_page);
 
 /**
  * end_page_writeback - end writeback against a page
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6de9440..36d5f9e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5063,6 +5063,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		} else {
 			__init_single_pfn(pfn, zone, nid);
 		}
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+		lock_page_init(pfn_to_page(pfn));
+#endif
 	}
 }
 
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 14/16] lockdep: Apply lock_acquire(release) on __Set(__Clear)PageLocked
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
                   ` (12 preceding siblings ...)
  2017-05-24  8:59 ` [PATCH v7 13/16] lockdep: Apply crossrelease to PG_locked locks Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 15/16] lockdep: Move data of CONFIG_LOCKDEP_PAGELOCK from page to page_ext Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 16/16] lockdep: Crossrelease feature documentation Byungchul Park
  15 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

Usually PG_locked bit is updated by lock_page() or unlock_page().
However, it can be also updated through __SetPageLocked() or
__ClearPageLockded(). They have to be considered, to get paired between
acquire and release.

Furthermore, e.g. __SetPageLocked() in add_to_page_cache_lru() is called
frequently. We might miss many chances to check deadlock if we ignore it.
Make __Set(__Clear)PageLockded considered as well.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/page-flags.h | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 74e4dda..9d5f79d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -252,7 +252,6 @@ static __always_inline int PageCompound(struct page *page)
 #define TESTSCFLAG_FALSE(uname)						\
 	TESTSETFLAG_FALSE(uname) TESTCLEARFLAG_FALSE(uname)
 
-__PAGEFLAG(Locked, locked, PF_NO_TAIL)
 PAGEFLAG(Error, error, PF_NO_COMPOUND) TESTCLEARFLAG(Error, error, PF_NO_COMPOUND)
 PAGEFLAG(Referenced, referenced, PF_HEAD)
 	TESTCLEARFLAG(Referenced, referenced, PF_HEAD)
@@ -354,6 +353,35 @@ static __always_inline int PageCompound(struct page *page)
 PAGEFLAG(Idle, idle, PF_ANY)
 #endif
 
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include <linux/lockdep.h>
+
+TESTPAGEFLAG(Locked, locked, PF_NO_TAIL)
+
+static __always_inline void __SetPageLocked(struct page *page)
+{
+	__set_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
+
+	page = compound_head(page);
+	lock_acquire_exclusive((struct lockdep_map *)&page->map, 0, 1, NULL, _RET_IP_);
+}
+
+static __always_inline void __ClearPageLocked(struct page *page)
+{
+	__clear_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
+
+	page = compound_head(page);
+	/*
+	 * lock_commit_crosslock() is necessary for crosslock
+	 * when the lock is released, before lock_release().
+	 */
+	lock_commit_crosslock((struct lockdep_map *)&page->map);
+	lock_release((struct lockdep_map *)&page->map, 0, _RET_IP_);
+}
+#else
+__PAGEFLAG(Locked, locked, PF_NO_TAIL)
+#endif
+
 /*
  * On an anonymous page mapped into a user virtual memory area,
  * page->mapping points to its anon_vma, not to a struct address_space;
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 15/16] lockdep: Move data of CONFIG_LOCKDEP_PAGELOCK from page to page_ext
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
                   ` (13 preceding siblings ...)
  2017-05-24  8:59 ` [PATCH v7 14/16] lockdep: Apply lock_acquire(release) on __Set(__Clear)PageLocked Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  2017-05-24  8:59 ` [PATCH v7 16/16] lockdep: Crossrelease feature documentation Byungchul Park
  15 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

CONFIG_LOCKDEP_PAGELOCK needs to keep lockdep_map_cross per page. Since
it's a debug feature, it's preferred to keep it in struct page_ext than
struct page. Move it to struct page_ext.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/mm_types.h   |  4 ---
 include/linux/page-flags.h | 19 +++++++++++--
 include/linux/page_ext.h   |  4 +++
 include/linux/pagemap.h    | 28 ++++++++++++++++---
 lib/Kconfig.debug          |  1 +
 mm/filemap.c               | 69 ++++++++++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c            |  3 --
 mm/page_ext.c              |  4 +++
 8 files changed, 118 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 06adfa2..a6c7133 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -225,10 +225,6 @@ struct page {
 #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
 	int _last_cpupid;
 #endif
-
-#ifdef CONFIG_LOCKDEP_PAGELOCK
-	struct lockdep_map_cross map;
-#endif
 }
 /*
  * The struct page can be forced to be double word aligned so that atomic ops
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 9d5f79d..cca33f5 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -355,28 +355,41 @@ static __always_inline int PageCompound(struct page *page)
 
 #ifdef CONFIG_LOCKDEP_PAGELOCK
 #include <linux/lockdep.h>
+#include <linux/page_ext.h>
 
 TESTPAGEFLAG(Locked, locked, PF_NO_TAIL)
 
 static __always_inline void __SetPageLocked(struct page *page)
 {
+	struct page_ext *e;
+
 	__set_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
 
 	page = compound_head(page);
-	lock_acquire_exclusive((struct lockdep_map *)&page->map, 0, 1, NULL, _RET_IP_);
+	e = lookup_page_ext(page);
+	if (unlikely(!e))
+		return;
+
+	lock_acquire_exclusive((struct lockdep_map *)&e->map, 0, 1, NULL, _RET_IP_);
 }
 
 static __always_inline void __ClearPageLocked(struct page *page)
 {
+	struct page_ext *e;
+
 	__clear_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
 
 	page = compound_head(page);
+	e = lookup_page_ext(page);
+	if (unlikely(!e))
+		return;
+
 	/*
 	 * lock_commit_crosslock() is necessary for crosslock
 	 * when the lock is released, before lock_release().
 	 */
-	lock_commit_crosslock((struct lockdep_map *)&page->map);
-	lock_release((struct lockdep_map *)&page->map, 0, _RET_IP_);
+	lock_commit_crosslock((struct lockdep_map *)&e->map);
+	lock_release((struct lockdep_map *)&e->map, 0, _RET_IP_);
 }
 #else
 __PAGEFLAG(Locked, locked, PF_NO_TAIL)
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index 9298c39..d1c52c8c 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -44,6 +44,10 @@ enum page_ext_flags {
  */
 struct page_ext {
 	unsigned long flags;
+
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+	struct lockdep_map_cross map;
+#endif
 };
 
 extern void pgdat_page_ext_init(struct pglist_data *pgdat);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index b72be29..1be753d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -16,6 +16,7 @@
 #include <linux/hugetlb_inline.h>
 #ifdef CONFIG_LOCKDEP_PAGELOCK
 #include <linux/lockdep.h>
+#include <linux/page_ext.h>
 #endif
 
 /*
@@ -436,28 +437,47 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
 }
 
 #ifdef CONFIG_LOCKDEP_PAGELOCK
+extern struct page_ext_operations lockdep_pagelock_ops;
+
 #define lock_page_init(p)						\
 do {									\
 	static struct lock_class_key __key;				\
-	lockdep_init_map_crosslock((struct lockdep_map *)&(p)->map,	\
+	struct page_ext *e = lookup_page_ext(p);		\
+								\
+	if (unlikely(!e))					\
+		break;						\
+								\
+	lockdep_init_map_crosslock((struct lockdep_map *)&(e)->map,	\
 			"(PG_locked)" #p, &__key, 0);			\
 } while (0)
 
 static inline void lock_page_acquire(struct page *page, int try)
 {
+	struct page_ext *e;
+
 	page = compound_head(page);
-	lock_acquire_exclusive((struct lockdep_map *)&page->map, 0,
+	e = lookup_page_ext(page);
+	if (unlikely(!e))
+		return;
+
+	lock_acquire_exclusive((struct lockdep_map *)&e->map, 0,
 			       try, NULL, _RET_IP_);
 }
 
 static inline void lock_page_release(struct page *page)
 {
+	struct page_ext *e;
+
 	page = compound_head(page);
+	e = lookup_page_ext(page);
+	if (unlikely(!e))
+		return;
+
 	/*
 	 * lock_commit_crosslock() is necessary for crosslocks.
 	 */
-	lock_commit_crosslock((struct lockdep_map *)&page->map);
-	lock_release((struct lockdep_map *)&page->map, 0, _RET_IP_);
+	lock_commit_crosslock((struct lockdep_map *)&e->map);
+	lock_release((struct lockdep_map *)&e->map, 0, _RET_IP_);
 }
 #else
 static inline void lock_page_init(struct page *page) {}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index cdcc3df..57c0fa6 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1065,6 +1065,7 @@ config LOCKDEP_COMPLETE
 config LOCKDEP_PAGELOCK
 	bool "Lock debugging: allow PG_locked lock to use deadlock detector"
 	select LOCKDEP_CROSSRELEASE
+	select PAGE_EXTENSION
 	default n
 	help
 	 PG_locked lock is a kind of crosslock. Using crossrelease feature,
diff --git a/mm/filemap.c b/mm/filemap.c
index d439cc7..afca751 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -35,6 +35,9 @@
 #include <linux/memcontrol.h>
 #include <linux/cleancache.h>
 #include <linux/rmap.h>
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include <linux/page_ext.h>
+#endif
 #include "internal.h"
 
 #define CREATE_TRACE_POINTS
@@ -986,6 +989,72 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 	}
 }
 
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+static bool need_lockdep_pagelock(void) { return true; }
+
+static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
+{
+	struct page *page;
+	struct page_ext *page_ext;
+	unsigned long pfn = zone->zone_start_pfn;
+	unsigned long end_pfn = pfn + zone->spanned_pages;
+	unsigned long count = 0;
+
+	for (; pfn < end_pfn; pfn++) {
+		if (!pfn_valid(pfn)) {
+			pfn = ALIGN(pfn + 1, MAX_ORDER_NR_PAGES);
+			continue;
+		}
+
+		if (!pfn_valid_within(pfn))
+			continue;
+
+		page = pfn_to_page(pfn);
+
+		if (page_zone(page) != zone)
+			continue;
+
+		page_ext = lookup_page_ext(page);
+		if (unlikely(!page_ext))
+			continue;
+
+		lock_page_init(page);
+		count++;
+	}
+
+	pr_info("Node %d, zone %8s: lockdep pagelock found early allocated %lu pages\n",
+		pgdat->node_id, zone->name, count);
+}
+
+static void init_zones_in_node(pg_data_t *pgdat)
+{
+	struct zone *zone;
+	struct zone *node_zones = pgdat->node_zones;
+	unsigned long flags;
+
+	for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) {
+		if (!populated_zone(zone))
+			continue;
+
+		spin_lock_irqsave(&zone->lock, flags);
+		init_pages_in_zone(pgdat, zone);
+		spin_unlock_irqrestore(&zone->lock, flags);
+	}
+}
+
+static void init_lockdep_pagelock(void)
+{
+	pg_data_t *pgdat;
+	for_each_online_pgdat(pgdat)
+		init_zones_in_node(pgdat);
+}
+
+struct page_ext_operations lockdep_pagelock_ops = {
+	.need = need_lockdep_pagelock,
+	.init = init_lockdep_pagelock,
+};
+#endif
+
 /**
  * page_cache_next_hole - find the next hole (not-present entry)
  * @mapping: mapping
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 36d5f9e..6de9440 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5063,9 +5063,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		} else {
 			__init_single_pfn(pfn, zone, nid);
 		}
-#ifdef CONFIG_LOCKDEP_PAGELOCK
-		lock_page_init(pfn_to_page(pfn));
-#endif
 	}
 }
 
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 121dcff..023ac65 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -7,6 +7,7 @@
 #include <linux/kmemleak.h>
 #include <linux/page_owner.h>
 #include <linux/page_idle.h>
+#include <linux/pagemap.h>
 
 /*
  * struct page extension
@@ -68,6 +69,9 @@
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	&page_idle_ops,
 #endif
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+	&lockdep_pagelock_ops,
+#endif
 };
 
 static unsigned long total_usage;
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 16/16] lockdep: Crossrelease feature documentation
  2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
                   ` (14 preceding siblings ...)
  2017-05-24  8:59 ` [PATCH v7 15/16] lockdep: Move data of CONFIG_LOCKDEP_PAGELOCK from page to page_ext Byungchul Park
@ 2017-05-24  8:59 ` Byungchul Park
  15 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-05-24  8:59 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

This document describes the concept of crossrelease feature.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 Documentation/locking/crossrelease.txt | 874 +++++++++++++++++++++++++++++++++
 1 file changed, 874 insertions(+)
 create mode 100644 Documentation/locking/crossrelease.txt

diff --git a/Documentation/locking/crossrelease.txt b/Documentation/locking/crossrelease.txt
new file mode 100644
index 0000000..bdf1423
--- /dev/null
+++ b/Documentation/locking/crossrelease.txt
@@ -0,0 +1,874 @@
+Crossrelease
+============
+
+Started by Byungchul Park <byungchul.park@lge.com>
+
+Contents:
+
+ (*) Background
+
+     - What causes deadlock
+     - How lockdep works
+
+ (*) Limitation
+
+     - Limit lockdep
+     - Pros from the limitation
+     - Cons from the limitation
+     - Relax the limitation
+
+ (*) Crossrelease
+
+     - Introduce crossrelease
+     - Introduce commit
+
+ (*) Implementation
+
+     - Data structures
+     - How crossrelease works
+
+ (*) Optimizations
+
+     - Avoid duplication
+     - Lockless for hot paths
+
+ (*) APPENDIX A: What lockdep does to work aggresively
+
+ (*) APPENDIX B: How to avoid adding false dependencies
+
+
+==========
+Background
+==========
+
+What causes deadlock
+--------------------
+
+A deadlock occurs when a context is waiting for an event to happen,
+which is impossible because another (or the) context who can trigger the
+event is also waiting for another (or the) event to happen, which is
+also impossible due to the same reason.
+
+For example:
+
+   A context going to trigger event C is waiting for event A to happen.
+   A context going to trigger event A is waiting for event B to happen.
+   A context going to trigger event B is waiting for event C to happen.
+
+A deadlock occurs when these three wait operations run at the same time,
+because event C cannot be triggered if event A does not happen, which in
+turn cannot be triggered if event B does not happen, which in turn
+cannot be triggered if event C does not happen. After all, no event can
+be triggered since any of them never meets its condition to wake up.
+
+A dependency might exist between two waiters and a deadlock might happen
+due to an incorrect releationship between dependencies. Thus, we must
+define what a dependency is first. A dependency exists between them if:
+
+   1. There are two waiters waiting for each event at a given time.
+   2. The only way to wake up each waiter is to trigger its event.
+   3. Whether one can be woken up depends on whether the other can.
+
+Each wait in the example creates its dependency like:
+
+   Event C depends on event A.
+   Event A depends on event B.
+   Event B depends on event C.
+
+   NOTE: Precisely speaking, a dependency is one between whether a
+   waiter for an event can be woken up and whether another waiter for
+   another event can be woken up. However from now on, we will describe
+   a dependency as if it's one between an event and another event for
+   simplicity.
+
+And they form circular dependencies like:
+
+    -> C -> A -> B -
+   /                \
+   \                /
+    ----------------
+
+   where 'A -> B' means that event A depends on event B.
+
+Such circular dependencies lead to a deadlock since no waiter can meet
+its condition to wake up as described.
+
+CONCLUSION
+
+Circular dependencies cause a deadlock.
+
+
+How lockdep works
+-----------------
+
+Lockdep tries to detect a deadlock by checking dependencies created by
+lock operations, acquire and release. Waiting for a lock corresponds to
+waiting for an event, and releasing a lock corresponds to triggering an
+event in the previous section.
+
+In short, lockdep does:
+
+   1. Detect a new dependency.
+   2. Add the dependency into a global graph.
+   3. Check if that makes dependencies circular.
+   4. Report a deadlock or its possibility if so.
+
+For example, consider a graph built by lockdep that looks like:
+
+   A -> B -
+           \
+            -> E
+           /
+   C -> D -
+
+   where A, B,..., E are different lock classes.
+
+Lockdep will add a dependency into the graph on detection of a new
+dependency. For example, it will add a dependency 'E -> C' when a new
+dependency between lock E and lock C is detected. Then the graph will be:
+
+       A -> B -
+               \
+                -> E -
+               /      \
+    -> C -> D -        \
+   /                   /
+   \                  /
+    ------------------
+
+   where A, B,..., E are different lock classes.
+
+This graph contains a subgraph which demonstrates circular dependencies:
+
+                -> E -
+               /      \
+    -> C -> D -        \
+   /                   /
+   \                  /
+    ------------------
+
+   where C, D and E are different lock classes.
+
+This is the condition under which a deadlock might occur. Lockdep
+reports it on detection after adding a new dependency. This is the way
+how lockdep works.
+
+CONCLUSION
+
+Lockdep detects a deadlock or its possibility by checking if circular
+dependencies were created after adding each new dependency.
+
+
+==========
+Limitation
+==========
+
+Limit lockdep
+-------------
+
+Limiting lockdep to work on only typical locks e.g. spin locks and
+mutexes, which are released within the acquire context, the
+implementation becomes simple but its capacity for detection becomes
+limited. Let's check pros and cons in next section.
+
+
+Pros from the limitation
+------------------------
+
+Given the limitation, when acquiring a lock, locks in a held_locks
+cannot be released if the context cannot acquire it so has to wait to
+acquire it, which means all waiters for the locks in the held_locks are
+stuck. It's an exact case to create dependencies between each lock in
+the held_locks and the lock to acquire.
+
+For example:
+
+   CONTEXT X
+   ---------
+   acquire A
+   acquire B /* Add a dependency 'A -> B' */
+   release B
+   release A
+
+   where A and B are different lock classes.
+
+When acquiring lock A, the held_locks of CONTEXT X is empty thus no
+dependency is added. But when acquiring lock B, lockdep detects and adds
+a new dependency 'A -> B' between lock A in the held_locks and lock B.
+They can be simply added whenever acquiring each lock.
+
+And data required by lockdep exists in a local structure, held_locks
+embedded in task_struct. Forcing to access the data within the context,
+lockdep can avoid racy problems without explicit locks while handling
+the local data.
+
+Lastly, lockdep only needs to keep locks currently being held, to build
+a dependency graph. However, relaxing the limitation, it needs to keep
+even locks already released, because a decision whether they created
+dependencies might be long-deferred.
+
+To sum up, we can expect several advantages from the limitation:
+
+   1. Lockdep can easily identify a dependency when acquiring a lock.
+   2. Races are avoidable while accessing local locks in a held_locks.
+   3. Lockdep only needs to keep locks currently being held.
+
+CONCLUSION
+
+Given the limitation, the implementation becomes simple and efficient.
+
+
+Cons from the limitation
+------------------------
+
+Given the limitation, lockdep is applicable only to typical locks. For
+example, page locks for page access or completions for synchronization
+cannot work with lockdep.
+
+Can we detect deadlocks below, under the limitation?
+
+Example 1:
+
+   CONTEXT X	   CONTEXT Y	   CONTEXT Z
+   ---------	   ---------	   ----------
+		   mutex_lock A
+   lock_page B
+		   lock_page B
+				   mutex_lock A /* DEADLOCK */
+				   unlock_page B held by X
+		   unlock_page B
+		   mutex_unlock A
+				   mutex_unlock A
+
+   where A and B are different lock classes.
+
+No, we cannot.
+
+Example 2:
+
+   CONTEXT X		   CONTEXT Y
+   ---------		   ---------
+			   mutex_lock A
+   mutex_lock A
+			   wait_for_complete B /* DEADLOCK */
+   complete B
+			   mutex_unlock A
+   mutex_unlock A
+
+   where A is a lock class and B is a completion variable.
+
+No, we cannot.
+
+CONCLUSION
+
+Given the limitation, lockdep cannot detect a deadlock or its
+possibility caused by page locks or completions.
+
+
+Relax the limitation
+--------------------
+
+Under the limitation, things to create dependencies are limited to
+typical locks. However, synchronization primitives like page locks and
+completions, which are allowed to be released in any context, also
+create dependencies and can cause a deadlock. So lockdep should track
+these locks to do a better job. We have to relax the limitation for
+these locks to work with lockdep.
+
+Detecting dependencies is very important for lockdep to work because
+adding a dependency means adding an opportunity to check whether it
+causes a deadlock. The more lockdep adds dependencies, the more it
+thoroughly works. Thus Lockdep has to do its best to detect and add as
+many true dependencies into a graph as possible.
+
+For example, considering only typical locks, lockdep builds a graph like:
+
+   A -> B -
+           \
+            -> E
+           /
+   C -> D -
+
+   where A, B,..., E are different lock classes.
+
+On the other hand, under the relaxation, additional dependencies might
+be created and added. Assuming additional 'FX -> C' and 'E -> GX' are
+added thanks to the relaxation, the graph will be:
+
+         A -> B -
+                 \
+                  -> E -> GX
+                 /
+   FX -> C -> D -
+
+   where A, B,..., E, FX and GX are different lock classes, and a suffix
+   'X' is added on non-typical locks.
+
+The latter graph gives us more chances to check circular dependencies
+than the former. However, it might suffer performance degradation since
+relaxing the limitation, with which design and implementation of lockdep
+can be efficient, might introduce inefficiency inevitably. So lockdep
+should provide two options, strong detection and efficient detection.
+
+Choosing efficient detection:
+
+   Lockdep works with only locks restricted to be released within the
+   acquire context. However, lockdep works efficiently.
+
+Choosing strong detection:
+
+   Lockdep works with all synchronization primitives. However, lockdep
+   suffers performance degradation.
+
+CONCLUSION
+
+Relaxing the limitation, lockdep can add additional dependencies giving
+additional opportunities to check circular dependencies.
+
+
+============
+Crossrelease
+============
+
+Introduce crossrelease
+----------------------
+
+In order to allow lockdep to handle additional dependencies by what
+might be released in any context, namely 'crosslock', we have to be able
+to identify those created by crosslocks. The proposed 'crossrelease'
+feature provoides a way to do that.
+
+Crossrelease feature has to do:
+
+   1. Identify dependencies created by crosslocks.
+   2. Add the dependencies into a dependency graph.
+
+That's all. Once a meaningful dependency is added into graph, then
+lockdep would work with the graph as it did. The most important thing
+crossrelease feature has to do is to correctly identify and add true
+dependencies into the global graph.
+
+A dependency e.g. 'A -> B' can be identified only in the A's release
+context because a decision required to identify the dependency can be
+made only in the release context. That is to decide whether A can be
+released so that a waiter for A can be woken up. It cannot be made in
+other than the A's release context.
+
+It's no matter for typical locks because each acquire context is same as
+its release context, thus lockdep can decide whether a lock can be
+released in the acquire context. However for crosslocks, lockdep cannot
+make the decision in the acquire context but has to wait until the
+release context is identified.
+
+Therefore, deadlocks by crosslocks cannot be detected just when it
+happens, because those cannot be identified until the crosslocks are
+released. However, deadlock possibilities can be detected and it's very
+worth. See 'APPENDIX A' section to check why.
+
+CONCLUSION
+
+Using crossrelease feature, lockdep can work with what might be released
+in any context, namely crosslock.
+
+
+Introduce commit
+----------------
+
+Since crossrelease defers the work adding true dependencies of
+crosslocks until they are actually released, crossrelease has to queue
+all acquisitions which might create dependencies with the crosslocks.
+Then it identifies dependencies using the queued data in batches at a
+proper time. We call it 'commit'.
+
+There are four types of dependencies:
+
+1. TT type: 'typical lock A -> typical lock B'
+
+   Just when acquiring B, lockdep can see it's in the A's release
+   context. So the dependency between A and B can be identified
+   immediately. Commit is unnecessary.
+
+2. TC type: 'typical lock A -> crosslock BX'
+
+   Just when acquiring BX, lockdep can see it's in the A's release
+   context. So the dependency between A and BX can be identified
+   immediately. Commit is unnecessary, too.
+
+3. CT type: 'crosslock AX -> typical lock B'
+
+   When acquiring B, lockdep cannot identify the dependency because
+   there's no way to know if it's in the AX's release context. It has
+   to wait until the decision can be made. Commit is necessary.
+
+4. CC type: 'crosslock AX -> crosslock BX'
+
+   When acquiring BX, lockdep cannot identify the dependency because
+   there's no way to know if it's in the AX's release context. It has
+   to wait until the decision can be made. Commit is necessary.
+   But, handling CC type is not implemented yet. It's a future work.
+
+Lockdep can work without commit for typical locks, but commit step is
+necessary once crosslocks are involved. Introducing commit, lockdep
+performs three steps. What lockdep does in each step is:
+
+1. Acquisition: For typical locks, lockdep does what it originally did
+   and queues the lock so that CT type dependencies can be checked using
+   it at the commit step. For crosslocks, it saves data which will be
+   used at the commit step and increases a reference count for it.
+
+2. Commit: No action is reauired for typical locks. For crosslocks,
+   lockdep adds CT type dependencies using the data saved at the
+   acquisition step.
+
+3. Release: No changes are required for typical locks. When a crosslock
+   is released, it decreases a reference count for it.
+
+CONCLUSION
+
+Crossrelease introduces commit step to handle dependencies of crosslocks
+in batches at a proper time.
+
+
+==============
+Implementation
+==============
+
+Data structures
+---------------
+
+Crossrelease introduces two main data structures.
+
+1. hist_lock
+
+   This is an array embedded in task_struct, for keeping lock history so
+   that dependencies can be added using them at the commit step. Since
+   it's local data, it can be accessed locklessly in the owner context.
+   The array is filled at the acquisition step and consumed at the
+   commit step. And it's managed in circular manner.
+
+2. cross_lock
+
+   One per lockdep_map exists. This is for keeping data of crosslocks
+   and used at the commit step.
+
+
+How crossrelease works
+----------------------
+
+It's the key of how crossrelease works, to defer necessary works to an
+appropriate point in time and perform in at once at the commit step.
+Let's take a look with examples step by step, starting from how lockdep
+works without crossrelease for typical locks.
+
+   acquire A /* Push A onto held_locks */
+   acquire B /* Push B onto held_locks and add 'A -> B' */
+   acquire C /* Push C onto held_locks and add 'B -> C' */
+   release C /* Pop C from held_locks */
+   release B /* Pop B from held_locks */
+   release A /* Pop A from held_locks */
+
+   where A, B and C are different lock classes.
+
+   NOTE: This document assumes that readers already understand how
+   lockdep works without crossrelease thus omits details. But there's
+   one thing to note. Lockdep pretends to pop a lock from held_locks
+   when releasing it. But it's subtly different from the original pop
+   operation because lockdep allows other than the top to be poped.
+
+In this case, lockdep adds 'the top of held_locks -> the lock to acquire'
+dependency every time acquiring a lock.
+
+After adding 'A -> B', a dependency graph will be:
+
+   A -> B
+
+   where A and B are different lock classes.
+
+And after adding 'B -> C', the graph will be:
+
+   A -> B -> C
+
+   where A, B and C are different lock classes.
+
+Let's performs commit step even for typical locks to add dependencies.
+Of course, commit step is not necessary for them, however, it would work
+well because this is a more general way.
+
+   acquire A
+   /*
+    * Queue A into hist_locks
+    *
+    * In hist_locks: A
+    * In graph: Empty
+    */
+
+   acquire B
+   /*
+    * Queue B into hist_locks
+    *
+    * In hist_locks: A, B
+    * In graph: Empty
+    */
+
+   acquire C
+   /*
+    * Queue C into hist_locks
+    *
+    * In hist_locks: A, B, C
+    * In graph: Empty
+    */
+
+   commit C
+   /*
+    * Add 'C -> ?'
+    * Answer the following to decide '?'
+    * What has been queued since acquire C: Nothing
+    *
+    * In hist_locks: A, B, C
+    * In graph: Empty
+    */
+
+   release C
+
+   commit B
+   /*
+    * Add 'B -> ?'
+    * Answer the following to decide '?'
+    * What has been queued since acquire B: C
+    *
+    * In hist_locks: A, B, C
+    * In graph: 'B -> C'
+    */
+
+   release B
+
+   commit A
+   /*
+    * Add 'A -> ?'
+    * Answer the following to decide '?'
+    * What has been queued since acquire A: B, C
+    *
+    * In hist_locks: A, B, C
+    * In graph: 'B -> C', 'A -> B', 'A -> C'
+    */
+
+   release A
+
+   where A, B and C are different lock classes.
+
+In this case, dependencies are added at the commit step as described.
+
+After commits for A, B and C, the graph will be:
+
+   A -> B -> C
+
+   where A, B and C are different lock classes.
+
+   NOTE: A dependency 'A -> C' is optimized out.
+
+We can see the former graph built without commit step is same as the
+latter graph built using commit steps. Of course the former way leads to
+earlier finish for building the graph, which means we can detect a
+deadlock or its possibility sooner. So the former way would be prefered
+when possible. But we cannot avoid using the latter way for crosslocks.
+
+Let's look at how commit steps work for crosslocks. In this case, the
+commit step is performed only on crosslock AX as real. And it assumes
+that the AX release context is different from the AX acquire context.
+
+   BX RELEASE CONTEXT		   BX ACQUIRE CONTEXT
+   ------------------		   ------------------
+				   acquire A
+				   /*
+				    * Push A onto held_locks
+				    * Queue A into hist_locks
+				    *
+				    * In held_locks: A
+				    * In hist_locks: A
+				    * In graph: Empty
+				    */
+
+				   acquire BX
+				   /*
+				    * Add 'the top of held_locks -> BX'
+				    *
+				    * In held_locks: A
+				    * In hist_locks: A
+				    * In graph: 'A -> BX'
+				    */
+
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+   It must be guaranteed that the following operations are seen after
+   acquiring BX globally. It can be done by things like barrier.
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+   acquire C
+   /*
+    * Push C onto held_locks
+    * Queue C into hist_locks
+    *
+    * In held_locks: C
+    * In hist_locks: C
+    * In graph: 'A -> BX'
+    */
+
+   release C
+   /*
+    * Pop C from held_locks
+    *
+    * In held_locks: Empty
+    * In hist_locks: C
+    * In graph: 'A -> BX'
+    */
+				   acquire D
+				   /*
+				    * Push D onto held_locks
+				    * Queue D into hist_locks
+				    * Add 'the top of held_locks -> D'
+				    *
+				    * In held_locks: A, D
+				    * In hist_locks: A, D
+				    * In graph: 'A -> BX', 'A -> D'
+				    */
+   acquire E
+   /*
+    * Push E onto held_locks
+    * Queue E into hist_locks
+    *
+    * In held_locks: E
+    * In hist_locks: C, E
+    * In graph: 'A -> BX', 'A -> D'
+    */
+
+   release E
+   /*
+    * Pop E from held_locks
+    *
+    * In held_locks: Empty
+    * In hist_locks: D, E
+    * In graph: 'A -> BX', 'A -> D'
+    */
+				   release D
+				   /*
+				    * Pop D from held_locks
+				    *
+				    * In held_locks: A
+				    * In hist_locks: A, D
+				    * In graph: 'A -> BX', 'A -> D'
+				    */
+   commit BX
+   /*
+    * Add 'BX -> ?'
+    * What has been queued since acquire BX: C, E
+    *
+    * In held_locks: Empty
+    * In hist_locks: D, E
+    * In graph: 'A -> BX', 'A -> D',
+    *           'BX -> C', 'BX -> E'
+    */
+
+   release BX
+   /*
+    * In held_locks: Empty
+    * In hist_locks: D, E
+    * In graph: 'A -> BX', 'A -> D',
+    *           'BX -> C', 'BX -> E'
+    */
+				   release A
+				   /*
+				    * Pop A from held_locks
+				    *
+				    * In held_locks: Empty
+				    * In hist_locks: A, D
+				    * In graph: 'A -> BX', 'A -> D',
+				    *           'BX -> C', 'BX -> E'
+				    */
+
+   where A, BX, C,..., E are different lock classes, and a suffix 'X' is
+   added on crosslocks.
+
+Crossrelease considers all acquisitions after acqiuring BX are
+candidates which might create dependencies with BX. True dependencies
+will be determined when identifying the release context of BX. Meanwhile,
+all typical locks are queued so that they can be used at the commit step.
+And then two dependencies 'BX -> C' and 'BX -> E' are added at the
+commit step when identifying the release context.
+
+The final graph will be, with crossrelease:
+
+               -> C
+              /
+       -> BX -
+      /       \
+   A -         -> E
+      \
+       -> D
+
+   where A, BX, C,..., E are different lock classes, and a suffix 'X' is
+   added on crosslocks.
+
+However, the final graph will be, without crossrelease:
+
+   A -> D
+
+   where A and D are different lock classes.
+
+The former graph has three more dependencies, 'A -> BX', 'BX -> C' and
+'BX -> E' giving additional opportunities to check if they cause
+deadlocks. This way lockdep can detect a deadlock or its possibility
+caused by crosslocks.
+
+CONCLUSION
+
+We checked how crossrelease works with several examples.
+
+
+=============
+Optimizations
+=============
+
+Avoid duplication
+-----------------
+
+Crossrelease feature uses a cache like what lockdep already uses for
+dependency chains, but this time it's for caching CT type dependencies.
+Once that dependency is cached, the same will never be added again.
+
+
+Lockless for hot paths
+----------------------
+
+To keep all locks for later use at the commit step, crossrelease adopts
+a local array embedded in task_struct, which makes access to the data
+lockless by forcing it to happen only within the owner context. It's
+like how lockdep handles held_locks. Lockless implmentation is important
+since typical locks are very frequently acquired and released.
+
+
+=================================================
+APPENDIX A: What lockdep does to work aggresively
+=================================================
+
+A deadlock actually occurs when all wait operations creating circular
+dependencies run at the same time. Even though they don't, a potential
+deadlock exists if the problematic dependencies exist. Thus it's
+meaningful to detect not only an actual deadlock but also its potential
+possibility. The latter is rather valuable. When a deadlock occurs
+actually, we can identify what happens in the system by some means or
+other even without lockdep. However, there's no way to detect possiblity
+without lockdep unless the whole code is parsed in head. It's terrible.
+Lockdep does the both, and crossrelease only focuses on the latter.
+
+Whether or not a deadlock actually occurs depends on several factors.
+For example, what order contexts are switched in is a factor. Assuming
+circular dependencies exist, a deadlock would occur when contexts are
+switched so that all wait operations creating the dependencies run
+simultaneously. Thus to detect a deadlock possibility even in the case
+that it has not occured yet, lockdep should consider all possible
+combinations of dependencies, trying to:
+
+1. Use a global dependency graph.
+
+   Lockdep combines all dependencies into one global graph and uses them,
+   regardless of which context generates them or what order contexts are
+   switched in. Aggregated dependencies are only considered so they are
+   prone to be circular if a problem exists.
+
+2. Check dependencies between classes instead of instances.
+
+   What actually causes a deadlock are instances of lock. However,
+   lockdep checks dependencies between classes instead of instances.
+   This way lockdep can detect a deadlock which has not happened but
+   might happen in future by others but the same class.
+
+3. Assume all acquisitions lead to waiting.
+
+   Although locks might be acquired without waiting which is essential
+   to create dependencies, lockdep assumes all acquisitions lead to
+   waiting since it might be true some time or another.
+
+CONCLUSION
+
+Lockdep detects not only an actual deadlock but also its possibility,
+and the latter is more valuable.
+
+
+==================================================
+APPENDIX B: How to avoid adding false dependencies
+==================================================
+
+Remind what a dependency is. A dependency exists if:
+
+   1. There are two waiters waiting for each event at a given time.
+   2. The only way to wake up each waiter is to trigger its event.
+   3. Whether one can be woken up depends on whether the other can.
+
+For example:
+
+   acquire A
+   acquire B /* A dependency 'A -> B' exists */
+   release B
+   release A
+
+   where A and B are different lock classes.
+
+A depedency 'A -> B' exists since:
+
+   1. A waiter for A and a waiter for B might exist when acquiring B.
+   2. Only way to wake up each is to release what it waits for.
+   3. Whether the waiter for A can be woken up depends on whether the
+      other can. IOW, TASK X cannot release A if it fails to acquire B.
+
+For another example:
+
+   TASK X			   TASK Y
+   ------			   ------
+				   acquire AX
+   acquire B /* A dependency 'AX -> B' exists */
+   release B
+   release AX held by Y
+
+   where AX and B are different lock classes, and a suffix 'X' is added
+   on crosslocks.
+
+Even in this case involving crosslocks, the same rule can be applied. A
+depedency 'AX -> B' exists since:
+
+   1. A waiter for AX and a waiter for B might exist when acquiring B.
+   2. Only way to wake up each is to release what it waits for.
+   3. Whether the waiter for AX can be woken up depends on whether the
+      other can. IOW, TASK X cannot release AX if it fails to acquire B.
+
+Let's take a look at more complicated example:
+
+   TASK X			   TASK Y
+   ------			   ------
+   acquire B
+   release B
+   fork Y
+				   acquire AX
+   acquire C /* A dependency 'AX -> C' exists */
+   release C
+   release AX held by Y
+
+   where AX, B and C are different lock classes, and a suffix 'X' is
+   added on crosslocks.
+
+Does a dependency 'AX -> B' exist? Nope.
+
+Two waiters are essential to create a dependency. However, waiters for
+AX and B to create 'AX -> B' cannot exist at the same time in this
+example. Thus the dependency 'AX -> B' cannot be created.
+
+It would be ideal if the full set of true ones can be considered. But
+we can ensure nothing but what actually happened. Relying on what
+actually happens at runtime, we can anyway add only true ones, though
+they might be a subset of true ones. It's similar to how lockdep works
+for typical locks. There might be more true dependencies than what
+lockdep has detected in runtime. Lockdep has no choice but to rely on
+what actually happens. Crossrelease also relies on it.
+
+CONCLUSION
+
+Relying on what actually happens, lockdep can avoid adding false
+dependencies.
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 05/16] lockdep: Implement crossrelease feature
  2017-05-24  8:59 ` [PATCH v7 05/16] lockdep: Implement crossrelease feature Byungchul Park
@ 2017-06-13  0:33   ` Byungchul Park
  2017-06-22 23:27     ` Byungchul Park
  2017-07-11 16:04   ` Peter Zijlstra
  1 sibling, 1 reply; 41+ messages in thread
From: Byungchul Park @ 2017-06-13  0:33 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

On Wed, May 24, 2017 at 05:59:38PM +0900, Byungchul Park wrote:
> Lockdep is a runtime locking correctness validator that detects and
> reports a deadlock or its possibility by checking dependencies between
> locks. It's useful since it does not report just an actual deadlock but
> also the possibility of a deadlock that has not actually happened yet.
> That enables problems to be fixed before they affect real systems.
> 
> However, this facility is only applicable to typical locks, such as
> spinlocks and mutexes, which are normally released within the context in
> which they were acquired. However, synchronization primitives like page
> locks or completions, which are allowed to be released in any context,
> also create dependencies and can cause a deadlock. So lockdep should
> track these locks to do a better job. The 'crossrelease' implementation
> makes these primitives also be tracked.

Hello,

I think you need to spend much time to review the patches, but 3 weeks
has passed since I submited. It would be appriciated if you spend more
time on it.

Thank you,
Byungchul

> 
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
>  include/linux/irqflags.h |  24 ++-
>  include/linux/lockdep.h  | 111 ++++++++++-
>  include/linux/sched.h    |   8 +
>  kernel/exit.c            |   1 +
>  kernel/fork.c            |   3 +
>  kernel/locking/lockdep.c | 474 ++++++++++++++++++++++++++++++++++++++++++++---
>  kernel/workqueue.c       |   2 +
>  lib/Kconfig.debug        |  12 ++
>  8 files changed, 601 insertions(+), 34 deletions(-)
> 
> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> index 5dd1272..c40af8a 100644
> --- a/include/linux/irqflags.h
> +++ b/include/linux/irqflags.h
> @@ -23,10 +23,26 @@
>  # define trace_softirq_context(p)	((p)->softirq_context)
>  # define trace_hardirqs_enabled(p)	((p)->hardirqs_enabled)
>  # define trace_softirqs_enabled(p)	((p)->softirqs_enabled)
> -# define trace_hardirq_enter()	do { current->hardirq_context++; } while (0)
> -# define trace_hardirq_exit()	do { current->hardirq_context--; } while (0)
> -# define lockdep_softirq_enter()	do { current->softirq_context++; } while (0)
> -# define lockdep_softirq_exit()	do { current->softirq_context--; } while (0)
> +# define trace_hardirq_enter()		\
> +do {					\
> +	current->hardirq_context++;	\
> +	crossrelease_hardirq_start();	\
> +} while (0)
> +# define trace_hardirq_exit()		\
> +do {					\
> +	current->hardirq_context--;	\
> +	crossrelease_hardirq_end();	\
> +} while (0)
> +# define lockdep_softirq_enter()	\
> +do {					\
> +	current->softirq_context++;	\
> +	crossrelease_softirq_start();	\
> +} while (0)
> +# define lockdep_softirq_exit()		\
> +do {					\
> +	current->softirq_context--;	\
> +	crossrelease_softirq_end();	\
> +} while (0)
>  # define INIT_TRACE_IRQFLAGS	.softirqs_enabled = 1,
>  #else
>  # define trace_hardirqs_on()		do { } while (0)
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index c1458fe..d531097 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -155,6 +155,12 @@ struct lockdep_map {
>  	int				cpu;
>  	unsigned long			ip;
>  #endif
> +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> +	/*
> +	 * Whether it's a crosslock.
> +	 */
> +	int				cross;
> +#endif
>  };
>  
>  static inline void lockdep_copy_map(struct lockdep_map *to,
> @@ -258,7 +264,61 @@ struct held_lock {
>  	unsigned int hardirqs_off:1;
>  	unsigned int references:12;					/* 32 bits */
>  	unsigned int pin_count;
> +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> +	/*
> +	 * Generation id.
> +	 *
> +	 * A value of cross_gen_id will be stored when holding this,
> +	 * which is globally increased whenever each crosslock is held.
> +	 */
> +	unsigned int gen_id;
> +#endif
> +};
> +
> +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> +#define MAX_XHLOCK_TRACE_ENTRIES 5
> +
> +/*
> + * This is for keeping locks waiting for commit so that true dependencies
> + * can be added at commit step.
> + */
> +struct hist_lock {
> +	/*
> +	 * Seperate stack_trace data. This will be used at commit step.
> +	 */
> +	struct stack_trace	trace;
> +	unsigned long		trace_entries[MAX_XHLOCK_TRACE_ENTRIES];
> +
> +	/*
> +	 * Seperate hlock instance. This will be used at commit step.
> +	 *
> +	 * TODO: Use a smaller data structure containing only necessary
> +	 * data. However, we should make lockdep code able to handle the
> +	 * smaller one first.
> +	 */
> +	struct held_lock	hlock;
> +};
> +
> +/*
> + * To initialize a lock as crosslock, lockdep_init_map_crosslock() should
> + * be called instead of lockdep_init_map().
> + */
> +struct cross_lock {
> +	/*
> +	 * Seperate hlock instance. This will be used at commit step.
> +	 *
> +	 * TODO: Use a smaller data structure containing only necessary
> +	 * data. However, we should make lockdep code able to handle the
> +	 * smaller one first.
> +	 */
> +	struct held_lock	hlock;
> +};
> +
> +struct lockdep_map_cross {
> +	struct lockdep_map map;
> +	struct cross_lock xlock;
>  };
> +#endif
>  
>  /*
>   * Initialization, self-test and debugging-output methods:
> @@ -282,13 +342,6 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
>  			     struct lock_class_key *key, int subclass);
>  
>  /*
> - * To initialize a lockdep_map statically use this macro.
> - * Note that _name must not be NULL.
> - */
> -#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
> -	{ .name = (_name), .key = (void *)(_key), }
> -
> -/*
>   * Reinitialize a lock key - for cases where there is special locking or
>   * special initialization of locks so that the validator gets the scope
>   * of dependencies wrong: they are either too broad (they need a class-split)
> @@ -443,6 +496,50 @@ static inline void lockdep_on(void)
>  
>  #endif /* !LOCKDEP */
>  
> +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> +extern void lockdep_init_map_crosslock(struct lockdep_map *lock,
> +				       const char *name,
> +				       struct lock_class_key *key,
> +				       int subclass);
> +extern void lock_commit_crosslock(struct lockdep_map *lock);
> +
> +#define STATIC_CROSS_LOCKDEP_MAP_INIT(_name, _key) \
> +	{ .map.name = (_name), .map.key = (void *)(_key), \
> +	  .map.cross = 1, }
> +
> +/*
> + * To initialize a lockdep_map statically use this macro.
> + * Note that _name must not be NULL.
> + */
> +#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
> +	{ .name = (_name), .key = (void *)(_key), .cross = 0, }
> +
> +extern void crossrelease_hardirq_start(void);
> +extern void crossrelease_hardirq_end(void);
> +extern void crossrelease_softirq_start(void);
> +extern void crossrelease_softirq_end(void);
> +extern void crossrelease_work_start(void);
> +extern void crossrelease_work_end(void);
> +extern void init_crossrelease_task(struct task_struct *task);
> +extern void free_crossrelease_task(struct task_struct *task);
> +#else
> +/*
> + * To initialize a lockdep_map statically use this macro.
> + * Note that _name must not be NULL.
> + */
> +#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
> +	{ .name = (_name), .key = (void *)(_key), }
> +
> +static inline void crossrelease_hardirq_start(void) {}
> +static inline void crossrelease_hardirq_end(void) {}
> +static inline void crossrelease_softirq_start(void) {}
> +static inline void crossrelease_softirq_end(void) {}
> +static inline void crossrelease_work_start(void) {}
> +static inline void crossrelease_work_end(void) {}
> +static inline void init_crossrelease_task(struct task_struct *task) {}
> +static inline void free_crossrelease_task(struct task_struct *task) {}
> +#endif
> +
>  #ifdef CONFIG_LOCK_STAT
>  
>  extern void lock_contended(struct lockdep_map *lock, unsigned long ip);
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index e9c009d..5f6d6f4 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1749,6 +1749,14 @@ struct task_struct {
>  	struct held_lock held_locks[MAX_LOCK_DEPTH];
>  	gfp_t lockdep_reclaim_gfp;
>  #endif
> +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> +#define MAX_XHLOCKS_NR 64UL
> +	struct hist_lock *xhlocks; /* Crossrelease history locks */
> +	unsigned int xhlock_idx;
> +	unsigned int xhlock_idx_soft; /* For restoring at softirq exit */
> +	unsigned int xhlock_idx_hard; /* For restoring at hardirq exit */
> +	unsigned int xhlock_idx_work; /* For restoring at work exit */
> +#endif
>  #ifdef CONFIG_UBSAN
>  	unsigned int in_ubsan;
>  #endif
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 3076f30..cc56aad 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -883,6 +883,7 @@ void __noreturn do_exit(long code)
>  	exit_rcu();
>  	TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));
>  
> +	free_crossrelease_task(tsk);
>  	do_task_dead();
>  }
>  EXPORT_SYMBOL_GPL(do_exit);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 997ac1d..f9623a0 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -451,6 +451,7 @@ void __init fork_init(void)
>  	for (i = 0; i < UCOUNT_COUNTS; i++) {
>  		init_user_ns.ucount_max[i] = max_threads/2;
>  	}
> +	init_crossrelease_task(&init_task);
>  }
>  
>  int __weak arch_dup_task_struct(struct task_struct *dst,
> @@ -1611,6 +1612,7 @@ static __latent_entropy struct task_struct *copy_process(
>  	p->lockdep_depth = 0; /* no locks held yet */
>  	p->curr_chain_key = 0;
>  	p->lockdep_recursion = 0;
> +	init_crossrelease_task(p);
>  #endif
>  
>  #ifdef CONFIG_DEBUG_MUTEXES
> @@ -1856,6 +1858,7 @@ static __latent_entropy struct task_struct *copy_process(
>  bad_fork_cleanup_perf:
>  	perf_event_free_task(p);
>  bad_fork_cleanup_policy:
> +	free_crossrelease_task(p);
>  #ifdef CONFIG_NUMA
>  	mpol_put(p->mempolicy);
>  bad_fork_cleanup_threadgroup_lock:
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 2847356..63eb04a 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -55,6 +55,10 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/lock.h>
>  
> +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> +#include <linux/slab.h>
> +#endif
> +
>  #ifdef CONFIG_PROVE_LOCKING
>  int prove_locking = 1;
>  module_param(prove_locking, int, 0644);
> @@ -709,6 +713,18 @@ static int count_matching_names(struct lock_class *new_class)
>  	return NULL;
>  }
>  
> +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> +static void cross_init(struct lockdep_map *lock, int cross);
> +static int cross_lock(struct lockdep_map *lock);
> +static int lock_acquire_crosslock(struct held_lock *hlock);
> +static int lock_release_crosslock(struct lockdep_map *lock);
> +#else
> +static inline void cross_init(struct lockdep_map *lock, int cross) {}
> +static inline int cross_lock(struct lockdep_map *lock) { return 0; }
> +static inline int lock_acquire_crosslock(struct held_lock *hlock) { return 2; }
> +static inline int lock_release_crosslock(struct lockdep_map *lock) { return 2; }
> +#endif
> +
>  /*
>   * Register a lock's class in the hash-table, if the class is not present
>   * yet. Otherwise we look it up. We cache the result in the lock object
> @@ -1768,6 +1784,9 @@ static inline void inc_chains(void)
>  		if (nest)
>  			return 2;
>  
> +		if (cross_lock(prev->instance))
> +			continue;
> +
>  		return print_deadlock_bug(curr, prev, next);
>  	}
>  	return 1;
> @@ -1921,30 +1940,36 @@ static inline void inc_chains(void)
>  		int distance = curr->lockdep_depth - depth + 1;
>  		hlock = curr->held_locks + depth - 1;
>  		/*
> -		 * Only non-recursive-read entries get new dependencies
> -		 * added:
> +		 * Only non-crosslock entries get new dependencies added.
> +		 * Crosslock entries will be added by commit later:
>  		 */
> -		if (hlock->read != 2 && hlock->check) {
> -			int ret = check_prev_add(curr, hlock, next,
> -						distance, &trace, save);
> -			if (!ret)
> -				return 0;
> -
> +		if (!cross_lock(hlock->instance)) {
>  			/*
> -			 * Stop saving stack_trace if save_trace() was
> -			 * called at least once:
> +			 * Only non-recursive-read entries get new dependencies
> +			 * added:
>  			 */
> -			if (save && ret == 2)
> -				save = NULL;
> +			if (hlock->read != 2 && hlock->check) {
> +				int ret = check_prev_add(curr, hlock, next,
> +							 distance, &trace, save);
> +				if (!ret)
> +					return 0;
>  
> -			/*
> -			 * Stop after the first non-trylock entry,
> -			 * as non-trylock entries have added their
> -			 * own direct dependencies already, so this
> -			 * lock is connected to them indirectly:
> -			 */
> -			if (!hlock->trylock)
> -				break;
> +				/*
> +				 * Stop saving stack_trace if save_trace() was
> +				 * called at least once:
> +				 */
> +				if (save && ret == 2)
> +					save = NULL;
> +
> +				/*
> +				 * Stop after the first non-trylock entry,
> +				 * as non-trylock entries have added their
> +				 * own direct dependencies already, so this
> +				 * lock is connected to them indirectly:
> +				 */
> +				if (!hlock->trylock)
> +					break;
> +			}
>  		}
>  		depth--;
>  		/*
> @@ -3203,7 +3228,7 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this,
>  /*
>   * Initialize a lock instance's lock-class mapping info:
>   */
> -void lockdep_init_map(struct lockdep_map *lock, const char *name,
> +static void __lockdep_init_map(struct lockdep_map *lock, const char *name,
>  		      struct lock_class_key *key, int subclass)
>  {
>  	int i;
> @@ -3261,8 +3286,25 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name,
>  		raw_local_irq_restore(flags);
>  	}
>  }
> +
> +void lockdep_init_map(struct lockdep_map *lock, const char *name,
> +		      struct lock_class_key *key, int subclass)
> +{
> +	cross_init(lock, 0);
> +	__lockdep_init_map(lock, name, key, subclass);
> +}
>  EXPORT_SYMBOL_GPL(lockdep_init_map);
>  
> +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> +void lockdep_init_map_crosslock(struct lockdep_map *lock, const char *name,
> +		      struct lock_class_key *key, int subclass)
> +{
> +	cross_init(lock, 1);
> +	__lockdep_init_map(lock, name, key, subclass);
> +}
> +EXPORT_SYMBOL_GPL(lockdep_init_map_crosslock);
> +#endif
> +
>  struct lock_class_key __lockdep_no_validate__;
>  EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
>  
> @@ -3317,6 +3359,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>  	unsigned int depth;
>  	int chain_head = 0;
>  	int class_idx;
> +	int ret;
>  	u64 chain_key;
>  
>  	if (unlikely(!debug_locks))
> @@ -3366,7 +3409,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>  
>  	class_idx = class - lock_classes + 1;
>  
> -	if (depth) {
> +	/* TODO: nest_lock is not implemented for crosslock yet. */
> +	if (depth && !cross_lock(lock)) {
>  		hlock = curr->held_locks + depth - 1;
>  		if (hlock->class_idx == class_idx && nest_lock) {
>  			if (hlock->references)
> @@ -3447,6 +3491,14 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>  	if (!validate_chain(curr, lock, hlock, chain_head, chain_key))
>  		return 0;
>  
> +	ret = lock_acquire_crosslock(hlock);
> +	/*
> +	 * 2 means normal acquire operations are needed. Otherwise, it's
> +	 * ok just to return with '0:fail, 1:success'.
> +	 */
> +	if (ret != 2)
> +		return ret;
> +
>  	curr->curr_chain_key = chain_key;
>  	curr->lockdep_depth++;
>  	check_chain_key(curr);
> @@ -3610,11 +3662,19 @@ static int match_held_lock(struct held_lock *hlock, struct lockdep_map *lock)
>  	struct task_struct *curr = current;
>  	struct held_lock *hlock, *prev_hlock;
>  	unsigned int depth;
> -	int i;
> +	int ret, i;
>  
>  	if (unlikely(!debug_locks))
>  		return 0;
>  
> +	ret = lock_release_crosslock(lock);
> +	/*
> +	 * 2 means normal release operations are needed. Otherwise, it's
> +	 * ok just to return with '0:fail, 1:success'.
> +	 */
> +	if (ret != 2)
> +		return ret;
> +
>  	depth = curr->lockdep_depth;
>  	/*
>  	 * So we're all set to release this lock.. wait what lock? We don't
> @@ -4557,3 +4617,371 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
>  	dump_stack();
>  }
>  EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
> +
> +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> +
> +#define xhlock(i)         (current->xhlocks[(i) % MAX_XHLOCKS_NR])
> +
> +/*
> + * Whenever a crosslock is held, cross_gen_id will be increased.
> + */
> +static atomic_t cross_gen_id; /* Can be wrapped */
> +
> +void crossrelease_hardirq_start(void)
> +{
> +	if (current->xhlocks)
> +		current->xhlock_idx_hard = current->xhlock_idx;
> +}
> +
> +void crossrelease_hardirq_end(void)
> +{
> +	if (current->xhlocks)
> +		current->xhlock_idx = current->xhlock_idx_hard;
> +}
> +
> +void crossrelease_softirq_start(void)
> +{
> +	if (current->xhlocks)
> +		current->xhlock_idx_soft = current->xhlock_idx;
> +}
> +
> +void crossrelease_softirq_end(void)
> +{
> +	if (current->xhlocks)
> +		current->xhlock_idx = current->xhlock_idx_soft;
> +}
> +
> +/*
> + * Each work of workqueue might run in a different context,
> + * thanks to concurrency support of workqueue. So we have to
> + * distinguish each work to avoid false positive.
> + */
> +void crossrelease_work_start(void)
> +{
> +	if (current->xhlocks)
> +		current->xhlock_idx_work = current->xhlock_idx;
> +}
> +
> +void crossrelease_work_end(void)
> +{
> +	if (current->xhlocks)
> +		current->xhlock_idx = current->xhlock_idx_work;
> +}
> +
> +static int cross_lock(struct lockdep_map *lock)
> +{
> +	return lock ? lock->cross : 0;
> +}
> +
> +/*
> + * This is needed to decide the relationship between wrapable variables.
> + */
> +static inline int before(unsigned int a, unsigned int b)
> +{
> +	return (int)(a - b) < 0;
> +}
> +
> +static inline struct lock_class *xhlock_class(struct hist_lock *xhlock)
> +{
> +	return hlock_class(&xhlock->hlock);
> +}
> +
> +static inline struct lock_class *xlock_class(struct cross_lock *xlock)
> +{
> +	return hlock_class(&xlock->hlock);
> +}
> +
> +/*
> + * Should we check a dependency with previous one?
> + */
> +static inline int depend_before(struct held_lock *hlock)
> +{
> +	return hlock->read != 2 && hlock->check && !hlock->trylock;
> +}
> +
> +/*
> + * Should we check a dependency with next one?
> + */
> +static inline int depend_after(struct held_lock *hlock)
> +{
> +	return hlock->read != 2 && hlock->check;
> +}
> +
> +/*
> + * Check if the xhlock is valid, which would be false if,
> + *
> + *    1. Has not used after initializaion yet.
> + *
> + * Remind hist_lock is implemented as a ring buffer.
> + */
> +static inline int xhlock_valid(struct hist_lock *xhlock)
> +{
> +	/*
> +	 * xhlock->hlock.instance must be !NULL.
> +	 */
> +	return !!xhlock->hlock.instance;
> +}
> +
> +/*
> + * Record a hist_lock entry.
> + *
> + * Irq disable is only required.
> + */
> +static void add_xhlock(struct held_lock *hlock)
> +{
> +	unsigned int idx = ++current->xhlock_idx;
> +	struct hist_lock *xhlock = &xhlock(idx);
> +
> +#ifdef CONFIG_DEBUG_LOCKDEP
> +	/*
> +	 * This can be done locklessly because they are all task-local
> +	 * state, we must however ensure IRQs are disabled.
> +	 */
> +	WARN_ON_ONCE(!irqs_disabled());
> +#endif
> +
> +	/* Initialize hist_lock's members */
> +	xhlock->hlock = *hlock;
> +
> +	xhlock->trace.nr_entries = 0;
> +	xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
> +	xhlock->trace.entries = xhlock->trace_entries;
> +	xhlock->trace.skip = 3;
> +	save_stack_trace(&xhlock->trace);
> +}
> +
> +static inline int same_context_xhlock(struct hist_lock *xhlock)
> +{
> +	return xhlock->hlock.irq_context == task_irq_context(current);
> +}
> +
> +/*
> + * This should be lockless as far as possible because this would be
> + * called very frequently.
> + */
> +static void check_add_xhlock(struct held_lock *hlock)
> +{
> +	/*
> +	 * Record a hist_lock, only in case that acquisitions ahead
> +	 * could depend on the held_lock. For example, if the held_lock
> +	 * is trylock then acquisitions ahead never depends on that.
> +	 * In that case, we don't need to record it. Just return.
> +	 */
> +	if (!current->xhlocks || !depend_before(hlock))
> +		return;
> +
> +	add_xhlock(hlock);
> +}
> +
> +/*
> + * For crosslock.
> + */
> +static int add_xlock(struct held_lock *hlock)
> +{
> +	struct cross_lock *xlock;
> +	unsigned int gen_id;
> +
> +	if (!graph_lock())
> +		return 0;
> +
> +	xlock = &((struct lockdep_map_cross *)hlock->instance)->xlock;
> +
> +	gen_id = (unsigned int)atomic_inc_return(&cross_gen_id);
> +	xlock->hlock = *hlock;
> +	xlock->hlock.gen_id = gen_id;
> +	graph_unlock();
> +
> +	return 1;
> +}
> +
> +/*
> + * Called for both normal and crosslock acquires. Normal locks will be
> + * pushed on the hist_lock queue. Cross locks will record state and
> + * stop regular lock_acquire() to avoid being placed on the held_lock
> + * stack.
> + *
> + * Return: 0 - failure;
> + *         1 - crosslock, done;
> + *         2 - normal lock, continue to held_lock[] ops.
> + */
> +static int lock_acquire_crosslock(struct held_lock *hlock)
> +{
> +	/*
> +	 *	CONTEXT 1		CONTEXT 2
> +	 *	---------		---------
> +	 *	lock A (cross)
> +	 *	X = atomic_inc_return(&cross_gen_id)
> +	 *	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +	 *				Y = atomic_read_acquire(&cross_gen_id)
> +	 *				lock B
> +	 *
> +	 * atomic_read_acquire() is for ordering between A and B,
> +	 * IOW, A happens before B, when CONTEXT 2 see Y >= X.
> +	 *
> +	 * Pairs with atomic_inc_return() in add_xlock().
> +	 */
> +	hlock->gen_id = (unsigned int)atomic_read_acquire(&cross_gen_id);
> +
> +	if (cross_lock(hlock->instance))
> +		return add_xlock(hlock);
> +
> +	check_add_xhlock(hlock);
> +	return 2;
> +}
> +
> +static int copy_trace(struct stack_trace *trace)
> +{
> +	unsigned long *buf = stack_trace + nr_stack_trace_entries;
> +	unsigned int max_nr = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
> +	unsigned int nr = min(max_nr, trace->nr_entries);
> +
> +	trace->nr_entries = nr;
> +	memcpy(buf, trace->entries, nr * sizeof(trace->entries[0]));
> +	trace->entries = buf;
> +	nr_stack_trace_entries += nr;
> +
> +	if (nr_stack_trace_entries >= MAX_STACK_TRACE_ENTRIES-1) {
> +		if (!debug_locks_off_graph_unlock())
> +			return 0;
> +
> +		print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!");
> +		dump_stack();
> +
> +		return 0;
> +	}
> +
> +	return 1;
> +}
> +
> +static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
> +{
> +	unsigned int xid, pid;
> +	u64 chain_key;
> +
> +	xid = xlock_class(xlock) - lock_classes;
> +	chain_key = iterate_chain_key((u64)0, xid);
> +	pid = xhlock_class(xhlock) - lock_classes;
> +	chain_key = iterate_chain_key(chain_key, pid);
> +
> +	if (lookup_chain_cache(chain_key))
> +		return 1;
> +
> +	if (!add_chain_cache_classes(xid, pid, xhlock->hlock.irq_context,
> +				chain_key))
> +		return 0;
> +
> +	if (!check_prev_add(current, &xlock->hlock, &xhlock->hlock, 1,
> +			    &xhlock->trace, copy_trace))
> +		return 0;
> +
> +	return 1;
> +}
> +
> +static void commit_xhlocks(struct cross_lock *xlock)
> +{
> +	unsigned int cur = current->xhlock_idx;
> +	unsigned int i;
> +
> +	if (!graph_lock())
> +		return;
> +
> +	for (i = 0; i < MAX_XHLOCKS_NR; i++) {
> +		struct hist_lock *xhlock = &xhlock(cur - i);
> +
> +		if (!xhlock_valid(xhlock))
> +			break;
> +
> +		if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
> +			break;
> +
> +		if (!same_context_xhlock(xhlock))
> +			break;
> +
> +		/*
> +		 * commit_xhlock() returns 0 with graph_lock already
> +		 * released if fail.
> +		 */
> +		if (!commit_xhlock(xlock, xhlock))
> +			return;
> +	}
> +
> +	graph_unlock();
> +}
> +
> +void lock_commit_crosslock(struct lockdep_map *lock)
> +{
> +	struct cross_lock *xlock;
> +	unsigned long flags;
> +
> +	if (unlikely(!debug_locks || current->lockdep_recursion))
> +		return;
> +
> +	if (!current->xhlocks)
> +		return;
> +
> +	/*
> +	 * Do commit hist_locks with the cross_lock, only in case that
> +	 * the cross_lock could depend on acquisitions after that.
> +	 *
> +	 * For example, if the cross_lock does not have the 'check' flag
> +	 * then we don't need to check dependencies and commit for that.
> +	 * Just skip it. In that case, of course, the cross_lock does
> +	 * not depend on acquisitions ahead, either.
> +	 *
> +	 * WARNING: Don't do that in add_xlock() in advance. When an
> +	 * acquisition context is different from the commit context,
> +	 * invalid(skipped) cross_lock might be accessed.
> +	 */
> +	if (!depend_after(&((struct lockdep_map_cross *)lock)->xlock.hlock))
> +		return;
> +
> +	raw_local_irq_save(flags);
> +	check_flags(flags);
> +	current->lockdep_recursion = 1;
> +	xlock = &((struct lockdep_map_cross *)lock)->xlock;
> +	commit_xhlocks(xlock);
> +	current->lockdep_recursion = 0;
> +	raw_local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL_GPL(lock_commit_crosslock);
> +
> +/*
> + * Return: 1 - crosslock, done;
> + *         2 - normal lock, continue to held_lock[] ops.
> + */
> +static int lock_release_crosslock(struct lockdep_map *lock)
> +{
> +	return cross_lock(lock) ? 1 : 2;
> +}
> +
> +static void cross_init(struct lockdep_map *lock, int cross)
> +{
> +	lock->cross = cross;
> +
> +	/*
> +	 * Crossrelease assumes that the ring buffer size of xhlocks
> +	 * is aligned with power of 2. So force it on build.
> +	 */
> +	BUILD_BUG_ON(MAX_XHLOCKS_NR & (MAX_XHLOCKS_NR - 1));
> +}
> +
> +void init_crossrelease_task(struct task_struct *task)
> +{
> +	task->xhlock_idx = UINT_MAX;
> +	task->xhlock_idx_soft = UINT_MAX;
> +	task->xhlock_idx_hard = UINT_MAX;
> +	task->xhlock_idx_work = UINT_MAX;
> +	task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,
> +				GFP_KERNEL);
> +}
> +
> +void free_crossrelease_task(struct task_struct *task)
> +{
> +	if (task->xhlocks) {
> +		void *tmp = task->xhlocks;
> +		/* Diable crossrelease for current */
> +		task->xhlocks = NULL;
> +		kfree(tmp);
> +	}
> +}
> +#endif
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 479d840..2f43ac1 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2092,6 +2092,7 @@ static void process_one_work(struct worker *worker, struct work_struct *work)
>  
>  	lock_map_acquire_read(&pwq->wq->lockdep_map);
>  	lock_map_acquire(&lockdep_map);
> +	crossrelease_work_start();
>  	trace_workqueue_execute_start(work);
>  	worker->current_func(work);
>  	/*
> @@ -2099,6 +2100,7 @@ static void process_one_work(struct worker *worker, struct work_struct *work)
>  	 * point will only record its address.
>  	 */
>  	trace_workqueue_execute_end(work);
> +	crossrelease_work_end();
>  	lock_map_release(&lockdep_map);
>  	lock_map_release(&pwq->wq->lockdep_map);
>  
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index a6c8db1..e584431 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1042,6 +1042,18 @@ config DEBUG_LOCK_ALLOC
>  	 spin_lock_init()/mutex_init()/etc., or whether there is any lock
>  	 held during task exit.
>  
> +config LOCKDEP_CROSSRELEASE
> +	bool "Lock debugging: make lockdep work for crosslocks"
> +	depends on PROVE_LOCKING
> +	default n
> +	help
> +	 This makes lockdep work for crosslock which is a lock allowed to
> +	 be released in a different context from the acquisition context.
> +	 Normally a lock must be released in the context acquiring the lock.
> +	 However, relexing this constraint helps synchronization primitives
> +	 such as page locks or completions can use the lock correctness
> +	 detector, lockdep.
> +
>  config PROVE_LOCKING
>  	bool "Lock debugging: prove locking correctness"
>  	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
> -- 
> 1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 05/16] lockdep: Implement crossrelease feature
  2017-06-13  0:33   ` Byungchul Park
@ 2017-06-22 23:27     ` Byungchul Park
  0 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-06-22 23:27 UTC (permalink / raw)
  To: peterz, mingo
  Cc: tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm, akpm,
	willy, npiggin, kernel-team

On Tue, Jun 13, 2017 at 09:33:36AM +0900, Byungchul Park wrote:
> On Wed, May 24, 2017 at 05:59:38PM +0900, Byungchul Park wrote:
> > Lockdep is a runtime locking correctness validator that detects and
> > reports a deadlock or its possibility by checking dependencies between
> > locks. It's useful since it does not report just an actual deadlock but
> > also the possibility of a deadlock that has not actually happened yet.
> > That enables problems to be fixed before they affect real systems.
> > 
> > However, this facility is only applicable to typical locks, such as
> > spinlocks and mutexes, which are normally released within the context in
> > which they were acquired. However, synchronization primitives like page
> > locks or completions, which are allowed to be released in any context,
> > also create dependencies and can cause a deadlock. So lockdep should
> > track these locks to do a better job. The 'crossrelease' implementation
> > makes these primitives also be tracked.
> 
> Hello,
> 
> I think you need to spend much time to review the patches, but 3 weeks
> has passed since I submited. It would be appriciated if you spend more
> time on it.

Hello, Peter

I meant you might need much time to review it.

But more than one month passed. It would be appriciated if you check it.

> 
> Thank you,
> Byungchul
> 
> > 
> > Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> > ---
> >  include/linux/irqflags.h |  24 ++-
> >  include/linux/lockdep.h  | 111 ++++++++++-
> >  include/linux/sched.h    |   8 +
> >  kernel/exit.c            |   1 +
> >  kernel/fork.c            |   3 +
> >  kernel/locking/lockdep.c | 474 ++++++++++++++++++++++++++++++++++++++++++++---
> >  kernel/workqueue.c       |   2 +
> >  lib/Kconfig.debug        |  12 ++
> >  8 files changed, 601 insertions(+), 34 deletions(-)
> > 
> > diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> > index 5dd1272..c40af8a 100644
> > --- a/include/linux/irqflags.h
> > +++ b/include/linux/irqflags.h
> > @@ -23,10 +23,26 @@
> >  # define trace_softirq_context(p)	((p)->softirq_context)
> >  # define trace_hardirqs_enabled(p)	((p)->hardirqs_enabled)
> >  # define trace_softirqs_enabled(p)	((p)->softirqs_enabled)
> > -# define trace_hardirq_enter()	do { current->hardirq_context++; } while (0)
> > -# define trace_hardirq_exit()	do { current->hardirq_context--; } while (0)
> > -# define lockdep_softirq_enter()	do { current->softirq_context++; } while (0)
> > -# define lockdep_softirq_exit()	do { current->softirq_context--; } while (0)
> > +# define trace_hardirq_enter()		\
> > +do {					\
> > +	current->hardirq_context++;	\
> > +	crossrelease_hardirq_start();	\
> > +} while (0)
> > +# define trace_hardirq_exit()		\
> > +do {					\
> > +	current->hardirq_context--;	\
> > +	crossrelease_hardirq_end();	\
> > +} while (0)
> > +# define lockdep_softirq_enter()	\
> > +do {					\
> > +	current->softirq_context++;	\
> > +	crossrelease_softirq_start();	\
> > +} while (0)
> > +# define lockdep_softirq_exit()		\
> > +do {					\
> > +	current->softirq_context--;	\
> > +	crossrelease_softirq_end();	\
> > +} while (0)
> >  # define INIT_TRACE_IRQFLAGS	.softirqs_enabled = 1,
> >  #else
> >  # define trace_hardirqs_on()		do { } while (0)
> > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> > index c1458fe..d531097 100644
> > --- a/include/linux/lockdep.h
> > +++ b/include/linux/lockdep.h
> > @@ -155,6 +155,12 @@ struct lockdep_map {
> >  	int				cpu;
> >  	unsigned long			ip;
> >  #endif
> > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > +	/*
> > +	 * Whether it's a crosslock.
> > +	 */
> > +	int				cross;
> > +#endif
> >  };
> >  
> >  static inline void lockdep_copy_map(struct lockdep_map *to,
> > @@ -258,7 +264,61 @@ struct held_lock {
> >  	unsigned int hardirqs_off:1;
> >  	unsigned int references:12;					/* 32 bits */
> >  	unsigned int pin_count;
> > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > +	/*
> > +	 * Generation id.
> > +	 *
> > +	 * A value of cross_gen_id will be stored when holding this,
> > +	 * which is globally increased whenever each crosslock is held.
> > +	 */
> > +	unsigned int gen_id;
> > +#endif
> > +};
> > +
> > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > +#define MAX_XHLOCK_TRACE_ENTRIES 5
> > +
> > +/*
> > + * This is for keeping locks waiting for commit so that true dependencies
> > + * can be added at commit step.
> > + */
> > +struct hist_lock {
> > +	/*
> > +	 * Seperate stack_trace data. This will be used at commit step.
> > +	 */
> > +	struct stack_trace	trace;
> > +	unsigned long		trace_entries[MAX_XHLOCK_TRACE_ENTRIES];
> > +
> > +	/*
> > +	 * Seperate hlock instance. This will be used at commit step.
> > +	 *
> > +	 * TODO: Use a smaller data structure containing only necessary
> > +	 * data. However, we should make lockdep code able to handle the
> > +	 * smaller one first.
> > +	 */
> > +	struct held_lock	hlock;
> > +};
> > +
> > +/*
> > + * To initialize a lock as crosslock, lockdep_init_map_crosslock() should
> > + * be called instead of lockdep_init_map().
> > + */
> > +struct cross_lock {
> > +	/*
> > +	 * Seperate hlock instance. This will be used at commit step.
> > +	 *
> > +	 * TODO: Use a smaller data structure containing only necessary
> > +	 * data. However, we should make lockdep code able to handle the
> > +	 * smaller one first.
> > +	 */
> > +	struct held_lock	hlock;
> > +};
> > +
> > +struct lockdep_map_cross {
> > +	struct lockdep_map map;
> > +	struct cross_lock xlock;
> >  };
> > +#endif
> >  
> >  /*
> >   * Initialization, self-test and debugging-output methods:
> > @@ -282,13 +342,6 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
> >  			     struct lock_class_key *key, int subclass);
> >  
> >  /*
> > - * To initialize a lockdep_map statically use this macro.
> > - * Note that _name must not be NULL.
> > - */
> > -#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
> > -	{ .name = (_name), .key = (void *)(_key), }
> > -
> > -/*
> >   * Reinitialize a lock key - for cases where there is special locking or
> >   * special initialization of locks so that the validator gets the scope
> >   * of dependencies wrong: they are either too broad (they need a class-split)
> > @@ -443,6 +496,50 @@ static inline void lockdep_on(void)
> >  
> >  #endif /* !LOCKDEP */
> >  
> > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > +extern void lockdep_init_map_crosslock(struct lockdep_map *lock,
> > +				       const char *name,
> > +				       struct lock_class_key *key,
> > +				       int subclass);
> > +extern void lock_commit_crosslock(struct lockdep_map *lock);
> > +
> > +#define STATIC_CROSS_LOCKDEP_MAP_INIT(_name, _key) \
> > +	{ .map.name = (_name), .map.key = (void *)(_key), \
> > +	  .map.cross = 1, }
> > +
> > +/*
> > + * To initialize a lockdep_map statically use this macro.
> > + * Note that _name must not be NULL.
> > + */
> > +#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
> > +	{ .name = (_name), .key = (void *)(_key), .cross = 0, }
> > +
> > +extern void crossrelease_hardirq_start(void);
> > +extern void crossrelease_hardirq_end(void);
> > +extern void crossrelease_softirq_start(void);
> > +extern void crossrelease_softirq_end(void);
> > +extern void crossrelease_work_start(void);
> > +extern void crossrelease_work_end(void);
> > +extern void init_crossrelease_task(struct task_struct *task);
> > +extern void free_crossrelease_task(struct task_struct *task);
> > +#else
> > +/*
> > + * To initialize a lockdep_map statically use this macro.
> > + * Note that _name must not be NULL.
> > + */
> > +#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
> > +	{ .name = (_name), .key = (void *)(_key), }
> > +
> > +static inline void crossrelease_hardirq_start(void) {}
> > +static inline void crossrelease_hardirq_end(void) {}
> > +static inline void crossrelease_softirq_start(void) {}
> > +static inline void crossrelease_softirq_end(void) {}
> > +static inline void crossrelease_work_start(void) {}
> > +static inline void crossrelease_work_end(void) {}
> > +static inline void init_crossrelease_task(struct task_struct *task) {}
> > +static inline void free_crossrelease_task(struct task_struct *task) {}
> > +#endif
> > +
> >  #ifdef CONFIG_LOCK_STAT
> >  
> >  extern void lock_contended(struct lockdep_map *lock, unsigned long ip);
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index e9c009d..5f6d6f4 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1749,6 +1749,14 @@ struct task_struct {
> >  	struct held_lock held_locks[MAX_LOCK_DEPTH];
> >  	gfp_t lockdep_reclaim_gfp;
> >  #endif
> > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > +#define MAX_XHLOCKS_NR 64UL
> > +	struct hist_lock *xhlocks; /* Crossrelease history locks */
> > +	unsigned int xhlock_idx;
> > +	unsigned int xhlock_idx_soft; /* For restoring at softirq exit */
> > +	unsigned int xhlock_idx_hard; /* For restoring at hardirq exit */
> > +	unsigned int xhlock_idx_work; /* For restoring at work exit */
> > +#endif
> >  #ifdef CONFIG_UBSAN
> >  	unsigned int in_ubsan;
> >  #endif
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index 3076f30..cc56aad 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -883,6 +883,7 @@ void __noreturn do_exit(long code)
> >  	exit_rcu();
> >  	TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));
> >  
> > +	free_crossrelease_task(tsk);
> >  	do_task_dead();
> >  }
> >  EXPORT_SYMBOL_GPL(do_exit);
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 997ac1d..f9623a0 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -451,6 +451,7 @@ void __init fork_init(void)
> >  	for (i = 0; i < UCOUNT_COUNTS; i++) {
> >  		init_user_ns.ucount_max[i] = max_threads/2;
> >  	}
> > +	init_crossrelease_task(&init_task);
> >  }
> >  
> >  int __weak arch_dup_task_struct(struct task_struct *dst,
> > @@ -1611,6 +1612,7 @@ static __latent_entropy struct task_struct *copy_process(
> >  	p->lockdep_depth = 0; /* no locks held yet */
> >  	p->curr_chain_key = 0;
> >  	p->lockdep_recursion = 0;
> > +	init_crossrelease_task(p);
> >  #endif
> >  
> >  #ifdef CONFIG_DEBUG_MUTEXES
> > @@ -1856,6 +1858,7 @@ static __latent_entropy struct task_struct *copy_process(
> >  bad_fork_cleanup_perf:
> >  	perf_event_free_task(p);
> >  bad_fork_cleanup_policy:
> > +	free_crossrelease_task(p);
> >  #ifdef CONFIG_NUMA
> >  	mpol_put(p->mempolicy);
> >  bad_fork_cleanup_threadgroup_lock:
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index 2847356..63eb04a 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -55,6 +55,10 @@
> >  #define CREATE_TRACE_POINTS
> >  #include <trace/events/lock.h>
> >  
> > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > +#include <linux/slab.h>
> > +#endif
> > +
> >  #ifdef CONFIG_PROVE_LOCKING
> >  int prove_locking = 1;
> >  module_param(prove_locking, int, 0644);
> > @@ -709,6 +713,18 @@ static int count_matching_names(struct lock_class *new_class)
> >  	return NULL;
> >  }
> >  
> > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > +static void cross_init(struct lockdep_map *lock, int cross);
> > +static int cross_lock(struct lockdep_map *lock);
> > +static int lock_acquire_crosslock(struct held_lock *hlock);
> > +static int lock_release_crosslock(struct lockdep_map *lock);
> > +#else
> > +static inline void cross_init(struct lockdep_map *lock, int cross) {}
> > +static inline int cross_lock(struct lockdep_map *lock) { return 0; }
> > +static inline int lock_acquire_crosslock(struct held_lock *hlock) { return 2; }
> > +static inline int lock_release_crosslock(struct lockdep_map *lock) { return 2; }
> > +#endif
> > +
> >  /*
> >   * Register a lock's class in the hash-table, if the class is not present
> >   * yet. Otherwise we look it up. We cache the result in the lock object
> > @@ -1768,6 +1784,9 @@ static inline void inc_chains(void)
> >  		if (nest)
> >  			return 2;
> >  
> > +		if (cross_lock(prev->instance))
> > +			continue;
> > +
> >  		return print_deadlock_bug(curr, prev, next);
> >  	}
> >  	return 1;
> > @@ -1921,30 +1940,36 @@ static inline void inc_chains(void)
> >  		int distance = curr->lockdep_depth - depth + 1;
> >  		hlock = curr->held_locks + depth - 1;
> >  		/*
> > -		 * Only non-recursive-read entries get new dependencies
> > -		 * added:
> > +		 * Only non-crosslock entries get new dependencies added.
> > +		 * Crosslock entries will be added by commit later:
> >  		 */
> > -		if (hlock->read != 2 && hlock->check) {
> > -			int ret = check_prev_add(curr, hlock, next,
> > -						distance, &trace, save);
> > -			if (!ret)
> > -				return 0;
> > -
> > +		if (!cross_lock(hlock->instance)) {
> >  			/*
> > -			 * Stop saving stack_trace if save_trace() was
> > -			 * called at least once:
> > +			 * Only non-recursive-read entries get new dependencies
> > +			 * added:
> >  			 */
> > -			if (save && ret == 2)
> > -				save = NULL;
> > +			if (hlock->read != 2 && hlock->check) {
> > +				int ret = check_prev_add(curr, hlock, next,
> > +							 distance, &trace, save);
> > +				if (!ret)
> > +					return 0;
> >  
> > -			/*
> > -			 * Stop after the first non-trylock entry,
> > -			 * as non-trylock entries have added their
> > -			 * own direct dependencies already, so this
> > -			 * lock is connected to them indirectly:
> > -			 */
> > -			if (!hlock->trylock)
> > -				break;
> > +				/*
> > +				 * Stop saving stack_trace if save_trace() was
> > +				 * called at least once:
> > +				 */
> > +				if (save && ret == 2)
> > +					save = NULL;
> > +
> > +				/*
> > +				 * Stop after the first non-trylock entry,
> > +				 * as non-trylock entries have added their
> > +				 * own direct dependencies already, so this
> > +				 * lock is connected to them indirectly:
> > +				 */
> > +				if (!hlock->trylock)
> > +					break;
> > +			}
> >  		}
> >  		depth--;
> >  		/*
> > @@ -3203,7 +3228,7 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this,
> >  /*
> >   * Initialize a lock instance's lock-class mapping info:
> >   */
> > -void lockdep_init_map(struct lockdep_map *lock, const char *name,
> > +static void __lockdep_init_map(struct lockdep_map *lock, const char *name,
> >  		      struct lock_class_key *key, int subclass)
> >  {
> >  	int i;
> > @@ -3261,8 +3286,25 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name,
> >  		raw_local_irq_restore(flags);
> >  	}
> >  }
> > +
> > +void lockdep_init_map(struct lockdep_map *lock, const char *name,
> > +		      struct lock_class_key *key, int subclass)
> > +{
> > +	cross_init(lock, 0);
> > +	__lockdep_init_map(lock, name, key, subclass);
> > +}
> >  EXPORT_SYMBOL_GPL(lockdep_init_map);
> >  
> > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > +void lockdep_init_map_crosslock(struct lockdep_map *lock, const char *name,
> > +		      struct lock_class_key *key, int subclass)
> > +{
> > +	cross_init(lock, 1);
> > +	__lockdep_init_map(lock, name, key, subclass);
> > +}
> > +EXPORT_SYMBOL_GPL(lockdep_init_map_crosslock);
> > +#endif
> > +
> >  struct lock_class_key __lockdep_no_validate__;
> >  EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
> >  
> > @@ -3317,6 +3359,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> >  	unsigned int depth;
> >  	int chain_head = 0;
> >  	int class_idx;
> > +	int ret;
> >  	u64 chain_key;
> >  
> >  	if (unlikely(!debug_locks))
> > @@ -3366,7 +3409,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> >  
> >  	class_idx = class - lock_classes + 1;
> >  
> > -	if (depth) {
> > +	/* TODO: nest_lock is not implemented for crosslock yet. */
> > +	if (depth && !cross_lock(lock)) {
> >  		hlock = curr->held_locks + depth - 1;
> >  		if (hlock->class_idx == class_idx && nest_lock) {
> >  			if (hlock->references)
> > @@ -3447,6 +3491,14 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> >  	if (!validate_chain(curr, lock, hlock, chain_head, chain_key))
> >  		return 0;
> >  
> > +	ret = lock_acquire_crosslock(hlock);
> > +	/*
> > +	 * 2 means normal acquire operations are needed. Otherwise, it's
> > +	 * ok just to return with '0:fail, 1:success'.
> > +	 */
> > +	if (ret != 2)
> > +		return ret;
> > +
> >  	curr->curr_chain_key = chain_key;
> >  	curr->lockdep_depth++;
> >  	check_chain_key(curr);
> > @@ -3610,11 +3662,19 @@ static int match_held_lock(struct held_lock *hlock, struct lockdep_map *lock)
> >  	struct task_struct *curr = current;
> >  	struct held_lock *hlock, *prev_hlock;
> >  	unsigned int depth;
> > -	int i;
> > +	int ret, i;
> >  
> >  	if (unlikely(!debug_locks))
> >  		return 0;
> >  
> > +	ret = lock_release_crosslock(lock);
> > +	/*
> > +	 * 2 means normal release operations are needed. Otherwise, it's
> > +	 * ok just to return with '0:fail, 1:success'.
> > +	 */
> > +	if (ret != 2)
> > +		return ret;
> > +
> >  	depth = curr->lockdep_depth;
> >  	/*
> >  	 * So we're all set to release this lock.. wait what lock? We don't
> > @@ -4557,3 +4617,371 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
> >  	dump_stack();
> >  }
> >  EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
> > +
> > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > +
> > +#define xhlock(i)         (current->xhlocks[(i) % MAX_XHLOCKS_NR])
> > +
> > +/*
> > + * Whenever a crosslock is held, cross_gen_id will be increased.
> > + */
> > +static atomic_t cross_gen_id; /* Can be wrapped */
> > +
> > +void crossrelease_hardirq_start(void)
> > +{
> > +	if (current->xhlocks)
> > +		current->xhlock_idx_hard = current->xhlock_idx;
> > +}
> > +
> > +void crossrelease_hardirq_end(void)
> > +{
> > +	if (current->xhlocks)
> > +		current->xhlock_idx = current->xhlock_idx_hard;
> > +}
> > +
> > +void crossrelease_softirq_start(void)
> > +{
> > +	if (current->xhlocks)
> > +		current->xhlock_idx_soft = current->xhlock_idx;
> > +}
> > +
> > +void crossrelease_softirq_end(void)
> > +{
> > +	if (current->xhlocks)
> > +		current->xhlock_idx = current->xhlock_idx_soft;
> > +}
> > +
> > +/*
> > + * Each work of workqueue might run in a different context,
> > + * thanks to concurrency support of workqueue. So we have to
> > + * distinguish each work to avoid false positive.
> > + */
> > +void crossrelease_work_start(void)
> > +{
> > +	if (current->xhlocks)
> > +		current->xhlock_idx_work = current->xhlock_idx;
> > +}
> > +
> > +void crossrelease_work_end(void)
> > +{
> > +	if (current->xhlocks)
> > +		current->xhlock_idx = current->xhlock_idx_work;
> > +}
> > +
> > +static int cross_lock(struct lockdep_map *lock)
> > +{
> > +	return lock ? lock->cross : 0;
> > +}
> > +
> > +/*
> > + * This is needed to decide the relationship between wrapable variables.
> > + */
> > +static inline int before(unsigned int a, unsigned int b)
> > +{
> > +	return (int)(a - b) < 0;
> > +}
> > +
> > +static inline struct lock_class *xhlock_class(struct hist_lock *xhlock)
> > +{
> > +	return hlock_class(&xhlock->hlock);
> > +}
> > +
> > +static inline struct lock_class *xlock_class(struct cross_lock *xlock)
> > +{
> > +	return hlock_class(&xlock->hlock);
> > +}
> > +
> > +/*
> > + * Should we check a dependency with previous one?
> > + */
> > +static inline int depend_before(struct held_lock *hlock)
> > +{
> > +	return hlock->read != 2 && hlock->check && !hlock->trylock;
> > +}
> > +
> > +/*
> > + * Should we check a dependency with next one?
> > + */
> > +static inline int depend_after(struct held_lock *hlock)
> > +{
> > +	return hlock->read != 2 && hlock->check;
> > +}
> > +
> > +/*
> > + * Check if the xhlock is valid, which would be false if,
> > + *
> > + *    1. Has not used after initializaion yet.
> > + *
> > + * Remind hist_lock is implemented as a ring buffer.
> > + */
> > +static inline int xhlock_valid(struct hist_lock *xhlock)
> > +{
> > +	/*
> > +	 * xhlock->hlock.instance must be !NULL.
> > +	 */
> > +	return !!xhlock->hlock.instance;
> > +}
> > +
> > +/*
> > + * Record a hist_lock entry.
> > + *
> > + * Irq disable is only required.
> > + */
> > +static void add_xhlock(struct held_lock *hlock)
> > +{
> > +	unsigned int idx = ++current->xhlock_idx;
> > +	struct hist_lock *xhlock = &xhlock(idx);
> > +
> > +#ifdef CONFIG_DEBUG_LOCKDEP
> > +	/*
> > +	 * This can be done locklessly because they are all task-local
> > +	 * state, we must however ensure IRQs are disabled.
> > +	 */
> > +	WARN_ON_ONCE(!irqs_disabled());
> > +#endif
> > +
> > +	/* Initialize hist_lock's members */
> > +	xhlock->hlock = *hlock;
> > +
> > +	xhlock->trace.nr_entries = 0;
> > +	xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
> > +	xhlock->trace.entries = xhlock->trace_entries;
> > +	xhlock->trace.skip = 3;
> > +	save_stack_trace(&xhlock->trace);
> > +}
> > +
> > +static inline int same_context_xhlock(struct hist_lock *xhlock)
> > +{
> > +	return xhlock->hlock.irq_context == task_irq_context(current);
> > +}
> > +
> > +/*
> > + * This should be lockless as far as possible because this would be
> > + * called very frequently.
> > + */
> > +static void check_add_xhlock(struct held_lock *hlock)
> > +{
> > +	/*
> > +	 * Record a hist_lock, only in case that acquisitions ahead
> > +	 * could depend on the held_lock. For example, if the held_lock
> > +	 * is trylock then acquisitions ahead never depends on that.
> > +	 * In that case, we don't need to record it. Just return.
> > +	 */
> > +	if (!current->xhlocks || !depend_before(hlock))
> > +		return;
> > +
> > +	add_xhlock(hlock);
> > +}
> > +
> > +/*
> > + * For crosslock.
> > + */
> > +static int add_xlock(struct held_lock *hlock)
> > +{
> > +	struct cross_lock *xlock;
> > +	unsigned int gen_id;
> > +
> > +	if (!graph_lock())
> > +		return 0;
> > +
> > +	xlock = &((struct lockdep_map_cross *)hlock->instance)->xlock;
> > +
> > +	gen_id = (unsigned int)atomic_inc_return(&cross_gen_id);
> > +	xlock->hlock = *hlock;
> > +	xlock->hlock.gen_id = gen_id;
> > +	graph_unlock();
> > +
> > +	return 1;
> > +}
> > +
> > +/*
> > + * Called for both normal and crosslock acquires. Normal locks will be
> > + * pushed on the hist_lock queue. Cross locks will record state and
> > + * stop regular lock_acquire() to avoid being placed on the held_lock
> > + * stack.
> > + *
> > + * Return: 0 - failure;
> > + *         1 - crosslock, done;
> > + *         2 - normal lock, continue to held_lock[] ops.
> > + */
> > +static int lock_acquire_crosslock(struct held_lock *hlock)
> > +{
> > +	/*
> > +	 *	CONTEXT 1		CONTEXT 2
> > +	 *	---------		---------
> > +	 *	lock A (cross)
> > +	 *	X = atomic_inc_return(&cross_gen_id)
> > +	 *	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +	 *				Y = atomic_read_acquire(&cross_gen_id)
> > +	 *				lock B
> > +	 *
> > +	 * atomic_read_acquire() is for ordering between A and B,
> > +	 * IOW, A happens before B, when CONTEXT 2 see Y >= X.
> > +	 *
> > +	 * Pairs with atomic_inc_return() in add_xlock().
> > +	 */
> > +	hlock->gen_id = (unsigned int)atomic_read_acquire(&cross_gen_id);
> > +
> > +	if (cross_lock(hlock->instance))
> > +		return add_xlock(hlock);
> > +
> > +	check_add_xhlock(hlock);
> > +	return 2;
> > +}
> > +
> > +static int copy_trace(struct stack_trace *trace)
> > +{
> > +	unsigned long *buf = stack_trace + nr_stack_trace_entries;
> > +	unsigned int max_nr = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
> > +	unsigned int nr = min(max_nr, trace->nr_entries);
> > +
> > +	trace->nr_entries = nr;
> > +	memcpy(buf, trace->entries, nr * sizeof(trace->entries[0]));
> > +	trace->entries = buf;
> > +	nr_stack_trace_entries += nr;
> > +
> > +	if (nr_stack_trace_entries >= MAX_STACK_TRACE_ENTRIES-1) {
> > +		if (!debug_locks_off_graph_unlock())
> > +			return 0;
> > +
> > +		print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!");
> > +		dump_stack();
> > +
> > +		return 0;
> > +	}
> > +
> > +	return 1;
> > +}
> > +
> > +static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
> > +{
> > +	unsigned int xid, pid;
> > +	u64 chain_key;
> > +
> > +	xid = xlock_class(xlock) - lock_classes;
> > +	chain_key = iterate_chain_key((u64)0, xid);
> > +	pid = xhlock_class(xhlock) - lock_classes;
> > +	chain_key = iterate_chain_key(chain_key, pid);
> > +
> > +	if (lookup_chain_cache(chain_key))
> > +		return 1;
> > +
> > +	if (!add_chain_cache_classes(xid, pid, xhlock->hlock.irq_context,
> > +				chain_key))
> > +		return 0;
> > +
> > +	if (!check_prev_add(current, &xlock->hlock, &xhlock->hlock, 1,
> > +			    &xhlock->trace, copy_trace))
> > +		return 0;
> > +
> > +	return 1;
> > +}
> > +
> > +static void commit_xhlocks(struct cross_lock *xlock)
> > +{
> > +	unsigned int cur = current->xhlock_idx;
> > +	unsigned int i;
> > +
> > +	if (!graph_lock())
> > +		return;
> > +
> > +	for (i = 0; i < MAX_XHLOCKS_NR; i++) {
> > +		struct hist_lock *xhlock = &xhlock(cur - i);
> > +
> > +		if (!xhlock_valid(xhlock))
> > +			break;
> > +
> > +		if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
> > +			break;
> > +
> > +		if (!same_context_xhlock(xhlock))
> > +			break;
> > +
> > +		/*
> > +		 * commit_xhlock() returns 0 with graph_lock already
> > +		 * released if fail.
> > +		 */
> > +		if (!commit_xhlock(xlock, xhlock))
> > +			return;
> > +	}
> > +
> > +	graph_unlock();
> > +}
> > +
> > +void lock_commit_crosslock(struct lockdep_map *lock)
> > +{
> > +	struct cross_lock *xlock;
> > +	unsigned long flags;
> > +
> > +	if (unlikely(!debug_locks || current->lockdep_recursion))
> > +		return;
> > +
> > +	if (!current->xhlocks)
> > +		return;
> > +
> > +	/*
> > +	 * Do commit hist_locks with the cross_lock, only in case that
> > +	 * the cross_lock could depend on acquisitions after that.
> > +	 *
> > +	 * For example, if the cross_lock does not have the 'check' flag
> > +	 * then we don't need to check dependencies and commit for that.
> > +	 * Just skip it. In that case, of course, the cross_lock does
> > +	 * not depend on acquisitions ahead, either.
> > +	 *
> > +	 * WARNING: Don't do that in add_xlock() in advance. When an
> > +	 * acquisition context is different from the commit context,
> > +	 * invalid(skipped) cross_lock might be accessed.
> > +	 */
> > +	if (!depend_after(&((struct lockdep_map_cross *)lock)->xlock.hlock))
> > +		return;
> > +
> > +	raw_local_irq_save(flags);
> > +	check_flags(flags);
> > +	current->lockdep_recursion = 1;
> > +	xlock = &((struct lockdep_map_cross *)lock)->xlock;
> > +	commit_xhlocks(xlock);
> > +	current->lockdep_recursion = 0;
> > +	raw_local_irq_restore(flags);
> > +}
> > +EXPORT_SYMBOL_GPL(lock_commit_crosslock);
> > +
> > +/*
> > + * Return: 1 - crosslock, done;
> > + *         2 - normal lock, continue to held_lock[] ops.
> > + */
> > +static int lock_release_crosslock(struct lockdep_map *lock)
> > +{
> > +	return cross_lock(lock) ? 1 : 2;
> > +}
> > +
> > +static void cross_init(struct lockdep_map *lock, int cross)
> > +{
> > +	lock->cross = cross;
> > +
> > +	/*
> > +	 * Crossrelease assumes that the ring buffer size of xhlocks
> > +	 * is aligned with power of 2. So force it on build.
> > +	 */
> > +	BUILD_BUG_ON(MAX_XHLOCKS_NR & (MAX_XHLOCKS_NR - 1));
> > +}
> > +
> > +void init_crossrelease_task(struct task_struct *task)
> > +{
> > +	task->xhlock_idx = UINT_MAX;
> > +	task->xhlock_idx_soft = UINT_MAX;
> > +	task->xhlock_idx_hard = UINT_MAX;
> > +	task->xhlock_idx_work = UINT_MAX;
> > +	task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,
> > +				GFP_KERNEL);
> > +}
> > +
> > +void free_crossrelease_task(struct task_struct *task)
> > +{
> > +	if (task->xhlocks) {
> > +		void *tmp = task->xhlocks;
> > +		/* Diable crossrelease for current */
> > +		task->xhlocks = NULL;
> > +		kfree(tmp);
> > +	}
> > +}
> > +#endif
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index 479d840..2f43ac1 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -2092,6 +2092,7 @@ static void process_one_work(struct worker *worker, struct work_struct *work)
> >  
> >  	lock_map_acquire_read(&pwq->wq->lockdep_map);
> >  	lock_map_acquire(&lockdep_map);
> > +	crossrelease_work_start();
> >  	trace_workqueue_execute_start(work);
> >  	worker->current_func(work);
> >  	/*
> > @@ -2099,6 +2100,7 @@ static void process_one_work(struct worker *worker, struct work_struct *work)
> >  	 * point will only record its address.
> >  	 */
> >  	trace_workqueue_execute_end(work);
> > +	crossrelease_work_end();
> >  	lock_map_release(&lockdep_map);
> >  	lock_map_release(&pwq->wq->lockdep_map);
> >  
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index a6c8db1..e584431 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1042,6 +1042,18 @@ config DEBUG_LOCK_ALLOC
> >  	 spin_lock_init()/mutex_init()/etc., or whether there is any lock
> >  	 held during task exit.
> >  
> > +config LOCKDEP_CROSSRELEASE
> > +	bool "Lock debugging: make lockdep work for crosslocks"
> > +	depends on PROVE_LOCKING
> > +	default n
> > +	help
> > +	 This makes lockdep work for crosslock which is a lock allowed to
> > +	 be released in a different context from the acquisition context.
> > +	 Normally a lock must be released in the context acquiring the lock.
> > +	 However, relexing this constraint helps synchronization primitives
> > +	 such as page locks or completions can use the lock correctness
> > +	 detector, lockdep.
> > +
> >  config PROVE_LOCKING
> >  	bool "Lock debugging: prove locking correctness"
> >  	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
> > -- 
> > 1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 05/16] lockdep: Implement crossrelease feature
  2017-05-24  8:59 ` [PATCH v7 05/16] lockdep: Implement crossrelease feature Byungchul Park
  2017-06-13  0:33   ` Byungchul Park
@ 2017-07-11 16:04   ` Peter Zijlstra
  2017-07-12  2:24     ` Byungchul Park
  1 sibling, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2017-07-11 16:04 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team


Sorry for the much delayed response; aside from the usual backlog I got
unusually held up by family responsibilities.

My comments in the form of a patch..


--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -542,10 +542,10 @@ extern void crossrelease_hardirq_start(v
 extern void crossrelease_hardirq_end(void);
 extern void crossrelease_softirq_start(void);
 extern void crossrelease_softirq_end(void);
-extern void crossrelease_work_start(void);
-extern void crossrelease_work_end(void);
-extern void init_crossrelease_task(struct task_struct *task);
-extern void free_crossrelease_task(struct task_struct *task);
+extern void crossrelease_hist_start(void);
+extern void crossrelease_hist_end(void);
+extern void lockdep_init_task(struct task_struct *task);
+extern void lockdep_free_task(struct task_struct *task);
 #else
 /*
  * To initialize a lockdep_map statically use this macro.
@@ -558,10 +558,10 @@ static inline void crossrelease_hardirq_
 static inline void crossrelease_hardirq_end(void) {}
 static inline void crossrelease_softirq_start(void) {}
 static inline void crossrelease_softirq_end(void) {}
-static inline void crossrelease_work_start(void) {}
-static inline void crossrelease_work_end(void) {}
-static inline void init_crossrelease_task(struct task_struct *task) {}
-static inline void free_crossrelease_task(struct task_struct *task) {}
+static inline void crossrelease_hist_start(void) {}
+static inline void crossrelease_hist_end(void) {}
+static inline void lockdep_init_task(struct task_struct *task) {}
+static inline void lockdep_free_task(struct task_struct *task) {}
 #endif
 
 #ifdef CONFIG_LOCK_STAT
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -821,7 +821,7 @@ struct task_struct {
 	unsigned int xhlock_idx;
 	unsigned int xhlock_idx_soft; /* For restoring at softirq exit */
 	unsigned int xhlock_idx_hard; /* For restoring at hardirq exit */
-	unsigned int xhlock_idx_work; /* For restoring at work exit */
+	unsigned int xhlock_idx_hist; /* For restoring at history boundaries */
 #endif
 #ifdef CONFIG_UBSAN
 	unsigned int			in_ubsan;
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -933,7 +933,7 @@ void __noreturn do_exit(long code)
 	exit_rcu();
 	TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));
 
-	free_crossrelease_task(tsk);
+	lockdep_free_task(tsk);
 	do_task_dead();
 }
 EXPORT_SYMBOL_GPL(do_exit);
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -485,7 +485,7 @@ void __init fork_init(void)
 	for (i = 0; i < UCOUNT_COUNTS; i++) {
 		init_user_ns.ucount_max[i] = max_threads/2;
 	}
-	init_crossrelease_task(&init_task);
+	lockdep_init_task(&init_task);
 
 #ifdef CONFIG_VMAP_STACK
 	cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "fork:vm_stack_cache",
@@ -1694,7 +1694,7 @@ static __latent_entropy struct task_stru
 	p->lockdep_depth = 0; /* no locks held yet */
 	p->curr_chain_key = 0;
 	p->lockdep_recursion = 0;
-	init_crossrelease_task(p);
+	lockdep_init_task(p);
 #endif
 
 #ifdef CONFIG_DEBUG_MUTEXES
@@ -1953,7 +1953,7 @@ static __latent_entropy struct task_stru
 bad_fork_cleanup_perf:
 	perf_event_free_task(p);
 bad_fork_cleanup_policy:
-	free_crossrelease_task(p);
+	lockdep_free_task(p);
 #ifdef CONFIG_NUMA
 	mpol_put(p->mempolicy);
 bad_fork_cleanup_threadgroup_lock:
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3381,8 +3381,8 @@ static int __lock_acquire(struct lockdep
 	unsigned int depth;
 	int chain_head = 0;
 	int class_idx;
-	int ret;
 	u64 chain_key;
+	int ret;
 
 	if (unlikely(!debug_locks))
 		return 0;
@@ -4653,6 +4653,13 @@ asmlinkage __visible void lockdep_sys_ex
 				curr->comm, curr->pid);
 		lockdep_print_held_locks(curr);
 	}
+
+	/*
+	 * The lock history for each syscall should be independent. So wipe the
+	 * slate clean on return to userspace.
+	 */
+	crossrelease_hist_end();
+	crossrelease_hist_start();
 }
 
 void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
@@ -4708,6 +4715,29 @@ EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious
 
 #ifdef CONFIG_LOCKDEP_CROSSRELEASE
 
+/*
+ * Crossrelease works by recording a lock history for each thread and
+ * connecting those historic locks that were taken after the
+ * wait_for_completion() in the complete() context.
+ *
+ * Task-A				Task-B
+ *
+ *					mutex_lock(&A);
+ *					mutex_unlock(&A);
+ *
+ * wait_for_completion(&C);
+ *   lock_acquire_crosslock();
+ *     atomic_inc_return(&cross_gen_id);
+ *                                |
+ *				  |	mutex_lock(&B);
+ *				  |	mutex_unlock(&B);
+ *                                |
+ *				  |	complete(&C);
+ *				  `--	  lock_commit_crosslock();
+ *
+ * Which will then add a dependency between B and C.
+ */
+
 #define xhlock(i)         (current->xhlocks[(i) % MAX_XHLOCKS_NR])
 
 /*
@@ -4715,6 +4745,25 @@ EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious
  */
 static atomic_t cross_gen_id; /* Can be wrapped */
 
+/*
+ * Lock history stacks; we have 3 nested lock history stacks:
+ *
+ *   Hard IRQ
+ *   Soft IRQ
+ *   History / Task
+ *
+ * The thing is that once we complete a (Hard/Soft) IRQ the future task locks
+ * should not depend on any of the locks observed while running the IRQ.
+ *
+ * So what we do is rewind the history buffer and erase all our knowledge of
+ * that temporal event.
+ *
+ * If the rewind wraps the history ring buffer ... XXX explain how we'll
+ * discard stuff. I cannot readily find how a rewind of exactly MAX_XHLOCKS_NR
+ * is not a NOP... should we make xhlock_valid() trigger when the rewind >=
+ * MAX_XHLOCKS_NR ? Possibly re-instroduce hist_gen_id ?
+ */
+
 void crossrelease_hardirq_start(void)
 {
 	if (current->xhlocks)
@@ -4740,20 +4789,31 @@ void crossrelease_softirq_end(void)
 }
 
 /*
- * Each work of workqueue might run in a different context,
- * thanks to concurrency support of workqueue. So we have to
- * distinguish each work to avoid false positive.
+ * We need this to annotate lock history boundaries. Take for instance
+ * workqueues; each work is independent of the last. The completion of a future
+ * work does not depend on the completion of a past work (in general).
+ * Therefore we must not carry that (lock) dependency across works.
+ *
+ * This is true for many things; pretty much all kthreads fall into this
+ * pattern, where they have an 'idle' state and future completions do not
+ * depend on past completions. Its just that since they all have the 'same'
+ * form -- the kthread does the same over and over -- it doesn't typically
+ * matter.
+ *
+ * The same is true for system-calls, once a system call is completed (we've
+ * returned to userspace) the next system call does not depend on the lock
+ * history of the previous system call.
  */
-void crossrelease_work_start(void)
+void crossrelease_hist_start(void)
 {
 	if (current->xhlocks)
-		current->xhlock_idx_work = current->xhlock_idx;
+		current->xhlock_idx_hist = current->xhlock_idx;
 }
 
-void crossrelease_work_end(void)
+void crossrelease_hist_end(void)
 {
 	if (current->xhlocks)
-		current->xhlock_idx = current->xhlock_idx_work;
+		current->xhlock_idx = current->xhlock_idx_hist;
 }
 
 static int cross_lock(struct lockdep_map *lock)
@@ -5053,17 +5113,17 @@ static void cross_init(struct lockdep_ma
 	BUILD_BUG_ON(MAX_XHLOCKS_NR & (MAX_XHLOCKS_NR - 1));
 }
 
-void init_crossrelease_task(struct task_struct *task)
+void lockdep_init_task(struct task_struct *task)
 {
 	task->xhlock_idx = UINT_MAX;
 	task->xhlock_idx_soft = UINT_MAX;
 	task->xhlock_idx_hard = UINT_MAX;
-	task->xhlock_idx_work = UINT_MAX;
+	task->xhlock_idx_hist = UINT_MAX;
 	task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,
 				GFP_KERNEL);
 }
 
-void free_crossrelease_task(struct task_struct *task)
+void lockdep_free_task(struct task_struct *task)
 {
 	if (task->xhlocks) {
 		void *tmp = task->xhlocks;
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2093,7 +2093,7 @@ __acquires(&pool->lock)
 
 	lock_map_acquire_read(&pwq->wq->lockdep_map);
 	lock_map_acquire(&lockdep_map);
-	crossrelease_work_start();
+	crossrelease_hist_start();
 	trace_workqueue_execute_start(work);
 	worker->current_func(work);
 	/*
@@ -2101,7 +2101,7 @@ __acquires(&pool->lock)
 	 * point will only record its address.
 	 */
 	trace_workqueue_execute_end(work);
-	crossrelease_work_end();
+	crossrelease_hist_end();
 	lock_map_release(&lockdep_map);
 	lock_map_release(&pwq->wq->lockdep_map);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-05-24  8:59 ` [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite Byungchul Park
@ 2017-07-11 16:12   ` Peter Zijlstra
  2017-07-12  2:00     ` Byungchul Park
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2017-07-11 16:12 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team


ARGH!!! please, if there are known holes in patches, put a comment in.

I now had to independently discover this problem during review of the
last patch.

On Wed, May 24, 2017 at 05:59:39PM +0900, Byungchul Park wrote:
> The ring buffer can be overwritten by hardirq/softirq/work contexts.
> That cases must be considered on rollback or commit. For example,
> 
>           |<------ hist_lock ring buffer size ----->|
>           ppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
> wrapped > iiiiiiiiiiiiiiiiiiiiiii....................
> 
>           where 'p' represents an acquisition in process context,
>           'i' represents an acquisition in irq context.
> 
> On irq exit, crossrelease tries to rollback idx to original position,
> but it should not because the entry already has been invalid by
> overwriting 'i'. Avoid rollback or commit for entries overwritten.
> 
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
>  include/linux/lockdep.h  | 20 +++++++++++
>  include/linux/sched.h    |  4 +++
>  kernel/locking/lockdep.c | 92 +++++++++++++++++++++++++++++++++++++++++-------
>  3 files changed, 104 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index d531097..a03f79d 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -284,6 +284,26 @@ struct held_lock {
>   */
>  struct hist_lock {
>  	/*
> +	 * Id for each entry in the ring buffer. This is used to
> +	 * decide whether the ring buffer was overwritten or not.
> +	 *
> +	 * For example,
> +	 *
> +	 *           |<----------- hist_lock ring buffer size ------->|
> +	 *           pppppppppppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiii
> +	 * wrapped > iiiiiiiiiiiiiiiiiiiiiiiiiii.......................
> +	 *
> +	 *           where 'p' represents an acquisition in process
> +	 *           context, 'i' represents an acquisition in irq
> +	 *           context.
> +	 *
> +	 * In this example, the ring buffer was overwritten by
> +	 * acquisitions in irq context, that should be detected on
> +	 * rollback or commit.
> +	 */
> +	unsigned int hist_id;
> +
> +	/*
>  	 * Seperate stack_trace data. This will be used at commit step.
>  	 */
>  	struct stack_trace	trace;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5f6d6f4..9e1437c 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1756,6 +1756,10 @@ struct task_struct {
>  	unsigned int xhlock_idx_soft; /* For restoring at softirq exit */
>  	unsigned int xhlock_idx_hard; /* For restoring at hardirq exit */
>  	unsigned int xhlock_idx_work; /* For restoring at work exit */
> +	unsigned int hist_id;
> +	unsigned int hist_id_soft; /* For overwrite check at softirq exit */
> +	unsigned int hist_id_hard; /* For overwrite check at hardirq exit */
> +	unsigned int hist_id_work; /* For overwrite check at work exit */
>  #endif
>  #ifdef CONFIG_UBSAN
>  	unsigned int in_ubsan;


Right, like I wrote in the comment; I don't think you need quite this
much.

The problem only happens if you rewind more than MAX_XHLOCKS_NR;
although I realize it can be an accumulative rewind, which makes it
slightly more tricky.

We can either make the rewind more expensive and make xhlock_valid()
false for each rewound entry; or we can keep the max_idx and account
from there. If we rewind >= MAX_XHLOCKS_NR from the max_idx we need to
invalidate the entire state, which we can do by invaliding
xhlock_valid() or by re-introduction of the hist_gen_id. When we
invalidate the entire state, we can also clear the max_idx.

Given that rewinding _that_ far should be fairly rare (do we have
numbers?) simply iterating the entire thing and setting
xhlock->hlock.instance = NULL, should work I think.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-11 16:12   ` Peter Zijlstra
@ 2017-07-12  2:00     ` Byungchul Park
  2017-07-12  7:56       ` Peter Zijlstra
  0 siblings, 1 reply; 41+ messages in thread
From: Byungchul Park @ 2017-07-12  2:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team

On Tue, Jul 11, 2017 at 06:12:32PM +0200, Peter Zijlstra wrote:
> 
> ARGH!!! please, if there are known holes in patches, put a comment in.

The fourth of the last change log is the comment, but it was not enough.
I will try to add more comment in that case.

> I now had to independently discover this problem during review of the
> last patch.
> 

...

> 
> Right, like I wrote in the comment; I don't think you need quite this
> much.
> 
> The problem only happens if you rewind more than MAX_XHLOCKS_NR;
> although I realize it can be an accumulative rewind, which makes it
> slightly more tricky.
> 
> We can either make the rewind more expensive and make xhlock_valid()
> false for each rewound entry; or we can keep the max_idx and account

Does max_idx mean the 'original position - 1'?

> from there. If we rewind >= MAX_XHLOCKS_NR from the max_idx we need to
> invalidate the entire state, which we can do by invaliding

Could you explain what the entire state is?

> xhlock_valid() or by re-introduction of the hist_gen_id. When we

What does the re-introduction of the hist_gen_id mean?

> invalidate the entire state, we can also clear the max_idx.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 05/16] lockdep: Implement crossrelease feature
  2017-07-11 16:04   ` Peter Zijlstra
@ 2017-07-12  2:24     ` Byungchul Park
  0 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-07-12  2:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team

On Tue, Jul 11, 2017 at 06:04:54PM +0200, Peter Zijlstra wrote:
> 
> Sorry for the much delayed response; aside from the usual backlog I got
> unusually held up by family responsibilities.
> 
> My comments in the form of a patch..

Thank you.

I will apply it at the next spin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-12  2:00     ` Byungchul Park
@ 2017-07-12  7:56       ` Peter Zijlstra
  2017-07-13  2:07         ` Byungchul Park
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2017-07-12  7:56 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team

On Wed, Jul 12, 2017 at 11:00:53AM +0900, Byungchul Park wrote:
> On Tue, Jul 11, 2017 at 06:12:32PM +0200, Peter Zijlstra wrote:

> > Right, like I wrote in the comment; I don't think you need quite this
> > much.
> > 
> > The problem only happens if you rewind more than MAX_XHLOCKS_NR;
> > although I realize it can be an accumulative rewind, which makes it
> > slightly more tricky.
> > 
> > We can either make the rewind more expensive and make xhlock_valid()
> > false for each rewound entry; or we can keep the max_idx and account
> 
> Does max_idx mean the 'original position - 1'?

	orig_idx = current->hist_idx;
	current->hist_idx++;
	if ((int)(current->hist_idx - orig_idx) > 0)
	  current->hist_idx_max = current->hist_idx;


I've forgotten if the idx points to the most recent entry or beyond it.

Given the circular nature, and tail being one ahead of head, the max
effectively tracks the tail (I suppose we can also do an explicit tail
tracking, but that might end up more difficult).

This allows rewinds of less than array_size() while still maintaining a
correct tail.

Only once we (cummulative or not) rewind past the tail -- iow, loose the
_entire_ history, do we need to do something drastic.

> > from there. If we rewind >= MAX_XHLOCKS_NR from the max_idx we need to
> > invalidate the entire state, which we can do by invaliding
> 
> Could you explain what the entire state is?

All hist_lock[]. Did the above help?

> > xhlock_valid() or by re-introduction of the hist_gen_id. When we
> 
> What does the re-introduction of the hist_gen_id mean?

What you used to call work_id or something like that. A generation count
for the hist_lock[].

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-12  7:56       ` Peter Zijlstra
@ 2017-07-13  2:07         ` Byungchul Park
  2017-07-13  8:14           ` Peter Zijlstra
  0 siblings, 1 reply; 41+ messages in thread
From: Byungchul Park @ 2017-07-13  2:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team

On Wed, Jul 12, 2017 at 09:56:17AM +0200, Peter Zijlstra wrote:
> On Wed, Jul 12, 2017 at 11:00:53AM +0900, Byungchul Park wrote:
> > On Tue, Jul 11, 2017 at 06:12:32PM +0200, Peter Zijlstra wrote:
> 
> > > Right, like I wrote in the comment; I don't think you need quite this
> > > much.
> > > 
> > > The problem only happens if you rewind more than MAX_XHLOCKS_NR;
> > > although I realize it can be an accumulative rewind, which makes it
> > > slightly more tricky.
> > > 
> > > We can either make the rewind more expensive and make xhlock_valid()
> > > false for each rewound entry; or we can keep the max_idx and account
> > 
> > Does max_idx mean the 'original position - 1'?
> 
> 	orig_idx = current->hist_idx;
> 	current->hist_idx++;
> 	if ((int)(current->hist_idx - orig_idx) > 0)
> 	  current->hist_idx_max = current->hist_idx;
> 
> 
> I've forgotten if the idx points to the most recent entry or beyond it.
> 
> Given the circular nature, and tail being one ahead of head, the max
> effectively tracks the tail (I suppose we can also do an explicit tail
> tracking, but that might end up more difficult).
> 
> This allows rewinds of less than array_size() while still maintaining a
> correct tail.
> 
> Only once we (cummulative or not) rewind past the tail -- iow, loose the
> _entire_ history, do we need to do something drastic.

I am sorry but I don't understand why we have to do the drastic work.

Does my approach have problems, rewinding to 'original idx' on exit and
deciding whether overwrite or not? I think, this way, no need to do the
drastic work. Or.. does my one get more overhead in usual case?

> 
> > > from there. If we rewind >= MAX_XHLOCKS_NR from the max_idx we need to
> > > invalidate the entire state, which we can do by invaliding
> > 
> > Could you explain what the entire state is?
> 
> All hist_lock[]. Did the above help?
> 
> > > xhlock_valid() or by re-introduction of the hist_gen_id. When we
> > 
> > What does the re-introduction of the hist_gen_id mean?
> 
> What you used to call work_id or something like that. A generation count
> for the hist_lock[].

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-13  2:07         ` Byungchul Park
@ 2017-07-13  8:14           ` Peter Zijlstra
  2017-07-13  8:57             ` Byungchul Park
  2017-07-18  1:25             ` Byungchul Park
  0 siblings, 2 replies; 41+ messages in thread
From: Peter Zijlstra @ 2017-07-13  8:14 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team

On Thu, Jul 13, 2017 at 11:07:45AM +0900, Byungchul Park wrote:
> Does my approach have problems, rewinding to 'original idx' on exit and
> deciding whether overwrite or not? I think, this way, no need to do the
> drastic work. Or.. does my one get more overhead in usual case?

So I think that invalidating just the one entry doesn't work; the moment
you fill that up the iteration in commit_xhlocks() will again use the
next one etc.. even though you wanted it not to.

So we need to wipe the _entire_ history.

So I _think_ the below should work, but its not been near a compiler.


--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -822,6 +822,7 @@ struct task_struct {
 	unsigned int xhlock_idx_soft; /* For restoring at softirq exit */
 	unsigned int xhlock_idx_hard; /* For restoring at hardirq exit */
 	unsigned int xhlock_idx_hist; /* For restoring at history boundaries */
+	unsigned int xhlock_idX_max;
 #endif
 #ifdef CONFIG_UBSAN
 	unsigned int			in_ubsan;
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4746,6 +4746,14 @@ EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious
 static atomic_t cross_gen_id; /* Can be wrapped */
 
 /*
+ * make xhlock_valid() false.
+ */
+static inline void invalidate_xhlock(struct hist_lock *xhlock)
+{
+	xhlock->hlock.instance = NULL;
+}
+
+/*
  * Lock history stacks; we have 3 nested lock history stacks:
  *
  *   Hard IRQ
@@ -4764,28 +4772,58 @@ static atomic_t cross_gen_id; /* Can be
  * MAX_XHLOCKS_NR ? Possibly re-instroduce hist_gen_id ?
  */
 
-void crossrelease_hardirq_start(void)
+static inline void __crossrelease_start(unsigned int *stamp)
 {
 	if (current->xhlocks)
-		current->xhlock_idx_hard = current->xhlock_idx;
+		*stamp = current->xhlock_idx;
+}
+
+static void __crossrelease_end(unsigned int *stamp)
+{
+	int i;
+
+	if (!current->xhlocks)
+		return;
+
+	current->xhlock_idx = *stamp;
+
+	/*
+	 * If we rewind past the tail; all of history is lost.
+	 */
+	if ((current->xhlock_idx_max - *stamp) < MAX_XHLOCKS_NR)
+		return;
+
+	/*
+	 * Invalidate the entire history..
+	 */
+	for (i = 0; i < MAX_XHLOCKS_NR; i++)
+		invalidate_xhlock(&xhlock(i));
+
+	current->xhlock_idx = 0;
+	current->xhlock_idx_hard = 0;
+	current->xhlock_idx_soft = 0;
+	current->xhlock_idx_hist = 0;
+	current->xhlock_idx_max = 0;
+}
+
+void crossrelease_hardirq_start(void)
+{
+	__crossrelease_start(&current->xhlock_idx_hard);
 }
 
 void crossrelease_hardirq_end(void)
 {
-	if (current->xhlocks)
-		current->xhlock_idx = current->xhlock_idx_hard;
+	__crossrelease_end(&current->xhlock_idx_hard);
 }
 
 void crossrelease_softirq_start(void)
 {
-	if (current->xhlocks)
-		current->xhlock_idx_soft = current->xhlock_idx;
+	__crossrelease_start(&current->xhlock_idx_soft);
 }
 
 void crossrelease_softirq_end(void)
 {
-	if (current->xhlocks)
-		current->xhlock_idx = current->xhlock_idx_soft;
+	__crossrelease_end(&current->xhlock_idx_soft);
 }
 
 /*
@@ -4806,14 +4844,12 @@ void crossrelease_softirq_end(void)
  */
 void crossrelease_hist_start(void)
 {
-	if (current->xhlocks)
-		current->xhlock_idx_hist = current->xhlock_idx;
+	__crossrelease_start(&current->xhlock_idx_hist);
 }
 
 void crossrelease_hist_end(void)
 {
-	if (current->xhlocks)
-		current->xhlock_idx = current->xhlock_idx_hist;
+	__crossrelease_end(&current->xhlock_idx_hist);
 }
 
 static int cross_lock(struct lockdep_map *lock)
@@ -4880,6 +4916,9 @@ static void add_xhlock(struct held_lock
 	unsigned int idx = ++current->xhlock_idx;
 	struct hist_lock *xhlock = &xhlock(idx);
 
+	if ((int)(current->xhlock_idx_max - idx) < 0)
+		current->xhlock_idx_max = idx;
+
 #ifdef CONFIG_DEBUG_LOCKDEP
 	/*
 	 * This can be done locklessly because they are all task-local

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-13  8:14           ` Peter Zijlstra
@ 2017-07-13  8:57             ` Byungchul Park
  2017-07-13  9:50               ` Peter Zijlstra
  2017-07-18  1:25             ` Byungchul Park
  1 sibling, 1 reply; 41+ messages in thread
From: Byungchul Park @ 2017-07-13  8:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team

On Thu, Jul 13, 2017 at 10:14:42AM +0200, Peter Zijlstra wrote:
> On Thu, Jul 13, 2017 at 11:07:45AM +0900, Byungchul Park wrote:
> > Does my approach have problems, rewinding to 'original idx' on exit and
> > deciding whether overwrite or not? I think, this way, no need to do the
> > drastic work. Or.. does my one get more overhead in usual case?
> 
> So I think that invalidating just the one entry doesn't work; the moment

I think invalidating just the one is enough. After rewinding, the entry
will be invalidated and the ring buffer starts to be filled forward from
the point with valid ones. When commit, it will proceed backward with
valid ones until meeting the invalidated entry and stop.

IOW, in case of (overwritten)

         rewind to here
         |
ppppppppppiiiiiiiiiiiiiiii
iiiiiiiiiiiiiii

         invalidate it on exit_irq
         and start to fill from here again
         |
pppppppppxiiiiiiiiiiiiiiii
iiiiiiiiiiiiiii

                    when commit occurs here
                    |
pppppppppxpppppppppppiiiii

         do commit within this range
         |<---------|
pppppppppxpppppppppppiiiii

So I think this works and is much simple. Anything I missed?

> you fill that up the iteration in commit_xhlocks() will again use the
> next one etc.. even though you wanted it not to.
> 
> So we need to wipe the _entire_ history.
> 
> So I _think_ the below should work, but its not been near a compiler.
> 
> 
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -822,6 +822,7 @@ struct task_struct {
>  	unsigned int xhlock_idx_soft; /* For restoring at softirq exit */
>  	unsigned int xhlock_idx_hard; /* For restoring at hardirq exit */
>  	unsigned int xhlock_idx_hist; /* For restoring at history boundaries */
> +	unsigned int xhlock_idX_max;
>  #endif
>  #ifdef CONFIG_UBSAN
>  	unsigned int			in_ubsan;
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -4746,6 +4746,14 @@ EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious
>  static atomic_t cross_gen_id; /* Can be wrapped */
>  
>  /*
> + * make xhlock_valid() false.
> + */
> +static inline void invalidate_xhlock(struct hist_lock *xhlock)
> +{
> +	xhlock->hlock.instance = NULL;
> +}
> +
> +/*
>   * Lock history stacks; we have 3 nested lock history stacks:
>   *
>   *   Hard IRQ
> @@ -4764,28 +4772,58 @@ static atomic_t cross_gen_id; /* Can be
>   * MAX_XHLOCKS_NR ? Possibly re-instroduce hist_gen_id ?
>   */
>  
> -void crossrelease_hardirq_start(void)
> +static inline void __crossrelease_start(unsigned int *stamp)
>  {
>  	if (current->xhlocks)
> -		current->xhlock_idx_hard = current->xhlock_idx;
> +		*stamp = current->xhlock_idx;
> +}
> +
> +static void __crossrelease_end(unsigned int *stamp)
> +{
> +	int i;
> +
> +	if (!current->xhlocks)
> +		return;
> +
> +	current->xhlock_idx = *stamp;
> +
> +	/*
> +	 * If we rewind past the tail; all of history is lost.
> +	 */
> +	if ((current->xhlock_idx_max - *stamp) < MAX_XHLOCKS_NR)
> +		return;
> +
> +	/*
> +	 * Invalidate the entire history..
> +	 */
> +	for (i = 0; i < MAX_XHLOCKS_NR; i++)
> +		invalidate_xhlock(&xhlock(i));
> +
> +	current->xhlock_idx = 0;
> +	current->xhlock_idx_hard = 0;
> +	current->xhlock_idx_soft = 0;
> +	current->xhlock_idx_hist = 0;
> +	current->xhlock_idx_max = 0;
> +}
> +
> +void crossrelease_hardirq_start(void)
> +{
> +	__crossrelease_start(&current->xhlock_idx_hard);
>  }
>  
>  void crossrelease_hardirq_end(void)
>  {
> -	if (current->xhlocks)
> -		current->xhlock_idx = current->xhlock_idx_hard;
> +	__crossrelease_end(&current->xhlock_idx_hard);
>  }
>  
>  void crossrelease_softirq_start(void)
>  {
> -	if (current->xhlocks)
> -		current->xhlock_idx_soft = current->xhlock_idx;
> +	__crossrelease_start(&current->xhlock_idx_soft);
>  }
>  
>  void crossrelease_softirq_end(void)
>  {
> -	if (current->xhlocks)
> -		current->xhlock_idx = current->xhlock_idx_soft;
> +	__crossrelease_end(&current->xhlock_idx_soft);
>  }
>  
>  /*
> @@ -4806,14 +4844,12 @@ void crossrelease_softirq_end(void)
>   */
>  void crossrelease_hist_start(void)
>  {
> -	if (current->xhlocks)
> -		current->xhlock_idx_hist = current->xhlock_idx;
> +	__crossrelease_start(&current->xhlock_idx_hist);
>  }
>  
>  void crossrelease_hist_end(void)
>  {
> -	if (current->xhlocks)
> -		current->xhlock_idx = current->xhlock_idx_hist;
> +	__crossrelease_end(&current->xhlock_idx_hist);
>  }
>  
>  static int cross_lock(struct lockdep_map *lock)
> @@ -4880,6 +4916,9 @@ static void add_xhlock(struct held_lock
>  	unsigned int idx = ++current->xhlock_idx;
>  	struct hist_lock *xhlock = &xhlock(idx);
>  
> +	if ((int)(current->xhlock_idx_max - idx) < 0)
> +		current->xhlock_idx_max = idx;
> +
>  #ifdef CONFIG_DEBUG_LOCKDEP
>  	/*
>  	 * This can be done locklessly because they are all task-local

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-13  8:57             ` Byungchul Park
@ 2017-07-13  9:50               ` Peter Zijlstra
  2017-07-13 10:09                 ` Byungchul Park
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2017-07-13  9:50 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team

On Thu, Jul 13, 2017 at 05:57:46PM +0900, Byungchul Park wrote:
> On Thu, Jul 13, 2017 at 10:14:42AM +0200, Peter Zijlstra wrote:
> > On Thu, Jul 13, 2017 at 11:07:45AM +0900, Byungchul Park wrote:
> > > Does my approach have problems, rewinding to 'original idx' on exit and
> > > deciding whether overwrite or not? I think, this way, no need to do the
> > > drastic work. Or.. does my one get more overhead in usual case?
> > 
> > So I think that invalidating just the one entry doesn't work; the moment
> 
> I think invalidating just the one is enough. After rewinding, the entry
> will be invalidated and the ring buffer starts to be filled forward from
> the point with valid ones. When commit, it will proceed backward with
> valid ones until meeting the invalidated entry and stop.
> 
> IOW, in case of (overwritten)
> 
>          rewind to here
>          |
> ppppppppppiiiiiiiiiiiiiiii
> iiiiiiiiiiiiiii
> 
>          invalidate it on exit_irq
>          and start to fill from here again
>          |
> pppppppppxiiiiiiiiiiiiiiii
> iiiiiiiiiiiiiii
> 
>                     when commit occurs here
>                     |
> pppppppppxpppppppppppiiiii
> 
>          do commit within this range
>          |<---------|
> pppppppppxpppppppppppiiiii
> 
> So I think this works and is much simple. Anything I missed?


	wait_for_completion(&C);
	  atomic_inc_return();

					mutex_lock(A1);
					mutex_unlock(A1);


					<IRQ>
					  spin_lock(B1);
					  spin_unlock(B1);

					  ...

					  spin_lock(B64);
					  spin_unlock(B64);
					</IRQ>


					mutex_lock(A2);
					mutex_unlock(A2);

					complete(&C);


That gives:

	xhist[ 0] = A1
	xhist[ 1] = B1
	...
	xhist[63] = B63

then we wrap and have:

	xhist[0] = B64

then we rewind to 1 and invalidate to arrive at:

	xhist[ 0] = B64
	xhist[ 1] = NULL   <-- idx
	xhist[ 2] = B2
	...
	xhist[63] = B63


Then we do A2 and get

	xhist[ 0] = B64
	xhist[ 1] = A2   <-- idx
	xhist[ 2] = B2
	...
	xhist[63] = B63

and the commit_xhlocks() will happily create links between C and A2,
B2..B64.

The C<->A2 link is desired, the C<->B* are not.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-13  9:50               ` Peter Zijlstra
@ 2017-07-13 10:09                 ` Byungchul Park
  2017-07-13 10:29                   ` Peter Zijlstra
  0 siblings, 1 reply; 41+ messages in thread
From: Byungchul Park @ 2017-07-13 10:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team

On Thu, Jul 13, 2017 at 11:50:52AM +0200, Peter Zijlstra wrote:
> 	wait_for_completion(&C);
> 	  atomic_inc_return();
> 
> 					mutex_lock(A1);
> 					mutex_unlock(A1);
> 
> 
> 					<IRQ>
> 					  spin_lock(B1);
> 					  spin_unlock(B1);
> 
> 					  ...
> 
> 					  spin_lock(B64);
> 					  spin_unlock(B64);
> 					</IRQ>
> 
> 
> 					mutex_lock(A2);
> 					mutex_unlock(A2);
> 
> 					complete(&C);
> 
> 
> That gives:
> 
> 	xhist[ 0] = A1

We have to rollback here later on irq_exit.

The followings are ones for irq context.

> 	xhist[ 1] = B1
> 	...
> 	xhist[63] = B63
> 
> then we wrap and have:
> 
> 	xhist[0] = B64
> 
> then we rewind to 1 and invalidate to arrive at:
> 

Now, whether xhist[0] has been overwritten or not is important. If yes,
xhist[0] should be NULL, _not_ xhist[1], which is one for irq context so
not interest at all.

> 	xhist[ 0] = B64
> 	xhist[ 1] = NULL   <-- idx

Therefore, it should be,

 	xhist[ 0] = NULL <- invalidate, cannot use it any more
	--- <- on returning back from irq context, start from here
 	xhist[ 1] = B1 <-- obsolete history of irq

> 	xhist[ 2] = B2
> 	...
> 	xhist[63] = B63
> 
> 
> Then we do A2 and get
> 
> 	xhist[ 0] = B64
> 	xhist[ 1] = A2   <-- idx
> 	xhist[ 2] = B2
> 	...
> 	xhist[63] = B63

So invalidating only one is enough.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-13 10:09                 ` Byungchul Park
@ 2017-07-13 10:29                   ` Peter Zijlstra
  2017-07-13 11:12                     ` Peter Zijlstra
  2017-07-13 11:19                     ` Byungchul Park
  0 siblings, 2 replies; 41+ messages in thread
From: Peter Zijlstra @ 2017-07-13 10:29 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team

On Thu, Jul 13, 2017 at 07:09:53PM +0900, Byungchul Park wrote:
> On Thu, Jul 13, 2017 at 11:50:52AM +0200, Peter Zijlstra wrote:
> > 	wait_for_completion(&C);
> > 	  atomic_inc_return();
> > 
> > 					mutex_lock(A1);
> > 					mutex_unlock(A1);
> > 
> > 
> > 					<IRQ>
> > 					  spin_lock(B1);
> > 					  spin_unlock(B1);
> > 
> > 					  ...
> > 
> > 					  spin_lock(B64);
> > 					  spin_unlock(B64);
> > 					</IRQ>
> > 
> > 
> > 					mutex_lock(A2);
> > 					mutex_unlock(A2);
> > 
> > 					complete(&C);
> > 
> > 
> > That gives:
> > 
> > 	xhist[ 0] = A1
> 
> We have to rollback here later on irq_exit.
> 
> The followings are ones for irq context.
> 
> > 	xhist[ 1] = B1
> > 	...
> > 	xhist[63] = B63
> > 
> > then we wrap and have:
> > 
> > 	xhist[0] = B64
> > 
> > then we rewind to 1 and invalidate to arrive at:
> > 
> 
> Now, whether xhist[0] has been overwritten or not is important. If yes,
> xhist[0] should be NULL, _not_ xhist[1], which is one for irq context so
> not interest at all.
> 
> > 	xhist[ 0] = B64
> > 	xhist[ 1] = NULL   <-- idx
> 
> Therefore, it should be,
> 
>  	xhist[ 0] = NULL <- invalidate, cannot use it any more
> 	--- <- on returning back from irq context, start from here
>  	xhist[ 1] = B1 <-- obsolete history of irq

Ah, so you rely on the same_context_xhlock() ? That doesn't work for
hist (formerly work).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-13 10:29                   ` Peter Zijlstra
@ 2017-07-13 11:12                     ` Peter Zijlstra
  2017-07-13 11:23                       ` Byungchul Park
  2017-07-13 11:19                     ` Byungchul Park
  1 sibling, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2017-07-13 11:12 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team

On Thu, Jul 13, 2017 at 12:29:05PM +0200, Peter Zijlstra wrote:
> On Thu, Jul 13, 2017 at 07:09:53PM +0900, Byungchul Park wrote:
> > On Thu, Jul 13, 2017 at 11:50:52AM +0200, Peter Zijlstra wrote:
> > > 	wait_for_completion(&C);
> > > 	  atomic_inc_return();
> > > 
> > > 					mutex_lock(A1);
> > > 					mutex_unlock(A1);
> > > 
> > > 
> > > 					<IRQ>
> > > 					  spin_lock(B1);
> > > 					  spin_unlock(B1);
> > > 
> > > 					  ...
> > > 
> > > 					  spin_lock(B64);
> > > 					  spin_unlock(B64);
> > > 					</IRQ>
> > > 
> > > 

Also consider the alternative:

					<IRQ>
					  spin_lock(D);
					  spin_unlock(D);

					  complete(&C);
					</IRQ>

in which case the context test will also not work.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-13 10:29                   ` Peter Zijlstra
  2017-07-13 11:12                     ` Peter Zijlstra
@ 2017-07-13 11:19                     ` Byungchul Park
  1 sibling, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-07-13 11:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Byungchul Park, Ingo Molnar, tglx, Michel Lespinasse, boqun.feng,
	kirill, linux-kernel, linux-mm, akpm, willy, npiggin,
	kernel-team

On Thu, Jul 13, 2017 at 7:29 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jul 13, 2017 at 07:09:53PM +0900, Byungchul Park wrote:
>> On Thu, Jul 13, 2017 at 11:50:52AM +0200, Peter Zijlstra wrote:
>> >     wait_for_completion(&C);
>> >       atomic_inc_return();
>> >
>> >                                     mutex_lock(A1);
>> >                                     mutex_unlock(A1);
>> >
>> >
>> >                                     <IRQ>
>> >                                       spin_lock(B1);
>> >                                       spin_unlock(B1);
>> >
>> >                                       ...
>> >
>> >                                       spin_lock(B64);
>> >                                       spin_unlock(B64);
>> >                                     </IRQ>
>> >
>> >
>> >                                     mutex_lock(A2);
>> >                                     mutex_unlock(A2);
>> >
>> >                                     complete(&C);
>> >
>> >
>> > That gives:
>> >
>> >     xhist[ 0] = A1
>>
>> We have to rollback here later on irq_exit.
>>
>> The followings are ones for irq context.
>>
>> >     xhist[ 1] = B1
>> >     ...
>> >     xhist[63] = B63
>> >
>> > then we wrap and have:
>> >
>> >     xhist[0] = B64
>> >
>> > then we rewind to 1 and invalidate to arrive at:
>> >
>>
>> Now, whether xhist[0] has been overwritten or not is important. If yes,
>> xhist[0] should be NULL, _not_ xhist[1], which is one for irq context so
>> not interest at all.
>>
>> >     xhist[ 0] = B64
>> >     xhist[ 1] = NULL   <-- idx
>>
>> Therefore, it should be,
>>
>>       xhist[ 0] = NULL <- invalidate, cannot use it any more
>>       --- <- on returning back from irq context, start from here
>>       xhist[ 1] = B1 <-- obsolete history of irq
>
> Ah, so you rely on the same_context_xhlock() ? That doesn't work for
> hist (formerly work).

As I mentioned in cover-letter, I applied the rollback mechanism into work
(of workqueue) as well. So it works even for hist.

-- 
Thanks,
Byungchul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-13 11:12                     ` Peter Zijlstra
@ 2017-07-13 11:23                       ` Byungchul Park
  2017-07-14  1:41                         ` Byungchul Park
  2017-07-14  6:42                         ` Byungchul Park
  0 siblings, 2 replies; 41+ messages in thread
From: Byungchul Park @ 2017-07-13 11:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Byungchul Park, Ingo Molnar, tglx, Michel Lespinasse, boqun.feng,
	kirill, linux-kernel, linux-mm, akpm, willy, npiggin,
	kernel-team

On Thu, Jul 13, 2017 at 8:12 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jul 13, 2017 at 12:29:05PM +0200, Peter Zijlstra wrote:
>> On Thu, Jul 13, 2017 at 07:09:53PM +0900, Byungchul Park wrote:
>> > On Thu, Jul 13, 2017 at 11:50:52AM +0200, Peter Zijlstra wrote:
>> > >   wait_for_completion(&C);
>> > >     atomic_inc_return();
>> > >
>> > >                                   mutex_lock(A1);
>> > >                                   mutex_unlock(A1);
>> > >
>> > >
>> > >                                   <IRQ>
>> > >                                     spin_lock(B1);
>> > >                                     spin_unlock(B1);
>> > >
>> > >                                     ...
>> > >
>> > >                                     spin_lock(B64);
>> > >                                     spin_unlock(B64);
>> > >                                   </IRQ>
>> > >
>> > >
>
> Also consider the alternative:
>
>                                         <IRQ>
>                                           spin_lock(D);
>                                           spin_unlock(D);
>
>                                           complete(&C);
>                                         </IRQ>
>
> in which case the context test will also not work.

Context tests are done on xhlock with the release context, _not_
acquisition context. For example, spin_lock(D) and complete(&C) are
in the same context, so the test would pass in this example.

So it works.


-- 
Thanks,
Byungchul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-13 11:23                       ` Byungchul Park
@ 2017-07-14  1:41                         ` Byungchul Park
  2017-07-14  6:42                         ` Byungchul Park
  1 sibling, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-07-14  1:41 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Peter Zijlstra, Ingo Molnar, tglx, Michel Lespinasse, boqun.feng,
	kirill, linux-kernel, linux-mm, akpm, willy, npiggin,
	kernel-team

On Thu, Jul 13, 2017 at 08:23:33PM +0900, Byungchul Park wrote:
> On Thu, Jul 13, 2017 at 8:12 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, Jul 13, 2017 at 12:29:05PM +0200, Peter Zijlstra wrote:
> >> On Thu, Jul 13, 2017 at 07:09:53PM +0900, Byungchul Park wrote:
> >> > On Thu, Jul 13, 2017 at 11:50:52AM +0200, Peter Zijlstra wrote:
> >> > >   wait_for_completion(&C);
> >> > >     atomic_inc_return();
> >> > >
> >> > >                                   mutex_lock(A1);
> >> > >                                   mutex_unlock(A1);
> >> > >
> >> > >
> >> > >                                   <IRQ>
> >> > >                                     spin_lock(B1);
> >> > >                                     spin_unlock(B1);
> >> > >
> >> > >                                     ...
> >> > >
> >> > >                                     spin_lock(B64);
> >> > >                                     spin_unlock(B64);
> >> > >                                   </IRQ>
> >> > >
> >> > >
> >
> > Also consider the alternative:
> >
> >                                         <IRQ>
> >                                           spin_lock(D);
> >                                           spin_unlock(D);
> >
> >                                           complete(&C);
> >                                         </IRQ>
> >
> > in which case the context test will also not work.
> 
> Context tests are done on xhlock with the release context, _not_
> acquisition context. For example, spin_lock(D) and complete(&C) are
> in the same context, so the test would pass in this example.

Something wrong?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-13 11:23                       ` Byungchul Park
  2017-07-14  1:41                         ` Byungchul Park
@ 2017-07-14  6:42                         ` Byungchul Park
  2017-07-21 13:54                           ` Peter Zijlstra
  1 sibling, 1 reply; 41+ messages in thread
From: Byungchul Park @ 2017-07-14  6:42 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Peter Zijlstra, Ingo Molnar, tglx, Michel Lespinasse, boqun.feng,
	kirill, linux-kernel, linux-mm, akpm, willy, npiggin,
	kernel-team

On Thu, Jul 13, 2017 at 08:23:33PM +0900, Byungchul Park wrote:
> On Thu, Jul 13, 2017 at 8:12 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, Jul 13, 2017 at 12:29:05PM +0200, Peter Zijlstra wrote:
> >> On Thu, Jul 13, 2017 at 07:09:53PM +0900, Byungchul Park wrote:
> >> > On Thu, Jul 13, 2017 at 11:50:52AM +0200, Peter Zijlstra wrote:
> >> > >   wait_for_completion(&C);
> >> > >     atomic_inc_return();
> >> > >
> >> > >                                   mutex_lock(A1);
> >> > >                                   mutex_unlock(A1);
> >> > >
> >> > >
> >> > >                                   <IRQ>
> >> > >                                     spin_lock(B1);
> >> > >                                     spin_unlock(B1);
> >> > >
> >> > >                                     ...
> >> > >
> >> > >                                     spin_lock(B64);
> >> > >                                     spin_unlock(B64);
> >> > >                                   </IRQ>
> >> > >
> >> > >
> >
> > Also consider the alternative:
> >
> >                                         <IRQ>
> >                                           spin_lock(D);
> >                                           spin_unlock(D);
> >
> >                                           complete(&C);
> >                                         </IRQ>
> >
> > in which case the context test will also not work.
> 
> Context tests are done on xhlock with the release context, _not_
> acquisition context. For example, spin_lock(D) and complete(&C) are
> in the same context, so the test would pass in this example.

I think you got confused. Or do I?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-13  8:14           ` Peter Zijlstra
  2017-07-13  8:57             ` Byungchul Park
@ 2017-07-18  1:25             ` Byungchul Park
  1 sibling, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-07-18  1:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team

On Thu, Jul 13, 2017 at 10:14:42AM +0200, Peter Zijlstra wrote:
> +static void __crossrelease_end(unsigned int *stamp)
> +{

[snip]

> +
> +	/*
> +	 * If we rewind past the tail; all of history is lost.
> +	 */
> +	if ((current->xhlock_idx_max - *stamp) < MAX_XHLOCKS_NR)
> +		return;
> +
> +	/*
> +	 * Invalidate the entire history..
> +	 */
> +	for (i = 0; i < MAX_XHLOCKS_NR; i++)
> +		invalidate_xhlock(&xhlock(i));
> +
> +	current->xhlock_idx = 0;
> +	current->xhlock_idx_hard = 0;
> +	current->xhlock_idx_soft = 0;
> +	current->xhlock_idx_hist = 0;
> +	current->xhlock_idx_max = 0;

I don't understand why you introduced this code, yet. Do we need this?

The other of your suggestion looks very good though..

> +}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-14  6:42                         ` Byungchul Park
@ 2017-07-21 13:54                           ` Peter Zijlstra
  2017-07-25  6:29                             ` Byungchul Park
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2017-07-21 13:54 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Byungchul Park, Ingo Molnar, tglx, Michel Lespinasse, boqun.feng,
	kirill, linux-kernel, linux-mm, akpm, willy, npiggin,
	kernel-team

On Fri, Jul 14, 2017 at 03:42:10PM +0900, Byungchul Park wrote:
> On Thu, Jul 13, 2017 at 08:23:33PM +0900, Byungchul Park wrote:
> > On Thu, Jul 13, 2017 at 8:12 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Thu, Jul 13, 2017 at 12:29:05PM +0200, Peter Zijlstra wrote:
> > >> On Thu, Jul 13, 2017 at 07:09:53PM +0900, Byungchul Park wrote:
> > >> > On Thu, Jul 13, 2017 at 11:50:52AM +0200, Peter Zijlstra wrote:
> > >> > >   wait_for_completion(&C);
> > >> > >     atomic_inc_return();
> > >> > >
> > >> > >                                   mutex_lock(A1);
> > >> > >                                   mutex_unlock(A1);
> > >> > >
> > >> > >
> > >> > >                                   <IRQ>
> > >> > >                                     spin_lock(B1);
> > >> > >                                     spin_unlock(B1);
> > >> > >
> > >> > >                                     ...
> > >> > >
> > >> > >                                     spin_lock(B64);
> > >> > >                                     spin_unlock(B64);
> > >> > >                                   </IRQ>
> > >> > >
> > >> > >
> > >
> > > Also consider the alternative:
> > >
> > >                                         <IRQ>
> > >                                           spin_lock(D);
> > >                                           spin_unlock(D);
> > >
> > >                                           complete(&C);
> > >                                         </IRQ>
> > >
> > > in which case the context test will also not work.
> > 
> > Context tests are done on xhlock with the release context, _not_
> > acquisition context. For example, spin_lock(D) and complete(&C) are
> > in the same context, so the test would pass in this example.

The point was, this example will also link C to B*.

(/me copy paste from older email)

That gives:

        xhist[ 0] = A1
        xhist[ 1] = B1
        ...
        xhist[63] = B63

then we wrap and have:

        xhist[0] = B64

then we rewind to 1 and invalidate to arrive at:

        xhist[ 0] = B64
        xhist[ 1] = NULL   <-- idx
        xhist[ 2] = B2
        ...
        xhist[63] = B63


Then we do D and get

        xhist[ 0] = B64
        xhist[ 1] = D   <-- idx
        xhist[ 2] = B2
        ...
        xhist[63] = B63


And now there is nothing that will invalidate B*, after all, the
gen_id's are all after C's stamp, and the same_context_xhlock() test
will also pass because they're all from IRQ context (albeit not the
same, but it cannot tell).


Does this explain? Or am I still missing something?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-21 13:54                           ` Peter Zijlstra
@ 2017-07-25  6:29                             ` Byungchul Park
  2017-07-25  8:45                               ` Peter Zijlstra
  0 siblings, 1 reply; 41+ messages in thread
From: Byungchul Park @ 2017-07-25  6:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Byungchul Park, Ingo Molnar, tglx, Michel Lespinasse, boqun.feng,
	kirill, linux-kernel, linux-mm, akpm, willy, npiggin,
	kernel-team

On Fri, Jul 21, 2017 at 03:54:20PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 14, 2017 at 03:42:10PM +0900, Byungchul Park wrote:
> > On Thu, Jul 13, 2017 at 08:23:33PM +0900, Byungchul Park wrote:
> > > On Thu, Jul 13, 2017 at 8:12 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > > On Thu, Jul 13, 2017 at 12:29:05PM +0200, Peter Zijlstra wrote:
> > > >> On Thu, Jul 13, 2017 at 07:09:53PM +0900, Byungchul Park wrote:
> > > >> > On Thu, Jul 13, 2017 at 11:50:52AM +0200, Peter Zijlstra wrote:
> > > >> > >   wait_for_completion(&C);
> > > >> > >     atomic_inc_return();
> > > >> > >
> > > >> > >                                   mutex_lock(A1);
> > > >> > >                                   mutex_unlock(A1);
> > > >> > >
> > > >> > >
> > > >> > >                                   <IRQ>
> > > >> > >                                     spin_lock(B1);
> > > >> > >                                     spin_unlock(B1);
> > > >> > >
> > > >> > >                                     ...
> > > >> > >
> > > >> > >                                     spin_lock(B64);
> > > >> > >                                     spin_unlock(B64);
> > > >> > >                                   </IRQ>
> > > >> > >
> > > >> > >
> > > >
> > > > Also consider the alternative:
> > > >
> > > >                                         <IRQ>
> > > >                                           spin_lock(D);
> > > >                                           spin_unlock(D);
> > > >
> > > >                                           complete(&C);
> > > >                                         </IRQ>
> > > >
> > > > in which case the context test will also not work.
> > > 
> > > Context tests are done on xhlock with the release context, _not_
> > > acquisition context. For example, spin_lock(D) and complete(&C) are
> > > in the same context, so the test would pass in this example.
> 
> The point was, this example will also link C to B*.

_No_, as I already said.

> (/me copy paste from older email)
> 
> That gives:
> 
>         xhist[ 0] = A1
>         xhist[ 1] = B1
>         ...
>         xhist[63] = B63
> 
> then we wrap and have:
> 
>         xhist[0] = B64
> 
> then we rewind to 1 and invalidate to arrive at:

We invalidate xhist[_0_], as I already said.

>         xhist[ 0] = B64
>         xhist[ 1] = NULL   <-- idx
>         xhist[ 2] = B2
>         ...
>         xhist[63] = B63
> 
> 
> Then we do D and get
> 
>         xhist[ 0] = B64
>         xhist[ 1] = D   <-- idx
>         xhist[ 2] = B2
>         ...
>         xhist[63] = B63

We should get

         xhist[ 0] = NULL
         xhist[ 1] = D   <-- idx
         xhist[ 2] = B2
         ...
         xhist[63] = B63

By the way, did not you get my reply? I did exactly same answer.
Perhaps You have not received or read my replies.

> And now there is nothing that will invalidate B*, after all, the
> gen_id's are all after C's stamp, and the same_context_xhlock() test
> will also pass because they're all from IRQ context (albeit not the
> same, but it cannot tell).

It will stop at xhist[0] because it has been invalidated.

> Does this explain? Or am I still missing something?

Could you read the following reply? Not enough?

https://lkml.org/lkml/2017/7/13/214

I am sorry if my english makes you hard to understand. But I already
answered all you asked.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite
  2017-07-25  6:29                             ` Byungchul Park
@ 2017-07-25  8:45                               ` Peter Zijlstra
  0 siblings, 0 replies; 41+ messages in thread
From: Peter Zijlstra @ 2017-07-25  8:45 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Byungchul Park, Ingo Molnar, tglx, Michel Lespinasse, boqun.feng,
	kirill, linux-kernel, linux-mm, akpm, willy, npiggin,
	kernel-team

On Tue, Jul 25, 2017 at 03:29:45PM +0900, Byungchul Park wrote:

> _No_, as I already said.
> 
> > (/me copy paste from older email)
> > 
> > That gives:
> > 
> >         xhist[ 0] = A1
> >         xhist[ 1] = B1
> >         ...
> >         xhist[63] = B63
> > 
> > then we wrap and have:
> > 
> >         xhist[0] = B64
> > 
> > then we rewind to 1 and invalidate to arrive at:
> 
> We invalidate xhist[_0_], as I already said.
> 
> >         xhist[ 0] = B64
> >         xhist[ 1] = NULL   <-- idx
> >         xhist[ 2] = B2
> >         ...
> >         xhist[63] = B63
> > 
> > 
> > Then we do D and get
> > 
> >         xhist[ 0] = B64
> >         xhist[ 1] = D   <-- idx
> >         xhist[ 2] = B2
> >         ...
> >         xhist[63] = B63
> 
> We should get
> 
>          xhist[ 0] = NULL
>          xhist[ 1] = D   <-- idx
>          xhist[ 2] = B2
>          ...
>          xhist[63] = B63
> 
> By the way, did not you get my reply? I did exactly same answer.
> Perhaps You have not received or read my replies.
> 
> > And now there is nothing that will invalidate B*, after all, the
> > gen_id's are all after C's stamp, and the same_context_xhlock() test
> > will also pass because they're all from IRQ context (albeit not the
> > same, but it cannot tell).
> 
> It will stop at xhist[0] because it has been invalidated.
> 
> > Does this explain? Or am I still missing something?
> 
> Could you read the following reply? Not enough?
> 
> https://lkml.org/lkml/2017/7/13/214
> 
> I am sorry if my english makes you hard to understand. But I already
> answered all you asked.

Ah, I think I see. It works because you commit backwards and terminate
on the invalidate.

Yes I had seen your emails, but the penny hadn't dropped, the light bulb
didn't switch on, etc.. sometimes I'm a little dense and need a little
more help.

Thanks, I'll go look at your latest posting now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 08/16] lockdep: Avoid adding redundant direct links of crosslocks
  2017-05-24  8:59 ` [PATCH v7 08/16] lockdep: Avoid adding redundant direct links of crosslocks Byungchul Park
@ 2017-07-25 15:41   ` Peter Zijlstra
  2017-07-26  7:16     ` Byungchul Park
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2017-07-25 15:41 UTC (permalink / raw)
  To: Byungchul Park
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team

On Wed, May 24, 2017 at 05:59:41PM +0900, Byungchul Park wrote:
> We can skip adding a dependency 'AX -> B', in case that we ensure 'AX ->
> the previous of B in hlocks' to be created, where AX is a crosslock and
> B is a typical lock. Remember that two adjacent locks in hlocks generate
> a dependency like 'prev -> next', that is, 'the previous of B in hlocks
> -> B' in this case.
> 
> For example:
> 
>              in hlocks[]
>              ------------
>           ^  A (gen_id: 4) --+
>           |                  | previous gen_id
>           |  B (gen_id: 3) <-+
>           |  C (gen_id: 3)
>           |  D (gen_id: 2)
>    oldest |  E (gen_id: 1)
> 
>              in xhlocks[]
>              ------------
>           ^  A (gen_id: 4, prev_gen_id: 3(B's gen id))
>           |  B (gen_id: 3, prev_gen_id: 3(C's gen id))
>           |  C (gen_id: 3, prev_gen_id: 2(D's gen id))
>           |  D (gen_id: 2, prev_gen_id: 1(E's gen id))
>    oldest |  E (gen_id: 1, prev_gen_id: NA)
> 
> On commit for a crosslock AX(gen_id = 3), it's engough to add 'AX -> C',
> but adding 'AX -> B' and 'AX -> A' is unnecessary since 'AX -> C', 'C ->
> B' and 'B -> A' cover them, which are guaranteed to be generated.
> 
> This patch intoduces a variable, prev_gen_id, to avoid adding this kind
> of redundant dependencies. In other words, the previous in hlocks will
> anyway handle it if the previous's gen_id >= the crosslock's gen_id.
> 

Didn't we talk about an alternative to this?

/me goes dig

 https://lkml.kernel.org/r/20170303091338.GH6536@twins.programming.kicks-ass.net

There and replies.

So how much does this save vs avoiding redundant links?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 08/16] lockdep: Avoid adding redundant direct links of crosslocks
  2017-07-25 15:41   ` Peter Zijlstra
@ 2017-07-26  7:16     ` Byungchul Park
  0 siblings, 0 replies; 41+ messages in thread
From: Byungchul Park @ 2017-07-26  7:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, walken, boqun.feng, kirill, linux-kernel, linux-mm,
	akpm, willy, npiggin, kernel-team

On Tue, Jul 25, 2017 at 05:41:36PM +0200, Peter Zijlstra wrote:
> On Wed, May 24, 2017 at 05:59:41PM +0900, Byungchul Park wrote:
> > We can skip adding a dependency 'AX -> B', in case that we ensure 'AX ->
> > the previous of B in hlocks' to be created, where AX is a crosslock and
> > B is a typical lock. Remember that two adjacent locks in hlocks generate
> > a dependency like 'prev -> next', that is, 'the previous of B in hlocks
> > -> B' in this case.
> > 
> > For example:
> > 
> >              in hlocks[]
> >              ------------
> >           ^  A (gen_id: 4) --+
> >           |                  | previous gen_id
> >           |  B (gen_id: 3) <-+
> >           |  C (gen_id: 3)
> >           |  D (gen_id: 2)
> >    oldest |  E (gen_id: 1)
> > 
> >              in xhlocks[]
> >              ------------
> >           ^  A (gen_id: 4, prev_gen_id: 3(B's gen id))
> >           |  B (gen_id: 3, prev_gen_id: 3(C's gen id))
> >           |  C (gen_id: 3, prev_gen_id: 2(D's gen id))
> >           |  D (gen_id: 2, prev_gen_id: 1(E's gen id))
> >    oldest |  E (gen_id: 1, prev_gen_id: NA)
> > 
> > On commit for a crosslock AX(gen_id = 3), it's engough to add 'AX -> C',
> > but adding 'AX -> B' and 'AX -> A' is unnecessary since 'AX -> C', 'C ->
> > B' and 'B -> A' cover them, which are guaranteed to be generated.
> > 
> > This patch intoduces a variable, prev_gen_id, to avoid adding this kind
> > of redundant dependencies. In other words, the previous in hlocks will
> > anyway handle it if the previous's gen_id >= the crosslock's gen_id.
> > 
> 
> Didn't we talk about an alternative to this?

Yes, we did. You said the optimazation was unnecessary, and I was not
sure if it's true, so added it at this time.

But *I will exclude this from next spin*.

> 
> /me goes dig
> 
>  https://lkml.kernel.org/r/20170303091338.GH6536@twins.programming.kicks-ass.net
> 
> There and replies.
> 
> So how much does this save vs avoiding redundant links?

No different on my qemu machine. The answer was:

https://lkml.org/lkml/2017/3/14/103

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2017-07-26  7:17 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-24  8:59 [PATCH v7 00/16] lockdep: Implement crossrelease feature Byungchul Park
2017-05-24  8:59 ` [PATCH v7 01/16] lockdep: Refactor lookup_chain_cache() Byungchul Park
2017-05-24  8:59 ` [PATCH v7 02/16] lockdep: Add a function building a chain between two classes Byungchul Park
2017-05-24  8:59 ` [PATCH v7 03/16] lockdep: Change the meaning of check_prev_add()'s return value Byungchul Park
2017-05-24  8:59 ` [PATCH v7 04/16] lockdep: Make check_prev_add() able to handle external stack_trace Byungchul Park
2017-05-24  8:59 ` [PATCH v7 05/16] lockdep: Implement crossrelease feature Byungchul Park
2017-06-13  0:33   ` Byungchul Park
2017-06-22 23:27     ` Byungchul Park
2017-07-11 16:04   ` Peter Zijlstra
2017-07-12  2:24     ` Byungchul Park
2017-05-24  8:59 ` [PATCH v7 06/16] lockdep: Detect and handle hist_lock ring buffer overwrite Byungchul Park
2017-07-11 16:12   ` Peter Zijlstra
2017-07-12  2:00     ` Byungchul Park
2017-07-12  7:56       ` Peter Zijlstra
2017-07-13  2:07         ` Byungchul Park
2017-07-13  8:14           ` Peter Zijlstra
2017-07-13  8:57             ` Byungchul Park
2017-07-13  9:50               ` Peter Zijlstra
2017-07-13 10:09                 ` Byungchul Park
2017-07-13 10:29                   ` Peter Zijlstra
2017-07-13 11:12                     ` Peter Zijlstra
2017-07-13 11:23                       ` Byungchul Park
2017-07-14  1:41                         ` Byungchul Park
2017-07-14  6:42                         ` Byungchul Park
2017-07-21 13:54                           ` Peter Zijlstra
2017-07-25  6:29                             ` Byungchul Park
2017-07-25  8:45                               ` Peter Zijlstra
2017-07-13 11:19                     ` Byungchul Park
2017-07-18  1:25             ` Byungchul Park
2017-05-24  8:59 ` [PATCH v7 07/16] lockdep: Handle non(or multi)-acquisition of a crosslock Byungchul Park
2017-05-24  8:59 ` [PATCH v7 08/16] lockdep: Avoid adding redundant direct links of crosslocks Byungchul Park
2017-07-25 15:41   ` Peter Zijlstra
2017-07-26  7:16     ` Byungchul Park
2017-05-24  8:59 ` [PATCH v7 09/16] lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS Byungchul Park
2017-05-24  8:59 ` [PATCH v7 10/16] lockdep: Make print_circular_bug() aware of crossrelease Byungchul Park
2017-05-24  8:59 ` [PATCH v7 11/16] lockdep: Apply crossrelease to completions Byungchul Park
2017-05-24  8:59 ` [PATCH v7 12/16] pagemap.h: Remove trailing white space Byungchul Park
2017-05-24  8:59 ` [PATCH v7 13/16] lockdep: Apply crossrelease to PG_locked locks Byungchul Park
2017-05-24  8:59 ` [PATCH v7 14/16] lockdep: Apply lock_acquire(release) on __Set(__Clear)PageLocked Byungchul Park
2017-05-24  8:59 ` [PATCH v7 15/16] lockdep: Move data of CONFIG_LOCKDEP_PAGELOCK from page to page_ext Byungchul Park
2017-05-24  8:59 ` [PATCH v7 16/16] lockdep: Crossrelease feature documentation Byungchul Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).