All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-04  8:17 ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

Hi Linus and folks,

I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock(?) acquisition order to try to
cover all synchonization machanisms. It's done on v5.18-rc3 tag.

https://github.com/lgebyungchulpark/linux-dept/commits/dept1.20_on_v5.18-rc3

Benifit:

	0. Works with all lock primitives.
	1. Works with wait_for_completion()/complete().
	2. Works with 'wait' on PG_locked.
	3. Works with 'wait' on PG_writeback.
	4. Works with swait/wakeup.
	5. Works with waitqueue.
	6. Multiple reports are allowed.
	7. Deduplication control on multiple reports.
	8. Withstand false positives thanks to 6.
	9. Easy to tag any wait/event.

Future work:

	0. To make it more stable.
	1. To separates Dept from Lockdep.
	2. To improves performance in terms of time and space.
	3. To use Dept as a dependency engine for Lockdep.
	4. To add any missing tags of wait/event in the kernel.
	5. To deduplicate stack trace.

How to interpret reports:

	1. E(event) in each context cannot be triggered because of the
	   W(wait) that cannot be woken.
	2. The stack trace helping find the problematic code is located
	   in each conext's detail.

Thanks,
Byungchul

---

Changes from v5:

	1. Use just pr_warn_once() rather than WARN_ONCE() on the lack
	   of internal resources because WARN_*() printing stacktrace is
	   too much for informing the lack. (feedback from Ted, Hyeonggon)
	2. Fix trivial bugs like missing initializing a struct before
	   using it.
	3. Assign a different class per task when handling onstack
	   variables for waitqueue or the like. Which makes Dept
	   distinguish between onstack variables of different tasks so
	   as to prevent false positives. (reported by Hyeonggon)
	4. Make Dept aware of even raw_local_irq_*() to prevent false
	   positives. (reported by Hyeonggon)
	5. Don't consider dependencies between the events that might be
	   triggered within __schedule() and the waits that requires
	    __schedule(), real ones. (reported by Hyeonggon)
	6. Unstage the staged wait that has prepare_to_wait_event()'ed
	   *and* yet to get to __schedule(), if we encounter __schedule()
	   in-between for another sleep, which is possible if e.g. a
	   mutex_lock() exists in 'condition' of ___wait_event().
	7. Turn on CONFIG_PROVE_LOCKING when CONFIG_DEPT is on, to rely
	   on the hardirq and softirq entrance tracing to make Dept more
	   portable for now.

Changes from v4:

	1. Fix some bugs that produce false alarms.
	2. Distinguish each syscall context from another *for arm64*.
	3. Make it not warn it but just print it in case Dept ring
	   buffer gets exhausted. (feedback from Hyeonggon)
	4. Explicitely describe "EXPERIMENTAL" and "Dept might produce
	   false positive reports" in Kconfig. (feedback from Ted)

Changes from v3:

	1. Dept shouldn't create dependencies between different depths
	   of a class that were indicated by *_lock_nested(). Dept
	   normally doesn't but it does once another lock class comes
	   in. So fixed it. (feedback from Hyeonggon)
	2. Dept considered a wait as a real wait once getting to
	   __schedule() even if it has been set to TASK_RUNNING by wake
	   up sources in advance. Fixed it so that Dept doesn't consider
	   the case as a real wait. (feedback from Jan Kara)
	3. Stop tracking dependencies with a map once the event
	   associated with the map has been handled. Dept will start to
	   work with the map again, on the next sleep.

Changes from v2:

	1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
	   reporting a lot of false positives, which is my fault.
	   Wait/event for bit_wait_table[] should've been tagged in a
	   higher layer for better work, which is a future work.
	   (feedback from Jan Kara)
	2. Disable Dept on crypto_larval's completion to prevent a false
	   positive.

Changes from v1:

	1. Fix coding style and typo. (feedback from Steven)
	2. Distinguish each work context from another in workqueue.
	3. Skip checking lock acquisition with nest_lock, which is about
	   correct lock usage that should be checked by Lockdep.

Changes from RFC(v0):

	1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
	   (feedback from Linus and Matthew)
	2. Use try version at lockdep_acquire_cpus_lock() annotation.
	3. Distinguish each syscall context from another.

Byungchul Park (21):
  llist: Move llist_{head,node} definition to types.h
  dept: Implement Dept(Dependency Tracker)
  dept: Apply Dept to spinlock
  dept: Apply Dept to mutex families
  dept: Apply Dept to rwlock
  dept: Apply Dept to wait_for_completion()/complete()
  dept: Apply Dept to seqlock
  dept: Apply Dept to rwsem
  dept: Add proc knobs to show stats and dependency graph
  dept: Introduce split map concept and new APIs for them
  dept: Apply Dept to wait/event of PG_{locked,writeback}
  dept: Apply SDT to swait
  dept: Apply SDT to wait(waitqueue)
  locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread
  dept: Distinguish each syscall context from another
  dept: Distinguish each work from another
  dept: Disable Dept within the wait_bit layer by default
  dept: Disable Dept on struct crypto_larval's completion for now
  dept: Differentiate onstack maps from others of different tasks in
    class
  dept: Do not add dependencies between events within scheduler and
    sleeps
  dept: Unstage wait when tagging a normal sleep wait

 arch/arm64/kernel/syscall.c        |    2 +
 arch/x86/entry/common.c            |    4 +
 crypto/api.c                       |    7 +-
 include/linux/completion.h         |   44 +-
 include/linux/dept.h               |  596 ++++++++
 include/linux/dept_page.h          |   78 +
 include/linux/dept_sdt.h           |   67 +
 include/linux/hardirq.h            |    3 +
 include/linux/irqflags.h           |   71 +-
 include/linux/llist.h              |    8 -
 include/linux/lockdep.h            |  186 ++-
 include/linux/lockdep_types.h      |    3 +
 include/linux/mutex.h              |   22 +
 include/linux/page-flags.h         |   45 +-
 include/linux/pagemap.h            |    7 +-
 include/linux/percpu-rwsem.h       |    4 +-
 include/linux/rtmutex.h            |    1 +
 include/linux/rwlock.h             |   42 +
 include/linux/rwlock_api_smp.h     |    8 +-
 include/linux/rwlock_types.h       |    1 +
 include/linux/rwsem.h              |   22 +
 include/linux/sched.h              |    7 +
 include/linux/seqlock.h            |   60 +-
 include/linux/spinlock.h           |   21 +
 include/linux/spinlock_types_raw.h |    3 +
 include/linux/swait.h              |    4 +
 include/linux/types.h              |    8 +
 include/linux/wait.h               |    6 +-
 init/init_task.c                   |    2 +
 init/main.c                        |    4 +
 kernel/Makefile                    |    1 +
 kernel/cpu.c                       |    2 +-
 kernel/dependency/Makefile         |    4 +
 kernel/dependency/dept.c           | 2938 ++++++++++++++++++++++++++++++++++++
 kernel/dependency/dept_hash.h      |   10 +
 kernel/dependency/dept_internal.h  |   26 +
 kernel/dependency/dept_object.h    |   13 +
 kernel/dependency/dept_proc.c      |   92 ++
 kernel/exit.c                      |    7 +
 kernel/fork.c                      |    2 +
 kernel/locking/lockdep.c           |   28 +-
 kernel/locking/spinlock_rt.c       |   24 +-
 kernel/module.c                    |    2 +
 kernel/sched/completion.c          |   12 +-
 kernel/sched/core.c                |   10 +
 kernel/sched/swait.c               |   10 +
 kernel/sched/wait.c                |   16 +
 kernel/sched/wait_bit.c            |    5 +-
 kernel/workqueue.c                 |    3 +
 lib/Kconfig.debug                  |   28 +
 mm/filemap.c                       |   68 +
 mm/page_ext.c                      |    5 +
 52 files changed, 4558 insertions(+), 84 deletions(-)
 create mode 100644 include/linux/dept.h
 create mode 100644 include/linux/dept_page.h
 create mode 100644 include/linux/dept_sdt.h
 create mode 100644 kernel/dependency/Makefile
 create mode 100644 kernel/dependency/dept.c
 create mode 100644 kernel/dependency/dept_hash.h
 create mode 100644 kernel/dependency/dept_internal.h
 create mode 100644 kernel/dependency/dept_object.h
 create mode 100644 kernel/dependency/dept_proc.c

-- 
1.9.1


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-04  8:17 ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Hi Linus and folks,

I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock(?) acquisition order to try to
cover all synchonization machanisms. It's done on v5.18-rc3 tag.

https://github.com/lgebyungchulpark/linux-dept/commits/dept1.20_on_v5.18-rc3

Benifit:

	0. Works with all lock primitives.
	1. Works with wait_for_completion()/complete().
	2. Works with 'wait' on PG_locked.
	3. Works with 'wait' on PG_writeback.
	4. Works with swait/wakeup.
	5. Works with waitqueue.
	6. Multiple reports are allowed.
	7. Deduplication control on multiple reports.
	8. Withstand false positives thanks to 6.
	9. Easy to tag any wait/event.

Future work:

	0. To make it more stable.
	1. To separates Dept from Lockdep.
	2. To improves performance in terms of time and space.
	3. To use Dept as a dependency engine for Lockdep.
	4. To add any missing tags of wait/event in the kernel.
	5. To deduplicate stack trace.

How to interpret reports:

	1. E(event) in each context cannot be triggered because of the
	   W(wait) that cannot be woken.
	2. The stack trace helping find the problematic code is located
	   in each conext's detail.

Thanks,
Byungchul

---

Changes from v5:

	1. Use just pr_warn_once() rather than WARN_ONCE() on the lack
	   of internal resources because WARN_*() printing stacktrace is
	   too much for informing the lack. (feedback from Ted, Hyeonggon)
	2. Fix trivial bugs like missing initializing a struct before
	   using it.
	3. Assign a different class per task when handling onstack
	   variables for waitqueue or the like. Which makes Dept
	   distinguish between onstack variables of different tasks so
	   as to prevent false positives. (reported by Hyeonggon)
	4. Make Dept aware of even raw_local_irq_*() to prevent false
	   positives. (reported by Hyeonggon)
	5. Don't consider dependencies between the events that might be
	   triggered within __schedule() and the waits that requires
	    __schedule(), real ones. (reported by Hyeonggon)
	6. Unstage the staged wait that has prepare_to_wait_event()'ed
	   *and* yet to get to __schedule(), if we encounter __schedule()
	   in-between for another sleep, which is possible if e.g. a
	   mutex_lock() exists in 'condition' of ___wait_event().
	7. Turn on CONFIG_PROVE_LOCKING when CONFIG_DEPT is on, to rely
	   on the hardirq and softirq entrance tracing to make Dept more
	   portable for now.

Changes from v4:

	1. Fix some bugs that produce false alarms.
	2. Distinguish each syscall context from another *for arm64*.
	3. Make it not warn it but just print it in case Dept ring
	   buffer gets exhausted. (feedback from Hyeonggon)
	4. Explicitely describe "EXPERIMENTAL" and "Dept might produce
	   false positive reports" in Kconfig. (feedback from Ted)

Changes from v3:

	1. Dept shouldn't create dependencies between different depths
	   of a class that were indicated by *_lock_nested(). Dept
	   normally doesn't but it does once another lock class comes
	   in. So fixed it. (feedback from Hyeonggon)
	2. Dept considered a wait as a real wait once getting to
	   __schedule() even if it has been set to TASK_RUNNING by wake
	   up sources in advance. Fixed it so that Dept doesn't consider
	   the case as a real wait. (feedback from Jan Kara)
	3. Stop tracking dependencies with a map once the event
	   associated with the map has been handled. Dept will start to
	   work with the map again, on the next sleep.

Changes from v2:

	1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
	   reporting a lot of false positives, which is my fault.
	   Wait/event for bit_wait_table[] should've been tagged in a
	   higher layer for better work, which is a future work.
	   (feedback from Jan Kara)
	2. Disable Dept on crypto_larval's completion to prevent a false
	   positive.

Changes from v1:

	1. Fix coding style and typo. (feedback from Steven)
	2. Distinguish each work context from another in workqueue.
	3. Skip checking lock acquisition with nest_lock, which is about
	   correct lock usage that should be checked by Lockdep.

Changes from RFC(v0):

	1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
	   (feedback from Linus and Matthew)
	2. Use try version at lockdep_acquire_cpus_lock() annotation.
	3. Distinguish each syscall context from another.

Byungchul Park (21):
  llist: Move llist_{head,node} definition to types.h
  dept: Implement Dept(Dependency Tracker)
  dept: Apply Dept to spinlock
  dept: Apply Dept to mutex families
  dept: Apply Dept to rwlock
  dept: Apply Dept to wait_for_completion()/complete()
  dept: Apply Dept to seqlock
  dept: Apply Dept to rwsem
  dept: Add proc knobs to show stats and dependency graph
  dept: Introduce split map concept and new APIs for them
  dept: Apply Dept to wait/event of PG_{locked,writeback}
  dept: Apply SDT to swait
  dept: Apply SDT to wait(waitqueue)
  locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread
  dept: Distinguish each syscall context from another
  dept: Distinguish each work from another
  dept: Disable Dept within the wait_bit layer by default
  dept: Disable Dept on struct crypto_larval's completion for now
  dept: Differentiate onstack maps from others of different tasks in
    class
  dept: Do not add dependencies between events within scheduler and
    sleeps
  dept: Unstage wait when tagging a normal sleep wait

 arch/arm64/kernel/syscall.c        |    2 +
 arch/x86/entry/common.c            |    4 +
 crypto/api.c                       |    7 +-
 include/linux/completion.h         |   44 +-
 include/linux/dept.h               |  596 ++++++++
 include/linux/dept_page.h          |   78 +
 include/linux/dept_sdt.h           |   67 +
 include/linux/hardirq.h            |    3 +
 include/linux/irqflags.h           |   71 +-
 include/linux/llist.h              |    8 -
 include/linux/lockdep.h            |  186 ++-
 include/linux/lockdep_types.h      |    3 +
 include/linux/mutex.h              |   22 +
 include/linux/page-flags.h         |   45 +-
 include/linux/pagemap.h            |    7 +-
 include/linux/percpu-rwsem.h       |    4 +-
 include/linux/rtmutex.h            |    1 +
 include/linux/rwlock.h             |   42 +
 include/linux/rwlock_api_smp.h     |    8 +-
 include/linux/rwlock_types.h       |    1 +
 include/linux/rwsem.h              |   22 +
 include/linux/sched.h              |    7 +
 include/linux/seqlock.h            |   60 +-
 include/linux/spinlock.h           |   21 +
 include/linux/spinlock_types_raw.h |    3 +
 include/linux/swait.h              |    4 +
 include/linux/types.h              |    8 +
 include/linux/wait.h               |    6 +-
 init/init_task.c                   |    2 +
 init/main.c                        |    4 +
 kernel/Makefile                    |    1 +
 kernel/cpu.c                       |    2 +-
 kernel/dependency/Makefile         |    4 +
 kernel/dependency/dept.c           | 2938 ++++++++++++++++++++++++++++++++++++
 kernel/dependency/dept_hash.h      |   10 +
 kernel/dependency/dept_internal.h  |   26 +
 kernel/dependency/dept_object.h    |   13 +
 kernel/dependency/dept_proc.c      |   92 ++
 kernel/exit.c                      |    7 +
 kernel/fork.c                      |    2 +
 kernel/locking/lockdep.c           |   28 +-
 kernel/locking/spinlock_rt.c       |   24 +-
 kernel/module.c                    |    2 +
 kernel/sched/completion.c          |   12 +-
 kernel/sched/core.c                |   10 +
 kernel/sched/swait.c               |   10 +
 kernel/sched/wait.c                |   16 +
 kernel/sched/wait_bit.c            |    5 +-
 kernel/workqueue.c                 |    3 +
 lib/Kconfig.debug                  |   28 +
 mm/filemap.c                       |   68 +
 mm/page_ext.c                      |    5 +
 52 files changed, 4558 insertions(+), 84 deletions(-)
 create mode 100644 include/linux/dept.h
 create mode 100644 include/linux/dept_page.h
 create mode 100644 include/linux/dept_sdt.h
 create mode 100644 kernel/dependency/Makefile
 create mode 100644 kernel/dependency/dept.c
 create mode 100644 kernel/dependency/dept_hash.h
 create mode 100644 kernel/dependency/dept_internal.h
 create mode 100644 kernel/dependency/dept_object.h
 create mode 100644 kernel/dependency/dept_proc.c

-- 
1.9.1


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 01/21] llist: Move llist_{head,node} definition to types.h
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

llist_head and llist_node can be used by very primitives. For example,
Dept for tracking dependency uses llist things in its header. To avoid
header dependency, move those to types.h.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/llist.h | 8 --------
 include/linux/types.h | 8 ++++++++
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/llist.h b/include/linux/llist.h
index 85bda2d..99cc3c3 100644
--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -53,14 +53,6 @@
 #include <linux/stddef.h>
 #include <linux/types.h>
 
-struct llist_head {
-	struct llist_node *first;
-};
-
-struct llist_node {
-	struct llist_node *next;
-};
-
 #define LLIST_HEAD_INIT(name)	{ NULL }
 #define LLIST_HEAD(name)	struct llist_head name = LLIST_HEAD_INIT(name)
 
diff --git a/include/linux/types.h b/include/linux/types.h
index ea8cf60a..b12a444 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -187,6 +187,14 @@ struct hlist_node {
 	struct hlist_node *next, **pprev;
 };
 
+struct llist_head {
+	struct llist_node *first;
+};
+
+struct llist_node {
+	struct llist_node *next;
+};
+
 struct ustat {
 	__kernel_daddr_t	f_tfree;
 #ifdef CONFIG_ARCH_32BIT_USTAT_F_TINODE
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 01/21] llist: Move llist_{head, node} definition to types.h
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

llist_head and llist_node can be used by very primitives. For example,
Dept for tracking dependency uses llist things in its header. To avoid
header dependency, move those to types.h.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/llist.h | 8 --------
 include/linux/types.h | 8 ++++++++
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/llist.h b/include/linux/llist.h
index 85bda2d..99cc3c3 100644
--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -53,14 +53,6 @@
 #include <linux/stddef.h>
 #include <linux/types.h>
 
-struct llist_head {
-	struct llist_node *first;
-};
-
-struct llist_node {
-	struct llist_node *next;
-};
-
 #define LLIST_HEAD_INIT(name)	{ NULL }
 #define LLIST_HEAD(name)	struct llist_head name = LLIST_HEAD_INIT(name)
 
diff --git a/include/linux/types.h b/include/linux/types.h
index ea8cf60a..b12a444 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -187,6 +187,14 @@ struct hlist_node {
 	struct hlist_node *next, **pprev;
 };
 
+struct llist_head {
+	struct llist_node *first;
+};
+
+struct llist_node {
+	struct llist_node *next;
+};
+
 struct ustat {
 	__kernel_daddr_t	f_tfree;
 #ifdef CONFIG_ARCH_32BIT_USTAT_F_TINODE
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 02/21] dept: Implement Dept(Dependency Tracker)
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

CURRENT STATUS
--------------
Lockdep tracks acquisition order of locks in order to detect deadlock,
and IRQ and IRQ enable/disable state as well to take accident
acquisitions into account.

Lockdep should be turned off once it detects and reports a deadlock
since the data structure and algorithm are not reusable after detection
because of the complex design.

PROBLEM
-------
*Waits* and their *events* that never reach eventually cause deadlock.
However, Lockdep is only interested in lock acquisition order, forcing
to emulate lock acqusition even for just waits and events that have
nothing to do with real lock.

Even worse, no one likes Lockdep's false positive detection because that
prevents further one that might be more valuable. That's why all the
kernel developers are sensitive to Lockdep's false positive.

Besides those, by tracking acquisition order, it cannot correctly deal
with read lock and cross-event e.g. wait_for_completion()/complete() for
deadlock detection. Lockdep is no longer a good tool for that purpose.

SOLUTION
--------
Again, *waits* and their *events* that never reach eventually cause
deadlock. The new solution, Dept(DEPendency Tracker), focuses on waits
and events themselves. Dept tracks waits and events and report it if
any event would be never reachable.

Dept does:
   . Works with read lock in the right way.
   . Works with any wait and event e.i. cross-event.
   . Continue to work even after reporting multiple times.
   . Provides simple and intuitive APIs.
   . Does exactly what dependency checker should do.

Q & A
-----
Q. Is this the first try ever to address the problem?
A. No. Cross-release feature (b09be676e0ff2 locking/lockdep: Implement
   the 'crossrelease' feature) addressed it 2 years ago that was a
   Lockdep extension and merged but reverted shortly because:

   Cross-release started to report valuable hidden problems but started
   to give report false positive reports as well. For sure, no one
   likes Lockdep's false positive reports since it makes Lockdep stop,
   preventing reporting further real problems.

Q. Why not Dept was developed as an extension of Lockdep?
A. Lockdep definitely includes all the efforts great developers have
   made for a long time so as to be quite stable enough. But I had to
   design and implement newly because of the following:

   1) Lockdep was designed to track lock acquisition order. The APIs and
      implementation do not fit on wait-event model.
   2) Lockdep is turned off on detection including false positive. Which
      is terrible and prevents developing any extension for stronger
      detection.

Q. Do you intend to totally replace Lockdep?
A. No. Lockdep also checks if lock usage is correct. Of course, the
   dependency check routine should be replaced but the other functions
   should be still there.

Q. Do you mean the dependency check routine should be replaced right
   away?
A. No. I admit Lockdep is stable enough thanks to great efforts kernel
   developers have made. Lockdep and Dept, both should be in the kernel
   until Dept gets considered stable.

Q. Stronger detection capability would give more false positive report.
   Which was a big problem when cross-release was introduced. Is it ok
   with Dept?
A. It's ok. Dept allows multiple reporting thanks to simple and quite
   generalized design. Of course, false positive reports should be fixed
   anyway but it's no longer as a critical problem as it was.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/dept.h            |  528 ++++++++
 include/linux/dept_sdt.h        |   60 +
 include/linux/hardirq.h         |    3 +
 include/linux/irqflags.h        |   71 +-
 include/linux/lockdep.h         |   61 +-
 include/linux/lockdep_types.h   |    3 +
 include/linux/sched.h           |    7 +
 init/init_task.c                |    2 +
 init/main.c                     |    2 +
 kernel/Makefile                 |    1 +
 kernel/dependency/Makefile      |    3 +
 kernel/dependency/dept.c        | 2633 +++++++++++++++++++++++++++++++++++++++
 kernel/dependency/dept_hash.h   |   10 +
 kernel/dependency/dept_object.h |   13 +
 kernel/exit.c                   |    1 +
 kernel/fork.c                   |    2 +
 kernel/locking/lockdep.c        |   28 +-
 kernel/module.c                 |    2 +
 kernel/sched/core.c             |    8 +
 lib/Kconfig.debug               |   27 +
 20 files changed, 3433 insertions(+), 32 deletions(-)
 create mode 100644 include/linux/dept.h
 create mode 100644 include/linux/dept_sdt.h
 create mode 100644 kernel/dependency/Makefile
 create mode 100644 kernel/dependency/dept.c
 create mode 100644 kernel/dependency/dept_hash.h
 create mode 100644 kernel/dependency/dept_object.h

diff --git a/include/linux/dept.h b/include/linux/dept.h
new file mode 100644
index 00000000..c498060
--- /dev/null
+++ b/include/linux/dept.h
@@ -0,0 +1,528 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * DEPT(DEPendency Tracker) - runtime dependency tracker
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __LINUX_DEPT_H
+#define __LINUX_DEPT_H
+
+#ifdef CONFIG_DEPT
+
+#include <linux/types.h>
+
+struct task_struct;
+
+#define DEPT_MAX_STACK_ENTRY		16
+#define DEPT_MAX_WAIT_HIST		64
+#define DEPT_MAX_ECXT_HELD		48
+
+#define DEPT_MAX_SUBCLASSES		16
+#define DEPT_MAX_SUBCLASSES_EVT		2
+#define DEPT_MAX_SUBCLASSES_USR		(DEPT_MAX_SUBCLASSES / DEPT_MAX_SUBCLASSES_EVT)
+#define DEPT_MAX_SUBCLASSES_CACHE	2
+
+#define DEPT_SIRQ			0
+#define DEPT_HIRQ			1
+#define DEPT_IRQS_NR			2
+#define DEPT_SIRQF			(1UL << DEPT_SIRQ)
+#define DEPT_HIRQF			(1UL << DEPT_HIRQ)
+
+struct dept_ecxt;
+struct dept_iecxt {
+	struct dept_ecxt		*ecxt;
+	int				enirq;
+	/*
+	 * for preventing to add a new ecxt
+	 */
+	bool				staled;
+};
+
+struct dept_wait;
+struct dept_iwait {
+	struct dept_wait		*wait;
+	int				irq;
+	/*
+	 * for preventing to add a new wait
+	 */
+	bool				staled;
+	bool				touched;
+};
+
+struct dept_class {
+	union {
+		struct llist_node	pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t		ref;
+	};
+
+	/*
+	 * unique information about the class
+	 */
+	const char			*name;
+	unsigned long			key;
+	int				sub;
+
+	/*
+	 * for BFS
+	 */
+	unsigned int			bfs_gen;
+	int				bfs_dist;
+	struct dept_class		*bfs_parent;
+
+	/*
+	 * for hashing this object
+	 */
+	struct hlist_node		hash_node;
+
+	/*
+	 * for linking all classes
+	 */
+	struct list_head		all_node;
+
+	/*
+	 * for associating its dependencies
+	 */
+	struct list_head		dep_head;
+	struct list_head		dep_rev_head;
+
+	/*
+	 * for tracking IRQ dependencies
+	 */
+	struct dept_iecxt		iecxt[DEPT_IRQS_NR];
+	struct dept_iwait		iwait[DEPT_IRQS_NR];
+};
+
+struct dept_stack {
+	union {
+		struct llist_node	pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t		ref;
+	};
+
+	/*
+	 * backtrace entries
+	 */
+	unsigned long			raw[DEPT_MAX_STACK_ENTRY];
+	int nr;
+};
+
+struct dept_ecxt {
+	union {
+		struct llist_node	pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t		ref;
+	};
+
+	/*
+	 * function that entered to this ecxt
+	 */
+	const char			*ecxt_fn;
+
+	/*
+	 * event function
+	 */
+	const char			*event_fn;
+
+	/*
+	 * associated class
+	 */
+	struct dept_class		*class;
+
+	/*
+	 * flag indicating which IRQ has been
+	 * enabled within the event context
+	 */
+	unsigned long			enirqf;
+
+	/*
+	 * where the IRQ-enabled happened
+	 */
+	unsigned long			enirq_ip[DEPT_IRQS_NR];
+	struct dept_stack		*enirq_stack[DEPT_IRQS_NR];
+
+	/*
+	 * where the event context started
+	 */
+	unsigned long			ecxt_ip;
+	struct dept_stack		*ecxt_stack;
+
+	/*
+	 * where the event triggered
+	 */
+	unsigned long			event_ip;
+	struct dept_stack		*event_stack;
+};
+
+struct dept_wait {
+	union {
+		struct llist_node	pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t		ref;
+	};
+
+	/*
+	 * function causing this wait
+	 */
+	const char			*wait_fn;
+
+	/*
+	 * the associated class
+	 */
+	struct dept_class		*class;
+
+	/*
+	 * which IRQ the wait was placed in
+	 */
+	unsigned long			irqf;
+
+	/*
+	 * where the IRQ wait happened
+	 */
+	unsigned long			irq_ip[DEPT_IRQS_NR];
+	struct dept_stack		*irq_stack[DEPT_IRQS_NR];
+
+	/*
+	 * where the wait happened
+	 */
+	unsigned long			wait_ip;
+	struct dept_stack		*wait_stack;
+};
+
+struct dept_dep {
+	union {
+		struct llist_node	pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t		ref;
+	};
+
+	/*
+	 * key data of dependency
+	 */
+	struct dept_ecxt		*ecxt;
+	struct dept_wait		*wait;
+
+	/*
+	 * This object can be referred without dept_lock
+	 * held but with IRQ disabled, e.g. for hash
+	 * lookup. So deferred deletion is needed.
+	 */
+	struct rcu_head			rh;
+
+	/*
+	 * for BFS
+	 */
+	struct list_head		bfs_node;
+
+	/*
+	 * for hashing this object
+	 */
+	struct hlist_node		hash_node;
+
+	/*
+	 * for linking to a class object
+	 */
+	struct list_head		dep_node;
+	struct list_head		dep_rev_node;
+};
+
+struct dept_hash {
+	/*
+	 * hash table
+	 */
+	struct hlist_head		*table;
+
+	/*
+	 * size of the table e.i. 2^bits
+	 */
+	int				bits;
+};
+
+struct dept_pool {
+	const char			*name;
+
+	/*
+	 * object size
+	 */
+	size_t				obj_sz;
+
+	/*
+	 * the number of the static array
+	 */
+	atomic_t			obj_nr;
+
+	/*
+	 * offset of ->pool_node
+	 */
+	size_t				node_off;
+
+	/*
+	 * pointer to the pool
+	 */
+	void				*spool;
+	struct llist_head		boot_pool;
+	struct llist_head __percpu	*lpool;
+};
+
+struct dept_ecxt_held {
+	/*
+	 * associated event context
+	 */
+	struct dept_ecxt		*ecxt;
+
+	/*
+	 * unique key for this dept_ecxt_held
+	 */
+	unsigned long			key;
+
+	/*
+	 * the wgen when the event context started
+	 */
+	unsigned int			wgen;
+
+	/*
+	 * for allowing user aware nesting
+	 */
+	int				nest;
+};
+
+struct dept_wait_hist {
+	/*
+	 * associated wait
+	 */
+	struct dept_wait		*wait;
+
+	/*
+	 * unique id of all waits system-wise until wrapped
+	 */
+	unsigned int			wgen;
+
+	/*
+	 * local context id to identify IRQ context
+	 */
+	unsigned int			ctxt_id;
+};
+
+struct dept_key {
+	union {
+		/*
+		 * Each byte-wise address will be used as its key.
+		 */
+		char			subkeys[DEPT_MAX_SUBCLASSES];
+
+		/*
+		 * for caching the main class pointer
+		 */
+		struct dept_class	*classes[DEPT_MAX_SUBCLASSES_CACHE];
+	};
+};
+
+struct dept_map {
+	const char			*name;
+	struct dept_key			*keys;
+	int				sub_usr;
+
+	/*
+	 * It's local copy for fast acces to the associated classes. And
+	 * Also used for dept_key instance for statically defined map.
+	 */
+	struct dept_key			keys_local;
+
+	/*
+	 * wait timestamp associated to this map
+	 */
+	unsigned int			wgen;
+
+	/*
+	 * whether this map should be going to be checked or not
+	 */
+	bool				nocheck;
+};
+
+#define DEPT_MAP_INITIALIZER(n)						\
+{									\
+	.name = #n,							\
+	.keys = NULL,							\
+	.sub_usr = 0,							\
+	.keys_local = { .classes = { 0 } },				\
+	.wgen = 0U,							\
+	.nocheck = false,						\
+}
+
+struct dept_task {
+	/*
+	 * all event contexts that have entered and before exiting
+	 */
+	struct dept_ecxt_held		ecxt_held[DEPT_MAX_ECXT_HELD];
+	int				ecxt_held_pos;
+
+	/*
+	 * ring buffer holding all waits that have happened
+	 */
+	struct dept_wait_hist		wait_hist[DEPT_MAX_WAIT_HIST];
+	int				wait_hist_pos;
+
+	/*
+	 * sequential id to identify each IRQ context
+	 */
+	unsigned int			irq_id[DEPT_IRQS_NR];
+
+	/*
+	 * for tracking IRQ-enabled points with cross-event
+	 */
+	unsigned int			wgen_enirq[DEPT_IRQS_NR];
+
+	/*
+	 * for keeping up-to-date IRQ-enabled points
+	 */
+	unsigned long			enirq_ip[DEPT_IRQS_NR];
+
+	/*
+	 * current effective IRQ-enabled flag
+	 */
+	unsigned long			eff_enirqf;
+
+	/*
+	 * for reserving a current stack instance at each operation
+	 */
+	struct dept_stack		*stack;
+
+	/*
+	 * for preventing recursive call into DEPT engine
+	 */
+	int				recursive;
+
+	/*
+	 * for staging data to commit a wait
+	 */
+	struct dept_map			*stage_m;
+	unsigned long			stage_w_f;
+	const char			*stage_w_fn;
+	int				stage_ne;
+
+	/*
+	 * the number of missing ecxts
+	 */
+	int				missing_ecxt;
+
+	/*
+	 * for tracking IRQ-enable state
+	 */
+	bool				hardirqs_enabled;
+	bool				softirqs_enabled;
+};
+
+#define DEPT_TASK_INITIALIZER(t)				\
+{								\
+	.wait_hist = { { .wait = NULL, } },			\
+	.ecxt_held_pos = 0,					\
+	.wait_hist_pos = 0,					\
+	.irq_id = { 0U },					\
+	.wgen_enirq = { 0U },					\
+	.enirq_ip = { 0UL },					\
+	.eff_enirqf = 0UL,					\
+	.stack = NULL,						\
+	.recursive = 0,						\
+	.stage_m = NULL,					\
+	.stage_w_f = 0UL,					\
+	.stage_w_fn = NULL,					\
+	.stage_ne = 0,						\
+	.missing_ecxt = 0,					\
+	.hardirqs_enabled = false,				\
+	.softirqs_enabled = false,				\
+}
+
+extern void dept_on(void);
+extern void dept_off(void);
+extern void dept_init(void);
+extern void dept_task_init(struct task_struct *t);
+extern void dept_task_exit(struct task_struct *t);
+extern void dept_free_range(void *start, unsigned int sz);
+extern void dept_map_init(struct dept_map *m, struct dept_key *k, int sub, const char *n);
+extern void dept_map_reinit(struct dept_map *m);
+extern void dept_map_nocheck(struct dept_map *m);
+
+extern void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip, const char *w_fn, int ne);
+extern void dept_stage_wait(struct dept_map *m, unsigned long w_f, const char *w_fn, int ne);
+extern void dept_ask_event_wait_commit(unsigned long ip);
+extern void dept_clean_stage(void);
+extern void dept_ecxt_enter(struct dept_map *m, unsigned long e_f, unsigned long ip, const char *c_fn, const char *e_fn, int ne);
+extern void dept_ask_event(struct dept_map *m);
+extern void dept_event(struct dept_map *m, unsigned long e_f, unsigned long ip, const char *e_fn);
+extern void dept_ecxt_exit(struct dept_map *m, unsigned long e_f, unsigned long ip);
+
+static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
+{
+	dept_ecxt_enter(m, 0UL, 0UL, NULL, NULL, 0);
+}
+
+/*
+ * for users who want to manage external keys
+ */
+extern void dept_key_init(struct dept_key *k);
+extern void dept_key_destroy(struct dept_key *k);
+
+extern void dept_softirq_enter(void);
+extern void dept_hardirq_enter(void);
+extern void dept_aware_softirqs_enable(void);
+extern void dept_aware_hardirqs_enable(void);
+extern void dept_aware_softirqs_disable(void);
+extern void dept_aware_hardirqs_disable(void);
+extern void dept_enirq_transition(unsigned long ip);
+#else /* !CONFIG_DEPT */
+struct dept_key  { };
+struct dept_map  { };
+struct dept_task { };
+
+#define DEPT_MAP_INITIALIZER(n) { }
+#define DEPT_TASK_INITIALIZER(t) { }
+
+#define dept_on()				do { } while (0)
+#define dept_off()				do { } while (0)
+#define dept_init()				do { } while (0)
+#define dept_task_init(t)			do { } while (0)
+#define dept_task_exit(t)			do { } while (0)
+#define dept_free_range(s, sz)			do { } while (0)
+#define dept_map_init(m, k, s, n)		do { (void)(n); (void)(k); } while (0)
+#define dept_map_reinit(m)			do { } while (0)
+#define dept_map_nocheck(m)			do { } while (0)
+
+#define dept_wait(m, w_f, ip, w_fn, ne)		do { (void)(w_fn); } while (0)
+#define dept_stage_wait(m, w_f, w_fn, ne)	do { (void)(w_fn); } while (0)
+#define dept_ask_event_wait_commit(ip)		do { } while (0)
+#define dept_clean_stage()			do { } while (0)
+#define dept_ecxt_enter(m, e_f, ip, c_fn, e_fn, ne) do { (void)(c_fn); (void)(e_fn); } while (0)
+#define dept_ask_event(m)			do { } while (0)
+#define dept_event(m, e_f, ip, e_fn)		do { (void)(e_fn); } while (0)
+#define dept_ecxt_exit(m, e_f, ip)		do { } while (0)
+#define dept_ecxt_enter_nokeep(m)		do { } while (0)
+#define dept_key_init(k)			do { (void)(k); } while (0)
+#define dept_key_destroy(k)			do { (void)(k); } while (0)
+
+#define dept_softirq_enter()				do { } while (0)
+#define dept_hardirq_enter()				do { } while (0)
+#define dept_aware_softirqs_enable()			do { } while (0)
+#define dept_aware_hardirqs_enable()			do { } while (0)
+#define dept_aware_softirqs_disable()			do { } while (0)
+#define dept_aware_hardirqs_disable()			do { } while (0)
+#define dept_enirq_transition(ip)			do { } while (0)
+#endif
+#endif /* __LINUX_DEPT_H */
diff --git a/include/linux/dept_sdt.h b/include/linux/dept_sdt.h
new file mode 100644
index 00000000..49763cd
--- /dev/null
+++ b/include/linux/dept_sdt.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Dept Single-event Dependency Tracker
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __LINUX_DEPT_SDT_H
+#define __LINUX_DEPT_SDT_H
+
+#include <linux/dept.h>
+
+#ifdef CONFIG_DEPT
+/*
+ * SDT(Single-event Dependency Tracker) APIs
+ *
+ * In case that one dept_map instance maps to a single event, SDT APIs
+ * can be used.
+ */
+#define sdt_map_init(m)							\
+	do {								\
+		static struct dept_key __key;				\
+		dept_map_init(m, &__key, 0, #m);			\
+	} while (0)
+#define sdt_map_init_key(m, k)		dept_map_init(m, k, 0, #m)
+
+#define sdt_wait(m)							\
+	do {								\
+		dept_ask_event(m);					\
+		dept_wait(m, 1UL, _THIS_IP_, "wait", 0);		\
+	} while (0)
+/*
+ * This will be committed in __schedule() when it actually gets to
+ * __schedule(). Both dept_ask_event() and dept_wait() will be performed
+ * on the commit in __schedule().
+ */
+#define sdt_wait_prepare(m)		dept_stage_wait(m, 1UL, "wait", 0)
+#define sdt_wait_finish()		dept_clean_stage()
+#define sdt_ecxt_enter(m)		dept_ecxt_enter(m, 1UL, _THIS_IP_, "start", "event", 0)
+#define sdt_event(m)			dept_event(m, 1UL, _THIS_IP_, "event")
+#define sdt_ecxt_exit(m)		dept_ecxt_exit(m, 1UL, _THIS_IP_)
+#else /* !CONFIG_DEPT */
+#define DEPT_SDT_MAP_INIT(dname)	{ }
+
+#define sdt_map_init(m)			do { } while (0)
+#define sdt_map_init_key(m, k)		do { (void)(k); } while (0)
+#define sdt_wait(m)			do { } while (0)
+#define sdt_wait_prepare(m)		do { } while (0)
+#define sdt_wait_finish()		do { } while (0)
+#define sdt_ecxt_enter(m)		do { } while (0)
+#define sdt_event(m)			do { } while (0)
+#define sdt_ecxt_exit(m)		do { } while (0)
+#endif
+
+#define DEFINE_DEPT_SDT(x)		\
+	struct dept_map x = DEPT_MAP_INITIALIZER(x)
+
+#endif /* __LINUX_DEPT_SDT_H */
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 76878b3..07005f2 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -5,6 +5,7 @@
 #include <linux/context_tracking_state.h>
 #include <linux/preempt.h>
 #include <linux/lockdep.h>
+#include <linux/dept.h>
 #include <linux/ftrace_irq.h>
 #include <linux/sched.h>
 #include <linux/vtime.h>
@@ -114,6 +115,7 @@ static inline void rcu_nmi_exit(void) { }
  */
 #define __nmi_enter()						\
 	do {							\
+		dept_off();					\
 		lockdep_off();					\
 		arch_nmi_enter();				\
 		BUG_ON(in_nmi() == NMI_MASK);			\
@@ -136,6 +138,7 @@ static inline void rcu_nmi_exit(void) { }
 		__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
 		arch_nmi_exit();				\
 		lockdep_on();					\
+		dept_on();					\
 	} while (0)
 
 #define nmi_exit()						\
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 4b14093..d168fa3 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -13,23 +13,52 @@
 #define _LINUX_TRACE_IRQFLAGS_H
 
 #include <linux/typecheck.h>
+#include <linux/dept.h>
 #include <asm/irqflags.h>
 #include <asm/percpu.h>
 
 /* Currently lockdep_softirqs_on/off is used only by lockdep */
 #ifdef CONFIG_PROVE_LOCKING
-  extern void lockdep_softirqs_on(unsigned long ip);
-  extern void lockdep_softirqs_off(unsigned long ip);
-  extern void lockdep_hardirqs_on_prepare(unsigned long ip);
-  extern void lockdep_hardirqs_on(unsigned long ip);
-  extern void lockdep_hardirqs_off(unsigned long ip);
+  extern void __lockdep_softirqs_on(unsigned long ip);
+  extern void __lockdep_softirqs_off(unsigned long ip);
+  extern void __lockdep_hardirqs_on_prepare(unsigned long ip);
+  extern void __lockdep_hardirqs_on(unsigned long ip);
+  extern void __lockdep_hardirqs_off(unsigned long ip);
 #else
-  static inline void lockdep_softirqs_on(unsigned long ip) { }
-  static inline void lockdep_softirqs_off(unsigned long ip) { }
-  static inline void lockdep_hardirqs_on_prepare(unsigned long ip) { }
-  static inline void lockdep_hardirqs_on(unsigned long ip) { }
-  static inline void lockdep_hardirqs_off(unsigned long ip) { }
+  static inline void __lockdep_softirqs_on(unsigned long ip) { }
+  static inline void __lockdep_softirqs_off(unsigned long ip) { }
+  static inline void __lockdep_hardirqs_on_prepare(unsigned long ip) { }
+  static inline void __lockdep_hardirqs_on(unsigned long ip) { }
+  static inline void __lockdep_hardirqs_off(unsigned long ip) { }
 #endif
+static inline void lockdep_softirqs_on(unsigned long ip)
+{
+	__lockdep_softirqs_on(ip);
+	dept_aware_softirqs_enable();
+	dept_enirq_transition(ip);
+}
+static inline void lockdep_softirqs_off(unsigned long ip)
+{
+	__lockdep_softirqs_off(ip);
+	dept_aware_softirqs_disable();
+	dept_enirq_transition(ip);
+}
+static inline void lockdep_hardirqs_on_prepare(unsigned long ip)
+{
+	__lockdep_hardirqs_on_prepare(ip);
+	dept_aware_hardirqs_enable();
+	dept_enirq_transition(ip);
+}
+static inline void lockdep_hardirqs_on(unsigned long ip)
+{
+	__lockdep_hardirqs_on(ip);
+}
+static inline void lockdep_hardirqs_off(unsigned long ip)
+{
+	__lockdep_hardirqs_off(ip);
+	dept_aware_hardirqs_disable();
+	dept_enirq_transition(ip);
+}
 
 #ifdef CONFIG_TRACE_IRQFLAGS
 
@@ -60,8 +89,10 @@ struct irqtrace_events {
 # define lockdep_softirqs_enabled(p)	((p)->softirqs_enabled)
 # define lockdep_hardirq_enter()			\
 do {							\
-	if (__this_cpu_inc_return(hardirq_context) == 1)\
+	if (__this_cpu_inc_return(hardirq_context) == 1) {\
 		current->hardirq_threaded = 0;		\
+		dept_hardirq_enter();			\
+	}						\
 } while (0)
 # define lockdep_hardirq_threaded()		\
 do {						\
@@ -135,7 +166,8 @@ struct irqtrace_events {
 #if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PREEMPT_RT)
 # define lockdep_softirq_enter()		\
 do {						\
-	current->softirq_context++;		\
+	if (!current->softirq_context++)	\
+		dept_softirq_enter();		\
 } while (0)
 # define lockdep_softirq_exit()			\
 do {						\
@@ -170,17 +202,28 @@ struct irqtrace_events {
 /*
  * Wrap the arch provided IRQ routines to provide appropriate checks.
  */
-#define raw_local_irq_disable()		arch_local_irq_disable()
-#define raw_local_irq_enable()		arch_local_irq_enable()
+#define raw_local_irq_disable()				\
+	do {						\
+		arch_local_irq_disable();		\
+		dept_aware_hardirqs_disable();		\
+	} while (0)
+#define raw_local_irq_enable()				\
+	do {						\
+		dept_aware_hardirqs_enable();		\
+		arch_local_irq_enable();		\
+	} while (0)
 #define raw_local_irq_save(flags)			\
 	do {						\
 		typecheck(unsigned long, flags);	\
 		flags = arch_local_irq_save();		\
+		dept_aware_hardirqs_disable();		\
 	} while (0)
 #define raw_local_irq_restore(flags)			\
 	do {						\
 		typecheck(unsigned long, flags);	\
 		raw_check_bogus_irq_restore();		\
+		if (!arch_irqs_disabled_flags(flags))	\
+			dept_aware_hardirqs_enable();	\
 		arch_local_irq_restore(flags);		\
 	} while (0)
 #define raw_local_save_flags(flags)			\
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 467b942..aee4660 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -20,6 +20,33 @@
 extern int prove_locking;
 extern int lock_stat;
 
+#ifdef CONFIG_DEPT
+static inline void dept_after_copy_map(struct dept_map *to,
+				       struct dept_map *from)
+{
+	int i;
+
+	if (from->keys == &from->keys_local)
+		to->keys = &to->keys_local;
+
+	if (!to->keys)
+		return;
+
+	/*
+	 * Since the class cache can be modified concurrently we could observe
+	 * half pointers (64bit arch using 32bit copy insns). Therefore clear
+	 * the caches and take the performance hit.
+	 *
+	 * XXX it doesn't work well with lockdep_set_class_and_subclass(), since
+	 *     that relies on cache abuse.
+	 */
+	for (i = 0; i < DEPT_MAX_SUBCLASSES_CACHE; i++)
+		to->keys->classes[i] = NULL;
+}
+#else
+#define dept_after_copy_map(t, f)	do { } while (0)
+#endif
+
 #ifdef CONFIG_LOCKDEP
 
 #include <linux/linkage.h>
@@ -43,6 +70,8 @@ static inline void lockdep_copy_map(struct lockdep_map *to,
 	 */
 	for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++)
 		to->class_cache[i] = NULL;
+
+	dept_after_copy_map(&to->dmap, &from->dmap);
 }
 
 /*
@@ -176,8 +205,19 @@ struct held_lock {
 	current->lockdep_recursion -= LOCKDEP_OFF;	\
 } while (0)
 
-extern void lockdep_register_key(struct lock_class_key *key);
-extern void lockdep_unregister_key(struct lock_class_key *key);
+extern void __lockdep_register_key(struct lock_class_key *key);
+extern void __lockdep_unregister_key(struct lock_class_key *key);
+
+#define lockdep_register_key(k)				\
+do {							\
+	__lockdep_register_key(k);			\
+	dept_key_init(&(k)->dkey);			\
+} while (0)
+#define lockdep_unregister_key(k)			\
+do {							\
+	__lockdep_unregister_key(k);			\
+	dept_key_destroy(&(k)->dkey);			\
+} while (0)
 
 /*
  * These methods are used by specific locking variants (spinlocks,
@@ -185,9 +225,18 @@ struct held_lock {
  * to lockdep:
  */
 
-extern void lockdep_init_map_type(struct lockdep_map *lock, const char *name,
+extern void __lockdep_init_map_type(struct lockdep_map *lock, const char *name,
 	struct lock_class_key *key, int subclass, u8 inner, u8 outer, u8 lock_type);
 
+#define lockdep_init_map_type(l, n, k, s, i, o, t)		\
+do {								\
+	__lockdep_init_map_type(l, n, k, s, i, o, t);		\
+	if ((k) == &__lockdep_no_validate__)			\
+		dept_map_nocheck(&(l)->dmap);			\
+	else							\
+		dept_map_init(&(l)->dmap, &(k)->dkey, s, n);	\
+} while (0)
+
 static inline void
 lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
 		       struct lock_class_key *key, int subclass, u8 inner, u8 outer)
@@ -435,9 +484,13 @@ enum xhlock_context_t {
 /*
  * To initialize a lockdep_map statically use this macro.
  * Note that _name must not be NULL.
+ *
+ * TODO: I found the case to use an address of other than a real key as
+ * _key, for instance, in workqueue. We cannot use it as key in Dept.
  */
 #define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
-	{ .name = (_name), .key = (void *)(_key), }
+	{ .name = (_name), .key = (void *)(_key), \
+	  .dmap = DEPT_MAP_INITIALIZER(_name) }
 
 static inline void lockdep_invariant_state(bool force) {}
 static inline void lockdep_free_task(struct task_struct *task) {}
diff --git a/include/linux/lockdep_types.h b/include/linux/lockdep_types.h
index d224308..50c8879 100644
--- a/include/linux/lockdep_types.h
+++ b/include/linux/lockdep_types.h
@@ -11,6 +11,7 @@
 #define __LINUX_LOCKDEP_TYPES_H
 
 #include <linux/types.h>
+#include <linux/dept.h>
 
 #define MAX_LOCKDEP_SUBCLASSES		8UL
 
@@ -76,6 +77,7 @@ struct lock_class_key {
 		struct hlist_node		hash_entry;
 		struct lockdep_subclass_key	subkeys[MAX_LOCKDEP_SUBCLASSES];
 	};
+	struct dept_key				dkey;
 };
 
 extern struct lock_class_key __lockdep_no_validate__;
@@ -185,6 +187,7 @@ struct lockdep_map {
 	int				cpu;
 	unsigned long			ip;
 #endif
+	struct dept_map			dmap;
 };
 
 struct pin_cookie { unsigned int val; };
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00..3716e41 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -35,6 +35,7 @@
 #include <linux/seqlock.h>
 #include <linux/kcsan.h>
 #include <asm/kmap_size.h>
+#include <linux/dept.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
 struct audit_context;
@@ -201,12 +202,16 @@
  */
 #define __set_current_state(state_value)				\
 	do {								\
+		if (state_value == TASK_RUNNING)			\
+			dept_clean_stage();				\
 		debug_normal_state_change((state_value));		\
 		WRITE_ONCE(current->__state, (state_value));		\
 	} while (0)
 
 #define set_current_state(state_value)					\
 	do {								\
+		if (state_value == TASK_RUNNING)			\
+			dept_clean_stage();				\
 		debug_normal_state_change((state_value));		\
 		smp_store_mb(current->__state, (state_value));		\
 	} while (0)
@@ -1156,6 +1161,8 @@ struct task_struct {
 	struct held_lock		held_locks[MAX_LOCK_DEPTH];
 #endif
 
+	struct dept_task		dept_task;
+
 #if defined(CONFIG_UBSAN) && !defined(CONFIG_UBSAN_TRAP)
 	unsigned int			in_ubsan;
 #endif
diff --git a/init/init_task.c b/init/init_task.c
index 73cc8f0..ceea035 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -12,6 +12,7 @@
 #include <linux/audit.h>
 #include <linux/numa.h>
 #include <linux/scs.h>
+#include <linux/dept.h>
 
 #include <linux/uaccess.h>
 
@@ -193,6 +194,7 @@ struct task_struct init_task
 	.curr_chain_key = INITIAL_CHAIN_KEY,
 	.lockdep_recursion = 0,
 #endif
+	.dept_task = DEPT_TASK_INITIALIZER(init_task),
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	.ret_stack		= NULL,
 	.tracing_graph_pause	= ATOMIC_INIT(0),
diff --git a/init/main.c b/init/main.c
index 98182c3..deabdd5 100644
--- a/init/main.c
+++ b/init/main.c
@@ -65,6 +65,7 @@
 #include <linux/debug_locks.h>
 #include <linux/debugobjects.h>
 #include <linux/lockdep.h>
+#include <linux/dept.h>
 #include <linux/kmemleak.h>
 #include <linux/padata.h>
 #include <linux/pid_namespace.h>
@@ -1071,6 +1072,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
 		      panic_param);
 
 	lockdep_init();
+	dept_init();
 
 	/*
 	 * Need to run this when irqs are enabled, because it wants
diff --git a/kernel/Makefile b/kernel/Makefile
index 847a82b..5de01e2 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -53,6 +53,7 @@ obj-y += rcu/
 obj-y += livepatch/
 obj-y += dma/
 obj-y += entry/
+obj-y += dependency/
 
 obj-$(CONFIG_KCMP) += kcmp.o
 obj-$(CONFIG_FREEZER) += freezer.o
diff --git a/kernel/dependency/Makefile b/kernel/dependency/Makefile
new file mode 100644
index 00000000..b5cfb8a
--- /dev/null
+++ b/kernel/dependency/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_DEPT) += dept.o
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
new file mode 100644
index 00000000..1e90284
--- /dev/null
+++ b/kernel/dependency/dept.c
@@ -0,0 +1,2633 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DEPT(DEPendency Tracker) - Runtime dependency tracker
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ *
+ * DEPT provides a general way to detect deadlock possibility in runtime
+ * and the interest is not limited to typical lock but to every
+ * syncronization primitives.
+ *
+ * The following ideas were borrowed from LOCKDEP:
+ *
+ *    1) Use a graph to track relationship between classes.
+ *    2) Prevent performance regression using hash.
+ *
+ * The following items were enhanced from LOCKDEP:
+ *
+ *    1) Cover more deadlock cases.
+ *    2) Allow muliple reports.
+ *
+ * TODO: Both LOCKDEP and DEPT should co-exist until DEPT is considered
+ * stable. Then the dependency check routine should be replaced with
+ * DEPT after. It should finally look like:
+ *
+ *
+ *
+ * As is:
+ *
+ *    LOCKDEP
+ *    +-----------------------------------------+
+ *    | Lock usage correctness check            | <-> locks
+ *    |                                         |
+ *    |                                         |
+ *    | +-------------------------------------+ |
+ *    | | Dependency check                    | |
+ *    | | (by tracking lock acquisition order)| |
+ *    | +-------------------------------------+ |
+ *    |                                         |
+ *    +-----------------------------------------+
+ *
+ *    DEPT
+ *    +-----------------------------------------+
+ *    | Dependency check                        | <-> waits/events
+ *    | (by tracking wait and event context)    |
+ *    +-----------------------------------------+
+ *
+ *
+ *
+ * To be:
+ *
+ *    LOCKDEP
+ *    +-----------------------------------------+
+ *    | Lock usage correctness check            | <-> locks
+ *    |                                         |
+ *    |                                         |
+ *    |       (Request dependency check)        |
+ *    |                    T                    |
+ *    +--------------------|--------------------+
+ *                         |
+ *    DEPT                 V
+ *    +-----------------------------------------+
+ *    | Dependency check                        | <-> waits/events
+ *    | (by tracking wait and event context)    |
+ *    +-----------------------------------------+
+ */
+
+#include <linux/sched.h>
+#include <linux/stacktrace.h>
+#include <linux/spinlock.h>
+#include <linux/kallsyms.h>
+#include <linux/hash.h>
+#include <linux/dept.h>
+#include <linux/utsname.h>
+
+static int dept_stop;
+static int dept_per_cpu_ready;
+
+#define DEPT_READY_WARN (!oops_in_progress)
+
+/*
+ * Make all operations using DEPT_WARN_ON() fail on oops_in_progress and
+ * prevent warning message.
+ */
+#define DEPT_WARN_ON_ONCE(c)						\
+	({								\
+		int __ret = 0;						\
+									\
+		if (likely(DEPT_READY_WARN))				\
+			__ret = WARN_ONCE(c, "DEPT_WARN_ON_ONCE: " #c);	\
+		__ret;							\
+	})
+
+#define DEPT_WARN_ONCE(s...)						\
+	({								\
+		if (likely(DEPT_READY_WARN))				\
+			WARN_ONCE(1, "DEPT_WARN_ONCE: " s);		\
+	})
+
+#define DEPT_WARN_ON(c)							\
+	({								\
+		int __ret = 0;						\
+									\
+		if (likely(DEPT_READY_WARN))				\
+			__ret = WARN(c, "DEPT_WARN_ON: " #c);		\
+		__ret;							\
+	})
+
+#define DEPT_WARN(s...)							\
+	({								\
+		if (likely(DEPT_READY_WARN))				\
+			WARN(1, "DEPT_WARN: " s);			\
+	})
+
+#define DEPT_STOP(s...)							\
+	({								\
+		WRITE_ONCE(dept_stop, 1);				\
+		if (likely(DEPT_READY_WARN))				\
+			WARN(1, "DEPT_STOP: " s);			\
+	})
+
+#define DEPT_INFO_ONCE(s...) pr_warn_once("DEPT_INFO_ONCE: " s)
+
+static arch_spinlock_t dept_spin = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+
+/*
+ * DEPT internal engine should be careful in using outside functions
+ * e.g. printk at reporting since that kind of usage might cause
+ * untrackable deadlock.
+ */
+static atomic_t dept_outworld = ATOMIC_INIT(0);
+
+static inline void dept_outworld_enter(void)
+{
+	atomic_inc(&dept_outworld);
+}
+
+static inline void dept_outworld_exit(void)
+{
+	atomic_dec(&dept_outworld);
+}
+
+static inline bool dept_outworld_entered(void)
+{
+	return atomic_read(&dept_outworld);
+}
+
+static inline bool dept_lock(void)
+{
+	while (!arch_spin_trylock(&dept_spin))
+		if (unlikely(dept_outworld_entered()))
+			return false;
+	return true;
+}
+
+static inline void dept_unlock(void)
+{
+	arch_spin_unlock(&dept_spin);
+}
+
+/*
+ * whether to stack-trace on every wait or every ecxt
+ */
+static bool rich_stack = true;
+
+enum bfs_ret {
+	BFS_CONTINUE,
+	BFS_CONTINUE_REV,
+	BFS_DONE,
+	BFS_SKIP,
+};
+
+static inline bool before(unsigned int a, unsigned int b)
+{
+	return (int)(a - b) < 0;
+}
+
+static inline bool valid_stack(struct dept_stack *s)
+{
+	return s && s->nr > 0;
+}
+
+static inline bool valid_class(struct dept_class *c)
+{
+	return c->key;
+}
+
+static inline void inval_class(struct dept_class *c)
+{
+	c->key = 0UL;
+}
+
+static inline struct dept_ecxt *dep_e(struct dept_dep *d)
+{
+	return d->ecxt;
+}
+
+static inline struct dept_wait *dep_w(struct dept_dep *d)
+{
+	return d->wait;
+}
+
+static inline struct dept_class *dep_fc(struct dept_dep *d)
+{
+	return dep_e(d)->class;
+}
+
+static inline struct dept_class *dep_tc(struct dept_dep *d)
+{
+	return dep_w(d)->class;
+}
+
+static inline const char *irq_str(int irq)
+{
+	if (irq == DEPT_SIRQ)
+		return "softirq";
+	if (irq == DEPT_HIRQ)
+		return "hardirq";
+	return "(unknown)";
+}
+
+static inline struct dept_task *dept_task(void)
+{
+	return &current->dept_task;
+}
+
+/*
+ * Pool
+ * =====================================================================
+ * DEPT maintains pools to provide objects in a safe way.
+ *
+ *    1) Static pool is used at the beginning of booting time.
+ *    2) Local pool is tried first before the static pool. Objects that
+ *       have been freed will be placed.
+ */
+
+enum object_t {
+#define OBJECT(id, nr) OBJECT_##id,
+	#include "dept_object.h"
+#undef  OBJECT
+	OBJECT_NR,
+};
+
+#define OBJECT(id, nr)							\
+static struct dept_##id spool_##id[nr];					\
+static DEFINE_PER_CPU(struct llist_head, lpool_##id);
+	#include "dept_object.h"
+#undef  OBJECT
+
+static struct dept_pool pool[OBJECT_NR] = {
+#define OBJECT(id, nr) {						\
+	.name = #id,							\
+	.obj_sz = sizeof(struct dept_##id),				\
+	.obj_nr = ATOMIC_INIT(nr),					\
+	.node_off = offsetof(struct dept_##id, pool_node),		\
+	.spool = spool_##id,						\
+	.lpool = &lpool_##id, },
+	#include "dept_object.h"
+#undef  OBJECT
+};
+
+/*
+ * Can use llist no matter whether CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG is
+ * enabled or not because NMI and other contexts in the same CPU never
+ * run inside of DEPT concurrently by preventing reentrance.
+ */
+static void *from_pool(enum object_t t)
+{
+	struct dept_pool *p;
+	struct llist_head *h;
+	struct llist_node *n;
+
+	/*
+	 * llist_del_first() doesn't allow concurrent access e.g.
+	 * between process and IRQ context.
+	 */
+	if (DEPT_WARN_ON(!irqs_disabled()))
+		return NULL;
+
+	p = &pool[t];
+
+	/*
+	 * Try local pool first.
+	 */
+	if (likely(dept_per_cpu_ready))
+		h = this_cpu_ptr(p->lpool);
+	else
+		h = &p->boot_pool;
+
+	n = llist_del_first(h);
+	if (n)
+		return (void *)n - p->node_off;
+
+	/*
+	 * Try static pool.
+	 */
+	if (atomic_read(&p->obj_nr) > 0) {
+		int idx = atomic_dec_return(&p->obj_nr);
+
+		if (idx >= 0)
+			return p->spool + (idx * p->obj_sz);
+	}
+
+	DEPT_INFO_ONCE("Pool(%s) is empty.\n", p->name);
+	return NULL;
+}
+
+static void to_pool(void *o, enum object_t t)
+{
+	struct dept_pool *p = &pool[t];
+	struct llist_head *h;
+
+	preempt_disable();
+	if (likely(dept_per_cpu_ready))
+		h = this_cpu_ptr(p->lpool);
+	else
+		h = &p->boot_pool;
+
+	llist_add(o + p->node_off, h);
+	preempt_enable();
+}
+
+#define OBJECT(id, nr)							\
+static void (*ctor_##id)(struct dept_##id *a);				\
+static void (*dtor_##id)(struct dept_##id *a);				\
+static inline struct dept_##id *new_##id(void)				\
+{									\
+	struct dept_##id *a;						\
+									\
+	a = (struct dept_##id *)from_pool(OBJECT_##id);			\
+	if (unlikely(!a))						\
+		return NULL;						\
+									\
+	atomic_set(&a->ref, 1);						\
+									\
+	if (ctor_##id)							\
+		ctor_##id(a);						\
+									\
+	return a;							\
+}									\
+									\
+static inline struct dept_##id *get_##id(struct dept_##id *a)		\
+{									\
+	atomic_inc(&a->ref);						\
+	return a;							\
+}									\
+									\
+static inline void put_##id(struct dept_##id *a)			\
+{									\
+	if (!atomic_dec_return(&a->ref)) {				\
+		if (dtor_##id)						\
+			dtor_##id(a);					\
+		to_pool(a, OBJECT_##id);				\
+	}								\
+}									\
+									\
+static inline void del_##id(struct dept_##id *a)			\
+{									\
+	put_##id(a);							\
+}									\
+									\
+static inline bool id##_consumed(struct dept_##id *a)			\
+{									\
+	return a && atomic_read(&a->ref) > 1;				\
+}
+#include "dept_object.h"
+#undef  OBJECT
+
+#define SET_CONSTRUCTOR(id, f) \
+static void (*ctor_##id)(struct dept_##id *a) = f
+
+static void initialize_dep(struct dept_dep *d)
+{
+	INIT_LIST_HEAD(&d->bfs_node);
+	INIT_LIST_HEAD(&d->dep_node);
+	INIT_LIST_HEAD(&d->dep_rev_node);
+}
+SET_CONSTRUCTOR(dep, initialize_dep);
+
+static void initialize_class(struct dept_class *c)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		struct dept_iecxt *ie = &c->iecxt[i];
+		struct dept_iwait *iw = &c->iwait[i];
+
+		ie->ecxt = NULL;
+		ie->enirq = i;
+		ie->staled = false;
+
+		iw->wait = NULL;
+		iw->irq = i;
+		iw->staled = false;
+		iw->touched = false;
+	}
+	c->bfs_gen = 0U;
+
+	INIT_LIST_HEAD(&c->all_node);
+	INIT_LIST_HEAD(&c->dep_head);
+	INIT_LIST_HEAD(&c->dep_rev_head);
+}
+SET_CONSTRUCTOR(class, initialize_class);
+
+static void initialize_ecxt(struct dept_ecxt *e)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		e->enirq_stack[i] = NULL;
+		e->enirq_ip[i] = 0UL;
+	}
+	e->ecxt_ip = 0UL;
+	e->ecxt_stack = NULL;
+	e->enirqf = 0UL;
+	e->event_ip = 0UL;
+	e->event_stack = NULL;
+}
+SET_CONSTRUCTOR(ecxt, initialize_ecxt);
+
+static void initialize_wait(struct dept_wait *w)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		w->irq_stack[i] = NULL;
+		w->irq_ip[i] = 0UL;
+	}
+	w->wait_ip = 0UL;
+	w->wait_stack = NULL;
+	w->irqf = 0UL;
+}
+SET_CONSTRUCTOR(wait, initialize_wait);
+
+static void initialize_stack(struct dept_stack *s)
+{
+	s->nr = 0;
+}
+SET_CONSTRUCTOR(stack, initialize_stack);
+
+#define OBJECT(id, nr) \
+static void (*ctor_##id)(struct dept_##id *a);
+	#include "dept_object.h"
+#undef  OBJECT
+
+#undef  SET_CONSTRUCTOR
+
+#define SET_DESTRUCTOR(id, f) \
+static void (*dtor_##id)(struct dept_##id *a) = f
+
+static void destroy_dep(struct dept_dep *d)
+{
+	if (dep_e(d))
+		put_ecxt(dep_e(d));
+	if (dep_w(d))
+		put_wait(dep_w(d));
+}
+SET_DESTRUCTOR(dep, destroy_dep);
+
+static void destroy_ecxt(struct dept_ecxt *e)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++)
+		if (e->enirq_stack[i])
+			put_stack(e->enirq_stack[i]);
+	if (e->class)
+		put_class(e->class);
+	if (e->ecxt_stack)
+		put_stack(e->ecxt_stack);
+	if (e->event_stack)
+		put_stack(e->event_stack);
+}
+SET_DESTRUCTOR(ecxt, destroy_ecxt);
+
+static void destroy_wait(struct dept_wait *w)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++)
+		if (w->irq_stack[i])
+			put_stack(w->irq_stack[i]);
+	if (w->class)
+		put_class(w->class);
+	if (w->wait_stack)
+		put_stack(w->wait_stack);
+}
+SET_DESTRUCTOR(wait, destroy_wait);
+
+#define OBJECT(id, nr) \
+static void (*dtor_##id)(struct dept_##id *a);
+	#include "dept_object.h"
+#undef  OBJECT
+
+#undef  SET_DESTRUCTOR
+
+/*
+ * Caching and hashing
+ * =====================================================================
+ * DEPT makes use of caching and hashing to improve performance. Each
+ * object can be obtained in O(1) with its key.
+ *
+ * NOTE: Currently we assume all the objects in the hashs will never be
+ * removed. Implement it when needed.
+ */
+
+/*
+ * Some information might be lost but it's only for hashing key.
+ */
+static inline unsigned long mix(unsigned long a, unsigned long b)
+{
+	int halfbits = sizeof(unsigned long) * 8 / 2;
+	unsigned long halfmask = (1UL << halfbits) - 1UL;
+
+	return (a << halfbits) | (b & halfmask);
+}
+
+static bool cmp_dep(struct dept_dep *d1, struct dept_dep *d2)
+{
+	return dep_fc(d1)->key == dep_fc(d2)->key &&
+	       dep_tc(d1)->key == dep_tc(d2)->key;
+}
+
+static unsigned long key_dep(struct dept_dep *d)
+{
+	return mix(dep_fc(d)->key, dep_tc(d)->key);
+}
+
+static bool cmp_class(struct dept_class *c1, struct dept_class *c2)
+{
+	return c1->key == c2->key;
+}
+
+static unsigned long key_class(struct dept_class *c)
+{
+	return c->key;
+}
+
+#define HASH(id, bits)							\
+static struct hlist_head table_##id[1UL << bits];			\
+									\
+static inline struct hlist_head *head_##id(struct dept_##id *a)		\
+{									\
+	return table_##id + hash_long(key_##id(a), bits);		\
+}									\
+									\
+static inline struct dept_##id *hash_lookup_##id(struct dept_##id *a)	\
+{									\
+	struct dept_##id *b;						\
+									\
+	hlist_for_each_entry_rcu(b, head_##id(a), hash_node)		\
+		if (cmp_##id(a, b))					\
+			return b;					\
+	return NULL;							\
+}									\
+									\
+static inline void hash_add_##id(struct dept_##id *a)			\
+{									\
+	hlist_add_head_rcu(&a->hash_node, head_##id(a));		\
+}									\
+									\
+static inline void hash_del_##id(struct dept_##id *a)			\
+{									\
+	hlist_del_rcu(&a->hash_node);					\
+}
+#include "dept_hash.h"
+#undef  HASH
+
+static inline struct dept_dep *lookup_dep(struct dept_class *fc,
+					  struct dept_class *tc)
+{
+	struct dept_ecxt onetime_e = { .class = fc };
+	struct dept_wait onetime_w = { .class = tc };
+	struct dept_dep  onetime_d = { .ecxt = &onetime_e,
+				       .wait = &onetime_w };
+	return hash_lookup_dep(&onetime_d);
+}
+
+static inline struct dept_class *lookup_class(unsigned long key)
+{
+	struct dept_class onetime_c = { .key = key };
+
+	return hash_lookup_class(&onetime_c);
+}
+
+/*
+ * Report
+ * =====================================================================
+ * DEPT prints useful information to help debuging on detection of
+ * problematic dependency.
+ */
+
+static inline void print_ip_stack(unsigned long ip, struct dept_stack *s)
+{
+	if (ip)
+		print_ip_sym(KERN_WARNING, ip);
+
+	if (valid_stack(s)) {
+		pr_warn("stacktrace:\n");
+		stack_trace_print(s->raw, s->nr, 5);
+	}
+
+	if (!ip && !valid_stack(s))
+		pr_warn("(N/A)\n");
+}
+
+#define print_spc(spc, fmt, ...)					\
+	pr_warn("%*c" fmt, (spc) * 4, ' ', ##__VA_ARGS__)
+
+static void print_diagram(struct dept_dep *d)
+{
+	struct dept_ecxt *e = dep_e(d);
+	struct dept_wait *w = dep_w(d);
+	struct dept_class *fc = dep_fc(d);
+	struct dept_class *tc = dep_tc(d);
+	unsigned long irqf;
+	int irq;
+	bool firstline = true;
+	int spc = 1;
+	const char *w_fn = w->wait_fn  ?: "(unknown)";
+	const char *e_fn = e->event_fn ?: "(unknown)";
+	const char *c_fn = e->ecxt_fn ?: "(unknown)";
+
+	irqf = e->enirqf & w->irqf;
+	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+		if (!firstline)
+			pr_warn("\nor\n\n");
+		firstline = false;
+
+		print_spc(spc, "[S] %s(%s:%d)\n", c_fn, fc->name, fc->sub);
+		print_spc(spc, "    <%s interrupt>\n", irq_str(irq));
+		print_spc(spc + 1, "[W] %s(%s:%d)\n", w_fn, tc->name, tc->sub);
+		print_spc(spc, "[E] %s(%s:%d)\n", e_fn, fc->name, fc->sub);
+	}
+
+	if (!irqf) {
+		print_spc(spc, "[S] %s(%s:%d)\n", c_fn, fc->name, fc->sub);
+		print_spc(spc, "[W] %s(%s:%d)\n", w_fn, tc->name, tc->sub);
+		print_spc(spc, "[E] %s(%s:%d)\n", e_fn, fc->name, fc->sub);
+	}
+}
+
+static void print_dep(struct dept_dep *d)
+{
+	struct dept_ecxt *e = dep_e(d);
+	struct dept_wait *w = dep_w(d);
+	struct dept_class *fc = dep_fc(d);
+	struct dept_class *tc = dep_tc(d);
+	unsigned long irqf;
+	int irq;
+	const char *w_fn = w->wait_fn  ?: "(unknown)";
+	const char *e_fn = e->event_fn ?: "(unknown)";
+	const char *c_fn = e->ecxt_fn ?: "(unknown)";
+
+	irqf = e->enirqf & w->irqf;
+	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+		pr_warn("%s has been enabled:\n", irq_str(irq));
+		print_ip_stack(e->enirq_ip[irq], e->enirq_stack[irq]);
+		pr_warn("\n");
+
+		pr_warn("[S] %s(%s:%d):\n", c_fn, fc->name, fc->sub);
+		print_ip_stack(e->ecxt_ip, e->ecxt_stack);
+		pr_warn("\n");
+
+		pr_warn("[W] %s(%s:%d) in %s context:\n",
+		       w_fn, tc->name, tc->sub, irq_str(irq));
+		print_ip_stack(w->irq_ip[irq], w->irq_stack[irq]);
+		pr_warn("\n");
+
+		pr_warn("[E] %s(%s:%d):\n", e_fn, fc->name, fc->sub);
+		print_ip_stack(e->event_ip, e->event_stack);
+	}
+
+	if (!irqf) {
+		pr_warn("[S] %s(%s:%d):\n", c_fn, fc->name, fc->sub);
+		print_ip_stack(e->ecxt_ip, e->ecxt_stack);
+		pr_warn("\n");
+
+		pr_warn("[W] %s(%s:%d):\n", w_fn, tc->name, tc->sub);
+		print_ip_stack(w->wait_ip, w->wait_stack);
+		pr_warn("\n");
+
+		pr_warn("[E] %s(%s:%d):\n", e_fn, fc->name, fc->sub);
+		print_ip_stack(e->event_ip, e->event_stack);
+	}
+}
+
+static void save_current_stack(int skip);
+
+/*
+ * Print all classes in a circle.
+ */
+static void print_circle(struct dept_class *c)
+{
+	struct dept_class *fc = c->bfs_parent;
+	struct dept_class *tc = c;
+	int i;
+
+	dept_outworld_enter();
+	save_current_stack(6);
+
+	pr_warn("===================================================\n");
+	pr_warn("DEPT: Circular dependency has been detected.\n");
+	pr_warn("%s %.*s %s\n", init_utsname()->release,
+		(int)strcspn(init_utsname()->version, " "),
+		init_utsname()->version,
+		print_tainted());
+	pr_warn("---------------------------------------------------\n");
+	pr_warn("summary\n");
+	pr_warn("---------------------------------------------------\n");
+
+	if (fc == tc)
+		pr_warn("*** AA DEADLOCK ***\n\n");
+	else
+		pr_warn("*** DEADLOCK ***\n\n");
+
+	i = 0;
+	do {
+		struct dept_dep *d = lookup_dep(fc, tc);
+
+		pr_warn("context %c\n", 'A' + (i++));
+		print_diagram(d);
+		if (fc != c)
+			pr_warn("\n");
+
+		tc = fc;
+		fc = fc->bfs_parent;
+	} while (tc != c);
+
+	pr_warn("\n");
+	pr_warn("[S]: start of the event context\n");
+	pr_warn("[W]: the wait blocked\n");
+	pr_warn("[E]: the event not reachable\n");
+
+	i = 0;
+	do {
+		struct dept_dep *d = lookup_dep(fc, tc);
+
+		pr_warn("---------------------------------------------------\n");
+		pr_warn("context %c's detail\n", 'A' + i);
+		pr_warn("---------------------------------------------------\n");
+		pr_warn("context %c\n", 'A' + (i++));
+		print_diagram(d);
+		pr_warn("\n");
+		print_dep(d);
+
+		tc = fc;
+		fc = fc->bfs_parent;
+	} while (tc != c);
+
+	pr_warn("---------------------------------------------------\n");
+	pr_warn("information that might be helpful\n");
+	pr_warn("---------------------------------------------------\n");
+	dump_stack();
+
+	dept_outworld_exit();
+}
+
+/*
+ * BFS(Breadth First Search)
+ * =====================================================================
+ * Whenever a new dependency is added into the graph, search the graph
+ * for a new circular dependency.
+ */
+
+static inline void enqueue(struct list_head *h, struct dept_dep *d)
+{
+	list_add_tail(&d->bfs_node, h);
+}
+
+static inline struct dept_dep *dequeue(struct list_head *h)
+{
+	struct dept_dep *d;
+
+	d = list_first_entry(h, struct dept_dep, bfs_node);
+	list_del(&d->bfs_node);
+	return d;
+}
+
+static inline bool empty(struct list_head *h)
+{
+	return list_empty(h);
+}
+
+static void extend_queue(struct list_head *h, struct dept_class *cur)
+{
+	struct dept_dep *d;
+
+	list_for_each_entry(d, &cur->dep_head, dep_node) {
+		struct dept_class *next = dep_tc(d);
+
+		if (cur->bfs_gen == next->bfs_gen)
+			continue;
+		next->bfs_gen = cur->bfs_gen;
+		next->bfs_dist = cur->bfs_dist + 1;
+		next->bfs_parent = cur;
+		enqueue(h, d);
+	}
+}
+
+static void extend_queue_rev(struct list_head *h, struct dept_class *cur)
+{
+	struct dept_dep *d;
+
+	list_for_each_entry(d, &cur->dep_rev_head, dep_rev_node) {
+		struct dept_class *next = dep_fc(d);
+
+		if (cur->bfs_gen == next->bfs_gen)
+			continue;
+		next->bfs_gen = cur->bfs_gen;
+		next->bfs_dist = cur->bfs_dist + 1;
+		next->bfs_parent = cur;
+		enqueue(h, d);
+	}
+}
+
+typedef enum bfs_ret bfs_f(struct dept_dep *d, void *in, void **out);
+static unsigned int bfs_gen;
+
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static void bfs(struct dept_class *c, bfs_f *cb, void *in, void **out)
+{
+	LIST_HEAD(q);
+	enum bfs_ret ret;
+
+	if (DEPT_WARN_ON(!cb))
+		return;
+
+	/*
+	 * Avoid zero bfs_gen.
+	 */
+	bfs_gen = bfs_gen + 1 ?: 1;
+
+	c->bfs_gen = bfs_gen;
+	c->bfs_dist = 0;
+	c->bfs_parent = c;
+
+	ret = cb(NULL, in, out);
+	if (ret == BFS_DONE)
+		return;
+	if (ret == BFS_SKIP)
+		return;
+	if (ret == BFS_CONTINUE)
+		extend_queue(&q, c);
+	if (ret == BFS_CONTINUE_REV)
+		extend_queue_rev(&q, c);
+
+	while (!empty(&q)) {
+		struct dept_dep *d = dequeue(&q);
+
+		ret = cb(d, in, out);
+		if (ret == BFS_DONE)
+			break;
+		if (ret == BFS_SKIP)
+			continue;
+		if (ret == BFS_CONTINUE)
+			extend_queue(&q, dep_tc(d));
+		if (ret == BFS_CONTINUE_REV)
+			extend_queue_rev(&q, dep_fc(d));
+	}
+
+	while (!empty(&q))
+		dequeue(&q);
+}
+
+/*
+ * Main operations
+ * =====================================================================
+ * Add dependencies - Each new dependency is added into the graph and
+ * checked if it forms a circular dependency.
+ *
+ * Track waits - Waits are queued into the ring buffer for later use to
+ * generate appropriate dependencies with cross-event.
+ *
+ * Track event contexts(ecxt) - Event contexts are pushed into local
+ * stack for later use to generate appropriate dependencies with waits.
+ */
+
+static inline unsigned long cur_enirqf(void);
+static inline int cur_irq(void);
+static inline unsigned int cur_ctxt_id(void);
+
+static inline struct dept_iecxt *iecxt(struct dept_class *c, int irq)
+{
+	return &c->iecxt[irq];
+}
+
+static inline struct dept_iwait *iwait(struct dept_class *c, int irq)
+{
+	return &c->iwait[irq];
+}
+
+static inline void stale_iecxt(struct dept_iecxt *ie)
+{
+	if (ie->ecxt)
+		put_ecxt(ie->ecxt);
+
+	WRITE_ONCE(ie->ecxt, NULL);
+	WRITE_ONCE(ie->staled, true);
+}
+
+static inline void set_iecxt(struct dept_iecxt *ie, struct dept_ecxt *e)
+{
+	/*
+	 * ->ecxt will never be updated once getting set until the class
+	 * gets removed.
+	 */
+	if (ie->ecxt)
+		DEPT_WARN_ON(1);
+	else
+		WRITE_ONCE(ie->ecxt, get_ecxt(e));
+}
+
+static inline void stale_iwait(struct dept_iwait *iw)
+{
+	if (iw->wait)
+		put_wait(iw->wait);
+
+	WRITE_ONCE(iw->wait, NULL);
+	WRITE_ONCE(iw->staled, true);
+}
+
+static inline void set_iwait(struct dept_iwait *iw, struct dept_wait *w)
+{
+	/*
+	 * ->wait will never be updated once getting set until the class
+	 * gets removed.
+	 */
+	if (iw->wait)
+		DEPT_WARN_ON(1);
+	else
+		WRITE_ONCE(iw->wait, get_wait(w));
+
+	iw->touched = true;
+}
+
+static inline void touch_iwait(struct dept_iwait *iw)
+{
+	iw->touched = true;
+}
+
+static inline void untouch_iwait(struct dept_iwait *iw)
+{
+	iw->touched = false;
+}
+
+static inline struct dept_stack *get_current_stack(void)
+{
+	struct dept_stack *s = dept_task()->stack;
+
+	return s ? get_stack(s) : NULL;
+}
+
+static inline void prepare_current_stack(void)
+{
+	struct dept_stack *s = dept_task()->stack;
+
+	/*
+	 * The dept_stack is already ready.
+	 */
+	if (s && !stack_consumed(s)) {
+		s->nr = 0;
+		return;
+	}
+
+	if (s)
+		put_stack(s);
+
+	s = dept_task()->stack = new_stack();
+	if (!s)
+		return;
+
+	get_stack(s);
+	del_stack(s);
+}
+
+static void save_current_stack(int skip)
+{
+	struct dept_stack *s = dept_task()->stack;
+
+	if (!s)
+		return;
+	if (valid_stack(s))
+		return;
+
+	s->nr = stack_trace_save(s->raw, DEPT_MAX_STACK_ENTRY, skip);
+}
+
+static void finish_current_stack(void)
+{
+	struct dept_stack *s = dept_task()->stack;
+
+	if (stack_consumed(s))
+		save_current_stack(2);
+}
+
+/*
+ * FIXME: For now, disable LOCKDEP while DEPT is working.
+ *
+ * Both LOCKDEP and DEPT report it on a deadlock detection using
+ * printk taking the risk of another deadlock that might be caused by
+ * locks of console or printk between inside and outside of them.
+ *
+ * For DEPT, it's no problem since multiple reports are allowed. But it
+ * would be a bad idea for LOCKDEP since it will stop even on a singe
+ * report. So we need to prevent LOCKDEP from its reporting the risk
+ * DEPT would take when reporting something.
+ */
+#include <linux/lockdep.h>
+
+void dept_off(void)
+{
+	dept_task()->recursive++;
+	lockdep_off();
+}
+
+void dept_on(void)
+{
+	dept_task()->recursive--;
+	lockdep_on();
+}
+
+static inline unsigned long dept_enter(void)
+{
+	unsigned long flags;
+
+	flags = arch_local_irq_save();
+	dept_off();
+	prepare_current_stack();
+	return flags;
+}
+
+static inline void dept_exit(unsigned long flags)
+{
+	finish_current_stack();
+	dept_on();
+	arch_local_irq_restore(flags);
+}
+
+static inline unsigned long dept_enter_recursive(void)
+{
+	unsigned long flags;
+
+	flags = arch_local_irq_save();
+	return flags;
+}
+
+static inline void dept_exit_recursive(unsigned long flags)
+{
+	arch_local_irq_restore(flags);
+}
+
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static struct dept_dep *__add_dep(struct dept_ecxt *e,
+				  struct dept_wait *w)
+{
+	struct dept_dep *d;
+
+	if (!valid_class(e->class) || !valid_class(w->class))
+		return NULL;
+
+	if (lookup_dep(e->class, w->class))
+		return NULL;
+
+	d = new_dep();
+	if (unlikely(!d))
+		return NULL;
+
+	d->ecxt = get_ecxt(e);
+	d->wait = get_wait(w);
+
+	/*
+	 * Add the dependency into hash and graph.
+	 */
+	hash_add_dep(d);
+	list_add(&d->dep_node, &dep_fc(d)->dep_head);
+	list_add(&d->dep_rev_node, &dep_tc(d)->dep_rev_head);
+	return d;
+}
+
+static enum bfs_ret cb_check_dl(struct dept_dep *d,
+				void *in, void **out)
+{
+	struct dept_dep *new = (struct dept_dep *)in;
+
+	/*
+	 * initial condition for this BFS search
+	 */
+	if (!d) {
+		dep_tc(new)->bfs_parent = dep_fc(new);
+
+		if (dep_tc(new) != dep_fc(new))
+			return BFS_CONTINUE;
+
+		/*
+		 * AA circle does not make additional deadlock. We don't
+		 * have to continue this BFS search.
+		 */
+		print_circle(dep_tc(new));
+		return BFS_DONE;
+	}
+
+	/*
+	 * Allow multiple reports.
+	 */
+	if (dep_tc(d) == dep_fc(new))
+		print_circle(dep_tc(new));
+
+	return BFS_CONTINUE;
+}
+
+/*
+ * This function is actually in charge of reporting.
+ */
+static inline void check_dl_bfs(struct dept_dep *d)
+{
+	bfs(dep_tc(d), cb_check_dl, (void *)d, NULL);
+}
+
+static enum bfs_ret cb_find_iw(struct dept_dep *d, void *in, void **out)
+{
+	int irq = *(int *)in;
+	struct dept_class *fc;
+	struct dept_iwait *iw;
+
+	if (DEPT_WARN_ON(!out))
+		return BFS_DONE;
+
+	/*
+	 * initial condition for this BFS search
+	 */
+	if (!d)
+		return BFS_CONTINUE_REV;
+
+	fc = dep_fc(d);
+	iw = iwait(fc, irq);
+
+	/*
+	 * If any parent's ->wait was set, then the children would've
+	 * been touched.
+	 */
+	if (!iw->touched)
+		return BFS_SKIP;
+
+	if (!iw->wait)
+		return BFS_CONTINUE_REV;
+
+	*out = iw;
+	return BFS_DONE;
+}
+
+static struct dept_iwait *find_iw_bfs(struct dept_class *c, int irq)
+{
+	struct dept_iwait *iw = iwait(c, irq);
+	struct dept_iwait *found = NULL;
+
+	if (iw->wait)
+		return iw;
+
+	/*
+	 * '->touched == false' guarantees there's no parent that has
+	 * been set ->wait.
+	 */
+	if (!iw->touched)
+		return NULL;
+
+	bfs(c, cb_find_iw, (void *)&irq, (void **)&found);
+
+	if (found)
+		return found;
+
+	untouch_iwait(iw);
+	return NULL;
+}
+
+static enum bfs_ret cb_touch_iw_find_ie(struct dept_dep *d, void *in,
+					void **out)
+{
+	int irq = *(int *)in;
+	struct dept_class *tc;
+	struct dept_iecxt *ie;
+	struct dept_iwait *iw;
+
+	if (DEPT_WARN_ON(!out))
+		return BFS_DONE;
+
+	/*
+	 * initial condition for this BFS search
+	 */
+	if (!d)
+		return BFS_CONTINUE;
+
+	tc = dep_tc(d);
+	ie = iecxt(tc, irq);
+	iw = iwait(tc, irq);
+
+	touch_iwait(iw);
+
+	if (!ie->ecxt)
+		return BFS_CONTINUE;
+
+	if (!*out)
+		*out = ie;
+
+	return BFS_CONTINUE;
+}
+
+static struct dept_iecxt *touch_iw_find_ie_bfs(struct dept_class *c,
+					       int irq)
+{
+	struct dept_iecxt *ie = iecxt(c, irq);
+	struct dept_iwait *iw = iwait(c, irq);
+	struct dept_iecxt *found = ie->ecxt ? ie : NULL;
+
+	touch_iwait(iw);
+	bfs(c, cb_touch_iw_find_ie, (void *)&irq, (void **)&found);
+	return found;
+}
+
+/*
+ * Should be called with dept_lock held.
+ */
+static void __add_idep(struct dept_iecxt *ie, struct dept_iwait *iw)
+{
+	struct dept_dep *new;
+
+	/*
+	 * There's nothing to do.
+	 */
+	if (!ie || !iw || !ie->ecxt || !iw->wait)
+		return;
+
+	new = __add_dep(ie->ecxt, iw->wait);
+
+	/*
+	 * Deadlock detected. Let check_dl_bfs() report it.
+	 */
+	if (new) {
+		check_dl_bfs(new);
+		stale_iecxt(ie);
+		stale_iwait(iw);
+	}
+
+	/*
+	 * If !new, it would be the case of lack of object resource.
+	 * Just let it go and get checked by other chances. Retrying is
+	 * meaningless in that case.
+	 */
+}
+
+static void set_check_iecxt(struct dept_class *c, int irq,
+			    struct dept_ecxt *e)
+{
+	struct dept_iecxt *ie = iecxt(c, irq);
+
+	set_iecxt(ie, e);
+	__add_idep(ie, find_iw_bfs(c, irq));
+}
+
+static void set_check_iwait(struct dept_class *c, int irq,
+			    struct dept_wait *w)
+{
+	struct dept_iwait *iw = iwait(c, irq);
+
+	set_iwait(iw, w);
+	__add_idep(touch_iw_find_ie_bfs(c, irq), iw);
+}
+
+static void add_iecxt(struct dept_class *c, int irq, struct dept_ecxt *e,
+		      bool stack)
+{
+	/*
+	 * This access is safe since we ensure e->class has set locally.
+	 */
+	struct dept_task *dt = dept_task();
+	struct dept_iecxt *ie = iecxt(c, irq);
+
+	if (unlikely(READ_ONCE(ie->staled)))
+		return;
+
+	/*
+	 * Skip add_iecxt() if ie->ecxt has ever been set at least once.
+	 * Which means it has a valid ->ecxt or been staled.
+	 */
+	if (READ_ONCE(ie->ecxt))
+		return;
+
+	if (unlikely(!dept_lock()))
+		return;
+
+	if (unlikely(ie->staled))
+		goto unlock;
+	if (ie->ecxt)
+		goto unlock;
+
+	e->enirqf |= (1UL << irq);
+
+	/*
+	 * Should be NULL since it's the first time that these
+	 * enirq_{ip,stack}[irq] have ever set.
+	 */
+	DEPT_WARN_ON(e->enirq_ip[irq]);
+	DEPT_WARN_ON(e->enirq_stack[irq]);
+
+	e->enirq_ip[irq] = dt->enirq_ip[irq];
+	e->enirq_stack[irq] = stack ? get_current_stack() : NULL;
+
+	set_check_iecxt(c, irq, e);
+unlock:
+	dept_unlock();
+}
+
+static void add_iwait(struct dept_class *c, int irq, struct dept_wait *w)
+{
+	struct dept_iwait *iw = iwait(c, irq);
+
+	if (unlikely(READ_ONCE(iw->staled)))
+		return;
+
+	/*
+	 * Skip add_iwait() if iw->wait has ever been set at least once.
+	 * Which means it has a valid ->wait or been staled.
+	 */
+	if (READ_ONCE(iw->wait))
+		return;
+
+	if (unlikely(!dept_lock()))
+		return;
+
+	if (unlikely(iw->staled))
+		goto unlock;
+	if (iw->wait)
+		goto unlock;
+
+	w->irqf |= (1UL << irq);
+
+	/*
+	 * Should be NULL since it's the first time that these
+	 * irq_{ip,stack}[irq] have ever set.
+	 */
+	DEPT_WARN_ON(w->irq_ip[irq]);
+	DEPT_WARN_ON(w->irq_stack[irq]);
+
+	w->irq_ip[irq] = w->wait_ip;
+	w->irq_stack[irq] = get_current_stack();
+
+	set_check_iwait(c, irq, w);
+unlock:
+	dept_unlock();
+}
+
+static inline struct dept_wait_hist *hist(int pos)
+{
+	struct dept_task *dt = dept_task();
+
+	return dt->wait_hist + (pos % DEPT_MAX_WAIT_HIST);
+}
+
+static inline int hist_pos_next(void)
+{
+	struct dept_task *dt = dept_task();
+
+	return dt->wait_hist_pos % DEPT_MAX_WAIT_HIST;
+}
+
+static inline void hist_advance(void)
+{
+	struct dept_task *dt = dept_task();
+
+	dt->wait_hist_pos++;
+	dt->wait_hist_pos %= DEPT_MAX_WAIT_HIST;
+}
+
+static inline struct dept_wait_hist *new_hist(void)
+{
+	struct dept_wait_hist *wh = hist(hist_pos_next());
+
+	hist_advance();
+	return wh;
+}
+
+static void add_hist(struct dept_wait *w, unsigned int wg, unsigned int ctxt_id)
+{
+	struct dept_wait_hist *wh = new_hist();
+
+	if (likely(wh->wait))
+		put_wait(wh->wait);
+
+	wh->wait = get_wait(w);
+	wh->wgen = wg;
+	wh->ctxt_id = ctxt_id;
+}
+
+/*
+ * Should be called after setting up e's iecxt and w's iwait.
+ */
+static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
+{
+	struct dept_class *fc = e->class;
+	struct dept_class *tc = w->class;
+	struct dept_dep *d;
+	int i;
+
+	if (lookup_dep(fc, tc))
+		return;
+
+	if (unlikely(!dept_lock()))
+		return;
+
+	/*
+	 * __add_dep() will lookup_dep() again with lock held.
+	 */
+	d = __add_dep(e, w);
+	if (d) {
+		check_dl_bfs(d);
+
+		for (i = 0; i < DEPT_IRQS_NR; i++) {
+			struct dept_iwait *fiw = iwait(fc, i);
+			struct dept_iecxt *found_ie;
+			struct dept_iwait *found_iw;
+
+			/*
+			 * '->touched == false' guarantees there's no
+			 * parent that has been set ->wait.
+			 */
+			if (!fiw->touched)
+				continue;
+
+			/*
+			 * find_iw_bfs() will untouch the iwait if
+			 * not found.
+			 */
+			found_iw = find_iw_bfs(fc, i);
+
+			if (!found_iw)
+				continue;
+
+			found_ie = touch_iw_find_ie_bfs(tc, i);
+			__add_idep(found_ie, found_iw);
+		}
+	}
+	dept_unlock();
+}
+
+static atomic_t wgen = ATOMIC_INIT(1);
+
+static void add_wait(struct dept_class *c, unsigned long ip,
+		     const char *w_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_wait *w;
+	unsigned int wg = 0U;
+	int irq;
+	int i;
+
+	w = new_wait();
+	if (unlikely(!w))
+		return;
+
+	WRITE_ONCE(w->class, get_class(c));
+	w->wait_ip = ip;
+	w->wait_fn = w_fn;
+	w->wait_stack = get_current_stack();
+
+	irq = cur_irq();
+	if (irq < DEPT_IRQS_NR)
+		add_iwait(c, irq, w);
+
+	/*
+	 * Avoid adding dependency between user aware nested ecxt and
+	 * wait.
+	 */
+	for (i = dt->ecxt_held_pos - 1; i >= 0; i--) {
+		struct dept_ecxt_held *eh;
+
+		eh = dt->ecxt_held + i;
+		if (eh->ecxt->class != c || eh->nest == ne)
+			add_dep(eh->ecxt, w);
+	}
+
+	if (!wait_consumed(w) && !rich_stack) {
+		if (w->wait_stack)
+			put_stack(w->wait_stack);
+		w->wait_stack = NULL;
+	}
+
+	/*
+	 * Avoid zero wgen.
+	 */
+	wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+	add_hist(w, wg, cur_ctxt_id());
+
+	del_wait(w);
+}
+
+static bool add_ecxt(void *obj, struct dept_class *c, unsigned long ip,
+		     const char *c_fn, const char *e_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_ecxt_held *eh;
+	struct dept_ecxt *e;
+	unsigned long irqf;
+	int irq;
+
+	if (DEPT_WARN_ON(dt->ecxt_held_pos == DEPT_MAX_ECXT_HELD))
+		return false;
+
+	e = new_ecxt();
+	if (unlikely(!e))
+		return false;
+
+	e->class = get_class(c);
+	e->ecxt_ip = ip;
+	e->ecxt_stack = ip && rich_stack ? get_current_stack() : NULL;
+	e->event_fn = e_fn;
+	e->ecxt_fn = c_fn;
+
+	eh = dt->ecxt_held + (dt->ecxt_held_pos++);
+	eh->ecxt = get_ecxt(e);
+	eh->key = (unsigned long)obj;
+	eh->wgen = atomic_read(&wgen);
+	eh->nest = ne;
+
+	irqf = cur_enirqf();
+	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR)
+		add_iecxt(c, irq, e, false);
+
+	del_ecxt(e);
+	return true;
+}
+
+static int find_ecxt_pos(unsigned long key, struct dept_class *c,
+			 bool newfirst)
+{
+	struct dept_task *dt = dept_task();
+	int i;
+
+	if (newfirst) {
+		for (i = dt->ecxt_held_pos - 1; i >= 0; i--) {
+			struct dept_ecxt_held *eh;
+
+			eh = dt->ecxt_held + i;
+			if (eh->key == key && eh->ecxt->class == c)
+				return i;
+		}
+	} else {
+		for (i = 0; i < dt->ecxt_held_pos; i++) {
+			struct dept_ecxt_held *eh;
+
+			eh = dt->ecxt_held + i;
+			if (eh->key == key && eh->ecxt->class == c)
+				return i;
+		}
+	}
+	return -1;
+}
+
+static bool pop_ecxt(void *obj, struct dept_class *c)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long key = (unsigned long)obj;
+	int pos;
+	int i;
+
+	pos = find_ecxt_pos(key, c, true);
+	if (pos == -1)
+		return false;
+
+	put_ecxt(dt->ecxt_held[pos].ecxt);
+	dt->ecxt_held_pos--;
+
+	for (i = pos; i < dt->ecxt_held_pos; i++)
+		dt->ecxt_held[i] = dt->ecxt_held[i + 1];
+	return true;
+}
+
+static inline bool good_hist(struct dept_wait_hist *wh, unsigned int wg)
+{
+	return wh->wait != NULL && before(wg, wh->wgen);
+}
+
+/*
+ * Binary-search the ring buffer for the earliest valid wait.
+ */
+static int find_hist_pos(unsigned int wg)
+{
+	int oldest;
+	int l;
+	int r;
+	int pos;
+
+	oldest = hist_pos_next();
+	if (unlikely(good_hist(hist(oldest), wg))) {
+		DEPT_INFO_ONCE("Need to expand the ring buffer.\n");
+		return oldest;
+	}
+
+	l = oldest + 1;
+	r = oldest + DEPT_MAX_WAIT_HIST - 1;
+	for (pos = (l + r) / 2; l <= r; pos = (l + r) / 2) {
+		struct dept_wait_hist *p = hist(pos - 1);
+		struct dept_wait_hist *wh = hist(pos);
+
+		if (!good_hist(p, wg) && good_hist(wh, wg))
+			return pos % DEPT_MAX_WAIT_HIST;
+		if (good_hist(wh, wg))
+			r = pos - 1;
+		else
+			l = pos + 1;
+	}
+	return -1;
+}
+
+static void do_event(void *obj, struct dept_class *c, unsigned int wg,
+		     unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_wait_hist *wh;
+	struct dept_ecxt_held *eh;
+	unsigned long key = (unsigned long)obj;
+	unsigned int ctxt_id;
+	int end;
+	int pos;
+	int i;
+
+	/*
+	 * The event was triggered before wait.
+	 */
+	if (!wg)
+		return;
+
+	pos = find_ecxt_pos(key, c, false);
+	if (pos == -1)
+		return;
+
+	eh = dt->ecxt_held + pos;
+	eh->ecxt->event_ip = ip;
+	eh->ecxt->event_stack = get_current_stack();
+
+	/*
+	 * The ecxt already has done what it needs.
+	 */
+	if (!before(wg, eh->wgen))
+		return;
+
+	pos = find_hist_pos(wg);
+	if (pos == -1)
+		return;
+
+	ctxt_id = cur_ctxt_id();
+	end = hist_pos_next();
+	end = end > pos ? end : end + DEPT_MAX_WAIT_HIST;
+	for (wh = hist(pos); pos < end; wh = hist(++pos)) {
+		if (wh->ctxt_id == ctxt_id)
+			add_dep(eh->ecxt, wh->wait);
+		if (!before(wh->wgen, eh->wgen))
+			break;
+	}
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		struct dept_ecxt *e;
+
+		if (before(dt->wgen_enirq[i], wg))
+			continue;
+
+		e = eh->ecxt;
+		add_iecxt(e->class, i, e, false);
+	}
+}
+
+static void del_dep_rcu(struct rcu_head *rh)
+{
+	struct dept_dep *d = container_of(rh, struct dept_dep, rh);
+
+	preempt_disable();
+	del_dep(d);
+	preempt_enable();
+}
+
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static void disconnect_class(struct dept_class *c)
+{
+	struct dept_dep *d, *n;
+	int i;
+
+	list_for_each_entry_safe(d, n, &c->dep_head, dep_node) {
+		list_del_rcu(&d->dep_node);
+		list_del_rcu(&d->dep_rev_node);
+		hash_del_dep(d);
+		call_rcu(&d->rh, del_dep_rcu);
+	}
+
+	list_for_each_entry_safe(d, n, &c->dep_rev_head, dep_rev_node) {
+		list_del_rcu(&d->dep_node);
+		list_del_rcu(&d->dep_rev_node);
+		hash_del_dep(d);
+		call_rcu(&d->rh, del_dep_rcu);
+	}
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		stale_iecxt(iecxt(c, i));
+		stale_iwait(iwait(c, i));
+	}
+}
+
+/*
+ * IRQ context control
+ * =====================================================================
+ * Whether a wait is in {hard,soft}-IRQ context or whether
+ * {hard,soft}-IRQ has been enabled on the way to an event is very
+ * important to check dependency. All those things should be tracked.
+ */
+
+static inline unsigned long cur_enirqf(void)
+{
+	struct dept_task *dt = dept_task();
+	int he = dt->hardirqs_enabled;
+	int se = dt->softirqs_enabled;
+
+	if (he)
+		return DEPT_HIRQF | (se ? DEPT_SIRQF : 0UL);
+	return 0UL;
+}
+
+static inline int cur_irq(void)
+{
+	if (lockdep_softirq_context(current))
+		return DEPT_SIRQ;
+	if (lockdep_hardirq_context())
+		return DEPT_HIRQ;
+	return DEPT_IRQS_NR;
+}
+
+static inline unsigned int cur_ctxt_id(void)
+{
+	struct dept_task *dt = dept_task();
+	int irq = cur_irq();
+
+	/*
+	 * Normal process context
+	 */
+	if (irq == DEPT_IRQS_NR)
+		return 0U;
+
+	return dt->irq_id[irq] | (1UL << irq);
+}
+
+static void enirq_transition(int irq)
+{
+	struct dept_task *dt = dept_task();
+	int i;
+
+	/*
+	 * READ wgen >= wgen of an event with IRQ enabled has been
+	 * observed on the way to the event means, the IRQ can cut in
+	 * within the ecxt. Used for cross-event detection.
+	 *
+	 *    wait context	event context(ecxt)
+	 *    ------------	-------------------
+	 *    wait event
+	 *       WRITE wgen
+	 *			observe IRQ enabled
+	 *			   READ wgen
+	 *			   keep the wgen locally
+	 *
+	 *			on the event
+	 *			   check the local wgen
+	 */
+	dt->wgen_enirq[irq] = atomic_read(&wgen);
+
+	for (i = dt->ecxt_held_pos - 1; i >= 0; i--) {
+		struct dept_ecxt_held *eh;
+		struct dept_ecxt *e;
+
+		eh = dt->ecxt_held + i;
+		e = eh->ecxt;
+		add_iecxt(e->class, irq, e, true);
+	}
+}
+
+static void enirq_update(unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long irqf;
+	unsigned long prev;
+	int irq;
+
+	prev = dt->eff_enirqf;
+	irqf = cur_enirqf();
+	dt->eff_enirqf = irqf;
+
+	/*
+	 * Do enirq_transition() only on an OFF -> ON transition.
+	 */
+	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+		if (prev & (1UL << irq))
+			continue;
+
+		dt->enirq_ip[irq] = ip;
+		enirq_transition(irq);
+	}
+}
+
+void dept_aware_softirqs_enable(void)
+{
+	dept_task()->softirqs_enabled = true;
+}
+
+void dept_aware_softirqs_disable(void)
+{
+	dept_task()->softirqs_enabled = false;
+}
+
+void dept_aware_hardirqs_enable(void)
+{
+	dept_task()->hardirqs_enabled = true;
+}
+EXPORT_SYMBOL_GPL(dept_aware_hardirqs_enable);
+
+void dept_aware_hardirqs_disable(void)
+{
+	dept_task()->hardirqs_enabled = false;
+}
+EXPORT_SYMBOL_GPL(dept_aware_hardirqs_disable);
+
+/*
+ * Ensure it has been called on ON/OFF transition.
+ */
+void dept_enirq_transition(unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	/*
+	 * IRQ ON/OFF transition might happen while Dept is working.
+	 * We cannot handle recursive entrance. Just ingnore it.
+	 * Only transitions outside of Dept will be considered.
+	 */
+	if (dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	enirq_update(ip);
+
+	dept_exit(flags);
+}
+
+/*
+ * Ensure it's the outmost softirq context.
+ */
+void dept_softirq_enter(void)
+{
+	struct dept_task *dt = dept_task();
+
+	dt->irq_id[DEPT_SIRQ] += (1UL << DEPT_IRQS_NR);
+}
+
+/*
+ * Ensure it's the outmost hardirq context.
+ */
+void dept_hardirq_enter(void)
+{
+	struct dept_task *dt = dept_task();
+
+	dt->irq_id[DEPT_HIRQ] += (1UL << DEPT_IRQS_NR);
+}
+
+/*
+ * DEPT API
+ * =====================================================================
+ * Main DEPT APIs.
+ */
+
+static inline void clean_classes_cache(struct dept_key *k)
+{
+	int i;
+
+	for (i = 0; i < DEPT_MAX_SUBCLASSES_CACHE; i++)
+		k->classes[i] = NULL;
+}
+
+void dept_map_init(struct dept_map *m, struct dept_key *k, int sub,
+		   const char *n)
+{
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (sub < 0 || sub >= DEPT_MAX_SUBCLASSES_USR) {
+		m->nocheck = true;
+		return;
+	}
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	clean_classes_cache(&m->keys_local);
+
+	m->sub_usr = sub;
+	m->keys = k;
+	m->name = n;
+	m->wgen = 0U;
+	m->nocheck = false;
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_map_init);
+
+void dept_map_reinit(struct dept_map *m)
+{
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (m->nocheck)
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	clean_classes_cache(&m->keys_local);
+	m->wgen = 0U;
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_map_reinit);
+
+void dept_map_nocheck(struct dept_map *m)
+{
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	m->nocheck = true;
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_map_nocheck);
+
+static LIST_HEAD(classes);
+
+static inline bool within(const void *addr, void *start, unsigned long size)
+{
+	return addr >= start && addr < start + size;
+}
+
+void dept_free_range(void *start, unsigned int sz)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_class *c, *n;
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		DEPT_STOP("Should successfully free the objects.\n");
+		return;
+	}
+
+	flags = dept_enter();
+
+	/*
+	 * dept_free_range() should not fail.
+	 *
+	 * FIXME: Should be fixed if dept_free_range() causes deadlock
+	 * with dept_lock().
+	 */
+	while (unlikely(!dept_lock()))
+		cpu_relax();
+
+	list_for_each_entry_safe(c, n, &classes, all_node) {
+		if (!within((void *)c->key, start, sz) &&
+		    !within(c->name, start, sz))
+			continue;
+
+		hash_del_class(c);
+		disconnect_class(c);
+		list_del(&c->all_node);
+		inval_class(c);
+
+		/*
+		 * Actual deletion will happen on the rcu callback
+		 * that has been added in disconnect_class().
+		 */
+		del_class(c);
+	}
+	dept_unlock();
+	dept_exit(flags);
+
+	/*
+	 * Wait until even lockless hash_lookup_class() for the class
+	 * returns NULL.
+	 */
+	might_sleep();
+	synchronize_rcu();
+}
+
+static inline int map_sub(struct dept_map *m, int e)
+{
+	return m->sub_usr + e * DEPT_MAX_SUBCLASSES_USR;
+}
+
+static struct dept_class *check_new_class(struct dept_key *local,
+					  struct dept_key *k, int sub,
+					  const char *n)
+{
+	struct dept_class *c = NULL;
+
+	if (DEPT_WARN_ON(sub >= DEPT_MAX_SUBCLASSES))
+		return NULL;
+
+	if (DEPT_WARN_ON(!k))
+		return NULL;
+
+	if (sub < DEPT_MAX_SUBCLASSES_CACHE)
+		c = READ_ONCE(local->classes[sub]);
+
+	if (c)
+		return c;
+
+	c = lookup_class((unsigned long)k->subkeys + sub);
+	if (c)
+		goto caching;
+
+	if (unlikely(!dept_lock()))
+		return NULL;
+
+	c = lookup_class((unsigned long)k->subkeys + sub);
+	if (unlikely(c))
+		goto unlock;
+
+	c = new_class();
+	if (unlikely(!c))
+		goto unlock;
+
+	c->name = n;
+	c->sub = sub;
+	c->key = (unsigned long)(k->subkeys + sub);
+	hash_add_class(c);
+	list_add(&c->all_node, &classes);
+unlock:
+	dept_unlock();
+caching:
+	/*
+	 * Should be cached even if c == NULL, to reflect the case that
+	 * the class has been deleted.
+	 */
+	if (sub < DEPT_MAX_SUBCLASSES_CACHE)
+		WRITE_ONCE(local->classes[sub], c);
+
+	return c;
+}
+
+static void __dept_wait(struct dept_map *m, unsigned long w_f,
+			unsigned long ip, const char *w_fn, int ne)
+{
+	int e;
+
+	/*
+	 * Be as conservative as possible. In case of mulitple waits for
+	 * a single dept_map, we are going to keep only the last wait's
+	 * wgen for simplicity - keeping all wgens seems overengineering.
+	 *
+	 * Of course, it might cause missing some dependencies that
+	 * would rarely, probabily never, happen but it helps avoid
+	 * false positive report.
+	 */
+	for_each_set_bit(e, &w_f, DEPT_MAX_SUBCLASSES_EVT) {
+		struct dept_class *c;
+		struct dept_key *k;
+
+		k = m->keys ?: &m->keys_local;
+		c = check_new_class(&m->keys_local, k,
+				    map_sub(m, e), m->name);
+		if (!c)
+			continue;
+
+		add_wait(c, ip, w_fn, ne);
+	}
+}
+
+void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip,
+	       const char *w_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive)
+		return;
+
+	if (m->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	__dept_wait(m, w_f, ip, w_fn, ne);
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_wait);
+
+static inline void stage_map(struct dept_task *dt, struct dept_map *m)
+{
+	dt->stage_m = m;
+}
+
+static inline void unstage_map(struct dept_task *dt)
+{
+	dt->stage_m = NULL;
+}
+
+static inline struct dept_map *staged_map(struct dept_task *dt)
+{
+	return dt->stage_m;
+}
+
+void dept_stage_wait(struct dept_map *m, unsigned long w_f,
+		     const char *w_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (m->nocheck)
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	stage_map(dt, m);
+
+	dt->stage_w_f = w_f;
+	dt->stage_w_fn = w_fn;
+	dt->stage_ne = ne;
+
+	/*
+	 * Disable the map just in case real sleep won't happen. This
+	 * will be enabled at dept_ask_event_wait_commit().
+	 */
+	WRITE_ONCE(m->wgen, 0U);
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_stage_wait);
+
+void dept_clean_stage(void)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	unstage_map(dt);
+
+	dt->stage_w_f = 0UL;
+	dt->stage_w_fn = NULL;
+	dt->stage_ne = 0;
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_clean_stage);
+
+/*
+ * Always called from __schedule().
+ */
+void dept_ask_event_wait_commit(unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	unsigned int wg;
+	struct dept_map *m;
+	unsigned long w_f;
+	const char *w_fn;
+	int ne;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		flags = dept_enter_recursive();
+
+		/*
+		 * Dept won't work with this map even though anyway an
+		 * event context has been just asked. Don't make it
+		 * confused at that time handling the event. Disable it
+		 * until the next real case.
+		 */
+		m = staged_map(dt);
+		if (m)
+			WRITE_ONCE(m->wgen, 0U);
+
+		dept_exit_recursive(flags);
+		return;
+	}
+
+	flags = dept_enter();
+
+	m = staged_map(dt);
+
+	/*
+	 * Checks if current has staged a wait before __schedule().
+	 */
+	if (!m)
+		goto exit;
+
+	if (m->nocheck)
+		goto exit;
+
+	w_f = dt->stage_w_f;
+	w_fn = dt->stage_w_fn;
+	ne = dt->stage_ne;
+
+	/*
+	 * Avoid zero wgen.
+	 */
+	wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+	WRITE_ONCE(m->wgen, wg);
+
+	__dept_wait(m, w_f, ip, w_fn, ne);
+exit:
+	dept_exit(flags);
+}
+
+void dept_ecxt_enter(struct dept_map *m, unsigned long e_f, unsigned long ip,
+		     const char *c_fn, const char *e_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	struct dept_class *c;
+	struct dept_key *k;
+	int e;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		dt->missing_ecxt++;
+		return;
+	}
+
+	if (m->nocheck)
+		return;
+
+	flags = dept_enter();
+
+        e = find_first_bit(&e_f, DEPT_MAX_SUBCLASSES_EVT);
+
+	if (e >= DEPT_MAX_SUBCLASSES_EVT)
+		goto missing_ecxt;
+
+	/*
+	 * The caller passed more than single event? Warn it so that the
+	 * caller code can be fixed, and handle the event corresponding
+	 * to the first bit anyway.
+	 */
+	DEPT_WARN_ON(1UL << e != e_f);
+
+	k = m->keys ?: &m->keys_local;
+	c = check_new_class(&m->keys_local, k, map_sub(m, e), m->name);
+
+	if (c && add_ecxt((void *)m, c, ip, c_fn, e_fn, ne))
+		goto exit;
+missing_ecxt:
+	dt->missing_ecxt++;
+exit:
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_ecxt_enter);
+
+void dept_ask_event(struct dept_map *m)
+{
+	unsigned long flags;
+	unsigned int wg;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (m->nocheck)
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	/*
+	 * Avoid zero wgen.
+	 */
+	wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+	WRITE_ONCE(m->wgen, wg);
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_ask_event);
+
+void dept_event(struct dept_map *m, unsigned long e_f, unsigned long ip,
+		const char *e_fn)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	struct dept_class *c;
+	struct dept_key *k;
+	int e;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		/*
+		 * Dept won't work with this map even though anyway an
+		 * event has been just triggered. Don't make it confused
+		 * at that time handling the next event. Disable it
+		 * until the next real case.
+		 */
+		WRITE_ONCE(m->wgen, 0U);
+		return;
+	}
+
+	if (m->nocheck)
+		return;
+
+	flags = dept_enter();
+
+        e = find_first_bit(&e_f, DEPT_MAX_SUBCLASSES_EVT);
+
+	if (DEPT_WARN_ON(e >= DEPT_MAX_SUBCLASSES_EVT))
+		goto exit;
+
+	/*
+	 * The caller passed more than single event? Warn it so that the
+	 * caller can be fixed, and handle the event corresponding to
+	 * the first bit anyway.
+	 */
+	DEPT_WARN_ON(1UL << e != e_f);
+
+	k = m->keys ?: &m->keys_local;
+	c = check_new_class(&m->keys_local, k, map_sub(m, e), m->name);
+
+	if (c && add_ecxt((void *)m, c, 0UL, NULL, e_fn, 0)) {
+		do_event((void *)m, c, READ_ONCE(m->wgen), ip);
+		pop_ecxt((void *)m, c);
+	}
+exit:
+	/*
+	 * Keep the map diabled until the next sleep.
+	 */
+	WRITE_ONCE(m->wgen, 0U);
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_event);
+
+void dept_ecxt_exit(struct dept_map *m, unsigned long e_f,
+		    unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	struct dept_class *c;
+	struct dept_key *k;
+	int e;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		dt->missing_ecxt--;
+		return;
+	}
+
+	if (m->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	e = find_first_bit(&e_f, DEPT_MAX_SUBCLASSES_EVT);
+
+	if (e >= DEPT_MAX_SUBCLASSES_EVT)
+		goto missing_ecxt;
+
+	/*
+	 * The caller passed more than single event? Warn it so that the
+	 * caller can be fixed, and handle the event corresponding to
+	 * the first bit anyway.
+	 */
+	DEPT_WARN_ON(1UL << e != e_f);
+
+	k = m->keys ?: &m->keys_local;
+	c = check_new_class(&m->keys_local, k, map_sub(m, e), m->name);
+
+	if (c && pop_ecxt((void *)m, c))
+		goto exit;
+missing_ecxt:
+	dt->missing_ecxt--;
+exit:
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_ecxt_exit);
+
+void dept_task_exit(struct task_struct *t)
+{
+	struct dept_task *dt = &t->dept_task;
+	int i;
+
+	raw_local_irq_disable();
+
+	if (dt->stack)
+		put_stack(dt->stack);
+
+	for (i = 0; i < dt->ecxt_held_pos; i++)
+		put_ecxt(dt->ecxt_held[i].ecxt);
+
+	for (i = 0; i < DEPT_MAX_WAIT_HIST; i++)
+		if (dt->wait_hist[i].wait)
+			put_wait(dt->wait_hist[i].wait);
+
+	dept_off();
+
+	raw_local_irq_enable();
+}
+
+void dept_task_init(struct task_struct *t)
+{
+	memset(&t->dept_task, 0x0, sizeof(struct dept_task));
+}
+
+void dept_key_init(struct dept_key *k)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	int sub;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		DEPT_STOP("Key initialization fails.\n");
+		return;
+	}
+
+	flags = dept_enter();
+
+	/*
+	 * dept_key_init() should not fail.
+	 *
+	 * FIXME: Should be fixed if dept_key_init() causes deadlock
+	 * with dept_lock().
+	 */
+	while (unlikely(!dept_lock()))
+		cpu_relax();
+
+	for (sub = 0; sub < DEPT_MAX_SUBCLASSES; sub++) {
+		struct dept_class *c;
+
+		c = lookup_class((unsigned long)k->subkeys + sub);
+		if (!c)
+			continue;
+
+		DEPT_STOP("The class(%s/%d) has not been removed.\n",
+			  c->name, sub);
+		break;
+	}
+
+	dept_unlock();
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_key_init);
+
+void dept_key_destroy(struct dept_key *k)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	int sub;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		DEPT_STOP("Key destroying fails.\n");
+		return;
+	}
+
+	flags = dept_enter();
+
+	/*
+	 * dept_key_destroy() should not fail.
+	 *
+	 * FIXME: Should be fixed if dept_key_destroy() causes deadlock
+	 * with dept_lock().
+	 */
+	while (unlikely(!dept_lock()))
+		cpu_relax();
+
+	for (sub = 0; sub < DEPT_MAX_SUBCLASSES; sub++) {
+		struct dept_class *c;
+
+		c = lookup_class((unsigned long)k->subkeys + sub);
+		if (!c)
+			continue;
+
+		hash_del_class(c);
+		disconnect_class(c);
+		list_del(&c->all_node);
+		inval_class(c);
+
+		/*
+		 * Actual deletion will happen on the rcu callback
+		 * that has been added in disconnect_class().
+		 */
+		del_class(c);
+	}
+
+	dept_unlock();
+	dept_exit(flags);
+
+	/*
+	 * Wait until even lockless hash_lookup_class() for the class
+	 * returns NULL.
+	 */
+	might_sleep();
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(dept_key_destroy);
+
+static void move_llist(struct llist_head *to, struct llist_head *from)
+{
+	struct llist_node *first = llist_del_all(from);
+	struct llist_node *last;
+
+	if (!first)
+		return;
+
+	for (last = first; last->next; last = last->next);
+	llist_add_batch(first, last, to);
+}
+
+static void migrate_per_cpu_pool(void)
+{
+	const int boot_cpu = 0;
+	int i;
+
+	/*
+	 * The boot CPU has been using the temperal local pool so far.
+	 * From now on that per_cpu areas have been ready, use the
+	 * per_cpu local pool instead.
+	 */
+	DEPT_WARN_ON(smp_processor_id() != boot_cpu);
+	for (i = 0; i < OBJECT_NR; i++) {
+		struct llist_head *from;
+		struct llist_head *to;
+
+		from = &pool[i].boot_pool;
+		to = per_cpu_ptr(pool[i].lpool, boot_cpu);
+		move_llist(to, from);
+	}
+}
+
+#define B2KB(B) ((B) / 1024)
+
+/*
+ * Should be called after setup_per_cpu_areas() and before no non-boot
+ * CPUs have been on.
+ */
+void __init dept_init(void)
+{
+	size_t mem_total = 0;
+
+	local_irq_disable();
+	dept_per_cpu_ready = 1;
+	migrate_per_cpu_pool();
+	local_irq_enable();
+
+#define OBJECT(id, nr) mem_total += sizeof(struct dept_##id) * nr;
+	#include "dept_object.h"
+#undef  OBJECT
+#define HASH(id, bits) mem_total += sizeof(struct hlist_head) * (1UL << bits);
+	#include "dept_hash.h"
+#undef  HASH
+
+	pr_info("DEPendency Tracker: Copyright (c) 2020 LG Electronics, Inc., Byungchul Park\n");
+	pr_info("... DEPT_MAX_STACK_ENTRY: %d\n", DEPT_MAX_STACK_ENTRY);
+	pr_info("... DEPT_MAX_WAIT_HIST  : %d\n", DEPT_MAX_WAIT_HIST);
+	pr_info("... DEPT_MAX_ECXT_HELD  : %d\n", DEPT_MAX_ECXT_HELD);
+	pr_info("... DEPT_MAX_SUBCLASSES : %d\n", DEPT_MAX_SUBCLASSES);
+#define OBJECT(id, nr)							\
+	pr_info("... memory used by %s: %zu KB\n",			\
+	       #id, B2KB(sizeof(struct dept_##id) * nr));
+	#include "dept_object.h"
+#undef  OBJECT
+#define HASH(id, bits)							\
+	pr_info("... hash list head used by %s: %zu KB\n",		\
+	       #id, B2KB(sizeof(struct hlist_head) * (1UL << bits)));
+	#include "dept_hash.h"
+#undef  HASH
+	pr_info("... total memory used by objects and hashs: %zu KB\n", B2KB(mem_total));
+	pr_info("... per task memory footprint: %zu bytes\n", sizeof(struct dept_task));
+}
diff --git a/kernel/dependency/dept_hash.h b/kernel/dependency/dept_hash.h
new file mode 100644
index 00000000..fd85aab
--- /dev/null
+++ b/kernel/dependency/dept_hash.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * HASH(id, bits)
+ *
+ * id  : Id for the object of struct dept_##id.
+ * bits: 1UL << bits is the hash table size.
+ */
+
+HASH(dep, 12)
+HASH(class, 12)
diff --git a/kernel/dependency/dept_object.h b/kernel/dependency/dept_object.h
new file mode 100644
index 00000000..0b7eb16
--- /dev/null
+++ b/kernel/dependency/dept_object.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * OBJECT(id, nr)
+ *
+ * id: Id for the object of struct dept_##id.
+ * nr: # of the object that should be kept in the pool.
+ */
+
+OBJECT(dep, 1024 * 8)
+OBJECT(class, 1024 * 8)
+OBJECT(stack, 1024 * 32)
+OBJECT(ecxt, 1024 * 16)
+OBJECT(wait, 1024 * 32)
diff --git a/kernel/exit.c b/kernel/exit.c
index f072959..bac41ee 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -844,6 +844,7 @@ void __noreturn do_exit(long code)
 	exit_tasks_rcu_finish();
 
 	lockdep_free_task(tsk);
+	dept_task_exit(tsk);
 	do_task_dead();
 }
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897..68f7154 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -98,6 +98,7 @@
 #include <linux/io_uring.h>
 #include <linux/bpf.h>
 #include <linux/sched/mm.h>
+#include <linux/dept.h>
 
 #include <asm/pgalloc.h>
 #include <linux/uaccess.h>
@@ -2187,6 +2188,7 @@ static __latent_entropy struct task_struct *copy_process(
 #ifdef CONFIG_LOCKDEP
 	lockdep_init_task(p);
 #endif
+	dept_task_init(p);
 
 #ifdef CONFIG_DEBUG_MUTEXES
 	p->blocked_on = NULL; /* not blocked yet */
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index c06cab6..2175f9c 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1182,7 +1182,7 @@ static inline struct hlist_head *keyhashentry(const struct lock_class_key *key)
 }
 
 /* Register a dynamically allocated key. */
-void lockdep_register_key(struct lock_class_key *key)
+void __lockdep_register_key(struct lock_class_key *key)
 {
 	struct hlist_head *hash_head;
 	struct lock_class_key *k;
@@ -1205,7 +1205,7 @@ void lockdep_register_key(struct lock_class_key *key)
 restore_irqs:
 	raw_local_irq_restore(flags);
 }
-EXPORT_SYMBOL_GPL(lockdep_register_key);
+EXPORT_SYMBOL_GPL(__lockdep_register_key);
 
 /* Check whether a key has been registered as a dynamic key. */
 static bool is_dynamic_key(const struct lock_class_key *key)
@@ -4243,7 +4243,7 @@ static void __trace_hardirqs_on_caller(void)
  * stops watching. After the RCU transition lockdep_hardirqs_on() has to be
  * invoked to set the final state.
  */
-void lockdep_hardirqs_on_prepare(unsigned long ip)
+void __lockdep_hardirqs_on_prepare(unsigned long ip)
 {
 	if (unlikely(!debug_locks))
 		return;
@@ -4294,9 +4294,9 @@ void lockdep_hardirqs_on_prepare(unsigned long ip)
 	__trace_hardirqs_on_caller();
 	lockdep_recursion_finish();
 }
-EXPORT_SYMBOL_GPL(lockdep_hardirqs_on_prepare);
+EXPORT_SYMBOL_GPL(__lockdep_hardirqs_on_prepare);
 
-void noinstr lockdep_hardirqs_on(unsigned long ip)
+void noinstr __lockdep_hardirqs_on(unsigned long ip)
 {
 	struct irqtrace_events *trace = &current->irqtrace;
 
@@ -4358,12 +4358,12 @@ void noinstr lockdep_hardirqs_on(unsigned long ip)
 	trace->hardirq_enable_event = ++trace->irq_events;
 	debug_atomic_inc(hardirqs_on_events);
 }
-EXPORT_SYMBOL_GPL(lockdep_hardirqs_on);
+EXPORT_SYMBOL_GPL(__lockdep_hardirqs_on);
 
 /*
  * Hardirqs were disabled:
  */
-void noinstr lockdep_hardirqs_off(unsigned long ip)
+void noinstr __lockdep_hardirqs_off(unsigned long ip)
 {
 	if (unlikely(!debug_locks))
 		return;
@@ -4400,12 +4400,12 @@ void noinstr lockdep_hardirqs_off(unsigned long ip)
 		debug_atomic_inc(redundant_hardirqs_off);
 	}
 }
-EXPORT_SYMBOL_GPL(lockdep_hardirqs_off);
+EXPORT_SYMBOL_GPL(__lockdep_hardirqs_off);
 
 /*
  * Softirqs will be enabled:
  */
-void lockdep_softirqs_on(unsigned long ip)
+void __lockdep_softirqs_on(unsigned long ip)
 {
 	struct irqtrace_events *trace = &current->irqtrace;
 
@@ -4445,7 +4445,7 @@ void lockdep_softirqs_on(unsigned long ip)
 /*
  * Softirqs were disabled:
  */
-void lockdep_softirqs_off(unsigned long ip)
+void __lockdep_softirqs_off(unsigned long ip)
 {
 	if (unlikely(!lockdep_enabled()))
 		return;
@@ -4773,7 +4773,7 @@ static inline int check_wait_context(struct task_struct *curr,
 /*
  * Initialize a lock instance's lock-class mapping info:
  */
-void lockdep_init_map_type(struct lockdep_map *lock, const char *name,
+void __lockdep_init_map_type(struct lockdep_map *lock, const char *name,
 			    struct lock_class_key *key, int subclass,
 			    u8 inner, u8 outer, u8 lock_type)
 {
@@ -4833,7 +4833,7 @@ void lockdep_init_map_type(struct lockdep_map *lock, const char *name,
 		raw_local_irq_restore(flags);
 	}
 }
-EXPORT_SYMBOL_GPL(lockdep_init_map_type);
+EXPORT_SYMBOL_GPL(__lockdep_init_map_type);
 
 struct lock_class_key __lockdep_no_validate__;
 EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
@@ -6298,7 +6298,7 @@ void lockdep_reset_lock(struct lockdep_map *lock)
  * key irrespective of debug_locks to avoid potential invalid access to freed
  * memory in lock_class entry.
  */
-void lockdep_unregister_key(struct lock_class_key *key)
+void __lockdep_unregister_key(struct lock_class_key *key)
 {
 	struct hlist_head *hash_head = keyhashentry(key);
 	struct lock_class_key *k;
@@ -6333,7 +6333,7 @@ void lockdep_unregister_key(struct lock_class_key *key)
 	/* Wait until is_dynamic_key() has finished accessing k->hash_entry. */
 	synchronize_rcu();
 }
-EXPORT_SYMBOL_GPL(lockdep_unregister_key);
+EXPORT_SYMBOL_GPL(__lockdep_unregister_key);
 
 void __init lockdep_init(void)
 {
diff --git a/kernel/module.c b/kernel/module.c
index 6cea788..0953f99 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2205,6 +2205,7 @@ static void free_module(struct module *mod)
 
 	/* Free lock-classes; relies on the preceding sync_rcu(). */
 	lockdep_free_key_range(mod->core_layout.base, mod->core_layout.size);
+	dept_free_range(mod->core_layout.base, mod->core_layout.size);
 
 	/* Finally, free the core (containing the module structure) */
 	module_memfree(mod->core_layout.base);
@@ -4159,6 +4160,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
  free_module:
 	/* Free lock-classes; relies on the preceding sync_rcu() */
 	lockdep_free_key_range(mod->core_layout.base, mod->core_layout.size);
+	dept_free_range(mod->core_layout.base, mod->core_layout.size);
 
 	module_deallocate(mod, info);
  free_copy:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 51efaab..5784b07 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6285,6 +6285,14 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	rcu_note_context_switch(!!sched_mode);
 
 	/*
+	 * Skip the commit if the current task does not actually go to
+	 * sleep.
+	 */
+	if (READ_ONCE(prev->__state) & TASK_NORMAL &&
+	    sched_mode == SM_NONE)
+		dept_ask_event_wait_commit(_RET_IP_);
+
+	/*
 	 * Make sure that signal_pending_state()->signal_pending() below
 	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
 	 * done by the caller to avoid the race with signal_wake_up():
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 075cd25..3c17507 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1261,6 +1261,33 @@ config DEBUG_PREEMPT
 
 menu "Lock Debugging (spinlocks, mutexes, etc...)"
 
+config DEPT
+	bool "Dependency tracking"
+	depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
+	select DEBUG_SPINLOCK
+	select DEBUG_MUTEXES
+	select DEBUG_RT_MUTEXES if RT_MUTEXES
+	select DEBUG_RWSEMS
+	select DEBUG_WW_MUTEX_SLOWPATH
+	select DEBUG_LOCK_ALLOC
+	select TRACE_IRQFLAGS
+	select STACKTRACE
+	select FRAME_POINTER if !MIPS && !PPC && !ARM && !S390 && !MICROBLAZE && !ARC && !X86
+	select KALLSYMS
+	select KALLSYMS_ALL
+	select PROVE_LOCKING
+	default n
+	help
+	  Check dependencies between wait and event and report it if
+	  deadlock possibility has been detected. Multiple reports are
+	  allowed if there are more than a single problem.
+
+	  This feature is considered EXPERIMENTAL that might produce
+	  false positive reports because new dependencies start to be
+	  tracked, that have never been tracked before. It's worth
+	  noting, to mitigate the impact by the false positives, multi
+	  reporting has been supported.
+
 config LOCK_DEBUGGING_SUPPORT
 	bool
 	depends on TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 02/21] dept: Implement Dept(Dependency Tracker)
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

CURRENT STATUS
--------------
Lockdep tracks acquisition order of locks in order to detect deadlock,
and IRQ and IRQ enable/disable state as well to take accident
acquisitions into account.

Lockdep should be turned off once it detects and reports a deadlock
since the data structure and algorithm are not reusable after detection
because of the complex design.

PROBLEM
-------
*Waits* and their *events* that never reach eventually cause deadlock.
However, Lockdep is only interested in lock acquisition order, forcing
to emulate lock acqusition even for just waits and events that have
nothing to do with real lock.

Even worse, no one likes Lockdep's false positive detection because that
prevents further one that might be more valuable. That's why all the
kernel developers are sensitive to Lockdep's false positive.

Besides those, by tracking acquisition order, it cannot correctly deal
with read lock and cross-event e.g. wait_for_completion()/complete() for
deadlock detection. Lockdep is no longer a good tool for that purpose.

SOLUTION
--------
Again, *waits* and their *events* that never reach eventually cause
deadlock. The new solution, Dept(DEPendency Tracker), focuses on waits
and events themselves. Dept tracks waits and events and report it if
any event would be never reachable.

Dept does:
   . Works with read lock in the right way.
   . Works with any wait and event e.i. cross-event.
   . Continue to work even after reporting multiple times.
   . Provides simple and intuitive APIs.
   . Does exactly what dependency checker should do.

Q & A
-----
Q. Is this the first try ever to address the problem?
A. No. Cross-release feature (b09be676e0ff2 locking/lockdep: Implement
   the 'crossrelease' feature) addressed it 2 years ago that was a
   Lockdep extension and merged but reverted shortly because:

   Cross-release started to report valuable hidden problems but started
   to give report false positive reports as well. For sure, no one
   likes Lockdep's false positive reports since it makes Lockdep stop,
   preventing reporting further real problems.

Q. Why not Dept was developed as an extension of Lockdep?
A. Lockdep definitely includes all the efforts great developers have
   made for a long time so as to be quite stable enough. But I had to
   design and implement newly because of the following:

   1) Lockdep was designed to track lock acquisition order. The APIs and
      implementation do not fit on wait-event model.
   2) Lockdep is turned off on detection including false positive. Which
      is terrible and prevents developing any extension for stronger
      detection.

Q. Do you intend to totally replace Lockdep?
A. No. Lockdep also checks if lock usage is correct. Of course, the
   dependency check routine should be replaced but the other functions
   should be still there.

Q. Do you mean the dependency check routine should be replaced right
   away?
A. No. I admit Lockdep is stable enough thanks to great efforts kernel
   developers have made. Lockdep and Dept, both should be in the kernel
   until Dept gets considered stable.

Q. Stronger detection capability would give more false positive report.
   Which was a big problem when cross-release was introduced. Is it ok
   with Dept?
A. It's ok. Dept allows multiple reporting thanks to simple and quite
   generalized design. Of course, false positive reports should be fixed
   anyway but it's no longer as a critical problem as it was.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/dept.h            |  528 ++++++++
 include/linux/dept_sdt.h        |   60 +
 include/linux/hardirq.h         |    3 +
 include/linux/irqflags.h        |   71 +-
 include/linux/lockdep.h         |   61 +-
 include/linux/lockdep_types.h   |    3 +
 include/linux/sched.h           |    7 +
 init/init_task.c                |    2 +
 init/main.c                     |    2 +
 kernel/Makefile                 |    1 +
 kernel/dependency/Makefile      |    3 +
 kernel/dependency/dept.c        | 2633 +++++++++++++++++++++++++++++++++++++++
 kernel/dependency/dept_hash.h   |   10 +
 kernel/dependency/dept_object.h |   13 +
 kernel/exit.c                   |    1 +
 kernel/fork.c                   |    2 +
 kernel/locking/lockdep.c        |   28 +-
 kernel/module.c                 |    2 +
 kernel/sched/core.c             |    8 +
 lib/Kconfig.debug               |   27 +
 20 files changed, 3433 insertions(+), 32 deletions(-)
 create mode 100644 include/linux/dept.h
 create mode 100644 include/linux/dept_sdt.h
 create mode 100644 kernel/dependency/Makefile
 create mode 100644 kernel/dependency/dept.c
 create mode 100644 kernel/dependency/dept_hash.h
 create mode 100644 kernel/dependency/dept_object.h

diff --git a/include/linux/dept.h b/include/linux/dept.h
new file mode 100644
index 00000000..c498060
--- /dev/null
+++ b/include/linux/dept.h
@@ -0,0 +1,528 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * DEPT(DEPendency Tracker) - runtime dependency tracker
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __LINUX_DEPT_H
+#define __LINUX_DEPT_H
+
+#ifdef CONFIG_DEPT
+
+#include <linux/types.h>
+
+struct task_struct;
+
+#define DEPT_MAX_STACK_ENTRY		16
+#define DEPT_MAX_WAIT_HIST		64
+#define DEPT_MAX_ECXT_HELD		48
+
+#define DEPT_MAX_SUBCLASSES		16
+#define DEPT_MAX_SUBCLASSES_EVT		2
+#define DEPT_MAX_SUBCLASSES_USR		(DEPT_MAX_SUBCLASSES / DEPT_MAX_SUBCLASSES_EVT)
+#define DEPT_MAX_SUBCLASSES_CACHE	2
+
+#define DEPT_SIRQ			0
+#define DEPT_HIRQ			1
+#define DEPT_IRQS_NR			2
+#define DEPT_SIRQF			(1UL << DEPT_SIRQ)
+#define DEPT_HIRQF			(1UL << DEPT_HIRQ)
+
+struct dept_ecxt;
+struct dept_iecxt {
+	struct dept_ecxt		*ecxt;
+	int				enirq;
+	/*
+	 * for preventing to add a new ecxt
+	 */
+	bool				staled;
+};
+
+struct dept_wait;
+struct dept_iwait {
+	struct dept_wait		*wait;
+	int				irq;
+	/*
+	 * for preventing to add a new wait
+	 */
+	bool				staled;
+	bool				touched;
+};
+
+struct dept_class {
+	union {
+		struct llist_node	pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t		ref;
+	};
+
+	/*
+	 * unique information about the class
+	 */
+	const char			*name;
+	unsigned long			key;
+	int				sub;
+
+	/*
+	 * for BFS
+	 */
+	unsigned int			bfs_gen;
+	int				bfs_dist;
+	struct dept_class		*bfs_parent;
+
+	/*
+	 * for hashing this object
+	 */
+	struct hlist_node		hash_node;
+
+	/*
+	 * for linking all classes
+	 */
+	struct list_head		all_node;
+
+	/*
+	 * for associating its dependencies
+	 */
+	struct list_head		dep_head;
+	struct list_head		dep_rev_head;
+
+	/*
+	 * for tracking IRQ dependencies
+	 */
+	struct dept_iecxt		iecxt[DEPT_IRQS_NR];
+	struct dept_iwait		iwait[DEPT_IRQS_NR];
+};
+
+struct dept_stack {
+	union {
+		struct llist_node	pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t		ref;
+	};
+
+	/*
+	 * backtrace entries
+	 */
+	unsigned long			raw[DEPT_MAX_STACK_ENTRY];
+	int nr;
+};
+
+struct dept_ecxt {
+	union {
+		struct llist_node	pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t		ref;
+	};
+
+	/*
+	 * function that entered to this ecxt
+	 */
+	const char			*ecxt_fn;
+
+	/*
+	 * event function
+	 */
+	const char			*event_fn;
+
+	/*
+	 * associated class
+	 */
+	struct dept_class		*class;
+
+	/*
+	 * flag indicating which IRQ has been
+	 * enabled within the event context
+	 */
+	unsigned long			enirqf;
+
+	/*
+	 * where the IRQ-enabled happened
+	 */
+	unsigned long			enirq_ip[DEPT_IRQS_NR];
+	struct dept_stack		*enirq_stack[DEPT_IRQS_NR];
+
+	/*
+	 * where the event context started
+	 */
+	unsigned long			ecxt_ip;
+	struct dept_stack		*ecxt_stack;
+
+	/*
+	 * where the event triggered
+	 */
+	unsigned long			event_ip;
+	struct dept_stack		*event_stack;
+};
+
+struct dept_wait {
+	union {
+		struct llist_node	pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t		ref;
+	};
+
+	/*
+	 * function causing this wait
+	 */
+	const char			*wait_fn;
+
+	/*
+	 * the associated class
+	 */
+	struct dept_class		*class;
+
+	/*
+	 * which IRQ the wait was placed in
+	 */
+	unsigned long			irqf;
+
+	/*
+	 * where the IRQ wait happened
+	 */
+	unsigned long			irq_ip[DEPT_IRQS_NR];
+	struct dept_stack		*irq_stack[DEPT_IRQS_NR];
+
+	/*
+	 * where the wait happened
+	 */
+	unsigned long			wait_ip;
+	struct dept_stack		*wait_stack;
+};
+
+struct dept_dep {
+	union {
+		struct llist_node	pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t		ref;
+	};
+
+	/*
+	 * key data of dependency
+	 */
+	struct dept_ecxt		*ecxt;
+	struct dept_wait		*wait;
+
+	/*
+	 * This object can be referred without dept_lock
+	 * held but with IRQ disabled, e.g. for hash
+	 * lookup. So deferred deletion is needed.
+	 */
+	struct rcu_head			rh;
+
+	/*
+	 * for BFS
+	 */
+	struct list_head		bfs_node;
+
+	/*
+	 * for hashing this object
+	 */
+	struct hlist_node		hash_node;
+
+	/*
+	 * for linking to a class object
+	 */
+	struct list_head		dep_node;
+	struct list_head		dep_rev_node;
+};
+
+struct dept_hash {
+	/*
+	 * hash table
+	 */
+	struct hlist_head		*table;
+
+	/*
+	 * size of the table e.i. 2^bits
+	 */
+	int				bits;
+};
+
+struct dept_pool {
+	const char			*name;
+
+	/*
+	 * object size
+	 */
+	size_t				obj_sz;
+
+	/*
+	 * the number of the static array
+	 */
+	atomic_t			obj_nr;
+
+	/*
+	 * offset of ->pool_node
+	 */
+	size_t				node_off;
+
+	/*
+	 * pointer to the pool
+	 */
+	void				*spool;
+	struct llist_head		boot_pool;
+	struct llist_head __percpu	*lpool;
+};
+
+struct dept_ecxt_held {
+	/*
+	 * associated event context
+	 */
+	struct dept_ecxt		*ecxt;
+
+	/*
+	 * unique key for this dept_ecxt_held
+	 */
+	unsigned long			key;
+
+	/*
+	 * the wgen when the event context started
+	 */
+	unsigned int			wgen;
+
+	/*
+	 * for allowing user aware nesting
+	 */
+	int				nest;
+};
+
+struct dept_wait_hist {
+	/*
+	 * associated wait
+	 */
+	struct dept_wait		*wait;
+
+	/*
+	 * unique id of all waits system-wise until wrapped
+	 */
+	unsigned int			wgen;
+
+	/*
+	 * local context id to identify IRQ context
+	 */
+	unsigned int			ctxt_id;
+};
+
+struct dept_key {
+	union {
+		/*
+		 * Each byte-wise address will be used as its key.
+		 */
+		char			subkeys[DEPT_MAX_SUBCLASSES];
+
+		/*
+		 * for caching the main class pointer
+		 */
+		struct dept_class	*classes[DEPT_MAX_SUBCLASSES_CACHE];
+	};
+};
+
+struct dept_map {
+	const char			*name;
+	struct dept_key			*keys;
+	int				sub_usr;
+
+	/*
+	 * It's local copy for fast acces to the associated classes. And
+	 * Also used for dept_key instance for statically defined map.
+	 */
+	struct dept_key			keys_local;
+
+	/*
+	 * wait timestamp associated to this map
+	 */
+	unsigned int			wgen;
+
+	/*
+	 * whether this map should be going to be checked or not
+	 */
+	bool				nocheck;
+};
+
+#define DEPT_MAP_INITIALIZER(n)						\
+{									\
+	.name = #n,							\
+	.keys = NULL,							\
+	.sub_usr = 0,							\
+	.keys_local = { .classes = { 0 } },				\
+	.wgen = 0U,							\
+	.nocheck = false,						\
+}
+
+struct dept_task {
+	/*
+	 * all event contexts that have entered and before exiting
+	 */
+	struct dept_ecxt_held		ecxt_held[DEPT_MAX_ECXT_HELD];
+	int				ecxt_held_pos;
+
+	/*
+	 * ring buffer holding all waits that have happened
+	 */
+	struct dept_wait_hist		wait_hist[DEPT_MAX_WAIT_HIST];
+	int				wait_hist_pos;
+
+	/*
+	 * sequential id to identify each IRQ context
+	 */
+	unsigned int			irq_id[DEPT_IRQS_NR];
+
+	/*
+	 * for tracking IRQ-enabled points with cross-event
+	 */
+	unsigned int			wgen_enirq[DEPT_IRQS_NR];
+
+	/*
+	 * for keeping up-to-date IRQ-enabled points
+	 */
+	unsigned long			enirq_ip[DEPT_IRQS_NR];
+
+	/*
+	 * current effective IRQ-enabled flag
+	 */
+	unsigned long			eff_enirqf;
+
+	/*
+	 * for reserving a current stack instance at each operation
+	 */
+	struct dept_stack		*stack;
+
+	/*
+	 * for preventing recursive call into DEPT engine
+	 */
+	int				recursive;
+
+	/*
+	 * for staging data to commit a wait
+	 */
+	struct dept_map			*stage_m;
+	unsigned long			stage_w_f;
+	const char			*stage_w_fn;
+	int				stage_ne;
+
+	/*
+	 * the number of missing ecxts
+	 */
+	int				missing_ecxt;
+
+	/*
+	 * for tracking IRQ-enable state
+	 */
+	bool				hardirqs_enabled;
+	bool				softirqs_enabled;
+};
+
+#define DEPT_TASK_INITIALIZER(t)				\
+{								\
+	.wait_hist = { { .wait = NULL, } },			\
+	.ecxt_held_pos = 0,					\
+	.wait_hist_pos = 0,					\
+	.irq_id = { 0U },					\
+	.wgen_enirq = { 0U },					\
+	.enirq_ip = { 0UL },					\
+	.eff_enirqf = 0UL,					\
+	.stack = NULL,						\
+	.recursive = 0,						\
+	.stage_m = NULL,					\
+	.stage_w_f = 0UL,					\
+	.stage_w_fn = NULL,					\
+	.stage_ne = 0,						\
+	.missing_ecxt = 0,					\
+	.hardirqs_enabled = false,				\
+	.softirqs_enabled = false,				\
+}
+
+extern void dept_on(void);
+extern void dept_off(void);
+extern void dept_init(void);
+extern void dept_task_init(struct task_struct *t);
+extern void dept_task_exit(struct task_struct *t);
+extern void dept_free_range(void *start, unsigned int sz);
+extern void dept_map_init(struct dept_map *m, struct dept_key *k, int sub, const char *n);
+extern void dept_map_reinit(struct dept_map *m);
+extern void dept_map_nocheck(struct dept_map *m);
+
+extern void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip, const char *w_fn, int ne);
+extern void dept_stage_wait(struct dept_map *m, unsigned long w_f, const char *w_fn, int ne);
+extern void dept_ask_event_wait_commit(unsigned long ip);
+extern void dept_clean_stage(void);
+extern void dept_ecxt_enter(struct dept_map *m, unsigned long e_f, unsigned long ip, const char *c_fn, const char *e_fn, int ne);
+extern void dept_ask_event(struct dept_map *m);
+extern void dept_event(struct dept_map *m, unsigned long e_f, unsigned long ip, const char *e_fn);
+extern void dept_ecxt_exit(struct dept_map *m, unsigned long e_f, unsigned long ip);
+
+static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
+{
+	dept_ecxt_enter(m, 0UL, 0UL, NULL, NULL, 0);
+}
+
+/*
+ * for users who want to manage external keys
+ */
+extern void dept_key_init(struct dept_key *k);
+extern void dept_key_destroy(struct dept_key *k);
+
+extern void dept_softirq_enter(void);
+extern void dept_hardirq_enter(void);
+extern void dept_aware_softirqs_enable(void);
+extern void dept_aware_hardirqs_enable(void);
+extern void dept_aware_softirqs_disable(void);
+extern void dept_aware_hardirqs_disable(void);
+extern void dept_enirq_transition(unsigned long ip);
+#else /* !CONFIG_DEPT */
+struct dept_key  { };
+struct dept_map  { };
+struct dept_task { };
+
+#define DEPT_MAP_INITIALIZER(n) { }
+#define DEPT_TASK_INITIALIZER(t) { }
+
+#define dept_on()				do { } while (0)
+#define dept_off()				do { } while (0)
+#define dept_init()				do { } while (0)
+#define dept_task_init(t)			do { } while (0)
+#define dept_task_exit(t)			do { } while (0)
+#define dept_free_range(s, sz)			do { } while (0)
+#define dept_map_init(m, k, s, n)		do { (void)(n); (void)(k); } while (0)
+#define dept_map_reinit(m)			do { } while (0)
+#define dept_map_nocheck(m)			do { } while (0)
+
+#define dept_wait(m, w_f, ip, w_fn, ne)		do { (void)(w_fn); } while (0)
+#define dept_stage_wait(m, w_f, w_fn, ne)	do { (void)(w_fn); } while (0)
+#define dept_ask_event_wait_commit(ip)		do { } while (0)
+#define dept_clean_stage()			do { } while (0)
+#define dept_ecxt_enter(m, e_f, ip, c_fn, e_fn, ne) do { (void)(c_fn); (void)(e_fn); } while (0)
+#define dept_ask_event(m)			do { } while (0)
+#define dept_event(m, e_f, ip, e_fn)		do { (void)(e_fn); } while (0)
+#define dept_ecxt_exit(m, e_f, ip)		do { } while (0)
+#define dept_ecxt_enter_nokeep(m)		do { } while (0)
+#define dept_key_init(k)			do { (void)(k); } while (0)
+#define dept_key_destroy(k)			do { (void)(k); } while (0)
+
+#define dept_softirq_enter()				do { } while (0)
+#define dept_hardirq_enter()				do { } while (0)
+#define dept_aware_softirqs_enable()			do { } while (0)
+#define dept_aware_hardirqs_enable()			do { } while (0)
+#define dept_aware_softirqs_disable()			do { } while (0)
+#define dept_aware_hardirqs_disable()			do { } while (0)
+#define dept_enirq_transition(ip)			do { } while (0)
+#endif
+#endif /* __LINUX_DEPT_H */
diff --git a/include/linux/dept_sdt.h b/include/linux/dept_sdt.h
new file mode 100644
index 00000000..49763cd
--- /dev/null
+++ b/include/linux/dept_sdt.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Dept Single-event Dependency Tracker
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __LINUX_DEPT_SDT_H
+#define __LINUX_DEPT_SDT_H
+
+#include <linux/dept.h>
+
+#ifdef CONFIG_DEPT
+/*
+ * SDT(Single-event Dependency Tracker) APIs
+ *
+ * In case that one dept_map instance maps to a single event, SDT APIs
+ * can be used.
+ */
+#define sdt_map_init(m)							\
+	do {								\
+		static struct dept_key __key;				\
+		dept_map_init(m, &__key, 0, #m);			\
+	} while (0)
+#define sdt_map_init_key(m, k)		dept_map_init(m, k, 0, #m)
+
+#define sdt_wait(m)							\
+	do {								\
+		dept_ask_event(m);					\
+		dept_wait(m, 1UL, _THIS_IP_, "wait", 0);		\
+	} while (0)
+/*
+ * This will be committed in __schedule() when it actually gets to
+ * __schedule(). Both dept_ask_event() and dept_wait() will be performed
+ * on the commit in __schedule().
+ */
+#define sdt_wait_prepare(m)		dept_stage_wait(m, 1UL, "wait", 0)
+#define sdt_wait_finish()		dept_clean_stage()
+#define sdt_ecxt_enter(m)		dept_ecxt_enter(m, 1UL, _THIS_IP_, "start", "event", 0)
+#define sdt_event(m)			dept_event(m, 1UL, _THIS_IP_, "event")
+#define sdt_ecxt_exit(m)		dept_ecxt_exit(m, 1UL, _THIS_IP_)
+#else /* !CONFIG_DEPT */
+#define DEPT_SDT_MAP_INIT(dname)	{ }
+
+#define sdt_map_init(m)			do { } while (0)
+#define sdt_map_init_key(m, k)		do { (void)(k); } while (0)
+#define sdt_wait(m)			do { } while (0)
+#define sdt_wait_prepare(m)		do { } while (0)
+#define sdt_wait_finish()		do { } while (0)
+#define sdt_ecxt_enter(m)		do { } while (0)
+#define sdt_event(m)			do { } while (0)
+#define sdt_ecxt_exit(m)		do { } while (0)
+#endif
+
+#define DEFINE_DEPT_SDT(x)		\
+	struct dept_map x = DEPT_MAP_INITIALIZER(x)
+
+#endif /* __LINUX_DEPT_SDT_H */
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 76878b3..07005f2 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -5,6 +5,7 @@
 #include <linux/context_tracking_state.h>
 #include <linux/preempt.h>
 #include <linux/lockdep.h>
+#include <linux/dept.h>
 #include <linux/ftrace_irq.h>
 #include <linux/sched.h>
 #include <linux/vtime.h>
@@ -114,6 +115,7 @@ static inline void rcu_nmi_exit(void) { }
  */
 #define __nmi_enter()						\
 	do {							\
+		dept_off();					\
 		lockdep_off();					\
 		arch_nmi_enter();				\
 		BUG_ON(in_nmi() == NMI_MASK);			\
@@ -136,6 +138,7 @@ static inline void rcu_nmi_exit(void) { }
 		__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
 		arch_nmi_exit();				\
 		lockdep_on();					\
+		dept_on();					\
 	} while (0)
 
 #define nmi_exit()						\
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 4b14093..d168fa3 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -13,23 +13,52 @@
 #define _LINUX_TRACE_IRQFLAGS_H
 
 #include <linux/typecheck.h>
+#include <linux/dept.h>
 #include <asm/irqflags.h>
 #include <asm/percpu.h>
 
 /* Currently lockdep_softirqs_on/off is used only by lockdep */
 #ifdef CONFIG_PROVE_LOCKING
-  extern void lockdep_softirqs_on(unsigned long ip);
-  extern void lockdep_softirqs_off(unsigned long ip);
-  extern void lockdep_hardirqs_on_prepare(unsigned long ip);
-  extern void lockdep_hardirqs_on(unsigned long ip);
-  extern void lockdep_hardirqs_off(unsigned long ip);
+  extern void __lockdep_softirqs_on(unsigned long ip);
+  extern void __lockdep_softirqs_off(unsigned long ip);
+  extern void __lockdep_hardirqs_on_prepare(unsigned long ip);
+  extern void __lockdep_hardirqs_on(unsigned long ip);
+  extern void __lockdep_hardirqs_off(unsigned long ip);
 #else
-  static inline void lockdep_softirqs_on(unsigned long ip) { }
-  static inline void lockdep_softirqs_off(unsigned long ip) { }
-  static inline void lockdep_hardirqs_on_prepare(unsigned long ip) { }
-  static inline void lockdep_hardirqs_on(unsigned long ip) { }
-  static inline void lockdep_hardirqs_off(unsigned long ip) { }
+  static inline void __lockdep_softirqs_on(unsigned long ip) { }
+  static inline void __lockdep_softirqs_off(unsigned long ip) { }
+  static inline void __lockdep_hardirqs_on_prepare(unsigned long ip) { }
+  static inline void __lockdep_hardirqs_on(unsigned long ip) { }
+  static inline void __lockdep_hardirqs_off(unsigned long ip) { }
 #endif
+static inline void lockdep_softirqs_on(unsigned long ip)
+{
+	__lockdep_softirqs_on(ip);
+	dept_aware_softirqs_enable();
+	dept_enirq_transition(ip);
+}
+static inline void lockdep_softirqs_off(unsigned long ip)
+{
+	__lockdep_softirqs_off(ip);
+	dept_aware_softirqs_disable();
+	dept_enirq_transition(ip);
+}
+static inline void lockdep_hardirqs_on_prepare(unsigned long ip)
+{
+	__lockdep_hardirqs_on_prepare(ip);
+	dept_aware_hardirqs_enable();
+	dept_enirq_transition(ip);
+}
+static inline void lockdep_hardirqs_on(unsigned long ip)
+{
+	__lockdep_hardirqs_on(ip);
+}
+static inline void lockdep_hardirqs_off(unsigned long ip)
+{
+	__lockdep_hardirqs_off(ip);
+	dept_aware_hardirqs_disable();
+	dept_enirq_transition(ip);
+}
 
 #ifdef CONFIG_TRACE_IRQFLAGS
 
@@ -60,8 +89,10 @@ struct irqtrace_events {
 # define lockdep_softirqs_enabled(p)	((p)->softirqs_enabled)
 # define lockdep_hardirq_enter()			\
 do {							\
-	if (__this_cpu_inc_return(hardirq_context) == 1)\
+	if (__this_cpu_inc_return(hardirq_context) == 1) {\
 		current->hardirq_threaded = 0;		\
+		dept_hardirq_enter();			\
+	}						\
 } while (0)
 # define lockdep_hardirq_threaded()		\
 do {						\
@@ -135,7 +166,8 @@ struct irqtrace_events {
 #if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PREEMPT_RT)
 # define lockdep_softirq_enter()		\
 do {						\
-	current->softirq_context++;		\
+	if (!current->softirq_context++)	\
+		dept_softirq_enter();		\
 } while (0)
 # define lockdep_softirq_exit()			\
 do {						\
@@ -170,17 +202,28 @@ struct irqtrace_events {
 /*
  * Wrap the arch provided IRQ routines to provide appropriate checks.
  */
-#define raw_local_irq_disable()		arch_local_irq_disable()
-#define raw_local_irq_enable()		arch_local_irq_enable()
+#define raw_local_irq_disable()				\
+	do {						\
+		arch_local_irq_disable();		\
+		dept_aware_hardirqs_disable();		\
+	} while (0)
+#define raw_local_irq_enable()				\
+	do {						\
+		dept_aware_hardirqs_enable();		\
+		arch_local_irq_enable();		\
+	} while (0)
 #define raw_local_irq_save(flags)			\
 	do {						\
 		typecheck(unsigned long, flags);	\
 		flags = arch_local_irq_save();		\
+		dept_aware_hardirqs_disable();		\
 	} while (0)
 #define raw_local_irq_restore(flags)			\
 	do {						\
 		typecheck(unsigned long, flags);	\
 		raw_check_bogus_irq_restore();		\
+		if (!arch_irqs_disabled_flags(flags))	\
+			dept_aware_hardirqs_enable();	\
 		arch_local_irq_restore(flags);		\
 	} while (0)
 #define raw_local_save_flags(flags)			\
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 467b942..aee4660 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -20,6 +20,33 @@
 extern int prove_locking;
 extern int lock_stat;
 
+#ifdef CONFIG_DEPT
+static inline void dept_after_copy_map(struct dept_map *to,
+				       struct dept_map *from)
+{
+	int i;
+
+	if (from->keys == &from->keys_local)
+		to->keys = &to->keys_local;
+
+	if (!to->keys)
+		return;
+
+	/*
+	 * Since the class cache can be modified concurrently we could observe
+	 * half pointers (64bit arch using 32bit copy insns). Therefore clear
+	 * the caches and take the performance hit.
+	 *
+	 * XXX it doesn't work well with lockdep_set_class_and_subclass(), since
+	 *     that relies on cache abuse.
+	 */
+	for (i = 0; i < DEPT_MAX_SUBCLASSES_CACHE; i++)
+		to->keys->classes[i] = NULL;
+}
+#else
+#define dept_after_copy_map(t, f)	do { } while (0)
+#endif
+
 #ifdef CONFIG_LOCKDEP
 
 #include <linux/linkage.h>
@@ -43,6 +70,8 @@ static inline void lockdep_copy_map(struct lockdep_map *to,
 	 */
 	for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++)
 		to->class_cache[i] = NULL;
+
+	dept_after_copy_map(&to->dmap, &from->dmap);
 }
 
 /*
@@ -176,8 +205,19 @@ struct held_lock {
 	current->lockdep_recursion -= LOCKDEP_OFF;	\
 } while (0)
 
-extern void lockdep_register_key(struct lock_class_key *key);
-extern void lockdep_unregister_key(struct lock_class_key *key);
+extern void __lockdep_register_key(struct lock_class_key *key);
+extern void __lockdep_unregister_key(struct lock_class_key *key);
+
+#define lockdep_register_key(k)				\
+do {							\
+	__lockdep_register_key(k);			\
+	dept_key_init(&(k)->dkey);			\
+} while (0)
+#define lockdep_unregister_key(k)			\
+do {							\
+	__lockdep_unregister_key(k);			\
+	dept_key_destroy(&(k)->dkey);			\
+} while (0)
 
 /*
  * These methods are used by specific locking variants (spinlocks,
@@ -185,9 +225,18 @@ struct held_lock {
  * to lockdep:
  */
 
-extern void lockdep_init_map_type(struct lockdep_map *lock, const char *name,
+extern void __lockdep_init_map_type(struct lockdep_map *lock, const char *name,
 	struct lock_class_key *key, int subclass, u8 inner, u8 outer, u8 lock_type);
 
+#define lockdep_init_map_type(l, n, k, s, i, o, t)		\
+do {								\
+	__lockdep_init_map_type(l, n, k, s, i, o, t);		\
+	if ((k) == &__lockdep_no_validate__)			\
+		dept_map_nocheck(&(l)->dmap);			\
+	else							\
+		dept_map_init(&(l)->dmap, &(k)->dkey, s, n);	\
+} while (0)
+
 static inline void
 lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
 		       struct lock_class_key *key, int subclass, u8 inner, u8 outer)
@@ -435,9 +484,13 @@ enum xhlock_context_t {
 /*
  * To initialize a lockdep_map statically use this macro.
  * Note that _name must not be NULL.
+ *
+ * TODO: I found the case to use an address of other than a real key as
+ * _key, for instance, in workqueue. We cannot use it as key in Dept.
  */
 #define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
-	{ .name = (_name), .key = (void *)(_key), }
+	{ .name = (_name), .key = (void *)(_key), \
+	  .dmap = DEPT_MAP_INITIALIZER(_name) }
 
 static inline void lockdep_invariant_state(bool force) {}
 static inline void lockdep_free_task(struct task_struct *task) {}
diff --git a/include/linux/lockdep_types.h b/include/linux/lockdep_types.h
index d224308..50c8879 100644
--- a/include/linux/lockdep_types.h
+++ b/include/linux/lockdep_types.h
@@ -11,6 +11,7 @@
 #define __LINUX_LOCKDEP_TYPES_H
 
 #include <linux/types.h>
+#include <linux/dept.h>
 
 #define MAX_LOCKDEP_SUBCLASSES		8UL
 
@@ -76,6 +77,7 @@ struct lock_class_key {
 		struct hlist_node		hash_entry;
 		struct lockdep_subclass_key	subkeys[MAX_LOCKDEP_SUBCLASSES];
 	};
+	struct dept_key				dkey;
 };
 
 extern struct lock_class_key __lockdep_no_validate__;
@@ -185,6 +187,7 @@ struct lockdep_map {
 	int				cpu;
 	unsigned long			ip;
 #endif
+	struct dept_map			dmap;
 };
 
 struct pin_cookie { unsigned int val; };
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00..3716e41 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -35,6 +35,7 @@
 #include <linux/seqlock.h>
 #include <linux/kcsan.h>
 #include <asm/kmap_size.h>
+#include <linux/dept.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
 struct audit_context;
@@ -201,12 +202,16 @@
  */
 #define __set_current_state(state_value)				\
 	do {								\
+		if (state_value == TASK_RUNNING)			\
+			dept_clean_stage();				\
 		debug_normal_state_change((state_value));		\
 		WRITE_ONCE(current->__state, (state_value));		\
 	} while (0)
 
 #define set_current_state(state_value)					\
 	do {								\
+		if (state_value == TASK_RUNNING)			\
+			dept_clean_stage();				\
 		debug_normal_state_change((state_value));		\
 		smp_store_mb(current->__state, (state_value));		\
 	} while (0)
@@ -1156,6 +1161,8 @@ struct task_struct {
 	struct held_lock		held_locks[MAX_LOCK_DEPTH];
 #endif
 
+	struct dept_task		dept_task;
+
 #if defined(CONFIG_UBSAN) && !defined(CONFIG_UBSAN_TRAP)
 	unsigned int			in_ubsan;
 #endif
diff --git a/init/init_task.c b/init/init_task.c
index 73cc8f0..ceea035 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -12,6 +12,7 @@
 #include <linux/audit.h>
 #include <linux/numa.h>
 #include <linux/scs.h>
+#include <linux/dept.h>
 
 #include <linux/uaccess.h>
 
@@ -193,6 +194,7 @@ struct task_struct init_task
 	.curr_chain_key = INITIAL_CHAIN_KEY,
 	.lockdep_recursion = 0,
 #endif
+	.dept_task = DEPT_TASK_INITIALIZER(init_task),
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	.ret_stack		= NULL,
 	.tracing_graph_pause	= ATOMIC_INIT(0),
diff --git a/init/main.c b/init/main.c
index 98182c3..deabdd5 100644
--- a/init/main.c
+++ b/init/main.c
@@ -65,6 +65,7 @@
 #include <linux/debug_locks.h>
 #include <linux/debugobjects.h>
 #include <linux/lockdep.h>
+#include <linux/dept.h>
 #include <linux/kmemleak.h>
 #include <linux/padata.h>
 #include <linux/pid_namespace.h>
@@ -1071,6 +1072,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
 		      panic_param);
 
 	lockdep_init();
+	dept_init();
 
 	/*
 	 * Need to run this when irqs are enabled, because it wants
diff --git a/kernel/Makefile b/kernel/Makefile
index 847a82b..5de01e2 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -53,6 +53,7 @@ obj-y += rcu/
 obj-y += livepatch/
 obj-y += dma/
 obj-y += entry/
+obj-y += dependency/
 
 obj-$(CONFIG_KCMP) += kcmp.o
 obj-$(CONFIG_FREEZER) += freezer.o
diff --git a/kernel/dependency/Makefile b/kernel/dependency/Makefile
new file mode 100644
index 00000000..b5cfb8a
--- /dev/null
+++ b/kernel/dependency/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_DEPT) += dept.o
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
new file mode 100644
index 00000000..1e90284
--- /dev/null
+++ b/kernel/dependency/dept.c
@@ -0,0 +1,2633 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DEPT(DEPendency Tracker) - Runtime dependency tracker
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ *
+ * DEPT provides a general way to detect deadlock possibility in runtime
+ * and the interest is not limited to typical lock but to every
+ * syncronization primitives.
+ *
+ * The following ideas were borrowed from LOCKDEP:
+ *
+ *    1) Use a graph to track relationship between classes.
+ *    2) Prevent performance regression using hash.
+ *
+ * The following items were enhanced from LOCKDEP:
+ *
+ *    1) Cover more deadlock cases.
+ *    2) Allow muliple reports.
+ *
+ * TODO: Both LOCKDEP and DEPT should co-exist until DEPT is considered
+ * stable. Then the dependency check routine should be replaced with
+ * DEPT after. It should finally look like:
+ *
+ *
+ *
+ * As is:
+ *
+ *    LOCKDEP
+ *    +-----------------------------------------+
+ *    | Lock usage correctness check            | <-> locks
+ *    |                                         |
+ *    |                                         |
+ *    | +-------------------------------------+ |
+ *    | | Dependency check                    | |
+ *    | | (by tracking lock acquisition order)| |
+ *    | +-------------------------------------+ |
+ *    |                                         |
+ *    +-----------------------------------------+
+ *
+ *    DEPT
+ *    +-----------------------------------------+
+ *    | Dependency check                        | <-> waits/events
+ *    | (by tracking wait and event context)    |
+ *    +-----------------------------------------+
+ *
+ *
+ *
+ * To be:
+ *
+ *    LOCKDEP
+ *    +-----------------------------------------+
+ *    | Lock usage correctness check            | <-> locks
+ *    |                                         |
+ *    |                                         |
+ *    |       (Request dependency check)        |
+ *    |                    T                    |
+ *    +--------------------|--------------------+
+ *                         |
+ *    DEPT                 V
+ *    +-----------------------------------------+
+ *    | Dependency check                        | <-> waits/events
+ *    | (by tracking wait and event context)    |
+ *    +-----------------------------------------+
+ */
+
+#include <linux/sched.h>
+#include <linux/stacktrace.h>
+#include <linux/spinlock.h>
+#include <linux/kallsyms.h>
+#include <linux/hash.h>
+#include <linux/dept.h>
+#include <linux/utsname.h>
+
+static int dept_stop;
+static int dept_per_cpu_ready;
+
+#define DEPT_READY_WARN (!oops_in_progress)
+
+/*
+ * Make all operations using DEPT_WARN_ON() fail on oops_in_progress and
+ * prevent warning message.
+ */
+#define DEPT_WARN_ON_ONCE(c)						\
+	({								\
+		int __ret = 0;						\
+									\
+		if (likely(DEPT_READY_WARN))				\
+			__ret = WARN_ONCE(c, "DEPT_WARN_ON_ONCE: " #c);	\
+		__ret;							\
+	})
+
+#define DEPT_WARN_ONCE(s...)						\
+	({								\
+		if (likely(DEPT_READY_WARN))				\
+			WARN_ONCE(1, "DEPT_WARN_ONCE: " s);		\
+	})
+
+#define DEPT_WARN_ON(c)							\
+	({								\
+		int __ret = 0;						\
+									\
+		if (likely(DEPT_READY_WARN))				\
+			__ret = WARN(c, "DEPT_WARN_ON: " #c);		\
+		__ret;							\
+	})
+
+#define DEPT_WARN(s...)							\
+	({								\
+		if (likely(DEPT_READY_WARN))				\
+			WARN(1, "DEPT_WARN: " s);			\
+	})
+
+#define DEPT_STOP(s...)							\
+	({								\
+		WRITE_ONCE(dept_stop, 1);				\
+		if (likely(DEPT_READY_WARN))				\
+			WARN(1, "DEPT_STOP: " s);			\
+	})
+
+#define DEPT_INFO_ONCE(s...) pr_warn_once("DEPT_INFO_ONCE: " s)
+
+static arch_spinlock_t dept_spin = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+
+/*
+ * DEPT internal engine should be careful in using outside functions
+ * e.g. printk at reporting since that kind of usage might cause
+ * untrackable deadlock.
+ */
+static atomic_t dept_outworld = ATOMIC_INIT(0);
+
+static inline void dept_outworld_enter(void)
+{
+	atomic_inc(&dept_outworld);
+}
+
+static inline void dept_outworld_exit(void)
+{
+	atomic_dec(&dept_outworld);
+}
+
+static inline bool dept_outworld_entered(void)
+{
+	return atomic_read(&dept_outworld);
+}
+
+static inline bool dept_lock(void)
+{
+	while (!arch_spin_trylock(&dept_spin))
+		if (unlikely(dept_outworld_entered()))
+			return false;
+	return true;
+}
+
+static inline void dept_unlock(void)
+{
+	arch_spin_unlock(&dept_spin);
+}
+
+/*
+ * whether to stack-trace on every wait or every ecxt
+ */
+static bool rich_stack = true;
+
+enum bfs_ret {
+	BFS_CONTINUE,
+	BFS_CONTINUE_REV,
+	BFS_DONE,
+	BFS_SKIP,
+};
+
+static inline bool before(unsigned int a, unsigned int b)
+{
+	return (int)(a - b) < 0;
+}
+
+static inline bool valid_stack(struct dept_stack *s)
+{
+	return s && s->nr > 0;
+}
+
+static inline bool valid_class(struct dept_class *c)
+{
+	return c->key;
+}
+
+static inline void inval_class(struct dept_class *c)
+{
+	c->key = 0UL;
+}
+
+static inline struct dept_ecxt *dep_e(struct dept_dep *d)
+{
+	return d->ecxt;
+}
+
+static inline struct dept_wait *dep_w(struct dept_dep *d)
+{
+	return d->wait;
+}
+
+static inline struct dept_class *dep_fc(struct dept_dep *d)
+{
+	return dep_e(d)->class;
+}
+
+static inline struct dept_class *dep_tc(struct dept_dep *d)
+{
+	return dep_w(d)->class;
+}
+
+static inline const char *irq_str(int irq)
+{
+	if (irq == DEPT_SIRQ)
+		return "softirq";
+	if (irq == DEPT_HIRQ)
+		return "hardirq";
+	return "(unknown)";
+}
+
+static inline struct dept_task *dept_task(void)
+{
+	return &current->dept_task;
+}
+
+/*
+ * Pool
+ * =====================================================================
+ * DEPT maintains pools to provide objects in a safe way.
+ *
+ *    1) Static pool is used at the beginning of booting time.
+ *    2) Local pool is tried first before the static pool. Objects that
+ *       have been freed will be placed.
+ */
+
+enum object_t {
+#define OBJECT(id, nr) OBJECT_##id,
+	#include "dept_object.h"
+#undef  OBJECT
+	OBJECT_NR,
+};
+
+#define OBJECT(id, nr)							\
+static struct dept_##id spool_##id[nr];					\
+static DEFINE_PER_CPU(struct llist_head, lpool_##id);
+	#include "dept_object.h"
+#undef  OBJECT
+
+static struct dept_pool pool[OBJECT_NR] = {
+#define OBJECT(id, nr) {						\
+	.name = #id,							\
+	.obj_sz = sizeof(struct dept_##id),				\
+	.obj_nr = ATOMIC_INIT(nr),					\
+	.node_off = offsetof(struct dept_##id, pool_node),		\
+	.spool = spool_##id,						\
+	.lpool = &lpool_##id, },
+	#include "dept_object.h"
+#undef  OBJECT
+};
+
+/*
+ * Can use llist no matter whether CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG is
+ * enabled or not because NMI and other contexts in the same CPU never
+ * run inside of DEPT concurrently by preventing reentrance.
+ */
+static void *from_pool(enum object_t t)
+{
+	struct dept_pool *p;
+	struct llist_head *h;
+	struct llist_node *n;
+
+	/*
+	 * llist_del_first() doesn't allow concurrent access e.g.
+	 * between process and IRQ context.
+	 */
+	if (DEPT_WARN_ON(!irqs_disabled()))
+		return NULL;
+
+	p = &pool[t];
+
+	/*
+	 * Try local pool first.
+	 */
+	if (likely(dept_per_cpu_ready))
+		h = this_cpu_ptr(p->lpool);
+	else
+		h = &p->boot_pool;
+
+	n = llist_del_first(h);
+	if (n)
+		return (void *)n - p->node_off;
+
+	/*
+	 * Try static pool.
+	 */
+	if (atomic_read(&p->obj_nr) > 0) {
+		int idx = atomic_dec_return(&p->obj_nr);
+
+		if (idx >= 0)
+			return p->spool + (idx * p->obj_sz);
+	}
+
+	DEPT_INFO_ONCE("Pool(%s) is empty.\n", p->name);
+	return NULL;
+}
+
+static void to_pool(void *o, enum object_t t)
+{
+	struct dept_pool *p = &pool[t];
+	struct llist_head *h;
+
+	preempt_disable();
+	if (likely(dept_per_cpu_ready))
+		h = this_cpu_ptr(p->lpool);
+	else
+		h = &p->boot_pool;
+
+	llist_add(o + p->node_off, h);
+	preempt_enable();
+}
+
+#define OBJECT(id, nr)							\
+static void (*ctor_##id)(struct dept_##id *a);				\
+static void (*dtor_##id)(struct dept_##id *a);				\
+static inline struct dept_##id *new_##id(void)				\
+{									\
+	struct dept_##id *a;						\
+									\
+	a = (struct dept_##id *)from_pool(OBJECT_##id);			\
+	if (unlikely(!a))						\
+		return NULL;						\
+									\
+	atomic_set(&a->ref, 1);						\
+									\
+	if (ctor_##id)							\
+		ctor_##id(a);						\
+									\
+	return a;							\
+}									\
+									\
+static inline struct dept_##id *get_##id(struct dept_##id *a)		\
+{									\
+	atomic_inc(&a->ref);						\
+	return a;							\
+}									\
+									\
+static inline void put_##id(struct dept_##id *a)			\
+{									\
+	if (!atomic_dec_return(&a->ref)) {				\
+		if (dtor_##id)						\
+			dtor_##id(a);					\
+		to_pool(a, OBJECT_##id);				\
+	}								\
+}									\
+									\
+static inline void del_##id(struct dept_##id *a)			\
+{									\
+	put_##id(a);							\
+}									\
+									\
+static inline bool id##_consumed(struct dept_##id *a)			\
+{									\
+	return a && atomic_read(&a->ref) > 1;				\
+}
+#include "dept_object.h"
+#undef  OBJECT
+
+#define SET_CONSTRUCTOR(id, f) \
+static void (*ctor_##id)(struct dept_##id *a) = f
+
+static void initialize_dep(struct dept_dep *d)
+{
+	INIT_LIST_HEAD(&d->bfs_node);
+	INIT_LIST_HEAD(&d->dep_node);
+	INIT_LIST_HEAD(&d->dep_rev_node);
+}
+SET_CONSTRUCTOR(dep, initialize_dep);
+
+static void initialize_class(struct dept_class *c)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		struct dept_iecxt *ie = &c->iecxt[i];
+		struct dept_iwait *iw = &c->iwait[i];
+
+		ie->ecxt = NULL;
+		ie->enirq = i;
+		ie->staled = false;
+
+		iw->wait = NULL;
+		iw->irq = i;
+		iw->staled = false;
+		iw->touched = false;
+	}
+	c->bfs_gen = 0U;
+
+	INIT_LIST_HEAD(&c->all_node);
+	INIT_LIST_HEAD(&c->dep_head);
+	INIT_LIST_HEAD(&c->dep_rev_head);
+}
+SET_CONSTRUCTOR(class, initialize_class);
+
+static void initialize_ecxt(struct dept_ecxt *e)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		e->enirq_stack[i] = NULL;
+		e->enirq_ip[i] = 0UL;
+	}
+	e->ecxt_ip = 0UL;
+	e->ecxt_stack = NULL;
+	e->enirqf = 0UL;
+	e->event_ip = 0UL;
+	e->event_stack = NULL;
+}
+SET_CONSTRUCTOR(ecxt, initialize_ecxt);
+
+static void initialize_wait(struct dept_wait *w)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		w->irq_stack[i] = NULL;
+		w->irq_ip[i] = 0UL;
+	}
+	w->wait_ip = 0UL;
+	w->wait_stack = NULL;
+	w->irqf = 0UL;
+}
+SET_CONSTRUCTOR(wait, initialize_wait);
+
+static void initialize_stack(struct dept_stack *s)
+{
+	s->nr = 0;
+}
+SET_CONSTRUCTOR(stack, initialize_stack);
+
+#define OBJECT(id, nr) \
+static void (*ctor_##id)(struct dept_##id *a);
+	#include "dept_object.h"
+#undef  OBJECT
+
+#undef  SET_CONSTRUCTOR
+
+#define SET_DESTRUCTOR(id, f) \
+static void (*dtor_##id)(struct dept_##id *a) = f
+
+static void destroy_dep(struct dept_dep *d)
+{
+	if (dep_e(d))
+		put_ecxt(dep_e(d));
+	if (dep_w(d))
+		put_wait(dep_w(d));
+}
+SET_DESTRUCTOR(dep, destroy_dep);
+
+static void destroy_ecxt(struct dept_ecxt *e)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++)
+		if (e->enirq_stack[i])
+			put_stack(e->enirq_stack[i]);
+	if (e->class)
+		put_class(e->class);
+	if (e->ecxt_stack)
+		put_stack(e->ecxt_stack);
+	if (e->event_stack)
+		put_stack(e->event_stack);
+}
+SET_DESTRUCTOR(ecxt, destroy_ecxt);
+
+static void destroy_wait(struct dept_wait *w)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++)
+		if (w->irq_stack[i])
+			put_stack(w->irq_stack[i]);
+	if (w->class)
+		put_class(w->class);
+	if (w->wait_stack)
+		put_stack(w->wait_stack);
+}
+SET_DESTRUCTOR(wait, destroy_wait);
+
+#define OBJECT(id, nr) \
+static void (*dtor_##id)(struct dept_##id *a);
+	#include "dept_object.h"
+#undef  OBJECT
+
+#undef  SET_DESTRUCTOR
+
+/*
+ * Caching and hashing
+ * =====================================================================
+ * DEPT makes use of caching and hashing to improve performance. Each
+ * object can be obtained in O(1) with its key.
+ *
+ * NOTE: Currently we assume all the objects in the hashs will never be
+ * removed. Implement it when needed.
+ */
+
+/*
+ * Some information might be lost but it's only for hashing key.
+ */
+static inline unsigned long mix(unsigned long a, unsigned long b)
+{
+	int halfbits = sizeof(unsigned long) * 8 / 2;
+	unsigned long halfmask = (1UL << halfbits) - 1UL;
+
+	return (a << halfbits) | (b & halfmask);
+}
+
+static bool cmp_dep(struct dept_dep *d1, struct dept_dep *d2)
+{
+	return dep_fc(d1)->key == dep_fc(d2)->key &&
+	       dep_tc(d1)->key == dep_tc(d2)->key;
+}
+
+static unsigned long key_dep(struct dept_dep *d)
+{
+	return mix(dep_fc(d)->key, dep_tc(d)->key);
+}
+
+static bool cmp_class(struct dept_class *c1, struct dept_class *c2)
+{
+	return c1->key == c2->key;
+}
+
+static unsigned long key_class(struct dept_class *c)
+{
+	return c->key;
+}
+
+#define HASH(id, bits)							\
+static struct hlist_head table_##id[1UL << bits];			\
+									\
+static inline struct hlist_head *head_##id(struct dept_##id *a)		\
+{									\
+	return table_##id + hash_long(key_##id(a), bits);		\
+}									\
+									\
+static inline struct dept_##id *hash_lookup_##id(struct dept_##id *a)	\
+{									\
+	struct dept_##id *b;						\
+									\
+	hlist_for_each_entry_rcu(b, head_##id(a), hash_node)		\
+		if (cmp_##id(a, b))					\
+			return b;					\
+	return NULL;							\
+}									\
+									\
+static inline void hash_add_##id(struct dept_##id *a)			\
+{									\
+	hlist_add_head_rcu(&a->hash_node, head_##id(a));		\
+}									\
+									\
+static inline void hash_del_##id(struct dept_##id *a)			\
+{									\
+	hlist_del_rcu(&a->hash_node);					\
+}
+#include "dept_hash.h"
+#undef  HASH
+
+static inline struct dept_dep *lookup_dep(struct dept_class *fc,
+					  struct dept_class *tc)
+{
+	struct dept_ecxt onetime_e = { .class = fc };
+	struct dept_wait onetime_w = { .class = tc };
+	struct dept_dep  onetime_d = { .ecxt = &onetime_e,
+				       .wait = &onetime_w };
+	return hash_lookup_dep(&onetime_d);
+}
+
+static inline struct dept_class *lookup_class(unsigned long key)
+{
+	struct dept_class onetime_c = { .key = key };
+
+	return hash_lookup_class(&onetime_c);
+}
+
+/*
+ * Report
+ * =====================================================================
+ * DEPT prints useful information to help debuging on detection of
+ * problematic dependency.
+ */
+
+static inline void print_ip_stack(unsigned long ip, struct dept_stack *s)
+{
+	if (ip)
+		print_ip_sym(KERN_WARNING, ip);
+
+	if (valid_stack(s)) {
+		pr_warn("stacktrace:\n");
+		stack_trace_print(s->raw, s->nr, 5);
+	}
+
+	if (!ip && !valid_stack(s))
+		pr_warn("(N/A)\n");
+}
+
+#define print_spc(spc, fmt, ...)					\
+	pr_warn("%*c" fmt, (spc) * 4, ' ', ##__VA_ARGS__)
+
+static void print_diagram(struct dept_dep *d)
+{
+	struct dept_ecxt *e = dep_e(d);
+	struct dept_wait *w = dep_w(d);
+	struct dept_class *fc = dep_fc(d);
+	struct dept_class *tc = dep_tc(d);
+	unsigned long irqf;
+	int irq;
+	bool firstline = true;
+	int spc = 1;
+	const char *w_fn = w->wait_fn  ?: "(unknown)";
+	const char *e_fn = e->event_fn ?: "(unknown)";
+	const char *c_fn = e->ecxt_fn ?: "(unknown)";
+
+	irqf = e->enirqf & w->irqf;
+	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+		if (!firstline)
+			pr_warn("\nor\n\n");
+		firstline = false;
+
+		print_spc(spc, "[S] %s(%s:%d)\n", c_fn, fc->name, fc->sub);
+		print_spc(spc, "    <%s interrupt>\n", irq_str(irq));
+		print_spc(spc + 1, "[W] %s(%s:%d)\n", w_fn, tc->name, tc->sub);
+		print_spc(spc, "[E] %s(%s:%d)\n", e_fn, fc->name, fc->sub);
+	}
+
+	if (!irqf) {
+		print_spc(spc, "[S] %s(%s:%d)\n", c_fn, fc->name, fc->sub);
+		print_spc(spc, "[W] %s(%s:%d)\n", w_fn, tc->name, tc->sub);
+		print_spc(spc, "[E] %s(%s:%d)\n", e_fn, fc->name, fc->sub);
+	}
+}
+
+static void print_dep(struct dept_dep *d)
+{
+	struct dept_ecxt *e = dep_e(d);
+	struct dept_wait *w = dep_w(d);
+	struct dept_class *fc = dep_fc(d);
+	struct dept_class *tc = dep_tc(d);
+	unsigned long irqf;
+	int irq;
+	const char *w_fn = w->wait_fn  ?: "(unknown)";
+	const char *e_fn = e->event_fn ?: "(unknown)";
+	const char *c_fn = e->ecxt_fn ?: "(unknown)";
+
+	irqf = e->enirqf & w->irqf;
+	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+		pr_warn("%s has been enabled:\n", irq_str(irq));
+		print_ip_stack(e->enirq_ip[irq], e->enirq_stack[irq]);
+		pr_warn("\n");
+
+		pr_warn("[S] %s(%s:%d):\n", c_fn, fc->name, fc->sub);
+		print_ip_stack(e->ecxt_ip, e->ecxt_stack);
+		pr_warn("\n");
+
+		pr_warn("[W] %s(%s:%d) in %s context:\n",
+		       w_fn, tc->name, tc->sub, irq_str(irq));
+		print_ip_stack(w->irq_ip[irq], w->irq_stack[irq]);
+		pr_warn("\n");
+
+		pr_warn("[E] %s(%s:%d):\n", e_fn, fc->name, fc->sub);
+		print_ip_stack(e->event_ip, e->event_stack);
+	}
+
+	if (!irqf) {
+		pr_warn("[S] %s(%s:%d):\n", c_fn, fc->name, fc->sub);
+		print_ip_stack(e->ecxt_ip, e->ecxt_stack);
+		pr_warn("\n");
+
+		pr_warn("[W] %s(%s:%d):\n", w_fn, tc->name, tc->sub);
+		print_ip_stack(w->wait_ip, w->wait_stack);
+		pr_warn("\n");
+
+		pr_warn("[E] %s(%s:%d):\n", e_fn, fc->name, fc->sub);
+		print_ip_stack(e->event_ip, e->event_stack);
+	}
+}
+
+static void save_current_stack(int skip);
+
+/*
+ * Print all classes in a circle.
+ */
+static void print_circle(struct dept_class *c)
+{
+	struct dept_class *fc = c->bfs_parent;
+	struct dept_class *tc = c;
+	int i;
+
+	dept_outworld_enter();
+	save_current_stack(6);
+
+	pr_warn("===================================================\n");
+	pr_warn("DEPT: Circular dependency has been detected.\n");
+	pr_warn("%s %.*s %s\n", init_utsname()->release,
+		(int)strcspn(init_utsname()->version, " "),
+		init_utsname()->version,
+		print_tainted());
+	pr_warn("---------------------------------------------------\n");
+	pr_warn("summary\n");
+	pr_warn("---------------------------------------------------\n");
+
+	if (fc == tc)
+		pr_warn("*** AA DEADLOCK ***\n\n");
+	else
+		pr_warn("*** DEADLOCK ***\n\n");
+
+	i = 0;
+	do {
+		struct dept_dep *d = lookup_dep(fc, tc);
+
+		pr_warn("context %c\n", 'A' + (i++));
+		print_diagram(d);
+		if (fc != c)
+			pr_warn("\n");
+
+		tc = fc;
+		fc = fc->bfs_parent;
+	} while (tc != c);
+
+	pr_warn("\n");
+	pr_warn("[S]: start of the event context\n");
+	pr_warn("[W]: the wait blocked\n");
+	pr_warn("[E]: the event not reachable\n");
+
+	i = 0;
+	do {
+		struct dept_dep *d = lookup_dep(fc, tc);
+
+		pr_warn("---------------------------------------------------\n");
+		pr_warn("context %c's detail\n", 'A' + i);
+		pr_warn("---------------------------------------------------\n");
+		pr_warn("context %c\n", 'A' + (i++));
+		print_diagram(d);
+		pr_warn("\n");
+		print_dep(d);
+
+		tc = fc;
+		fc = fc->bfs_parent;
+	} while (tc != c);
+
+	pr_warn("---------------------------------------------------\n");
+	pr_warn("information that might be helpful\n");
+	pr_warn("---------------------------------------------------\n");
+	dump_stack();
+
+	dept_outworld_exit();
+}
+
+/*
+ * BFS(Breadth First Search)
+ * =====================================================================
+ * Whenever a new dependency is added into the graph, search the graph
+ * for a new circular dependency.
+ */
+
+static inline void enqueue(struct list_head *h, struct dept_dep *d)
+{
+	list_add_tail(&d->bfs_node, h);
+}
+
+static inline struct dept_dep *dequeue(struct list_head *h)
+{
+	struct dept_dep *d;
+
+	d = list_first_entry(h, struct dept_dep, bfs_node);
+	list_del(&d->bfs_node);
+	return d;
+}
+
+static inline bool empty(struct list_head *h)
+{
+	return list_empty(h);
+}
+
+static void extend_queue(struct list_head *h, struct dept_class *cur)
+{
+	struct dept_dep *d;
+
+	list_for_each_entry(d, &cur->dep_head, dep_node) {
+		struct dept_class *next = dep_tc(d);
+
+		if (cur->bfs_gen == next->bfs_gen)
+			continue;
+		next->bfs_gen = cur->bfs_gen;
+		next->bfs_dist = cur->bfs_dist + 1;
+		next->bfs_parent = cur;
+		enqueue(h, d);
+	}
+}
+
+static void extend_queue_rev(struct list_head *h, struct dept_class *cur)
+{
+	struct dept_dep *d;
+
+	list_for_each_entry(d, &cur->dep_rev_head, dep_rev_node) {
+		struct dept_class *next = dep_fc(d);
+
+		if (cur->bfs_gen == next->bfs_gen)
+			continue;
+		next->bfs_gen = cur->bfs_gen;
+		next->bfs_dist = cur->bfs_dist + 1;
+		next->bfs_parent = cur;
+		enqueue(h, d);
+	}
+}
+
+typedef enum bfs_ret bfs_f(struct dept_dep *d, void *in, void **out);
+static unsigned int bfs_gen;
+
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static void bfs(struct dept_class *c, bfs_f *cb, void *in, void **out)
+{
+	LIST_HEAD(q);
+	enum bfs_ret ret;
+
+	if (DEPT_WARN_ON(!cb))
+		return;
+
+	/*
+	 * Avoid zero bfs_gen.
+	 */
+	bfs_gen = bfs_gen + 1 ?: 1;
+
+	c->bfs_gen = bfs_gen;
+	c->bfs_dist = 0;
+	c->bfs_parent = c;
+
+	ret = cb(NULL, in, out);
+	if (ret == BFS_DONE)
+		return;
+	if (ret == BFS_SKIP)
+		return;
+	if (ret == BFS_CONTINUE)
+		extend_queue(&q, c);
+	if (ret == BFS_CONTINUE_REV)
+		extend_queue_rev(&q, c);
+
+	while (!empty(&q)) {
+		struct dept_dep *d = dequeue(&q);
+
+		ret = cb(d, in, out);
+		if (ret == BFS_DONE)
+			break;
+		if (ret == BFS_SKIP)
+			continue;
+		if (ret == BFS_CONTINUE)
+			extend_queue(&q, dep_tc(d));
+		if (ret == BFS_CONTINUE_REV)
+			extend_queue_rev(&q, dep_fc(d));
+	}
+
+	while (!empty(&q))
+		dequeue(&q);
+}
+
+/*
+ * Main operations
+ * =====================================================================
+ * Add dependencies - Each new dependency is added into the graph and
+ * checked if it forms a circular dependency.
+ *
+ * Track waits - Waits are queued into the ring buffer for later use to
+ * generate appropriate dependencies with cross-event.
+ *
+ * Track event contexts(ecxt) - Event contexts are pushed into local
+ * stack for later use to generate appropriate dependencies with waits.
+ */
+
+static inline unsigned long cur_enirqf(void);
+static inline int cur_irq(void);
+static inline unsigned int cur_ctxt_id(void);
+
+static inline struct dept_iecxt *iecxt(struct dept_class *c, int irq)
+{
+	return &c->iecxt[irq];
+}
+
+static inline struct dept_iwait *iwait(struct dept_class *c, int irq)
+{
+	return &c->iwait[irq];
+}
+
+static inline void stale_iecxt(struct dept_iecxt *ie)
+{
+	if (ie->ecxt)
+		put_ecxt(ie->ecxt);
+
+	WRITE_ONCE(ie->ecxt, NULL);
+	WRITE_ONCE(ie->staled, true);
+}
+
+static inline void set_iecxt(struct dept_iecxt *ie, struct dept_ecxt *e)
+{
+	/*
+	 * ->ecxt will never be updated once getting set until the class
+	 * gets removed.
+	 */
+	if (ie->ecxt)
+		DEPT_WARN_ON(1);
+	else
+		WRITE_ONCE(ie->ecxt, get_ecxt(e));
+}
+
+static inline void stale_iwait(struct dept_iwait *iw)
+{
+	if (iw->wait)
+		put_wait(iw->wait);
+
+	WRITE_ONCE(iw->wait, NULL);
+	WRITE_ONCE(iw->staled, true);
+}
+
+static inline void set_iwait(struct dept_iwait *iw, struct dept_wait *w)
+{
+	/*
+	 * ->wait will never be updated once getting set until the class
+	 * gets removed.
+	 */
+	if (iw->wait)
+		DEPT_WARN_ON(1);
+	else
+		WRITE_ONCE(iw->wait, get_wait(w));
+
+	iw->touched = true;
+}
+
+static inline void touch_iwait(struct dept_iwait *iw)
+{
+	iw->touched = true;
+}
+
+static inline void untouch_iwait(struct dept_iwait *iw)
+{
+	iw->touched = false;
+}
+
+static inline struct dept_stack *get_current_stack(void)
+{
+	struct dept_stack *s = dept_task()->stack;
+
+	return s ? get_stack(s) : NULL;
+}
+
+static inline void prepare_current_stack(void)
+{
+	struct dept_stack *s = dept_task()->stack;
+
+	/*
+	 * The dept_stack is already ready.
+	 */
+	if (s && !stack_consumed(s)) {
+		s->nr = 0;
+		return;
+	}
+
+	if (s)
+		put_stack(s);
+
+	s = dept_task()->stack = new_stack();
+	if (!s)
+		return;
+
+	get_stack(s);
+	del_stack(s);
+}
+
+static void save_current_stack(int skip)
+{
+	struct dept_stack *s = dept_task()->stack;
+
+	if (!s)
+		return;
+	if (valid_stack(s))
+		return;
+
+	s->nr = stack_trace_save(s->raw, DEPT_MAX_STACK_ENTRY, skip);
+}
+
+static void finish_current_stack(void)
+{
+	struct dept_stack *s = dept_task()->stack;
+
+	if (stack_consumed(s))
+		save_current_stack(2);
+}
+
+/*
+ * FIXME: For now, disable LOCKDEP while DEPT is working.
+ *
+ * Both LOCKDEP and DEPT report it on a deadlock detection using
+ * printk taking the risk of another deadlock that might be caused by
+ * locks of console or printk between inside and outside of them.
+ *
+ * For DEPT, it's no problem since multiple reports are allowed. But it
+ * would be a bad idea for LOCKDEP since it will stop even on a singe
+ * report. So we need to prevent LOCKDEP from its reporting the risk
+ * DEPT would take when reporting something.
+ */
+#include <linux/lockdep.h>
+
+void dept_off(void)
+{
+	dept_task()->recursive++;
+	lockdep_off();
+}
+
+void dept_on(void)
+{
+	dept_task()->recursive--;
+	lockdep_on();
+}
+
+static inline unsigned long dept_enter(void)
+{
+	unsigned long flags;
+
+	flags = arch_local_irq_save();
+	dept_off();
+	prepare_current_stack();
+	return flags;
+}
+
+static inline void dept_exit(unsigned long flags)
+{
+	finish_current_stack();
+	dept_on();
+	arch_local_irq_restore(flags);
+}
+
+static inline unsigned long dept_enter_recursive(void)
+{
+	unsigned long flags;
+
+	flags = arch_local_irq_save();
+	return flags;
+}
+
+static inline void dept_exit_recursive(unsigned long flags)
+{
+	arch_local_irq_restore(flags);
+}
+
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static struct dept_dep *__add_dep(struct dept_ecxt *e,
+				  struct dept_wait *w)
+{
+	struct dept_dep *d;
+
+	if (!valid_class(e->class) || !valid_class(w->class))
+		return NULL;
+
+	if (lookup_dep(e->class, w->class))
+		return NULL;
+
+	d = new_dep();
+	if (unlikely(!d))
+		return NULL;
+
+	d->ecxt = get_ecxt(e);
+	d->wait = get_wait(w);
+
+	/*
+	 * Add the dependency into hash and graph.
+	 */
+	hash_add_dep(d);
+	list_add(&d->dep_node, &dep_fc(d)->dep_head);
+	list_add(&d->dep_rev_node, &dep_tc(d)->dep_rev_head);
+	return d;
+}
+
+static enum bfs_ret cb_check_dl(struct dept_dep *d,
+				void *in, void **out)
+{
+	struct dept_dep *new = (struct dept_dep *)in;
+
+	/*
+	 * initial condition for this BFS search
+	 */
+	if (!d) {
+		dep_tc(new)->bfs_parent = dep_fc(new);
+
+		if (dep_tc(new) != dep_fc(new))
+			return BFS_CONTINUE;
+
+		/*
+		 * AA circle does not make additional deadlock. We don't
+		 * have to continue this BFS search.
+		 */
+		print_circle(dep_tc(new));
+		return BFS_DONE;
+	}
+
+	/*
+	 * Allow multiple reports.
+	 */
+	if (dep_tc(d) == dep_fc(new))
+		print_circle(dep_tc(new));
+
+	return BFS_CONTINUE;
+}
+
+/*
+ * This function is actually in charge of reporting.
+ */
+static inline void check_dl_bfs(struct dept_dep *d)
+{
+	bfs(dep_tc(d), cb_check_dl, (void *)d, NULL);
+}
+
+static enum bfs_ret cb_find_iw(struct dept_dep *d, void *in, void **out)
+{
+	int irq = *(int *)in;
+	struct dept_class *fc;
+	struct dept_iwait *iw;
+
+	if (DEPT_WARN_ON(!out))
+		return BFS_DONE;
+
+	/*
+	 * initial condition for this BFS search
+	 */
+	if (!d)
+		return BFS_CONTINUE_REV;
+
+	fc = dep_fc(d);
+	iw = iwait(fc, irq);
+
+	/*
+	 * If any parent's ->wait was set, then the children would've
+	 * been touched.
+	 */
+	if (!iw->touched)
+		return BFS_SKIP;
+
+	if (!iw->wait)
+		return BFS_CONTINUE_REV;
+
+	*out = iw;
+	return BFS_DONE;
+}
+
+static struct dept_iwait *find_iw_bfs(struct dept_class *c, int irq)
+{
+	struct dept_iwait *iw = iwait(c, irq);
+	struct dept_iwait *found = NULL;
+
+	if (iw->wait)
+		return iw;
+
+	/*
+	 * '->touched == false' guarantees there's no parent that has
+	 * been set ->wait.
+	 */
+	if (!iw->touched)
+		return NULL;
+
+	bfs(c, cb_find_iw, (void *)&irq, (void **)&found);
+
+	if (found)
+		return found;
+
+	untouch_iwait(iw);
+	return NULL;
+}
+
+static enum bfs_ret cb_touch_iw_find_ie(struct dept_dep *d, void *in,
+					void **out)
+{
+	int irq = *(int *)in;
+	struct dept_class *tc;
+	struct dept_iecxt *ie;
+	struct dept_iwait *iw;
+
+	if (DEPT_WARN_ON(!out))
+		return BFS_DONE;
+
+	/*
+	 * initial condition for this BFS search
+	 */
+	if (!d)
+		return BFS_CONTINUE;
+
+	tc = dep_tc(d);
+	ie = iecxt(tc, irq);
+	iw = iwait(tc, irq);
+
+	touch_iwait(iw);
+
+	if (!ie->ecxt)
+		return BFS_CONTINUE;
+
+	if (!*out)
+		*out = ie;
+
+	return BFS_CONTINUE;
+}
+
+static struct dept_iecxt *touch_iw_find_ie_bfs(struct dept_class *c,
+					       int irq)
+{
+	struct dept_iecxt *ie = iecxt(c, irq);
+	struct dept_iwait *iw = iwait(c, irq);
+	struct dept_iecxt *found = ie->ecxt ? ie : NULL;
+
+	touch_iwait(iw);
+	bfs(c, cb_touch_iw_find_ie, (void *)&irq, (void **)&found);
+	return found;
+}
+
+/*
+ * Should be called with dept_lock held.
+ */
+static void __add_idep(struct dept_iecxt *ie, struct dept_iwait *iw)
+{
+	struct dept_dep *new;
+
+	/*
+	 * There's nothing to do.
+	 */
+	if (!ie || !iw || !ie->ecxt || !iw->wait)
+		return;
+
+	new = __add_dep(ie->ecxt, iw->wait);
+
+	/*
+	 * Deadlock detected. Let check_dl_bfs() report it.
+	 */
+	if (new) {
+		check_dl_bfs(new);
+		stale_iecxt(ie);
+		stale_iwait(iw);
+	}
+
+	/*
+	 * If !new, it would be the case of lack of object resource.
+	 * Just let it go and get checked by other chances. Retrying is
+	 * meaningless in that case.
+	 */
+}
+
+static void set_check_iecxt(struct dept_class *c, int irq,
+			    struct dept_ecxt *e)
+{
+	struct dept_iecxt *ie = iecxt(c, irq);
+
+	set_iecxt(ie, e);
+	__add_idep(ie, find_iw_bfs(c, irq));
+}
+
+static void set_check_iwait(struct dept_class *c, int irq,
+			    struct dept_wait *w)
+{
+	struct dept_iwait *iw = iwait(c, irq);
+
+	set_iwait(iw, w);
+	__add_idep(touch_iw_find_ie_bfs(c, irq), iw);
+}
+
+static void add_iecxt(struct dept_class *c, int irq, struct dept_ecxt *e,
+		      bool stack)
+{
+	/*
+	 * This access is safe since we ensure e->class has set locally.
+	 */
+	struct dept_task *dt = dept_task();
+	struct dept_iecxt *ie = iecxt(c, irq);
+
+	if (unlikely(READ_ONCE(ie->staled)))
+		return;
+
+	/*
+	 * Skip add_iecxt() if ie->ecxt has ever been set at least once.
+	 * Which means it has a valid ->ecxt or been staled.
+	 */
+	if (READ_ONCE(ie->ecxt))
+		return;
+
+	if (unlikely(!dept_lock()))
+		return;
+
+	if (unlikely(ie->staled))
+		goto unlock;
+	if (ie->ecxt)
+		goto unlock;
+
+	e->enirqf |= (1UL << irq);
+
+	/*
+	 * Should be NULL since it's the first time that these
+	 * enirq_{ip,stack}[irq] have ever set.
+	 */
+	DEPT_WARN_ON(e->enirq_ip[irq]);
+	DEPT_WARN_ON(e->enirq_stack[irq]);
+
+	e->enirq_ip[irq] = dt->enirq_ip[irq];
+	e->enirq_stack[irq] = stack ? get_current_stack() : NULL;
+
+	set_check_iecxt(c, irq, e);
+unlock:
+	dept_unlock();
+}
+
+static void add_iwait(struct dept_class *c, int irq, struct dept_wait *w)
+{
+	struct dept_iwait *iw = iwait(c, irq);
+
+	if (unlikely(READ_ONCE(iw->staled)))
+		return;
+
+	/*
+	 * Skip add_iwait() if iw->wait has ever been set at least once.
+	 * Which means it has a valid ->wait or been staled.
+	 */
+	if (READ_ONCE(iw->wait))
+		return;
+
+	if (unlikely(!dept_lock()))
+		return;
+
+	if (unlikely(iw->staled))
+		goto unlock;
+	if (iw->wait)
+		goto unlock;
+
+	w->irqf |= (1UL << irq);
+
+	/*
+	 * Should be NULL since it's the first time that these
+	 * irq_{ip,stack}[irq] have ever set.
+	 */
+	DEPT_WARN_ON(w->irq_ip[irq]);
+	DEPT_WARN_ON(w->irq_stack[irq]);
+
+	w->irq_ip[irq] = w->wait_ip;
+	w->irq_stack[irq] = get_current_stack();
+
+	set_check_iwait(c, irq, w);
+unlock:
+	dept_unlock();
+}
+
+static inline struct dept_wait_hist *hist(int pos)
+{
+	struct dept_task *dt = dept_task();
+
+	return dt->wait_hist + (pos % DEPT_MAX_WAIT_HIST);
+}
+
+static inline int hist_pos_next(void)
+{
+	struct dept_task *dt = dept_task();
+
+	return dt->wait_hist_pos % DEPT_MAX_WAIT_HIST;
+}
+
+static inline void hist_advance(void)
+{
+	struct dept_task *dt = dept_task();
+
+	dt->wait_hist_pos++;
+	dt->wait_hist_pos %= DEPT_MAX_WAIT_HIST;
+}
+
+static inline struct dept_wait_hist *new_hist(void)
+{
+	struct dept_wait_hist *wh = hist(hist_pos_next());
+
+	hist_advance();
+	return wh;
+}
+
+static void add_hist(struct dept_wait *w, unsigned int wg, unsigned int ctxt_id)
+{
+	struct dept_wait_hist *wh = new_hist();
+
+	if (likely(wh->wait))
+		put_wait(wh->wait);
+
+	wh->wait = get_wait(w);
+	wh->wgen = wg;
+	wh->ctxt_id = ctxt_id;
+}
+
+/*
+ * Should be called after setting up e's iecxt and w's iwait.
+ */
+static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
+{
+	struct dept_class *fc = e->class;
+	struct dept_class *tc = w->class;
+	struct dept_dep *d;
+	int i;
+
+	if (lookup_dep(fc, tc))
+		return;
+
+	if (unlikely(!dept_lock()))
+		return;
+
+	/*
+	 * __add_dep() will lookup_dep() again with lock held.
+	 */
+	d = __add_dep(e, w);
+	if (d) {
+		check_dl_bfs(d);
+
+		for (i = 0; i < DEPT_IRQS_NR; i++) {
+			struct dept_iwait *fiw = iwait(fc, i);
+			struct dept_iecxt *found_ie;
+			struct dept_iwait *found_iw;
+
+			/*
+			 * '->touched == false' guarantees there's no
+			 * parent that has been set ->wait.
+			 */
+			if (!fiw->touched)
+				continue;
+
+			/*
+			 * find_iw_bfs() will untouch the iwait if
+			 * not found.
+			 */
+			found_iw = find_iw_bfs(fc, i);
+
+			if (!found_iw)
+				continue;
+
+			found_ie = touch_iw_find_ie_bfs(tc, i);
+			__add_idep(found_ie, found_iw);
+		}
+	}
+	dept_unlock();
+}
+
+static atomic_t wgen = ATOMIC_INIT(1);
+
+static void add_wait(struct dept_class *c, unsigned long ip,
+		     const char *w_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_wait *w;
+	unsigned int wg = 0U;
+	int irq;
+	int i;
+
+	w = new_wait();
+	if (unlikely(!w))
+		return;
+
+	WRITE_ONCE(w->class, get_class(c));
+	w->wait_ip = ip;
+	w->wait_fn = w_fn;
+	w->wait_stack = get_current_stack();
+
+	irq = cur_irq();
+	if (irq < DEPT_IRQS_NR)
+		add_iwait(c, irq, w);
+
+	/*
+	 * Avoid adding dependency between user aware nested ecxt and
+	 * wait.
+	 */
+	for (i = dt->ecxt_held_pos - 1; i >= 0; i--) {
+		struct dept_ecxt_held *eh;
+
+		eh = dt->ecxt_held + i;
+		if (eh->ecxt->class != c || eh->nest == ne)
+			add_dep(eh->ecxt, w);
+	}
+
+	if (!wait_consumed(w) && !rich_stack) {
+		if (w->wait_stack)
+			put_stack(w->wait_stack);
+		w->wait_stack = NULL;
+	}
+
+	/*
+	 * Avoid zero wgen.
+	 */
+	wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+	add_hist(w, wg, cur_ctxt_id());
+
+	del_wait(w);
+}
+
+static bool add_ecxt(void *obj, struct dept_class *c, unsigned long ip,
+		     const char *c_fn, const char *e_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_ecxt_held *eh;
+	struct dept_ecxt *e;
+	unsigned long irqf;
+	int irq;
+
+	if (DEPT_WARN_ON(dt->ecxt_held_pos == DEPT_MAX_ECXT_HELD))
+		return false;
+
+	e = new_ecxt();
+	if (unlikely(!e))
+		return false;
+
+	e->class = get_class(c);
+	e->ecxt_ip = ip;
+	e->ecxt_stack = ip && rich_stack ? get_current_stack() : NULL;
+	e->event_fn = e_fn;
+	e->ecxt_fn = c_fn;
+
+	eh = dt->ecxt_held + (dt->ecxt_held_pos++);
+	eh->ecxt = get_ecxt(e);
+	eh->key = (unsigned long)obj;
+	eh->wgen = atomic_read(&wgen);
+	eh->nest = ne;
+
+	irqf = cur_enirqf();
+	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR)
+		add_iecxt(c, irq, e, false);
+
+	del_ecxt(e);
+	return true;
+}
+
+static int find_ecxt_pos(unsigned long key, struct dept_class *c,
+			 bool newfirst)
+{
+	struct dept_task *dt = dept_task();
+	int i;
+
+	if (newfirst) {
+		for (i = dt->ecxt_held_pos - 1; i >= 0; i--) {
+			struct dept_ecxt_held *eh;
+
+			eh = dt->ecxt_held + i;
+			if (eh->key == key && eh->ecxt->class == c)
+				return i;
+		}
+	} else {
+		for (i = 0; i < dt->ecxt_held_pos; i++) {
+			struct dept_ecxt_held *eh;
+
+			eh = dt->ecxt_held + i;
+			if (eh->key == key && eh->ecxt->class == c)
+				return i;
+		}
+	}
+	return -1;
+}
+
+static bool pop_ecxt(void *obj, struct dept_class *c)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long key = (unsigned long)obj;
+	int pos;
+	int i;
+
+	pos = find_ecxt_pos(key, c, true);
+	if (pos == -1)
+		return false;
+
+	put_ecxt(dt->ecxt_held[pos].ecxt);
+	dt->ecxt_held_pos--;
+
+	for (i = pos; i < dt->ecxt_held_pos; i++)
+		dt->ecxt_held[i] = dt->ecxt_held[i + 1];
+	return true;
+}
+
+static inline bool good_hist(struct dept_wait_hist *wh, unsigned int wg)
+{
+	return wh->wait != NULL && before(wg, wh->wgen);
+}
+
+/*
+ * Binary-search the ring buffer for the earliest valid wait.
+ */
+static int find_hist_pos(unsigned int wg)
+{
+	int oldest;
+	int l;
+	int r;
+	int pos;
+
+	oldest = hist_pos_next();
+	if (unlikely(good_hist(hist(oldest), wg))) {
+		DEPT_INFO_ONCE("Need to expand the ring buffer.\n");
+		return oldest;
+	}
+
+	l = oldest + 1;
+	r = oldest + DEPT_MAX_WAIT_HIST - 1;
+	for (pos = (l + r) / 2; l <= r; pos = (l + r) / 2) {
+		struct dept_wait_hist *p = hist(pos - 1);
+		struct dept_wait_hist *wh = hist(pos);
+
+		if (!good_hist(p, wg) && good_hist(wh, wg))
+			return pos % DEPT_MAX_WAIT_HIST;
+		if (good_hist(wh, wg))
+			r = pos - 1;
+		else
+			l = pos + 1;
+	}
+	return -1;
+}
+
+static void do_event(void *obj, struct dept_class *c, unsigned int wg,
+		     unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_wait_hist *wh;
+	struct dept_ecxt_held *eh;
+	unsigned long key = (unsigned long)obj;
+	unsigned int ctxt_id;
+	int end;
+	int pos;
+	int i;
+
+	/*
+	 * The event was triggered before wait.
+	 */
+	if (!wg)
+		return;
+
+	pos = find_ecxt_pos(key, c, false);
+	if (pos == -1)
+		return;
+
+	eh = dt->ecxt_held + pos;
+	eh->ecxt->event_ip = ip;
+	eh->ecxt->event_stack = get_current_stack();
+
+	/*
+	 * The ecxt already has done what it needs.
+	 */
+	if (!before(wg, eh->wgen))
+		return;
+
+	pos = find_hist_pos(wg);
+	if (pos == -1)
+		return;
+
+	ctxt_id = cur_ctxt_id();
+	end = hist_pos_next();
+	end = end > pos ? end : end + DEPT_MAX_WAIT_HIST;
+	for (wh = hist(pos); pos < end; wh = hist(++pos)) {
+		if (wh->ctxt_id == ctxt_id)
+			add_dep(eh->ecxt, wh->wait);
+		if (!before(wh->wgen, eh->wgen))
+			break;
+	}
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		struct dept_ecxt *e;
+
+		if (before(dt->wgen_enirq[i], wg))
+			continue;
+
+		e = eh->ecxt;
+		add_iecxt(e->class, i, e, false);
+	}
+}
+
+static void del_dep_rcu(struct rcu_head *rh)
+{
+	struct dept_dep *d = container_of(rh, struct dept_dep, rh);
+
+	preempt_disable();
+	del_dep(d);
+	preempt_enable();
+}
+
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static void disconnect_class(struct dept_class *c)
+{
+	struct dept_dep *d, *n;
+	int i;
+
+	list_for_each_entry_safe(d, n, &c->dep_head, dep_node) {
+		list_del_rcu(&d->dep_node);
+		list_del_rcu(&d->dep_rev_node);
+		hash_del_dep(d);
+		call_rcu(&d->rh, del_dep_rcu);
+	}
+
+	list_for_each_entry_safe(d, n, &c->dep_rev_head, dep_rev_node) {
+		list_del_rcu(&d->dep_node);
+		list_del_rcu(&d->dep_rev_node);
+		hash_del_dep(d);
+		call_rcu(&d->rh, del_dep_rcu);
+	}
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		stale_iecxt(iecxt(c, i));
+		stale_iwait(iwait(c, i));
+	}
+}
+
+/*
+ * IRQ context control
+ * =====================================================================
+ * Whether a wait is in {hard,soft}-IRQ context or whether
+ * {hard,soft}-IRQ has been enabled on the way to an event is very
+ * important to check dependency. All those things should be tracked.
+ */
+
+static inline unsigned long cur_enirqf(void)
+{
+	struct dept_task *dt = dept_task();
+	int he = dt->hardirqs_enabled;
+	int se = dt->softirqs_enabled;
+
+	if (he)
+		return DEPT_HIRQF | (se ? DEPT_SIRQF : 0UL);
+	return 0UL;
+}
+
+static inline int cur_irq(void)
+{
+	if (lockdep_softirq_context(current))
+		return DEPT_SIRQ;
+	if (lockdep_hardirq_context())
+		return DEPT_HIRQ;
+	return DEPT_IRQS_NR;
+}
+
+static inline unsigned int cur_ctxt_id(void)
+{
+	struct dept_task *dt = dept_task();
+	int irq = cur_irq();
+
+	/*
+	 * Normal process context
+	 */
+	if (irq == DEPT_IRQS_NR)
+		return 0U;
+
+	return dt->irq_id[irq] | (1UL << irq);
+}
+
+static void enirq_transition(int irq)
+{
+	struct dept_task *dt = dept_task();
+	int i;
+
+	/*
+	 * READ wgen >= wgen of an event with IRQ enabled has been
+	 * observed on the way to the event means, the IRQ can cut in
+	 * within the ecxt. Used for cross-event detection.
+	 *
+	 *    wait context	event context(ecxt)
+	 *    ------------	-------------------
+	 *    wait event
+	 *       WRITE wgen
+	 *			observe IRQ enabled
+	 *			   READ wgen
+	 *			   keep the wgen locally
+	 *
+	 *			on the event
+	 *			   check the local wgen
+	 */
+	dt->wgen_enirq[irq] = atomic_read(&wgen);
+
+	for (i = dt->ecxt_held_pos - 1; i >= 0; i--) {
+		struct dept_ecxt_held *eh;
+		struct dept_ecxt *e;
+
+		eh = dt->ecxt_held + i;
+		e = eh->ecxt;
+		add_iecxt(e->class, irq, e, true);
+	}
+}
+
+static void enirq_update(unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long irqf;
+	unsigned long prev;
+	int irq;
+
+	prev = dt->eff_enirqf;
+	irqf = cur_enirqf();
+	dt->eff_enirqf = irqf;
+
+	/*
+	 * Do enirq_transition() only on an OFF -> ON transition.
+	 */
+	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+		if (prev & (1UL << irq))
+			continue;
+
+		dt->enirq_ip[irq] = ip;
+		enirq_transition(irq);
+	}
+}
+
+void dept_aware_softirqs_enable(void)
+{
+	dept_task()->softirqs_enabled = true;
+}
+
+void dept_aware_softirqs_disable(void)
+{
+	dept_task()->softirqs_enabled = false;
+}
+
+void dept_aware_hardirqs_enable(void)
+{
+	dept_task()->hardirqs_enabled = true;
+}
+EXPORT_SYMBOL_GPL(dept_aware_hardirqs_enable);
+
+void dept_aware_hardirqs_disable(void)
+{
+	dept_task()->hardirqs_enabled = false;
+}
+EXPORT_SYMBOL_GPL(dept_aware_hardirqs_disable);
+
+/*
+ * Ensure it has been called on ON/OFF transition.
+ */
+void dept_enirq_transition(unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	/*
+	 * IRQ ON/OFF transition might happen while Dept is working.
+	 * We cannot handle recursive entrance. Just ingnore it.
+	 * Only transitions outside of Dept will be considered.
+	 */
+	if (dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	enirq_update(ip);
+
+	dept_exit(flags);
+}
+
+/*
+ * Ensure it's the outmost softirq context.
+ */
+void dept_softirq_enter(void)
+{
+	struct dept_task *dt = dept_task();
+
+	dt->irq_id[DEPT_SIRQ] += (1UL << DEPT_IRQS_NR);
+}
+
+/*
+ * Ensure it's the outmost hardirq context.
+ */
+void dept_hardirq_enter(void)
+{
+	struct dept_task *dt = dept_task();
+
+	dt->irq_id[DEPT_HIRQ] += (1UL << DEPT_IRQS_NR);
+}
+
+/*
+ * DEPT API
+ * =====================================================================
+ * Main DEPT APIs.
+ */
+
+static inline void clean_classes_cache(struct dept_key *k)
+{
+	int i;
+
+	for (i = 0; i < DEPT_MAX_SUBCLASSES_CACHE; i++)
+		k->classes[i] = NULL;
+}
+
+void dept_map_init(struct dept_map *m, struct dept_key *k, int sub,
+		   const char *n)
+{
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (sub < 0 || sub >= DEPT_MAX_SUBCLASSES_USR) {
+		m->nocheck = true;
+		return;
+	}
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	clean_classes_cache(&m->keys_local);
+
+	m->sub_usr = sub;
+	m->keys = k;
+	m->name = n;
+	m->wgen = 0U;
+	m->nocheck = false;
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_map_init);
+
+void dept_map_reinit(struct dept_map *m)
+{
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (m->nocheck)
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	clean_classes_cache(&m->keys_local);
+	m->wgen = 0U;
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_map_reinit);
+
+void dept_map_nocheck(struct dept_map *m)
+{
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	m->nocheck = true;
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_map_nocheck);
+
+static LIST_HEAD(classes);
+
+static inline bool within(const void *addr, void *start, unsigned long size)
+{
+	return addr >= start && addr < start + size;
+}
+
+void dept_free_range(void *start, unsigned int sz)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_class *c, *n;
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		DEPT_STOP("Should successfully free the objects.\n");
+		return;
+	}
+
+	flags = dept_enter();
+
+	/*
+	 * dept_free_range() should not fail.
+	 *
+	 * FIXME: Should be fixed if dept_free_range() causes deadlock
+	 * with dept_lock().
+	 */
+	while (unlikely(!dept_lock()))
+		cpu_relax();
+
+	list_for_each_entry_safe(c, n, &classes, all_node) {
+		if (!within((void *)c->key, start, sz) &&
+		    !within(c->name, start, sz))
+			continue;
+
+		hash_del_class(c);
+		disconnect_class(c);
+		list_del(&c->all_node);
+		inval_class(c);
+
+		/*
+		 * Actual deletion will happen on the rcu callback
+		 * that has been added in disconnect_class().
+		 */
+		del_class(c);
+	}
+	dept_unlock();
+	dept_exit(flags);
+
+	/*
+	 * Wait until even lockless hash_lookup_class() for the class
+	 * returns NULL.
+	 */
+	might_sleep();
+	synchronize_rcu();
+}
+
+static inline int map_sub(struct dept_map *m, int e)
+{
+	return m->sub_usr + e * DEPT_MAX_SUBCLASSES_USR;
+}
+
+static struct dept_class *check_new_class(struct dept_key *local,
+					  struct dept_key *k, int sub,
+					  const char *n)
+{
+	struct dept_class *c = NULL;
+
+	if (DEPT_WARN_ON(sub >= DEPT_MAX_SUBCLASSES))
+		return NULL;
+
+	if (DEPT_WARN_ON(!k))
+		return NULL;
+
+	if (sub < DEPT_MAX_SUBCLASSES_CACHE)
+		c = READ_ONCE(local->classes[sub]);
+
+	if (c)
+		return c;
+
+	c = lookup_class((unsigned long)k->subkeys + sub);
+	if (c)
+		goto caching;
+
+	if (unlikely(!dept_lock()))
+		return NULL;
+
+	c = lookup_class((unsigned long)k->subkeys + sub);
+	if (unlikely(c))
+		goto unlock;
+
+	c = new_class();
+	if (unlikely(!c))
+		goto unlock;
+
+	c->name = n;
+	c->sub = sub;
+	c->key = (unsigned long)(k->subkeys + sub);
+	hash_add_class(c);
+	list_add(&c->all_node, &classes);
+unlock:
+	dept_unlock();
+caching:
+	/*
+	 * Should be cached even if c == NULL, to reflect the case that
+	 * the class has been deleted.
+	 */
+	if (sub < DEPT_MAX_SUBCLASSES_CACHE)
+		WRITE_ONCE(local->classes[sub], c);
+
+	return c;
+}
+
+static void __dept_wait(struct dept_map *m, unsigned long w_f,
+			unsigned long ip, const char *w_fn, int ne)
+{
+	int e;
+
+	/*
+	 * Be as conservative as possible. In case of mulitple waits for
+	 * a single dept_map, we are going to keep only the last wait's
+	 * wgen for simplicity - keeping all wgens seems overengineering.
+	 *
+	 * Of course, it might cause missing some dependencies that
+	 * would rarely, probabily never, happen but it helps avoid
+	 * false positive report.
+	 */
+	for_each_set_bit(e, &w_f, DEPT_MAX_SUBCLASSES_EVT) {
+		struct dept_class *c;
+		struct dept_key *k;
+
+		k = m->keys ?: &m->keys_local;
+		c = check_new_class(&m->keys_local, k,
+				    map_sub(m, e), m->name);
+		if (!c)
+			continue;
+
+		add_wait(c, ip, w_fn, ne);
+	}
+}
+
+void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip,
+	       const char *w_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive)
+		return;
+
+	if (m->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	__dept_wait(m, w_f, ip, w_fn, ne);
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_wait);
+
+static inline void stage_map(struct dept_task *dt, struct dept_map *m)
+{
+	dt->stage_m = m;
+}
+
+static inline void unstage_map(struct dept_task *dt)
+{
+	dt->stage_m = NULL;
+}
+
+static inline struct dept_map *staged_map(struct dept_task *dt)
+{
+	return dt->stage_m;
+}
+
+void dept_stage_wait(struct dept_map *m, unsigned long w_f,
+		     const char *w_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (m->nocheck)
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	stage_map(dt, m);
+
+	dt->stage_w_f = w_f;
+	dt->stage_w_fn = w_fn;
+	dt->stage_ne = ne;
+
+	/*
+	 * Disable the map just in case real sleep won't happen. This
+	 * will be enabled at dept_ask_event_wait_commit().
+	 */
+	WRITE_ONCE(m->wgen, 0U);
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_stage_wait);
+
+void dept_clean_stage(void)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	unstage_map(dt);
+
+	dt->stage_w_f = 0UL;
+	dt->stage_w_fn = NULL;
+	dt->stage_ne = 0;
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_clean_stage);
+
+/*
+ * Always called from __schedule().
+ */
+void dept_ask_event_wait_commit(unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	unsigned int wg;
+	struct dept_map *m;
+	unsigned long w_f;
+	const char *w_fn;
+	int ne;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		flags = dept_enter_recursive();
+
+		/*
+		 * Dept won't work with this map even though anyway an
+		 * event context has been just asked. Don't make it
+		 * confused at that time handling the event. Disable it
+		 * until the next real case.
+		 */
+		m = staged_map(dt);
+		if (m)
+			WRITE_ONCE(m->wgen, 0U);
+
+		dept_exit_recursive(flags);
+		return;
+	}
+
+	flags = dept_enter();
+
+	m = staged_map(dt);
+
+	/*
+	 * Checks if current has staged a wait before __schedule().
+	 */
+	if (!m)
+		goto exit;
+
+	if (m->nocheck)
+		goto exit;
+
+	w_f = dt->stage_w_f;
+	w_fn = dt->stage_w_fn;
+	ne = dt->stage_ne;
+
+	/*
+	 * Avoid zero wgen.
+	 */
+	wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+	WRITE_ONCE(m->wgen, wg);
+
+	__dept_wait(m, w_f, ip, w_fn, ne);
+exit:
+	dept_exit(flags);
+}
+
+void dept_ecxt_enter(struct dept_map *m, unsigned long e_f, unsigned long ip,
+		     const char *c_fn, const char *e_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	struct dept_class *c;
+	struct dept_key *k;
+	int e;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		dt->missing_ecxt++;
+		return;
+	}
+
+	if (m->nocheck)
+		return;
+
+	flags = dept_enter();
+
+        e = find_first_bit(&e_f, DEPT_MAX_SUBCLASSES_EVT);
+
+	if (e >= DEPT_MAX_SUBCLASSES_EVT)
+		goto missing_ecxt;
+
+	/*
+	 * The caller passed more than single event? Warn it so that the
+	 * caller code can be fixed, and handle the event corresponding
+	 * to the first bit anyway.
+	 */
+	DEPT_WARN_ON(1UL << e != e_f);
+
+	k = m->keys ?: &m->keys_local;
+	c = check_new_class(&m->keys_local, k, map_sub(m, e), m->name);
+
+	if (c && add_ecxt((void *)m, c, ip, c_fn, e_fn, ne))
+		goto exit;
+missing_ecxt:
+	dt->missing_ecxt++;
+exit:
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_ecxt_enter);
+
+void dept_ask_event(struct dept_map *m)
+{
+	unsigned long flags;
+	unsigned int wg;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (m->nocheck)
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	/*
+	 * Avoid zero wgen.
+	 */
+	wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+	WRITE_ONCE(m->wgen, wg);
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_ask_event);
+
+void dept_event(struct dept_map *m, unsigned long e_f, unsigned long ip,
+		const char *e_fn)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	struct dept_class *c;
+	struct dept_key *k;
+	int e;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		/*
+		 * Dept won't work with this map even though anyway an
+		 * event has been just triggered. Don't make it confused
+		 * at that time handling the next event. Disable it
+		 * until the next real case.
+		 */
+		WRITE_ONCE(m->wgen, 0U);
+		return;
+	}
+
+	if (m->nocheck)
+		return;
+
+	flags = dept_enter();
+
+        e = find_first_bit(&e_f, DEPT_MAX_SUBCLASSES_EVT);
+
+	if (DEPT_WARN_ON(e >= DEPT_MAX_SUBCLASSES_EVT))
+		goto exit;
+
+	/*
+	 * The caller passed more than single event? Warn it so that the
+	 * caller can be fixed, and handle the event corresponding to
+	 * the first bit anyway.
+	 */
+	DEPT_WARN_ON(1UL << e != e_f);
+
+	k = m->keys ?: &m->keys_local;
+	c = check_new_class(&m->keys_local, k, map_sub(m, e), m->name);
+
+	if (c && add_ecxt((void *)m, c, 0UL, NULL, e_fn, 0)) {
+		do_event((void *)m, c, READ_ONCE(m->wgen), ip);
+		pop_ecxt((void *)m, c);
+	}
+exit:
+	/*
+	 * Keep the map diabled until the next sleep.
+	 */
+	WRITE_ONCE(m->wgen, 0U);
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_event);
+
+void dept_ecxt_exit(struct dept_map *m, unsigned long e_f,
+		    unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	struct dept_class *c;
+	struct dept_key *k;
+	int e;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		dt->missing_ecxt--;
+		return;
+	}
+
+	if (m->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	e = find_first_bit(&e_f, DEPT_MAX_SUBCLASSES_EVT);
+
+	if (e >= DEPT_MAX_SUBCLASSES_EVT)
+		goto missing_ecxt;
+
+	/*
+	 * The caller passed more than single event? Warn it so that the
+	 * caller can be fixed, and handle the event corresponding to
+	 * the first bit anyway.
+	 */
+	DEPT_WARN_ON(1UL << e != e_f);
+
+	k = m->keys ?: &m->keys_local;
+	c = check_new_class(&m->keys_local, k, map_sub(m, e), m->name);
+
+	if (c && pop_ecxt((void *)m, c))
+		goto exit;
+missing_ecxt:
+	dt->missing_ecxt--;
+exit:
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_ecxt_exit);
+
+void dept_task_exit(struct task_struct *t)
+{
+	struct dept_task *dt = &t->dept_task;
+	int i;
+
+	raw_local_irq_disable();
+
+	if (dt->stack)
+		put_stack(dt->stack);
+
+	for (i = 0; i < dt->ecxt_held_pos; i++)
+		put_ecxt(dt->ecxt_held[i].ecxt);
+
+	for (i = 0; i < DEPT_MAX_WAIT_HIST; i++)
+		if (dt->wait_hist[i].wait)
+			put_wait(dt->wait_hist[i].wait);
+
+	dept_off();
+
+	raw_local_irq_enable();
+}
+
+void dept_task_init(struct task_struct *t)
+{
+	memset(&t->dept_task, 0x0, sizeof(struct dept_task));
+}
+
+void dept_key_init(struct dept_key *k)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	int sub;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		DEPT_STOP("Key initialization fails.\n");
+		return;
+	}
+
+	flags = dept_enter();
+
+	/*
+	 * dept_key_init() should not fail.
+	 *
+	 * FIXME: Should be fixed if dept_key_init() causes deadlock
+	 * with dept_lock().
+	 */
+	while (unlikely(!dept_lock()))
+		cpu_relax();
+
+	for (sub = 0; sub < DEPT_MAX_SUBCLASSES; sub++) {
+		struct dept_class *c;
+
+		c = lookup_class((unsigned long)k->subkeys + sub);
+		if (!c)
+			continue;
+
+		DEPT_STOP("The class(%s/%d) has not been removed.\n",
+			  c->name, sub);
+		break;
+	}
+
+	dept_unlock();
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_key_init);
+
+void dept_key_destroy(struct dept_key *k)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	int sub;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		DEPT_STOP("Key destroying fails.\n");
+		return;
+	}
+
+	flags = dept_enter();
+
+	/*
+	 * dept_key_destroy() should not fail.
+	 *
+	 * FIXME: Should be fixed if dept_key_destroy() causes deadlock
+	 * with dept_lock().
+	 */
+	while (unlikely(!dept_lock()))
+		cpu_relax();
+
+	for (sub = 0; sub < DEPT_MAX_SUBCLASSES; sub++) {
+		struct dept_class *c;
+
+		c = lookup_class((unsigned long)k->subkeys + sub);
+		if (!c)
+			continue;
+
+		hash_del_class(c);
+		disconnect_class(c);
+		list_del(&c->all_node);
+		inval_class(c);
+
+		/*
+		 * Actual deletion will happen on the rcu callback
+		 * that has been added in disconnect_class().
+		 */
+		del_class(c);
+	}
+
+	dept_unlock();
+	dept_exit(flags);
+
+	/*
+	 * Wait until even lockless hash_lookup_class() for the class
+	 * returns NULL.
+	 */
+	might_sleep();
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(dept_key_destroy);
+
+static void move_llist(struct llist_head *to, struct llist_head *from)
+{
+	struct llist_node *first = llist_del_all(from);
+	struct llist_node *last;
+
+	if (!first)
+		return;
+
+	for (last = first; last->next; last = last->next);
+	llist_add_batch(first, last, to);
+}
+
+static void migrate_per_cpu_pool(void)
+{
+	const int boot_cpu = 0;
+	int i;
+
+	/*
+	 * The boot CPU has been using the temperal local pool so far.
+	 * From now on that per_cpu areas have been ready, use the
+	 * per_cpu local pool instead.
+	 */
+	DEPT_WARN_ON(smp_processor_id() != boot_cpu);
+	for (i = 0; i < OBJECT_NR; i++) {
+		struct llist_head *from;
+		struct llist_head *to;
+
+		from = &pool[i].boot_pool;
+		to = per_cpu_ptr(pool[i].lpool, boot_cpu);
+		move_llist(to, from);
+	}
+}
+
+#define B2KB(B) ((B) / 1024)
+
+/*
+ * Should be called after setup_per_cpu_areas() and before no non-boot
+ * CPUs have been on.
+ */
+void __init dept_init(void)
+{
+	size_t mem_total = 0;
+
+	local_irq_disable();
+	dept_per_cpu_ready = 1;
+	migrate_per_cpu_pool();
+	local_irq_enable();
+
+#define OBJECT(id, nr) mem_total += sizeof(struct dept_##id) * nr;
+	#include "dept_object.h"
+#undef  OBJECT
+#define HASH(id, bits) mem_total += sizeof(struct hlist_head) * (1UL << bits);
+	#include "dept_hash.h"
+#undef  HASH
+
+	pr_info("DEPendency Tracker: Copyright (c) 2020 LG Electronics, Inc., Byungchul Park\n");
+	pr_info("... DEPT_MAX_STACK_ENTRY: %d\n", DEPT_MAX_STACK_ENTRY);
+	pr_info("... DEPT_MAX_WAIT_HIST  : %d\n", DEPT_MAX_WAIT_HIST);
+	pr_info("... DEPT_MAX_ECXT_HELD  : %d\n", DEPT_MAX_ECXT_HELD);
+	pr_info("... DEPT_MAX_SUBCLASSES : %d\n", DEPT_MAX_SUBCLASSES);
+#define OBJECT(id, nr)							\
+	pr_info("... memory used by %s: %zu KB\n",			\
+	       #id, B2KB(sizeof(struct dept_##id) * nr));
+	#include "dept_object.h"
+#undef  OBJECT
+#define HASH(id, bits)							\
+	pr_info("... hash list head used by %s: %zu KB\n",		\
+	       #id, B2KB(sizeof(struct hlist_head) * (1UL << bits)));
+	#include "dept_hash.h"
+#undef  HASH
+	pr_info("... total memory used by objects and hashs: %zu KB\n", B2KB(mem_total));
+	pr_info("... per task memory footprint: %zu bytes\n", sizeof(struct dept_task));
+}
diff --git a/kernel/dependency/dept_hash.h b/kernel/dependency/dept_hash.h
new file mode 100644
index 00000000..fd85aab
--- /dev/null
+++ b/kernel/dependency/dept_hash.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * HASH(id, bits)
+ *
+ * id  : Id for the object of struct dept_##id.
+ * bits: 1UL << bits is the hash table size.
+ */
+
+HASH(dep, 12)
+HASH(class, 12)
diff --git a/kernel/dependency/dept_object.h b/kernel/dependency/dept_object.h
new file mode 100644
index 00000000..0b7eb16
--- /dev/null
+++ b/kernel/dependency/dept_object.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * OBJECT(id, nr)
+ *
+ * id: Id for the object of struct dept_##id.
+ * nr: # of the object that should be kept in the pool.
+ */
+
+OBJECT(dep, 1024 * 8)
+OBJECT(class, 1024 * 8)
+OBJECT(stack, 1024 * 32)
+OBJECT(ecxt, 1024 * 16)
+OBJECT(wait, 1024 * 32)
diff --git a/kernel/exit.c b/kernel/exit.c
index f072959..bac41ee 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -844,6 +844,7 @@ void __noreturn do_exit(long code)
 	exit_tasks_rcu_finish();
 
 	lockdep_free_task(tsk);
+	dept_task_exit(tsk);
 	do_task_dead();
 }
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897..68f7154 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -98,6 +98,7 @@
 #include <linux/io_uring.h>
 #include <linux/bpf.h>
 #include <linux/sched/mm.h>
+#include <linux/dept.h>
 
 #include <asm/pgalloc.h>
 #include <linux/uaccess.h>
@@ -2187,6 +2188,7 @@ static __latent_entropy struct task_struct *copy_process(
 #ifdef CONFIG_LOCKDEP
 	lockdep_init_task(p);
 #endif
+	dept_task_init(p);
 
 #ifdef CONFIG_DEBUG_MUTEXES
 	p->blocked_on = NULL; /* not blocked yet */
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index c06cab6..2175f9c 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1182,7 +1182,7 @@ static inline struct hlist_head *keyhashentry(const struct lock_class_key *key)
 }
 
 /* Register a dynamically allocated key. */
-void lockdep_register_key(struct lock_class_key *key)
+void __lockdep_register_key(struct lock_class_key *key)
 {
 	struct hlist_head *hash_head;
 	struct lock_class_key *k;
@@ -1205,7 +1205,7 @@ void lockdep_register_key(struct lock_class_key *key)
 restore_irqs:
 	raw_local_irq_restore(flags);
 }
-EXPORT_SYMBOL_GPL(lockdep_register_key);
+EXPORT_SYMBOL_GPL(__lockdep_register_key);
 
 /* Check whether a key has been registered as a dynamic key. */
 static bool is_dynamic_key(const struct lock_class_key *key)
@@ -4243,7 +4243,7 @@ static void __trace_hardirqs_on_caller(void)
  * stops watching. After the RCU transition lockdep_hardirqs_on() has to be
  * invoked to set the final state.
  */
-void lockdep_hardirqs_on_prepare(unsigned long ip)
+void __lockdep_hardirqs_on_prepare(unsigned long ip)
 {
 	if (unlikely(!debug_locks))
 		return;
@@ -4294,9 +4294,9 @@ void lockdep_hardirqs_on_prepare(unsigned long ip)
 	__trace_hardirqs_on_caller();
 	lockdep_recursion_finish();
 }
-EXPORT_SYMBOL_GPL(lockdep_hardirqs_on_prepare);
+EXPORT_SYMBOL_GPL(__lockdep_hardirqs_on_prepare);
 
-void noinstr lockdep_hardirqs_on(unsigned long ip)
+void noinstr __lockdep_hardirqs_on(unsigned long ip)
 {
 	struct irqtrace_events *trace = &current->irqtrace;
 
@@ -4358,12 +4358,12 @@ void noinstr lockdep_hardirqs_on(unsigned long ip)
 	trace->hardirq_enable_event = ++trace->irq_events;
 	debug_atomic_inc(hardirqs_on_events);
 }
-EXPORT_SYMBOL_GPL(lockdep_hardirqs_on);
+EXPORT_SYMBOL_GPL(__lockdep_hardirqs_on);
 
 /*
  * Hardirqs were disabled:
  */
-void noinstr lockdep_hardirqs_off(unsigned long ip)
+void noinstr __lockdep_hardirqs_off(unsigned long ip)
 {
 	if (unlikely(!debug_locks))
 		return;
@@ -4400,12 +4400,12 @@ void noinstr lockdep_hardirqs_off(unsigned long ip)
 		debug_atomic_inc(redundant_hardirqs_off);
 	}
 }
-EXPORT_SYMBOL_GPL(lockdep_hardirqs_off);
+EXPORT_SYMBOL_GPL(__lockdep_hardirqs_off);
 
 /*
  * Softirqs will be enabled:
  */
-void lockdep_softirqs_on(unsigned long ip)
+void __lockdep_softirqs_on(unsigned long ip)
 {
 	struct irqtrace_events *trace = &current->irqtrace;
 
@@ -4445,7 +4445,7 @@ void lockdep_softirqs_on(unsigned long ip)
 /*
  * Softirqs were disabled:
  */
-void lockdep_softirqs_off(unsigned long ip)
+void __lockdep_softirqs_off(unsigned long ip)
 {
 	if (unlikely(!lockdep_enabled()))
 		return;
@@ -4773,7 +4773,7 @@ static inline int check_wait_context(struct task_struct *curr,
 /*
  * Initialize a lock instance's lock-class mapping info:
  */
-void lockdep_init_map_type(struct lockdep_map *lock, const char *name,
+void __lockdep_init_map_type(struct lockdep_map *lock, const char *name,
 			    struct lock_class_key *key, int subclass,
 			    u8 inner, u8 outer, u8 lock_type)
 {
@@ -4833,7 +4833,7 @@ void lockdep_init_map_type(struct lockdep_map *lock, const char *name,
 		raw_local_irq_restore(flags);
 	}
 }
-EXPORT_SYMBOL_GPL(lockdep_init_map_type);
+EXPORT_SYMBOL_GPL(__lockdep_init_map_type);
 
 struct lock_class_key __lockdep_no_validate__;
 EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
@@ -6298,7 +6298,7 @@ void lockdep_reset_lock(struct lockdep_map *lock)
  * key irrespective of debug_locks to avoid potential invalid access to freed
  * memory in lock_class entry.
  */
-void lockdep_unregister_key(struct lock_class_key *key)
+void __lockdep_unregister_key(struct lock_class_key *key)
 {
 	struct hlist_head *hash_head = keyhashentry(key);
 	struct lock_class_key *k;
@@ -6333,7 +6333,7 @@ void lockdep_unregister_key(struct lock_class_key *key)
 	/* Wait until is_dynamic_key() has finished accessing k->hash_entry. */
 	synchronize_rcu();
 }
-EXPORT_SYMBOL_GPL(lockdep_unregister_key);
+EXPORT_SYMBOL_GPL(__lockdep_unregister_key);
 
 void __init lockdep_init(void)
 {
diff --git a/kernel/module.c b/kernel/module.c
index 6cea788..0953f99 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2205,6 +2205,7 @@ static void free_module(struct module *mod)
 
 	/* Free lock-classes; relies on the preceding sync_rcu(). */
 	lockdep_free_key_range(mod->core_layout.base, mod->core_layout.size);
+	dept_free_range(mod->core_layout.base, mod->core_layout.size);
 
 	/* Finally, free the core (containing the module structure) */
 	module_memfree(mod->core_layout.base);
@@ -4159,6 +4160,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
  free_module:
 	/* Free lock-classes; relies on the preceding sync_rcu() */
 	lockdep_free_key_range(mod->core_layout.base, mod->core_layout.size);
+	dept_free_range(mod->core_layout.base, mod->core_layout.size);
 
 	module_deallocate(mod, info);
  free_copy:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 51efaab..5784b07 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6285,6 +6285,14 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	rcu_note_context_switch(!!sched_mode);
 
 	/*
+	 * Skip the commit if the current task does not actually go to
+	 * sleep.
+	 */
+	if (READ_ONCE(prev->__state) & TASK_NORMAL &&
+	    sched_mode == SM_NONE)
+		dept_ask_event_wait_commit(_RET_IP_);
+
+	/*
 	 * Make sure that signal_pending_state()->signal_pending() below
 	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
 	 * done by the caller to avoid the race with signal_wake_up():
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 075cd25..3c17507 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1261,6 +1261,33 @@ config DEBUG_PREEMPT
 
 menu "Lock Debugging (spinlocks, mutexes, etc...)"
 
+config DEPT
+	bool "Dependency tracking"
+	depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
+	select DEBUG_SPINLOCK
+	select DEBUG_MUTEXES
+	select DEBUG_RT_MUTEXES if RT_MUTEXES
+	select DEBUG_RWSEMS
+	select DEBUG_WW_MUTEX_SLOWPATH
+	select DEBUG_LOCK_ALLOC
+	select TRACE_IRQFLAGS
+	select STACKTRACE
+	select FRAME_POINTER if !MIPS && !PPC && !ARM && !S390 && !MICROBLAZE && !ARC && !X86
+	select KALLSYMS
+	select KALLSYMS_ALL
+	select PROVE_LOCKING
+	default n
+	help
+	  Check dependencies between wait and event and report it if
+	  deadlock possibility has been detected. Multiple reports are
+	  allowed if there are more than a single problem.
+
+	  This feature is considered EXPERIMENTAL that might produce
+	  false positive reports because new dependencies start to be
+	  tracked, that have never been tracked before. It's worth
+	  noting, to mitigate the impact by the false positives, multi
+	  reporting has been supported.
+
 config LOCK_DEBUGGING_SUPPORT
 	bool
 	depends on TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 03/21] dept: Apply Dept to spinlock
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

Makes Dept able to track dependencies by spinlock.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h            | 18 +++++++++++++++---
 include/linux/spinlock.h           | 21 +++++++++++++++++++++
 include/linux/spinlock_types_raw.h |  3 +++
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index aee4660..4fa91d5 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -572,9 +572,21 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 #define lock_acquire_shared(l, s, t, n, i)		lock_acquire(l, s, t, 1, 1, n, i)
 #define lock_acquire_shared_recursive(l, s, t, n, i)	lock_acquire(l, s, t, 2, 1, n, i)
 
-#define spin_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
-#define spin_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
-#define spin_release(l, i)			lock_release(l, i)
+#define spin_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_spin_lock(&(l)->dmap, s, t, NULL, "spin_unlock", i);	\
+} while (0)
+#define spin_acquire_nest(l, s, t, n, i)				\
+do {									\
+	lock_acquire_exclusive(l, s, t, n, i);				\
+	dept_spin_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "spin_unlock", i); \
+} while (0)
+#define spin_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_spin_unlock(&(l)->dmap, i);				\
+} while (0)
 
 #define rwlock_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define rwlock_acquire_read(l, s, t, i)					\
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 5c0c517..191fb99 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -95,6 +95,27 @@
 # include <linux/spinlock_up.h>
 #endif
 
+#ifdef CONFIG_DEPT
+#define dept_spin_lock(m, ne, t, n, e_fn, ip)				\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+	} else if (n) {							\
+		dept_ecxt_enter_nokeep(m);				\
+	} else {							\
+		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+	}								\
+} while (0)
+#define dept_spin_unlock(m, ip)						\
+do {									\
+	dept_ecxt_exit(m, 1UL, ip);					\
+} while (0)
+#else
+#define dept_spin_lock(m, ne, t, n, e_fn, ip)	do { } while (0)
+#define dept_spin_unlock(m, ip)			do { } while (0)
+#endif
+
 #ifdef CONFIG_DEBUG_SPINLOCK
   extern void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
 				   struct lock_class_key *key, short inner);
diff --git a/include/linux/spinlock_types_raw.h b/include/linux/spinlock_types_raw.h
index 91cb36b..1a1da54 100644
--- a/include/linux/spinlock_types_raw.h
+++ b/include/linux/spinlock_types_raw.h
@@ -31,11 +31,13 @@
 	.dep_map = {					\
 		.name = #lockname,			\
 		.wait_type_inner = LD_WAIT_SPIN,	\
+		.dmap = DEPT_MAP_INITIALIZER(lockname)	\
 	}
 # define SPIN_DEP_MAP_INIT(lockname)			\
 	.dep_map = {					\
 		.name = #lockname,			\
 		.wait_type_inner = LD_WAIT_CONFIG,	\
+		.dmap = DEPT_MAP_INITIALIZER(lockname)	\
 	}
 
 # define LOCAL_SPIN_DEP_MAP_INIT(lockname)		\
@@ -43,6 +45,7 @@
 		.name = #lockname,			\
 		.wait_type_inner = LD_WAIT_CONFIG,	\
 		.lock_type = LD_LOCK_PERCPU,		\
+		.dmap = DEPT_MAP_INITIALIZER(lockname)	\
 	}
 #else
 # define RAW_SPIN_DEP_MAP_INIT(lockname)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 03/21] dept: Apply Dept to spinlock
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Makes Dept able to track dependencies by spinlock.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h            | 18 +++++++++++++++---
 include/linux/spinlock.h           | 21 +++++++++++++++++++++
 include/linux/spinlock_types_raw.h |  3 +++
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index aee4660..4fa91d5 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -572,9 +572,21 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 #define lock_acquire_shared(l, s, t, n, i)		lock_acquire(l, s, t, 1, 1, n, i)
 #define lock_acquire_shared_recursive(l, s, t, n, i)	lock_acquire(l, s, t, 2, 1, n, i)
 
-#define spin_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
-#define spin_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
-#define spin_release(l, i)			lock_release(l, i)
+#define spin_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_spin_lock(&(l)->dmap, s, t, NULL, "spin_unlock", i);	\
+} while (0)
+#define spin_acquire_nest(l, s, t, n, i)				\
+do {									\
+	lock_acquire_exclusive(l, s, t, n, i);				\
+	dept_spin_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "spin_unlock", i); \
+} while (0)
+#define spin_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_spin_unlock(&(l)->dmap, i);				\
+} while (0)
 
 #define rwlock_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define rwlock_acquire_read(l, s, t, i)					\
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 5c0c517..191fb99 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -95,6 +95,27 @@
 # include <linux/spinlock_up.h>
 #endif
 
+#ifdef CONFIG_DEPT
+#define dept_spin_lock(m, ne, t, n, e_fn, ip)				\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+	} else if (n) {							\
+		dept_ecxt_enter_nokeep(m);				\
+	} else {							\
+		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+	}								\
+} while (0)
+#define dept_spin_unlock(m, ip)						\
+do {									\
+	dept_ecxt_exit(m, 1UL, ip);					\
+} while (0)
+#else
+#define dept_spin_lock(m, ne, t, n, e_fn, ip)	do { } while (0)
+#define dept_spin_unlock(m, ip)			do { } while (0)
+#endif
+
 #ifdef CONFIG_DEBUG_SPINLOCK
   extern void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
 				   struct lock_class_key *key, short inner);
diff --git a/include/linux/spinlock_types_raw.h b/include/linux/spinlock_types_raw.h
index 91cb36b..1a1da54 100644
--- a/include/linux/spinlock_types_raw.h
+++ b/include/linux/spinlock_types_raw.h
@@ -31,11 +31,13 @@
 	.dep_map = {					\
 		.name = #lockname,			\
 		.wait_type_inner = LD_WAIT_SPIN,	\
+		.dmap = DEPT_MAP_INITIALIZER(lockname)	\
 	}
 # define SPIN_DEP_MAP_INIT(lockname)			\
 	.dep_map = {					\
 		.name = #lockname,			\
 		.wait_type_inner = LD_WAIT_CONFIG,	\
+		.dmap = DEPT_MAP_INITIALIZER(lockname)	\
 	}
 
 # define LOCAL_SPIN_DEP_MAP_INIT(lockname)		\
@@ -43,6 +45,7 @@
 		.name = #lockname,			\
 		.wait_type_inner = LD_WAIT_CONFIG,	\
 		.lock_type = LD_LOCK_PERCPU,		\
+		.dmap = DEPT_MAP_INITIALIZER(lockname)	\
 	}
 #else
 # define RAW_SPIN_DEP_MAP_INIT(lockname)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 04/21] dept: Apply Dept to mutex families
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

Makes Dept able to track dependencies by mutex families.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h | 18 +++++++++++++++---
 include/linux/mutex.h   | 22 ++++++++++++++++++++++
 include/linux/rtmutex.h |  1 +
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 4fa91d5..99569acb 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -603,9 +603,21 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 #define seqcount_acquire_read(l, s, t, i)	lock_acquire_shared_recursive(l, s, t, NULL, i)
 #define seqcount_release(l, i)			lock_release(l, i)
 
-#define mutex_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
-#define mutex_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
-#define mutex_release(l, i)			lock_release(l, i)
+#define mutex_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_mutex_lock(&(l)->dmap, s, t, NULL, "mutex_unlock", i);	\
+} while (0)
+#define mutex_acquire_nest(l, s, t, n, i)				\
+do {									\
+	lock_acquire_exclusive(l, s, t, n, i);				\
+	dept_mutex_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "mutex_unlock", i);\
+} while (0)
+#define mutex_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_mutex_unlock(&(l)->dmap, i);				\
+} while (0)
 
 #define rwsem_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define rwsem_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 8f226d4..b699cf41 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -25,6 +25,7 @@
 		, .dep_map = {					\
 			.name = #lockname,			\
 			.wait_type_inner = LD_WAIT_SLEEP,	\
+			.dmap = DEPT_MAP_INITIALIZER(lockname)	\
 		}
 #else
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname)
@@ -75,6 +76,27 @@ struct mutex {
 #endif
 };
 
+#ifdef CONFIG_DEPT
+#define dept_mutex_lock(m, ne, t, n, e_fn, ip)				\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+	} else if (n) {							\
+		dept_ecxt_enter_nokeep(m);				\
+	} else {							\
+		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+	}								\
+} while (0)
+#define dept_mutex_unlock(m, ip)					\
+do {									\
+	dept_ecxt_exit(m, 1UL, ip);					\
+} while (0)
+#else
+#define dept_mutex_lock(m, ne, t, n, e_fn, ip)	do { } while (0)
+#define dept_mutex_unlock(m, ip)		do { } while (0)
+#endif
+
 #ifdef CONFIG_DEBUG_MUTEXES
 
 #define __DEBUG_MUTEX_INITIALIZER(lockname)				\
diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h
index 7d04988..416064d 100644
--- a/include/linux/rtmutex.h
+++ b/include/linux/rtmutex.h
@@ -81,6 +81,7 @@ static inline void rt_mutex_debug_task_free(struct task_struct *tsk) { }
 	.dep_map = {					\
 		.name = #mutexname,			\
 		.wait_type_inner = LD_WAIT_SLEEP,	\
+		.dmap = DEPT_MAP_INITIALIZER(mutexname)	\
 	}
 #else
 #define __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 04/21] dept: Apply Dept to mutex families
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Makes Dept able to track dependencies by mutex families.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h | 18 +++++++++++++++---
 include/linux/mutex.h   | 22 ++++++++++++++++++++++
 include/linux/rtmutex.h |  1 +
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 4fa91d5..99569acb 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -603,9 +603,21 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 #define seqcount_acquire_read(l, s, t, i)	lock_acquire_shared_recursive(l, s, t, NULL, i)
 #define seqcount_release(l, i)			lock_release(l, i)
 
-#define mutex_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
-#define mutex_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
-#define mutex_release(l, i)			lock_release(l, i)
+#define mutex_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_mutex_lock(&(l)->dmap, s, t, NULL, "mutex_unlock", i);	\
+} while (0)
+#define mutex_acquire_nest(l, s, t, n, i)				\
+do {									\
+	lock_acquire_exclusive(l, s, t, n, i);				\
+	dept_mutex_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "mutex_unlock", i);\
+} while (0)
+#define mutex_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_mutex_unlock(&(l)->dmap, i);				\
+} while (0)
 
 #define rwsem_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define rwsem_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 8f226d4..b699cf41 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -25,6 +25,7 @@
 		, .dep_map = {					\
 			.name = #lockname,			\
 			.wait_type_inner = LD_WAIT_SLEEP,	\
+			.dmap = DEPT_MAP_INITIALIZER(lockname)	\
 		}
 #else
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname)
@@ -75,6 +76,27 @@ struct mutex {
 #endif
 };
 
+#ifdef CONFIG_DEPT
+#define dept_mutex_lock(m, ne, t, n, e_fn, ip)				\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+	} else if (n) {							\
+		dept_ecxt_enter_nokeep(m);				\
+	} else {							\
+		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+	}								\
+} while (0)
+#define dept_mutex_unlock(m, ip)					\
+do {									\
+	dept_ecxt_exit(m, 1UL, ip);					\
+} while (0)
+#else
+#define dept_mutex_lock(m, ne, t, n, e_fn, ip)	do { } while (0)
+#define dept_mutex_unlock(m, ip)		do { } while (0)
+#endif
+
 #ifdef CONFIG_DEBUG_MUTEXES
 
 #define __DEBUG_MUTEX_INITIALIZER(lockname)				\
diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h
index 7d04988..416064d 100644
--- a/include/linux/rtmutex.h
+++ b/include/linux/rtmutex.h
@@ -81,6 +81,7 @@ static inline void rt_mutex_debug_task_free(struct task_struct *tsk) { }
 	.dep_map = {					\
 		.name = #mutexname,			\
 		.wait_type_inner = LD_WAIT_SLEEP,	\
+		.dmap = DEPT_MAP_INITIALIZER(mutexname)	\
 	}
 #else
 #define __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 05/21] dept: Apply Dept to rwlock
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

Makes Dept able to track dependencies by rwlock.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h        | 25 ++++++++++++++++++++-----
 include/linux/rwlock.h         | 42 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/rwlock_api_smp.h |  8 ++++----
 include/linux/rwlock_types.h   |  1 +
 4 files changed, 67 insertions(+), 9 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 99569acb..b59d8f3 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -588,16 +588,31 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 	dept_spin_unlock(&(l)->dmap, i);				\
 } while (0)
 
-#define rwlock_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
+#define rwlock_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_rwlock_wlock(&(l)->dmap, s, t, NULL, "write_unlock", i);	\
+} while (0)
 #define rwlock_acquire_read(l, s, t, i)					\
 do {									\
-	if (read_lock_is_recursive())					\
+	if (read_lock_is_recursive()) {				\
 		lock_acquire_shared_recursive(l, s, t, NULL, i);	\
-	else								\
+		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 0);\
+	} else {							\
 		lock_acquire_shared(l, s, t, NULL, i);			\
+		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 1);\
+	}								\
+} while (0)
+#define rwlock_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_rwlock_wunlock(&(l)->dmap, i);				\
+} while (0)
+#define rwlock_release_read(l, i)					\
+do {									\
+	lock_release(l, i);						\
+	dept_rwlock_runlock(&(l)->dmap, i);				\
 } while (0)
-
-#define rwlock_release(l, i)			lock_release(l, i)
 
 #define seqcount_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define seqcount_acquire_read(l, s, t, i)	lock_acquire_shared_recursive(l, s, t, NULL, i)
diff --git a/include/linux/rwlock.h b/include/linux/rwlock.h
index 8f416c5..bbab144 100644
--- a/include/linux/rwlock.h
+++ b/include/linux/rwlock.h
@@ -28,6 +28,48 @@
 	do { *(lock) = __RW_LOCK_UNLOCKED(lock); } while (0)
 #endif
 
+#ifdef CONFIG_DEPT
+#define DEPT_EVT_RWLOCK_R		1UL
+#define DEPT_EVT_RWLOCK_W		(1UL << 1)
+#define DEPT_EVT_RWLOCK_RW		(DEPT_EVT_RWLOCK_R | DEPT_EVT_RWLOCK_W)
+
+#define dept_rwlock_wlock(m, ne, t, n, e_fn, ip)			\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_W, ip, __func__, e_fn, ne);\
+	} else if (n) {							\
+		dept_ecxt_enter_nokeep(m);				\
+	} else {							\
+		dept_wait(m, DEPT_EVT_RWLOCK_RW, ip, __func__, ne);	\
+		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_W, ip, __func__, e_fn, ne);\
+	}								\
+} while (0)
+#define dept_rwlock_rlock(m, ne, t, n, e_fn, ip, q)			\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_R, ip, __func__, e_fn, ne);\
+	} else if (n) {							\
+		dept_ecxt_enter_nokeep(m);				\
+	} else {							\
+		dept_wait(m, (q) ? DEPT_EVT_RWLOCK_RW : DEPT_EVT_RWLOCK_W, ip, __func__, ne);\
+		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_R, ip, __func__, e_fn, ne);\
+	}								\
+} while (0)
+#define dept_rwlock_wunlock(m, ip)					\
+do {									\
+	dept_ecxt_exit(m, DEPT_EVT_RWLOCK_W, ip);			\
+} while (0)
+#define dept_rwlock_runlock(m, ip)					\
+do {									\
+	dept_ecxt_exit(m, DEPT_EVT_RWLOCK_R, ip);			\
+} while (0)
+#else
+#define dept_rwlock_wlock(m, ne, t, n, e_fn, ip)	do { } while (0)
+#define dept_rwlock_rlock(m, ne, t, n, e_fn, ip, q)	do { } while (0)
+#define dept_rwlock_wunlock(m, ip)			do { } while (0)
+#define dept_rwlock_runlock(m, ip)			do { } while (0)
+#endif
+
 #ifdef CONFIG_DEBUG_SPINLOCK
  extern void do_raw_read_lock(rwlock_t *lock) __acquires(lock);
  extern int do_raw_read_trylock(rwlock_t *lock);
diff --git a/include/linux/rwlock_api_smp.h b/include/linux/rwlock_api_smp.h
index dceb0a5..a222cf1 100644
--- a/include/linux/rwlock_api_smp.h
+++ b/include/linux/rwlock_api_smp.h
@@ -228,7 +228,7 @@ static inline void __raw_write_unlock(rwlock_t *lock)
 
 static inline void __raw_read_unlock(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, _RET_IP_);
+	rwlock_release_read(&lock->dep_map, _RET_IP_);
 	do_raw_read_unlock(lock);
 	preempt_enable();
 }
@@ -236,7 +236,7 @@ static inline void __raw_read_unlock(rwlock_t *lock)
 static inline void
 __raw_read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
-	rwlock_release(&lock->dep_map, _RET_IP_);
+	rwlock_release_read(&lock->dep_map, _RET_IP_);
 	do_raw_read_unlock(lock);
 	local_irq_restore(flags);
 	preempt_enable();
@@ -244,7 +244,7 @@ static inline void __raw_read_unlock(rwlock_t *lock)
 
 static inline void __raw_read_unlock_irq(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, _RET_IP_);
+	rwlock_release_read(&lock->dep_map, _RET_IP_);
 	do_raw_read_unlock(lock);
 	local_irq_enable();
 	preempt_enable();
@@ -252,7 +252,7 @@ static inline void __raw_read_unlock_irq(rwlock_t *lock)
 
 static inline void __raw_read_unlock_bh(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, _RET_IP_);
+	rwlock_release_read(&lock->dep_map, _RET_IP_);
 	do_raw_read_unlock(lock);
 	__local_bh_enable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
 }
diff --git a/include/linux/rwlock_types.h b/include/linux/rwlock_types.h
index 1948442..27b4b78 100644
--- a/include/linux/rwlock_types.h
+++ b/include/linux/rwlock_types.h
@@ -10,6 +10,7 @@
 	.dep_map = {							\
 		.name = #lockname,					\
 		.wait_type_inner = LD_WAIT_CONFIG,			\
+		.dmap = DEPT_MAP_INITIALIZER(lockname)			\
 	}
 #else
 # define RW_DEP_MAP_INIT(lockname)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 05/21] dept: Apply Dept to rwlock
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Makes Dept able to track dependencies by rwlock.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h        | 25 ++++++++++++++++++++-----
 include/linux/rwlock.h         | 42 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/rwlock_api_smp.h |  8 ++++----
 include/linux/rwlock_types.h   |  1 +
 4 files changed, 67 insertions(+), 9 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 99569acb..b59d8f3 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -588,16 +588,31 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 	dept_spin_unlock(&(l)->dmap, i);				\
 } while (0)
 
-#define rwlock_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
+#define rwlock_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_rwlock_wlock(&(l)->dmap, s, t, NULL, "write_unlock", i);	\
+} while (0)
 #define rwlock_acquire_read(l, s, t, i)					\
 do {									\
-	if (read_lock_is_recursive())					\
+	if (read_lock_is_recursive()) {				\
 		lock_acquire_shared_recursive(l, s, t, NULL, i);	\
-	else								\
+		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 0);\
+	} else {							\
 		lock_acquire_shared(l, s, t, NULL, i);			\
+		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 1);\
+	}								\
+} while (0)
+#define rwlock_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_rwlock_wunlock(&(l)->dmap, i);				\
+} while (0)
+#define rwlock_release_read(l, i)					\
+do {									\
+	lock_release(l, i);						\
+	dept_rwlock_runlock(&(l)->dmap, i);				\
 } while (0)
-
-#define rwlock_release(l, i)			lock_release(l, i)
 
 #define seqcount_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define seqcount_acquire_read(l, s, t, i)	lock_acquire_shared_recursive(l, s, t, NULL, i)
diff --git a/include/linux/rwlock.h b/include/linux/rwlock.h
index 8f416c5..bbab144 100644
--- a/include/linux/rwlock.h
+++ b/include/linux/rwlock.h
@@ -28,6 +28,48 @@
 	do { *(lock) = __RW_LOCK_UNLOCKED(lock); } while (0)
 #endif
 
+#ifdef CONFIG_DEPT
+#define DEPT_EVT_RWLOCK_R		1UL
+#define DEPT_EVT_RWLOCK_W		(1UL << 1)
+#define DEPT_EVT_RWLOCK_RW		(DEPT_EVT_RWLOCK_R | DEPT_EVT_RWLOCK_W)
+
+#define dept_rwlock_wlock(m, ne, t, n, e_fn, ip)			\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_W, ip, __func__, e_fn, ne);\
+	} else if (n) {							\
+		dept_ecxt_enter_nokeep(m);				\
+	} else {							\
+		dept_wait(m, DEPT_EVT_RWLOCK_RW, ip, __func__, ne);	\
+		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_W, ip, __func__, e_fn, ne);\
+	}								\
+} while (0)
+#define dept_rwlock_rlock(m, ne, t, n, e_fn, ip, q)			\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_R, ip, __func__, e_fn, ne);\
+	} else if (n) {							\
+		dept_ecxt_enter_nokeep(m);				\
+	} else {							\
+		dept_wait(m, (q) ? DEPT_EVT_RWLOCK_RW : DEPT_EVT_RWLOCK_W, ip, __func__, ne);\
+		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_R, ip, __func__, e_fn, ne);\
+	}								\
+} while (0)
+#define dept_rwlock_wunlock(m, ip)					\
+do {									\
+	dept_ecxt_exit(m, DEPT_EVT_RWLOCK_W, ip);			\
+} while (0)
+#define dept_rwlock_runlock(m, ip)					\
+do {									\
+	dept_ecxt_exit(m, DEPT_EVT_RWLOCK_R, ip);			\
+} while (0)
+#else
+#define dept_rwlock_wlock(m, ne, t, n, e_fn, ip)	do { } while (0)
+#define dept_rwlock_rlock(m, ne, t, n, e_fn, ip, q)	do { } while (0)
+#define dept_rwlock_wunlock(m, ip)			do { } while (0)
+#define dept_rwlock_runlock(m, ip)			do { } while (0)
+#endif
+
 #ifdef CONFIG_DEBUG_SPINLOCK
  extern void do_raw_read_lock(rwlock_t *lock) __acquires(lock);
  extern int do_raw_read_trylock(rwlock_t *lock);
diff --git a/include/linux/rwlock_api_smp.h b/include/linux/rwlock_api_smp.h
index dceb0a5..a222cf1 100644
--- a/include/linux/rwlock_api_smp.h
+++ b/include/linux/rwlock_api_smp.h
@@ -228,7 +228,7 @@ static inline void __raw_write_unlock(rwlock_t *lock)
 
 static inline void __raw_read_unlock(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, _RET_IP_);
+	rwlock_release_read(&lock->dep_map, _RET_IP_);
 	do_raw_read_unlock(lock);
 	preempt_enable();
 }
@@ -236,7 +236,7 @@ static inline void __raw_read_unlock(rwlock_t *lock)
 static inline void
 __raw_read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
-	rwlock_release(&lock->dep_map, _RET_IP_);
+	rwlock_release_read(&lock->dep_map, _RET_IP_);
 	do_raw_read_unlock(lock);
 	local_irq_restore(flags);
 	preempt_enable();
@@ -244,7 +244,7 @@ static inline void __raw_read_unlock(rwlock_t *lock)
 
 static inline void __raw_read_unlock_irq(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, _RET_IP_);
+	rwlock_release_read(&lock->dep_map, _RET_IP_);
 	do_raw_read_unlock(lock);
 	local_irq_enable();
 	preempt_enable();
@@ -252,7 +252,7 @@ static inline void __raw_read_unlock_irq(rwlock_t *lock)
 
 static inline void __raw_read_unlock_bh(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, _RET_IP_);
+	rwlock_release_read(&lock->dep_map, _RET_IP_);
 	do_raw_read_unlock(lock);
 	__local_bh_enable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
 }
diff --git a/include/linux/rwlock_types.h b/include/linux/rwlock_types.h
index 1948442..27b4b78 100644
--- a/include/linux/rwlock_types.h
+++ b/include/linux/rwlock_types.h
@@ -10,6 +10,7 @@
 	.dep_map = {							\
 		.name = #lockname,					\
 		.wait_type_inner = LD_WAIT_CONFIG,			\
+		.dmap = DEPT_MAP_INITIALIZER(lockname)			\
 	}
 #else
 # define RW_DEP_MAP_INIT(lockname)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 06/21] dept: Apply Dept to wait_for_completion()/complete()
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

Makes Dept able to track dependencies by
wait_for_completion()/complete().

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/completion.h | 44 ++++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/completion.c  | 12 ++++++++++--
 2 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 51d9ab0..358c656 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -26,14 +26,45 @@
 struct completion {
 	unsigned int done;
 	struct swait_queue_head wait;
+	struct dept_map dmap;
 };
 
+#ifdef CONFIG_DEPT
+#define dept_wfc_nocheck(m)			dept_map_nocheck(m)
+#define dept_wfc_init(m, k, s, n)		dept_map_init(m, k, s, n)
+#define dept_wfc_reinit(m)			dept_map_reinit(m)
+#define dept_wfc_wait(m, ip)						\
+do {									\
+	dept_ask_event(m);						\
+	dept_wait(m, 1UL, ip, __func__, 0);				\
+} while (0)
+#define dept_wfc_complete(m, ip)		dept_event(m, 1UL, ip, __func__)
+#define dept_wfc_enter(m, ip)			dept_ecxt_enter(m, 1UL, ip, "completion_context_enter", "complete", 0)
+#define dept_wfc_exit(m, ip)			dept_ecxt_exit(m, 1UL, ip)
+#else
+#define dept_wfc_nocheck(m)			do { } while (0)
+#define dept_wfc_init(m, k, s, n)		do { (void)(n); (void)(k); } while (0)
+#define dept_wfc_reinit(m)			do { } while (0)
+#define dept_wfc_wait(m, ip)			do { } while (0)
+#define dept_wfc_complete(m, ip)		do { } while (0)
+#define dept_wfc_enter(m, ip)			do { } while (0)
+#define dept_wfc_exit(m, ip)			do { } while (0)
+#endif
+
+#define init_completion_nocheck(x) __init_completion(x, NULL, #x, false)
+#define init_completion(x)					\
+	do {							\
+		static struct dept_key __dkey;			\
+		__init_completion(x, &__dkey, #x, true);	\
+	} while (0)
+
 #define init_completion_map(x, m) init_completion(x)
 static inline void complete_acquire(struct completion *x) {}
 static inline void complete_release(struct completion *x) {}
 
 #define COMPLETION_INITIALIZER(work) \
-	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
+	  .dmap = DEPT_MAP_INITIALIZER(work) }
 
 #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
 	(*({ init_completion_map(&(work), &(map)); &(work); }))
@@ -81,9 +112,17 @@ static inline void complete_release(struct completion *x) {}
  * This inline function will initialize a dynamically created completion
  * structure.
  */
-static inline void init_completion(struct completion *x)
+static inline void __init_completion(struct completion *x,
+				     struct dept_key *dkey,
+				     const char *name, bool check)
 {
 	x->done = 0;
+
+	if (check)
+		dept_wfc_init(&x->dmap, dkey, 0, name);
+	else
+		dept_wfc_nocheck(&x->dmap);
+
 	init_swait_queue_head(&x->wait);
 }
 
@@ -97,6 +136,7 @@ static inline void init_completion(struct completion *x)
 static inline void reinit_completion(struct completion *x)
 {
 	x->done = 0;
+	dept_wfc_reinit(&x->dmap);
 }
 
 extern void wait_for_completion(struct completion *);
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 35f15c2..4cf0cfe 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -29,6 +29,7 @@ void complete(struct completion *x)
 {
 	unsigned long flags;
 
+	dept_wfc_complete(&x->dmap, _RET_IP_);
 	raw_spin_lock_irqsave(&x->wait.lock, flags);
 
 	if (x->done != UINT_MAX)
@@ -58,6 +59,7 @@ void complete_all(struct completion *x)
 {
 	unsigned long flags;
 
+	dept_wfc_complete(&x->dmap, _RET_IP_);
 	lockdep_assert_RT_in_threaded_ctx();
 
 	raw_spin_lock_irqsave(&x->wait.lock, flags);
@@ -112,17 +114,23 @@ void complete_all(struct completion *x)
 }
 
 static long __sched
-wait_for_common(struct completion *x, long timeout, int state)
+_wait_for_common(struct completion *x, long timeout, int state)
 {
 	return __wait_for_common(x, schedule_timeout, timeout, state);
 }
 
 static long __sched
-wait_for_common_io(struct completion *x, long timeout, int state)
+_wait_for_common_io(struct completion *x, long timeout, int state)
 {
 	return __wait_for_common(x, io_schedule_timeout, timeout, state);
 }
 
+#define wait_for_common(x, t, s)					\
+({ dept_wfc_wait(&(x)->dmap, _RET_IP_); _wait_for_common(x, t, s); })
+
+#define wait_for_common_io(x, t, s)					\
+({ dept_wfc_wait(&(x)->dmap, _RET_IP_); _wait_for_common_io(x, t, s); })
+
 /**
  * wait_for_completion: - waits for completion of a task
  * @x:  holds the state of this particular completion
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 06/21] dept: Apply Dept to wait_for_completion()/complete()
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Makes Dept able to track dependencies by
wait_for_completion()/complete().

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/completion.h | 44 ++++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/completion.c  | 12 ++++++++++--
 2 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 51d9ab0..358c656 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -26,14 +26,45 @@
 struct completion {
 	unsigned int done;
 	struct swait_queue_head wait;
+	struct dept_map dmap;
 };
 
+#ifdef CONFIG_DEPT
+#define dept_wfc_nocheck(m)			dept_map_nocheck(m)
+#define dept_wfc_init(m, k, s, n)		dept_map_init(m, k, s, n)
+#define dept_wfc_reinit(m)			dept_map_reinit(m)
+#define dept_wfc_wait(m, ip)						\
+do {									\
+	dept_ask_event(m);						\
+	dept_wait(m, 1UL, ip, __func__, 0);				\
+} while (0)
+#define dept_wfc_complete(m, ip)		dept_event(m, 1UL, ip, __func__)
+#define dept_wfc_enter(m, ip)			dept_ecxt_enter(m, 1UL, ip, "completion_context_enter", "complete", 0)
+#define dept_wfc_exit(m, ip)			dept_ecxt_exit(m, 1UL, ip)
+#else
+#define dept_wfc_nocheck(m)			do { } while (0)
+#define dept_wfc_init(m, k, s, n)		do { (void)(n); (void)(k); } while (0)
+#define dept_wfc_reinit(m)			do { } while (0)
+#define dept_wfc_wait(m, ip)			do { } while (0)
+#define dept_wfc_complete(m, ip)		do { } while (0)
+#define dept_wfc_enter(m, ip)			do { } while (0)
+#define dept_wfc_exit(m, ip)			do { } while (0)
+#endif
+
+#define init_completion_nocheck(x) __init_completion(x, NULL, #x, false)
+#define init_completion(x)					\
+	do {							\
+		static struct dept_key __dkey;			\
+		__init_completion(x, &__dkey, #x, true);	\
+	} while (0)
+
 #define init_completion_map(x, m) init_completion(x)
 static inline void complete_acquire(struct completion *x) {}
 static inline void complete_release(struct completion *x) {}
 
 #define COMPLETION_INITIALIZER(work) \
-	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
+	  .dmap = DEPT_MAP_INITIALIZER(work) }
 
 #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
 	(*({ init_completion_map(&(work), &(map)); &(work); }))
@@ -81,9 +112,17 @@ static inline void complete_release(struct completion *x) {}
  * This inline function will initialize a dynamically created completion
  * structure.
  */
-static inline void init_completion(struct completion *x)
+static inline void __init_completion(struct completion *x,
+				     struct dept_key *dkey,
+				     const char *name, bool check)
 {
 	x->done = 0;
+
+	if (check)
+		dept_wfc_init(&x->dmap, dkey, 0, name);
+	else
+		dept_wfc_nocheck(&x->dmap);
+
 	init_swait_queue_head(&x->wait);
 }
 
@@ -97,6 +136,7 @@ static inline void init_completion(struct completion *x)
 static inline void reinit_completion(struct completion *x)
 {
 	x->done = 0;
+	dept_wfc_reinit(&x->dmap);
 }
 
 extern void wait_for_completion(struct completion *);
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 35f15c2..4cf0cfe 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -29,6 +29,7 @@ void complete(struct completion *x)
 {
 	unsigned long flags;
 
+	dept_wfc_complete(&x->dmap, _RET_IP_);
 	raw_spin_lock_irqsave(&x->wait.lock, flags);
 
 	if (x->done != UINT_MAX)
@@ -58,6 +59,7 @@ void complete_all(struct completion *x)
 {
 	unsigned long flags;
 
+	dept_wfc_complete(&x->dmap, _RET_IP_);
 	lockdep_assert_RT_in_threaded_ctx();
 
 	raw_spin_lock_irqsave(&x->wait.lock, flags);
@@ -112,17 +114,23 @@ void complete_all(struct completion *x)
 }
 
 static long __sched
-wait_for_common(struct completion *x, long timeout, int state)
+_wait_for_common(struct completion *x, long timeout, int state)
 {
 	return __wait_for_common(x, schedule_timeout, timeout, state);
 }
 
 static long __sched
-wait_for_common_io(struct completion *x, long timeout, int state)
+_wait_for_common_io(struct completion *x, long timeout, int state)
 {
 	return __wait_for_common(x, io_schedule_timeout, timeout, state);
 }
 
+#define wait_for_common(x, t, s)					\
+({ dept_wfc_wait(&(x)->dmap, _RET_IP_); _wait_for_common(x, t, s); })
+
+#define wait_for_common_io(x, t, s)					\
+({ dept_wfc_wait(&(x)->dmap, _RET_IP_); _wait_for_common_io(x, t, s); })
+
 /**
  * wait_for_completion: - waits for completion of a task
  * @x:  holds the state of this particular completion
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 07/21] dept: Apply Dept to seqlock
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

Makes Dept able to track dependencies by seqlock with adding wait
annotation on read side of seqlock.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/seqlock.h | 60 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 58 insertions(+), 2 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 37ded6b..47c3379 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -23,6 +23,23 @@
 
 #include <asm/processor.h>
 
+#ifdef CONFIG_DEPT
+#define DEPT_EVT_ALL		((1UL << DEPT_MAX_SUBCLASSES_EVT) - 1)
+#define dept_seq_wait(m, ip)	dept_wait(m, DEPT_EVT_ALL, ip, __func__, 0)
+#define dept_seq_writebegin(m, ip)				\
+do {								\
+	dept_ecxt_enter(m, 1UL, ip, __func__, "write_seqcount_end", 0);\
+} while (0)
+#define dept_seq_writeend(m, ip)				\
+do {								\
+	dept_ecxt_exit(m, 1UL, ip);				\
+} while (0)
+#else
+#define dept_seq_wait(m, ip)		do { } while (0)
+#define dept_seq_writebegin(m, ip)	do { } while (0)
+#define dept_seq_writeend(m, ip)	do { } while (0)
+#endif
+
 /*
  * The seqlock seqcount_t interface does not prescribe a precise sequence of
  * read begin/retry/end. For readers, typically there is a call to
@@ -82,7 +99,8 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 
 # define SEQCOUNT_DEP_MAP_INIT(lockname)				\
-		.dep_map = { .name = #lockname }
+		.dep_map = { .name = #lockname, \
+			     .dmap = DEPT_MAP_INITIALIZER(lockname) }
 
 /**
  * seqcount_init() - runtime initializer for seqcount_t
@@ -148,7 +166,7 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  * This lock-unlock technique must be implemented for all of PREEMPT_RT
  * sleeping locks.  See Documentation/locking/locktypes.rst
  */
-#if defined(CONFIG_LOCKDEP) || defined(CONFIG_PREEMPT_RT)
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_DEPT) || defined(CONFIG_PREEMPT_RT)
 #define __SEQ_LOCK(expr)	expr
 #else
 #define __SEQ_LOCK(expr)
@@ -203,6 +221,22 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
 	__SEQ_LOCK(locktype	*lock);					\
 } seqcount_##lockname##_t;						\
 									\
+static __always_inline void						\
+__seqprop_##lockname##_wait(const seqcount_##lockname##_t *s)		\
+{									\
+	__SEQ_LOCK(dept_seq_wait(&(lockmember)->dep_map.dmap, _RET_IP_));\
+}									\
+									\
+static __always_inline void						\
+__seqprop_##lockname##_writebegin(const seqcount_##lockname##_t *s)	\
+{									\
+}									\
+									\
+static __always_inline void						\
+__seqprop_##lockname##_writeend(const seqcount_##lockname##_t *s)	\
+{									\
+}									\
+									\
 static __always_inline seqcount_t *					\
 __seqprop_##lockname##_ptr(seqcount_##lockname##_t *s)			\
 {									\
@@ -271,6 +305,21 @@ static inline void __seqprop_assert(const seqcount_t *s)
 	lockdep_assert_preemption_disabled();
 }
 
+static inline void __seqprop_wait(seqcount_t *s)
+{
+	dept_seq_wait(&s->dep_map.dmap, _RET_IP_);
+}
+
+static inline void __seqprop_writebegin(seqcount_t *s)
+{
+	dept_seq_writebegin(&s->dep_map.dmap, _RET_IP_);
+}
+
+static inline void __seqprop_writeend(seqcount_t *s)
+{
+	dept_seq_writeend(&s->dep_map.dmap, _RET_IP_);
+}
+
 #define __SEQ_RT	IS_ENABLED(CONFIG_PREEMPT_RT)
 
 SEQCOUNT_LOCKNAME(raw_spinlock, raw_spinlock_t,  false,    s->lock,        raw_spin, raw_spin_lock(s->lock))
@@ -311,6 +360,9 @@ static inline void __seqprop_assert(const seqcount_t *s)
 #define seqprop_sequence(s)		__seqprop(s, sequence)
 #define seqprop_preemptible(s)		__seqprop(s, preemptible)
 #define seqprop_assert(s)		__seqprop(s, assert)
+#define seqprop_dept_wait(s)		__seqprop(s, wait)
+#define seqprop_dept_writebegin(s)	__seqprop(s, writebegin)
+#define seqprop_dept_writeend(s)	__seqprop(s, writeend)
 
 /**
  * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
@@ -360,6 +412,7 @@ static inline void __seqprop_assert(const seqcount_t *s)
 #define read_seqcount_begin(s)						\
 ({									\
 	seqcount_lockdep_reader_access(seqprop_ptr(s));			\
+	seqprop_dept_wait(s);						\
 	raw_read_seqcount_begin(s);					\
 })
 
@@ -512,6 +565,7 @@ static inline void do_raw_write_seqcount_end(seqcount_t *s)
 		preempt_disable();					\
 									\
 	do_write_seqcount_begin_nested(seqprop_ptr(s), subclass);	\
+	seqprop_dept_writebegin(s);					\
 } while (0)
 
 static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
@@ -538,6 +592,7 @@ static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
 		preempt_disable();					\
 									\
 	do_write_seqcount_begin(seqprop_ptr(s));			\
+	seqprop_dept_writebegin(s);					\
 } while (0)
 
 static inline void do_write_seqcount_begin(seqcount_t *s)
@@ -554,6 +609,7 @@ static inline void do_write_seqcount_begin(seqcount_t *s)
  */
 #define write_seqcount_end(s)						\
 do {									\
+	seqprop_dept_writeend(s);					\
 	do_write_seqcount_end(seqprop_ptr(s));				\
 									\
 	if (seqprop_preemptible(s))					\
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 07/21] dept: Apply Dept to seqlock
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Makes Dept able to track dependencies by seqlock with adding wait
annotation on read side of seqlock.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/seqlock.h | 60 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 58 insertions(+), 2 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 37ded6b..47c3379 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -23,6 +23,23 @@
 
 #include <asm/processor.h>
 
+#ifdef CONFIG_DEPT
+#define DEPT_EVT_ALL		((1UL << DEPT_MAX_SUBCLASSES_EVT) - 1)
+#define dept_seq_wait(m, ip)	dept_wait(m, DEPT_EVT_ALL, ip, __func__, 0)
+#define dept_seq_writebegin(m, ip)				\
+do {								\
+	dept_ecxt_enter(m, 1UL, ip, __func__, "write_seqcount_end", 0);\
+} while (0)
+#define dept_seq_writeend(m, ip)				\
+do {								\
+	dept_ecxt_exit(m, 1UL, ip);				\
+} while (0)
+#else
+#define dept_seq_wait(m, ip)		do { } while (0)
+#define dept_seq_writebegin(m, ip)	do { } while (0)
+#define dept_seq_writeend(m, ip)	do { } while (0)
+#endif
+
 /*
  * The seqlock seqcount_t interface does not prescribe a precise sequence of
  * read begin/retry/end. For readers, typically there is a call to
@@ -82,7 +99,8 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 
 # define SEQCOUNT_DEP_MAP_INIT(lockname)				\
-		.dep_map = { .name = #lockname }
+		.dep_map = { .name = #lockname, \
+			     .dmap = DEPT_MAP_INITIALIZER(lockname) }
 
 /**
  * seqcount_init() - runtime initializer for seqcount_t
@@ -148,7 +166,7 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  * This lock-unlock technique must be implemented for all of PREEMPT_RT
  * sleeping locks.  See Documentation/locking/locktypes.rst
  */
-#if defined(CONFIG_LOCKDEP) || defined(CONFIG_PREEMPT_RT)
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_DEPT) || defined(CONFIG_PREEMPT_RT)
 #define __SEQ_LOCK(expr)	expr
 #else
 #define __SEQ_LOCK(expr)
@@ -203,6 +221,22 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
 	__SEQ_LOCK(locktype	*lock);					\
 } seqcount_##lockname##_t;						\
 									\
+static __always_inline void						\
+__seqprop_##lockname##_wait(const seqcount_##lockname##_t *s)		\
+{									\
+	__SEQ_LOCK(dept_seq_wait(&(lockmember)->dep_map.dmap, _RET_IP_));\
+}									\
+									\
+static __always_inline void						\
+__seqprop_##lockname##_writebegin(const seqcount_##lockname##_t *s)	\
+{									\
+}									\
+									\
+static __always_inline void						\
+__seqprop_##lockname##_writeend(const seqcount_##lockname##_t *s)	\
+{									\
+}									\
+									\
 static __always_inline seqcount_t *					\
 __seqprop_##lockname##_ptr(seqcount_##lockname##_t *s)			\
 {									\
@@ -271,6 +305,21 @@ static inline void __seqprop_assert(const seqcount_t *s)
 	lockdep_assert_preemption_disabled();
 }
 
+static inline void __seqprop_wait(seqcount_t *s)
+{
+	dept_seq_wait(&s->dep_map.dmap, _RET_IP_);
+}
+
+static inline void __seqprop_writebegin(seqcount_t *s)
+{
+	dept_seq_writebegin(&s->dep_map.dmap, _RET_IP_);
+}
+
+static inline void __seqprop_writeend(seqcount_t *s)
+{
+	dept_seq_writeend(&s->dep_map.dmap, _RET_IP_);
+}
+
 #define __SEQ_RT	IS_ENABLED(CONFIG_PREEMPT_RT)
 
 SEQCOUNT_LOCKNAME(raw_spinlock, raw_spinlock_t,  false,    s->lock,        raw_spin, raw_spin_lock(s->lock))
@@ -311,6 +360,9 @@ static inline void __seqprop_assert(const seqcount_t *s)
 #define seqprop_sequence(s)		__seqprop(s, sequence)
 #define seqprop_preemptible(s)		__seqprop(s, preemptible)
 #define seqprop_assert(s)		__seqprop(s, assert)
+#define seqprop_dept_wait(s)		__seqprop(s, wait)
+#define seqprop_dept_writebegin(s)	__seqprop(s, writebegin)
+#define seqprop_dept_writeend(s)	__seqprop(s, writeend)
 
 /**
  * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
@@ -360,6 +412,7 @@ static inline void __seqprop_assert(const seqcount_t *s)
 #define read_seqcount_begin(s)						\
 ({									\
 	seqcount_lockdep_reader_access(seqprop_ptr(s));			\
+	seqprop_dept_wait(s);						\
 	raw_read_seqcount_begin(s);					\
 })
 
@@ -512,6 +565,7 @@ static inline void do_raw_write_seqcount_end(seqcount_t *s)
 		preempt_disable();					\
 									\
 	do_write_seqcount_begin_nested(seqprop_ptr(s), subclass);	\
+	seqprop_dept_writebegin(s);					\
 } while (0)
 
 static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
@@ -538,6 +592,7 @@ static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
 		preempt_disable();					\
 									\
 	do_write_seqcount_begin(seqprop_ptr(s));			\
+	seqprop_dept_writebegin(s);					\
 } while (0)
 
 static inline void do_write_seqcount_begin(seqcount_t *s)
@@ -554,6 +609,7 @@ static inline void do_write_seqcount_begin(seqcount_t *s)
  */
 #define write_seqcount_end(s)						\
 do {									\
+	seqprop_dept_writeend(s);					\
 	do_write_seqcount_end(seqprop_ptr(s));				\
 									\
 	if (seqprop_preemptible(s))					\
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 08/21] dept: Apply Dept to rwsem
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

Makes Dept able to track dependencies by rwsem.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h      | 24 ++++++++++++++++++++----
 include/linux/percpu-rwsem.h |  4 +++-
 include/linux/rwsem.h        | 22 ++++++++++++++++++++++
 3 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index b59d8f3..b0e097f 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -634,10 +634,26 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 	dept_mutex_unlock(&(l)->dmap, i);				\
 } while (0)
 
-#define rwsem_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
-#define rwsem_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
-#define rwsem_acquire_read(l, s, t, i)		lock_acquire_shared(l, s, t, NULL, i)
-#define rwsem_release(l, i)			lock_release(l, i)
+#define rwsem_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_rwsem_lock(&(l)->dmap, s, t, NULL, "up_write", i);		\
+} while (0)
+#define rwsem_acquire_nest(l, s, t, n, i)				\
+do {									\
+	lock_acquire_exclusive(l, s, t, n, i);				\
+	dept_rwsem_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "up_write", i);\
+} while (0)
+#define rwsem_acquire_read(l, s, t, i)					\
+do {									\
+	lock_acquire_shared(l, s, t, NULL, i);				\
+	dept_rwsem_lock(&(l)->dmap, s, t, NULL, "up_read", i);		\
+} while (0)
+#define rwsem_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_rwsem_unlock(&(l)->dmap, i);				\
+} while (0)
 
 #define lock_map_acquire(l)			lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_)
 #define lock_map_acquire_read(l)		lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_)
diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 5fda40f..9a0603d 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -21,7 +21,9 @@ struct percpu_rw_semaphore {
 };
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname },
+#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname)	.dep_map = {	\
+	.name = #lockname,					\
+	.dmap = DEPT_MAP_INITIALIZER(lockname) },
 #else
 #define __PERCPU_RWSEM_DEP_MAP_INIT(lockname)
 #endif
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index efa5c32..ed4c34e 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -21,6 +21,7 @@
 	.dep_map = {					\
 		.name = #lockname,			\
 		.wait_type_inner = LD_WAIT_SLEEP,	\
+		.dmap = DEPT_MAP_INITIALIZER(lockname)	\
 	},
 #else
 # define __RWSEM_DEP_MAP_INIT(lockname)
@@ -32,6 +33,27 @@
 #include <linux/osq_lock.h>
 #endif
 
+#ifdef CONFIG_DEPT
+#define dept_rwsem_lock(m, ne, t, n, e_fn, ip)				\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+	} else if (n) {							\
+		dept_ecxt_enter_nokeep(m);				\
+	} else {							\
+		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+	}								\
+} while (0)
+#define dept_rwsem_unlock(m, ip)					\
+do {									\
+	dept_ecxt_exit(m, 1UL, ip);					\
+} while (0)
+#else
+#define dept_rwsem_lock(m, ne, t, n, e_fn, ip)	do { } while (0)
+#define dept_rwsem_unlock(m, ip)		do { } while (0)
+#endif
+
 /*
  * For an uncontended rwsem, count and owner are the only fields a task
  * needs to touch when acquiring the rwsem. So they are put next to each
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 08/21] dept: Apply Dept to rwsem
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Makes Dept able to track dependencies by rwsem.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h      | 24 ++++++++++++++++++++----
 include/linux/percpu-rwsem.h |  4 +++-
 include/linux/rwsem.h        | 22 ++++++++++++++++++++++
 3 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index b59d8f3..b0e097f 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -634,10 +634,26 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 	dept_mutex_unlock(&(l)->dmap, i);				\
 } while (0)
 
-#define rwsem_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
-#define rwsem_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
-#define rwsem_acquire_read(l, s, t, i)		lock_acquire_shared(l, s, t, NULL, i)
-#define rwsem_release(l, i)			lock_release(l, i)
+#define rwsem_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_rwsem_lock(&(l)->dmap, s, t, NULL, "up_write", i);		\
+} while (0)
+#define rwsem_acquire_nest(l, s, t, n, i)				\
+do {									\
+	lock_acquire_exclusive(l, s, t, n, i);				\
+	dept_rwsem_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "up_write", i);\
+} while (0)
+#define rwsem_acquire_read(l, s, t, i)					\
+do {									\
+	lock_acquire_shared(l, s, t, NULL, i);				\
+	dept_rwsem_lock(&(l)->dmap, s, t, NULL, "up_read", i);		\
+} while (0)
+#define rwsem_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_rwsem_unlock(&(l)->dmap, i);				\
+} while (0)
 
 #define lock_map_acquire(l)			lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_)
 #define lock_map_acquire_read(l)		lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_)
diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 5fda40f..9a0603d 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -21,7 +21,9 @@ struct percpu_rw_semaphore {
 };
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname },
+#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname)	.dep_map = {	\
+	.name = #lockname,					\
+	.dmap = DEPT_MAP_INITIALIZER(lockname) },
 #else
 #define __PERCPU_RWSEM_DEP_MAP_INIT(lockname)
 #endif
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index efa5c32..ed4c34e 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -21,6 +21,7 @@
 	.dep_map = {					\
 		.name = #lockname,			\
 		.wait_type_inner = LD_WAIT_SLEEP,	\
+		.dmap = DEPT_MAP_INITIALIZER(lockname)	\
 	},
 #else
 # define __RWSEM_DEP_MAP_INIT(lockname)
@@ -32,6 +33,27 @@
 #include <linux/osq_lock.h>
 #endif
 
+#ifdef CONFIG_DEPT
+#define dept_rwsem_lock(m, ne, t, n, e_fn, ip)				\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+	} else if (n) {							\
+		dept_ecxt_enter_nokeep(m);				\
+	} else {							\
+		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+	}								\
+} while (0)
+#define dept_rwsem_unlock(m, ip)					\
+do {									\
+	dept_ecxt_exit(m, 1UL, ip);					\
+} while (0)
+#else
+#define dept_rwsem_lock(m, ne, t, n, e_fn, ip)	do { } while (0)
+#define dept_rwsem_unlock(m, ip)		do { } while (0)
+#endif
+
 /*
  * For an uncontended rwsem, count and owner are the only fields a task
  * needs to touch when acquiring the rwsem. So they are put next to each
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 09/21] dept: Add proc knobs to show stats and dependency graph
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

It'd be useful to show Dept internal stats and dependency graph on
runtime via proc for better information. Introduced the knobs.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/dependency/Makefile        |  1 +
 kernel/dependency/dept.c          | 24 ++++------
 kernel/dependency/dept_internal.h | 26 +++++++++++
 kernel/dependency/dept_proc.c     | 92 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 128 insertions(+), 15 deletions(-)
 create mode 100644 kernel/dependency/dept_internal.h
 create mode 100644 kernel/dependency/dept_proc.c

diff --git a/kernel/dependency/Makefile b/kernel/dependency/Makefile
index b5cfb8a..92f1654 100644
--- a/kernel/dependency/Makefile
+++ b/kernel/dependency/Makefile
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_DEPT) += dept.o
+obj-$(CONFIG_DEPT) += dept_proc.o
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 1e90284..4670eec 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -73,6 +73,7 @@
 #include <linux/hash.h>
 #include <linux/dept.h>
 #include <linux/utsname.h>
+#include "dept_internal.h"
 
 static int dept_stop;
 static int dept_per_cpu_ready;
@@ -235,20 +236,13 @@ static inline struct dept_task *dept_task(void)
  *       have been freed will be placed.
  */
 
-enum object_t {
-#define OBJECT(id, nr) OBJECT_##id,
-	#include "dept_object.h"
-#undef  OBJECT
-	OBJECT_NR,
-};
-
 #define OBJECT(id, nr)							\
 static struct dept_##id spool_##id[nr];					\
 static DEFINE_PER_CPU(struct llist_head, lpool_##id);
 	#include "dept_object.h"
 #undef  OBJECT
 
-static struct dept_pool pool[OBJECT_NR] = {
+struct dept_pool dept_pool[OBJECT_NR] = {
 #define OBJECT(id, nr) {						\
 	.name = #id,							\
 	.obj_sz = sizeof(struct dept_##id),				\
@@ -278,7 +272,7 @@ static void *from_pool(enum object_t t)
 	if (DEPT_WARN_ON(!irqs_disabled()))
 		return NULL;
 
-	p = &pool[t];
+	p = &dept_pool[t];
 
 	/*
 	 * Try local pool first.
@@ -308,7 +302,7 @@ static void *from_pool(enum object_t t)
 
 static void to_pool(void *o, enum object_t t)
 {
-	struct dept_pool *p = &pool[t];
+	struct dept_pool *p = &dept_pool[t];
 	struct llist_head *h;
 
 	preempt_disable();
@@ -1960,7 +1954,7 @@ void dept_map_nocheck(struct dept_map *m)
 }
 EXPORT_SYMBOL_GPL(dept_map_nocheck);
 
-static LIST_HEAD(classes);
+LIST_HEAD(dept_classes);
 
 static inline bool within(const void *addr, void *start, unsigned long size)
 {
@@ -1992,7 +1986,7 @@ void dept_free_range(void *start, unsigned int sz)
 	while (unlikely(!dept_lock()))
 		cpu_relax();
 
-	list_for_each_entry_safe(c, n, &classes, all_node) {
+	list_for_each_entry_safe(c, n, &dept_classes, all_node) {
 		if (!within((void *)c->key, start, sz) &&
 		    !within(c->name, start, sz))
 			continue;
@@ -2061,7 +2055,7 @@ static struct dept_class *check_new_class(struct dept_key *local,
 	c->sub = sub;
 	c->key = (unsigned long)(k->subkeys + sub);
 	hash_add_class(c);
-	list_add(&c->all_node, &classes);
+	list_add(&c->all_node, &dept_classes);
 unlock:
 	dept_unlock();
 caching:
@@ -2585,8 +2579,8 @@ static void migrate_per_cpu_pool(void)
 		struct llist_head *from;
 		struct llist_head *to;
 
-		from = &pool[i].boot_pool;
-		to = per_cpu_ptr(pool[i].lpool, boot_cpu);
+		from = &dept_pool[i].boot_pool;
+		to = per_cpu_ptr(dept_pool[i].lpool, boot_cpu);
 		move_llist(to, from);
 	}
 }
diff --git a/kernel/dependency/dept_internal.h b/kernel/dependency/dept_internal.h
new file mode 100644
index 00000000..007c1ee
--- /dev/null
+++ b/kernel/dependency/dept_internal.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Dept(DEPendency Tracker) - runtime dependency tracker internal header
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __DEPT_INTERNAL_H
+#define __DEPT_INTERNAL_H
+
+#ifdef CONFIG_DEPT
+
+enum object_t {
+#define OBJECT(id, nr) OBJECT_##id,
+	#include "dept_object.h"
+#undef  OBJECT
+	OBJECT_NR,
+};
+
+extern struct list_head dept_classes;
+extern struct dept_pool dept_pool[];
+
+#endif
+#endif /* __DEPT_INTERNAL_H */
diff --git a/kernel/dependency/dept_proc.c b/kernel/dependency/dept_proc.c
new file mode 100644
index 00000000..c069354
--- /dev/null
+++ b/kernel/dependency/dept_proc.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Procfs knobs for Dept(DEPendency Tracker)
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (C) 2021 LG Electronics, Inc. , Byungchul Park
+ */
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/dept.h>
+#include "dept_internal.h"
+
+static void *l_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	/*
+	 * XXX: Serialize list traversal if needed. The following might
+	 * give a wrong information on contention.
+	 */
+	return seq_list_next(v, &dept_classes, pos);
+}
+
+static void *l_start(struct seq_file *m, loff_t *pos)
+{
+	/*
+	 * XXX: Serialize list traversal if needed. The following might
+	 * give a wrong information on contention.
+	 */
+	return seq_list_start_head(&dept_classes, *pos);
+}
+
+static void l_stop(struct seq_file *m, void *v)
+{
+}
+
+static int l_show(struct seq_file *m, void *v)
+{
+	struct dept_class *fc = list_entry(v, struct dept_class, all_node);
+	struct dept_dep *d;
+
+	if (v == &dept_classes) {
+		seq_puts(m, "All classes:\n\n");
+		return 0;
+	}
+
+	seq_printf(m, "[%p] %s\n", (void *)fc->key, fc->name);
+
+	/*
+	 * XXX: Serialize list traversal if needed. The following might
+	 * give a wrong information on contention.
+	 */
+	list_for_each_entry(d, &fc->dep_head, dep_node) {
+		struct dept_class *tc = d->wait->class;
+
+		seq_printf(m, " -> [%p] %s\n", (void *)tc->key, tc->name);
+	}
+	seq_puts(m, "\n");
+
+	return 0;
+}
+
+static const struct seq_operations dept_deps_ops = {
+	.start	= l_start,
+	.next	= l_next,
+	.stop	= l_stop,
+	.show	= l_show,
+};
+
+static int dept_stats_show(struct seq_file *m, void *v)
+{
+	int r;
+
+	seq_puts(m, "Availability in the static pools:\n\n");
+#define OBJECT(id, nr)							\
+	r = atomic_read(&dept_pool[OBJECT_##id].obj_nr);		\
+	if (r < 0)							\
+		r = 0;							\
+	seq_printf(m, "%s\t%d/%d(%d%%)\n", #id, r, nr, (r * 100) / (nr));
+	#include "dept_object.h"
+#undef  OBJECT
+
+	return 0;
+}
+
+static int __init dept_proc_init(void)
+{
+	proc_create_seq("dept_deps", S_IRUSR, NULL, &dept_deps_ops);
+	proc_create_single("dept_stats", S_IRUSR, NULL, dept_stats_show);
+	return 0;
+}
+
+__initcall(dept_proc_init);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 09/21] dept: Add proc knobs to show stats and dependency graph
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

It'd be useful to show Dept internal stats and dependency graph on
runtime via proc for better information. Introduced the knobs.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/dependency/Makefile        |  1 +
 kernel/dependency/dept.c          | 24 ++++------
 kernel/dependency/dept_internal.h | 26 +++++++++++
 kernel/dependency/dept_proc.c     | 92 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 128 insertions(+), 15 deletions(-)
 create mode 100644 kernel/dependency/dept_internal.h
 create mode 100644 kernel/dependency/dept_proc.c

diff --git a/kernel/dependency/Makefile b/kernel/dependency/Makefile
index b5cfb8a..92f1654 100644
--- a/kernel/dependency/Makefile
+++ b/kernel/dependency/Makefile
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_DEPT) += dept.o
+obj-$(CONFIG_DEPT) += dept_proc.o
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 1e90284..4670eec 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -73,6 +73,7 @@
 #include <linux/hash.h>
 #include <linux/dept.h>
 #include <linux/utsname.h>
+#include "dept_internal.h"
 
 static int dept_stop;
 static int dept_per_cpu_ready;
@@ -235,20 +236,13 @@ static inline struct dept_task *dept_task(void)
  *       have been freed will be placed.
  */
 
-enum object_t {
-#define OBJECT(id, nr) OBJECT_##id,
-	#include "dept_object.h"
-#undef  OBJECT
-	OBJECT_NR,
-};
-
 #define OBJECT(id, nr)							\
 static struct dept_##id spool_##id[nr];					\
 static DEFINE_PER_CPU(struct llist_head, lpool_##id);
 	#include "dept_object.h"
 #undef  OBJECT
 
-static struct dept_pool pool[OBJECT_NR] = {
+struct dept_pool dept_pool[OBJECT_NR] = {
 #define OBJECT(id, nr) {						\
 	.name = #id,							\
 	.obj_sz = sizeof(struct dept_##id),				\
@@ -278,7 +272,7 @@ static void *from_pool(enum object_t t)
 	if (DEPT_WARN_ON(!irqs_disabled()))
 		return NULL;
 
-	p = &pool[t];
+	p = &dept_pool[t];
 
 	/*
 	 * Try local pool first.
@@ -308,7 +302,7 @@ static void *from_pool(enum object_t t)
 
 static void to_pool(void *o, enum object_t t)
 {
-	struct dept_pool *p = &pool[t];
+	struct dept_pool *p = &dept_pool[t];
 	struct llist_head *h;
 
 	preempt_disable();
@@ -1960,7 +1954,7 @@ void dept_map_nocheck(struct dept_map *m)
 }
 EXPORT_SYMBOL_GPL(dept_map_nocheck);
 
-static LIST_HEAD(classes);
+LIST_HEAD(dept_classes);
 
 static inline bool within(const void *addr, void *start, unsigned long size)
 {
@@ -1992,7 +1986,7 @@ void dept_free_range(void *start, unsigned int sz)
 	while (unlikely(!dept_lock()))
 		cpu_relax();
 
-	list_for_each_entry_safe(c, n, &classes, all_node) {
+	list_for_each_entry_safe(c, n, &dept_classes, all_node) {
 		if (!within((void *)c->key, start, sz) &&
 		    !within(c->name, start, sz))
 			continue;
@@ -2061,7 +2055,7 @@ static struct dept_class *check_new_class(struct dept_key *local,
 	c->sub = sub;
 	c->key = (unsigned long)(k->subkeys + sub);
 	hash_add_class(c);
-	list_add(&c->all_node, &classes);
+	list_add(&c->all_node, &dept_classes);
 unlock:
 	dept_unlock();
 caching:
@@ -2585,8 +2579,8 @@ static void migrate_per_cpu_pool(void)
 		struct llist_head *from;
 		struct llist_head *to;
 
-		from = &pool[i].boot_pool;
-		to = per_cpu_ptr(pool[i].lpool, boot_cpu);
+		from = &dept_pool[i].boot_pool;
+		to = per_cpu_ptr(dept_pool[i].lpool, boot_cpu);
 		move_llist(to, from);
 	}
 }
diff --git a/kernel/dependency/dept_internal.h b/kernel/dependency/dept_internal.h
new file mode 100644
index 00000000..007c1ee
--- /dev/null
+++ b/kernel/dependency/dept_internal.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Dept(DEPendency Tracker) - runtime dependency tracker internal header
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __DEPT_INTERNAL_H
+#define __DEPT_INTERNAL_H
+
+#ifdef CONFIG_DEPT
+
+enum object_t {
+#define OBJECT(id, nr) OBJECT_##id,
+	#include "dept_object.h"
+#undef  OBJECT
+	OBJECT_NR,
+};
+
+extern struct list_head dept_classes;
+extern struct dept_pool dept_pool[];
+
+#endif
+#endif /* __DEPT_INTERNAL_H */
diff --git a/kernel/dependency/dept_proc.c b/kernel/dependency/dept_proc.c
new file mode 100644
index 00000000..c069354
--- /dev/null
+++ b/kernel/dependency/dept_proc.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Procfs knobs for Dept(DEPendency Tracker)
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (C) 2021 LG Electronics, Inc. , Byungchul Park
+ */
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/dept.h>
+#include "dept_internal.h"
+
+static void *l_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	/*
+	 * XXX: Serialize list traversal if needed. The following might
+	 * give a wrong information on contention.
+	 */
+	return seq_list_next(v, &dept_classes, pos);
+}
+
+static void *l_start(struct seq_file *m, loff_t *pos)
+{
+	/*
+	 * XXX: Serialize list traversal if needed. The following might
+	 * give a wrong information on contention.
+	 */
+	return seq_list_start_head(&dept_classes, *pos);
+}
+
+static void l_stop(struct seq_file *m, void *v)
+{
+}
+
+static int l_show(struct seq_file *m, void *v)
+{
+	struct dept_class *fc = list_entry(v, struct dept_class, all_node);
+	struct dept_dep *d;
+
+	if (v == &dept_classes) {
+		seq_puts(m, "All classes:\n\n");
+		return 0;
+	}
+
+	seq_printf(m, "[%p] %s\n", (void *)fc->key, fc->name);
+
+	/*
+	 * XXX: Serialize list traversal if needed. The following might
+	 * give a wrong information on contention.
+	 */
+	list_for_each_entry(d, &fc->dep_head, dep_node) {
+		struct dept_class *tc = d->wait->class;
+
+		seq_printf(m, " -> [%p] %s\n", (void *)tc->key, tc->name);
+	}
+	seq_puts(m, "\n");
+
+	return 0;
+}
+
+static const struct seq_operations dept_deps_ops = {
+	.start	= l_start,
+	.next	= l_next,
+	.stop	= l_stop,
+	.show	= l_show,
+};
+
+static int dept_stats_show(struct seq_file *m, void *v)
+{
+	int r;
+
+	seq_puts(m, "Availability in the static pools:\n\n");
+#define OBJECT(id, nr)							\
+	r = atomic_read(&dept_pool[OBJECT_##id].obj_nr);		\
+	if (r < 0)							\
+		r = 0;							\
+	seq_printf(m, "%s\t%d/%d(%d%%)\n", #id, r, nr, (r * 100) / (nr));
+	#include "dept_object.h"
+#undef  OBJECT
+
+	return 0;
+}
+
+static int __init dept_proc_init(void)
+{
+	proc_create_seq("dept_deps", S_IRUSR, NULL, &dept_deps_ops);
+	proc_create_single("dept_stats", S_IRUSR, NULL, dept_stats_show);
+	return 0;
+}
+
+__initcall(dept_proc_init);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 10/21] dept: Introduce split map concept and new APIs for them
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

There is a case where total maps for its wait/event is so large in size.
For instance, struct page for PG_locked and PG_writeback is the case.
The additional memory size for the maps would be 'the # of pages *
sizeof(struct dept_map)' if each struct page keeps its map all the way,
which might be too big to accept in some systems.

It'd better have split map. One is for each instance and the other is
for what is commonly used. So split map and added new APIs for them.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/dept.h     |  78 ++++++++++++++++++-------
 kernel/dependency/dept.c | 146 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 203 insertions(+), 21 deletions(-)

diff --git a/include/linux/dept.h b/include/linux/dept.h
index c498060..9698134 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -367,6 +367,30 @@ struct dept_map {
 	.nocheck = false,						\
 }
 
+struct dept_map_each {
+	/*
+	 * wait timestamp associated to this map
+	 */
+	unsigned int wgen;
+};
+
+struct dept_map_common {
+	const char *name;
+	struct dept_key *keys;
+	int sub_usr;
+
+	/*
+	 * It's local copy for fast acces to the associated classes. And
+	 * Also used for dept_key instance for statically defined map.
+	 */
+	struct dept_key keys_local;
+
+	/*
+	 * whether this map should be going to be checked or not
+	 */
+	bool nocheck;
+};
+
 struct dept_task {
 	/*
 	 * all event contexts that have entered and before exiting
@@ -468,6 +492,11 @@ struct dept_task {
 extern void dept_ask_event(struct dept_map *m);
 extern void dept_event(struct dept_map *m, unsigned long e_f, unsigned long ip, const char *e_fn);
 extern void dept_ecxt_exit(struct dept_map *m, unsigned long e_f, unsigned long ip);
+extern void dept_split_map_each_init(struct dept_map_each *me);
+extern void dept_split_map_common_init(struct dept_map_common *mc, struct dept_key *k, const char *n);
+extern void dept_wait_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *w_fn, int ne);
+extern void dept_event_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *e_fn);
+extern void dept_ask_event_split_map(struct dept_map_each *me, struct dept_map_common *mc);
 
 static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 {
@@ -490,32 +519,39 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 #else /* !CONFIG_DEPT */
 struct dept_key  { };
 struct dept_map  { };
+struct dept_map_each    { };
+struct dept_map_commmon { };
 struct dept_task { };
 
 #define DEPT_MAP_INITIALIZER(n) { }
 #define DEPT_TASK_INITIALIZER(t) { }
 
-#define dept_on()				do { } while (0)
-#define dept_off()				do { } while (0)
-#define dept_init()				do { } while (0)
-#define dept_task_init(t)			do { } while (0)
-#define dept_task_exit(t)			do { } while (0)
-#define dept_free_range(s, sz)			do { } while (0)
-#define dept_map_init(m, k, s, n)		do { (void)(n); (void)(k); } while (0)
-#define dept_map_reinit(m)			do { } while (0)
-#define dept_map_nocheck(m)			do { } while (0)
-
-#define dept_wait(m, w_f, ip, w_fn, ne)		do { (void)(w_fn); } while (0)
-#define dept_stage_wait(m, w_f, w_fn, ne)	do { (void)(w_fn); } while (0)
-#define dept_ask_event_wait_commit(ip)		do { } while (0)
-#define dept_clean_stage()			do { } while (0)
-#define dept_ecxt_enter(m, e_f, ip, c_fn, e_fn, ne) do { (void)(c_fn); (void)(e_fn); } while (0)
-#define dept_ask_event(m)			do { } while (0)
-#define dept_event(m, e_f, ip, e_fn)		do { (void)(e_fn); } while (0)
-#define dept_ecxt_exit(m, e_f, ip)		do { } while (0)
-#define dept_ecxt_enter_nokeep(m)		do { } while (0)
-#define dept_key_init(k)			do { (void)(k); } while (0)
-#define dept_key_destroy(k)			do { (void)(k); } while (0)
+#define dept_on()					do { } while (0)
+#define dept_off()					do { } while (0)
+#define dept_init()					do { } while (0)
+#define dept_task_init(t)				do { } while (0)
+#define dept_task_exit(t)				do { } while (0)
+#define dept_free_range(s, sz)				do { } while (0)
+#define dept_map_init(m, k, s, n)			do { (void)(n); (void)(k); } while (0)
+#define dept_map_reinit(m)				do { } while (0)
+#define dept_map_nocheck(m)				do { } while (0)
+
+#define dept_wait(m, w_f, ip, w_fn, ne)			do { (void)(w_fn); } while (0)
+#define dept_stage_wait(m, w_f, w_fn, ne)		do { (void)(w_fn); } while (0)
+#define dept_ask_event_wait_commit(ip)			do { } while (0)
+#define dept_clean_stage()				do { } while (0)
+#define dept_ecxt_enter(m, e_f, ip, c_fn, e_fn, ne)	do { (void)(c_fn); (void)(e_fn); } while (0)
+#define dept_ask_event(m)				do { } while (0)
+#define dept_event(m, e_f, ip, e_fn)			do { (void)(e_fn); } while (0)
+#define dept_ecxt_exit(m, e_f, ip)			do { } while (0)
+#define dept_split_map_each_init(me)			do { } while (0)
+#define dept_split_map_common_init(mc, k, n)		do { (void)(n); (void)(k); } while (0)
+#define dept_wait_split_map(me, mc, ip, w_fn, ne)	do { } while (0)
+#define dept_event_split_map(me, mc, ip, e_fn)		do { } while (0)
+#define dept_ask_event_split_map(me, mc)		do { } while (0)
+#define dept_ecxt_enter_nokeep(m)			do { } while (0)
+#define dept_key_init(k)				do { (void)(k); } while (0)
+#define dept_key_destroy(k)				do { (void)(k); } while (0)
 
 #define dept_softirq_enter()				do { } while (0)
 #define dept_hardirq_enter()				do { } while (0)
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 4670eec..a0413f1 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -2427,6 +2427,152 @@ void dept_ecxt_exit(struct dept_map *m, unsigned long e_f,
 }
 EXPORT_SYMBOL_GPL(dept_ecxt_exit);
 
+void dept_split_map_each_init(struct dept_map_each *me)
+{
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	me->wgen = 0U;
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_split_map_each_init);
+
+void dept_split_map_common_init(struct dept_map_common *mc,
+				struct dept_key *k, const char *n)
+{
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	clean_classes_cache(&mc->keys_local);
+
+	/*
+	 * sub_usr is not used with split map.
+	 */
+	mc->sub_usr = 0;
+	mc->keys = k;
+	mc->name = n;
+	mc->nocheck = false;
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_split_map_common_init);
+
+void dept_wait_split_map(struct dept_map_each *me,
+			 struct dept_map_common *mc,
+			 unsigned long ip, const char *w_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_class *c;
+	struct dept_key *k;
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive)
+		return;
+
+	if (mc->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	k = mc->keys ?: &mc->keys_local;
+	c = check_new_class(&mc->keys_local, k, 0, mc->name);
+	if (c)
+		add_wait(c, ip, w_fn, ne);
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_wait_split_map);
+
+void dept_ask_event_split_map(struct dept_map_each *me,
+			      struct dept_map_common *mc)
+{
+	unsigned int wg;
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (mc->nocheck)
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	/*
+	 * Avoid zero wgen.
+	 */
+	wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+	WRITE_ONCE(me->wgen, wg);
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_ask_event_split_map);
+
+void dept_event_split_map(struct dept_map_each *me,
+			  struct dept_map_common *mc,
+			  unsigned long ip, const char *e_fn)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_class *c;
+	struct dept_key *k;
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		/*
+		 * Dept won't work with this map even though anyway an
+		 * event has been just triggered. Don't make it confused
+		 * at that time handling the next event. Disable it
+		 * until the next real case.
+		 */
+		WRITE_ONCE(me->wgen, 0U);
+		return;
+	}
+
+	if (mc->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	k = mc->keys ?: &mc->keys_local;
+	c = check_new_class(&mc->keys_local, k, 0, mc->name);
+
+	if (c && add_ecxt((void *)me, c, 0UL, NULL, e_fn, 0)) {
+		do_event((void *)me, c, READ_ONCE(me->wgen), ip);
+		pop_ecxt((void *)me, c);
+	}
+
+	/*
+	 * Keep the map diabled until the next sleep.
+	 */
+	WRITE_ONCE(me->wgen, 0U);
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_event_split_map);
+
 void dept_task_exit(struct task_struct *t)
 {
 	struct dept_task *dt = &t->dept_task;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 10/21] dept: Introduce split map concept and new APIs for them
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

There is a case where total maps for its wait/event is so large in size.
For instance, struct page for PG_locked and PG_writeback is the case.
The additional memory size for the maps would be 'the # of pages *
sizeof(struct dept_map)' if each struct page keeps its map all the way,
which might be too big to accept in some systems.

It'd better have split map. One is for each instance and the other is
for what is commonly used. So split map and added new APIs for them.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/dept.h     |  78 ++++++++++++++++++-------
 kernel/dependency/dept.c | 146 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 203 insertions(+), 21 deletions(-)

diff --git a/include/linux/dept.h b/include/linux/dept.h
index c498060..9698134 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -367,6 +367,30 @@ struct dept_map {
 	.nocheck = false,						\
 }
 
+struct dept_map_each {
+	/*
+	 * wait timestamp associated to this map
+	 */
+	unsigned int wgen;
+};
+
+struct dept_map_common {
+	const char *name;
+	struct dept_key *keys;
+	int sub_usr;
+
+	/*
+	 * It's local copy for fast acces to the associated classes. And
+	 * Also used for dept_key instance for statically defined map.
+	 */
+	struct dept_key keys_local;
+
+	/*
+	 * whether this map should be going to be checked or not
+	 */
+	bool nocheck;
+};
+
 struct dept_task {
 	/*
 	 * all event contexts that have entered and before exiting
@@ -468,6 +492,11 @@ struct dept_task {
 extern void dept_ask_event(struct dept_map *m);
 extern void dept_event(struct dept_map *m, unsigned long e_f, unsigned long ip, const char *e_fn);
 extern void dept_ecxt_exit(struct dept_map *m, unsigned long e_f, unsigned long ip);
+extern void dept_split_map_each_init(struct dept_map_each *me);
+extern void dept_split_map_common_init(struct dept_map_common *mc, struct dept_key *k, const char *n);
+extern void dept_wait_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *w_fn, int ne);
+extern void dept_event_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *e_fn);
+extern void dept_ask_event_split_map(struct dept_map_each *me, struct dept_map_common *mc);
 
 static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 {
@@ -490,32 +519,39 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 #else /* !CONFIG_DEPT */
 struct dept_key  { };
 struct dept_map  { };
+struct dept_map_each    { };
+struct dept_map_commmon { };
 struct dept_task { };
 
 #define DEPT_MAP_INITIALIZER(n) { }
 #define DEPT_TASK_INITIALIZER(t) { }
 
-#define dept_on()				do { } while (0)
-#define dept_off()				do { } while (0)
-#define dept_init()				do { } while (0)
-#define dept_task_init(t)			do { } while (0)
-#define dept_task_exit(t)			do { } while (0)
-#define dept_free_range(s, sz)			do { } while (0)
-#define dept_map_init(m, k, s, n)		do { (void)(n); (void)(k); } while (0)
-#define dept_map_reinit(m)			do { } while (0)
-#define dept_map_nocheck(m)			do { } while (0)
-
-#define dept_wait(m, w_f, ip, w_fn, ne)		do { (void)(w_fn); } while (0)
-#define dept_stage_wait(m, w_f, w_fn, ne)	do { (void)(w_fn); } while (0)
-#define dept_ask_event_wait_commit(ip)		do { } while (0)
-#define dept_clean_stage()			do { } while (0)
-#define dept_ecxt_enter(m, e_f, ip, c_fn, e_fn, ne) do { (void)(c_fn); (void)(e_fn); } while (0)
-#define dept_ask_event(m)			do { } while (0)
-#define dept_event(m, e_f, ip, e_fn)		do { (void)(e_fn); } while (0)
-#define dept_ecxt_exit(m, e_f, ip)		do { } while (0)
-#define dept_ecxt_enter_nokeep(m)		do { } while (0)
-#define dept_key_init(k)			do { (void)(k); } while (0)
-#define dept_key_destroy(k)			do { (void)(k); } while (0)
+#define dept_on()					do { } while (0)
+#define dept_off()					do { } while (0)
+#define dept_init()					do { } while (0)
+#define dept_task_init(t)				do { } while (0)
+#define dept_task_exit(t)				do { } while (0)
+#define dept_free_range(s, sz)				do { } while (0)
+#define dept_map_init(m, k, s, n)			do { (void)(n); (void)(k); } while (0)
+#define dept_map_reinit(m)				do { } while (0)
+#define dept_map_nocheck(m)				do { } while (0)
+
+#define dept_wait(m, w_f, ip, w_fn, ne)			do { (void)(w_fn); } while (0)
+#define dept_stage_wait(m, w_f, w_fn, ne)		do { (void)(w_fn); } while (0)
+#define dept_ask_event_wait_commit(ip)			do { } while (0)
+#define dept_clean_stage()				do { } while (0)
+#define dept_ecxt_enter(m, e_f, ip, c_fn, e_fn, ne)	do { (void)(c_fn); (void)(e_fn); } while (0)
+#define dept_ask_event(m)				do { } while (0)
+#define dept_event(m, e_f, ip, e_fn)			do { (void)(e_fn); } while (0)
+#define dept_ecxt_exit(m, e_f, ip)			do { } while (0)
+#define dept_split_map_each_init(me)			do { } while (0)
+#define dept_split_map_common_init(mc, k, n)		do { (void)(n); (void)(k); } while (0)
+#define dept_wait_split_map(me, mc, ip, w_fn, ne)	do { } while (0)
+#define dept_event_split_map(me, mc, ip, e_fn)		do { } while (0)
+#define dept_ask_event_split_map(me, mc)		do { } while (0)
+#define dept_ecxt_enter_nokeep(m)			do { } while (0)
+#define dept_key_init(k)				do { (void)(k); } while (0)
+#define dept_key_destroy(k)				do { (void)(k); } while (0)
 
 #define dept_softirq_enter()				do { } while (0)
 #define dept_hardirq_enter()				do { } while (0)
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 4670eec..a0413f1 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -2427,6 +2427,152 @@ void dept_ecxt_exit(struct dept_map *m, unsigned long e_f,
 }
 EXPORT_SYMBOL_GPL(dept_ecxt_exit);
 
+void dept_split_map_each_init(struct dept_map_each *me)
+{
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	me->wgen = 0U;
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_split_map_each_init);
+
+void dept_split_map_common_init(struct dept_map_common *mc,
+				struct dept_key *k, const char *n)
+{
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	clean_classes_cache(&mc->keys_local);
+
+	/*
+	 * sub_usr is not used with split map.
+	 */
+	mc->sub_usr = 0;
+	mc->keys = k;
+	mc->name = n;
+	mc->nocheck = false;
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_split_map_common_init);
+
+void dept_wait_split_map(struct dept_map_each *me,
+			 struct dept_map_common *mc,
+			 unsigned long ip, const char *w_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_class *c;
+	struct dept_key *k;
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive)
+		return;
+
+	if (mc->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	k = mc->keys ?: &mc->keys_local;
+	c = check_new_class(&mc->keys_local, k, 0, mc->name);
+	if (c)
+		add_wait(c, ip, w_fn, ne);
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_wait_split_map);
+
+void dept_ask_event_split_map(struct dept_map_each *me,
+			      struct dept_map_common *mc)
+{
+	unsigned int wg;
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (mc->nocheck)
+		return;
+
+	/*
+	 * Allow recursive entrance.
+	 */
+	flags = dept_enter_recursive();
+
+	/*
+	 * Avoid zero wgen.
+	 */
+	wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+	WRITE_ONCE(me->wgen, wg);
+
+	dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_ask_event_split_map);
+
+void dept_event_split_map(struct dept_map_each *me,
+			  struct dept_map_common *mc,
+			  unsigned long ip, const char *e_fn)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_class *c;
+	struct dept_key *k;
+	unsigned long flags;
+
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		/*
+		 * Dept won't work with this map even though anyway an
+		 * event has been just triggered. Don't make it confused
+		 * at that time handling the next event. Disable it
+		 * until the next real case.
+		 */
+		WRITE_ONCE(me->wgen, 0U);
+		return;
+	}
+
+	if (mc->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	k = mc->keys ?: &mc->keys_local;
+	c = check_new_class(&mc->keys_local, k, 0, mc->name);
+
+	if (c && add_ecxt((void *)me, c, 0UL, NULL, e_fn, 0)) {
+		do_event((void *)me, c, READ_ONCE(me->wgen), ip);
+		pop_ecxt((void *)me, c);
+	}
+
+	/*
+	 * Keep the map diabled until the next sleep.
+	 */
+	WRITE_ONCE(me->wgen, 0U);
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_event_split_map);
+
 void dept_task_exit(struct task_struct *t)
 {
 	struct dept_task *dt = &t->dept_task;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 11/21] dept: Apply Dept to wait/event of PG_{locked,writeback}
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

Makes Dept able to track dependencies by PG_{locked,writeback}. For
instance, (un)lock_page() generates that type of dependency.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/dept_page.h       | 78 +++++++++++++++++++++++++++++++++++++++++
 include/linux/page-flags.h      | 45 ++++++++++++++++++++++--
 include/linux/pagemap.h         |  7 +++-
 init/main.c                     |  2 ++
 kernel/dependency/dept_object.h |  2 +-
 lib/Kconfig.debug               |  1 +
 mm/filemap.c                    | 68 +++++++++++++++++++++++++++++++++++
 mm/page_ext.c                   |  5 +++
 8 files changed, 204 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/dept_page.h

diff --git a/include/linux/dept_page.h b/include/linux/dept_page.h
new file mode 100644
index 00000000..d2d093d
--- /dev/null
+++ b/include/linux/dept_page.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_DEPT_PAGE_H
+#define __LINUX_DEPT_PAGE_H
+
+#ifdef CONFIG_DEPT
+#include <linux/dept.h>
+
+extern struct page_ext_operations dept_pglocked_ops;
+extern struct page_ext_operations dept_pgwriteback_ops;
+extern struct dept_map_common pglocked_mc;
+extern struct dept_map_common pgwriteback_mc;
+
+extern void dept_page_init(void);
+extern struct dept_map_each *get_pglocked_me(struct page *page);
+extern struct dept_map_each *get_pgwriteback_me(struct page *page);
+
+#define dept_pglocked_wait(f)					\
+do {								\
+	struct dept_map_each *me = get_pglocked_me(&(f)->page);	\
+								\
+	if (likely(me))						\
+		dept_wait_split_map(me, &pglocked_mc, _RET_IP_, \
+				    __func__, 0);		\
+} while (0)
+
+#define dept_pglocked_set_bit(f)				\
+do {								\
+	struct dept_map_each *me = get_pglocked_me(&(f)->page);	\
+								\
+	if (likely(me))						\
+		dept_ask_event_split_map(me, &pglocked_mc);	\
+} while (0)
+
+#define dept_pglocked_event(f)					\
+do {								\
+	struct dept_map_each *me = get_pglocked_me(&(f)->page);	\
+								\
+	if (likely(me))						\
+		dept_event_split_map(me, &pglocked_mc, _RET_IP_,\
+				     __func__);			\
+} while (0)
+
+#define dept_pgwriteback_wait(f)				\
+do {								\
+	struct dept_map_each *me = get_pgwriteback_me(&(f)->page);\
+								\
+	if (likely(me))						\
+		dept_wait_split_map(me, &pgwriteback_mc, _RET_IP_,\
+				    __func__, 0);		\
+} while (0)
+
+#define dept_pgwriteback_set_bit(f)				\
+do {								\
+	struct dept_map_each *me = get_pgwriteback_me(&(f)->page);\
+								\
+	if (likely(me))						\
+		dept_ask_event_split_map(me, &pgwriteback_mc);\
+} while (0)
+
+#define dept_pgwriteback_event(f)				\
+do {								\
+	struct dept_map_each *me = get_pgwriteback_me(&(f)->page);\
+								\
+	if (likely(me))						\
+		dept_event_split_map(me, &pgwriteback_mc, _RET_IP_,\
+				     __func__);			\
+} while (0)
+#else
+#define dept_page_init()		do { } while (0)
+#define dept_pglocked_wait(f)		do { } while (0)
+#define dept_pglocked_set_bit(f)	do { } while (0)
+#define dept_pglocked_event(f)		do { } while (0)
+#define dept_pgwriteback_wait(f)	do { } while (0)
+#define dept_pgwriteback_set_bit(f)	do { } while (0)
+#define dept_pgwriteback_event(f)	do { } while (0)
+#endif
+
+#endif /* __LINUX_DEPT_PAGE_H */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 9d8eeaa..9fd9e39 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -480,7 +480,6 @@ static unsigned long *folio_flags(struct folio *folio, unsigned n)
 #define TESTSCFLAG_FALSE(uname, lname)					\
 	TESTSETFLAG_FALSE(uname, lname) TESTCLEARFLAG_FALSE(uname, lname)
 
-__PAGEFLAG(Locked, locked, PF_NO_TAIL)
 PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD)
 PAGEFLAG(Error, error, PF_NO_TAIL) TESTCLEARFLAG(Error, error, PF_NO_TAIL)
 PAGEFLAG(Referenced, referenced, PF_HEAD)
@@ -528,7 +527,6 @@ static unsigned long *folio_flags(struct folio *folio, unsigned n)
  * risky: they bypass page accounting.
  */
 TESTPAGEFLAG(Writeback, writeback, PF_NO_TAIL)
-	TESTSCFLAG(Writeback, writeback, PF_NO_TAIL)
 PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL)
 
 /* PG_readahead is only used for reads; PG_reclaim is only for writes */
@@ -611,6 +609,49 @@ static __always_inline bool PageSwapCache(struct page *page)
 PAGEFLAG_FALSE(SkipKASanPoison, skip_kasan_poison)
 #endif
 
+#ifdef CONFIG_DEPT
+TESTPAGEFLAG(Locked, locked, PF_NO_TAIL)
+__CLEARPAGEFLAG(Locked, locked, PF_NO_TAIL)
+TESTCLEARFLAG(Writeback, writeback, PF_NO_TAIL)
+
+#include <linux/dept_page.h>
+
+static __always_inline
+void __folio_set_locked(struct folio *folio)
+{
+	dept_pglocked_set_bit(folio);
+	__set_bit(PG_locked, folio_flags(folio, FOLIO_PF_NO_TAIL));
+}
+
+static __always_inline void __SetPageLocked(struct page *page)
+{
+	dept_pglocked_set_bit(page_folio(page));
+	__set_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
+}
+
+static __always_inline
+bool folio_test_set_writeback(struct folio *folio)
+{
+	bool ret = test_and_set_bit(PG_writeback, folio_flags(folio, FOLIO_PF_NO_TAIL));
+
+	if (!ret)
+		dept_pgwriteback_set_bit(folio);
+	return ret;
+}
+
+static __always_inline int TestSetPageWriteback(struct page *page)
+{
+	int ret = test_and_set_bit(PG_writeback, &PF_NO_TAIL(page, 1)->flags);
+
+	if (!ret)
+		dept_pgwriteback_set_bit(page_folio(page));
+	return ret;
+}
+#else
+__PAGEFLAG(Locked, locked, PF_NO_TAIL)
+TESTSCFLAG(Writeback, writeback, PF_NO_TAIL)
+#endif
+
 /*
  * PageReported() is used to track reported free pages within the Buddy
  * allocator. We can use the non-atomic version of the test and set
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 993994c..49b211c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -15,6 +15,7 @@
 #include <linux/bitops.h>
 #include <linux/hardirq.h> /* for in_interrupt() */
 #include <linux/hugetlb_inline.h>
+#include <linux/dept_page.h>
 
 struct folio_batch;
 
@@ -890,7 +891,11 @@ bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
 
 static inline bool folio_trylock(struct folio *folio)
 {
-	return likely(!test_and_set_bit_lock(PG_locked, folio_flags(folio, 0)));
+	int ret = test_and_set_bit_lock(PG_locked, folio_flags(folio, 0));
+
+	if (likely(!ret))
+		dept_pglocked_set_bit(folio);
+	return likely(!ret);
 }
 
 /*
diff --git a/init/main.c b/init/main.c
index deabdd5..7d3b905 100644
--- a/init/main.c
+++ b/init/main.c
@@ -101,6 +101,7 @@
 #include <linux/init_syscalls.h>
 #include <linux/stackdepot.h>
 #include <net/net_namespace.h>
+#include <linux/pagemap.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -1073,6 +1074,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
 
 	lockdep_init();
 	dept_init();
+	dept_page_init();
 
 	/*
 	 * Need to run this when irqs are enabled, because it wants
diff --git a/kernel/dependency/dept_object.h b/kernel/dependency/dept_object.h
index 0b7eb16..75b4212 100644
--- a/kernel/dependency/dept_object.h
+++ b/kernel/dependency/dept_object.h
@@ -6,7 +6,7 @@
  * nr: # of the object that should be kept in the pool.
  */
 
-OBJECT(dep, 1024 * 8)
+OBJECT(dep, 1024 * 16)
 OBJECT(class, 1024 * 8)
 OBJECT(stack, 1024 * 32)
 OBJECT(ecxt, 1024 * 16)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 3c17507..6bb1bf2 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1270,6 +1270,7 @@ config DEPT
 	select DEBUG_RWSEMS
 	select DEBUG_WW_MUTEX_SLOWPATH
 	select DEBUG_LOCK_ALLOC
+	select PAGE_EXTENSION
 	select TRACE_IRQFLAGS
 	select STACKTRACE
 	select FRAME_POINTER if !MIPS && !PPC && !ARM && !S390 && !MICROBLAZE && !ARC && !X86
diff --git a/mm/filemap.c b/mm/filemap.c
index 9a1eef6..eb20de95 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1157,6 +1157,11 @@ static void folio_wake_bit(struct folio *folio, int bit_nr)
 	unsigned long flags;
 	wait_queue_entry_t bookmark;
 
+	if (bit_nr == PG_locked)
+		dept_pglocked_event(folio);
+	else if (bit_nr == PG_writeback)
+		dept_pgwriteback_event(folio);
+
 	key.folio = folio;
 	key.bit_nr = bit_nr;
 	key.page_match = 0;
@@ -1229,6 +1234,10 @@ static inline bool folio_trylock_flag(struct folio *folio, int bit_nr,
 	if (wait->flags & WQ_FLAG_EXCLUSIVE) {
 		if (test_and_set_bit(bit_nr, &folio->flags))
 			return false;
+		else if (bit_nr == PG_locked)
+			dept_pglocked_set_bit(folio);
+		else if (bit_nr == PG_writeback)
+			dept_pgwriteback_set_bit(folio);
 	} else if (test_bit(bit_nr, &folio->flags))
 		return false;
 
@@ -1250,6 +1259,11 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 	bool delayacct = false;
 	unsigned long pflags;
 
+	if (bit_nr == PG_locked)
+		dept_pglocked_wait(folio);
+	else if (bit_nr == PG_writeback)
+		dept_pgwriteback_wait(folio);
+
 	if (bit_nr == PG_locked &&
 	    !folio_test_uptodate(folio) && folio_test_workingset(folio)) {
 		if (!folio_test_swapbacked(folio)) {
@@ -1342,6 +1356,11 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 		if (unlikely(test_and_set_bit(bit_nr, folio_flags(folio, 0))))
 			goto repeat;
 
+		if (bit_nr == PG_locked)
+			dept_pglocked_set_bit(folio);
+		else if (bit_nr == PG_writeback)
+			dept_pgwriteback_set_bit(folio);
+
 		wait->flags |= WQ_FLAG_DONE;
 		break;
 	}
@@ -3983,3 +4002,52 @@ bool filemap_release_folio(struct folio *folio, gfp_t gfp)
 	return try_to_free_buffers(&folio->page);
 }
 EXPORT_SYMBOL(filemap_release_folio);
+
+#ifdef CONFIG_DEPT
+static bool need_dept_pglocked(void)
+{
+	return true;
+}
+
+struct page_ext_operations dept_pglocked_ops = {
+	.size = sizeof(struct dept_map_each),
+	.need = need_dept_pglocked,
+};
+
+struct dept_map_each *get_pglocked_me(struct page *p)
+{
+	struct page_ext *e = lookup_page_ext(p);
+
+	return e ? (void *)e + dept_pglocked_ops.offset : NULL;
+}
+EXPORT_SYMBOL(get_pglocked_me);
+
+static bool need_dept_pgwriteback(void)
+{
+	return true;
+}
+
+struct page_ext_operations dept_pgwriteback_ops = {
+	.size = sizeof(struct dept_map_each),
+	.need = need_dept_pgwriteback,
+};
+
+struct dept_map_each *get_pgwriteback_me(struct page *p)
+{
+	struct page_ext *e = lookup_page_ext(p);
+
+	return e ? (void *)e + dept_pgwriteback_ops.offset : NULL;
+}
+EXPORT_SYMBOL(get_pgwriteback_me);
+
+struct dept_map_common pglocked_mc;
+EXPORT_SYMBOL(pglocked_mc);
+struct dept_map_common pgwriteback_mc;
+EXPORT_SYMBOL(pgwriteback_mc);
+
+void dept_page_init(void)
+{
+	dept_split_map_common_init(&pglocked_mc, NULL, "pglocked");
+	dept_split_map_common_init(&pgwriteback_mc, NULL, "pgwriteback");
+}
+#endif
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 2e66d93..b7f5b0d 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -9,6 +9,7 @@
 #include <linux/page_owner.h>
 #include <linux/page_idle.h>
 #include <linux/page_table_check.h>
+#include <linux/dept_page.h>
 
 /*
  * struct page extension
@@ -79,6 +80,10 @@ static bool need_page_idle(void)
 #ifdef CONFIG_PAGE_TABLE_CHECK
 	&page_table_check_ops,
 #endif
+#ifdef CONFIG_DEPT
+	&dept_pglocked_ops,
+	&dept_pgwriteback_ops,
+#endif
 };
 
 unsigned long page_ext_size = sizeof(struct page_ext);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 11/21] dept: Apply Dept to wait/event of PG_{locked, writeback}
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Makes Dept able to track dependencies by PG_{locked,writeback}. For
instance, (un)lock_page() generates that type of dependency.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/dept_page.h       | 78 +++++++++++++++++++++++++++++++++++++++++
 include/linux/page-flags.h      | 45 ++++++++++++++++++++++--
 include/linux/pagemap.h         |  7 +++-
 init/main.c                     |  2 ++
 kernel/dependency/dept_object.h |  2 +-
 lib/Kconfig.debug               |  1 +
 mm/filemap.c                    | 68 +++++++++++++++++++++++++++++++++++
 mm/page_ext.c                   |  5 +++
 8 files changed, 204 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/dept_page.h

diff --git a/include/linux/dept_page.h b/include/linux/dept_page.h
new file mode 100644
index 00000000..d2d093d
--- /dev/null
+++ b/include/linux/dept_page.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_DEPT_PAGE_H
+#define __LINUX_DEPT_PAGE_H
+
+#ifdef CONFIG_DEPT
+#include <linux/dept.h>
+
+extern struct page_ext_operations dept_pglocked_ops;
+extern struct page_ext_operations dept_pgwriteback_ops;
+extern struct dept_map_common pglocked_mc;
+extern struct dept_map_common pgwriteback_mc;
+
+extern void dept_page_init(void);
+extern struct dept_map_each *get_pglocked_me(struct page *page);
+extern struct dept_map_each *get_pgwriteback_me(struct page *page);
+
+#define dept_pglocked_wait(f)					\
+do {								\
+	struct dept_map_each *me = get_pglocked_me(&(f)->page);	\
+								\
+	if (likely(me))						\
+		dept_wait_split_map(me, &pglocked_mc, _RET_IP_, \
+				    __func__, 0);		\
+} while (0)
+
+#define dept_pglocked_set_bit(f)				\
+do {								\
+	struct dept_map_each *me = get_pglocked_me(&(f)->page);	\
+								\
+	if (likely(me))						\
+		dept_ask_event_split_map(me, &pglocked_mc);	\
+} while (0)
+
+#define dept_pglocked_event(f)					\
+do {								\
+	struct dept_map_each *me = get_pglocked_me(&(f)->page);	\
+								\
+	if (likely(me))						\
+		dept_event_split_map(me, &pglocked_mc, _RET_IP_,\
+				     __func__);			\
+} while (0)
+
+#define dept_pgwriteback_wait(f)				\
+do {								\
+	struct dept_map_each *me = get_pgwriteback_me(&(f)->page);\
+								\
+	if (likely(me))						\
+		dept_wait_split_map(me, &pgwriteback_mc, _RET_IP_,\
+				    __func__, 0);		\
+} while (0)
+
+#define dept_pgwriteback_set_bit(f)				\
+do {								\
+	struct dept_map_each *me = get_pgwriteback_me(&(f)->page);\
+								\
+	if (likely(me))						\
+		dept_ask_event_split_map(me, &pgwriteback_mc);\
+} while (0)
+
+#define dept_pgwriteback_event(f)				\
+do {								\
+	struct dept_map_each *me = get_pgwriteback_me(&(f)->page);\
+								\
+	if (likely(me))						\
+		dept_event_split_map(me, &pgwriteback_mc, _RET_IP_,\
+				     __func__);			\
+} while (0)
+#else
+#define dept_page_init()		do { } while (0)
+#define dept_pglocked_wait(f)		do { } while (0)
+#define dept_pglocked_set_bit(f)	do { } while (0)
+#define dept_pglocked_event(f)		do { } while (0)
+#define dept_pgwriteback_wait(f)	do { } while (0)
+#define dept_pgwriteback_set_bit(f)	do { } while (0)
+#define dept_pgwriteback_event(f)	do { } while (0)
+#endif
+
+#endif /* __LINUX_DEPT_PAGE_H */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 9d8eeaa..9fd9e39 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -480,7 +480,6 @@ static unsigned long *folio_flags(struct folio *folio, unsigned n)
 #define TESTSCFLAG_FALSE(uname, lname)					\
 	TESTSETFLAG_FALSE(uname, lname) TESTCLEARFLAG_FALSE(uname, lname)
 
-__PAGEFLAG(Locked, locked, PF_NO_TAIL)
 PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD)
 PAGEFLAG(Error, error, PF_NO_TAIL) TESTCLEARFLAG(Error, error, PF_NO_TAIL)
 PAGEFLAG(Referenced, referenced, PF_HEAD)
@@ -528,7 +527,6 @@ static unsigned long *folio_flags(struct folio *folio, unsigned n)
  * risky: they bypass page accounting.
  */
 TESTPAGEFLAG(Writeback, writeback, PF_NO_TAIL)
-	TESTSCFLAG(Writeback, writeback, PF_NO_TAIL)
 PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL)
 
 /* PG_readahead is only used for reads; PG_reclaim is only for writes */
@@ -611,6 +609,49 @@ static __always_inline bool PageSwapCache(struct page *page)
 PAGEFLAG_FALSE(SkipKASanPoison, skip_kasan_poison)
 #endif
 
+#ifdef CONFIG_DEPT
+TESTPAGEFLAG(Locked, locked, PF_NO_TAIL)
+__CLEARPAGEFLAG(Locked, locked, PF_NO_TAIL)
+TESTCLEARFLAG(Writeback, writeback, PF_NO_TAIL)
+
+#include <linux/dept_page.h>
+
+static __always_inline
+void __folio_set_locked(struct folio *folio)
+{
+	dept_pglocked_set_bit(folio);
+	__set_bit(PG_locked, folio_flags(folio, FOLIO_PF_NO_TAIL));
+}
+
+static __always_inline void __SetPageLocked(struct page *page)
+{
+	dept_pglocked_set_bit(page_folio(page));
+	__set_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
+}
+
+static __always_inline
+bool folio_test_set_writeback(struct folio *folio)
+{
+	bool ret = test_and_set_bit(PG_writeback, folio_flags(folio, FOLIO_PF_NO_TAIL));
+
+	if (!ret)
+		dept_pgwriteback_set_bit(folio);
+	return ret;
+}
+
+static __always_inline int TestSetPageWriteback(struct page *page)
+{
+	int ret = test_and_set_bit(PG_writeback, &PF_NO_TAIL(page, 1)->flags);
+
+	if (!ret)
+		dept_pgwriteback_set_bit(page_folio(page));
+	return ret;
+}
+#else
+__PAGEFLAG(Locked, locked, PF_NO_TAIL)
+TESTSCFLAG(Writeback, writeback, PF_NO_TAIL)
+#endif
+
 /*
  * PageReported() is used to track reported free pages within the Buddy
  * allocator. We can use the non-atomic version of the test and set
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 993994c..49b211c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -15,6 +15,7 @@
 #include <linux/bitops.h>
 #include <linux/hardirq.h> /* for in_interrupt() */
 #include <linux/hugetlb_inline.h>
+#include <linux/dept_page.h>
 
 struct folio_batch;
 
@@ -890,7 +891,11 @@ bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
 
 static inline bool folio_trylock(struct folio *folio)
 {
-	return likely(!test_and_set_bit_lock(PG_locked, folio_flags(folio, 0)));
+	int ret = test_and_set_bit_lock(PG_locked, folio_flags(folio, 0));
+
+	if (likely(!ret))
+		dept_pglocked_set_bit(folio);
+	return likely(!ret);
 }
 
 /*
diff --git a/init/main.c b/init/main.c
index deabdd5..7d3b905 100644
--- a/init/main.c
+++ b/init/main.c
@@ -101,6 +101,7 @@
 #include <linux/init_syscalls.h>
 #include <linux/stackdepot.h>
 #include <net/net_namespace.h>
+#include <linux/pagemap.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -1073,6 +1074,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
 
 	lockdep_init();
 	dept_init();
+	dept_page_init();
 
 	/*
 	 * Need to run this when irqs are enabled, because it wants
diff --git a/kernel/dependency/dept_object.h b/kernel/dependency/dept_object.h
index 0b7eb16..75b4212 100644
--- a/kernel/dependency/dept_object.h
+++ b/kernel/dependency/dept_object.h
@@ -6,7 +6,7 @@
  * nr: # of the object that should be kept in the pool.
  */
 
-OBJECT(dep, 1024 * 8)
+OBJECT(dep, 1024 * 16)
 OBJECT(class, 1024 * 8)
 OBJECT(stack, 1024 * 32)
 OBJECT(ecxt, 1024 * 16)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 3c17507..6bb1bf2 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1270,6 +1270,7 @@ config DEPT
 	select DEBUG_RWSEMS
 	select DEBUG_WW_MUTEX_SLOWPATH
 	select DEBUG_LOCK_ALLOC
+	select PAGE_EXTENSION
 	select TRACE_IRQFLAGS
 	select STACKTRACE
 	select FRAME_POINTER if !MIPS && !PPC && !ARM && !S390 && !MICROBLAZE && !ARC && !X86
diff --git a/mm/filemap.c b/mm/filemap.c
index 9a1eef6..eb20de95 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1157,6 +1157,11 @@ static void folio_wake_bit(struct folio *folio, int bit_nr)
 	unsigned long flags;
 	wait_queue_entry_t bookmark;
 
+	if (bit_nr == PG_locked)
+		dept_pglocked_event(folio);
+	else if (bit_nr == PG_writeback)
+		dept_pgwriteback_event(folio);
+
 	key.folio = folio;
 	key.bit_nr = bit_nr;
 	key.page_match = 0;
@@ -1229,6 +1234,10 @@ static inline bool folio_trylock_flag(struct folio *folio, int bit_nr,
 	if (wait->flags & WQ_FLAG_EXCLUSIVE) {
 		if (test_and_set_bit(bit_nr, &folio->flags))
 			return false;
+		else if (bit_nr == PG_locked)
+			dept_pglocked_set_bit(folio);
+		else if (bit_nr == PG_writeback)
+			dept_pgwriteback_set_bit(folio);
 	} else if (test_bit(bit_nr, &folio->flags))
 		return false;
 
@@ -1250,6 +1259,11 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 	bool delayacct = false;
 	unsigned long pflags;
 
+	if (bit_nr == PG_locked)
+		dept_pglocked_wait(folio);
+	else if (bit_nr == PG_writeback)
+		dept_pgwriteback_wait(folio);
+
 	if (bit_nr == PG_locked &&
 	    !folio_test_uptodate(folio) && folio_test_workingset(folio)) {
 		if (!folio_test_swapbacked(folio)) {
@@ -1342,6 +1356,11 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 		if (unlikely(test_and_set_bit(bit_nr, folio_flags(folio, 0))))
 			goto repeat;
 
+		if (bit_nr == PG_locked)
+			dept_pglocked_set_bit(folio);
+		else if (bit_nr == PG_writeback)
+			dept_pgwriteback_set_bit(folio);
+
 		wait->flags |= WQ_FLAG_DONE;
 		break;
 	}
@@ -3983,3 +4002,52 @@ bool filemap_release_folio(struct folio *folio, gfp_t gfp)
 	return try_to_free_buffers(&folio->page);
 }
 EXPORT_SYMBOL(filemap_release_folio);
+
+#ifdef CONFIG_DEPT
+static bool need_dept_pglocked(void)
+{
+	return true;
+}
+
+struct page_ext_operations dept_pglocked_ops = {
+	.size = sizeof(struct dept_map_each),
+	.need = need_dept_pglocked,
+};
+
+struct dept_map_each *get_pglocked_me(struct page *p)
+{
+	struct page_ext *e = lookup_page_ext(p);
+
+	return e ? (void *)e + dept_pglocked_ops.offset : NULL;
+}
+EXPORT_SYMBOL(get_pglocked_me);
+
+static bool need_dept_pgwriteback(void)
+{
+	return true;
+}
+
+struct page_ext_operations dept_pgwriteback_ops = {
+	.size = sizeof(struct dept_map_each),
+	.need = need_dept_pgwriteback,
+};
+
+struct dept_map_each *get_pgwriteback_me(struct page *p)
+{
+	struct page_ext *e = lookup_page_ext(p);
+
+	return e ? (void *)e + dept_pgwriteback_ops.offset : NULL;
+}
+EXPORT_SYMBOL(get_pgwriteback_me);
+
+struct dept_map_common pglocked_mc;
+EXPORT_SYMBOL(pglocked_mc);
+struct dept_map_common pgwriteback_mc;
+EXPORT_SYMBOL(pgwriteback_mc);
+
+void dept_page_init(void)
+{
+	dept_split_map_common_init(&pglocked_mc, NULL, "pglocked");
+	dept_split_map_common_init(&pgwriteback_mc, NULL, "pgwriteback");
+}
+#endif
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 2e66d93..b7f5b0d 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -9,6 +9,7 @@
 #include <linux/page_owner.h>
 #include <linux/page_idle.h>
 #include <linux/page_table_check.h>
+#include <linux/dept_page.h>
 
 /*
  * struct page extension
@@ -79,6 +80,10 @@ static bool need_page_idle(void)
 #ifdef CONFIG_PAGE_TABLE_CHECK
 	&page_table_check_ops,
 #endif
+#ifdef CONFIG_DEPT
+	&dept_pglocked_ops,
+	&dept_pgwriteback_ops,
+#endif
 };
 
 unsigned long page_ext_size = sizeof(struct page_ext);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 12/21] dept: Apply SDT to swait
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

Makes SDT able to track dependencies by swait.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/swait.h |  4 ++++
 kernel/sched/swait.c  | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/swait.h b/include/linux/swait.h
index 6a8c22b..8080b29 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -6,6 +6,7 @@
 #include <linux/stddef.h>
 #include <linux/spinlock.h>
 #include <linux/wait.h>
+#include <linux/dept_sdt.h>
 #include <asm/current.h>
 
 /*
@@ -43,6 +44,7 @@
 struct swait_queue_head {
 	raw_spinlock_t		lock;
 	struct list_head	task_list;
+	struct dept_map		dmap;
 };
 
 struct swait_queue {
@@ -61,6 +63,7 @@ struct swait_queue {
 #define __SWAIT_QUEUE_HEAD_INITIALIZER(name) {				\
 	.lock		= __RAW_SPIN_LOCK_UNLOCKED(name.lock),		\
 	.task_list	= LIST_HEAD_INIT((name).task_list),		\
+	.dmap		= DEPT_MAP_INITIALIZER(name),			\
 }
 
 #define DECLARE_SWAIT_QUEUE_HEAD(name)					\
@@ -72,6 +75,7 @@ extern void __init_swait_queue_head(struct swait_queue_head *q, const char *name
 #define init_swait_queue_head(q)				\
 	do {							\
 		static struct lock_class_key __key;		\
+		sdt_map_init(&(q)->dmap);			\
 		__init_swait_queue_head((q), #q, &__key);	\
 	} while (0)
 
diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c
index 76b9b79..4f713c0 100644
--- a/kernel/sched/swait.c
+++ b/kernel/sched/swait.c
@@ -26,6 +26,7 @@ void swake_up_locked(struct swait_queue_head *q)
 		return;
 
 	curr = list_first_entry(&q->task_list, typeof(*curr), task_list);
+	sdt_event(&q->dmap);
 	wake_up_process(curr->task);
 	list_del_init(&curr->task_list);
 }
@@ -68,6 +69,7 @@ void swake_up_all(struct swait_queue_head *q)
 	while (!list_empty(&tmp)) {
 		curr = list_first_entry(&tmp, typeof(*curr), task_list);
 
+		sdt_event(&q->dmap);
 		wake_up_state(curr->task, TASK_NORMAL);
 		list_del_init(&curr->task_list);
 
@@ -96,6 +98,9 @@ void prepare_to_swait_exclusive(struct swait_queue_head *q, struct swait_queue *
 	__prepare_to_swait(q, wait);
 	set_current_state(state);
 	raw_spin_unlock_irqrestore(&q->lock, flags);
+
+	if (state & TASK_NORMAL)
+		sdt_wait_prepare(&q->dmap);
 }
 EXPORT_SYMBOL(prepare_to_swait_exclusive);
 
@@ -118,12 +123,16 @@ long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait
 	}
 	raw_spin_unlock_irqrestore(&q->lock, flags);
 
+	if (!ret && state & TASK_NORMAL)
+		sdt_wait_prepare(&q->dmap);
+
 	return ret;
 }
 EXPORT_SYMBOL(prepare_to_swait_event);
 
 void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
 {
+	sdt_wait_finish();
 	__set_current_state(TASK_RUNNING);
 	if (!list_empty(&wait->task_list))
 		list_del_init(&wait->task_list);
@@ -133,6 +142,7 @@ void finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
 {
 	unsigned long flags;
 
+	sdt_wait_finish();
 	__set_current_state(TASK_RUNNING);
 
 	if (!list_empty_careful(&wait->task_list)) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 12/21] dept: Apply SDT to swait
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Makes SDT able to track dependencies by swait.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/swait.h |  4 ++++
 kernel/sched/swait.c  | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/swait.h b/include/linux/swait.h
index 6a8c22b..8080b29 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -6,6 +6,7 @@
 #include <linux/stddef.h>
 #include <linux/spinlock.h>
 #include <linux/wait.h>
+#include <linux/dept_sdt.h>
 #include <asm/current.h>
 
 /*
@@ -43,6 +44,7 @@
 struct swait_queue_head {
 	raw_spinlock_t		lock;
 	struct list_head	task_list;
+	struct dept_map		dmap;
 };
 
 struct swait_queue {
@@ -61,6 +63,7 @@ struct swait_queue {
 #define __SWAIT_QUEUE_HEAD_INITIALIZER(name) {				\
 	.lock		= __RAW_SPIN_LOCK_UNLOCKED(name.lock),		\
 	.task_list	= LIST_HEAD_INIT((name).task_list),		\
+	.dmap		= DEPT_MAP_INITIALIZER(name),			\
 }
 
 #define DECLARE_SWAIT_QUEUE_HEAD(name)					\
@@ -72,6 +75,7 @@ extern void __init_swait_queue_head(struct swait_queue_head *q, const char *name
 #define init_swait_queue_head(q)				\
 	do {							\
 		static struct lock_class_key __key;		\
+		sdt_map_init(&(q)->dmap);			\
 		__init_swait_queue_head((q), #q, &__key);	\
 	} while (0)
 
diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c
index 76b9b79..4f713c0 100644
--- a/kernel/sched/swait.c
+++ b/kernel/sched/swait.c
@@ -26,6 +26,7 @@ void swake_up_locked(struct swait_queue_head *q)
 		return;
 
 	curr = list_first_entry(&q->task_list, typeof(*curr), task_list);
+	sdt_event(&q->dmap);
 	wake_up_process(curr->task);
 	list_del_init(&curr->task_list);
 }
@@ -68,6 +69,7 @@ void swake_up_all(struct swait_queue_head *q)
 	while (!list_empty(&tmp)) {
 		curr = list_first_entry(&tmp, typeof(*curr), task_list);
 
+		sdt_event(&q->dmap);
 		wake_up_state(curr->task, TASK_NORMAL);
 		list_del_init(&curr->task_list);
 
@@ -96,6 +98,9 @@ void prepare_to_swait_exclusive(struct swait_queue_head *q, struct swait_queue *
 	__prepare_to_swait(q, wait);
 	set_current_state(state);
 	raw_spin_unlock_irqrestore(&q->lock, flags);
+
+	if (state & TASK_NORMAL)
+		sdt_wait_prepare(&q->dmap);
 }
 EXPORT_SYMBOL(prepare_to_swait_exclusive);
 
@@ -118,12 +123,16 @@ long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait
 	}
 	raw_spin_unlock_irqrestore(&q->lock, flags);
 
+	if (!ret && state & TASK_NORMAL)
+		sdt_wait_prepare(&q->dmap);
+
 	return ret;
 }
 EXPORT_SYMBOL(prepare_to_swait_event);
 
 void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
 {
+	sdt_wait_finish();
 	__set_current_state(TASK_RUNNING);
 	if (!list_empty(&wait->task_list))
 		list_del_init(&wait->task_list);
@@ -133,6 +142,7 @@ void finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
 {
 	unsigned long flags;
 
+	sdt_wait_finish();
 	__set_current_state(TASK_RUNNING);
 
 	if (!list_empty_careful(&wait->task_list)) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 13/21] dept: Apply SDT to wait(waitqueue)
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

Makes SDT able to track dependencies by wait(waitqueue).

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/wait.h |  6 +++++-
 kernel/sched/wait.c  | 16 ++++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 851e07d..e637585 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -7,6 +7,7 @@
 #include <linux/list.h>
 #include <linux/stddef.h>
 #include <linux/spinlock.h>
+#include <linux/dept_sdt.h>
 
 #include <asm/current.h>
 #include <uapi/linux/wait.h>
@@ -37,6 +38,7 @@ struct wait_queue_entry {
 struct wait_queue_head {
 	spinlock_t		lock;
 	struct list_head	head;
+	struct dept_map		dmap;
 };
 typedef struct wait_queue_head wait_queue_head_t;
 
@@ -56,7 +58,8 @@ struct wait_queue_head {
 
 #define __WAIT_QUEUE_HEAD_INITIALIZER(name) {					\
 	.lock		= __SPIN_LOCK_UNLOCKED(name.lock),			\
-	.head		= LIST_HEAD_INIT(name.head) }
+	.head		= LIST_HEAD_INIT(name.head),				\
+	.dmap		= DEPT_MAP_INITIALIZER(name) }
 
 #define DECLARE_WAIT_QUEUE_HEAD(name) \
 	struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
@@ -67,6 +70,7 @@ struct wait_queue_head {
 	do {									\
 		static struct lock_class_key __key;				\
 										\
+		sdt_map_init(&(wq_head)->dmap);					\
 		__init_waitqueue_head((wq_head), #wq_head, &__key);		\
 	} while (0)
 
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 9860bb9..d67d0dc4 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -104,6 +104,7 @@ static int __wake_up_common(struct wait_queue_head *wq_head, unsigned int mode,
 		if (flags & WQ_FLAG_BOOKMARK)
 			continue;
 
+		sdt_event(&wq_head->dmap);
 		ret = curr->func(curr, mode, wake_flags, key);
 		if (ret < 0)
 			break;
@@ -267,6 +268,9 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head)
 		__add_wait_queue(wq_head, wq_entry);
 	set_current_state(state);
 	spin_unlock_irqrestore(&wq_head->lock, flags);
+
+	if (state & TASK_NORMAL)
+		sdt_wait_prepare(&wq_head->dmap);
 }
 EXPORT_SYMBOL(prepare_to_wait);
 
@@ -285,6 +289,10 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head)
 	}
 	set_current_state(state);
 	spin_unlock_irqrestore(&wq_head->lock, flags);
+
+	if (state & TASK_NORMAL)
+		sdt_wait_prepare(&wq_head->dmap);
+
 	return was_empty;
 }
 EXPORT_SYMBOL(prepare_to_wait_exclusive);
@@ -330,6 +338,9 @@ long prepare_to_wait_event(struct wait_queue_head *wq_head, struct wait_queue_en
 	}
 	spin_unlock_irqrestore(&wq_head->lock, flags);
 
+	if (!ret && state & TASK_NORMAL)
+		sdt_wait_prepare(&wq_head->dmap);
+
 	return ret;
 }
 EXPORT_SYMBOL(prepare_to_wait_event);
@@ -351,7 +362,9 @@ int do_wait_intr(wait_queue_head_t *wq, wait_queue_entry_t *wait)
 		return -ERESTARTSYS;
 
 	spin_unlock(&wq->lock);
+	sdt_wait_prepare(&wq->dmap);
 	schedule();
+	sdt_wait_finish();
 	spin_lock(&wq->lock);
 
 	return 0;
@@ -368,7 +381,9 @@ int do_wait_intr_irq(wait_queue_head_t *wq, wait_queue_entry_t *wait)
 		return -ERESTARTSYS;
 
 	spin_unlock_irq(&wq->lock);
+	sdt_wait_prepare(&wq->dmap);
 	schedule();
+	sdt_wait_finish();
 	spin_lock_irq(&wq->lock);
 
 	return 0;
@@ -388,6 +403,7 @@ void finish_wait(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_en
 {
 	unsigned long flags;
 
+	sdt_wait_finish();
 	__set_current_state(TASK_RUNNING);
 	/*
 	 * We can check for list emptiness outside the lock
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 13/21] dept: Apply SDT to wait(waitqueue)
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Makes SDT able to track dependencies by wait(waitqueue).

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/wait.h |  6 +++++-
 kernel/sched/wait.c  | 16 ++++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 851e07d..e637585 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -7,6 +7,7 @@
 #include <linux/list.h>
 #include <linux/stddef.h>
 #include <linux/spinlock.h>
+#include <linux/dept_sdt.h>
 
 #include <asm/current.h>
 #include <uapi/linux/wait.h>
@@ -37,6 +38,7 @@ struct wait_queue_entry {
 struct wait_queue_head {
 	spinlock_t		lock;
 	struct list_head	head;
+	struct dept_map		dmap;
 };
 typedef struct wait_queue_head wait_queue_head_t;
 
@@ -56,7 +58,8 @@ struct wait_queue_head {
 
 #define __WAIT_QUEUE_HEAD_INITIALIZER(name) {					\
 	.lock		= __SPIN_LOCK_UNLOCKED(name.lock),			\
-	.head		= LIST_HEAD_INIT(name.head) }
+	.head		= LIST_HEAD_INIT(name.head),				\
+	.dmap		= DEPT_MAP_INITIALIZER(name) }
 
 #define DECLARE_WAIT_QUEUE_HEAD(name) \
 	struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
@@ -67,6 +70,7 @@ struct wait_queue_head {
 	do {									\
 		static struct lock_class_key __key;				\
 										\
+		sdt_map_init(&(wq_head)->dmap);					\
 		__init_waitqueue_head((wq_head), #wq_head, &__key);		\
 	} while (0)
 
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 9860bb9..d67d0dc4 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -104,6 +104,7 @@ static int __wake_up_common(struct wait_queue_head *wq_head, unsigned int mode,
 		if (flags & WQ_FLAG_BOOKMARK)
 			continue;
 
+		sdt_event(&wq_head->dmap);
 		ret = curr->func(curr, mode, wake_flags, key);
 		if (ret < 0)
 			break;
@@ -267,6 +268,9 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head)
 		__add_wait_queue(wq_head, wq_entry);
 	set_current_state(state);
 	spin_unlock_irqrestore(&wq_head->lock, flags);
+
+	if (state & TASK_NORMAL)
+		sdt_wait_prepare(&wq_head->dmap);
 }
 EXPORT_SYMBOL(prepare_to_wait);
 
@@ -285,6 +289,10 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head)
 	}
 	set_current_state(state);
 	spin_unlock_irqrestore(&wq_head->lock, flags);
+
+	if (state & TASK_NORMAL)
+		sdt_wait_prepare(&wq_head->dmap);
+
 	return was_empty;
 }
 EXPORT_SYMBOL(prepare_to_wait_exclusive);
@@ -330,6 +338,9 @@ long prepare_to_wait_event(struct wait_queue_head *wq_head, struct wait_queue_en
 	}
 	spin_unlock_irqrestore(&wq_head->lock, flags);
 
+	if (!ret && state & TASK_NORMAL)
+		sdt_wait_prepare(&wq_head->dmap);
+
 	return ret;
 }
 EXPORT_SYMBOL(prepare_to_wait_event);
@@ -351,7 +362,9 @@ int do_wait_intr(wait_queue_head_t *wq, wait_queue_entry_t *wait)
 		return -ERESTARTSYS;
 
 	spin_unlock(&wq->lock);
+	sdt_wait_prepare(&wq->dmap);
 	schedule();
+	sdt_wait_finish();
 	spin_lock(&wq->lock);
 
 	return 0;
@@ -368,7 +381,9 @@ int do_wait_intr_irq(wait_queue_head_t *wq, wait_queue_entry_t *wait)
 		return -ERESTARTSYS;
 
 	spin_unlock_irq(&wq->lock);
+	sdt_wait_prepare(&wq->dmap);
 	schedule();
+	sdt_wait_finish();
 	spin_lock_irq(&wq->lock);
 
 	return 0;
@@ -388,6 +403,7 @@ void finish_wait(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_en
 {
 	unsigned long flags;
 
+	sdt_wait_finish();
 	__set_current_state(TASK_RUNNING);
 	/*
 	 * We can check for list emptiness outside the lock
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 14/21] locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

cb92173d1f0 ("locking/lockdep, cpu/hotplug: Annotate AP thread") was
introduced to make lockdep_assert_cpus_held() work in AP thread.

However, the annotation is too strong for that purpose. We don't have to
use more than try lock annotation for that.

Furthermore, now that Dept was introduced, false positive alarms was
reported by that. Replaced it with try lock annotation.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index d0a9aa0..cb6e66c 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -355,7 +355,7 @@ int lockdep_is_cpus_held(void)
 
 static void lockdep_acquire_cpus_lock(void)
 {
-	rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 0, _THIS_IP_);
+	rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 1, _THIS_IP_);
 }
 
 static void lockdep_release_cpus_lock(void)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 14/21] locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

cb92173d1f0 ("locking/lockdep, cpu/hotplug: Annotate AP thread") was
introduced to make lockdep_assert_cpus_held() work in AP thread.

However, the annotation is too strong for that purpose. We don't have to
use more than try lock annotation for that.

Furthermore, now that Dept was introduced, false positive alarms was
reported by that. Replaced it with try lock annotation.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index d0a9aa0..cb6e66c 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -355,7 +355,7 @@ int lockdep_is_cpus_held(void)
 
 static void lockdep_acquire_cpus_lock(void)
 {
-	rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 0, _THIS_IP_);
+	rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 1, _THIS_IP_);
 }
 
 static void lockdep_release_cpus_lock(void)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 15/21] dept: Distinguish each syscall context from another
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

It enters kernel mode on each syscall and each syscall handling should
be considered independently from the point of view of Dept. Otherwise,
Dept may wrongly track dependencies across different syscalls.

That might be a real dependency from user mode. However, now that Dept
just started to work, conservatively let Dept not track dependencies
across different syscalls.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 arch/arm64/kernel/syscall.c |  2 ++
 arch/x86/entry/common.c     |  4 +++
 include/linux/dept.h        | 39 +++++++++++++++-----------
 kernel/dependency/dept.c    | 67 +++++++++++++++++++++++----------------------
 4 files changed, 63 insertions(+), 49 deletions(-)

diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index c938603..1f219f0 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -7,6 +7,7 @@
 #include <linux/ptrace.h>
 #include <linux/randomize_kstack.h>
 #include <linux/syscalls.h>
+#include <linux/dept.h>
 
 #include <asm/daifflags.h>
 #include <asm/debug-monitors.h>
@@ -105,6 +106,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 	 */
 
 	local_daif_restore(DAIF_PROCCTX);
+	dept_kernel_enter();
 
 	if (flags & _TIF_MTE_ASYNC_FAULT) {
 		/*
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6c28264..7cdd27a 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -19,6 +19,7 @@
 #include <linux/nospec.h>
 #include <linux/syscalls.h>
 #include <linux/uaccess.h>
+#include <linux/dept.h>
 
 #ifdef CONFIG_XEN_PV
 #include <xen/xen-ops.h>
@@ -72,6 +73,7 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
 
 __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
 {
+	dept_kernel_enter();
 	add_random_kstack_offset();
 	nr = syscall_enter_from_user_mode(regs, nr);
 
@@ -120,6 +122,7 @@ __visible noinstr void do_int80_syscall_32(struct pt_regs *regs)
 {
 	int nr = syscall_32_enter(regs);
 
+	dept_kernel_enter();
 	add_random_kstack_offset();
 	/*
 	 * Subtlety here: if ptrace pokes something larger than 2^31-1 into
@@ -140,6 +143,7 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
 	int nr = syscall_32_enter(regs);
 	int res;
 
+	dept_kernel_enter();
 	add_random_kstack_offset();
 	/*
 	 * This cannot use syscall_enter_from_user_mode() as it has to
diff --git a/include/linux/dept.h b/include/linux/dept.h
index 9698134..c020b17 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -25,11 +25,16 @@
 #define DEPT_MAX_SUBCLASSES_USR		(DEPT_MAX_SUBCLASSES / DEPT_MAX_SUBCLASSES_EVT)
 #define DEPT_MAX_SUBCLASSES_CACHE	2
 
-#define DEPT_SIRQ			0
-#define DEPT_HIRQ			1
-#define DEPT_IRQS_NR			2
-#define DEPT_SIRQF			(1UL << DEPT_SIRQ)
-#define DEPT_HIRQF			(1UL << DEPT_HIRQ)
+enum {
+	DEPT_CXT_SIRQ = 0,
+	DEPT_CXT_HIRQ,
+	DEPT_CXT_IRQS_NR,
+	DEPT_CXT_PROCESS = DEPT_CXT_IRQS_NR,
+	DEPT_CXTS_NR
+};
+
+#define DEPT_SIRQF			(1UL << DEPT_CXT_SIRQ)
+#define DEPT_HIRQF			(1UL << DEPT_CXT_HIRQ)
 
 struct dept_ecxt;
 struct dept_iecxt {
@@ -95,8 +100,8 @@ struct dept_class {
 	/*
 	 * for tracking IRQ dependencies
 	 */
-	struct dept_iecxt		iecxt[DEPT_IRQS_NR];
-	struct dept_iwait		iwait[DEPT_IRQS_NR];
+	struct dept_iecxt		iecxt[DEPT_CXT_IRQS_NR];
+	struct dept_iwait		iwait[DEPT_CXT_IRQS_NR];
 };
 
 struct dept_stack {
@@ -150,8 +155,8 @@ struct dept_ecxt {
 	/*
 	 * where the IRQ-enabled happened
 	 */
-	unsigned long			enirq_ip[DEPT_IRQS_NR];
-	struct dept_stack		*enirq_stack[DEPT_IRQS_NR];
+	unsigned long			enirq_ip[DEPT_CXT_IRQS_NR];
+	struct dept_stack		*enirq_stack[DEPT_CXT_IRQS_NR];
 
 	/*
 	 * where the event context started
@@ -194,8 +199,8 @@ struct dept_wait {
 	/*
 	 * where the IRQ wait happened
 	 */
-	unsigned long			irq_ip[DEPT_IRQS_NR];
-	struct dept_stack		*irq_stack[DEPT_IRQS_NR];
+	unsigned long			irq_ip[DEPT_CXT_IRQS_NR];
+	struct dept_stack		*irq_stack[DEPT_CXT_IRQS_NR];
 
 	/*
 	 * where the wait happened
@@ -405,19 +410,19 @@ struct dept_task {
 	int				wait_hist_pos;
 
 	/*
-	 * sequential id to identify each IRQ context
+	 * sequential id to identify each context
 	 */
-	unsigned int			irq_id[DEPT_IRQS_NR];
+	unsigned int			cxt_id[DEPT_CXTS_NR];
 
 	/*
 	 * for tracking IRQ-enabled points with cross-event
 	 */
-	unsigned int			wgen_enirq[DEPT_IRQS_NR];
+	unsigned int			wgen_enirq[DEPT_CXT_IRQS_NR];
 
 	/*
 	 * for keeping up-to-date IRQ-enabled points
 	 */
-	unsigned long			enirq_ip[DEPT_IRQS_NR];
+	unsigned long			enirq_ip[DEPT_CXT_IRQS_NR];
 
 	/*
 	 * current effective IRQ-enabled flag
@@ -459,7 +464,7 @@ struct dept_task {
 	.wait_hist = { { .wait = NULL, } },			\
 	.ecxt_held_pos = 0,					\
 	.wait_hist_pos = 0,					\
-	.irq_id = { 0U },					\
+	.cxt_id = { 0U },					\
 	.wgen_enirq = { 0U },					\
 	.enirq_ip = { 0UL },					\
 	.eff_enirqf = 0UL,					\
@@ -497,6 +502,7 @@ struct dept_task {
 extern void dept_wait_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *w_fn, int ne);
 extern void dept_event_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *e_fn);
 extern void dept_ask_event_split_map(struct dept_map_each *me, struct dept_map_common *mc);
+extern void dept_kernel_enter(void);
 
 static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 {
@@ -549,6 +555,7 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 #define dept_wait_split_map(me, mc, ip, w_fn, ne)	do { } while (0)
 #define dept_event_split_map(me, mc, ip, e_fn)		do { } while (0)
 #define dept_ask_event_split_map(me, mc)		do { } while (0)
+#define dept_kernel_enter()				do { } while (0)
 #define dept_ecxt_enter_nokeep(m)			do { } while (0)
 #define dept_key_init(k)				do { (void)(k); } while (0)
 #define dept_key_destroy(k)				do { (void)(k); } while (0)
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index a0413f1..18e5951 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -214,9 +214,9 @@ static inline struct dept_class *dep_tc(struct dept_dep *d)
 
 static inline const char *irq_str(int irq)
 {
-	if (irq == DEPT_SIRQ)
+	if (irq == DEPT_CXT_SIRQ)
 		return "softirq";
-	if (irq == DEPT_HIRQ)
+	if (irq == DEPT_CXT_HIRQ)
 		return "hardirq";
 	return "(unknown)";
 }
@@ -376,7 +376,7 @@ static void initialize_class(struct dept_class *c)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		struct dept_iecxt *ie = &c->iecxt[i];
 		struct dept_iwait *iw = &c->iwait[i];
 
@@ -401,7 +401,7 @@ static void initialize_ecxt(struct dept_ecxt *e)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		e->enirq_stack[i] = NULL;
 		e->enirq_ip[i] = 0UL;
 	}
@@ -417,7 +417,7 @@ static void initialize_wait(struct dept_wait *w)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		w->irq_stack[i] = NULL;
 		w->irq_ip[i] = 0UL;
 	}
@@ -456,7 +456,7 @@ static void destroy_ecxt(struct dept_ecxt *e)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++)
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++)
 		if (e->enirq_stack[i])
 			put_stack(e->enirq_stack[i]);
 	if (e->class)
@@ -472,7 +472,7 @@ static void destroy_wait(struct dept_wait *w)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++)
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++)
 		if (w->irq_stack[i])
 			put_stack(w->irq_stack[i]);
 	if (w->class)
@@ -617,7 +617,7 @@ static void print_diagram(struct dept_dep *d)
 	const char *c_fn = e->ecxt_fn ?: "(unknown)";
 
 	irqf = e->enirqf & w->irqf;
-	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+	for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
 		if (!firstline)
 			pr_warn("\nor\n\n");
 		firstline = false;
@@ -648,7 +648,7 @@ static void print_dep(struct dept_dep *d)
 	const char *c_fn = e->ecxt_fn ?: "(unknown)";
 
 	irqf = e->enirqf & w->irqf;
-	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+	for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
 		pr_warn("%s has been enabled:\n", irq_str(irq));
 		print_ip_stack(e->enirq_ip[irq], e->enirq_stack[irq]);
 		pr_warn("\n");
@@ -874,7 +874,7 @@ static void bfs(struct dept_class *c, bfs_f *cb, void *in, void **out)
  */
 
 static inline unsigned long cur_enirqf(void);
-static inline int cur_irq(void);
+static inline int cur_cxt(void);
 static inline unsigned int cur_ctxt_id(void);
 
 static inline struct dept_iecxt *iecxt(struct dept_class *c, int irq)
@@ -1413,7 +1413,7 @@ static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
 	if (d) {
 		check_dl_bfs(d);
 
-		for (i = 0; i < DEPT_IRQS_NR; i++) {
+		for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 			struct dept_iwait *fiw = iwait(fc, i);
 			struct dept_iecxt *found_ie;
 			struct dept_iwait *found_iw;
@@ -1449,7 +1449,7 @@ static void add_wait(struct dept_class *c, unsigned long ip,
 	struct dept_task *dt = dept_task();
 	struct dept_wait *w;
 	unsigned int wg = 0U;
-	int irq;
+	int cxt;
 	int i;
 
 	w = new_wait();
@@ -1461,9 +1461,9 @@ static void add_wait(struct dept_class *c, unsigned long ip,
 	w->wait_fn = w_fn;
 	w->wait_stack = get_current_stack();
 
-	irq = cur_irq();
-	if (irq < DEPT_IRQS_NR)
-		add_iwait(c, irq, w);
+	cxt = cur_cxt();
+	if (cxt == DEPT_CXT_HIRQ || cxt == DEPT_CXT_SIRQ)
+		add_iwait(c, cxt, w);
 
 	/*
 	 * Avoid adding dependency between user aware nested ecxt and
@@ -1521,7 +1521,7 @@ static bool add_ecxt(void *obj, struct dept_class *c, unsigned long ip,
 	eh->nest = ne;
 
 	irqf = cur_enirqf();
-	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR)
+	for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR)
 		add_iecxt(c, irq, e, false);
 
 	del_ecxt(e);
@@ -1656,7 +1656,7 @@ static void do_event(void *obj, struct dept_class *c, unsigned int wg,
 			break;
 	}
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		struct dept_ecxt *e;
 
 		if (before(dt->wgen_enirq[i], wg))
@@ -1698,7 +1698,7 @@ static void disconnect_class(struct dept_class *c)
 		call_rcu(&d->rh, del_dep_rcu);
 	}
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		stale_iecxt(iecxt(c, i));
 		stale_iwait(iwait(c, i));
 	}
@@ -1723,27 +1723,21 @@ static inline unsigned long cur_enirqf(void)
 	return 0UL;
 }
 
-static inline int cur_irq(void)
+static inline int cur_cxt(void)
 {
 	if (lockdep_softirq_context(current))
-		return DEPT_SIRQ;
+		return DEPT_CXT_SIRQ;
 	if (lockdep_hardirq_context())
-		return DEPT_HIRQ;
-	return DEPT_IRQS_NR;
+		return DEPT_CXT_HIRQ;
+	return DEPT_CXT_PROCESS;
 }
 
 static inline unsigned int cur_ctxt_id(void)
 {
 	struct dept_task *dt = dept_task();
-	int irq = cur_irq();
+	int cxt = cur_cxt();
 
-	/*
-	 * Normal process context
-	 */
-	if (irq == DEPT_IRQS_NR)
-		return 0U;
-
-	return dt->irq_id[irq] | (1UL << irq);
+	return dt->cxt_id[cxt] | (1UL << cxt);
 }
 
 static void enirq_transition(int irq)
@@ -1793,7 +1787,7 @@ static void enirq_update(unsigned long ip)
 	/*
 	 * Do enirq_transition() only on an OFF -> ON transition.
 	 */
-	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+	for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
 		if (prev & (1UL << irq))
 			continue;
 
@@ -1850,6 +1844,13 @@ void dept_enirq_transition(unsigned long ip)
 	dept_exit(flags);
 }
 
+void dept_kernel_enter(void)
+{
+	struct dept_task *dt = dept_task();
+
+	dt->cxt_id[DEPT_CXT_PROCESS] += (1UL << DEPT_CXTS_NR);
+}
+
 /*
  * Ensure it's the outmost softirq context.
  */
@@ -1857,7 +1858,7 @@ void dept_softirq_enter(void)
 {
 	struct dept_task *dt = dept_task();
 
-	dt->irq_id[DEPT_SIRQ] += (1UL << DEPT_IRQS_NR);
+	dt->cxt_id[DEPT_CXT_SIRQ] += (1UL << DEPT_CXTS_NR);
 }
 
 /*
@@ -1867,7 +1868,7 @@ void dept_hardirq_enter(void)
 {
 	struct dept_task *dt = dept_task();
 
-	dt->irq_id[DEPT_HIRQ] += (1UL << DEPT_IRQS_NR);
+	dt->cxt_id[DEPT_CXT_HIRQ] += (1UL << DEPT_CXTS_NR);
 }
 
 /*
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 15/21] dept: Distinguish each syscall context from another
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

It enters kernel mode on each syscall and each syscall handling should
be considered independently from the point of view of Dept. Otherwise,
Dept may wrongly track dependencies across different syscalls.

That might be a real dependency from user mode. However, now that Dept
just started to work, conservatively let Dept not track dependencies
across different syscalls.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 arch/arm64/kernel/syscall.c |  2 ++
 arch/x86/entry/common.c     |  4 +++
 include/linux/dept.h        | 39 +++++++++++++++-----------
 kernel/dependency/dept.c    | 67 +++++++++++++++++++++++----------------------
 4 files changed, 63 insertions(+), 49 deletions(-)

diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index c938603..1f219f0 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -7,6 +7,7 @@
 #include <linux/ptrace.h>
 #include <linux/randomize_kstack.h>
 #include <linux/syscalls.h>
+#include <linux/dept.h>
 
 #include <asm/daifflags.h>
 #include <asm/debug-monitors.h>
@@ -105,6 +106,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 	 */
 
 	local_daif_restore(DAIF_PROCCTX);
+	dept_kernel_enter();
 
 	if (flags & _TIF_MTE_ASYNC_FAULT) {
 		/*
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6c28264..7cdd27a 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -19,6 +19,7 @@
 #include <linux/nospec.h>
 #include <linux/syscalls.h>
 #include <linux/uaccess.h>
+#include <linux/dept.h>
 
 #ifdef CONFIG_XEN_PV
 #include <xen/xen-ops.h>
@@ -72,6 +73,7 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
 
 __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
 {
+	dept_kernel_enter();
 	add_random_kstack_offset();
 	nr = syscall_enter_from_user_mode(regs, nr);
 
@@ -120,6 +122,7 @@ __visible noinstr void do_int80_syscall_32(struct pt_regs *regs)
 {
 	int nr = syscall_32_enter(regs);
 
+	dept_kernel_enter();
 	add_random_kstack_offset();
 	/*
 	 * Subtlety here: if ptrace pokes something larger than 2^31-1 into
@@ -140,6 +143,7 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
 	int nr = syscall_32_enter(regs);
 	int res;
 
+	dept_kernel_enter();
 	add_random_kstack_offset();
 	/*
 	 * This cannot use syscall_enter_from_user_mode() as it has to
diff --git a/include/linux/dept.h b/include/linux/dept.h
index 9698134..c020b17 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -25,11 +25,16 @@
 #define DEPT_MAX_SUBCLASSES_USR		(DEPT_MAX_SUBCLASSES / DEPT_MAX_SUBCLASSES_EVT)
 #define DEPT_MAX_SUBCLASSES_CACHE	2
 
-#define DEPT_SIRQ			0
-#define DEPT_HIRQ			1
-#define DEPT_IRQS_NR			2
-#define DEPT_SIRQF			(1UL << DEPT_SIRQ)
-#define DEPT_HIRQF			(1UL << DEPT_HIRQ)
+enum {
+	DEPT_CXT_SIRQ = 0,
+	DEPT_CXT_HIRQ,
+	DEPT_CXT_IRQS_NR,
+	DEPT_CXT_PROCESS = DEPT_CXT_IRQS_NR,
+	DEPT_CXTS_NR
+};
+
+#define DEPT_SIRQF			(1UL << DEPT_CXT_SIRQ)
+#define DEPT_HIRQF			(1UL << DEPT_CXT_HIRQ)
 
 struct dept_ecxt;
 struct dept_iecxt {
@@ -95,8 +100,8 @@ struct dept_class {
 	/*
 	 * for tracking IRQ dependencies
 	 */
-	struct dept_iecxt		iecxt[DEPT_IRQS_NR];
-	struct dept_iwait		iwait[DEPT_IRQS_NR];
+	struct dept_iecxt		iecxt[DEPT_CXT_IRQS_NR];
+	struct dept_iwait		iwait[DEPT_CXT_IRQS_NR];
 };
 
 struct dept_stack {
@@ -150,8 +155,8 @@ struct dept_ecxt {
 	/*
 	 * where the IRQ-enabled happened
 	 */
-	unsigned long			enirq_ip[DEPT_IRQS_NR];
-	struct dept_stack		*enirq_stack[DEPT_IRQS_NR];
+	unsigned long			enirq_ip[DEPT_CXT_IRQS_NR];
+	struct dept_stack		*enirq_stack[DEPT_CXT_IRQS_NR];
 
 	/*
 	 * where the event context started
@@ -194,8 +199,8 @@ struct dept_wait {
 	/*
 	 * where the IRQ wait happened
 	 */
-	unsigned long			irq_ip[DEPT_IRQS_NR];
-	struct dept_stack		*irq_stack[DEPT_IRQS_NR];
+	unsigned long			irq_ip[DEPT_CXT_IRQS_NR];
+	struct dept_stack		*irq_stack[DEPT_CXT_IRQS_NR];
 
 	/*
 	 * where the wait happened
@@ -405,19 +410,19 @@ struct dept_task {
 	int				wait_hist_pos;
 
 	/*
-	 * sequential id to identify each IRQ context
+	 * sequential id to identify each context
 	 */
-	unsigned int			irq_id[DEPT_IRQS_NR];
+	unsigned int			cxt_id[DEPT_CXTS_NR];
 
 	/*
 	 * for tracking IRQ-enabled points with cross-event
 	 */
-	unsigned int			wgen_enirq[DEPT_IRQS_NR];
+	unsigned int			wgen_enirq[DEPT_CXT_IRQS_NR];
 
 	/*
 	 * for keeping up-to-date IRQ-enabled points
 	 */
-	unsigned long			enirq_ip[DEPT_IRQS_NR];
+	unsigned long			enirq_ip[DEPT_CXT_IRQS_NR];
 
 	/*
 	 * current effective IRQ-enabled flag
@@ -459,7 +464,7 @@ struct dept_task {
 	.wait_hist = { { .wait = NULL, } },			\
 	.ecxt_held_pos = 0,					\
 	.wait_hist_pos = 0,					\
-	.irq_id = { 0U },					\
+	.cxt_id = { 0U },					\
 	.wgen_enirq = { 0U },					\
 	.enirq_ip = { 0UL },					\
 	.eff_enirqf = 0UL,					\
@@ -497,6 +502,7 @@ struct dept_task {
 extern void dept_wait_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *w_fn, int ne);
 extern void dept_event_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *e_fn);
 extern void dept_ask_event_split_map(struct dept_map_each *me, struct dept_map_common *mc);
+extern void dept_kernel_enter(void);
 
 static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 {
@@ -549,6 +555,7 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 #define dept_wait_split_map(me, mc, ip, w_fn, ne)	do { } while (0)
 #define dept_event_split_map(me, mc, ip, e_fn)		do { } while (0)
 #define dept_ask_event_split_map(me, mc)		do { } while (0)
+#define dept_kernel_enter()				do { } while (0)
 #define dept_ecxt_enter_nokeep(m)			do { } while (0)
 #define dept_key_init(k)				do { (void)(k); } while (0)
 #define dept_key_destroy(k)				do { (void)(k); } while (0)
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index a0413f1..18e5951 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -214,9 +214,9 @@ static inline struct dept_class *dep_tc(struct dept_dep *d)
 
 static inline const char *irq_str(int irq)
 {
-	if (irq == DEPT_SIRQ)
+	if (irq == DEPT_CXT_SIRQ)
 		return "softirq";
-	if (irq == DEPT_HIRQ)
+	if (irq == DEPT_CXT_HIRQ)
 		return "hardirq";
 	return "(unknown)";
 }
@@ -376,7 +376,7 @@ static void initialize_class(struct dept_class *c)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		struct dept_iecxt *ie = &c->iecxt[i];
 		struct dept_iwait *iw = &c->iwait[i];
 
@@ -401,7 +401,7 @@ static void initialize_ecxt(struct dept_ecxt *e)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		e->enirq_stack[i] = NULL;
 		e->enirq_ip[i] = 0UL;
 	}
@@ -417,7 +417,7 @@ static void initialize_wait(struct dept_wait *w)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		w->irq_stack[i] = NULL;
 		w->irq_ip[i] = 0UL;
 	}
@@ -456,7 +456,7 @@ static void destroy_ecxt(struct dept_ecxt *e)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++)
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++)
 		if (e->enirq_stack[i])
 			put_stack(e->enirq_stack[i]);
 	if (e->class)
@@ -472,7 +472,7 @@ static void destroy_wait(struct dept_wait *w)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++)
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++)
 		if (w->irq_stack[i])
 			put_stack(w->irq_stack[i]);
 	if (w->class)
@@ -617,7 +617,7 @@ static void print_diagram(struct dept_dep *d)
 	const char *c_fn = e->ecxt_fn ?: "(unknown)";
 
 	irqf = e->enirqf & w->irqf;
-	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+	for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
 		if (!firstline)
 			pr_warn("\nor\n\n");
 		firstline = false;
@@ -648,7 +648,7 @@ static void print_dep(struct dept_dep *d)
 	const char *c_fn = e->ecxt_fn ?: "(unknown)";
 
 	irqf = e->enirqf & w->irqf;
-	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+	for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
 		pr_warn("%s has been enabled:\n", irq_str(irq));
 		print_ip_stack(e->enirq_ip[irq], e->enirq_stack[irq]);
 		pr_warn("\n");
@@ -874,7 +874,7 @@ static void bfs(struct dept_class *c, bfs_f *cb, void *in, void **out)
  */
 
 static inline unsigned long cur_enirqf(void);
-static inline int cur_irq(void);
+static inline int cur_cxt(void);
 static inline unsigned int cur_ctxt_id(void);
 
 static inline struct dept_iecxt *iecxt(struct dept_class *c, int irq)
@@ -1413,7 +1413,7 @@ static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
 	if (d) {
 		check_dl_bfs(d);
 
-		for (i = 0; i < DEPT_IRQS_NR; i++) {
+		for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 			struct dept_iwait *fiw = iwait(fc, i);
 			struct dept_iecxt *found_ie;
 			struct dept_iwait *found_iw;
@@ -1449,7 +1449,7 @@ static void add_wait(struct dept_class *c, unsigned long ip,
 	struct dept_task *dt = dept_task();
 	struct dept_wait *w;
 	unsigned int wg = 0U;
-	int irq;
+	int cxt;
 	int i;
 
 	w = new_wait();
@@ -1461,9 +1461,9 @@ static void add_wait(struct dept_class *c, unsigned long ip,
 	w->wait_fn = w_fn;
 	w->wait_stack = get_current_stack();
 
-	irq = cur_irq();
-	if (irq < DEPT_IRQS_NR)
-		add_iwait(c, irq, w);
+	cxt = cur_cxt();
+	if (cxt == DEPT_CXT_HIRQ || cxt == DEPT_CXT_SIRQ)
+		add_iwait(c, cxt, w);
 
 	/*
 	 * Avoid adding dependency between user aware nested ecxt and
@@ -1521,7 +1521,7 @@ static bool add_ecxt(void *obj, struct dept_class *c, unsigned long ip,
 	eh->nest = ne;
 
 	irqf = cur_enirqf();
-	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR)
+	for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR)
 		add_iecxt(c, irq, e, false);
 
 	del_ecxt(e);
@@ -1656,7 +1656,7 @@ static void do_event(void *obj, struct dept_class *c, unsigned int wg,
 			break;
 	}
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		struct dept_ecxt *e;
 
 		if (before(dt->wgen_enirq[i], wg))
@@ -1698,7 +1698,7 @@ static void disconnect_class(struct dept_class *c)
 		call_rcu(&d->rh, del_dep_rcu);
 	}
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		stale_iecxt(iecxt(c, i));
 		stale_iwait(iwait(c, i));
 	}
@@ -1723,27 +1723,21 @@ static inline unsigned long cur_enirqf(void)
 	return 0UL;
 }
 
-static inline int cur_irq(void)
+static inline int cur_cxt(void)
 {
 	if (lockdep_softirq_context(current))
-		return DEPT_SIRQ;
+		return DEPT_CXT_SIRQ;
 	if (lockdep_hardirq_context())
-		return DEPT_HIRQ;
-	return DEPT_IRQS_NR;
+		return DEPT_CXT_HIRQ;
+	return DEPT_CXT_PROCESS;
 }
 
 static inline unsigned int cur_ctxt_id(void)
 {
 	struct dept_task *dt = dept_task();
-	int irq = cur_irq();
+	int cxt = cur_cxt();
 
-	/*
-	 * Normal process context
-	 */
-	if (irq == DEPT_IRQS_NR)
-		return 0U;
-
-	return dt->irq_id[irq] | (1UL << irq);
+	return dt->cxt_id[cxt] | (1UL << cxt);
 }
 
 static void enirq_transition(int irq)
@@ -1793,7 +1787,7 @@ static void enirq_update(unsigned long ip)
 	/*
 	 * Do enirq_transition() only on an OFF -> ON transition.
 	 */
-	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+	for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
 		if (prev & (1UL << irq))
 			continue;
 
@@ -1850,6 +1844,13 @@ void dept_enirq_transition(unsigned long ip)
 	dept_exit(flags);
 }
 
+void dept_kernel_enter(void)
+{
+	struct dept_task *dt = dept_task();
+
+	dt->cxt_id[DEPT_CXT_PROCESS] += (1UL << DEPT_CXTS_NR);
+}
+
 /*
  * Ensure it's the outmost softirq context.
  */
@@ -1857,7 +1858,7 @@ void dept_softirq_enter(void)
 {
 	struct dept_task *dt = dept_task();
 
-	dt->irq_id[DEPT_SIRQ] += (1UL << DEPT_IRQS_NR);
+	dt->cxt_id[DEPT_CXT_SIRQ] += (1UL << DEPT_CXTS_NR);
 }
 
 /*
@@ -1867,7 +1868,7 @@ void dept_hardirq_enter(void)
 {
 	struct dept_task *dt = dept_task();
 
-	dt->irq_id[DEPT_HIRQ] += (1UL << DEPT_IRQS_NR);
+	dt->cxt_id[DEPT_CXT_HIRQ] += (1UL << DEPT_CXTS_NR);
 }
 
 /*
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 16/21] dept: Distinguish each work from another
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

Workqueue already provides concurrency control. By that, any wait in a
work doesn't prevents events in other works with the control enabled.
Thus, each work would better be considered a different context.

So let Dept assign a different context id to each work.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/dept.h     |  2 ++
 kernel/dependency/dept.c | 10 ++++++++++
 kernel/workqueue.c       |  3 +++
 3 files changed, 15 insertions(+)

diff --git a/include/linux/dept.h b/include/linux/dept.h
index c020b17..1a3858c 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -503,6 +503,7 @@ struct dept_task {
 extern void dept_event_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *e_fn);
 extern void dept_ask_event_split_map(struct dept_map_each *me, struct dept_map_common *mc);
 extern void dept_kernel_enter(void);
+extern void dept_work_enter(void);
 
 static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 {
@@ -556,6 +557,7 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 #define dept_event_split_map(me, mc, ip, e_fn)		do { } while (0)
 #define dept_ask_event_split_map(me, mc)		do { } while (0)
 #define dept_kernel_enter()				do { } while (0)
+#define dept_work_enter()				do { } while (0)
 #define dept_ecxt_enter_nokeep(m)			do { } while (0)
 #define dept_key_init(k)				do { (void)(k); } while (0)
 #define dept_key_destroy(k)				do { (void)(k); } while (0)
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 18e5951..6707313 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -1844,6 +1844,16 @@ void dept_enirq_transition(unsigned long ip)
 	dept_exit(flags);
 }
 
+/*
+ * Assign a different context id to each work.
+ */
+void dept_work_enter(void)
+{
+	struct dept_task *dt = dept_task();
+
+	dt->cxt_id[DEPT_CXT_PROCESS] += (1UL << DEPT_CXTS_NR);
+}
+
 void dept_kernel_enter(void)
 {
 	struct dept_task *dt = dept_task();
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0d2514b..334654c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -51,6 +51,7 @@
 #include <linux/sched/isolation.h>
 #include <linux/nmi.h>
 #include <linux/kvm_para.h>
+#include <linux/dept.h>
 
 #include "workqueue_internal.h"
 
@@ -2199,6 +2200,8 @@ static void process_one_work(struct worker *worker, struct work_struct *work)
 
 	lockdep_copy_map(&lockdep_map, &work->lockdep_map);
 #endif
+	dept_work_enter();
+
 	/* ensure we're on the correct CPU */
 	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
 		     raw_smp_processor_id() != pool->cpu);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 16/21] dept: Distinguish each work from another
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Workqueue already provides concurrency control. By that, any wait in a
work doesn't prevents events in other works with the control enabled.
Thus, each work would better be considered a different context.

So let Dept assign a different context id to each work.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/dept.h     |  2 ++
 kernel/dependency/dept.c | 10 ++++++++++
 kernel/workqueue.c       |  3 +++
 3 files changed, 15 insertions(+)

diff --git a/include/linux/dept.h b/include/linux/dept.h
index c020b17..1a3858c 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -503,6 +503,7 @@ struct dept_task {
 extern void dept_event_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *e_fn);
 extern void dept_ask_event_split_map(struct dept_map_each *me, struct dept_map_common *mc);
 extern void dept_kernel_enter(void);
+extern void dept_work_enter(void);
 
 static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 {
@@ -556,6 +557,7 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 #define dept_event_split_map(me, mc, ip, e_fn)		do { } while (0)
 #define dept_ask_event_split_map(me, mc)		do { } while (0)
 #define dept_kernel_enter()				do { } while (0)
+#define dept_work_enter()				do { } while (0)
 #define dept_ecxt_enter_nokeep(m)			do { } while (0)
 #define dept_key_init(k)				do { (void)(k); } while (0)
 #define dept_key_destroy(k)				do { (void)(k); } while (0)
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 18e5951..6707313 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -1844,6 +1844,16 @@ void dept_enirq_transition(unsigned long ip)
 	dept_exit(flags);
 }
 
+/*
+ * Assign a different context id to each work.
+ */
+void dept_work_enter(void)
+{
+	struct dept_task *dt = dept_task();
+
+	dt->cxt_id[DEPT_CXT_PROCESS] += (1UL << DEPT_CXTS_NR);
+}
+
 void dept_kernel_enter(void)
 {
 	struct dept_task *dt = dept_task();
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0d2514b..334654c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -51,6 +51,7 @@
 #include <linux/sched/isolation.h>
 #include <linux/nmi.h>
 #include <linux/kvm_para.h>
+#include <linux/dept.h>
 
 #include "workqueue_internal.h"
 
@@ -2199,6 +2200,8 @@ static void process_one_work(struct worker *worker, struct work_struct *work)
 
 	lockdep_copy_map(&lockdep_map, &work->lockdep_map);
 #endif
+	dept_work_enter();
+
 	/* ensure we're on the correct CPU */
 	WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
 		     raw_smp_processor_id() != pool->cpu);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 17/21] dept: Disable Dept within the wait_bit layer by default
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

The struct wait_queue_head array, bit_wait_table[] in sched/wait_bit.c
are shared by all its users, which unfortunately vary in terms of class.
So each should've been assigned its own class to avoid false positives.

It'd better let Dept work at a higher layer than wait_bit. So disabled
Dept within the wait_bit layer by default.

It's worth noting that Dept is still working with the other struct
wait_queue_head ones that are mostly well-classified.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/wait_bit.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/wait_bit.c b/kernel/sched/wait_bit.c
index d4788f8..df93e33 100644
--- a/kernel/sched/wait_bit.c
+++ b/kernel/sched/wait_bit.c
@@ -3,6 +3,7 @@
 /*
  * The implementation of the wait_bit*() and related waiting APIs:
  */
+#include <linux/dept.h>
 
 #define WAIT_TABLE_BITS 8
 #define WAIT_TABLE_SIZE (1 << WAIT_TABLE_BITS)
@@ -246,6 +247,8 @@ void __init wait_bit_init(void)
 {
 	int i;
 
-	for (i = 0; i < WAIT_TABLE_SIZE; i++)
+	for (i = 0; i < WAIT_TABLE_SIZE; i++) {
 		init_waitqueue_head(bit_wait_table + i);
+		dept_map_nocheck(&(bit_wait_table + i)->dmap);
+	}
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 17/21] dept: Disable Dept within the wait_bit layer by default
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

The struct wait_queue_head array, bit_wait_table[] in sched/wait_bit.c
are shared by all its users, which unfortunately vary in terms of class.
So each should've been assigned its own class to avoid false positives.

It'd better let Dept work at a higher layer than wait_bit. So disabled
Dept within the wait_bit layer by default.

It's worth noting that Dept is still working with the other struct
wait_queue_head ones that are mostly well-classified.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/wait_bit.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/wait_bit.c b/kernel/sched/wait_bit.c
index d4788f8..df93e33 100644
--- a/kernel/sched/wait_bit.c
+++ b/kernel/sched/wait_bit.c
@@ -3,6 +3,7 @@
 /*
  * The implementation of the wait_bit*() and related waiting APIs:
  */
+#include <linux/dept.h>
 
 #define WAIT_TABLE_BITS 8
 #define WAIT_TABLE_SIZE (1 << WAIT_TABLE_BITS)
@@ -246,6 +247,8 @@ void __init wait_bit_init(void)
 {
 	int i;
 
-	for (i = 0; i < WAIT_TABLE_SIZE; i++)
+	for (i = 0; i < WAIT_TABLE_SIZE; i++) {
 		init_waitqueue_head(bit_wait_table + i);
+		dept_map_nocheck(&(bit_wait_table + i)->dmap);
+	}
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 18/21] dept: Disable Dept on struct crypto_larval's completion for now
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

struct crypto_larval's completion is used for multiple purposes e.g.
waiting for test to complete or waiting for probe to complete.

The completion variable needs to be split according to what it's used
for. Otherwise, Dept cannot distinguish one from another and doesn't
work properly. Now that it isn't, disable Dept on it.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 crypto/api.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/crypto/api.c b/crypto/api.c
index 69508ae..305d24c 100644
--- a/crypto/api.c
+++ b/crypto/api.c
@@ -115,7 +115,12 @@ struct crypto_larval *crypto_larval_alloc(const char *name, u32 type, u32 mask)
 	larval->alg.cra_destroy = crypto_larval_destroy;
 
 	strlcpy(larval->alg.cra_name, name, CRYPTO_MAX_ALG_NAME);
-	init_completion(&larval->completion);
+	/*
+	 * TODO: Split ->completion according to what it's used for e.g.
+	 * ->test_completion, ->probe_completion and the like, so that
+	 *  Dept can track its dependency properly.
+	 */
+	init_completion_nocheck(&larval->completion);
 
 	return larval;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 18/21] dept: Disable Dept on struct crypto_larval's completion for now
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

struct crypto_larval's completion is used for multiple purposes e.g.
waiting for test to complete or waiting for probe to complete.

The completion variable needs to be split according to what it's used
for. Otherwise, Dept cannot distinguish one from another and doesn't
work properly. Now that it isn't, disable Dept on it.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 crypto/api.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/crypto/api.c b/crypto/api.c
index 69508ae..305d24c 100644
--- a/crypto/api.c
+++ b/crypto/api.c
@@ -115,7 +115,12 @@ struct crypto_larval *crypto_larval_alloc(const char *name, u32 type, u32 mask)
 	larval->alg.cra_destroy = crypto_larval_destroy;
 
 	strlcpy(larval->alg.cra_name, name, CRYPTO_MAX_ALG_NAME);
-	init_completion(&larval->completion);
+	/*
+	 * TODO: Split ->completion according to what it's used for e.g.
+	 * ->test_completion, ->probe_completion and the like, so that
+	 *  Dept can track its dependency properly.
+	 */
+	init_completion_nocheck(&larval->completion);
 
 	return larval;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 19/21] dept: Differentiate onstack maps from others of different tasks in class
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

Dept assumes that maps might belong to the same class if the running
code is the same for possibility detection. However, maps on stack would
never belong to a common class between different tasks because each task
has its own instance on stack.

So differentiated onstack maps from others in class, to avoid false
positive alarms.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/dept.h     |   3 +
 kernel/dependency/dept.c | 166 ++++++++++++++++++++++++++++++++++++++---------
 kernel/exit.c            |   8 ++-
 3 files changed, 147 insertions(+), 30 deletions(-)

diff --git a/include/linux/dept.h b/include/linux/dept.h
index 1a3858c..3027121 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -72,6 +72,7 @@ struct dept_class {
 	 */
 	const char			*name;
 	unsigned long			key;
+	unsigned long			key2;
 	int				sub;
 
 	/*
@@ -343,6 +344,7 @@ struct dept_key {
 struct dept_map {
 	const char			*name;
 	struct dept_key			*keys;
+	unsigned long			key2;
 	int				sub_usr;
 
 	/*
@@ -366,6 +368,7 @@ struct dept_map {
 {									\
 	.name = #n,							\
 	.keys = NULL,							\
+	.key2 = 0UL,							\
 	.sub_usr = 0,							\
 	.keys_local = { .classes = { 0 } },				\
 	.wgen = 0U,							\
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 6707313..2bc6259 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -73,6 +73,7 @@
 #include <linux/hash.h>
 #include <linux/dept.h>
 #include <linux/utsname.h>
+#include <linux/sched/task_stack.h>
 #include "dept_internal.h"
 
 static int dept_stop;
@@ -523,12 +524,12 @@ static unsigned long key_dep(struct dept_dep *d)
 
 static bool cmp_class(struct dept_class *c1, struct dept_class *c2)
 {
-	return c1->key == c2->key;
+	return c1->key == c2->key && c1->key2 == c2->key2;
 }
 
 static unsigned long key_class(struct dept_class *c)
 {
-	return c->key;
+	return c->key2 ? mix(c->key, c->key2) : c->key;
 }
 
 #define HASH(id, bits)							\
@@ -571,14 +572,38 @@ static inline struct dept_dep *lookup_dep(struct dept_class *fc,
 	return hash_lookup_dep(&onetime_d);
 }
 
-static inline struct dept_class *lookup_class(unsigned long key)
+static inline struct dept_class *lookup_class(unsigned long key,
+					      unsigned long key2)
 {
-	struct dept_class onetime_c = { .key = key };
+	struct dept_class onetime_c = { .key = key, .key2 = key2 };
 
 	return hash_lookup_class(&onetime_c);
 }
 
 /*
+ * NOTE: Must be called with dept_lock held.
+ */
+static void obtain_classes_from_hlist(struct hlist_head *to,
+			bool (*cmp)(struct dept_class *c, void *data),
+			void *data)
+{
+	struct dept_class *c;
+	struct hlist_node *n;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(table_class); i++) {
+		struct hlist_head *h = table_class + i;
+
+		hlist_for_each_entry_safe(c, n, h, hash_node) {
+			if (cmp(c, data)) {
+				hlist_del_rcu(&c->hash_node);
+				hlist_add_head_rcu(&c->hash_node, to);
+			}
+		}
+	}
+}
+
+/*
  * Report
  * =====================================================================
  * DEPT prints useful information to help debuging on detection of
@@ -1899,6 +1924,7 @@ void dept_map_init(struct dept_map *m, struct dept_key *k, int sub,
 		   const char *n)
 {
 	unsigned long flags;
+	bool onstack;
 
 	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
 		return;
@@ -1908,6 +1934,16 @@ void dept_map_init(struct dept_map *m, struct dept_key *k, int sub,
 		return;
 	}
 
+	onstack = object_is_on_stack(m);
+
+	/*
+	 * Require an explicit key for onstack maps.
+	 */
+	if (onstack && !k) {
+		m->nocheck = true;
+		return;
+	}
+
 	/*
 	 * Allow recursive entrance.
 	 */
@@ -1917,6 +1953,7 @@ void dept_map_init(struct dept_map *m, struct dept_key *k, int sub,
 
 	m->sub_usr = sub;
 	m->keys = k;
+	m->key2 = onstack ? (unsigned long)current : 0UL;
 	m->name = n;
 	m->wgen = 0U;
 	m->nocheck = false;
@@ -2031,7 +2068,7 @@ static inline int map_sub(struct dept_map *m, int e)
 
 static struct dept_class *check_new_class(struct dept_key *local,
 					  struct dept_key *k, int sub,
-					  const char *n)
+					  unsigned long k2, const char *n)
 {
 	struct dept_class *c = NULL;
 
@@ -2047,14 +2084,14 @@ static struct dept_class *check_new_class(struct dept_key *local,
 	if (c)
 		return c;
 
-	c = lookup_class((unsigned long)k->subkeys + sub);
+	c = lookup_class((unsigned long)k->subkeys + sub, k2);
 	if (c)
 		goto caching;
 
 	if (unlikely(!dept_lock()))
 		return NULL;
 
-	c = lookup_class((unsigned long)k->subkeys + sub);
+	c = lookup_class((unsigned long)k->subkeys + sub, k2);
 	if (unlikely(c))
 		goto unlock;
 
@@ -2065,6 +2102,7 @@ static struct dept_class *check_new_class(struct dept_key *local,
 	c->name = n;
 	c->sub = sub;
 	c->key = (unsigned long)(k->subkeys + sub);
+	c->key2 = k2;
 	hash_add_class(c);
 	list_add(&c->all_node, &dept_classes);
 unlock:
@@ -2099,8 +2137,8 @@ static void __dept_wait(struct dept_map *m, unsigned long w_f,
 		struct dept_key *k;
 
 		k = m->keys ?: &m->keys_local;
-		c = check_new_class(&m->keys_local, k,
-				    map_sub(m, e), m->name);
+		c = check_new_class(&m->keys_local, k, map_sub(m, e),
+				    m->key2, m->name);
 		if (!c)
 			continue;
 
@@ -2298,7 +2336,8 @@ void dept_ecxt_enter(struct dept_map *m, unsigned long e_f, unsigned long ip,
 	DEPT_WARN_ON(1UL << e != e_f);
 
 	k = m->keys ?: &m->keys_local;
-	c = check_new_class(&m->keys_local, k, map_sub(m, e), m->name);
+	c = check_new_class(&m->keys_local, k, map_sub(m, e),
+			    m->key2, m->name);
 
 	if (c && add_ecxt((void *)m, c, ip, c_fn, e_fn, ne))
 		goto exit;
@@ -2376,7 +2415,8 @@ void dept_event(struct dept_map *m, unsigned long e_f, unsigned long ip,
 	DEPT_WARN_ON(1UL << e != e_f);
 
 	k = m->keys ?: &m->keys_local;
-	c = check_new_class(&m->keys_local, k, map_sub(m, e), m->name);
+	c = check_new_class(&m->keys_local, k, map_sub(m, e),
+			    m->key2, m->name);
 
 	if (c && add_ecxt((void *)m, c, 0UL, NULL, e_fn, 0)) {
 		do_event((void *)m, c, READ_ONCE(m->wgen), ip);
@@ -2427,7 +2467,8 @@ void dept_ecxt_exit(struct dept_map *m, unsigned long e_f,
 	DEPT_WARN_ON(1UL << e != e_f);
 
 	k = m->keys ?: &m->keys_local;
-	c = check_new_class(&m->keys_local, k, map_sub(m, e), m->name);
+	c = check_new_class(&m->keys_local, k, map_sub(m, e),
+			    m->key2, m->name);
 
 	if (c && pop_ecxt((void *)m, c))
 		goto exit;
@@ -2504,7 +2545,7 @@ void dept_wait_split_map(struct dept_map_each *me,
 	flags = dept_enter();
 
 	k = mc->keys ?: &mc->keys_local;
-	c = check_new_class(&mc->keys_local, k, 0, mc->name);
+	c = check_new_class(&mc->keys_local, k, 0, 0UL, mc->name);
 	if (c)
 		add_wait(c, ip, w_fn, ne);
 
@@ -2568,7 +2609,7 @@ void dept_event_split_map(struct dept_map_each *me,
 	flags = dept_enter();
 
 	k = mc->keys ?: &mc->keys_local;
-	c = check_new_class(&mc->keys_local, k, 0, mc->name);
+	c = check_new_class(&mc->keys_local, k, 0, 0UL, mc->name);
 
 	if (c && add_ecxt((void *)me, c, 0UL, NULL, e_fn, 0)) {
 		do_event((void *)me, c, READ_ONCE(me->wgen), ip);
@@ -2584,12 +2625,64 @@ void dept_event_split_map(struct dept_map_each *me,
 }
 EXPORT_SYMBOL_GPL(dept_event_split_map);
 
+static bool cmp_class_key2(struct dept_class *c, void *k2)
+{
+	return c->key2 == (unsigned long)k2;
+}
+
+static void per_task_key_destroy(void)
+{
+	struct dept_class *c;
+	struct hlist_node *n;
+	HLIST_HEAD(h);
+
+	/*
+	 * per_task_key_destroy() should not fail.
+	 *
+	 * FIXME: Should be fixed if per_task_key_destroy() causes
+	 * deadlock with dept_lock().
+	 */
+	while (unlikely(!dept_lock()))
+		cpu_relax();
+
+	obtain_classes_from_hlist(&h, cmp_class_key2, current);
+
+	hlist_for_each_entry_safe(c, n, &h, hash_node) {
+		hash_del_class(c);
+		disconnect_class(c);
+		list_del(&c->all_node);
+		inval_class(c);
+
+		/*
+		 * Actual deletion will happen on the rcu callback
+		 * that has been added in disconnect_class().
+		 */
+		del_class(c);
+	}
+
+	dept_unlock();
+}
+
 void dept_task_exit(struct task_struct *t)
 {
-	struct dept_task *dt = &t->dept_task;
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
 	int i;
 
-	raw_local_irq_disable();
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		DEPT_STOP("Entered task_exit() while Dept is working.\n");
+		return;
+	}
+
+	if (t != current) {
+		DEPT_STOP("Never expect task_exit() done by others.\n");
+		return;
+	}
+
+	flags = dept_enter();
 
 	if (dt->stack)
 		put_stack(dt->stack);
@@ -2601,9 +2694,17 @@ void dept_task_exit(struct task_struct *t)
 		if (dt->wait_hist[i].wait)
 			put_wait(dt->wait_hist[i].wait);
 
+	per_task_key_destroy();
+
 	dept_off();
+	dept_exit(flags);
 
-	raw_local_irq_enable();
+	/*
+	 * Wait until even lockless hash_lookup_class() for the class
+	 * returns NULL.
+	 */
+	might_sleep();
+	synchronize_rcu();
 }
 
 void dept_task_init(struct task_struct *t)
@@ -2611,10 +2712,18 @@ void dept_task_init(struct task_struct *t)
 	memset(&t->dept_task, 0x0, sizeof(struct dept_task));
 }
 
+static bool cmp_class_key1(struct dept_class *c, void *k1)
+{
+	return c->key == (unsigned long)k1;
+}
+
 void dept_key_init(struct dept_key *k)
 {
 	struct dept_task *dt = dept_task();
 	unsigned long flags;
+	struct dept_class *c;
+	struct hlist_node *n;
+	HLIST_HEAD(h);
 	int sub;
 
 	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
@@ -2636,13 +2745,11 @@ void dept_key_init(struct dept_key *k)
 	while (unlikely(!dept_lock()))
 		cpu_relax();
 
-	for (sub = 0; sub < DEPT_MAX_SUBCLASSES; sub++) {
-		struct dept_class *c;
-
-		c = lookup_class((unsigned long)k->subkeys + sub);
-		if (!c)
-			continue;
+	for (sub = 0; sub < DEPT_MAX_SUBCLASSES; sub++)
+		obtain_classes_from_hlist(&h, cmp_class_key1,
+					  k->subkeys + sub);
 
+	hlist_for_each_entry_safe(c, n, &h, hash_node) {
 		DEPT_STOP("The class(%s/%d) has not been removed.\n",
 			  c->name, sub);
 		break;
@@ -2657,6 +2764,9 @@ void dept_key_destroy(struct dept_key *k)
 {
 	struct dept_task *dt = dept_task();
 	unsigned long flags;
+	struct dept_class *c;
+	struct hlist_node *n;
+	HLIST_HEAD(h);
 	int sub;
 
 	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
@@ -2678,13 +2788,11 @@ void dept_key_destroy(struct dept_key *k)
 	while (unlikely(!dept_lock()))
 		cpu_relax();
 
-	for (sub = 0; sub < DEPT_MAX_SUBCLASSES; sub++) {
-		struct dept_class *c;
-
-		c = lookup_class((unsigned long)k->subkeys + sub);
-		if (!c)
-			continue;
+	for (sub = 0; sub < DEPT_MAX_SUBCLASSES; sub++)
+		obtain_classes_from_hlist(&h, cmp_class_key1,
+					  k->subkeys + sub);
 
+	hlist_for_each_entry_safe(c, n, &h, hash_node) {
 		hash_del_class(c);
 		disconnect_class(c);
 		list_del(&c->all_node);
diff --git a/kernel/exit.c b/kernel/exit.c
index bac41ee..d381fd4 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -738,6 +738,13 @@ void __noreturn do_exit(long code)
 	struct task_struct *tsk = current;
 	int group_dead;
 
+	/*
+	 * dept_task_exit() requires might_sleep() because it needs to
+	 * wait on the grace period after cleaning the objects that have
+	 * been coupled with the current task_struct.
+	 */
+	dept_task_exit(tsk);
+
 	WARN_ON(tsk->plug);
 
 	kcov_task_exit(tsk);
@@ -844,7 +851,6 @@ void __noreturn do_exit(long code)
 	exit_tasks_rcu_finish();
 
 	lockdep_free_task(tsk);
-	dept_task_exit(tsk);
 	do_task_dead();
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 19/21] dept: Differentiate onstack maps from others of different tasks in class
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Dept assumes that maps might belong to the same class if the running
code is the same for possibility detection. However, maps on stack would
never belong to a common class between different tasks because each task
has its own instance on stack.

So differentiated onstack maps from others in class, to avoid false
positive alarms.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/dept.h     |   3 +
 kernel/dependency/dept.c | 166 ++++++++++++++++++++++++++++++++++++++---------
 kernel/exit.c            |   8 ++-
 3 files changed, 147 insertions(+), 30 deletions(-)

diff --git a/include/linux/dept.h b/include/linux/dept.h
index 1a3858c..3027121 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -72,6 +72,7 @@ struct dept_class {
 	 */
 	const char			*name;
 	unsigned long			key;
+	unsigned long			key2;
 	int				sub;
 
 	/*
@@ -343,6 +344,7 @@ struct dept_key {
 struct dept_map {
 	const char			*name;
 	struct dept_key			*keys;
+	unsigned long			key2;
 	int				sub_usr;
 
 	/*
@@ -366,6 +368,7 @@ struct dept_map {
 {									\
 	.name = #n,							\
 	.keys = NULL,							\
+	.key2 = 0UL,							\
 	.sub_usr = 0,							\
 	.keys_local = { .classes = { 0 } },				\
 	.wgen = 0U,							\
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 6707313..2bc6259 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -73,6 +73,7 @@
 #include <linux/hash.h>
 #include <linux/dept.h>
 #include <linux/utsname.h>
+#include <linux/sched/task_stack.h>
 #include "dept_internal.h"
 
 static int dept_stop;
@@ -523,12 +524,12 @@ static unsigned long key_dep(struct dept_dep *d)
 
 static bool cmp_class(struct dept_class *c1, struct dept_class *c2)
 {
-	return c1->key == c2->key;
+	return c1->key == c2->key && c1->key2 == c2->key2;
 }
 
 static unsigned long key_class(struct dept_class *c)
 {
-	return c->key;
+	return c->key2 ? mix(c->key, c->key2) : c->key;
 }
 
 #define HASH(id, bits)							\
@@ -571,14 +572,38 @@ static inline struct dept_dep *lookup_dep(struct dept_class *fc,
 	return hash_lookup_dep(&onetime_d);
 }
 
-static inline struct dept_class *lookup_class(unsigned long key)
+static inline struct dept_class *lookup_class(unsigned long key,
+					      unsigned long key2)
 {
-	struct dept_class onetime_c = { .key = key };
+	struct dept_class onetime_c = { .key = key, .key2 = key2 };
 
 	return hash_lookup_class(&onetime_c);
 }
 
 /*
+ * NOTE: Must be called with dept_lock held.
+ */
+static void obtain_classes_from_hlist(struct hlist_head *to,
+			bool (*cmp)(struct dept_class *c, void *data),
+			void *data)
+{
+	struct dept_class *c;
+	struct hlist_node *n;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(table_class); i++) {
+		struct hlist_head *h = table_class + i;
+
+		hlist_for_each_entry_safe(c, n, h, hash_node) {
+			if (cmp(c, data)) {
+				hlist_del_rcu(&c->hash_node);
+				hlist_add_head_rcu(&c->hash_node, to);
+			}
+		}
+	}
+}
+
+/*
  * Report
  * =====================================================================
  * DEPT prints useful information to help debuging on detection of
@@ -1899,6 +1924,7 @@ void dept_map_init(struct dept_map *m, struct dept_key *k, int sub,
 		   const char *n)
 {
 	unsigned long flags;
+	bool onstack;
 
 	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
 		return;
@@ -1908,6 +1934,16 @@ void dept_map_init(struct dept_map *m, struct dept_key *k, int sub,
 		return;
 	}
 
+	onstack = object_is_on_stack(m);
+
+	/*
+	 * Require an explicit key for onstack maps.
+	 */
+	if (onstack && !k) {
+		m->nocheck = true;
+		return;
+	}
+
 	/*
 	 * Allow recursive entrance.
 	 */
@@ -1917,6 +1953,7 @@ void dept_map_init(struct dept_map *m, struct dept_key *k, int sub,
 
 	m->sub_usr = sub;
 	m->keys = k;
+	m->key2 = onstack ? (unsigned long)current : 0UL;
 	m->name = n;
 	m->wgen = 0U;
 	m->nocheck = false;
@@ -2031,7 +2068,7 @@ static inline int map_sub(struct dept_map *m, int e)
 
 static struct dept_class *check_new_class(struct dept_key *local,
 					  struct dept_key *k, int sub,
-					  const char *n)
+					  unsigned long k2, const char *n)
 {
 	struct dept_class *c = NULL;
 
@@ -2047,14 +2084,14 @@ static struct dept_class *check_new_class(struct dept_key *local,
 	if (c)
 		return c;
 
-	c = lookup_class((unsigned long)k->subkeys + sub);
+	c = lookup_class((unsigned long)k->subkeys + sub, k2);
 	if (c)
 		goto caching;
 
 	if (unlikely(!dept_lock()))
 		return NULL;
 
-	c = lookup_class((unsigned long)k->subkeys + sub);
+	c = lookup_class((unsigned long)k->subkeys + sub, k2);
 	if (unlikely(c))
 		goto unlock;
 
@@ -2065,6 +2102,7 @@ static struct dept_class *check_new_class(struct dept_key *local,
 	c->name = n;
 	c->sub = sub;
 	c->key = (unsigned long)(k->subkeys + sub);
+	c->key2 = k2;
 	hash_add_class(c);
 	list_add(&c->all_node, &dept_classes);
 unlock:
@@ -2099,8 +2137,8 @@ static void __dept_wait(struct dept_map *m, unsigned long w_f,
 		struct dept_key *k;
 
 		k = m->keys ?: &m->keys_local;
-		c = check_new_class(&m->keys_local, k,
-				    map_sub(m, e), m->name);
+		c = check_new_class(&m->keys_local, k, map_sub(m, e),
+				    m->key2, m->name);
 		if (!c)
 			continue;
 
@@ -2298,7 +2336,8 @@ void dept_ecxt_enter(struct dept_map *m, unsigned long e_f, unsigned long ip,
 	DEPT_WARN_ON(1UL << e != e_f);
 
 	k = m->keys ?: &m->keys_local;
-	c = check_new_class(&m->keys_local, k, map_sub(m, e), m->name);
+	c = check_new_class(&m->keys_local, k, map_sub(m, e),
+			    m->key2, m->name);
 
 	if (c && add_ecxt((void *)m, c, ip, c_fn, e_fn, ne))
 		goto exit;
@@ -2376,7 +2415,8 @@ void dept_event(struct dept_map *m, unsigned long e_f, unsigned long ip,
 	DEPT_WARN_ON(1UL << e != e_f);
 
 	k = m->keys ?: &m->keys_local;
-	c = check_new_class(&m->keys_local, k, map_sub(m, e), m->name);
+	c = check_new_class(&m->keys_local, k, map_sub(m, e),
+			    m->key2, m->name);
 
 	if (c && add_ecxt((void *)m, c, 0UL, NULL, e_fn, 0)) {
 		do_event((void *)m, c, READ_ONCE(m->wgen), ip);
@@ -2427,7 +2467,8 @@ void dept_ecxt_exit(struct dept_map *m, unsigned long e_f,
 	DEPT_WARN_ON(1UL << e != e_f);
 
 	k = m->keys ?: &m->keys_local;
-	c = check_new_class(&m->keys_local, k, map_sub(m, e), m->name);
+	c = check_new_class(&m->keys_local, k, map_sub(m, e),
+			    m->key2, m->name);
 
 	if (c && pop_ecxt((void *)m, c))
 		goto exit;
@@ -2504,7 +2545,7 @@ void dept_wait_split_map(struct dept_map_each *me,
 	flags = dept_enter();
 
 	k = mc->keys ?: &mc->keys_local;
-	c = check_new_class(&mc->keys_local, k, 0, mc->name);
+	c = check_new_class(&mc->keys_local, k, 0, 0UL, mc->name);
 	if (c)
 		add_wait(c, ip, w_fn, ne);
 
@@ -2568,7 +2609,7 @@ void dept_event_split_map(struct dept_map_each *me,
 	flags = dept_enter();
 
 	k = mc->keys ?: &mc->keys_local;
-	c = check_new_class(&mc->keys_local, k, 0, mc->name);
+	c = check_new_class(&mc->keys_local, k, 0, 0UL, mc->name);
 
 	if (c && add_ecxt((void *)me, c, 0UL, NULL, e_fn, 0)) {
 		do_event((void *)me, c, READ_ONCE(me->wgen), ip);
@@ -2584,12 +2625,64 @@ void dept_event_split_map(struct dept_map_each *me,
 }
 EXPORT_SYMBOL_GPL(dept_event_split_map);
 
+static bool cmp_class_key2(struct dept_class *c, void *k2)
+{
+	return c->key2 == (unsigned long)k2;
+}
+
+static void per_task_key_destroy(void)
+{
+	struct dept_class *c;
+	struct hlist_node *n;
+	HLIST_HEAD(h);
+
+	/*
+	 * per_task_key_destroy() should not fail.
+	 *
+	 * FIXME: Should be fixed if per_task_key_destroy() causes
+	 * deadlock with dept_lock().
+	 */
+	while (unlikely(!dept_lock()))
+		cpu_relax();
+
+	obtain_classes_from_hlist(&h, cmp_class_key2, current);
+
+	hlist_for_each_entry_safe(c, n, &h, hash_node) {
+		hash_del_class(c);
+		disconnect_class(c);
+		list_del(&c->all_node);
+		inval_class(c);
+
+		/*
+		 * Actual deletion will happen on the rcu callback
+		 * that has been added in disconnect_class().
+		 */
+		del_class(c);
+	}
+
+	dept_unlock();
+}
+
 void dept_task_exit(struct task_struct *t)
 {
-	struct dept_task *dt = &t->dept_task;
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
 	int i;
 
-	raw_local_irq_disable();
+	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
+		return;
+
+	if (dt->recursive) {
+		DEPT_STOP("Entered task_exit() while Dept is working.\n");
+		return;
+	}
+
+	if (t != current) {
+		DEPT_STOP("Never expect task_exit() done by others.\n");
+		return;
+	}
+
+	flags = dept_enter();
 
 	if (dt->stack)
 		put_stack(dt->stack);
@@ -2601,9 +2694,17 @@ void dept_task_exit(struct task_struct *t)
 		if (dt->wait_hist[i].wait)
 			put_wait(dt->wait_hist[i].wait);
 
+	per_task_key_destroy();
+
 	dept_off();
+	dept_exit(flags);
 
-	raw_local_irq_enable();
+	/*
+	 * Wait until even lockless hash_lookup_class() for the class
+	 * returns NULL.
+	 */
+	might_sleep();
+	synchronize_rcu();
 }
 
 void dept_task_init(struct task_struct *t)
@@ -2611,10 +2712,18 @@ void dept_task_init(struct task_struct *t)
 	memset(&t->dept_task, 0x0, sizeof(struct dept_task));
 }
 
+static bool cmp_class_key1(struct dept_class *c, void *k1)
+{
+	return c->key == (unsigned long)k1;
+}
+
 void dept_key_init(struct dept_key *k)
 {
 	struct dept_task *dt = dept_task();
 	unsigned long flags;
+	struct dept_class *c;
+	struct hlist_node *n;
+	HLIST_HEAD(h);
 	int sub;
 
 	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
@@ -2636,13 +2745,11 @@ void dept_key_init(struct dept_key *k)
 	while (unlikely(!dept_lock()))
 		cpu_relax();
 
-	for (sub = 0; sub < DEPT_MAX_SUBCLASSES; sub++) {
-		struct dept_class *c;
-
-		c = lookup_class((unsigned long)k->subkeys + sub);
-		if (!c)
-			continue;
+	for (sub = 0; sub < DEPT_MAX_SUBCLASSES; sub++)
+		obtain_classes_from_hlist(&h, cmp_class_key1,
+					  k->subkeys + sub);
 
+	hlist_for_each_entry_safe(c, n, &h, hash_node) {
 		DEPT_STOP("The class(%s/%d) has not been removed.\n",
 			  c->name, sub);
 		break;
@@ -2657,6 +2764,9 @@ void dept_key_destroy(struct dept_key *k)
 {
 	struct dept_task *dt = dept_task();
 	unsigned long flags;
+	struct dept_class *c;
+	struct hlist_node *n;
+	HLIST_HEAD(h);
 	int sub;
 
 	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
@@ -2678,13 +2788,11 @@ void dept_key_destroy(struct dept_key *k)
 	while (unlikely(!dept_lock()))
 		cpu_relax();
 
-	for (sub = 0; sub < DEPT_MAX_SUBCLASSES; sub++) {
-		struct dept_class *c;
-
-		c = lookup_class((unsigned long)k->subkeys + sub);
-		if (!c)
-			continue;
+	for (sub = 0; sub < DEPT_MAX_SUBCLASSES; sub++)
+		obtain_classes_from_hlist(&h, cmp_class_key1,
+					  k->subkeys + sub);
 
+	hlist_for_each_entry_safe(c, n, &h, hash_node) {
 		hash_del_class(c);
 		disconnect_class(c);
 		list_del(&c->all_node);
diff --git a/kernel/exit.c b/kernel/exit.c
index bac41ee..d381fd4 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -738,6 +738,13 @@ void __noreturn do_exit(long code)
 	struct task_struct *tsk = current;
 	int group_dead;
 
+	/*
+	 * dept_task_exit() requires might_sleep() because it needs to
+	 * wait on the grace period after cleaning the objects that have
+	 * been coupled with the current task_struct.
+	 */
+	dept_task_exit(tsk);
+
 	WARN_ON(tsk->plug);
 
 	kcov_task_exit(tsk);
@@ -844,7 +851,6 @@ void __noreturn do_exit(long code)
 	exit_tasks_rcu_finish();
 
 	lockdep_free_task(tsk);
-	dept_task_exit(tsk);
 	do_task_dead();
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 20/21] dept: Do not add dependencies between events within scheduler and sleeps
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

A sleep is not a wait that prevents the events within __schedule(). It
rather goes through __schedule() so all the events are going to be
triggered while sleeping. So they don't have any dependencies with each
other.

So distinguished sleep type of wait from the other type e.i. spinning
and made it skip building dependencies between sleep type of waits and
the events within __schedule().

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/completion.h   |  2 +-
 include/linux/dept.h         | 28 ++++++++++++++++++++----
 include/linux/dept_page.h    |  4 ++--
 include/linux/dept_sdt.h     |  9 +++++++-
 include/linux/lockdep.h      | 52 +++++++++++++++++++++++++++++++++++++++-----
 include/linux/mutex.h        |  2 +-
 include/linux/rwlock.h       | 12 +++++-----
 include/linux/rwsem.h        |  2 +-
 include/linux/seqlock.h      |  2 +-
 include/linux/spinlock.h     |  8 +++----
 kernel/dependency/dept.c     | 37 ++++++++++++++++++++++++-------
 kernel/locking/spinlock_rt.c | 24 ++++++++++----------
 kernel/sched/core.c          |  2 ++
 13 files changed, 138 insertions(+), 46 deletions(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 358c656..2dade27 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -36,7 +36,7 @@ struct completion {
 #define dept_wfc_wait(m, ip)						\
 do {									\
 	dept_ask_event(m);						\
-	dept_wait(m, 1UL, ip, __func__, 0);				\
+	dept_wait(m, 1UL, ip, __func__, 0, true);			\
 } while (0)
 #define dept_wfc_complete(m, ip)		dept_event(m, 1UL, ip, __func__)
 #define dept_wfc_enter(m, ip)			dept_ecxt_enter(m, 1UL, ip, "completion_context_enter", "complete", 0)
diff --git a/include/linux/dept.h b/include/linux/dept.h
index 3027121..28db897 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -170,6 +170,11 @@ struct dept_ecxt {
 	 */
 	unsigned long			event_ip;
 	struct dept_stack		*event_stack;
+
+	/*
+	 * whether the event is triggered within __schedule()
+	 */
+	bool				in_sched;
 };
 
 struct dept_wait {
@@ -208,6 +213,11 @@ struct dept_wait {
 	 */
 	unsigned long			wait_ip;
 	struct dept_stack		*wait_stack;
+
+	/*
+	 * spin or sleep
+	 */
+	bool				sleep;
 };
 
 struct dept_dep {
@@ -460,6 +470,11 @@ struct dept_task {
 	 */
 	bool				hardirqs_enabled;
 	bool				softirqs_enabled;
+
+	/*
+	 * whether the current task is in __schedule()
+	 */
+	bool				in_sched;
 };
 
 #define DEPT_TASK_INITIALIZER(t)				\
@@ -480,6 +495,7 @@ struct dept_task {
 	.missing_ecxt = 0,					\
 	.hardirqs_enabled = false,				\
 	.softirqs_enabled = false,				\
+	.in_sched = false,					\
 }
 
 extern void dept_on(void);
@@ -492,7 +508,7 @@ struct dept_task {
 extern void dept_map_reinit(struct dept_map *m);
 extern void dept_map_nocheck(struct dept_map *m);
 
-extern void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip, const char *w_fn, int ne);
+extern void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip, const char *w_fn, int ne, bool sleep);
 extern void dept_stage_wait(struct dept_map *m, unsigned long w_f, const char *w_fn, int ne);
 extern void dept_ask_event_wait_commit(unsigned long ip);
 extern void dept_clean_stage(void);
@@ -502,11 +518,13 @@ struct dept_task {
 extern void dept_ecxt_exit(struct dept_map *m, unsigned long e_f, unsigned long ip);
 extern void dept_split_map_each_init(struct dept_map_each *me);
 extern void dept_split_map_common_init(struct dept_map_common *mc, struct dept_key *k, const char *n);
-extern void dept_wait_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *w_fn, int ne);
+extern void dept_wait_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *w_fn, int ne, bool sleep);
 extern void dept_event_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *e_fn);
 extern void dept_ask_event_split_map(struct dept_map_each *me, struct dept_map_common *mc);
 extern void dept_kernel_enter(void);
 extern void dept_work_enter(void);
+extern void dept_sched_enter(void);
+extern void dept_sched_exit(void);
 
 static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 {
@@ -546,7 +564,7 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 #define dept_map_reinit(m)				do { } while (0)
 #define dept_map_nocheck(m)				do { } while (0)
 
-#define dept_wait(m, w_f, ip, w_fn, ne)			do { (void)(w_fn); } while (0)
+#define dept_wait(m, w_f, ip, w_fn, ne, s)		do { (void)(w_fn); } while (0)
 #define dept_stage_wait(m, w_f, w_fn, ne)		do { (void)(w_fn); } while (0)
 #define dept_ask_event_wait_commit(ip)			do { } while (0)
 #define dept_clean_stage()				do { } while (0)
@@ -556,11 +574,13 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 #define dept_ecxt_exit(m, e_f, ip)			do { } while (0)
 #define dept_split_map_each_init(me)			do { } while (0)
 #define dept_split_map_common_init(mc, k, n)		do { (void)(n); (void)(k); } while (0)
-#define dept_wait_split_map(me, mc, ip, w_fn, ne)	do { } while (0)
+#define dept_wait_split_map(me, mc, ip, w_fn, ne, s)	do { } while (0)
 #define dept_event_split_map(me, mc, ip, e_fn)		do { } while (0)
 #define dept_ask_event_split_map(me, mc)		do { } while (0)
 #define dept_kernel_enter()				do { } while (0)
 #define dept_work_enter()				do { } while (0)
+#define dept_sched_enter()				do { } while (0)
+#define dept_sched_exit()				do { } while (0)
 #define dept_ecxt_enter_nokeep(m)			do { } while (0)
 #define dept_key_init(k)				do { (void)(k); } while (0)
 #define dept_key_destroy(k)				do { (void)(k); } while (0)
diff --git a/include/linux/dept_page.h b/include/linux/dept_page.h
index d2d093d..4af3b2d 100644
--- a/include/linux/dept_page.h
+++ b/include/linux/dept_page.h
@@ -20,7 +20,7 @@
 								\
 	if (likely(me))						\
 		dept_wait_split_map(me, &pglocked_mc, _RET_IP_, \
-				    __func__, 0);		\
+				    __func__, 0, true);		\
 } while (0)
 
 #define dept_pglocked_set_bit(f)				\
@@ -46,7 +46,7 @@
 								\
 	if (likely(me))						\
 		dept_wait_split_map(me, &pgwriteback_mc, _RET_IP_,\
-				    __func__, 0);		\
+				    __func__, 0, true);		\
 } while (0)
 
 #define dept_pgwriteback_set_bit(f)				\
diff --git a/include/linux/dept_sdt.h b/include/linux/dept_sdt.h
index 49763cd..14a1720 100644
--- a/include/linux/dept_sdt.h
+++ b/include/linux/dept_sdt.h
@@ -29,7 +29,13 @@
 #define sdt_wait(m)							\
 	do {								\
 		dept_ask_event(m);					\
-		dept_wait(m, 1UL, _THIS_IP_, "wait", 0);		\
+		dept_wait(m, 1UL, _THIS_IP_, "wait", 0, true);		\
+	} while (0)
+
+#define sdt_wait_spin(m)						\
+	do {								\
+		dept_ask_event(m);					\
+		dept_wait(m, 1UL, _THIS_IP_, "wait", 0, false);		\
 	} while (0)
 /*
  * This will be committed in __schedule() when it actually gets to
@@ -47,6 +53,7 @@
 #define sdt_map_init(m)			do { } while (0)
 #define sdt_map_init_key(m, k)		do { (void)(k); } while (0)
 #define sdt_wait(m)			do { } while (0)
+#define sdt_wait_spin(m)		do { } while (0)
 #define sdt_wait_prepare(m)		do { } while (0)
 #define sdt_wait_finish()		do { } while (0)
 #define sdt_ecxt_enter(m)		do { } while (0)
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index b0e097f..b2119f4 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -575,12 +575,12 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 #define spin_acquire(l, s, t, i)					\
 do {									\
 	lock_acquire_exclusive(l, s, t, NULL, i);			\
-	dept_spin_lock(&(l)->dmap, s, t, NULL, "spin_unlock", i);	\
+	dept_spin_lock(&(l)->dmap, s, t, NULL, "spin_unlock", i, false);\
 } while (0)
 #define spin_acquire_nest(l, s, t, n, i)				\
 do {									\
 	lock_acquire_exclusive(l, s, t, n, i);				\
-	dept_spin_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "spin_unlock", i); \
+	dept_spin_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "spin_unlock", i, false); \
 } while (0)
 #define spin_release(l, i)						\
 do {									\
@@ -591,16 +591,16 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 #define rwlock_acquire(l, s, t, i)					\
 do {									\
 	lock_acquire_exclusive(l, s, t, NULL, i);			\
-	dept_rwlock_wlock(&(l)->dmap, s, t, NULL, "write_unlock", i);	\
+	dept_rwlock_wlock(&(l)->dmap, s, t, NULL, "write_unlock", i, false);\
 } while (0)
 #define rwlock_acquire_read(l, s, t, i)					\
 do {									\
 	if (read_lock_is_recursive()) {				\
 		lock_acquire_shared_recursive(l, s, t, NULL, i);	\
-		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 0);\
+		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 0, false);\
 	} else {							\
 		lock_acquire_shared(l, s, t, NULL, i);			\
-		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 1);\
+		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 1, false);\
 	}								\
 } while (0)
 #define rwlock_release(l, i)						\
@@ -614,6 +614,48 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 	dept_rwlock_runlock(&(l)->dmap, i);				\
 } while (0)
 
+#define rt_spin_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_spin_lock(&(l)->dmap, s, t, NULL, "spin_unlock", i, true);	\
+} while (0)
+#define rt_spin_acquire_nest(l, s, t, n, i)				\
+do {									\
+	lock_acquire_exclusive(l, s, t, n, i);				\
+	dept_spin_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "spin_unlock", i, true);\
+} while (0)
+#define rt_spin_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_spin_unlock(&(l)->dmap, i);				\
+} while (0)
+
+#define rt_rwlock_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_rwlock_wlock(&(l)->dmap, s, t, NULL, "write_unlock", i, true);\
+} while (0)
+#define rt_rwlock_acquire_read(l, s, t, i)					\
+do {									\
+	if (read_lock_is_recursive()) {				\
+		lock_acquire_shared_recursive(l, s, t, NULL, i);	\
+		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 0, true);\
+	} else {							\
+		lock_acquire_shared(l, s, t, NULL, i);			\
+		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 1, true);\
+	}								\
+} while (0)
+#define rt_rwlock_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_rwlock_wunlock(&(l)->dmap, i);				\
+} while (0)
+#define rt_rwlock_release_read(l, i)					\
+do {									\
+	lock_release(l, i);						\
+	dept_rwlock_runlock(&(l)->dmap, i);				\
+} while (0)
+
 #define seqcount_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define seqcount_acquire_read(l, s, t, i)	lock_acquire_shared_recursive(l, s, t, NULL, i)
 #define seqcount_release(l, i)			lock_release(l, i)
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index b699cf41..e98a912 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -84,7 +84,7 @@ struct mutex {
 	} else if (n) {							\
 		dept_ecxt_enter_nokeep(m);				\
 	} else {							\
-		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_wait(m, 1UL, ip, __func__, ne, true);		\
 		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
 	}								\
 } while (0)
diff --git a/include/linux/rwlock.h b/include/linux/rwlock.h
index bbab144..68a083d 100644
--- a/include/linux/rwlock.h
+++ b/include/linux/rwlock.h
@@ -33,25 +33,25 @@
 #define DEPT_EVT_RWLOCK_W		(1UL << 1)
 #define DEPT_EVT_RWLOCK_RW		(DEPT_EVT_RWLOCK_R | DEPT_EVT_RWLOCK_W)
 
-#define dept_rwlock_wlock(m, ne, t, n, e_fn, ip)			\
+#define dept_rwlock_wlock(m, ne, t, n, e_fn, ip, s)			\
 do {									\
 	if (t) {							\
 		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_W, ip, __func__, e_fn, ne);\
 	} else if (n) {							\
 		dept_ecxt_enter_nokeep(m);				\
 	} else {							\
-		dept_wait(m, DEPT_EVT_RWLOCK_RW, ip, __func__, ne);	\
+		dept_wait(m, DEPT_EVT_RWLOCK_RW, ip, __func__, ne, s);	\
 		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_W, ip, __func__, e_fn, ne);\
 	}								\
 } while (0)
-#define dept_rwlock_rlock(m, ne, t, n, e_fn, ip, q)			\
+#define dept_rwlock_rlock(m, ne, t, n, e_fn, ip, q, s)			\
 do {									\
 	if (t) {							\
 		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_R, ip, __func__, e_fn, ne);\
 	} else if (n) {							\
 		dept_ecxt_enter_nokeep(m);				\
 	} else {							\
-		dept_wait(m, (q) ? DEPT_EVT_RWLOCK_RW : DEPT_EVT_RWLOCK_W, ip, __func__, ne);\
+		dept_wait(m, (q) ? DEPT_EVT_RWLOCK_RW : DEPT_EVT_RWLOCK_W, ip, __func__, ne, s);\
 		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_R, ip, __func__, e_fn, ne);\
 	}								\
 } while (0)
@@ -64,8 +64,8 @@
 	dept_ecxt_exit(m, DEPT_EVT_RWLOCK_R, ip);			\
 } while (0)
 #else
-#define dept_rwlock_wlock(m, ne, t, n, e_fn, ip)	do { } while (0)
-#define dept_rwlock_rlock(m, ne, t, n, e_fn, ip, q)	do { } while (0)
+#define dept_rwlock_wlock(m, ne, t, n, e_fn, ip, s)	do { } while (0)
+#define dept_rwlock_rlock(m, ne, t, n, e_fn, ip, q, s)	do { } while (0)
 #define dept_rwlock_wunlock(m, ip)			do { } while (0)
 #define dept_rwlock_runlock(m, ip)			do { } while (0)
 #endif
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index ed4c34e..fd86dfd5 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -41,7 +41,7 @@
 	} else if (n) {							\
 		dept_ecxt_enter_nokeep(m);				\
 	} else {							\
-		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_wait(m, 1UL, ip, __func__, ne, true);		\
 		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
 	}								\
 } while (0)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 47c3379..ac2ac40 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -25,7 +25,7 @@
 
 #ifdef CONFIG_DEPT
 #define DEPT_EVT_ALL		((1UL << DEPT_MAX_SUBCLASSES_EVT) - 1)
-#define dept_seq_wait(m, ip)	dept_wait(m, DEPT_EVT_ALL, ip, __func__, 0)
+#define dept_seq_wait(m, ip)	dept_wait(m, DEPT_EVT_ALL, ip, __func__, 0, false)
 #define dept_seq_writebegin(m, ip)				\
 do {								\
 	dept_ecxt_enter(m, 1UL, ip, __func__, "write_seqcount_end", 0);\
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 191fb99..a78aaa3 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -96,14 +96,14 @@
 #endif
 
 #ifdef CONFIG_DEPT
-#define dept_spin_lock(m, ne, t, n, e_fn, ip)				\
+#define dept_spin_lock(m, ne, t, n, e_fn, ip, s)			\
 do {									\
 	if (t) {							\
 		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
 	} else if (n) {							\
 		dept_ecxt_enter_nokeep(m);				\
 	} else {							\
-		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_wait(m, 1UL, ip, __func__, ne, s);			\
 		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
 	}								\
 } while (0)
@@ -112,8 +112,8 @@
 	dept_ecxt_exit(m, 1UL, ip);					\
 } while (0)
 #else
-#define dept_spin_lock(m, ne, t, n, e_fn, ip)	do { } while (0)
-#define dept_spin_unlock(m, ip)			do { } while (0)
+#define dept_spin_lock(m, ne, t, n, e_fn, ip, s)	do { } while (0)
+#define dept_spin_unlock(m, ip)				do { } while (0)
 #endif
 
 #ifdef CONFIG_DEBUG_SPINLOCK
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 2bc6259..14dc33b 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -1425,6 +1425,13 @@ static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
 	struct dept_dep *d;
 	int i;
 
+	/*
+	 * It's meaningless to track dependencies between sleeps and
+	 * events triggered within __schedule().
+	 */
+	if (e->in_sched && w->sleep)
+		return;
+
 	if (lookup_dep(fc, tc))
 		return;
 
@@ -1469,7 +1476,7 @@ static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
 static atomic_t wgen = ATOMIC_INIT(1);
 
 static void add_wait(struct dept_class *c, unsigned long ip,
-		     const char *w_fn, int ne)
+		     const char *w_fn, int ne, bool sleep)
 {
 	struct dept_task *dt = dept_task();
 	struct dept_wait *w;
@@ -1485,6 +1492,7 @@ static void add_wait(struct dept_class *c, unsigned long ip,
 	w->wait_ip = ip;
 	w->wait_fn = w_fn;
 	w->wait_stack = get_current_stack();
+	w->sleep = sleep;
 
 	cxt = cur_cxt();
 	if (cxt == DEPT_CXT_HIRQ || cxt == DEPT_CXT_SIRQ)
@@ -1538,6 +1546,7 @@ static bool add_ecxt(void *obj, struct dept_class *c, unsigned long ip,
 	e->ecxt_stack = ip && rich_stack ? get_current_stack() : NULL;
 	e->event_fn = e_fn;
 	e->ecxt_fn = c_fn;
+	e->in_sched = dt->in_sched;
 
 	eh = dt->ecxt_held + (dt->ecxt_held_pos++);
 	eh->ecxt = get_ecxt(e);
@@ -1906,6 +1915,16 @@ void dept_hardirq_enter(void)
 	dt->cxt_id[DEPT_CXT_HIRQ] += (1UL << DEPT_CXTS_NR);
 }
 
+void dept_sched_enter(void)
+{
+	dept_task()->in_sched = true;
+}
+
+void dept_sched_exit(void)
+{
+	dept_task()->in_sched = false;
+}
+
 /*
  * DEPT API
  * =====================================================================
@@ -2119,7 +2138,8 @@ static struct dept_class *check_new_class(struct dept_key *local,
 }
 
 static void __dept_wait(struct dept_map *m, unsigned long w_f,
-			unsigned long ip, const char *w_fn, int ne)
+			unsigned long ip, const char *w_fn, int ne,
+			bool sleep)
 {
 	int e;
 
@@ -2142,12 +2162,12 @@ static void __dept_wait(struct dept_map *m, unsigned long w_f,
 		if (!c)
 			continue;
 
-		add_wait(c, ip, w_fn, ne);
+		add_wait(c, ip, w_fn, ne, sleep);
 	}
 }
 
 void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip,
-	       const char *w_fn, int ne)
+	       const char *w_fn, int ne, bool sleep)
 {
 	struct dept_task *dt = dept_task();
 	unsigned long flags;
@@ -2163,7 +2183,7 @@ void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip,
 
 	flags = dept_enter();
 
-	__dept_wait(m, w_f, ip, w_fn, ne);
+	__dept_wait(m, w_f, ip, w_fn, ne, sleep);
 
 	dept_exit(flags);
 }
@@ -2296,7 +2316,7 @@ void dept_ask_event_wait_commit(unsigned long ip)
 	wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
 	WRITE_ONCE(m->wgen, wg);
 
-	__dept_wait(m, w_f, ip, w_fn, ne);
+	__dept_wait(m, w_f, ip, w_fn, ne, true);
 exit:
 	dept_exit(flags);
 }
@@ -2526,7 +2546,8 @@ void dept_split_map_common_init(struct dept_map_common *mc,
 
 void dept_wait_split_map(struct dept_map_each *me,
 			 struct dept_map_common *mc,
-			 unsigned long ip, const char *w_fn, int ne)
+			 unsigned long ip, const char *w_fn, int ne,
+			 bool sleep)
 {
 	struct dept_task *dt = dept_task();
 	struct dept_class *c;
@@ -2547,7 +2568,7 @@ void dept_wait_split_map(struct dept_map_each *me,
 	k = mc->keys ?: &mc->keys_local;
 	c = check_new_class(&mc->keys_local, k, 0, 0UL, mc->name);
 	if (c)
-		add_wait(c, ip, w_fn, ne);
+		add_wait(c, ip, w_fn, ne, sleep);
 
 	dept_exit(flags);
 }
diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c
index 48a19ed..2e1d0e5 100644
--- a/kernel/locking/spinlock_rt.c
+++ b/kernel/locking/spinlock_rt.c
@@ -51,7 +51,7 @@ static __always_inline void __rt_spin_lock(spinlock_t *lock)
 
 void __sched rt_spin_lock(spinlock_t *lock)
 {
-	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	rt_spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
 	__rt_spin_lock(lock);
 }
 EXPORT_SYMBOL(rt_spin_lock);
@@ -59,7 +59,7 @@ void __sched rt_spin_lock(spinlock_t *lock)
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 void __sched rt_spin_lock_nested(spinlock_t *lock, int subclass)
 {
-	spin_acquire(&lock->dep_map, subclass, 0, _RET_IP_);
+	rt_spin_acquire(&lock->dep_map, subclass, 0, _RET_IP_);
 	__rt_spin_lock(lock);
 }
 EXPORT_SYMBOL(rt_spin_lock_nested);
@@ -67,7 +67,7 @@ void __sched rt_spin_lock_nested(spinlock_t *lock, int subclass)
 void __sched rt_spin_lock_nest_lock(spinlock_t *lock,
 				    struct lockdep_map *nest_lock)
 {
-	spin_acquire_nest(&lock->dep_map, 0, 0, nest_lock, _RET_IP_);
+	rt_spin_acquire_nest(&lock->dep_map, 0, 0, nest_lock, _RET_IP_);
 	__rt_spin_lock(lock);
 }
 EXPORT_SYMBOL(rt_spin_lock_nest_lock);
@@ -75,7 +75,7 @@ void __sched rt_spin_lock_nest_lock(spinlock_t *lock,
 
 void __sched rt_spin_unlock(spinlock_t *lock)
 {
-	spin_release(&lock->dep_map, _RET_IP_);
+	rt_spin_release(&lock->dep_map, _RET_IP_);
 	migrate_enable();
 	rcu_read_unlock();
 
@@ -104,7 +104,7 @@ static __always_inline int __rt_spin_trylock(spinlock_t *lock)
 		ret = rt_mutex_slowtrylock(&lock->lock);
 
 	if (ret) {
-		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+		rt_spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
 		rcu_read_lock();
 		migrate_disable();
 	}
@@ -197,7 +197,7 @@ int __sched rt_read_trylock(rwlock_t *rwlock)
 
 	ret = rwbase_read_trylock(&rwlock->rwbase);
 	if (ret) {
-		rwlock_acquire_read(&rwlock->dep_map, 0, 1, _RET_IP_);
+		rt_rwlock_acquire_read(&rwlock->dep_map, 0, 1, _RET_IP_);
 		rcu_read_lock();
 		migrate_disable();
 	}
@@ -211,7 +211,7 @@ int __sched rt_write_trylock(rwlock_t *rwlock)
 
 	ret = rwbase_write_trylock(&rwlock->rwbase);
 	if (ret) {
-		rwlock_acquire(&rwlock->dep_map, 0, 1, _RET_IP_);
+		rt_rwlock_acquire(&rwlock->dep_map, 0, 1, _RET_IP_);
 		rcu_read_lock();
 		migrate_disable();
 	}
@@ -222,7 +222,7 @@ int __sched rt_write_trylock(rwlock_t *rwlock)
 void __sched rt_read_lock(rwlock_t *rwlock)
 {
 	rtlock_might_resched();
-	rwlock_acquire_read(&rwlock->dep_map, 0, 0, _RET_IP_);
+	rt_rwlock_acquire_read(&rwlock->dep_map, 0, 0, _RET_IP_);
 	rwbase_read_lock(&rwlock->rwbase, TASK_RTLOCK_WAIT);
 	rcu_read_lock();
 	migrate_disable();
@@ -232,7 +232,7 @@ void __sched rt_read_lock(rwlock_t *rwlock)
 void __sched rt_write_lock(rwlock_t *rwlock)
 {
 	rtlock_might_resched();
-	rwlock_acquire(&rwlock->dep_map, 0, 0, _RET_IP_);
+	rt_rwlock_acquire(&rwlock->dep_map, 0, 0, _RET_IP_);
 	rwbase_write_lock(&rwlock->rwbase, TASK_RTLOCK_WAIT);
 	rcu_read_lock();
 	migrate_disable();
@@ -243,7 +243,7 @@ void __sched rt_write_lock(rwlock_t *rwlock)
 void __sched rt_write_lock_nested(rwlock_t *rwlock, int subclass)
 {
 	rtlock_might_resched();
-	rwlock_acquire(&rwlock->dep_map, subclass, 0, _RET_IP_);
+	rt_rwlock_acquire(&rwlock->dep_map, subclass, 0, _RET_IP_);
 	rwbase_write_lock(&rwlock->rwbase, TASK_RTLOCK_WAIT);
 	rcu_read_lock();
 	migrate_disable();
@@ -253,7 +253,7 @@ void __sched rt_write_lock_nested(rwlock_t *rwlock, int subclass)
 
 void __sched rt_read_unlock(rwlock_t *rwlock)
 {
-	rwlock_release(&rwlock->dep_map, _RET_IP_);
+	rt_rwlock_release(&rwlock->dep_map, _RET_IP_);
 	migrate_enable();
 	rcu_read_unlock();
 	rwbase_read_unlock(&rwlock->rwbase, TASK_RTLOCK_WAIT);
@@ -262,7 +262,7 @@ void __sched rt_read_unlock(rwlock_t *rwlock)
 
 void __sched rt_write_unlock(rwlock_t *rwlock)
 {
-	rwlock_release(&rwlock->dep_map, _RET_IP_);
+	rt_rwlock_release(&rwlock->dep_map, _RET_IP_);
 	rcu_read_unlock();
 	migrate_enable();
 	rwbase_write_unlock(&rwlock->rwbase);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5784b07..cb42f52 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6272,6 +6272,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	struct rq *rq;
 	int cpu;
 
+	dept_sched_enter();
 	cpu = smp_processor_id();
 	rq = cpu_rq(cpu);
 	prev = rq->curr;
@@ -6401,6 +6402,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 		__balance_callbacks(rq);
 		raw_spin_rq_unlock_irq(rq);
 	}
+	dept_sched_exit();
 }
 
 void __noreturn do_task_dead(void)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 20/21] dept: Do not add dependencies between events within scheduler and sleeps
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

A sleep is not a wait that prevents the events within __schedule(). It
rather goes through __schedule() so all the events are going to be
triggered while sleeping. So they don't have any dependencies with each
other.

So distinguished sleep type of wait from the other type e.i. spinning
and made it skip building dependencies between sleep type of waits and
the events within __schedule().

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/completion.h   |  2 +-
 include/linux/dept.h         | 28 ++++++++++++++++++++----
 include/linux/dept_page.h    |  4 ++--
 include/linux/dept_sdt.h     |  9 +++++++-
 include/linux/lockdep.h      | 52 +++++++++++++++++++++++++++++++++++++++-----
 include/linux/mutex.h        |  2 +-
 include/linux/rwlock.h       | 12 +++++-----
 include/linux/rwsem.h        |  2 +-
 include/linux/seqlock.h      |  2 +-
 include/linux/spinlock.h     |  8 +++----
 kernel/dependency/dept.c     | 37 ++++++++++++++++++++++++-------
 kernel/locking/spinlock_rt.c | 24 ++++++++++----------
 kernel/sched/core.c          |  2 ++
 13 files changed, 138 insertions(+), 46 deletions(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 358c656..2dade27 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -36,7 +36,7 @@ struct completion {
 #define dept_wfc_wait(m, ip)						\
 do {									\
 	dept_ask_event(m);						\
-	dept_wait(m, 1UL, ip, __func__, 0);				\
+	dept_wait(m, 1UL, ip, __func__, 0, true);			\
 } while (0)
 #define dept_wfc_complete(m, ip)		dept_event(m, 1UL, ip, __func__)
 #define dept_wfc_enter(m, ip)			dept_ecxt_enter(m, 1UL, ip, "completion_context_enter", "complete", 0)
diff --git a/include/linux/dept.h b/include/linux/dept.h
index 3027121..28db897 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -170,6 +170,11 @@ struct dept_ecxt {
 	 */
 	unsigned long			event_ip;
 	struct dept_stack		*event_stack;
+
+	/*
+	 * whether the event is triggered within __schedule()
+	 */
+	bool				in_sched;
 };
 
 struct dept_wait {
@@ -208,6 +213,11 @@ struct dept_wait {
 	 */
 	unsigned long			wait_ip;
 	struct dept_stack		*wait_stack;
+
+	/*
+	 * spin or sleep
+	 */
+	bool				sleep;
 };
 
 struct dept_dep {
@@ -460,6 +470,11 @@ struct dept_task {
 	 */
 	bool				hardirqs_enabled;
 	bool				softirqs_enabled;
+
+	/*
+	 * whether the current task is in __schedule()
+	 */
+	bool				in_sched;
 };
 
 #define DEPT_TASK_INITIALIZER(t)				\
@@ -480,6 +495,7 @@ struct dept_task {
 	.missing_ecxt = 0,					\
 	.hardirqs_enabled = false,				\
 	.softirqs_enabled = false,				\
+	.in_sched = false,					\
 }
 
 extern void dept_on(void);
@@ -492,7 +508,7 @@ struct dept_task {
 extern void dept_map_reinit(struct dept_map *m);
 extern void dept_map_nocheck(struct dept_map *m);
 
-extern void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip, const char *w_fn, int ne);
+extern void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip, const char *w_fn, int ne, bool sleep);
 extern void dept_stage_wait(struct dept_map *m, unsigned long w_f, const char *w_fn, int ne);
 extern void dept_ask_event_wait_commit(unsigned long ip);
 extern void dept_clean_stage(void);
@@ -502,11 +518,13 @@ struct dept_task {
 extern void dept_ecxt_exit(struct dept_map *m, unsigned long e_f, unsigned long ip);
 extern void dept_split_map_each_init(struct dept_map_each *me);
 extern void dept_split_map_common_init(struct dept_map_common *mc, struct dept_key *k, const char *n);
-extern void dept_wait_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *w_fn, int ne);
+extern void dept_wait_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *w_fn, int ne, bool sleep);
 extern void dept_event_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *e_fn);
 extern void dept_ask_event_split_map(struct dept_map_each *me, struct dept_map_common *mc);
 extern void dept_kernel_enter(void);
 extern void dept_work_enter(void);
+extern void dept_sched_enter(void);
+extern void dept_sched_exit(void);
 
 static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 {
@@ -546,7 +564,7 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 #define dept_map_reinit(m)				do { } while (0)
 #define dept_map_nocheck(m)				do { } while (0)
 
-#define dept_wait(m, w_f, ip, w_fn, ne)			do { (void)(w_fn); } while (0)
+#define dept_wait(m, w_f, ip, w_fn, ne, s)		do { (void)(w_fn); } while (0)
 #define dept_stage_wait(m, w_f, w_fn, ne)		do { (void)(w_fn); } while (0)
 #define dept_ask_event_wait_commit(ip)			do { } while (0)
 #define dept_clean_stage()				do { } while (0)
@@ -556,11 +574,13 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 #define dept_ecxt_exit(m, e_f, ip)			do { } while (0)
 #define dept_split_map_each_init(me)			do { } while (0)
 #define dept_split_map_common_init(mc, k, n)		do { (void)(n); (void)(k); } while (0)
-#define dept_wait_split_map(me, mc, ip, w_fn, ne)	do { } while (0)
+#define dept_wait_split_map(me, mc, ip, w_fn, ne, s)	do { } while (0)
 #define dept_event_split_map(me, mc, ip, e_fn)		do { } while (0)
 #define dept_ask_event_split_map(me, mc)		do { } while (0)
 #define dept_kernel_enter()				do { } while (0)
 #define dept_work_enter()				do { } while (0)
+#define dept_sched_enter()				do { } while (0)
+#define dept_sched_exit()				do { } while (0)
 #define dept_ecxt_enter_nokeep(m)			do { } while (0)
 #define dept_key_init(k)				do { (void)(k); } while (0)
 #define dept_key_destroy(k)				do { (void)(k); } while (0)
diff --git a/include/linux/dept_page.h b/include/linux/dept_page.h
index d2d093d..4af3b2d 100644
--- a/include/linux/dept_page.h
+++ b/include/linux/dept_page.h
@@ -20,7 +20,7 @@
 								\
 	if (likely(me))						\
 		dept_wait_split_map(me, &pglocked_mc, _RET_IP_, \
-				    __func__, 0);		\
+				    __func__, 0, true);		\
 } while (0)
 
 #define dept_pglocked_set_bit(f)				\
@@ -46,7 +46,7 @@
 								\
 	if (likely(me))						\
 		dept_wait_split_map(me, &pgwriteback_mc, _RET_IP_,\
-				    __func__, 0);		\
+				    __func__, 0, true);		\
 } while (0)
 
 #define dept_pgwriteback_set_bit(f)				\
diff --git a/include/linux/dept_sdt.h b/include/linux/dept_sdt.h
index 49763cd..14a1720 100644
--- a/include/linux/dept_sdt.h
+++ b/include/linux/dept_sdt.h
@@ -29,7 +29,13 @@
 #define sdt_wait(m)							\
 	do {								\
 		dept_ask_event(m);					\
-		dept_wait(m, 1UL, _THIS_IP_, "wait", 0);		\
+		dept_wait(m, 1UL, _THIS_IP_, "wait", 0, true);		\
+	} while (0)
+
+#define sdt_wait_spin(m)						\
+	do {								\
+		dept_ask_event(m);					\
+		dept_wait(m, 1UL, _THIS_IP_, "wait", 0, false);		\
 	} while (0)
 /*
  * This will be committed in __schedule() when it actually gets to
@@ -47,6 +53,7 @@
 #define sdt_map_init(m)			do { } while (0)
 #define sdt_map_init_key(m, k)		do { (void)(k); } while (0)
 #define sdt_wait(m)			do { } while (0)
+#define sdt_wait_spin(m)		do { } while (0)
 #define sdt_wait_prepare(m)		do { } while (0)
 #define sdt_wait_finish()		do { } while (0)
 #define sdt_ecxt_enter(m)		do { } while (0)
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index b0e097f..b2119f4 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -575,12 +575,12 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 #define spin_acquire(l, s, t, i)					\
 do {									\
 	lock_acquire_exclusive(l, s, t, NULL, i);			\
-	dept_spin_lock(&(l)->dmap, s, t, NULL, "spin_unlock", i);	\
+	dept_spin_lock(&(l)->dmap, s, t, NULL, "spin_unlock", i, false);\
 } while (0)
 #define spin_acquire_nest(l, s, t, n, i)				\
 do {									\
 	lock_acquire_exclusive(l, s, t, n, i);				\
-	dept_spin_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "spin_unlock", i); \
+	dept_spin_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "spin_unlock", i, false); \
 } while (0)
 #define spin_release(l, i)						\
 do {									\
@@ -591,16 +591,16 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 #define rwlock_acquire(l, s, t, i)					\
 do {									\
 	lock_acquire_exclusive(l, s, t, NULL, i);			\
-	dept_rwlock_wlock(&(l)->dmap, s, t, NULL, "write_unlock", i);	\
+	dept_rwlock_wlock(&(l)->dmap, s, t, NULL, "write_unlock", i, false);\
 } while (0)
 #define rwlock_acquire_read(l, s, t, i)					\
 do {									\
 	if (read_lock_is_recursive()) {				\
 		lock_acquire_shared_recursive(l, s, t, NULL, i);	\
-		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 0);\
+		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 0, false);\
 	} else {							\
 		lock_acquire_shared(l, s, t, NULL, i);			\
-		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 1);\
+		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 1, false);\
 	}								\
 } while (0)
 #define rwlock_release(l, i)						\
@@ -614,6 +614,48 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 	dept_rwlock_runlock(&(l)->dmap, i);				\
 } while (0)
 
+#define rt_spin_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_spin_lock(&(l)->dmap, s, t, NULL, "spin_unlock", i, true);	\
+} while (0)
+#define rt_spin_acquire_nest(l, s, t, n, i)				\
+do {									\
+	lock_acquire_exclusive(l, s, t, n, i);				\
+	dept_spin_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "spin_unlock", i, true);\
+} while (0)
+#define rt_spin_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_spin_unlock(&(l)->dmap, i);				\
+} while (0)
+
+#define rt_rwlock_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_rwlock_wlock(&(l)->dmap, s, t, NULL, "write_unlock", i, true);\
+} while (0)
+#define rt_rwlock_acquire_read(l, s, t, i)					\
+do {									\
+	if (read_lock_is_recursive()) {				\
+		lock_acquire_shared_recursive(l, s, t, NULL, i);	\
+		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 0, true);\
+	} else {							\
+		lock_acquire_shared(l, s, t, NULL, i);			\
+		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 1, true);\
+	}								\
+} while (0)
+#define rt_rwlock_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_rwlock_wunlock(&(l)->dmap, i);				\
+} while (0)
+#define rt_rwlock_release_read(l, i)					\
+do {									\
+	lock_release(l, i);						\
+	dept_rwlock_runlock(&(l)->dmap, i);				\
+} while (0)
+
 #define seqcount_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define seqcount_acquire_read(l, s, t, i)	lock_acquire_shared_recursive(l, s, t, NULL, i)
 #define seqcount_release(l, i)			lock_release(l, i)
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index b699cf41..e98a912 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -84,7 +84,7 @@ struct mutex {
 	} else if (n) {							\
 		dept_ecxt_enter_nokeep(m);				\
 	} else {							\
-		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_wait(m, 1UL, ip, __func__, ne, true);		\
 		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
 	}								\
 } while (0)
diff --git a/include/linux/rwlock.h b/include/linux/rwlock.h
index bbab144..68a083d 100644
--- a/include/linux/rwlock.h
+++ b/include/linux/rwlock.h
@@ -33,25 +33,25 @@
 #define DEPT_EVT_RWLOCK_W		(1UL << 1)
 #define DEPT_EVT_RWLOCK_RW		(DEPT_EVT_RWLOCK_R | DEPT_EVT_RWLOCK_W)
 
-#define dept_rwlock_wlock(m, ne, t, n, e_fn, ip)			\
+#define dept_rwlock_wlock(m, ne, t, n, e_fn, ip, s)			\
 do {									\
 	if (t) {							\
 		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_W, ip, __func__, e_fn, ne);\
 	} else if (n) {							\
 		dept_ecxt_enter_nokeep(m);				\
 	} else {							\
-		dept_wait(m, DEPT_EVT_RWLOCK_RW, ip, __func__, ne);	\
+		dept_wait(m, DEPT_EVT_RWLOCK_RW, ip, __func__, ne, s);	\
 		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_W, ip, __func__, e_fn, ne);\
 	}								\
 } while (0)
-#define dept_rwlock_rlock(m, ne, t, n, e_fn, ip, q)			\
+#define dept_rwlock_rlock(m, ne, t, n, e_fn, ip, q, s)			\
 do {									\
 	if (t) {							\
 		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_R, ip, __func__, e_fn, ne);\
 	} else if (n) {							\
 		dept_ecxt_enter_nokeep(m);				\
 	} else {							\
-		dept_wait(m, (q) ? DEPT_EVT_RWLOCK_RW : DEPT_EVT_RWLOCK_W, ip, __func__, ne);\
+		dept_wait(m, (q) ? DEPT_EVT_RWLOCK_RW : DEPT_EVT_RWLOCK_W, ip, __func__, ne, s);\
 		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_R, ip, __func__, e_fn, ne);\
 	}								\
 } while (0)
@@ -64,8 +64,8 @@
 	dept_ecxt_exit(m, DEPT_EVT_RWLOCK_R, ip);			\
 } while (0)
 #else
-#define dept_rwlock_wlock(m, ne, t, n, e_fn, ip)	do { } while (0)
-#define dept_rwlock_rlock(m, ne, t, n, e_fn, ip, q)	do { } while (0)
+#define dept_rwlock_wlock(m, ne, t, n, e_fn, ip, s)	do { } while (0)
+#define dept_rwlock_rlock(m, ne, t, n, e_fn, ip, q, s)	do { } while (0)
 #define dept_rwlock_wunlock(m, ip)			do { } while (0)
 #define dept_rwlock_runlock(m, ip)			do { } while (0)
 #endif
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index ed4c34e..fd86dfd5 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -41,7 +41,7 @@
 	} else if (n) {							\
 		dept_ecxt_enter_nokeep(m);				\
 	} else {							\
-		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_wait(m, 1UL, ip, __func__, ne, true);		\
 		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
 	}								\
 } while (0)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 47c3379..ac2ac40 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -25,7 +25,7 @@
 
 #ifdef CONFIG_DEPT
 #define DEPT_EVT_ALL		((1UL << DEPT_MAX_SUBCLASSES_EVT) - 1)
-#define dept_seq_wait(m, ip)	dept_wait(m, DEPT_EVT_ALL, ip, __func__, 0)
+#define dept_seq_wait(m, ip)	dept_wait(m, DEPT_EVT_ALL, ip, __func__, 0, false)
 #define dept_seq_writebegin(m, ip)				\
 do {								\
 	dept_ecxt_enter(m, 1UL, ip, __func__, "write_seqcount_end", 0);\
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 191fb99..a78aaa3 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -96,14 +96,14 @@
 #endif
 
 #ifdef CONFIG_DEPT
-#define dept_spin_lock(m, ne, t, n, e_fn, ip)				\
+#define dept_spin_lock(m, ne, t, n, e_fn, ip, s)			\
 do {									\
 	if (t) {							\
 		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
 	} else if (n) {							\
 		dept_ecxt_enter_nokeep(m);				\
 	} else {							\
-		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_wait(m, 1UL, ip, __func__, ne, s);			\
 		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
 	}								\
 } while (0)
@@ -112,8 +112,8 @@
 	dept_ecxt_exit(m, 1UL, ip);					\
 } while (0)
 #else
-#define dept_spin_lock(m, ne, t, n, e_fn, ip)	do { } while (0)
-#define dept_spin_unlock(m, ip)			do { } while (0)
+#define dept_spin_lock(m, ne, t, n, e_fn, ip, s)	do { } while (0)
+#define dept_spin_unlock(m, ip)				do { } while (0)
 #endif
 
 #ifdef CONFIG_DEBUG_SPINLOCK
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 2bc6259..14dc33b 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -1425,6 +1425,13 @@ static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
 	struct dept_dep *d;
 	int i;
 
+	/*
+	 * It's meaningless to track dependencies between sleeps and
+	 * events triggered within __schedule().
+	 */
+	if (e->in_sched && w->sleep)
+		return;
+
 	if (lookup_dep(fc, tc))
 		return;
 
@@ -1469,7 +1476,7 @@ static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
 static atomic_t wgen = ATOMIC_INIT(1);
 
 static void add_wait(struct dept_class *c, unsigned long ip,
-		     const char *w_fn, int ne)
+		     const char *w_fn, int ne, bool sleep)
 {
 	struct dept_task *dt = dept_task();
 	struct dept_wait *w;
@@ -1485,6 +1492,7 @@ static void add_wait(struct dept_class *c, unsigned long ip,
 	w->wait_ip = ip;
 	w->wait_fn = w_fn;
 	w->wait_stack = get_current_stack();
+	w->sleep = sleep;
 
 	cxt = cur_cxt();
 	if (cxt == DEPT_CXT_HIRQ || cxt == DEPT_CXT_SIRQ)
@@ -1538,6 +1546,7 @@ static bool add_ecxt(void *obj, struct dept_class *c, unsigned long ip,
 	e->ecxt_stack = ip && rich_stack ? get_current_stack() : NULL;
 	e->event_fn = e_fn;
 	e->ecxt_fn = c_fn;
+	e->in_sched = dt->in_sched;
 
 	eh = dt->ecxt_held + (dt->ecxt_held_pos++);
 	eh->ecxt = get_ecxt(e);
@@ -1906,6 +1915,16 @@ void dept_hardirq_enter(void)
 	dt->cxt_id[DEPT_CXT_HIRQ] += (1UL << DEPT_CXTS_NR);
 }
 
+void dept_sched_enter(void)
+{
+	dept_task()->in_sched = true;
+}
+
+void dept_sched_exit(void)
+{
+	dept_task()->in_sched = false;
+}
+
 /*
  * DEPT API
  * =====================================================================
@@ -2119,7 +2138,8 @@ static struct dept_class *check_new_class(struct dept_key *local,
 }
 
 static void __dept_wait(struct dept_map *m, unsigned long w_f,
-			unsigned long ip, const char *w_fn, int ne)
+			unsigned long ip, const char *w_fn, int ne,
+			bool sleep)
 {
 	int e;
 
@@ -2142,12 +2162,12 @@ static void __dept_wait(struct dept_map *m, unsigned long w_f,
 		if (!c)
 			continue;
 
-		add_wait(c, ip, w_fn, ne);
+		add_wait(c, ip, w_fn, ne, sleep);
 	}
 }
 
 void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip,
-	       const char *w_fn, int ne)
+	       const char *w_fn, int ne, bool sleep)
 {
 	struct dept_task *dt = dept_task();
 	unsigned long flags;
@@ -2163,7 +2183,7 @@ void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip,
 
 	flags = dept_enter();
 
-	__dept_wait(m, w_f, ip, w_fn, ne);
+	__dept_wait(m, w_f, ip, w_fn, ne, sleep);
 
 	dept_exit(flags);
 }
@@ -2296,7 +2316,7 @@ void dept_ask_event_wait_commit(unsigned long ip)
 	wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
 	WRITE_ONCE(m->wgen, wg);
 
-	__dept_wait(m, w_f, ip, w_fn, ne);
+	__dept_wait(m, w_f, ip, w_fn, ne, true);
 exit:
 	dept_exit(flags);
 }
@@ -2526,7 +2546,8 @@ void dept_split_map_common_init(struct dept_map_common *mc,
 
 void dept_wait_split_map(struct dept_map_each *me,
 			 struct dept_map_common *mc,
-			 unsigned long ip, const char *w_fn, int ne)
+			 unsigned long ip, const char *w_fn, int ne,
+			 bool sleep)
 {
 	struct dept_task *dt = dept_task();
 	struct dept_class *c;
@@ -2547,7 +2568,7 @@ void dept_wait_split_map(struct dept_map_each *me,
 	k = mc->keys ?: &mc->keys_local;
 	c = check_new_class(&mc->keys_local, k, 0, 0UL, mc->name);
 	if (c)
-		add_wait(c, ip, w_fn, ne);
+		add_wait(c, ip, w_fn, ne, sleep);
 
 	dept_exit(flags);
 }
diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c
index 48a19ed..2e1d0e5 100644
--- a/kernel/locking/spinlock_rt.c
+++ b/kernel/locking/spinlock_rt.c
@@ -51,7 +51,7 @@ static __always_inline void __rt_spin_lock(spinlock_t *lock)
 
 void __sched rt_spin_lock(spinlock_t *lock)
 {
-	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	rt_spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
 	__rt_spin_lock(lock);
 }
 EXPORT_SYMBOL(rt_spin_lock);
@@ -59,7 +59,7 @@ void __sched rt_spin_lock(spinlock_t *lock)
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 void __sched rt_spin_lock_nested(spinlock_t *lock, int subclass)
 {
-	spin_acquire(&lock->dep_map, subclass, 0, _RET_IP_);
+	rt_spin_acquire(&lock->dep_map, subclass, 0, _RET_IP_);
 	__rt_spin_lock(lock);
 }
 EXPORT_SYMBOL(rt_spin_lock_nested);
@@ -67,7 +67,7 @@ void __sched rt_spin_lock_nested(spinlock_t *lock, int subclass)
 void __sched rt_spin_lock_nest_lock(spinlock_t *lock,
 				    struct lockdep_map *nest_lock)
 {
-	spin_acquire_nest(&lock->dep_map, 0, 0, nest_lock, _RET_IP_);
+	rt_spin_acquire_nest(&lock->dep_map, 0, 0, nest_lock, _RET_IP_);
 	__rt_spin_lock(lock);
 }
 EXPORT_SYMBOL(rt_spin_lock_nest_lock);
@@ -75,7 +75,7 @@ void __sched rt_spin_lock_nest_lock(spinlock_t *lock,
 
 void __sched rt_spin_unlock(spinlock_t *lock)
 {
-	spin_release(&lock->dep_map, _RET_IP_);
+	rt_spin_release(&lock->dep_map, _RET_IP_);
 	migrate_enable();
 	rcu_read_unlock();
 
@@ -104,7 +104,7 @@ static __always_inline int __rt_spin_trylock(spinlock_t *lock)
 		ret = rt_mutex_slowtrylock(&lock->lock);
 
 	if (ret) {
-		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+		rt_spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
 		rcu_read_lock();
 		migrate_disable();
 	}
@@ -197,7 +197,7 @@ int __sched rt_read_trylock(rwlock_t *rwlock)
 
 	ret = rwbase_read_trylock(&rwlock->rwbase);
 	if (ret) {
-		rwlock_acquire_read(&rwlock->dep_map, 0, 1, _RET_IP_);
+		rt_rwlock_acquire_read(&rwlock->dep_map, 0, 1, _RET_IP_);
 		rcu_read_lock();
 		migrate_disable();
 	}
@@ -211,7 +211,7 @@ int __sched rt_write_trylock(rwlock_t *rwlock)
 
 	ret = rwbase_write_trylock(&rwlock->rwbase);
 	if (ret) {
-		rwlock_acquire(&rwlock->dep_map, 0, 1, _RET_IP_);
+		rt_rwlock_acquire(&rwlock->dep_map, 0, 1, _RET_IP_);
 		rcu_read_lock();
 		migrate_disable();
 	}
@@ -222,7 +222,7 @@ int __sched rt_write_trylock(rwlock_t *rwlock)
 void __sched rt_read_lock(rwlock_t *rwlock)
 {
 	rtlock_might_resched();
-	rwlock_acquire_read(&rwlock->dep_map, 0, 0, _RET_IP_);
+	rt_rwlock_acquire_read(&rwlock->dep_map, 0, 0, _RET_IP_);
 	rwbase_read_lock(&rwlock->rwbase, TASK_RTLOCK_WAIT);
 	rcu_read_lock();
 	migrate_disable();
@@ -232,7 +232,7 @@ void __sched rt_read_lock(rwlock_t *rwlock)
 void __sched rt_write_lock(rwlock_t *rwlock)
 {
 	rtlock_might_resched();
-	rwlock_acquire(&rwlock->dep_map, 0, 0, _RET_IP_);
+	rt_rwlock_acquire(&rwlock->dep_map, 0, 0, _RET_IP_);
 	rwbase_write_lock(&rwlock->rwbase, TASK_RTLOCK_WAIT);
 	rcu_read_lock();
 	migrate_disable();
@@ -243,7 +243,7 @@ void __sched rt_write_lock(rwlock_t *rwlock)
 void __sched rt_write_lock_nested(rwlock_t *rwlock, int subclass)
 {
 	rtlock_might_resched();
-	rwlock_acquire(&rwlock->dep_map, subclass, 0, _RET_IP_);
+	rt_rwlock_acquire(&rwlock->dep_map, subclass, 0, _RET_IP_);
 	rwbase_write_lock(&rwlock->rwbase, TASK_RTLOCK_WAIT);
 	rcu_read_lock();
 	migrate_disable();
@@ -253,7 +253,7 @@ void __sched rt_write_lock_nested(rwlock_t *rwlock, int subclass)
 
 void __sched rt_read_unlock(rwlock_t *rwlock)
 {
-	rwlock_release(&rwlock->dep_map, _RET_IP_);
+	rt_rwlock_release(&rwlock->dep_map, _RET_IP_);
 	migrate_enable();
 	rcu_read_unlock();
 	rwbase_read_unlock(&rwlock->rwbase, TASK_RTLOCK_WAIT);
@@ -262,7 +262,7 @@ void __sched rt_read_unlock(rwlock_t *rwlock)
 
 void __sched rt_write_unlock(rwlock_t *rwlock)
 {
-	rwlock_release(&rwlock->dep_map, _RET_IP_);
+	rt_rwlock_release(&rwlock->dep_map, _RET_IP_);
 	rcu_read_unlock();
 	migrate_enable();
 	rwbase_write_unlock(&rwlock->rwbase);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5784b07..cb42f52 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6272,6 +6272,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	struct rq *rq;
 	int cpu;
 
+	dept_sched_enter();
 	cpu = smp_processor_id();
 	rq = cpu_rq(cpu);
 	prev = rq->curr;
@@ -6401,6 +6402,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 		__balance_callbacks(rq);
 		raw_spin_rq_unlock_irq(rq);
 	}
+	dept_sched_exit();
 }
 
 void __noreturn do_task_dead(void)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 21/21] dept: Unstage wait when tagging a normal sleep wait
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04  8:17   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

Staging a wait and commit have been introduced to handle conditional
sleeps that can be determined in __schedule() whether it actually goes
to sleep or not. With this feature, actual wait tagging is delayed
until __schedule().

Unfortunately, an ambiguity arises when a normal sleep wait that doesn't
require staging and commit, is involved in the middle of handling a
conditional sleep e.g. between prepare_to_wait_*() and __schedule(),
which is a very rare case tho.

So let it give up handling the conditional sleep by unstaging it
unconditionally when a normal sleep wait gets involved, to avoid the
ambiguity.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/dependency/dept.c | 55 +++++++++++++++++++++++++++++++++++-------------
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 14dc33b..ce6d5b3 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -2166,6 +2166,21 @@ static void __dept_wait(struct dept_map *m, unsigned long w_f,
 	}
 }
 
+static inline void stage_map(struct dept_task *dt, struct dept_map *m)
+{
+	dt->stage_m = m;
+}
+
+static inline void unstage_map(struct dept_task *dt)
+{
+	dt->stage_m = NULL;
+}
+
+static inline struct dept_map *staged_map(struct dept_task *dt)
+{
+	return dt->stage_m;
+}
+
 void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip,
 	       const char *w_fn, int ne, bool sleep)
 {
@@ -2183,27 +2198,24 @@ void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip,
 
 	flags = dept_enter();
 
+	/*
+	 * There's no way to distinguish between a staged wait and this
+	 * one, in the middle of handling a wait that requires staging
+	 * and commit in __schedule().
+	 *
+	 * The wait that has been tagged dept_wait() with sleep == true
+	 * should ignore the staged wait in __schedule() if it exists,
+	 * to avoid the ambiguity. It can be done by unstaging it.
+	 */
+	if (sleep)
+		unstage_map(dt);
+
 	__dept_wait(m, w_f, ip, w_fn, ne, sleep);
 
 	dept_exit(flags);
 }
 EXPORT_SYMBOL_GPL(dept_wait);
 
-static inline void stage_map(struct dept_task *dt, struct dept_map *m)
-{
-	dt->stage_m = m;
-}
-
-static inline void unstage_map(struct dept_task *dt)
-{
-	dt->stage_m = NULL;
-}
-
-static inline struct dept_map *staged_map(struct dept_task *dt)
-{
-	return dt->stage_m;
-}
-
 void dept_stage_wait(struct dept_map *m, unsigned long w_f,
 		     const char *w_fn, int ne)
 {
@@ -2565,6 +2577,19 @@ void dept_wait_split_map(struct dept_map_each *me,
 
 	flags = dept_enter();
 
+	/*
+	 * There's no way to distinguish between a staged wait and this
+	 * one, in the middle of handling a wait that requires staging
+	 * and commit in __schedule().
+	 *
+	 * The wait that has been tagged dept_wait_split_map() with
+	 * sleep == true should ignore the staged wait in __schedule()
+	 * if it exists, to avoid the ambiguity. It can be done by
+	 * unstaging it.
+	 */
+	if (sleep)
+		unstage_map(dt);
+
 	k = mc->keys ?: &mc->keys_local;
 	c = check_new_class(&mc->keys_local, k, 0, 0UL, mc->name);
 	if (c)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH RFC v6 21/21] dept: Unstage wait when tagging a normal sleep wait
@ 2022-05-04  8:17   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-04  8:17 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Staging a wait and commit have been introduced to handle conditional
sleeps that can be determined in __schedule() whether it actually goes
to sleep or not. With this feature, actual wait tagging is delayed
until __schedule().

Unfortunately, an ambiguity arises when a normal sleep wait that doesn't
require staging and commit, is involved in the middle of handling a
conditional sleep e.g. between prepare_to_wait_*() and __schedule(),
which is a very rare case tho.

So let it give up handling the conditional sleep by unstaging it
unconditionally when a normal sleep wait gets involved, to avoid the
ambiguity.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/dependency/dept.c | 55 +++++++++++++++++++++++++++++++++++-------------
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 14dc33b..ce6d5b3 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -2166,6 +2166,21 @@ static void __dept_wait(struct dept_map *m, unsigned long w_f,
 	}
 }
 
+static inline void stage_map(struct dept_task *dt, struct dept_map *m)
+{
+	dt->stage_m = m;
+}
+
+static inline void unstage_map(struct dept_task *dt)
+{
+	dt->stage_m = NULL;
+}
+
+static inline struct dept_map *staged_map(struct dept_task *dt)
+{
+	return dt->stage_m;
+}
+
 void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip,
 	       const char *w_fn, int ne, bool sleep)
 {
@@ -2183,27 +2198,24 @@ void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip,
 
 	flags = dept_enter();
 
+	/*
+	 * There's no way to distinguish between a staged wait and this
+	 * one, in the middle of handling a wait that requires staging
+	 * and commit in __schedule().
+	 *
+	 * The wait that has been tagged dept_wait() with sleep == true
+	 * should ignore the staged wait in __schedule() if it exists,
+	 * to avoid the ambiguity. It can be done by unstaging it.
+	 */
+	if (sleep)
+		unstage_map(dt);
+
 	__dept_wait(m, w_f, ip, w_fn, ne, sleep);
 
 	dept_exit(flags);
 }
 EXPORT_SYMBOL_GPL(dept_wait);
 
-static inline void stage_map(struct dept_task *dt, struct dept_map *m)
-{
-	dt->stage_m = m;
-}
-
-static inline void unstage_map(struct dept_task *dt)
-{
-	dt->stage_m = NULL;
-}
-
-static inline struct dept_map *staged_map(struct dept_task *dt)
-{
-	return dt->stage_m;
-}
-
 void dept_stage_wait(struct dept_map *m, unsigned long w_f,
 		     const char *w_fn, int ne)
 {
@@ -2565,6 +2577,19 @@ void dept_wait_split_map(struct dept_map_each *me,
 
 	flags = dept_enter();
 
+	/*
+	 * There's no way to distinguish between a staged wait and this
+	 * one, in the middle of handling a wait that requires staging
+	 * and commit in __schedule().
+	 *
+	 * The wait that has been tagged dept_wait_split_map() with
+	 * sleep == true should ignore the staged wait in __schedule()
+	 * if it exists, to avoid the ambiguity. It can be done by
+	 * unstaging it.
+	 */
+	if (sleep)
+		unstage_map(dt);
+
 	k = mc->keys ?: &mc->keys_local;
 	c = check_new_class(&mc->keys_local, k, 0, 0UL, mc->name);
 	if (c)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 16/21] dept: Distinguish each work from another
  2022-05-04  8:17   ` Byungchul Park
@ 2022-05-04 11:23     ` Sergey Shtylyov
  -1 siblings, 0 replies; 105+ messages in thread
From: Sergey Shtylyov @ 2022-05-04 11:23 UTC (permalink / raw)
  To: Byungchul Park, torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

Hello!

On 5/4/22 11:17 AM, Byungchul Park wrote:

> Workqueue already provides concurrency control. By that, any wait in a
> work doesn't prevents events in other works with the control enabled.
> Thus, each work would better be considered a different context.
> 
> So let Dept assign a different context id to each work.
> 
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[...]
> diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
> index 18e5951..6707313 100644
> --- a/kernel/dependency/dept.c
> +++ b/kernel/dependency/dept.c
> @@ -1844,6 +1844,16 @@ void dept_enirq_transition(unsigned long ip)
>  	dept_exit(flags);
>  }
>  
> +/*
> + * Assign a different context id to each work.
> + */
> +void dept_work_enter(void)
> +{
> +	struct dept_task *dt = dept_task();
> +
> +	dt->cxt_id[DEPT_CXT_PROCESS] += (1UL << DEPT_CXTS_NR);

   Parens around << unnecessary...

[...]

MBR, Sergey

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 16/21] dept: Distinguish each work from another
@ 2022-05-04 11:23     ` Sergey Shtylyov
  0 siblings, 0 replies; 105+ messages in thread
From: Sergey Shtylyov @ 2022-05-04 11:23 UTC (permalink / raw)
  To: Byungchul Park, torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Hello!

On 5/4/22 11:17 AM, Byungchul Park wrote:

> Workqueue already provides concurrency control. By that, any wait in a
> work doesn't prevents events in other works with the control enabled.
> Thus, each work would better be considered a different context.
> 
> So let Dept assign a different context id to each work.
> 
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[...]
> diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
> index 18e5951..6707313 100644
> --- a/kernel/dependency/dept.c
> +++ b/kernel/dependency/dept.c
> @@ -1844,6 +1844,16 @@ void dept_enirq_transition(unsigned long ip)
>  	dept_exit(flags);
>  }
>  
> +/*
> + * Assign a different context id to each work.
> + */
> +void dept_work_enter(void)
> +{
> +	struct dept_task *dt = dept_task();
> +
> +	dt->cxt_id[DEPT_CXT_PROCESS] += (1UL << DEPT_CXTS_NR);

   Parens around << unnecessary...

[...]

MBR, Sergey

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 02/21] dept: Implement Dept(Dependency Tracker)
  2022-05-04  8:17   ` Byungchul Park
  (?)
@ 2022-05-04 13:29   ` kernel test robot
  -1 siblings, 0 replies; 105+ messages in thread
From: kernel test robot @ 2022-05-04 13:29 UTC (permalink / raw)
  To: Byungchul Park; +Cc: llvm, kbuild-all

Hi Byungchul,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on tip/sched/core]
[also build test WARNING on linux/master linus/master v5.18-rc5]
[cannot apply to tip/locking/core hnaz-mm/master next-20220504]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Byungchul-Park/DEPT-Dependency-Tracker/20220504-165133
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 1a90bfd220201fbe050dfc15deaac20ca5f15638
config: x86_64-randconfig-a011-20220502 (https://download.01.org/0day-ci/archive/20220504/202205042158.8iGx4fJj-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 363b3a645a1e30011cc8da624f13dac5fd915628)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/8ab7541af4844f7183b42fa0bdab884e87c2b1b8
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Byungchul-Park/DEPT-Dependency-Tracker/20220504-165133
        git checkout 8ab7541af4844f7183b42fa0bdab884e87c2b1b8
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash kernel/locking/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> kernel/locking/lockdep.c:4247: warning: expecting prototype for lockdep_hardirqs_on_prepare(). Prototype was for __lockdep_hardirqs_on_prepare() instead


vim +4247 kernel/locking/lockdep.c

dd4e5d3ac4a76b kernel/lockdep.c         Peter Zijlstra 2011-06-21  4236  
c86e9b987cea3d kernel/locking/lockdep.c Peter Zijlstra 2020-03-18  4237  /**
c86e9b987cea3d kernel/locking/lockdep.c Peter Zijlstra 2020-03-18  4238   * lockdep_hardirqs_on_prepare - Prepare for enabling interrupts
c86e9b987cea3d kernel/locking/lockdep.c Peter Zijlstra 2020-03-18  4239   * @ip:		Caller address
c86e9b987cea3d kernel/locking/lockdep.c Peter Zijlstra 2020-03-18  4240   *
c86e9b987cea3d kernel/locking/lockdep.c Peter Zijlstra 2020-03-18  4241   * Invoked before a possible transition to RCU idle from exit to user or
c86e9b987cea3d kernel/locking/lockdep.c Peter Zijlstra 2020-03-18  4242   * guest mode. This ensures that all RCU operations are done before RCU
c86e9b987cea3d kernel/locking/lockdep.c Peter Zijlstra 2020-03-18  4243   * stops watching. After the RCU transition lockdep_hardirqs_on() has to be
c86e9b987cea3d kernel/locking/lockdep.c Peter Zijlstra 2020-03-18  4244   * invoked to set the final state.
c86e9b987cea3d kernel/locking/lockdep.c Peter Zijlstra 2020-03-18  4245   */
8ab7541af4844f kernel/locking/lockdep.c Byungchul Park 2022-05-04  4246  void __lockdep_hardirqs_on_prepare(unsigned long ip)
dd4e5d3ac4a76b kernel/lockdep.c         Peter Zijlstra 2011-06-21 @4247  {
859d069ee1ddd8 kernel/locking/lockdep.c Peter Zijlstra 2020-05-27  4248  	if (unlikely(!debug_locks))
859d069ee1ddd8 kernel/locking/lockdep.c Peter Zijlstra 2020-05-27  4249  		return;
859d069ee1ddd8 kernel/locking/lockdep.c Peter Zijlstra 2020-05-27  4250  
859d069ee1ddd8 kernel/locking/lockdep.c Peter Zijlstra 2020-05-27  4251  	/*
859d069ee1ddd8 kernel/locking/lockdep.c Peter Zijlstra 2020-05-27  4252  	 * NMIs do not (and cannot) track lock dependencies, nothing to do.
859d069ee1ddd8 kernel/locking/lockdep.c Peter Zijlstra 2020-05-27  4253  	 */
859d069ee1ddd8 kernel/locking/lockdep.c Peter Zijlstra 2020-05-27  4254  	if (unlikely(in_nmi()))
859d069ee1ddd8 kernel/locking/lockdep.c Peter Zijlstra 2020-05-27  4255  		return;
859d069ee1ddd8 kernel/locking/lockdep.c Peter Zijlstra 2020-05-27  4256  
f8e48a3dca060e kernel/locking/lockdep.c Peter Zijlstra 2020-10-22  4257  	if (unlikely(this_cpu_read(lockdep_recursion)))
dd4e5d3ac4a76b kernel/lockdep.c         Peter Zijlstra 2011-06-21  4258  		return;
dd4e5d3ac4a76b kernel/lockdep.c         Peter Zijlstra 2011-06-21  4259  
f9ad4a5f3f20be kernel/locking/lockdep.c Peter Zijlstra 2020-05-27  4260  	if (unlikely(lockdep_hardirqs_enabled())) {
7d36b26be0f3c6 kernel/lockdep.c         Peter Zijlstra 2011-07-26  4261  		/*
7d36b26be0f3c6 kernel/lockdep.c         Peter Zijlstra 2011-07-26  4262  		 * Neither irq nor preemption are disabled here
7d36b26be0f3c6 kernel/lockdep.c         Peter Zijlstra 2011-07-26  4263  		 * so this is racy by nature but losing one hit
7d36b26be0f3c6 kernel/lockdep.c         Peter Zijlstra 2011-07-26  4264  		 * in a stat is not a big deal.
7d36b26be0f3c6 kernel/lockdep.c         Peter Zijlstra 2011-07-26  4265  		 */
7d36b26be0f3c6 kernel/lockdep.c         Peter Zijlstra 2011-07-26  4266  		__debug_atomic_inc(redundant_hardirqs_on);
7d36b26be0f3c6 kernel/lockdep.c         Peter Zijlstra 2011-07-26  4267  		return;
7d36b26be0f3c6 kernel/lockdep.c         Peter Zijlstra 2011-07-26  4268  	}
7d36b26be0f3c6 kernel/lockdep.c         Peter Zijlstra 2011-07-26  4269  
0119fee449f501 kernel/lockdep.c         Peter Zijlstra 2011-09-02  4270  	/*
0119fee449f501 kernel/lockdep.c         Peter Zijlstra 2011-09-02  4271  	 * We're enabling irqs and according to our state above irqs weren't
0119fee449f501 kernel/lockdep.c         Peter Zijlstra 2011-09-02  4272  	 * already enabled, yet we find the hardware thinks they are in fact
0119fee449f501 kernel/lockdep.c         Peter Zijlstra 2011-09-02  4273  	 * enabled.. someone messed up their IRQ state tracing.
0119fee449f501 kernel/lockdep.c         Peter Zijlstra 2011-09-02  4274  	 */
dd4e5d3ac4a76b kernel/lockdep.c         Peter Zijlstra 2011-06-21  4275  	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
dd4e5d3ac4a76b kernel/lockdep.c         Peter Zijlstra 2011-06-21  4276  		return;
dd4e5d3ac4a76b kernel/lockdep.c         Peter Zijlstra 2011-06-21  4277  
0119fee449f501 kernel/lockdep.c         Peter Zijlstra 2011-09-02  4278  	/*
0119fee449f501 kernel/lockdep.c         Peter Zijlstra 2011-09-02  4279  	 * See the fine text that goes along with this variable definition.
0119fee449f501 kernel/lockdep.c         Peter Zijlstra 2011-09-02  4280  	 */
d671002be6bdd7 kernel/locking/lockdep.c zhengbin       2019-04-29  4281  	if (DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled))
7d36b26be0f3c6 kernel/lockdep.c         Peter Zijlstra 2011-07-26  4282  		return;
7d36b26be0f3c6 kernel/lockdep.c         Peter Zijlstra 2011-07-26  4283  
0119fee449f501 kernel/lockdep.c         Peter Zijlstra 2011-09-02  4284  	/*
0119fee449f501 kernel/lockdep.c         Peter Zijlstra 2011-09-02  4285  	 * Can't allow enabling interrupts while in an interrupt handler,
0119fee449f501 kernel/lockdep.c         Peter Zijlstra 2011-09-02  4286  	 * that's general bad form and such. Recursion, limited stack etc..
0119fee449f501 kernel/lockdep.c         Peter Zijlstra 2011-09-02  4287  	 */
f9ad4a5f3f20be kernel/locking/lockdep.c Peter Zijlstra 2020-05-27  4288  	if (DEBUG_LOCKS_WARN_ON(lockdep_hardirq_context()))
7d36b26be0f3c6 kernel/lockdep.c         Peter Zijlstra 2011-07-26  4289  		return;
7d36b26be0f3c6 kernel/lockdep.c         Peter Zijlstra 2011-07-26  4290  
c86e9b987cea3d kernel/locking/lockdep.c Peter Zijlstra 2020-03-18  4291  	current->hardirq_chain_key = current->curr_chain_key;
c86e9b987cea3d kernel/locking/lockdep.c Peter Zijlstra 2020-03-18  4292  
4d004099a668c4 kernel/locking/lockdep.c Peter Zijlstra 2020-10-02  4293  	lockdep_recursion_inc();
c86e9b987cea3d kernel/locking/lockdep.c Peter Zijlstra 2020-03-18  4294  	__trace_hardirqs_on_caller();
10476e6304222c kernel/locking/lockdep.c Peter Zijlstra 2020-03-13  4295  	lockdep_recursion_finish();
dd4e5d3ac4a76b kernel/lockdep.c         Peter Zijlstra 2011-06-21  4296  }
8ab7541af4844f kernel/locking/lockdep.c Byungchul Park 2022-05-04  4297  EXPORT_SYMBOL_GPL(__lockdep_hardirqs_on_prepare);
c86e9b987cea3d kernel/locking/lockdep.c Peter Zijlstra 2020-03-18  4298  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-04 18:17   ` Linus Torvalds
  -1 siblings, 0 replies; 105+ messages in thread
From: Linus Torvalds @ 2022-05-04 18:17 UTC (permalink / raw)
  To: Byungchul Park
  Cc: hamohammed.sa, Jan Kara, Peter Zijlstra, Daniel Vetter,
	Amir Goldstein, Dave Chinner, dri-devel, Chris Wilson,
	J. Bruce Fields, linux-ide, Andreas Dilger, Joel Fernandes,
	42.hyeyoo, Christoph Lameter, Will Deacon, duyuyang, Sasha Levin,
	paolo.valente, Damien Le Moal, Matthew Wilcox, Christoph Hellwig,
	Dave Airlie, Ingo Molnar, Darrick J. Wong, Vladimir Davydov,
	David Rientjes, Dennis Zhou, Ext4 Developers List, Linux-MM,
	ngupta, johannes.berg, jack, Dan Williams, Josef Bacik,
	Steven Rostedt, linux-block, linux-fsdevel, Jerome Glisse,
	Al Viro, Thomas Gleixner, Michal Hocko, Vlastimil Babka,
	melissa.srw, sj, Theodore Ts'o, rodrigosiqueiramelo,
	kernel-team, Greg Kroah-Hartman, Jeff Layton,
	Linux Kernel Mailing List, Pekka Enberg, Minchan Kim,
	Johannes Weiner, Tejun Heo, Andrew Morton

On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
>
> Hi Linus and folks,
>
> I've been developing a tool for detecting deadlock possibilities by
> tracking wait/event rather than lock(?) acquisition order to try to
> cover all synchonization machanisms.

So what is the actual status of reports these days?

Last time I looked at some reports, it gave a lot of false positives
due to mis-understanding prepare_to_sleep().

For this all to make sense, it would need to not have false positives
(or at least a very small number of them together with a way to sanely
get rid of them), and also have a track record of finding things that
lockdep doesn't.

Maybe such reports have been sent out with the current situation, and
I haven't seen them.

                 Linus

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-04 18:17   ` Linus Torvalds
  0 siblings, 0 replies; 105+ messages in thread
From: Linus Torvalds @ 2022-05-04 18:17 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Damien Le Moal, linux-ide, Andreas Dilger, Ext4 Developers List,
	Ingo Molnar, Linux Kernel Mailing List, Peter Zijlstra,
	Will Deacon, Thomas Gleixner, Steven Rostedt, Joel Fernandes,
	Sasha Levin, Daniel Vetter, Chris Wilson, duyuyang,
	johannes.berg, Tejun Heo, Theodore Ts'o, Matthew Wilcox,
	Dave Chinner, Amir Goldstein, J. Bruce Fields,
	Greg Kroah-Hartman, kernel-team, Linux-MM, Andrew Morton,
	Michal Hocko, Minchan Kim, Johannes Weiner, Vladimir Davydov, sj,
	Jerome Glisse, Dennis Zhou, Christoph Lameter, Pekka Enberg,
	David Rientjes, Vlastimil Babka, ngupta, linux-block,
	paolo.valente, Josef Bacik, linux-fsdevel, Al Viro, Jan Kara,
	jack, Jeff Layton, Dan Williams, Christoph Hellwig,
	Darrick J. Wong, dri-devel, Dave Airlie, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
>
> Hi Linus and folks,
>
> I've been developing a tool for detecting deadlock possibilities by
> tracking wait/event rather than lock(?) acquisition order to try to
> cover all synchonization machanisms.

So what is the actual status of reports these days?

Last time I looked at some reports, it gave a lot of false positives
due to mis-understanding prepare_to_sleep().

For this all to make sense, it would need to not have false positives
(or at least a very small number of them together with a way to sanely
get rid of them), and also have a track record of finding things that
lockdep doesn't.

Maybe such reports have been sent out with the current situation, and
I haven't seen them.

                 Linus

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-04 18:17   ` Linus Torvalds
@ 2022-05-06  0:11     ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-06  0:11 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa, 42.hyeyoo

Linus wrote:
>
> On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
> >
> > Hi Linus and folks,
> >
> > I've been developing a tool for detecting deadlock possibilities by
> > tracking wait/event rather than lock(?) acquisition order to try to
> > cover all synchonization machanisms.
> 
> So what is the actual status of reports these days?
> 
> Last time I looked at some reports, it gave a lot of false positives
> due to mis-understanding prepare_to_sleep().

Yes, it was. I handled the case in the following way:

1. Stage the wait at prepare_to_sleep(), which might be used at commit.
   Which has yet to be an actual wait that Dept considers.
2. If the condition for sleep is true, the wait will be committed at
   __schedule(). The wait becomes an actual one that Dept considers.
3. If the condition is false and the task gets back to TASK_RUNNING,
   clean(=reset) the staged wait.

That way, Dept only works with what actually hits to __schedule() for
the waits through sleep.

> For this all to make sense, it would need to not have false positives
> (or at least a very small number of them together with a way to sanely

Yes. I agree with you. I got rid of them that way I described above.

> get rid of them), and also have a track record of finding things that
> lockdep doesn't.

I have some reports that wait_for_completion or waitqueue is involved.
It's worth noting those are not tracked by Lockdep. I'm checking if
those are true positive or not. I will share those reports once I get
more convinced for that.

> Maybe such reports have been sent out with the current situation, and
> I haven't seen them.

Dept reports usually have been sent to me privately, not in LKML. As I
told you, I'm planning to share them.

	Byungchul

> 
>                  Linus
> 

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-06  0:11     ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-06  0:11 UTC (permalink / raw)
  To: torvalds
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, ngupta, johannes.berg, jack,
	dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

Linus wrote:
>
> On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
> >
> > Hi Linus and folks,
> >
> > I've been developing a tool for detecting deadlock possibilities by
> > tracking wait/event rather than lock(?) acquisition order to try to
> > cover all synchonization machanisms.
> 
> So what is the actual status of reports these days?
> 
> Last time I looked at some reports, it gave a lot of false positives
> due to mis-understanding prepare_to_sleep().

Yes, it was. I handled the case in the following way:

1. Stage the wait at prepare_to_sleep(), which might be used at commit.
   Which has yet to be an actual wait that Dept considers.
2. If the condition for sleep is true, the wait will be committed at
   __schedule(). The wait becomes an actual one that Dept considers.
3. If the condition is false and the task gets back to TASK_RUNNING,
   clean(=reset) the staged wait.

That way, Dept only works with what actually hits to __schedule() for
the waits through sleep.

> For this all to make sense, it would need to not have false positives
> (or at least a very small number of them together with a way to sanely

Yes. I agree with you. I got rid of them that way I described above.

> get rid of them), and also have a track record of finding things that
> lockdep doesn't.

I have some reports that wait_for_completion or waitqueue is involved.
It's worth noting those are not tracked by Lockdep. I'm checking if
those are true positive or not. I will share those reports once I get
more convinced for that.

> Maybe such reports have been sent out with the current situation, and
> I haven't seen them.

Dept reports usually have been sent to me privately, not in LKML. As I
told you, I'm planning to share them.

	Byungchul

> 
>                  Linus
> 

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-06  0:11     ` Byungchul Park
@ 2022-05-07  7:20       ` Hyeonggon Yoo
  -1 siblings, 0 replies; 105+ messages in thread
From: Hyeonggon Yoo @ 2022-05-07  7:20 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Fri, May 06, 2022 at 09:11:35AM +0900, Byungchul Park wrote:
> Linus wrote:
> >
> > On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
> > >
> > > Hi Linus and folks,
> > >
> > > I've been developing a tool for detecting deadlock possibilities by
> > > tracking wait/event rather than lock(?) acquisition order to try to
> > > cover all synchonization machanisms.
> > 
> > So what is the actual status of reports these days?
> > 
> > Last time I looked at some reports, it gave a lot of false positives
> > due to mis-understanding prepare_to_sleep().
> 
> Yes, it was. I handled the case in the following way:
> 
> 1. Stage the wait at prepare_to_sleep(), which might be used at commit.
>    Which has yet to be an actual wait that Dept considers.
> 2. If the condition for sleep is true, the wait will be committed at
>    __schedule(). The wait becomes an actual one that Dept considers.
> 3. If the condition is false and the task gets back to TASK_RUNNING,
>    clean(=reset) the staged wait.
> 
> That way, Dept only works with what actually hits to __schedule() for
> the waits through sleep.
> 
> > For this all to make sense, it would need to not have false positives
> > (or at least a very small number of them together with a way to sanely
> 
> Yes. I agree with you. I got rid of them that way I described above.
>

IMHO DEPT should not report what lockdep allows (Not talking about
wait events). I mean lockdep allows some kind of nested locks but
DEPT reports them.

When I was collecting reports from DEPT on varous configurations,
Most of them was report of down_write_nested(), which is allowed in
lockdep.

DEPT should not report at least what we know it's not a real deadlock.
Otherwise there will be reports that is never fixed, which is quite
unpleasant and reporters cannot examine all of them if it's real deadlock
or not.

> > get rid of them), and also have a track record of finding things that
> > lockdep doesn't.
> 
> I have some reports that wait_for_completion or waitqueue is involved.
> It's worth noting those are not tracked by Lockdep. I'm checking if
> those are true positive or not. I will share those reports once I get
> more convinced for that.
> 
> > Maybe such reports have been sent out with the current situation, and
> > I haven't seen them.
> 
> Dept reports usually have been sent to me privately, not in LKML. As I
> told you, I'm planning to share them.
> 
> 	Byungchul
> 
> > 
> >                  Linus
> > 

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-07  7:20       ` Hyeonggon Yoo
  0 siblings, 0 replies; 105+ messages in thread
From: Hyeonggon Yoo @ 2022-05-07  7:20 UTC (permalink / raw)
  To: Byungchul Park
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel, cl,
	will, duyuyang, sashal, paolo.valente, damien.lemoal, willy, hch,
	airlied, mingo, djwong, vdavydov.dev, rientjes, dennis,
	linux-ext4, ngupta, johannes.berg, jack, dan.j.williams, josef,
	rostedt, linux-block, linux-fsdevel, jglisse, viro, tglx, mhocko,
	vbabka, melissa.srw, sj, tytso, rodrigosiqueiramelo, kernel-team,
	gregkh, jlayton, linux-kernel, penberg, minchan, hannes, tj,
	akpm, torvalds

On Fri, May 06, 2022 at 09:11:35AM +0900, Byungchul Park wrote:
> Linus wrote:
> >
> > On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
> > >
> > > Hi Linus and folks,
> > >
> > > I've been developing a tool for detecting deadlock possibilities by
> > > tracking wait/event rather than lock(?) acquisition order to try to
> > > cover all synchonization machanisms.
> > 
> > So what is the actual status of reports these days?
> > 
> > Last time I looked at some reports, it gave a lot of false positives
> > due to mis-understanding prepare_to_sleep().
> 
> Yes, it was. I handled the case in the following way:
> 
> 1. Stage the wait at prepare_to_sleep(), which might be used at commit.
>    Which has yet to be an actual wait that Dept considers.
> 2. If the condition for sleep is true, the wait will be committed at
>    __schedule(). The wait becomes an actual one that Dept considers.
> 3. If the condition is false and the task gets back to TASK_RUNNING,
>    clean(=reset) the staged wait.
> 
> That way, Dept only works with what actually hits to __schedule() for
> the waits through sleep.
> 
> > For this all to make sense, it would need to not have false positives
> > (or at least a very small number of them together with a way to sanely
> 
> Yes. I agree with you. I got rid of them that way I described above.
>

IMHO DEPT should not report what lockdep allows (Not talking about
wait events). I mean lockdep allows some kind of nested locks but
DEPT reports them.

When I was collecting reports from DEPT on varous configurations,
Most of them was report of down_write_nested(), which is allowed in
lockdep.

DEPT should not report at least what we know it's not a real deadlock.
Otherwise there will be reports that is never fixed, which is quite
unpleasant and reporters cannot examine all of them if it's real deadlock
or not.

> > get rid of them), and also have a track record of finding things that
> > lockdep doesn't.
> 
> I have some reports that wait_for_completion or waitqueue is involved.
> It's worth noting those are not tracked by Lockdep. I'm checking if
> those are true positive or not. I will share those reports once I get
> more convinced for that.
> 
> > Maybe such reports have been sent out with the current situation, and
> > I haven't seen them.
> 
> Dept reports usually have been sent to me privately, not in LKML. As I
> told you, I'm planning to share them.
> 
> 	Byungchul
> 
> > 
> >                  Linus
> > 

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-07  7:20       ` Hyeonggon Yoo
@ 2022-05-09  0:16         ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-09  0:16 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Sat, May 07, 2022 at 04:20:50PM +0900, Hyeonggon Yoo wrote:
> On Fri, May 06, 2022 at 09:11:35AM +0900, Byungchul Park wrote:
> > Linus wrote:
> > >
> > > On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
> > > >
> > > > Hi Linus and folks,
> > > >
> > > > I've been developing a tool for detecting deadlock possibilities by
> > > > tracking wait/event rather than lock(?) acquisition order to try to
> > > > cover all synchonization machanisms.
> > > 
> > > So what is the actual status of reports these days?
> > > 
> > > Last time I looked at some reports, it gave a lot of false positives
> > > due to mis-understanding prepare_to_sleep().
> > 
> > Yes, it was. I handled the case in the following way:
> > 
> > 1. Stage the wait at prepare_to_sleep(), which might be used at commit.
> >    Which has yet to be an actual wait that Dept considers.
> > 2. If the condition for sleep is true, the wait will be committed at
> >    __schedule(). The wait becomes an actual one that Dept considers.
> > 3. If the condition is false and the task gets back to TASK_RUNNING,
> >    clean(=reset) the staged wait.
> > 
> > That way, Dept only works with what actually hits to __schedule() for
> > the waits through sleep.
> > 
> > > For this all to make sense, it would need to not have false positives
> > > (or at least a very small number of them together with a way to sanely
> > 
> > Yes. I agree with you. I got rid of them that way I described above.
> >
> 
> IMHO DEPT should not report what lockdep allows (Not talking about

No.

> wait events). I mean lockdep allows some kind of nested locks but
> DEPT reports them.

You have already asked exactly same question in another thread of
LKML. That time I answered to it but let me explain it again.

---

CASE 1.

   lock L with depth n
   lock_nested L' with depth n + 1
   ...
   unlock L'
   unlock L

This case is allowed by Lockdep.
This case is allowed by DEPT cuz it's not a deadlock.

CASE 2.

   lock L with depth n
   lock A
   lock_nested L' with depth n + 1
   ...
   unlock L'
   unlock A
   unlock L

This case is allowed by Lockdep.
This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.

---

The following scenario would explain why CASE 2 is problematic.

   THREAD X			THREAD Y

   lock L with depth n
				lock L' with depth n
   lock A
				lock A
   lock_nested L' with depth n + 1
				lock_nested L'' with depth n + 1
   ...				...
   unlock L'			unlock L''
   unlock A			unlock A
   unlock L			unlock L'

Yes. I need to check if the report you shared with me is a true one, but
it's not because DEPT doesn't work with *_nested() APIs.

	Byungchul

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-09  0:16         ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-09  0:16 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel, cl,
	will, duyuyang, sashal, paolo.valente, damien.lemoal, willy, hch,
	airlied, mingo, djwong, vdavydov.dev, rientjes, dennis,
	linux-ext4, ngupta, johannes.berg, jack, dan.j.williams, josef,
	rostedt, linux-block, linux-fsdevel, jglisse, viro, tglx, mhocko,
	vbabka, melissa.srw, sj, tytso, rodrigosiqueiramelo, kernel-team,
	gregkh, jlayton, linux-kernel, penberg, minchan, hannes, tj,
	akpm, torvalds

On Sat, May 07, 2022 at 04:20:50PM +0900, Hyeonggon Yoo wrote:
> On Fri, May 06, 2022 at 09:11:35AM +0900, Byungchul Park wrote:
> > Linus wrote:
> > >
> > > On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
> > > >
> > > > Hi Linus and folks,
> > > >
> > > > I've been developing a tool for detecting deadlock possibilities by
> > > > tracking wait/event rather than lock(?) acquisition order to try to
> > > > cover all synchonization machanisms.
> > > 
> > > So what is the actual status of reports these days?
> > > 
> > > Last time I looked at some reports, it gave a lot of false positives
> > > due to mis-understanding prepare_to_sleep().
> > 
> > Yes, it was. I handled the case in the following way:
> > 
> > 1. Stage the wait at prepare_to_sleep(), which might be used at commit.
> >    Which has yet to be an actual wait that Dept considers.
> > 2. If the condition for sleep is true, the wait will be committed at
> >    __schedule(). The wait becomes an actual one that Dept considers.
> > 3. If the condition is false and the task gets back to TASK_RUNNING,
> >    clean(=reset) the staged wait.
> > 
> > That way, Dept only works with what actually hits to __schedule() for
> > the waits through sleep.
> > 
> > > For this all to make sense, it would need to not have false positives
> > > (or at least a very small number of them together with a way to sanely
> > 
> > Yes. I agree with you. I got rid of them that way I described above.
> >
> 
> IMHO DEPT should not report what lockdep allows (Not talking about

No.

> wait events). I mean lockdep allows some kind of nested locks but
> DEPT reports them.

You have already asked exactly same question in another thread of
LKML. That time I answered to it but let me explain it again.

---

CASE 1.

   lock L with depth n
   lock_nested L' with depth n + 1
   ...
   unlock L'
   unlock L

This case is allowed by Lockdep.
This case is allowed by DEPT cuz it's not a deadlock.

CASE 2.

   lock L with depth n
   lock A
   lock_nested L' with depth n + 1
   ...
   unlock L'
   unlock A
   unlock L

This case is allowed by Lockdep.
This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.

---

The following scenario would explain why CASE 2 is problematic.

   THREAD X			THREAD Y

   lock L with depth n
				lock L' with depth n
   lock A
				lock A
   lock_nested L' with depth n + 1
				lock_nested L'' with depth n + 1
   ...				...
   unlock L'			unlock L''
   unlock A			unlock A
   unlock L			unlock L'

Yes. I need to check if the report you shared with me is a true one, but
it's not because DEPT doesn't work with *_nested() APIs.

	Byungchul

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-04 18:17   ` Linus Torvalds
@ 2022-05-09  1:22     ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-09  1:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Damien Le Moal, linux-ide, Andreas Dilger, Ext4 Developers List,
	Ingo Molnar, Linux Kernel Mailing List, Peter Zijlstra,
	Will Deacon, Thomas Gleixner, Steven Rostedt, Joel Fernandes,
	Sasha Levin, Daniel Vetter, Chris Wilson, duyuyang,
	johannes.berg, Tejun Heo, Theodore Ts'o, Matthew Wilcox,
	Dave Chinner, Amir Goldstein, J. Bruce Fields,
	Greg Kroah-Hartman, kernel-team, Linux-MM, Andrew Morton,
	Michal Hocko, Minchan Kim, Johannes Weiner, Vladimir Davydov, sj,
	Jerome Glisse, Dennis Zhou, Christoph Lameter, Pekka Enberg,
	David Rientjes, Vlastimil Babka, ngupta, linux-block,
	paolo.valente, Josef Bacik, linux-fsdevel, Al Viro, Jan Kara,
	jack, Jeff Layton, Dan Williams, Christoph Hellwig,
	Darrick J. Wong, dri-devel, Dave Airlie, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

On Wed, May 04, 2022 at 11:17:02AM -0700, Linus Torvalds wrote:
> On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
> >
> > Hi Linus and folks,
> >
> > I've been developing a tool for detecting deadlock possibilities by
> > tracking wait/event rather than lock(?) acquisition order to try to
> > cover all synchonization machanisms.
> 
> So what is the actual status of reports these days?

I'd like to mention one important thing here. Reportability would get
stronger if the more wait-event pairs get tagged everywhere DEPT can
work.

Everything e.g. HW-SW interface, any retry logic and so on can be a
wait-event pair if they work wait or event anyway. For example, polling
on an IO mapped read register and initiating the HW to go for the event
also can be a pair. Definitely those make DEPT more useful.

---

The way to use the APIs:

1. Define SDT(Simple Dependency Tracker)

   DEFINE_DEPT_SDT(my_hw_event); <- add this

2. Tag on the waits

   sdt_wait(&my_hw_event); <- add this
   ... retry logic until my hw work done ... <- the original code

3. Tag on the events

   sdt_event(&my_hw_event); <- add this
   run_my_hw(); <- the original code

---

These are all we should do. I believe DEPT would be a very useful tool
once all wait-event pairs get tagged by the developers in all subsystems
and device drivers.

	Byungchul

> Last time I looked at some reports, it gave a lot of false positives
> due to mis-understanding prepare_to_sleep().
> 
> For this all to make sense, it would need to not have false positives
> (or at least a very small number of them together with a way to sanely
> get rid of them), and also have a track record of finding things that
> lockdep doesn't.
> 
> Maybe such reports have been sent out with the current situation, and
> I haven't seen them.
> 
>                  Linus

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-09  1:22     ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-09  1:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: hamohammed.sa, Jan Kara, Peter Zijlstra, Daniel Vetter,
	Amir Goldstein, Dave Chinner, dri-devel, Chris Wilson,
	J. Bruce Fields, linux-ide, Andreas Dilger, Joel Fernandes,
	42.hyeyoo, Christoph Lameter, Will Deacon, duyuyang, Sasha Levin,
	paolo.valente, Damien Le Moal, Matthew Wilcox, Christoph Hellwig,
	Dave Airlie, Ingo Molnar, Darrick J. Wong, Vladimir Davydov,
	David Rientjes, Dennis Zhou, Ext4 Developers List, Linux-MM,
	ngupta, johannes.berg, jack, Dan Williams, Josef Bacik,
	Steven Rostedt, linux-block, linux-fsdevel, Jerome Glisse,
	Al Viro, Thomas Gleixner, Michal Hocko, Vlastimil Babka,
	melissa.srw, sj, Theodore Ts'o, rodrigosiqueiramelo,
	kernel-team, Greg Kroah-Hartman, Jeff Layton,
	Linux Kernel Mailing List, Pekka Enberg, Minchan Kim,
	Johannes Weiner, Tejun Heo, Andrew Morton

On Wed, May 04, 2022 at 11:17:02AM -0700, Linus Torvalds wrote:
> On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
> >
> > Hi Linus and folks,
> >
> > I've been developing a tool for detecting deadlock possibilities by
> > tracking wait/event rather than lock(?) acquisition order to try to
> > cover all synchonization machanisms.
> 
> So what is the actual status of reports these days?

I'd like to mention one important thing here. Reportability would get
stronger if the more wait-event pairs get tagged everywhere DEPT can
work.

Everything e.g. HW-SW interface, any retry logic and so on can be a
wait-event pair if they work wait or event anyway. For example, polling
on an IO mapped read register and initiating the HW to go for the event
also can be a pair. Definitely those make DEPT more useful.

---

The way to use the APIs:

1. Define SDT(Simple Dependency Tracker)

   DEFINE_DEPT_SDT(my_hw_event); <- add this

2. Tag on the waits

   sdt_wait(&my_hw_event); <- add this
   ... retry logic until my hw work done ... <- the original code

3. Tag on the events

   sdt_event(&my_hw_event); <- add this
   run_my_hw(); <- the original code

---

These are all we should do. I believe DEPT would be a very useful tool
once all wait-event pairs get tagged by the developers in all subsystems
and device drivers.

	Byungchul

> Last time I looked at some reports, it gave a lot of false positives
> due to mis-understanding prepare_to_sleep().
> 
> For this all to make sense, it would need to not have false positives
> (or at least a very small number of them together with a way to sanely
> get rid of them), and also have a track record of finding things that
> lockdep doesn't.
> 
> Maybe such reports have been sent out with the current situation, and
> I haven't seen them.
> 
>                  Linus

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-09  0:16         ` Byungchul Park
@ 2022-05-09 20:47           ` Steven Rostedt
  -1 siblings, 0 replies; 105+ messages in thread
From: Steven Rostedt @ 2022-05-09 20:47 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Hyeonggon Yoo, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, linux-kernel, peterz, will,
	tglx, joel, sashal, daniel.vetter, chris, duyuyang,
	johannes.berg, tj, tytso, willy, david, amir73il, gregkh,
	kernel-team, linux-mm, akpm, mhocko, minchan, hannes,
	vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka,
	ngupta, linux-block, paolo.valente, josef, linux-fsdevel, viro,
	jack, jack, jlayton, dan.j.williams, hch, djwong, dri-devel,
	airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Mon, 9 May 2022 09:16:37 +0900
Byungchul Park <byungchul.park@lge.com> wrote:

> CASE 2.
> 
>    lock L with depth n
>    lock A
>    lock_nested L' with depth n + 1
>    ...
>    unlock L'
>    unlock A
>    unlock L
> 
> This case is allowed by Lockdep.
> This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
> 
> ---
> 
> The following scenario would explain why CASE 2 is problematic.
> 
>    THREAD X			THREAD Y
> 
>    lock L with depth n
> 				lock L' with depth n
>    lock A
> 				lock A
>    lock_nested L' with depth n + 1

I'm confused by what exactly you are saying is a deadlock above.

Are you saying that lock A and L' are inversed? If so, lockdep had better
detect that regardless of L. A nested lock associates the the nesting with
the same type of lock. That is, in lockdep nested tells lockdep not to
trigger on the L and L' but it will not ignore that A was taken.

-- Steve



> 				lock_nested L'' with depth n + 1
>    ...				...
>    unlock L'			unlock L''
>    unlock A			unlock A
>    unlock L			unlock L'


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-09 20:47           ` Steven Rostedt
  0 siblings, 0 replies; 105+ messages in thread
From: Steven Rostedt @ 2022-05-09 20:47 UTC (permalink / raw)
  To: Byungchul Park
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel,
	Hyeonggon Yoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, ngupta, johannes.berg, jack,
	dan.j.williams, josef, linux-block, linux-fsdevel, jglisse, viro,
	tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm, torvalds

On Mon, 9 May 2022 09:16:37 +0900
Byungchul Park <byungchul.park@lge.com> wrote:

> CASE 2.
> 
>    lock L with depth n
>    lock A
>    lock_nested L' with depth n + 1
>    ...
>    unlock L'
>    unlock A
>    unlock L
> 
> This case is allowed by Lockdep.
> This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
> 
> ---
> 
> The following scenario would explain why CASE 2 is problematic.
> 
>    THREAD X			THREAD Y
> 
>    lock L with depth n
> 				lock L' with depth n
>    lock A
> 				lock A
>    lock_nested L' with depth n + 1

I'm confused by what exactly you are saying is a deadlock above.

Are you saying that lock A and L' are inversed? If so, lockdep had better
detect that regardless of L. A nested lock associates the the nesting with
the same type of lock. That is, in lockdep nested tells lockdep not to
trigger on the L and L' but it will not ignore that A was taken.

-- Steve



> 				lock_nested L'' with depth n + 1
>    ...				...
>    unlock L'			unlock L''
>    unlock A			unlock A
>    unlock L			unlock L'


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-09 21:05   ` Theodore Ts'o
  -1 siblings, 0 replies; 105+ messages in thread
From: Theodore Ts'o @ 2022-05-09 21:05 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa, 42.hyeyoo

I tried DEPT-v6 applied against 5.18-rc5, and it reported the
following positive.

The reason why it's nonsense is that in context A's [W] wait:

[ 1538.545054] [W] folio_wait_bit_common(pglocked:0):
[ 1538.545370] [<ffffffff81259944>] __filemap_get_folio+0x3e4/0x420
[ 1538.545763] stacktrace:
[ 1538.545928]       folio_wait_bit_common+0x2fa/0x460
[ 1538.546248]       __filemap_get_folio+0x3e4/0x420
[ 1538.546558]       pagecache_get_page+0x11/0x40
[ 1538.546852]       ext4_mb_init_group+0x80/0x2e0
[ 1538.547152]       ext4_mb_good_group_nolock+0x2a3/0x2d0

... we're reading the block allocation bitmap into the page cache.
This does not correspond to a real inode, and so we don't actually
take ei->i_data_sem in this on the psuedo-inode used.

In contast, context's B's [W] and [E]'s stack traces, the
folio_wait_bit is clearly associated with page which is mapped to a
real inode:

[ 1538.553656] [W] down_write(&ei->i_data_sem:0):
[ 1538.553948] [<ffffffff8141c01b>] ext4_map_blocks+0x17b/0x680
[ 1538.554320] stacktrace:
[ 1538.554485]       ext4_map_blocks+0x17b/0x680
[ 1538.554772]       mpage_map_and_submit_extent+0xef/0x530
[ 1538.555122]       ext4_writepages+0x798/0x990
[ 1538.555409]       do_writepages+0xcf/0x1c0
[ 1538.555682]       __writeback_single_inode+0x58/0x3f0
[ 1538.556014]       writeback_sb_inodes+0x210/0x540
  		     ...

[ 1538.558621] [E] folio_wake_bit(pglocked:0):
[ 1538.558896] [<ffffffff814418c0>] ext4_bio_write_page+0x400/0x560
[ 1538.559290] stacktrace:
[ 1538.559455]       ext4_bio_write_page+0x400/0x560
[ 1538.559765]       mpage_submit_page+0x5c/0x80
[ 1538.560051]       mpage_map_and_submit_buffers+0x15a/0x250
[ 1538.560409]       mpage_map_and_submit_extent+0x134/0x530
[ 1538.560764]       ext4_writepages+0x798/0x990
[ 1538.561057]       do_writepages+0xcf/0x1c0
[ 1538.561329]       __writeback_single_inode+0x58/0x3f0
		...


In any case, this will ***never*** deadlock, and it's due to DEPT
fundamentally not understanding that waiting on different pages may be
due to inodes that come from completely different inodes, and so there
is zero possible chance this would never deadlock.

I suspect there will be similar false positives for tests (or
userspace) that uses copy_file_range(2) or send_file(2) system calls.

I've included the full DEPT log report below.

						- Ted

generic/011		[20:11:16][ 1533.411773] run fstests generic/011 at 2022-05-07 20:11:16
[ 1533.509603] DEPT_INFO_ONCE: Need to expand the ring buffer.
[ 1536.910044] DEPT_INFO_ONCE: Pool(wait) is empty.
[ 1538.533315] ===================================================
[ 1538.533793] DEPT: Circular dependency has been detected.
[ 1538.534199] 5.18.0-rc5-xfstests-dept-00021-g8d3d751c9964 #571 Not tainted
[ 1538.534645] ---------------------------------------------------
[ 1538.535035] summary
[ 1538.535177] ---------------------------------------------------
[ 1538.535567] *** DEADLOCK ***
[ 1538.535567] 
[ 1538.535854] context A
[ 1538.536008]     [S] down_write(&ei->i_data_sem:0)
[ 1538.536323]     [W] folio_wait_bit_common(pglocked:0)
[ 1538.536655]     [E] up_write(&ei->i_data_sem:0)
[ 1538.536958] 
[ 1538.537063] context B
[ 1538.537216]     [S] (unknown)(pglocked:0)
[ 1538.537480]     [W] down_write(&ei->i_data_sem:0)
[ 1538.537789]     [E] folio_wake_bit(pglocked:0)
[ 1538.538082] 
[ 1538.538184] [S]: start of the event context
[ 1538.538460] [W]: the wait blocked
[ 1538.538680] [E]: the event not reachable
[ 1538.538939] ---------------------------------------------------
[ 1538.539327] context A's detail
[ 1538.539530] ---------------------------------------------------
[ 1538.539918] context A
[ 1538.540072]     [S] down_write(&ei->i_data_sem:0)
[ 1538.540382]     [W] folio_wait_bit_common(pglocked:0)
[ 1538.540712]     [E] up_write(&ei->i_data_sem:0)
[ 1538.541015] 
[ 1538.541119] [S] down_write(&ei->i_data_sem:0):
[ 1538.541410] [<ffffffff8141c01b>] ext4_map_blocks+0x17b/0x680
[ 1538.541782] stacktrace:
[ 1538.541946]       ext4_map_blocks+0x17b/0x680
[ 1538.542234]       ext4_getblk+0x5f/0x1f0
[ 1538.542493]       ext4_bread+0xc/0x70
[ 1538.542736]       ext4_append+0x48/0xf0
[ 1538.542991]       ext4_init_new_dir+0xc8/0x160
[ 1538.543284]       ext4_mkdir+0x19a/0x320
[ 1538.543542]       vfs_mkdir+0x83/0xe0
[ 1538.543788]       do_mkdirat+0x8c/0x130
[ 1538.544042]       __x64_sys_mkdir+0x29/0x30
[ 1538.544319]       do_syscall_64+0x40/0x90
[ 1538.544584]       entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1538.544949] 
[ 1538.545054] [W] folio_wait_bit_common(pglocked:0):
[ 1538.545370] [<ffffffff81259944>] __filemap_get_folio+0x3e4/0x420
[ 1538.545763] stacktrace:
[ 1538.545928]       folio_wait_bit_common+0x2fa/0x460
[ 1538.546248]       __filemap_get_folio+0x3e4/0x420
[ 1538.546558]       pagecache_get_page+0x11/0x40
[ 1538.546852]       ext4_mb_init_group+0x80/0x2e0
[ 1538.547152]       ext4_mb_good_group_nolock+0x2a3/0x2d0
[ 1538.547496]       ext4_mb_regular_allocator+0x391/0x780
[ 1538.547840]       ext4_mb_new_blocks+0x44e/0x720
[ 1538.548145]       ext4_ext_map_blocks+0x7f1/0xd00
[ 1538.548455]       ext4_map_blocks+0x19e/0x680
[ 1538.548743]       ext4_getblk+0x5f/0x1f0
[ 1538.549006]       ext4_bread+0xc/0x70
[ 1538.549250]       ext4_append+0x48/0xf0
[ 1538.549505]       ext4_init_new_dir+0xc8/0x160
[ 1538.549798]       ext4_mkdir+0x19a/0x320
[ 1538.550058]       vfs_mkdir+0x83/0xe0
[ 1538.550302]       do_mkdirat+0x8c/0x130
[ 1538.550557] 
[ 1538.550660] [E] up_write(&ei->i_data_sem:0):
[ 1538.550940] (N/A)
[ 1538.551071] ---------------------------------------------------
[ 1538.551459] context B's detail
[ 1538.551662] ---------------------------------------------------
[ 1538.552047] context B
[ 1538.552202]     [S] (unknown)(pglocked:0)
[ 1538.552466]     [W] down_write(&ei->i_data_sem:0)
[ 1538.552775]     [E] folio_wake_bit(pglocked:0)
[ 1538.553071] 
[ 1538.553174] [S] (unknown)(pglocked:0):
[ 1538.553422] (N/A)
[ 1538.553553] 
[ 1538.553656] [W] down_write(&ei->i_data_sem:0):
[ 1538.553948] [<ffffffff8141c01b>] ext4_map_blocks+0x17b/0x680
[ 1538.554320] stacktrace:
[ 1538.554485]       ext4_map_blocks+0x17b/0x680
[ 1538.554772]       mpage_map_and_submit_extent+0xef/0x530
[ 1538.555122]       ext4_writepages+0x798/0x990
[ 1538.555409]       do_writepages+0xcf/0x1c0
[ 1538.555682]       __writeback_single_inode+0x58/0x3f0
[ 1538.556014]       writeback_sb_inodes+0x210/0x540
[ 1538.556324]       __writeback_inodes_wb+0x4c/0xe0
[ 1538.556635]       wb_writeback+0x298/0x450
[ 1538.556911]       wb_do_writeback+0x29e/0x320
[ 1538.557199]       wb_workfn+0x6a/0x2c0
[ 1538.557447]       process_one_work+0x302/0x650
[ 1538.557743]       worker_thread+0x55/0x400
[ 1538.558013]       kthread+0xf0/0x120
[ 1538.558251]       ret_from_fork+0x1f/0x30
[ 1538.558518] 
[ 1538.558621] [E] folio_wake_bit(pglocked:0):
[ 1538.558896] [<ffffffff814418c0>] ext4_bio_write_page+0x400/0x560
[ 1538.559290] stacktrace:
[ 1538.559455]       ext4_bio_write_page+0x400/0x560
[ 1538.559765]       mpage_submit_page+0x5c/0x80
[ 1538.560051]       mpage_map_and_submit_buffers+0x15a/0x250
[ 1538.560409]       mpage_map_and_submit_extent+0x134/0x530
[ 1538.560764]       ext4_writepages+0x798/0x990
[ 1538.561057]       do_writepages+0xcf/0x1c0
[ 1538.561329]       __writeback_single_inode+0x58/0x3f0
[ 1538.561662]       writeback_sb_inodes+0x210/0x540
[ 1538.561973]       __writeback_inodes_wb+0x4c/0xe0
[ 1538.562283]       wb_writeback+0x298/0x450
[ 1538.562555]       wb_do_writeback+0x29e/0x320
[ 1538.562842]       wb_workfn+0x6a/0x2c0
[ 1538.563095]       process_one_work+0x302/0x650
[ 1538.563387]       worker_thread+0x55/0x400
[ 1538.563658]       kthread+0xf0/0x120
[ 1538.563895]       ret_from_fork+0x1f/0x30
[ 1538.564161] ---------------------------------------------------
[ 1538.564548] information that might be helpful
[ 1538.564832] ---------------------------------------------------
[ 1538.565223] CPU: 1 PID: 46539 Comm: dirstress Not tainted 5.18.0-rc5-xfstests-dept-00021-g8d3d751c9964 #571
[ 1538.565854] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 1538.566394] Call Trace:
[ 1538.566559]  <TASK>
[ 1538.566701]  dump_stack_lvl+0x4f/0x68
[ 1538.566945]  print_circle.cold+0x15b/0x169
[ 1538.567218]  ? print_circle+0xe0/0xe0
[ 1538.567461]  cb_check_dl+0x55/0x60
[ 1538.567687]  bfs+0xd5/0x1b0
[ 1538.567874]  add_dep+0xd3/0x1a0
[ 1538.568083]  ? __filemap_get_folio+0x3e4/0x420
[ 1538.568374]  add_wait+0xe3/0x250
[ 1538.568590]  ? __filemap_get_folio+0x3e4/0x420
[ 1538.568886]  dept_wait_split_map+0xb1/0x130
[ 1538.569163]  folio_wait_bit_common+0x2fa/0x460
[ 1538.569456]  ? lock_is_held_type+0xfc/0x130
[ 1538.569733]  __filemap_get_folio+0x3e4/0x420
[ 1538.570013]  ? __lock_release+0x1b2/0x2c0
[ 1538.570278]  pagecache_get_page+0x11/0x40
[ 1538.570543]  ext4_mb_init_group+0x80/0x2e0
[ 1538.570813]  ? ext4_get_group_desc+0xb2/0x200
[ 1538.571102]  ext4_mb_good_group_nolock+0x2a3/0x2d0
[ 1538.571418]  ext4_mb_regular_allocator+0x391/0x780
[ 1538.571733]  ? rcu_read_lock_sched_held+0x3f/0x70
[ 1538.572044]  ? trace_kmem_cache_alloc+0x2c/0xd0
[ 1538.572343]  ? kmem_cache_alloc+0x1f7/0x3f0
[ 1538.572618]  ext4_mb_new_blocks+0x44e/0x720
[ 1538.572896]  ext4_ext_map_blocks+0x7f1/0xd00
[ 1538.573179]  ? find_held_lock+0x2b/0x80
[ 1538.573434]  ext4_map_blocks+0x19e/0x680
[ 1538.573693]  ext4_getblk+0x5f/0x1f0
[ 1538.573927]  ext4_bread+0xc/0x70
[ 1538.574141]  ext4_append+0x48/0xf0
[ 1538.574369]  ext4_init_new_dir+0xc8/0x160
[ 1538.574634]  ext4_mkdir+0x19a/0x320
[ 1538.574866]  vfs_mkdir+0x83/0xe0
[ 1538.575082]  do_mkdirat+0x8c/0x130
[ 1538.575308]  __x64_sys_mkdir+0x29/0x30
[ 1538.575557]  do_syscall_64+0x40/0x90
[ 1538.575795]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1538.576128] RIP: 0033:0x7f0960466b07
[ 1538.576367] Code: 1f 40 00 48 8b 05 89 f3 0c 00 64 c7 00 5f 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 59 f3 0c 00 f7 d8 64 89 01 48
[ 1538.577576] RSP: 002b:00007ffd0fa955a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[ 1538.578069] RAX: ffffffffffffffda RBX: 0000000000000239 RCX: 00007f0960466b07
[ 1538.578533] RDX: 0000000000000000 RSI: 00000000000001ff RDI: 00007ffd0fa955d0
[ 1538.578995] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000010
[ 1538.579458] R10: 00007ffd0fa95345 R11: 0000000000000246 R12: 00000000000003e8
[ 1538.579923] R13: 0000000000000000 R14: 00007ffd0fa955d0 R15: 00007ffd0fa95dd0
[ 1538.580389]  </TASK>
[ 1540.581382] EXT4-fs (vdb): mounted filesystem with ordered data mode. Quota mode: none.
 [20:11:24] 8s


P.S.  Later on the console, the test ground to the halt because DEPT
started WARNING over and over and over again....

[ 3129.686102] DEPT_WARN_ON: dt->ecxt_held_pos == DEPT_MAX_ECXT_HELD
[ 3129.686396]  ? __might_fault+0x32/0x80
[ 3129.686660] WARNING: CPU: 1 PID: 107320 at kernel/dependency/dept.c:1537 add_ecxt+0x1c0/0x1d0
[ 3129.687040]  ? __might_fault+0x32/0x80
[ 3129.687282] CPU: 1 PID: 107320 Comm: aio-stress Tainted: G        W         5.18.0-rc5-xfstests-dept-00021-g8d3d751c9964 #571

with multiple CPU's completely spamming the serial console.  This
should probably be a WARN_ON_ONCE, or some thing that disables DEPT
entirely, since apparently won't be any useful DEPT reports (or any
useful kernel work, for that matteR) is going to be happening after
this.


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-09 21:05   ` Theodore Ts'o
  0 siblings, 0 replies; 105+ messages in thread
From: Theodore Ts'o @ 2022-05-09 21:05 UTC (permalink / raw)
  To: Byungchul Park
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm, torvalds

I tried DEPT-v6 applied against 5.18-rc5, and it reported the
following positive.

The reason why it's nonsense is that in context A's [W] wait:

[ 1538.545054] [W] folio_wait_bit_common(pglocked:0):
[ 1538.545370] [<ffffffff81259944>] __filemap_get_folio+0x3e4/0x420
[ 1538.545763] stacktrace:
[ 1538.545928]       folio_wait_bit_common+0x2fa/0x460
[ 1538.546248]       __filemap_get_folio+0x3e4/0x420
[ 1538.546558]       pagecache_get_page+0x11/0x40
[ 1538.546852]       ext4_mb_init_group+0x80/0x2e0
[ 1538.547152]       ext4_mb_good_group_nolock+0x2a3/0x2d0

... we're reading the block allocation bitmap into the page cache.
This does not correspond to a real inode, and so we don't actually
take ei->i_data_sem in this on the psuedo-inode used.

In contast, context's B's [W] and [E]'s stack traces, the
folio_wait_bit is clearly associated with page which is mapped to a
real inode:

[ 1538.553656] [W] down_write(&ei->i_data_sem:0):
[ 1538.553948] [<ffffffff8141c01b>] ext4_map_blocks+0x17b/0x680
[ 1538.554320] stacktrace:
[ 1538.554485]       ext4_map_blocks+0x17b/0x680
[ 1538.554772]       mpage_map_and_submit_extent+0xef/0x530
[ 1538.555122]       ext4_writepages+0x798/0x990
[ 1538.555409]       do_writepages+0xcf/0x1c0
[ 1538.555682]       __writeback_single_inode+0x58/0x3f0
[ 1538.556014]       writeback_sb_inodes+0x210/0x540
  		     ...

[ 1538.558621] [E] folio_wake_bit(pglocked:0):
[ 1538.558896] [<ffffffff814418c0>] ext4_bio_write_page+0x400/0x560
[ 1538.559290] stacktrace:
[ 1538.559455]       ext4_bio_write_page+0x400/0x560
[ 1538.559765]       mpage_submit_page+0x5c/0x80
[ 1538.560051]       mpage_map_and_submit_buffers+0x15a/0x250
[ 1538.560409]       mpage_map_and_submit_extent+0x134/0x530
[ 1538.560764]       ext4_writepages+0x798/0x990
[ 1538.561057]       do_writepages+0xcf/0x1c0
[ 1538.561329]       __writeback_single_inode+0x58/0x3f0
		...


In any case, this will ***never*** deadlock, and it's due to DEPT
fundamentally not understanding that waiting on different pages may be
due to inodes that come from completely different inodes, and so there
is zero possible chance this would never deadlock.

I suspect there will be similar false positives for tests (or
userspace) that uses copy_file_range(2) or send_file(2) system calls.

I've included the full DEPT log report below.

						- Ted

generic/011		[20:11:16][ 1533.411773] run fstests generic/011 at 2022-05-07 20:11:16
[ 1533.509603] DEPT_INFO_ONCE: Need to expand the ring buffer.
[ 1536.910044] DEPT_INFO_ONCE: Pool(wait) is empty.
[ 1538.533315] ===================================================
[ 1538.533793] DEPT: Circular dependency has been detected.
[ 1538.534199] 5.18.0-rc5-xfstests-dept-00021-g8d3d751c9964 #571 Not tainted
[ 1538.534645] ---------------------------------------------------
[ 1538.535035] summary
[ 1538.535177] ---------------------------------------------------
[ 1538.535567] *** DEADLOCK ***
[ 1538.535567] 
[ 1538.535854] context A
[ 1538.536008]     [S] down_write(&ei->i_data_sem:0)
[ 1538.536323]     [W] folio_wait_bit_common(pglocked:0)
[ 1538.536655]     [E] up_write(&ei->i_data_sem:0)
[ 1538.536958] 
[ 1538.537063] context B
[ 1538.537216]     [S] (unknown)(pglocked:0)
[ 1538.537480]     [W] down_write(&ei->i_data_sem:0)
[ 1538.537789]     [E] folio_wake_bit(pglocked:0)
[ 1538.538082] 
[ 1538.538184] [S]: start of the event context
[ 1538.538460] [W]: the wait blocked
[ 1538.538680] [E]: the event not reachable
[ 1538.538939] ---------------------------------------------------
[ 1538.539327] context A's detail
[ 1538.539530] ---------------------------------------------------
[ 1538.539918] context A
[ 1538.540072]     [S] down_write(&ei->i_data_sem:0)
[ 1538.540382]     [W] folio_wait_bit_common(pglocked:0)
[ 1538.540712]     [E] up_write(&ei->i_data_sem:0)
[ 1538.541015] 
[ 1538.541119] [S] down_write(&ei->i_data_sem:0):
[ 1538.541410] [<ffffffff8141c01b>] ext4_map_blocks+0x17b/0x680
[ 1538.541782] stacktrace:
[ 1538.541946]       ext4_map_blocks+0x17b/0x680
[ 1538.542234]       ext4_getblk+0x5f/0x1f0
[ 1538.542493]       ext4_bread+0xc/0x70
[ 1538.542736]       ext4_append+0x48/0xf0
[ 1538.542991]       ext4_init_new_dir+0xc8/0x160
[ 1538.543284]       ext4_mkdir+0x19a/0x320
[ 1538.543542]       vfs_mkdir+0x83/0xe0
[ 1538.543788]       do_mkdirat+0x8c/0x130
[ 1538.544042]       __x64_sys_mkdir+0x29/0x30
[ 1538.544319]       do_syscall_64+0x40/0x90
[ 1538.544584]       entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1538.544949] 
[ 1538.545054] [W] folio_wait_bit_common(pglocked:0):
[ 1538.545370] [<ffffffff81259944>] __filemap_get_folio+0x3e4/0x420
[ 1538.545763] stacktrace:
[ 1538.545928]       folio_wait_bit_common+0x2fa/0x460
[ 1538.546248]       __filemap_get_folio+0x3e4/0x420
[ 1538.546558]       pagecache_get_page+0x11/0x40
[ 1538.546852]       ext4_mb_init_group+0x80/0x2e0
[ 1538.547152]       ext4_mb_good_group_nolock+0x2a3/0x2d0
[ 1538.547496]       ext4_mb_regular_allocator+0x391/0x780
[ 1538.547840]       ext4_mb_new_blocks+0x44e/0x720
[ 1538.548145]       ext4_ext_map_blocks+0x7f1/0xd00
[ 1538.548455]       ext4_map_blocks+0x19e/0x680
[ 1538.548743]       ext4_getblk+0x5f/0x1f0
[ 1538.549006]       ext4_bread+0xc/0x70
[ 1538.549250]       ext4_append+0x48/0xf0
[ 1538.549505]       ext4_init_new_dir+0xc8/0x160
[ 1538.549798]       ext4_mkdir+0x19a/0x320
[ 1538.550058]       vfs_mkdir+0x83/0xe0
[ 1538.550302]       do_mkdirat+0x8c/0x130
[ 1538.550557] 
[ 1538.550660] [E] up_write(&ei->i_data_sem:0):
[ 1538.550940] (N/A)
[ 1538.551071] ---------------------------------------------------
[ 1538.551459] context B's detail
[ 1538.551662] ---------------------------------------------------
[ 1538.552047] context B
[ 1538.552202]     [S] (unknown)(pglocked:0)
[ 1538.552466]     [W] down_write(&ei->i_data_sem:0)
[ 1538.552775]     [E] folio_wake_bit(pglocked:0)
[ 1538.553071] 
[ 1538.553174] [S] (unknown)(pglocked:0):
[ 1538.553422] (N/A)
[ 1538.553553] 
[ 1538.553656] [W] down_write(&ei->i_data_sem:0):
[ 1538.553948] [<ffffffff8141c01b>] ext4_map_blocks+0x17b/0x680
[ 1538.554320] stacktrace:
[ 1538.554485]       ext4_map_blocks+0x17b/0x680
[ 1538.554772]       mpage_map_and_submit_extent+0xef/0x530
[ 1538.555122]       ext4_writepages+0x798/0x990
[ 1538.555409]       do_writepages+0xcf/0x1c0
[ 1538.555682]       __writeback_single_inode+0x58/0x3f0
[ 1538.556014]       writeback_sb_inodes+0x210/0x540
[ 1538.556324]       __writeback_inodes_wb+0x4c/0xe0
[ 1538.556635]       wb_writeback+0x298/0x450
[ 1538.556911]       wb_do_writeback+0x29e/0x320
[ 1538.557199]       wb_workfn+0x6a/0x2c0
[ 1538.557447]       process_one_work+0x302/0x650
[ 1538.557743]       worker_thread+0x55/0x400
[ 1538.558013]       kthread+0xf0/0x120
[ 1538.558251]       ret_from_fork+0x1f/0x30
[ 1538.558518] 
[ 1538.558621] [E] folio_wake_bit(pglocked:0):
[ 1538.558896] [<ffffffff814418c0>] ext4_bio_write_page+0x400/0x560
[ 1538.559290] stacktrace:
[ 1538.559455]       ext4_bio_write_page+0x400/0x560
[ 1538.559765]       mpage_submit_page+0x5c/0x80
[ 1538.560051]       mpage_map_and_submit_buffers+0x15a/0x250
[ 1538.560409]       mpage_map_and_submit_extent+0x134/0x530
[ 1538.560764]       ext4_writepages+0x798/0x990
[ 1538.561057]       do_writepages+0xcf/0x1c0
[ 1538.561329]       __writeback_single_inode+0x58/0x3f0
[ 1538.561662]       writeback_sb_inodes+0x210/0x540
[ 1538.561973]       __writeback_inodes_wb+0x4c/0xe0
[ 1538.562283]       wb_writeback+0x298/0x450
[ 1538.562555]       wb_do_writeback+0x29e/0x320
[ 1538.562842]       wb_workfn+0x6a/0x2c0
[ 1538.563095]       process_one_work+0x302/0x650
[ 1538.563387]       worker_thread+0x55/0x400
[ 1538.563658]       kthread+0xf0/0x120
[ 1538.563895]       ret_from_fork+0x1f/0x30
[ 1538.564161] ---------------------------------------------------
[ 1538.564548] information that might be helpful
[ 1538.564832] ---------------------------------------------------
[ 1538.565223] CPU: 1 PID: 46539 Comm: dirstress Not tainted 5.18.0-rc5-xfstests-dept-00021-g8d3d751c9964 #571
[ 1538.565854] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 1538.566394] Call Trace:
[ 1538.566559]  <TASK>
[ 1538.566701]  dump_stack_lvl+0x4f/0x68
[ 1538.566945]  print_circle.cold+0x15b/0x169
[ 1538.567218]  ? print_circle+0xe0/0xe0
[ 1538.567461]  cb_check_dl+0x55/0x60
[ 1538.567687]  bfs+0xd5/0x1b0
[ 1538.567874]  add_dep+0xd3/0x1a0
[ 1538.568083]  ? __filemap_get_folio+0x3e4/0x420
[ 1538.568374]  add_wait+0xe3/0x250
[ 1538.568590]  ? __filemap_get_folio+0x3e4/0x420
[ 1538.568886]  dept_wait_split_map+0xb1/0x130
[ 1538.569163]  folio_wait_bit_common+0x2fa/0x460
[ 1538.569456]  ? lock_is_held_type+0xfc/0x130
[ 1538.569733]  __filemap_get_folio+0x3e4/0x420
[ 1538.570013]  ? __lock_release+0x1b2/0x2c0
[ 1538.570278]  pagecache_get_page+0x11/0x40
[ 1538.570543]  ext4_mb_init_group+0x80/0x2e0
[ 1538.570813]  ? ext4_get_group_desc+0xb2/0x200
[ 1538.571102]  ext4_mb_good_group_nolock+0x2a3/0x2d0
[ 1538.571418]  ext4_mb_regular_allocator+0x391/0x780
[ 1538.571733]  ? rcu_read_lock_sched_held+0x3f/0x70
[ 1538.572044]  ? trace_kmem_cache_alloc+0x2c/0xd0
[ 1538.572343]  ? kmem_cache_alloc+0x1f7/0x3f0
[ 1538.572618]  ext4_mb_new_blocks+0x44e/0x720
[ 1538.572896]  ext4_ext_map_blocks+0x7f1/0xd00
[ 1538.573179]  ? find_held_lock+0x2b/0x80
[ 1538.573434]  ext4_map_blocks+0x19e/0x680
[ 1538.573693]  ext4_getblk+0x5f/0x1f0
[ 1538.573927]  ext4_bread+0xc/0x70
[ 1538.574141]  ext4_append+0x48/0xf0
[ 1538.574369]  ext4_init_new_dir+0xc8/0x160
[ 1538.574634]  ext4_mkdir+0x19a/0x320
[ 1538.574866]  vfs_mkdir+0x83/0xe0
[ 1538.575082]  do_mkdirat+0x8c/0x130
[ 1538.575308]  __x64_sys_mkdir+0x29/0x30
[ 1538.575557]  do_syscall_64+0x40/0x90
[ 1538.575795]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1538.576128] RIP: 0033:0x7f0960466b07
[ 1538.576367] Code: 1f 40 00 48 8b 05 89 f3 0c 00 64 c7 00 5f 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 59 f3 0c 00 f7 d8 64 89 01 48
[ 1538.577576] RSP: 002b:00007ffd0fa955a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[ 1538.578069] RAX: ffffffffffffffda RBX: 0000000000000239 RCX: 00007f0960466b07
[ 1538.578533] RDX: 0000000000000000 RSI: 00000000000001ff RDI: 00007ffd0fa955d0
[ 1538.578995] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000010
[ 1538.579458] R10: 00007ffd0fa95345 R11: 0000000000000246 R12: 00000000000003e8
[ 1538.579923] R13: 0000000000000000 R14: 00007ffd0fa955d0 R15: 00007ffd0fa95dd0
[ 1538.580389]  </TASK>
[ 1540.581382] EXT4-fs (vdb): mounted filesystem with ordered data mode. Quota mode: none.
 [20:11:24] 8s


P.S.  Later on the console, the test ground to the halt because DEPT
started WARNING over and over and over again....

[ 3129.686102] DEPT_WARN_ON: dt->ecxt_held_pos == DEPT_MAX_ECXT_HELD
[ 3129.686396]  ? __might_fault+0x32/0x80
[ 3129.686660] WARNING: CPU: 1 PID: 107320 at kernel/dependency/dept.c:1537 add_ecxt+0x1c0/0x1d0
[ 3129.687040]  ? __might_fault+0x32/0x80
[ 3129.687282] CPU: 1 PID: 107320 Comm: aio-stress Tainted: G        W         5.18.0-rc5-xfstests-dept-00021-g8d3d751c9964 #571

with multiple CPU's completely spamming the serial console.  This
should probably be a WARN_ON_ONCE, or some thing that disables DEPT
entirely, since apparently won't be any useful DEPT reports (or any
useful kernel work, for that matteR) is going to be happening after
this.


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-09 21:05   ` Theodore Ts'o
@ 2022-05-09 22:28     ` Theodore Ts'o
  -1 siblings, 0 replies; 105+ messages in thread
From: Theodore Ts'o @ 2022-05-09 22:28 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa, 42.hyeyoo

Oh, one other problem with DEPT --- it's SLOW --- the overhead is
enormous.  Using kvm-xfstests[1] running "kvm-xfstests smoke", here
are some sample times:

			LOCKDEP		DEPT
Time to first test	49 seconds	602 seconds
ext4/001      		2 s		22 s
ext4/003		2 s		8 s
ext4/005		0 s		7 s
ext4/020		1 s		8 s
ext4/021		11 s		17 s
ext4/023		0 s		83 s
generic/001		4 s		76 s
generic/002		0 s		11 s
generic/003		10 s		19 s

There are some large variations; in some cases, some xfstests take 10x
as much time or more to run.  In fact, when I first started the
kvm-xfstests run with DEPT, I thought something had hung and that
tests would never start.  (In fact, with gce-xfstests the default
watchdog "something has gone terribly wrong with the kexec" had fired,
and I didn't get any test results using gce-xfstests at all.  If DEPT
goes in without any optimizations, I'm going to have to adjust the
watchdogs timers for gce-xfstests.)

The bottom line is that at the moment, between the false positives,
and the significant overhead imposed by DEPT, I would suggest that if
DEPT ever does go in, that it should be possible to disable DEPT and
only use the existing CONFIG_PROVE_LOCKING version of LOCKDEP, just
because DEPT is S - L - O - W.

[1] https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md

						- Ted

P.S.  Darrick and I both have disabled using LOCKDEP by default
because it slows down ext4 -g auto testing by a factor 2, and xfs -g
auto testing by a factor of 3.  So the fact that DEPT is a factor of
2x to 10x or more slower than LOCKDEP when running various xfstests
tests should be a real concern.


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-09 22:28     ` Theodore Ts'o
  0 siblings, 0 replies; 105+ messages in thread
From: Theodore Ts'o @ 2022-05-09 22:28 UTC (permalink / raw)
  To: Byungchul Park
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm, torvalds

Oh, one other problem with DEPT --- it's SLOW --- the overhead is
enormous.  Using kvm-xfstests[1] running "kvm-xfstests smoke", here
are some sample times:

			LOCKDEP		DEPT
Time to first test	49 seconds	602 seconds
ext4/001      		2 s		22 s
ext4/003		2 s		8 s
ext4/005		0 s		7 s
ext4/020		1 s		8 s
ext4/021		11 s		17 s
ext4/023		0 s		83 s
generic/001		4 s		76 s
generic/002		0 s		11 s
generic/003		10 s		19 s

There are some large variations; in some cases, some xfstests take 10x
as much time or more to run.  In fact, when I first started the
kvm-xfstests run with DEPT, I thought something had hung and that
tests would never start.  (In fact, with gce-xfstests the default
watchdog "something has gone terribly wrong with the kexec" had fired,
and I didn't get any test results using gce-xfstests at all.  If DEPT
goes in without any optimizations, I'm going to have to adjust the
watchdogs timers for gce-xfstests.)

The bottom line is that at the moment, between the false positives,
and the significant overhead imposed by DEPT, I would suggest that if
DEPT ever does go in, that it should be possible to disable DEPT and
only use the existing CONFIG_PROVE_LOCKING version of LOCKDEP, just
because DEPT is S - L - O - W.

[1] https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md

						- Ted

P.S.  Darrick and I both have disabled using LOCKDEP by default
because it slows down ext4 -g auto testing by a factor 2, and xfs -g
auto testing by a factor of 3.  So the fact that DEPT is a factor of
2x to 10x or more slower than LOCKDEP when running various xfstests
tests should be a real concern.


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-09 20:47           ` Steven Rostedt
@ 2022-05-09 23:38             ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-09 23:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Hyeonggon Yoo, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, linux-kernel, peterz, will,
	tglx, joel, sashal, daniel.vetter, chris, duyuyang,
	johannes.berg, tj, tytso, willy, david, amir73il, gregkh,
	kernel-team, linux-mm, akpm, mhocko, minchan, hannes,
	vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka,
	ngupta, linux-block, paolo.valente, josef, linux-fsdevel, viro,
	jack, jack, jlayton, dan.j.williams, hch, djwong, dri-devel,
	airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Mon, May 09, 2022 at 04:47:12PM -0400, Steven Rostedt wrote:
> On Mon, 9 May 2022 09:16:37 +0900
> Byungchul Park <byungchul.park@lge.com> wrote:
> 
> > CASE 2.
> > 
> >    lock L with depth n
> >    lock A
> >    lock_nested L' with depth n + 1
> >    ...
> >    unlock L'
> >    unlock A
> >    unlock L
> > 
> > This case is allowed by Lockdep.
> > This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
> > 
> > ---
> > 
> > The following scenario would explain why CASE 2 is problematic.
> > 
> >    THREAD X			THREAD Y
> > 
> >    lock L with depth n
> > 				lock L' with depth n
> >    lock A
> > 				lock A
> >    lock_nested L' with depth n + 1
> 
> I'm confused by what exactly you are saying is a deadlock above.
> 
> Are you saying that lock A and L' are inversed? If so, lockdep had better

Hi Steven,

Yes, I was talking about A and L'.

> detect that regardless of L. A nested lock associates the the nesting with

When I checked Lockdep code, L' with depth n + 1 and L' with depth n
have different classes in Lockdep.

That's why I said Lockdep cannot detect it. By any chance, has it
changed so as to consider this case? Or am I missing something?

> the same type of lock. That is, in lockdep nested tells lockdep not to
> trigger on the L and L' but it will not ignore that A was taken.

It will not ignore A but it would work like this:

   THREAD X			THREAD Y

   lock Ln
				lock Ln
   lock A
				lock A
   lock_nested Lm
				lock_nested Lm

So, Lockdep considers this case safe, actually not tho.

	Byungchul

> 
> -- Steve
> 
> 
> 
> > 				lock_nested L'' with depth n + 1
> >    ...				...
> >    unlock L'			unlock L''
> >    unlock A			unlock A
> >    unlock L			unlock L'

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-09 23:38             ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-09 23:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel,
	Hyeonggon Yoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, ngupta, johannes.berg, jack,
	dan.j.williams, josef, linux-block, linux-fsdevel, jglisse, viro,
	tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm, torvalds

On Mon, May 09, 2022 at 04:47:12PM -0400, Steven Rostedt wrote:
> On Mon, 9 May 2022 09:16:37 +0900
> Byungchul Park <byungchul.park@lge.com> wrote:
> 
> > CASE 2.
> > 
> >    lock L with depth n
> >    lock A
> >    lock_nested L' with depth n + 1
> >    ...
> >    unlock L'
> >    unlock A
> >    unlock L
> > 
> > This case is allowed by Lockdep.
> > This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
> > 
> > ---
> > 
> > The following scenario would explain why CASE 2 is problematic.
> > 
> >    THREAD X			THREAD Y
> > 
> >    lock L with depth n
> > 				lock L' with depth n
> >    lock A
> > 				lock A
> >    lock_nested L' with depth n + 1
> 
> I'm confused by what exactly you are saying is a deadlock above.
> 
> Are you saying that lock A and L' are inversed? If so, lockdep had better

Hi Steven,

Yes, I was talking about A and L'.

> detect that regardless of L. A nested lock associates the the nesting with

When I checked Lockdep code, L' with depth n + 1 and L' with depth n
have different classes in Lockdep.

That's why I said Lockdep cannot detect it. By any chance, has it
changed so as to consider this case? Or am I missing something?

> the same type of lock. That is, in lockdep nested tells lockdep not to
> trigger on the L and L' but it will not ignore that A was taken.

It will not ignore A but it would work like this:

   THREAD X			THREAD Y

   lock Ln
				lock Ln
   lock A
				lock A
   lock_nested Lm
				lock_nested Lm

So, Lockdep considers this case safe, actually not tho.

	Byungchul

> 
> -- Steve
> 
> 
> 
> > 				lock_nested L'' with depth n + 1
> >    ...				...
> >    unlock L'			unlock L''
> >    unlock A			unlock A
> >    unlock L			unlock L'

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-09 22:28     ` Theodore Ts'o
@ 2022-05-10  0:32       ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-10  0:32 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa, 42.hyeyoo

On Mon, May 09, 2022 at 06:28:17PM -0400, Theodore Ts'o wrote:
> Oh, one other problem with DEPT --- it's SLOW --- the overhead is
> enormous.  Using kvm-xfstests[1] running "kvm-xfstests smoke", here
> are some sample times:

Yes, right. DEPT has never been optimized. It rather turns on
CONFIG_LOCKDEP and even CONFIG_PROVE_LOCKING when CONFIG_DEPT gets on
because of porting issue. I have no choice but to rely on those to
develop DEPT out of tree. Of course, that's what I don't like.

Plus, for now, I'm focusing on removing false positives. Once it's
considered settled down, I will work on performance optimizaition. But
it should still keep relying on Lockdep CONFIGs and adding additional
overhead on it until DEPT can be developed in the tree.

> 			LOCKDEP		DEPT
> Time to first test	49 seconds	602 seconds
> ext4/001      		2 s		22 s
> ext4/003		2 s		8 s
> ext4/005		0 s		7 s
> ext4/020		1 s		8 s
> ext4/021		11 s		17 s
> ext4/023		0 s		83 s
> generic/001		4 s		76 s
> generic/002		0 s		11 s
> generic/003		10 s		19 s
> 
> There are some large variations; in some cases, some xfstests take 10x
> as much time or more to run.  In fact, when I first started the
> kvm-xfstests run with DEPT, I thought something had hung and that
> tests would never start.  (In fact, with gce-xfstests the default
> watchdog "something has gone terribly wrong with the kexec" had fired,
> and I didn't get any test results using gce-xfstests at all.  If DEPT
> goes in without any optimizations, I'm going to have to adjust the
> watchdogs timers for gce-xfstests.)

Thank you for informing it. I will go for the optimization as well.

> The bottom line is that at the moment, between the false positives,
> and the significant overhead imposed by DEPT, I would suggest that if
> DEPT ever does go in, that it should be possible to disable DEPT and
> only use the existing CONFIG_PROVE_LOCKING version of LOCKDEP, just
> because DEPT is S - L - O - W.
> 
> [1] https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md
> 
> 						- Ted
> 
> P.S.  Darrick and I both have disabled using LOCKDEP by default
> because it slows down ext4 -g auto testing by a factor 2, and xfs -g
> auto testing by a factor of 3.  So the fact that DEPT is a factor of
> 2x to 10x or more slower than LOCKDEP when running various xfstests
> tests should be a real concern.

DEPT is tracking way more objects than Lockdep so it's inevitable to be
slower, but let me try to make it have the similar performance to
Lockdep.

	Byungchul

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-10  0:32       ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-10  0:32 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm, torvalds

On Mon, May 09, 2022 at 06:28:17PM -0400, Theodore Ts'o wrote:
> Oh, one other problem with DEPT --- it's SLOW --- the overhead is
> enormous.  Using kvm-xfstests[1] running "kvm-xfstests smoke", here
> are some sample times:

Yes, right. DEPT has never been optimized. It rather turns on
CONFIG_LOCKDEP and even CONFIG_PROVE_LOCKING when CONFIG_DEPT gets on
because of porting issue. I have no choice but to rely on those to
develop DEPT out of tree. Of course, that's what I don't like.

Plus, for now, I'm focusing on removing false positives. Once it's
considered settled down, I will work on performance optimizaition. But
it should still keep relying on Lockdep CONFIGs and adding additional
overhead on it until DEPT can be developed in the tree.

> 			LOCKDEP		DEPT
> Time to first test	49 seconds	602 seconds
> ext4/001      		2 s		22 s
> ext4/003		2 s		8 s
> ext4/005		0 s		7 s
> ext4/020		1 s		8 s
> ext4/021		11 s		17 s
> ext4/023		0 s		83 s
> generic/001		4 s		76 s
> generic/002		0 s		11 s
> generic/003		10 s		19 s
> 
> There are some large variations; in some cases, some xfstests take 10x
> as much time or more to run.  In fact, when I first started the
> kvm-xfstests run with DEPT, I thought something had hung and that
> tests would never start.  (In fact, with gce-xfstests the default
> watchdog "something has gone terribly wrong with the kexec" had fired,
> and I didn't get any test results using gce-xfstests at all.  If DEPT
> goes in without any optimizations, I'm going to have to adjust the
> watchdogs timers for gce-xfstests.)

Thank you for informing it. I will go for the optimization as well.

> The bottom line is that at the moment, between the false positives,
> and the significant overhead imposed by DEPT, I would suggest that if
> DEPT ever does go in, that it should be possible to disable DEPT and
> only use the existing CONFIG_PROVE_LOCKING version of LOCKDEP, just
> because DEPT is S - L - O - W.
> 
> [1] https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md
> 
> 						- Ted
> 
> P.S.  Darrick and I both have disabled using LOCKDEP by default
> because it slows down ext4 -g auto testing by a factor 2, and xfs -g
> auto testing by a factor of 3.  So the fact that DEPT is a factor of
> 2x to 10x or more slower than LOCKDEP when running various xfstests
> tests should be a real concern.

DEPT is tracking way more objects than Lockdep so it's inevitable to be
slower, but let me try to make it have the similar performance to
Lockdep.

	Byungchul

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-10  0:32       ` Byungchul Park
@ 2022-05-10  1:32         ` Theodore Ts'o
  -1 siblings, 0 replies; 105+ messages in thread
From: Theodore Ts'o @ 2022-05-10  1:32 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa, 42.hyeyoo

On Tue, May 10, 2022 at 09:32:13AM +0900, Byungchul Park wrote:
> Yes, right. DEPT has never been optimized. It rather turns on
> CONFIG_LOCKDEP and even CONFIG_PROVE_LOCKING when CONFIG_DEPT gets on
> because of porting issue. I have no choice but to rely on those to
> develop DEPT out of tree. Of course, that's what I don't like.

Sure, but blaming the overhead on unnecessary CONFIG_PROVE_LOCKING
overhead can explain only a tiny fraction of the slowdown.  Consider:
if time to first test (time to boot the kernel, setup the test
environment, figure out which tests to run, etc.) is 12 seconds w/o
LOCKDEP, 49 seconds with LOCKDEP/PROVE_LOCKING and 602 seconds with
DEPT, you can really only blame 37 seconds out of the 602 seconds of
DEPT on unnecessary PROVE_LOCKING overhead.

So let's assume we can get rid of all of the PROVE_LOCKING overhead.
We're still talking about 12 seconds for time-to-first test without
any lock debugging, versus ** 565 ** seconds for time-to-first test
with DEPT.  That's a factor of 47x for DEPT sans LOCKDEP overhead,
compared to a 4x overhead for PROVE_LOCKING.

> Plus, for now, I'm focusing on removing false positives. Once it's
> considered settled down, I will work on performance optimizaition. But
> it should still keep relying on Lockdep CONFIGs and adding additional
> overhead on it until DEPT can be developed in the tree.

Well, please take a look at the false positive which I reported.  I
suspect that in order to fix that particular false positive, we'll
either need to have a way to disable DEPT on waiting on all page/folio
dirty bits, or it will need to treat pages from different inodes
and/or address spaces as being entirely separate classes, instead of
collapsing all inode dirty bits, and all of various inode's mutexes
(such as ext4's i_data_sem) as being part of a single object class.

> DEPT is tracking way more objects than Lockdep so it's inevitable to be
> slower, but let me try to make it have the similar performance to
> Lockdep.

In order to eliminate some of these false positives, I suspect it's
going to increase the number of object classes that DEPT will need to
track even *more*.  At which point, the cost/benefit of DEPT may get
called into question, especially if all of the false positives can't
be suppressed.

					- Ted

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-10  1:32         ` Theodore Ts'o
  0 siblings, 0 replies; 105+ messages in thread
From: Theodore Ts'o @ 2022-05-10  1:32 UTC (permalink / raw)
  To: Byungchul Park
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm, torvalds

On Tue, May 10, 2022 at 09:32:13AM +0900, Byungchul Park wrote:
> Yes, right. DEPT has never been optimized. It rather turns on
> CONFIG_LOCKDEP and even CONFIG_PROVE_LOCKING when CONFIG_DEPT gets on
> because of porting issue. I have no choice but to rely on those to
> develop DEPT out of tree. Of course, that's what I don't like.

Sure, but blaming the overhead on unnecessary CONFIG_PROVE_LOCKING
overhead can explain only a tiny fraction of the slowdown.  Consider:
if time to first test (time to boot the kernel, setup the test
environment, figure out which tests to run, etc.) is 12 seconds w/o
LOCKDEP, 49 seconds with LOCKDEP/PROVE_LOCKING and 602 seconds with
DEPT, you can really only blame 37 seconds out of the 602 seconds of
DEPT on unnecessary PROVE_LOCKING overhead.

So let's assume we can get rid of all of the PROVE_LOCKING overhead.
We're still talking about 12 seconds for time-to-first test without
any lock debugging, versus ** 565 ** seconds for time-to-first test
with DEPT.  That's a factor of 47x for DEPT sans LOCKDEP overhead,
compared to a 4x overhead for PROVE_LOCKING.

> Plus, for now, I'm focusing on removing false positives. Once it's
> considered settled down, I will work on performance optimizaition. But
> it should still keep relying on Lockdep CONFIGs and adding additional
> overhead on it until DEPT can be developed in the tree.

Well, please take a look at the false positive which I reported.  I
suspect that in order to fix that particular false positive, we'll
either need to have a way to disable DEPT on waiting on all page/folio
dirty bits, or it will need to treat pages from different inodes
and/or address spaces as being entirely separate classes, instead of
collapsing all inode dirty bits, and all of various inode's mutexes
(such as ext4's i_data_sem) as being part of a single object class.

> DEPT is tracking way more objects than Lockdep so it's inevitable to be
> slower, but let me try to make it have the similar performance to
> Lockdep.

In order to eliminate some of these false positives, I suspect it's
going to increase the number of object classes that DEPT will need to
track even *more*.  At which point, the cost/benefit of DEPT may get
called into question, especially if all of the false positives can't
be suppressed.

					- Ted

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-10  1:32         ` Theodore Ts'o
@ 2022-05-10  5:37           ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-10  5:37 UTC (permalink / raw)
  To: tytso
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, paolo.valente, josef, linux-fsdevel,
	viro, jack, jack, jlayton, dan.j.williams, hch, djwong,
	dri-devel, rodrigosiqueiramelo, melissa.srw, hamohammed.sa,
	42.hyeyoo

Ted wrote:
> On Tue, May 10, 2022 at 09:32:13AM +0900, Byungchul Park wrote:
> > Yes, right. DEPT has never been optimized. It rather turns on
> > CONFIG_LOCKDEP and even CONFIG_PROVE_LOCKING when CONFIG_DEPT gets on
> > because of porting issue. I have no choice but to rely on those to
> > develop DEPT out of tree. Of course, that's what I don't like.
> 
> Sure, but blaming the overhead on unnecessary CONFIG_PROVE_LOCKING
> overhead can explain only a tiny fraction of the slowdown.  Consider:
> if time to first test (time to boot the kernel, setup the test
> environment, figure out which tests to run, etc.) is 12 seconds w/o
> LOCKDEP, 49 seconds with LOCKDEP/PROVE_LOCKING and 602 seconds with
> DEPT, you can really only blame 37 seconds out of the 602 seconds of
> DEPT on unnecessary PROVE_LOCKING overhead.
> 
> So let's assume we can get rid of all of the PROVE_LOCKING overhead.
> We're still talking about 12 seconds for time-to-first test without
> any lock debugging, versus ** 565 ** seconds for time-to-first test
> with DEPT.  That's a factor of 47x for DEPT sans LOCKDEP overhead,
> compared to a 4x overhead for PROVE_LOCKING.

Okay. I will work on it.

> > Plus, for now, I'm focusing on removing false positives. Once it's
> > considered settled down, I will work on performance optimizaition. But
> > it should still keep relying on Lockdep CONFIGs and adding additional
> > overhead on it until DEPT can be developed in the tree.
> 
> Well, please take a look at the false positive which I reported.  I
> suspect that in order to fix that particular false positive, we'll
> either need to have a way to disable DEPT on waiting on all page/folio
> dirty bits, or it will need to treat pages from different inodes
> and/or address spaces as being entirely separate classes, instead of
> collapsing all inode dirty bits, and all of various inode's mutexes
> (such as ext4's i_data_sem) as being part of a single object class.

I'd rather solve it by assigning different classes to different types of
inode. This is the right way.

> > DEPT is tracking way more objects than Lockdep so it's inevitable to be
> > slower, but let me try to make it have the similar performance to
> > Lockdep.
> 
> In order to eliminate some of these false positives, I suspect it's
> going to increase the number of object classes that DEPT will need to
> track even *more*.  At which point, the cost/benefit of DEPT may get
> called into question, especially if all of the false positives can't
> be suppressed.

Look. Let's talk in general terms. There's no way to get rid of the
false positives all the way. It's a decision issue for *balancing*
between considering potential cases and only real ones. Definitely,
potential is not real. The more potential things we consider, the higher
the chances are, that false positives appear.

But yes. The advantage we'd take by detecting potential ones should be
higher than the risk of being bothered by false ones. Do you think a
tool is useless if it produces a few false positives? Of course, it'd
be a problem if it's too many, but otherwise, I think it'd be a great
tool if the advantage > the risk.

Don't get me wrong here. It doesn't mean DEPT is perfect for now. The
performance should be improved and false alarms that appear should be
removed, of course. I'm talking about the direction.

For now, there's no tool to track wait/event itself in Linux kernel -
a subset of the functionality exists tho. DEPT is the 1st try for that
purpose and can be a useful tool by the right direction.

I know what you are concerning about. I bet it's false positives that
are going to bother you once merged. I'll insist that DEPT shouldn't be
used as a mandatory testing tool until considered stable enough. But
what about ones who would take the advantage use DEPT. Why don't you
think of folks who will take the advantage from the hints about
dependency of synchronization esp. when their subsystem requires very
complicated synchronization? Should a tool be useful only in a final
testing stage? What about the usefulness during development stage?

It's worth noting DEPT works with any wait/event so any lockups e.g.
even by HW-SW interface, retry logic or the like can be detected by DEPT
once all waits and events are tagged properly. I believe the advantage
by that is much higher than the bad side facing false alarms. It's just
my opinion. I'm goning to respect the majority opinion.

	Byungchul
> 
> 					- Ted
> 

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-10  5:37           ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-10  5:37 UTC (permalink / raw)
  To: tytso
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, mingo, djwong, vdavydov.dev, rientjes,
	dennis, linux-ext4, ngupta, johannes.berg, jack, dan.j.williams,
	josef, rostedt, linux-block, linux-fsdevel, jglisse, viro, tglx,
	mhocko, vbabka, melissa.srw, sj, rodrigosiqueiramelo,
	kernel-team, gregkh, jlayton, linux-kernel, penberg, minchan,
	hannes, tj, akpm, torvalds

Ted wrote:
> On Tue, May 10, 2022 at 09:32:13AM +0900, Byungchul Park wrote:
> > Yes, right. DEPT has never been optimized. It rather turns on
> > CONFIG_LOCKDEP and even CONFIG_PROVE_LOCKING when CONFIG_DEPT gets on
> > because of porting issue. I have no choice but to rely on those to
> > develop DEPT out of tree. Of course, that's what I don't like.
> 
> Sure, but blaming the overhead on unnecessary CONFIG_PROVE_LOCKING
> overhead can explain only a tiny fraction of the slowdown.  Consider:
> if time to first test (time to boot the kernel, setup the test
> environment, figure out which tests to run, etc.) is 12 seconds w/o
> LOCKDEP, 49 seconds with LOCKDEP/PROVE_LOCKING and 602 seconds with
> DEPT, you can really only blame 37 seconds out of the 602 seconds of
> DEPT on unnecessary PROVE_LOCKING overhead.
> 
> So let's assume we can get rid of all of the PROVE_LOCKING overhead.
> We're still talking about 12 seconds for time-to-first test without
> any lock debugging, versus ** 565 ** seconds for time-to-first test
> with DEPT.  That's a factor of 47x for DEPT sans LOCKDEP overhead,
> compared to a 4x overhead for PROVE_LOCKING.

Okay. I will work on it.

> > Plus, for now, I'm focusing on removing false positives. Once it's
> > considered settled down, I will work on performance optimizaition. But
> > it should still keep relying on Lockdep CONFIGs and adding additional
> > overhead on it until DEPT can be developed in the tree.
> 
> Well, please take a look at the false positive which I reported.  I
> suspect that in order to fix that particular false positive, we'll
> either need to have a way to disable DEPT on waiting on all page/folio
> dirty bits, or it will need to treat pages from different inodes
> and/or address spaces as being entirely separate classes, instead of
> collapsing all inode dirty bits, and all of various inode's mutexes
> (such as ext4's i_data_sem) as being part of a single object class.

I'd rather solve it by assigning different classes to different types of
inode. This is the right way.

> > DEPT is tracking way more objects than Lockdep so it's inevitable to be
> > slower, but let me try to make it have the similar performance to
> > Lockdep.
> 
> In order to eliminate some of these false positives, I suspect it's
> going to increase the number of object classes that DEPT will need to
> track even *more*.  At which point, the cost/benefit of DEPT may get
> called into question, especially if all of the false positives can't
> be suppressed.

Look. Let's talk in general terms. There's no way to get rid of the
false positives all the way. It's a decision issue for *balancing*
between considering potential cases and only real ones. Definitely,
potential is not real. The more potential things we consider, the higher
the chances are, that false positives appear.

But yes. The advantage we'd take by detecting potential ones should be
higher than the risk of being bothered by false ones. Do you think a
tool is useless if it produces a few false positives? Of course, it'd
be a problem if it's too many, but otherwise, I think it'd be a great
tool if the advantage > the risk.

Don't get me wrong here. It doesn't mean DEPT is perfect for now. The
performance should be improved and false alarms that appear should be
removed, of course. I'm talking about the direction.

For now, there's no tool to track wait/event itself in Linux kernel -
a subset of the functionality exists tho. DEPT is the 1st try for that
purpose and can be a useful tool by the right direction.

I know what you are concerning about. I bet it's false positives that
are going to bother you once merged. I'll insist that DEPT shouldn't be
used as a mandatory testing tool until considered stable enough. But
what about ones who would take the advantage use DEPT. Why don't you
think of folks who will take the advantage from the hints about
dependency of synchronization esp. when their subsystem requires very
complicated synchronization? Should a tool be useful only in a final
testing stage? What about the usefulness during development stage?

It's worth noting DEPT works with any wait/event so any lockups e.g.
even by HW-SW interface, retry logic or the like can be detected by DEPT
once all waits and events are tagged properly. I believe the advantage
by that is much higher than the bad side facing false alarms. It's just
my opinion. I'm goning to respect the majority opinion.

	Byungchul
> 
> 					- Ted
> 

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-09  0:16         ` Byungchul Park
@ 2022-05-10 11:18           ` Hyeonggon Yoo
  -1 siblings, 0 replies; 105+ messages in thread
From: Hyeonggon Yoo @ 2022-05-10 11:18 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Mon, May 09, 2022 at 09:16:37AM +0900, Byungchul Park wrote:
> On Sat, May 07, 2022 at 04:20:50PM +0900, Hyeonggon Yoo wrote:
> > On Fri, May 06, 2022 at 09:11:35AM +0900, Byungchul Park wrote:
> > > Linus wrote:
> > > >
> > > > On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
> > > > >
> > > > > Hi Linus and folks,
> > > > >
> > > > > I've been developing a tool for detecting deadlock possibilities by
> > > > > tracking wait/event rather than lock(?) acquisition order to try to
> > > > > cover all synchonization machanisms.
> > > > 
> > > > So what is the actual status of reports these days?
> > > > 
> > > > Last time I looked at some reports, it gave a lot of false positives
> > > > due to mis-understanding prepare_to_sleep().
> > > 
> > > Yes, it was. I handled the case in the following way:
> > > 
> > > 1. Stage the wait at prepare_to_sleep(), which might be used at commit.
> > >    Which has yet to be an actual wait that Dept considers.
> > > 2. If the condition for sleep is true, the wait will be committed at
> > >    __schedule(). The wait becomes an actual one that Dept considers.
> > > 3. If the condition is false and the task gets back to TASK_RUNNING,
> > >    clean(=reset) the staged wait.
> > > 
> > > That way, Dept only works with what actually hits to __schedule() for
> > > the waits through sleep.
> > > 
> > > > For this all to make sense, it would need to not have false positives
> > > > (or at least a very small number of them together with a way to sanely
> > > 
> > > Yes. I agree with you. I got rid of them that way I described above.
> > >
> > 
> > IMHO DEPT should not report what lockdep allows (Not talking about
> 
> No.
> 
> > wait events). I mean lockdep allows some kind of nested locks but
> > DEPT reports them.
> 
> You have already asked exactly same question in another thread of
> LKML. That time I answered to it but let me explain it again.
> 
> ---
> 
> CASE 1.
> 
>    lock L with depth n
>    lock_nested L' with depth n + 1
>    ...
>    unlock L'
>    unlock L
> 
> This case is allowed by Lockdep.
> This case is allowed by DEPT cuz it's not a deadlock.
> 
> CASE 2.
> 
>    lock L with depth n
>    lock A
>    lock_nested L' with depth n + 1
>    ...
>    unlock L'
>    unlock A
>    unlock L
> 
> This case is allowed by Lockdep.
> This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
>

Yeah, in previous threads we discussed this [1]

And the case was:
	scan_mutex -> object_lock -> kmemleak_lock -> object_lock
And dept reported:
	object_lock -> kmemleak_lock, kmemleak_lock -> object_lock as
	deadlock.

But IIUC - What DEPT reported happens only under scan_mutex and
It is not simple just not to take them because the object can be removed from the
list and freed while scanning via kmemleak_free() without kmemleak_lock and object_lock.

Just I'm still not sure that someone will fix the warning in the future - even if the
locking rule is not good - if it will not cause a real deadlock.

> ---
> 
> The following scenario would explain why CASE 2 is problematic.
> 
>    THREAD X			THREAD Y
> 
>    lock L with depth n
> 				lock L' with depth n
>    lock A
> 				lock A
>    lock_nested L' with depth n + 1
> 				lock_nested L'' with depth n + 1
>    ...				...
>    unlock L'			unlock L''
>    unlock A			unlock A
>    unlock L			unlock L'
> 
> Yes. I need to check if the report you shared with me is a true one, but
> it's not because DEPT doesn't work with *_nested() APIs.
>

Sorry, It was not right just to say DEPT doesn't work with _nested() APIs.

> 	Byungchul

[1] https://lore.kernel.org/lkml/20220304002809.GA6112@X58A-UD3R/

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-10 11:18           ` Hyeonggon Yoo
  0 siblings, 0 replies; 105+ messages in thread
From: Hyeonggon Yoo @ 2022-05-10 11:18 UTC (permalink / raw)
  To: Byungchul Park
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel, cl,
	will, duyuyang, sashal, paolo.valente, damien.lemoal, willy, hch,
	airlied, mingo, djwong, vdavydov.dev, rientjes, dennis,
	linux-ext4, ngupta, johannes.berg, jack, dan.j.williams, josef,
	rostedt, linux-block, linux-fsdevel, jglisse, viro, tglx, mhocko,
	vbabka, melissa.srw, sj, tytso, rodrigosiqueiramelo, kernel-team,
	gregkh, jlayton, linux-kernel, penberg, minchan, hannes, tj,
	akpm, torvalds

On Mon, May 09, 2022 at 09:16:37AM +0900, Byungchul Park wrote:
> On Sat, May 07, 2022 at 04:20:50PM +0900, Hyeonggon Yoo wrote:
> > On Fri, May 06, 2022 at 09:11:35AM +0900, Byungchul Park wrote:
> > > Linus wrote:
> > > >
> > > > On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
> > > > >
> > > > > Hi Linus and folks,
> > > > >
> > > > > I've been developing a tool for detecting deadlock possibilities by
> > > > > tracking wait/event rather than lock(?) acquisition order to try to
> > > > > cover all synchonization machanisms.
> > > > 
> > > > So what is the actual status of reports these days?
> > > > 
> > > > Last time I looked at some reports, it gave a lot of false positives
> > > > due to mis-understanding prepare_to_sleep().
> > > 
> > > Yes, it was. I handled the case in the following way:
> > > 
> > > 1. Stage the wait at prepare_to_sleep(), which might be used at commit.
> > >    Which has yet to be an actual wait that Dept considers.
> > > 2. If the condition for sleep is true, the wait will be committed at
> > >    __schedule(). The wait becomes an actual one that Dept considers.
> > > 3. If the condition is false and the task gets back to TASK_RUNNING,
> > >    clean(=reset) the staged wait.
> > > 
> > > That way, Dept only works with what actually hits to __schedule() for
> > > the waits through sleep.
> > > 
> > > > For this all to make sense, it would need to not have false positives
> > > > (or at least a very small number of them together with a way to sanely
> > > 
> > > Yes. I agree with you. I got rid of them that way I described above.
> > >
> > 
> > IMHO DEPT should not report what lockdep allows (Not talking about
> 
> No.
> 
> > wait events). I mean lockdep allows some kind of nested locks but
> > DEPT reports them.
> 
> You have already asked exactly same question in another thread of
> LKML. That time I answered to it but let me explain it again.
> 
> ---
> 
> CASE 1.
> 
>    lock L with depth n
>    lock_nested L' with depth n + 1
>    ...
>    unlock L'
>    unlock L
> 
> This case is allowed by Lockdep.
> This case is allowed by DEPT cuz it's not a deadlock.
> 
> CASE 2.
> 
>    lock L with depth n
>    lock A
>    lock_nested L' with depth n + 1
>    ...
>    unlock L'
>    unlock A
>    unlock L
> 
> This case is allowed by Lockdep.
> This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
>

Yeah, in previous threads we discussed this [1]

And the case was:
	scan_mutex -> object_lock -> kmemleak_lock -> object_lock
And dept reported:
	object_lock -> kmemleak_lock, kmemleak_lock -> object_lock as
	deadlock.

But IIUC - What DEPT reported happens only under scan_mutex and
It is not simple just not to take them because the object can be removed from the
list and freed while scanning via kmemleak_free() without kmemleak_lock and object_lock.

Just I'm still not sure that someone will fix the warning in the future - even if the
locking rule is not good - if it will not cause a real deadlock.

> ---
> 
> The following scenario would explain why CASE 2 is problematic.
> 
>    THREAD X			THREAD Y
> 
>    lock L with depth n
> 				lock L' with depth n
>    lock A
> 				lock A
>    lock_nested L' with depth n + 1
> 				lock_nested L'' with depth n + 1
>    ...				...
>    unlock L'			unlock L''
>    unlock A			unlock A
>    unlock L			unlock L'
> 
> Yes. I need to check if the report you shared with me is a true one, but
> it's not because DEPT doesn't work with *_nested() APIs.
>

Sorry, It was not right just to say DEPT doesn't work with _nested() APIs.

> 	Byungchul

[1] https://lore.kernel.org/lkml/20220304002809.GA6112@X58A-UD3R/

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-09 23:38             ` Byungchul Park
@ 2022-05-10 14:12               ` Steven Rostedt
  -1 siblings, 0 replies; 105+ messages in thread
From: Steven Rostedt @ 2022-05-10 14:12 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Hyeonggon Yoo, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, linux-kernel, peterz, will,
	tglx, joel, sashal, daniel.vetter, chris, duyuyang,
	johannes.berg, tj, tytso, willy, david, amir73il, gregkh,
	kernel-team, linux-mm, akpm, mhocko, minchan, hannes,
	vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka,
	ngupta, linux-block, paolo.valente, josef, linux-fsdevel, viro,
	jack, jack, jlayton, dan.j.williams, hch, djwong, dri-devel,
	airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Tue, 10 May 2022 08:38:38 +0900
Byungchul Park <byungchul.park@lge.com> wrote:

> Yes, I was talking about A and L'.
> 
> > detect that regardless of L. A nested lock associates the the nesting with  
> 
> When I checked Lockdep code, L' with depth n + 1 and L' with depth n
> have different classes in Lockdep.

If that's the case, then that's a bug in lockdep.

> 
> That's why I said Lockdep cannot detect it. By any chance, has it
> changed so as to consider this case? Or am I missing something?

No, it's not that lockdep cannot detect it, it should detect it. If it is
not detecting it, then we need to fix that.

-- Steve

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-10 14:12               ` Steven Rostedt
  0 siblings, 0 replies; 105+ messages in thread
From: Steven Rostedt @ 2022-05-10 14:12 UTC (permalink / raw)
  To: Byungchul Park
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel,
	Hyeonggon Yoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, ngupta, johannes.berg, jack,
	dan.j.williams, josef, linux-block, linux-fsdevel, jglisse, viro,
	tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm, torvalds

On Tue, 10 May 2022 08:38:38 +0900
Byungchul Park <byungchul.park@lge.com> wrote:

> Yes, I was talking about A and L'.
> 
> > detect that regardless of L. A nested lock associates the the nesting with  
> 
> When I checked Lockdep code, L' with depth n + 1 and L' with depth n
> have different classes in Lockdep.

If that's the case, then that's a bug in lockdep.

> 
> That's why I said Lockdep cannot detect it. By any chance, has it
> changed so as to consider this case? Or am I missing something?

No, it's not that lockdep cannot detect it, it should detect it. If it is
not detecting it, then we need to fix that.

-- Steve

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-10 14:12               ` Steven Rostedt
@ 2022-05-10 23:26                 ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-10 23:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Hyeonggon Yoo, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, linux-kernel, peterz, will,
	tglx, joel, sashal, daniel.vetter, chris, duyuyang,
	johannes.berg, tj, tytso, willy, david, amir73il, gregkh,
	kernel-team, linux-mm, akpm, mhocko, minchan, hannes,
	vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka,
	ngupta, linux-block, paolo.valente, josef, linux-fsdevel, viro,
	jack, jack, jlayton, dan.j.williams, hch, djwong, dri-devel,
	airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Tue, May 10, 2022 at 10:12:54AM -0400, Steven Rostedt wrote:
> On Tue, 10 May 2022 08:38:38 +0900
> Byungchul Park <byungchul.park@lge.com> wrote:
> 
> > Yes, I was talking about A and L'.
> > 
> > > detect that regardless of L. A nested lock associates the the nesting with  
> > 
> > When I checked Lockdep code, L' with depth n + 1 and L' with depth n
> > have different classes in Lockdep.
> 
> If that's the case, then that's a bug in lockdep.

Yes, agree. I should've said 'Lockdep doesn't detect it currently.'
rather than 'Lockdep can't detect it.'.

I also think we make it for this case by fixing the bug in Lockdep.

> > 
> > That's why I said Lockdep cannot detect it. By any chance, has it
> > changed so as to consider this case? Or am I missing something?
> 
> No, it's not that lockdep cannot detect it, it should detect it. If it is
> not detecting it, then we need to fix that.

Yes.

	Byungchul
> 
> -- Steve

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-10 23:26                 ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-10 23:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel,
	Hyeonggon Yoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, ngupta, johannes.berg, jack,
	dan.j.williams, josef, linux-block, linux-fsdevel, jglisse, viro,
	tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm, torvalds

On Tue, May 10, 2022 at 10:12:54AM -0400, Steven Rostedt wrote:
> On Tue, 10 May 2022 08:38:38 +0900
> Byungchul Park <byungchul.park@lge.com> wrote:
> 
> > Yes, I was talking about A and L'.
> > 
> > > detect that regardless of L. A nested lock associates the the nesting with  
> > 
> > When I checked Lockdep code, L' with depth n + 1 and L' with depth n
> > have different classes in Lockdep.
> 
> If that's the case, then that's a bug in lockdep.

Yes, agree. I should've said 'Lockdep doesn't detect it currently.'
rather than 'Lockdep can't detect it.'.

I also think we make it for this case by fixing the bug in Lockdep.

> > 
> > That's why I said Lockdep cannot detect it. By any chance, has it
> > changed so as to consider this case? Or am I missing something?
> 
> No, it's not that lockdep cannot detect it, it should detect it. If it is
> not detecting it, then we need to fix that.

Yes.

	Byungchul
> 
> -- Steve

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-10 11:18           ` Hyeonggon Yoo
@ 2022-05-10 23:39             ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-10 23:39 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Tue, May 10, 2022 at 08:18:12PM +0900, Hyeonggon Yoo wrote:
> On Mon, May 09, 2022 at 09:16:37AM +0900, Byungchul Park wrote:
> > On Sat, May 07, 2022 at 04:20:50PM +0900, Hyeonggon Yoo wrote:
> > > On Fri, May 06, 2022 at 09:11:35AM +0900, Byungchul Park wrote:
> > > > Linus wrote:
> > > > >
> > > > > On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
> > > > > >
> > > > > > Hi Linus and folks,
> > > > > >
> > > > > > I've been developing a tool for detecting deadlock possibilities by
> > > > > > tracking wait/event rather than lock(?) acquisition order to try to
> > > > > > cover all synchonization machanisms.
> > > > > 
> > > > > So what is the actual status of reports these days?
> > > > > 
> > > > > Last time I looked at some reports, it gave a lot of false positives
> > > > > due to mis-understanding prepare_to_sleep().
> > > > 
> > > > Yes, it was. I handled the case in the following way:
> > > > 
> > > > 1. Stage the wait at prepare_to_sleep(), which might be used at commit.
> > > >    Which has yet to be an actual wait that Dept considers.
> > > > 2. If the condition for sleep is true, the wait will be committed at
> > > >    __schedule(). The wait becomes an actual one that Dept considers.
> > > > 3. If the condition is false and the task gets back to TASK_RUNNING,
> > > >    clean(=reset) the staged wait.
> > > > 
> > > > That way, Dept only works with what actually hits to __schedule() for
> > > > the waits through sleep.
> > > > 
> > > > > For this all to make sense, it would need to not have false positives
> > > > > (or at least a very small number of them together with a way to sanely
> > > > 
> > > > Yes. I agree with you. I got rid of them that way I described above.
> > > >
> > > 
> > > IMHO DEPT should not report what lockdep allows (Not talking about
> > 
> > No.
> > 
> > > wait events). I mean lockdep allows some kind of nested locks but
> > > DEPT reports them.
> > 
> > You have already asked exactly same question in another thread of
> > LKML. That time I answered to it but let me explain it again.
> > 
> > ---
> > 
> > CASE 1.
> > 
> >    lock L with depth n
> >    lock_nested L' with depth n + 1
> >    ...
> >    unlock L'
> >    unlock L
> > 
> > This case is allowed by Lockdep.
> > This case is allowed by DEPT cuz it's not a deadlock.
> > 
> > CASE 2.
> > 
> >    lock L with depth n
> >    lock A
> >    lock_nested L' with depth n + 1
> >    ...
> >    unlock L'
> >    unlock A
> >    unlock L
> > 
> > This case is allowed by Lockdep.
> > This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
> >
> 
> Yeah, in previous threads we discussed this [1]
> 
> And the case was:
> 	scan_mutex -> object_lock -> kmemleak_lock -> object_lock
> And dept reported:
> 	object_lock -> kmemleak_lock, kmemleak_lock -> object_lock as
> 	deadlock.
> 
> But IIUC - What DEPT reported happens only under scan_mutex and
> It is not simple just not to take them because the object can be removed from the
> list and freed while scanning via kmemleak_free() without kmemleak_lock and object_lock.

That should be one of the following order:

1. kmemleak_lock -> object_lock -> object_lock(nested)
2. object_lock -> object_lock(nested) -> kmemleak_lock

> Just I'm still not sure that someone will fix the warning in the future - even if the
> locking rule is not good - if it will not cause a real deadlock.

There's more important thing than making code just work for now. For
example, maintainance, communcation via code between current developers
and potential new commers in the future and so on.

At least, a comment describing why the wrong order in the code is safe
should be added. I wouldn't allow the current order in the code if I
were the maintainer.

	Byungchul

> > ---
> > 
> > The following scenario would explain why CASE 2 is problematic.
> > 
> >    THREAD X			THREAD Y
> > 
> >    lock L with depth n
> > 				lock L' with depth n
> >    lock A
> > 				lock A
> >    lock_nested L' with depth n + 1
> > 				lock_nested L'' with depth n + 1
> >    ...				...
> >    unlock L'			unlock L''
> >    unlock A			unlock A
> >    unlock L			unlock L'
> > 
> > Yes. I need to check if the report you shared with me is a true one, but
> > it's not because DEPT doesn't work with *_nested() APIs.
> >
> 
> Sorry, It was not right just to say DEPT doesn't work with _nested() APIs.
> 
> > 	Byungchul
> 
> [1] https://lore.kernel.org/lkml/20220304002809.GA6112@X58A-UD3R/
> 
> -- 
> Thanks,
> Hyeonggon

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-10 23:39             ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-10 23:39 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel, cl,
	will, duyuyang, sashal, paolo.valente, damien.lemoal, willy, hch,
	airlied, mingo, djwong, vdavydov.dev, rientjes, dennis,
	linux-ext4, ngupta, johannes.berg, jack, dan.j.williams, josef,
	rostedt, linux-block, linux-fsdevel, jglisse, viro, tglx, mhocko,
	vbabka, melissa.srw, sj, tytso, rodrigosiqueiramelo, kernel-team,
	gregkh, jlayton, linux-kernel, penberg, minchan, hannes, tj,
	akpm, torvalds

On Tue, May 10, 2022 at 08:18:12PM +0900, Hyeonggon Yoo wrote:
> On Mon, May 09, 2022 at 09:16:37AM +0900, Byungchul Park wrote:
> > On Sat, May 07, 2022 at 04:20:50PM +0900, Hyeonggon Yoo wrote:
> > > On Fri, May 06, 2022 at 09:11:35AM +0900, Byungchul Park wrote:
> > > > Linus wrote:
> > > > >
> > > > > On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
> > > > > >
> > > > > > Hi Linus and folks,
> > > > > >
> > > > > > I've been developing a tool for detecting deadlock possibilities by
> > > > > > tracking wait/event rather than lock(?) acquisition order to try to
> > > > > > cover all synchonization machanisms.
> > > > > 
> > > > > So what is the actual status of reports these days?
> > > > > 
> > > > > Last time I looked at some reports, it gave a lot of false positives
> > > > > due to mis-understanding prepare_to_sleep().
> > > > 
> > > > Yes, it was. I handled the case in the following way:
> > > > 
> > > > 1. Stage the wait at prepare_to_sleep(), which might be used at commit.
> > > >    Which has yet to be an actual wait that Dept considers.
> > > > 2. If the condition for sleep is true, the wait will be committed at
> > > >    __schedule(). The wait becomes an actual one that Dept considers.
> > > > 3. If the condition is false and the task gets back to TASK_RUNNING,
> > > >    clean(=reset) the staged wait.
> > > > 
> > > > That way, Dept only works with what actually hits to __schedule() for
> > > > the waits through sleep.
> > > > 
> > > > > For this all to make sense, it would need to not have false positives
> > > > > (or at least a very small number of them together with a way to sanely
> > > > 
> > > > Yes. I agree with you. I got rid of them that way I described above.
> > > >
> > > 
> > > IMHO DEPT should not report what lockdep allows (Not talking about
> > 
> > No.
> > 
> > > wait events). I mean lockdep allows some kind of nested locks but
> > > DEPT reports them.
> > 
> > You have already asked exactly same question in another thread of
> > LKML. That time I answered to it but let me explain it again.
> > 
> > ---
> > 
> > CASE 1.
> > 
> >    lock L with depth n
> >    lock_nested L' with depth n + 1
> >    ...
> >    unlock L'
> >    unlock L
> > 
> > This case is allowed by Lockdep.
> > This case is allowed by DEPT cuz it's not a deadlock.
> > 
> > CASE 2.
> > 
> >    lock L with depth n
> >    lock A
> >    lock_nested L' with depth n + 1
> >    ...
> >    unlock L'
> >    unlock A
> >    unlock L
> > 
> > This case is allowed by Lockdep.
> > This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
> >
> 
> Yeah, in previous threads we discussed this [1]
> 
> And the case was:
> 	scan_mutex -> object_lock -> kmemleak_lock -> object_lock
> And dept reported:
> 	object_lock -> kmemleak_lock, kmemleak_lock -> object_lock as
> 	deadlock.
> 
> But IIUC - What DEPT reported happens only under scan_mutex and
> It is not simple just not to take them because the object can be removed from the
> list and freed while scanning via kmemleak_free() without kmemleak_lock and object_lock.

That should be one of the following order:

1. kmemleak_lock -> object_lock -> object_lock(nested)
2. object_lock -> object_lock(nested) -> kmemleak_lock

> Just I'm still not sure that someone will fix the warning in the future - even if the
> locking rule is not good - if it will not cause a real deadlock.

There's more important thing than making code just work for now. For
example, maintainance, communcation via code between current developers
and potential new commers in the future and so on.

At least, a comment describing why the wrong order in the code is safe
should be added. I wouldn't allow the current order in the code if I
were the maintainer.

	Byungchul

> > ---
> > 
> > The following scenario would explain why CASE 2 is problematic.
> > 
> >    THREAD X			THREAD Y
> > 
> >    lock L with depth n
> > 				lock L' with depth n
> >    lock A
> > 				lock A
> >    lock_nested L' with depth n + 1
> > 				lock_nested L'' with depth n + 1
> >    ...				...
> >    unlock L'			unlock L''
> >    unlock A			unlock A
> >    unlock L			unlock L'
> > 
> > Yes. I need to check if the report you shared with me is a true one, but
> > it's not because DEPT doesn't work with *_nested() APIs.
> >
> 
> Sorry, It was not right just to say DEPT doesn't work with _nested() APIs.
> 
> > 	Byungchul
> 
> [1] https://lore.kernel.org/lkml/20220304002809.GA6112@X58A-UD3R/
> 
> -- 
> Thanks,
> Hyeonggon

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-10  5:37           ` Byungchul Park
@ 2022-05-11  1:16             ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-11  1:16 UTC (permalink / raw)
  To: tytso
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, paolo.valente, josef, linux-fsdevel,
	viro, jack, jack, jlayton, dan.j.williams, hch, djwong,
	dri-devel, rodrigosiqueiramelo, melissa.srw, hamohammed.sa,
	42.hyeyoo

On Tue, May 10, 2022 at 02:37:40PM +0900, Byungchul Park wrote:
> Ted wrote:
> > On Tue, May 10, 2022 at 09:32:13AM +0900, Byungchul Park wrote:
> > > DEPT is tracking way more objects than Lockdep so it's inevitable to be
> > > slower, but let me try to make it have the similar performance to
> > > Lockdep.
> > 
> > In order to eliminate some of these false positives, I suspect it's
> > going to increase the number of object classes that DEPT will need to
> > track even *more*.  At which point, the cost/benefit of DEPT may get
> > called into question, especially if all of the false positives can't
> > be suppressed.
> 
> Look. Let's talk in general terms. There's no way to get rid of the
> false positives all the way. It's a decision issue for *balancing*
> between considering potential cases and only real ones. Definitely,
> potential is not real. The more potential things we consider, the higher
> the chances are, that false positives appear.
> 
> But yes. The advantage we'd take by detecting potential ones should be
> higher than the risk of being bothered by false ones. Do you think a
> tool is useless if it produces a few false positives? Of course, it'd
> be a problem if it's too many, but otherwise, I think it'd be a great
> tool if the advantage > the risk.
> 
> Don't get me wrong here. It doesn't mean DEPT is perfect for now. The
> performance should be improved and false alarms that appear should be
> removed, of course. I'm talking about the direction.
> 
> For now, there's no tool to track wait/event itself in Linux kernel -
> a subset of the functionality exists tho. DEPT is the 1st try for that
> purpose and can be a useful tool by the right direction.
> 
> I know what you are concerning about. I bet it's false positives that
> are going to bother you once merged. I'll insist that DEPT shouldn't be
> used as a mandatory testing tool until considered stable enough. But
> what about ones who would take the advantage use DEPT. Why don't you
> think of folks who will take the advantage from the hints about
> dependency of synchronization esp. when their subsystem requires very
> complicated synchronization? Should a tool be useful only in a final
> testing stage? What about the usefulness during development stage?
> 
> It's worth noting DEPT works with any wait/event so any lockups e.g.
> even by HW-SW interface, retry logic or the like can be detected by DEPT
> once all waits and events are tagged properly. I believe the advantage
> by that is much higher than the bad side facing false alarms. It's just
> my opinion. I'm goning to respect the majority opinion.

s/take advantage/have the benefit/g

	Byungchul

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-11  1:16             ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-11  1:16 UTC (permalink / raw)
  To: tytso
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, mingo, djwong, vdavydov.dev, rientjes,
	dennis, linux-ext4, ngupta, johannes.berg, jack, dan.j.williams,
	josef, rostedt, linux-block, linux-fsdevel, jglisse, viro, tglx,
	mhocko, vbabka, melissa.srw, sj, rodrigosiqueiramelo,
	kernel-team, gregkh, jlayton, linux-kernel, penberg, minchan,
	hannes, tj, akpm, torvalds

On Tue, May 10, 2022 at 02:37:40PM +0900, Byungchul Park wrote:
> Ted wrote:
> > On Tue, May 10, 2022 at 09:32:13AM +0900, Byungchul Park wrote:
> > > DEPT is tracking way more objects than Lockdep so it's inevitable to be
> > > slower, but let me try to make it have the similar performance to
> > > Lockdep.
> > 
> > In order to eliminate some of these false positives, I suspect it's
> > going to increase the number of object classes that DEPT will need to
> > track even *more*.  At which point, the cost/benefit of DEPT may get
> > called into question, especially if all of the false positives can't
> > be suppressed.
> 
> Look. Let's talk in general terms. There's no way to get rid of the
> false positives all the way. It's a decision issue for *balancing*
> between considering potential cases and only real ones. Definitely,
> potential is not real. The more potential things we consider, the higher
> the chances are, that false positives appear.
> 
> But yes. The advantage we'd take by detecting potential ones should be
> higher than the risk of being bothered by false ones. Do you think a
> tool is useless if it produces a few false positives? Of course, it'd
> be a problem if it's too many, but otherwise, I think it'd be a great
> tool if the advantage > the risk.
> 
> Don't get me wrong here. It doesn't mean DEPT is perfect for now. The
> performance should be improved and false alarms that appear should be
> removed, of course. I'm talking about the direction.
> 
> For now, there's no tool to track wait/event itself in Linux kernel -
> a subset of the functionality exists tho. DEPT is the 1st try for that
> purpose and can be a useful tool by the right direction.
> 
> I know what you are concerning about. I bet it's false positives that
> are going to bother you once merged. I'll insist that DEPT shouldn't be
> used as a mandatory testing tool until considered stable enough. But
> what about ones who would take the advantage use DEPT. Why don't you
> think of folks who will take the advantage from the hints about
> dependency of synchronization esp. when their subsystem requires very
> complicated synchronization? Should a tool be useful only in a final
> testing stage? What about the usefulness during development stage?
> 
> It's worth noting DEPT works with any wait/event so any lockups e.g.
> even by HW-SW interface, retry logic or the like can be detected by DEPT
> once all waits and events are tagged properly. I believe the advantage
> by that is much higher than the bad side facing false alarms. It's just
> my opinion. I'm goning to respect the majority opinion.

s/take advantage/have the benefit/g

	Byungchul

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-10 23:39             ` Byungchul Park
@ 2022-05-11 10:04               ` Hyeonggon Yoo
  -1 siblings, 0 replies; 105+ messages in thread
From: Hyeonggon Yoo @ 2022-05-11 10:04 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa, catalin.marinas

On Wed, May 11, 2022 at 08:39:29AM +0900, Byungchul Park wrote:
> On Tue, May 10, 2022 at 08:18:12PM +0900, Hyeonggon Yoo wrote:
> > On Mon, May 09, 2022 at 09:16:37AM +0900, Byungchul Park wrote:
> > > On Sat, May 07, 2022 at 04:20:50PM +0900, Hyeonggon Yoo wrote:
> > > > On Fri, May 06, 2022 at 09:11:35AM +0900, Byungchul Park wrote:
> > > > > Linus wrote:
> > > > > >
> > > > > > On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
> > > > > > >
> > > > > > > Hi Linus and folks,
> > > > > > >
> > > > > > > I've been developing a tool for detecting deadlock possibilities by
> > > > > > > tracking wait/event rather than lock(?) acquisition order to try to
> > > > > > > cover all synchonization machanisms.
> > > > > > 
> > > > > > So what is the actual status of reports these days?
> > > > > > 
> > > > > > Last time I looked at some reports, it gave a lot of false positives
> > > > > > due to mis-understanding prepare_to_sleep().
> > > > > 
> > > > > Yes, it was. I handled the case in the following way:
> > > > > 
> > > > > 1. Stage the wait at prepare_to_sleep(), which might be used at commit.
> > > > >    Which has yet to be an actual wait that Dept considers.
> > > > > 2. If the condition for sleep is true, the wait will be committed at
> > > > >    __schedule(). The wait becomes an actual one that Dept considers.
> > > > > 3. If the condition is false and the task gets back to TASK_RUNNING,
> > > > >    clean(=reset) the staged wait.
> > > > > 
> > > > > That way, Dept only works with what actually hits to __schedule() for
> > > > > the waits through sleep.
> > > > > 
> > > > > > For this all to make sense, it would need to not have false positives
> > > > > > (or at least a very small number of them together with a way to sanely
> > > > > 
> > > > > Yes. I agree with you. I got rid of them that way I described above.
> > > > >
> > > > 
> > > > IMHO DEPT should not report what lockdep allows (Not talking about
> > > 
> > > No.
> > > 
> > > > wait events). I mean lockdep allows some kind of nested locks but
> > > > DEPT reports them.
> > > 
> > > You have already asked exactly same question in another thread of
> > > LKML. That time I answered to it but let me explain it again.
> > > 
> > > ---
> > > 
> > > CASE 1.
> > > 
> > >    lock L with depth n
> > >    lock_nested L' with depth n + 1
> > >    ...
> > >    unlock L'
> > >    unlock L
> > > 
> > > This case is allowed by Lockdep.
> > > This case is allowed by DEPT cuz it's not a deadlock.
> > > 
> > > CASE 2.
> > > 
> > >    lock L with depth n
> > >    lock A
> > >    lock_nested L' with depth n + 1
> > >    ...
> > >    unlock L'
> > >    unlock A
> > >    unlock L
> > > 
> > > This case is allowed by Lockdep.
> > > This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
> > >
> > 
> > Yeah, in previous threads we discussed this [1]
> > 
> > And the case was:
> > 	scan_mutex -> object_lock -> kmemleak_lock -> object_lock
> > And dept reported:
> > 	object_lock -> kmemleak_lock, kmemleak_lock -> object_lock as
> > 	deadlock.
> > 
> > But IIUC - What DEPT reported happens only under scan_mutex and
> > It is not simple just not to take them because the object can be removed from the
> > list and freed while scanning via kmemleak_free() without kmemleak_lock and object_lock.
>
>
> That should be one of the following order:
> 
> 1. kmemleak_lock -> object_lock -> object_lock(nested)
> 2. object_lock -> object_lock(nested) -> kmemleak_lock
> 
> > Just I'm still not sure that someone will fix the warning in the future - even if the
> > locking rule is not good - if it will not cause a real deadlock.
> 
> There's more important thing than making code just work for now. For
> example, maintainance, communcation via code between current developers
> and potential new commers in the future and so on.

Then we will get same reports from DEPT until already existing bad code (even if it does not
cause deadlock) is reworked. If you think that is right thing to do, okay.

> At least, a comment describing why the wrong order in the code is safe
> should be added.

AFAIK The comment is already there in mm/kmemleak.c.

> I wouldn't allow the current order in the code if I
> were the maintainer.

[+Cc Catalin]
He may have opinion.

Thanks,
Hyeonggon

> 	Byungchul
> 
> > > ---
> > > 
> > > The following scenario would explain why CASE 2 is problematic.
> > > 
> > >    THREAD X			THREAD Y
> > > 
> > >    lock L with depth n
> > > 				lock L' with depth n
> > >    lock A
> > > 				lock A
> > >    lock_nested L' with depth n + 1
> > > 				lock_nested L'' with depth n + 1
> > >    ...				...
> > >    unlock L'			unlock L''
> > >    unlock A			unlock A
> > >    unlock L			unlock L'
> > > 
> > > Yes. I need to check if the report you shared with me is a true one, but
> > > it's not because DEPT doesn't work with *_nested() APIs.
> > >
> > 
> > Sorry, It was not right just to say DEPT doesn't work with _nested() APIs.
> > 
> > > 	Byungchul
> > 
> > [1] https://lore.kernel.org/lkml/20220304002809.GA6112@X58A-UD3R/
> > 
> > -- 
> > Thanks,
> > Hyeonggon

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-11 10:04               ` Hyeonggon Yoo
  0 siblings, 0 replies; 105+ messages in thread
From: Hyeonggon Yoo @ 2022-05-11 10:04 UTC (permalink / raw)
  To: Byungchul Park
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel, cl,
	will, duyuyang, sashal, paolo.valente, damien.lemoal, willy, hch,
	airlied, mingo, djwong, vdavydov.dev, rientjes, dennis,
	linux-ext4, ngupta, johannes.berg, jack, dan.j.williams, josef,
	rostedt, linux-block, linux-fsdevel, jglisse, viro, tglx, mhocko,
	vbabka, melissa.srw, sj, tytso, catalin.marinas,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm, torvalds

On Wed, May 11, 2022 at 08:39:29AM +0900, Byungchul Park wrote:
> On Tue, May 10, 2022 at 08:18:12PM +0900, Hyeonggon Yoo wrote:
> > On Mon, May 09, 2022 at 09:16:37AM +0900, Byungchul Park wrote:
> > > On Sat, May 07, 2022 at 04:20:50PM +0900, Hyeonggon Yoo wrote:
> > > > On Fri, May 06, 2022 at 09:11:35AM +0900, Byungchul Park wrote:
> > > > > Linus wrote:
> > > > > >
> > > > > > On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@lge.com> wrote:
> > > > > > >
> > > > > > > Hi Linus and folks,
> > > > > > >
> > > > > > > I've been developing a tool for detecting deadlock possibilities by
> > > > > > > tracking wait/event rather than lock(?) acquisition order to try to
> > > > > > > cover all synchonization machanisms.
> > > > > > 
> > > > > > So what is the actual status of reports these days?
> > > > > > 
> > > > > > Last time I looked at some reports, it gave a lot of false positives
> > > > > > due to mis-understanding prepare_to_sleep().
> > > > > 
> > > > > Yes, it was. I handled the case in the following way:
> > > > > 
> > > > > 1. Stage the wait at prepare_to_sleep(), which might be used at commit.
> > > > >    Which has yet to be an actual wait that Dept considers.
> > > > > 2. If the condition for sleep is true, the wait will be committed at
> > > > >    __schedule(). The wait becomes an actual one that Dept considers.
> > > > > 3. If the condition is false and the task gets back to TASK_RUNNING,
> > > > >    clean(=reset) the staged wait.
> > > > > 
> > > > > That way, Dept only works with what actually hits to __schedule() for
> > > > > the waits through sleep.
> > > > > 
> > > > > > For this all to make sense, it would need to not have false positives
> > > > > > (or at least a very small number of them together with a way to sanely
> > > > > 
> > > > > Yes. I agree with you. I got rid of them that way I described above.
> > > > >
> > > > 
> > > > IMHO DEPT should not report what lockdep allows (Not talking about
> > > 
> > > No.
> > > 
> > > > wait events). I mean lockdep allows some kind of nested locks but
> > > > DEPT reports them.
> > > 
> > > You have already asked exactly same question in another thread of
> > > LKML. That time I answered to it but let me explain it again.
> > > 
> > > ---
> > > 
> > > CASE 1.
> > > 
> > >    lock L with depth n
> > >    lock_nested L' with depth n + 1
> > >    ...
> > >    unlock L'
> > >    unlock L
> > > 
> > > This case is allowed by Lockdep.
> > > This case is allowed by DEPT cuz it's not a deadlock.
> > > 
> > > CASE 2.
> > > 
> > >    lock L with depth n
> > >    lock A
> > >    lock_nested L' with depth n + 1
> > >    ...
> > >    unlock L'
> > >    unlock A
> > >    unlock L
> > > 
> > > This case is allowed by Lockdep.
> > > This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
> > >
> > 
> > Yeah, in previous threads we discussed this [1]
> > 
> > And the case was:
> > 	scan_mutex -> object_lock -> kmemleak_lock -> object_lock
> > And dept reported:
> > 	object_lock -> kmemleak_lock, kmemleak_lock -> object_lock as
> > 	deadlock.
> > 
> > But IIUC - What DEPT reported happens only under scan_mutex and
> > It is not simple just not to take them because the object can be removed from the
> > list and freed while scanning via kmemleak_free() without kmemleak_lock and object_lock.
>
>
> That should be one of the following order:
> 
> 1. kmemleak_lock -> object_lock -> object_lock(nested)
> 2. object_lock -> object_lock(nested) -> kmemleak_lock
> 
> > Just I'm still not sure that someone will fix the warning in the future - even if the
> > locking rule is not good - if it will not cause a real deadlock.
> 
> There's more important thing than making code just work for now. For
> example, maintainance, communcation via code between current developers
> and potential new commers in the future and so on.

Then we will get same reports from DEPT until already existing bad code (even if it does not
cause deadlock) is reworked. If you think that is right thing to do, okay.

> At least, a comment describing why the wrong order in the code is safe
> should be added.

AFAIK The comment is already there in mm/kmemleak.c.

> I wouldn't allow the current order in the code if I
> were the maintainer.

[+Cc Catalin]
He may have opinion.

Thanks,
Hyeonggon

> 	Byungchul
> 
> > > ---
> > > 
> > > The following scenario would explain why CASE 2 is problematic.
> > > 
> > >    THREAD X			THREAD Y
> > > 
> > >    lock L with depth n
> > > 				lock L' with depth n
> > >    lock A
> > > 				lock A
> > >    lock_nested L' with depth n + 1
> > > 				lock_nested L'' with depth n + 1
> > >    ...				...
> > >    unlock L'			unlock L''
> > >    unlock A			unlock A
> > >    unlock L			unlock L'
> > > 
> > > Yes. I need to check if the report you shared with me is a true one, but
> > > it's not because DEPT doesn't work with *_nested() APIs.
> > >
> > 
> > Sorry, It was not right just to say DEPT doesn't work with _nested() APIs.
> > 
> > > 	Byungchul
> > 
> > [1] https://lore.kernel.org/lkml/20220304002809.GA6112@X58A-UD3R/
> > 
> > -- 
> > Thanks,
> > Hyeonggon

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [REPORT] syscall reboot + umh + firmware fallback
  2022-05-04  8:17 ` Byungchul Park
@ 2022-05-12  5:25   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-12  5:25 UTC (permalink / raw)
  To: torvalds, holt, mcgrof
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, 42.hyeyoo

+cc mcgrof@kernel.org (firmware)
+cc holt@sgi.com (syscall reboot)

Hi Luis, Robin and folks,

I'm developing a tool for lockup detection, DETP(Dependency Tracker).
I got a DEPT report from Hyeonggon - Thanks, Hyeonggon!

It doesn't mean the code *definitely* has a deadlock. However, it looks
problematic to me. So I'd like to ask things to see if it is actually.

Because Hyeonggon didn't run decode_stacktrace.sh before sending it to
me, I don't have a report having a better debugging information. But I
can explain it in this mail. The problematic scenario looks like:


PROCESS A	PROCESS B	WORKER C

__do_sys_reboot()
		__do_sys_reboot()
 mutex_lock(&system_transition_mutex)
 ...		 mutex_lock(&system_transition_mutex) <- stuck
		 ...
				request_firmware_work_func()
				 _request_firmware()
				  firmware_fallback_sysfs()
				   usermodehelper_read_lock_wait()
				    down_read(&umhelper_sem)
				   ...
				   fw_load_sysfs_fallback()
				    fw_sysfs_wait_timeout()
				     wait_for_completion_killable_timeout(&fw_st->completion) <- stuck
 kernel_halt()
  __usermodehelper_disable()
   down_write(&umhelper_sem) <- stuck

--------------------------------------------------------
All the 3 contexts are stuck at this point.
--------------------------------------------------------

PROCESS A	PROCESS B	WORKER C

   ...
   up_write(&umhelper_sem)
 ...
 mutex_unlock(&system_transition_mutex) <- cannot wake up B

		 ...
		 kernel_halt()
		  notifier_call_chain()
		   hw_shutdown_notify()
		    kill_pending_fw_fallback_reqs()
		     __fw_load_abort()
		      complete_all(&fw_st->completion) <- cannot wake up C

				   ...
				   usermodeheler_read_unlock()
				    up_read(&umhelper_sem) <- cannot wake up A


So I think this scenario is problematic. Or am I missing something here?
Or do you think it's okay because the wait_for_completion_*() has a
timeout? AFAIK, timeouts are not supposed to fire in normal cases.

It'd be appreciated if you share your opinion on the report.

	Byungchul

---

[   18.136012][    T1] ===================================================
[   18.136419][    T1] DEPT: Circular dependency has been detected.
[   18.136782][    T1] 5.18.0-rc3-57979-gc2b89afca919 #2374 Tainted: G    B            
[   18.137249][    T1] ---------------------------------------------------
[   18.137649][    T1] summary
[   18.137823][    T1] ---------------------------------------------------
[   18.138222][    T1] *** DEADLOCK ***
[   18.138222][    T1] 
[   18.138569][    T1] context A
[   18.138754][    T1]     [S] __mutex_lock_common(system_transition_mutex:0)
[   18.139170][    T1]     [W] down_write(umhelper_sem:0)
[   18.139482][    T1]     [E] mutex_unlock(system_transition_mutex:0)
[   18.139865][    T1] 
[   18.140004][    T1] context B
[   18.140189][    T1]     [S] (unknown)(&fw_st->completion:0)
[   18.140527][    T1]     [W] __mutex_lock_common(system_transition_mutex:0)
[   18.140942][    T1]     [E] complete_all(&fw_st->completion:0)
[   18.141295][    T1] 
[   18.141434][    T1] context C
[   18.141618][    T1]     [S] down_read(umhelper_sem:0)
[   18.141926][    T1]     [W] wait_for_completion_killable_timeout(&fw_st->completion:0)
[   18.142402][    T1]     [E] up_read(umhelper_sem:0)
[   18.142699][    T1] 
[   18.142837][    T1] [S]: start of the event context
[   18.143134][    T1] [W]: the wait blocked
[   18.143379][    T1] [E]: the event not reachable
[   18.143661][    T1] ---------------------------------------------------
[   18.144063][    T1] context A's detail
[   18.144293][    T1] ---------------------------------------------------
[   18.144691][    T1] context A
[   18.144875][    T1]     [S] __mutex_lock_common(system_transition_mutex:0)
[   18.145290][    T1]     [W] down_write(umhelper_sem:0)
[   18.145602][    T1]     [E] mutex_unlock(system_transition_mutex:0)
[   18.145982][    T1] 
[   18.146120][    T1] [S] __mutex_lock_common(system_transition_mutex:0):
[   18.146519][    T1] [<ffffffff810ee14c>] __do_sys_reboot+0x11f/0x24f
[   18.146907][    T1] stacktrace:
[   18.147101][    T1]       __mutex_lock+0x1f3/0x3f3
[   18.147396][    T1]       __do_sys_reboot+0x11f/0x24f
[   18.147706][    T1]       do_syscall_64+0xd4/0xfb
[   18.148001][    T1]       entry_SYSCALL_64_after_hwframe+0x44/0xae
[   18.148379][    T1] 
[   18.148517][    T1] [W] down_write(umhelper_sem:0):
[   18.148815][    T1] [<ffffffff810d9c14>] __usermodehelper_disable+0x80/0x17f
[   18.149243][    T1] stacktrace:
[   18.149438][    T1]       __dept_wait+0x115/0x15b
[   18.149726][    T1]       dept_wait+0xcd/0xf3
[   18.149993][    T1]       down_write+0x4e/0x82
[   18.150266][    T1]       __usermodehelper_disable+0x80/0x17f
[   18.150615][    T1]       kernel_halt+0x33/0x5d
[   18.150893][    T1]       __do_sys_reboot+0x197/0x24f
[   18.151201][    T1]       do_syscall_64+0xd4/0xfb
[   18.151489][    T1]       entry_SYSCALL_64_after_hwframe+0x44/0xae
[   18.151866][    T1] 
[   18.152004][    T1] [E] mutex_unlock(system_transition_mutex:0):
[   18.152368][    T1] (N/A)
[   18.152532][    T1] ---------------------------------------------------
[   18.152931][    T1] context B's detail
[   18.153161][    T1] ---------------------------------------------------
[   18.153559][    T1] context B
[   18.153743][    T1]     [S] (unknown)(&fw_st->completion:0)
[   18.154082][    T1]     [W] __mutex_lock_common(system_transition_mutex:0)
[   18.154496][    T1]     [E] complete_all(&fw_st->completion:0)
[   18.154848][    T1] 
[   18.154987][    T1] [S] (unknown)(&fw_st->completion:0):
[   18.155310][    T1] (N/A)
[   18.155473][    T1] 
[   18.155612][    T1] [W] __mutex_lock_common(system_transition_mutex:0):
[   18.156014][    T1] [<ffffffff810ee14c>] __do_sys_reboot+0x11f/0x24f
[   18.156400][    T1] stacktrace:
[   18.156594][    T1]       __mutex_lock+0x1ce/0x3f3
[   18.156887][    T1]       __do_sys_reboot+0x11f/0x24f
[   18.157196][    T1]       do_syscall_64+0xd4/0xfb
[   18.157482][    T1]       entry_SYSCALL_64_after_hwframe+0x44/0xae
[   18.157856][    T1] 
[   18.157993][    T1] [E] complete_all(&fw_st->completion:0):
[   18.158330][    T1] [<ffffffff81c04230>] kill_pending_fw_fallback_reqs+0x66/0x95
[   18.158778][    T1] stacktrace:
[   18.158973][    T1]       complete_all+0x28/0x58
[   18.159256][    T1]       kill_pending_fw_fallback_reqs+0x66/0x95
[   18.159624][    T1]       fw_shutdown_notify+0x7/0xa
[   18.159929][    T1]       notifier_call_chain+0x4f/0x81
[   18.160246][    T1]       blocking_notifier_call_chain+0x4c/0x64
[   18.160611][    T1]       kernel_halt+0x13/0x5d
[   18.160889][    T1]       __do_sys_reboot+0x197/0x24f
[   18.161197][    T1]       do_syscall_64+0xd4/0xfb
[   18.161485][    T1]       entry_SYSCALL_64_after_hwframe+0x44/0xae
[   18.161861][    T1] ---------------------------------------------------
[   18.162260][    T1] context C's detail
[   18.162490][    T1] ---------------------------------------------------
[   18.162889][    T1] context C
[   18.163073][    T1]     [S] down_read(umhelper_sem:0)
[   18.163379][    T1]     [W] wait_for_completion_killable_timeout(&fw_st->completion:0)
[   18.163855][    T1]     [E] up_read(umhelper_sem:0)
[   18.164150][    T1] 
[   18.164288][    T1] [S] down_read(umhelper_sem:0):
[   18.164578][    T1] [<ffffffff810d8f99>] usermodehelper_read_lock_wait+0xad/0x139
[   18.165027][    T1] stacktrace:
[   18.165220][    T1]       down_read+0x74/0x85
[   18.165487][    T1]       usermodehelper_read_lock_wait+0xad/0x139
[   18.165860][    T1]       firmware_fallback_sysfs+0x118/0x521
[   18.166207][    T1]       _request_firmware+0x7ef/0x91b
[   18.166525][    T1]       request_firmware_work_func+0xb1/0x13b
[   18.166884][    T1]       process_one_work+0x4c3/0x771
[   18.167199][    T1]       worker_thread+0x37f/0x49e
[   18.167497][    T1]       kthread+0x122/0x131
[   18.167768][    T1]       ret_from_fork+0x1f/0x30
[   18.168055][    T1] 
[   18.168192][    T1] [W] wait_for_completion_killable_timeout(&fw_st->completion:0):
[   18.168650][    T1] [<ffffffff81c046b7>] firmware_fallback_sysfs+0x42a/0x521
[   18.169076][    T1] stacktrace:
[   18.169270][    T1]       wait_for_completion_killable_timeout+0x3c/0x58
[   18.169676][    T1]       firmware_fallback_sysfs+0x42a/0x521
[   18.170024][    T1]       _request_firmware+0x7ef/0x91b
[   18.170341][    T1]       request_firmware_work_func+0xb1/0x13b
[   18.170699][    T1]       process_one_work+0x4c3/0x771
[   18.171012][    T1]       worker_thread+0x37f/0x49e
[   18.171309][    T1]       kthread+0x122/0x131
[   18.171575][    T1]       ret_from_fork+0x1f/0x30
[   18.171863][    T1] 
[   18.172001][    T1] [E] up_read(umhelper_sem:0):
[   18.172281][    T1] (N/A)
[   18.172445][    T1] ---------------------------------------------------
[   18.172844][    T1] information that might be helpful
[   18.173151][    T1] ---------------------------------------------------
[   18.173549][    T1] CPU: 0 PID: 1 Comm: init Tainted: G    B             5.18.0-rc3-57979-gc2b89afca919 #2374
[   18.174144][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[   18.174687][    T1] Call Trace:
[   18.174882][    T1]  <TASK>
[   18.175056][    T1]  print_circle+0x6a2/0x6f9
[   18.175326][    T1]  ? extend_queue+0xf4/0xf4
[   18.175594][    T1]  ? __list_add_valid+0xce/0xf6
[   18.175886][    T1]  ? __list_add+0x45/0x4e
[   18.176144][    T1]  ? print_circle+0x6f9/0x6f9
[   18.176422][    T1]  cb_check_dl+0xc0/0xc7
[   18.176675][    T1]  bfs+0x1c8/0x27b
[   18.176897][    T1]  ? disconnect_class+0x24c/0x24c
[   18.177195][    T1]  ? __list_add+0x45/0x4e
[   18.177454][    T1]  add_dep+0x140/0x217
[   18.177697][    T1]  ? add_ecxt+0x33b/0x33b
[   18.177955][    T1]  ? llist_del_first+0xc/0x46
[   18.178234][    T1]  add_wait+0x58a/0x723
[   18.178482][    T1]  ? __thaw_task+0x3e/0x3e
[   18.178745][    T1]  ? add_dep+0x217/0x217
[   18.178998][    T1]  ? llist_add_batch+0x33/0x4c
[   18.179281][    T1]  ? check_new_class+0x139/0x30d
[   18.179574][    T1]  __dept_wait+0x115/0x15b
[   18.179837][    T1]  ? __usermodehelper_disable+0x80/0x17f
[   18.180170][    T1]  ? add_wait+0x723/0x723
[   18.180426][    T1]  ? lock_release+0x338/0x338
[   18.180704][    T1]  ? __usermodehelper_disable+0x80/0x17f
[   18.181037][    T1]  dept_wait+0xcd/0xf3
[   18.181280][    T1]  down_write+0x4e/0x82
[   18.181527][    T1]  ? __usermodehelper_disable+0x80/0x17f
[   18.181861][    T1]  __usermodehelper_disable+0x80/0x17f
[   18.182184][    T1]  ? __usermodehelper_set_disable_depth+0x3a/0x3a
[   18.182565][    T1]  ? dept_ecxt_exit+0x1c9/0x1f7
[   18.182854][    T1]  ? blocking_notifier_call_chain+0x57/0x64
[   18.183205][    T1]  kernel_halt+0x33/0x5d
[   18.183458][    T1]  __do_sys_reboot+0x197/0x24f
[   18.183742][    T1]  ? kernel_power_off+0x6b/0x6b
[   18.184033][    T1]  ? dept_on+0x27/0x33
[   18.184275][    T1]  ? dept_exit+0x38/0x42
[   18.184526][    T1]  ? dept_on+0x27/0x33
[   18.184767][    T1]  ? dept_on+0x27/0x33
[   18.185008][    T1]  ? dept_exit+0x38/0x42
[   18.185260][    T1]  ? dept_enirq_transition+0x25a/0x295
[   18.185582][    T1]  ? syscall_enter_from_user_mode+0x47/0x71
[   18.185930][    T1]  ? dept_aware_softirqs_disable+0x1e/0x1e
[   18.186274][    T1]  ? syscall_enter_from_user_mode+0x47/0x71
[   18.186622][    T1]  do_syscall_64+0xd4/0xfb
[   18.186883][    T1]  ? asm_exc_page_fault+0x1e/0x30
[   18.187180][    T1]  ? dept_aware_softirqs_disable+0x1e/0x1e
[   18.187526][    T1]  ? dept_kernel_enter+0x15/0x1e
[   18.187821][    T1]  ? do_syscall_64+0x13/0xfb
[   18.188094][    T1]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   18.188443][    T1] RIP: 0033:0x4485f7
[   18.188672][    T1] Code: 00 75 05 48 83 c4 28 c3 e8 26 17 00 00 66 0f 1f 44 00 00 f3 0f 1e fa 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8
[   18.189822][    T1] RSP: 002b:00007ffc92206f28 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
[   18.190316][    T1] RAX: ffffffffffffffda RBX: 00007ffc92207118 RCX: 00000000004485f7
[   18.190784][    T1] RDX: 000000004321fedc RSI: 0000000028121969 RDI: 00000000fee1dead
[   18.191254][    T1] RBP: 00007ffc92206f30 R08: 00000000016376a0 R09: 0000000000000000
[   18.191722][    T1] R10: 00000000004c3820 R11: 0000000000000202 R12: 0000000000000001
[   18.192194][    T1] R13: 00007ffc92207108 R14: 00000000004bf8d0 R15: 0000000000000001
[   18.192667][    T1]  </TASK>

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [REPORT] syscall reboot + umh + firmware fallback
@ 2022-05-12  5:25   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-12  5:25 UTC (permalink / raw)
  To: torvalds, holt, mcgrof
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm

+cc mcgrof@kernel.org (firmware)
+cc holt@sgi.com (syscall reboot)

Hi Luis, Robin and folks,

I'm developing a tool for lockup detection, DETP(Dependency Tracker).
I got a DEPT report from Hyeonggon - Thanks, Hyeonggon!

It doesn't mean the code *definitely* has a deadlock. However, it looks
problematic to me. So I'd like to ask things to see if it is actually.

Because Hyeonggon didn't run decode_stacktrace.sh before sending it to
me, I don't have a report having a better debugging information. But I
can explain it in this mail. The problematic scenario looks like:


PROCESS A	PROCESS B	WORKER C

__do_sys_reboot()
		__do_sys_reboot()
 mutex_lock(&system_transition_mutex)
 ...		 mutex_lock(&system_transition_mutex) <- stuck
		 ...
				request_firmware_work_func()
				 _request_firmware()
				  firmware_fallback_sysfs()
				   usermodehelper_read_lock_wait()
				    down_read(&umhelper_sem)
				   ...
				   fw_load_sysfs_fallback()
				    fw_sysfs_wait_timeout()
				     wait_for_completion_killable_timeout(&fw_st->completion) <- stuck
 kernel_halt()
  __usermodehelper_disable()
   down_write(&umhelper_sem) <- stuck

--------------------------------------------------------
All the 3 contexts are stuck at this point.
--------------------------------------------------------

PROCESS A	PROCESS B	WORKER C

   ...
   up_write(&umhelper_sem)
 ...
 mutex_unlock(&system_transition_mutex) <- cannot wake up B

		 ...
		 kernel_halt()
		  notifier_call_chain()
		   hw_shutdown_notify()
		    kill_pending_fw_fallback_reqs()
		     __fw_load_abort()
		      complete_all(&fw_st->completion) <- cannot wake up C

				   ...
				   usermodeheler_read_unlock()
				    up_read(&umhelper_sem) <- cannot wake up A


So I think this scenario is problematic. Or am I missing something here?
Or do you think it's okay because the wait_for_completion_*() has a
timeout? AFAIK, timeouts are not supposed to fire in normal cases.

It'd be appreciated if you share your opinion on the report.

	Byungchul

---

[   18.136012][    T1] ===================================================
[   18.136419][    T1] DEPT: Circular dependency has been detected.
[   18.136782][    T1] 5.18.0-rc3-57979-gc2b89afca919 #2374 Tainted: G    B            
[   18.137249][    T1] ---------------------------------------------------
[   18.137649][    T1] summary
[   18.137823][    T1] ---------------------------------------------------
[   18.138222][    T1] *** DEADLOCK ***
[   18.138222][    T1] 
[   18.138569][    T1] context A
[   18.138754][    T1]     [S] __mutex_lock_common(system_transition_mutex:0)
[   18.139170][    T1]     [W] down_write(umhelper_sem:0)
[   18.139482][    T1]     [E] mutex_unlock(system_transition_mutex:0)
[   18.139865][    T1] 
[   18.140004][    T1] context B
[   18.140189][    T1]     [S] (unknown)(&fw_st->completion:0)
[   18.140527][    T1]     [W] __mutex_lock_common(system_transition_mutex:0)
[   18.140942][    T1]     [E] complete_all(&fw_st->completion:0)
[   18.141295][    T1] 
[   18.141434][    T1] context C
[   18.141618][    T1]     [S] down_read(umhelper_sem:0)
[   18.141926][    T1]     [W] wait_for_completion_killable_timeout(&fw_st->completion:0)
[   18.142402][    T1]     [E] up_read(umhelper_sem:0)
[   18.142699][    T1] 
[   18.142837][    T1] [S]: start of the event context
[   18.143134][    T1] [W]: the wait blocked
[   18.143379][    T1] [E]: the event not reachable
[   18.143661][    T1] ---------------------------------------------------
[   18.144063][    T1] context A's detail
[   18.144293][    T1] ---------------------------------------------------
[   18.144691][    T1] context A
[   18.144875][    T1]     [S] __mutex_lock_common(system_transition_mutex:0)
[   18.145290][    T1]     [W] down_write(umhelper_sem:0)
[   18.145602][    T1]     [E] mutex_unlock(system_transition_mutex:0)
[   18.145982][    T1] 
[   18.146120][    T1] [S] __mutex_lock_common(system_transition_mutex:0):
[   18.146519][    T1] [<ffffffff810ee14c>] __do_sys_reboot+0x11f/0x24f
[   18.146907][    T1] stacktrace:
[   18.147101][    T1]       __mutex_lock+0x1f3/0x3f3
[   18.147396][    T1]       __do_sys_reboot+0x11f/0x24f
[   18.147706][    T1]       do_syscall_64+0xd4/0xfb
[   18.148001][    T1]       entry_SYSCALL_64_after_hwframe+0x44/0xae
[   18.148379][    T1] 
[   18.148517][    T1] [W] down_write(umhelper_sem:0):
[   18.148815][    T1] [<ffffffff810d9c14>] __usermodehelper_disable+0x80/0x17f
[   18.149243][    T1] stacktrace:
[   18.149438][    T1]       __dept_wait+0x115/0x15b
[   18.149726][    T1]       dept_wait+0xcd/0xf3
[   18.149993][    T1]       down_write+0x4e/0x82
[   18.150266][    T1]       __usermodehelper_disable+0x80/0x17f
[   18.150615][    T1]       kernel_halt+0x33/0x5d
[   18.150893][    T1]       __do_sys_reboot+0x197/0x24f
[   18.151201][    T1]       do_syscall_64+0xd4/0xfb
[   18.151489][    T1]       entry_SYSCALL_64_after_hwframe+0x44/0xae
[   18.151866][    T1] 
[   18.152004][    T1] [E] mutex_unlock(system_transition_mutex:0):
[   18.152368][    T1] (N/A)
[   18.152532][    T1] ---------------------------------------------------
[   18.152931][    T1] context B's detail
[   18.153161][    T1] ---------------------------------------------------
[   18.153559][    T1] context B
[   18.153743][    T1]     [S] (unknown)(&fw_st->completion:0)
[   18.154082][    T1]     [W] __mutex_lock_common(system_transition_mutex:0)
[   18.154496][    T1]     [E] complete_all(&fw_st->completion:0)
[   18.154848][    T1] 
[   18.154987][    T1] [S] (unknown)(&fw_st->completion:0):
[   18.155310][    T1] (N/A)
[   18.155473][    T1] 
[   18.155612][    T1] [W] __mutex_lock_common(system_transition_mutex:0):
[   18.156014][    T1] [<ffffffff810ee14c>] __do_sys_reboot+0x11f/0x24f
[   18.156400][    T1] stacktrace:
[   18.156594][    T1]       __mutex_lock+0x1ce/0x3f3
[   18.156887][    T1]       __do_sys_reboot+0x11f/0x24f
[   18.157196][    T1]       do_syscall_64+0xd4/0xfb
[   18.157482][    T1]       entry_SYSCALL_64_after_hwframe+0x44/0xae
[   18.157856][    T1] 
[   18.157993][    T1] [E] complete_all(&fw_st->completion:0):
[   18.158330][    T1] [<ffffffff81c04230>] kill_pending_fw_fallback_reqs+0x66/0x95
[   18.158778][    T1] stacktrace:
[   18.158973][    T1]       complete_all+0x28/0x58
[   18.159256][    T1]       kill_pending_fw_fallback_reqs+0x66/0x95
[   18.159624][    T1]       fw_shutdown_notify+0x7/0xa
[   18.159929][    T1]       notifier_call_chain+0x4f/0x81
[   18.160246][    T1]       blocking_notifier_call_chain+0x4c/0x64
[   18.160611][    T1]       kernel_halt+0x13/0x5d
[   18.160889][    T1]       __do_sys_reboot+0x197/0x24f
[   18.161197][    T1]       do_syscall_64+0xd4/0xfb
[   18.161485][    T1]       entry_SYSCALL_64_after_hwframe+0x44/0xae
[   18.161861][    T1] ---------------------------------------------------
[   18.162260][    T1] context C's detail
[   18.162490][    T1] ---------------------------------------------------
[   18.162889][    T1] context C
[   18.163073][    T1]     [S] down_read(umhelper_sem:0)
[   18.163379][    T1]     [W] wait_for_completion_killable_timeout(&fw_st->completion:0)
[   18.163855][    T1]     [E] up_read(umhelper_sem:0)
[   18.164150][    T1] 
[   18.164288][    T1] [S] down_read(umhelper_sem:0):
[   18.164578][    T1] [<ffffffff810d8f99>] usermodehelper_read_lock_wait+0xad/0x139
[   18.165027][    T1] stacktrace:
[   18.165220][    T1]       down_read+0x74/0x85
[   18.165487][    T1]       usermodehelper_read_lock_wait+0xad/0x139
[   18.165860][    T1]       firmware_fallback_sysfs+0x118/0x521
[   18.166207][    T1]       _request_firmware+0x7ef/0x91b
[   18.166525][    T1]       request_firmware_work_func+0xb1/0x13b
[   18.166884][    T1]       process_one_work+0x4c3/0x771
[   18.167199][    T1]       worker_thread+0x37f/0x49e
[   18.167497][    T1]       kthread+0x122/0x131
[   18.167768][    T1]       ret_from_fork+0x1f/0x30
[   18.168055][    T1] 
[   18.168192][    T1] [W] wait_for_completion_killable_timeout(&fw_st->completion:0):
[   18.168650][    T1] [<ffffffff81c046b7>] firmware_fallback_sysfs+0x42a/0x521
[   18.169076][    T1] stacktrace:
[   18.169270][    T1]       wait_for_completion_killable_timeout+0x3c/0x58
[   18.169676][    T1]       firmware_fallback_sysfs+0x42a/0x521
[   18.170024][    T1]       _request_firmware+0x7ef/0x91b
[   18.170341][    T1]       request_firmware_work_func+0xb1/0x13b
[   18.170699][    T1]       process_one_work+0x4c3/0x771
[   18.171012][    T1]       worker_thread+0x37f/0x49e
[   18.171309][    T1]       kthread+0x122/0x131
[   18.171575][    T1]       ret_from_fork+0x1f/0x30
[   18.171863][    T1] 
[   18.172001][    T1] [E] up_read(umhelper_sem:0):
[   18.172281][    T1] (N/A)
[   18.172445][    T1] ---------------------------------------------------
[   18.172844][    T1] information that might be helpful
[   18.173151][    T1] ---------------------------------------------------
[   18.173549][    T1] CPU: 0 PID: 1 Comm: init Tainted: G    B             5.18.0-rc3-57979-gc2b89afca919 #2374
[   18.174144][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[   18.174687][    T1] Call Trace:
[   18.174882][    T1]  <TASK>
[   18.175056][    T1]  print_circle+0x6a2/0x6f9
[   18.175326][    T1]  ? extend_queue+0xf4/0xf4
[   18.175594][    T1]  ? __list_add_valid+0xce/0xf6
[   18.175886][    T1]  ? __list_add+0x45/0x4e
[   18.176144][    T1]  ? print_circle+0x6f9/0x6f9
[   18.176422][    T1]  cb_check_dl+0xc0/0xc7
[   18.176675][    T1]  bfs+0x1c8/0x27b
[   18.176897][    T1]  ? disconnect_class+0x24c/0x24c
[   18.177195][    T1]  ? __list_add+0x45/0x4e
[   18.177454][    T1]  add_dep+0x140/0x217
[   18.177697][    T1]  ? add_ecxt+0x33b/0x33b
[   18.177955][    T1]  ? llist_del_first+0xc/0x46
[   18.178234][    T1]  add_wait+0x58a/0x723
[   18.178482][    T1]  ? __thaw_task+0x3e/0x3e
[   18.178745][    T1]  ? add_dep+0x217/0x217
[   18.178998][    T1]  ? llist_add_batch+0x33/0x4c
[   18.179281][    T1]  ? check_new_class+0x139/0x30d
[   18.179574][    T1]  __dept_wait+0x115/0x15b
[   18.179837][    T1]  ? __usermodehelper_disable+0x80/0x17f
[   18.180170][    T1]  ? add_wait+0x723/0x723
[   18.180426][    T1]  ? lock_release+0x338/0x338
[   18.180704][    T1]  ? __usermodehelper_disable+0x80/0x17f
[   18.181037][    T1]  dept_wait+0xcd/0xf3
[   18.181280][    T1]  down_write+0x4e/0x82
[   18.181527][    T1]  ? __usermodehelper_disable+0x80/0x17f
[   18.181861][    T1]  __usermodehelper_disable+0x80/0x17f
[   18.182184][    T1]  ? __usermodehelper_set_disable_depth+0x3a/0x3a
[   18.182565][    T1]  ? dept_ecxt_exit+0x1c9/0x1f7
[   18.182854][    T1]  ? blocking_notifier_call_chain+0x57/0x64
[   18.183205][    T1]  kernel_halt+0x33/0x5d
[   18.183458][    T1]  __do_sys_reboot+0x197/0x24f
[   18.183742][    T1]  ? kernel_power_off+0x6b/0x6b
[   18.184033][    T1]  ? dept_on+0x27/0x33
[   18.184275][    T1]  ? dept_exit+0x38/0x42
[   18.184526][    T1]  ? dept_on+0x27/0x33
[   18.184767][    T1]  ? dept_on+0x27/0x33
[   18.185008][    T1]  ? dept_exit+0x38/0x42
[   18.185260][    T1]  ? dept_enirq_transition+0x25a/0x295
[   18.185582][    T1]  ? syscall_enter_from_user_mode+0x47/0x71
[   18.185930][    T1]  ? dept_aware_softirqs_disable+0x1e/0x1e
[   18.186274][    T1]  ? syscall_enter_from_user_mode+0x47/0x71
[   18.186622][    T1]  do_syscall_64+0xd4/0xfb
[   18.186883][    T1]  ? asm_exc_page_fault+0x1e/0x30
[   18.187180][    T1]  ? dept_aware_softirqs_disable+0x1e/0x1e
[   18.187526][    T1]  ? dept_kernel_enter+0x15/0x1e
[   18.187821][    T1]  ? do_syscall_64+0x13/0xfb
[   18.188094][    T1]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   18.188443][    T1] RIP: 0033:0x4485f7
[   18.188672][    T1] Code: 00 75 05 48 83 c4 28 c3 e8 26 17 00 00 66 0f 1f 44 00 00 f3 0f 1e fa 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8
[   18.189822][    T1] RSP: 002b:00007ffc92206f28 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
[   18.190316][    T1] RAX: ffffffffffffffda RBX: 00007ffc92207118 RCX: 00000000004485f7
[   18.190784][    T1] RDX: 000000004321fedc RSI: 0000000028121969 RDI: 00000000fee1dead
[   18.191254][    T1] RBP: 00007ffc92206f30 R08: 00000000016376a0 R09: 0000000000000000
[   18.191722][    T1] R10: 00000000004c3820 R11: 0000000000000202 R12: 0000000000000001
[   18.192194][    T1] R13: 00007ffc92207108 R14: 00000000004bf8d0 R15: 0000000000000001
[   18.192667][    T1]  </TASK>

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [REPORT] syscall reboot + umh + firmware fallback
  2022-05-12  5:25   ` Byungchul Park
@ 2022-05-12  9:15     ` Tejun Heo
  -1 siblings, 0 replies; 105+ messages in thread
From: Tejun Heo @ 2022-05-12  9:15 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, holt, mcgrof, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, linux-kernel, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg,
	tytso, willy, david, amir73il, bfields, gregkh, kernel-team,
	linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj,
	jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta,
	linux-block, paolo.valente, josef, linux-fsdevel, viro, jack,
	jack, jlayton, dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa, 42.hyeyoo

Hello,

Just took a look out of curiosity.

On Thu, May 12, 2022 at 02:25:57PM +0900, Byungchul Park wrote:
> PROCESS A	PROCESS B	WORKER C
> 
> __do_sys_reboot()
> 		__do_sys_reboot()
>  mutex_lock(&system_transition_mutex)
>  ...		 mutex_lock(&system_transition_mutex) <- stuck
> 		 ...
> 				request_firmware_work_func()
> 				 _request_firmware()
> 				  firmware_fallback_sysfs()
> 				   usermodehelper_read_lock_wait()
> 				    down_read(&umhelper_sem)
> 				   ...
> 				   fw_load_sysfs_fallback()
> 				    fw_sysfs_wait_timeout()
> 				     wait_for_completion_killable_timeout(&fw_st->completion) <- stuck
>  kernel_halt()
>   __usermodehelper_disable()
>    down_write(&umhelper_sem) <- stuck
> 
> --------------------------------------------------------
> All the 3 contexts are stuck at this point.
> --------------------------------------------------------
> 
> PROCESS A	PROCESS B	WORKER C
> 
>    ...
>    up_write(&umhelper_sem)
>  ...
>  mutex_unlock(&system_transition_mutex) <- cannot wake up B
> 
> 		 ...
> 		 kernel_halt()
> 		  notifier_call_chain()
> 		   hw_shutdown_notify()
> 		    kill_pending_fw_fallback_reqs()
> 		     __fw_load_abort()
> 		      complete_all(&fw_st->completion) <- cannot wake up C
> 
> 				   ...
> 				   usermodeheler_read_unlock()
> 				    up_read(&umhelper_sem) <- cannot wake up A

I'm not sure I'm reading it correctly but it looks like "process B" column
is superflous given that it's waiting on the same lock to do the same thing
that A is already doing (besides, you can't really halt the machine twice).
What it's reporting seems to be ABBA deadlock between A waiting on
umhelper_sem and C waiting on fw_st->completion. The report seems spurious:

1. wait_for_completion_killable_timeout() doesn't need someone to wake it up
   to make forward progress because it will unstick itself after timeout
   expires.

2. complete_all() from __fw_load_abort() isn't the only source of wakeup.
   The fw loader can be, and mainly should be, woken up by firmware loading
   actually completing instead of being aborted.

I guess the reason why B shows up there is because the operation order is
such that just between A and C, the complete_all() takes place before
__usermodehlper_disable(), so the whole thing kinda doesn't make sense as
you can't block a past operation by a future one. Inserting process B
introduces the reverse ordering.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [REPORT] syscall reboot + umh + firmware fallback
@ 2022-05-12  9:15     ` Tejun Heo
  0 siblings, 0 replies; 105+ messages in thread
From: Tejun Heo @ 2022-05-12  9:15 UTC (permalink / raw)
  To: Byungchul Park
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, linux-mm, ngupta, johannes.berg,
	jack, dan.j.williams, josef, rostedt, linux-block, jglisse, viro,
	tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, mcgrof, holt, hannes, linux-fsdevel, akpm,
	torvalds

Hello,

Just took a look out of curiosity.

On Thu, May 12, 2022 at 02:25:57PM +0900, Byungchul Park wrote:
> PROCESS A	PROCESS B	WORKER C
> 
> __do_sys_reboot()
> 		__do_sys_reboot()
>  mutex_lock(&system_transition_mutex)
>  ...		 mutex_lock(&system_transition_mutex) <- stuck
> 		 ...
> 				request_firmware_work_func()
> 				 _request_firmware()
> 				  firmware_fallback_sysfs()
> 				   usermodehelper_read_lock_wait()
> 				    down_read(&umhelper_sem)
> 				   ...
> 				   fw_load_sysfs_fallback()
> 				    fw_sysfs_wait_timeout()
> 				     wait_for_completion_killable_timeout(&fw_st->completion) <- stuck
>  kernel_halt()
>   __usermodehelper_disable()
>    down_write(&umhelper_sem) <- stuck
> 
> --------------------------------------------------------
> All the 3 contexts are stuck at this point.
> --------------------------------------------------------
> 
> PROCESS A	PROCESS B	WORKER C
> 
>    ...
>    up_write(&umhelper_sem)
>  ...
>  mutex_unlock(&system_transition_mutex) <- cannot wake up B
> 
> 		 ...
> 		 kernel_halt()
> 		  notifier_call_chain()
> 		   hw_shutdown_notify()
> 		    kill_pending_fw_fallback_reqs()
> 		     __fw_load_abort()
> 		      complete_all(&fw_st->completion) <- cannot wake up C
> 
> 				   ...
> 				   usermodeheler_read_unlock()
> 				    up_read(&umhelper_sem) <- cannot wake up A

I'm not sure I'm reading it correctly but it looks like "process B" column
is superflous given that it's waiting on the same lock to do the same thing
that A is already doing (besides, you can't really halt the machine twice).
What it's reporting seems to be ABBA deadlock between A waiting on
umhelper_sem and C waiting on fw_st->completion. The report seems spurious:

1. wait_for_completion_killable_timeout() doesn't need someone to wake it up
   to make forward progress because it will unstick itself after timeout
   expires.

2. complete_all() from __fw_load_abort() isn't the only source of wakeup.
   The fw loader can be, and mainly should be, woken up by firmware loading
   actually completing instead of being aborted.

I guess the reason why B shows up there is because the operation order is
such that just between A and C, the complete_all() takes place before
__usermodehlper_disable(), so the whole thing kinda doesn't make sense as
you can't block a past operation by a future one. Inserting process B
introduces the reverse ordering.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [REPORT] syscall reboot + umh + firmware fallback
  2022-05-12  9:15     ` Tejun Heo
@ 2022-05-12 11:18       ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-12 11:18 UTC (permalink / raw)
  To: tj
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tytso, willy,
	david, amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa, 42.hyeyoo, mcgrof, holt

Tejun wrote:
> Hello,

Hello,

> I'm not sure I'm reading it correctly but it looks like "process B" column

I think you're interpreting the report correctly.

> is superflous given that it's waiting on the same lock to do the same thing
> that A is already doing (besides, you can't really halt the machine twice).

Indeed! I've been in a daze. I thought kernel_halt() can be called twice
by two different purposes. Sorry for the noise.

> What it's reporting seems to be ABBA deadlock between A waiting on
> umhelper_sem and C waiting on fw_st->completion. The report seems spurious:
>
> 1. wait_for_completion_killable_timeout() doesn't need someone to wake it up
>    to make forward progress because it will unstick itself after timeout
>    expires.

I have a question about this one. Yes, it would never been stuck thanks
to timeout. However, IIUC, timeouts are not supposed to expire in normal
cases. So I thought a timeout expiration means not a normal case so need
to inform it in terms of dependency so as to prevent further expiraton.
That's why I have been trying to track even timeout'ed APIs.

Do you think DEPT shouldn't track timeout APIs? If I was wrong, I
shouldn't track the timeout APIs any more.

> 2. complete_all() from __fw_load_abort() isn't the only source of wakeup.
>    The fw loader can be, and mainly should be, woken up by firmware loading
>    actually completing instead of being aborted.

This is the point I'd like to ask. In normal cases, fw_load_done() might
happen, of course, if the loading gets completed. However, I was
wondering if the kernel ensures either fw_load_done() or fw_load_abort()
to be called by *another* context while kernel_halt().

> Thanks.

Thank you very much!

	Byungchul

> 
> -- 
> tejun
> 

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [REPORT] syscall reboot + umh + firmware fallback
@ 2022-05-12 11:18       ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-12 11:18 UTC (permalink / raw)
  To: tj
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, mingo, djwong, vdavydov.dev, rientjes,
	dennis, linux-ext4, ngupta, johannes.berg, jack, dan.j.williams,
	josef, rostedt, linux-block, jglisse, viro, tglx, mhocko, vbabka,
	melissa.srw, sj, tytso, rodrigosiqueiramelo, kernel-team, gregkh,
	jlayton, linux-kernel, penberg, minchan, mcgrof, holt, hannes,
	linux-fsdevel, akpm, torvalds

Tejun wrote:
> Hello,

Hello,

> I'm not sure I'm reading it correctly but it looks like "process B" column

I think you're interpreting the report correctly.

> is superflous given that it's waiting on the same lock to do the same thing
> that A is already doing (besides, you can't really halt the machine twice).

Indeed! I've been in a daze. I thought kernel_halt() can be called twice
by two different purposes. Sorry for the noise.

> What it's reporting seems to be ABBA deadlock between A waiting on
> umhelper_sem and C waiting on fw_st->completion. The report seems spurious:
>
> 1. wait_for_completion_killable_timeout() doesn't need someone to wake it up
>    to make forward progress because it will unstick itself after timeout
>    expires.

I have a question about this one. Yes, it would never been stuck thanks
to timeout. However, IIUC, timeouts are not supposed to expire in normal
cases. So I thought a timeout expiration means not a normal case so need
to inform it in terms of dependency so as to prevent further expiraton.
That's why I have been trying to track even timeout'ed APIs.

Do you think DEPT shouldn't track timeout APIs? If I was wrong, I
shouldn't track the timeout APIs any more.

> 2. complete_all() from __fw_load_abort() isn't the only source of wakeup.
>    The fw loader can be, and mainly should be, woken up by firmware loading
>    actually completing instead of being aborted.

This is the point I'd like to ask. In normal cases, fw_load_done() might
happen, of course, if the loading gets completed. However, I was
wondering if the kernel ensures either fw_load_done() or fw_load_abort()
to be called by *another* context while kernel_halt().

> Thanks.

Thank you very much!

	Byungchul

> 
> -- 
> tejun
> 

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [REPORT] syscall reboot + umh + firmware fallback
  2022-05-12 11:18       ` Byungchul Park
@ 2022-05-12 13:56         ` Theodore Ts'o
  -1 siblings, 0 replies; 105+ messages in thread
From: Theodore Ts'o @ 2022-05-12 13:56 UTC (permalink / raw)
  To: Byungchul Park
  Cc: tj, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, linux-kernel, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg,
	willy, david, amir73il, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa, 42.hyeyoo, mcgrof, holt

On Thu, May 12, 2022 at 08:18:24PM +0900, Byungchul Park wrote:
> I have a question about this one. Yes, it would never been stuck thanks
> to timeout. However, IIUC, timeouts are not supposed to expire in normal
> cases. So I thought a timeout expiration means not a normal case so need
> to inform it in terms of dependency so as to prevent further expiraton.
> That's why I have been trying to track even timeout'ed APIs.

As I beleive I've already pointed out to you previously in ext4 and
ocfs2, the jbd2 timeout every five seconds happens **all** the time
while the file system is mounted.  Commits more frequently than five
seconds is the exception case, at least for desktops/laptop workloads.

We *don't* get to the timeout only when a userspace process calls
fsync(2), or if the journal was incorrectly sized by the system
administrator so that it's too small, and the workload has so many
file system mutations that we have to prematurely close the
transaction ahead of the 5 second timeout.

> Do you think DEPT shouldn't track timeout APIs? If I was wrong, I
> shouldn't track the timeout APIs any more.

DEPT tracking timeouts will cause false positives in at least some
cases.  At the very least, there needs to be an easy way to suppress
these false positives on a per wait/mutex/spinlock basis.

      	       	    	     	      	   	 - Ted

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [REPORT] syscall reboot + umh + firmware fallback
@ 2022-05-12 13:56         ` Theodore Ts'o
  0 siblings, 0 replies; 105+ messages in thread
From: Theodore Ts'o @ 2022-05-12 13:56 UTC (permalink / raw)
  To: Byungchul Park
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, mingo, djwong, vdavydov.dev, rientjes,
	dennis, linux-ext4, ngupta, johannes.berg, jack, dan.j.williams,
	josef, rostedt, linux-block, linux-fsdevel, jglisse, viro, tglx,
	mhocko, vbabka, melissa.srw, sj, rodrigosiqueiramelo,
	kernel-team, gregkh, jlayton, linux-kernel, penberg, minchan,
	mcgrof, holt, hannes, tj, akpm, torvalds

On Thu, May 12, 2022 at 08:18:24PM +0900, Byungchul Park wrote:
> I have a question about this one. Yes, it would never been stuck thanks
> to timeout. However, IIUC, timeouts are not supposed to expire in normal
> cases. So I thought a timeout expiration means not a normal case so need
> to inform it in terms of dependency so as to prevent further expiraton.
> That's why I have been trying to track even timeout'ed APIs.

As I beleive I've already pointed out to you previously in ext4 and
ocfs2, the jbd2 timeout every five seconds happens **all** the time
while the file system is mounted.  Commits more frequently than five
seconds is the exception case, at least for desktops/laptop workloads.

We *don't* get to the timeout only when a userspace process calls
fsync(2), or if the journal was incorrectly sized by the system
administrator so that it's too small, and the workload has so many
file system mutations that we have to prematurely close the
transaction ahead of the 5 second timeout.

> Do you think DEPT shouldn't track timeout APIs? If I was wrong, I
> shouldn't track the timeout APIs any more.

DEPT tracking timeouts will cause false positives in at least some
cases.  At the very least, there needs to be an easy way to suppress
these false positives on a per wait/mutex/spinlock basis.

      	       	    	     	      	   	 - Ted

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [REPORT] syscall reboot + umh + firmware fallback
  2022-05-12 11:18       ` Byungchul Park
@ 2022-05-12 16:41         ` Tejun Heo
  -1 siblings, 0 replies; 105+ messages in thread
From: Tejun Heo @ 2022-05-12 16:41 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tytso, willy,
	david, amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa, 42.hyeyoo, mcgrof, holt

Hello,

On Thu, May 12, 2022 at 08:18:24PM +0900, Byungchul Park wrote:
> > 1. wait_for_completion_killable_timeout() doesn't need someone to wake it up
> >    to make forward progress because it will unstick itself after timeout
> >    expires.
> 
> I have a question about this one. Yes, it would never been stuck thanks
> to timeout. However, IIUC, timeouts are not supposed to expire in normal
> cases. So I thought a timeout expiration means not a normal case so need
> to inform it in terms of dependency so as to prevent further expiraton.
> That's why I have been trying to track even timeout'ed APIs.
> 
> Do you think DEPT shouldn't track timeout APIs? If I was wrong, I
> shouldn't track the timeout APIs any more.

Without actually surveying the use cases, I can't say for sure but my
experience has been that we often get pretty creative with timeouts and it's
something people actively think about and monitor (and it's usually not
subtle). Given that, I'm skeptical about how much value it'd add for a
dependency checker to warn about timeouts. It might be net negative than the
other way around.

> > 2. complete_all() from __fw_load_abort() isn't the only source of wakeup.
> >    The fw loader can be, and mainly should be, woken up by firmware loading
> >    actually completing instead of being aborted.
> 
> This is the point I'd like to ask. In normal cases, fw_load_done() might
> happen, of course, if the loading gets completed. However, I was
> wondering if the kernel ensures either fw_load_done() or fw_load_abort()
> to be called by *another* context while kernel_halt().

We'll have to walk through the code to tell that. On a cursory look tho, up
until that point (just before shutting down usermode helper), I don't see
anything which would actively block firmware loading.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [REPORT] syscall reboot + umh + firmware fallback
@ 2022-05-12 16:41         ` Tejun Heo
  0 siblings, 0 replies; 105+ messages in thread
From: Tejun Heo @ 2022-05-12 16:41 UTC (permalink / raw)
  To: Byungchul Park
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, mingo, djwong, vdavydov.dev, rientjes,
	dennis, linux-ext4, ngupta, johannes.berg, jack, dan.j.williams,
	josef, rostedt, linux-block, jglisse, viro, tglx, mhocko, vbabka,
	melissa.srw, sj, tytso, rodrigosiqueiramelo, kernel-team, gregkh,
	jlayton, linux-kernel, penberg, minchan, mcgrof, holt, hannes,
	linux-fsdevel, akpm, torvalds

Hello,

On Thu, May 12, 2022 at 08:18:24PM +0900, Byungchul Park wrote:
> > 1. wait_for_completion_killable_timeout() doesn't need someone to wake it up
> >    to make forward progress because it will unstick itself after timeout
> >    expires.
> 
> I have a question about this one. Yes, it would never been stuck thanks
> to timeout. However, IIUC, timeouts are not supposed to expire in normal
> cases. So I thought a timeout expiration means not a normal case so need
> to inform it in terms of dependency so as to prevent further expiraton.
> That's why I have been trying to track even timeout'ed APIs.
> 
> Do you think DEPT shouldn't track timeout APIs? If I was wrong, I
> shouldn't track the timeout APIs any more.

Without actually surveying the use cases, I can't say for sure but my
experience has been that we often get pretty creative with timeouts and it's
something people actively think about and monitor (and it's usually not
subtle). Given that, I'm skeptical about how much value it'd add for a
dependency checker to warn about timeouts. It might be net negative than the
other way around.

> > 2. complete_all() from __fw_load_abort() isn't the only source of wakeup.
> >    The fw loader can be, and mainly should be, woken up by firmware loading
> >    actually completing instead of being aborted.
> 
> This is the point I'd like to ask. In normal cases, fw_load_done() might
> happen, of course, if the loading gets completed. However, I was
> wondering if the kernel ensures either fw_load_done() or fw_load_abort()
> to be called by *another* context while kernel_halt().

We'll have to walk through the code to tell that. On a cursory look tho, up
until that point (just before shutting down usermode helper), I don't see
anything which would actively block firmware loading.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-11 10:04               ` Hyeonggon Yoo
@ 2022-05-19 10:11                 ` Catalin Marinas
  -1 siblings, 0 replies; 105+ messages in thread
From: Catalin Marinas @ 2022-05-19 10:11 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Byungchul Park, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, linux-kernel, peterz, will,
	tglx, rostedt, joel, sashal, daniel.vetter, chris, duyuyang,
	johannes.berg, tj, tytso, willy, david, amir73il, gregkh,
	kernel-team, linux-mm, akpm, mhocko, minchan, hannes,
	vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka,
	ngupta, linux-block, paolo.valente, josef, linux-fsdevel, viro,
	jack, jack, jlayton, dan.j.williams, hch, djwong, dri-devel,
	airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Wed, May 11, 2022 at 07:04:51PM +0900, Hyeonggon Yoo wrote:
> On Wed, May 11, 2022 at 08:39:29AM +0900, Byungchul Park wrote:
> > On Tue, May 10, 2022 at 08:18:12PM +0900, Hyeonggon Yoo wrote:
> > > On Mon, May 09, 2022 at 09:16:37AM +0900, Byungchul Park wrote:
> > > > CASE 1.
> > > > 
> > > >    lock L with depth n
> > > >    lock_nested L' with depth n + 1
> > > >    ...
> > > >    unlock L'
> > > >    unlock L
> > > > 
> > > > This case is allowed by Lockdep.
> > > > This case is allowed by DEPT cuz it's not a deadlock.
> > > > 
> > > > CASE 2.
> > > > 
> > > >    lock L with depth n
> > > >    lock A
> > > >    lock_nested L' with depth n + 1
> > > >    ...
> > > >    unlock L'
> > > >    unlock A
> > > >    unlock L
> > > > 
> > > > This case is allowed by Lockdep.
> > > > This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
> > > 
> > > Yeah, in previous threads we discussed this [1]
> > > 
> > > And the case was:
> > > 	scan_mutex -> object_lock -> kmemleak_lock -> object_lock
> > > And dept reported:
> > > 	object_lock -> kmemleak_lock, kmemleak_lock -> object_lock as
> > > 	deadlock.
> > > 
> > > But IIUC - What DEPT reported happens only under scan_mutex and it
> > > is not simple just not to take them because the object can be
> > > removed from the list and freed while scanning via kmemleak_free()
> > > without kmemleak_lock and object_lock.

The above kmemleak sequence shouldn't deadlock since those locks, even
if taken in a different order, are serialised by scan_mutex. For various
reasons, trying to reduce the latency, I ended up with some
fine-grained, per-object locking.

For object allocation (rbtree modification) and tree search, we use
kmemleak_lock. During scanning (which can take minutes under
scan_mutex), we want to prevent (a) long latencies and (b) freeing the
object being scanned. We release the locks regularly for (a) and hold
the object->lock for (b).

In another thread Byungchul mentioned:

|    context X			context Y
| 
|    lock mutex A		lock mutex A
|    lock B			lock C
|    lock C			lock B
|    unlock C			unlock B
|    unlock B			unlock C
|    unlock mutex A		unlock mutex A
| 
| In my opinion, lock B and lock C are unnecessary if they are always
| along with lock mutex A. Or we should keep correct lock order across all
| the code.

If these are the only two places, yes, locks B and C would be
unnecessary. But we have those locks acquired (not nested) on the
allocation path (kmemleak_lock) and freeing path (object->lock). We
don't want to block those paths while scan_mutex is held.

That said, we may be able to use a single kmemleak_lock for everything.
The object freeing path may be affected slightly during scanning but the
code does release it every MAX_SCAN_SIZE bytes. It may even get slightly
faster as we'd hammer a single lock (I'll do some benchmarks).

But from a correctness perspective, I think the DEPT tool should be
improved a bit to detect when such out of order locking is serialised by
an enclosing lock/mutex.

-- 
Catalin

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-19 10:11                 ` Catalin Marinas
  0 siblings, 0 replies; 105+ messages in thread
From: Catalin Marinas @ 2022-05-19 10:11 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel, cl,
	will, duyuyang, sashal, paolo.valente, damien.lemoal, willy, hch,
	airlied, mingo, djwong, vdavydov.dev, rientjes, dennis,
	linux-ext4, ngupta, johannes.berg, jack, dan.j.williams, josef,
	rostedt, Byungchul Park, linux-fsdevel, jglisse, viro, tglx,
	mhocko, vbabka, melissa.srw, linux-block, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm, torvalds

On Wed, May 11, 2022 at 07:04:51PM +0900, Hyeonggon Yoo wrote:
> On Wed, May 11, 2022 at 08:39:29AM +0900, Byungchul Park wrote:
> > On Tue, May 10, 2022 at 08:18:12PM +0900, Hyeonggon Yoo wrote:
> > > On Mon, May 09, 2022 at 09:16:37AM +0900, Byungchul Park wrote:
> > > > CASE 1.
> > > > 
> > > >    lock L with depth n
> > > >    lock_nested L' with depth n + 1
> > > >    ...
> > > >    unlock L'
> > > >    unlock L
> > > > 
> > > > This case is allowed by Lockdep.
> > > > This case is allowed by DEPT cuz it's not a deadlock.
> > > > 
> > > > CASE 2.
> > > > 
> > > >    lock L with depth n
> > > >    lock A
> > > >    lock_nested L' with depth n + 1
> > > >    ...
> > > >    unlock L'
> > > >    unlock A
> > > >    unlock L
> > > > 
> > > > This case is allowed by Lockdep.
> > > > This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
> > > 
> > > Yeah, in previous threads we discussed this [1]
> > > 
> > > And the case was:
> > > 	scan_mutex -> object_lock -> kmemleak_lock -> object_lock
> > > And dept reported:
> > > 	object_lock -> kmemleak_lock, kmemleak_lock -> object_lock as
> > > 	deadlock.
> > > 
> > > But IIUC - What DEPT reported happens only under scan_mutex and it
> > > is not simple just not to take them because the object can be
> > > removed from the list and freed while scanning via kmemleak_free()
> > > without kmemleak_lock and object_lock.

The above kmemleak sequence shouldn't deadlock since those locks, even
if taken in a different order, are serialised by scan_mutex. For various
reasons, trying to reduce the latency, I ended up with some
fine-grained, per-object locking.

For object allocation (rbtree modification) and tree search, we use
kmemleak_lock. During scanning (which can take minutes under
scan_mutex), we want to prevent (a) long latencies and (b) freeing the
object being scanned. We release the locks regularly for (a) and hold
the object->lock for (b).

In another thread Byungchul mentioned:

|    context X			context Y
| 
|    lock mutex A		lock mutex A
|    lock B			lock C
|    lock C			lock B
|    unlock C			unlock B
|    unlock B			unlock C
|    unlock mutex A		unlock mutex A
| 
| In my opinion, lock B and lock C are unnecessary if they are always
| along with lock mutex A. Or we should keep correct lock order across all
| the code.

If these are the only two places, yes, locks B and C would be
unnecessary. But we have those locks acquired (not nested) on the
allocation path (kmemleak_lock) and freeing path (object->lock). We
don't want to block those paths while scan_mutex is held.

That said, we may be able to use a single kmemleak_lock for everything.
The object freeing path may be affected slightly during scanning but the
code does release it every MAX_SCAN_SIZE bytes. It may even get slightly
faster as we'd hammer a single lock (I'll do some benchmarks).

But from a correctness perspective, I think the DEPT tool should be
improved a bit to detect when such out of order locking is serialised by
an enclosing lock/mutex.

-- 
Catalin

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 02/21] dept: Implement Dept(Dependency Tracker)
  2022-05-04  8:17   ` Byungchul Park
@ 2022-05-21  3:24     ` Hyeonggon Yoo
  -1 siblings, 0 replies; 105+ messages in thread
From: Hyeonggon Yoo @ 2022-05-21  3:24 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa

On Wed, May 04, 2022 at 05:17:30PM +0900, Byungchul Park wrote:
> CURRENT STATUS
> +/*

[...]

> + * Ensure it has been called on ON/OFF transition.
> + */
> +void dept_enirq_transition(unsigned long ip)
> +{
> +	struct dept_task *dt = dept_task();
> +	unsigned long flags;
> +
> +	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
> +		return;
> +
> +	/*
> +	 * IRQ ON/OFF transition might happen while Dept is working.
> +	 * We cannot handle recursive entrance. Just ingnore it.
> +	 * Only transitions outside of Dept will be considered.
> +	 */
> +	if (dt->recursive)
> +		return;
> +
> +	flags = dept_enter();
> +
> +	enirq_update(ip);
> +
> +	dept_exit(flags);
> +}

EXPORT_SYMBOL_GPL(dept_enirq_transition);

ERROR: modpost: "dept_enirq_transition" [arch/x86/kvm/kvm-amd.ko] undefined!
ERROR: modpost: "dept_enirq_transition" [arch/x86/kvm/kvm-intel.ko] undefined!

This function needs to be exported for modules.

Thanks.

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 02/21] dept: Implement Dept(Dependency Tracker)
@ 2022-05-21  3:24     ` Hyeonggon Yoo
  0 siblings, 0 replies; 105+ messages in thread
From: Hyeonggon Yoo @ 2022-05-21  3:24 UTC (permalink / raw)
  To: Byungchul Park
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel, cl,
	will, duyuyang, sashal, paolo.valente, damien.lemoal, willy, hch,
	airlied, mingo, djwong, vdavydov.dev, rientjes, dennis,
	linux-ext4, linux-mm, ngupta, johannes.berg, jack,
	dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm, torvalds

On Wed, May 04, 2022 at 05:17:30PM +0900, Byungchul Park wrote:
> CURRENT STATUS
> +/*

[...]

> + * Ensure it has been called on ON/OFF transition.
> + */
> +void dept_enirq_transition(unsigned long ip)
> +{
> +	struct dept_task *dt = dept_task();
> +	unsigned long flags;
> +
> +	if (unlikely(READ_ONCE(dept_stop) || in_nmi()))
> +		return;
> +
> +	/*
> +	 * IRQ ON/OFF transition might happen while Dept is working.
> +	 * We cannot handle recursive entrance. Just ingnore it.
> +	 * Only transitions outside of Dept will be considered.
> +	 */
> +	if (dt->recursive)
> +		return;
> +
> +	flags = dept_enter();
> +
> +	enirq_update(ip);
> +
> +	dept_exit(flags);
> +}

EXPORT_SYMBOL_GPL(dept_enirq_transition);

ERROR: modpost: "dept_enirq_transition" [arch/x86/kvm/kvm-amd.ko] undefined!
ERROR: modpost: "dept_enirq_transition" [arch/x86/kvm/kvm-intel.ko] undefined!

This function needs to be exported for modules.

Thanks.

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 07/21] dept: Apply Dept to seqlock
  2022-05-04  8:17   ` Byungchul Park
@ 2022-05-21  5:25     ` Hyeonggon Yoo
  -1 siblings, 0 replies; 105+ messages in thread
From: Hyeonggon Yoo @ 2022-05-21  5:25 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa

Hello I got new report from dept, related to seqlock.
I applied dept 1.20 series on v5.18-rc7.

Below is what DEPT reported.
I think this is bogus because reader of p->alloc_lock cannot block
its writer. Or please kindly tell me if I'm missing something ;)

Thanks.

[    8.032674] ===================================================
[    8.032676] DEPT: Circular dependency has been detected.
[    8.032677] 5.18.0-rc7-dept+ #10 Tainted: G            E
[    8.032677] ---------------------------------------------------
[    8.032678] summary
[    8.032678] ---------------------------------------------------
[    8.032679] *** DEADLOCK ***

[    8.032679] context A
[    8.032679]     [S] __raw_spin_lock_irqsave(&host->lock:0)
[    8.032681]     [W] __seqprop_spinlock_wait(&p->alloc_lock:0)
[    8.032681]     [E] spin_unlock(&host->lock:0)

[    8.032682] context B
[    8.032682]     [S] __raw_spin_lock(&dentry->d_lock:0)
[    8.032683]     [W] __raw_spin_lock(&host->lock:0)
[    8.032684]     [E] spin_unlock(&dentry->d_lock:0)

[    8.032684] context C
[    8.032685]     [S] __raw_spin_lock(&p->alloc_lock:0)
[    8.032685]     [W] __raw_spin_lock(&dentry->d_lock:0)
[    8.032685]     [E] spin_unlock(&p->alloc_lock:0)

[    8.032686] [S]: start of the event context
[    8.032686] [W]: the wait blocked
[    8.032687] [E]: the event not reachable
[    8.032687] ---------------------------------------------------
[    8.032687] context A's detail
[    8.032688] ---------------------------------------------------
[    8.032688] context A
[    8.032688]     [S] __raw_spin_lock_irqsave(&host->lock:0)
[    8.032689]     [W] __seqprop_spinlock_wait(&p->alloc_lock:0)
[    8.032689]     [E] spin_unlock(&host->lock:0)

[    8.032690] [S] __raw_spin_lock_irqsave(&host->lock:0):
[    8.032690] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:2734 drivers/ata/libata-scsi.c:4017) 
[    8.032694] stacktrace:
[    8.032694] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:2734 drivers/ata/libata-scsi.c:4017) 
[    8.032696] scsi_queue_rq (drivers/scsi/scsi_lib.c:1517 drivers/scsi/scsi_lib.c:1745) 
[    8.032697] blk_mq_dispatch_rq_list (block/blk-mq.c:1858) 
[    8.032700] blk_mq_do_dispatch_sched (block/blk-mq-sched.c:173 block/blk-mq-sched.c:187) 
[    8.032701] __blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:313) 
[    8.032702] blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:339) 
[    8.032703] __blk_mq_run_hw_queue (./include/linux/rcupdate.h:723 block/blk-mq.c:1974) 
[    8.032704] __blk_mq_delay_run_hw_queue (block/blk-mq.c:2052) 
[    8.032705] blk_mq_run_hw_queue (block/blk-mq.c:2103) 
[    8.032706] blk_mq_sched_insert_requests (./include/linux/rcupdate.h:692 ./include/linux/percpu-refcount.h:330 ./include/linux/percpu-refcount.h:351 block/blk-mq-sched.c:495) 
[    8.032707] blk_mq_flush_plug_list (block/blk-mq.c:2640) 
[    8.032708] __blk_flush_plug (block/blk-core.c:1247) 
[    8.032709] blk_finish_plug (block/blk-core.c:1265 block/blk-core.c:1261) 
[    8.032710] read_pages (mm/readahead.c:181) 
[    8.032712] page_cache_ra_unbounded (./include/linux/fs.h:815 mm/readahead.c:262) 
[    8.032713] page_cache_ra_order (mm/readahead.c:547) 

[    8.032714] [W] __seqprop_spinlock_wait(&p->alloc_lock:0):
[    8.032714] __slab_alloc.constprop.0 (mm/slub.c:3092) 
[    8.032717] stacktrace:
[    8.032717] dept_wait (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:227 kernel/dependency/dept.c:1013 kernel/dependency/dept.c:1057 kernel/dependency/dept.c:2216) 
[    8.032719] ___slab_alloc (./include/linux/seqlock.h:326 ./include/linux/cpuset.h:151 mm/slub.c:2223 mm/slub.c:2266 mm/slub.c:3000) 
[    8.032720] __slab_alloc.constprop.0 (mm/slub.c:3092) 
[    8.032721] kmem_cache_alloc (mm/slub.c:3183 mm/slub.c:3225 mm/slub.c:3232 mm/slub.c:3242) 
[    8.032722] alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
[    8.032724] alloc_iova_fast (drivers/iommu/iova.c:455) 
[    8.032725] iommu_dma_alloc_iova (drivers/iommu/dma-iommu.c:628) 
[    8.032726] iommu_dma_map_sg (drivers/iommu/dma-iommu.c:1201) 
[    8.032727] __dma_map_sg_attrs (kernel/dma/mapping.c:195) 
[    8.032729] dma_map_sg_attrs (kernel/dma/mapping.c:232) 
[    8.032730] ata_qc_issue (drivers/ata/libata-core.c:4530 drivers/ata/libata-core.c:4876) 
[    8.032731] __ata_scsi_queuecmd (drivers/ata/libata-scsi.c:1710 drivers/ata/libata-scsi.c:3974) 
[    8.032732] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:4019) 
[    8.032734] scsi_queue_rq (drivers/scsi/scsi_lib.c:1517 drivers/scsi/scsi_lib.c:1745) 
[    8.032734] blk_mq_dispatch_rq_list (block/blk-mq.c:1858) 
[    8.032735] blk_mq_do_dispatch_sched (block/blk-mq-sched.c:173 block/blk-mq-sched.c:187) 

[    8.032736] [E] spin_unlock(&host->lock:0):
[    8.032737] (N/A)
[    8.032737] ---------------------------------------------------
[    8.032738] context B's detail
[    8.032738] ---------------------------------------------------
[    8.032738] context B
[    8.032738]     [S] __raw_spin_lock(&dentry->d_lock:0)
[    8.032739]     [W] __raw_spin_lock(&host->lock:0)
[    8.032740]     [E] spin_unlock(&dentry->d_lock:0)

[    8.032740] [S] __raw_spin_lock(&dentry->d_lock:0):
[    8.032741] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
[    8.032743] stacktrace:
[    8.032743] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
[    8.032744] path_get (fs/namei.c:546) 
[    8.032746] do_dentry_open (fs/open.c:778) 
[    8.032747] vfs_open (fs/open.c:959) 
[    8.032748] path_openat (fs/namei.c:3583 fs/namei.c:3602) 
[    8.032749] do_filp_open (fs/namei.c:3636) 
[    8.032750] do_sys_openat2 (fs/open.c:1213) 
[    8.032751] __x64_sys_openat (fs/open.c:1240) 
[    8.032752] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
[    8.032754] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 

[    8.032756] [W] __raw_spin_lock(&host->lock:0):
[    8.032756] ahci_single_level_irq_intr (drivers/ata/libahci.c:1970) libahci
[    8.032759] stacktrace:
[    8.032760] ahci_single_level_irq_intr (drivers/ata/libahci.c:1970) libahci
[    8.032761] __handle_irq_event_percpu (kernel/irq/handle.c:158) 
[    8.032763] handle_irq_event (kernel/irq/handle.c:195 kernel/irq/handle.c:210) 
[    8.032763] handle_edge_irq (kernel/irq/chip.c:819) 
[    8.032764] __common_interrupt (./include/asm-generic/irq_regs.h:28 (discriminator 22) arch/x86/kernel/irq.c:263 (discriminator 22)) 
[    8.032766] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14)) 
[    8.032768] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:636) 
[    8.032769] lock_release (kernel/locking/lockdep.c:5665) 
[    8.032771] _raw_spin_unlock (./include/linux/spinlock_api_smp.h:141 kernel/locking/spinlock.c:186) 
[    8.032772] lockref_get (lib/lockref.c:55) 
[    8.032772] path_get (fs/namei.c:546) 
[    8.032774] do_dentry_open (fs/open.c:778) 
[    8.032774] vfs_open (fs/open.c:959) 
[    8.032775] path_openat (fs/namei.c:3583 fs/namei.c:3602) 
[    8.032776] do_filp_open (fs/namei.c:3636) 
[    8.032777] do_sys_openat2 (fs/open.c:1213) 

[    8.032778] [E] spin_unlock(&dentry->d_lock:0):
[    8.032778] (N/A)
[    8.032779] ---------------------------------------------------
[    8.032779] context C's detail
[    8.032779] ---------------------------------------------------
[    8.032780] context C
[    8.032780]     [S] __raw_spin_lock(&p->alloc_lock:0)
[    8.032780]     [W] __raw_spin_lock(&dentry->d_lock:0)
[    8.032781]     [E] spin_unlock(&p->alloc_lock:0)

[    8.032781] [S] __raw_spin_lock(&p->alloc_lock:0):
[    8.032782] proc_root_link (fs/proc/base.c:177 fs/proc/base.c:208) 
[    8.032784] stacktrace:
[    8.032784] proc_root_link (fs/proc/base.c:177 fs/proc/base.c:208) 
[    8.032784] proc_pid_get_link.part.0 (fs/proc/base.c:1756) 
[    8.032785] proc_pid_get_link (fs/proc/base.c:1762) 
[    8.032786] step_into (fs/namei.c:1819 fs/namei.c:1876) 
[    8.032787] walk_component (fs/namei.c:2027) 
[    8.032788] path_lookupat (fs/namei.c:2475 fs/namei.c:2499) 
[    8.032789] filename_lookup (fs/namei.c:2528) 
[    8.032790] vfs_statx (fs/stat.c:229) 
[    8.032791] vfs_fstatat (fs/stat.c:256) 
[    8.032792] __do_sys_newfstatat (fs/stat.c:426) 
[    8.032793] __x64_sys_newfstatat (fs/stat.c:419) 
[    8.032793] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
[    8.032794] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 

[    8.032796] [W] __raw_spin_lock(&dentry->d_lock:0):
[    8.032796] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
[    8.032797] stacktrace:
[    8.032797] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
[    8.032798] path_get (fs/namei.c:546) 
[    8.032799] proc_root_link (./include/linux/spinlock.h:410 ./include/linux/fs_struct.h:32 fs/proc/base.c:178 fs/proc/base.c:208) 
[    8.032800] proc_pid_get_link.part.0 (fs/proc/base.c:1756) 
[    8.032801] proc_pid_get_link (fs/proc/base.c:1762) 
[    8.032801] step_into (fs/namei.c:1819 fs/namei.c:1876) 
[    8.032802] walk_component (fs/namei.c:2027) 
[    8.032803] path_lookupat (fs/namei.c:2475 fs/namei.c:2499) 
[    8.032805] filename_lookup (fs/namei.c:2528) 
[    8.032805] vfs_statx (fs/stat.c:229) 
[    8.032806] vfs_fstatat (fs/stat.c:256) 
[    8.032807] __do_sys_newfstatat (fs/stat.c:426) 
[    8.032808] __x64_sys_newfstatat (fs/stat.c:419) 
[    8.032808] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
[    8.032809] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 

[    8.032810] [E] spin_unlock(&p->alloc_lock:0):
[    8.032811] (N/A)
[    8.032811] ---------------------------------------------------
[    8.032811] information that might be helpful
[    8.032812] ---------------------------------------------------
[    8.032812] CPU: 4 PID: 534 Comm: systemd-tmpfile Tainted: G            E     5.18.0-rc7-dept+ #10
[    8.032814] Hardware name: ASUS System Product Name/TUF GAMING B550-PLUS (WI-FI), BIOS 1401 12/03/2020
[    8.032814] Call Trace:
[    8.032815]  <TASK>
[    8.032816] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
[    8.032819] dump_stack (lib/dump_stack.c:114) 
[    8.032820] print_circle.cold (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:143 kernel/dependency/dept.c:776) 
[    8.032822] ? print_circle (kernel/dependency/dept.c:1107) 
[    8.032824] cb_check_dl (kernel/dependency/dept.c:1133) 
[    8.032825] bfs (kernel/dependency/dept.c:874) 
[    8.032826] add_dep (kernel/dependency/dept.c:1457) 
[    8.032828] add_wait (kernel/dependency/dept.c:1505) 
[    8.032829] ? __slab_alloc.constprop.0 (mm/slub.c:3092) 
[    8.032831] __dept_wait (kernel/dependency/dept.c:2156 (discriminator 2)) 
[    8.032832] ? __slab_alloc.constprop.0 (mm/slub.c:3092) 
[    8.032833] ? __slab_alloc.constprop.0 (mm/slub.c:3092) 
[    8.032834] dept_wait (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:227 kernel/dependency/dept.c:1013 kernel/dependency/dept.c:1057 kernel/dependency/dept.c:2216) 
[    8.032836] ___slab_alloc (./include/linux/seqlock.h:326 ./include/linux/cpuset.h:151 mm/slub.c:2223 mm/slub.c:2266 mm/slub.c:3000) 
[    8.032837] ? alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
[    8.032839] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:27 (discriminator 1)) 
[    8.032841] ? alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
[    8.032842] __slab_alloc.constprop.0 (mm/slub.c:3092) 
[    8.032844] kmem_cache_alloc (mm/slub.c:3183 mm/slub.c:3225 mm/slub.c:3232 mm/slub.c:3242) 
[    8.032845] ? alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
[    8.032846] alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
[    8.032847] ? dept_ecxt_exit (kernel/dependency/dept.c:2506 (discriminator 1)) 
[    8.032849] alloc_iova_fast (drivers/iommu/iova.c:455) 
[    8.032851] iommu_dma_alloc_iova (drivers/iommu/dma-iommu.c:628) 
[    8.032852] iommu_dma_map_sg (drivers/iommu/dma-iommu.c:1201) 
[    8.032854] ? ata_scsi_mode_select_xlat (drivers/ata/libata-scsi.c:1503) 
[    8.032855] __dma_map_sg_attrs (kernel/dma/mapping.c:195) 
[    8.032856] dma_map_sg_attrs (kernel/dma/mapping.c:232) 
[    8.032858] ata_qc_issue (drivers/ata/libata-core.c:4530 drivers/ata/libata-core.c:4876) 
[    8.032859] __ata_scsi_queuecmd (drivers/ata/libata-scsi.c:1710 drivers/ata/libata-scsi.c:3974) 
[    8.032861] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:4019) 
[    8.032862] scsi_queue_rq (drivers/scsi/scsi_lib.c:1517 drivers/scsi/scsi_lib.c:1745) 
[    8.032864] blk_mq_dispatch_rq_list (block/blk-mq.c:1858) 
[    8.032866] ? sbitmap_get (lib/sbitmap.c:179 lib/sbitmap.c:206 lib/sbitmap.c:231) 
[    8.032869] blk_mq_do_dispatch_sched (block/blk-mq-sched.c:173 block/blk-mq-sched.c:187) 
[    8.032871] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) 
[    8.032872] __blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:313) 
[    8.032874] blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:339) 
[    8.032875] __blk_mq_run_hw_queue (./include/linux/rcupdate.h:723 block/blk-mq.c:1974) 
[    8.032876] __blk_mq_delay_run_hw_queue (block/blk-mq.c:2052) 
[    8.032877] blk_mq_run_hw_queue (block/blk-mq.c:2103) 
[    8.032879] blk_mq_sched_insert_requests (./include/linux/rcupdate.h:692 ./include/linux/percpu-refcount.h:330 ./include/linux/percpu-refcount.h:351 block/blk-mq-sched.c:495) 
[    8.032880] blk_mq_flush_plug_list (block/blk-mq.c:2640) 
[    8.032882] __blk_flush_plug (block/blk-core.c:1247) 
[    8.032883] ? lock_release (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5664 (discriminator 1)) 
[    8.032885] blk_finish_plug (block/blk-core.c:1265 block/blk-core.c:1261) 
[    8.032886] read_pages (mm/readahead.c:181) 
[    8.032888] page_cache_ra_unbounded (./include/linux/fs.h:815 mm/readahead.c:262) 
[    8.032890] page_cache_ra_order (mm/readahead.c:547) 
[    8.032892] ondemand_readahead (mm/readahead.c:669) 
[    8.032893] page_cache_sync_ra (mm/readahead.c:696) 
[    8.032894] filemap_get_pages (mm/filemap.c:2613) 
[    8.032896] ? lock_is_held_type (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5686 (discriminator 1)) 
[    8.032898] filemap_read (mm/filemap.c:2698) 
[    8.032900] ? lock_is_held_type (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5686 (discriminator 1)) 
[    8.032901] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) 
[    8.032901] ? lock_is_held_type (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5686 (discriminator 1)) 
[    8.032903] ? sched_clock (arch/x86/kernel/tsc.c:254) 
[    8.032904] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) 
[    8.032905] ? lock_release (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5664 (discriminator 1)) 
[    8.032907] generic_file_read_iter (mm/filemap.c:2845) 
[    8.032908] ? aa_file_perm (security/apparmor/file.c:644) 
[    8.032910] ext4_file_read_iter (fs/ext4/file.c:133) 
[    8.032912] new_sync_read (fs/read_write.c:402 (discriminator 1)) 
[    8.032915] vfs_read (fs/read_write.c:482) 
[    8.032916] ksys_read (fs/read_write.c:620) 
[    8.032918] __x64_sys_read (fs/read_write.c:628) 
[    8.032919] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
[    8.032920] ? do_syscall_64 (arch/x86/entry/common.c:89) 
[    8.032921] ? syscall_exit_to_user_mode (kernel/entry/common.c:297) 
[    8.032922] ? do_syscall_64 (arch/x86/entry/common.c:89) 
[    8.032924] ? do_syscall_64 (arch/x86/entry/common.c:89) 
[    8.032925] ? do_syscall_64 (./arch/x86/include/asm/jump_label.h:27 arch/x86/entry/common.c:77) 
[    8.032926] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 
[    8.032927] RIP: 0033:0x7f66de513932
[ 8.032928] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 3a b9 0c 00 e8 15 1a 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
All code
========
   0:	c0 e9 b2             	shr    $0xb2,%cl
   3:	fe                   	(bad)  
   4:	ff                   	(bad)  
   5:	ff 50 48             	call   *0x48(%rax)
   8:	8d 3d 3a b9 0c 00    	lea    0xcb93a(%rip),%edi        # 0xcb948
   e:	e8 15 1a 02 00       	call   0x21a28
  13:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  18:	f3 0f 1e fa          	endbr64 
  1c:	64 8b 04 25 18 00 00 	mov    %fs:0x18,%eax
  23:	00 
  24:	85 c0                	test   %eax,%eax
  26:	75 10                	jne    0x38
  28:	0f 05                	syscall 
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 56                	ja     0x88
  32:	c3                   	ret    
  33:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  38:	48 83 ec 28          	sub    $0x28,%rsp
  3c:	48                   	rex.W
  3d:	89                   	.byte 0x89
  3e:	54                   	push   %rsp
  3f:	24                   	.byte 0x24

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 56                	ja     0x5e
   8:	c3                   	ret    
   9:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
   e:	48 83 ec 28          	sub    $0x28,%rsp
  12:	48                   	rex.W
  13:	89                   	.byte 0x89
  14:	54                   	push   %rsp
  15:	24                   	.byte 0x24
[    8.032929] RSP: 002b:00007ffcdce2cee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[    8.032931] RAX: ffffffffffffffda RBX: 000056271b3552d0 RCX: 00007f66de513932
[    8.032932] RDX: 0000000000001000 RSI: 000056271b357f00 RDI: 0000000000000004
[    8.032932] RBP: 00007f66de616600 R08: 0000000000000004 R09: 000056271b358f00
[    8.032933] R10: 000056271b357ef0 R11: 0000000000000246 R12: 00007f66de62aec0
[    8.032934] R13: 0000000000000d68 R14: 00007f66de615a00 R15: 0000000000000d68
[    8.032936]  </TASK>

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 07/21] dept: Apply Dept to seqlock
@ 2022-05-21  5:25     ` Hyeonggon Yoo
  0 siblings, 0 replies; 105+ messages in thread
From: Hyeonggon Yoo @ 2022-05-21  5:25 UTC (permalink / raw)
  To: Byungchul Park
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel, cl,
	will, duyuyang, sashal, paolo.valente, damien.lemoal, willy, hch,
	airlied, mingo, djwong, vdavydov.dev, rientjes, dennis,
	linux-ext4, linux-mm, ngupta, johannes.berg, jack,
	dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm, torvalds

Hello I got new report from dept, related to seqlock.
I applied dept 1.20 series on v5.18-rc7.

Below is what DEPT reported.
I think this is bogus because reader of p->alloc_lock cannot block
its writer. Or please kindly tell me if I'm missing something ;)

Thanks.

[    8.032674] ===================================================
[    8.032676] DEPT: Circular dependency has been detected.
[    8.032677] 5.18.0-rc7-dept+ #10 Tainted: G            E
[    8.032677] ---------------------------------------------------
[    8.032678] summary
[    8.032678] ---------------------------------------------------
[    8.032679] *** DEADLOCK ***

[    8.032679] context A
[    8.032679]     [S] __raw_spin_lock_irqsave(&host->lock:0)
[    8.032681]     [W] __seqprop_spinlock_wait(&p->alloc_lock:0)
[    8.032681]     [E] spin_unlock(&host->lock:0)

[    8.032682] context B
[    8.032682]     [S] __raw_spin_lock(&dentry->d_lock:0)
[    8.032683]     [W] __raw_spin_lock(&host->lock:0)
[    8.032684]     [E] spin_unlock(&dentry->d_lock:0)

[    8.032684] context C
[    8.032685]     [S] __raw_spin_lock(&p->alloc_lock:0)
[    8.032685]     [W] __raw_spin_lock(&dentry->d_lock:0)
[    8.032685]     [E] spin_unlock(&p->alloc_lock:0)

[    8.032686] [S]: start of the event context
[    8.032686] [W]: the wait blocked
[    8.032687] [E]: the event not reachable
[    8.032687] ---------------------------------------------------
[    8.032687] context A's detail
[    8.032688] ---------------------------------------------------
[    8.032688] context A
[    8.032688]     [S] __raw_spin_lock_irqsave(&host->lock:0)
[    8.032689]     [W] __seqprop_spinlock_wait(&p->alloc_lock:0)
[    8.032689]     [E] spin_unlock(&host->lock:0)

[    8.032690] [S] __raw_spin_lock_irqsave(&host->lock:0):
[    8.032690] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:2734 drivers/ata/libata-scsi.c:4017) 
[    8.032694] stacktrace:
[    8.032694] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:2734 drivers/ata/libata-scsi.c:4017) 
[    8.032696] scsi_queue_rq (drivers/scsi/scsi_lib.c:1517 drivers/scsi/scsi_lib.c:1745) 
[    8.032697] blk_mq_dispatch_rq_list (block/blk-mq.c:1858) 
[    8.032700] blk_mq_do_dispatch_sched (block/blk-mq-sched.c:173 block/blk-mq-sched.c:187) 
[    8.032701] __blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:313) 
[    8.032702] blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:339) 
[    8.032703] __blk_mq_run_hw_queue (./include/linux/rcupdate.h:723 block/blk-mq.c:1974) 
[    8.032704] __blk_mq_delay_run_hw_queue (block/blk-mq.c:2052) 
[    8.032705] blk_mq_run_hw_queue (block/blk-mq.c:2103) 
[    8.032706] blk_mq_sched_insert_requests (./include/linux/rcupdate.h:692 ./include/linux/percpu-refcount.h:330 ./include/linux/percpu-refcount.h:351 block/blk-mq-sched.c:495) 
[    8.032707] blk_mq_flush_plug_list (block/blk-mq.c:2640) 
[    8.032708] __blk_flush_plug (block/blk-core.c:1247) 
[    8.032709] blk_finish_plug (block/blk-core.c:1265 block/blk-core.c:1261) 
[    8.032710] read_pages (mm/readahead.c:181) 
[    8.032712] page_cache_ra_unbounded (./include/linux/fs.h:815 mm/readahead.c:262) 
[    8.032713] page_cache_ra_order (mm/readahead.c:547) 

[    8.032714] [W] __seqprop_spinlock_wait(&p->alloc_lock:0):
[    8.032714] __slab_alloc.constprop.0 (mm/slub.c:3092) 
[    8.032717] stacktrace:
[    8.032717] dept_wait (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:227 kernel/dependency/dept.c:1013 kernel/dependency/dept.c:1057 kernel/dependency/dept.c:2216) 
[    8.032719] ___slab_alloc (./include/linux/seqlock.h:326 ./include/linux/cpuset.h:151 mm/slub.c:2223 mm/slub.c:2266 mm/slub.c:3000) 
[    8.032720] __slab_alloc.constprop.0 (mm/slub.c:3092) 
[    8.032721] kmem_cache_alloc (mm/slub.c:3183 mm/slub.c:3225 mm/slub.c:3232 mm/slub.c:3242) 
[    8.032722] alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
[    8.032724] alloc_iova_fast (drivers/iommu/iova.c:455) 
[    8.032725] iommu_dma_alloc_iova (drivers/iommu/dma-iommu.c:628) 
[    8.032726] iommu_dma_map_sg (drivers/iommu/dma-iommu.c:1201) 
[    8.032727] __dma_map_sg_attrs (kernel/dma/mapping.c:195) 
[    8.032729] dma_map_sg_attrs (kernel/dma/mapping.c:232) 
[    8.032730] ata_qc_issue (drivers/ata/libata-core.c:4530 drivers/ata/libata-core.c:4876) 
[    8.032731] __ata_scsi_queuecmd (drivers/ata/libata-scsi.c:1710 drivers/ata/libata-scsi.c:3974) 
[    8.032732] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:4019) 
[    8.032734] scsi_queue_rq (drivers/scsi/scsi_lib.c:1517 drivers/scsi/scsi_lib.c:1745) 
[    8.032734] blk_mq_dispatch_rq_list (block/blk-mq.c:1858) 
[    8.032735] blk_mq_do_dispatch_sched (block/blk-mq-sched.c:173 block/blk-mq-sched.c:187) 

[    8.032736] [E] spin_unlock(&host->lock:0):
[    8.032737] (N/A)
[    8.032737] ---------------------------------------------------
[    8.032738] context B's detail
[    8.032738] ---------------------------------------------------
[    8.032738] context B
[    8.032738]     [S] __raw_spin_lock(&dentry->d_lock:0)
[    8.032739]     [W] __raw_spin_lock(&host->lock:0)
[    8.032740]     [E] spin_unlock(&dentry->d_lock:0)

[    8.032740] [S] __raw_spin_lock(&dentry->d_lock:0):
[    8.032741] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
[    8.032743] stacktrace:
[    8.032743] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
[    8.032744] path_get (fs/namei.c:546) 
[    8.032746] do_dentry_open (fs/open.c:778) 
[    8.032747] vfs_open (fs/open.c:959) 
[    8.032748] path_openat (fs/namei.c:3583 fs/namei.c:3602) 
[    8.032749] do_filp_open (fs/namei.c:3636) 
[    8.032750] do_sys_openat2 (fs/open.c:1213) 
[    8.032751] __x64_sys_openat (fs/open.c:1240) 
[    8.032752] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
[    8.032754] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 

[    8.032756] [W] __raw_spin_lock(&host->lock:0):
[    8.032756] ahci_single_level_irq_intr (drivers/ata/libahci.c:1970) libahci
[    8.032759] stacktrace:
[    8.032760] ahci_single_level_irq_intr (drivers/ata/libahci.c:1970) libahci
[    8.032761] __handle_irq_event_percpu (kernel/irq/handle.c:158) 
[    8.032763] handle_irq_event (kernel/irq/handle.c:195 kernel/irq/handle.c:210) 
[    8.032763] handle_edge_irq (kernel/irq/chip.c:819) 
[    8.032764] __common_interrupt (./include/asm-generic/irq_regs.h:28 (discriminator 22) arch/x86/kernel/irq.c:263 (discriminator 22)) 
[    8.032766] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14)) 
[    8.032768] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:636) 
[    8.032769] lock_release (kernel/locking/lockdep.c:5665) 
[    8.032771] _raw_spin_unlock (./include/linux/spinlock_api_smp.h:141 kernel/locking/spinlock.c:186) 
[    8.032772] lockref_get (lib/lockref.c:55) 
[    8.032772] path_get (fs/namei.c:546) 
[    8.032774] do_dentry_open (fs/open.c:778) 
[    8.032774] vfs_open (fs/open.c:959) 
[    8.032775] path_openat (fs/namei.c:3583 fs/namei.c:3602) 
[    8.032776] do_filp_open (fs/namei.c:3636) 
[    8.032777] do_sys_openat2 (fs/open.c:1213) 

[    8.032778] [E] spin_unlock(&dentry->d_lock:0):
[    8.032778] (N/A)
[    8.032779] ---------------------------------------------------
[    8.032779] context C's detail
[    8.032779] ---------------------------------------------------
[    8.032780] context C
[    8.032780]     [S] __raw_spin_lock(&p->alloc_lock:0)
[    8.032780]     [W] __raw_spin_lock(&dentry->d_lock:0)
[    8.032781]     [E] spin_unlock(&p->alloc_lock:0)

[    8.032781] [S] __raw_spin_lock(&p->alloc_lock:0):
[    8.032782] proc_root_link (fs/proc/base.c:177 fs/proc/base.c:208) 
[    8.032784] stacktrace:
[    8.032784] proc_root_link (fs/proc/base.c:177 fs/proc/base.c:208) 
[    8.032784] proc_pid_get_link.part.0 (fs/proc/base.c:1756) 
[    8.032785] proc_pid_get_link (fs/proc/base.c:1762) 
[    8.032786] step_into (fs/namei.c:1819 fs/namei.c:1876) 
[    8.032787] walk_component (fs/namei.c:2027) 
[    8.032788] path_lookupat (fs/namei.c:2475 fs/namei.c:2499) 
[    8.032789] filename_lookup (fs/namei.c:2528) 
[    8.032790] vfs_statx (fs/stat.c:229) 
[    8.032791] vfs_fstatat (fs/stat.c:256) 
[    8.032792] __do_sys_newfstatat (fs/stat.c:426) 
[    8.032793] __x64_sys_newfstatat (fs/stat.c:419) 
[    8.032793] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
[    8.032794] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 

[    8.032796] [W] __raw_spin_lock(&dentry->d_lock:0):
[    8.032796] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
[    8.032797] stacktrace:
[    8.032797] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
[    8.032798] path_get (fs/namei.c:546) 
[    8.032799] proc_root_link (./include/linux/spinlock.h:410 ./include/linux/fs_struct.h:32 fs/proc/base.c:178 fs/proc/base.c:208) 
[    8.032800] proc_pid_get_link.part.0 (fs/proc/base.c:1756) 
[    8.032801] proc_pid_get_link (fs/proc/base.c:1762) 
[    8.032801] step_into (fs/namei.c:1819 fs/namei.c:1876) 
[    8.032802] walk_component (fs/namei.c:2027) 
[    8.032803] path_lookupat (fs/namei.c:2475 fs/namei.c:2499) 
[    8.032805] filename_lookup (fs/namei.c:2528) 
[    8.032805] vfs_statx (fs/stat.c:229) 
[    8.032806] vfs_fstatat (fs/stat.c:256) 
[    8.032807] __do_sys_newfstatat (fs/stat.c:426) 
[    8.032808] __x64_sys_newfstatat (fs/stat.c:419) 
[    8.032808] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
[    8.032809] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 

[    8.032810] [E] spin_unlock(&p->alloc_lock:0):
[    8.032811] (N/A)
[    8.032811] ---------------------------------------------------
[    8.032811] information that might be helpful
[    8.032812] ---------------------------------------------------
[    8.032812] CPU: 4 PID: 534 Comm: systemd-tmpfile Tainted: G            E     5.18.0-rc7-dept+ #10
[    8.032814] Hardware name: ASUS System Product Name/TUF GAMING B550-PLUS (WI-FI), BIOS 1401 12/03/2020
[    8.032814] Call Trace:
[    8.032815]  <TASK>
[    8.032816] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
[    8.032819] dump_stack (lib/dump_stack.c:114) 
[    8.032820] print_circle.cold (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:143 kernel/dependency/dept.c:776) 
[    8.032822] ? print_circle (kernel/dependency/dept.c:1107) 
[    8.032824] cb_check_dl (kernel/dependency/dept.c:1133) 
[    8.032825] bfs (kernel/dependency/dept.c:874) 
[    8.032826] add_dep (kernel/dependency/dept.c:1457) 
[    8.032828] add_wait (kernel/dependency/dept.c:1505) 
[    8.032829] ? __slab_alloc.constprop.0 (mm/slub.c:3092) 
[    8.032831] __dept_wait (kernel/dependency/dept.c:2156 (discriminator 2)) 
[    8.032832] ? __slab_alloc.constprop.0 (mm/slub.c:3092) 
[    8.032833] ? __slab_alloc.constprop.0 (mm/slub.c:3092) 
[    8.032834] dept_wait (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:227 kernel/dependency/dept.c:1013 kernel/dependency/dept.c:1057 kernel/dependency/dept.c:2216) 
[    8.032836] ___slab_alloc (./include/linux/seqlock.h:326 ./include/linux/cpuset.h:151 mm/slub.c:2223 mm/slub.c:2266 mm/slub.c:3000) 
[    8.032837] ? alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
[    8.032839] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:27 (discriminator 1)) 
[    8.032841] ? alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
[    8.032842] __slab_alloc.constprop.0 (mm/slub.c:3092) 
[    8.032844] kmem_cache_alloc (mm/slub.c:3183 mm/slub.c:3225 mm/slub.c:3232 mm/slub.c:3242) 
[    8.032845] ? alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
[    8.032846] alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
[    8.032847] ? dept_ecxt_exit (kernel/dependency/dept.c:2506 (discriminator 1)) 
[    8.032849] alloc_iova_fast (drivers/iommu/iova.c:455) 
[    8.032851] iommu_dma_alloc_iova (drivers/iommu/dma-iommu.c:628) 
[    8.032852] iommu_dma_map_sg (drivers/iommu/dma-iommu.c:1201) 
[    8.032854] ? ata_scsi_mode_select_xlat (drivers/ata/libata-scsi.c:1503) 
[    8.032855] __dma_map_sg_attrs (kernel/dma/mapping.c:195) 
[    8.032856] dma_map_sg_attrs (kernel/dma/mapping.c:232) 
[    8.032858] ata_qc_issue (drivers/ata/libata-core.c:4530 drivers/ata/libata-core.c:4876) 
[    8.032859] __ata_scsi_queuecmd (drivers/ata/libata-scsi.c:1710 drivers/ata/libata-scsi.c:3974) 
[    8.032861] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:4019) 
[    8.032862] scsi_queue_rq (drivers/scsi/scsi_lib.c:1517 drivers/scsi/scsi_lib.c:1745) 
[    8.032864] blk_mq_dispatch_rq_list (block/blk-mq.c:1858) 
[    8.032866] ? sbitmap_get (lib/sbitmap.c:179 lib/sbitmap.c:206 lib/sbitmap.c:231) 
[    8.032869] blk_mq_do_dispatch_sched (block/blk-mq-sched.c:173 block/blk-mq-sched.c:187) 
[    8.032871] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) 
[    8.032872] __blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:313) 
[    8.032874] blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:339) 
[    8.032875] __blk_mq_run_hw_queue (./include/linux/rcupdate.h:723 block/blk-mq.c:1974) 
[    8.032876] __blk_mq_delay_run_hw_queue (block/blk-mq.c:2052) 
[    8.032877] blk_mq_run_hw_queue (block/blk-mq.c:2103) 
[    8.032879] blk_mq_sched_insert_requests (./include/linux/rcupdate.h:692 ./include/linux/percpu-refcount.h:330 ./include/linux/percpu-refcount.h:351 block/blk-mq-sched.c:495) 
[    8.032880] blk_mq_flush_plug_list (block/blk-mq.c:2640) 
[    8.032882] __blk_flush_plug (block/blk-core.c:1247) 
[    8.032883] ? lock_release (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5664 (discriminator 1)) 
[    8.032885] blk_finish_plug (block/blk-core.c:1265 block/blk-core.c:1261) 
[    8.032886] read_pages (mm/readahead.c:181) 
[    8.032888] page_cache_ra_unbounded (./include/linux/fs.h:815 mm/readahead.c:262) 
[    8.032890] page_cache_ra_order (mm/readahead.c:547) 
[    8.032892] ondemand_readahead (mm/readahead.c:669) 
[    8.032893] page_cache_sync_ra (mm/readahead.c:696) 
[    8.032894] filemap_get_pages (mm/filemap.c:2613) 
[    8.032896] ? lock_is_held_type (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5686 (discriminator 1)) 
[    8.032898] filemap_read (mm/filemap.c:2698) 
[    8.032900] ? lock_is_held_type (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5686 (discriminator 1)) 
[    8.032901] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) 
[    8.032901] ? lock_is_held_type (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5686 (discriminator 1)) 
[    8.032903] ? sched_clock (arch/x86/kernel/tsc.c:254) 
[    8.032904] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) 
[    8.032905] ? lock_release (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5664 (discriminator 1)) 
[    8.032907] generic_file_read_iter (mm/filemap.c:2845) 
[    8.032908] ? aa_file_perm (security/apparmor/file.c:644) 
[    8.032910] ext4_file_read_iter (fs/ext4/file.c:133) 
[    8.032912] new_sync_read (fs/read_write.c:402 (discriminator 1)) 
[    8.032915] vfs_read (fs/read_write.c:482) 
[    8.032916] ksys_read (fs/read_write.c:620) 
[    8.032918] __x64_sys_read (fs/read_write.c:628) 
[    8.032919] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
[    8.032920] ? do_syscall_64 (arch/x86/entry/common.c:89) 
[    8.032921] ? syscall_exit_to_user_mode (kernel/entry/common.c:297) 
[    8.032922] ? do_syscall_64 (arch/x86/entry/common.c:89) 
[    8.032924] ? do_syscall_64 (arch/x86/entry/common.c:89) 
[    8.032925] ? do_syscall_64 (./arch/x86/include/asm/jump_label.h:27 arch/x86/entry/common.c:77) 
[    8.032926] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 
[    8.032927] RIP: 0033:0x7f66de513932
[ 8.032928] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 3a b9 0c 00 e8 15 1a 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
All code
========
   0:	c0 e9 b2             	shr    $0xb2,%cl
   3:	fe                   	(bad)  
   4:	ff                   	(bad)  
   5:	ff 50 48             	call   *0x48(%rax)
   8:	8d 3d 3a b9 0c 00    	lea    0xcb93a(%rip),%edi        # 0xcb948
   e:	e8 15 1a 02 00       	call   0x21a28
  13:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  18:	f3 0f 1e fa          	endbr64 
  1c:	64 8b 04 25 18 00 00 	mov    %fs:0x18,%eax
  23:	00 
  24:	85 c0                	test   %eax,%eax
  26:	75 10                	jne    0x38
  28:	0f 05                	syscall 
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 56                	ja     0x88
  32:	c3                   	ret    
  33:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  38:	48 83 ec 28          	sub    $0x28,%rsp
  3c:	48                   	rex.W
  3d:	89                   	.byte 0x89
  3e:	54                   	push   %rsp
  3f:	24                   	.byte 0x24

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 56                	ja     0x5e
   8:	c3                   	ret    
   9:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
   e:	48 83 ec 28          	sub    $0x28,%rsp
  12:	48                   	rex.W
  13:	89                   	.byte 0x89
  14:	54                   	push   %rsp
  15:	24                   	.byte 0x24
[    8.032929] RSP: 002b:00007ffcdce2cee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[    8.032931] RAX: ffffffffffffffda RBX: 000056271b3552d0 RCX: 00007f66de513932
[    8.032932] RDX: 0000000000001000 RSI: 000056271b357f00 RDI: 0000000000000004
[    8.032932] RBP: 00007f66de616600 R08: 0000000000000004 R09: 000056271b358f00
[    8.032933] R10: 000056271b357ef0 R11: 0000000000000246 R12: 00007f66de62aec0
[    8.032934] R13: 0000000000000d68 R14: 00007f66de615a00 R15: 0000000000000d68
[    8.032936]  </TASK>

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [REPORT] syscall reboot + umh + firmware fallback
  2022-05-12 13:56         ` Theodore Ts'o
@ 2022-05-23  1:10           ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-23  1:10 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: tj, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, linux-kernel, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg,
	willy, david, amir73il, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa, 42.hyeyoo, mcgrof, holt

On Thu, May 12, 2022 at 09:56:46AM -0400, Theodore Ts'o wrote:
> On Thu, May 12, 2022 at 08:18:24PM +0900, Byungchul Park wrote:
> > I have a question about this one. Yes, it would never been stuck thanks
> > to timeout. However, IIUC, timeouts are not supposed to expire in normal
> > cases. So I thought a timeout expiration means not a normal case so need
> > to inform it in terms of dependency so as to prevent further expiraton.
> > That's why I have been trying to track even timeout'ed APIs.
> 
> As I beleive I've already pointed out to you previously in ext4 and
> ocfs2, the jbd2 timeout every five seconds happens **all** the time
> while the file system is mounted.  Commits more frequently than five
> seconds is the exception case, at least for desktops/laptop workloads.

Thanks, Ted. It's easy to stop tracking APIs with timeout. I've been
just afraid that the cases that we want to suppress anyway will be
skipped.

However, I should stop it if it produces too many false alarms.

> We *don't* get to the timeout only when a userspace process calls
> fsync(2), or if the journal was incorrectly sized by the system
> administrator so that it's too small, and the workload has so many
> file system mutations that we have to prematurely close the
> transaction ahead of the 5 second timeout.

Yeah... It's how journaling works. Thanks.

> > Do you think DEPT shouldn't track timeout APIs? If I was wrong, I
> > shouldn't track the timeout APIs any more.
> 
> DEPT tracking timeouts will cause false positives in at least some
> cases.  At the very least, there needs to be an easy way to suppress
> these false positives on a per wait/mutex/spinlock basis.

The easy way is to stop tracking those that are along with timeout until
DEPT starts to consider waits/events by timeout functionality itself.

Thanks.

	Byungchul
> 
>       	       	    	     	      	   	 - Ted

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [REPORT] syscall reboot + umh + firmware fallback
@ 2022-05-23  1:10           ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-23  1:10 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel,
	42.hyeyoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, mingo, djwong, vdavydov.dev, rientjes,
	dennis, linux-ext4, ngupta, johannes.berg, jack, dan.j.williams,
	josef, rostedt, linux-block, linux-fsdevel, jglisse, viro, tglx,
	mhocko, vbabka, melissa.srw, sj, rodrigosiqueiramelo,
	kernel-team, gregkh, jlayton, linux-kernel, penberg, minchan,
	mcgrof, holt, hannes, tj, akpm, torvalds

On Thu, May 12, 2022 at 09:56:46AM -0400, Theodore Ts'o wrote:
> On Thu, May 12, 2022 at 08:18:24PM +0900, Byungchul Park wrote:
> > I have a question about this one. Yes, it would never been stuck thanks
> > to timeout. However, IIUC, timeouts are not supposed to expire in normal
> > cases. So I thought a timeout expiration means not a normal case so need
> > to inform it in terms of dependency so as to prevent further expiraton.
> > That's why I have been trying to track even timeout'ed APIs.
> 
> As I beleive I've already pointed out to you previously in ext4 and
> ocfs2, the jbd2 timeout every five seconds happens **all** the time
> while the file system is mounted.  Commits more frequently than five
> seconds is the exception case, at least for desktops/laptop workloads.

Thanks, Ted. It's easy to stop tracking APIs with timeout. I've been
just afraid that the cases that we want to suppress anyway will be
skipped.

However, I should stop it if it produces too many false alarms.

> We *don't* get to the timeout only when a userspace process calls
> fsync(2), or if the journal was incorrectly sized by the system
> administrator so that it's too small, and the workload has so many
> file system mutations that we have to prematurely close the
> transaction ahead of the 5 second timeout.

Yeah... It's how journaling works. Thanks.

> > Do you think DEPT shouldn't track timeout APIs? If I was wrong, I
> > shouldn't track the timeout APIs any more.
> 
> DEPT tracking timeouts will cause false positives in at least some
> cases.  At the very least, there needs to be an easy way to suppress
> these false positives on a per wait/mutex/spinlock basis.

The easy way is to stop tracking those that are along with timeout until
DEPT starts to consider waits/events by timeout functionality itself.

Thanks.

	Byungchul
> 
>       	       	    	     	      	   	 - Ted

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
  2022-05-19 10:11                 ` Catalin Marinas
@ 2022-05-23  2:43                   ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-23  2:43 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Hyeonggon Yoo, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, linux-kernel, peterz, will,
	tglx, rostedt, joel, sashal, daniel.vetter, chris, duyuyang,
	johannes.berg, tj, tytso, willy, david, amir73il, gregkh,
	kernel-team, linux-mm, akpm, mhocko, minchan, hannes,
	vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka,
	ngupta, linux-block, paolo.valente, josef, linux-fsdevel, viro,
	jack, jack, jlayton, dan.j.williams, hch, djwong, dri-devel,
	airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Thu, May 19, 2022 at 11:11:10AM +0100, Catalin Marinas wrote:
> On Wed, May 11, 2022 at 07:04:51PM +0900, Hyeonggon Yoo wrote:
> > On Wed, May 11, 2022 at 08:39:29AM +0900, Byungchul Park wrote:
> > > On Tue, May 10, 2022 at 08:18:12PM +0900, Hyeonggon Yoo wrote:
> > > > On Mon, May 09, 2022 at 09:16:37AM +0900, Byungchul Park wrote:
> > > > > CASE 1.
> > > > > 
> > > > >    lock L with depth n
> > > > >    lock_nested L' with depth n + 1
> > > > >    ...
> > > > >    unlock L'
> > > > >    unlock L
> > > > > 
> > > > > This case is allowed by Lockdep.
> > > > > This case is allowed by DEPT cuz it's not a deadlock.
> > > > > 
> > > > > CASE 2.
> > > > > 
> > > > >    lock L with depth n
> > > > >    lock A
> > > > >    lock_nested L' with depth n + 1
> > > > >    ...
> > > > >    unlock L'
> > > > >    unlock A
> > > > >    unlock L
> > > > > 
> > > > > This case is allowed by Lockdep.
> > > > > This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
> > > > 
> > > > Yeah, in previous threads we discussed this [1]
> > > > 
> > > > And the case was:
> > > > 	scan_mutex -> object_lock -> kmemleak_lock -> object_lock
> > > > And dept reported:
> > > > 	object_lock -> kmemleak_lock, kmemleak_lock -> object_lock as
> > > > 	deadlock.
> > > > 
> > > > But IIUC - What DEPT reported happens only under scan_mutex and it
> > > > is not simple just not to take them because the object can be
> > > > removed from the list and freed while scanning via kmemleak_free()
> > > > without kmemleak_lock and object_lock.
> 
> The above kmemleak sequence shouldn't deadlock since those locks, even
> if taken in a different order, are serialised by scan_mutex. For various
> reasons, trying to reduce the latency, I ended up with some
> fine-grained, per-object locking.

I understand why you introduced the fine-grained lock. However, the
different order should be avoided anyway. As Steven said, Lockdep also
should've detected this case, say, this would have been detected if
Lockdep worked correctly.

It's not a technical issue to make a tool skip the reversed order when
it's already protected by another lock. Because each lock has its own
purpose as you explained, no body knows if the cases might arise that
use kmemleak_lock and object_lock only w/o holding scan_mutex someday.

I'm wondering how other folks think this case should be handled tho.

> For object allocation (rbtree modification) and tree search, we use
> kmemleak_lock. During scanning (which can take minutes under
> scan_mutex), we want to prevent (a) long latencies and (b) freeing the
> object being scanned. We release the locks regularly for (a) and hold
> the object->lock for (b).
> 
> In another thread Byungchul mentioned:
> 
> |    context X			context Y
> | 
> |    lock mutex A		lock mutex A
> |    lock B			lock C
> |    lock C			lock B
> |    unlock C			unlock B
> |    unlock B			unlock C
> |    unlock mutex A		unlock mutex A
> | 
> | In my opinion, lock B and lock C are unnecessary if they are always
> | along with lock mutex A. Or we should keep correct lock order across all
> | the code.
> 
> If these are the only two places, yes, locks B and C would be
> unnecessary. But we have those locks acquired (not nested) on the
> allocation path (kmemleak_lock) and freeing path (object->lock). We
> don't want to block those paths while scan_mutex is held.
> 
> That said, we may be able to use a single kmemleak_lock for everything.
> The object freeing path may be affected slightly during scanning but the
> code does release it every MAX_SCAN_SIZE bytes. It may even get slightly
> faster as we'd hammer a single lock (I'll do some benchmarks).
> 
> But from a correctness perspective, I think the DEPT tool should be
> improved a bit to detect when such out of order locking is serialised by
> an enclosing lock/mutex.

Again, I don't think this is a technical issue.

	Byungchul
> 
> -- 
> Catalin

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
@ 2022-05-23  2:43                   ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-23  2:43 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, linux-mm, linux-ide, adilger.kernel, joel,
	Hyeonggon Yoo, cl, will, duyuyang, sashal, paolo.valente,
	damien.lemoal, willy, hch, airlied, mingo, djwong, vdavydov.dev,
	rientjes, dennis, linux-ext4, ngupta, johannes.berg, jack,
	dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm, torvalds

On Thu, May 19, 2022 at 11:11:10AM +0100, Catalin Marinas wrote:
> On Wed, May 11, 2022 at 07:04:51PM +0900, Hyeonggon Yoo wrote:
> > On Wed, May 11, 2022 at 08:39:29AM +0900, Byungchul Park wrote:
> > > On Tue, May 10, 2022 at 08:18:12PM +0900, Hyeonggon Yoo wrote:
> > > > On Mon, May 09, 2022 at 09:16:37AM +0900, Byungchul Park wrote:
> > > > > CASE 1.
> > > > > 
> > > > >    lock L with depth n
> > > > >    lock_nested L' with depth n + 1
> > > > >    ...
> > > > >    unlock L'
> > > > >    unlock L
> > > > > 
> > > > > This case is allowed by Lockdep.
> > > > > This case is allowed by DEPT cuz it's not a deadlock.
> > > > > 
> > > > > CASE 2.
> > > > > 
> > > > >    lock L with depth n
> > > > >    lock A
> > > > >    lock_nested L' with depth n + 1
> > > > >    ...
> > > > >    unlock L'
> > > > >    unlock A
> > > > >    unlock L
> > > > > 
> > > > > This case is allowed by Lockdep.
> > > > > This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
> > > > 
> > > > Yeah, in previous threads we discussed this [1]
> > > > 
> > > > And the case was:
> > > > 	scan_mutex -> object_lock -> kmemleak_lock -> object_lock
> > > > And dept reported:
> > > > 	object_lock -> kmemleak_lock, kmemleak_lock -> object_lock as
> > > > 	deadlock.
> > > > 
> > > > But IIUC - What DEPT reported happens only under scan_mutex and it
> > > > is not simple just not to take them because the object can be
> > > > removed from the list and freed while scanning via kmemleak_free()
> > > > without kmemleak_lock and object_lock.
> 
> The above kmemleak sequence shouldn't deadlock since those locks, even
> if taken in a different order, are serialised by scan_mutex. For various
> reasons, trying to reduce the latency, I ended up with some
> fine-grained, per-object locking.

I understand why you introduced the fine-grained lock. However, the
different order should be avoided anyway. As Steven said, Lockdep also
should've detected this case, say, this would have been detected if
Lockdep worked correctly.

It's not a technical issue to make a tool skip the reversed order when
it's already protected by another lock. Because each lock has its own
purpose as you explained, no body knows if the cases might arise that
use kmemleak_lock and object_lock only w/o holding scan_mutex someday.

I'm wondering how other folks think this case should be handled tho.

> For object allocation (rbtree modification) and tree search, we use
> kmemleak_lock. During scanning (which can take minutes under
> scan_mutex), we want to prevent (a) long latencies and (b) freeing the
> object being scanned. We release the locks regularly for (a) and hold
> the object->lock for (b).
> 
> In another thread Byungchul mentioned:
> 
> |    context X			context Y
> | 
> |    lock mutex A		lock mutex A
> |    lock B			lock C
> |    lock C			lock B
> |    unlock C			unlock B
> |    unlock B			unlock C
> |    unlock mutex A		unlock mutex A
> | 
> | In my opinion, lock B and lock C are unnecessary if they are always
> | along with lock mutex A. Or we should keep correct lock order across all
> | the code.
> 
> If these are the only two places, yes, locks B and C would be
> unnecessary. But we have those locks acquired (not nested) on the
> allocation path (kmemleak_lock) and freeing path (object->lock). We
> don't want to block those paths while scan_mutex is held.
> 
> That said, we may be able to use a single kmemleak_lock for everything.
> The object freeing path may be affected slightly during scanning but the
> code does release it every MAX_SCAN_SIZE bytes. It may even get slightly
> faster as we'd hammer a single lock (I'll do some benchmarks).
> 
> But from a correctness perspective, I think the DEPT tool should be
> improved a bit to detect when such out of order locking is serialised by
> an enclosing lock/mutex.

Again, I don't think this is a technical issue.

	Byungchul
> 
> -- 
> Catalin

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 07/21] dept: Apply Dept to seqlock
  2022-05-21  5:25     ` Hyeonggon Yoo
@ 2022-05-24  6:00       ` Byungchul Park
  -1 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-24  6:00 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa

On Sat, May 21, 2022 at 02:25:35PM +0900, Hyeonggon Yoo wrote:
> Hello I got new report from dept, related to seqlock.
> I applied dept 1.20 series on v5.18-rc7.
> 
> Below is what DEPT reported.
> I think this is bogus because reader of p->alloc_lock cannot block
> its writer. Or please kindly tell me if I'm missing something ;)

Hi,

Yes, it should've been silent. I will fix it. Thank you!

Just FYI, a reader of seq-reader is a wait blocking the following event
e.i. spin_unlock(&host->lock) in here, not its writer. This report tells
the writer is blocked by __raw_spin_lock(&dentry->d_lock), not by its
reader. I've explained it just for your information. (:

Thank you!

	Byungchul


> 
> Thanks.
> 
> [    8.032674] ===================================================
> [    8.032676] DEPT: Circular dependency has been detected.
> [    8.032677] 5.18.0-rc7-dept+ #10 Tainted: G            E
> [    8.032677] ---------------------------------------------------
> [    8.032678] summary
> [    8.032678] ---------------------------------------------------
> [    8.032679] *** DEADLOCK ***
> 
> [    8.032679] context A
> [    8.032679]     [S] __raw_spin_lock_irqsave(&host->lock:0)
> [    8.032681]     [W] __seqprop_spinlock_wait(&p->alloc_lock:0)
> [    8.032681]     [E] spin_unlock(&host->lock:0)
> 
> [    8.032682] context B
> [    8.032682]     [S] __raw_spin_lock(&dentry->d_lock:0)
> [    8.032683]     [W] __raw_spin_lock(&host->lock:0)
> [    8.032684]     [E] spin_unlock(&dentry->d_lock:0)
> 
> [    8.032684] context C
> [    8.032685]     [S] __raw_spin_lock(&p->alloc_lock:0)
> [    8.032685]     [W] __raw_spin_lock(&dentry->d_lock:0)
> [    8.032685]     [E] spin_unlock(&p->alloc_lock:0)
> 
> [    8.032686] [S]: start of the event context
> [    8.032686] [W]: the wait blocked
> [    8.032687] [E]: the event not reachable
> [    8.032687] ---------------------------------------------------
> [    8.032687] context A's detail
> [    8.032688] ---------------------------------------------------
> [    8.032688] context A
> [    8.032688]     [S] __raw_spin_lock_irqsave(&host->lock:0)
> [    8.032689]     [W] __seqprop_spinlock_wait(&p->alloc_lock:0)
> [    8.032689]     [E] spin_unlock(&host->lock:0)
> 
> [    8.032690] [S] __raw_spin_lock_irqsave(&host->lock:0):
> [    8.032690] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:2734 drivers/ata/libata-scsi.c:4017) 
> [    8.032694] stacktrace:
> [    8.032694] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:2734 drivers/ata/libata-scsi.c:4017) 
> [    8.032696] scsi_queue_rq (drivers/scsi/scsi_lib.c:1517 drivers/scsi/scsi_lib.c:1745) 
> [    8.032697] blk_mq_dispatch_rq_list (block/blk-mq.c:1858) 
> [    8.032700] blk_mq_do_dispatch_sched (block/blk-mq-sched.c:173 block/blk-mq-sched.c:187) 
> [    8.032701] __blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:313) 
> [    8.032702] blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:339) 
> [    8.032703] __blk_mq_run_hw_queue (./include/linux/rcupdate.h:723 block/blk-mq.c:1974) 
> [    8.032704] __blk_mq_delay_run_hw_queue (block/blk-mq.c:2052) 
> [    8.032705] blk_mq_run_hw_queue (block/blk-mq.c:2103) 
> [    8.032706] blk_mq_sched_insert_requests (./include/linux/rcupdate.h:692 ./include/linux/percpu-refcount.h:330 ./include/linux/percpu-refcount.h:351 block/blk-mq-sched.c:495) 
> [    8.032707] blk_mq_flush_plug_list (block/blk-mq.c:2640) 
> [    8.032708] __blk_flush_plug (block/blk-core.c:1247) 
> [    8.032709] blk_finish_plug (block/blk-core.c:1265 block/blk-core.c:1261) 
> [    8.032710] read_pages (mm/readahead.c:181) 
> [    8.032712] page_cache_ra_unbounded (./include/linux/fs.h:815 mm/readahead.c:262) 
> [    8.032713] page_cache_ra_order (mm/readahead.c:547) 
> 
> [    8.032714] [W] __seqprop_spinlock_wait(&p->alloc_lock:0):
> [    8.032714] __slab_alloc.constprop.0 (mm/slub.c:3092) 
> [    8.032717] stacktrace:
> [    8.032717] dept_wait (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:227 kernel/dependency/dept.c:1013 kernel/dependency/dept.c:1057 kernel/dependency/dept.c:2216) 
> [    8.032719] ___slab_alloc (./include/linux/seqlock.h:326 ./include/linux/cpuset.h:151 mm/slub.c:2223 mm/slub.c:2266 mm/slub.c:3000) 
> [    8.032720] __slab_alloc.constprop.0 (mm/slub.c:3092) 
> [    8.032721] kmem_cache_alloc (mm/slub.c:3183 mm/slub.c:3225 mm/slub.c:3232 mm/slub.c:3242) 
> [    8.032722] alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
> [    8.032724] alloc_iova_fast (drivers/iommu/iova.c:455) 
> [    8.032725] iommu_dma_alloc_iova (drivers/iommu/dma-iommu.c:628) 
> [    8.032726] iommu_dma_map_sg (drivers/iommu/dma-iommu.c:1201) 
> [    8.032727] __dma_map_sg_attrs (kernel/dma/mapping.c:195) 
> [    8.032729] dma_map_sg_attrs (kernel/dma/mapping.c:232) 
> [    8.032730] ata_qc_issue (drivers/ata/libata-core.c:4530 drivers/ata/libata-core.c:4876) 
> [    8.032731] __ata_scsi_queuecmd (drivers/ata/libata-scsi.c:1710 drivers/ata/libata-scsi.c:3974) 
> [    8.032732] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:4019) 
> [    8.032734] scsi_queue_rq (drivers/scsi/scsi_lib.c:1517 drivers/scsi/scsi_lib.c:1745) 
> [    8.032734] blk_mq_dispatch_rq_list (block/blk-mq.c:1858) 
> [    8.032735] blk_mq_do_dispatch_sched (block/blk-mq-sched.c:173 block/blk-mq-sched.c:187) 
> 
> [    8.032736] [E] spin_unlock(&host->lock:0):
> [    8.032737] (N/A)
> [    8.032737] ---------------------------------------------------
> [    8.032738] context B's detail
> [    8.032738] ---------------------------------------------------
> [    8.032738] context B
> [    8.032738]     [S] __raw_spin_lock(&dentry->d_lock:0)
> [    8.032739]     [W] __raw_spin_lock(&host->lock:0)
> [    8.032740]     [E] spin_unlock(&dentry->d_lock:0)
> 
> [    8.032740] [S] __raw_spin_lock(&dentry->d_lock:0):
> [    8.032741] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
> [    8.032743] stacktrace:
> [    8.032743] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
> [    8.032744] path_get (fs/namei.c:546) 
> [    8.032746] do_dentry_open (fs/open.c:778) 
> [    8.032747] vfs_open (fs/open.c:959) 
> [    8.032748] path_openat (fs/namei.c:3583 fs/namei.c:3602) 
> [    8.032749] do_filp_open (fs/namei.c:3636) 
> [    8.032750] do_sys_openat2 (fs/open.c:1213) 
> [    8.032751] __x64_sys_openat (fs/open.c:1240) 
> [    8.032752] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
> [    8.032754] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 
> 
> [    8.032756] [W] __raw_spin_lock(&host->lock:0):
> [    8.032756] ahci_single_level_irq_intr (drivers/ata/libahci.c:1970) libahci
> [    8.032759] stacktrace:
> [    8.032760] ahci_single_level_irq_intr (drivers/ata/libahci.c:1970) libahci
> [    8.032761] __handle_irq_event_percpu (kernel/irq/handle.c:158) 
> [    8.032763] handle_irq_event (kernel/irq/handle.c:195 kernel/irq/handle.c:210) 
> [    8.032763] handle_edge_irq (kernel/irq/chip.c:819) 
> [    8.032764] __common_interrupt (./include/asm-generic/irq_regs.h:28 (discriminator 22) arch/x86/kernel/irq.c:263 (discriminator 22)) 
> [    8.032766] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14)) 
> [    8.032768] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:636) 
> [    8.032769] lock_release (kernel/locking/lockdep.c:5665) 
> [    8.032771] _raw_spin_unlock (./include/linux/spinlock_api_smp.h:141 kernel/locking/spinlock.c:186) 
> [    8.032772] lockref_get (lib/lockref.c:55) 
> [    8.032772] path_get (fs/namei.c:546) 
> [    8.032774] do_dentry_open (fs/open.c:778) 
> [    8.032774] vfs_open (fs/open.c:959) 
> [    8.032775] path_openat (fs/namei.c:3583 fs/namei.c:3602) 
> [    8.032776] do_filp_open (fs/namei.c:3636) 
> [    8.032777] do_sys_openat2 (fs/open.c:1213) 
> 
> [    8.032778] [E] spin_unlock(&dentry->d_lock:0):
> [    8.032778] (N/A)
> [    8.032779] ---------------------------------------------------
> [    8.032779] context C's detail
> [    8.032779] ---------------------------------------------------
> [    8.032780] context C
> [    8.032780]     [S] __raw_spin_lock(&p->alloc_lock:0)
> [    8.032780]     [W] __raw_spin_lock(&dentry->d_lock:0)
> [    8.032781]     [E] spin_unlock(&p->alloc_lock:0)
> 
> [    8.032781] [S] __raw_spin_lock(&p->alloc_lock:0):
> [    8.032782] proc_root_link (fs/proc/base.c:177 fs/proc/base.c:208) 
> [    8.032784] stacktrace:
> [    8.032784] proc_root_link (fs/proc/base.c:177 fs/proc/base.c:208) 
> [    8.032784] proc_pid_get_link.part.0 (fs/proc/base.c:1756) 
> [    8.032785] proc_pid_get_link (fs/proc/base.c:1762) 
> [    8.032786] step_into (fs/namei.c:1819 fs/namei.c:1876) 
> [    8.032787] walk_component (fs/namei.c:2027) 
> [    8.032788] path_lookupat (fs/namei.c:2475 fs/namei.c:2499) 
> [    8.032789] filename_lookup (fs/namei.c:2528) 
> [    8.032790] vfs_statx (fs/stat.c:229) 
> [    8.032791] vfs_fstatat (fs/stat.c:256) 
> [    8.032792] __do_sys_newfstatat (fs/stat.c:426) 
> [    8.032793] __x64_sys_newfstatat (fs/stat.c:419) 
> [    8.032793] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
> [    8.032794] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 
> 
> [    8.032796] [W] __raw_spin_lock(&dentry->d_lock:0):
> [    8.032796] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
> [    8.032797] stacktrace:
> [    8.032797] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
> [    8.032798] path_get (fs/namei.c:546) 
> [    8.032799] proc_root_link (./include/linux/spinlock.h:410 ./include/linux/fs_struct.h:32 fs/proc/base.c:178 fs/proc/base.c:208) 
> [    8.032800] proc_pid_get_link.part.0 (fs/proc/base.c:1756) 
> [    8.032801] proc_pid_get_link (fs/proc/base.c:1762) 
> [    8.032801] step_into (fs/namei.c:1819 fs/namei.c:1876) 
> [    8.032802] walk_component (fs/namei.c:2027) 
> [    8.032803] path_lookupat (fs/namei.c:2475 fs/namei.c:2499) 
> [    8.032805] filename_lookup (fs/namei.c:2528) 
> [    8.032805] vfs_statx (fs/stat.c:229) 
> [    8.032806] vfs_fstatat (fs/stat.c:256) 
> [    8.032807] __do_sys_newfstatat (fs/stat.c:426) 
> [    8.032808] __x64_sys_newfstatat (fs/stat.c:419) 
> [    8.032808] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
> [    8.032809] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 
> 
> [    8.032810] [E] spin_unlock(&p->alloc_lock:0):
> [    8.032811] (N/A)
> [    8.032811] ---------------------------------------------------
> [    8.032811] information that might be helpful
> [    8.032812] ---------------------------------------------------
> [    8.032812] CPU: 4 PID: 534 Comm: systemd-tmpfile Tainted: G            E     5.18.0-rc7-dept+ #10
> [    8.032814] Hardware name: ASUS System Product Name/TUF GAMING B550-PLUS (WI-FI), BIOS 1401 12/03/2020
> [    8.032814] Call Trace:
> [    8.032815]  <TASK>
> [    8.032816] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
> [    8.032819] dump_stack (lib/dump_stack.c:114) 
> [    8.032820] print_circle.cold (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:143 kernel/dependency/dept.c:776) 
> [    8.032822] ? print_circle (kernel/dependency/dept.c:1107) 
> [    8.032824] cb_check_dl (kernel/dependency/dept.c:1133) 
> [    8.032825] bfs (kernel/dependency/dept.c:874) 
> [    8.032826] add_dep (kernel/dependency/dept.c:1457) 
> [    8.032828] add_wait (kernel/dependency/dept.c:1505) 
> [    8.032829] ? __slab_alloc.constprop.0 (mm/slub.c:3092) 
> [    8.032831] __dept_wait (kernel/dependency/dept.c:2156 (discriminator 2)) 
> [    8.032832] ? __slab_alloc.constprop.0 (mm/slub.c:3092) 
> [    8.032833] ? __slab_alloc.constprop.0 (mm/slub.c:3092) 
> [    8.032834] dept_wait (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:227 kernel/dependency/dept.c:1013 kernel/dependency/dept.c:1057 kernel/dependency/dept.c:2216) 
> [    8.032836] ___slab_alloc (./include/linux/seqlock.h:326 ./include/linux/cpuset.h:151 mm/slub.c:2223 mm/slub.c:2266 mm/slub.c:3000) 
> [    8.032837] ? alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
> [    8.032839] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:27 (discriminator 1)) 
> [    8.032841] ? alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
> [    8.032842] __slab_alloc.constprop.0 (mm/slub.c:3092) 
> [    8.032844] kmem_cache_alloc (mm/slub.c:3183 mm/slub.c:3225 mm/slub.c:3232 mm/slub.c:3242) 
> [    8.032845] ? alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
> [    8.032846] alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
> [    8.032847] ? dept_ecxt_exit (kernel/dependency/dept.c:2506 (discriminator 1)) 
> [    8.032849] alloc_iova_fast (drivers/iommu/iova.c:455) 
> [    8.032851] iommu_dma_alloc_iova (drivers/iommu/dma-iommu.c:628) 
> [    8.032852] iommu_dma_map_sg (drivers/iommu/dma-iommu.c:1201) 
> [    8.032854] ? ata_scsi_mode_select_xlat (drivers/ata/libata-scsi.c:1503) 
> [    8.032855] __dma_map_sg_attrs (kernel/dma/mapping.c:195) 
> [    8.032856] dma_map_sg_attrs (kernel/dma/mapping.c:232) 
> [    8.032858] ata_qc_issue (drivers/ata/libata-core.c:4530 drivers/ata/libata-core.c:4876) 
> [    8.032859] __ata_scsi_queuecmd (drivers/ata/libata-scsi.c:1710 drivers/ata/libata-scsi.c:3974) 
> [    8.032861] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:4019) 
> [    8.032862] scsi_queue_rq (drivers/scsi/scsi_lib.c:1517 drivers/scsi/scsi_lib.c:1745) 
> [    8.032864] blk_mq_dispatch_rq_list (block/blk-mq.c:1858) 
> [    8.032866] ? sbitmap_get (lib/sbitmap.c:179 lib/sbitmap.c:206 lib/sbitmap.c:231) 
> [    8.032869] blk_mq_do_dispatch_sched (block/blk-mq-sched.c:173 block/blk-mq-sched.c:187) 
> [    8.032871] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) 
> [    8.032872] __blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:313) 
> [    8.032874] blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:339) 
> [    8.032875] __blk_mq_run_hw_queue (./include/linux/rcupdate.h:723 block/blk-mq.c:1974) 
> [    8.032876] __blk_mq_delay_run_hw_queue (block/blk-mq.c:2052) 
> [    8.032877] blk_mq_run_hw_queue (block/blk-mq.c:2103) 
> [    8.032879] blk_mq_sched_insert_requests (./include/linux/rcupdate.h:692 ./include/linux/percpu-refcount.h:330 ./include/linux/percpu-refcount.h:351 block/blk-mq-sched.c:495) 
> [    8.032880] blk_mq_flush_plug_list (block/blk-mq.c:2640) 
> [    8.032882] __blk_flush_plug (block/blk-core.c:1247) 
> [    8.032883] ? lock_release (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5664 (discriminator 1)) 
> [    8.032885] blk_finish_plug (block/blk-core.c:1265 block/blk-core.c:1261) 
> [    8.032886] read_pages (mm/readahead.c:181) 
> [    8.032888] page_cache_ra_unbounded (./include/linux/fs.h:815 mm/readahead.c:262) 
> [    8.032890] page_cache_ra_order (mm/readahead.c:547) 
> [    8.032892] ondemand_readahead (mm/readahead.c:669) 
> [    8.032893] page_cache_sync_ra (mm/readahead.c:696) 
> [    8.032894] filemap_get_pages (mm/filemap.c:2613) 
> [    8.032896] ? lock_is_held_type (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5686 (discriminator 1)) 
> [    8.032898] filemap_read (mm/filemap.c:2698) 
> [    8.032900] ? lock_is_held_type (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5686 (discriminator 1)) 
> [    8.032901] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) 
> [    8.032901] ? lock_is_held_type (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5686 (discriminator 1)) 
> [    8.032903] ? sched_clock (arch/x86/kernel/tsc.c:254) 
> [    8.032904] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) 
> [    8.032905] ? lock_release (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5664 (discriminator 1)) 
> [    8.032907] generic_file_read_iter (mm/filemap.c:2845) 
> [    8.032908] ? aa_file_perm (security/apparmor/file.c:644) 
> [    8.032910] ext4_file_read_iter (fs/ext4/file.c:133) 
> [    8.032912] new_sync_read (fs/read_write.c:402 (discriminator 1)) 
> [    8.032915] vfs_read (fs/read_write.c:482) 
> [    8.032916] ksys_read (fs/read_write.c:620) 
> [    8.032918] __x64_sys_read (fs/read_write.c:628) 
> [    8.032919] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
> [    8.032920] ? do_syscall_64 (arch/x86/entry/common.c:89) 
> [    8.032921] ? syscall_exit_to_user_mode (kernel/entry/common.c:297) 
> [    8.032922] ? do_syscall_64 (arch/x86/entry/common.c:89) 
> [    8.032924] ? do_syscall_64 (arch/x86/entry/common.c:89) 
> [    8.032925] ? do_syscall_64 (./arch/x86/include/asm/jump_label.h:27 arch/x86/entry/common.c:77) 
> [    8.032926] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 
> [    8.032927] RIP: 0033:0x7f66de513932
> [ 8.032928] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 3a b9 0c 00 e8 15 1a 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
> All code
> ========
>    0:	c0 e9 b2             	shr    $0xb2,%cl
>    3:	fe                   	(bad)  
>    4:	ff                   	(bad)  
>    5:	ff 50 48             	call   *0x48(%rax)
>    8:	8d 3d 3a b9 0c 00    	lea    0xcb93a(%rip),%edi        # 0xcb948
>    e:	e8 15 1a 02 00       	call   0x21a28
>   13:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>   18:	f3 0f 1e fa          	endbr64 
>   1c:	64 8b 04 25 18 00 00 	mov    %fs:0x18,%eax
>   23:	00 
>   24:	85 c0                	test   %eax,%eax
>   26:	75 10                	jne    0x38
>   28:	0f 05                	syscall 
>   2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
>   30:	77 56                	ja     0x88
>   32:	c3                   	ret    
>   33:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>   38:	48 83 ec 28          	sub    $0x28,%rsp
>   3c:	48                   	rex.W
>   3d:	89                   	.byte 0x89
>   3e:	54                   	push   %rsp
>   3f:	24                   	.byte 0x24
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
>    6:	77 56                	ja     0x5e
>    8:	c3                   	ret    
>    9:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>    e:	48 83 ec 28          	sub    $0x28,%rsp
>   12:	48                   	rex.W
>   13:	89                   	.byte 0x89
>   14:	54                   	push   %rsp
>   15:	24                   	.byte 0x24
> [    8.032929] RSP: 002b:00007ffcdce2cee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [    8.032931] RAX: ffffffffffffffda RBX: 000056271b3552d0 RCX: 00007f66de513932
> [    8.032932] RDX: 0000000000001000 RSI: 000056271b357f00 RDI: 0000000000000004
> [    8.032932] RBP: 00007f66de616600 R08: 0000000000000004 R09: 000056271b358f00
> [    8.032933] R10: 000056271b357ef0 R11: 0000000000000246 R12: 00007f66de62aec0
> [    8.032934] R13: 0000000000000d68 R14: 00007f66de615a00 R15: 0000000000000d68
> [    8.032936]  </TASK>
> 
> -- 
> Thanks,
> Hyeonggon

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH RFC v6 07/21] dept: Apply Dept to seqlock
@ 2022-05-24  6:00       ` Byungchul Park
  0 siblings, 0 replies; 105+ messages in thread
From: Byungchul Park @ 2022-05-24  6:00 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: hamohammed.sa, jack, peterz, daniel.vetter, amir73il, david,
	dri-devel, chris, bfields, linux-ide, adilger.kernel, joel, cl,
	will, duyuyang, sashal, paolo.valente, damien.lemoal, willy, hch,
	airlied, mingo, djwong, vdavydov.dev, rientjes, dennis,
	linux-ext4, linux-mm, ngupta, johannes.berg, jack,
	dan.j.williams, josef, rostedt, linux-block, linux-fsdevel,
	jglisse, viro, tglx, mhocko, vbabka, melissa.srw, sj, tytso,
	rodrigosiqueiramelo, kernel-team, gregkh, jlayton, linux-kernel,
	penberg, minchan, hannes, tj, akpm, torvalds

On Sat, May 21, 2022 at 02:25:35PM +0900, Hyeonggon Yoo wrote:
> Hello I got new report from dept, related to seqlock.
> I applied dept 1.20 series on v5.18-rc7.
> 
> Below is what DEPT reported.
> I think this is bogus because reader of p->alloc_lock cannot block
> its writer. Or please kindly tell me if I'm missing something ;)

Hi,

Yes, it should've been silent. I will fix it. Thank you!

Just FYI, a reader of seq-reader is a wait blocking the following event
e.i. spin_unlock(&host->lock) in here, not its writer. This report tells
the writer is blocked by __raw_spin_lock(&dentry->d_lock), not by its
reader. I've explained it just for your information. (:

Thank you!

	Byungchul


> 
> Thanks.
> 
> [    8.032674] ===================================================
> [    8.032676] DEPT: Circular dependency has been detected.
> [    8.032677] 5.18.0-rc7-dept+ #10 Tainted: G            E
> [    8.032677] ---------------------------------------------------
> [    8.032678] summary
> [    8.032678] ---------------------------------------------------
> [    8.032679] *** DEADLOCK ***
> 
> [    8.032679] context A
> [    8.032679]     [S] __raw_spin_lock_irqsave(&host->lock:0)
> [    8.032681]     [W] __seqprop_spinlock_wait(&p->alloc_lock:0)
> [    8.032681]     [E] spin_unlock(&host->lock:0)
> 
> [    8.032682] context B
> [    8.032682]     [S] __raw_spin_lock(&dentry->d_lock:0)
> [    8.032683]     [W] __raw_spin_lock(&host->lock:0)
> [    8.032684]     [E] spin_unlock(&dentry->d_lock:0)
> 
> [    8.032684] context C
> [    8.032685]     [S] __raw_spin_lock(&p->alloc_lock:0)
> [    8.032685]     [W] __raw_spin_lock(&dentry->d_lock:0)
> [    8.032685]     [E] spin_unlock(&p->alloc_lock:0)
> 
> [    8.032686] [S]: start of the event context
> [    8.032686] [W]: the wait blocked
> [    8.032687] [E]: the event not reachable
> [    8.032687] ---------------------------------------------------
> [    8.032687] context A's detail
> [    8.032688] ---------------------------------------------------
> [    8.032688] context A
> [    8.032688]     [S] __raw_spin_lock_irqsave(&host->lock:0)
> [    8.032689]     [W] __seqprop_spinlock_wait(&p->alloc_lock:0)
> [    8.032689]     [E] spin_unlock(&host->lock:0)
> 
> [    8.032690] [S] __raw_spin_lock_irqsave(&host->lock:0):
> [    8.032690] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:2734 drivers/ata/libata-scsi.c:4017) 
> [    8.032694] stacktrace:
> [    8.032694] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:2734 drivers/ata/libata-scsi.c:4017) 
> [    8.032696] scsi_queue_rq (drivers/scsi/scsi_lib.c:1517 drivers/scsi/scsi_lib.c:1745) 
> [    8.032697] blk_mq_dispatch_rq_list (block/blk-mq.c:1858) 
> [    8.032700] blk_mq_do_dispatch_sched (block/blk-mq-sched.c:173 block/blk-mq-sched.c:187) 
> [    8.032701] __blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:313) 
> [    8.032702] blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:339) 
> [    8.032703] __blk_mq_run_hw_queue (./include/linux/rcupdate.h:723 block/blk-mq.c:1974) 
> [    8.032704] __blk_mq_delay_run_hw_queue (block/blk-mq.c:2052) 
> [    8.032705] blk_mq_run_hw_queue (block/blk-mq.c:2103) 
> [    8.032706] blk_mq_sched_insert_requests (./include/linux/rcupdate.h:692 ./include/linux/percpu-refcount.h:330 ./include/linux/percpu-refcount.h:351 block/blk-mq-sched.c:495) 
> [    8.032707] blk_mq_flush_plug_list (block/blk-mq.c:2640) 
> [    8.032708] __blk_flush_plug (block/blk-core.c:1247) 
> [    8.032709] blk_finish_plug (block/blk-core.c:1265 block/blk-core.c:1261) 
> [    8.032710] read_pages (mm/readahead.c:181) 
> [    8.032712] page_cache_ra_unbounded (./include/linux/fs.h:815 mm/readahead.c:262) 
> [    8.032713] page_cache_ra_order (mm/readahead.c:547) 
> 
> [    8.032714] [W] __seqprop_spinlock_wait(&p->alloc_lock:0):
> [    8.032714] __slab_alloc.constprop.0 (mm/slub.c:3092) 
> [    8.032717] stacktrace:
> [    8.032717] dept_wait (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:227 kernel/dependency/dept.c:1013 kernel/dependency/dept.c:1057 kernel/dependency/dept.c:2216) 
> [    8.032719] ___slab_alloc (./include/linux/seqlock.h:326 ./include/linux/cpuset.h:151 mm/slub.c:2223 mm/slub.c:2266 mm/slub.c:3000) 
> [    8.032720] __slab_alloc.constprop.0 (mm/slub.c:3092) 
> [    8.032721] kmem_cache_alloc (mm/slub.c:3183 mm/slub.c:3225 mm/slub.c:3232 mm/slub.c:3242) 
> [    8.032722] alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
> [    8.032724] alloc_iova_fast (drivers/iommu/iova.c:455) 
> [    8.032725] iommu_dma_alloc_iova (drivers/iommu/dma-iommu.c:628) 
> [    8.032726] iommu_dma_map_sg (drivers/iommu/dma-iommu.c:1201) 
> [    8.032727] __dma_map_sg_attrs (kernel/dma/mapping.c:195) 
> [    8.032729] dma_map_sg_attrs (kernel/dma/mapping.c:232) 
> [    8.032730] ata_qc_issue (drivers/ata/libata-core.c:4530 drivers/ata/libata-core.c:4876) 
> [    8.032731] __ata_scsi_queuecmd (drivers/ata/libata-scsi.c:1710 drivers/ata/libata-scsi.c:3974) 
> [    8.032732] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:4019) 
> [    8.032734] scsi_queue_rq (drivers/scsi/scsi_lib.c:1517 drivers/scsi/scsi_lib.c:1745) 
> [    8.032734] blk_mq_dispatch_rq_list (block/blk-mq.c:1858) 
> [    8.032735] blk_mq_do_dispatch_sched (block/blk-mq-sched.c:173 block/blk-mq-sched.c:187) 
> 
> [    8.032736] [E] spin_unlock(&host->lock:0):
> [    8.032737] (N/A)
> [    8.032737] ---------------------------------------------------
> [    8.032738] context B's detail
> [    8.032738] ---------------------------------------------------
> [    8.032738] context B
> [    8.032738]     [S] __raw_spin_lock(&dentry->d_lock:0)
> [    8.032739]     [W] __raw_spin_lock(&host->lock:0)
> [    8.032740]     [E] spin_unlock(&dentry->d_lock:0)
> 
> [    8.032740] [S] __raw_spin_lock(&dentry->d_lock:0):
> [    8.032741] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
> [    8.032743] stacktrace:
> [    8.032743] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
> [    8.032744] path_get (fs/namei.c:546) 
> [    8.032746] do_dentry_open (fs/open.c:778) 
> [    8.032747] vfs_open (fs/open.c:959) 
> [    8.032748] path_openat (fs/namei.c:3583 fs/namei.c:3602) 
> [    8.032749] do_filp_open (fs/namei.c:3636) 
> [    8.032750] do_sys_openat2 (fs/open.c:1213) 
> [    8.032751] __x64_sys_openat (fs/open.c:1240) 
> [    8.032752] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
> [    8.032754] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 
> 
> [    8.032756] [W] __raw_spin_lock(&host->lock:0):
> [    8.032756] ahci_single_level_irq_intr (drivers/ata/libahci.c:1970) libahci
> [    8.032759] stacktrace:
> [    8.032760] ahci_single_level_irq_intr (drivers/ata/libahci.c:1970) libahci
> [    8.032761] __handle_irq_event_percpu (kernel/irq/handle.c:158) 
> [    8.032763] handle_irq_event (kernel/irq/handle.c:195 kernel/irq/handle.c:210) 
> [    8.032763] handle_edge_irq (kernel/irq/chip.c:819) 
> [    8.032764] __common_interrupt (./include/asm-generic/irq_regs.h:28 (discriminator 22) arch/x86/kernel/irq.c:263 (discriminator 22)) 
> [    8.032766] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14)) 
> [    8.032768] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:636) 
> [    8.032769] lock_release (kernel/locking/lockdep.c:5665) 
> [    8.032771] _raw_spin_unlock (./include/linux/spinlock_api_smp.h:141 kernel/locking/spinlock.c:186) 
> [    8.032772] lockref_get (lib/lockref.c:55) 
> [    8.032772] path_get (fs/namei.c:546) 
> [    8.032774] do_dentry_open (fs/open.c:778) 
> [    8.032774] vfs_open (fs/open.c:959) 
> [    8.032775] path_openat (fs/namei.c:3583 fs/namei.c:3602) 
> [    8.032776] do_filp_open (fs/namei.c:3636) 
> [    8.032777] do_sys_openat2 (fs/open.c:1213) 
> 
> [    8.032778] [E] spin_unlock(&dentry->d_lock:0):
> [    8.032778] (N/A)
> [    8.032779] ---------------------------------------------------
> [    8.032779] context C's detail
> [    8.032779] ---------------------------------------------------
> [    8.032780] context C
> [    8.032780]     [S] __raw_spin_lock(&p->alloc_lock:0)
> [    8.032780]     [W] __raw_spin_lock(&dentry->d_lock:0)
> [    8.032781]     [E] spin_unlock(&p->alloc_lock:0)
> 
> [    8.032781] [S] __raw_spin_lock(&p->alloc_lock:0):
> [    8.032782] proc_root_link (fs/proc/base.c:177 fs/proc/base.c:208) 
> [    8.032784] stacktrace:
> [    8.032784] proc_root_link (fs/proc/base.c:177 fs/proc/base.c:208) 
> [    8.032784] proc_pid_get_link.part.0 (fs/proc/base.c:1756) 
> [    8.032785] proc_pid_get_link (fs/proc/base.c:1762) 
> [    8.032786] step_into (fs/namei.c:1819 fs/namei.c:1876) 
> [    8.032787] walk_component (fs/namei.c:2027) 
> [    8.032788] path_lookupat (fs/namei.c:2475 fs/namei.c:2499) 
> [    8.032789] filename_lookup (fs/namei.c:2528) 
> [    8.032790] vfs_statx (fs/stat.c:229) 
> [    8.032791] vfs_fstatat (fs/stat.c:256) 
> [    8.032792] __do_sys_newfstatat (fs/stat.c:426) 
> [    8.032793] __x64_sys_newfstatat (fs/stat.c:419) 
> [    8.032793] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
> [    8.032794] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 
> 
> [    8.032796] [W] __raw_spin_lock(&dentry->d_lock:0):
> [    8.032796] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
> [    8.032797] stacktrace:
> [    8.032797] lockref_get (./include/linux/spinlock.h:410 lib/lockref.c:54) 
> [    8.032798] path_get (fs/namei.c:546) 
> [    8.032799] proc_root_link (./include/linux/spinlock.h:410 ./include/linux/fs_struct.h:32 fs/proc/base.c:178 fs/proc/base.c:208) 
> [    8.032800] proc_pid_get_link.part.0 (fs/proc/base.c:1756) 
> [    8.032801] proc_pid_get_link (fs/proc/base.c:1762) 
> [    8.032801] step_into (fs/namei.c:1819 fs/namei.c:1876) 
> [    8.032802] walk_component (fs/namei.c:2027) 
> [    8.032803] path_lookupat (fs/namei.c:2475 fs/namei.c:2499) 
> [    8.032805] filename_lookup (fs/namei.c:2528) 
> [    8.032805] vfs_statx (fs/stat.c:229) 
> [    8.032806] vfs_fstatat (fs/stat.c:256) 
> [    8.032807] __do_sys_newfstatat (fs/stat.c:426) 
> [    8.032808] __x64_sys_newfstatat (fs/stat.c:419) 
> [    8.032808] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
> [    8.032809] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 
> 
> [    8.032810] [E] spin_unlock(&p->alloc_lock:0):
> [    8.032811] (N/A)
> [    8.032811] ---------------------------------------------------
> [    8.032811] information that might be helpful
> [    8.032812] ---------------------------------------------------
> [    8.032812] CPU: 4 PID: 534 Comm: systemd-tmpfile Tainted: G            E     5.18.0-rc7-dept+ #10
> [    8.032814] Hardware name: ASUS System Product Name/TUF GAMING B550-PLUS (WI-FI), BIOS 1401 12/03/2020
> [    8.032814] Call Trace:
> [    8.032815]  <TASK>
> [    8.032816] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
> [    8.032819] dump_stack (lib/dump_stack.c:114) 
> [    8.032820] print_circle.cold (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:143 kernel/dependency/dept.c:776) 
> [    8.032822] ? print_circle (kernel/dependency/dept.c:1107) 
> [    8.032824] cb_check_dl (kernel/dependency/dept.c:1133) 
> [    8.032825] bfs (kernel/dependency/dept.c:874) 
> [    8.032826] add_dep (kernel/dependency/dept.c:1457) 
> [    8.032828] add_wait (kernel/dependency/dept.c:1505) 
> [    8.032829] ? __slab_alloc.constprop.0 (mm/slub.c:3092) 
> [    8.032831] __dept_wait (kernel/dependency/dept.c:2156 (discriminator 2)) 
> [    8.032832] ? __slab_alloc.constprop.0 (mm/slub.c:3092) 
> [    8.032833] ? __slab_alloc.constprop.0 (mm/slub.c:3092) 
> [    8.032834] dept_wait (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:227 kernel/dependency/dept.c:1013 kernel/dependency/dept.c:1057 kernel/dependency/dept.c:2216) 
> [    8.032836] ___slab_alloc (./include/linux/seqlock.h:326 ./include/linux/cpuset.h:151 mm/slub.c:2223 mm/slub.c:2266 mm/slub.c:3000) 
> [    8.032837] ? alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
> [    8.032839] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:27 (discriminator 1)) 
> [    8.032841] ? alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
> [    8.032842] __slab_alloc.constprop.0 (mm/slub.c:3092) 
> [    8.032844] kmem_cache_alloc (mm/slub.c:3183 mm/slub.c:3225 mm/slub.c:3232 mm/slub.c:3242) 
> [    8.032845] ? alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
> [    8.032846] alloc_iova (./include/linux/slab.h:704 drivers/iommu/iova.c:240 drivers/iommu/iova.c:316) 
> [    8.032847] ? dept_ecxt_exit (kernel/dependency/dept.c:2506 (discriminator 1)) 
> [    8.032849] alloc_iova_fast (drivers/iommu/iova.c:455) 
> [    8.032851] iommu_dma_alloc_iova (drivers/iommu/dma-iommu.c:628) 
> [    8.032852] iommu_dma_map_sg (drivers/iommu/dma-iommu.c:1201) 
> [    8.032854] ? ata_scsi_mode_select_xlat (drivers/ata/libata-scsi.c:1503) 
> [    8.032855] __dma_map_sg_attrs (kernel/dma/mapping.c:195) 
> [    8.032856] dma_map_sg_attrs (kernel/dma/mapping.c:232) 
> [    8.032858] ata_qc_issue (drivers/ata/libata-core.c:4530 drivers/ata/libata-core.c:4876) 
> [    8.032859] __ata_scsi_queuecmd (drivers/ata/libata-scsi.c:1710 drivers/ata/libata-scsi.c:3974) 
> [    8.032861] ata_scsi_queuecmd (drivers/ata/libata-scsi.c:4019) 
> [    8.032862] scsi_queue_rq (drivers/scsi/scsi_lib.c:1517 drivers/scsi/scsi_lib.c:1745) 
> [    8.032864] blk_mq_dispatch_rq_list (block/blk-mq.c:1858) 
> [    8.032866] ? sbitmap_get (lib/sbitmap.c:179 lib/sbitmap.c:206 lib/sbitmap.c:231) 
> [    8.032869] blk_mq_do_dispatch_sched (block/blk-mq-sched.c:173 block/blk-mq-sched.c:187) 
> [    8.032871] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) 
> [    8.032872] __blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:313) 
> [    8.032874] blk_mq_sched_dispatch_requests (block/blk-mq-sched.c:339) 
> [    8.032875] __blk_mq_run_hw_queue (./include/linux/rcupdate.h:723 block/blk-mq.c:1974) 
> [    8.032876] __blk_mq_delay_run_hw_queue (block/blk-mq.c:2052) 
> [    8.032877] blk_mq_run_hw_queue (block/blk-mq.c:2103) 
> [    8.032879] blk_mq_sched_insert_requests (./include/linux/rcupdate.h:692 ./include/linux/percpu-refcount.h:330 ./include/linux/percpu-refcount.h:351 block/blk-mq-sched.c:495) 
> [    8.032880] blk_mq_flush_plug_list (block/blk-mq.c:2640) 
> [    8.032882] __blk_flush_plug (block/blk-core.c:1247) 
> [    8.032883] ? lock_release (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5664 (discriminator 1)) 
> [    8.032885] blk_finish_plug (block/blk-core.c:1265 block/blk-core.c:1261) 
> [    8.032886] read_pages (mm/readahead.c:181) 
> [    8.032888] page_cache_ra_unbounded (./include/linux/fs.h:815 mm/readahead.c:262) 
> [    8.032890] page_cache_ra_order (mm/readahead.c:547) 
> [    8.032892] ondemand_readahead (mm/readahead.c:669) 
> [    8.032893] page_cache_sync_ra (mm/readahead.c:696) 
> [    8.032894] filemap_get_pages (mm/filemap.c:2613) 
> [    8.032896] ? lock_is_held_type (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5686 (discriminator 1)) 
> [    8.032898] filemap_read (mm/filemap.c:2698) 
> [    8.032900] ? lock_is_held_type (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5686 (discriminator 1)) 
> [    8.032901] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) 
> [    8.032901] ? lock_is_held_type (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5686 (discriminator 1)) 
> [    8.032903] ? sched_clock (arch/x86/kernel/tsc.c:254) 
> [    8.032904] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) 
> [    8.032905] ? lock_release (./arch/x86/include/asm/paravirt.h:704 (discriminator 1) ./arch/x86/include/asm/irqflags.h:138 (discriminator 1) kernel/locking/lockdep.c:5664 (discriminator 1)) 
> [    8.032907] generic_file_read_iter (mm/filemap.c:2845) 
> [    8.032908] ? aa_file_perm (security/apparmor/file.c:644) 
> [    8.032910] ext4_file_read_iter (fs/ext4/file.c:133) 
> [    8.032912] new_sync_read (fs/read_write.c:402 (discriminator 1)) 
> [    8.032915] vfs_read (fs/read_write.c:482) 
> [    8.032916] ksys_read (fs/read_write.c:620) 
> [    8.032918] __x64_sys_read (fs/read_write.c:628) 
> [    8.032919] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) 
> [    8.032920] ? do_syscall_64 (arch/x86/entry/common.c:89) 
> [    8.032921] ? syscall_exit_to_user_mode (kernel/entry/common.c:297) 
> [    8.032922] ? do_syscall_64 (arch/x86/entry/common.c:89) 
> [    8.032924] ? do_syscall_64 (arch/x86/entry/common.c:89) 
> [    8.032925] ? do_syscall_64 (./arch/x86/include/asm/jump_label.h:27 arch/x86/entry/common.c:77) 
> [    8.032926] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 
> [    8.032927] RIP: 0033:0x7f66de513932
> [ 8.032928] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 3a b9 0c 00 e8 15 1a 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
> All code
> ========
>    0:	c0 e9 b2             	shr    $0xb2,%cl
>    3:	fe                   	(bad)  
>    4:	ff                   	(bad)  
>    5:	ff 50 48             	call   *0x48(%rax)
>    8:	8d 3d 3a b9 0c 00    	lea    0xcb93a(%rip),%edi        # 0xcb948
>    e:	e8 15 1a 02 00       	call   0x21a28
>   13:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>   18:	f3 0f 1e fa          	endbr64 
>   1c:	64 8b 04 25 18 00 00 	mov    %fs:0x18,%eax
>   23:	00 
>   24:	85 c0                	test   %eax,%eax
>   26:	75 10                	jne    0x38
>   28:	0f 05                	syscall 
>   2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
>   30:	77 56                	ja     0x88
>   32:	c3                   	ret    
>   33:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>   38:	48 83 ec 28          	sub    $0x28,%rsp
>   3c:	48                   	rex.W
>   3d:	89                   	.byte 0x89
>   3e:	54                   	push   %rsp
>   3f:	24                   	.byte 0x24
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
>    6:	77 56                	ja     0x5e
>    8:	c3                   	ret    
>    9:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>    e:	48 83 ec 28          	sub    $0x28,%rsp
>   12:	48                   	rex.W
>   13:	89                   	.byte 0x89
>   14:	54                   	push   %rsp
>   15:	24                   	.byte 0x24
> [    8.032929] RSP: 002b:00007ffcdce2cee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [    8.032931] RAX: ffffffffffffffda RBX: 000056271b3552d0 RCX: 00007f66de513932
> [    8.032932] RDX: 0000000000001000 RSI: 000056271b357f00 RDI: 0000000000000004
> [    8.032932] RBP: 00007f66de616600 R08: 0000000000000004 R09: 000056271b358f00
> [    8.032933] R10: 000056271b357ef0 R11: 0000000000000246 R12: 00007f66de62aec0
> [    8.032934] R13: 0000000000000d68 R14: 00007f66de615a00 R15: 0000000000000d68
> [    8.032936]  </TASK>
> 
> -- 
> Thanks,
> Hyeonggon

^ permalink raw reply	[flat|nested] 105+ messages in thread

end of thread, other threads:[~2022-05-24  6:02 UTC | newest]

Thread overview: 105+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-04  8:17 [PATCH RFC v6 00/21] DEPT(Dependency Tracker) Byungchul Park
2022-05-04  8:17 ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 01/21] llist: Move llist_{head,node} definition to types.h Byungchul Park
2022-05-04  8:17   ` [PATCH RFC v6 01/21] llist: Move llist_{head, node} " Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 02/21] dept: Implement Dept(Dependency Tracker) Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04 13:29   ` kernel test robot
2022-05-21  3:24   ` Hyeonggon Yoo
2022-05-21  3:24     ` Hyeonggon Yoo
2022-05-04  8:17 ` [PATCH RFC v6 03/21] dept: Apply Dept to spinlock Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 04/21] dept: Apply Dept to mutex families Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 05/21] dept: Apply Dept to rwlock Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 06/21] dept: Apply Dept to wait_for_completion()/complete() Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 07/21] dept: Apply Dept to seqlock Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-21  5:25   ` Hyeonggon Yoo
2022-05-21  5:25     ` Hyeonggon Yoo
2022-05-24  6:00     ` Byungchul Park
2022-05-24  6:00       ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 08/21] dept: Apply Dept to rwsem Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 09/21] dept: Add proc knobs to show stats and dependency graph Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 10/21] dept: Introduce split map concept and new APIs for them Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 11/21] dept: Apply Dept to wait/event of PG_{locked,writeback} Byungchul Park
2022-05-04  8:17   ` [PATCH RFC v6 11/21] dept: Apply Dept to wait/event of PG_{locked, writeback} Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 12/21] dept: Apply SDT to swait Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 13/21] dept: Apply SDT to wait(waitqueue) Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 14/21] locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 15/21] dept: Distinguish each syscall context from another Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 16/21] dept: Distinguish each work " Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04 11:23   ` Sergey Shtylyov
2022-05-04 11:23     ` Sergey Shtylyov
2022-05-04  8:17 ` [PATCH RFC v6 17/21] dept: Disable Dept within the wait_bit layer by default Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 18/21] dept: Disable Dept on struct crypto_larval's completion for now Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 19/21] dept: Differentiate onstack maps from others of different tasks in class Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 20/21] dept: Do not add dependencies between events within scheduler and sleeps Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04  8:17 ` [PATCH RFC v6 21/21] dept: Unstage wait when tagging a normal sleep wait Byungchul Park
2022-05-04  8:17   ` Byungchul Park
2022-05-04 18:17 ` [PATCH RFC v6 00/21] DEPT(Dependency Tracker) Linus Torvalds
2022-05-04 18:17   ` Linus Torvalds
2022-05-06  0:11   ` Byungchul Park
2022-05-06  0:11     ` Byungchul Park
2022-05-07  7:20     ` Hyeonggon Yoo
2022-05-07  7:20       ` Hyeonggon Yoo
2022-05-09  0:16       ` Byungchul Park
2022-05-09  0:16         ` Byungchul Park
2022-05-09 20:47         ` Steven Rostedt
2022-05-09 20:47           ` Steven Rostedt
2022-05-09 23:38           ` Byungchul Park
2022-05-09 23:38             ` Byungchul Park
2022-05-10 14:12             ` Steven Rostedt
2022-05-10 14:12               ` Steven Rostedt
2022-05-10 23:26               ` Byungchul Park
2022-05-10 23:26                 ` Byungchul Park
2022-05-10 11:18         ` Hyeonggon Yoo
2022-05-10 11:18           ` Hyeonggon Yoo
2022-05-10 23:39           ` Byungchul Park
2022-05-10 23:39             ` Byungchul Park
2022-05-11 10:04             ` Hyeonggon Yoo
2022-05-11 10:04               ` Hyeonggon Yoo
2022-05-19 10:11               ` Catalin Marinas
2022-05-19 10:11                 ` Catalin Marinas
2022-05-23  2:43                 ` Byungchul Park
2022-05-23  2:43                   ` Byungchul Park
2022-05-09  1:22   ` Byungchul Park
2022-05-09  1:22     ` Byungchul Park
2022-05-09 21:05 ` Theodore Ts'o
2022-05-09 21:05   ` Theodore Ts'o
2022-05-09 22:28   ` Theodore Ts'o
2022-05-09 22:28     ` Theodore Ts'o
2022-05-10  0:32     ` Byungchul Park
2022-05-10  0:32       ` Byungchul Park
2022-05-10  1:32       ` Theodore Ts'o
2022-05-10  1:32         ` Theodore Ts'o
2022-05-10  5:37         ` Byungchul Park
2022-05-10  5:37           ` Byungchul Park
2022-05-11  1:16           ` Byungchul Park
2022-05-11  1:16             ` Byungchul Park
2022-05-12  5:25 ` [REPORT] syscall reboot + umh + firmware fallback Byungchul Park
2022-05-12  5:25   ` Byungchul Park
2022-05-12  9:15   ` Tejun Heo
2022-05-12  9:15     ` Tejun Heo
2022-05-12 11:18     ` Byungchul Park
2022-05-12 11:18       ` Byungchul Park
2022-05-12 13:56       ` Theodore Ts'o
2022-05-12 13:56         ` Theodore Ts'o
2022-05-23  1:10         ` Byungchul Park
2022-05-23  1:10           ` Byungchul Park
2022-05-12 16:41       ` Tejun Heo
2022-05-12 16:41         ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.