linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/16] DEPT(Dependency Tracker)
@ 2022-02-17 10:57 Byungchul Park
  2022-02-17 10:57 ` [PATCH 01/16] llist: Move llist_{head,node} definition to types.h Byungchul Park
                   ` (17 more replies)
  0 siblings, 18 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

Hi Linus and folks,

I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock(?) acquisition order to try to
cover all synchonization machanisms. It's done on v5.17-rc1 tag.

https://github.com/lgebyungchulpark/linux-dept/commits/dept1.12_on_v5.17-rc1

Benifit:

	0. Works with all lock primitives.
	1. Works with wait_for_completion()/complete().
	2. Works with 'wait' on PG_locked.
	3. Works with 'wait' on PG_writeback.
	4. Works with swait/wakeup.
	5. Works with waitqueue.
	6. Multiple reports are allowed.
	7. Deduplication control on multiple reports.
	8. Withstand false positives thanks to 6.
	9. Easy to tag any wait/event.

Future work:

	0. To make it more stable.
	1. To separates Dept from Lockdep.
	2. To improves performance in terms of time and space.
	3. To use Dept as a dependency engine for Lockdep.
	4. To add any missing tags of wait/event in the kernel.
	5. To deduplicate stack trace.

I've got several reports from the tool. Some of them look like false
alarms and some others look like real deadlock possibility. Because of
my unfamiliarity of the domain, it's hard to confirm if it's a real one.
Let me add the reports on this email thread.

How to interpret the report is:

	1. E(event) in each context cannot be triggered because of the
	   W(wait) that cannot be woken.
	2. The stack trace helping find the problematic code is located
	   in each conext's detail.

Changes from RFC:

	1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
	2. Use try version at lockdep_acquire_cpus_lock() annotation.
	3. Distinguish each syscall context from another.

Thanks,
Byungchul

Byungchul Park (16):
  llist: Move llist_{head,node} definition to types.h
  dept: Implement Dept(Dependency Tracker)
  dept: Embed Dept data in Lockdep
  dept: Apply Dept to spinlock
  dept: Apply Dept to mutex families
  dept: Apply Dept to rwlock
  dept: Apply Dept to wait_for_completion()/complete()
  dept: Apply Dept to seqlock
  dept: Apply Dept to rwsem
  dept: Add proc knobs to show stats and dependency graph
  dept: Introduce split map concept and new APIs for them
  dept: Apply Dept to wait/event of PG_{locked,writeback}
  dept: Apply SDT to swait
  dept: Apply SDT to wait(waitqueue)
  locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread
  dept: Distinguish each syscall context from another

 include/linux/completion.h         |   42 +-
 include/linux/dept.h               |  523 +++++++
 include/linux/dept_page.h          |   78 ++
 include/linux/dept_sdt.h           |   62 +
 include/linux/hardirq.h            |    3 +
 include/linux/irqflags.h           |   33 +-
 include/linux/llist.h              |    8 -
 include/linux/lockdep.h            |  156 ++-
 include/linux/lockdep_types.h      |    3 +
 include/linux/mutex.h              |   31 +
 include/linux/page-flags.h         |   45 +-
 include/linux/pagemap.h            |    7 +-
 include/linux/percpu-rwsem.h       |   10 +-
 include/linux/rtmutex.h            |    7 +
 include/linux/rwlock.h             |   48 +
 include/linux/rwlock_api_smp.h     |    8 +-
 include/linux/rwlock_types.h       |    7 +
 include/linux/rwsem.h              |   31 +
 include/linux/sched.h              |    7 +
 include/linux/seqlock.h            |   59 +-
 include/linux/spinlock.h           |   24 +
 include/linux/spinlock_types_raw.h |   13 +
 include/linux/swait.h              |    4 +
 include/linux/types.h              |    8 +
 include/linux/wait.h               |    6 +-
 init/init_task.c                   |    2 +
 init/main.c                        |    4 +
 kernel/Makefile                    |    1 +
 kernel/cpu.c                       |    2 +-
 kernel/dependency/Makefile         |    5 +
 kernel/dependency/dept.c           | 2702 ++++++++++++++++++++++++++++++++++++
 kernel/dependency/dept_hash.h      |   10 +
 kernel/dependency/dept_internal.h  |   26 +
 kernel/dependency/dept_object.h    |   13 +
 kernel/dependency/dept_proc.c      |   93 ++
 kernel/entry/common.c              |    3 +
 kernel/exit.c                      |    1 +
 kernel/fork.c                      |    2 +
 kernel/locking/lockdep.c           |   12 +-
 kernel/module.c                    |    2 +
 kernel/sched/completion.c          |   12 +-
 kernel/sched/core.c                |    3 +
 kernel/sched/swait.c               |   10 +
 kernel/sched/wait.c                |   16 +
 kernel/softirq.c                   |    6 +-
 kernel/trace/trace_preemptirq.c    |   19 +-
 lib/Kconfig.debug                  |   21 +
 mm/filemap.c                       |   68 +
 mm/page_ext.c                      |    5 +
 49 files changed, 4204 insertions(+), 57 deletions(-)
 create mode 100644 include/linux/dept.h
 create mode 100644 include/linux/dept_page.h
 create mode 100644 include/linux/dept_sdt.h
 create mode 100644 kernel/dependency/Makefile
 create mode 100644 kernel/dependency/dept.c
 create mode 100644 kernel/dependency/dept_hash.h
 create mode 100644 kernel/dependency/dept_internal.h
 create mode 100644 kernel/dependency/dept_object.h
 create mode 100644 kernel/dependency/dept_proc.c

-- 
1.9.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 01/16] llist: Move llist_{head,node} definition to types.h
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 10:57 ` [PATCH 02/16] dept: Implement Dept(Dependency Tracker) Byungchul Park
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

llist_head and llist_node can be used by very primitives. For example,
Dept for tracking dependency uses llist things in its header. To avoid
header dependency, move those to types.h.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/llist.h | 8 --------
 include/linux/types.h | 8 ++++++++
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/llist.h b/include/linux/llist.h
index 85bda2d..99cc3c3 100644
--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -53,14 +53,6 @@
 #include <linux/stddef.h>
 #include <linux/types.h>
 
-struct llist_head {
-	struct llist_node *first;
-};
-
-struct llist_node {
-	struct llist_node *next;
-};
-
 #define LLIST_HEAD_INIT(name)	{ NULL }
 #define LLIST_HEAD(name)	struct llist_head name = LLIST_HEAD_INIT(name)
 
diff --git a/include/linux/types.h b/include/linux/types.h
index ac825ad..4662d6e 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -187,6 +187,14 @@ struct hlist_node {
 	struct hlist_node *next, **pprev;
 };
 
+struct llist_head {
+	struct llist_node *first;
+};
+
+struct llist_node {
+	struct llist_node *next;
+};
+
 struct ustat {
 	__kernel_daddr_t	f_tfree;
 #ifdef CONFIG_ARCH_32BIT_USTAT_F_TINODE
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 02/16] dept: Implement Dept(Dependency Tracker)
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
  2022-02-17 10:57 ` [PATCH 01/16] llist: Move llist_{head,node} definition to types.h Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 15:54   ` Steven Rostedt
                     ` (3 more replies)
  2022-02-17 10:57 ` [PATCH 03/16] dept: Embed Dept data in Lockdep Byungchul Park
                   ` (15 subsequent siblings)
  17 siblings, 4 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

CURRENT STATUS
--------------
Lockdep tracks acquisition order of locks in order to detect deadlock,
and IRQ and IRQ enable/disable state as well to take accident
acquisitions into account.

Lockdep should be turned off once it detects and reports a deadlock
since the data structure and algorithm are not reusable after detection
because of the complex design.

PROBLEM
-------
*Waits* and their *events* that never reach eventually cause deadlock.
However, Lockdep is only interested in lock acquisition order, forcing
to emulate lock acqusition even for just waits and events that have
nothing to do with real lock.

Even worse, no one likes Lockdep's false positive detection because that
prevents further one that might be more valuable. That's why all the
kernel developers are sensitive to Lockdep's false positive.

Besides those, by tracking acquisition order, it cannot correctly deal
with read lock and cross-event e.g. wait_for_completion()/complete() for
deadlock detection. Lockdep is no longer a good tool for that purpose.

SOLUTION
--------
Again, *waits* and their *events* that never reach eventually cause
deadlock. The new solution, Dept(DEPendency Tracker), focuses on waits
and events themselves. Dept tracks waits and events and report it if
any event would be never reachable.

Dept does:
   . Works with read lock in the right way.
   . Works with any wait and event e.i. cross-event.
   . Continue to work even after reporting multiple times.
   . Provides simple and intuitive APIs.
   . Does exactly what dependency checker should do.

Q & A
-----
Q. Is this the first try ever to address the problem?
A. No. Cross-release feature (b09be676e0ff2 locking/lockdep: Implement
   the 'crossrelease' feature) addressed it 2 years ago that was a
   Lockdep extension and merged but reverted shortly because:

   Cross-release started to report valuable hidden problems but started
   to give report false positive reports as well. For sure, no one
   likes Lockdep's false positive reports since it makes Lockdep stop,
   preventing reporting further real problems.

Q. Why not Dept was developed as an extension of Lockdep?
A. Lockdep definitely includes all the efforts great developers have
   made for a long time so as to be quite stable enough. But I had to
   design and implement newly because of the following:

   1) Lockdep was designed to track lock acquisition order. The APIs and
      implementation do not fit on wait-event model.
   2) Lockdep is turned off on detection including false positive. Which
      is terrible and prevents developing any extension for stronger
      detection.

Q. Do you intend to totally replace Lockdep?
A. No. Lockdep also checks if lock usage is correct. Of course, the
   dependency check routine should be replaced but the other functions
   should be still there.

Q. Do you mean the dependency check routine should be replaced right
   away?
A. No. I admit Lockdep is stable enough thanks to great efforts kernel
   developers have made. Lockdep and Dept, both should be in the kernel
   until Dept gets considered stable.

Q. Stronger detection capability would give more false positive report.
   Which was a big problem when cross-release was introduced. Is it ok
   with Dept?
A. It's ok. Dept allows multiple reporting thanks to simple and quite
   generalized design. Of course, false positive reports should be fixed
   anyway but it's no longer as a critical problem as it was.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/dept.h            |  480 ++++++++
 include/linux/dept_sdt.h        |   62 +
 include/linux/hardirq.h         |    3 +
 include/linux/irqflags.h        |   33 +-
 include/linux/sched.h           |    7 +
 init/init_task.c                |    2 +
 init/main.c                     |    2 +
 kernel/Makefile                 |    1 +
 kernel/dependency/Makefile      |    4 +
 kernel/dependency/dept.c        | 2585 +++++++++++++++++++++++++++++++++++++++
 kernel/dependency/dept_hash.h   |   10 +
 kernel/dependency/dept_object.h |   13 +
 kernel/exit.c                   |    1 +
 kernel/fork.c                   |    2 +
 kernel/module.c                 |    2 +
 kernel/sched/core.c             |    3 +
 kernel/softirq.c                |    6 +-
 kernel/trace/trace_preemptirq.c |   19 +-
 lib/Kconfig.debug               |   20 +
 19 files changed, 3246 insertions(+), 9 deletions(-)
 create mode 100644 include/linux/dept.h
 create mode 100644 include/linux/dept_sdt.h
 create mode 100644 kernel/dependency/Makefile
 create mode 100644 kernel/dependency/dept.c
 create mode 100644 kernel/dependency/dept_hash.h
 create mode 100644 kernel/dependency/dept_object.h

diff --git a/include/linux/dept.h b/include/linux/dept.h
new file mode 100644
index 0000000..2ac4bca
--- /dev/null
+++ b/include/linux/dept.h
@@ -0,0 +1,480 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * DEPT(DEPendency Tracker) - runtime dependency tracker
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __LINUX_DEPT_H
+#define __LINUX_DEPT_H
+
+#ifdef CONFIG_DEPT
+
+#include <linux/types.h>
+
+struct task_struct;
+
+#define DEPT_MAX_STACK_ENTRY		16
+#define DEPT_MAX_WAIT_HIST		64
+#define DEPT_MAX_ECXT_HELD		48
+
+#define DEPT_MAX_SUBCLASSES		16
+#define DEPT_MAX_SUBCLASSES_EVT		2
+#define DEPT_MAX_SUBCLASSES_USR		(DEPT_MAX_SUBCLASSES / DEPT_MAX_SUBCLASSES_EVT)
+#define DEPT_MAX_SUBCLASSES_CACHE	2
+
+#define DEPT_SIRQ			0
+#define DEPT_HIRQ			1
+#define DEPT_IRQS_NR			2
+#define DEPT_SIRQF			(1UL << DEPT_SIRQ)
+#define DEPT_HIRQF			(1UL << DEPT_HIRQ)
+
+struct dept_ecxt;
+struct dept_iecxt {
+	struct dept_ecxt *ecxt;
+	int enirq;
+	bool staled; /* for preventing to add a new ecxt */
+};
+
+struct dept_wait;
+struct dept_iwait {
+	struct dept_wait *wait;
+	int irq;
+	bool staled; /* for preventing to add a new wait */
+	bool touched;
+};
+
+struct dept_class {
+	union {
+		struct llist_node pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t ref;
+	};
+
+	/*
+	 * unique information about the class
+	 */
+	const char *name;
+	unsigned long key;
+	int sub;
+
+	/*
+	 * for BFS
+	 */
+	unsigned int bfs_gen;
+	int bfs_dist;
+	struct dept_class *bfs_parent;
+
+	/*
+	 * for hashing this object
+	 */
+	struct hlist_node hash_node;
+
+	/*
+	 * for linking all classes
+	 */
+	struct list_head all_node;
+
+	/*
+	 * for associating its dependencies
+	 */
+	struct list_head dep_head;
+	struct list_head dep_rev_head;
+
+	/*
+	 * for tracking IRQ dependencies
+	 */
+	struct dept_iecxt iecxt[DEPT_IRQS_NR];
+	struct dept_iwait iwait[DEPT_IRQS_NR];
+};
+
+struct dept_stack {
+	union {
+		struct llist_node pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t ref;
+	};
+
+	/*
+	 * backtrace entries
+	 */
+	unsigned long raw[DEPT_MAX_STACK_ENTRY];
+	int nr;
+};
+
+struct dept_ecxt {
+	union {
+		struct llist_node pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t ref;
+	};
+
+	/*
+	 * function that entered to this ecxt
+	 */
+	const char *ecxt_fn;
+
+	/*
+	 * event function
+	 */
+	const char *event_fn;
+
+	/*
+	 * associated class
+	 */
+	struct dept_class *class;
+
+	/*
+	 * flag indicating which IRQ has been
+	 * enabled within the event context
+	 */
+	unsigned long enirqf;
+
+	/*
+	 * where the IRQ-enabled happened
+	 */
+	unsigned long enirq_ip[DEPT_IRQS_NR];
+	struct dept_stack *enirq_stack[DEPT_IRQS_NR];
+
+	/*
+	 * where the event context started
+	 */
+	unsigned long ecxt_ip;
+	struct dept_stack *ecxt_stack;
+
+	/*
+	 * where the event triggered
+	 */
+	unsigned long event_ip;
+	struct dept_stack *event_stack;
+};
+
+struct dept_wait {
+	union {
+		struct llist_node pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t ref;
+	};
+
+	/*
+	 * function causing this wait
+	 */
+	const char *wait_fn;
+
+	/*
+	 * the associated class
+	 */
+	struct dept_class *class;
+
+	/*
+	 * which IRQ the wait was placed in
+	 */
+	unsigned long irqf;
+
+	/*
+	 * where the IRQ wait happened
+	 */
+	unsigned long irq_ip[DEPT_IRQS_NR];
+	struct dept_stack *irq_stack[DEPT_IRQS_NR];
+
+	/*
+	 * where the wait happened
+	 */
+	unsigned long wait_ip;
+	struct dept_stack *wait_stack;
+};
+
+struct dept_dep {
+	union {
+		struct llist_node pool_node;
+
+		/*
+		 * reference counter for object management
+		 */
+		atomic_t ref;
+	};
+
+	/*
+	 * key data of dependency
+	 */
+	struct dept_ecxt *ecxt;
+	struct dept_wait *wait;
+
+	/*
+	 * This object can be referred without dept_lock
+	 * held but with IRQ disabled, e.g. for hash
+	 * lookup. So deferred deletion is needed.
+	 */
+	struct rcu_head rh;
+
+	/*
+	 * for BFS
+	 */
+	struct list_head bfs_node;
+
+	/*
+	 * for hashing this object
+	 */
+	struct hlist_node hash_node;
+
+	/*
+	 * for linking to a class object
+	 */
+	struct list_head dep_node;
+	struct list_head dep_rev_node;
+};
+
+struct dept_hash {
+	/*
+	 * hash table
+	 */
+	struct hlist_head *table;
+
+	/*
+	 * size of the table e.i. 2^bits
+	 */
+	int bits;
+};
+
+struct dept_pool {
+	const char *name;
+
+	/*
+	 * object size
+	 */
+	size_t obj_sz;
+
+	/*
+	 * the number of the static array
+	 */
+	atomic_t obj_nr;
+
+	/*
+	 * offset of ->pool_node
+	 */
+	size_t node_off;
+
+	/*
+	 * pointer to the pool
+	 */
+	void *spool;
+	struct llist_head boot_pool;
+	struct llist_head __percpu *lpool;
+};
+
+struct dept_ecxt_held {
+	/*
+	 * associated event context
+	 */
+	struct dept_ecxt *ecxt;
+
+	/*
+	 * unique key for this dept_ecxt_held
+	 */
+	unsigned long key;
+
+	/*
+	 * the wgen when the event context started
+	 */
+	unsigned int wgen;
+
+	/*
+	 * for allowing user aware nesting
+	 */
+	int nest;
+};
+
+struct dept_wait_hist {
+	/*
+	 * associated wait
+	 */
+	struct dept_wait *wait;
+
+	/*
+	 * unique id of all waits system-wise until wrapped
+	 */
+	unsigned int wgen;
+
+	/*
+	 * local context id to identify IRQ context
+	 */
+	unsigned int ctxt_id;
+};
+
+struct dept_key {
+	union {
+		/*
+		 * Each byte-wise address will be used as its key.
+		 */
+		char subkeys[DEPT_MAX_SUBCLASSES];
+
+		/*
+		 * for caching the main class pointer
+		 */
+		struct dept_class *classes[DEPT_MAX_SUBCLASSES_CACHE];
+	};
+};
+
+struct dept_map {
+	const char *name;
+	struct dept_key *keys;
+	int sub_usr;
+
+	/*
+	 * It's local copy for fast acces to the associated classes. And
+	 * Also used for dept_key instance for statically defined map.
+	 */
+	struct dept_key keys_local;
+
+	/*
+	 * wait timestamp associated to this map
+	 */
+	unsigned int wgen;
+
+	/*
+	 * whether this map should be going to be checked or not
+	 */
+	bool nocheck;
+};
+
+struct dept_task {
+	/*
+	 * all event contexts that have entered and before exiting
+	 */
+	struct dept_ecxt_held ecxt_held[DEPT_MAX_ECXT_HELD];
+	int ecxt_held_pos;
+
+	/*
+	 * ring buffer holding all waits that have happened
+	 */
+	struct dept_wait_hist wait_hist[DEPT_MAX_WAIT_HIST];
+	int wait_hist_pos;
+
+	/*
+	 * sequential id to identify each IRQ context
+	 */
+	unsigned int irq_id[DEPT_IRQS_NR];
+
+	/*
+	 * for tracking IRQ-enabled points with cross-event
+	 */
+	unsigned int wgen_enirq[DEPT_IRQS_NR];
+
+	/*
+	 * for keeping up-to-date IRQ-enabled points
+	 */
+	unsigned long enirq_ip[DEPT_IRQS_NR];
+
+	/*
+	 * current effective IRQ-enabled flag
+	 */
+	unsigned long eff_enirqf;
+
+	/*
+	 * for reserving a current stack instance at each operation
+	 */
+	struct dept_stack *stack;
+
+	/*
+	 * for preventing recursive call into DEPT engine
+	 */
+	int recursive;
+
+	/*
+	 * for staging data to commit a wait
+	 */
+	struct dept_map *stage_m;
+	unsigned long stage_w_f;
+	unsigned long stage_ip;
+	const char *stage_w_fn;
+	int stage_ne;
+
+	/*
+	 * for tracking IRQ-enable state
+	 */
+	bool hardirqs_enabled;
+	bool softirqs_enabled;
+};
+
+#define DEPT_TASK_INITIALIZER(t)					\
+	.dept_task.wait_hist = { { .wait = NULL, } },			\
+	.dept_task.ecxt_held_pos = 0,					\
+	.dept_task.wait_hist_pos = 0,					\
+	.dept_task.irq_id = { 0 },					\
+	.dept_task.wgen_enirq = { 0 },					\
+	.dept_task.enirq_ip = { 0 },					\
+	.dept_task.recursive = 0,					\
+	.dept_task.hardirqs_enabled = false,				\
+	.dept_task.softirqs_enabled = false,
+
+extern void dept_on(void);
+extern void dept_off(void);
+extern void dept_init(void);
+extern void dept_task_init(struct task_struct *t);
+extern void dept_task_exit(struct task_struct *t);
+extern void dept_free_range(void *start, unsigned int sz);
+extern void dept_map_init(struct dept_map *m, struct dept_key *k, int sub, const char *n);
+extern void dept_map_reinit(struct dept_map *m);
+extern void dept_map_nocheck(struct dept_map *m);
+
+extern void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip, const char *w_fn, int ne);
+extern void dept_stage_wait(struct dept_map *m, unsigned long w_f, unsigned long ip, const char *w_fn, int ne);
+extern void dept_ask_event_wait_commit(void);
+extern void dept_clean_stage(void);
+extern void dept_ecxt_enter(struct dept_map *m, unsigned long e_f, unsigned long ip, const char *c_fn, const char *e_fn, int ne);
+extern void dept_ask_event(struct dept_map *m);
+extern void dept_event(struct dept_map *m, unsigned long e_f, unsigned long ip, const char *e_fn);
+extern void dept_ecxt_exit(struct dept_map *m, unsigned long ip);
+extern struct dept_map *dept_top_map(void);
+extern void dept_warn_on(bool cond);
+
+/*
+ * for users who want to manage external keys
+ */
+extern void dept_key_init(struct dept_key *k);
+extern void dept_key_destroy(struct dept_key *k);
+#else /* !CONFIG_DEPT */
+struct dept_key  { };
+struct dept_map  { };
+struct dept_task { };
+
+#define DEPT_TASK_INITIALIZER(t)
+
+#define dept_on()					do { } while (0)
+#define dept_off()					do { } while (0)
+#define dept_init()					do { } while (0)
+#define dept_task_init(t)				do { } while (0)
+#define dept_task_exit(t)				do { } while (0)
+#define dept_free_range(s, sz)				do { } while (0)
+#define dept_map_init(m, k, s, n)			do { (void)(n); (void)(k); } while (0)
+#define dept_map_reinit(m)				do { } while (0)
+#define dept_map_nocheck(m)				do { } while (0)
+
+#define dept_wait(m, w_f, ip, w_fn, ne)			do { (void)(w_fn); } while (0)
+#define dept_stage_wait(m, w_f, ip, w_fn, ne)		do { (void)(w_fn); } while (0)
+#define dept_ask_event_wait_commit()			do { } while (0)
+#define dept_clean_stage()				do { } while (0)
+#define dept_ecxt_enter(m, e_f, ip, c_fn, e_fn, ne)	do { (void)(c_fn); (void)(e_fn); } while (0)
+#define dept_ask_event(m)				do { } while (0)
+#define dept_event(m, e_f, ip, e_fn)			do { (void)(e_fn); } while (0)
+#define dept_ecxt_exit(m, ip)				do { } while (0)
+#define dept_top_map()					NULL
+#define dept_warn_on(c)					do { } while (0)
+#define dept_key_init(k)				do { (void)(k); } while (0)
+#define dept_key_destroy(k)				do { (void)(k); } while (0)
+#endif
+#endif /* __LINUX_DEPT_H */
diff --git a/include/linux/dept_sdt.h b/include/linux/dept_sdt.h
new file mode 100644
index 0000000..9e1c544
--- /dev/null
+++ b/include/linux/dept_sdt.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Dept Single-event Dependency Tracker
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __LINUX_DEPT_SDT_H
+#define __LINUX_DEPT_SDT_H
+
+#include <linux/dept.h>
+
+#ifdef CONFIG_DEPT
+#define DEPT_SDT_MAP_INIT(dname)	{ .name = #dname }
+
+/*
+ * SDT(Single-event Dependency Tracker) APIs
+ *
+ * In case that one dept_map instance maps to a single event, SDT APIs
+ * can be used.
+ */
+#define sdt_map_init(m)							\
+	do {								\
+		static struct dept_key __key;				\
+		dept_map_init(m, &__key, 0, #m);			\
+	} while (0)
+#define sdt_map_init_key(m, k)		dept_map_init(m, k, 0, #m)
+
+#define sdt_wait(m)							\
+	do {								\
+		dept_ask_event(m);					\
+		dept_wait(m, 1UL, _THIS_IP_, "wait", 0);		\
+	} while (0)
+/*
+ * This will be committed in __schedule() when it actually gets to
+ * __schedule(). Both dept_ask_event() and dept_wait() will be performed
+ * on the commit in __schedule().
+ */
+#define sdt_wait_prepare(m)		dept_stage_wait(m, 1UL, _THIS_IP_, "wait", 0)
+#define sdt_wait_finish()		dept_clean_stage()
+#define sdt_ecxt_enter(m)		dept_ecxt_enter(m, 1UL, _THIS_IP_, "start", "event", 0)
+#define sdt_event(m)			dept_event(m, 1UL, _THIS_IP_, "event")
+#define sdt_ecxt_exit(m)		dept_ecxt_exit(m, _THIS_IP_)
+#else /* !CONFIG_DEPT */
+#define DEPT_SDT_MAP_INIT(dname)			{ }
+
+#define sdt_map_init(m)					do { } while (0)
+#define sdt_map_init_key(m, k)				do { (void)(k); } while (0)
+#define sdt_wait(m)					do { } while (0)
+#define sdt_wait_prepare(m)				do { } while (0)
+#define sdt_wait_finish()				do { } while (0)
+#define sdt_ecxt_enter(m)				do { } while (0)
+#define sdt_event(m)					do { } while (0)
+#define sdt_ecxt_exit(m)				do { } while (0)
+#endif
+
+#define DEFINE_DEPT_SDT(x)		\
+	struct dept_map x = DEPT_SDT_MAP_INIT(x)
+
+#endif /* __LINUX_DEPT_SDT_H */
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 76878b3..07005f2 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -5,6 +5,7 @@
 #include <linux/context_tracking_state.h>
 #include <linux/preempt.h>
 #include <linux/lockdep.h>
+#include <linux/dept.h>
 #include <linux/ftrace_irq.h>
 #include <linux/sched.h>
 #include <linux/vtime.h>
@@ -114,6 +115,7 @@ static inline void rcu_nmi_exit(void) { }
  */
 #define __nmi_enter()						\
 	do {							\
+		dept_off();					\
 		lockdep_off();					\
 		arch_nmi_enter();				\
 		BUG_ON(in_nmi() == NMI_MASK);			\
@@ -136,6 +138,7 @@ static inline void rcu_nmi_exit(void) { }
 		__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
 		arch_nmi_exit();				\
 		lockdep_on();					\
+		dept_on();					\
 	} while (0)
 
 #define nmi_exit()						\
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 4b14093..6e7d7d2 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -31,6 +31,22 @@
   static inline void lockdep_hardirqs_off(unsigned long ip) { }
 #endif
 
+#ifdef CONFIG_DEPT
+extern void dept_hardirq_enter(void);
+extern void dept_softirq_enter(void);
+extern void dept_enable_hardirq(unsigned long ip);
+extern void dept_enable_softirq(unsigned long ip);
+extern void dept_disable_hardirq(unsigned long ip);
+extern void dept_disable_softirq(unsigned long ip);
+#else
+static inline void dept_hardirq_enter(void) { }
+static inline void dept_softirq_enter(void) { }
+static inline void dept_enable_hardirq(unsigned long ip) { }
+static inline void dept_enable_softirq(unsigned long ip) { }
+static inline void dept_disable_hardirq(unsigned long ip) { }
+static inline void dept_disable_softirq(unsigned long ip) { }
+#endif
+
 #ifdef CONFIG_TRACE_IRQFLAGS
 
 /* Per-task IRQ trace events information. */
@@ -53,15 +69,19 @@ struct irqtrace_events {
 extern void trace_hardirqs_off_finish(void);
 extern void trace_hardirqs_on(void);
 extern void trace_hardirqs_off(void);
+extern void trace_softirqs_on_caller(unsigned long ip);
+extern void trace_softirqs_off_caller(unsigned long ip);
 
 # define lockdep_hardirq_context()	(raw_cpu_read(hardirq_context))
 # define lockdep_softirq_context(p)	((p)->softirq_context)
 # define lockdep_hardirqs_enabled()	(this_cpu_read(hardirqs_enabled))
 # define lockdep_softirqs_enabled(p)	((p)->softirqs_enabled)
-# define lockdep_hardirq_enter()			\
-do {							\
-	if (__this_cpu_inc_return(hardirq_context) == 1)\
-		current->hardirq_threaded = 0;		\
+# define lockdep_hardirq_enter()				\
+do {								\
+	if (__this_cpu_inc_return(hardirq_context) == 1) {	\
+		current->hardirq_threaded = 0;			\
+		dept_hardirq_enter();				\
+	}							\
 } while (0)
 # define lockdep_hardirq_threaded()		\
 do {						\
@@ -115,6 +135,8 @@ struct irqtrace_events {
 # define trace_hardirqs_off_finish()		do { } while (0)
 # define trace_hardirqs_on()			do { } while (0)
 # define trace_hardirqs_off()			do { } while (0)
+# define trace_softirqs_on_caller(ip)		do { } while (0)
+# define trace_softirqs_off_caller(ip)		do { } while (0)
 # define lockdep_hardirq_context()		0
 # define lockdep_softirq_context(p)		0
 # define lockdep_hardirqs_enabled()		0
@@ -135,7 +157,8 @@ struct irqtrace_events {
 #if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PREEMPT_RT)
 # define lockdep_softirq_enter()		\
 do {						\
-	current->softirq_context++;		\
+	if (!current->softirq_context++)	\
+		dept_softirq_enter();		\
 } while (0)
 # define lockdep_softirq_exit()			\
 do {						\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 508b91d..1526b32 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -35,6 +35,7 @@
 #include <linux/seqlock.h>
 #include <linux/kcsan.h>
 #include <asm/kmap_size.h>
+#include <linux/dept.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
 struct audit_context;
@@ -201,12 +202,16 @@
  */
 #define __set_current_state(state_value)				\
 	do {								\
+		if (state_value == TASK_RUNNING)			\
+			dept_clean_stage();				\
 		debug_normal_state_change((state_value));		\
 		WRITE_ONCE(current->__state, (state_value));		\
 	} while (0)
 
 #define set_current_state(state_value)					\
 	do {								\
+		if (state_value == TASK_RUNNING)			\
+			dept_clean_stage();				\
 		debug_normal_state_change((state_value));		\
 		smp_store_mb(current->__state, (state_value));		\
 	} while (0)
@@ -1157,6 +1162,8 @@ struct task_struct {
 	struct held_lock		held_locks[MAX_LOCK_DEPTH];
 #endif
 
+	struct dept_task		dept_task;
+
 #if defined(CONFIG_UBSAN) && !defined(CONFIG_UBSAN_TRAP)
 	unsigned int			in_ubsan;
 #endif
diff --git a/init/init_task.c b/init/init_task.c
index 73cc8f0..d530256 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -12,6 +12,7 @@
 #include <linux/audit.h>
 #include <linux/numa.h>
 #include <linux/scs.h>
+#include <linux/dept.h>
 
 #include <linux/uaccess.h>
 
@@ -193,6 +194,7 @@ struct task_struct init_task
 	.curr_chain_key = INITIAL_CHAIN_KEY,
 	.lockdep_recursion = 0,
 #endif
+	DEPT_TASK_INITIALIZER(init_task)
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	.ret_stack		= NULL,
 	.tracing_graph_pause	= ATOMIC_INIT(0),
diff --git a/init/main.c b/init/main.c
index 65fa2e4..ca96e11 100644
--- a/init/main.c
+++ b/init/main.c
@@ -65,6 +65,7 @@
 #include <linux/debug_locks.h>
 #include <linux/debugobjects.h>
 #include <linux/lockdep.h>
+#include <linux/dept.h>
 #include <linux/kmemleak.h>
 #include <linux/padata.h>
 #include <linux/pid_namespace.h>
@@ -1070,6 +1071,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
 		      panic_param);
 
 	lockdep_init();
+	dept_init();
 
 	/*
 	 * Need to run this when irqs are enabled, because it wants
diff --git a/kernel/Makefile b/kernel/Makefile
index 56f4ee9..cef9b02 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -53,6 +53,7 @@ obj-y += rcu/
 obj-y += livepatch/
 obj-y += dma/
 obj-y += entry/
+obj-y += dependency/
 
 obj-$(CONFIG_KCMP) += kcmp.o
 obj-$(CONFIG_FREEZER) += freezer.o
diff --git a/kernel/dependency/Makefile b/kernel/dependency/Makefile
new file mode 100644
index 0000000..9f7778e
--- /dev/null
+++ b/kernel/dependency/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_DEPT) += dept.o
+
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
new file mode 100644
index 0000000..4a3ab39
--- /dev/null
+++ b/kernel/dependency/dept.c
@@ -0,0 +1,2585 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DEPT(DEPendency Tracker) - Runtime dependency tracker
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ *
+ * DEPT provides a general way to detect deadlock possibility in runtime
+ * and the interest is not limited to typical lock but to every
+ * syncronization primitives.
+ *
+ * The following ideas were borrowed from LOCKDEP:
+ *
+ *    1) Use a graph to track relationship between classes.
+ *    2) Prevent performance regression using hash.
+ *
+ * The following items were enhanced from LOCKDEP:
+ *
+ *    1) Cover more deadlock cases.
+ *    2) Allow muliple reports.
+ *
+ * TODO: Both LOCKDEP and DEPT should co-exist until DEPT is considered
+ * stable. Then the dependency check routine should be replaced with
+ * DEPT after. It should finally look like:
+ *
+ *
+ *
+ * As is:
+ *
+ *    LOCKDEP
+ *    +-----------------------------------------+
+ *    | Lock usage correctness check            | <-> locks
+ *    |                                         |
+ *    |                                         |
+ *    | +-------------------------------------+ |
+ *    | | Dependency check                    | |
+ *    | | (by tracking lock acquisition order)| |
+ *    | +-------------------------------------+ |
+ *    |                                         |
+ *    +-----------------------------------------+
+ *
+ *    DEPT
+ *    +-----------------------------------------+
+ *    | Dependency check                        | <-> waits/events
+ *    | (by tracking wait and event context)    |
+ *    +-----------------------------------------+
+ *
+ *
+ *
+ * To be:
+ *
+ *    LOCKDEP
+ *    +-----------------------------------------+
+ *    | Lock usage correctness check            | <-> locks
+ *    |                                         |
+ *    |                                         |
+ *    |       (Request dependency check)        |
+ *    |                    T                    |
+ *    +--------------------|--------------------+
+ *                         |
+ *    DEPT                 V
+ *    +-----------------------------------------+
+ *    | Dependency check                        | <-> waits/events
+ *    | (by tracking wait and event context)    |
+ *    +-----------------------------------------+
+ *
+ *
+ *
+ * ---
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your ootion) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ */
+
+#include <linux/sched.h>
+#include <linux/stacktrace.h>
+#include <linux/spinlock.h>
+#include <linux/kallsyms.h>
+#include <linux/hash.h>
+#include <linux/dept.h>
+#include <linux/utsname.h>
+
+static int dept_stop;
+static int dept_per_cpu_ready;
+
+#define DEPT_READY_WARN (!oops_in_progress)
+
+/*
+ * Make all operations using DEPT_WARN_ON() fail on oops_in_progress and
+ * prevent warning message.
+ */
+#define DEPT_WARN_ON_ONCE(c)						\
+	({								\
+		int __ret = 0;						\
+									\
+		if (likely(DEPT_READY_WARN))				\
+			__ret = WARN_ONCE(c, "DEPT_WARN_ON_ONCE: " #c);	\
+		__ret;							\
+	})
+
+#define DEPT_WARN_ONCE(s...)						\
+	({								\
+		if (likely(DEPT_READY_WARN))				\
+			WARN_ONCE(1, "DEPT_WARN_ONCE: " s);		\
+	})
+
+#define DEPT_WARN_ON(c)							\
+	({								\
+		int __ret = 0;						\
+									\
+		if (likely(DEPT_READY_WARN))				\
+			__ret = WARN(c, "DEPT_WARN_ON: " #c);		\
+		__ret;							\
+	})
+
+#define DEPT_WARN(s...)							\
+	({								\
+		if (likely(DEPT_READY_WARN))				\
+			WARN(1, "DEPT_WARN: " s);			\
+	})
+
+#define DEPT_STOP(s...)							\
+	({								\
+		WRITE_ONCE(dept_stop, 1);				\
+		if (likely(DEPT_READY_WARN))				\
+			WARN(1, "DEPT_STOP: " s);			\
+	})
+
+static arch_spinlock_t dept_spin = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+
+/*
+ * DEPT internal engine should be careful in using outside functions
+ * e.g. printk at reporting since that kind of usage might cause
+ * untrackable deadlock.
+ */
+static atomic_t dept_outworld = ATOMIC_INIT(0);
+
+static inline void dept_outworld_enter(void)
+{
+	atomic_inc(&dept_outworld);
+}
+
+static inline void dept_outworld_exit(void)
+{
+	atomic_dec(&dept_outworld);
+}
+
+static inline bool dept_outworld_entered(void)
+{
+	return atomic_read(&dept_outworld);
+}
+
+static inline bool dept_lock(void)
+{
+	while (!arch_spin_trylock(&dept_spin))
+		if (unlikely(dept_outworld_entered()))
+			return false;
+	return true;
+}
+
+static inline void dept_unlock(void)
+{
+	arch_spin_unlock(&dept_spin);
+}
+
+/*
+ * whether to stack-trace on every wait or every ecxt
+ */
+static bool rich_stack = true;
+
+enum bfs_ret {
+	BFS_CONTINUE,
+	BFS_CONTINUE_REV,
+	BFS_DONE,
+	BFS_SKIP,
+};
+
+static inline bool before(unsigned int a, unsigned int b)
+{
+	return (int)(a - b) < 0;
+}
+
+static inline bool valid_stack(struct dept_stack *s)
+{
+	return s && s->nr > 0;
+}
+
+static inline bool valid_class(struct dept_class *c)
+{
+	return c->key;
+}
+
+static inline void inval_class(struct dept_class *c)
+{
+	c->key = 0UL;
+}
+
+static inline struct dept_ecxt *dep_e(struct dept_dep *d)
+{
+	return d->ecxt;
+}
+
+static inline struct dept_wait *dep_w(struct dept_dep *d)
+{
+	return d->wait;
+}
+
+static inline struct dept_class *dep_fc(struct dept_dep *d)
+{
+	return dep_e(d)->class;
+}
+
+static inline struct dept_class *dep_tc(struct dept_dep *d)
+{
+	return dep_w(d)->class;
+}
+
+static inline const char *irq_str(int irq)
+{
+	if (irq == DEPT_SIRQ)
+		return "softirq";
+	if (irq == DEPT_HIRQ)
+		return "hardirq";
+	return "(unknown)";
+}
+
+static inline struct dept_task *dept_task(void)
+{
+	return &current->dept_task;
+}
+
+/*
+ * Pool
+ * =====================================================================
+ * DEPT maintains pools to provide objects in a safe way.
+ *
+ *    1) Static pool is used at the beginning of booting time.
+ *    2) Local pool is tried first before the static pool. Objects that
+ *       have been freed will be placed.
+ */
+
+enum object_t {
+#define OBJECT(id, nr) OBJECT_##id,
+	#include "dept_object.h"
+#undef  OBJECT
+	OBJECT_NR,
+};
+
+#define OBJECT(id, nr)							\
+static struct dept_##id spool_##id[nr];					\
+static DEFINE_PER_CPU(struct llist_head, lpool_##id);
+	#include "dept_object.h"
+#undef  OBJECT
+
+static struct dept_pool pool[OBJECT_NR] = {
+#define OBJECT(id, nr) {						\
+	.name = #id,							\
+	.obj_sz = sizeof(struct dept_##id),				\
+	.obj_nr = ATOMIC_INIT(nr),					\
+	.node_off = offsetof(struct dept_##id, pool_node),		\
+	.spool = spool_##id,						\
+	.lpool = &lpool_##id, },
+	#include "dept_object.h"
+#undef  OBJECT
+};
+
+/*
+ * Can use llist no matter whether CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG is
+ * enabled because DEPT never race with NMI by nesting control.
+ */
+static void *from_pool(enum object_t t)
+{
+	struct dept_pool *p;
+	struct llist_head *h;
+	struct llist_node *n;
+
+	/*
+	 * llist_del_first() doesn't allow concurrent access e.g.
+	 * between process and IRQ context.
+	 */
+	if (DEPT_WARN_ON(!irqs_disabled()))
+		return NULL;
+
+	p = &pool[t];
+
+	/*
+	 * Try local pool first.
+	 */
+	if (likely(dept_per_cpu_ready))
+		h = this_cpu_ptr(p->lpool);
+	else
+		h = &p->boot_pool;
+
+	n = llist_del_first(h);
+	if (n)
+		return (void *)n - p->node_off;
+
+	/*
+	 * Try static pool.
+	 */
+	if (atomic_read(&p->obj_nr) > 0) {
+		int idx = atomic_dec_return(&p->obj_nr);
+		if (idx >= 0)
+			return p->spool + (idx * p->obj_sz);
+	}
+
+	DEPT_WARN_ONCE("Pool(%s) is empty.\n", p->name);
+	return NULL;
+}
+
+static void to_pool(void *o, enum object_t t)
+{
+	struct dept_pool *p = &pool[t];
+	struct llist_head *h;
+
+	preempt_disable();
+	if (likely(dept_per_cpu_ready))
+		h = this_cpu_ptr(p->lpool);
+	else
+		h = &p->boot_pool;
+
+	llist_add(o + p->node_off, h);
+	preempt_enable();
+}
+
+#define OBJECT(id, nr)							\
+static void (*ctor_##id)(struct dept_##id *a);				\
+static void (*dtor_##id)(struct dept_##id *a);				\
+static inline struct dept_##id *new_##id(void)				\
+{									\
+	struct dept_##id *a;						\
+									\
+	a = (struct dept_##id *)from_pool(OBJECT_##id);			\
+	if (unlikely(!a))						\
+		return NULL;						\
+									\
+	atomic_set(&a->ref, 1);						\
+									\
+	if (ctor_##id)							\
+		ctor_##id(a);						\
+									\
+	return a;							\
+}									\
+									\
+static inline struct dept_##id *get_##id(struct dept_##id *a)		\
+{									\
+	atomic_inc(&a->ref);						\
+	return a;							\
+}									\
+									\
+static inline void put_##id(struct dept_##id *a)			\
+{									\
+	if (!atomic_dec_return(&a->ref)) {				\
+		if (dtor_##id)						\
+			dtor_##id(a);					\
+		to_pool(a, OBJECT_##id);				\
+	}								\
+}									\
+									\
+static inline void del_##id(struct dept_##id *a)			\
+{									\
+	put_##id(a);							\
+}									\
+									\
+static inline bool id##_consumed(struct dept_##id *a)			\
+{									\
+	return a && atomic_read(&a->ref) > 1;				\
+}
+#include "dept_object.h"
+#undef  OBJECT
+
+#define SET_CONSTRUCTOR(id, f) \
+static void (*ctor_##id)(struct dept_##id *a) = f
+
+static void initialize_dep(struct dept_dep *d)
+{
+	INIT_LIST_HEAD(&d->bfs_node);
+	INIT_LIST_HEAD(&d->dep_node);
+	INIT_LIST_HEAD(&d->dep_rev_node);
+}
+SET_CONSTRUCTOR(dep, initialize_dep);
+
+static void initialize_class(struct dept_class *c)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		struct dept_iecxt *ie = &c->iecxt[i];
+		struct dept_iwait *iw = &c->iwait[i];
+
+		ie->ecxt = NULL;
+		ie->enirq = i;
+		ie->staled = false;
+
+		iw->wait = NULL;
+		iw->irq = i;
+		iw->staled = false;
+		iw->touched = false;
+	}
+	c->bfs_gen = 0U;
+
+	INIT_LIST_HEAD(&c->all_node);
+	INIT_LIST_HEAD(&c->dep_head);
+	INIT_LIST_HEAD(&c->dep_rev_head);
+}
+SET_CONSTRUCTOR(class, initialize_class);
+
+static void initialize_ecxt(struct dept_ecxt *e)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		e->enirq_stack[i] = NULL;
+		e->enirq_ip[i] = 0UL;
+	}
+	e->ecxt_ip = 0UL;
+	e->ecxt_stack = NULL;
+	e->enirqf = 0UL;
+	e->event_stack = NULL;
+}
+SET_CONSTRUCTOR(ecxt, initialize_ecxt);
+
+static void initialize_wait(struct dept_wait *w)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		w->irq_stack[i] = NULL;
+		w->irq_ip[i] = 0UL;
+	}
+	w->wait_ip = 0UL;
+	w->wait_stack = NULL;
+	w->irqf = 0UL;
+}
+SET_CONSTRUCTOR(wait, initialize_wait);
+
+static void initialize_stack(struct dept_stack *s)
+{
+	s->nr = 0;
+}
+SET_CONSTRUCTOR(stack, initialize_stack);
+
+#define OBJECT(id, nr) \
+static void (*ctor_##id)(struct dept_##id *a);
+	#include "dept_object.h"
+#undef  OBJECT
+
+#undef  SET_CONSTRUCTOR
+
+#define SET_DESTRUCTOR(id, f) \
+static void (*dtor_##id)(struct dept_##id *a) = f
+
+static void destroy_dep(struct dept_dep *d)
+{
+	if (dep_e(d))
+		put_ecxt(dep_e(d));
+	if (dep_w(d))
+		put_wait(dep_w(d));
+}
+SET_DESTRUCTOR(dep, destroy_dep);
+
+static void destroy_ecxt(struct dept_ecxt *e)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++)
+		if (e->enirq_stack[i])
+			put_stack(e->enirq_stack[i]);
+	if (e->class)
+		put_class(e->class);
+	if (e->ecxt_stack)
+		put_stack(e->ecxt_stack);
+	if (e->event_stack)
+		put_stack(e->event_stack);
+}
+SET_DESTRUCTOR(ecxt, destroy_ecxt);
+
+static void destroy_wait(struct dept_wait *w)
+{
+	int i;
+
+	for (i = 0; i < DEPT_IRQS_NR; i++)
+		if (w->irq_stack[i])
+			put_stack(w->irq_stack[i]);
+	if (w->class)
+		put_class(w->class);
+	if (w->wait_stack)
+		put_stack(w->wait_stack);
+}
+SET_DESTRUCTOR(wait, destroy_wait);
+
+#define OBJECT(id, nr) \
+static void (*dtor_##id)(struct dept_##id *a);
+	#include "dept_object.h"
+#undef  OBJECT
+
+#undef  SET_DESTRUCTOR
+
+/*
+ * Caching and hashing
+ * =====================================================================
+ * DEPT makes use of caching and hashing to improve performance. Each
+ * object can be obtained in O(1) with its key.
+ *
+ * NOTE: Currently we assume all the objects in the hashs will never be
+ * removed. Implement it when needed.
+ */
+
+/*
+ * Some information might be lost but it's only for hashing key.
+ */
+static inline unsigned long mix(unsigned long a, unsigned long b)
+{
+	int halfbits = sizeof(unsigned long) * 8 / 2;
+	unsigned long halfmask = (1UL << halfbits) - 1UL;
+	return (a << halfbits) | (b & halfmask);
+}
+
+static bool cmp_dep(struct dept_dep *d1, struct dept_dep *d2)
+{
+	return dep_fc(d1)->key == dep_fc(d2)->key &&
+	       dep_tc(d1)->key == dep_tc(d2)->key;
+}
+
+static unsigned long key_dep(struct dept_dep *d)
+{
+	return mix(dep_fc(d)->key, dep_tc(d)->key);
+}
+
+static bool cmp_class(struct dept_class *c1, struct dept_class *c2)
+{
+	return c1->key == c2->key;
+}
+
+static unsigned long key_class(struct dept_class *c)
+{
+	return c->key;
+}
+
+#define HASH(id, bits)							\
+static struct hlist_head table_##id[1UL << bits];			\
+									\
+static inline struct hlist_head *head_##id(struct dept_##id *a)		\
+{									\
+	return table_##id + hash_long(key_##id(a), bits);		\
+}									\
+									\
+static inline struct dept_##id *hash_lookup_##id(struct dept_##id *a)	\
+{									\
+	struct dept_##id *b;						\
+									\
+	hlist_for_each_entry_rcu(b, head_##id(a), hash_node)		\
+		if (cmp_##id(a, b))					\
+			return b;					\
+	return NULL;							\
+}									\
+									\
+static inline void hash_add_##id(struct dept_##id *a)			\
+{									\
+	hlist_add_head_rcu(&a->hash_node, head_##id(a));		\
+}									\
+									\
+static inline void hash_del_##id(struct dept_##id *a)			\
+{									\
+	hlist_del_rcu(&a->hash_node);					\
+}
+#include "dept_hash.h"
+#undef  HASH
+
+static inline struct dept_dep *lookup_dep(struct dept_class *fc,
+					  struct dept_class *tc)
+{
+	struct dept_ecxt onetime_e = { .class = fc };
+	struct dept_wait onetime_w = { .class = tc };
+	struct dept_dep  onetime_d = { .ecxt = &onetime_e,
+				       .wait = &onetime_w };
+	return hash_lookup_dep(&onetime_d);
+}
+
+static inline struct dept_class *lookup_class(unsigned long key)
+{
+	struct dept_class onetime_c = { .key = key };
+
+	return hash_lookup_class(&onetime_c);
+}
+
+/*
+ * Report
+ * =====================================================================
+ * DEPT prints useful information to help debuging on detection of
+ * problematic dependency.
+ */
+
+static inline void print_ip_stack(unsigned long ip, struct dept_stack *s)
+{
+	if (ip)
+		print_ip_sym(KERN_WARNING, ip);
+
+	if (valid_stack(s)) {
+		pr_warn("stacktrace:\n");
+		stack_trace_print(s->raw, s->nr, 5);
+	}
+
+	if (!ip && !valid_stack(s))
+		pr_warn("(N/A)\n");
+}
+
+#define print_spc(spc, fmt, ...)					\
+	pr_warn("%*c" fmt, (spc) * 4, ' ', ##__VA_ARGS__)
+
+static void print_diagram(struct dept_dep *d)
+{
+	struct dept_ecxt *e = dep_e(d);
+	struct dept_wait *w = dep_w(d);
+	struct dept_class *fc = dep_fc(d);
+	struct dept_class *tc = dep_tc(d);
+	unsigned long irqf;
+	int irq;
+	bool firstline = true;
+	int spc = 1;
+	const char *w_fn = w->wait_fn  ?: "(unknown)";
+	const char *e_fn = e->event_fn ?: "(unknown)";
+	const char *c_fn = e->ecxt_fn ?: "(unknown)";
+
+	irqf = e->enirqf & w->irqf;
+	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+		if (!firstline)
+			pr_warn("\nor\n\n");
+		firstline = false;
+
+		print_spc(spc, "[S] %s(%s:%d)\n", c_fn, fc->name, fc->sub);
+		print_spc(spc, "    <%s interrupt>\n", irq_str(irq));
+		print_spc(spc + 1, "[W] %s(%s:%d)\n", w_fn, tc->name, tc->sub);
+		print_spc(spc, "[E] %s(%s:%d)\n", e_fn, fc->name, fc->sub);
+	}
+
+	if (!irqf) {
+		print_spc(spc, "[S] %s(%s:%d)\n", c_fn, fc->name, fc->sub);
+		print_spc(spc, "[W] %s(%s:%d)\n", w_fn, tc->name, tc->sub);
+		print_spc(spc, "[E] %s(%s:%d)\n", e_fn, fc->name, fc->sub);
+	}
+}
+
+static void print_dep(struct dept_dep *d)
+{
+	struct dept_ecxt *e = dep_e(d);
+	struct dept_wait *w = dep_w(d);
+	struct dept_class *fc = dep_fc(d);
+	struct dept_class *tc = dep_tc(d);
+	unsigned long irqf;
+	int irq;
+	const char *w_fn = w->wait_fn  ?: "(unknown)";
+	const char *e_fn = e->event_fn ?: "(unknown)";
+	const char *c_fn = e->ecxt_fn ?: "(unknown)";
+
+	irqf = e->enirqf & w->irqf;
+	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+		pr_warn("%s has been enabled:\n", irq_str(irq));
+		print_ip_stack(e->enirq_ip[irq], e->enirq_stack[irq]);
+		pr_warn("\n");
+
+		pr_warn("[S] %s(%s:%d):\n", c_fn, fc->name, fc->sub);
+		print_ip_stack(e->ecxt_ip, e->ecxt_stack);
+		pr_warn("\n");
+
+		pr_warn("[W] %s(%s:%d) in %s context:\n",
+		       w_fn, tc->name, tc->sub, irq_str(irq));
+		print_ip_stack(w->irq_ip[irq], w->irq_stack[irq]);
+		pr_warn("\n");
+
+		pr_warn("[E] %s(%s:%d):\n", e_fn, fc->name, fc->sub);
+		print_ip_stack(e->event_ip, e->event_stack);
+	}
+
+	if (!irqf) {
+		pr_warn("[S] %s(%s:%d):\n", c_fn, fc->name, fc->sub);
+		print_ip_stack(e->ecxt_ip, e->ecxt_stack);
+		pr_warn("\n");
+
+		pr_warn("[W] %s(%s:%d):\n", w_fn, tc->name, tc->sub);
+		print_ip_stack(w->wait_ip, w->wait_stack);
+		pr_warn("\n");
+
+		pr_warn("[E] %s(%s:%d):\n", e_fn, fc->name, fc->sub);
+		print_ip_stack(e->event_ip, e->event_stack);
+	}
+}
+
+static void save_current_stack(int skip);
+
+/*
+ * Print all classes in a circle.
+ */
+static void print_circle(struct dept_class *c)
+{
+	struct dept_class *fc = c->bfs_parent;
+	struct dept_class *tc = c;
+	int i;
+
+	dept_outworld_enter();
+	save_current_stack(6);
+
+	pr_warn("===================================================\n");
+	pr_warn("DEPT: Circular dependency has been detected.\n");
+	pr_warn("%s %.*s %s\n", init_utsname()->release,
+		(int)strcspn(init_utsname()->version, " "),
+		init_utsname()->version,
+		print_tainted());
+	pr_warn("---------------------------------------------------\n");
+	pr_warn("summary\n");
+	pr_warn("---------------------------------------------------\n");
+
+	if (fc == tc)
+		pr_warn("*** AA DEADLOCK ***\n\n");
+	else
+		pr_warn("*** DEADLOCK ***\n\n");
+
+	i = 0;
+	do {
+		struct dept_dep *d = lookup_dep(fc, tc);
+
+		pr_warn("context %c\n", 'A' + (i++));
+		print_diagram(d);
+		if (fc != c)
+			pr_warn("\n");
+
+		tc = fc;
+		fc = fc->bfs_parent;
+	} while (tc != c);
+
+	pr_warn("\n");
+	pr_warn("[S]: start of the event context\n");
+	pr_warn("[W]: the wait blocked\n");
+	pr_warn("[E]: the event not reachable\n");
+
+	i = 0;
+	do {
+		struct dept_dep *d = lookup_dep(fc, tc);
+
+		pr_warn("---------------------------------------------------\n");
+		pr_warn("context %c's detail\n", 'A' + i);
+		pr_warn("---------------------------------------------------\n");
+		pr_warn("context %c\n", 'A' + (i++));
+		print_diagram(d);
+		pr_warn("\n");
+		print_dep(d);
+
+		tc = fc;
+		fc = fc->bfs_parent;
+	} while (tc != c);
+
+	pr_warn("---------------------------------------------------\n");
+	pr_warn("information that might be helpful\n");
+	pr_warn("---------------------------------------------------\n");
+	dump_stack();
+
+	dept_outworld_exit();
+}
+
+/*
+ * BFS(Breadth First Search)
+ * =====================================================================
+ * Whenever a new dependency is added into the graph, search the graph
+ * for a new circular dependency.
+ */
+
+static inline void enqueue(struct list_head *h, struct dept_dep *d)
+{
+	list_add_tail(&d->bfs_node, h);
+}
+
+static inline struct dept_dep *dequeue(struct list_head *h)
+{
+	struct dept_dep *d;
+
+	d = list_first_entry(h, struct dept_dep, bfs_node);
+	list_del(&d->bfs_node);
+	return d;
+}
+
+static inline bool empty(struct list_head *h)
+{
+	return list_empty(h);
+}
+
+static void extend_queue(struct list_head *h, struct dept_class *cur)
+{
+	struct dept_dep *d;
+
+	list_for_each_entry(d, &cur->dep_head, dep_node) {
+		struct dept_class *next = dep_tc(d);
+
+		if (cur->bfs_gen == next->bfs_gen)
+			continue;
+		next->bfs_gen = cur->bfs_gen;
+		next->bfs_dist = cur->bfs_dist + 1;
+		next->bfs_parent = cur;
+		enqueue(h, d);
+	}
+}
+
+static void extend_queue_rev(struct list_head *h, struct dept_class *cur)
+{
+	struct dept_dep *d;
+
+	list_for_each_entry(d, &cur->dep_rev_head, dep_rev_node) {
+		struct dept_class *next = dep_fc(d);
+
+		if (cur->bfs_gen == next->bfs_gen)
+			continue;
+		next->bfs_gen = cur->bfs_gen;
+		next->bfs_dist = cur->bfs_dist + 1;
+		next->bfs_parent = cur;
+		enqueue(h, d);
+	}
+}
+
+typedef enum bfs_ret bfs_f(struct dept_dep *d, void *in, void **out);
+static unsigned int bfs_gen;
+
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static void bfs(struct dept_class *c, bfs_f *cb, void *in, void **out)
+{
+	LIST_HEAD(q);
+	enum bfs_ret ret;
+
+	if (DEPT_WARN_ON(!cb))
+		return;
+
+	/*
+	 * Avoid zero bfs_gen.
+	 */
+	bfs_gen = bfs_gen + 1 ?: 1;
+
+	c->bfs_gen = bfs_gen;
+	c->bfs_dist = 0;
+	c->bfs_parent = c;
+
+	ret = cb(NULL, in, out);
+	if (ret == BFS_DONE)
+		return;
+	if (ret == BFS_SKIP)
+		return;
+	if (ret == BFS_CONTINUE)
+		extend_queue(&q, c);
+	if (ret == BFS_CONTINUE_REV)
+		extend_queue_rev(&q, c);
+
+	while (!empty(&q)) {
+		struct dept_dep *d = dequeue(&q);
+
+		ret = cb(d, in, out);
+		if (ret == BFS_DONE)
+			break;
+		if (ret == BFS_SKIP)
+			continue;
+		if (ret == BFS_CONTINUE)
+			extend_queue(&q, dep_tc(d));
+		if (ret == BFS_CONTINUE_REV)
+			extend_queue_rev(&q, dep_fc(d));
+	}
+
+	while (!empty(&q))
+		dequeue(&q);
+}
+
+/*
+ * Main operations
+ * =====================================================================
+ * Add dependencies - Each new dependency is added into the graph and
+ * checked if it forms a circular dependency.
+ *
+ * Track waits - Waits are queued into the ring buffer for later use to
+ * generate appropriate dependencies with cross-event.
+ *
+ * Track event contexts(ecxt) - Event contexts are pushed into local
+ * stack for later use to generate appropriate dependencies with waits.
+ */
+
+static inline unsigned long cur_enirqf(void);
+static inline int cur_irq(void);
+static inline unsigned int cur_ctxt_id(void);
+
+static inline struct dept_iecxt *iecxt(struct dept_class *c, int irq)
+{
+	return &c->iecxt[irq];
+}
+
+static inline struct dept_iwait *iwait(struct dept_class *c, int irq)
+{
+	return &c->iwait[irq];
+}
+
+static inline void stale_iecxt(struct dept_iecxt *ie)
+{
+	if (ie->ecxt)
+		put_ecxt(ie->ecxt);
+
+	WRITE_ONCE(ie->ecxt, NULL);
+	WRITE_ONCE(ie->staled, true);
+}
+
+static inline void set_iecxt(struct dept_iecxt *ie, struct dept_ecxt *e)
+{
+	/*
+	 * ->ecxt will never be updated once getting set until the class
+	 * gets removed.
+	 */
+	if (ie->ecxt)
+		DEPT_WARN_ON(1);
+	else
+		WRITE_ONCE(ie->ecxt, get_ecxt(e));
+}
+
+static inline void stale_iwait(struct dept_iwait *iw)
+{
+	if (iw->wait)
+		put_wait(iw->wait);
+
+	WRITE_ONCE(iw->wait, NULL);
+	WRITE_ONCE(iw->staled, true);
+}
+
+static inline void set_iwait(struct dept_iwait *iw, struct dept_wait *w)
+{
+	/*
+	 * ->wait will never be updated once getting set until the class
+	 * gets removed.
+	 */
+	if (iw->wait)
+		DEPT_WARN_ON(1);
+	else
+		WRITE_ONCE(iw->wait, get_wait(w));
+
+	iw->touched = true;
+}
+
+static inline void touch_iwait(struct dept_iwait *iw)
+{
+	iw->touched = true;
+}
+
+static inline void untouch_iwait(struct dept_iwait *iw)
+{
+	iw->touched = false;
+}
+
+static inline struct dept_stack *get_current_stack(void)
+{
+	struct dept_stack *s = dept_task()->stack;
+
+	return s ? get_stack(s) : NULL;
+}
+
+static inline void prepare_current_stack(void)
+{
+	struct dept_stack *s = dept_task()->stack;
+
+	/*
+	 * The dept_stack is already ready.
+	 */
+	if (s && !stack_consumed(s)) {
+		s->nr = 0;
+		return;
+	}
+
+	if (s)
+		put_stack(s);
+
+	s = dept_task()->stack = new_stack();
+	if (!s)
+		return;
+
+	get_stack(s);
+	del_stack(s);
+}
+
+static void save_current_stack(int skip)
+{
+	struct dept_stack *s = dept_task()->stack;
+
+	if (!s)
+		return;
+	if (valid_stack(s))
+		return;
+
+	s->nr = stack_trace_save(s->raw, DEPT_MAX_STACK_ENTRY, skip);
+}
+
+static void finish_current_stack(void)
+{
+	struct dept_stack *s = dept_task()->stack;
+
+	if (stack_consumed(s))
+		save_current_stack(2);
+}
+
+/*
+ * FIXME: For now, disable LOCKDEP while DEPT is working.
+ *
+ * Both LOCKDEP and DEPT report it on a deadlock detection using
+ * printk taking the risk of another deadlock that might be caused by
+ * locks of console or printk between inside and outside of them.
+ *
+ * For DEPT, it's no problem since multiple reports are allowed. But it
+ * would be a bad idea for LOCKDEP since it will stop even on a singe
+ * report. So we need to prevent LOCKDEP from its reporting the risk
+ * DEPT would take when reporting something.
+ */
+#include <linux/lockdep.h>
+
+void dept_off(void)
+{
+	dept_task()->recursive++;
+	lockdep_off();
+}
+
+void dept_on(void)
+{
+	dept_task()->recursive--;
+	lockdep_on();
+}
+
+static inline unsigned long dept_enter(void)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	dept_off();
+	prepare_current_stack();
+	return flags;
+}
+
+static inline void dept_exit(unsigned long flags)
+{
+	finish_current_stack();
+	dept_on();
+	raw_local_irq_restore(flags);
+}
+
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static struct dept_dep *__add_dep(struct dept_ecxt *e,
+				  struct dept_wait *w)
+{
+	struct dept_dep *d;
+
+	if (!valid_class(e->class) || !valid_class(w->class))
+		return NULL;
+
+	if (lookup_dep(e->class, w->class))
+		return NULL;
+
+	d = new_dep();
+	if (unlikely(!d))
+		return NULL;
+
+	d->ecxt = get_ecxt(e);
+	d->wait = get_wait(w);
+
+	/*
+	 * Add the dependency into hash and graph.
+	 */
+	hash_add_dep(d);
+	list_add(&d->dep_node, &dep_fc(d)->dep_head);
+	list_add(&d->dep_rev_node, &dep_tc(d)->dep_rev_head);
+	return d;
+}
+
+static enum bfs_ret cb_check_dl(struct dept_dep *d,
+				void *in, void **out)
+{
+	struct dept_dep *new = (struct dept_dep *)in;
+
+	/*
+	 * initial condition for this BFS search
+	 */
+	if (!d) {
+		dep_tc(new)->bfs_parent = dep_fc(new);
+
+		if (dep_tc(new) != dep_fc(new))
+			return BFS_CONTINUE;
+
+		/*
+		 * AA circle does not make additional deadlock. We don't
+		 * have to continue this BFS search.
+		 */
+		print_circle(dep_tc(new));
+		return BFS_DONE;
+	}
+
+	/*
+	 * Allow multiple reports.
+	 */
+	if (dep_tc(d) == dep_fc(new))
+		print_circle(dep_tc(new));
+
+	return BFS_CONTINUE;
+}
+
+/*
+ * This function is actually in charge of reporting.
+ */
+static inline void check_dl_bfs(struct dept_dep *d)
+{
+	bfs(dep_tc(d), cb_check_dl, (void *)d, NULL);
+}
+
+static enum bfs_ret cb_find_iw(struct dept_dep *d, void *in, void **out)
+{
+	int irq = *(int *)in;
+	struct dept_class *fc;
+	struct dept_iwait *iw;
+
+	if (DEPT_WARN_ON(!out))
+		return BFS_DONE;
+
+	/*
+	 * initial condition for this BFS search
+	 */
+	if (!d)
+		return BFS_CONTINUE_REV;
+
+	fc = dep_fc(d);
+	iw = iwait(fc, irq);
+
+	/*
+	 * If any parent's ->wait was set, then the children would've
+	 * been touched.
+	 */
+	if (!iw->touched)
+		return BFS_SKIP;
+
+	if (!iw->wait)
+		return BFS_CONTINUE_REV;
+
+	*out = iw;
+	return BFS_DONE;
+}
+
+static struct dept_iwait *find_iw_bfs(struct dept_class *c, int irq)
+{
+	struct dept_iwait *iw = iwait(c, irq);
+	struct dept_iwait *found = NULL;
+
+	if (iw->wait)
+		return iw;
+
+	/*
+	 * '->touched == false' guarantees there's no parent that has
+	 * been set ->wait.
+	 */
+	if (!iw->touched)
+		return NULL;
+
+	bfs(c, cb_find_iw, (void *)&irq, (void **)&found);
+
+	if (found)
+		return found;
+
+	untouch_iwait(iw);
+	return NULL;
+}
+
+static enum bfs_ret cb_touch_iw_find_ie(struct dept_dep *d, void *in,
+					void **out)
+{
+	int irq = *(int *)in;
+	struct dept_class *tc;
+	struct dept_iecxt *ie;
+	struct dept_iwait *iw;
+
+	if (DEPT_WARN_ON(!out))
+		return BFS_DONE;
+
+	/*
+	 * initial condition for this BFS search
+	 */
+	if (!d)
+		return BFS_CONTINUE;
+
+	tc = dep_tc(d);
+	ie = iecxt(tc, irq);
+	iw = iwait(tc, irq);
+
+	touch_iwait(iw);
+
+	if (!ie->ecxt)
+		return BFS_CONTINUE;
+
+	if (!*out)
+		*out = ie;
+
+	return BFS_CONTINUE;
+}
+
+static struct dept_iecxt *touch_iw_find_ie_bfs(struct dept_class *c,
+					       int irq)
+{
+	struct dept_iecxt *ie = iecxt(c, irq);
+	struct dept_iwait *iw = iwait(c, irq);
+	struct dept_iecxt *found = ie->ecxt ? ie : NULL;
+
+	touch_iwait(iw);
+	bfs(c, cb_touch_iw_find_ie, (void *)&irq, (void **)&found);
+	return found;
+}
+
+/*
+ * Should be called with dept_lock held.
+ */
+static void __add_idep(struct dept_iecxt *ie, struct dept_iwait *iw)
+{
+	struct dept_dep *new;
+
+	/*
+	 * There's nothing to do.
+	 */
+	if (!ie || !iw || !ie->ecxt || !iw->wait)
+		return;
+
+	new = __add_dep(ie->ecxt, iw->wait);
+
+	/*
+	 * Deadlock detected. Let check_dl_bfs() report it.
+	 */
+	if (new) {
+		check_dl_bfs(new);
+		stale_iecxt(ie);
+		stale_iwait(iw);
+	}
+
+	/*
+	 * If !new, it would be the case of lack of object resource.
+	 * Just let it go and get checked by other chances. Retrying is
+	 * meaningless in that case.
+	 */
+}
+
+static void set_check_iecxt(struct dept_class *c, int irq,
+			    struct dept_ecxt *e)
+{
+	struct dept_iecxt *ie = iecxt(c, irq);
+
+	set_iecxt(ie, e);
+	__add_idep(ie, find_iw_bfs(c, irq));
+}
+
+static void set_check_iwait(struct dept_class *c, int irq,
+			    struct dept_wait *w)
+{
+	struct dept_iwait *iw = iwait(c, irq);
+
+	set_iwait(iw, w);
+	__add_idep(touch_iw_find_ie_bfs(c, irq), iw);
+}
+
+static void add_iecxt(struct dept_class *c, int irq, struct dept_ecxt *e,
+		      bool stack)
+{
+	/*
+	 * This access is safe since we ensure e->class has set locally.
+	 */
+	struct dept_task *dt = dept_task();
+	struct dept_iecxt *ie = iecxt(c, irq);
+
+	if (unlikely(READ_ONCE(ie->staled)))
+		return;
+
+	/*
+	 * Skip add_iecxt() if ie->ecxt has ever been set at least once.
+	 * Which means it has a valid ->ecxt or been staled.
+	 */
+	if (READ_ONCE(ie->ecxt))
+		return;
+
+	if (unlikely(!dept_lock()))
+		return;
+
+	if (unlikely(ie->staled))
+		goto unlock;
+	if (ie->ecxt)
+		goto unlock;
+
+	e->enirqf |= (1UL << irq);
+
+	/*
+	 * Should be NULL since it's the first time that these
+	 * enirq_{ip,stack}[irq] have ever set.
+	 */
+	DEPT_WARN_ON(e->enirq_ip[irq]);
+	DEPT_WARN_ON(e->enirq_stack[irq]);
+
+	e->enirq_ip[irq] = dt->enirq_ip[irq];
+	e->enirq_stack[irq] = stack ? get_current_stack() : NULL;
+
+	set_check_iecxt(c, irq, e);
+unlock:
+	dept_unlock();
+}
+
+static void add_iwait(struct dept_class *c, int irq, struct dept_wait *w)
+{
+	struct dept_iwait *iw = iwait(c, irq);
+
+	if (unlikely(READ_ONCE(iw->staled)))
+		return;
+
+	/*
+	 * Skip add_iwait() if iw->wait has ever been set at least once.
+	 * Which means it has a valid ->wait or been staled.
+	 */
+	if (READ_ONCE(iw->wait))
+		return;
+
+	if (unlikely(!dept_lock()))
+		return;
+
+	if (unlikely(iw->staled))
+		goto unlock;
+	if (iw->wait)
+		goto unlock;
+
+	w->irqf |= (1UL << irq);
+
+	/*
+	 * Should be NULL since it's the first time that these
+	 * irq_{ip,stack}[irq] have ever set.
+	 */
+	DEPT_WARN_ON(w->irq_ip[irq]);
+	DEPT_WARN_ON(w->irq_stack[irq]);
+
+	w->irq_ip[irq] = w->wait_ip;
+	w->irq_stack[irq] = get_current_stack();
+
+	set_check_iwait(c, irq, w);
+unlock:
+	dept_unlock();
+}
+
+static inline struct dept_wait_hist *hist(int pos)
+{
+	struct dept_task *dt = dept_task();
+
+	return dt->wait_hist + (pos % DEPT_MAX_WAIT_HIST);
+}
+
+static inline int hist_pos_next(void)
+{
+	struct dept_task *dt = dept_task();
+
+	return dt->wait_hist_pos % DEPT_MAX_WAIT_HIST;
+}
+
+static inline void hist_advance(void)
+{
+	struct dept_task *dt = dept_task();
+
+	dt->wait_hist_pos++;
+	dt->wait_hist_pos %= DEPT_MAX_WAIT_HIST;
+}
+
+static inline struct dept_wait_hist *new_hist(void)
+{
+	struct dept_wait_hist *wh = hist(hist_pos_next());
+
+	hist_advance();
+	return wh;
+}
+
+static void add_hist(struct dept_wait *w, unsigned int wg, unsigned int ctxt_id)
+{
+	struct dept_wait_hist *wh = new_hist();
+
+	if (likely(wh->wait))
+		put_wait(wh->wait);
+
+	wh->wait = get_wait(w);
+	wh->wgen = wg;
+	wh->ctxt_id = ctxt_id;
+}
+
+/*
+ * Should be called after setting up e's iecxt and w's iwait.
+ */
+static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
+{
+	struct dept_class *fc = e->class;
+	struct dept_class *tc = w->class;
+	struct dept_dep *d;
+	int i;
+
+	if (lookup_dep(fc, tc))
+		return;
+
+	if (unlikely(!dept_lock()))
+		return;
+
+	/*
+	 * __add_dep() will lookup_dep() again with lock held.
+	 */
+	d = __add_dep(e, w);
+	if (d) {
+		check_dl_bfs(d);
+
+		for (i = 0; i < DEPT_IRQS_NR; i++) {
+			struct dept_iwait *fiw = iwait(fc, i);
+			struct dept_iecxt *found_ie;
+			struct dept_iwait *found_iw;
+
+			/*
+			 * '->touched == false' guarantees there's no
+			 * parent that has been set ->wait.
+			 */
+			if (!fiw->touched)
+				continue;
+
+			/*
+			 * find_iw_bfs() will untouch the iwait if
+			 * not found.
+			 */
+			found_iw = find_iw_bfs(fc, i);
+
+			if (!found_iw)
+				continue;
+
+			found_ie = touch_iw_find_ie_bfs(tc, i);
+			__add_idep(found_ie, found_iw);
+		}
+	}
+	dept_unlock();
+}
+
+static atomic_t wgen = ATOMIC_INIT(1);
+
+static void add_wait(struct dept_class *c, unsigned long ip,
+		     const char *w_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_wait *w;
+	unsigned int wg = 0U;
+	int irq;
+	int i;
+
+	w = new_wait();
+	if (unlikely(!w))
+		return;
+
+	WRITE_ONCE(w->class, get_class(c));
+	w->wait_ip = ip;
+	w->wait_fn = w_fn;
+	w->wait_stack = get_current_stack();
+
+	irq = cur_irq();
+	if (irq < DEPT_IRQS_NR)
+		add_iwait(c, irq, w);
+
+	/*
+	 * Avoid adding dependency between user aware nested ecxt and
+	 * wait.
+	 */
+	for (i = dt->ecxt_held_pos - 1; i >= 0; i--) {
+		struct dept_ecxt_held *eh;
+
+		eh = dt->ecxt_held + i;
+		if (eh->ecxt->class != c || eh->nest == ne)
+			break;
+	}
+
+	for (; i >= 0; i--) {
+		struct dept_ecxt_held *eh;
+
+		eh = dt->ecxt_held + i;
+		add_dep(eh->ecxt, w);
+	}
+
+	if (!wait_consumed(w) && !rich_stack) {
+		if (w->wait_stack)
+			put_stack(w->wait_stack);
+		w->wait_stack = NULL;
+	}
+
+	/*
+	 * Avoid zero wgen.
+	 */
+	wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+	add_hist(w, wg, cur_ctxt_id());
+
+	del_wait(w);
+}
+
+static void add_ecxt(void *obj, struct dept_class *c, unsigned long ip,
+		     const char *c_fn, const char *e_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_ecxt_held *eh;
+	struct dept_ecxt *e;
+	unsigned long irqf;
+	int irq;
+
+	if (DEPT_WARN_ON(dt->ecxt_held_pos == DEPT_MAX_ECXT_HELD))
+		return;
+
+	e = new_ecxt();
+	if (unlikely(!e))
+		return;
+
+	e->class = get_class(c);
+	e->ecxt_ip = ip;
+	e->ecxt_stack = ip && rich_stack ? get_current_stack() : NULL;
+	e->event_fn = e_fn;
+	e->ecxt_fn = c_fn;
+
+	eh = dt->ecxt_held + (dt->ecxt_held_pos++);
+	eh->ecxt = get_ecxt(e);
+	eh->key = (unsigned long)obj;
+	eh->wgen = atomic_read(&wgen);
+	eh->nest = ne;
+
+	irqf = cur_enirqf();
+	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR)
+		add_iecxt(c, irq, e, false);
+
+	del_ecxt(e);
+}
+
+static int find_ecxt_pos(unsigned long key, bool newfirst)
+{
+	struct dept_task *dt = dept_task();
+	int i;
+
+	if (newfirst) {
+		for (i = dt->ecxt_held_pos - 1; i >= 0; i--)
+			if (dt->ecxt_held[i].key == key)
+				return i;
+	} else {
+		for (i = 0; i < dt->ecxt_held_pos; i++)
+			if (dt->ecxt_held[i].key == key)
+				return i;
+	}
+	return -1;
+}
+
+static void pop_ecxt(void *obj)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long key = (unsigned long)obj;
+	int pos;
+	int i;
+
+	/*
+	 * TODO: WARN on pos == -1.
+	 */
+	pos = find_ecxt_pos(key, true);
+	if (pos == -1)
+		return;
+
+	put_ecxt(dt->ecxt_held[pos].ecxt);
+	dt->ecxt_held_pos--;
+
+	for (i = pos; i < dt->ecxt_held_pos; i++)
+		dt->ecxt_held[i] = dt->ecxt_held[i + 1];
+}
+
+static inline bool good_hist(struct dept_wait_hist *wh, unsigned int wg)
+{
+	return wh->wait != NULL && before(wg, wh->wgen);
+}
+
+/*
+ * Binary-search the ring buffer for the earliest valid wait.
+ */
+static int find_hist_pos(unsigned int wg)
+{
+	int oldest;
+	int l;
+	int r;
+	int pos;
+
+	oldest = hist_pos_next();
+	if (unlikely(good_hist(hist(oldest), wg))) {
+		DEPT_WARN_ONCE("Need to expand the ring buffer.\n");
+		return oldest;
+	}
+
+	l = oldest + 1;
+	r = oldest + DEPT_MAX_WAIT_HIST - 1;
+	for (pos = (l + r) / 2; l <= r; pos = (l + r) / 2) {
+		struct dept_wait_hist *p = hist(pos - 1);
+		struct dept_wait_hist *wh = hist(pos);
+
+		if (!good_hist(p, wg) && good_hist(wh, wg))
+			return pos % DEPT_MAX_WAIT_HIST;
+		if (good_hist(wh, wg))
+			r = pos - 1;
+		else
+			l = pos + 1;
+	}
+	return -1;
+}
+
+static void do_event(void *obj, struct dept_class *c, unsigned int wg,
+		     unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_wait_hist *wh;
+	struct dept_ecxt_held *eh;
+	unsigned long key = (unsigned long)obj;
+	unsigned int ctxt_id;
+	int end;
+	int pos;
+	int i;
+
+	/*
+	 * The event was triggered before wait.
+	 */
+	if (!wg)
+		return;
+
+	pos = find_ecxt_pos(key, false);
+	if (pos == -1)
+		return;
+
+	eh = dt->ecxt_held + pos;
+	eh->ecxt->event_ip = ip;
+	eh->ecxt->event_stack = get_current_stack();
+
+	/*
+	 * The ecxt already has done what it needs.
+	 */
+	if (!before(wg, eh->wgen))
+		return;
+
+	pos = find_hist_pos(wg);
+	if (pos == -1)
+		return;
+
+	ctxt_id = cur_ctxt_id();
+	end = hist_pos_next();
+	end = end > pos ? end : end + DEPT_MAX_WAIT_HIST;
+	for (wh = hist(pos); pos < end; wh = hist(++pos)) {
+		if (wh->ctxt_id == ctxt_id)
+			add_dep(eh->ecxt, wh->wait);
+		if (!before(wh->wgen, eh->wgen))
+			break;
+	}
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		struct dept_ecxt *e;
+
+		if (before(dt->wgen_enirq[i], wg))
+			continue;
+
+		e = eh->ecxt;
+		add_iecxt(e->class, i, e, false);
+	}
+}
+
+static void del_dep_rcu(struct rcu_head *rh)
+{
+	struct dept_dep *d = container_of(rh, struct dept_dep, rh);
+
+	preempt_disable();
+	del_dep(d);
+	preempt_enable();
+}
+
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static void disconnect_class(struct dept_class *c)
+{
+	struct dept_dep *d, *n;
+	int i;
+
+	list_for_each_entry_safe(d, n, &c->dep_head, dep_node) {
+		list_del_rcu(&d->dep_node);
+		list_del_rcu(&d->dep_rev_node);
+		hash_del_dep(d);
+		call_rcu(&d->rh, del_dep_rcu);
+	}
+
+	list_for_each_entry_safe(d, n, &c->dep_rev_head, dep_rev_node) {
+		list_del_rcu(&d->dep_node);
+		list_del_rcu(&d->dep_rev_node);
+		hash_del_dep(d);
+		call_rcu(&d->rh, del_dep_rcu);
+	}
+
+	for (i = 0; i < DEPT_IRQS_NR; i++) {
+		stale_iecxt(iecxt(c, i));
+		stale_iwait(iwait(c, i));
+	}
+}
+
+/*
+ * IRQ context control
+ * =====================================================================
+ * Whether a wait is in {hard,soft}-IRQ context or whether
+ * {hard,soft}-IRQ has been enabled on the way to an event is very
+ * important to check dependency. All those things should be tracked.
+ */
+
+static inline unsigned long cur_enirqf(void)
+{
+	struct dept_task *dt = dept_task();
+	int he = dt->hardirqs_enabled;
+	int se = dt->softirqs_enabled;
+
+	if (he)
+		return DEPT_HIRQF | (se ? DEPT_SIRQF : 0UL);
+	return 0UL;
+}
+
+static inline int cur_irq(void)
+{
+	if (lockdep_softirq_context(current))
+		return DEPT_SIRQ;
+	if (lockdep_hardirq_context())
+		return DEPT_HIRQ;
+	return DEPT_IRQS_NR;
+}
+
+static inline unsigned int cur_ctxt_id(void)
+{
+	struct dept_task *dt = dept_task();
+	int irq = cur_irq();
+
+	/*
+	 * Normal process context
+	 */
+	if (irq == DEPT_IRQS_NR)
+		return 0U;
+
+	return dt->irq_id[irq] | (1UL << irq);
+}
+
+static void enirq_transition(int irq)
+{
+	struct dept_task *dt = dept_task();
+	int i;
+
+	/*
+	 * READ wgen >= wgen of an event with IRQ enabled has been
+	 * observed on the way to the event means, the IRQ can cut in
+	 * within the ecxt. Used for cross-event detection.
+	 *
+	 *    wait context	event context(ecxt)
+	 *    ------------	-------------------
+	 *    wait event
+	 *       WRITE wgen
+	 *			observe IRQ enabled
+	 *			   READ wgen
+	 *			   keep the wgen locally
+	 *
+	 *			on the event
+	 *			   check the local wgen
+	 */
+	dt->wgen_enirq[irq] = atomic_read(&wgen);
+
+	for (i = dt->ecxt_held_pos - 1; i >= 0; i--) {
+		struct dept_ecxt_held *eh;
+		struct dept_ecxt *e;
+
+		eh = dt->ecxt_held + i;
+		e = eh->ecxt;
+		add_iecxt(e->class, irq, e, true);
+	}
+}
+
+static void enirq_update(unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long irqf;
+	unsigned long prev;
+	int irq;
+
+	prev = dt->eff_enirqf;
+	irqf = cur_enirqf();
+	dt->eff_enirqf = irqf;
+
+	/*
+	 * Do enirq_transition() only on an OFF -> ON transition.
+	 */
+	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+		if (prev & (1UL << irq))
+			continue;
+
+		dt->enirq_ip[irq] = ip;
+		enirq_transition(irq);
+	}
+}
+
+/*
+ * Ensure it has been called on OFF -> ON transition.
+ */
+void dept_enable_softirq(unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	if (DEPT_WARN_ON(early_boot_irqs_disabled))
+		goto exit;
+
+	if (DEPT_WARN_ON(!irqs_disabled()))
+		goto exit;
+
+	dt->softirqs_enabled = true;
+	enirq_update(ip);
+exit:
+	dept_exit(flags);
+}
+
+/*
+ * Ensure it has been called on OFF -> ON transition.
+ */
+void dept_enable_hardirq(unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	if (DEPT_WARN_ON(early_boot_irqs_disabled))
+		goto exit;
+
+	if (DEPT_WARN_ON(!irqs_disabled()))
+		goto exit;
+
+	dt->hardirqs_enabled = true;
+	enirq_update(ip);
+exit:
+	dept_exit(flags);
+}
+
+/*
+ * Ensure it has been called on ON -> OFF transition.
+ */
+void dept_disable_softirq(unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	if (DEPT_WARN_ON(!irqs_disabled()))
+		goto exit;
+
+	dt->softirqs_enabled = false;
+	enirq_update(ip);
+exit:
+	dept_exit(flags);
+}
+
+/*
+ * Ensure it has been called on ON -> OFF transition.
+ */
+void dept_disable_hardirq(unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	if (DEPT_WARN_ON(!irqs_disabled()))
+		goto exit;
+
+	dt->hardirqs_enabled = false;
+	enirq_update(ip);
+exit:
+	dept_exit(flags);
+}
+
+/*
+ * Ensure it's the outmost softirq context.
+ */
+void dept_softirq_enter(void)
+{
+	struct dept_task *dt = dept_task();
+
+	dt->irq_id[DEPT_SIRQ] += (1UL << DEPT_IRQS_NR);
+}
+
+/*
+ * Ensure it's the outmost hardirq context.
+ */
+void dept_hardirq_enter(void)
+{
+	struct dept_task *dt = dept_task();
+
+	dt->irq_id[DEPT_HIRQ] += (1UL << DEPT_IRQS_NR);
+}
+
+/*
+ * DEPT API
+ * =====================================================================
+ * Main DEPT APIs.
+ */
+
+static inline void clean_classes_cache(struct dept_key *k)
+{
+	int i;
+
+	for (i = 0; i < DEPT_MAX_SUBCLASSES_CACHE; i++)
+		k->classes[i] = NULL;
+}
+
+void dept_map_init(struct dept_map *m, struct dept_key *k, int sub,
+		   const char *n)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	if (DEPT_WARN_ON(sub < 0 || sub >= DEPT_MAX_SUBCLASSES_USR)) {
+		m->nocheck = true;
+		goto exit;
+	}
+
+	if (m->keys != k)
+		m->keys = k;
+	clean_classes_cache(&m->keys_local);
+
+	m->sub_usr = sub;
+	m->name = n;
+	m->wgen = 0U;
+	m->nocheck = false;
+exit:
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_map_init);
+
+void dept_map_reinit(struct dept_map *m)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	clean_classes_cache(&m->keys_local);
+	m->wgen = 0U;
+	m->nocheck = false;
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_map_reinit);
+
+void dept_map_nocheck(struct dept_map *m)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	m->nocheck = true;
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_map_nocheck);
+
+static LIST_HEAD(classes);
+
+static inline bool within(const void *addr, void *start, unsigned long size)
+{
+	return addr >= start && addr < start + size;
+}
+
+void dept_free_range(void *start, unsigned int sz)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_class *c, *n;
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	/*
+	 * dept_free_range() should not fail.
+	 *
+	 * FIXME: Should be fixed if dept_free_range() causes deadlock
+	 * with dept_lock().
+	 */
+	while (unlikely(!dept_lock()))
+		cpu_relax();
+
+	list_for_each_entry_safe(c, n, &classes, all_node) {
+		if (!within((void *)c->key, start, sz) &&
+		    !within(c->name, start, sz))
+			continue;
+
+		hash_del_class(c);
+		disconnect_class(c);
+		list_del(&c->all_node);
+		inval_class(c);
+
+		/*
+		 * Actual deletion will happen on the rcu callback
+		 * that has been added in disconnect_class().
+		 */
+		del_class(c);
+	}
+	dept_unlock();
+	dept_exit(flags);
+
+	/*
+	 * Wait until even lockless hash_lookup_class() for the class
+	 * returns NULL.
+	 */
+	might_sleep();
+	synchronize_rcu();
+}
+
+static inline int map_sub(struct dept_map *m, int e)
+{
+	return m->sub_usr + e * DEPT_MAX_SUBCLASSES_USR;
+}
+
+static struct dept_class *check_new_class(struct dept_key *local,
+					  struct dept_key *k, int sub,
+					  const char *n)
+{
+	struct dept_class *c = NULL;
+
+	if (DEPT_WARN_ON(sub >= DEPT_MAX_SUBCLASSES))
+		return NULL;
+
+	if (DEPT_WARN_ON(!k))
+		return NULL;
+
+	if (sub < DEPT_MAX_SUBCLASSES_CACHE)
+		c = READ_ONCE(local->classes[sub]);
+
+	if (c)
+		return c;
+
+	c = lookup_class((unsigned long)k->subkeys + sub);
+	if (c)
+		goto caching;
+
+	if (unlikely(!dept_lock()))
+		return NULL;
+
+	c = lookup_class((unsigned long)k->subkeys + sub);
+	if (unlikely(c))
+		goto unlock;
+
+	c = new_class();
+	if (unlikely(!c))
+		goto unlock;
+
+	c->name = n;
+	c->sub = sub;
+	c->key = (unsigned long)(k->subkeys + sub);
+	hash_add_class(c);
+	list_add(&c->all_node, &classes);
+unlock:
+	dept_unlock();
+caching:
+	if (sub < DEPT_MAX_SUBCLASSES_CACHE && c)
+		WRITE_ONCE(local->classes[sub], c);
+
+	return c;
+}
+
+void __dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip,
+		 const char *w_fn, int ne)
+{
+	int e;
+
+	/*
+	 * Be as conservative as possible. In case of mulitple waits for
+	 * a single dept_map, we are going to keep only the last wait's
+	 * wgen for simplicity - keeping all wgens seems overengineering.
+	 *
+	 * Of course, it might cause missing some dependencies that
+	 * would rarely, probabily never, happen but it helps avoid
+	 * false positive report.
+	 */
+	for_each_set_bit(e, &w_f, DEPT_MAX_SUBCLASSES_EVT) {
+		struct dept_class *c;
+		struct dept_key *k;
+
+		k = m->keys ?: &m->keys_local;
+		c = check_new_class(&m->keys_local, k,
+				    map_sub(m, e), m->name);
+		if (!c)
+			continue;
+
+		add_wait(c, ip, w_fn, ne);
+	}
+}
+
+void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip,
+	       const char *w_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	if (m->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	__dept_wait(m, w_f, ip, w_fn, ne);
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_wait);
+
+static inline void stage_map(struct dept_task *dt, struct dept_map *m)
+{
+	dt->stage_m = m;
+}
+
+static inline void unstage_map(struct dept_task *dt)
+{
+	dt->stage_m = NULL;
+}
+
+static inline struct dept_map *staged_map(struct dept_task *dt)
+{
+	return dt->stage_m;
+}
+
+void dept_stage_wait(struct dept_map *m, unsigned long w_f,
+		     unsigned long ip, const char *w_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	if (m->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	stage_map(dt, m);
+
+	dt->stage_w_f = w_f;
+	dt->stage_ip = ip;
+	dt->stage_w_fn = w_fn;
+	dt->stage_ne = ne;
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_stage_wait);
+
+void dept_clean_stage(void)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	unstage_map(dt);
+
+	dt->stage_w_f = 0UL;
+	dt->stage_ip = 0UL;
+	dt->stage_w_fn = NULL;
+	dt->stage_ne = 0;
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_clean_stage);
+
+/*
+ * Always called from __schedule().
+ */
+void dept_ask_event_wait_commit(void)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	unsigned int wg;
+	struct dept_map *m;
+	unsigned long w_f;
+	unsigned long ip;
+	const char *w_fn;
+	int ne;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	m = staged_map(dt);
+
+	/*
+	 * Checks if current has staged a wait before __schedule().
+	 */
+	if (!m)
+		goto exit;
+
+	if (m->nocheck)
+		goto exit;
+
+	w_f = dt->stage_w_f;
+	ip = dt->stage_ip;
+	w_fn = dt->stage_w_fn;
+	ne = dt->stage_ne;
+
+	/*
+	 * Avoid zero wgen.
+	 */
+	wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+	WRITE_ONCE(m->wgen, wg);
+
+	__dept_wait(m, w_f, ip, w_fn, ne);
+exit:
+	dept_exit(flags);
+}
+
+void dept_ecxt_enter(struct dept_map *m, unsigned long e_f, unsigned long ip,
+		     const char *c_fn, const char *e_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	int e;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	if (m->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	for_each_set_bit(e, &e_f, DEPT_MAX_SUBCLASSES_EVT) {
+		struct dept_class *c;
+		struct dept_key *k;
+
+		k = m->keys ?: &m->keys_local;
+		c = check_new_class(&m->keys_local, k,
+				    map_sub(m, e), m->name);
+		if (!c)
+			continue;
+
+		add_ecxt((void *)m, c, ip, c_fn, e_fn, ne);
+	}
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_ecxt_enter);
+
+void dept_ask_event(struct dept_map *m)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	unsigned int wg;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	if (m->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	/*
+	 * Avoid zero wgen.
+	 */
+	wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+	WRITE_ONCE(m->wgen, wg);
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_ask_event);
+
+void dept_event(struct dept_map *m, unsigned long e_f, unsigned long ip,
+		const char *e_fn)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	int e;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	if (m->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	for_each_set_bit(e, &e_f, DEPT_MAX_SUBCLASSES_EVT) {
+		struct dept_class *c;
+		struct dept_key *k;
+
+		k = m->keys ?: &m->keys_local;
+		c = check_new_class(&m->keys_local, k,
+				    map_sub(m, e), m->name);
+		if (!c)
+			continue;
+
+		add_ecxt((void *)m, c, 0UL, NULL, e_fn, 0);
+		do_event((void *)m, c, READ_ONCE(m->wgen), ip);
+		pop_ecxt((void *)m);
+	}
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_event);
+
+void dept_ecxt_exit(struct dept_map *m, unsigned long ip)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	if (m->nocheck)
+		return;
+
+	flags = dept_enter();
+	pop_ecxt((void *)m);
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_ecxt_exit);
+
+struct dept_map *dept_top_map(void)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_map *m;
+	unsigned long flags;
+	int pos;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return NULL;
+
+	flags = dept_enter();
+	pos = dt->ecxt_held_pos;
+	m = pos ? (struct dept_map *)dt->ecxt_held[pos - 1].key : NULL;
+	dept_exit(flags);
+
+	return m;
+}
+EXPORT_SYMBOL_GPL(dept_top_map);
+
+void dept_warn_on(bool cond)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+	DEPT_WARN_ON(cond);
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_warn_on);
+
+void dept_task_exit(struct task_struct *t)
+{
+	struct dept_task *dt = &t->dept_task;
+	int i;
+
+	raw_local_irq_disable();
+
+	if (dt->stack)
+		put_stack(dt->stack);
+
+	for (i = 0; i < dt->ecxt_held_pos; i++)
+		put_ecxt(dt->ecxt_held[i].ecxt);
+
+	for (i = 0; i < DEPT_MAX_WAIT_HIST; i++)
+		if (dt->wait_hist[i].wait)
+			put_wait(dt->wait_hist[i].wait);
+
+	dept_off();
+
+	raw_local_irq_enable();
+}
+
+void dept_task_init(struct task_struct *t)
+{
+	memset(&t->dept_task, 0x0, sizeof(struct dept_task));
+}
+
+void dept_key_init(struct dept_key *k)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	int sub;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	/*
+	 * dept_key_init() should not fail.
+	 *
+	 * FIXME: Should be fixed if dept_key_init() causes deadlock
+	 * with dept_lock().
+	 */
+	while (unlikely(!dept_lock()))
+		cpu_relax();
+
+	for (sub = 0; sub < DEPT_MAX_SUBCLASSES; sub++) {
+		struct dept_class *c;
+
+		c = lookup_class((unsigned long)k->subkeys + sub);
+		if (!c)
+			continue;
+
+		DEPT_STOP("The class(%s/%d) has not been removed.\n",
+			  c->name, sub);
+		break;
+	}
+
+	dept_unlock();
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_key_init);
+
+void dept_key_destroy(struct dept_key *k)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	int sub;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	/*
+	 * dept_key_destroy() should not fail.
+	 *
+	 * FIXME: Should be fixed if dept_key_destroy() causes deadlock
+	 * with dept_lock().
+	 */
+	while (unlikely(!dept_lock()))
+		cpu_relax();
+
+	for (sub = 0; sub < DEPT_MAX_SUBCLASSES; sub++) {
+		struct dept_class *c;
+
+		c = lookup_class((unsigned long)k->subkeys + sub);
+		if (!c)
+			continue;
+
+		hash_del_class(c);
+		disconnect_class(c);
+		list_del(&c->all_node);
+		inval_class(c);
+
+		/*
+		 * Actual deletion will happen on the rcu callback
+		 * that has been added in disconnect_class().
+		 */
+		del_class(c);
+	}
+
+	dept_unlock();
+	dept_exit(flags);
+
+	/*
+	 * Wait until even lockless hash_lookup_class() for the class
+	 * returns NULL.
+	 */
+	might_sleep();
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(dept_key_destroy);
+
+static void move_llist(struct llist_head *to, struct llist_head *from)
+{
+	struct llist_node *first = llist_del_all(from);
+	struct llist_node *last;
+
+	if (!first)
+		return;
+
+	for (last = first; last->next; last = last->next);
+	llist_add_batch(first, last, to);
+}
+
+static void migrate_per_cpu_pool(void)
+{
+	const int boot_cpu = 0;
+	int i;
+
+	/*
+	 * The boot CPU has been using the temperal local pool so far.
+	 * From now on that per_cpu areas have been ready, use the
+	 * per_cpu local pool instead.
+	 */
+	DEPT_WARN_ON(smp_processor_id() != boot_cpu);
+	for (i = 0; i < OBJECT_NR; i++) {
+		struct llist_head *from;
+		struct llist_head *to;
+
+		from = &pool[i].boot_pool;
+		to = per_cpu_ptr(pool[i].lpool, boot_cpu);
+		move_llist(to, from);
+	}
+}
+
+#define B2KB(B) ((B) / 1024)
+
+/*
+ * Should be called after setup_per_cpu_areas() and before no non-boot
+ * CPUs have been on.
+ */
+void __init dept_init(void)
+{
+	size_t mem_total = 0;
+
+	local_irq_disable();
+	dept_per_cpu_ready = 1;
+	migrate_per_cpu_pool();
+	local_irq_enable();
+
+#define OBJECT(id, nr) mem_total += sizeof(struct dept_##id) * nr;
+	#include "dept_object.h"
+#undef  OBJECT
+#define HASH(id, bits) mem_total += sizeof(struct hlist_head) * (1UL << bits);
+	#include "dept_hash.h"
+#undef  HASH
+
+	pr_info("DEPendency Tracker: Copyright (c) 2020 LG Electronics, Inc., Byungchul Park\n");
+	pr_info("... DEPT_MAX_STACK_ENTRY: %d\n", DEPT_MAX_STACK_ENTRY);
+	pr_info("... DEPT_MAX_WAIT_HIST  : %d\n", DEPT_MAX_WAIT_HIST);
+	pr_info("... DEPT_MAX_ECXT_HELD  : %d\n", DEPT_MAX_ECXT_HELD);
+	pr_info("... DEPT_MAX_SUBCLASSES : %d\n", DEPT_MAX_SUBCLASSES);
+#define OBJECT(id, nr)							\
+	pr_info("... memory used by %s: %zu KB\n",			\
+	       #id, B2KB(sizeof(struct dept_##id) * nr));
+	#include "dept_object.h"
+#undef  OBJECT
+#define HASH(id, bits)							\
+	pr_info("... hash list head used by %s: %zu KB\n",		\
+	       #id, B2KB(sizeof(struct hlist_head) * (1UL << bits)));
+	#include "dept_hash.h"
+#undef  HASH
+	pr_info("... total memory used by objects and hashs: %zu KB\n", B2KB(mem_total));
+	pr_info("... per task memory footprint: %zu bytes\n", sizeof(struct dept_task));
+}
diff --git a/kernel/dependency/dept_hash.h b/kernel/dependency/dept_hash.h
new file mode 100644
index 0000000..fd85aab
--- /dev/null
+++ b/kernel/dependency/dept_hash.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * HASH(id, bits)
+ *
+ * id  : Id for the object of struct dept_##id.
+ * bits: 1UL << bits is the hash table size.
+ */
+
+HASH(dep, 12)
+HASH(class, 12)
diff --git a/kernel/dependency/dept_object.h b/kernel/dependency/dept_object.h
new file mode 100644
index 0000000..ad5ff57
--- /dev/null
+++ b/kernel/dependency/dept_object.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * OBJECT(id, nr)
+ *
+ * id: Id for the object of struct dept_##id.
+ * nr: # of the object that should be kept in the pool.
+ */
+
+OBJECT(dep, 1024 * 8)
+OBJECT(class, 1024 * 4)
+OBJECT(stack, 1024 * 32)
+OBJECT(ecxt, 1024 * 4)
+OBJECT(wait, 1024 * 32)
diff --git a/kernel/exit.c b/kernel/exit.c
index b00a25b..187ee24 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -854,6 +854,7 @@ void __noreturn do_exit(long code)
 	exit_tasks_rcu_finish();
 
 	lockdep_free_task(tsk);
+	dept_task_exit(tsk);
 	do_task_dead();
 }
 
diff --git a/kernel/fork.c b/kernel/fork.c
index d75a528f..8de918b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -97,6 +97,7 @@
 #include <linux/scs.h>
 #include <linux/io_uring.h>
 #include <linux/bpf.h>
+#include <linux/dept.h>
 
 #include <asm/pgalloc.h>
 #include <linux/uaccess.h>
@@ -2117,6 +2118,7 @@ static __latent_entropy struct task_struct *copy_process(
 #ifdef CONFIG_LOCKDEP
 	lockdep_init_task(p);
 #endif
+	dept_task_init(p);
 
 #ifdef CONFIG_DEBUG_MUTEXES
 	p->blocked_on = NULL; /* not blocked yet */
diff --git a/kernel/module.c b/kernel/module.c
index 24dab04..bd9376d 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2205,6 +2205,7 @@ static void free_module(struct module *mod)
 
 	/* Free lock-classes; relies on the preceding sync_rcu(). */
 	lockdep_free_key_range(mod->core_layout.base, mod->core_layout.size);
+	dept_free_range(mod->core_layout.base, mod->core_layout.size);
 
 	/* Finally, free the core (containing the module structure) */
 	module_memfree(mod->core_layout.base);
@@ -4174,6 +4175,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
  free_module:
 	/* Free lock-classes; relies on the preceding sync_rcu() */
 	lockdep_free_key_range(mod->core_layout.base, mod->core_layout.size);
+	dept_free_range(mod->core_layout.base, mod->core_layout.size);
 
 	module_deallocate(mod, info);
  free_copy:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2e4ae00..1597d1c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6192,6 +6192,9 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	local_irq_disable();
 	rcu_note_context_switch(!!sched_mode);
 
+	if (sched_mode == SM_NONE)
+		dept_ask_event_wait_commit();
+
 	/*
 	 * Make sure that signal_pending_state()->signal_pending() below
 	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 41f4709..a28c950 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -320,7 +320,7 @@ void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
 	 * Were softirqs turned off above:
 	 */
 	if (softirq_count() == (cnt & SOFTIRQ_MASK))
-		lockdep_softirqs_off(ip);
+		trace_softirqs_off_caller(ip);
 	raw_local_irq_restore(flags);
 
 	if (preempt_count() == cnt) {
@@ -341,7 +341,7 @@ static void __local_bh_enable(unsigned int cnt)
 		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
 
 	if (softirq_count() == (cnt & SOFTIRQ_MASK))
-		lockdep_softirqs_on(_RET_IP_);
+		trace_softirqs_on_caller(_RET_IP_);
 
 	__preempt_count_sub(cnt);
 }
@@ -368,7 +368,7 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
 	 * Are softirqs going to be turned on now:
 	 */
 	if (softirq_count() == SOFTIRQ_DISABLE_OFFSET)
-		lockdep_softirqs_on(ip);
+		trace_softirqs_on_caller(ip);
 	/*
 	 * Keep preemption disabled until we are done with
 	 * softirq processing:
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
index f493804..19cafdfb 100644
--- a/kernel/trace/trace_preemptirq.c
+++ b/kernel/trace/trace_preemptirq.c
@@ -19,6 +19,18 @@
 /* Per-cpu variable to prevent redundant calls when IRQs already off */
 static DEFINE_PER_CPU(int, tracing_irq_cpu);
 
+void trace_softirqs_on_caller(unsigned long ip)
+{
+	lockdep_softirqs_on(ip);
+	dept_enable_softirq(ip);
+}
+
+void trace_softirqs_off_caller(unsigned long ip)
+{
+	lockdep_softirqs_off(ip);
+	dept_disable_softirq(ip);
+}
+
 /*
  * Like trace_hardirqs_on() but without the lockdep invocation. This is
  * used in the low level entry code where the ordering vs. RCU is important
@@ -33,6 +45,7 @@ void trace_hardirqs_on_prepare(void)
 		tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1);
 		this_cpu_write(tracing_irq_cpu, 0);
 	}
+	dept_enable_hardirq(CALLER_ADDR0);
 }
 EXPORT_SYMBOL(trace_hardirqs_on_prepare);
 NOKPROBE_SYMBOL(trace_hardirqs_on_prepare);
@@ -45,6 +58,7 @@ void trace_hardirqs_on(void)
 		tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1);
 		this_cpu_write(tracing_irq_cpu, 0);
 	}
+	dept_enable_hardirq(CALLER_ADDR0);
 
 	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
 	lockdep_hardirqs_on(CALLER_ADDR0);
@@ -66,7 +80,7 @@ void trace_hardirqs_off_finish(void)
 		if (!in_nmi())
 			trace_irq_disable(CALLER_ADDR0, CALLER_ADDR1);
 	}
-
+	dept_disable_hardirq(CALLER_ADDR0);
 }
 EXPORT_SYMBOL(trace_hardirqs_off_finish);
 NOKPROBE_SYMBOL(trace_hardirqs_off_finish);
@@ -81,6 +95,7 @@ void trace_hardirqs_off(void)
 		if (!in_nmi())
 			trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
 	}
+	dept_disable_hardirq(CALLER_ADDR0);
 }
 EXPORT_SYMBOL(trace_hardirqs_off);
 NOKPROBE_SYMBOL(trace_hardirqs_off);
@@ -93,6 +108,7 @@ __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
 		tracer_hardirqs_on(CALLER_ADDR0, caller_addr);
 		this_cpu_write(tracing_irq_cpu, 0);
 	}
+	dept_enable_hardirq(CALLER_ADDR0);
 
 	lockdep_hardirqs_on_prepare(CALLER_ADDR0);
 	lockdep_hardirqs_on(CALLER_ADDR0);
@@ -110,6 +126,7 @@ __visible void trace_hardirqs_off_caller(unsigned long caller_addr)
 		if (!in_nmi())
 			trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
 	}
+	dept_disable_hardirq(CALLER_ADDR0);
 }
 EXPORT_SYMBOL(trace_hardirqs_off_caller);
 NOKPROBE_SYMBOL(trace_hardirqs_off_caller);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 14b89aa..309b275 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1233,6 +1233,26 @@ config DEBUG_PREEMPT
 
 menu "Lock Debugging (spinlocks, mutexes, etc...)"
 
+config DEPT
+	bool "Dependency tracking"
+	depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
+	select DEBUG_SPINLOCK
+	select DEBUG_MUTEXES
+	select DEBUG_RT_MUTEXES if RT_MUTEXES
+	select DEBUG_RWSEMS
+	select DEBUG_WW_MUTEX_SLOWPATH
+	select DEBUG_LOCK_ALLOC
+	select TRACE_IRQFLAGS
+	select STACKTRACE
+	select FRAME_POINTER if !MIPS && !PPC && !ARM && !S390 && !MICROBLAZE && !ARC && !X86
+	select KALLSYMS
+	select KALLSYMS_ALL
+	default n
+	help
+	  Check dependencies between wait and event and report it if
+	  deadlock possibility has been detected. Multiple reports are
+	  allowed if there are more than a single problem.
+
 config LOCK_DEBUGGING_SUPPORT
 	bool
 	depends on TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 03/16] dept: Embed Dept data in Lockdep
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
  2022-02-17 10:57 ` [PATCH 01/16] llist: Move llist_{head,node} definition to types.h Byungchul Park
  2022-02-17 10:57 ` [PATCH 02/16] dept: Implement Dept(Dependency Tracker) Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 10:57 ` [PATCH 04/16] dept: Apply Dept to spinlock Byungchul Park
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

Dept should work independently from Lockdep. However, there's no choise
but to rely on Lockdep code and its instances for now.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h       | 71 ++++++++++++++++++++++++++++++++++++++++---
 include/linux/lockdep_types.h |  3 ++
 kernel/locking/lockdep.c      | 12 ++++----
 3 files changed, 76 insertions(+), 10 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 467b942..c56f6b6 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -20,6 +20,33 @@
 extern int prove_locking;
 extern int lock_stat;
 
+#ifdef CONFIG_DEPT
+static inline void dept_after_copy_map(struct dept_map *to,
+				       struct dept_map *from)
+{
+	int i;
+
+	if (from->keys == &from->keys_local)
+		to->keys = &to->keys_local;
+
+	if (!to->keys)
+		return;
+
+	/*
+	 * Since the class cache can be modified concurrently we could observe
+	 * half pointers (64bit arch using 32bit copy insns). Therefore clear
+	 * the caches and take the performance hit.
+	 *
+	 * XXX it doesn't work well with lockdep_set_class_and_subclass(), since
+	 *     that relies on cache abuse.
+	 */
+	for (i = 0; i < DEPT_MAX_SUBCLASSES_CACHE; i++)
+		to->keys->classes[i] = NULL;
+}
+#else
+#define dept_after_copy_map(t, f)	do { } while (0)
+#endif
+
 #ifdef CONFIG_LOCKDEP
 
 #include <linux/linkage.h>
@@ -43,6 +70,8 @@ static inline void lockdep_copy_map(struct lockdep_map *to,
 	 */
 	for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++)
 		to->class_cache[i] = NULL;
+
+	dept_after_copy_map(&to->dmap, &from->dmap);
 }
 
 /*
@@ -176,8 +205,19 @@ struct held_lock {
 	current->lockdep_recursion -= LOCKDEP_OFF;	\
 } while (0)
 
-extern void lockdep_register_key(struct lock_class_key *key);
-extern void lockdep_unregister_key(struct lock_class_key *key);
+extern void __lockdep_register_key(struct lock_class_key *key);
+extern void __lockdep_unregister_key(struct lock_class_key *key);
+
+#define lockdep_register_key(k)				\
+do {							\
+	__lockdep_register_key(k);			\
+	dept_key_init(&(k)->dkey);			\
+} while (0)
+#define lockdep_unregister_key(k)			\
+do {							\
+	__lockdep_unregister_key(k);			\
+	dept_key_destroy(&(k)->dkey);			\
+} while (0)
 
 /*
  * These methods are used by specific locking variants (spinlocks,
@@ -185,9 +225,18 @@ struct held_lock {
  * to lockdep:
  */
 
-extern void lockdep_init_map_type(struct lockdep_map *lock, const char *name,
+extern void __lockdep_init_map_type(struct lockdep_map *lock, const char *name,
 	struct lock_class_key *key, int subclass, u8 inner, u8 outer, u8 lock_type);
 
+#define lockdep_init_map_type(l, n, k, s, i, o, t)		\
+do {								\
+	__lockdep_init_map_type(l, n, k, s, i, o, t);		\
+	if ((k) == &__lockdep_no_validate__)			\
+		dept_map_nocheck(&(l)->dmap);			\
+	else							\
+		dept_map_init(&(l)->dmap, &(k)->dkey, s, n);	\
+} while (0)
+
 static inline void
 lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
 		       struct lock_class_key *key, int subclass, u8 inner, u8 outer)
@@ -431,13 +480,27 @@ enum xhlock_context_t {
 	XHLOCK_CTX_NR,
 };
 
+#ifdef CONFIG_DEPT
+/*
+ * TODO: I found the case to use an address of other than a real key as
+ * _key, for instance, in workqueue. So for now, we cannot use the
+ * assignment like '.dmap.keys = &(_key)->dkey' unless it's fixed.
+ */
+#define STATIC_DEPT_MAP_INIT(_name, _key) .dmap = {		\
+	.name = (_name),					\
+	.keys = NULL },
+#else
+#define STATIC_DEPT_MAP_INIT(_name, _key)
+#endif
+
 #define lockdep_init_map_crosslock(m, n, k, s) do {} while (0)
 /*
  * To initialize a lockdep_map statically use this macro.
  * Note that _name must not be NULL.
  */
 #define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
-	{ .name = (_name), .key = (void *)(_key), }
+	{ .name = (_name), .key = (void *)(_key), \
+	STATIC_DEPT_MAP_INIT(_name, _key) }
 
 static inline void lockdep_invariant_state(bool force) {}
 static inline void lockdep_free_task(struct task_struct *task) {}
diff --git a/include/linux/lockdep_types.h b/include/linux/lockdep_types.h
index d224308..50c8879 100644
--- a/include/linux/lockdep_types.h
+++ b/include/linux/lockdep_types.h
@@ -11,6 +11,7 @@
 #define __LINUX_LOCKDEP_TYPES_H
 
 #include <linux/types.h>
+#include <linux/dept.h>
 
 #define MAX_LOCKDEP_SUBCLASSES		8UL
 
@@ -76,6 +77,7 @@ struct lock_class_key {
 		struct hlist_node		hash_entry;
 		struct lockdep_subclass_key	subkeys[MAX_LOCKDEP_SUBCLASSES];
 	};
+	struct dept_key				dkey;
 };
 
 extern struct lock_class_key __lockdep_no_validate__;
@@ -185,6 +187,7 @@ struct lockdep_map {
 	int				cpu;
 	unsigned long			ip;
 #endif
+	struct dept_map			dmap;
 };
 
 struct pin_cookie { unsigned int val; };
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 4a882f8..a85468d 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1184,7 +1184,7 @@ static inline struct hlist_head *keyhashentry(const struct lock_class_key *key)
 }
 
 /* Register a dynamically allocated key. */
-void lockdep_register_key(struct lock_class_key *key)
+void __lockdep_register_key(struct lock_class_key *key)
 {
 	struct hlist_head *hash_head;
 	struct lock_class_key *k;
@@ -1207,7 +1207,7 @@ void lockdep_register_key(struct lock_class_key *key)
 restore_irqs:
 	raw_local_irq_restore(flags);
 }
-EXPORT_SYMBOL_GPL(lockdep_register_key);
+EXPORT_SYMBOL_GPL(__lockdep_register_key);
 
 /* Check whether a key has been registered as a dynamic key. */
 static bool is_dynamic_key(const struct lock_class_key *key)
@@ -4771,7 +4771,7 @@ static inline int check_wait_context(struct task_struct *curr,
 /*
  * Initialize a lock instance's lock-class mapping info:
  */
-void lockdep_init_map_type(struct lockdep_map *lock, const char *name,
+void __lockdep_init_map_type(struct lockdep_map *lock, const char *name,
 			    struct lock_class_key *key, int subclass,
 			    u8 inner, u8 outer, u8 lock_type)
 {
@@ -4831,7 +4831,7 @@ void lockdep_init_map_type(struct lockdep_map *lock, const char *name,
 		raw_local_irq_restore(flags);
 	}
 }
-EXPORT_SYMBOL_GPL(lockdep_init_map_type);
+EXPORT_SYMBOL_GPL(__lockdep_init_map_type);
 
 struct lock_class_key __lockdep_no_validate__;
 EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
@@ -6291,7 +6291,7 @@ void lockdep_reset_lock(struct lockdep_map *lock)
 }
 
 /* Unregister a dynamically allocated key. */
-void lockdep_unregister_key(struct lock_class_key *key)
+void __lockdep_unregister_key(struct lock_class_key *key)
 {
 	struct hlist_head *hash_head = keyhashentry(key);
 	struct lock_class_key *k;
@@ -6326,7 +6326,7 @@ void lockdep_unregister_key(struct lock_class_key *key)
 	/* Wait until is_dynamic_key() has finished accessing k->hash_entry. */
 	synchronize_rcu();
 }
-EXPORT_SYMBOL_GPL(lockdep_unregister_key);
+EXPORT_SYMBOL_GPL(__lockdep_unregister_key);
 
 void __init lockdep_init(void)
 {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 04/16] dept: Apply Dept to spinlock
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (2 preceding siblings ...)
  2022-02-17 10:57 ` [PATCH 03/16] dept: Embed Dept data in Lockdep Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 10:57 ` [PATCH 05/16] dept: Apply Dept to mutex families Byungchul Park
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

Makes Dept able to track dependencies by spinlock.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h            | 18 +++++++++++++++---
 include/linux/spinlock.h           | 24 ++++++++++++++++++++++++
 include/linux/spinlock_types_raw.h | 13 +++++++++++++
 3 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index c56f6b6..1da8b95 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -582,9 +582,21 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 #define lock_acquire_shared(l, s, t, n, i)		lock_acquire(l, s, t, 1, 1, n, i)
 #define lock_acquire_shared_recursive(l, s, t, n, i)	lock_acquire(l, s, t, 2, 1, n, i)
 
-#define spin_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
-#define spin_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
-#define spin_release(l, i)			lock_release(l, i)
+#define spin_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_spin_lock(&(l)->dmap, s, t, NULL, "spin_unlock", i);	\
+} while (0)
+#define spin_acquire_nest(l, s, t, n, i)				\
+do {									\
+	lock_acquire_exclusive(l, s, t, n, i);				\
+	dept_spin_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "spin_unlock", i); \
+} while (0)
+#define spin_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_spin_unlock(&(l)->dmap, i);				\
+} while (0)
 
 #define rwlock_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define rwlock_acquire_read(l, s, t, i)					\
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 5c0c517..eaffc9f 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -95,6 +95,30 @@
 # include <linux/spinlock_up.h>
 #endif
 
+#ifdef CONFIG_DEPT
+#define dept_spin_lock(m, ne, t, n, e_fn, ip)				\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+		dept_ask_event(m);					\
+	} else if (n) {							\
+		dept_warn_on(dept_top_map() != (n));			\
+	} else {							\
+		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+		dept_ask_event(m);					\
+	}								\
+} while (0)
+#define dept_spin_unlock(m, ip)						\
+do {									\
+	dept_event(m, 1UL, ip, __func__);				\
+	dept_ecxt_exit(m, ip);						\
+} while (0)
+#else
+#define dept_spin_lock(m, ne, t, n, e_fn, ip)	do { } while (0)
+#define dept_spin_unlock(m, ip)			do { } while (0)
+#endif
+
 #ifdef CONFIG_DEBUG_SPINLOCK
   extern void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
 				   struct lock_class_key *key, short inner);
diff --git a/include/linux/spinlock_types_raw.h b/include/linux/spinlock_types_raw.h
index 91cb36b..5a9b25d 100644
--- a/include/linux/spinlock_types_raw.h
+++ b/include/linux/spinlock_types_raw.h
@@ -26,16 +26,28 @@
 
 #define SPINLOCK_OWNER_INIT	((void *)-1L)
 
+#ifdef CONFIG_DEPT
+# define RAW_SPIN_DMAP_INIT(lockname)	.dmap = { .name = #lockname },
+# define SPIN_DMAP_INIT(lockname)	.dmap = { .name = #lockname },
+# define LOCAL_SPIN_DMAP_INIT(lockname)	.dmap = { .name = #lockname },
+#else
+# define RAW_SPIN_DMAP_INIT(lockname)
+# define SPIN_DMAP_INIT(lockname)
+# define LOCAL_SPIN_DMAP_INIT(lockname)
+#endif
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 # define RAW_SPIN_DEP_MAP_INIT(lockname)		\
 	.dep_map = {					\
 		.name = #lockname,			\
 		.wait_type_inner = LD_WAIT_SPIN,	\
+		RAW_SPIN_DMAP_INIT(lockname)		\
 	}
 # define SPIN_DEP_MAP_INIT(lockname)			\
 	.dep_map = {					\
 		.name = #lockname,			\
 		.wait_type_inner = LD_WAIT_CONFIG,	\
+		SPIN_DMAP_INIT(lockname)		\
 	}
 
 # define LOCAL_SPIN_DEP_MAP_INIT(lockname)		\
@@ -43,6 +55,7 @@
 		.name = #lockname,			\
 		.wait_type_inner = LD_WAIT_CONFIG,	\
 		.lock_type = LD_LOCK_PERCPU,		\
+		LOCAL_SPIN_DMAP_INIT(lockname)		\
 	}
 #else
 # define RAW_SPIN_DEP_MAP_INIT(lockname)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 05/16] dept: Apply Dept to mutex families
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (3 preceding siblings ...)
  2022-02-17 10:57 ` [PATCH 04/16] dept: Apply Dept to spinlock Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 10:57 ` [PATCH 06/16] dept: Apply Dept to rwlock Byungchul Park
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

Makes Dept able to track dependencies by mutex families.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h | 18 +++++++++++++++---
 include/linux/mutex.h   | 31 +++++++++++++++++++++++++++++++
 include/linux/rtmutex.h |  7 +++++++
 3 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 1da8b95..4c6c2a1 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -613,9 +613,21 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 #define seqcount_acquire_read(l, s, t, i)	lock_acquire_shared_recursive(l, s, t, NULL, i)
 #define seqcount_release(l, i)			lock_release(l, i)
 
-#define mutex_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
-#define mutex_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
-#define mutex_release(l, i)			lock_release(l, i)
+#define mutex_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_mutex_lock(&(l)->dmap, s, t, NULL, "mutex_unlock", i);	\
+} while (0)
+#define mutex_acquire_nest(l, s, t, n, i)				\
+do {									\
+	lock_acquire_exclusive(l, s, t, n, i);				\
+	dept_mutex_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "mutex_unlock", i);\
+} while (0)
+#define mutex_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_mutex_unlock(&(l)->dmap, i);				\
+} while (0)
 
 #define rwsem_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define rwsem_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 8f226d4..536ef42 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -20,11 +20,18 @@
 #include <linux/osq_lock.h>
 #include <linux/debug_locks.h>
 
+#ifdef CONFIG_DEPT
+# define DMAP_MUTEX_INIT(lockname)	.dmap = { .name = #lockname },
+#else
+# define DMAP_MUTEX_INIT(lockname)
+#endif
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname)			\
 		, .dep_map = {					\
 			.name = #lockname,			\
 			.wait_type_inner = LD_WAIT_SLEEP,	\
+			DMAP_MUTEX_INIT(lockname)		\
 		}
 #else
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname)
@@ -75,6 +82,30 @@ struct mutex {
 #endif
 };
 
+#ifdef CONFIG_DEPT
+#define dept_mutex_lock(m, ne, t, n, e_fn, ip)				\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+		dept_ask_event(m);					\
+	} else if (n) {							\
+		dept_warn_on(dept_top_map() != (n));			\
+	} else {							\
+		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+		dept_ask_event(m);					\
+	}								\
+} while (0)
+#define dept_mutex_unlock(m, ip)					\
+do {									\
+	dept_event(m, 1UL, ip, __func__);				\
+	dept_ecxt_exit(m, ip);						\
+} while (0)
+#else
+#define dept_mutex_lock(m, ne, t, n, e_fn, ip)	do { } while (0)
+#define dept_mutex_unlock(m, ip)		do { } while (0)
+#endif
+
 #ifdef CONFIG_DEBUG_MUTEXES
 
 #define __DEBUG_MUTEX_INITIALIZER(lockname)				\
diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h
index 7d04988..60cebb0 100644
--- a/include/linux/rtmutex.h
+++ b/include/linux/rtmutex.h
@@ -76,11 +76,18 @@ static inline void rt_mutex_debug_task_free(struct task_struct *tsk) { }
 	__rt_mutex_init(mutex, __func__, &__key); \
 } while (0)
 
+#ifdef CONFIG_DEPT
+#define DMAP_RT_MUTEX_INIT(mutexname)	.dmap = { .name = #mutexname },
+#else
+#define DMAP_RT_MUTEX_INIT(mutexname)
+#endif
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 #define __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname)	\
 	.dep_map = {					\
 		.name = #mutexname,			\
 		.wait_type_inner = LD_WAIT_SLEEP,	\
+		DMAP_RT_MUTEX_INIT(mutexname)		\
 	}
 #else
 #define __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 06/16] dept: Apply Dept to rwlock
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (4 preceding siblings ...)
  2022-02-17 10:57 ` [PATCH 05/16] dept: Apply Dept to mutex families Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 10:57 ` [PATCH 07/16] dept: Apply Dept to wait_for_completion()/complete() Byungchul Park
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

Makes Dept able to track dependencies by rwlock.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h        | 25 +++++++++++++++++-----
 include/linux/rwlock.h         | 48 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/rwlock_api_smp.h |  8 +++----
 include/linux/rwlock_types.h   |  7 ++++++
 4 files changed, 79 insertions(+), 9 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 4c6c2a1..306c22d 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -598,16 +598,31 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 	dept_spin_unlock(&(l)->dmap, i);				\
 } while (0)
 
-#define rwlock_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
+#define rwlock_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_rwlock_wlock(&(l)->dmap, s, t, NULL, "write_unlock", i);	\
+} while (0)
 #define rwlock_acquire_read(l, s, t, i)					\
 do {									\
-	if (read_lock_is_recursive())					\
+	if (read_lock_is_recursive()) {				\
 		lock_acquire_shared_recursive(l, s, t, NULL, i);	\
-	else								\
+		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 0);\
+	} else {							\
 		lock_acquire_shared(l, s, t, NULL, i);			\
+		dept_rwlock_rlock(&(l)->dmap, s, t, NULL, "read_unlock", i, 1);\
+	}								\
+} while (0)
+#define rwlock_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_rwlock_wunlock(&(l)->dmap, i);				\
+} while (0)
+#define rwlock_release_read(l, i)					\
+do {									\
+	lock_release(l, i);						\
+	dept_rwlock_runlock(&(l)->dmap, i);				\
 } while (0)
-
-#define rwlock_release(l, i)			lock_release(l, i)
 
 #define seqcount_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
 #define seqcount_acquire_read(l, s, t, i)	lock_acquire_shared_recursive(l, s, t, NULL, i)
diff --git a/include/linux/rwlock.h b/include/linux/rwlock.h
index 8f416c5..0cc75bc 100644
--- a/include/linux/rwlock.h
+++ b/include/linux/rwlock.h
@@ -28,6 +28,54 @@
 	do { *(lock) = __RW_LOCK_UNLOCKED(lock); } while (0)
 #endif
 
+#ifdef CONFIG_DEPT
+#define DEPT_EVT_RWLOCK_R		1UL
+#define DEPT_EVT_RWLOCK_W		(1UL << 1)
+#define DEPT_EVT_RWLOCK_RW		(DEPT_EVT_RWLOCK_R | DEPT_EVT_RWLOCK_W)
+
+#define dept_rwlock_wlock(m, ne, t, n, e_fn, ip)			\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_W, ip, __func__, e_fn, ne);\
+		dept_ask_event(m);					\
+	} else if (n) {							\
+		dept_warn_on(dept_top_map() != (n));			\
+	} else {							\
+		dept_wait(m, DEPT_EVT_RWLOCK_RW, ip, __func__, ne);	\
+		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_W, ip, __func__, e_fn, ne);\
+		dept_ask_event(m);					\
+	}								\
+} while (0)
+#define dept_rwlock_rlock(m, ne, t, n, e_fn, ip, q)			\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_R, ip, __func__, e_fn, ne);\
+		dept_ask_event(m);					\
+	} else if (n) {							\
+		dept_warn_on(dept_top_map() != (n));			\
+	} else {							\
+		dept_wait(m, (q) ? DEPT_EVT_RWLOCK_RW : DEPT_EVT_RWLOCK_W, ip, __func__, ne);\
+		dept_ecxt_enter(m, DEPT_EVT_RWLOCK_R, ip, __func__, e_fn, ne);\
+		dept_ask_event(m);					\
+	}								\
+} while (0)
+#define dept_rwlock_wunlock(m, ip)					\
+do {									\
+	dept_event(m, DEPT_EVT_RWLOCK_W, ip, __func__);			\
+	dept_ecxt_exit(m, ip);						\
+} while (0)
+#define dept_rwlock_runlock(m, ip)					\
+do {									\
+	dept_event(m, DEPT_EVT_RWLOCK_R, ip, __func__);			\
+	dept_ecxt_exit(m, ip);						\
+} while (0)
+#else
+#define dept_rwlock_wlock(m, ne, t, n, e_fn, ip)	do { } while (0)
+#define dept_rwlock_rlock(m, ne, t, n, e_fn, ip, q)	do { } while (0)
+#define dept_rwlock_wunlock(m, ip)			do { } while (0)
+#define dept_rwlock_runlock(m, ip)			do { } while (0)
+#endif
+
 #ifdef CONFIG_DEBUG_SPINLOCK
  extern void do_raw_read_lock(rwlock_t *lock) __acquires(lock);
  extern int do_raw_read_trylock(rwlock_t *lock);
diff --git a/include/linux/rwlock_api_smp.h b/include/linux/rwlock_api_smp.h
index dceb0a5..a222cf1 100644
--- a/include/linux/rwlock_api_smp.h
+++ b/include/linux/rwlock_api_smp.h
@@ -228,7 +228,7 @@ static inline void __raw_write_unlock(rwlock_t *lock)
 
 static inline void __raw_read_unlock(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, _RET_IP_);
+	rwlock_release_read(&lock->dep_map, _RET_IP_);
 	do_raw_read_unlock(lock);
 	preempt_enable();
 }
@@ -236,7 +236,7 @@ static inline void __raw_read_unlock(rwlock_t *lock)
 static inline void
 __raw_read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
-	rwlock_release(&lock->dep_map, _RET_IP_);
+	rwlock_release_read(&lock->dep_map, _RET_IP_);
 	do_raw_read_unlock(lock);
 	local_irq_restore(flags);
 	preempt_enable();
@@ -244,7 +244,7 @@ static inline void __raw_read_unlock(rwlock_t *lock)
 
 static inline void __raw_read_unlock_irq(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, _RET_IP_);
+	rwlock_release_read(&lock->dep_map, _RET_IP_);
 	do_raw_read_unlock(lock);
 	local_irq_enable();
 	preempt_enable();
@@ -252,7 +252,7 @@ static inline void __raw_read_unlock_irq(rwlock_t *lock)
 
 static inline void __raw_read_unlock_bh(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, _RET_IP_);
+	rwlock_release_read(&lock->dep_map, _RET_IP_);
 	do_raw_read_unlock(lock);
 	__local_bh_enable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
 }
diff --git a/include/linux/rwlock_types.h b/include/linux/rwlock_types.h
index 1948442..6dddc5b 100644
--- a/include/linux/rwlock_types.h
+++ b/include/linux/rwlock_types.h
@@ -5,11 +5,18 @@
 # error "Do not include directly, include spinlock_types.h"
 #endif
 
+#ifdef CONFIG_DEPT
+# define RW_DMAP_INIT(lockname) .dmap = { .name = #lockname },
+#else
+# define RW_DMAP_INIT(lockname)
+#endif
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 # define RW_DEP_MAP_INIT(lockname)					\
 	.dep_map = {							\
 		.name = #lockname,					\
 		.wait_type_inner = LD_WAIT_CONFIG,			\
+		RW_DMAP_INIT(lockname)					\
 	}
 #else
 # define RW_DEP_MAP_INIT(lockname)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 07/16] dept: Apply Dept to wait_for_completion()/complete()
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (5 preceding siblings ...)
  2022-02-17 10:57 ` [PATCH 06/16] dept: Apply Dept to rwlock Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 19:46   ` kernel test robot
  2022-02-17 10:57 ` [PATCH 08/16] dept: Apply Dept to seqlock Byungchul Park
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

Makes Dept able to track dependencies by
wait_for_completion()/complete().

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/completion.h | 42 ++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/completion.c  | 12 ++++++++++--
 2 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 51d9ab0..0beaa17 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -26,14 +26,48 @@
 struct completion {
 	unsigned int done;
 	struct swait_queue_head wait;
+	struct dept_map dmap;
 };
 
+#ifdef CONFIG_DEPT
+#define dept_wfc_init(m, k, s, n)		dept_map_init(m, k, s, n)
+#define dept_wfc_reinit(m)			dept_map_reinit(m)
+#define dept_wfc_wait(m, ip)						\
+do {									\
+	dept_ask_event(m);						\
+	dept_wait(m, 1UL, ip, __func__, 0);				\
+} while (0)
+#define dept_wfc_complete(m, ip)		dept_event(m, 1UL, ip, __func__)
+#define dept_wfc_enter(m, ip)			dept_ecxt_enter(m, 1UL, ip, "completion_context_enter", "complete", 0)
+#define dept_wfc_exit(m, ip)			dept_ecxt_exit(m, ip)
+#else
+#define dept_wfc_init(m, k, s, n)		do { (void)(n); (void)(k); } while (0)
+#define dept_wfc_reinit(m)			do { } while (0)
+#define dept_wfc_wait(m, ip)			do { } while (0)
+#define dept_wfc_complete(m, ip)		do { } while (0)
+#define dept_wfc_enter(m, ip)			do { } while (0)
+#define dept_wfc_exit(m, ip)			do { } while (0)
+#endif
+
+#ifdef CONFIG_DEPT
+#define WFC_DEPT_MAP_INIT(work) .dmap = { .name = #work }
+#else
+#define WFC_DEPT_MAP_INIT(work)
+#endif
+
+#define init_completion(x)					\
+	do {							\
+		static struct dept_key __dkey;			\
+		__init_completion(x, &__dkey, #x);		\
+	} while (0)
+
 #define init_completion_map(x, m) init_completion(x)
 static inline void complete_acquire(struct completion *x) {}
 static inline void complete_release(struct completion *x) {}
 
 #define COMPLETION_INITIALIZER(work) \
-	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
+	WFC_DEPT_MAP_INIT(work) }
 
 #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
 	(*({ init_completion_map(&(work), &(map)); &(work); }))
@@ -81,9 +115,12 @@ static inline void complete_release(struct completion *x) {}
  * This inline function will initialize a dynamically created completion
  * structure.
  */
-static inline void init_completion(struct completion *x)
+static inline void __init_completion(struct completion *x,
+				     struct dept_key *dkey,
+				     const char *name)
 {
 	x->done = 0;
+	dept_wfc_init(&x->dmap, dkey, 0, name);
 	init_swait_queue_head(&x->wait);
 }
 
@@ -97,6 +134,7 @@ static inline void init_completion(struct completion *x)
 static inline void reinit_completion(struct completion *x)
 {
 	x->done = 0;
+	dept_wfc_reinit(&x->dmap);
 }
 
 extern void wait_for_completion(struct completion *);
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index a778554..6e31cc0 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -29,6 +29,7 @@ void complete(struct completion *x)
 {
 	unsigned long flags;
 
+	dept_wfc_complete(&x->dmap, _RET_IP_);
 	raw_spin_lock_irqsave(&x->wait.lock, flags);
 
 	if (x->done != UINT_MAX)
@@ -58,6 +59,7 @@ void complete_all(struct completion *x)
 {
 	unsigned long flags;
 
+	dept_wfc_complete(&x->dmap, _RET_IP_);
 	lockdep_assert_RT_in_threaded_ctx();
 
 	raw_spin_lock_irqsave(&x->wait.lock, flags);
@@ -112,17 +114,23 @@ void complete_all(struct completion *x)
 }
 
 static long __sched
-wait_for_common(struct completion *x, long timeout, int state)
+_wait_for_common(struct completion *x, long timeout, int state)
 {
 	return __wait_for_common(x, schedule_timeout, timeout, state);
 }
 
 static long __sched
-wait_for_common_io(struct completion *x, long timeout, int state)
+_wait_for_common_io(struct completion *x, long timeout, int state)
 {
 	return __wait_for_common(x, io_schedule_timeout, timeout, state);
 }
 
+#define wait_for_common(x, t, s)					\
+({ dept_wfc_wait(&(x)->dmap, _RET_IP_); _wait_for_common(x, t, s); })
+
+#define wait_for_common_io(x, t, s)					\
+({ dept_wfc_wait(&(x)->dmap, _RET_IP_); _wait_for_common_io(x, t, s); })
+
 /**
  * wait_for_completion: - waits for completion of a task
  * @x:  holds the state of this particular completion
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 08/16] dept: Apply Dept to seqlock
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (6 preceding siblings ...)
  2022-02-17 10:57 ` [PATCH 07/16] dept: Apply Dept to wait_for_completion()/complete() Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 10:57 ` [PATCH 09/16] dept: Apply Dept to rwsem Byungchul Park
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

Makes Dept able to track dependencies by seqlock with adding wait
annotation on read side of seqlock.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/seqlock.h | 59 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 37ded6b..6e8ecd7 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -23,6 +23,25 @@
 
 #include <asm/processor.h>
 
+#ifdef CONFIG_DEPT
+#define DEPT_EVT_ALL		((1UL << DEPT_MAX_SUBCLASSES_EVT) - 1)
+#define dept_seq_wait(m, ip)	dept_wait(m, DEPT_EVT_ALL, ip, __func__, 0)
+#define dept_seq_writebegin(m, ip)				\
+do {								\
+	dept_ecxt_enter(m, 1UL, ip, __func__, "write_seqcount_end", 0);\
+	dept_ask_event(m);					\
+} while (0)
+#define dept_seq_writeend(m, ip)				\
+do {								\
+	dept_event(m, 1UL, ip, __func__);			\
+	dept_ecxt_exit(m, ip);					\
+} while (0)
+#else
+#define dept_seq_wait(m, ip)		do { } while (0)
+#define dept_seq_writebegin(m, ip)	do { } while (0)
+#define dept_seq_writeend(m, ip)	do { } while (0)
+#endif
+
 /*
  * The seqlock seqcount_t interface does not prescribe a precise sequence of
  * read begin/retry/end. For readers, typically there is a call to
@@ -148,7 +167,7 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
  * This lock-unlock technique must be implemented for all of PREEMPT_RT
  * sleeping locks.  See Documentation/locking/locktypes.rst
  */
-#if defined(CONFIG_LOCKDEP) || defined(CONFIG_PREEMPT_RT)
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_DEPT) || defined(CONFIG_PREEMPT_RT)
 #define __SEQ_LOCK(expr)	expr
 #else
 #define __SEQ_LOCK(expr)
@@ -203,6 +222,22 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
 	__SEQ_LOCK(locktype	*lock);					\
 } seqcount_##lockname##_t;						\
 									\
+static __always_inline void						\
+__seqprop_##lockname##_wait(const seqcount_##lockname##_t *s)		\
+{									\
+	__SEQ_LOCK(dept_seq_wait(&(lockmember)->dep_map.dmap, _RET_IP_));\
+}									\
+									\
+static __always_inline void						\
+__seqprop_##lockname##_writebegin(const seqcount_##lockname##_t *s)	\
+{									\
+}									\
+									\
+static __always_inline void						\
+__seqprop_##lockname##_writeend(const seqcount_##lockname##_t *s)	\
+{									\
+}									\
+									\
 static __always_inline seqcount_t *					\
 __seqprop_##lockname##_ptr(seqcount_##lockname##_t *s)			\
 {									\
@@ -271,6 +306,21 @@ static inline void __seqprop_assert(const seqcount_t *s)
 	lockdep_assert_preemption_disabled();
 }
 
+static inline void __seqprop_wait(seqcount_t *s)
+{
+	dept_seq_wait(&s->dep_map.dmap, _RET_IP_);
+}
+
+static inline void __seqprop_writebegin(seqcount_t *s)
+{
+	dept_seq_writebegin(&s->dep_map.dmap, _RET_IP_);
+}
+
+static inline void __seqprop_writeend(seqcount_t *s)
+{
+	dept_seq_writeend(&s->dep_map.dmap, _RET_IP_);
+}
+
 #define __SEQ_RT	IS_ENABLED(CONFIG_PREEMPT_RT)
 
 SEQCOUNT_LOCKNAME(raw_spinlock, raw_spinlock_t,  false,    s->lock,        raw_spin, raw_spin_lock(s->lock))
@@ -311,6 +361,9 @@ static inline void __seqprop_assert(const seqcount_t *s)
 #define seqprop_sequence(s)		__seqprop(s, sequence)
 #define seqprop_preemptible(s)		__seqprop(s, preemptible)
 #define seqprop_assert(s)		__seqprop(s, assert)
+#define seqprop_dept_wait(s)		__seqprop(s, wait)
+#define seqprop_dept_writebegin(s)	__seqprop(s, writebegin)
+#define seqprop_dept_writeend(s)	__seqprop(s, writeend)
 
 /**
  * __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
@@ -360,6 +413,7 @@ static inline void __seqprop_assert(const seqcount_t *s)
 #define read_seqcount_begin(s)						\
 ({									\
 	seqcount_lockdep_reader_access(seqprop_ptr(s));			\
+	seqprop_dept_wait(s);						\
 	raw_read_seqcount_begin(s);					\
 })
 
@@ -512,6 +566,7 @@ static inline void do_raw_write_seqcount_end(seqcount_t *s)
 		preempt_disable();					\
 									\
 	do_write_seqcount_begin_nested(seqprop_ptr(s), subclass);	\
+	seqprop_dept_writebegin(s);					\
 } while (0)
 
 static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
@@ -538,6 +593,7 @@ static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
 		preempt_disable();					\
 									\
 	do_write_seqcount_begin(seqprop_ptr(s));			\
+	seqprop_dept_writebegin(s);					\
 } while (0)
 
 static inline void do_write_seqcount_begin(seqcount_t *s)
@@ -554,6 +610,7 @@ static inline void do_write_seqcount_begin(seqcount_t *s)
  */
 #define write_seqcount_end(s)						\
 do {									\
+	seqprop_dept_writeend(s);					\
 	do_write_seqcount_end(seqprop_ptr(s));				\
 									\
 	if (seqprop_preemptible(s))					\
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 09/16] dept: Apply Dept to rwsem
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (7 preceding siblings ...)
  2022-02-17 10:57 ` [PATCH 08/16] dept: Apply Dept to seqlock Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 10:57 ` [PATCH 10/16] dept: Add proc knobs to show stats and dependency graph Byungchul Park
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

Makes Dept able to track dependencies by rwsem.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/lockdep.h      | 24 ++++++++++++++++++++----
 include/linux/percpu-rwsem.h | 10 +++++++++-
 include/linux/rwsem.h        | 31 +++++++++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 306c22d..6aab26c 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -644,10 +644,26 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 	dept_mutex_unlock(&(l)->dmap, i);				\
 } while (0)
 
-#define rwsem_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
-#define rwsem_acquire_nest(l, s, t, n, i)	lock_acquire_exclusive(l, s, t, n, i)
-#define rwsem_acquire_read(l, s, t, i)		lock_acquire_shared(l, s, t, NULL, i)
-#define rwsem_release(l, i)			lock_release(l, i)
+#define rwsem_acquire(l, s, t, i)					\
+do {									\
+	lock_acquire_exclusive(l, s, t, NULL, i);			\
+	dept_rwsem_lock(&(l)->dmap, s, t, NULL, "up_write", i);		\
+} while (0)
+#define rwsem_acquire_nest(l, s, t, n, i)				\
+do {									\
+	lock_acquire_exclusive(l, s, t, n, i);				\
+	dept_rwsem_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "up_write", i);\
+} while (0)
+#define rwsem_acquire_read(l, s, t, i)					\
+do {									\
+	lock_acquire_shared(l, s, t, NULL, i);				\
+	dept_rwsem_lock(&(l)->dmap, s, t, NULL, "up_read", i);		\
+} while (0)
+#define rwsem_release(l, i)						\
+do {									\
+	lock_release(l, i);						\
+	dept_rwsem_unlock(&(l)->dmap, i);				\
+} while (0)
 
 #define lock_map_acquire(l)			lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_)
 #define lock_map_acquire_read(l)		lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_)
diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 5fda40f..7ec5625 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -20,8 +20,16 @@ struct percpu_rw_semaphore {
 #endif
 };
 
+#ifdef CONFIG_DEPT
+#define __PERCPU_RWSEM_DMAP_INIT(lockname) .dmap = { .name = #lockname }
+#else
+#define __PERCPU_RWSEM_DMAP_INIT(lockname)
+#endif
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname },
+#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname)	.dep_map = {	\
+	.name = #lockname,					\
+	__PERCPU_RWSEM_DMAP_INIT(lockname) },
 #else
 #define __PERCPU_RWSEM_DEP_MAP_INIT(lockname)
 #endif
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index f934876..1011eca 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -16,11 +16,18 @@
 #include <linux/atomic.h>
 #include <linux/err.h>
 
+#ifdef CONFIG_DEPT
+# define RWSEM_DMAP_INIT(lockname)	.dmap = { .name = #lockname },
+#else
+# define RWSEM_DMAP_INIT(lockname)
+#endif
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 # define __RWSEM_DEP_MAP_INIT(lockname)			\
 	.dep_map = {					\
 		.name = #lockname,			\
 		.wait_type_inner = LD_WAIT_SLEEP,	\
+		RWSEM_DMAP_INIT(lockname)		\
 	},
 #else
 # define __RWSEM_DEP_MAP_INIT(lockname)
@@ -32,6 +39,30 @@
 #include <linux/osq_lock.h>
 #endif
 
+#ifdef CONFIG_DEPT
+#define dept_rwsem_lock(m, ne, t, n, e_fn, ip)				\
+do {									\
+	if (t) {							\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+		dept_ask_event(m);					\
+	} else if (n) {							\
+		dept_warn_on(dept_top_map() != (n));			\
+	} else {							\
+		dept_wait(m, 1UL, ip, __func__, ne);			\
+		dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne);	\
+		dept_ask_event(m);					\
+	}								\
+} while (0)
+#define dept_rwsem_unlock(m, ip)					\
+do {									\
+	dept_event(m, 1UL, ip, __func__);				\
+	dept_ecxt_exit(m, ip);						\
+} while (0)
+#else
+#define dept_rwsem_lock(m, ne, t, n, e_fn, ip)	do { } while (0)
+#define dept_rwsem_unlock(m, ip)		do { } while (0)
+#endif
+
 /*
  * For an uncontended rwsem, count and owner are the only fields a task
  * needs to touch when acquiring the rwsem. So they are put next to each
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 10/16] dept: Add proc knobs to show stats and dependency graph
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (8 preceding siblings ...)
  2022-02-17 10:57 ` [PATCH 09/16] dept: Apply Dept to rwsem Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 15:55   ` Steven Rostedt
  2022-02-17 10:57 ` [PATCH 11/16] dept: Introduce split map concept and new APIs for them Byungchul Park
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

It'd be useful to show Dept internal stats and dependency graph on
runtime via proc for better information. Introduced the knobs.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/dependency/Makefile        |  1 +
 kernel/dependency/dept.c          | 24 ++++------
 kernel/dependency/dept_internal.h | 26 +++++++++++
 kernel/dependency/dept_proc.c     | 93 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 129 insertions(+), 15 deletions(-)
 create mode 100644 kernel/dependency/dept_internal.h
 create mode 100644 kernel/dependency/dept_proc.c

diff --git a/kernel/dependency/Makefile b/kernel/dependency/Makefile
index 9f7778e..49152ad 100644
--- a/kernel/dependency/Makefile
+++ b/kernel/dependency/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_DEPT) += dept.o
+obj-$(CONFIG_DEPT) += dept_proc.o
 
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 4a3ab39..b8d56075 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -90,6 +90,7 @@
 #include <linux/hash.h>
 #include <linux/dept.h>
 #include <linux/utsname.h>
+#include "dept_internal.h"
 
 static int dept_stop;
 static int dept_per_cpu_ready;
@@ -250,20 +251,13 @@ static inline struct dept_task *dept_task(void)
  *       have been freed will be placed.
  */
 
-enum object_t {
-#define OBJECT(id, nr) OBJECT_##id,
-	#include "dept_object.h"
-#undef  OBJECT
-	OBJECT_NR,
-};
-
 #define OBJECT(id, nr)							\
 static struct dept_##id spool_##id[nr];					\
 static DEFINE_PER_CPU(struct llist_head, lpool_##id);
 	#include "dept_object.h"
 #undef  OBJECT
 
-static struct dept_pool pool[OBJECT_NR] = {
+struct dept_pool dept_pool[OBJECT_NR] = {
 #define OBJECT(id, nr) {						\
 	.name = #id,							\
 	.obj_sz = sizeof(struct dept_##id),				\
@@ -292,7 +286,7 @@ static void *from_pool(enum object_t t)
 	if (DEPT_WARN_ON(!irqs_disabled()))
 		return NULL;
 
-	p = &pool[t];
+	p = &dept_pool[t];
 
 	/*
 	 * Try local pool first.
@@ -321,7 +315,7 @@ static void *from_pool(enum object_t t)
 
 static void to_pool(void *o, enum object_t t)
 {
-	struct dept_pool *p = &pool[t];
+	struct dept_pool *p = &dept_pool[t];
 	struct llist_head *h;
 
 	preempt_disable();
@@ -1996,7 +1990,7 @@ void dept_map_nocheck(struct dept_map *m)
 }
 EXPORT_SYMBOL_GPL(dept_map_nocheck);
 
-static LIST_HEAD(classes);
+LIST_HEAD(dept_classes);
 
 static inline bool within(const void *addr, void *start, unsigned long size)
 {
@@ -2023,7 +2017,7 @@ void dept_free_range(void *start, unsigned int sz)
 	while (unlikely(!dept_lock()))
 		cpu_relax();
 
-	list_for_each_entry_safe(c, n, &classes, all_node) {
+	list_for_each_entry_safe(c, n, &dept_classes, all_node) {
 		if (!within((void *)c->key, start, sz) &&
 		    !within(c->name, start, sz))
 			continue;
@@ -2092,7 +2086,7 @@ static struct dept_class *check_new_class(struct dept_key *local,
 	c->sub = sub;
 	c->key = (unsigned long)(k->subkeys + sub);
 	hash_add_class(c);
-	list_add(&c->all_node, &classes);
+	list_add(&c->all_node, &dept_classes);
 unlock:
 	dept_unlock();
 caching:
@@ -2537,8 +2531,8 @@ static void migrate_per_cpu_pool(void)
 		struct llist_head *from;
 		struct llist_head *to;
 
-		from = &pool[i].boot_pool;
-		to = per_cpu_ptr(pool[i].lpool, boot_cpu);
+		from = &dept_pool[i].boot_pool;
+		to = per_cpu_ptr(dept_pool[i].lpool, boot_cpu);
 		move_llist(to, from);
 	}
 }
diff --git a/kernel/dependency/dept_internal.h b/kernel/dependency/dept_internal.h
new file mode 100644
index 0000000..007c1ee
--- /dev/null
+++ b/kernel/dependency/dept_internal.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Dept(DEPendency Tracker) - runtime dependency tracker internal header
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __DEPT_INTERNAL_H
+#define __DEPT_INTERNAL_H
+
+#ifdef CONFIG_DEPT
+
+enum object_t {
+#define OBJECT(id, nr) OBJECT_##id,
+	#include "dept_object.h"
+#undef  OBJECT
+	OBJECT_NR,
+};
+
+extern struct list_head dept_classes;
+extern struct dept_pool dept_pool[];
+
+#endif
+#endif /* __DEPT_INTERNAL_H */
diff --git a/kernel/dependency/dept_proc.c b/kernel/dependency/dept_proc.c
new file mode 100644
index 0000000..465a91c
--- /dev/null
+++ b/kernel/dependency/dept_proc.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Procfs knobs for Dept(DEPendency Tracker)
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (C) 2021 LG Electronics, Inc. , Byungchul Park
+ */
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/dept.h>
+#include "dept_internal.h"
+
+static void *l_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	/*
+	 * XXX: Serialize list traversal if needed. The following might
+	 * give a wrong information on contention.
+	 */
+	return seq_list_next(v, &dept_classes, pos);
+}
+
+static void *l_start(struct seq_file *m, loff_t *pos)
+{
+	/*
+	 * XXX: Serialize list traversal if needed. The following might
+	 * give a wrong information on contention.
+	 */
+	return seq_list_start_head(&dept_classes, *pos);
+}
+
+static void l_stop(struct seq_file *m, void *v)
+{
+}
+
+static int l_show(struct seq_file *m, void *v)
+{
+	struct dept_class *fc = list_entry(v, struct dept_class, all_node);
+	struct dept_dep *d;
+
+	if (v == &dept_classes) {
+		seq_puts(m, "All classes:\n\n");
+		return 0;
+	}
+
+	seq_printf(m, "[%p] %s\n", (void *)fc->key, fc->name);
+
+	/*
+	 * XXX: Serialize list traversal if needed. The following might
+	 * give a wrong information on contention.
+	 */
+	list_for_each_entry(d, &fc->dep_head, dep_node) {
+		struct dept_class *tc = d->wait->class;
+
+		seq_printf(m, " -> [%p] %s\n", (void *)tc->key, tc->name);
+	}
+	seq_puts(m, "\n");
+
+	return 0;
+}
+
+static const struct seq_operations dept_deps_ops = {
+	.start	= l_start,
+	.next	= l_next,
+	.stop	= l_stop,
+	.show	= l_show,
+};
+
+static int dept_stats_show(struct seq_file *m, void *v)
+{
+	int r;
+
+	seq_puts(m, "Availability in the static pools:\n\n");
+#define OBJECT(id, nr)							\
+	r = atomic_read(&dept_pool[OBJECT_##id].obj_nr);		\
+	if (r < 0)							\
+		r = 0;							\
+	seq_printf(m, "%s\t%d/%d(%d%%)\n", #id, r, nr, (r * 100) / (nr));
+	#include "dept_object.h"
+#undef  OBJECT
+
+	return 0;
+}
+
+static int __init dept_proc_init(void)
+{
+	proc_create_seq("dept_deps", S_IRUSR, NULL, &dept_deps_ops);
+	proc_create_single("dept_stats", S_IRUSR, NULL, dept_stats_show);
+	return 0;
+}
+
+__initcall(dept_proc_init);
+
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 11/16] dept: Introduce split map concept and new APIs for them
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (9 preceding siblings ...)
  2022-02-17 10:57 ` [PATCH 10/16] dept: Add proc knobs to show stats and dependency graph Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 10:57 ` [PATCH 12/16] dept: Apply Dept to wait/event of PG_{locked,writeback} Byungchul Park
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

There is a case where all maps used for a type of wait/event is so large
in size. For instance, struct page can be a type for (un)lock_page().
The additional memory size for the maps would be 'the # of pages *
sizeof(struct dept_map)' if each struct page keeps its map all the way,
which might be too big to accept in some system.

It'd better to have split map, one is for each instance and the other
is for the type which is commonly used, and new APIs using them. So
introduced split map and new APIs for them.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/dept.h     |  36 ++++++++++++++
 kernel/dependency/dept.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 158 insertions(+)

diff --git a/include/linux/dept.h b/include/linux/dept.h
index 2ac4bca..531065a 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -351,6 +351,30 @@ struct dept_map {
 	bool nocheck;
 };
 
+struct dept_map_each {
+	/*
+	 * wait timestamp associated to this map
+	 */
+	unsigned int wgen;
+};
+
+struct dept_map_common {
+	const char *name;
+	struct dept_key *keys;
+	int sub_usr;
+
+	/*
+	 * It's local copy for fast acces to the associated classes. And
+	 * Also used for dept_key instance for statically defined map.
+	 */
+	struct dept_key keys_local;
+
+	/*
+	 * whether this map should be going to be checked or not
+	 */
+	bool nocheck;
+};
+
 struct dept_task {
 	/*
 	 * all event contexts that have entered and before exiting
@@ -441,6 +465,11 @@ struct dept_task {
 extern void dept_ecxt_exit(struct dept_map *m, unsigned long ip);
 extern struct dept_map *dept_top_map(void);
 extern void dept_warn_on(bool cond);
+extern void dept_split_map_each_init(struct dept_map_each *me);
+extern void dept_split_map_common_init(struct dept_map_common *mc, struct dept_key *k, const char *n);
+extern void dept_wait_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *w_fn, int ne);
+extern void dept_event_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *e_fn);
+extern void dept_ask_event_split_map(struct dept_map_each *me, struct dept_map_common *mc);
 
 /*
  * for users who want to manage external keys
@@ -450,6 +479,8 @@ struct dept_task {
 #else /* !CONFIG_DEPT */
 struct dept_key  { };
 struct dept_map  { };
+struct dept_map_each    { };
+struct dept_map_commmon { };
 struct dept_task { };
 
 #define DEPT_TASK_INITIALIZER(t)
@@ -474,6 +505,11 @@ struct dept_task {
 #define dept_ecxt_exit(m, ip)				do { } while (0)
 #define dept_top_map()					NULL
 #define dept_warn_on(c)					do { } while (0)
+#define dept_split_map_each_init(me)			do { } while (0)
+#define dept_split_map_common_init(mc, k, n)		do { (void)(n); (void)(k); } while (0)
+#define dept_wait_split_map(me, mc, ip, w_fn, ne)	do { } while (0)
+#define dept_event_split_map(me, mc, ip, e_fn)		do { } while (0)
+#define dept_ask_event_split_map(me, mc)		do { } while (0)
 #define dept_key_init(k)				do { (void)(k); } while (0)
 #define dept_key_destroy(k)				do { (void)(k); } while (0)
 #endif
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index b8d56075..4510dbb 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -2339,6 +2339,128 @@ void dept_event(struct dept_map *m, unsigned long e_f, unsigned long ip,
 }
 EXPORT_SYMBOL_GPL(dept_event);
 
+void dept_split_map_each_init(struct dept_map_each *me)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	me->wgen = 0U;
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_split_map_each_init);
+
+void dept_split_map_common_init(struct dept_map_common *mc,
+				struct dept_key *k, const char *n)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	flags = dept_enter();
+
+	if (mc->keys != k)
+		mc->keys = k;
+	clean_classes_cache(&mc->keys_local);
+
+	/*
+	 * sub_usr is not used with split map.
+	 */
+	mc->sub_usr = 0;
+	mc->name = n;
+	mc->nocheck = false;
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_split_map_common_init);
+
+void dept_wait_split_map(struct dept_map_each *me,
+			 struct dept_map_common *mc,
+			 unsigned long ip, const char *w_fn, int ne)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_class *c;
+	struct dept_key *k;
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	if (mc->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	k = mc->keys ?: &mc->keys_local;
+	c = check_new_class(&mc->keys_local, k, 0, mc->name);
+	if (c)
+		add_wait(c, ip, w_fn, ne);
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_wait_split_map);
+
+void dept_ask_event_split_map(struct dept_map_each *me,
+			      struct dept_map_common *mc)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+	unsigned int wg;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	if (mc->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	/*
+	 * Avoid zero wgen.
+	 */
+	wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+	WRITE_ONCE(me->wgen, wg);
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_ask_event_split_map);
+
+void dept_event_split_map(struct dept_map_each *me,
+			  struct dept_map_common *mc,
+			  unsigned long ip, const char *e_fn)
+{
+	struct dept_task *dt = dept_task();
+	struct dept_class *c;
+	struct dept_key *k;
+	unsigned long flags;
+
+	if (READ_ONCE(dept_stop) || dt->recursive)
+		return;
+
+	if (mc->nocheck)
+		return;
+
+	flags = dept_enter();
+
+	k = mc->keys ?: &mc->keys_local;
+	c = check_new_class(&mc->keys_local, k, 0, mc->name);
+	if (c) {
+		add_ecxt((void *)me, c, 0UL, NULL, e_fn, 0);
+		do_event((void *)me, c, READ_ONCE(me->wgen), ip);
+		pop_ecxt((void *)me);
+	}
+
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_event_split_map);
+
 void dept_ecxt_exit(struct dept_map *m, unsigned long ip)
 {
 	struct dept_task *dt = dept_task();
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 12/16] dept: Apply Dept to wait/event of PG_{locked,writeback}
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (10 preceding siblings ...)
  2022-02-17 10:57 ` [PATCH 11/16] dept: Introduce split map concept and new APIs for them Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 10:57 ` [PATCH 13/16] dept: Apply SDT to swait Byungchul Park
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

Makes Dept able to track dependencies by PG_{locked,writeback}. For
instance, (un)lock_page() generates that type of dependency.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/dept_page.h       | 78 +++++++++++++++++++++++++++++++++++++++++
 include/linux/page-flags.h      | 45 ++++++++++++++++++++++--
 include/linux/pagemap.h         |  7 +++-
 init/main.c                     |  2 ++
 kernel/dependency/dept_object.h |  2 +-
 lib/Kconfig.debug               |  1 +
 mm/filemap.c                    | 68 +++++++++++++++++++++++++++++++++++
 mm/page_ext.c                   |  5 +++
 8 files changed, 204 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/dept_page.h

diff --git a/include/linux/dept_page.h b/include/linux/dept_page.h
new file mode 100644
index 0000000..d2d093d
--- /dev/null
+++ b/include/linux/dept_page.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_DEPT_PAGE_H
+#define __LINUX_DEPT_PAGE_H
+
+#ifdef CONFIG_DEPT
+#include <linux/dept.h>
+
+extern struct page_ext_operations dept_pglocked_ops;
+extern struct page_ext_operations dept_pgwriteback_ops;
+extern struct dept_map_common pglocked_mc;
+extern struct dept_map_common pgwriteback_mc;
+
+extern void dept_page_init(void);
+extern struct dept_map_each *get_pglocked_me(struct page *page);
+extern struct dept_map_each *get_pgwriteback_me(struct page *page);
+
+#define dept_pglocked_wait(f)					\
+do {								\
+	struct dept_map_each *me = get_pglocked_me(&(f)->page);	\
+								\
+	if (likely(me))						\
+		dept_wait_split_map(me, &pglocked_mc, _RET_IP_, \
+				    __func__, 0);		\
+} while (0)
+
+#define dept_pglocked_set_bit(f)				\
+do {								\
+	struct dept_map_each *me = get_pglocked_me(&(f)->page);	\
+								\
+	if (likely(me))						\
+		dept_ask_event_split_map(me, &pglocked_mc);	\
+} while (0)
+
+#define dept_pglocked_event(f)					\
+do {								\
+	struct dept_map_each *me = get_pglocked_me(&(f)->page);	\
+								\
+	if (likely(me))						\
+		dept_event_split_map(me, &pglocked_mc, _RET_IP_,\
+				     __func__);			\
+} while (0)
+
+#define dept_pgwriteback_wait(f)				\
+do {								\
+	struct dept_map_each *me = get_pgwriteback_me(&(f)->page);\
+								\
+	if (likely(me))						\
+		dept_wait_split_map(me, &pgwriteback_mc, _RET_IP_,\
+				    __func__, 0);		\
+} while (0)
+
+#define dept_pgwriteback_set_bit(f)				\
+do {								\
+	struct dept_map_each *me = get_pgwriteback_me(&(f)->page);\
+								\
+	if (likely(me))						\
+		dept_ask_event_split_map(me, &pgwriteback_mc);\
+} while (0)
+
+#define dept_pgwriteback_event(f)				\
+do {								\
+	struct dept_map_each *me = get_pgwriteback_me(&(f)->page);\
+								\
+	if (likely(me))						\
+		dept_event_split_map(me, &pgwriteback_mc, _RET_IP_,\
+				     __func__);			\
+} while (0)
+#else
+#define dept_page_init()		do { } while (0)
+#define dept_pglocked_wait(f)		do { } while (0)
+#define dept_pglocked_set_bit(f)	do { } while (0)
+#define dept_pglocked_event(f)		do { } while (0)
+#define dept_pgwriteback_wait(f)	do { } while (0)
+#define dept_pgwriteback_set_bit(f)	do { } while (0)
+#define dept_pgwriteback_event(f)	do { } while (0)
+#endif
+
+#endif /* __LINUX_DEPT_PAGE_H */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 1c3b6e5..066b6a5 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -411,7 +411,6 @@ static unsigned long *folio_flags(struct folio *folio, unsigned n)
 #define TESTSCFLAG_FALSE(uname, lname)					\
 	TESTSETFLAG_FALSE(uname, lname) TESTCLEARFLAG_FALSE(uname, lname)
 
-__PAGEFLAG(Locked, locked, PF_NO_TAIL)
 PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) __CLEARPAGEFLAG(Waiters, waiters, PF_ONLY_HEAD)
 PAGEFLAG(Error, error, PF_NO_TAIL) TESTCLEARFLAG(Error, error, PF_NO_TAIL)
 PAGEFLAG(Referenced, referenced, PF_HEAD)
@@ -459,7 +458,6 @@ static unsigned long *folio_flags(struct folio *folio, unsigned n)
  * risky: they bypass page accounting.
  */
 TESTPAGEFLAG(Writeback, writeback, PF_NO_TAIL)
-	TESTSCFLAG(Writeback, writeback, PF_NO_TAIL)
 PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL)
 
 /* PG_readahead is only used for reads; PG_reclaim is only for writes */
@@ -542,6 +540,49 @@ static __always_inline bool PageSwapCache(struct page *page)
 PAGEFLAG_FALSE(SkipKASanPoison, skip_kasan_poison)
 #endif
 
+#ifdef CONFIG_DEPT
+TESTPAGEFLAG(Locked, locked, PF_NO_TAIL)
+__CLEARPAGEFLAG(Locked, locked, PF_NO_TAIL)
+TESTCLEARFLAG(Writeback, writeback, PF_NO_TAIL)
+
+#include <linux/dept_page.h>
+
+static __always_inline
+void __folio_set_locked(struct folio *folio)
+{
+	dept_pglocked_set_bit(folio);
+	__set_bit(PG_locked, folio_flags(folio, FOLIO_PF_NO_TAIL));
+}
+
+static __always_inline void __SetPageLocked(struct page *page)
+{
+	dept_pglocked_set_bit(page_folio(page));
+	__set_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
+}
+
+static __always_inline
+bool folio_test_set_writeback(struct folio *folio)
+{
+	bool ret = test_and_set_bit(PG_writeback, folio_flags(folio, FOLIO_PF_NO_TAIL));
+
+	if (!ret)
+		dept_pgwriteback_set_bit(folio);
+	return ret;
+}
+
+static __always_inline int TestSetPageWriteback(struct page *page)
+{
+	int ret = test_and_set_bit(PG_writeback, &PF_NO_TAIL(page, 1)->flags);
+
+	if (!ret)
+		dept_pgwriteback_set_bit(page_folio(page));
+	return ret;
+}
+#else
+__PAGEFLAG(Locked, locked, PF_NO_TAIL)
+TESTSCFLAG(Writeback, writeback, PF_NO_TAIL)
+#endif
+
 /*
  * PageReported() is used to track reported free pages within the Buddy
  * allocator. We can use the non-atomic version of the test and set
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 270bf51..9ff11a1 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -15,6 +15,7 @@
 #include <linux/bitops.h>
 #include <linux/hardirq.h> /* for in_interrupt() */
 #include <linux/hugetlb_inline.h>
+#include <linux/dept_page.h>
 
 struct folio_batch;
 
@@ -761,7 +762,11 @@ bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
 
 static inline bool folio_trylock(struct folio *folio)
 {
-	return likely(!test_and_set_bit_lock(PG_locked, folio_flags(folio, 0)));
+	int ret = test_and_set_bit_lock(PG_locked, folio_flags(folio, 0));
+
+	if (likely(!ret))
+		dept_pglocked_set_bit(folio);
+	return likely(!ret);
 }
 
 /*
diff --git a/init/main.c b/init/main.c
index ca96e11..4818c75 100644
--- a/init/main.c
+++ b/init/main.c
@@ -100,6 +100,7 @@
 #include <linux/kcsan.h>
 #include <linux/init_syscalls.h>
 #include <linux/stackdepot.h>
+#include <linux/pagemap.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -1072,6 +1073,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
 
 	lockdep_init();
 	dept_init();
+	dept_page_init();
 
 	/*
 	 * Need to run this when irqs are enabled, because it wants
diff --git a/kernel/dependency/dept_object.h b/kernel/dependency/dept_object.h
index ad5ff57..f3f1cfe 100644
--- a/kernel/dependency/dept_object.h
+++ b/kernel/dependency/dept_object.h
@@ -6,7 +6,7 @@
  * nr: # of the object that should be kept in the pool.
  */
 
-OBJECT(dep, 1024 * 8)
+OBJECT(dep, 1024 * 16)
 OBJECT(class, 1024 * 4)
 OBJECT(stack, 1024 * 32)
 OBJECT(ecxt, 1024 * 4)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 309b275..c7c2510 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1242,6 +1242,7 @@ config DEPT
 	select DEBUG_RWSEMS
 	select DEBUG_WW_MUTEX_SLOWPATH
 	select DEBUG_LOCK_ALLOC
+	select PAGE_EXTENSION
 	select TRACE_IRQFLAGS
 	select STACKTRACE
 	select FRAME_POINTER if !MIPS && !PPC && !ARM && !S390 && !MICROBLAZE && !ARC && !X86
diff --git a/mm/filemap.c b/mm/filemap.c
index ad8c39d..4f004c4 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1148,6 +1148,11 @@ static void folio_wake_bit(struct folio *folio, int bit_nr)
 	unsigned long flags;
 	wait_queue_entry_t bookmark;
 
+	if (bit_nr == PG_locked)
+		dept_pglocked_event(folio);
+	else if (bit_nr == PG_writeback)
+		dept_pgwriteback_event(folio);
+
 	key.folio = folio;
 	key.bit_nr = bit_nr;
 	key.page_match = 0;
@@ -1227,6 +1232,10 @@ static inline bool folio_trylock_flag(struct folio *folio, int bit_nr,
 	if (wait->flags & WQ_FLAG_EXCLUSIVE) {
 		if (test_and_set_bit(bit_nr, &folio->flags))
 			return false;
+		else if (bit_nr == PG_locked)
+			dept_pglocked_set_bit(folio);
+		else if (bit_nr == PG_writeback)
+			dept_pgwriteback_set_bit(folio);
 	} else if (test_bit(bit_nr, &folio->flags))
 		return false;
 
@@ -1248,6 +1257,11 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 	bool delayacct = false;
 	unsigned long pflags;
 
+	if (bit_nr == PG_locked)
+		dept_pglocked_wait(folio);
+	else if (bit_nr == PG_writeback)
+		dept_pgwriteback_wait(folio);
+
 	if (bit_nr == PG_locked &&
 	    !folio_test_uptodate(folio) && folio_test_workingset(folio)) {
 		if (!folio_test_swapbacked(folio)) {
@@ -1340,6 +1354,11 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 		if (unlikely(test_and_set_bit(bit_nr, folio_flags(folio, 0))))
 			goto repeat;
 
+		if (bit_nr == PG_locked)
+			dept_pglocked_set_bit(folio);
+		else if (bit_nr == PG_writeback)
+			dept_pgwriteback_set_bit(folio);
+
 		wait->flags |= WQ_FLAG_DONE;
 		break;
 	}
@@ -3960,3 +3979,52 @@ bool filemap_release_folio(struct folio *folio, gfp_t gfp)
 	return try_to_free_buffers(&folio->page);
 }
 EXPORT_SYMBOL(filemap_release_folio);
+
+#ifdef CONFIG_DEPT
+static bool need_dept_pglocked(void)
+{
+	return true;
+}
+
+struct page_ext_operations dept_pglocked_ops = {
+	.size = sizeof(struct dept_map_each),
+	.need = need_dept_pglocked,
+};
+
+struct dept_map_each *get_pglocked_me(struct page *p)
+{
+	struct page_ext *e = lookup_page_ext(p);
+
+	return e ? (void *)e + dept_pglocked_ops.offset : NULL;
+}
+EXPORT_SYMBOL(get_pglocked_me);
+
+static bool need_dept_pgwriteback(void)
+{
+	return true;
+}
+
+struct page_ext_operations dept_pgwriteback_ops = {
+	.size = sizeof(struct dept_map_each),
+	.need = need_dept_pgwriteback,
+};
+
+struct dept_map_each *get_pgwriteback_me(struct page *p)
+{
+	struct page_ext *e = lookup_page_ext(p);
+
+	return e ? (void *)e + dept_pgwriteback_ops.offset : NULL;
+}
+EXPORT_SYMBOL(get_pgwriteback_me);
+
+struct dept_map_common pglocked_mc;
+EXPORT_SYMBOL(pglocked_mc);
+struct dept_map_common pgwriteback_mc;
+EXPORT_SYMBOL(pgwriteback_mc);
+
+void dept_page_init(void)
+{
+	dept_split_map_common_init(&pglocked_mc, NULL, "pglocked");
+	dept_split_map_common_init(&pgwriteback_mc, NULL, "pgwriteback");
+}
+#endif
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 2e66d93..b7f5b0d 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -9,6 +9,7 @@
 #include <linux/page_owner.h>
 #include <linux/page_idle.h>
 #include <linux/page_table_check.h>
+#include <linux/dept_page.h>
 
 /*
  * struct page extension
@@ -79,6 +80,10 @@ static bool need_page_idle(void)
 #ifdef CONFIG_PAGE_TABLE_CHECK
 	&page_table_check_ops,
 #endif
+#ifdef CONFIG_DEPT
+	&dept_pglocked_ops,
+	&dept_pgwriteback_ops,
+#endif
 };
 
 unsigned long page_ext_size = sizeof(struct page_ext);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 13/16] dept: Apply SDT to swait
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (11 preceding siblings ...)
  2022-02-17 10:57 ` [PATCH 12/16] dept: Apply Dept to wait/event of PG_{locked,writeback} Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 10:57 ` [PATCH 14/16] dept: Apply SDT to wait(waitqueue) Byungchul Park
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

Makes SDT able to track dependencies by swait.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/swait.h |  4 ++++
 kernel/sched/swait.c  | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/swait.h b/include/linux/swait.h
index 6a8c22b..dbdf2ce 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -6,6 +6,7 @@
 #include <linux/stddef.h>
 #include <linux/spinlock.h>
 #include <linux/wait.h>
+#include <linux/dept_sdt.h>
 #include <asm/current.h>
 
 /*
@@ -43,6 +44,7 @@
 struct swait_queue_head {
 	raw_spinlock_t		lock;
 	struct list_head	task_list;
+	struct dept_map		dmap;
 };
 
 struct swait_queue {
@@ -61,6 +63,7 @@ struct swait_queue {
 #define __SWAIT_QUEUE_HEAD_INITIALIZER(name) {				\
 	.lock		= __RAW_SPIN_LOCK_UNLOCKED(name.lock),		\
 	.task_list	= LIST_HEAD_INIT((name).task_list),		\
+	.dmap		= DEPT_SDT_MAP_INIT(name),			\
 }
 
 #define DECLARE_SWAIT_QUEUE_HEAD(name)					\
@@ -72,6 +75,7 @@ extern void __init_swait_queue_head(struct swait_queue_head *q, const char *name
 #define init_swait_queue_head(q)				\
 	do {							\
 		static struct lock_class_key __key;		\
+		sdt_map_init(&(q)->dmap);			\
 		__init_swait_queue_head((q), #q, &__key);	\
 	} while (0)
 
diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c
index e1c655f..4ca7d6e 100644
--- a/kernel/sched/swait.c
+++ b/kernel/sched/swait.c
@@ -27,6 +27,7 @@ void swake_up_locked(struct swait_queue_head *q)
 		return;
 
 	curr = list_first_entry(&q->task_list, typeof(*curr), task_list);
+	sdt_event(&q->dmap);
 	wake_up_process(curr->task);
 	list_del_init(&curr->task_list);
 }
@@ -69,6 +70,7 @@ void swake_up_all(struct swait_queue_head *q)
 	while (!list_empty(&tmp)) {
 		curr = list_first_entry(&tmp, typeof(*curr), task_list);
 
+		sdt_event(&q->dmap);
 		wake_up_state(curr->task, TASK_NORMAL);
 		list_del_init(&curr->task_list);
 
@@ -97,6 +99,9 @@ void prepare_to_swait_exclusive(struct swait_queue_head *q, struct swait_queue *
 	__prepare_to_swait(q, wait);
 	set_current_state(state);
 	raw_spin_unlock_irqrestore(&q->lock, flags);
+
+	if (state & TASK_NORMAL)
+		sdt_wait_prepare(&q->dmap);
 }
 EXPORT_SYMBOL(prepare_to_swait_exclusive);
 
@@ -119,12 +124,16 @@ long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait
 	}
 	raw_spin_unlock_irqrestore(&q->lock, flags);
 
+	if (!ret && state & TASK_NORMAL)
+		sdt_wait_prepare(&q->dmap);
+
 	return ret;
 }
 EXPORT_SYMBOL(prepare_to_swait_event);
 
 void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
 {
+	sdt_wait_finish();
 	__set_current_state(TASK_RUNNING);
 	if (!list_empty(&wait->task_list))
 		list_del_init(&wait->task_list);
@@ -134,6 +143,7 @@ void finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
 {
 	unsigned long flags;
 
+	sdt_wait_finish();
 	__set_current_state(TASK_RUNNING);
 
 	if (!list_empty_careful(&wait->task_list)) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 14/16] dept: Apply SDT to wait(waitqueue)
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (12 preceding siblings ...)
  2022-02-17 10:57 ` [PATCH 13/16] dept: Apply SDT to swait Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 10:57 ` [PATCH 15/16] locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread Byungchul Park
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

Makes SDT able to track dependencies by wait(waitqueue).

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/wait.h |  6 +++++-
 kernel/sched/wait.c  | 16 ++++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 851e07d..2133998 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -7,6 +7,7 @@
 #include <linux/list.h>
 #include <linux/stddef.h>
 #include <linux/spinlock.h>
+#include <linux/dept_sdt.h>
 
 #include <asm/current.h>
 #include <uapi/linux/wait.h>
@@ -37,6 +38,7 @@ struct wait_queue_entry {
 struct wait_queue_head {
 	spinlock_t		lock;
 	struct list_head	head;
+	struct dept_map		dmap;
 };
 typedef struct wait_queue_head wait_queue_head_t;
 
@@ -56,7 +58,8 @@ struct wait_queue_head {
 
 #define __WAIT_QUEUE_HEAD_INITIALIZER(name) {					\
 	.lock		= __SPIN_LOCK_UNLOCKED(name.lock),			\
-	.head		= LIST_HEAD_INIT(name.head) }
+	.head		= LIST_HEAD_INIT(name.head),				\
+	.dmap		= DEPT_SDT_MAP_INIT(name) }
 
 #define DECLARE_WAIT_QUEUE_HEAD(name) \
 	struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
@@ -67,6 +70,7 @@ struct wait_queue_head {
 	do {									\
 		static struct lock_class_key __key;				\
 										\
+		sdt_map_init(&(wq_head)->dmap);					\
 		__init_waitqueue_head((wq_head), #wq_head, &__key);		\
 	} while (0)
 
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index eca3810..fc5a16a 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -105,6 +105,7 @@ static int __wake_up_common(struct wait_queue_head *wq_head, unsigned int mode,
 		if (flags & WQ_FLAG_BOOKMARK)
 			continue;
 
+		sdt_event(&wq_head->dmap);
 		ret = curr->func(curr, mode, wake_flags, key);
 		if (ret < 0)
 			break;
@@ -268,6 +269,9 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head)
 		__add_wait_queue(wq_head, wq_entry);
 	set_current_state(state);
 	spin_unlock_irqrestore(&wq_head->lock, flags);
+
+	if (state & TASK_NORMAL)
+		sdt_wait_prepare(&wq_head->dmap);
 }
 EXPORT_SYMBOL(prepare_to_wait);
 
@@ -286,6 +290,10 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head)
 	}
 	set_current_state(state);
 	spin_unlock_irqrestore(&wq_head->lock, flags);
+
+	if (state & TASK_NORMAL)
+		sdt_wait_prepare(&wq_head->dmap);
+
 	return was_empty;
 }
 EXPORT_SYMBOL(prepare_to_wait_exclusive);
@@ -331,6 +339,9 @@ long prepare_to_wait_event(struct wait_queue_head *wq_head, struct wait_queue_en
 	}
 	spin_unlock_irqrestore(&wq_head->lock, flags);
 
+	if (!ret && state & TASK_NORMAL)
+		sdt_wait_prepare(&wq_head->dmap);
+
 	return ret;
 }
 EXPORT_SYMBOL(prepare_to_wait_event);
@@ -352,7 +363,9 @@ int do_wait_intr(wait_queue_head_t *wq, wait_queue_entry_t *wait)
 		return -ERESTARTSYS;
 
 	spin_unlock(&wq->lock);
+	sdt_wait_prepare(&wq->dmap);
 	schedule();
+	sdt_wait_finish();
 	spin_lock(&wq->lock);
 
 	return 0;
@@ -369,7 +382,9 @@ int do_wait_intr_irq(wait_queue_head_t *wq, wait_queue_entry_t *wait)
 		return -ERESTARTSYS;
 
 	spin_unlock_irq(&wq->lock);
+	sdt_wait_prepare(&wq->dmap);
 	schedule();
+	sdt_wait_finish();
 	spin_lock_irq(&wq->lock);
 
 	return 0;
@@ -389,6 +404,7 @@ void finish_wait(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_en
 {
 	unsigned long flags;
 
+	sdt_wait_finish();
 	__set_current_state(TASK_RUNNING);
 	/*
 	 * We can check for list emptiness outside the lock
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 15/16] locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (13 preceding siblings ...)
  2022-02-17 10:57 ` [PATCH 14/16] dept: Apply SDT to wait(waitqueue) Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 10:57 ` [PATCH 16/16] dept: Distinguish each syscall context from another Byungchul Park
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

cb92173d1f0 (locking/lockdep, cpu/hotplug: Annotate AP thread) was
introduced to make lockdep_assert_cpus_held() work in AP thread.

However, the annotation is too strong for that purpose. We don't have to
use more than try lock annotation for that.

Furthermore, now that Dept was introduced, false positive alarms was
reported by that. Replaced it with try lock annotation.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 407a256..1f92a42 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -355,7 +355,7 @@ int lockdep_is_cpus_held(void)
 
 static void lockdep_acquire_cpus_lock(void)
 {
-	rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 0, _THIS_IP_);
+	rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 1, _THIS_IP_);
 }
 
 static void lockdep_release_cpus_lock(void)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 16/16] dept: Distinguish each syscall context from another
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (14 preceding siblings ...)
  2022-02-17 10:57 ` [PATCH 15/16] locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread Byungchul Park
@ 2022-02-17 10:57 ` Byungchul Park
  2022-02-17 11:10 ` Report 1 in ext4 and journal based on v5.17-rc1 Byungchul Park
  2022-02-17 15:51 ` [PATCH 00/16] DEPT(Dependency Tracker) Theodore Ts'o
  17 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 10:57 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

It enters kernel mode on each syscall and each syscall handling should
be considered independently from the point of view of Dept. Otherwise,
Dept may wrongly track dependencies across different syscalls.

That might be a real dependency from user mode. However, now that Dept
just started to work, conservatively let Dept not track dependencies
across different syscalls.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 include/linux/dept.h     | 39 ++++++++++++++++------------
 kernel/dependency/dept.c | 67 ++++++++++++++++++++++++------------------------
 kernel/entry/common.c    |  3 +++
 3 files changed, 60 insertions(+), 49 deletions(-)

diff --git a/include/linux/dept.h b/include/linux/dept.h
index 531065a..acf4db0 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -25,11 +25,16 @@
 #define DEPT_MAX_SUBCLASSES_USR		(DEPT_MAX_SUBCLASSES / DEPT_MAX_SUBCLASSES_EVT)
 #define DEPT_MAX_SUBCLASSES_CACHE	2
 
-#define DEPT_SIRQ			0
-#define DEPT_HIRQ			1
-#define DEPT_IRQS_NR			2
-#define DEPT_SIRQF			(1UL << DEPT_SIRQ)
-#define DEPT_HIRQF			(1UL << DEPT_HIRQ)
+enum {
+	DEPT_CXT_SIRQ = 0,
+	DEPT_CXT_HIRQ,
+	DEPT_CXT_IRQS_NR,
+	DEPT_CXT_PROCESS = DEPT_CXT_IRQS_NR,
+	DEPT_CXTS_NR
+};
+
+#define DEPT_SIRQF			(1UL << DEPT_CXT_SIRQ)
+#define DEPT_HIRQF			(1UL << DEPT_CXT_HIRQ)
 
 struct dept_ecxt;
 struct dept_iecxt {
@@ -89,8 +94,8 @@ struct dept_class {
 	/*
 	 * for tracking IRQ dependencies
 	 */
-	struct dept_iecxt iecxt[DEPT_IRQS_NR];
-	struct dept_iwait iwait[DEPT_IRQS_NR];
+	struct dept_iecxt iecxt[DEPT_CXT_IRQS_NR];
+	struct dept_iwait iwait[DEPT_CXT_IRQS_NR];
 };
 
 struct dept_stack {
@@ -144,8 +149,8 @@ struct dept_ecxt {
 	/*
 	 * where the IRQ-enabled happened
 	 */
-	unsigned long enirq_ip[DEPT_IRQS_NR];
-	struct dept_stack *enirq_stack[DEPT_IRQS_NR];
+	unsigned long enirq_ip[DEPT_CXT_IRQS_NR];
+	struct dept_stack *enirq_stack[DEPT_CXT_IRQS_NR];
 
 	/*
 	 * where the event context started
@@ -188,8 +193,8 @@ struct dept_wait {
 	/*
 	 * where the IRQ wait happened
 	 */
-	unsigned long irq_ip[DEPT_IRQS_NR];
-	struct dept_stack *irq_stack[DEPT_IRQS_NR];
+	unsigned long irq_ip[DEPT_CXT_IRQS_NR];
+	struct dept_stack *irq_stack[DEPT_CXT_IRQS_NR];
 
 	/*
 	 * where the wait happened
@@ -389,19 +394,19 @@ struct dept_task {
 	int wait_hist_pos;
 
 	/*
-	 * sequential id to identify each IRQ context
+	 * sequential id to identify each context
 	 */
-	unsigned int irq_id[DEPT_IRQS_NR];
+	unsigned int cxt_id[DEPT_CXTS_NR];
 
 	/*
 	 * for tracking IRQ-enabled points with cross-event
 	 */
-	unsigned int wgen_enirq[DEPT_IRQS_NR];
+	unsigned int wgen_enirq[DEPT_CXT_IRQS_NR];
 
 	/*
 	 * for keeping up-to-date IRQ-enabled points
 	 */
-	unsigned long enirq_ip[DEPT_IRQS_NR];
+	unsigned long enirq_ip[DEPT_CXT_IRQS_NR];
 
 	/*
 	 * current effective IRQ-enabled flag
@@ -438,7 +443,7 @@ struct dept_task {
 	.dept_task.wait_hist = { { .wait = NULL, } },			\
 	.dept_task.ecxt_held_pos = 0,					\
 	.dept_task.wait_hist_pos = 0,					\
-	.dept_task.irq_id = { 0 },					\
+	.dept_task.cxt_id = { 0 },					\
 	.dept_task.wgen_enirq = { 0 },					\
 	.dept_task.enirq_ip = { 0 },					\
 	.dept_task.recursive = 0,					\
@@ -470,6 +475,7 @@ struct dept_task {
 extern void dept_wait_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *w_fn, int ne);
 extern void dept_event_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *e_fn);
 extern void dept_ask_event_split_map(struct dept_map_each *me, struct dept_map_common *mc);
+extern void dept_kernel_enter(void);
 
 /*
  * for users who want to manage external keys
@@ -510,6 +516,7 @@ struct dept_task {
 #define dept_wait_split_map(me, mc, ip, w_fn, ne)	do { } while (0)
 #define dept_event_split_map(me, mc, ip, e_fn)		do { } while (0)
 #define dept_ask_event_split_map(me, mc)		do { } while (0)
+#define dept_kernel_enter()				do { } while (0)
 #define dept_key_init(k)				do { (void)(k); } while (0)
 #define dept_key_destroy(k)				do { (void)(k); } while (0)
 #endif
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 4510dbb..f728500 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -229,9 +229,9 @@ static inline struct dept_class *dep_tc(struct dept_dep *d)
 
 static inline const char *irq_str(int irq)
 {
-	if (irq == DEPT_SIRQ)
+	if (irq == DEPT_CXT_SIRQ)
 		return "softirq";
-	if (irq == DEPT_HIRQ)
+	if (irq == DEPT_CXT_HIRQ)
 		return "hardirq";
 	return "(unknown)";
 }
@@ -389,7 +389,7 @@ static void initialize_class(struct dept_class *c)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		struct dept_iecxt *ie = &c->iecxt[i];
 		struct dept_iwait *iw = &c->iwait[i];
 
@@ -414,7 +414,7 @@ static void initialize_ecxt(struct dept_ecxt *e)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		e->enirq_stack[i] = NULL;
 		e->enirq_ip[i] = 0UL;
 	}
@@ -429,7 +429,7 @@ static void initialize_wait(struct dept_wait *w)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		w->irq_stack[i] = NULL;
 		w->irq_ip[i] = 0UL;
 	}
@@ -468,7 +468,7 @@ static void destroy_ecxt(struct dept_ecxt *e)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++)
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++)
 		if (e->enirq_stack[i])
 			put_stack(e->enirq_stack[i]);
 	if (e->class)
@@ -484,7 +484,7 @@ static void destroy_wait(struct dept_wait *w)
 {
 	int i;
 
-	for (i = 0; i < DEPT_IRQS_NR; i++)
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++)
 		if (w->irq_stack[i])
 			put_stack(w->irq_stack[i]);
 	if (w->class)
@@ -628,7 +628,7 @@ static void print_diagram(struct dept_dep *d)
 	const char *c_fn = e->ecxt_fn ?: "(unknown)";
 
 	irqf = e->enirqf & w->irqf;
-	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+	for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
 		if (!firstline)
 			pr_warn("\nor\n\n");
 		firstline = false;
@@ -659,7 +659,7 @@ static void print_dep(struct dept_dep *d)
 	const char *c_fn = e->ecxt_fn ?: "(unknown)";
 
 	irqf = e->enirqf & w->irqf;
-	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+	for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
 		pr_warn("%s has been enabled:\n", irq_str(irq));
 		print_ip_stack(e->enirq_ip[irq], e->enirq_stack[irq]);
 		pr_warn("\n");
@@ -885,7 +885,7 @@ static void bfs(struct dept_class *c, bfs_f *cb, void *in, void **out)
  */
 
 static inline unsigned long cur_enirqf(void);
-static inline int cur_irq(void);
+static inline int cur_cxt(void);
 static inline unsigned int cur_ctxt_id(void);
 
 static inline struct dept_iecxt *iecxt(struct dept_class *c, int irq)
@@ -1411,7 +1411,7 @@ static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
 	if (d) {
 		check_dl_bfs(d);
 
-		for (i = 0; i < DEPT_IRQS_NR; i++) {
+		for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 			struct dept_iwait *fiw = iwait(fc, i);
 			struct dept_iecxt *found_ie;
 			struct dept_iwait *found_iw;
@@ -1447,7 +1447,7 @@ static void add_wait(struct dept_class *c, unsigned long ip,
 	struct dept_task *dt = dept_task();
 	struct dept_wait *w;
 	unsigned int wg = 0U;
-	int irq;
+	int cxt;
 	int i;
 
 	w = new_wait();
@@ -1459,9 +1459,9 @@ static void add_wait(struct dept_class *c, unsigned long ip,
 	w->wait_fn = w_fn;
 	w->wait_stack = get_current_stack();
 
-	irq = cur_irq();
-	if (irq < DEPT_IRQS_NR)
-		add_iwait(c, irq, w);
+	cxt = cur_cxt();
+	if (cxt == DEPT_CXT_HIRQ || cxt == DEPT_CXT_SIRQ)
+		add_iwait(c, cxt, w);
 
 	/*
 	 * Avoid adding dependency between user aware nested ecxt and
@@ -1526,7 +1526,7 @@ static void add_ecxt(void *obj, struct dept_class *c, unsigned long ip,
 	eh->nest = ne;
 
 	irqf = cur_enirqf();
-	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR)
+	for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR)
 		add_iecxt(c, irq, e, false);
 
 	del_ecxt(e);
@@ -1653,7 +1653,7 @@ static void do_event(void *obj, struct dept_class *c, unsigned int wg,
 			break;
 	}
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		struct dept_ecxt *e;
 
 		if (before(dt->wgen_enirq[i], wg))
@@ -1695,7 +1695,7 @@ static void disconnect_class(struct dept_class *c)
 		call_rcu(&d->rh, del_dep_rcu);
 	}
 
-	for (i = 0; i < DEPT_IRQS_NR; i++) {
+	for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
 		stale_iecxt(iecxt(c, i));
 		stale_iwait(iwait(c, i));
 	}
@@ -1720,27 +1720,21 @@ static inline unsigned long cur_enirqf(void)
 	return 0UL;
 }
 
-static inline int cur_irq(void)
+static inline int cur_cxt(void)
 {
 	if (lockdep_softirq_context(current))
-		return DEPT_SIRQ;
+		return DEPT_CXT_SIRQ;
 	if (lockdep_hardirq_context())
-		return DEPT_HIRQ;
-	return DEPT_IRQS_NR;
+		return DEPT_CXT_HIRQ;
+	return DEPT_CXT_PROCESS;
 }
 
 static inline unsigned int cur_ctxt_id(void)
 {
 	struct dept_task *dt = dept_task();
-	int irq = cur_irq();
+	int cxt = cur_cxt();
 
-	/*
-	 * Normal process context
-	 */
-	if (irq == DEPT_IRQS_NR)
-		return 0U;
-
-	return dt->irq_id[irq] | (1UL << irq);
+	return dt->cxt_id[cxt] | (1UL << cxt);
 }
 
 static void enirq_transition(int irq)
@@ -1790,7 +1784,7 @@ static void enirq_update(unsigned long ip)
 	/*
 	 * Do enirq_transition() only on an OFF -> ON transition.
 	 */
-	for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+	for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
 		if (prev & (1UL << irq))
 			continue;
 
@@ -1893,6 +1887,13 @@ void dept_disable_hardirq(unsigned long ip)
 	dept_exit(flags);
 }
 
+void dept_kernel_enter(void)
+{
+	struct dept_task *dt = dept_task();
+
+	dt->cxt_id[DEPT_CXT_PROCESS] += (1UL << DEPT_CXTS_NR);
+}
+
 /*
  * Ensure it's the outmost softirq context.
  */
@@ -1900,7 +1901,7 @@ void dept_softirq_enter(void)
 {
 	struct dept_task *dt = dept_task();
 
-	dt->irq_id[DEPT_SIRQ] += (1UL << DEPT_IRQS_NR);
+	dt->cxt_id[DEPT_CXT_SIRQ] += (1UL << DEPT_CXTS_NR);
 }
 
 /*
@@ -1910,7 +1911,7 @@ void dept_hardirq_enter(void)
 {
 	struct dept_task *dt = dept_task();
 
-	dt->irq_id[DEPT_HIRQ] += (1UL << DEPT_IRQS_NR);
+	dt->cxt_id[DEPT_CXT_HIRQ] += (1UL << DEPT_CXTS_NR);
 }
 
 /*
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index bad7136..1826508 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -6,6 +6,7 @@
 #include <linux/livepatch.h>
 #include <linux/audit.h>
 #include <linux/tick.h>
+#include <linux/dept.h>
 
 #include "common.h"
 
@@ -102,6 +103,7 @@ noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)
 	long ret;
 
 	__enter_from_user_mode(regs);
+	dept_kernel_enter();
 
 	instrumentation_begin();
 	local_irq_enable();
@@ -114,6 +116,7 @@ noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)
 noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs)
 {
 	__enter_from_user_mode(regs);
+	dept_kernel_enter();
 	instrumentation_begin();
 	local_irq_enable();
 	instrumentation_end();
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Report 1 in ext4 and journal based on v5.17-rc1
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (15 preceding siblings ...)
  2022-02-17 10:57 ` [PATCH 16/16] dept: Distinguish each syscall context from another Byungchul Park
@ 2022-02-17 11:10 ` Byungchul Park
  2022-02-17 11:10   ` Report 2 " Byungchul Park
                     ` (2 more replies)
  2022-02-17 15:51 ` [PATCH 00/16] DEPT(Dependency Tracker) Theodore Ts'o
  17 siblings, 3 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 11:10 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

[    7.009608] ===================================================
[    7.009613] DEPT: Circular dependency has been detected.
[    7.009614] 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 Tainted: G        W
[    7.009616] ---------------------------------------------------
[    7.009617] summary
[    7.009618] ---------------------------------------------------
[    7.009618] *** DEADLOCK ***
[    7.009618]
[    7.009619] context A
[    7.009619]     [S] (unknown)(&(bit_wait_table + i)->dmap:0)
[    7.009621]     [W] down_write(&ei->i_data_sem:0)
[    7.009623]     [E] event(&(bit_wait_table + i)->dmap:0)
[    7.009624]
[    7.009625] context B
[    7.009625]     [S] down_read(&ei->i_data_sem:0)
[    7.009626]     [W] wait(&(bit_wait_table + i)->dmap:0)
[    7.009627]     [E] up_read(&ei->i_data_sem:0)
[    7.009628]
[    7.009629] [S]: start of the event context
[    7.009629] [W]: the wait blocked
[    7.009630] [E]: the event not reachable
[    7.009631] ---------------------------------------------------
[    7.009631] context A's detail
[    7.009632] ---------------------------------------------------
[    7.009632] context A
[    7.009633]     [S] (unknown)(&(bit_wait_table + i)->dmap:0)
[    7.009634]     [W] down_write(&ei->i_data_sem:0)
[    7.009635]     [E] event(&(bit_wait_table + i)->dmap:0)
[    7.009636]
[    7.009636] [S] (unknown)(&(bit_wait_table + i)->dmap:0):
[    7.009638] (N/A)
[    7.009638]
[    7.009639] [W] down_write(&ei->i_data_sem:0):
[    7.009639] ext4_truncate (fs/ext4/inode.c:4187) 
[    7.009645] stacktrace:
[    7.009646] down_write (kernel/locking/rwsem.c:1514) 
[    7.009648] ext4_truncate (fs/ext4/inode.c:4187) 
[    7.009650] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
[    7.009652] generic_perform_write (mm/filemap.c:3784) 
[    7.009654] ext4_buffered_write_iter (fs/ext4/file.c:269) 
[    7.009657] ext4_file_write_iter (fs/ext4/file.c:677) 
[    7.009659] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[    7.009662] vfs_write (fs/read_write.c:590) 
[    7.009663] ksys_write (fs/read_write.c:644) 
[    7.009664] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    7.009667] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    7.009669]
[    7.009670] [E] event(&(bit_wait_table + i)->dmap:0):
[    7.009671] __wake_up_common (kernel/sched/wait.c:108) 
[    7.009673] stacktrace:
[    7.009674] dept_event (kernel/dependency/dept.c:2337) 
[    7.009677] __wake_up_common (kernel/sched/wait.c:109) 
[    7.009678] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
[    7.009679] __wake_up_bit (kernel/sched/wait_bit.c:127) 
[    7.009681] ext4_orphan_del (fs/ext4/orphan.c:282) 
[    7.009683] ext4_truncate (fs/ext4/inode.c:4212) 
[    7.009685] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
[    7.009687] generic_perform_write (mm/filemap.c:3784) 
[    7.009688] ext4_buffered_write_iter (fs/ext4/file.c:269) 
[    7.009690] ext4_file_write_iter (fs/ext4/file.c:677) 
[    7.009692] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[    7.009694] vfs_write (fs/read_write.c:590) 
[    7.009695] ksys_write (fs/read_write.c:644) 
[    7.009696] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    7.009698] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    7.009700] ---------------------------------------------------
[    7.009700] context B's detail
[    7.009701] ---------------------------------------------------
[    7.009702] context B
[    7.009702]     [S] down_read(&ei->i_data_sem:0)
[    7.009703]     [W] wait(&(bit_wait_table + i)->dmap:0)
[    7.009704]     [E] up_read(&ei->i_data_sem:0)
[    7.009705]
[    7.009706] [S] down_read(&ei->i_data_sem:0):
[    7.009707] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) 
[    7.009709] stacktrace:
[    7.009709] down_read (kernel/locking/rwsem.c:1461) 
[    7.009711] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) 
[    7.009712] ext4_getblk (fs/ext4/inode.c:851) 
[    7.009714] ext4_bread (fs/ext4/inode.c:903) 
[    7.009715] __ext4_read_dirblock (fs/ext4/namei.c:117) 
[    7.009718] dx_probe (fs/ext4/namei.c:789) 
[    7.009720] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
[    7.009722] __ext4_find_entry (fs/ext4/namei.c:1571) 
[    7.009723] ext4_lookup (fs/ext4/namei.c:1770) 
[    7.009725] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) 
[    7.009727] path_openat (fs/namei.c:3401 fs/namei.c:3605) 
[    7.009729] do_filp_open (fs/namei.c:3637) 
[    7.009731] do_sys_openat2 (fs/open.c:1215) 
[    7.009732] do_sys_open (fs/open.c:1231) 
[    7.009734] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    7.009736] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    7.009738]
[    7.009738] [W] wait(&(bit_wait_table + i)->dmap:0):
[    7.009739] prepare_to_wait (kernel/sched/wait.c:275) 
[    7.009741] stacktrace:
[    7.009741] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) 
[    7.009743] schedule (kernel/sched/core.c:6373 (discriminator 1)) 
[    7.009744] io_schedule (./arch/x86/include/asm/current.h:15 kernel/sched/core.c:8392 kernel/sched/core.c:8418) 
[    7.009745] bit_wait_io (./arch/x86/include/asm/current.h:15 kernel/sched/wait_bit.c:210) 
[    7.009746] __wait_on_bit (kernel/sched/wait_bit.c:49) 
[    7.009748] out_of_line_wait_on_bit (kernel/sched/wait_bit.c:65) 
[    7.009749] ext4_read_bh (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 ./include/linux/buffer_head.h:120 fs/ext4/super.c:201) 
[    7.009752] __read_extent_tree_block (fs/ext4/extents.c:545) 
[    7.009754] ext4_find_extent (fs/ext4/extents.c:928) 
[    7.009756] ext4_ext_map_blocks (fs/ext4/extents.c:4099) 
[    7.009757] ext4_map_blocks (fs/ext4/inode.c:563) 
[    7.009759] ext4_getblk (fs/ext4/inode.c:851) 
[    7.009760] ext4_bread (fs/ext4/inode.c:903) 
[    7.009762] __ext4_read_dirblock (fs/ext4/namei.c:117) 
[    7.009764] dx_probe (fs/ext4/namei.c:789) 
[    7.009765] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
[    7.009767]
[    7.009768] [E] up_read(&ei->i_data_sem:0):
[    7.009769] ext4_map_blocks (fs/ext4/inode.c:593) 
[    7.009771] stacktrace:
[    7.009771] up_read (kernel/locking/rwsem.c:1556) 
[    7.009774] ext4_map_blocks (fs/ext4/inode.c:593) 
[    7.009775] ext4_getblk (fs/ext4/inode.c:851) 
[    7.009777] ext4_bread (fs/ext4/inode.c:903) 
[    7.009778] __ext4_read_dirblock (fs/ext4/namei.c:117) 
[    7.009780] dx_probe (fs/ext4/namei.c:789) 
[    7.009782] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
[    7.009784] __ext4_find_entry (fs/ext4/namei.c:1571) 
[    7.009786] ext4_lookup (fs/ext4/namei.c:1770) 
[    7.009788] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) 
[    7.009789] path_openat (fs/namei.c:3401 fs/namei.c:3605) 
[    7.009791] do_filp_open (fs/namei.c:3637) 
[    7.009792] do_sys_openat2 (fs/open.c:1215) 
[    7.009794] do_sys_open (fs/open.c:1231) 
[    7.009795] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    7.009797] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    7.009799] ---------------------------------------------------
[    7.009800] information that might be helpful
[    7.009800] ---------------------------------------------------
[    7.009801] CPU: 0 PID: 611 Comm: rs:main Q:Reg Tainted: G        W         5.17.0-rc1-00014-g8a599299c0cb-dirty #30
[    7.009804] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[    7.009805] Call Trace:
[    7.009806]  <TASK>
[    7.009807] dump_stack_lvl (lib/dump_stack.c:107) 
[    7.009809] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) 
[    7.009812] ? print_circle (kernel/dependency/dept.c:1086) 
[    7.009814] cb_check_dl (kernel/dependency/dept.c:1104) 
[    7.009815] bfs (kernel/dependency/dept.c:860) 
[    7.009818] add_dep (kernel/dependency/dept.c:1423) 
[    7.009820] do_event.isra.25 (kernel/dependency/dept.c:1650) 
[    7.009822] ? __wake_up_common (kernel/sched/wait.c:108) 
[    7.009824] dept_event (kernel/dependency/dept.c:2337) 
[    7.009826] __wake_up_common (kernel/sched/wait.c:109) 
[    7.009828] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
[    7.009830] __wake_up_bit (kernel/sched/wait_bit.c:127) 
[    7.009832] ext4_orphan_del (fs/ext4/orphan.c:282) 
[    7.009835] ? dept_ecxt_exit (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:2478) 
[    7.009837] ext4_truncate (fs/ext4/inode.c:4212) 
[    7.009839] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
[    7.009842] generic_perform_write (mm/filemap.c:3784) 
[    7.009845] ext4_buffered_write_iter (fs/ext4/file.c:269) 
[    7.009848] ext4_file_write_iter (fs/ext4/file.c:677) 
[    7.009851] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[    7.009854] vfs_write (fs/read_write.c:590) 
[    7.009856] ksys_write (fs/read_write.c:644) 
[    7.009857] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:65) 
[    7.009860] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    7.009862] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    7.009865] RIP: 0033:0x7f3b160b335d
[ 7.009867] Code: e1 20 00 00 75 10 b8 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce fa ff ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 17 fb ff ff 48 89 d0 48 83 c4 08 48 3d 01
All code
========
   0:	e1 20                	loope  0x22
   2:	00 00                	add    %al,(%rax)
   4:	75 10                	jne    0x16
   6:	b8 01 00 00 00       	mov    $0x1,%eax
   b:	0f 05                	syscall 
   d:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
  13:	73 31                	jae    0x46
  15:	c3                   	retq   
  16:	48 83 ec 08          	sub    $0x8,%rsp
  1a:	e8 ce fa ff ff       	callq  0xfffffffffffffaed
  1f:	48 89 04 24          	mov    %rax,(%rsp)
  23:	b8 01 00 00 00       	mov    $0x1,%eax
  28:	0f 05                	syscall 
  2a:*	48 8b 3c 24          	mov    (%rsp),%rdi		<-- trapping instruction
  2e:	48 89 c2             	mov    %rax,%rdx
  31:	e8 17 fb ff ff       	callq  0xfffffffffffffb4d
  36:	48 89 d0             	mov    %rdx,%rax
  39:	48 83 c4 08          	add    $0x8,%rsp
  3d:	48                   	rex.W
  3e:	3d                   	.byte 0x3d
  3f:	01                   	.byte 0x1

Code starting with the faulting instruction
===========================================
   0:	48 8b 3c 24          	mov    (%rsp),%rdi
   4:	48 89 c2             	mov    %rax,%rdx
   7:	e8 17 fb ff ff       	callq  0xfffffffffffffb23
   c:	48 89 d0             	mov    %rdx,%rax
   f:	48 83 c4 08          	add    $0x8,%rsp
  13:	48                   	rex.W
  14:	3d                   	.byte 0x3d
  15:	01                   	.byte 0x1
[    7.009869] RSP: 002b:00007f3b1340f180 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[    7.009871] RAX: ffffffffffffffda RBX: 00007f3b040010a0 RCX: 00007f3b160b335d
[    7.009873] RDX: 0000000000000300 RSI: 00007f3b040010a0 RDI: 0000000000000001
[    7.009874] RBP: 0000000000000000 R08: fffffffffffffa15 R09: fffffffffffffa05
[    7.009875] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f3b04000df0
[    7.009876] R13: 00007f3b1340f1a0 R14: 0000000000000220 R15: 0000000000000300
[    7.009879]  </TASK>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Report 2 in ext4 and journal based on v5.17-rc1
  2022-02-17 11:10 ` Report 1 in ext4 and journal based on v5.17-rc1 Byungchul Park
@ 2022-02-17 11:10   ` Byungchul Park
  2022-02-21 19:02     ` Jan Kara
  2022-02-17 13:27   ` Report 1 " Matthew Wilcox
  2022-02-22  8:27   ` Jan Kara
  2 siblings, 1 reply; 67+ messages in thread
From: Byungchul Park @ 2022-02-17 11:10 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

[    9.008161] ===================================================
[    9.008163] DEPT: Circular dependency has been detected.
[    9.008164] 5.17.0-rc1-00015-gb94f67143867-dirty #2 Tainted: G        W
[    9.008166] ---------------------------------------------------
[    9.008167] summary
[    9.008167] ---------------------------------------------------
[    9.008168] *** DEADLOCK ***
[    9.008168]
[    9.008168] context A
[    9.008169]     [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0)
[    9.008171]     [W] wait(&(&journal->j_wait_commit)->dmap:0)
[    9.008172]     [E] event(&(&journal->j_wait_transaction_locked)->dmap:0)
[    9.008173]
[    9.008173] context B
[    9.008174]     [S] down_write(mapping.invalidate_lock:0)
[    9.008175]     [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0)
[    9.008176]     [E] up_write(mapping.invalidate_lock:0)
[    9.008177]
[    9.008178] context C
[    9.008179]     [S] (unknown)(&(&journal->j_wait_commit)->dmap:0)
[    9.008180]     [W] down_write(mapping.invalidate_lock:0)
[    9.008181]     [E] event(&(&journal->j_wait_commit)->dmap:0)
[    9.008181]
[    9.008182] [S]: start of the event context
[    9.008183] [W]: the wait blocked
[    9.008183] [E]: the event not reachable
[    9.008184] ---------------------------------------------------
[    9.008184] context A's detail
[    9.008185] ---------------------------------------------------
[    9.008186] context A
[    9.008186]     [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0)
[    9.008187]     [W] wait(&(&journal->j_wait_commit)->dmap:0)
[    9.008188]     [E] event(&(&journal->j_wait_transaction_locked)->dmap:0)
[    9.008189]
[    9.008190] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0):
[    9.008191] (N/A)
[    9.008191]
[    9.008192] [W] wait(&(&journal->j_wait_commit)->dmap:0):
[    9.008193] prepare_to_wait (kernel/sched/wait.c:275) 
[    9.008197] stacktrace:
[    9.008198] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) 
[    9.008200] schedule (kernel/sched/core.c:6373 (discriminator 1)) 
[    9.008201] kjournald2 (fs/jbd2/journal.c:250) 
[    9.008203] kthread (kernel/kthread.c:377) 
[    9.008206] ret_from_fork (arch/x86/entry/entry_64.S:301) 
[    9.008209]
[    9.008209] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0):
[    9.008210] __wake_up_common (kernel/sched/wait.c:108) 
[    9.008212] stacktrace:
[    9.008213] dept_event (kernel/dependency/dept.c:2337) 
[    9.008215] __wake_up_common (kernel/sched/wait.c:109) 
[    9.008217] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
[    9.008218] jbd2_journal_commit_transaction (fs/jbd2/commit.c:583) 
[    9.008221] kjournald2 (fs/jbd2/journal.c:214 (discriminator 3)) 
[    9.008223] kthread (kernel/kthread.c:377) 
[    9.008224] ret_from_fork (arch/x86/entry/entry_64.S:301) 
[    9.008226] ---------------------------------------------------
[    9.008226] context B's detail
[    9.008227] ---------------------------------------------------
[    9.008228] context B
[    9.008228]     [S] down_write(mapping.invalidate_lock:0)
[    9.008229]     [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0)
[    9.008230]     [E] up_write(mapping.invalidate_lock:0)
[    9.008231]
[    9.008232] [S] down_write(mapping.invalidate_lock:0):
[    9.008233] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
[    9.008237] stacktrace:
[    9.008237] down_write (kernel/locking/rwsem.c:1514) 
[    9.008239] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
[    9.008241] generic_perform_write (mm/filemap.c:3784) 
[    9.008243] ext4_buffered_write_iter (fs/ext4/file.c:269) 
[    9.008245] ext4_file_write_iter (fs/ext4/file.c:677) 
[    9.008247] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[    9.008250] vfs_write (fs/read_write.c:590) 
[    9.008251] ksys_write (fs/read_write.c:644) 
[    9.008253] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    9.008255] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    9.008258]
[    9.008258] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0):
[    9.008259] prepare_to_wait (kernel/sched/wait.c:275) 
[    9.008261] stacktrace:
[    9.008261] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) 
[    9.008263] schedule (kernel/sched/core.c:6373 (discriminator 1)) 
[    9.008264] wait_transaction_locked (fs/jbd2/transaction.c:184) 
[    9.008266] add_transaction_credits (fs/jbd2/transaction.c:248 (discriminator 3)) 
[    9.008267] start_this_handle (fs/jbd2/transaction.c:427) 
[    9.008269] jbd2__journal_start (fs/jbd2/transaction.c:526) 
[    9.008271] __ext4_journal_start_sb (fs/ext4/ext4_jbd2.c:105) 
[    9.008273] ext4_truncate (fs/ext4/inode.c:4164) 
[    9.008274] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
[    9.008276] generic_perform_write (mm/filemap.c:3784) 
[    9.008277] ext4_buffered_write_iter (fs/ext4/file.c:269) 
[    9.008279] ext4_file_write_iter (fs/ext4/file.c:677) 
[    9.008281] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[    9.008283] vfs_write (fs/read_write.c:590) 
[    9.008284] ksys_write (fs/read_write.c:644) 
[    9.008285] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    9.008287]
[    9.008288] [E] up_write(mapping.invalidate_lock:0):
[    9.008288] ext4_da_get_block_prep (fs/ext4/inode.c:1795 fs/ext4/inode.c:1829) 
[    9.008291] ---------------------------------------------------
[    9.008291] context C's detail
[    9.008292] ---------------------------------------------------
[    9.008292] context C
[    9.008293]     [S] (unknown)(&(&journal->j_wait_commit)->dmap:0)
[    9.008294]     [W] down_write(mapping.invalidate_lock:0)
[    9.008295]     [E] event(&(&journal->j_wait_commit)->dmap:0)
[    9.008296]
[    9.008297] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0):
[    9.008298] (N/A)
[    9.008298]
[    9.008299] [W] down_write(mapping.invalidate_lock:0):
[    9.008299] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
[    9.008302] stacktrace:
[    9.008302] down_write (kernel/locking/rwsem.c:1514) 
[    9.008304] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
[    9.008305] generic_perform_write (mm/filemap.c:3784) 
[    9.008307] ext4_buffered_write_iter (fs/ext4/file.c:269) 
[    9.008309] ext4_file_write_iter (fs/ext4/file.c:677) 
[    9.008311] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[    9.008312] vfs_write (fs/read_write.c:590) 
[    9.008314] ksys_write (fs/read_write.c:644) 
[    9.008315] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    9.008316] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    9.008318]
[    9.008319] [E] event(&(&journal->j_wait_commit)->dmap:0):
[    9.008320] __wake_up_common (kernel/sched/wait.c:108) 
[    9.008321] stacktrace:
[    9.008322] __wake_up_common (kernel/sched/wait.c:109) 
[    9.008323] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
[    9.008324] __jbd2_log_start_commit (fs/jbd2/journal.c:508) 
[    9.008326] jbd2_log_start_commit (fs/jbd2/journal.c:527) 
[    9.008327] __jbd2_journal_force_commit (fs/jbd2/journal.c:560) 
[    9.008329] jbd2_journal_force_commit_nested (fs/jbd2/journal.c:583) 
[    9.008331] ext4_should_retry_alloc (fs/ext4/balloc.c:670 (discriminator 3)) 
[    9.008332] ext4_da_write_begin (fs/ext4/inode.c:2965 (discriminator 1)) 
[    9.008334] generic_perform_write (mm/filemap.c:3784) 
[    9.008335] ext4_buffered_write_iter (fs/ext4/file.c:269) 
[    9.008337] ext4_file_write_iter (fs/ext4/file.c:677) 
[    9.008339] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
[    9.008341] vfs_write (fs/read_write.c:590) 
[    9.008342] ksys_write (fs/read_write.c:644) 
[    9.008343] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
[    9.008345] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
[    9.008347] ---------------------------------------------------
[    9.008348] information that might be helpful
[    9.008348] ---------------------------------------------------
[    9.008349] CPU: 0 PID: 89 Comm: jbd2/sda1-8 Tainted: G        W         5.17.0-rc1-00015-gb94f67143867-dirty #2
[    9.008352] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[    9.008353] Call Trace:
[    9.008354]  <TASK>
[    9.008355] dump_stack_lvl (lib/dump_stack.c:107) 
[    9.008358] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) 
[    9.008360] ? print_circle (kernel/dependency/dept.c:1086) 
[    9.008362] cb_check_dl (kernel/dependency/dept.c:1104) 
[    9.008364] bfs (kernel/dependency/dept.c:860) 
[    9.008366] add_dep (kernel/dependency/dept.c:1423) 
[    9.008368] do_event.isra.25 (kernel/dependency/dept.c:1651) 
[    9.008370] ? __wake_up_common (kernel/sched/wait.c:108) 
[    9.008372] dept_event (kernel/dependency/dept.c:2337) 
[    9.008374] __wake_up_common (kernel/sched/wait.c:109) 
[    9.008376] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
[    9.008379] jbd2_journal_commit_transaction (fs/jbd2/commit.c:583) 
[    9.008381] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:24) 
[    9.008385] ? ret_from_fork (arch/x86/entry/entry_64.S:301) 
[    9.008387] ? dept_enable_hardirq (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:1843) 
[    9.008389] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:45 ./arch/x86/include/asm/irqflags.h:80 ./arch/x86/include/asm/irqflags.h:138 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) 
[    9.008392] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:103 ./include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) 
[    9.008394] ? try_to_del_timer_sync (kernel/time/timer.c:1239) 
[    9.008396] kjournald2 (fs/jbd2/journal.c:214 (discriminator 3)) 
[    9.008398] ? prepare_to_wait_exclusive (kernel/sched/wait.c:431) 
[    9.008400] ? commit_timeout (fs/jbd2/journal.c:173) 
[    9.008402] kthread (kernel/kthread.c:377) 
[    9.008404] ? kthread_complete_and_exit (kernel/kthread.c:332) 
[    9.008407] ret_from_fork (arch/x86/entry/entry_64.S:301) 
[    9.008410]  </TASK>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 1 in ext4 and journal based on v5.17-rc1
  2022-02-17 11:10 ` Report 1 in ext4 and journal based on v5.17-rc1 Byungchul Park
  2022-02-17 11:10   ` Report 2 " Byungchul Park
@ 2022-02-17 13:27   ` Matthew Wilcox
  2022-02-18  0:41     ` Byungchul Park
  2022-02-22  8:27   ` Jan Kara
  2 siblings, 1 reply; 67+ messages in thread
From: Matthew Wilcox @ 2022-02-17 13:27 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, axboe, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa

On Thu, Feb 17, 2022 at 08:10:03PM +0900, Byungchul Park wrote:
> [    7.009608] ===================================================
> [    7.009613] DEPT: Circular dependency has been detected.
> [    7.009614] 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 Tainted: G        W
> [    7.009616] ---------------------------------------------------
> [    7.009617] summary
> [    7.009618] ---------------------------------------------------
> [    7.009618] *** DEADLOCK ***
> [    7.009618]
> [    7.009619] context A
> [    7.009619]     [S] (unknown)(&(bit_wait_table + i)->dmap:0)

Why is the context unknown here?  I don't see a way to debug this
without knowing where we acquired the bit wait lock.

> [    7.009621]     [W] down_write(&ei->i_data_sem:0)
> [    7.009623]     [E] event(&(bit_wait_table + i)->dmap:0)
> [    7.009624]
> [    7.009625] context B
> [    7.009625]     [S] down_read(&ei->i_data_sem:0)
> [    7.009626]     [W] wait(&(bit_wait_table + i)->dmap:0)
> [    7.009627]     [E] up_read(&ei->i_data_sem:0)
> [    7.009628]
> [    7.009629] [S]: start of the event context
> [    7.009629] [W]: the wait blocked
> [    7.009630] [E]: the event not reachable
> [    7.009631] ---------------------------------------------------
> [    7.009631] context A's detail
> [    7.009632] ---------------------------------------------------
> [    7.009632] context A
> [    7.009633]     [S] (unknown)(&(bit_wait_table + i)->dmap:0)
> [    7.009634]     [W] down_write(&ei->i_data_sem:0)
> [    7.009635]     [E] event(&(bit_wait_table + i)->dmap:0)
> [    7.009636]
> [    7.009636] [S] (unknown)(&(bit_wait_table + i)->dmap:0):
> [    7.009638] (N/A)
> [    7.009638]
> [    7.009639] [W] down_write(&ei->i_data_sem:0):
> [    7.009639] ext4_truncate (fs/ext4/inode.c:4187) 
> [    7.009645] stacktrace:
> [    7.009646] down_write (kernel/locking/rwsem.c:1514) 
> [    7.009648] ext4_truncate (fs/ext4/inode.c:4187) 
> [    7.009650] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
> [    7.009652] generic_perform_write (mm/filemap.c:3784) 
> [    7.009654] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> [    7.009657] ext4_file_write_iter (fs/ext4/file.c:677) 
> [    7.009659] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> [    7.009662] vfs_write (fs/read_write.c:590) 
> [    7.009663] ksys_write (fs/read_write.c:644) 
> [    7.009664] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> [    7.009667] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> [    7.009669]
> [    7.009670] [E] event(&(bit_wait_table + i)->dmap:0):
> [    7.009671] __wake_up_common (kernel/sched/wait.c:108) 
> [    7.009673] stacktrace:
> [    7.009674] dept_event (kernel/dependency/dept.c:2337) 
> [    7.009677] __wake_up_common (kernel/sched/wait.c:109) 
> [    7.009678] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
> [    7.009679] __wake_up_bit (kernel/sched/wait_bit.c:127) 
> [    7.009681] ext4_orphan_del (fs/ext4/orphan.c:282) 
> [    7.009683] ext4_truncate (fs/ext4/inode.c:4212) 
> [    7.009685] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
> [    7.009687] generic_perform_write (mm/filemap.c:3784) 
> [    7.009688] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> [    7.009690] ext4_file_write_iter (fs/ext4/file.c:677) 
> [    7.009692] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> [    7.009694] vfs_write (fs/read_write.c:590) 
> [    7.009695] ksys_write (fs/read_write.c:644) 
> [    7.009696] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> [    7.009698] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> [    7.009700] ---------------------------------------------------
> [    7.009700] context B's detail
> [    7.009701] ---------------------------------------------------
> [    7.009702] context B
> [    7.009702]     [S] down_read(&ei->i_data_sem:0)
> [    7.009703]     [W] wait(&(bit_wait_table + i)->dmap:0)
> [    7.009704]     [E] up_read(&ei->i_data_sem:0)
> [    7.009705]
> [    7.009706] [S] down_read(&ei->i_data_sem:0):
> [    7.009707] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) 
> [    7.009709] stacktrace:
> [    7.009709] down_read (kernel/locking/rwsem.c:1461) 
> [    7.009711] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) 
> [    7.009712] ext4_getblk (fs/ext4/inode.c:851) 
> [    7.009714] ext4_bread (fs/ext4/inode.c:903) 
> [    7.009715] __ext4_read_dirblock (fs/ext4/namei.c:117) 
> [    7.009718] dx_probe (fs/ext4/namei.c:789) 
> [    7.009720] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
> [    7.009722] __ext4_find_entry (fs/ext4/namei.c:1571) 
> [    7.009723] ext4_lookup (fs/ext4/namei.c:1770) 
> [    7.009725] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) 
> [    7.009727] path_openat (fs/namei.c:3401 fs/namei.c:3605) 
> [    7.009729] do_filp_open (fs/namei.c:3637) 
> [    7.009731] do_sys_openat2 (fs/open.c:1215) 
> [    7.009732] do_sys_open (fs/open.c:1231) 
> [    7.009734] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> [    7.009736] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> [    7.009738]
> [    7.009738] [W] wait(&(bit_wait_table + i)->dmap:0):
> [    7.009739] prepare_to_wait (kernel/sched/wait.c:275) 
> [    7.009741] stacktrace:
> [    7.009741] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) 
> [    7.009743] schedule (kernel/sched/core.c:6373 (discriminator 1)) 
> [    7.009744] io_schedule (./arch/x86/include/asm/current.h:15 kernel/sched/core.c:8392 kernel/sched/core.c:8418) 
> [    7.009745] bit_wait_io (./arch/x86/include/asm/current.h:15 kernel/sched/wait_bit.c:210) 
> [    7.009746] __wait_on_bit (kernel/sched/wait_bit.c:49) 
> [    7.009748] out_of_line_wait_on_bit (kernel/sched/wait_bit.c:65) 
> [    7.009749] ext4_read_bh (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 ./include/linux/buffer_head.h:120 fs/ext4/super.c:201) 
> [    7.009752] __read_extent_tree_block (fs/ext4/extents.c:545) 
> [    7.009754] ext4_find_extent (fs/ext4/extents.c:928) 
> [    7.009756] ext4_ext_map_blocks (fs/ext4/extents.c:4099) 
> [    7.009757] ext4_map_blocks (fs/ext4/inode.c:563) 
> [    7.009759] ext4_getblk (fs/ext4/inode.c:851) 
> [    7.009760] ext4_bread (fs/ext4/inode.c:903) 
> [    7.009762] __ext4_read_dirblock (fs/ext4/namei.c:117) 
> [    7.009764] dx_probe (fs/ext4/namei.c:789) 
> [    7.009765] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
> [    7.009767]
> [    7.009768] [E] up_read(&ei->i_data_sem:0):
> [    7.009769] ext4_map_blocks (fs/ext4/inode.c:593) 
> [    7.009771] stacktrace:
> [    7.009771] up_read (kernel/locking/rwsem.c:1556) 
> [    7.009774] ext4_map_blocks (fs/ext4/inode.c:593) 
> [    7.009775] ext4_getblk (fs/ext4/inode.c:851) 
> [    7.009777] ext4_bread (fs/ext4/inode.c:903) 
> [    7.009778] __ext4_read_dirblock (fs/ext4/namei.c:117) 
> [    7.009780] dx_probe (fs/ext4/namei.c:789) 
> [    7.009782] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
> [    7.009784] __ext4_find_entry (fs/ext4/namei.c:1571) 
> [    7.009786] ext4_lookup (fs/ext4/namei.c:1770) 
> [    7.009788] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) 
> [    7.009789] path_openat (fs/namei.c:3401 fs/namei.c:3605) 
> [    7.009791] do_filp_open (fs/namei.c:3637) 
> [    7.009792] do_sys_openat2 (fs/open.c:1215) 
> [    7.009794] do_sys_open (fs/open.c:1231) 
> [    7.009795] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> [    7.009797] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> [    7.009799] ---------------------------------------------------
> [    7.009800] information that might be helpful
> [    7.009800] ---------------------------------------------------
> [    7.009801] CPU: 0 PID: 611 Comm: rs:main Q:Reg Tainted: G        W         5.17.0-rc1-00014-g8a599299c0cb-dirty #30
> [    7.009804] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> [    7.009805] Call Trace:
> [    7.009806]  <TASK>
> [    7.009807] dump_stack_lvl (lib/dump_stack.c:107) 
> [    7.009809] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) 
> [    7.009812] ? print_circle (kernel/dependency/dept.c:1086) 
> [    7.009814] cb_check_dl (kernel/dependency/dept.c:1104) 
> [    7.009815] bfs (kernel/dependency/dept.c:860) 
> [    7.009818] add_dep (kernel/dependency/dept.c:1423) 
> [    7.009820] do_event.isra.25 (kernel/dependency/dept.c:1650) 
> [    7.009822] ? __wake_up_common (kernel/sched/wait.c:108) 
> [    7.009824] dept_event (kernel/dependency/dept.c:2337) 
> [    7.009826] __wake_up_common (kernel/sched/wait.c:109) 
> [    7.009828] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
> [    7.009830] __wake_up_bit (kernel/sched/wait_bit.c:127) 
> [    7.009832] ext4_orphan_del (fs/ext4/orphan.c:282) 
> [    7.009835] ? dept_ecxt_exit (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:2478) 
> [    7.009837] ext4_truncate (fs/ext4/inode.c:4212) 
> [    7.009839] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
> [    7.009842] generic_perform_write (mm/filemap.c:3784) 
> [    7.009845] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> [    7.009848] ext4_file_write_iter (fs/ext4/file.c:677) 
> [    7.009851] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> [    7.009854] vfs_write (fs/read_write.c:590) 
> [    7.009856] ksys_write (fs/read_write.c:644) 
> [    7.009857] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:65) 
> [    7.009860] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> [    7.009862] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> [    7.009865] RIP: 0033:0x7f3b160b335d
> [ 7.009867] Code: e1 20 00 00 75 10 b8 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce fa ff ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 17 fb ff ff 48 89 d0 48 83 c4 08 48 3d 01
> All code
> ========
>    0:	e1 20                	loope  0x22
>    2:	00 00                	add    %al,(%rax)
>    4:	75 10                	jne    0x16
>    6:	b8 01 00 00 00       	mov    $0x1,%eax
>    b:	0f 05                	syscall 
>    d:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>   13:	73 31                	jae    0x46
>   15:	c3                   	retq   
>   16:	48 83 ec 08          	sub    $0x8,%rsp
>   1a:	e8 ce fa ff ff       	callq  0xfffffffffffffaed
>   1f:	48 89 04 24          	mov    %rax,(%rsp)
>   23:	b8 01 00 00 00       	mov    $0x1,%eax
>   28:	0f 05                	syscall 
>   2a:*	48 8b 3c 24          	mov    (%rsp),%rdi		<-- trapping instruction
>   2e:	48 89 c2             	mov    %rax,%rdx
>   31:	e8 17 fb ff ff       	callq  0xfffffffffffffb4d
>   36:	48 89 d0             	mov    %rdx,%rax
>   39:	48 83 c4 08          	add    $0x8,%rsp
>   3d:	48                   	rex.W
>   3e:	3d                   	.byte 0x3d
>   3f:	01                   	.byte 0x1
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	48 8b 3c 24          	mov    (%rsp),%rdi
>    4:	48 89 c2             	mov    %rax,%rdx
>    7:	e8 17 fb ff ff       	callq  0xfffffffffffffb23
>    c:	48 89 d0             	mov    %rdx,%rax
>    f:	48 83 c4 08          	add    $0x8,%rsp
>   13:	48                   	rex.W
>   14:	3d                   	.byte 0x3d
>   15:	01                   	.byte 0x1
> [    7.009869] RSP: 002b:00007f3b1340f180 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> [    7.009871] RAX: ffffffffffffffda RBX: 00007f3b040010a0 RCX: 00007f3b160b335d
> [    7.009873] RDX: 0000000000000300 RSI: 00007f3b040010a0 RDI: 0000000000000001
> [    7.009874] RBP: 0000000000000000 R08: fffffffffffffa15 R09: fffffffffffffa05
> [    7.009875] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f3b04000df0
> [    7.009876] R13: 00007f3b1340f1a0 R14: 0000000000000220 R15: 0000000000000300
> [    7.009879]  </TASK>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/16] DEPT(Dependency Tracker)
  2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
                   ` (16 preceding siblings ...)
  2022-02-17 11:10 ` Report 1 in ext4 and journal based on v5.17-rc1 Byungchul Park
@ 2022-02-17 15:51 ` Theodore Ts'o
  2022-02-17 17:00   ` Steven Rostedt
  2022-02-19  9:54   ` Byungchul Park
  17 siblings, 2 replies; 67+ messages in thread
From: Theodore Ts'o @ 2022-02-17 15:51 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, axboe, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa

On Thu, Feb 17, 2022 at 07:57:36PM +0900, Byungchul Park wrote:
> 
> I've got several reports from the tool. Some of them look like false
> alarms and some others look like real deadlock possibility. Because of
> my unfamiliarity of the domain, it's hard to confirm if it's a real one.
> Let me add the reports on this email thread.

The problem is we have so many potentially invalid, or
so-rare-as-to-be-not-worth-the-time-to-investigate-in-the-
grand-scheme-of-all-of-the-fires-burning-on-maintainers laps that it's
really not reasonable to ask maintainers to determine whether
something is a false alarm or not.  If I want more of these unreliable
potential bug reports to investigate, there is a huge backlog in
Syzkaller.  :-)

Looking at the second ext4 report, it doesn't make any sense.  Context
A is the kjournald thread.  We don't do a commit until (a) the timeout
expires, or (b) someone explicitly requests that a commit happen
waking up j_wait_commit.  I'm guessing that complaint here is that
DEPT thinks nothing is explicitly requesting a wake up.  But note that
after 5 seconds (or whatever journal->j_commit_interval) is configured
to be we *will* always start a commit.  So ergo, there can't be a deadlock.

At a higher level of discussion, it's an unfair tax on maintainer's
times to ask maintainers to help you debug DEPT for you.  Tools like
Syzkaller and DEPT are useful insofar as they save us time in making
our subsystems better.  But until you can prove that it's not going to
be a massive denial of service attack on maintainer's time, at the
*very* least keep an RFC on the patch, or add massive warnings that
more often than not DEPT is going to be sending maintainers on a wild
goose chase.

If you know there there "appear to be false positives", you need to
make sure you've tracked them all down before trying to ask that this
be merged.

You may also want to add some documentation about why we should trust
this; in particular for wait channels, when a process calls schedule()
there may be multiple reasons why the thread will wake up --- in the
worst case, such as in the select(2) or epoll(2) system call, there
may be literally thousands of reasons (one for every file desriptor
the select is waiting on) --- why the process will wake up and thus
resolve the potential "deadlock" that DEPT is worrying about.  How is
DEPT going to handle those cases?  If the answer is that things need
to be tagged, then at least disclose potential reasons why DEPT might
be untrustworthy to save your reviewers time.

I know that you're trying to help us, but this tool needs to be far
better than Lockdep before we should think about merging it.  Even if
it finds 5% more potential deadlocks, if it creates 95% more false
positive reports --- and the ones it finds are crazy things that
rarely actually happen in practice, are the costs worth the benefits?
And who is bearing the costs, and who is receiving the benefits?

Regards,

					- Ted

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 02/16] dept: Implement Dept(Dependency Tracker)
  2022-02-17 10:57 ` [PATCH 02/16] dept: Implement Dept(Dependency Tracker) Byungchul Park
@ 2022-02-17 15:54   ` Steven Rostedt
  2022-02-17 17:36   ` Steven Rostedt
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 67+ messages in thread
From: Steven Rostedt @ 2022-02-17 15:54 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Thu, 17 Feb 2022 19:57:38 +0900
Byungchul Park <byungchul.park@lge.com> wrote:

> diff --git a/kernel/dependency/Makefile b/kernel/dependency/Makefile
> new file mode 100644
> index 0000000..9f7778e
> --- /dev/null
> +++ b/kernel/dependency/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_DEPT) += dept.o
> +

FYI, git complains about the extra new line at the end of the file.

-- Steve

> diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 10/16] dept: Add proc knobs to show stats and dependency graph
  2022-02-17 10:57 ` [PATCH 10/16] dept: Add proc knobs to show stats and dependency graph Byungchul Park
@ 2022-02-17 15:55   ` Steven Rostedt
  0 siblings, 0 replies; 67+ messages in thread
From: Steven Rostedt @ 2022-02-17 15:55 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Thu, 17 Feb 2022 19:57:46 +0900
Byungchul Park <byungchul.park@lge.com> wrote:

> +static int __init dept_proc_init(void)
> +{
> +	proc_create_seq("dept_deps", S_IRUSR, NULL, &dept_deps_ops);
> +	proc_create_single("dept_stats", S_IRUSR, NULL, dept_stats_show);
> +	return 0;
> +}
> +
> +__initcall(dept_proc_init);
> +

And git complains about this extra line at the end of the file too.

-- Steve

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/16] DEPT(Dependency Tracker)
  2022-02-17 15:51 ` [PATCH 00/16] DEPT(Dependency Tracker) Theodore Ts'o
@ 2022-02-17 17:00   ` Steven Rostedt
  2022-02-17 17:06     ` Matthew Wilcox
                       ` (2 more replies)
  2022-02-19  9:54   ` Byungchul Park
  1 sibling, 3 replies; 67+ messages in thread
From: Steven Rostedt @ 2022-02-17 17:00 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Byungchul Park, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, linux-kernel, peterz, will,
	tglx, joel, sashal, daniel.vetter, chris, duyuyang,
	johannes.berg, tj, willy, david, amir73il, bfields, gregkh,
	kernel-team, linux-mm, akpm, mhocko, minchan, hannes,
	vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka,
	ngupta, linux-block, axboe, paolo.valente, josef, linux-fsdevel,
	viro, jack, jack, jlayton, dan.j.williams, hch, djwong,
	dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Thu, 17 Feb 2022 10:51:09 -0500
"Theodore Ts'o" <tytso@mit.edu> wrote:

> I know that you're trying to help us, but this tool needs to be far
> better than Lockdep before we should think about merging it.  Even if
> it finds 5% more potential deadlocks, if it creates 95% more false
> positive reports --- and the ones it finds are crazy things that
> rarely actually happen in practice, are the costs worth the benefits?
> And who is bearing the costs, and who is receiving the benefits?

I personally believe that there's potential that this can be helpful and we
will want to merge it.

But, what I believe Ted is trying to say is, if you do not know if the
report is a bug or not, please do not ask the maintainers to determine it
for you. This is a good opportunity for you to look to see why your tool
reported an issue, and learn that subsystem. Look at if this is really a
bug or not, and investigate why.

The likely/unlikely tracing I do finds issues all over the kernel. But
before I report anything, I look at the subsystem and determine *why* it's
reporting what it does. In some cases, it's just a config issue. Where, I
may submit a patch saying "this is 100% wrong in X config, and we should
just remove the "unlikely". But I did the due diligence to find out exactly
what the issue is, and why the tooling reported what it reported.

I want to stress that your Dept tooling looks to have the potential of
being something that will be worth while including. But the false positives
needs to be down to the rate of lockdep false positives. As Ted said, if
it's reporting 95% false positives, nobody is going to look at the 5% of
real bugs that it finds.

-- Steve

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/16] DEPT(Dependency Tracker)
  2022-02-17 17:00   ` Steven Rostedt
@ 2022-02-17 17:06     ` Matthew Wilcox
  2022-02-19 10:05       ` Byungchul Park
  2022-02-18  4:19     ` Theodore Ts'o
  2022-02-19 10:18     ` Byungchul Park
  2 siblings, 1 reply; 67+ messages in thread
From: Matthew Wilcox @ 2022-02-17 17:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Theodore Ts'o, Byungchul Park, torvalds, damien.lemoal,
	linux-ide, adilger.kernel, linux-ext4, mingo, linux-kernel,
	peterz, will, tglx, joel, sashal, daniel.vetter, chris, duyuyang,
	johannes.berg, tj, david, amir73il, bfields, gregkh, kernel-team,
	linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj,
	jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta,
	linux-block, axboe, paolo.valente, josef, linux-fsdevel, viro,
	jack, jack, jlayton, dan.j.williams, hch, djwong, dri-devel,
	airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Thu, Feb 17, 2022 at 12:00:05PM -0500, Steven Rostedt wrote:
> On Thu, 17 Feb 2022 10:51:09 -0500
> "Theodore Ts'o" <tytso@mit.edu> wrote:
> 
> > I know that you're trying to help us, but this tool needs to be far
> > better than Lockdep before we should think about merging it.  Even if
> > it finds 5% more potential deadlocks, if it creates 95% more false
> > positive reports --- and the ones it finds are crazy things that
> > rarely actually happen in practice, are the costs worth the benefits?
> > And who is bearing the costs, and who is receiving the benefits?
> 
> I personally believe that there's potential that this can be helpful and we
> will want to merge it.
> 
> But, what I believe Ted is trying to say is, if you do not know if the
> report is a bug or not, please do not ask the maintainers to determine it
> for you. This is a good opportunity for you to look to see why your tool
> reported an issue, and learn that subsystem. Look at if this is really a
> bug or not, and investigate why.

I agree with Steven here, to the point where I'm willing to invest some
time being a beta-tester for this, so if you focus your efforts on
filesystem/mm kinds of problems, I can continue looking at them and
tell you what's helpful and what's unhelpful in the reports.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 02/16] dept: Implement Dept(Dependency Tracker)
  2022-02-17 10:57 ` [PATCH 02/16] dept: Implement Dept(Dependency Tracker) Byungchul Park
  2022-02-17 15:54   ` Steven Rostedt
@ 2022-02-17 17:36   ` Steven Rostedt
  2022-02-18  6:09     ` Byungchul Park
  2022-02-17 19:46   ` kernel test robot
  2022-02-17 19:46   ` kernel test robot
  3 siblings, 1 reply; 67+ messages in thread
From: Steven Rostedt @ 2022-02-17 17:36 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Thu, 17 Feb 2022 19:57:38 +0900
Byungchul Park <byungchul.park@lge.com> wrote:

> diff --git a/include/linux/dept.h b/include/linux/dept.h
> new file mode 100644
> index 0000000..2ac4bca
> --- /dev/null
> +++ b/include/linux/dept.h
> @@ -0,0 +1,480 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * DEPT(DEPendency Tracker) - runtime dependency tracker
> + *
> + * Started by Byungchul Park <max.byungchul.park@gmail.com>:
> + *
> + *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
> + */
> +
> +#ifndef __LINUX_DEPT_H
> +#define __LINUX_DEPT_H
> +
> +#ifdef CONFIG_DEPT
> +
> +#include <linux/types.h>
> +
> +struct task_struct;
> +
> +#define DEPT_MAX_STACK_ENTRY		16
> +#define DEPT_MAX_WAIT_HIST		64
> +#define DEPT_MAX_ECXT_HELD		48
> +
> +#define DEPT_MAX_SUBCLASSES		16
> +#define DEPT_MAX_SUBCLASSES_EVT		2
> +#define DEPT_MAX_SUBCLASSES_USR		(DEPT_MAX_SUBCLASSES / DEPT_MAX_SUBCLASSES_EVT)
> +#define DEPT_MAX_SUBCLASSES_CACHE	2
> +
> +#define DEPT_SIRQ			0
> +#define DEPT_HIRQ			1
> +#define DEPT_IRQS_NR			2
> +#define DEPT_SIRQF			(1UL << DEPT_SIRQ)
> +#define DEPT_HIRQF			(1UL << DEPT_HIRQ)
> +
> +struct dept_ecxt;
> +struct dept_iecxt {
> +	struct dept_ecxt *ecxt;
> +	int enirq;
> +	bool staled; /* for preventing to add a new ecxt */
> +};
> +
> +struct dept_wait;
> +struct dept_iwait {
> +	struct dept_wait *wait;
> +	int irq;
> +	bool staled; /* for preventing to add a new wait */
> +	bool touched;
> +};

Nit. It makes it easier to read (and then review) if structures are spaced
where their fields are all lined up:

struct dept_iecxt {
	struct dept_ecxt		*ecxt;
	int				enirq;
	bool				staled;
};

struct dept_iwait {
	struct dept_wait		*wait;
	int				irq;
	bool				staled;
	bool				touched;
};

See, the fields stand out, and is nicer on the eyes. Especially for those
of us that are getting up in age, and our eyes do not work as well as they
use to ;-)

> +
> +struct dept_class {
> +	union {
> +		struct llist_node pool_node;
> +
> +		/*
> +		 * reference counter for object management
> +		 */
> +		atomic_t ref;
> +	};
> +
> +	/*
> +	 * unique information about the class
> +	 */
> +	const char *name;
> +	unsigned long key;
> +	int sub;
> +
> +	/*
> +	 * for BFS
> +	 */
> +	unsigned int bfs_gen;
> +	int bfs_dist;
> +	struct dept_class *bfs_parent;
> +
> +	/*
> +	 * for hashing this object
> +	 */
> +	struct hlist_node hash_node;
> +
> +	/*
> +	 * for linking all classes
> +	 */
> +	struct list_head all_node;
> +
> +	/*
> +	 * for associating its dependencies
> +	 */
> +	struct list_head dep_head;
> +	struct list_head dep_rev_head;
> +
> +	/*
> +	 * for tracking IRQ dependencies
> +	 */
> +	struct dept_iecxt iecxt[DEPT_IRQS_NR];
> +	struct dept_iwait iwait[DEPT_IRQS_NR];
> +};
> +


> diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
> new file mode 100644
> index 0000000..4a3ab39
> --- /dev/null
> +++ b/kernel/dependency/dept.c
> @@ -0,0 +1,2585 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * DEPT(DEPendency Tracker) - Runtime dependency tracker
> + *
> + * Started by Byungchul Park <max.byungchul.park@gmail.com>:
> + *
> + *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
> + *
> + * DEPT provides a general way to detect deadlock possibility in runtime
> + * and the interest is not limited to typical lock but to every
> + * syncronization primitives.
> + *
[..]

> + *
> + *
> + * ---
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your ootion) any later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, you can access it online at
> + * http://www.gnu.org/licenses/gpl-2.0.html.

The SPDX at the top of the file is all that is needed. Please remove this
boiler plate. We do not use GPL boiler plates in the kernel anymore. The
SPDX code supersedes that.

> + */
> +
> +#include <linux/sched.h>
> +#include <linux/stacktrace.h>
> +#include <linux/spinlock.h>
> +#include <linux/kallsyms.h>
> +#include <linux/hash.h>
> +#include <linux/dept.h>
> +#include <linux/utsname.h>
> +
> +static int dept_stop;
> +static int dept_per_cpu_ready;
> +
> +#define DEPT_READY_WARN (!oops_in_progress)
> +
> +/*
> + * Make all operations using DEPT_WARN_ON() fail on oops_in_progress and
> + * prevent warning message.
> + */
> +#define DEPT_WARN_ON_ONCE(c)						\
> +	({								\
> +		int __ret = 0;						\
> +									\
> +		if (likely(DEPT_READY_WARN))				\
> +			__ret = WARN_ONCE(c, "DEPT_WARN_ON_ONCE: " #c);	\
> +		__ret;							\
> +	})
> +
> +#define DEPT_WARN_ONCE(s...)						\
> +	({								\
> +		if (likely(DEPT_READY_WARN))				\
> +			WARN_ONCE(1, "DEPT_WARN_ONCE: " s);		\
> +	})
> +
> +#define DEPT_WARN_ON(c)							\
> +	({								\
> +		int __ret = 0;						\
> +									\
> +		if (likely(DEPT_READY_WARN))				\
> +			__ret = WARN(c, "DEPT_WARN_ON: " #c);		\
> +		__ret;							\
> +	})
> +
> +#define DEPT_WARN(s...)							\
> +	({								\
> +		if (likely(DEPT_READY_WARN))				\
> +			WARN(1, "DEPT_WARN: " s);			\
> +	})
> +
> +#define DEPT_STOP(s...)							\
> +	({								\
> +		WRITE_ONCE(dept_stop, 1);				\
> +		if (likely(DEPT_READY_WARN))				\
> +			WARN(1, "DEPT_STOP: " s);			\
> +	})
> +
> +static arch_spinlock_t dept_spin = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
> +
> +/*
> + * DEPT internal engine should be careful in using outside functions
> + * e.g. printk at reporting since that kind of usage might cause
> + * untrackable deadlock.
> + */
> +static atomic_t dept_outworld = ATOMIC_INIT(0);
> +
> +static inline void dept_outworld_enter(void)
> +{
> +	atomic_inc(&dept_outworld);
> +}
> +
> +static inline void dept_outworld_exit(void)
> +{
> +	atomic_dec(&dept_outworld);
> +}
> +
> +static inline bool dept_outworld_entered(void)
> +{
> +	return atomic_read(&dept_outworld);
> +}
> +
> +static inline bool dept_lock(void)
> +{
> +	while (!arch_spin_trylock(&dept_spin))
> +		if (unlikely(dept_outworld_entered()))
> +			return false;
> +	return true;
> +}
> +
> +static inline void dept_unlock(void)
> +{
> +	arch_spin_unlock(&dept_spin);
> +}
> +
> +/*
> + * whether to stack-trace on every wait or every ecxt
> + */
> +static bool rich_stack = true;
> +
> +enum bfs_ret {
> +	BFS_CONTINUE,
> +	BFS_CONTINUE_REV,
> +	BFS_DONE,
> +	BFS_SKIP,
> +};
> +
> +static inline bool before(unsigned int a, unsigned int b)
> +{
> +	return (int)(a - b) < 0;
> +}
> +
> +static inline bool valid_stack(struct dept_stack *s)
> +{
> +	return s && s->nr > 0;
> +}
> +
> +static inline bool valid_class(struct dept_class *c)
> +{
> +	return c->key;
> +}
> +
> +static inline void inval_class(struct dept_class *c)
> +{
> +	c->key = 0UL;
> +}
> +
> +static inline struct dept_ecxt *dep_e(struct dept_dep *d)
> +{
> +	return d->ecxt;
> +}
> +
> +static inline struct dept_wait *dep_w(struct dept_dep *d)
> +{
> +	return d->wait;
> +}
> +
> +static inline struct dept_class *dep_fc(struct dept_dep *d)
> +{
> +	return dep_e(d)->class;
> +}
> +
> +static inline struct dept_class *dep_tc(struct dept_dep *d)
> +{
> +	return dep_w(d)->class;
> +}
> +
> +static inline const char *irq_str(int irq)
> +{
> +	if (irq == DEPT_SIRQ)
> +		return "softirq";
> +	if (irq == DEPT_HIRQ)
> +		return "hardirq";
> +	return "(unknown)";
> +}
> +
> +static inline struct dept_task *dept_task(void)
> +{
> +	return &current->dept_task;
> +}
> +
> +/*
> + * Pool
> + * =====================================================================
> + * DEPT maintains pools to provide objects in a safe way.
> + *
> + *    1) Static pool is used at the beginning of booting time.
> + *    2) Local pool is tried first before the static pool. Objects that
> + *       have been freed will be placed.
> + */
> +
> +enum object_t {
> +#define OBJECT(id, nr) OBJECT_##id,
> +	#include "dept_object.h"
> +#undef  OBJECT
> +	OBJECT_NR,
> +};
> +
> +#define OBJECT(id, nr)							\
> +static struct dept_##id spool_##id[nr];					\
> +static DEFINE_PER_CPU(struct llist_head, lpool_##id);
> +	#include "dept_object.h"
> +#undef  OBJECT
> +
> +static struct dept_pool pool[OBJECT_NR] = {
> +#define OBJECT(id, nr) {						\
> +	.name = #id,							\
> +	.obj_sz = sizeof(struct dept_##id),				\
> +	.obj_nr = ATOMIC_INIT(nr),					\
> +	.node_off = offsetof(struct dept_##id, pool_node),		\
> +	.spool = spool_##id,						\
> +	.lpool = &lpool_##id, },
> +	#include "dept_object.h"
> +#undef  OBJECT
> +};
> +
> +/*
> + * Can use llist no matter whether CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG is
> + * enabled because DEPT never race with NMI by nesting control.

                         "never races with"

Although, I'm confused by what you mean with "by nesting control".

> + */
> +static void *from_pool(enum object_t t)
> +{
> +	struct dept_pool *p;
> +	struct llist_head *h;
> +	struct llist_node *n;
> +
> +	/*
> +	 * llist_del_first() doesn't allow concurrent access e.g.
> +	 * between process and IRQ context.
> +	 */
> +	if (DEPT_WARN_ON(!irqs_disabled()))
> +		return NULL;
> +
> +	p = &pool[t];
> +
> +	/*
> +	 * Try local pool first.
> +	 */
> +	if (likely(dept_per_cpu_ready))
> +		h = this_cpu_ptr(p->lpool);
> +	else
> +		h = &p->boot_pool;
> +
> +	n = llist_del_first(h);
> +	if (n)
> +		return (void *)n - p->node_off;
> +
> +	/*
> +	 * Try static pool.
> +	 */
> +	if (atomic_read(&p->obj_nr) > 0) {
> +		int idx = atomic_dec_return(&p->obj_nr);
> +		if (idx >= 0)
> +			return p->spool + (idx * p->obj_sz);
> +	}
> +
> +	DEPT_WARN_ONCE("Pool(%s) is empty.\n", p->name);
> +	return NULL;
> +}
> +
> +static void to_pool(void *o, enum object_t t)
> +{
> +	struct dept_pool *p = &pool[t];
> +	struct llist_head *h;
> +
> +	preempt_disable();
> +	if (likely(dept_per_cpu_ready))
> +		h = this_cpu_ptr(p->lpool);
> +	else
> +		h = &p->boot_pool;
> +
> +	llist_add(o + p->node_off, h);
> +	preempt_enable();
> +}
> +
> +#define OBJECT(id, nr)							\
> +static void (*ctor_##id)(struct dept_##id *a);				\
> +static void (*dtor_##id)(struct dept_##id *a);				\
> +static inline struct dept_##id *new_##id(void)				\
> +{									\
> +	struct dept_##id *a;						\
> +									\
> +	a = (struct dept_##id *)from_pool(OBJECT_##id);			\
> +	if (unlikely(!a))						\
> +		return NULL;						\
> +									\
> +	atomic_set(&a->ref, 1);						\
> +									\
> +	if (ctor_##id)							\
> +		ctor_##id(a);						\
> +									\
> +	return a;							\
> +}									\
> +									\
> +static inline struct dept_##id *get_##id(struct dept_##id *a)		\
> +{									\
> +	atomic_inc(&a->ref);						\
> +	return a;							\
> +}									\
> +									\
> +static inline void put_##id(struct dept_##id *a)			\
> +{									\
> +	if (!atomic_dec_return(&a->ref)) {				\
> +		if (dtor_##id)						\
> +			dtor_##id(a);					\
> +		to_pool(a, OBJECT_##id);				\
> +	}								\
> +}									\
> +									\
> +static inline void del_##id(struct dept_##id *a)			\
> +{									\
> +	put_##id(a);							\
> +}									\
> +									\
> +static inline bool id##_consumed(struct dept_##id *a)			\
> +{									\
> +	return a && atomic_read(&a->ref) > 1;				\
> +}
> +#include "dept_object.h"
> +#undef  OBJECT
> +
> +#define SET_CONSTRUCTOR(id, f) \
> +static void (*ctor_##id)(struct dept_##id *a) = f
> +
> +static void initialize_dep(struct dept_dep *d)
> +{
> +	INIT_LIST_HEAD(&d->bfs_node);
> +	INIT_LIST_HEAD(&d->dep_node);
> +	INIT_LIST_HEAD(&d->dep_rev_node);
> +}
> +SET_CONSTRUCTOR(dep, initialize_dep);
> +
> +static void initialize_class(struct dept_class *c)
> +{
> +	int i;
> +
> +	for (i = 0; i < DEPT_IRQS_NR; i++) {
> +		struct dept_iecxt *ie = &c->iecxt[i];
> +		struct dept_iwait *iw = &c->iwait[i];
> +
> +		ie->ecxt = NULL;
> +		ie->enirq = i;
> +		ie->staled = false;
> +
> +		iw->wait = NULL;
> +		iw->irq = i;
> +		iw->staled = false;
> +		iw->touched = false;
> +	}
> +	c->bfs_gen = 0U;

Is the U really necessary?

> +
> +	INIT_LIST_HEAD(&c->all_node);
> +	INIT_LIST_HEAD(&c->dep_head);
> +	INIT_LIST_HEAD(&c->dep_rev_head);
> +}
> +SET_CONSTRUCTOR(class, initialize_class);
> +
> +static void initialize_ecxt(struct dept_ecxt *e)
> +{
> +	int i;
> +
> +	for (i = 0; i < DEPT_IRQS_NR; i++) {
> +		e->enirq_stack[i] = NULL;
> +		e->enirq_ip[i] = 0UL;
> +	}
> +	e->ecxt_ip = 0UL;

Even UL is not necessary. Zero is zero.

> +	e->ecxt_stack = NULL;
> +	e->enirqf = 0UL;
> +	e->event_stack = NULL;
> +}
> +SET_CONSTRUCTOR(ecxt, initialize_ecxt);
> +

-- Steve

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 02/16] dept: Implement Dept(Dependency Tracker)
  2022-02-17 10:57 ` [PATCH 02/16] dept: Implement Dept(Dependency Tracker) Byungchul Park
  2022-02-17 15:54   ` Steven Rostedt
  2022-02-17 17:36   ` Steven Rostedt
@ 2022-02-17 19:46   ` kernel test robot
  2022-02-17 19:46   ` kernel test robot
  3 siblings, 0 replies; 67+ messages in thread
From: kernel test robot @ 2022-02-17 19:46 UTC (permalink / raw)
  To: Byungchul Park, torvalds
  Cc: kbuild-all, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes

Hi Byungchul,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on hnaz-mm/master linux/master linus/master v5.17-rc4]
[cannot apply to tip/locking/core next-20220217]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Byungchul-Park/DEPT-Dependency-Tracker/20220217-190040
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 3624ba7b5e2acc02b01301ea5fd3534971eb9896
config: sh-allmodconfig (https://download.01.org/0day-ci/archive/20220217/202202172345.rVBMAj8W-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/4d0434b0b917f4374a09f3b75cbcadf148cfa711
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Byungchul-Park/DEPT-Dependency-Tracker/20220217-190040
        git checkout 4d0434b0b917f4374a09f3b75cbcadf148cfa711
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=sh SHELL=/bin/bash kernel/dependency/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> kernel/dependency/dept.c:2105:6: warning: no previous prototype for '__dept_wait' [-Wmissing-prototypes]
    2105 | void __dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip,
         |      ^~~~~~~~~~~
   In file included from include/asm-generic/bug.h:22,
                    from arch/sh/include/asm/bug.h:112,
                    from include/linux/bug.h:5,
                    from include/linux/thread_info.h:13,
                    from include/asm-generic/current.h:5,
                    from ./arch/sh/include/generated/asm/current.h:1,
                    from include/linux/sched.h:12,
                    from kernel/dependency/dept.c:86:
   kernel/dependency/dept_hash.h: In function 'dept_init':
>> include/linux/kern_levels.h:5:25: warning: format '%zu' expects argument of type 'size_t', but argument 3 has type 'long unsigned int' [-Wformat=]
       5 | #define KERN_SOH        "\001"          /* ASCII Start Of Header */
         |                         ^~~~~~
   include/linux/printk.h:418:25: note: in definition of macro 'printk_index_wrap'
     418 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
         |                         ^~~~
   include/linux/printk.h:519:9: note: in expansion of macro 'printk'
     519 |         printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
         |         ^~~~~~
   include/linux/kern_levels.h:14:25: note: in expansion of macro 'KERN_SOH'
      14 | #define KERN_INFO       KERN_SOH "6"    /* informational */
         |                         ^~~~~~~~
   include/linux/printk.h:519:16: note: in expansion of macro 'KERN_INFO'
     519 |         printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
         |                ^~~~~~~~~
   kernel/dependency/dept.c:2579:9: note: in expansion of macro 'pr_info'
    2579 |         pr_info("... hash list head used by %s: %zu KB\n",              \
         |         ^~~~~~~
   kernel/dependency/dept_hash.h:9:1: note: in expansion of macro 'HASH'
       9 | HASH(dep, 12)
         | ^~~~
>> include/linux/kern_levels.h:5:25: warning: format '%zu' expects argument of type 'size_t', but argument 3 has type 'long unsigned int' [-Wformat=]
       5 | #define KERN_SOH        "\001"          /* ASCII Start Of Header */
         |                         ^~~~~~
   include/linux/printk.h:418:25: note: in definition of macro 'printk_index_wrap'
     418 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
         |                         ^~~~
   include/linux/printk.h:519:9: note: in expansion of macro 'printk'
     519 |         printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
         |         ^~~~~~
   include/linux/kern_levels.h:14:25: note: in expansion of macro 'KERN_SOH'
      14 | #define KERN_INFO       KERN_SOH "6"    /* informational */
         |                         ^~~~~~~~
   include/linux/printk.h:519:16: note: in expansion of macro 'KERN_INFO'
     519 |         printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
         |                ^~~~~~~~~
   kernel/dependency/dept.c:2579:9: note: in expansion of macro 'pr_info'
    2579 |         pr_info("... hash list head used by %s: %zu KB\n",              \
         |         ^~~~~~~
   kernel/dependency/dept_hash.h:10:1: note: in expansion of macro 'HASH'
      10 | HASH(class, 12)
         | ^~~~


vim +/__dept_wait +2105 kernel/dependency/dept.c

  2104	
> 2105	void __dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip,
  2106			 const char *w_fn, int ne)
  2107	{
  2108		int e;
  2109	
  2110		/*
  2111		 * Be as conservative as possible. In case of mulitple waits for
  2112		 * a single dept_map, we are going to keep only the last wait's
  2113		 * wgen for simplicity - keeping all wgens seems overengineering.
  2114		 *
  2115		 * Of course, it might cause missing some dependencies that
  2116		 * would rarely, probabily never, happen but it helps avoid
  2117		 * false positive report.
  2118		 */
  2119		for_each_set_bit(e, &w_f, DEPT_MAX_SUBCLASSES_EVT) {
  2120			struct dept_class *c;
  2121			struct dept_key *k;
  2122	
  2123			k = m->keys ?: &m->keys_local;
  2124			c = check_new_class(&m->keys_local, k,
  2125					    map_sub(m, e), m->name);
  2126			if (!c)
  2127				continue;
  2128	
  2129			add_wait(c, ip, w_fn, ne);
  2130		}
  2131	}
  2132	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 07/16] dept: Apply Dept to wait_for_completion()/complete()
  2022-02-17 10:57 ` [PATCH 07/16] dept: Apply Dept to wait_for_completion()/complete() Byungchul Park
@ 2022-02-17 19:46   ` kernel test robot
  0 siblings, 0 replies; 67+ messages in thread
From: kernel test robot @ 2022-02-17 19:46 UTC (permalink / raw)
  To: Byungchul Park, torvalds
  Cc: kbuild-all, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes

Hi Byungchul,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on linux/master linus/master v5.17-rc4]
[cannot apply to tip/locking/core hnaz-mm/master next-20220217]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Byungchul-Park/DEPT-Dependency-Tracker/20220217-190040
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 3624ba7b5e2acc02b01301ea5fd3534971eb9896
config: arc-allyesconfig (https://download.01.org/0day-ci/archive/20220218/202202180000.zICbPUhq-lkp@intel.com/config)
compiler: arceb-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/8ef6cb09cb67a0c5cd7ba4f25f4825d0a5b269b0
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Byungchul-Park/DEPT-Dependency-Tracker/20220217-190040
        git checkout 8ef6cb09cb67a0c5cd7ba4f25f4825d0a5b269b0
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash drivers/scsi/qla2xxx/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/scsi/qla2xxx/qla_dfs.c: In function 'qla2x00_dfs_tgt_port_database_show':
>> drivers/scsi/qla2xxx/qla_dfs.c:227:1: warning: the frame size of 1028 bytes is larger than 1024 bytes [-Wframe-larger-than=]
     227 | }
         | ^


vim +227 drivers/scsi/qla2xxx/qla_dfs.c

36c7845282eef01 Quinn Tran       2016-02-04  174  
c423437e3ff41b8 Himanshu Madhani 2017-03-15  175  static int
c423437e3ff41b8 Himanshu Madhani 2017-03-15  176  qla2x00_dfs_tgt_port_database_show(struct seq_file *s, void *unused)
c423437e3ff41b8 Himanshu Madhani 2017-03-15  177  {
c423437e3ff41b8 Himanshu Madhani 2017-03-15  178  	scsi_qla_host_t *vha = s->private;
c423437e3ff41b8 Himanshu Madhani 2017-03-15  179  	struct qla_hw_data *ha = vha->hw;
4e5a05d1ecd92ce Arun Easi        2020-09-03  180  	struct gid_list_info *gid_list;
c423437e3ff41b8 Himanshu Madhani 2017-03-15  181  	dma_addr_t gid_list_dma;
c423437e3ff41b8 Himanshu Madhani 2017-03-15  182  	fc_port_t fc_port;
4e5a05d1ecd92ce Arun Easi        2020-09-03  183  	char *id_iter;
c423437e3ff41b8 Himanshu Madhani 2017-03-15  184  	int rc, i;
c423437e3ff41b8 Himanshu Madhani 2017-03-15  185  	uint16_t entries, loop_id;
c423437e3ff41b8 Himanshu Madhani 2017-03-15  186  
c423437e3ff41b8 Himanshu Madhani 2017-03-15  187  	seq_printf(s, "%s\n", vha->host_str);
c423437e3ff41b8 Himanshu Madhani 2017-03-15  188  	gid_list = dma_alloc_coherent(&ha->pdev->dev,
c423437e3ff41b8 Himanshu Madhani 2017-03-15  189  				      qla2x00_gid_list_size(ha),
c423437e3ff41b8 Himanshu Madhani 2017-03-15  190  				      &gid_list_dma, GFP_KERNEL);
c423437e3ff41b8 Himanshu Madhani 2017-03-15  191  	if (!gid_list) {
83548fe2fcbb78a Quinn Tran       2017-06-02  192  		ql_dbg(ql_dbg_user, vha, 0x7018,
c423437e3ff41b8 Himanshu Madhani 2017-03-15  193  		       "DMA allocation failed for %u\n",
c423437e3ff41b8 Himanshu Madhani 2017-03-15  194  		       qla2x00_gid_list_size(ha));
c423437e3ff41b8 Himanshu Madhani 2017-03-15  195  		return 0;
c423437e3ff41b8 Himanshu Madhani 2017-03-15  196  	}
c423437e3ff41b8 Himanshu Madhani 2017-03-15  197  
c423437e3ff41b8 Himanshu Madhani 2017-03-15  198  	rc = qla24xx_gidlist_wait(vha, gid_list, gid_list_dma,
c423437e3ff41b8 Himanshu Madhani 2017-03-15  199  				  &entries);
c423437e3ff41b8 Himanshu Madhani 2017-03-15  200  	if (rc != QLA_SUCCESS)
c423437e3ff41b8 Himanshu Madhani 2017-03-15  201  		goto out_free_id_list;
c423437e3ff41b8 Himanshu Madhani 2017-03-15  202  
4e5a05d1ecd92ce Arun Easi        2020-09-03  203  	id_iter = (char *)gid_list;
c423437e3ff41b8 Himanshu Madhani 2017-03-15  204  
c423437e3ff41b8 Himanshu Madhani 2017-03-15  205  	seq_puts(s, "Port Name	Port ID		Loop ID\n");
c423437e3ff41b8 Himanshu Madhani 2017-03-15  206  
c423437e3ff41b8 Himanshu Madhani 2017-03-15  207  	for (i = 0; i < entries; i++) {
4e5a05d1ecd92ce Arun Easi        2020-09-03  208  		struct gid_list_info *gid =
4e5a05d1ecd92ce Arun Easi        2020-09-03  209  			(struct gid_list_info *)id_iter;
c423437e3ff41b8 Himanshu Madhani 2017-03-15  210  		loop_id = le16_to_cpu(gid->loop_id);
c423437e3ff41b8 Himanshu Madhani 2017-03-15  211  		memset(&fc_port, 0, sizeof(fc_port_t));
c423437e3ff41b8 Himanshu Madhani 2017-03-15  212  
c423437e3ff41b8 Himanshu Madhani 2017-03-15  213  		fc_port.loop_id = loop_id;
c423437e3ff41b8 Himanshu Madhani 2017-03-15  214  
c423437e3ff41b8 Himanshu Madhani 2017-03-15  215  		rc = qla24xx_gpdb_wait(vha, &fc_port, 0);
c423437e3ff41b8 Himanshu Madhani 2017-03-15  216  		seq_printf(s, "%8phC  %02x%02x%02x  %d\n",
c423437e3ff41b8 Himanshu Madhani 2017-03-15  217  			   fc_port.port_name, fc_port.d_id.b.domain,
c423437e3ff41b8 Himanshu Madhani 2017-03-15  218  			   fc_port.d_id.b.area, fc_port.d_id.b.al_pa,
c423437e3ff41b8 Himanshu Madhani 2017-03-15  219  			   fc_port.loop_id);
4e5a05d1ecd92ce Arun Easi        2020-09-03  220  		id_iter += ha->gid_list_info_size;
c423437e3ff41b8 Himanshu Madhani 2017-03-15  221  	}
c423437e3ff41b8 Himanshu Madhani 2017-03-15  222  out_free_id_list:
c423437e3ff41b8 Himanshu Madhani 2017-03-15  223  	dma_free_coherent(&ha->pdev->dev, qla2x00_gid_list_size(ha),
c423437e3ff41b8 Himanshu Madhani 2017-03-15  224  			  gid_list, gid_list_dma);
c423437e3ff41b8 Himanshu Madhani 2017-03-15  225  
c423437e3ff41b8 Himanshu Madhani 2017-03-15  226  	return 0;
c423437e3ff41b8 Himanshu Madhani 2017-03-15 @227  }
c423437e3ff41b8 Himanshu Madhani 2017-03-15  228  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 02/16] dept: Implement Dept(Dependency Tracker)
  2022-02-17 10:57 ` [PATCH 02/16] dept: Implement Dept(Dependency Tracker) Byungchul Park
                     ` (2 preceding siblings ...)
  2022-02-17 19:46   ` kernel test robot
@ 2022-02-17 19:46   ` kernel test robot
  3 siblings, 0 replies; 67+ messages in thread
From: kernel test robot @ 2022-02-17 19:46 UTC (permalink / raw)
  To: Byungchul Park, torvalds
  Cc: kbuild-all, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes

Hi Byungchul,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on hnaz-mm/master linux/master linus/master v5.17-rc4]
[cannot apply to tip/locking/core next-20220217]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Byungchul-Park/DEPT-Dependency-Tracker/20220217-190040
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 3624ba7b5e2acc02b01301ea5fd3534971eb9896
config: xtensa-allyesconfig (https://download.01.org/0day-ci/archive/20220218/202202180059.SibYSAt1-lkp@intel.com/config)
compiler: xtensa-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/4d0434b0b917f4374a09f3b75cbcadf148cfa711
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Byungchul-Park/DEPT-Dependency-Tracker/20220217-190040
        git checkout 4d0434b0b917f4374a09f3b75cbcadf148cfa711
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=xtensa SHELL=/bin/bash drivers/net/ethernet/sfc/falcon/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/linux/delay.h:23,
                    from drivers/net/ethernet/sfc/falcon/falcon.c:9:
   drivers/net/ethernet/sfc/falcon/falcon.c: In function 'falcon_spi_slow_wait':
>> drivers/net/ethernet/sfc/falcon/falcon.c:750:58: warning: '?:' using integer constants in boolean context [-Wint-in-bool-context]
     750 |                                     TASK_UNINTERRUPTIBLE : TASK_INTERRUPTIBLE);
   include/linux/sched.h:205:21: note: in definition of macro '__set_current_state'
     205 |                 if (state_value == TASK_RUNNING)                        \
         |                     ^~~~~~~~~~~


vim +750 drivers/net/ethernet/sfc/falcon/falcon.c

4a5b504d0c582db drivers/net/sfc/falcon.c                 Ben Hutchings 2008-09-01  738  
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  739  static int
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  740  falcon_spi_slow_wait(struct falcon_mtd_partition *part, bool uninterruptible)
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  741  {
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  742  	const struct falcon_spi_device *spi = part->spi;
5a6681e22c14090 drivers/net/ethernet/sfc/falcon/falcon.c Edward Cree   2016-11-28  743  	struct ef4_nic *efx = part->common.mtd.priv;
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  744  	u8 status;
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  745  	int rc, i;
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  746  
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  747  	/* Wait up to 4s for flash/EEPROM to finish a slow operation. */
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  748  	for (i = 0; i < 40; i++) {
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  749  		__set_current_state(uninterruptible ?
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28 @750  				    TASK_UNINTERRUPTIBLE : TASK_INTERRUPTIBLE);
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  751  		schedule_timeout(HZ / 10);
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  752  		rc = falcon_spi_cmd(efx, spi, SPI_RDSR, -1, NULL,
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  753  				    &status, sizeof(status));
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  754  		if (rc)
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  755  			return rc;
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  756  		if (!(status & SPI_STATUS_NRDY))
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  757  			return 0;
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  758  		if (signal_pending(current))
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  759  			return -EINTR;
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  760  	}
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  761  	pr_err("%s: timed out waiting for %s\n",
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  762  	       part->common.name, part->common.dev_type_name);
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  763  	return -ETIMEDOUT;
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  764  }
45a3fd55acc8989 drivers/net/ethernet/sfc/falcon.c        Ben Hutchings 2012-11-28  765  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 1 in ext4 and journal based on v5.17-rc1
  2022-02-17 13:27   ` Report 1 " Matthew Wilcox
@ 2022-02-18  0:41     ` Byungchul Park
  0 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-18  0:41 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, axboe, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa

On Thu, Feb 17, 2022 at 01:27:42PM +0000, Matthew Wilcox wrote:
> On Thu, Feb 17, 2022 at 08:10:03PM +0900, Byungchul Park wrote:
> > [    7.009608] ===================================================
> > [    7.009613] DEPT: Circular dependency has been detected.
> > [    7.009614] 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 Tainted: G        W
> > [    7.009616] ---------------------------------------------------
> > [    7.009617] summary
> > [    7.009618] ---------------------------------------------------
> > [    7.009618] *** DEADLOCK ***
> > [    7.009618]
> > [    7.009619] context A
> > [    7.009619]     [S] (unknown)(&(bit_wait_table + i)->dmap:0)
> 
> Why is the context unknown here?  I don't see a way to debug this
> without knowing where we acquired the bit wait lock.

ideal view
------------------

context X			context A

request event E to context A	...
   write REQUESTEVENT		if (can see REQUESTEVENT written)
...				   notice the request from X [S]
wait for the event [W]		...
   write barrier
   write WAITSTART		
   actual wait			if (can see REQUESTEVENT written)
				   consider it's on the way to the event
...				
				...
				finally the event [E]

Dept works with the above view wrt. wait and event. [S] point varies
depending on ways to make context A notice the request so that the event
context A can start. Of course, by making more effort on identifying
each way in the kernel, we can figure out the exact point.

Here, we can also check WAITSTART with a read barrier instead of
REQUESTEVENT in context A conservatively. That's how Dept works.

Dept's view
------------------

context X			context A

request event E to context A	...
   write REQUESTEVENT		if (can see REQUESTEVENT written)
...				   notice the request from X [S]
wait for the event [W]		...
   write barrier
   write WAITSTART		read barrier
   actual wait			if (can see WAITSTART written)
				   consider it's on the way to the event
...				
				...
				finally the event [E]
				   consider all waits in context A so far
				   that could see WAITSTART written

Thanks,
Byungchul

> > [    7.009621]     [W] down_write(&ei->i_data_sem:0)
> > [    7.009623]     [E] event(&(bit_wait_table + i)->dmap:0)
> > [    7.009624]
> > [    7.009625] context B
> > [    7.009625]     [S] down_read(&ei->i_data_sem:0)
> > [    7.009626]     [W] wait(&(bit_wait_table + i)->dmap:0)
> > [    7.009627]     [E] up_read(&ei->i_data_sem:0)
> > [    7.009628]
> > [    7.009629] [S]: start of the event context
> > [    7.009629] [W]: the wait blocked
> > [    7.009630] [E]: the event not reachable
> > [    7.009631] ---------------------------------------------------
> > [    7.009631] context A's detail
> > [    7.009632] ---------------------------------------------------
> > [    7.009632] context A
> > [    7.009633]     [S] (unknown)(&(bit_wait_table + i)->dmap:0)
> > [    7.009634]     [W] down_write(&ei->i_data_sem:0)
> > [    7.009635]     [E] event(&(bit_wait_table + i)->dmap:0)
> > [    7.009636]
> > [    7.009636] [S] (unknown)(&(bit_wait_table + i)->dmap:0):
> > [    7.009638] (N/A)
> > [    7.009638]
> > [    7.009639] [W] down_write(&ei->i_data_sem:0):
> > [    7.009639] ext4_truncate (fs/ext4/inode.c:4187) 
> > [    7.009645] stacktrace:
> > [    7.009646] down_write (kernel/locking/rwsem.c:1514) 
> > [    7.009648] ext4_truncate (fs/ext4/inode.c:4187) 
> > [    7.009650] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
> > [    7.009652] generic_perform_write (mm/filemap.c:3784) 
> > [    7.009654] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> > [    7.009657] ext4_file_write_iter (fs/ext4/file.c:677) 
> > [    7.009659] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> > [    7.009662] vfs_write (fs/read_write.c:590) 
> > [    7.009663] ksys_write (fs/read_write.c:644) 
> > [    7.009664] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> > [    7.009667] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> > [    7.009669]
> > [    7.009670] [E] event(&(bit_wait_table + i)->dmap:0):
> > [    7.009671] __wake_up_common (kernel/sched/wait.c:108) 
> > [    7.009673] stacktrace:
> > [    7.009674] dept_event (kernel/dependency/dept.c:2337) 
> > [    7.009677] __wake_up_common (kernel/sched/wait.c:109) 
> > [    7.009678] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
> > [    7.009679] __wake_up_bit (kernel/sched/wait_bit.c:127) 
> > [    7.009681] ext4_orphan_del (fs/ext4/orphan.c:282) 
> > [    7.009683] ext4_truncate (fs/ext4/inode.c:4212) 
> > [    7.009685] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
> > [    7.009687] generic_perform_write (mm/filemap.c:3784) 
> > [    7.009688] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> > [    7.009690] ext4_file_write_iter (fs/ext4/file.c:677) 
> > [    7.009692] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> > [    7.009694] vfs_write (fs/read_write.c:590) 
> > [    7.009695] ksys_write (fs/read_write.c:644) 
> > [    7.009696] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> > [    7.009698] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> > [    7.009700] ---------------------------------------------------
> > [    7.009700] context B's detail
> > [    7.009701] ---------------------------------------------------
> > [    7.009702] context B
> > [    7.009702]     [S] down_read(&ei->i_data_sem:0)
> > [    7.009703]     [W] wait(&(bit_wait_table + i)->dmap:0)
> > [    7.009704]     [E] up_read(&ei->i_data_sem:0)
> > [    7.009705]
> > [    7.009706] [S] down_read(&ei->i_data_sem:0):
> > [    7.009707] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) 
> > [    7.009709] stacktrace:
> > [    7.009709] down_read (kernel/locking/rwsem.c:1461) 
> > [    7.009711] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) 
> > [    7.009712] ext4_getblk (fs/ext4/inode.c:851) 
> > [    7.009714] ext4_bread (fs/ext4/inode.c:903) 
> > [    7.009715] __ext4_read_dirblock (fs/ext4/namei.c:117) 
> > [    7.009718] dx_probe (fs/ext4/namei.c:789) 
> > [    7.009720] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
> > [    7.009722] __ext4_find_entry (fs/ext4/namei.c:1571) 
> > [    7.009723] ext4_lookup (fs/ext4/namei.c:1770) 
> > [    7.009725] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) 
> > [    7.009727] path_openat (fs/namei.c:3401 fs/namei.c:3605) 
> > [    7.009729] do_filp_open (fs/namei.c:3637) 
> > [    7.009731] do_sys_openat2 (fs/open.c:1215) 
> > [    7.009732] do_sys_open (fs/open.c:1231) 
> > [    7.009734] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> > [    7.009736] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> > [    7.009738]
> > [    7.009738] [W] wait(&(bit_wait_table + i)->dmap:0):
> > [    7.009739] prepare_to_wait (kernel/sched/wait.c:275) 
> > [    7.009741] stacktrace:
> > [    7.009741] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) 
> > [    7.009743] schedule (kernel/sched/core.c:6373 (discriminator 1)) 
> > [    7.009744] io_schedule (./arch/x86/include/asm/current.h:15 kernel/sched/core.c:8392 kernel/sched/core.c:8418) 
> > [    7.009745] bit_wait_io (./arch/x86/include/asm/current.h:15 kernel/sched/wait_bit.c:210) 
> > [    7.009746] __wait_on_bit (kernel/sched/wait_bit.c:49) 
> > [    7.009748] out_of_line_wait_on_bit (kernel/sched/wait_bit.c:65) 
> > [    7.009749] ext4_read_bh (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 ./include/linux/buffer_head.h:120 fs/ext4/super.c:201) 
> > [    7.009752] __read_extent_tree_block (fs/ext4/extents.c:545) 
> > [    7.009754] ext4_find_extent (fs/ext4/extents.c:928) 
> > [    7.009756] ext4_ext_map_blocks (fs/ext4/extents.c:4099) 
> > [    7.009757] ext4_map_blocks (fs/ext4/inode.c:563) 
> > [    7.009759] ext4_getblk (fs/ext4/inode.c:851) 
> > [    7.009760] ext4_bread (fs/ext4/inode.c:903) 
> > [    7.009762] __ext4_read_dirblock (fs/ext4/namei.c:117) 
> > [    7.009764] dx_probe (fs/ext4/namei.c:789) 
> > [    7.009765] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
> > [    7.009767]
> > [    7.009768] [E] up_read(&ei->i_data_sem:0):
> > [    7.009769] ext4_map_blocks (fs/ext4/inode.c:593) 
> > [    7.009771] stacktrace:
> > [    7.009771] up_read (kernel/locking/rwsem.c:1556) 
> > [    7.009774] ext4_map_blocks (fs/ext4/inode.c:593) 
> > [    7.009775] ext4_getblk (fs/ext4/inode.c:851) 
> > [    7.009777] ext4_bread (fs/ext4/inode.c:903) 
> > [    7.009778] __ext4_read_dirblock (fs/ext4/namei.c:117) 
> > [    7.009780] dx_probe (fs/ext4/namei.c:789) 
> > [    7.009782] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
> > [    7.009784] __ext4_find_entry (fs/ext4/namei.c:1571) 
> > [    7.009786] ext4_lookup (fs/ext4/namei.c:1770) 
> > [    7.009788] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) 
> > [    7.009789] path_openat (fs/namei.c:3401 fs/namei.c:3605) 
> > [    7.009791] do_filp_open (fs/namei.c:3637) 
> > [    7.009792] do_sys_openat2 (fs/open.c:1215) 
> > [    7.009794] do_sys_open (fs/open.c:1231) 
> > [    7.009795] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> > [    7.009797] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> > [    7.009799] ---------------------------------------------------
> > [    7.009800] information that might be helpful
> > [    7.009800] ---------------------------------------------------
> > [    7.009801] CPU: 0 PID: 611 Comm: rs:main Q:Reg Tainted: G        W         5.17.0-rc1-00014-g8a599299c0cb-dirty #30
> > [    7.009804] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [    7.009805] Call Trace:
> > [    7.009806]  <TASK>
> > [    7.009807] dump_stack_lvl (lib/dump_stack.c:107) 
> > [    7.009809] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) 
> > [    7.009812] ? print_circle (kernel/dependency/dept.c:1086) 
> > [    7.009814] cb_check_dl (kernel/dependency/dept.c:1104) 
> > [    7.009815] bfs (kernel/dependency/dept.c:860) 
> > [    7.009818] add_dep (kernel/dependency/dept.c:1423) 
> > [    7.009820] do_event.isra.25 (kernel/dependency/dept.c:1650) 
> > [    7.009822] ? __wake_up_common (kernel/sched/wait.c:108) 
> > [    7.009824] dept_event (kernel/dependency/dept.c:2337) 
> > [    7.009826] __wake_up_common (kernel/sched/wait.c:109) 
> > [    7.009828] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
> > [    7.009830] __wake_up_bit (kernel/sched/wait_bit.c:127) 
> > [    7.009832] ext4_orphan_del (fs/ext4/orphan.c:282) 
> > [    7.009835] ? dept_ecxt_exit (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:2478) 
> > [    7.009837] ext4_truncate (fs/ext4/inode.c:4212) 
> > [    7.009839] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
> > [    7.009842] generic_perform_write (mm/filemap.c:3784) 
> > [    7.009845] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> > [    7.009848] ext4_file_write_iter (fs/ext4/file.c:677) 
> > [    7.009851] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> > [    7.009854] vfs_write (fs/read_write.c:590) 
> > [    7.009856] ksys_write (fs/read_write.c:644) 
> > [    7.009857] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:65) 
> > [    7.009860] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> > [    7.009862] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> > [    7.009865] RIP: 0033:0x7f3b160b335d
> > [ 7.009867] Code: e1 20 00 00 75 10 b8 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce fa ff ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 17 fb ff ff 48 89 d0 48 83 c4 08 48 3d 01
> > All code
> > ========
> >    0:	e1 20                	loope  0x22
> >    2:	00 00                	add    %al,(%rax)
> >    4:	75 10                	jne    0x16
> >    6:	b8 01 00 00 00       	mov    $0x1,%eax
> >    b:	0f 05                	syscall 
> >    d:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
> >   13:	73 31                	jae    0x46
> >   15:	c3                   	retq   
> >   16:	48 83 ec 08          	sub    $0x8,%rsp
> >   1a:	e8 ce fa ff ff       	callq  0xfffffffffffffaed
> >   1f:	48 89 04 24          	mov    %rax,(%rsp)
> >   23:	b8 01 00 00 00       	mov    $0x1,%eax
> >   28:	0f 05                	syscall 
> >   2a:*	48 8b 3c 24          	mov    (%rsp),%rdi		<-- trapping instruction
> >   2e:	48 89 c2             	mov    %rax,%rdx
> >   31:	e8 17 fb ff ff       	callq  0xfffffffffffffb4d
> >   36:	48 89 d0             	mov    %rdx,%rax
> >   39:	48 83 c4 08          	add    $0x8,%rsp
> >   3d:	48                   	rex.W
> >   3e:	3d                   	.byte 0x3d
> >   3f:	01                   	.byte 0x1
> > 
> > Code starting with the faulting instruction
> > ===========================================
> >    0:	48 8b 3c 24          	mov    (%rsp),%rdi
> >    4:	48 89 c2             	mov    %rax,%rdx
> >    7:	e8 17 fb ff ff       	callq  0xfffffffffffffb23
> >    c:	48 89 d0             	mov    %rdx,%rax
> >    f:	48 83 c4 08          	add    $0x8,%rsp
> >   13:	48                   	rex.W
> >   14:	3d                   	.byte 0x3d
> >   15:	01                   	.byte 0x1
> > [    7.009869] RSP: 002b:00007f3b1340f180 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> > [    7.009871] RAX: ffffffffffffffda RBX: 00007f3b040010a0 RCX: 00007f3b160b335d
> > [    7.009873] RDX: 0000000000000300 RSI: 00007f3b040010a0 RDI: 0000000000000001
> > [    7.009874] RBP: 0000000000000000 R08: fffffffffffffa15 R09: fffffffffffffa05
> > [    7.009875] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f3b04000df0
> > [    7.009876] R13: 00007f3b1340f1a0 R14: 0000000000000220 R15: 0000000000000300
> > [    7.009879]  </TASK>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/16] DEPT(Dependency Tracker)
  2022-02-17 17:00   ` Steven Rostedt
  2022-02-17 17:06     ` Matthew Wilcox
@ 2022-02-18  4:19     ` Theodore Ts'o
  2022-02-19 10:34       ` Byungchul Park
  2022-02-19 10:18     ` Byungchul Park
  2 siblings, 1 reply; 67+ messages in thread
From: Theodore Ts'o @ 2022-02-18  4:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Byungchul Park, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, linux-kernel, peterz, will,
	tglx, joel, sashal, daniel.vetter, chris, duyuyang,
	johannes.berg, tj, willy, david, amir73il, bfields, gregkh,
	kernel-team, linux-mm, akpm, mhocko, minchan, hannes,
	vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka,
	ngupta, linux-block, axboe, paolo.valente, josef, linux-fsdevel,
	viro, jack, jack, jlayton, dan.j.williams, hch, djwong,
	dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Thu, Feb 17, 2022 at 12:00:05PM -0500, Steven Rostedt wrote:
> 
> I personally believe that there's potential that this can be helpful and we
> will want to merge it.
> 
> But, what I believe Ted is trying to say is, if you do not know if the
> report is a bug or not, please do not ask the maintainers to determine it
> for you. This is a good opportunity for you to look to see why your tool
> reported an issue, and learn that subsystem. Look at if this is really a
> bug or not, and investigate why.

I agree there's potential here, or I would have ignored the ext4 "bug
report".

When we can get rid of the false positives, I think it should be
merged; I'd just rather it not be merged until after the false
positives are fixed, since otherwise, someone well-meaning will start
using it with Syzkaller, and noise that maintainers need to deal with
(with people requesting reverts of two year old commits, etc) will
increase by a factor of ten or more.  (With Syzbot reproducers that
set up random cgroups, IP tunnels with wiregaurd enabled, FUSE stress
testers, etc., that file system maintainers will be asked to try to
disentangle.)

So from a maintainer's perspective, false positives are highly
negative.  It may be that from some people's POV, one bug found and 20
false positive might still be "useful".  But if your tool gains a
reputation of not valuing maintainers' time, it's just going to make
us (or at least me :-) cranky, and it's going to be very hard to
recover from perception.  So it's probably better to be very
conservative and careful in polishing it before asking for it to be
merged.

Cheers,

						- Ted


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 02/16] dept: Implement Dept(Dependency Tracker)
  2022-02-17 17:36   ` Steven Rostedt
@ 2022-02-18  6:09     ` Byungchul Park
  0 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-18  6:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Thu, Feb 17, 2022 at 12:36:56PM -0500, Steven Rostedt wrote:
> > +struct dept_ecxt;
> > +struct dept_iecxt {
> > +	struct dept_ecxt *ecxt;
> > +	int enirq;
> > +	bool staled; /* for preventing to add a new ecxt */
> > +};
> > +
> > +struct dept_wait;
> > +struct dept_iwait {
> > +	struct dept_wait *wait;
> > +	int irq;
> > +	bool staled; /* for preventing to add a new wait */
> > +	bool touched;
> > +};
> 
> Nit. It makes it easier to read (and then review) if structures are spaced
> where their fields are all lined up:
> 
> struct dept_iecxt {
> 	struct dept_ecxt		*ecxt;
> 	int				enirq;
> 	bool				staled;
> };
> 
> struct dept_iwait {
> 	struct dept_wait		*wait;
> 	int				irq;
> 	bool				staled;
> 	bool				touched;
> };
> 
> See, the fields stand out, and is nicer on the eyes. Especially for those
> of us that are getting up in age, and our eyes do not work as well as they
> use to ;-)

Sure! I will apply this.

> > + * ---
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your ootion) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, you can access it online at
> > + * http://www.gnu.org/licenses/gpl-2.0.html.
> 
> The SPDX at the top of the file is all that is needed. Please remove this
> boiler plate. We do not use GPL boiler plates in the kernel anymore. The
> SPDX code supersedes that.

Thank you for informing it!

> > +/*
> > + * Can use llist no matter whether CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG is
> > + * enabled because DEPT never race with NMI by nesting control.
> 
>                          "never races with"

Good eyes!

> Although, I'm confused by what you mean with "by nesting control".

I should've expressed it more clearly. It meant NMI and other contexts
never run inside of Dept concurrently in the same CPU by preventing
reentrance.

> > +static void initialize_class(struct dept_class *c)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < DEPT_IRQS_NR; i++) {
> > +		struct dept_iecxt *ie = &c->iecxt[i];
> > +		struct dept_iwait *iw = &c->iwait[i];
> > +
> > +		ie->ecxt = NULL;
> > +		ie->enirq = i;
> > +		ie->staled = false;
> > +
> > +		iw->wait = NULL;
> > +		iw->irq = i;
> > +		iw->staled = false;
> > +		iw->touched = false;
> > +	}
> > +	c->bfs_gen = 0U;
> 
> Is the U really necessary?

I was just wondering if it's really harmful? I want to leave this if
it's harmless because U let us guess the data type of ->bfs_gen correctly
at a glance. Or am I missing some reason why I should fix this?

Thank you very much, Steven.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/16] DEPT(Dependency Tracker)
  2022-02-17 15:51 ` [PATCH 00/16] DEPT(Dependency Tracker) Theodore Ts'o
  2022-02-17 17:00   ` Steven Rostedt
@ 2022-02-19  9:54   ` Byungchul Park
  1 sibling, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-19  9:54 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, axboe, paolo.valente,
	josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams,
	hch, djwong, dri-devel, airlied, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa

On Thu, Feb 17, 2022 at 10:51:09AM -0500, Theodore Ts'o wrote:
> On Thu, Feb 17, 2022 at 07:57:36PM +0900, Byungchul Park wrote:
> > 
> > I've got several reports from the tool. Some of them look like false
> > alarms and some others look like real deadlock possibility. Because of
> > my unfamiliarity of the domain, it's hard to confirm if it's a real one.
> > Let me add the reports on this email thread.
> 
> The problem is we have so many potentially invalid, or
> so-rare-as-to-be-not-worth-the-time-to-investigate-in-the-
> grand-scheme-of-all-of-the-fires-burning-on-maintainers laps that it's
> really not reasonable to ask maintainers to determine whether

Even though I might have been wrong and might be gonna be wrong, you
look so arrogant. You were hasty to judge and trying to walk over me.

I reported it because I thought it was a real problem but couldn't
confirm it. For the other reports that I thought was not real, I didn't
even mention it. If you are talking about the previous report, then I
felt so sorry as I told you. I skimmed through the part of the waits...

Basically, I respect you and appreciate your feedback. Hope you not get
me wrong.

> Looking at the second ext4 report, it doesn't make any sense.  Context
> A is the kjournald thread.  We don't do a commit until (a) the timeout
> expires, or (b) someone explicitly requests that a commit happen
> waking up j_wait_commit.  I'm guessing that complaint here is that
> DEPT thinks nothing is explicitly requesting a wake up.  But note that
> after 5 seconds (or whatever journal->j_commit_interval) is configured
> to be we *will* always start a commit.  So ergo, there can't be a deadlock.

Yeah, it might not be a *deadlock deadlock* because the wait will be
anyway woken up by one of the wake up points you mentioned. However, the
dependency looks problematic because the three contexts participating in
the dependency chain would be stuck for a while until one eventually
wakes it up. I bet it would not be what you meant.

Again. It's not critical but problematic. Or am I missing something?

> At a higher level of discussion, it's an unfair tax on maintainer's
> times to ask maintainers to help you debug DEPT for you.  Tools like
> Syzkaller and DEPT are useful insofar as they save us time in making
> our subsystems better.  But until you can prove that it's not going to
> be a massive denial of service attack on maintainer's time, at the

Partially I agree. I would understand you even if you don't support Dept
until you think it's valuable enough. However, let me keep asking things
to fs folks, not you, even though I would cc you on it.

> If you know there there "appear to be false positives", you need to
> make sure you've tracked them all down before trying to ask that this
> be merged.

To track them all down, I need to ask LKML because Dept works perfectly
with my system. I don't want it to be merged with a lot of false
positive still in there, either.

> You may also want to add some documentation about why we should trust
> this; in particular for wait channels, when a process calls schedule()
> there may be multiple reasons why the thread will wake up --- in the
> worst case, such as in the select(2) or epoll(2) system call, there
> may be literally thousands of reasons (one for every file desriptor
> the select is waiting on) --- why the process will wake up and thus
> resolve the potential "deadlock" that DEPT is worrying about.  How is
> DEPT going to handle those cases?  If the answer is that things need

Thank you for the information but I don't get it which case you are
concerning. I'd like to ask you a specific senario of that so that we
can discuss it more - maybe I guess I could answer to it tho, but I
won't ask you. Just give me an instance only if you think it's worthy.

You look like a guy who unconditionally blames on new things before
understanding it rather than asking and discussing. Again. I also think
anyone doesn't have to spend his or her time for what he or she think is
not worthy enough.

> I know that you're trying to help us, but this tool needs to be far
> better than Lockdep before we should think about merging it.  Even if
> it finds 5% more potential deadlocks, if it creates 95% more false

It should not get merged for sure if so, but it sounds too sarcastic.
Let's see if it creates 95% false positives for real. If it's true and
I can't control it, I will give up. That's what I should do.

There are a lot of factors to judge how valuable Dept is. Dept would be
useful especially in the middle of development, rather than in the final
state in the tree. It'd be appreciated if you think that sides more, too.

Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/16] DEPT(Dependency Tracker)
  2022-02-17 17:06     ` Matthew Wilcox
@ 2022-02-19 10:05       ` Byungchul Park
  0 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-19 10:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Steven Rostedt, Theodore Ts'o, torvalds, damien.lemoal,
	linux-ide, adilger.kernel, linux-ext4, mingo, linux-kernel,
	peterz, will, tglx, joel, sashal, daniel.vetter, chris, duyuyang,
	johannes.berg, tj, david, amir73il, bfields, gregkh, kernel-team,
	linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj,
	jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta,
	linux-block, axboe, paolo.valente, josef, linux-fsdevel, viro,
	jack, jack, jlayton, dan.j.williams, hch, djwong, dri-devel,
	airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Thu, Feb 17, 2022 at 05:06:18PM +0000, Matthew Wilcox wrote:
> On Thu, Feb 17, 2022 at 12:00:05PM -0500, Steven Rostedt wrote:
> > On Thu, 17 Feb 2022 10:51:09 -0500
> > "Theodore Ts'o" <tytso@mit.edu> wrote:
> > 
> > > I know that you're trying to help us, but this tool needs to be far
> > > better than Lockdep before we should think about merging it.  Even if
> > > it finds 5% more potential deadlocks, if it creates 95% more false
> > > positive reports --- and the ones it finds are crazy things that
> > > rarely actually happen in practice, are the costs worth the benefits?
> > > And who is bearing the costs, and who is receiving the benefits?
> > 
> > I personally believe that there's potential that this can be helpful and we
> > will want to merge it.
> > 
> > But, what I believe Ted is trying to say is, if you do not know if the
> > report is a bug or not, please do not ask the maintainers to determine it
> > for you. This is a good opportunity for you to look to see why your tool
> > reported an issue, and learn that subsystem. Look at if this is really a
> > bug or not, and investigate why.
> 
> I agree with Steven here, to the point where I'm willing to invest some
> time being a beta-tester for this, so if you focus your efforts on
> filesystem/mm kinds of problems, I can continue looking at them and
> tell you what's helpful and what's unhelpful in the reports.

I appreciate your support. I'll do my best to make it *THE* perfect tool
for that purpose. I'd feel great if it helps a lot and saves you guys'
time in advance, it might not for now tho.

Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/16] DEPT(Dependency Tracker)
  2022-02-17 17:00   ` Steven Rostedt
  2022-02-17 17:06     ` Matthew Wilcox
  2022-02-18  4:19     ` Theodore Ts'o
@ 2022-02-19 10:18     ` Byungchul Park
  2 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-19 10:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Theodore Ts'o, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, linux-kernel, peterz, will,
	tglx, joel, sashal, daniel.vetter, chris, duyuyang,
	johannes.berg, tj, willy, david, amir73il, bfields, gregkh,
	kernel-team, linux-mm, akpm, mhocko, minchan, hannes,
	vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka,
	ngupta, linux-block, axboe, paolo.valente, josef, linux-fsdevel,
	viro, jack, jack, jlayton, dan.j.williams, hch, djwong,
	dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Thu, Feb 17, 2022 at 12:00:05PM -0500, Steven Rostedt wrote:
> On Thu, 17 Feb 2022 10:51:09 -0500
> "Theodore Ts'o" <tytso@mit.edu> wrote:
> 
> > I know that you're trying to help us, but this tool needs to be far
> > better than Lockdep before we should think about merging it.  Even if
> > it finds 5% more potential deadlocks, if it creates 95% more false
> > positive reports --- and the ones it finds are crazy things that
> > rarely actually happen in practice, are the costs worth the benefits?
> > And who is bearing the costs, and who is receiving the benefits?
> 
> I personally believe that there's potential that this can be helpful and we
> will want to merge it.
> 
> But, what I believe Ted is trying to say is, if you do not know if the
> report is a bug or not, please do not ask the maintainers to determine it
> for you. This is a good opportunity for you to look to see why your tool
> reported an issue, and learn that subsystem. Look at if this is really a
> bug or not, and investigate why.

Appreciate your feedback. I'll be more careful in reporting things, and
I think I need to make it more conservative...

> The likely/unlikely tracing I do finds issues all over the kernel. But
> before I report anything, I look at the subsystem and determine *why* it's
> reporting what it does. In some cases, it's just a config issue. Where, I
> may submit a patch saying "this is 100% wrong in X config, and we should
> just remove the "unlikely". But I did the due diligence to find out exactly
> what the issue is, and why the tooling reported what it reported.

I'll try my best to do things that way. However, thing is that there's
few reports with my system... That's why I shared Dept in LKML space.

> I want to stress that your Dept tooling looks to have the potential of
> being something that will be worth while including. But the false positives
> needs to be down to the rate of lockdep false positives. As Ted said, if
> it's reporting 95% false positives, nobody is going to look at the 5% of
> real bugs that it finds.

Agree. Dept should not be merged if so. I'm not pushing ahead, but I'm
convinced that Dept works what a dependency tracker should do. Let's see
how valuable it is esp. in the middle of developing something in the
kernel.

Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/16] DEPT(Dependency Tracker)
  2022-02-18  4:19     ` Theodore Ts'o
@ 2022-02-19 10:34       ` Byungchul Park
  0 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-19 10:34 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Steven Rostedt, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, linux-kernel, peterz, will,
	tglx, joel, sashal, daniel.vetter, chris, duyuyang,
	johannes.berg, tj, willy, david, amir73il, bfields, gregkh,
	kernel-team, linux-mm, akpm, mhocko, minchan, hannes,
	vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka,
	ngupta, linux-block, axboe, paolo.valente, josef, linux-fsdevel,
	viro, jack, jack, jlayton, dan.j.williams, hch, djwong,
	dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Thu, Feb 17, 2022 at 11:19:15PM -0500, Theodore Ts'o wrote:
> On Thu, Feb 17, 2022 at 12:00:05PM -0500, Steven Rostedt wrote:
> > 
> > I personally believe that there's potential that this can be helpful and we
> > will want to merge it.
> > 
> > But, what I believe Ted is trying to say is, if you do not know if the
> > report is a bug or not, please do not ask the maintainers to determine it
> > for you. This is a good opportunity for you to look to see why your tool
> > reported an issue, and learn that subsystem. Look at if this is really a
> > bug or not, and investigate why.
> 
> I agree there's potential here, or I would have ignored the ext4 "bug
> report".

I just checked this one. Appreciate it...

> When we can get rid of the false positives, I think it should be

Of course, the false positives should be removed once it's found. I will
try my best to remove all of those on my own as much as possible.
However, thing is I can't see others than what I can see with my system.

> merged; I'd just rather it not be merged until after the false
> positives are fixed, since otherwise, someone well-meaning will start
> using it with Syzkaller, and noise that maintainers need to deal with
> (with people requesting reverts of two year old commits, etc) will
> increase by a factor of ten or more.  (With Syzbot reproducers that

Agree.

> set up random cgroups, IP tunnels with wiregaurd enabled, FUSE stress
> testers, etc., that file system maintainers will be asked to try to
> disentangle.)
> 
> So from a maintainer's perspective, false positives are highly
> negative.  It may be that from some people's POV, one bug found and 20
> false positive might still be "useful".  But if your tool gains a
> reputation of not valuing maintainers' time, it's just going to make
> us (or at least me :-) cranky, and it's going to be very hard to

Agree.

> recover from perception.  So it's probably better to be very
> conservative and careful in polishing it before asking for it to be
> merged.

If it's true that there are too many false positives like 95%, then I'll
fix those fist for sure before asking to merge it. Let's see if so.

To kernel developers,

It'd be appreciated if you'd let us know if you can see real ones than
false positives in the middle of developing something in the kernel so
it's useful. Otherwise, it's hard to measure how many false positives it
reports and how valuable it is and so on...

Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-02-17 11:10   ` Report 2 " Byungchul Park
@ 2022-02-21 19:02     ` Jan Kara
  2022-02-23  0:35       ` Byungchul Park
  0 siblings, 1 reply; 67+ messages in thread
From: Jan Kara @ 2022-02-21 19:02 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa


So I was trying to understand what this report is about for some time but
honestly I have failed...

On Thu 17-02-22 20:10:04, Byungchul Park wrote:
> [    9.008161] ===================================================
> [    9.008163] DEPT: Circular dependency has been detected.
> [    9.008164] 5.17.0-rc1-00015-gb94f67143867-dirty #2 Tainted: G        W
> [    9.008166] ---------------------------------------------------
> [    9.008167] summary
> [    9.008167] ---------------------------------------------------
> [    9.008168] *** DEADLOCK ***
> [    9.008168]
> [    9.008168] context A
> [    9.008169]     [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0)
> [    9.008171]     [W] wait(&(&journal->j_wait_commit)->dmap:0)
> [    9.008172]     [E] event(&(&journal->j_wait_transaction_locked)->dmap:0)
> [    9.008173]
> [    9.008173] context B
> [    9.008174]     [S] down_write(mapping.invalidate_lock:0)
> [    9.008175]     [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0)
> [    9.008176]     [E] up_write(mapping.invalidate_lock:0)
> [    9.008177]
> [    9.008178] context C
> [    9.008179]     [S] (unknown)(&(&journal->j_wait_commit)->dmap:0)
> [    9.008180]     [W] down_write(mapping.invalidate_lock:0)
> [    9.008181]     [E] event(&(&journal->j_wait_commit)->dmap:0)
> [    9.008181]
> [    9.008182] [S]: start of the event context
> [    9.008183] [W]: the wait blocked
> [    9.008183] [E]: the event not reachable

So what situation is your tool complaining about here? Can you perhaps show
it here in more common visualization like:

TASK1				TASK2
				does foo, grabs Z
does X, grabs lock Y
blocks on Z
				blocks on Y

or something like that? Because I was not able to decipher this from the
report even after trying for some time...

								Honza

				

> [    9.008184] ---------------------------------------------------
> [    9.008184] context A's detail
> [    9.008185] ---------------------------------------------------
> [    9.008186] context A
> [    9.008186]     [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0)
> [    9.008187]     [W] wait(&(&journal->j_wait_commit)->dmap:0)
> [    9.008188]     [E] event(&(&journal->j_wait_transaction_locked)->dmap:0)
> [    9.008189]
> [    9.008190] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0):
> [    9.008191] (N/A)
> [    9.008191]
> [    9.008192] [W] wait(&(&journal->j_wait_commit)->dmap:0):
> [    9.008193] prepare_to_wait (kernel/sched/wait.c:275) 
> [    9.008197] stacktrace:
> [    9.008198] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) 
> [    9.008200] schedule (kernel/sched/core.c:6373 (discriminator 1)) 
> [    9.008201] kjournald2 (fs/jbd2/journal.c:250) 
> [    9.008203] kthread (kernel/kthread.c:377) 
> [    9.008206] ret_from_fork (arch/x86/entry/entry_64.S:301) 
> [    9.008209]
> [    9.008209] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0):
> [    9.008210] __wake_up_common (kernel/sched/wait.c:108) 
> [    9.008212] stacktrace:
> [    9.008213] dept_event (kernel/dependency/dept.c:2337) 
> [    9.008215] __wake_up_common (kernel/sched/wait.c:109) 
> [    9.008217] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
> [    9.008218] jbd2_journal_commit_transaction (fs/jbd2/commit.c:583) 
> [    9.008221] kjournald2 (fs/jbd2/journal.c:214 (discriminator 3)) 
> [    9.008223] kthread (kernel/kthread.c:377) 
> [    9.008224] ret_from_fork (arch/x86/entry/entry_64.S:301) 
> [    9.008226] ---------------------------------------------------
> [    9.008226] context B's detail
> [    9.008227] ---------------------------------------------------
> [    9.008228] context B
> [    9.008228]     [S] down_write(mapping.invalidate_lock:0)
> [    9.008229]     [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0)
> [    9.008230]     [E] up_write(mapping.invalidate_lock:0)
> [    9.008231]
> [    9.008232] [S] down_write(mapping.invalidate_lock:0):
> [    9.008233] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
> [    9.008237] stacktrace:
> [    9.008237] down_write (kernel/locking/rwsem.c:1514) 
> [    9.008239] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
> [    9.008241] generic_perform_write (mm/filemap.c:3784) 
> [    9.008243] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> [    9.008245] ext4_file_write_iter (fs/ext4/file.c:677) 
> [    9.008247] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> [    9.008250] vfs_write (fs/read_write.c:590) 
> [    9.008251] ksys_write (fs/read_write.c:644) 
> [    9.008253] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> [    9.008255] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> [    9.008258]
> [    9.008258] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0):
> [    9.008259] prepare_to_wait (kernel/sched/wait.c:275) 
> [    9.008261] stacktrace:
> [    9.008261] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) 
> [    9.008263] schedule (kernel/sched/core.c:6373 (discriminator 1)) 
> [    9.008264] wait_transaction_locked (fs/jbd2/transaction.c:184) 
> [    9.008266] add_transaction_credits (fs/jbd2/transaction.c:248 (discriminator 3)) 
> [    9.008267] start_this_handle (fs/jbd2/transaction.c:427) 
> [    9.008269] jbd2__journal_start (fs/jbd2/transaction.c:526) 
> [    9.008271] __ext4_journal_start_sb (fs/ext4/ext4_jbd2.c:105) 
> [    9.008273] ext4_truncate (fs/ext4/inode.c:4164) 
> [    9.008274] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
> [    9.008276] generic_perform_write (mm/filemap.c:3784) 
> [    9.008277] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> [    9.008279] ext4_file_write_iter (fs/ext4/file.c:677) 
> [    9.008281] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> [    9.008283] vfs_write (fs/read_write.c:590) 
> [    9.008284] ksys_write (fs/read_write.c:644) 
> [    9.008285] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> [    9.008287]
> [    9.008288] [E] up_write(mapping.invalidate_lock:0):
> [    9.008288] ext4_da_get_block_prep (fs/ext4/inode.c:1795 fs/ext4/inode.c:1829) 
> [    9.008291] ---------------------------------------------------
> [    9.008291] context C's detail
> [    9.008292] ---------------------------------------------------
> [    9.008292] context C
> [    9.008293]     [S] (unknown)(&(&journal->j_wait_commit)->dmap:0)
> [    9.008294]     [W] down_write(mapping.invalidate_lock:0)
> [    9.008295]     [E] event(&(&journal->j_wait_commit)->dmap:0)
> [    9.008296]
> [    9.008297] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0):
> [    9.008298] (N/A)
> [    9.008298]
> [    9.008299] [W] down_write(mapping.invalidate_lock:0):
> [    9.008299] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
> [    9.008302] stacktrace:
> [    9.008302] down_write (kernel/locking/rwsem.c:1514) 
> [    9.008304] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
> [    9.008305] generic_perform_write (mm/filemap.c:3784) 
> [    9.008307] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> [    9.008309] ext4_file_write_iter (fs/ext4/file.c:677) 
> [    9.008311] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> [    9.008312] vfs_write (fs/read_write.c:590) 
> [    9.008314] ksys_write (fs/read_write.c:644) 
> [    9.008315] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> [    9.008316] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> [    9.008318]
> [    9.008319] [E] event(&(&journal->j_wait_commit)->dmap:0):
> [    9.008320] __wake_up_common (kernel/sched/wait.c:108) 
> [    9.008321] stacktrace:
> [    9.008322] __wake_up_common (kernel/sched/wait.c:109) 
> [    9.008323] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
> [    9.008324] __jbd2_log_start_commit (fs/jbd2/journal.c:508) 
> [    9.008326] jbd2_log_start_commit (fs/jbd2/journal.c:527) 
> [    9.008327] __jbd2_journal_force_commit (fs/jbd2/journal.c:560) 
> [    9.008329] jbd2_journal_force_commit_nested (fs/jbd2/journal.c:583) 
> [    9.008331] ext4_should_retry_alloc (fs/ext4/balloc.c:670 (discriminator 3)) 
> [    9.008332] ext4_da_write_begin (fs/ext4/inode.c:2965 (discriminator 1)) 
> [    9.008334] generic_perform_write (mm/filemap.c:3784) 
> [    9.008335] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> [    9.008337] ext4_file_write_iter (fs/ext4/file.c:677) 
> [    9.008339] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> [    9.008341] vfs_write (fs/read_write.c:590) 
> [    9.008342] ksys_write (fs/read_write.c:644) 
> [    9.008343] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> [    9.008345] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> [    9.008347] ---------------------------------------------------
> [    9.008348] information that might be helpful
> [    9.008348] ---------------------------------------------------
> [    9.008349] CPU: 0 PID: 89 Comm: jbd2/sda1-8 Tainted: G        W         5.17.0-rc1-00015-gb94f67143867-dirty #2
> [    9.008352] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> [    9.008353] Call Trace:
> [    9.008354]  <TASK>
> [    9.008355] dump_stack_lvl (lib/dump_stack.c:107) 
> [    9.008358] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) 
> [    9.008360] ? print_circle (kernel/dependency/dept.c:1086) 
> [    9.008362] cb_check_dl (kernel/dependency/dept.c:1104) 
> [    9.008364] bfs (kernel/dependency/dept.c:860) 
> [    9.008366] add_dep (kernel/dependency/dept.c:1423) 
> [    9.008368] do_event.isra.25 (kernel/dependency/dept.c:1651) 
> [    9.008370] ? __wake_up_common (kernel/sched/wait.c:108) 
> [    9.008372] dept_event (kernel/dependency/dept.c:2337) 
> [    9.008374] __wake_up_common (kernel/sched/wait.c:109) 
> [    9.008376] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
> [    9.008379] jbd2_journal_commit_transaction (fs/jbd2/commit.c:583) 
> [    9.008381] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:24) 
> [    9.008385] ? ret_from_fork (arch/x86/entry/entry_64.S:301) 
> [    9.008387] ? dept_enable_hardirq (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:1843) 
> [    9.008389] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:45 ./arch/x86/include/asm/irqflags.h:80 ./arch/x86/include/asm/irqflags.h:138 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) 
> [    9.008392] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:103 ./include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) 
> [    9.008394] ? try_to_del_timer_sync (kernel/time/timer.c:1239) 
> [    9.008396] kjournald2 (fs/jbd2/journal.c:214 (discriminator 3)) 
> [    9.008398] ? prepare_to_wait_exclusive (kernel/sched/wait.c:431) 
> [    9.008400] ? commit_timeout (fs/jbd2/journal.c:173) 
> [    9.008402] kthread (kernel/kthread.c:377) 
> [    9.008404] ? kthread_complete_and_exit (kernel/kthread.c:332) 
> [    9.008407] ret_from_fork (arch/x86/entry/entry_64.S:301) 
> [    9.008410]  </TASK>
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 1 in ext4 and journal based on v5.17-rc1
  2022-02-17 11:10 ` Report 1 in ext4 and journal based on v5.17-rc1 Byungchul Park
  2022-02-17 11:10   ` Report 2 " Byungchul Park
  2022-02-17 13:27   ` Report 1 " Matthew Wilcox
@ 2022-02-22  8:27   ` Jan Kara
  2022-02-23  1:40     ` Byungchul Park
  2022-02-23  3:30     ` Byungchul Park
  2 siblings, 2 replies; 67+ messages in thread
From: Jan Kara @ 2022-02-22  8:27 UTC (permalink / raw)
  To: Byungchul Park
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Thu 17-02-22 20:10:03, Byungchul Park wrote:
> [    7.009608] ===================================================
> [    7.009613] DEPT: Circular dependency has been detected.
> [    7.009614] 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 Tainted: G        W
> [    7.009616] ---------------------------------------------------
> [    7.009617] summary
> [    7.009618] ---------------------------------------------------
> [    7.009618] *** DEADLOCK ***
> [    7.009618]
> [    7.009619] context A
> [    7.009619]     [S] (unknown)(&(bit_wait_table + i)->dmap:0)
> [    7.009621]     [W] down_write(&ei->i_data_sem:0)
> [    7.009623]     [E] event(&(bit_wait_table + i)->dmap:0)
> [    7.009624]
> [    7.009625] context B
> [    7.009625]     [S] down_read(&ei->i_data_sem:0)
> [    7.009626]     [W] wait(&(bit_wait_table + i)->dmap:0)
> [    7.009627]     [E] up_read(&ei->i_data_sem:0)
> [    7.009628]

Looking into this I have noticed that Dept here tracks bitlocks (buffer
locks in particular) but it apparently treats locks on all buffers as one
locking class so it conflates lock on superblock buffer with a lock on
extent tree block buffer. These are wastly different locks with different
locking constraints. So to avoid false positives in filesystems we will
need to add annotations to differentiate locks on different buffers (based
on what the block is used for). Similarly how we e.g. annotate i_rwsem for
different inodes.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-02-21 19:02     ` Jan Kara
@ 2022-02-23  0:35       ` Byungchul Park
  2022-02-23 14:48         ` Jan Kara
  0 siblings, 1 reply; 67+ messages in thread
From: Byungchul Park @ 2022-02-23  0:35 UTC (permalink / raw)
  To: Jan Kara
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Mon, Feb 21, 2022 at 08:02:04PM +0100, Jan Kara wrote:
> On Thu 17-02-22 20:10:04, Byungchul Park wrote:
> > [    9.008161] ===================================================
> > [    9.008163] DEPT: Circular dependency has been detected.
> > [    9.008164] 5.17.0-rc1-00015-gb94f67143867-dirty #2 Tainted: G        W
> > [    9.008166] ---------------------------------------------------
> > [    9.008167] summary
> > [    9.008167] ---------------------------------------------------
> > [    9.008168] *** DEADLOCK ***
> > [    9.008168]
> > [    9.008168] context A
> > [    9.008169]     [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0)
> > [    9.008171]     [W] wait(&(&journal->j_wait_commit)->dmap:0)
> > [    9.008172]     [E] event(&(&journal->j_wait_transaction_locked)->dmap:0)
> > [    9.008173]
> > [    9.008173] context B
> > [    9.008174]     [S] down_write(mapping.invalidate_lock:0)
> > [    9.008175]     [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0)
> > [    9.008176]     [E] up_write(mapping.invalidate_lock:0)
> > [    9.008177]
> > [    9.008178] context C
> > [    9.008179]     [S] (unknown)(&(&journal->j_wait_commit)->dmap:0)
> > [    9.008180]     [W] down_write(mapping.invalidate_lock:0)
> > [    9.008181]     [E] event(&(&journal->j_wait_commit)->dmap:0)
> > [    9.008181]
> > [    9.008182] [S]: start of the event context
> > [    9.008183] [W]: the wait blocked
> > [    9.008183] [E]: the event not reachable
> 
> So what situation is your tool complaining about here? Can you perhaps show
> it here in more common visualization like:

Sure.

> TASK1				TASK2
> 				does foo, grabs Z
> does X, grabs lock Y
> blocks on Z
> 				blocks on Y
> 
> or something like that? Because I was not able to decipher this from the
> report even after trying for some time...

KJOURNALD2(kthread)	TASK1(ksys_write)	TASK2(ksys_write)

wait A
--- stuck
			wait B
			--- stuck
						wait C
						--- stuck

wake up B		wake up C		wake up A

where:
A is a wait_queue, j_wait_commit
B is a wait_queue, j_wait_transaction_locked
C is a rwsem, mapping.invalidate_lock

The above is the simplest form. And it's worth noting that Dept focuses
on wait and event itself rather than grabing and releasing things like
lock. The following is the more descriptive form of it.

KJOURNALD2(kthread)	TASK1(ksys_write)	TASK2(ksys_write)

wait @j_wait_commit
			ext4_truncate_failed_write()
			   down_write(mapping.invalidate_lock)

			   ext4_truncate()
			      ...
			      wait @j_wait_transaction_locked

						ext_truncate_failed_write()
						   down_write(mapping.invalidate_lock)

						ext4_should_retry_alloc()
						   ...
						   __jbd2_log_start_commit()
						      wake_up(j_wait_commit)
jbd2_journal_commit_transaction()
   wake_up(j_wait_transaction_locked)
			   up_write(mapping.invalidate_lock)

I hope this would help you understand the report.

Yeah... This is what Dept complained. And as Ted said, the kthread would
be woken up by another wakeup. So it's not deadlock deadlock. However,
these three threads and any other tasks waiting for any of the events A,
B, C would be struck for a while until the wakeup eventually wakes up
the kthread, kjournald2.

I guess it's not what ext4 is meant to do. Honestly, ext4 and journal
system look so complicated that I'm not convinced tho...

Thanks,
Byungchul

> 
> 								Honza
> 
> 				
> 
> > [    9.008184] ---------------------------------------------------
> > [    9.008184] context A's detail
> > [    9.008185] ---------------------------------------------------
> > [    9.008186] context A
> > [    9.008186]     [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0)
> > [    9.008187]     [W] wait(&(&journal->j_wait_commit)->dmap:0)
> > [    9.008188]     [E] event(&(&journal->j_wait_transaction_locked)->dmap:0)
> > [    9.008189]
> > [    9.008190] [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0):
> > [    9.008191] (N/A)
> > [    9.008191]
> > [    9.008192] [W] wait(&(&journal->j_wait_commit)->dmap:0):
> > [    9.008193] prepare_to_wait (kernel/sched/wait.c:275) 
> > [    9.008197] stacktrace:
> > [    9.008198] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) 
> > [    9.008200] schedule (kernel/sched/core.c:6373 (discriminator 1)) 
> > [    9.008201] kjournald2 (fs/jbd2/journal.c:250) 
> > [    9.008203] kthread (kernel/kthread.c:377) 
> > [    9.008206] ret_from_fork (arch/x86/entry/entry_64.S:301) 
> > [    9.008209]
> > [    9.008209] [E] event(&(&journal->j_wait_transaction_locked)->dmap:0):
> > [    9.008210] __wake_up_common (kernel/sched/wait.c:108) 
> > [    9.008212] stacktrace:
> > [    9.008213] dept_event (kernel/dependency/dept.c:2337) 
> > [    9.008215] __wake_up_common (kernel/sched/wait.c:109) 
> > [    9.008217] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
> > [    9.008218] jbd2_journal_commit_transaction (fs/jbd2/commit.c:583) 
> > [    9.008221] kjournald2 (fs/jbd2/journal.c:214 (discriminator 3)) 
> > [    9.008223] kthread (kernel/kthread.c:377) 
> > [    9.008224] ret_from_fork (arch/x86/entry/entry_64.S:301) 
> > [    9.008226] ---------------------------------------------------
> > [    9.008226] context B's detail
> > [    9.008227] ---------------------------------------------------
> > [    9.008228] context B
> > [    9.008228]     [S] down_write(mapping.invalidate_lock:0)
> > [    9.008229]     [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0)
> > [    9.008230]     [E] up_write(mapping.invalidate_lock:0)
> > [    9.008231]
> > [    9.008232] [S] down_write(mapping.invalidate_lock:0):
> > [    9.008233] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
> > [    9.008237] stacktrace:
> > [    9.008237] down_write (kernel/locking/rwsem.c:1514) 
> > [    9.008239] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
> > [    9.008241] generic_perform_write (mm/filemap.c:3784) 
> > [    9.008243] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> > [    9.008245] ext4_file_write_iter (fs/ext4/file.c:677) 
> > [    9.008247] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> > [    9.008250] vfs_write (fs/read_write.c:590) 
> > [    9.008251] ksys_write (fs/read_write.c:644) 
> > [    9.008253] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> > [    9.008255] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> > [    9.008258]
> > [    9.008258] [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0):
> > [    9.008259] prepare_to_wait (kernel/sched/wait.c:275) 
> > [    9.008261] stacktrace:
> > [    9.008261] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) 
> > [    9.008263] schedule (kernel/sched/core.c:6373 (discriminator 1)) 
> > [    9.008264] wait_transaction_locked (fs/jbd2/transaction.c:184) 
> > [    9.008266] add_transaction_credits (fs/jbd2/transaction.c:248 (discriminator 3)) 
> > [    9.008267] start_this_handle (fs/jbd2/transaction.c:427) 
> > [    9.008269] jbd2__journal_start (fs/jbd2/transaction.c:526) 
> > [    9.008271] __ext4_journal_start_sb (fs/ext4/ext4_jbd2.c:105) 
> > [    9.008273] ext4_truncate (fs/ext4/inode.c:4164) 
> > [    9.008274] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
> > [    9.008276] generic_perform_write (mm/filemap.c:3784) 
> > [    9.008277] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> > [    9.008279] ext4_file_write_iter (fs/ext4/file.c:677) 
> > [    9.008281] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> > [    9.008283] vfs_write (fs/read_write.c:590) 
> > [    9.008284] ksys_write (fs/read_write.c:644) 
> > [    9.008285] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> > [    9.008287]
> > [    9.008288] [E] up_write(mapping.invalidate_lock:0):
> > [    9.008288] ext4_da_get_block_prep (fs/ext4/inode.c:1795 fs/ext4/inode.c:1829) 
> > [    9.008291] ---------------------------------------------------
> > [    9.008291] context C's detail
> > [    9.008292] ---------------------------------------------------
> > [    9.008292] context C
> > [    9.008293]     [S] (unknown)(&(&journal->j_wait_commit)->dmap:0)
> > [    9.008294]     [W] down_write(mapping.invalidate_lock:0)
> > [    9.008295]     [E] event(&(&journal->j_wait_commit)->dmap:0)
> > [    9.008296]
> > [    9.008297] [S] (unknown)(&(&journal->j_wait_commit)->dmap:0):
> > [    9.008298] (N/A)
> > [    9.008298]
> > [    9.008299] [W] down_write(mapping.invalidate_lock:0):
> > [    9.008299] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
> > [    9.008302] stacktrace:
> > [    9.008302] down_write (kernel/locking/rwsem.c:1514) 
> > [    9.008304] ext4_da_write_begin (fs/ext4/truncate.h:21 fs/ext4/inode.c:2963) 
> > [    9.008305] generic_perform_write (mm/filemap.c:3784) 
> > [    9.008307] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> > [    9.008309] ext4_file_write_iter (fs/ext4/file.c:677) 
> > [    9.008311] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> > [    9.008312] vfs_write (fs/read_write.c:590) 
> > [    9.008314] ksys_write (fs/read_write.c:644) 
> > [    9.008315] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> > [    9.008316] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> > [    9.008318]
> > [    9.008319] [E] event(&(&journal->j_wait_commit)->dmap:0):
> > [    9.008320] __wake_up_common (kernel/sched/wait.c:108) 
> > [    9.008321] stacktrace:
> > [    9.008322] __wake_up_common (kernel/sched/wait.c:109) 
> > [    9.008323] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
> > [    9.008324] __jbd2_log_start_commit (fs/jbd2/journal.c:508) 
> > [    9.008326] jbd2_log_start_commit (fs/jbd2/journal.c:527) 
> > [    9.008327] __jbd2_journal_force_commit (fs/jbd2/journal.c:560) 
> > [    9.008329] jbd2_journal_force_commit_nested (fs/jbd2/journal.c:583) 
> > [    9.008331] ext4_should_retry_alloc (fs/ext4/balloc.c:670 (discriminator 3)) 
> > [    9.008332] ext4_da_write_begin (fs/ext4/inode.c:2965 (discriminator 1)) 
> > [    9.008334] generic_perform_write (mm/filemap.c:3784) 
> > [    9.008335] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> > [    9.008337] ext4_file_write_iter (fs/ext4/file.c:677) 
> > [    9.008339] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> > [    9.008341] vfs_write (fs/read_write.c:590) 
> > [    9.008342] ksys_write (fs/read_write.c:644) 
> > [    9.008343] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> > [    9.008345] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> > [    9.008347] ---------------------------------------------------
> > [    9.008348] information that might be helpful
> > [    9.008348] ---------------------------------------------------
> > [    9.008349] CPU: 0 PID: 89 Comm: jbd2/sda1-8 Tainted: G        W         5.17.0-rc1-00015-gb94f67143867-dirty #2
> > [    9.008352] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [    9.008353] Call Trace:
> > [    9.008354]  <TASK>
> > [    9.008355] dump_stack_lvl (lib/dump_stack.c:107) 
> > [    9.008358] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) 
> > [    9.008360] ? print_circle (kernel/dependency/dept.c:1086) 
> > [    9.008362] cb_check_dl (kernel/dependency/dept.c:1104) 
> > [    9.008364] bfs (kernel/dependency/dept.c:860) 
> > [    9.008366] add_dep (kernel/dependency/dept.c:1423) 
> > [    9.008368] do_event.isra.25 (kernel/dependency/dept.c:1651) 
> > [    9.008370] ? __wake_up_common (kernel/sched/wait.c:108) 
> > [    9.008372] dept_event (kernel/dependency/dept.c:2337) 
> > [    9.008374] __wake_up_common (kernel/sched/wait.c:109) 
> > [    9.008376] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
> > [    9.008379] jbd2_journal_commit_transaction (fs/jbd2/commit.c:583) 
> > [    9.008381] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:24) 
> > [    9.008385] ? ret_from_fork (arch/x86/entry/entry_64.S:301) 
> > [    9.008387] ? dept_enable_hardirq (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:1843) 
> > [    9.008389] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:45 ./arch/x86/include/asm/irqflags.h:80 ./arch/x86/include/asm/irqflags.h:138 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) 
> > [    9.008392] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:103 ./include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) 
> > [    9.008394] ? try_to_del_timer_sync (kernel/time/timer.c:1239) 
> > [    9.008396] kjournald2 (fs/jbd2/journal.c:214 (discriminator 3)) 
> > [    9.008398] ? prepare_to_wait_exclusive (kernel/sched/wait.c:431) 
> > [    9.008400] ? commit_timeout (fs/jbd2/journal.c:173) 
> > [    9.008402] kthread (kernel/kthread.c:377) 
> > [    9.008404] ? kthread_complete_and_exit (kernel/kthread.c:332) 
> > [    9.008407] ret_from_fork (arch/x86/entry/entry_64.S:301) 
> > [    9.008410]  </TASK>
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 1 in ext4 and journal based on v5.17-rc1
  2022-02-22  8:27   ` Jan Kara
@ 2022-02-23  1:40     ` Byungchul Park
  2022-02-23  3:30     ` Byungchul Park
  1 sibling, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-23  1:40 UTC (permalink / raw)
  To: Jan Kara
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Tue, Feb 22, 2022 at 09:27:23AM +0100, Jan Kara wrote:
> On Thu 17-02-22 20:10:03, Byungchul Park wrote:
> > [    7.009608] ===================================================
> > [    7.009613] DEPT: Circular dependency has been detected.
> > [    7.009614] 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 Tainted: G        W
> > [    7.009616] ---------------------------------------------------
> > [    7.009617] summary
> > [    7.009618] ---------------------------------------------------
> > [    7.009618] *** DEADLOCK ***
> > [    7.009618]
> > [    7.009619] context A
> > [    7.009619]     [S] (unknown)(&(bit_wait_table + i)->dmap:0)
> > [    7.009621]     [W] down_write(&ei->i_data_sem:0)
> > [    7.009623]     [E] event(&(bit_wait_table + i)->dmap:0)
> > [    7.009624]
> > [    7.009625] context B
> > [    7.009625]     [S] down_read(&ei->i_data_sem:0)
> > [    7.009626]     [W] wait(&(bit_wait_table + i)->dmap:0)
> > [    7.009627]     [E] up_read(&ei->i_data_sem:0)
> > [    7.009628]
> 
> Looking into this I have noticed that Dept here tracks bitlocks (buffer
> locks in particular) but it apparently treats locks on all buffers as one
> locking class so it conflates lock on superblock buffer with a lock on
> extent tree block buffer. These are wastly different locks with different
> locking constraints. So to avoid false positives in filesystems we will
> need to add annotations to differentiate locks on different buffers (based
> on what the block is used for). Similarly how we e.g. annotate i_rwsem for

Exactly yes. All synchronization objects should be classfied by what it
is used for. Even though it's already classified by the location of the
code initializing the object - roughly and normally saying we can expect
those have the same constraint, we are actually assigning different
constraints according to the subtle design esp. in file systems.

It would also help the code have better documentation ;-) I'm willing to
add annotations for that to fs.

> different inodes.
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 1 in ext4 and journal based on v5.17-rc1
  2022-02-22  8:27   ` Jan Kara
  2022-02-23  1:40     ` Byungchul Park
@ 2022-02-23  3:30     ` Byungchul Park
  1 sibling, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-02-23  3:30 UTC (permalink / raw)
  To: Jan Kara
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Tue, Feb 22, 2022 at 09:27:23AM +0100, Jan Kara wrote:
> On Thu 17-02-22 20:10:03, Byungchul Park wrote:
> > [    7.009608] ===================================================
> > [    7.009613] DEPT: Circular dependency has been detected.
> > [    7.009614] 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 Tainted: G        W
> > [    7.009616] ---------------------------------------------------
> > [    7.009617] summary
> > [    7.009618] ---------------------------------------------------
> > [    7.009618] *** DEADLOCK ***
> > [    7.009618]
> > [    7.009619] context A
> > [    7.009619]     [S] (unknown)(&(bit_wait_table + i)->dmap:0)
> > [    7.009621]     [W] down_write(&ei->i_data_sem:0)
> > [    7.009623]     [E] event(&(bit_wait_table + i)->dmap:0)
> > [    7.009624]
> > [    7.009625] context B
> > [    7.009625]     [S] down_read(&ei->i_data_sem:0)
> > [    7.009626]     [W] wait(&(bit_wait_table + i)->dmap:0)
> > [    7.009627]     [E] up_read(&ei->i_data_sem:0)
> > [    7.009628]
> 
> Looking into this I have noticed that Dept here tracks bitlocks (buffer
> locks in particular) but it apparently treats locks on all buffers as one
> locking class so it conflates lock on superblock buffer with a lock on
> extent tree block buffer. These are wastly different locks with different
> locking constraints. So to avoid false positives in filesystems we will
> need to add annotations to differentiate locks on different buffers (based
> on what the block is used for). Similarly how we e.g. annotate i_rwsem for
> different inodes.

Hi Jan Kara,

I just understood why some guys in this space got mad at Dept reports.
I barely got reports from the lock you mentioned with my system -
precisely speaking only one, even though I've been rebooting my system
many times. But another report that someone gave for me showed there
were a lot of reports from the lock.

Your comment and the report are so much helpful. I need to assign
each's own class first for the buffer locks. Thank you very much.

Thanks,
Byungchul

> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-02-23  0:35       ` Byungchul Park
@ 2022-02-23 14:48         ` Jan Kara
  2022-02-24  1:11           ` Byungchul Park
  0 siblings, 1 reply; 67+ messages in thread
From: Jan Kara @ 2022-02-23 14:48 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Jan Kara, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, linux-kernel, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj,
	tytso, willy, david, amir73il, bfields, gregkh, kernel-team,
	linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj,
	jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta,
	linux-block, axboe, paolo.valente, josef, linux-fsdevel, viro,
	jack, jlayton, dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Wed 23-02-22 09:35:34, Byungchul Park wrote:
> On Mon, Feb 21, 2022 at 08:02:04PM +0100, Jan Kara wrote:
> > On Thu 17-02-22 20:10:04, Byungchul Park wrote:
> > > [    9.008161] ===================================================
> > > [    9.008163] DEPT: Circular dependency has been detected.
> > > [    9.008164] 5.17.0-rc1-00015-gb94f67143867-dirty #2 Tainted: G        W
> > > [    9.008166] ---------------------------------------------------
> > > [    9.008167] summary
> > > [    9.008167] ---------------------------------------------------
> > > [    9.008168] *** DEADLOCK ***
> > > [    9.008168]
> > > [    9.008168] context A
> > > [    9.008169]     [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0)
> > > [    9.008171]     [W] wait(&(&journal->j_wait_commit)->dmap:0)
> > > [    9.008172]     [E] event(&(&journal->j_wait_transaction_locked)->dmap:0)
> > > [    9.008173]
> > > [    9.008173] context B
> > > [    9.008174]     [S] down_write(mapping.invalidate_lock:0)
> > > [    9.008175]     [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0)
> > > [    9.008176]     [E] up_write(mapping.invalidate_lock:0)
> > > [    9.008177]
> > > [    9.008178] context C
> > > [    9.008179]     [S] (unknown)(&(&journal->j_wait_commit)->dmap:0)
> > > [    9.008180]     [W] down_write(mapping.invalidate_lock:0)
> > > [    9.008181]     [E] event(&(&journal->j_wait_commit)->dmap:0)
> > > [    9.008181]
> > > [    9.008182] [S]: start of the event context
> > > [    9.008183] [W]: the wait blocked
> > > [    9.008183] [E]: the event not reachable
> > 
> > So what situation is your tool complaining about here? Can you perhaps show
> > it here in more common visualization like:
> 
> Sure.
> 
> > TASK1				TASK2
> > 				does foo, grabs Z
> > does X, grabs lock Y
> > blocks on Z
> > 				blocks on Y
> > 
> > or something like that? Because I was not able to decipher this from the
> > report even after trying for some time...
> 
> KJOURNALD2(kthread)	TASK1(ksys_write)	TASK2(ksys_write)
> 
> wait A
> --- stuck
> 			wait B
> 			--- stuck
> 						wait C
> 						--- stuck
> 
> wake up B		wake up C		wake up A
> 
> where:
> A is a wait_queue, j_wait_commit
> B is a wait_queue, j_wait_transaction_locked
> C is a rwsem, mapping.invalidate_lock

I see. But a situation like this is not necessarily a guarantee of a
deadlock, is it? I mean there can be task D that will eventually call say
'wake up B' and unblock everything and this is how things were designed to
work? Multiple sources of wakeups are quite common I'd say... What does
Dept do to prevent false reports in cases like this?

> The above is the simplest form. And it's worth noting that Dept focuses
> on wait and event itself rather than grabing and releasing things like
> lock. The following is the more descriptive form of it.
> 
> KJOURNALD2(kthread)	TASK1(ksys_write)	TASK2(ksys_write)
> 
> wait @j_wait_commit
> 			ext4_truncate_failed_write()
> 			   down_write(mapping.invalidate_lock)
> 
> 			   ext4_truncate()
> 			      ...
> 			      wait @j_wait_transaction_locked
> 
> 						ext_truncate_failed_write()
> 						   down_write(mapping.invalidate_lock)
> 
> 						ext4_should_retry_alloc()
> 						   ...
> 						   __jbd2_log_start_commit()
> 						      wake_up(j_wait_commit)
> jbd2_journal_commit_transaction()
>    wake_up(j_wait_transaction_locked)
> 			   up_write(mapping.invalidate_lock)
> 
> I hope this would help you understand the report.

I see, thanks for explanation! So the above scenario is impossible because
for anyone to block on @j_wait_transaction_locked the transaction must be
committing, which is done only by kjournald2 kthread and so that thread
cannot be waiting at @j_wait_commit. Essentially blocking on
@j_wait_transaction_locked means @j_wait_commit wakeup was already done.

I guess this shows there can be non-trivial dependencies between wait
queues which are difficult to track in an automated way and without such
tracking we are going to see false positives...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-02-23 14:48         ` Jan Kara
@ 2022-02-24  1:11           ` Byungchul Park
  2022-02-24 10:22             ` Jan Kara
  0 siblings, 1 reply; 67+ messages in thread
From: Byungchul Park @ 2022-02-24  1:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Wed, Feb 23, 2022 at 03:48:59PM +0100, Jan Kara wrote:
> On Wed 23-02-22 09:35:34, Byungchul Park wrote:
> > On Mon, Feb 21, 2022 at 08:02:04PM +0100, Jan Kara wrote:
> > > On Thu 17-02-22 20:10:04, Byungchul Park wrote:
> > > > [    9.008161] ===================================================
> > > > [    9.008163] DEPT: Circular dependency has been detected.
> > > > [    9.008164] 5.17.0-rc1-00015-gb94f67143867-dirty #2 Tainted: G        W
> > > > [    9.008166] ---------------------------------------------------
> > > > [    9.008167] summary
> > > > [    9.008167] ---------------------------------------------------
> > > > [    9.008168] *** DEADLOCK ***
> > > > [    9.008168]
> > > > [    9.008168] context A
> > > > [    9.008169]     [S] (unknown)(&(&journal->j_wait_transaction_locked)->dmap:0)
> > > > [    9.008171]     [W] wait(&(&journal->j_wait_commit)->dmap:0)
> > > > [    9.008172]     [E] event(&(&journal->j_wait_transaction_locked)->dmap:0)
> > > > [    9.008173]
> > > > [    9.008173] context B
> > > > [    9.008174]     [S] down_write(mapping.invalidate_lock:0)
> > > > [    9.008175]     [W] wait(&(&journal->j_wait_transaction_locked)->dmap:0)
> > > > [    9.008176]     [E] up_write(mapping.invalidate_lock:0)
> > > > [    9.008177]
> > > > [    9.008178] context C
> > > > [    9.008179]     [S] (unknown)(&(&journal->j_wait_commit)->dmap:0)
> > > > [    9.008180]     [W] down_write(mapping.invalidate_lock:0)
> > > > [    9.008181]     [E] event(&(&journal->j_wait_commit)->dmap:0)
> > > > [    9.008181]
> > > > [    9.008182] [S]: start of the event context
> > > > [    9.008183] [W]: the wait blocked
> > > > [    9.008183] [E]: the event not reachable
> > > 
> > > So what situation is your tool complaining about here? Can you perhaps show
> > > it here in more common visualization like:
> > 
> > Sure.
> > 
> > > TASK1				TASK2
> > > 				does foo, grabs Z
> > > does X, grabs lock Y
> > > blocks on Z
> > > 				blocks on Y
> > > 
> > > or something like that? Because I was not able to decipher this from the
> > > report even after trying for some time...
> > 
> > KJOURNALD2(kthread)	TASK1(ksys_write)	TASK2(ksys_write)
> > 
> > wait A
> > --- stuck
> > 			wait B
> > 			--- stuck
> > 						wait C
> > 						--- stuck
> > 
> > wake up B		wake up C		wake up A
> > 
> > where:
> > A is a wait_queue, j_wait_commit
> > B is a wait_queue, j_wait_transaction_locked
> > C is a rwsem, mapping.invalidate_lock
> 
> I see. But a situation like this is not necessarily a guarantee of a
> deadlock, is it? I mean there can be task D that will eventually call say
> 'wake up B' and unblock everything and this is how things were designed to
> work? Multiple sources of wakeups are quite common I'd say... What does

Yes. At the very beginning when I desgined Dept, I was thinking whether
to support multiple wakeup sources or not for a quite long time.
Supporting it would be a better option to aovid non-critical reports.
However, I thought anyway we'd better fix it - not urgent tho - if
there's any single circle dependency. That's why I decided not to
support it for now and wanted to gather the kernel guys' opinions. Thing
is which policy we should go with.

> Dept do to prevent false reports in cases like this?
> 
> > The above is the simplest form. And it's worth noting that Dept focuses
> > on wait and event itself rather than grabing and releasing things like
> > lock. The following is the more descriptive form of it.
> > 
> > KJOURNALD2(kthread)	TASK1(ksys_write)	TASK2(ksys_write)
> > 
> > wait @j_wait_commit
> > 			ext4_truncate_failed_write()
> > 			   down_write(mapping.invalidate_lock)
> > 
> > 			   ext4_truncate()
> > 			      ...
> > 			      wait @j_wait_transaction_locked
> > 
> > 						ext_truncate_failed_write()
> > 						   down_write(mapping.invalidate_lock)
> > 
> > 						ext4_should_retry_alloc()
> > 						   ...
> > 						   __jbd2_log_start_commit()
> > 						      wake_up(j_wait_commit)
> > jbd2_journal_commit_transaction()
> >    wake_up(j_wait_transaction_locked)
> > 			   up_write(mapping.invalidate_lock)
> > 
> > I hope this would help you understand the report.
> 
> I see, thanks for explanation! So the above scenario is impossible because

My pleasure.

> for anyone to block on @j_wait_transaction_locked the transaction must be
> committing, which is done only by kjournald2 kthread and so that thread
> cannot be waiting at @j_wait_commit. Essentially blocking on
> @j_wait_transaction_locked means @j_wait_commit wakeup was already done.

kjournal2 repeatedly does the wait and the wake_up so the above scenario
looks possible to me even based on what you explained. Maybe I should
understand how the journal things work more for furhter discussion. Your
explanation is so helpful. Thank you really.

Thanks,
Byungchul

> I guess this shows there can be non-trivial dependencies between wait
> queues which are difficult to track in an automated way and without such
> tracking we are going to see false positives...
> 
> 								Honza
> 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-02-24  1:11           ` Byungchul Park
@ 2022-02-24 10:22             ` Jan Kara
  2022-02-28  9:28               ` Byungchul Park
  0 siblings, 1 reply; 67+ messages in thread
From: Jan Kara @ 2022-02-24 10:22 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Jan Kara, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, linux-kernel, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj,
	tytso, willy, david, amir73il, bfields, gregkh, kernel-team,
	linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj,
	jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta,
	linux-block, axboe, paolo.valente, josef, linux-fsdevel, viro,
	jack, jlayton, dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Thu 24-02-22 10:11:02, Byungchul Park wrote:
> On Wed, Feb 23, 2022 at 03:48:59PM +0100, Jan Kara wrote:
> > > KJOURNALD2(kthread)	TASK1(ksys_write)	TASK2(ksys_write)
> > > 
> > > wait A
> > > --- stuck
> > > 			wait B
> > > 			--- stuck
> > > 						wait C
> > > 						--- stuck
> > > 
> > > wake up B		wake up C		wake up A
> > > 
> > > where:
> > > A is a wait_queue, j_wait_commit
> > > B is a wait_queue, j_wait_transaction_locked
> > > C is a rwsem, mapping.invalidate_lock
> > 
> > I see. But a situation like this is not necessarily a guarantee of a
> > deadlock, is it? I mean there can be task D that will eventually call say
> > 'wake up B' and unblock everything and this is how things were designed to
> > work? Multiple sources of wakeups are quite common I'd say... What does
> 
> Yes. At the very beginning when I desgined Dept, I was thinking whether
> to support multiple wakeup sources or not for a quite long time.
> Supporting it would be a better option to aovid non-critical reports.
> However, I thought anyway we'd better fix it - not urgent tho - if
> there's any single circle dependency. That's why I decided not to
> support it for now and wanted to gather the kernel guys' opinions. Thing
> is which policy we should go with.

I see. So supporting only a single wakeup source is fine for locks I guess.
But for general wait queues or other synchronization mechanisms, I'm afraid
it will lead to quite some false positive reports. Just my 2c.

> > Dept do to prevent false reports in cases like this?
> > 
> > > The above is the simplest form. And it's worth noting that Dept focuses
> > > on wait and event itself rather than grabing and releasing things like
> > > lock. The following is the more descriptive form of it.
> > > 
> > > KJOURNALD2(kthread)	TASK1(ksys_write)	TASK2(ksys_write)
> > > 
> > > wait @j_wait_commit
> > > 			ext4_truncate_failed_write()
> > > 			   down_write(mapping.invalidate_lock)
> > > 
> > > 			   ext4_truncate()
> > > 			      ...
> > > 			      wait @j_wait_transaction_locked
> > > 
> > > 						ext_truncate_failed_write()
> > > 						   down_write(mapping.invalidate_lock)
> > > 
> > > 						ext4_should_retry_alloc()
> > > 						   ...
> > > 						   __jbd2_log_start_commit()
> > > 						      wake_up(j_wait_commit)
> > > jbd2_journal_commit_transaction()
> > >    wake_up(j_wait_transaction_locked)
> > > 			   up_write(mapping.invalidate_lock)
> > > 
> > > I hope this would help you understand the report.
> > 
> > I see, thanks for explanation! So the above scenario is impossible because
> 
> My pleasure.
> 
> > for anyone to block on @j_wait_transaction_locked the transaction must be
> > committing, which is done only by kjournald2 kthread and so that thread
> > cannot be waiting at @j_wait_commit. Essentially blocking on
> > @j_wait_transaction_locked means @j_wait_commit wakeup was already done.
> 
> kjournal2 repeatedly does the wait and the wake_up so the above scenario
> looks possible to me even based on what you explained. Maybe I should
> understand how the journal things work more for furhter discussion. Your
> explanation is so helpful. Thank you really.

OK, let me provide you with more details for better understanding :) In
jbd2 we have an object called 'transaction'. This object can go through
many states but for our case is important that transaction is moved to
T_LOCKED state and out of it only while jbd2_journal_commit_transaction()
function is executing and waiting on j_wait_transaction_locked waitqueue is
exactly waiting for a transaction to get out of T_LOCKED state. Function
jbd2_journal_commit_transaction() is executed only by kjournald. Hence
anyone can see transaction in T_LOCKED state only if kjournald is running
inside jbd2_journal_commit_transaction() and thus kjournald cannot be
sleeping on j_wait_commit at the same time. Does this explain things?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-02-24 10:22             ` Jan Kara
@ 2022-02-28  9:28               ` Byungchul Park
  2022-02-28 10:14                 ` Jan Kara
  0 siblings, 1 reply; 67+ messages in thread
From: Byungchul Park @ 2022-02-28  9:28 UTC (permalink / raw)
  To: Jan Kara
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Thu, Feb 24, 2022 at 11:22:39AM +0100, Jan Kara wrote:
> On Thu 24-02-22 10:11:02, Byungchul Park wrote:
> > On Wed, Feb 23, 2022 at 03:48:59PM +0100, Jan Kara wrote:
> > > > KJOURNALD2(kthread)	TASK1(ksys_write)	TASK2(ksys_write)
> > > > 
> > > > wait A
> > > > --- stuck
> > > > 			wait B
> > > > 			--- stuck
> > > > 						wait C
> > > > 						--- stuck
> > > > 
> > > > wake up B		wake up C		wake up A
> > > > 
> > > > where:
> > > > A is a wait_queue, j_wait_commit
> > > > B is a wait_queue, j_wait_transaction_locked
> > > > C is a rwsem, mapping.invalidate_lock
> > > 
> > > I see. But a situation like this is not necessarily a guarantee of a
> > > deadlock, is it? I mean there can be task D that will eventually call say
> > > 'wake up B' and unblock everything and this is how things were designed to
> > > work? Multiple sources of wakeups are quite common I'd say... What does
> > 
> > Yes. At the very beginning when I desgined Dept, I was thinking whether
> > to support multiple wakeup sources or not for a quite long time.
> > Supporting it would be a better option to aovid non-critical reports.
> > However, I thought anyway we'd better fix it - not urgent tho - if
> > there's any single circle dependency. That's why I decided not to
> > support it for now and wanted to gather the kernel guys' opinions. Thing
> > is which policy we should go with.
> 
> I see. So supporting only a single wakeup source is fine for locks I guess.
> But for general wait queues or other synchronization mechanisms, I'm afraid
> it will lead to quite some false positive reports. Just my 2c.

Thank you for your feedback.

I realized we've been using "false positive" differently. There exist
the three types of code in terms of dependency and deadlock. It's worth
noting that dependencies are built from between waits and events in Dept.

---

case 1. Code with an actual circular dependency, but not deadlock.

   A circular dependency can be broken by a rescue wakeup source e.g.
   timeout. It's not a deadlock. If it's okay that the contexts
   participating in the circular dependency and others waiting for the
   events in the circle are stuck until it gets broken. Otherwise, say,
   if it's not meant, then it's anyway problematic.

   1-1. What if we judge this code is problematic?
   1-2. What if we judge this code is good?

case 2. Code with an actual circular dependency, and deadlock.

   There's no other wakeup source than those within the circular
   dependency. Literally deadlock. It's problematic and critical.

   2-1. What if we judge this code is problematic?
   2-2. What if we judge this code is good?

case 3. Code with no actual circular dependency, and not deadlock.

   Must be good.

   3-1. What if we judge this code is problematic?
   3-2. What if we judge this code is good?

---

I call only 3-1 "false positive" circular dependency. And you call 1-1
and 3-1 "false positive" deadlock.

I've been wondering if the kernel guys esp. Linus considers code with
any circular dependency is problematic or not, even if it won't lead to
a deadlock, say, case 1. Even though I designed Dept based on what I
believe is right, of course, I'm willing to change the design according
to the majority opinion.

However, I would never allow case 1 if I were the owner of the kernel
for better stability, even though the code works anyway okay for now.

Thanks,
Byungchul

> > > Dept do to prevent false reports in cases like this?
> > > 
> > > > The above is the simplest form. And it's worth noting that Dept focuses
> > > > on wait and event itself rather than grabing and releasing things like
> > > > lock. The following is the more descriptive form of it.
> > > > 
> > > > KJOURNALD2(kthread)	TASK1(ksys_write)	TASK2(ksys_write)
> > > > 
> > > > wait @j_wait_commit
> > > > 			ext4_truncate_failed_write()
> > > > 			   down_write(mapping.invalidate_lock)
> > > > 
> > > > 			   ext4_truncate()
> > > > 			      ...
> > > > 			      wait @j_wait_transaction_locked
> > > > 
> > > > 						ext_truncate_failed_write()
> > > > 						   down_write(mapping.invalidate_lock)
> > > > 
> > > > 						ext4_should_retry_alloc()
> > > > 						   ...
> > > > 						   __jbd2_log_start_commit()
> > > > 						      wake_up(j_wait_commit)
> > > > jbd2_journal_commit_transaction()
> > > >    wake_up(j_wait_transaction_locked)
> > > > 			   up_write(mapping.invalidate_lock)
> > > > 
> > > > I hope this would help you understand the report.
> > > 
> > > I see, thanks for explanation! So the above scenario is impossible because
> > 
> > My pleasure.
> > 
> > > for anyone to block on @j_wait_transaction_locked the transaction must be
> > > committing, which is done only by kjournald2 kthread and so that thread
> > > cannot be waiting at @j_wait_commit. Essentially blocking on
> > > @j_wait_transaction_locked means @j_wait_commit wakeup was already done.
> > 
> > kjournal2 repeatedly does the wait and the wake_up so the above scenario
> > looks possible to me even based on what you explained. Maybe I should
> > understand how the journal things work more for furhter discussion. Your
> > explanation is so helpful. Thank you really.
> 
> OK, let me provide you with more details for better understanding :) In
> jbd2 we have an object called 'transaction'. This object can go through
> many states but for our case is important that transaction is moved to
> T_LOCKED state and out of it only while jbd2_journal_commit_transaction()
> function is executing and waiting on j_wait_transaction_locked waitqueue is
> exactly waiting for a transaction to get out of T_LOCKED state. Function
> jbd2_journal_commit_transaction() is executed only by kjournald. Hence
> anyone can see transaction in T_LOCKED state only if kjournald is running
> inside jbd2_journal_commit_transaction() and thus kjournald cannot be
> sleeping on j_wait_commit at the same time. Does this explain things?
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-02-28  9:28               ` Byungchul Park
@ 2022-02-28 10:14                 ` Jan Kara
  2022-02-28 21:25                   ` Theodore Ts'o
  2022-03-03  1:00                   ` Byungchul Park
  0 siblings, 2 replies; 67+ messages in thread
From: Jan Kara @ 2022-02-28 10:14 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Jan Kara, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, linux-kernel, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj,
	tytso, willy, david, amir73il, bfields, gregkh, kernel-team,
	linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj,
	jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta,
	linux-block, axboe, paolo.valente, josef, linux-fsdevel, viro,
	jack, jlayton, dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Mon 28-02-22 18:28:26, Byungchul Park wrote:
> On Thu, Feb 24, 2022 at 11:22:39AM +0100, Jan Kara wrote:
> > On Thu 24-02-22 10:11:02, Byungchul Park wrote:
> > > On Wed, Feb 23, 2022 at 03:48:59PM +0100, Jan Kara wrote:
> > > > > KJOURNALD2(kthread)	TASK1(ksys_write)	TASK2(ksys_write)
> > > > > 
> > > > > wait A
> > > > > --- stuck
> > > > > 			wait B
> > > > > 			--- stuck
> > > > > 						wait C
> > > > > 						--- stuck
> > > > > 
> > > > > wake up B		wake up C		wake up A
> > > > > 
> > > > > where:
> > > > > A is a wait_queue, j_wait_commit
> > > > > B is a wait_queue, j_wait_transaction_locked
> > > > > C is a rwsem, mapping.invalidate_lock
> > > > 
> > > > I see. But a situation like this is not necessarily a guarantee of a
> > > > deadlock, is it? I mean there can be task D that will eventually call say
> > > > 'wake up B' and unblock everything and this is how things were designed to
> > > > work? Multiple sources of wakeups are quite common I'd say... What does
> > > 
> > > Yes. At the very beginning when I desgined Dept, I was thinking whether
> > > to support multiple wakeup sources or not for a quite long time.
> > > Supporting it would be a better option to aovid non-critical reports.
> > > However, I thought anyway we'd better fix it - not urgent tho - if
> > > there's any single circle dependency. That's why I decided not to
> > > support it for now and wanted to gather the kernel guys' opinions. Thing
> > > is which policy we should go with.
> > 
> > I see. So supporting only a single wakeup source is fine for locks I guess.
> > But for general wait queues or other synchronization mechanisms, I'm afraid
> > it will lead to quite some false positive reports. Just my 2c.
> 
> Thank you for your feedback.
> 
> I realized we've been using "false positive" differently. There exist
> the three types of code in terms of dependency and deadlock. It's worth
> noting that dependencies are built from between waits and events in Dept.
> 
> ---
> 
> case 1. Code with an actual circular dependency, but not deadlock.
> 
>    A circular dependency can be broken by a rescue wakeup source e.g.
>    timeout. It's not a deadlock. If it's okay that the contexts
>    participating in the circular dependency and others waiting for the
>    events in the circle are stuck until it gets broken. Otherwise, say,
>    if it's not meant, then it's anyway problematic.
> 
>    1-1. What if we judge this code is problematic?
>    1-2. What if we judge this code is good?
> 
> case 2. Code with an actual circular dependency, and deadlock.
> 
>    There's no other wakeup source than those within the circular
>    dependency. Literally deadlock. It's problematic and critical.
> 
>    2-1. What if we judge this code is problematic?
>    2-2. What if we judge this code is good?
> 
> case 3. Code with no actual circular dependency, and not deadlock.
> 
>    Must be good.
> 
>    3-1. What if we judge this code is problematic?
>    3-2. What if we judge this code is good?
> 
> ---
> 
> I call only 3-1 "false positive" circular dependency. And you call 1-1
> and 3-1 "false positive" deadlock.
> 
> I've been wondering if the kernel guys esp. Linus considers code with
> any circular dependency is problematic or not, even if it won't lead to
> a deadlock, say, case 1. Even though I designed Dept based on what I
> believe is right, of course, I'm willing to change the design according
> to the majority opinion.
> 
> However, I would never allow case 1 if I were the owner of the kernel
> for better stability, even though the code works anyway okay for now.

So yes, I call a report for the situation "There is circular dependency but
deadlock is not possible." a false positive. And that is because in my
opinion your definition of circular dependency includes schemes that are
useful and used in the kernel.

Your example in case 1 is kind of borderline (I personally would consider
that bug as well) but there are other more valid schemes with multiple
wakeup sources like:

We have a queue of work to do Q protected by lock L. Consumer process has
code like:

while (1) {
	lock L
	prepare_to_wait(work_queued);
	if (no work) {
		unlock L
		sleep
	} else {
		unlock L
		do work
		wake_up(work_done)
	}
}

AFAIU Dept will create dependency here that 'wakeup work_done' is after
'wait for work_queued'. Producer has code like:

while (1) {
	lock L
	prepare_to_wait(work_done)
	if (too much work queued) {
		unlock L
		sleep
	} else {
		queue work
		unlock L
		wake_up(work_queued)
	}
}

And Dept will create dependency here that 'wakeup work_queued' is after
'wait for work_done'. And thus we have a trivial cycle in the dependencies
despite the code being perfectly valid and safe.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-02-28 10:14                 ` Jan Kara
@ 2022-02-28 21:25                   ` Theodore Ts'o
  2022-03-03  1:36                     ` Byungchul Park
  2022-03-03  1:00                   ` Byungchul Park
  1 sibling, 1 reply; 67+ messages in thread
From: Theodore Ts'o @ 2022-02-28 21:25 UTC (permalink / raw)
  To: Jan Kara
  Cc: Byungchul Park, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, linux-kernel, peterz, will,
	tglx, rostedt, joel, sashal, daniel.vetter, chris, duyuyang,
	johannes.berg, tj, willy, david, amir73il, bfields, gregkh,
	kernel-team, linux-mm, akpm, mhocko, minchan, hannes,
	vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka,
	ngupta, linux-block, axboe, paolo.valente, josef, linux-fsdevel,
	viro, jack, jlayton, dan.j.williams, hch, djwong, dri-devel,
	airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote:
> > case 1. Code with an actual circular dependency, but not deadlock.
> > 
> >    A circular dependency can be broken by a rescue wakeup source e.g.
> >    timeout. It's not a deadlock. If it's okay that the contexts
> >    participating in the circular dependency and others waiting for the
> >    events in the circle are stuck until it gets broken. Otherwise, say,
> >    if it's not meant, then it's anyway problematic.
> > 
> >    1-1. What if we judge this code is problematic?
> >    1-2. What if we judge this code is good?
> > 
> > I've been wondering if the kernel guys esp. Linus considers code with
> > any circular dependency is problematic or not, even if it won't lead to
> > a deadlock, say, case 1. Even though I designed Dept based on what I
> > believe is right, of course, I'm willing to change the design according
> > to the majority opinion.
> > 
> > However, I would never allow case 1 if I were the owner of the kernel
> > for better stability, even though the code works anyway okay for now.

Note, I used the example of the timeout as the most obvious way of
explaining that a deadlock is not possible.  There is also the much
more complex explanation which Jan was trying to give, which is what
leads to the circular dependency.  It can happen that when trying to
start a handle, if either (a) there is not enough space in the journal
for new handles, or (b) the current transaction is so large that if we
don't close the transaction and start a new hone, we will end up
running out of space in the future, and so in that case,
start_this_handle() will block starting any more handles, and then
wake up the commit thread.  The commit thread then waits for the
currently running threads to complete, before it allows new handles to
start, and then it will complete the commit.  In the case of (a) we
then need to do a journal checkpoint, which is more work to release
space in the journal, and only then, can we allow new handles to start.

The botom line is (a) it works, (b) there aren't significant delays,
and for DEPT to complain that this is somehow wrong and we need to
completely rearchitect perfectly working code because it doesn't
confirm to DEPT's idea of what is "correct" is not acceptable.

> We have a queue of work to do Q protected by lock L. Consumer process has
> code like:
> 
> while (1) {
> 	lock L
> 	prepare_to_wait(work_queued);
> 	if (no work) {
> 		unlock L
> 		sleep
> 	} else {
> 		unlock L
> 		do work
> 		wake_up(work_done)
> 	}
> }
> 
> AFAIU Dept will create dependency here that 'wakeup work_done' is after
> 'wait for work_queued'. Producer has code like:
> 
> while (1) {
> 	lock L
> 	prepare_to_wait(work_done)
> 	if (too much work queued) {
> 		unlock L
> 		sleep
> 	} else {
> 		queue work
> 		unlock L
> 		wake_up(work_queued)
> 	}
> }
> 
> And Dept will create dependency here that 'wakeup work_queued' is after
> 'wait for work_done'. And thus we have a trivial cycle in the dependencies
> despite the code being perfectly valid and safe.

Cheers,

							- Ted

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-02-28 10:14                 ` Jan Kara
  2022-02-28 21:25                   ` Theodore Ts'o
@ 2022-03-03  1:00                   ` Byungchul Park
  2022-03-03  2:32                     ` Theodore Ts'o
  2022-03-03  9:54                     ` Jan Kara
  1 sibling, 2 replies; 67+ messages in thread
From: Byungchul Park @ 2022-03-03  1:00 UTC (permalink / raw)
  To: Jan Kara
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote:
> On Mon 28-02-22 18:28:26, Byungchul Park wrote:
> > case 1. Code with an actual circular dependency, but not deadlock.
> > 
> >    A circular dependency can be broken by a rescue wakeup source e.g.
> >    timeout. It's not a deadlock. If it's okay that the contexts
> >    participating in the circular dependency and others waiting for the
> >    events in the circle are stuck until it gets broken. Otherwise, say,
> >    if it's not meant, then it's anyway problematic.
> > 
> >    1-1. What if we judge this code is problematic?
> >    1-2. What if we judge this code is good?
> > 
> > case 2. Code with an actual circular dependency, and deadlock.
> > 
> >    There's no other wakeup source than those within the circular
> >    dependency. Literally deadlock. It's problematic and critical.
> > 
> >    2-1. What if we judge this code is problematic?
> >    2-2. What if we judge this code is good?
> > 
> > case 3. Code with no actual circular dependency, and not deadlock.
> > 
> >    Must be good.
> > 
> >    3-1. What if we judge this code is problematic?
> >    3-2. What if we judge this code is good?
> > 
> > ---
> > 
> > I call only 3-1 "false positive" circular dependency. And you call 1-1
> > and 3-1 "false positive" deadlock.
> > 
> > I've been wondering if the kernel guys esp. Linus considers code with
> > any circular dependency is problematic or not, even if it won't lead to
> > a deadlock, say, case 1. Even though I designed Dept based on what I
> > believe is right, of course, I'm willing to change the design according
> > to the majority opinion.
> > 
> > However, I would never allow case 1 if I were the owner of the kernel
> > for better stability, even though the code works anyway okay for now.
> 
> So yes, I call a report for the situation "There is circular dependency but
> deadlock is not possible." a false positive. And that is because in my
> opinion your definition of circular dependency includes schemes that are
> useful and used in the kernel.
> 
> Your example in case 1 is kind of borderline (I personally would consider
> that bug as well) but there are other more valid schemes with multiple
> wakeup sources like:
> 
> We have a queue of work to do Q protected by lock L. Consumer process has
> code like:
> 
> while (1) {
> 	lock L
> 	prepare_to_wait(work_queued);
> 	if (no work) {
> 		unlock L
> 		sleep
> 	} else {
> 		unlock L
> 		do work
> 		wake_up(work_done)
> 	}
> }
> 
> AFAIU Dept will create dependency here that 'wakeup work_done' is after
> 'wait for work_queued'. Producer has code like:

First of all, thank you for this good example.

> while (1) {
> 	lock L
> 	prepare_to_wait(work_done)
> 	if (too much work queued) {
> 		unlock L
> 		sleep
> 	} else {
> 		queue work
> 		unlock L
> 		wake_up(work_queued)
> 	}
> }
> 
> And Dept will create dependency here that 'wakeup work_queued' is after
> 'wait for work_done'. And thus we have a trivial cycle in the dependencies
> despite the code being perfectly valid and safe.

Unfortunately, it's neither perfect nor safe without another wakeup
source - rescue wakeup source.

   consumer			producer

				lock L
				(too much work queued == true)
				unlock L
				--- preempted
   lock L
   unlock L
   do work
   lock L
   unlock L
   do work
   ...
   (no work == true)
   sleep
				--- scheduled in
				sleep

This code leads a deadlock without another wakeup source, say, not safe.

But yes. I also think this code should be allowed if it anyway runs
alongside another wakeup source. For the case, Dept should track the
rescue wakeup source instead that leads a actual deadlock.

I will correct code to make Dept track its rescue wakeup source whenever
finding the case.

Lastly, just for your information, I need to explain how Dept works a
little more for you not to misunderstand Dept.

Assuming the consumer and producer guarantee not to lead a deadlock like
the following, Dept won't report it a problem:

   consumer			producer

				sleep
   wakeup work_done
				queue work
   sleep
				wakeup work_queued
   do work
				sleep
   wakeup work_done
				queue work
   sleep
				wakeup work_queued
   do work
				sleep
   ...				...

Dept does not consider all waits preceeding an event but only waits that
might lead a deadlock. In this case, Dept works with each region
independently.

   consumer			producer

				sleep <- initiates region 1
   --- region 1 starts
   ...				...
   --- region 1 ends
   wakeup work_done
   ...				...
				queue work
   ...				...
   sleep <- initiates region 2
				--- region 2 starts
   ...				...
				--- region 2 ends
				wakeup work_queued
   ...				...
   do work
   ...				...
				sleep <- initiates region 3
   --- region 3 starts
   ...				...
   --- region 3 ends
   wakeup work_done
   ...				...
				queue work
   ...				...
   sleep <- initiates region 4
				--- region 4 starts
   ...				...
				--- region 4 ends
				wakeup work_queued
   ...				...
   do work
   ...				...

That is, Dept does not build dependencies across different regions. So
you don't have to worry about unreasonable false positives that much.

Thoughts?

Thanks,
Byungchul

> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-02-28 21:25                   ` Theodore Ts'o
@ 2022-03-03  1:36                     ` Byungchul Park
  0 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-03-03  1:36 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Jan Kara, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, linux-kernel, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj,
	willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm,
	akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis,
	cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Mon, Feb 28, 2022 at 04:25:04PM -0500, Theodore Ts'o wrote:
> On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote:
> > > case 1. Code with an actual circular dependency, but not deadlock.
> > > 
> > >    A circular dependency can be broken by a rescue wakeup source e.g.
> > >    timeout. It's not a deadlock. If it's okay that the contexts
> > >    participating in the circular dependency and others waiting for the
> > >    events in the circle are stuck until it gets broken. Otherwise, say,
> > >    if it's not meant, then it's anyway problematic.
> > > 
> > >    1-1. What if we judge this code is problematic?
> > >    1-2. What if we judge this code is good?
> > > 
> > > I've been wondering if the kernel guys esp. Linus considers code with
> > > any circular dependency is problematic or not, even if it won't lead to
> > > a deadlock, say, case 1. Even though I designed Dept based on what I
> > > believe is right, of course, I'm willing to change the design according
> > > to the majority opinion.
> > > 
> > > However, I would never allow case 1 if I were the owner of the kernel
> > > for better stability, even though the code works anyway okay for now.
> 
> Note, I used the example of the timeout as the most obvious way of
> explaining that a deadlock is not possible.  There is also the much
> more complex explanation which Jan was trying to give, which is what
> leads to the circular dependency.  It can happen that when trying to
> start a handle, if either (a) there is not enough space in the journal
> for new handles, or (b) the current transaction is so large that if we
> don't close the transaction and start a new hone, we will end up
> running out of space in the future, and so in that case,
> start_this_handle() will block starting any more handles, and then
> wake up the commit thread.  The commit thread then waits for the
> currently running threads to complete, before it allows new handles to
> start, and then it will complete the commit.  In the case of (a) we
> then need to do a journal checkpoint, which is more work to release
> space in the journal, and only then, can we allow new handles to start.

Thank you for the full explanation of how journal things work.

> The botom line is (a) it works, (b) there aren't significant delays,
> and for DEPT to complain that this is somehow wrong and we need to
> completely rearchitect perfectly working code because it doesn't
> confirm to DEPT's idea of what is "correct" is not acceptable.

Thanks to you and Jan Kara, I realized it's not a real dependency in the
consumer and producer scenario but again *ONLY IF* there is a rescue
wakeup source. Dept should track the rescue wakeup source instead in the
case.

I won't ask you to rearchitect the working code. The code looks sane.

Thanks a lot.

Thanks,
Byungchul

> > We have a queue of work to do Q protected by lock L. Consumer process has
> > code like:
> > 
> > while (1) {
> > 	lock L
> > 	prepare_to_wait(work_queued);
> > 	if (no work) {
> > 		unlock L
> > 		sleep
> > 	} else {
> > 		unlock L
> > 		do work
> > 		wake_up(work_done)
> > 	}
> > }
> > 
> > AFAIU Dept will create dependency here that 'wakeup work_done' is after
> > 'wait for work_queued'. Producer has code like:
> > 
> > while (1) {
> > 	lock L
> > 	prepare_to_wait(work_done)
> > 	if (too much work queued) {
> > 		unlock L
> > 		sleep
> > 	} else {
> > 		queue work
> > 		unlock L
> > 		wake_up(work_queued)
> > 	}
> > }
> > 
> > And Dept will create dependency here that 'wakeup work_queued' is after
> > 'wait for work_done'. And thus we have a trivial cycle in the dependencies
> > despite the code being perfectly valid and safe.
> 
> Cheers,
> 
> 							- Ted

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-03  1:00                   ` Byungchul Park
@ 2022-03-03  2:32                     ` Theodore Ts'o
  2022-03-03  5:23                       ` Byungchul Park
  2022-03-03  9:54                     ` Jan Kara
  1 sibling, 1 reply; 67+ messages in thread
From: Theodore Ts'o @ 2022-03-03  2:32 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Jan Kara, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, linux-kernel, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj,
	willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm,
	akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis,
	cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Thu, Mar 03, 2022 at 10:00:33AM +0900, Byungchul Park wrote:
> 
> Unfortunately, it's neither perfect nor safe without another wakeup
> source - rescue wakeup source.
> 
>    consumer			producer
> 
> 				lock L
> 				(too much work queued == true)
> 				unlock L
> 				--- preempted
>    lock L
>    unlock L
>    do work
>    lock L
>    unlock L
>    do work
>    ...
>    (no work == true)
>    sleep
> 				--- scheduled in
> 				sleep

That's not how things work in ext4.  It's **way** more complicated
than that.  We have multiple wait channels, one wake up the consumer
(read: the commit thread), and one which wakes up any processes
waiting for commit thread to have made forward progress.  We also have
two spin-lock protected sequence number, one which indicates the
current commited transaction #, and one indicating the transaction #
that needs to be committed.

On the commit thread, it will sleep on j_wait_commit, and when it is
woken up, it will check to see if there is work to be done
(j_commit_sequence != j_commit_request), and if so, do the work, and
then wake up processes waiting on the wait_queue j_wait_done_commit.
(Again, all of this uses the pattern, "prepare to wait", then check to
see if we should sleep, if we do need to sleep, unlock j_state_lock,
then sleep.   So this prevents any races leading to lost wakeups.

On the start_this_handle() thread, if we current transaction is too
full, we set j_commit_request to its transaction id to indicate that
we want the current transaction to be committed, and then we wake up
the j_wait_commit wait queue and then we enter a loop where do a
prepare_to_wait in j_wait_done_commit, check to see if
j_commit_sequence == the transaction id that we want to be completed,
and if it's not done yet, we unlock the j_state_lock spinlock, and go
to sleep.  Again, because of the prepare_to_wait, there is no chance
of a lost wakeup.

So there really is no "consumer" and "producer" here.  If you really
insist on using this model, which really doesn't apply, for one
thread, it's the consumer with respect to one wait queue, and the
producer with respect to the *other* wait queue.  For the other
thread, the consumer and producer roles are reversed.

And of course, this is a highly simplified model, since we also have a
wait queue used by the commit thread to wait for the number of active
handles on a particular transaction to go to zero, and
stop_this_handle() will wake up commit thread via this wait queue when
the last active handle on a particular transaction is retired.  (And
yes, that parameter is also protected by a different spin lock which
is per-transaction).

So it seems to me that a fundamental flaw in DEPT's model is assuming
that the only waiting paradigm that can be used is consumer/producer,
and that's simply not true.  The fact that you use the term "lock" is
also going to lead a misleading line of reasoning, because properly
speaking, they aren't really locks.  We are simply using wait channels
to wake up processes as necessary, and then they will check other
variables to decide whether or not they need to sleep or not, and we
have an invariant that when these variables change indicating forward
progress, the associated wait channel will be woken up.

Cheers,

						- Ted


P.S.  This model is also highly simplified since there are other
reasons why the commit thread can be woken up, some which might be via
a timeout, and some which is via the j_wait_commit wait channel but
not because j_commit_request has been changed, but because file system
is being unmounted, or the file system is being frozen in preparation
of a snapshot, etc.  These are *not* necessary to prevent a deadlock,
because under normal circumstances the two wake channels are
sufficient of themselves.  So please don't think of them as "rescue
wakeup sources"; again, that's highly misleading and the wrong way to
think of them.

And to make things even more complicated, we have more than 2 wait
channel --- we have *five*:

	/**
	 * @j_wait_transaction_locked:
	 *
	 * Wait queue for waiting for a locked transaction to start committing,
	 * or for a barrier lock to be released.
	 */
	wait_queue_head_t	j_wait_transaction_locked;

	/**
	 * @j_wait_done_commit: Wait queue for waiting for commit to complete.
	 */
	wait_queue_head_t	j_wait_done_commit;

	/**
	 * @j_wait_commit: Wait queue to trigger commit.
	 */
	wait_queue_head_t	j_wait_commit;

	/**
	 * @j_wait_updates: Wait queue to wait for updates to complete.
	 */
	wait_queue_head_t	j_wait_updates;

	/**
	 * @j_wait_reserved:
	 *
	 * Wait queue to wait for reserved buffer credits to drop.
	 */
	wait_queue_head_t	j_wait_reserved;

	/**
	 * @j_fc_wait:
	 *
	 * Wait queue to wait for completion of async fast commits.
	 */
	wait_queue_head_t	j_fc_wait;


"There are more things in heaven and Earth, Horatio,
 Than are dreamt of in your philosophy."
      	  	    - William Shakespeare, Hamlet

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-03  2:32                     ` Theodore Ts'o
@ 2022-03-03  5:23                       ` Byungchul Park
  2022-03-03 14:36                         ` Theodore Ts'o
  0 siblings, 1 reply; 67+ messages in thread
From: Byungchul Park @ 2022-03-03  5:23 UTC (permalink / raw)
  To: tytso
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, torvalds,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

Ted wrote:
> On Thu, Mar 03, 2022 at 10:00:33AM +0900, Byungchul Park wrote:
> > 
> > Unfortunately, it's neither perfect nor safe without another wakeup
> > source - rescue wakeup source.
> > 
> >    consumer			producer
> > 
> >				lock L
> >				(too much work queued == true)
> >				unlock L
> >				--- preempted
> >    lock L
> >    unlock L
> >    do work
> >    lock L
> >    unlock L
> >    do work
> >    ...
> >    (no work == true)
> >    sleep
> >				--- scheduled in
> >				sleep
> 
> That's not how things work in ext4.  It's **way** more complicated

You seem to get it wrong. This example is what Jan Kara gave me. I just
tried to explain things based on Jan Kara's example so leaving all
statements that Jan Kara wrote. Plus the example was so helpful. Thanks,
Jan Kara.

> than that.  We have multiple wait channels, one wake up the consumer
> (read: the commit thread), and one which wakes up any processes
> waiting for commit thread to have made forward progress.  We also have
> two spin-lock protected sequence number, one which indicates the
> current commited transaction #, and one indicating the transaction #
> that needs to be committed.
> 
> On the commit thread, it will sleep on j_wait_commit, and when it is
> woken up, it will check to see if there is work to be done
> (j_commit_sequence != j_commit_request), and if so, do the work, and
> then wake up processes waiting on the wait_queue j_wait_done_commit.
> (Again, all of this uses the pattern, "prepare to wait", then check to
> see if we should sleep, if we do need to sleep, unlock j_state_lock,
> then sleep.   So this prevents any races leading to lost wakeups.
> 
> On the start_this_handle() thread, if we current transaction is too
> full, we set j_commit_request to its transaction id to indicate that
> we want the current transaction to be committed, and then we wake up
> the j_wait_commit wait queue and then we enter a loop where do a
> prepare_to_wait in j_wait_done_commit, check to see if
> j_commit_sequence == the transaction id that we want to be completed,
> and if it's not done yet, we unlock the j_state_lock spinlock, and go
> to sleep.  Again, because of the prepare_to_wait, there is no chance
> of a lost wakeup.

The above explantion gives me a clear view about synchronization of
journal things. I appreciate it.

> So there really is no "consumer" and "producer" here.  If you really
> insist on using this model, which really doesn't apply, for one

Dept does not assume "consumer" and "producer" model at all, but Dept
works with general waits and events. *That model is just one of them.*

> thread, it's the consumer with respect to one wait queue, and the
> producer with respect to the *other* wait queue.  For the other
> thread, the consumer and producer roles are reversed.
> 
> And of course, this is a highly simplified model, since we also have a
> wait queue used by the commit thread to wait for the number of active
> handles on a particular transaction to go to zero, and
> stop_this_handle() will wake up commit thread via this wait queue when
> the last active handle on a particular transaction is retired.  (And
> yes, that parameter is also protected by a different spin lock which
> is per-transaction).

This one also gives me a clear view. Thanks a lot.

> So it seems to me that a fundamental flaw in DEPT's model is assuming
> that the only waiting paradigm that can be used is consumer/producer,

No, Dept does not.

> and that's simply not true.  The fact that you use the term "lock" is
> also going to lead a misleading line of reasoning, because properly

"lock/unlock L" comes from the Jan Kara's example. It has almost nothing
to do with the explanation. I just left "lock/unlock L" as a statement
that comes from the Jan Kara's example.

> speaking, they aren't really locks.  We are simply using wait channels

I totally agree with you. *They aren't really locks but it's just waits
and wakeups.* That's exactly why I decided to develop Dept. Dept is not
interested in locks unlike Lockdep, but fouces on waits and wakeup
sources itself. I think you get Dept wrong a lot. Please ask me more if
you have things you doubt about Dept.

Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-03  1:00                   ` Byungchul Park
  2022-03-03  2:32                     ` Theodore Ts'o
@ 2022-03-03  9:54                     ` Jan Kara
  2022-03-04  1:56                       ` Byungchul Park
  1 sibling, 1 reply; 67+ messages in thread
From: Jan Kara @ 2022-03-03  9:54 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Jan Kara, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, linux-kernel, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj,
	tytso, willy, david, amir73il, bfields, gregkh, kernel-team,
	linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj,
	jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta,
	linux-block, axboe, paolo.valente, josef, linux-fsdevel, viro,
	jack, jlayton, dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Thu 03-03-22 10:00:33, Byungchul Park wrote:
> On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote:
> > On Mon 28-02-22 18:28:26, Byungchul Park wrote:
> > > case 1. Code with an actual circular dependency, but not deadlock.
> > > 
> > >    A circular dependency can be broken by a rescue wakeup source e.g.
> > >    timeout. It's not a deadlock. If it's okay that the contexts
> > >    participating in the circular dependency and others waiting for the
> > >    events in the circle are stuck until it gets broken. Otherwise, say,
> > >    if it's not meant, then it's anyway problematic.
> > > 
> > >    1-1. What if we judge this code is problematic?
> > >    1-2. What if we judge this code is good?
> > > 
> > > case 2. Code with an actual circular dependency, and deadlock.
> > > 
> > >    There's no other wakeup source than those within the circular
> > >    dependency. Literally deadlock. It's problematic and critical.
> > > 
> > >    2-1. What if we judge this code is problematic?
> > >    2-2. What if we judge this code is good?
> > > 
> > > case 3. Code with no actual circular dependency, and not deadlock.
> > > 
> > >    Must be good.
> > > 
> > >    3-1. What if we judge this code is problematic?
> > >    3-2. What if we judge this code is good?
> > > 
> > > ---
> > > 
> > > I call only 3-1 "false positive" circular dependency. And you call 1-1
> > > and 3-1 "false positive" deadlock.
> > > 
> > > I've been wondering if the kernel guys esp. Linus considers code with
> > > any circular dependency is problematic or not, even if it won't lead to
> > > a deadlock, say, case 1. Even though I designed Dept based on what I
> > > believe is right, of course, I'm willing to change the design according
> > > to the majority opinion.
> > > 
> > > However, I would never allow case 1 if I were the owner of the kernel
> > > for better stability, even though the code works anyway okay for now.
> > 
> > So yes, I call a report for the situation "There is circular dependency but
> > deadlock is not possible." a false positive. And that is because in my
> > opinion your definition of circular dependency includes schemes that are
> > useful and used in the kernel.
> > 
> > Your example in case 1 is kind of borderline (I personally would consider
> > that bug as well) but there are other more valid schemes with multiple
> > wakeup sources like:
> > 
> > We have a queue of work to do Q protected by lock L. Consumer process has
> > code like:
> > 
> > while (1) {
> > 	lock L
> > 	prepare_to_wait(work_queued);
> > 	if (no work) {
> > 		unlock L
> > 		sleep
> > 	} else {
> > 		unlock L
> > 		do work
> > 		wake_up(work_done)
> > 	}
> > }
> > 
> > AFAIU Dept will create dependency here that 'wakeup work_done' is after
> > 'wait for work_queued'. Producer has code like:
> 
> First of all, thank you for this good example.
> 
> > while (1) {
> > 	lock L
> > 	prepare_to_wait(work_done)
> > 	if (too much work queued) {
> > 		unlock L
> > 		sleep
> > 	} else {
> > 		queue work
> > 		unlock L
> > 		wake_up(work_queued)
> > 	}
> > }
> > 
> > And Dept will create dependency here that 'wakeup work_queued' is after
> > 'wait for work_done'. And thus we have a trivial cycle in the dependencies
> > despite the code being perfectly valid and safe.
> 
> Unfortunately, it's neither perfect nor safe without another wakeup
> source - rescue wakeup source.
> 
>    consumer			producer
> 
> 				lock L
> 				(too much work queued == true)
> 				unlock L
> 				--- preempted
>    lock L
>    unlock L
>    do work
>    lock L
>    unlock L
>    do work
>    ...
>    (no work == true)
>    sleep
> 				--- scheduled in
> 				sleep
> 
> This code leads a deadlock without another wakeup source, say, not safe.

So the scenario you describe above is indeed possible. But the trick is
that the wakeup from 'consumer' as is doing work will remove 'producer'
from the wait queue and change the 'producer' process state to
'TASK_RUNNING'. So when 'producer' calls sleep (in fact schedule()), the
scheduler will just treat this as another preemption point and the
'producer' will immediately or soon continue to run. So indeed we can think
of this as "another wakeup source" but the source is in the CPU scheduler
itself. This is the standard way how waitqueues are used in the kernel...

> Lastly, just for your information, I need to explain how Dept works a
> little more for you not to misunderstand Dept.
> 
> Assuming the consumer and producer guarantee not to lead a deadlock like
> the following, Dept won't report it a problem:
> 
>    consumer			producer
> 
> 				sleep
>    wakeup work_done
> 				queue work
>    sleep
> 				wakeup work_queued
>    do work
> 				sleep
>    wakeup work_done
> 				queue work
>    sleep
> 				wakeup work_queued
>    do work
> 				sleep
>    ...				...
> 
> Dept does not consider all waits preceeding an event but only waits that
> might lead a deadlock. In this case, Dept works with each region
> independently.
> 
>    consumer			producer
> 
> 				sleep <- initiates region 1
>    --- region 1 starts
>    ...				...
>    --- region 1 ends
>    wakeup work_done
>    ...				...
> 				queue work
>    ...				...
>    sleep <- initiates region 2
> 				--- region 2 starts
>    ...				...
> 				--- region 2 ends
> 				wakeup work_queued
>    ...				...
>    do work
>    ...				...
> 				sleep <- initiates region 3
>    --- region 3 starts
>    ...				...
>    --- region 3 ends
>    wakeup work_done
>    ...				...
> 				queue work
>    ...				...
>    sleep <- initiates region 4
> 				--- region 4 starts
>    ...				...
> 				--- region 4 ends
> 				wakeup work_queued
>    ...				...
>    do work
>    ...				...
> 
> That is, Dept does not build dependencies across different regions. So
> you don't have to worry about unreasonable false positives that much.
> 
> Thoughts?

Thanks for explanation! And what exactly defines the 'regions'? When some
process goes to sleep on some waitqueue, this defines a start of a region
at the place where all the other processes are at that moment and wakeup of
the waitqueue is an end of the region?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-03  5:23                       ` Byungchul Park
@ 2022-03-03 14:36                         ` Theodore Ts'o
  2022-03-04  0:42                           ` Byungchul Park
  2022-03-04  3:20                           ` Byungchul Park
  0 siblings, 2 replies; 67+ messages in thread
From: Theodore Ts'o @ 2022-03-03 14:36 UTC (permalink / raw)
  To: Byungchul Park
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, torvalds,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Thu, Mar 03, 2022 at 02:23:33PM +0900, Byungchul Park wrote:
> I totally agree with you. *They aren't really locks but it's just waits
> and wakeups.* That's exactly why I decided to develop Dept. Dept is not
> interested in locks unlike Lockdep, but fouces on waits and wakeup
> sources itself. I think you get Dept wrong a lot. Please ask me more if
> you have things you doubt about Dept.

So the question is this --- do you now understand why, even though
there is a circular dependency, nothing gets stalled in the
interactions between the two wait channels?

						- Ted

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-03 14:36                         ` Theodore Ts'o
@ 2022-03-04  0:42                           ` Byungchul Park
  2022-03-05  3:26                             ` Theodore Ts'o
  2022-03-04  3:20                           ` Byungchul Park
  1 sibling, 1 reply; 67+ messages in thread
From: Byungchul Park @ 2022-03-04  0:42 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, torvalds,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Thu, Mar 03, 2022 at 09:36:25AM -0500, Theodore Ts'o wrote:
> On Thu, Mar 03, 2022 at 02:23:33PM +0900, Byungchul Park wrote:
> > I totally agree with you. *They aren't really locks but it's just waits
> > and wakeups.* That's exactly why I decided to develop Dept. Dept is not
> > interested in locks unlike Lockdep, but fouces on waits and wakeup
> > sources itself. I think you get Dept wrong a lot. Please ask me more if
> > you have things you doubt about Dept.
> 
> So the question is this --- do you now understand why, even though
> there is a circular dependency, nothing gets stalled in the
> interactions between the two wait channels?

??? I'm afraid I don't get you.

All contexts waiting for any of the events in the circular dependency
chain will be definitely stuck if there is a circular dependency as I
explained. So we need another wakeup source to break the circle. In
ext4 code, you might have the wakeup source for breaking the circle.

What I agreed with is:

   The case that 1) the circular dependency is unevitable 2) there are
   another wakeup source for breadking the circle and 3) the duration
   in sleep is short enough, should be acceptable.

Sounds good?

Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-03  9:54                     ` Jan Kara
@ 2022-03-04  1:56                       ` Byungchul Park
  0 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-03-04  1:56 UTC (permalink / raw)
  To: Jan Kara
  Cc: torvalds, damien.lemoal, linux-ide, adilger.kernel, linux-ext4,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Thu, Mar 03, 2022 at 10:54:56AM +0100, Jan Kara wrote:
> On Thu 03-03-22 10:00:33, Byungchul Park wrote:
> > Unfortunately, it's neither perfect nor safe without another wakeup
> > source - rescue wakeup source.
> > 
> >    consumer			producer
> > 
> > 				lock L
> > 				(too much work queued == true)
> > 				unlock L
> > 				--- preempted
> >    lock L
> >    unlock L
> >    do work
> >    lock L
> >    unlock L
> >    do work
> >    ...
> >    (no work == true)
> >    sleep
> > 				--- scheduled in
> > 				sleep
> > 
> > This code leads a deadlock without another wakeup source, say, not safe.
> 
> So the scenario you describe above is indeed possible. But the trick is
> that the wakeup from 'consumer' as is doing work will remove 'producer'
> from the wait queue and change the 'producer' process state to
> 'TASK_RUNNING'. So when 'producer' calls sleep (in fact schedule()), the
> scheduler will just treat this as another preemption point and the
> 'producer' will immediately or soon continue to run. So indeed we can think
> of this as "another wakeup source" but the source is in the CPU scheduler
> itself. This is the standard way how waitqueues are used in the kernel...

Nice! Thanks for the explanation. I will take it into account if needed.

> > Lastly, just for your information, I need to explain how Dept works a
> > little more for you not to misunderstand Dept.
> > 
> > Assuming the consumer and producer guarantee not to lead a deadlock like
> > the following, Dept won't report it a problem:
> > 
> >    consumer			producer
> > 
> > 				sleep
> >    wakeup work_done
> > 				queue work
> >    sleep
> > 				wakeup work_queued
> >    do work
> > 				sleep
> >    wakeup work_done
> > 				queue work
> >    sleep
> > 				wakeup work_queued
> >    do work
> > 				sleep
> >    ...				...
> > 
> > Dept does not consider all waits preceeding an event but only waits that
> > might lead a deadlock. In this case, Dept works with each region
> > independently.
> > 
> >    consumer			producer
> > 
> > 				sleep <- initiates region 1
> >    --- region 1 starts
> >    ...				...
> >    --- region 1 ends
> >    wakeup work_done
> >    ...				...
> > 				queue work
> >    ...				...
> >    sleep <- initiates region 2
> > 				--- region 2 starts
> >    ...				...
> > 				--- region 2 ends
> > 				wakeup work_queued
> >    ...				...
> >    do work
> >    ...				...
> > 				sleep <- initiates region 3
> >    --- region 3 starts
> >    ...				...
> >    --- region 3 ends
> >    wakeup work_done
> >    ...				...
> > 				queue work
> >    ...				...
> >    sleep <- initiates region 4
> > 				--- region 4 starts
> >    ...				...
> > 				--- region 4 ends
> > 				wakeup work_queued
> >    ...				...
> >    do work
> >    ...				...
> > 
> > That is, Dept does not build dependencies across different regions. So
> > you don't have to worry about unreasonable false positives that much.
> > 
> > Thoughts?
> 
> Thanks for explanation! And what exactly defines the 'regions'? When some
> process goes to sleep on some waitqueue, this defines a start of a region
> at the place where all the other processes are at that moment and wakeup of
> the waitqueue is an end of the region?

Yes. Let me explain it more for better understanding.
(I copied it from the talk I did with Matthew..)


   ideal view
   -----------
   context X			context Y

   request event E		...
      write REQUESTEVENT	when (notice REQUESTEVENT written)
   ...				   notice the request from X [S]

				--- ideally region 1 starts here
   wait for the event		...
      sleep			if (can see REQUESTEVENT written)
   				   it's on the way to the event
   ...				
   				...
				--- ideally region 1 ends here

				finally the event [E]

Dept basically works with the above view with regard to wait and event.
But it's very hard to identify the ideal [S] point in practice. So Dept
instead identifies [S] point by checking WAITSTART with memory barriers
like the following, which would make Dept work conservatively.


   Dept's view
   ------------
   context X			context Y

   request event E		...
      write REQUESTEVENT	when (notice REQUESTEVENT written)
   ...				   notice the request from X

				--- region 2 Dept gives up starts
   wait for the event		...
      write barrier
      write WAITSTART		read barrier
      sleep			when (notice WAITSTART written)
				   ensure the request has come [S]

				--- region 2 Dept gives up ends
				--- region 3 starts here
				...
				if (can see WAITSTART written)
				   it's on the way to the event
   ...				
   				...
				--- region 3 ends here

   				finally the event [E]

In short, Dept works with region 3.

Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-03 14:36                         ` Theodore Ts'o
  2022-03-04  0:42                           ` Byungchul Park
@ 2022-03-04  3:20                           ` Byungchul Park
  2022-03-05  3:40                             ` Theodore Ts'o
  1 sibling, 1 reply; 67+ messages in thread
From: Byungchul Park @ 2022-03-04  3:20 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, torvalds,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Thu, Mar 03, 2022 at 09:36:25AM -0500, Theodore Ts'o wrote:
> On Thu, Mar 03, 2022 at 02:23:33PM +0900, Byungchul Park wrote:
> > I totally agree with you. *They aren't really locks but it's just waits
> > and wakeups.* That's exactly why I decided to develop Dept. Dept is not
> > interested in locks unlike Lockdep, but fouces on waits and wakeup
> > sources itself. I think you get Dept wrong a lot. Please ask me more if
> > you have things you doubt about Dept.
> 
> So the question is this --- do you now understand why, even though
> there is a circular dependency, nothing gets stalled in the
> interactions between the two wait channels?

I found a point that the two wait channels don't lead a deadlock in
some cases thanks to Jan Kara. I will fix it so that Dept won't
complain it.

Thanks,
Byungchul

> 
> 						- Ted

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-04  0:42                           ` Byungchul Park
@ 2022-03-05  3:26                             ` Theodore Ts'o
  2022-03-05 14:15                               ` Byungchul Park
  0 siblings, 1 reply; 67+ messages in thread
From: Theodore Ts'o @ 2022-03-05  3:26 UTC (permalink / raw)
  To: Byungchul Park
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, torvalds,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Fri, Mar 04, 2022 at 09:42:37AM +0900, Byungchul Park wrote:
> 
> All contexts waiting for any of the events in the circular dependency
> chain will be definitely stuck if there is a circular dependency as I
> explained. So we need another wakeup source to break the circle. In
> ext4 code, you might have the wakeup source for breaking the circle.
> 
> What I agreed with is:
> 
>    The case that 1) the circular dependency is unevitable 2) there are
>    another wakeup source for breadking the circle and 3) the duration
>    in sleep is short enough, should be acceptable.
> 
> Sounds good?

These dependencies are part of every single ext4 metadata update,
and if there were any unnecessary sleeps, this would be a major
performance gap, and this is a very well studied part of ext4.

There are some places where we sleep, sure.  In some case
start_this_handle() needs to wait for a commit to complete, and the
commit thread might need to sleep for I/O to complete.  But the moment
the thing that we're waiting for is complete, we wake up all of the
processes on the wait queue.  But in the case where we wait for I/O
complete, that wakeupis coming from the device driver, when it
receives the the I/O completion interrupt from the hard drive.  Is
that considered an "external source"?  Maybe DEPT doesn't recognize
that this is certain to happen just as day follows the night?  (Well,
maybe the I/O completion interrupt might not happen if the disk drive
bursts into flames --- but then, you've got bigger problems. :-)

In any case, if DEPT is going to report these "circular dependencies
as bugs that MUST be fixed", it's going to be pure noise and I will
ignore all DEPT reports, and will push back on having Lockdep replaced
by DEPT --- because Lockdep give us actionable reports, and if DEPT
can't tell the difference between a valid programming pattern and a
bug, then it's worse than useless.

Sounds good?

							- Ted

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-04  3:20                           ` Byungchul Park
@ 2022-03-05  3:40                             ` Theodore Ts'o
  2022-03-05 14:55                               ` Byungchul Park
  0 siblings, 1 reply; 67+ messages in thread
From: Theodore Ts'o @ 2022-03-05  3:40 UTC (permalink / raw)
  To: Byungchul Park
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, torvalds,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Fri, Mar 04, 2022 at 12:20:02PM +0900, Byungchul Park wrote:
> 
> I found a point that the two wait channels don't lead a deadlock in
> some cases thanks to Jan Kara. I will fix it so that Dept won't
> complain it.

I sent my last (admittedly cranky) message before you sent this.  I'm
glad you finally understood Jan's explanation.  I was trying to tell
you the same thing, but apparently I failed to communicate in a
sufficiently clear manner.  In any case, what Jan described is a
fundamental part of how wait queues work, and I'm kind of amazed that
you were able to implement DEPT without understanding it.  (But maybe
that is why some of the DEPT reports were completely incomprehensible
to me; I couldn't interpret why in the world DEPT was saying there was
a problem.)

In any case, the thing I would ask is a little humility.  We regularly
use lockdep, and we run a huge number of stress tests, throughout each
development cycle.

So if DEPT is issuing lots of reports about apparently circular
dependencies, please try to be open to the thought that the fault is
in DEPT, and don't try to argue with maintainers that their code MUST
be buggy --- but since you don't understand our code, and DEPT must be
theoretically perfect, that it is up to the Maintainers to prove to
you that their code is correct.

I am going to gently suggest that it is at least as likely, if not
more likely, that the failure is in DEPT or your understanding of what
how kernel wait channels and locking works.  After all, why would it
be that we haven't found these problems via our other QA practices?

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-05  3:26                             ` Theodore Ts'o
@ 2022-03-05 14:15                               ` Byungchul Park
  2022-03-05 15:05                                 ` Joel Fernandes
  0 siblings, 1 reply; 67+ messages in thread
From: Byungchul Park @ 2022-03-05 14:15 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, torvalds,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Fri, Mar 04, 2022 at 10:26:23PM -0500, Theodore Ts'o wrote:
> On Fri, Mar 04, 2022 at 09:42:37AM +0900, Byungchul Park wrote:
> > 
> > All contexts waiting for any of the events in the circular dependency
> > chain will be definitely stuck if there is a circular dependency as I
> > explained. So we need another wakeup source to break the circle. In
> > ext4 code, you might have the wakeup source for breaking the circle.
> > 
> > What I agreed with is:
> > 
> >    The case that 1) the circular dependency is unevitable 2) there are
> >    another wakeup source for breadking the circle and 3) the duration
> >    in sleep is short enough, should be acceptable.
> > 
> > Sounds good?
> 
> These dependencies are part of every single ext4 metadata update,
> and if there were any unnecessary sleeps, this would be a major
> performance gap, and this is a very well studied part of ext4.
> 
> There are some places where we sleep, sure.  In some case
> start_this_handle() needs to wait for a commit to complete, and the
> commit thread might need to sleep for I/O to complete.  But the moment
> the thing that we're waiting for is complete, we wake up all of the
> processes on the wait queue.  But in the case where we wait for I/O
> complete, that wakeupis coming from the device driver, when it
> receives the the I/O completion interrupt from the hard drive.  Is
> that considered an "external source"?  Maybe DEPT doesn't recognize
> that this is certain to happen just as day follows the night?  (Well,
> maybe the I/O completion interrupt might not happen if the disk drive
> bursts into flames --- but then, you've got bigger problems. :-)

Almost all you've been blaming at Dept are totally non-sense. Based on
what you're saying, I'm conviced that you don't understand how Dept
works even 1%. You don't even try to understand it before blame.

You don't have to understand and support it. But I can't response to you
if you keep saying silly things that way.

> In any case, if DEPT is going to report these "circular dependencies
> as bugs that MUST be fixed", it's going to be pure noise and I will
> ignore all DEPT reports, and will push back on having Lockdep replaced

Dept is going to be improved so that what you are concerning about won't
be reported.

> by DEPT --- because Lockdep give us actionable reports, and if DEPT

Right. Dept should give actionable reports, too.

> can't tell the difference between a valid programming pattern and a
> bug, then it's worse than useless.

Needless to say.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-05  3:40                             ` Theodore Ts'o
@ 2022-03-05 14:55                               ` Byungchul Park
  2022-03-05 15:12                                 ` Reimar Döffinger
  2022-03-06  3:30                                 ` Theodore Ts'o
  0 siblings, 2 replies; 67+ messages in thread
From: Byungchul Park @ 2022-03-05 14:55 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, torvalds,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Fri, Mar 04, 2022 at 10:40:35PM -0500, Theodore Ts'o wrote:
> On Fri, Mar 04, 2022 at 12:20:02PM +0900, Byungchul Park wrote:
> > 
> > I found a point that the two wait channels don't lead a deadlock in
> > some cases thanks to Jan Kara. I will fix it so that Dept won't
> > complain it.
> 
> I sent my last (admittedly cranky) message before you sent this.  I'm
> glad you finally understood Jan's explanation.  I was trying to tell

Not finally. I've understood him whenever he tried to tell me something.

> you the same thing, but apparently I failed to communicate in a

I don't think so. Your point and Jan's point are different. All he has
said make sense. But yours does not.

> sufficiently clear manner.  In any case, what Jan described is a
> fundamental part of how wait queues work, and I'm kind of amazed that
> you were able to implement DEPT without understanding it.  (But maybe

Of course, it was possible because all that Dept has to know for basic
work is wait and event. The subtle things like what Jan told me help
Dept be better.

> that is why some of the DEPT reports were completely incomprehensible

It's because you are blinded to blame at it without understanding how
Dept works at all. I will fix those that must be fixed. Don't worry.

> to me; I couldn't interpret why in the world DEPT was saying there was
> a problem.)

I can tell you if you really want to understand why. But I can't if you
are like this.

> In any case, the thing I would ask is a little humility.  We regularly
> use lockdep, and we run a huge number of stress tests, throughout each
> development cycle.

Sure.

> So if DEPT is issuing lots of reports about apparently circular
> dependencies, please try to be open to the thought that the fault is

No one was convinced that Dept doesn't have a fault. I think your
worries are too much.

> in DEPT, and don't try to argue with maintainers that their code MUST
> be buggy --- but since you don't understand our code, and DEPT must be

No one argued that their code must be buggy, either. So I don't think
you have to worry about what's never happened.

> theoretically perfect, that it is up to the Maintainers to prove to
> you that their code is correct.
> 
> I am going to gently suggest that it is at least as likely, if not
> more likely, that the failure is in DEPT or your understanding of what

No doubt. I already think so. But it doesn't mean that I have to keep
quiet without discussing to imporve Dept. I will keep improving Dept in
a reasonable way.

> how kernel wait channels and locking works.  After all, why would it
> be that we haven't found these problems via our other QA practices?

Let's talk more once you understand how Dept works at least 10%. Or I
think we cannot talk in a productive way.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-05 14:15                               ` Byungchul Park
@ 2022-03-05 15:05                                 ` Joel Fernandes
  2022-03-07  2:43                                   ` Byungchul Park
  0 siblings, 1 reply; 67+ messages in thread
From: Joel Fernandes @ 2022-03-05 15:05 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Theodore Ts'o, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, torvalds, mingo, linux-kernel, peterz, will, tglx,
	rostedt, sashal, daniel.vetter, chris, duyuyang, johannes.berg,
	tj, willy, david, amir73il, bfields, gregkh, kernel-team,
	linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj,
	jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta,
	linux-block, paolo.valente, josef, linux-fsdevel, viro, jack,
	jack, jlayton, dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa, paulmck

On Sat, Mar 05, 2022 at 11:15:38PM +0900, Byungchul Park wrote:
> On Fri, Mar 04, 2022 at 10:26:23PM -0500, Theodore Ts'o wrote:
> > On Fri, Mar 04, 2022 at 09:42:37AM +0900, Byungchul Park wrote:
> > > 
> > > All contexts waiting for any of the events in the circular dependency
> > > chain will be definitely stuck if there is a circular dependency as I
> > > explained. So we need another wakeup source to break the circle. In
> > > ext4 code, you might have the wakeup source for breaking the circle.
> > > 
> > > What I agreed with is:
> > > 
> > >    The case that 1) the circular dependency is unevitable 2) there are
> > >    another wakeup source for breadking the circle and 3) the duration
> > >    in sleep is short enough, should be acceptable.
> > > 
> > > Sounds good?
> > 
> > These dependencies are part of every single ext4 metadata update,
> > and if there were any unnecessary sleeps, this would be a major
> > performance gap, and this is a very well studied part of ext4.
> > 
> > There are some places where we sleep, sure.  In some case
> > start_this_handle() needs to wait for a commit to complete, and the
> > commit thread might need to sleep for I/O to complete.  But the moment
> > the thing that we're waiting for is complete, we wake up all of the
> > processes on the wait queue.  But in the case where we wait for I/O
> > complete, that wakeupis coming from the device driver, when it
> > receives the the I/O completion interrupt from the hard drive.  Is
> > that considered an "external source"?  Maybe DEPT doesn't recognize
> > that this is certain to happen just as day follows the night?  (Well,
> > maybe the I/O completion interrupt might not happen if the disk drive
> > bursts into flames --- but then, you've got bigger problems. :-)
> 
> Almost all you've been blaming at Dept are totally non-sense. Based on
> what you're saying, I'm conviced that you don't understand how Dept
> works even 1%. You don't even try to understand it before blame.
> 
> You don't have to understand and support it. But I can't response to you
> if you keep saying silly things that way.

Byungchul, other than ext4 have there been any DEPT reports that other
subsystem maintainers' agree were valid usecases?

Regarding false-positives, just to note lockdep is not without its share of
false-positives. Just that (as you know), the signal-to-noise ratio should be
high for it to be useful. I've put up with lockdep's false positives just
because it occasionally saves me from catastrophe.

> > In any case, if DEPT is going to report these "circular dependencies
> > as bugs that MUST be fixed", it's going to be pure noise and I will
> > ignore all DEPT reports, and will push back on having Lockdep replaced
> 
> Dept is going to be improved so that what you are concerning about won't
> be reported.

Yeah I am looking forward to learning more about it however I was wondering
about the following: lockdep can already be used for modeling "resource
acquire/release" and "resource wait" semantics that are unrelated to locks,
like we do in mm reclaim. I am wondering why we cannot just use those existing
lockdep mechanisms for the wait/wake usecases (Assuming that we can agree
that circular dependencies on related to wait/wake is a bad thing. Or perhaps
there's a reason why Peter Zijlstra did not use lockdep for wait/wake
dependencies (such as multiple wake sources) considering he wrote a lot of
that code.

Keep kicking ass brother, you're doing great.

Thanks,

     Joel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-05 14:55                               ` Byungchul Park
@ 2022-03-05 15:12                                 ` Reimar Döffinger
  2022-03-06  3:30                                 ` Theodore Ts'o
  1 sibling, 0 replies; 67+ messages in thread
From: Reimar Döffinger @ 2022-03-05 15:12 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Theodore Ts'o, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, torvalds, mingo, linux-kernel, peterz, will, tglx,
	rostedt, joel, sashal, daniel.vetter, chris, duyuyang,
	johannes.berg, tj, willy, david, amir73il, bfields, gregkh,
	kernel-team, linux-mm, akpm, mhocko, minchan, hannes,
	vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka,
	ngupta, linux-block, paolo.valente, josef, linux-fsdevel, viro,
	jack, jack, jlayton, dan.j.williams, hch, djwong, dri-devel,
	airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa

Hi,
Sorry to butt in as an outsider, but this seems like a shockingly disrespectful discussion for such a wide CC list.
I don't want to make rules how you discuss things (I very rarely contribute), and I see the value in a frank discussion, but maybe you could continue with a reduced CC list?
I find it unlikely that I am the only one who could do without this.

Best regards,
Reimar Döffinger

> On 5 Mar 2022, at 15:55, Byungchul Park <byungchul.park@lge.com> wrote:
> 
> On Fri, Mar 04, 2022 at 10:40:35PM -0500, Theodore Ts'o wrote:
>> On Fri, Mar 04, 2022 at 12:20:02PM +0900, Byungchul Park wrote:
>>> 
>>> I found a point that the two wait channels don't lead a deadlock in
>>> some cases thanks to Jan Kara. I will fix it so that Dept won't
>>> complain it.
>> 
>> I sent my last (admittedly cranky) message before you sent this.  I'm
>> glad you finally understood Jan's explanation.  I was trying to tell
> 
> Not finally. I've understood him whenever he tried to tell me something.
> 
>> you the same thing, but apparently I failed to communicate in a
> 
> I don't think so. Your point and Jan's point are different. All he has
> said make sense. But yours does not.
> 
>> sufficiently clear manner.  In any case, what Jan described is a
>> fundamental part of how wait queues work, and I'm kind of amazed that
>> you were able to implement DEPT without understanding it.  (But maybe
> 
> Of course, it was possible because all that Dept has to know for basic
> work is wait and event. The subtle things like what Jan told me help
> Dept be better.
> 
>> that is why some of the DEPT reports were completely incomprehensible
> 
> It's because you are blinded to blame at it without understanding how
> Dept works at all. I will fix those that must be fixed. Don't worry.
> 
>> to me; I couldn't interpret why in the world DEPT was saying there was
>> a problem.)
> 
> I can tell you if you really want to understand why. But I can't if you
> are like this.
> 
>> In any case, the thing I would ask is a little humility.  We regularly
>> use lockdep, and we run a huge number of stress tests, throughout each
>> development cycle.
> 
> Sure.
> 
>> So if DEPT is issuing lots of reports about apparently circular
>> dependencies, please try to be open to the thought that the fault is
> 
> No one was convinced that Dept doesn't have a fault. I think your
> worries are too much.
> 
>> in DEPT, and don't try to argue with maintainers that their code MUST
>> be buggy --- but since you don't understand our code, and DEPT must be
> 
> No one argued that their code must be buggy, either. So I don't think
> you have to worry about what's never happened.
> 
>> theoretically perfect, that it is up to the Maintainers to prove to
>> you that their code is correct.
>> 
>> I am going to gently suggest that it is at least as likely, if not
>> more likely, that the failure is in DEPT or your understanding of what
> 
> No doubt. I already think so. But it doesn't mean that I have to keep
> quiet without discussing to imporve Dept. I will keep improving Dept in
> a reasonable way.
> 
>> how kernel wait channels and locking works.  After all, why would it
>> be that we haven't found these problems via our other QA practices?
> 
> Let's talk more once you understand how Dept works at least 10%. Or I
> think we cannot talk in a productive way.
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-05 14:55                               ` Byungchul Park
  2022-03-05 15:12                                 ` Reimar Döffinger
@ 2022-03-06  3:30                                 ` Theodore Ts'o
  2022-03-06 10:51                                   ` Byungchul Park
  1 sibling, 1 reply; 67+ messages in thread
From: Theodore Ts'o @ 2022-03-06  3:30 UTC (permalink / raw)
  To: Byungchul Park
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, torvalds,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Sat, Mar 05, 2022 at 11:55:34PM +0900, Byungchul Park wrote:
> > that is why some of the DEPT reports were completely incomprehensible
> 
> It's because you are blinded to blame at it without understanding how
> Dept works at all. I will fix those that must be fixed. Don't worry.

Users of DEPT must not have to understand how DEPT works in order to
understand and use DEPT reports.  If you think I don't understand how
DEPT work, I'm going to gently suggest that this means DEPT reports
are clear enough, and/or DEPT documentation needs to be
*substantially* improved, or both --- and these needs to happen before
DEPT is ready to be merged.

> > So if DEPT is issuing lots of reports about apparently circular
> > dependencies, please try to be open to the thought that the fault is
> 
> No one was convinced that Dept doesn't have a fault. I think your
> worries are too much.

In that case, may I ask that you add back a RFC to the subject prefix
(e.g., [PATCH RFC -v5]?)  Or maybe even add the subject prefix NOT YET
READY?  I have seen cases when after a patch series get to PATCH -v22,
and then people assume that it *must* be ready, as opposed what it
really means, which is "the author is just persistently reposting and
rebasing the patch series over and over again".  It would be helpful
if you directly acknowledge, in each patch submission, that it is not
yet ready for prime time.

After all, right now, DEPT has generated two reports in ext4, both of
which were false positives, and both of which have required a lot of
maintainer times to prove to you that they were in fact false
positives.  So are we all agreed that DEPT is not ready for prime
time?

> No one argued that their code must be buggy, either. So I don't think
> you have to worry about what's never happened.

Well, you kept on insisting that ext4 must have a circular dependency,
and that depending on a "rescue wakeup" is bad programming practice,
but you'll reluctantly agree to make DEPT accept "rescue wakeups" if
that is the will of the developers.  My concern here is the
fundmaental concept of "rescue wakeups" is wrong; I don't see how you
can distinguish between a valid wakeup and one that you and DEPT is
going to somehow characterize as dodgy.

Consider: a process can first subscribe to multiple wait queues, and
arrange to be woken up by a timeout, and then call schedule() to go to
sleep.  So it is not waiting on a single wait channel, but potentially
*multiple* wakeup sources.  If you are going to prove that kernel is
not going to make forward progress, you need to prove that *all* ways
that process might not wake up aren't going to happen for some reason.

Just because one wakeup source seems to form a circular dependency
proves nothing, since another wakeup source might be the designed and
architected way that code makes forward progress.

You seem to be assuminng that one wakeup source is somehow the
"correct" one, and the other ways that process could be woken up is a
"rescue wakeup source" and you seem to believe that relying on a
"rescue wakeup source" is bad.  But in the case of a process which has
called prepare-to-wait on more than one wait queue, how is DEPT going
to distinguish between your "morally correct" wkaeup source, and the
"rescue wakeup source"?

> No doubt. I already think so. But it doesn't mean that I have to keep
> quiet without discussing to imporve Dept. I will keep improving Dept in
> a reasonable way.

Well, I don't want to be in a position of having to prove that every
single DEPT report in my code that doesn't make sense to me is
nonsense, or else DEPT will get merged.

So maybe we need to reverse the burden of proof.

Instead of just sending a DEPT report, and then asking the maintainers
to explain why it is a false positive, how about if *you* use the DEPT
report to examinie the subsystem code, and then explain plain English,
how you think this could trigger in real life, or cause a performance
problem in real life or perhaps provide a script or C reproducer that
triggers the supposed deadlock?

Yes, that means you will need to understand the code in question, but
hopefully the DEPT reports should be clear enough that someone who
isn't a deep expert in the code should be able to spot the bug.  If
not, and if only a few deep experts of code in question will be able
to decipher the DEPT report and figure out a fix, that's really not
ideal.

If DEPT can find a real bug and you can show that Lockdep wouldn't
have been able to find it, then that would be proof that it is making
a real contribution.  That's would be real benefit.  At the same time,
DEPT will hopefully be able to demonstrate a false positive rate which
is low enough that the benefits clearly outweight the costs.

At the moment, I believe the scoreboard for DEPT with respect to ext4
is zero real bugs found, and two false positives, both of which have
required significant rounds of e-mail before the subsystem maintainers
were able to prove to you that it was, indeed, DEPT reporting a false
positive.

Do you now understand why I am so concerned that you aren't putting an
RFC or NOT YET READY in the subject line?

						- Ted

P.S.  If DEPT had a CONFIG_EXPERIMENTAL, with a disclaimer in the
KConfig that some of its reports might be false positives, that might
be another way of easing my fears that this won't get used by
Syzkaller, and to generate a lot of burdensome triage work on the
maintainers.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-06  3:30                                 ` Theodore Ts'o
@ 2022-03-06 10:51                                   ` Byungchul Park
  2022-03-06 14:19                                     ` Theodore Ts'o
  0 siblings, 1 reply; 67+ messages in thread
From: Byungchul Park @ 2022-03-06 10:51 UTC (permalink / raw)
  To: tytso
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, torvalds,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

Ted wrote:
> On Sat, Mar 05, 2022 at 11:55:34PM +0900, Byungchul Park wrote:
> > > that is why some of the DEPT reports were completely incomprehensible
> > 
> > It's because you are blinded to blame at it without understanding how
> > Dept works at all. I will fix those that must be fixed. Don't worry.
> 
> Users of DEPT must not have to understand how DEPT works in order to

Users must not have to understand how Dept works for sure, and haters
must not blame things based on what they guess wrong.

> understand and use DEPT reports.  If you think I don't understand how
> DEPT work, I'm going to gently suggest that this means DEPT reports
> are clear enough, and/or DEPT documentation needs to be
> *substantially* improved, or both --- and these needs to happen before
> DEPT is ready to be merged.

Okay.

> > > So if DEPT is issuing lots of reports about apparently circular
> > > dependencies, please try to be open to the thought that the fault is
> > 
> > No one was convinced that Dept doesn't have a fault. I think your
> > worries are too much.
> 
> In that case, may I ask that you add back a RFC to the subject prefix
> (e.g., [PATCH RFC -v5]?)  Or maybe even add the subject prefix NOT YET

I will.

> READY?  I have seen cases when after a patch series get to PATCH -v22,
> and then people assume that it *must* be ready, as opposed what it
> really means, which is "the author is just persistently reposting and
> rebasing the patch series over and over again".  It would be helpful
> if you directly acknowledge, in each patch submission, that it is not
> yet ready for prime time.
> 
> After all, right now, DEPT has generated two reports in ext4, both of
> which were false positives, and both of which have required a lot of
> maintainer times to prove to you that they were in fact false
> positives.  So are we all agreed that DEPT is not ready for prime
> time?

Yes.

> > No one argued that their code must be buggy, either. So I don't think
> > you have to worry about what's never happened.
> 
> Well, you kept on insisting that ext4 must have a circular dependency,
> and that depending on a "rescue wakeup" is bad programming practice,
> but you'll reluctantly agree to make DEPT accept "rescue wakeups" if
> that is the will of the developers.  My concern here is the
> fundmaental concept of "rescue wakeups" is wrong; I don't see how you
> can distinguish between a valid wakeup and one that you and DEPT is
> going to somehow characterize as dodgy.

Your concern on it makes sense. I need to explain how I think about it
more, but not now cuz I guess the other folks alrealy got tired enough.
Let's talk about it later when needed again.

> Consider: a process can first subscribe to multiple wait queues, and
> arrange to be woken up by a timeout, and then call schedule() to go to
> sleep.  So it is not waiting on a single wait channel, but potentially
> *multiple* wakeup sources.  If you are going to prove that kernel is
> not going to make forward progress, you need to prove that *all* ways
> that process might not wake up aren't going to happen for some reason.
> 
> Just because one wakeup source seems to form a circular dependency
> proves nothing, since another wakeup source might be the designed and
> architected way that code makes forward progress.

I also think it's legal if the design is intended. But it's not if not.
This is what I meant. Again, it's not about ext4.

> You seem to be assuminng that one wakeup source is somehow the
> "correct" one, and the other ways that process could be woken up is a
> "rescue wakeup source" and you seem to believe that relying on a
> "rescue wakeup source" is bad.  But in the case of a process which has

It depends on whether or not it's intended IMHO.

> called prepare-to-wait on more than one wait queue, how is DEPT going
> to distinguish between your "morally correct" wkaeup source, and the
> "rescue wakeup source"?

Sure, it should be done manually. I should do it on my own when that
kind of issue arises. Agian, ext4 is not the case cuz, based on what Jan
explained, there's no real circular dependency wrt commit wq, done wq
and so on.

> > No doubt. I already think so. But it doesn't mean that I have to keep
> > quiet without discussing to imporve Dept. I will keep improving Dept in
> > a reasonable way.
> 
> Well, I don't want to be in a position of having to prove that every
> single DEPT report in my code that doesn't make sense to me is
> nonsense, or else DEPT will get merged.
> 
> So maybe we need to reverse the burden of proof.

I will keep in mind.

> Instead of just sending a DEPT report, and then asking the maintainers
> to explain why it is a false positive, how about if *you* use the DEPT
> report to examinie the subsystem code, and then explain plain English,
> how you think this could trigger in real life, or cause a performance
> problem in real life or perhaps provide a script or C reproducer that
> triggers the supposed deadlock?

Makes sense. Let me try.

> Yes, that means you will need to understand the code in question, but
> hopefully the DEPT reports should be clear enough that someone who
> isn't a deep expert in the code should be able to spot the bug.  If
> not, and if only a few deep experts of code in question will be able
> to decipher the DEPT report and figure out a fix, that's really not
> ideal.

Agree. Just FYI, I've never blamed you are not the expert on Dept.

> If DEPT can find a real bug and you can show that Lockdep wouldn't
> have been able to find it, then that would be proof that it is making
> a real contribution.  That's would be real benefit.  At the same time,
> DEPT will hopefully be able to demonstrate a false positive rate which
> is low enough that the benefits clearly outweight the costs.
> 
> At the moment, I believe the scoreboard for DEPT with respect to ext4
> is zero real bugs found, and two false positives, both of which have
> required significant rounds of e-mail before the subsystem maintainers
> were able to prove to you that it was, indeed, DEPT reporting a false
> positive.

Right. But we've barely talked in a productive way. We've talked about
other things way more than proving what you're mentioning.

> Do you now understand why I am so concerned that you aren't putting an
> RFC or NOT YET READY in the subject line?

Yes. I will.

> - Ted
> 
> P.S.  If DEPT had a CONFIG_EXPERIMENTAL, with a disclaimer in the
> KConfig that some of its reports might be false positives, that might
> be another way of easing my fears that this won't get used by
> Syzkaller, and to generate a lot of burdensome triage work on the
> maintainers.

Thank you for straightforward explanation this time,
Byungchul

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-06 10:51                                   ` Byungchul Park
@ 2022-03-06 14:19                                     ` Theodore Ts'o
  2022-03-10  1:45                                       ` Byungchul Park
  0 siblings, 1 reply; 67+ messages in thread
From: Theodore Ts'o @ 2022-03-06 14:19 UTC (permalink / raw)
  To: Byungchul Park
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, torvalds,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Sun, Mar 06, 2022 at 07:51:42PM +0900, Byungchul Park wrote:
> > 
> > Users of DEPT must not have to understand how DEPT works in order to
> 
> Users must not have to understand how Dept works for sure, and haters
> must not blame things based on what they guess wrong.

For the record, I don't hate DEPT.  I *fear* that DEPT will result in
my getting spammed with a huge number of false posiives once automated
testing systems like Syzkaller, zero-day test robot, etcs., get a hold
of it once it gets merged and start generating hundreds of automated
reports.

And when I tried to read the DEPT reports, and the DEPT documentation,
and I found that its explanation for why ext4 had a circular
dependency simply did not make sense.  If my struggles to understand
why DEPT was issuing a false positive is "guessing", then how do we
have discussions over how to make DEPT better?

> > called prepare-to-wait on more than one wait queue, how is DEPT going
> > to distinguish between your "morally correct" wkaeup source, and the
> > "rescue wakeup source"?
> 
> Sure, it should be done manually. I should do it on my own when that
> kind of issue arises.

The question here is how often will it need to be done, and how easy
will it be to "do it manually"?  Suppose we mark all of the DEPT false
positives before it gets merged?  How easy will it be able to suppress
future false positives in the future, as the kernel evolves?

Perhaps one method is to haved a way to take a particular wait queue,
or call to schedule(), or at the level of an entire kernel source
file, and opt it out from DEPT analysis?  That way, if DEPT gets
merged, and a maintainer starts getting spammed by bogus (or
incomprehensible) reports, there is a simople way they can annotate
their source code to prevent DEPT from analyzing code that it is
apparently not able to understand correctly.

That way we don't necessarily need to have a debate over how close to
zero percent false positives is necessary before DEPT can get merged.
And we avoid needing to force maintainers to prove that a DEPT report
is a false positive, which is from my experience hard to do, since
they get accused of being DEPT haters and not understanding DEPT.

	  	   	      	    	    	     - Ted

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-05 15:05                                 ` Joel Fernandes
@ 2022-03-07  2:43                                   ` Byungchul Park
  0 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-03-07  2:43 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Theodore Ts'o, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, torvalds, mingo, linux-kernel, peterz, will, tglx,
	rostedt, sashal, daniel.vetter, chris, duyuyang, johannes.berg,
	tj, willy, david, amir73il, bfields, gregkh, kernel-team,
	linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj,
	jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta,
	linux-block, paolo.valente, josef, linux-fsdevel, viro, jack,
	jack, jlayton, dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa, paulmck

On Sat, Mar 05, 2022 at 03:05:23PM +0000, Joel Fernandes wrote:
> On Sat, Mar 05, 2022 at 11:15:38PM +0900, Byungchul Park wrote:
> > Almost all you've been blaming at Dept are totally non-sense. Based on
> > what you're saying, I'm conviced that you don't understand how Dept
> > works even 1%. You don't even try to understand it before blame.
> > 
> > You don't have to understand and support it. But I can't response to you
> > if you keep saying silly things that way.
> 
> Byungchul, other than ext4 have there been any DEPT reports that other
> subsystem maintainers' agree were valid usecases?

Not yet.

> Regarding false-positives, just to note lockdep is not without its share of
> false-positives. Just that (as you know), the signal-to-noise ratio should be
> high for it to be useful. I've put up with lockdep's false positives just
> because it occasionally saves me from catastrophe.

I love your insight. Agree. A tool would be useful only when it's
*actually* helpful. I hope Dept would be so.

> > > In any case, if DEPT is going to report these "circular dependencies
> > > as bugs that MUST be fixed", it's going to be pure noise and I will
> > > ignore all DEPT reports, and will push back on having Lockdep replaced
> > 
> > Dept is going to be improved so that what you are concerning about won't
> > be reported.
> 
> Yeah I am looking forward to learning more about it however I was wondering
> about the following: lockdep can already be used for modeling "resource
> acquire/release" and "resource wait" semantics that are unrelated to locks,
> like we do in mm reclaim. I am wondering why we cannot just use those existing
> lockdep mechanisms for the wait/wake usecases (Assuming that we can agree

1. Lockdep can't work with general waits/events happening across
   contexts basically. To get over this, manual tagging of
   acquire/release can be used at each section that we suspect. But
   unfortunately, we cannot use the method if we cannot simply identify
   the sections. Furthermore, it's inevitable to miss sections that
   shouldn't get missed.

2. Some cases should be correctly tracked via wait/event model, not
   acquisition order model. For example, read-lock in rwlock should be
   defined as a waiter waiting for write-unlock, write-lock in rwlock
   as a waiter waiting for either read-unlock or write-unlock.
   Otherwise, if we try to track those cases using acquisition order,
   it cannot completely work. Don't you think it looks werid?

3. Tracking what we didn't track before means both stronger detection
   and new emergence of false positives, exactly same as Lockdep at its
   beginning when it started to track what we hadn't tracked before.
   Even though the emergence was allowed at that time, now that Locdkep
   got stable enough, folks would be more strict on new emergences. It's
   gonna get even worse if valid reports are getting prevented by false
   positives.

   For that reason, multi reporting functionality is essential. I was
   thinking to improve Lockdep to allow multi reporting. But it might be
   needed to change more than developing a new tool from scratch. Plus
   it might be even more difficult cuz Lockdep already works not badly.
   So even for Lockdep, I thought the new thing should be developed
   independently leaving Lockdep as it is.

4. (minor reason) The concept and name of acquisition and release is not
   for general wait/event. The design and implementation are not,
   either. I wanted to address the issue as soon as possible before we
   squeeze out Lockdep to use for general wait/event more and the kernel
   code gets weird. Of course, it doesn't mean Dept is more stable than
   Lockdep. However, I can tell Dept works what a dependency tool should
   do and we need to make the code go right.

> that circular dependencies on related to wait/wake is a bad thing. Or perhaps
> there's a reason why Peter Zijlstra did not use lockdep for wait/wake
> dependencies (such as multiple wake sources) considering he wrote a lot of
> that code.
> 
> Keep kicking ass brother, you're doing great.

Thank you! I'll go through this in a right way so as not to disappoint
you!

Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Report 2 in ext4 and journal based on v5.17-rc1
  2022-03-06 14:19                                     ` Theodore Ts'o
@ 2022-03-10  1:45                                       ` Byungchul Park
  0 siblings, 0 replies; 67+ messages in thread
From: Byungchul Park @ 2022-03-10  1:45 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, torvalds,
	mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, willy, david,
	amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, paolo.valente, josef,
	linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch,
	djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw,
	hamohammed.sa

On Sun, Mar 06, 2022 at 09:19:10AM -0500, Theodore Ts'o wrote:
> On Sun, Mar 06, 2022 at 07:51:42PM +0900, Byungchul Park wrote:
> > > 
> > > Users of DEPT must not have to understand how DEPT works in order to
> > 
> > Users must not have to understand how Dept works for sure, and haters
> > must not blame things based on what they guess wrong.
> 
> For the record, I don't hate DEPT.  I *fear* that DEPT will result in
> my getting spammed with a huge number of false posiives once automated
> testing systems like Syzkaller, zero-day test robot, etcs., get a hold
> of it once it gets merged and start generating hundreds of automated
> reports.

Agree. Dept should not be a part of *automated testing system* until it
finally works as much as Lockdep in terms of false positives. However,
it's impossible to achieve it by doing it out of the tree.

Besides automated testing system, Dept works great in the middle of
developing something that is so complicated in terms of synchronization.
They don't have to worry about real reports anymore, that should be
reported, from getting prevented by a false positve.

I will explicitely describe EXPERIMENTAL and "Dept might false-alarm" in
Kconfig until it's considered a few-false-alarming tool.

> > Sure, it should be done manually. I should do it on my own when that
> > kind of issue arises.
> 
> The question here is how often will it need to be done, and how easy

I guess it's gonna rarely happens. I want to see too.

> will it be to "do it manually"?  Suppose we mark all of the DEPT false

Very easy. Equal to or easier than the way we do for lockdep. But the
interest would be wait/event objects rather than locks.

> positives before it gets merged?  How easy will it be able to suppress
> future false positives in the future, as the kernel evolves?

Same as - or even better than - what we do for Lockdep.

And we'd better consider those activies as a code-documentation. Not
only making things just work but organizing code and documenting
in code, are also very meaningful.

> Perhaps one method is to haved a way to take a particular wait queue,
> or call to schedule(), or at the level of an entire kernel source
> file, and opt it out from DEPT analysis?  That way, if DEPT gets
> merged, and a maintainer starts getting spammed by bogus (or

Once Dept gets stable - hoplefully now that Dept is working very
conservatively, there might not be as many false positives as you're
concerning. The situation is in control.

> That way we don't necessarily need to have a debate over how close to
> zero percent false positives is necessary before DEPT can get merged.

Non-sense. I would agree with you if it was so when Lockdep was merged.
But I'll try to achieve almost zero false positives, again, it's
impossible to do it out of tree.

> And we avoid needing to force maintainers to prove that a DEPT report

So... It'd be okay if Dept goes not as a part of automated testing
system. Right?

> is a false positive, which is from my experience hard to do, since
> they get accused of being DEPT haters and not understanding DEPT.

Honestly, it's not a problem of that they don't understand other
domians than what they are familiar with, but another issue. I won't
mention it.

And it sounds like you'd do nothing unless it turns out to be
problematic 100%. It's definitely the *easiest* way to maintain
something because it's the same as not checking its correctness at all.

Even if it's so hard to do, checking if the code is safe for real
repeatedly, is what it surely should be done. Again, I understand it
would be freaking hard. But it doesn't mean we should avoid it.

Here, there seems to be two points you'd like to say:

1. Fundamental question. Does Dept track wait and event correctly?
2. Even if so, can Dept consider all the subtle things in the kernel?

For 1, I'm willing to response to whatever it is. And not only me but we
can make it perfectly work if the concept and direction is *right*.
For 2, I need to ask things and try my best to fix those if it exists.

Again. Dept won't be a part of *automated testing system* until it
finally works as much as Lockdep in terms of false positives. Hopefully
you are okay with it.

---
Byungchul

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2022-03-10  1:46 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
2022-02-17 10:57 ` [PATCH 01/16] llist: Move llist_{head,node} definition to types.h Byungchul Park
2022-02-17 10:57 ` [PATCH 02/16] dept: Implement Dept(Dependency Tracker) Byungchul Park
2022-02-17 15:54   ` Steven Rostedt
2022-02-17 17:36   ` Steven Rostedt
2022-02-18  6:09     ` Byungchul Park
2022-02-17 19:46   ` kernel test robot
2022-02-17 19:46   ` kernel test robot
2022-02-17 10:57 ` [PATCH 03/16] dept: Embed Dept data in Lockdep Byungchul Park
2022-02-17 10:57 ` [PATCH 04/16] dept: Apply Dept to spinlock Byungchul Park
2022-02-17 10:57 ` [PATCH 05/16] dept: Apply Dept to mutex families Byungchul Park
2022-02-17 10:57 ` [PATCH 06/16] dept: Apply Dept to rwlock Byungchul Park
2022-02-17 10:57 ` [PATCH 07/16] dept: Apply Dept to wait_for_completion()/complete() Byungchul Park
2022-02-17 19:46   ` kernel test robot
2022-02-17 10:57 ` [PATCH 08/16] dept: Apply Dept to seqlock Byungchul Park
2022-02-17 10:57 ` [PATCH 09/16] dept: Apply Dept to rwsem Byungchul Park
2022-02-17 10:57 ` [PATCH 10/16] dept: Add proc knobs to show stats and dependency graph Byungchul Park
2022-02-17 15:55   ` Steven Rostedt
2022-02-17 10:57 ` [PATCH 11/16] dept: Introduce split map concept and new APIs for them Byungchul Park
2022-02-17 10:57 ` [PATCH 12/16] dept: Apply Dept to wait/event of PG_{locked,writeback} Byungchul Park
2022-02-17 10:57 ` [PATCH 13/16] dept: Apply SDT to swait Byungchul Park
2022-02-17 10:57 ` [PATCH 14/16] dept: Apply SDT to wait(waitqueue) Byungchul Park
2022-02-17 10:57 ` [PATCH 15/16] locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread Byungchul Park
2022-02-17 10:57 ` [PATCH 16/16] dept: Distinguish each syscall context from another Byungchul Park
2022-02-17 11:10 ` Report 1 in ext4 and journal based on v5.17-rc1 Byungchul Park
2022-02-17 11:10   ` Report 2 " Byungchul Park
2022-02-21 19:02     ` Jan Kara
2022-02-23  0:35       ` Byungchul Park
2022-02-23 14:48         ` Jan Kara
2022-02-24  1:11           ` Byungchul Park
2022-02-24 10:22             ` Jan Kara
2022-02-28  9:28               ` Byungchul Park
2022-02-28 10:14                 ` Jan Kara
2022-02-28 21:25                   ` Theodore Ts'o
2022-03-03  1:36                     ` Byungchul Park
2022-03-03  1:00                   ` Byungchul Park
2022-03-03  2:32                     ` Theodore Ts'o
2022-03-03  5:23                       ` Byungchul Park
2022-03-03 14:36                         ` Theodore Ts'o
2022-03-04  0:42                           ` Byungchul Park
2022-03-05  3:26                             ` Theodore Ts'o
2022-03-05 14:15                               ` Byungchul Park
2022-03-05 15:05                                 ` Joel Fernandes
2022-03-07  2:43                                   ` Byungchul Park
2022-03-04  3:20                           ` Byungchul Park
2022-03-05  3:40                             ` Theodore Ts'o
2022-03-05 14:55                               ` Byungchul Park
2022-03-05 15:12                                 ` Reimar Döffinger
2022-03-06  3:30                                 ` Theodore Ts'o
2022-03-06 10:51                                   ` Byungchul Park
2022-03-06 14:19                                     ` Theodore Ts'o
2022-03-10  1:45                                       ` Byungchul Park
2022-03-03  9:54                     ` Jan Kara
2022-03-04  1:56                       ` Byungchul Park
2022-02-17 13:27   ` Report 1 " Matthew Wilcox
2022-02-18  0:41     ` Byungchul Park
2022-02-22  8:27   ` Jan Kara
2022-02-23  1:40     ` Byungchul Park
2022-02-23  3:30     ` Byungchul Park
2022-02-17 15:51 ` [PATCH 00/16] DEPT(Dependency Tracker) Theodore Ts'o
2022-02-17 17:00   ` Steven Rostedt
2022-02-17 17:06     ` Matthew Wilcox
2022-02-19 10:05       ` Byungchul Park
2022-02-18  4:19     ` Theodore Ts'o
2022-02-19 10:34       ` Byungchul Park
2022-02-19 10:18     ` Byungchul Park
2022-02-19  9:54   ` Byungchul Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).