On Fri, Jan 20, 2023 at 07:44:01PM -0800, Boqun Feng wrote: > On Sat, Jan 21, 2023 at 12:28:14PM +0900, Byungchul Park wrote: > > On Thu, Jan 19, 2023 at 07:07:59PM -0800, Boqun Feng wrote: > > > On Thu, Jan 19, 2023 at 06:23:49PM -0800, Boqun Feng wrote: > > > > On Fri, Jan 20, 2023 at 10:51:45AM +0900, Byungchul Park wrote: > > > > [...] > > > > > > > T0 T1 T2 > > > > > -- -- -- > > > > > unfair_read_lock(A); > > > > > write_lock(B); > > > > > write_lock(A); > > > > > write_lock(B); > > > > > fair_read_lock(A); > > > > > write_unlock(B); > > > > > read_unlock(A); > > > > > read_unlock(A); > > > > > write_unlock(B); > > > > > write_unlock(A); > > > > > > > > > > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner > > > > > not doing either write_unlock(B) or read_unlock(B). In other words: > > > > > > > > > > 1. read_unlock(A) happening depends on write_unlock(B) happening. > > > > > 2. read_unlock(A) happening depends on read_unlock(B) happening. > > > > > > > > > > T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A > > > > > owner not doing either write_unlock(A) or read_unlock(A). In other > > > > > words: > > > > > > > > > > 3. write_unlock(B) happening depends on write_unlock(A) happening. > > > > > 4. write_unlock(B) happening depends on read_unlock(A) happening. > > > > > > > > > > 1, 2, 3 and 4 give the following dependencies: > > > > > > > > > > 1. read_unlock(A) -> write_unlock(B) > > > > > 2. read_unlock(A) -> read_unlock(B) > > > > > 3. write_unlock(B) -> write_unlock(A) > > > > > 4. write_unlock(B) -> read_unlock(A) > > > > > > > > > > With 1 and 4, there's a circular dependency so DEPT definitely report > > > > > this as a problem. > > > > > > > > > > REMIND: DEPT focuses on waits and events. > > > > > > > > Do you have the test cases showing DEPT can detect this? > > > > > > > > > > Just tried the following on your latest GitHub branch, I commented all > > > but one deadlock case. Lockdep CAN detect it but DEPT CANNOT detect it. > > > Feel free to double check. > > > > I tried the 'queued read lock' test cases with DEPT on. I can see DEPT > > detect and report it. But yeah.. it's too verbose now. It's because DEPT > > is not aware of the test environment so it's just working hard to report > > every case. > > > > To make DEPT work with the selftest better, some works are needed. I > > will work on it later or you please work on it. > > > > The corresponding report is the following. > > > [...] > > [ 4.593037] context A's detail > > [ 4.593351] --------------------------------------------------- > > [ 4.593944] context A > > [ 4.594182] [S] lock(&rwlock_A:0) > > [ 4.594577] [W] lock(&rwlock_B:0) > > [ 4.594952] [E] unlock(&rwlock_A:0) > > [ 4.595341] > > [ 4.595501] [S] lock(&rwlock_A:0): > > [ 4.595848] [] queued_read_lock_hardirq_ER_rE+0xf4/0x170 > > [ 4.596547] stacktrace: > > [ 4.596797] _raw_read_lock+0xcf/0x110 > > [ 4.597215] queued_read_lock_hardirq_ER_rE+0xf4/0x170 > > [ 4.597766] dotest+0x30/0x7bc > > [ 4.598118] locking_selftest+0x2c6f/0x2ead > > [ 4.598602] start_kernel+0x5aa/0x6d5 > > [ 4.599017] secondary_startup_64_no_verify+0xe0/0xeb > > [ 4.599562] > [...] > > [ 4.608427] [] queued_read_lock_hardirq_RE_Er+0xf4/0x170 > > [ 4.609113] stacktrace: > > [ 4.609366] _raw_write_lock+0xc3/0xd0 > > [ 4.609788] queued_read_lock_hardirq_RE_Er+0xf4/0x170 > > [ 4.610371] dotest+0x30/0x7bc > > [ 4.610730] locking_selftest+0x2c41/0x2ead > > [ 4.611195] start_kernel+0x5aa/0x6d5 > > [ 4.611615] secondary_startup_64_no_verify+0xe0/0xeb > > [ 4.612164] > > [ 4.612325] [W] lock(&rwlock_A:0): > > [ 4.612671] [] queued_read_lock_hardirq_RE_Er+0x100/0x170 > > [ 4.613369] stacktrace: > > [ 4.613622] _raw_read_lock+0xac/0x110 > > [ 4.614047] queued_read_lock_hardirq_RE_Er+0x100/0x170 > > [ 4.614652] dotest+0x30/0x7bc > > [ 4.615007] locking_selftest+0x2c41/0x2ead > > [ 4.615468] start_kernel+0x5aa/0x6d5 > > [ 4.615879] secondary_startup_64_no_verify+0xe0/0xeb > > [ 4.616607] > [...] > > > As I told you, DEPT treats a queued lock as a normal type lock, no > > matter whether it's a read lock. That's why it prints just > > 'lock(&rwlock_A:0)' instead of 'read_lock(&rwlock_A:0)'. If needed, I'm > > gonna change the format. > > > > I checked the selftest code and found, LOCK(B) is transformed like: > > > > LOCK(B) -> WL(B) -> write_lock(&rwlock_B) > > > > That's why '&rwlock_B' is printed instead of just 'B', JFYI. > > > > Nah, you output shows that you've run at least both function > > queued_read_lock_hardirq_RE_Er() > queued_read_lock_hardirq_ER_rE() > > but if you apply my diff > > https://lore.kernel.org/lkml/Y8oFj9A19cw3enHB@boqun-archlinux/ > > you should only run > > queued_read_lock_hardirq_RE_Er() > > one test. > > One of the reason that DEPT "detect" this is that DEPT doesn't reset > between tests, so old dependencies from previous run get carried over. > > > > Plus, for your information, you should turn on CONFIG_DEPT to use it. > > > > Yes I turn that config on. FWIW, the branch is at: https://github.com/fbq/linux-rust/tree/dept-test .config attached, and be sure to run with kernel command line "debug_locks_verbose=2" to see lockdep warning: [ 0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [ 0.000000] ... MAX_LOCKDEP_SUBCLASSES: 8 [ 0.000000] ... MAX_LOCK_DEPTH: 48 [ 0.000000] ... MAX_LOCKDEP_KEYS: 8192 [ 0.000000] ... CLASSHASH_SIZE: 4096 [ 0.000000] ... MAX_LOCKDEP_ENTRIES: 32768 [ 0.000000] ... MAX_LOCKDEP_CHAINS: 65536 [ 0.000000] ... CHAINHASH_SIZE: 32768 [ 0.000000] memory used by lock dependency info: 6365 kB [ 0.000000] memory used for stack traces: 4224 kB [ 0.000000] per task-struct memory footprint: 1920 bytes [ 0.000000] DEPendency Tracker: Copyright (c) 2020 LG Electronics, Inc., Byungchul Park [ 0.000000] ... DEPT_MAX_STACK_ENTRY: 16 [ 0.000000] ... DEPT_MAX_WAIT_HIST : 64 [ 0.000000] ... DEPT_MAX_ECXT_HELD : 48 [ 0.000000] ... DEPT_MAX_SUBCLASSES : 16 [ 0.000000] ... memory initially used by dep: 1664 KB [ 0.000000] ... memory initially used by class: 1472 KB [ 0.000000] ... memory initially used by stack: 9216 KB [ 0.000000] ... memory initially used by ecxt: 1664 KB [ 0.000000] ... memory initially used by wait: 2816 KB [ 0.000000] ... hash list head used by dep: 32 KB [ 0.000000] ... hash list head used by class: 32 KB [ 0.000000] ... total memory initially used by objects and hashs: 8480 KB [ 0.000000] ... per task memory footprint: 2720 bytes [ 0.000000] ------------------------ [ 0.000000] | Locking API testsuite: [ 0.000000] ---------------------------------------------------------------------------- [ 0.000000] | spin |wlock |rlock |mutex | wsem | rsem |rtmutex [ 0.000000] -------------------------------------------------------------------------- [ 0.000000] -------------------------------------------------------------------------- [ 0.000000] | queued read lock tests | [ 0.000000] --------------------------- [ 0.000000] hardirq read-lock/lock-read: [ 0.000000] [ 0.000000] ====================================================== [ 0.000000] WARNING: possible circular locking dependency detected [ 0.000000] 6.2.0-rc2+ #15 Not tainted [ 0.000000] ------------------------------------------------------ [ 0.000000] swapper/0/0 is trying to acquire lock: [ 0.000000] ffffffffb6600278 (rwlock_A){.-..}-{2:2}, at: locking_selftest+0x20f/0xa64 [ 0.000000] [ 0.000000] but task is already holding lock: [ 0.000000] ffffffffb66001f8 (rwlock_B){-...}-{2:2}, at: locking_selftest+0x203/0xa64 [ 0.000000] [ 0.000000] which lock already depends on the new lock. [ 0.000000] [ 0.000000] [ 0.000000] the existing dependency chain (in reverse order) is: [ 0.000000] [ 0.000000] -> #1 (rwlock_B){-...}-{2:2}: [ 0.000000] _raw_write_lock+0x7e/0xd0 [ 0.000000] locking_selftest+0x191/0xa64 [ 0.000000] start_kernel+0x5b0/0x6e9 [ 0.000000] secondary_startup_64_no_verify+0xe0/0xeb [ 0.000000] [ 0.000000] -> #0 (rwlock_A){.-..}-{2:2}: [ 0.000000] __lock_acquire+0x149f/0x2620 [ 0.000000] lock_acquire+0xdb/0x300 [ 0.000000] _raw_read_lock+0xf9/0x110 [ 0.000000] locking_selftest+0x20f/0xa64 [ 0.000000] start_kernel+0x5b0/0x6e9 [ 0.000000] secondary_startup_64_no_verify+0xe0/0xeb [ 0.000000] [ 0.000000] other info that might help us debug this: [ 0.000000] [ 0.000000] Possible unsafe locking scenario: [ 0.000000] [ 0.000000] CPU0 CPU1 [ 0.000000] ---- ---- [ 0.000000] lock(rwlock_B); [ 0.000000] lock(rwlock_A); [ 0.000000] lock(rwlock_B); [ 0.000000] lock(rwlock_A); [ 0.000000] [ 0.000000] *** DEADLOCK *** [ 0.000000] [ 0.000000] 1 lock held by swapper/0/0: [ 0.000000] #0: ffffffffb66001f8 (rwlock_B){-...}-{2:2}, at: locking_selftest+0x203/0xa64 [ 0.000000] [ 0.000000] stack backtrace: [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc2+ #15 [ 0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.1-1-1 04/01/2014 [ 0.000000] Call Trace: [ 0.000000] [ 0.000000] dump_stack_lvl+0x66/0x89 [ 0.000000] check_noncircular+0x102/0x120 [ 0.000000] __lock_acquire+0x149f/0x2620 [ 0.000000] ? dept_enter+0x6b/0xe0 [ 0.000000] lock_acquire+0xdb/0x300 [ 0.000000] ? locking_selftest+0x20f/0xa64 [ 0.000000] ? dept_ecxt_enter+0x139/0x1a0 [ 0.000000] _raw_read_lock+0xf9/0x110 [ 0.000000] ? locking_selftest+0x20f/0xa64 [ 0.000000] locking_selftest+0x20f/0xa64 [ 0.000000] start_kernel+0x5b0/0x6e9 [ 0.000000] secondary_startup_64_no_verify+0xe0/0xeb [ 0.000000] [ 0.000000] ok | lockclass mask: 2, debug_locks: 0, expected: 0 [ 0.000000] [ 0.000000] ------------------------------------------------------- [ 0.000000] Good, all 1 testcases passed! | [ 0.000000] --------------------------------- [ 0.000000] ACPI: Core revision 20221020 [ 0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns [ 0.001000] APIC: Switch to symmetric I/O mode setup [ 0.002000] Switched APIC routing to physical flat. [ 0.003000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 0.011000] tsc: Unable to calibrate against PIT [ 0.012000] tsc: using HPET reference calibration [ 0.013000] tsc: Detected 2394.216 MHz processor Regards, Boqun