* [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks @ 2018-11-19 18:55 Waiman Long 2018-11-19 18:55 ` [PATCH v2 01/17] locking/lockdep: Remove version from lock_class structure Waiman Long ` (16 more replies) 0 siblings, 17 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long v1->v2: - Mark more locks as terminal. - Add a patch to remove the unused version field from lock_class. - Steal some bits from pin_count of held_lock for flags. - Add a patch to warn if a task holding a raw spinlock is acquiring an non-raw lock. The purpose of this patchset is to add a new class of locks called terminal locks and converts some of the low level raw or regular spinlocks to terminal locks. A terminal lock does not have forward dependency and it won't allow a lock or unlock operation on another lock. Two level nesting of terminal locks is allowed, though. Only spinlocks that are acquired with the _irq/_irqsave variants or acquired in an IRQ disabled context should be classified as terminal locks. Because of the restrictions on terminal locks, we can do simple checks on them without using the lockdep lock validation machinery. The advantages of making these changes are as follows: 1) It saves table entries used by the validation code and hence make it harder to overflow those tables. 2) The lockdep check in __lock_acquire() will be a little bit faster for terminal locks without using the lock validation code. In fact, it is possible to overflow some of the tables by running a variety of different workloads on a debug kernel. I have seen bug reports about exhausting MAX_LOCKDEP_KEYS, MAX_LOCKDEP_ENTRIES and MAX_STACK_TRACE_ENTRIES. This patch will help to reduce the chance of overflowing some of the tables. Below were selected output lines from the lockdep_stats files of the patched and unpatched kernels after bootup and running parallel kernel builds and some perf benchmarks. Item Unpatched kernel Patched kernel % Change ---- ---------------- -------------- -------- direct dependencies 9924 9032 -9.0% dependency chains 18258 16326 -10.6% dependency chain hlocks 73198 64927 -11.3% stack-trace entries 110502 103225 -6.6% There were some reductions in the size of the lockdep tables. They were not significant, but it is still a good start to rein in the number of entries in those tables to make it harder to overflow them. In term of performance, there isn't that much noticeable differences in both the kernel build and perf benchmark. Low level locking microbenchmark with 4 locking threads shows the following locking rates on a Haswell system: Kernel Lock Rate ------ ---- ---- Unpatched kernel Regular lock 2,288 kop/s Patched kernel Regular lock 2,352 kop/s Terminal lock 2,512 kop/s I was not sure why there was a slight performance improvement with the patched kernel. However, comparing the regular and terminal lock results, there was a slight 7% improvement in locking throughput for terminal locks. Looking at the perf ouput for regular lock: 5.43% run-locktest [kernel.vmlinux] [k] __lock_acquire 4.65% run-locktest [kernel.vmlinux] [k] lock_contended 2.80% run-locktest [kernel.vmlinux] [k] lock_acquired 2.53% run-locktest [kernel.vmlinux] [k] lock_release 1.42% run-locktest [kernel.vmlinux] [k] mark_lock For terminal lock: 5.00% run-locktest [kernel.vmlinux] [k] __lock_acquire 4.66% run-locktest [kernel.vmlinux] [k] lock_contended 2.88% run-locktest [kernel.vmlinux] [k] lock_acquired 2.55% run-locktest [kernel.vmlinux] [k] lock_release 1.34% run-locktest [kernel.vmlinux] [k] mark_lock The __lock_acquire() function ran a bit faster with terminal lock, but the other lockdep functions remain more or less the same in term of performance. In term internal lockdep structure sizes, there should be no size increase for 64-bit architectures with CONFIG_LOCK_STAT defined. The lockdep_map structure will increase in size for 32-bit architectures or when CONFIG_LOCK_STAT isn't defined. Waiman Long (17): locking/lockdep: Remove version from lock_class structure locking/lockdep: Rework lockdep_set_novalidate_class() locking/lockdep: Add a new terminal lock type locking/lockdep: Add DEFINE_TERMINAL_SPINLOCK() and related macros printk: Mark logbuf_lock & console_owner_lock as terminal locks debugobjects: Mark pool_lock as a terminal lock debugobjects: Move printk out of db lock critical sections locking/lockdep: Add support for nestable terminal locks debugobjects: Make object hash locks nestable terminal locks lib/stackdepot: Make depot_lock a terminal spinlock locking/rwsem: Mark rwsem.wait_lock as a terminal lock cgroup: Mark the rstat percpu lock as terminal mm/kasan: Make quarantine_lock a terminal lock dma-debug: Mark free_entries_lock as terminal kernfs: Mark kernfs_open_node_lock as terminal lock delay_acct: Mark task's delays->lock as terminal spinlock locking/lockdep: Check raw/non-raw locking conflicts fs/kernfs/file.c | 2 +- include/linux/lockdep.h | 59 +++++++++++++++++++++++---- include/linux/rwsem.h | 11 +++++- include/linux/spinlock_types.h | 34 ++++++++++------ kernel/cgroup/rstat.c | 9 ++++- kernel/delayacct.c | 4 +- kernel/dma/debug.c | 2 +- kernel/locking/lockdep.c | 81 ++++++++++++++++++++++++++++++++------ kernel/locking/lockdep_internals.h | 5 +++ kernel/locking/lockdep_proc.c | 11 +++++- kernel/locking/rwsem-xadd.c | 1 + kernel/locking/spinlock_debug.c | 1 + kernel/printk/printk.c | 4 +- kernel/printk/printk_safe.c | 2 +- lib/debugobjects.c | 70 ++++++++++++++++++++++---------- lib/stackdepot.c | 2 +- mm/kasan/quarantine.c | 2 +- 17 files changed, 235 insertions(+), 65 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v2 01/17] locking/lockdep: Remove version from lock_class structure 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-12-11 15:22 ` [tip:locking/core] locking/lockdep: Remove ::version " tip-bot for Waiman Long 2018-11-19 18:55 ` [PATCH v2 02/17] locking/lockdep: Rework lockdep_set_novalidate_class() Waiman Long ` (15 subsequent siblings) 16 siblings, 1 reply; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long It turns out the version field in the lock_class structure isn't used anywhere. Just remove it. Signed-off-by: Waiman Long <longman@redhat.com> --- include/linux/lockdep.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 1fd82ff..c5335df 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -97,8 +97,6 @@ struct lock_class { * Generation counter, when doing certain classes of graph walking, * to ensure that we check one node only once: */ - unsigned int version; - int name_version; const char *name; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [tip:locking/core] locking/lockdep: Remove ::version from lock_class structure 2018-11-19 18:55 ` [PATCH v2 01/17] locking/lockdep: Remove version from lock_class structure Waiman Long @ 2018-12-11 15:22 ` tip-bot for Waiman Long 0 siblings, 0 replies; 40+ messages in thread From: tip-bot for Waiman Long @ 2018-12-11 15:22 UTC (permalink / raw) To: linux-tip-commits Cc: aryabinin, akpm, will.deacon, longman, peterz, mingo, sasha.levin, tj, sergey.senozhatsky, hpa, tglx, pmladek, torvalds, linux-kernel Commit-ID: 2421b7f3573babfe1673a5ffee1677a5013e6df1 Gitweb: https://git.kernel.org/tip/2421b7f3573babfe1673a5ffee1677a5013e6df1 Author: Waiman Long <longman@redhat.com> AuthorDate: Mon, 19 Nov 2018 13:55:10 -0500 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Tue, 11 Dec 2018 14:54:46 +0100 locking/lockdep: Remove ::version from lock_class structure It turns out the version field in the lock_class structure isn't used anywhere. Just remove it. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: iommu@lists.linux-foundation.org Cc: kasan-dev@googlegroups.com Link: https://lkml.kernel.org/r/1542653726-5655-2-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> --- include/linux/lockdep.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 1fd82ff99c65..c5335df2372f 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -97,8 +97,6 @@ struct lock_class { * Generation counter, when doing certain classes of graph walking, * to ensure that we check one node only once: */ - unsigned int version; - int name_version; const char *name; ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v2 02/17] locking/lockdep: Rework lockdep_set_novalidate_class() 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long 2018-11-19 18:55 ` [PATCH v2 01/17] locking/lockdep: Remove version from lock_class structure Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-11-19 18:55 ` [PATCH v2 03/17] locking/lockdep: Add a new terminal lock type Waiman Long ` (14 subsequent siblings) 16 siblings, 0 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long The current lockdep_set_novalidate_class() implementation is like a hack. It assigns a special class key for that lock and calls lockdep_init_map() twice. This patch changes the implementation to make it as a flag bit instead. This will allow other special locking class types to be defined and used in the lockdep code. A new "type" field is added to both the lockdep_map and lock_class structures. The new field can now be used to designate a lock and a class object as novalidate. The lockdep_set_novalidate_class() call, however, should be called only after lock initialization which calls lockdep_init_map(). For 64-bit architectures, the new type field won't increase the size of the lock_class structure. The lockdep_map structure won't change as well for 64-bit architectures with CONFIG_LOCK_STAT configured. Please note that lockdep_set_novalidate_class() should not be used at all unless there is overwhelming reason to do so. Hopefully we can retired it in the near future. Signed-off-by: Waiman Long <longman@redhat.com> --- include/linux/lockdep.h | 17 ++++++++++++++--- kernel/locking/lockdep.c | 14 +++++++------- 2 files changed, 21 insertions(+), 10 deletions(-) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index c5335df..8fe5b4f 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -58,8 +58,6 @@ struct lock_class_key { struct lockdep_subclass_key subkeys[MAX_LOCKDEP_SUBCLASSES]; }; -extern struct lock_class_key __lockdep_no_validate__; - #define LOCKSTAT_POINTS 4 /* @@ -94,6 +92,11 @@ struct lock_class { struct list_head locks_after, locks_before; /* + * Lock class type flags + */ + unsigned int flags; + + /* * Generation counter, when doing certain classes of graph walking, * to ensure that we check one node only once: */ @@ -140,6 +143,12 @@ struct lock_class_stats { #endif /* + * Lockdep class type flags + * 1) LOCKDEP_FLAG_NOVALIDATE: No full validation, just simple checks. + */ +#define LOCKDEP_FLAG_NOVALIDATE (1 << 0) + +/* * Map the lock object (the lock instance) to the lock-class object. * This is embedded into specific lock instances: */ @@ -147,6 +156,7 @@ struct lockdep_map { struct lock_class_key *key; struct lock_class *class_cache[NR_LOCKDEP_CACHING_CLASSES]; const char *name; + unsigned int flags; #ifdef CONFIG_LOCK_STAT int cpu; unsigned long ip; @@ -294,7 +304,8 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name, (lock)->dep_map.key, sub) #define lockdep_set_novalidate_class(lock) \ - lockdep_set_class_and_name(lock, &__lockdep_no_validate__, #lock) + do { (lock)->dep_map.flags |= LOCKDEP_FLAG_NOVALIDATE; } while (0) + /* * Compare locking classes */ diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 1efada2..493b567 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -692,10 +692,11 @@ static int count_matching_names(struct lock_class *new_class) hlist_for_each_entry_rcu(class, hash_head, hash_entry) { if (class->key == key) { /* - * Huh! same key, different name? Did someone trample - * on some memory? We're most confused. + * Huh! same key, different name or flags? Did someone + * trample on some memory? We're most confused. */ - WARN_ON_ONCE(class->name != lock->name); + WARN_ON_ONCE((class->name != lock->name) || + (class->flags != lock->flags)); return class; } } @@ -788,6 +789,7 @@ static bool assign_lock_key(struct lockdep_map *lock) debug_atomic_inc(nr_unused_locks); class->key = key; class->name = lock->name; + class->flags = lock->flags; class->subclass = subclass; INIT_LIST_HEAD(&class->lock_entry); INIT_LIST_HEAD(&class->locks_before); @@ -3108,6 +3110,7 @@ static void __lockdep_init_map(struct lockdep_map *lock, const char *name, return; } + lock->flags = 0; lock->name = name; /* @@ -3152,9 +3155,6 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name, } EXPORT_SYMBOL_GPL(lockdep_init_map); -struct lock_class_key __lockdep_no_validate__; -EXPORT_SYMBOL_GPL(__lockdep_no_validate__); - static int print_lock_nested_lock_not_held(struct task_struct *curr, struct held_lock *hlock, @@ -3215,7 +3215,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, if (unlikely(!debug_locks)) return 0; - if (!prove_locking || lock->key == &__lockdep_no_validate__) + if (!prove_locking || (lock->flags & LOCKDEP_FLAG_NOVALIDATE)) check = 0; if (subclass < NR_LOCKDEP_CACHING_CLASSES) -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v2 03/17] locking/lockdep: Add a new terminal lock type 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long 2018-11-19 18:55 ` [PATCH v2 01/17] locking/lockdep: Remove version from lock_class structure Waiman Long 2018-11-19 18:55 ` [PATCH v2 02/17] locking/lockdep: Rework lockdep_set_novalidate_class() Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-12-07 9:19 ` Peter Zijlstra 2018-11-19 18:55 ` [PATCH v2 04/17] locking/lockdep: Add DEFINE_TERMINAL_SPINLOCK() and related macros Waiman Long ` (13 subsequent siblings) 16 siblings, 1 reply; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long A terminal lock is a lock where further locking or unlocking on another lock is not allowed. IOW, no forward dependency is permitted. With such a restriction in place, we don't really need to do a full validation of the lock chain involving a terminal lock. Instead, we just check if there is any further locking or unlocking on another lock when a terminal lock is being held. Only spinlocks which are acquired by the _irq or _irqsave variants or in IRQ disabled context should be classified as terminal locks. By adding this new lock type, we can save entries in lock_chains[], chain_hlocks[], list_entries[] and stack_trace[]. By marking suitable locks as terminal, we reduce the chance of overflowing those tables allowing them to focus on locks that can have both forward and backward dependencies. Four bits are stolen from the pin_count of the held_lock structure to hold a new 4-bit flags field. The pin_count field is essentially a summation of 16-bit random cookie values. Removing 4 bits still allow the pin_count to accumulate up to almost 4096 of those cookie values. Signed-off-by: Waiman Long <longman@redhat.com> --- include/linux/lockdep.h | 29 ++++++++++++++++++++--- kernel/locking/lockdep.c | 47 ++++++++++++++++++++++++++++++++------ kernel/locking/lockdep_internals.h | 5 ++++ kernel/locking/lockdep_proc.c | 11 +++++++-- 4 files changed, 80 insertions(+), 12 deletions(-) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 8fe5b4f..a146bca 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -144,9 +144,20 @@ struct lock_class_stats { /* * Lockdep class type flags + * * 1) LOCKDEP_FLAG_NOVALIDATE: No full validation, just simple checks. + * 2) LOCKDEP_FLAG_TERMINAL: This is a terminal lock where lock/unlock on + * another lock within its critical section is not allowed. + * + * Only the least significant 4 bits of the flags will be copied to the + * held_lock structure. */ -#define LOCKDEP_FLAG_NOVALIDATE (1 << 0) +#define LOCKDEP_FLAG_TERMINAL (1 << 0) +#define LOCKDEP_FLAG_NOVALIDATE (1 << 4) + +#define LOCKDEP_HLOCK_FLAGS_MASK 0x0f +#define LOCKDEP_NOCHECK_FLAGS (LOCKDEP_FLAG_NOVALIDATE |\ + LOCKDEP_FLAG_TERMINAL) /* * Map the lock object (the lock instance) to the lock-class object. @@ -263,7 +274,16 @@ struct held_lock { unsigned int check:1; /* see lock_acquire() comment */ unsigned int hardirqs_off:1; unsigned int references:12; /* 32 bits */ - unsigned int pin_count; + /* + * Four bits are stolen from pin_count for flags so as not to + * increase the size of the structure. The stolen bits may not + * be enough in the future as more flag bits are added. However, + * not all of them may need to be checked in the held_lock + * structure. We just have to make sure that the the relevant + * ones will be in the 4 least significant bits. + */ + unsigned int pin_count:28; + unsigned int flags:4; }; /* @@ -305,6 +325,8 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name, #define lockdep_set_novalidate_class(lock) \ do { (lock)->dep_map.flags |= LOCKDEP_FLAG_NOVALIDATE; } while (0) +#define lockdep_set_terminal_class(lock) \ + do { (lock)->dep_map.flags |= LOCKDEP_FLAG_TERMINAL; } while (0) /* * Compare locking classes @@ -420,7 +442,8 @@ static inline void lockdep_on(void) do { (void)(key); } while (0) #define lockdep_set_subclass(lock, sub) do { } while (0) -#define lockdep_set_novalidate_class(lock) do { } while (0) +#define lockdep_set_novalidate_class(lock) do { } while (0) +#define lockdep_set_terminal_class(lock) do { } while (0) /* * We don't define lockdep_match_class() and lockdep_match_key() for !LOCKDEP diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 493b567..40894c1 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -3020,6 +3020,11 @@ static inline int separate_irq_context(struct task_struct *curr, #endif /* defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_PROVE_LOCKING) */ +static int hlock_is_terminal(struct held_lock *hlock) +{ + return flags_is_terminal(hlock->flags); +} + /* * Mark a lock with a usage bit, and validate the state transition: */ @@ -3047,7 +3052,11 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this, hlock_class(this)->usage_mask |= new_mask; - if (!save_trace(hlock_class(this)->usage_traces + new_bit)) + /* + * We don't need to save the stack trace for terminal locks. + */ + if (!hlock_is_terminal(this) && + !save_trace(hlock_class(this)->usage_traces + new_bit)) return 0; switch (new_bit) { @@ -3215,9 +3224,6 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, if (unlikely(!debug_locks)) return 0; - if (!prove_locking || (lock->flags & LOCKDEP_FLAG_NOVALIDATE)) - check = 0; - if (subclass < NR_LOCKDEP_CACHING_CLASSES) class = lock->class_cache[subclass]; /* @@ -3229,6 +3235,9 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, return 0; } + if (!prove_locking || (class->flags & LOCKDEP_NOCHECK_FLAGS)) + check = 0; + debug_class_ops_inc(class); if (very_verbose(class)) { @@ -3255,6 +3264,13 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, if (depth) { hlock = curr->held_locks + depth - 1; + + /* + * Warn if the previous lock is a terminal lock. + */ + if (DEBUG_LOCKS_WARN_ON(hlock_is_terminal(hlock))) + return 0; + if (hlock->class_idx == class_idx && nest_lock) { if (hlock->references) { /* @@ -3294,6 +3310,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, hlock->holdtime_stamp = lockstat_clock(); #endif hlock->pin_count = pin_count; + hlock->flags = class->flags & LOCKDEP_HLOCK_FLAGS_MASK; if (check && !mark_irqflags(curr, hlock)) return 0; @@ -3636,6 +3653,14 @@ static int __lock_downgrade(struct lockdep_map *lock, unsigned long ip) if (i == depth-1) return 1; + /* + * Unlock of an outer lock is not allowed while holding a terminal + * lock. + */ + hlock = curr->held_locks + depth - 1; + if (DEBUG_LOCKS_WARN_ON(hlock_is_terminal(hlock))) + return 0; + if (reacquire_held_locks(curr, depth, i + 1)) return 0; @@ -3688,7 +3713,7 @@ static struct pin_cookie __lock_pin_lock(struct lockdep_map *lock) /* * Grab 16bits of randomness; this is sufficient to not * be guessable and still allows some pin nesting in - * our u32 pin_count. + * our 28-bit pin_count. */ cookie.val = 1 + (prandom_u32() >> 16); hlock->pin_count += cookie.val; @@ -4013,7 +4038,7 @@ void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie cookie) } static void -__lock_acquired(struct lockdep_map *lock, unsigned long ip) +__lock_acquired(struct lockdep_map *lock, unsigned long ip, unsigned long flags) { struct task_struct *curr = current; struct held_lock *hlock; @@ -4039,6 +4064,13 @@ void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie cookie) if (hlock->instance != lock) return; + /* + * A terminal lock should only be used with IRQ disabled. + */ + if (DEBUG_LOCKS_WARN_ON(hlock_is_terminal(hlock) && + !irqs_disabled_flags(flags))) + return; + cpu = smp_processor_id(); if (hlock->waittime_stamp) { now = lockstat_clock(); @@ -4093,9 +4125,10 @@ void lock_acquired(struct lockdep_map *lock, unsigned long ip) return; raw_local_irq_save(flags); + check_flags(flags); current->lockdep_recursion = 1; - __lock_acquired(lock, ip); + __lock_acquired(lock, ip, flags); current->lockdep_recursion = 0; raw_local_irq_restore(flags); } diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h index 88c847a..271fba8 100644 --- a/kernel/locking/lockdep_internals.h +++ b/kernel/locking/lockdep_internals.h @@ -212,3 +212,8 @@ static inline unsigned long debug_class_ops_read(struct lock_class *class) # define debug_atomic_read(ptr) 0 # define debug_class_ops_inc(ptr) do { } while (0) #endif + +static inline unsigned int flags_is_terminal(unsigned int flags) +{ + return flags & LOCKDEP_FLAG_TERMINAL; +} diff --git a/kernel/locking/lockdep_proc.c b/kernel/locking/lockdep_proc.c index 3d31f9b..37fbd41 100644 --- a/kernel/locking/lockdep_proc.c +++ b/kernel/locking/lockdep_proc.c @@ -78,7 +78,10 @@ static int l_show(struct seq_file *m, void *v) get_usage_chars(class, usage); seq_printf(m, " %s", usage); - seq_printf(m, ": "); + /* + * Print terminal lock status + */ + seq_printf(m, "%c: ", flags_is_terminal(class->flags) ? 'T' : ' '); print_name(m, class); seq_puts(m, "\n"); @@ -208,7 +211,7 @@ static int lockdep_stats_show(struct seq_file *m, void *v) nr_irq_read_safe = 0, nr_irq_read_unsafe = 0, nr_softirq_read_safe = 0, nr_softirq_read_unsafe = 0, nr_hardirq_read_safe = 0, nr_hardirq_read_unsafe = 0, - sum_forward_deps = 0; + nr_nocheck = 0, sum_forward_deps = 0; list_for_each_entry(class, &all_lock_classes, lock_entry) { @@ -240,6 +243,8 @@ static int lockdep_stats_show(struct seq_file *m, void *v) nr_hardirq_read_safe++; if (class->usage_mask & LOCKF_ENABLED_HARDIRQ_READ) nr_hardirq_read_unsafe++; + if (class->flags & LOCKDEP_NOCHECK_FLAGS) + nr_nocheck++; #ifdef CONFIG_PROVE_LOCKING sum_forward_deps += lockdep_count_forward_deps(class); @@ -318,6 +323,8 @@ static int lockdep_stats_show(struct seq_file *m, void *v) nr_uncategorized); seq_printf(m, " unused locks: %11lu\n", nr_unused); + seq_printf(m, " unchecked locks: %11lu\n", + nr_nocheck); seq_printf(m, " max locking depth: %11u\n", max_lockdep_depth); #ifdef CONFIG_PROVE_LOCKING -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v2 03/17] locking/lockdep: Add a new terminal lock type 2018-11-19 18:55 ` [PATCH v2 03/17] locking/lockdep: Add a new terminal lock type Waiman Long @ 2018-12-07 9:19 ` Peter Zijlstra 0 siblings, 0 replies; 40+ messages in thread From: Peter Zijlstra @ 2018-12-07 9:19 UTC (permalink / raw) To: Waiman Long Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On Mon, Nov 19, 2018 at 01:55:12PM -0500, Waiman Long wrote: > A terminal lock is a lock where further locking or unlocking on another > lock is not allowed. IOW, no forward dependency is permitted. > > With such a restriction in place, we don't really need to do a full > validation of the lock chain involving a terminal lock. Instead, > we just check if there is any further locking or unlocking on another > lock when a terminal lock is being held. > > Only spinlocks which are acquired by the _irq or _irqsave variants > or in IRQ disabled context should be classified as terminal locks. > > By adding this new lock type, we can save entries in lock_chains[], > chain_hlocks[], list_entries[] and stack_trace[]. By marking suitable > locks as terminal, we reduce the chance of overflowing those tables > allowing them to focus on locks that can have both forward and backward > dependencies. > > Four bits are stolen from the pin_count of the held_lock structure > to hold a new 4-bit flags field. The pin_count field is essentially a > summation of 16-bit random cookie values. Removing 4 bits still allow > the pin_count to accumulate up to almost 4096 of those cookie values. > > Signed-off-by: Waiman Long <longman@redhat.com> > --- > include/linux/lockdep.h | 29 ++++++++++++++++++++--- > kernel/locking/lockdep.c | 47 ++++++++++++++++++++++++++++++++------ > kernel/locking/lockdep_internals.h | 5 ++++ > kernel/locking/lockdep_proc.c | 11 +++++++-- > 4 files changed, 80 insertions(+), 12 deletions(-) > > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h > index 8fe5b4f..a146bca 100644 > --- a/include/linux/lockdep.h > +++ b/include/linux/lockdep.h > @@ -144,9 +144,20 @@ struct lock_class_stats { > > /* > * Lockdep class type flags > + * > * 1) LOCKDEP_FLAG_NOVALIDATE: No full validation, just simple checks. > + * 2) LOCKDEP_FLAG_TERMINAL: This is a terminal lock where lock/unlock on > + * another lock within its critical section is not allowed. > + * > + * Only the least significant 4 bits of the flags will be copied to the > + * held_lock structure. > */ > -#define LOCKDEP_FLAG_NOVALIDATE (1 << 0) > +#define LOCKDEP_FLAG_TERMINAL (1 << 0) > +#define LOCKDEP_FLAG_NOVALIDATE (1 << 4) Just leave the novalidate thing along, then you don't get to do silly things like this. Also; I have a pending patch (that I never quite finished) that tests lock nesting type (ie. raw_spinlock_t < spinlock_t < struct mutex) that wanted to use many of these same holes you took. I think we can easily fit the lot together in bitfields though, since you really don't need that many flags. I refreshed the below patch a number of months ago (no idea if it still applies, I think it was before Paul munged all of RCU). You need to kill printk and lift a few RT patches for the below to 'work' IIRC. --- Subject: lockdep: Introduce wait-type checks From: Peter Zijlstra <peterz@infradead.org> Date: Tue, 19 Nov 2013 21:45:48 +0100 This patch extends lockdep to validate lock wait-type context. The current wait-types are: LD_WAIT_FREE, /* wait free, rcu etc.. */ LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */ LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */ LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */ Where lockdep validates that the current lock (the one being acquired) fits in the current wait-context (as generated by the held stack). This ensures that we do not try and acquire mutices while holding spinlocks, do not attempt to acquire spinlocks while holding raw_spinlocks and so on. In other words, its a more fancy might_sleep(). Obviously RCU made the entire ordeal more complex than a simple single value test because we can acquire RCU in (pretty much) any context and while it presents a context to nested locks it is not the same as it got acquired in. Therefore we needed to split the wait_type into two values, one representing the acquire (outer) and one representing the nested context (inner). For most 'normal' locks these two are the same. [ To make static initialization easier we have the rule that: .outer == INV means .outer == .inner; because INV == 0. ] It further means that we need to find the minimal .inner of the held stack to compare against the outer of the new lock; because while 'normal' RCU presents a CONFIG type to nested locks, if it is taken while already holding a SPIN type it obviously doesn't relax the rules. Below is an example output; generated by the trivial example: raw_spin_lock(&foo); spin_lock(&bar); spin_unlock(&bar); raw_spin_unlock(&foo); The way to read it is to look at the new -{n,m} part in the lock description; -{3:3} for our attempted lock, and try and match that up to the held locks, which in this case is the one: -{2,2}. This tells us that the acquiring lock requires a more relaxed environment that presented by the lock stack. Currently only the normal locks and RCU are converted, the rest of the lockdep users defaults to .inner = INV which is ignored. More convertions can be done when desired. Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Requested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- include/linux/irqflags.h | 8 ++ include/linux/lockdep.h | 66 +++++++++++++++---- include/linux/mutex.h | 7 +- include/linux/rwlock_types.h | 6 + include/linux/rwsem.h | 6 + include/linux/sched.h | 1 include/linux/spinlock.h | 35 +++++++--- include/linux/spinlock_types.h | 24 +++++- kernel/irq/handle.c | 7 ++ kernel/locking/lockdep.c | 138 ++++++++++++++++++++++++++++++++++++---- kernel/locking/mutex-debug.c | 2 kernel/locking/rwsem-spinlock.c | 2 kernel/locking/rwsem-xadd.c | 2 kernel/locking/spinlock_debug.c | 6 - kernel/rcu/update.c | 24 +++++- 15 files changed, 280 insertions(+), 54 deletions(-) --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -37,7 +37,12 @@ # define trace_softirqs_enabled(p) ((p)->softirqs_enabled) # define trace_hardirq_enter() \ do { \ - current->hardirq_context++; \ + if (!current->hardirq_context++) \ + current->hardirq_threaded = 0; \ +} while (0) +# define trace_hardirq_threaded() \ +do { \ + current->hardirq_threaded = 1; \ } while (0) # define trace_hardirq_exit() \ do { \ @@ -59,6 +64,7 @@ do { \ # define trace_hardirqs_enabled(p) 0 # define trace_softirqs_enabled(p) 0 # define trace_hardirq_enter() do { } while (0) +# define trace_hardirq_threaded() do { } while (0) # define trace_hardirq_exit() do { } while (0) # define lockdep_softirq_enter() do { } while (0) # define lockdep_softirq_exit() do { } while (0) --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -107,6 +107,9 @@ struct lock_class { const char *name; int name_version; + short wait_type_inner; + short wait_type_outer; + #ifdef CONFIG_LOCK_STAT unsigned long contention_point[LOCKSTAT_POINTS]; unsigned long contending_point[LOCKSTAT_POINTS]; @@ -146,6 +149,17 @@ struct lock_class_stats lock_stats(struc void clear_lock_stats(struct lock_class *class); #endif +enum lockdep_wait_type { + LD_WAIT_INV = 0, /* not checked, catch all */ + + LD_WAIT_FREE, /* wait free, rcu etc.. */ + LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */ + LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */ + LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */ + + LD_WAIT_MAX, /* must be last */ +}; + /* * Map the lock object (the lock instance) to the lock-class object. * This is embedded into specific lock instances: @@ -154,6 +168,8 @@ struct lockdep_map { struct lock_class_key *key; struct lock_class *class_cache[NR_LOCKDEP_CACHING_CLASSES]; const char *name; + short wait_type_outer; /* can be taken in this context */ + short wait_type_inner; /* presents this context */ #ifdef CONFIG_LOCK_STAT int cpu; unsigned long ip; @@ -281,8 +297,21 @@ extern void lockdep_on(void); * to lockdep: */ -extern void lockdep_init_map(struct lockdep_map *lock, const char *name, - struct lock_class_key *key, int subclass); +extern void lockdep_init_map_waits(struct lockdep_map *lock, const char *name, + struct lock_class_key *key, int subclass, short inner, short outer); + +static inline void +lockdep_init_map_wait(struct lockdep_map *lock, const char *name, + struct lock_class_key *key, int subclass, short inner) +{ + lockdep_init_map_waits(lock, name, key, subclass, inner, LD_WAIT_INV); +} + +static inline void lockdep_init_map(struct lockdep_map *lock, const char *name, + struct lock_class_key *key, int subclass) +{ + lockdep_init_map_wait(lock, name, key, subclass, LD_WAIT_INV); +} /* * Reinitialize a lock key - for cases where there is special locking or @@ -290,18 +319,29 @@ extern void lockdep_init_map(struct lock * of dependencies wrong: they are either too broad (they need a class-split) * or they are too narrow (they suffer from a false class-split): */ -#define lockdep_set_class(lock, key) \ - lockdep_init_map(&(lock)->dep_map, #key, key, 0) -#define lockdep_set_class_and_name(lock, key, name) \ - lockdep_init_map(&(lock)->dep_map, name, key, 0) -#define lockdep_set_class_and_subclass(lock, key, sub) \ - lockdep_init_map(&(lock)->dep_map, #key, key, sub) -#define lockdep_set_subclass(lock, sub) \ - lockdep_init_map(&(lock)->dep_map, #lock, \ - (lock)->dep_map.key, sub) +#define lockdep_set_class(lock, key) \ + lockdep_init_map_waits(&(lock)->dep_map, #key, key, 0, \ + (lock)->dep_map.wait_type_inner, \ + (lock)->dep_map.wait_type_outer) + +#define lockdep_set_class_and_name(lock, key, name) \ + lockdep_init_map_waits(&(lock)->dep_map, name, key, 0, \ + (lock)->dep_map.wait_type_inner, \ + (lock)->dep_map.wait_type_outer) + +#define lockdep_set_class_and_subclass(lock, key, sub) \ + lockdep_init_map_waits(&(lock)->dep_map, #key, key, sub,\ + (lock)->dep_map.wait_type_inner, \ + (lock)->dep_map.wait_type_outer) + +#define lockdep_set_subclass(lock, sub) \ + lockdep_init_map_waits(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\ + (lock)->dep_map.wait_type_inner, \ + (lock)->dep_map.wait_type_outer) #define lockdep_set_novalidate_class(lock) \ lockdep_set_class_and_name(lock, &__lockdep_no_validate__, #lock) + /* * Compare locking classes */ @@ -407,6 +447,10 @@ static inline void lockdep_on(void) # define lock_set_class(l, n, k, s, i) do { } while (0) # define lock_set_subclass(l, s, i) do { } while (0) # define lockdep_init() do { } while (0) +# define lockdep_init_map_waits(lock, name, key, sub, inner, outer) \ + do { (void)(name); (void)(key); } while (0) +# define lockdep_init_map_wait(lock, name, key, sub, inner) \ + do { (void)(name); (void)(key); } while (0) # define lockdep_init_map(lock, name, key, sub) \ do { (void)(name); (void)(key); } while (0) # define lockdep_set_class(lock, key) do { (void)(key); } while (0) --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -119,8 +119,11 @@ do { \ } while (0) #ifdef CONFIG_DEBUG_LOCK_ALLOC -# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \ - , .dep_map = { .name = #lockname } +# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \ + , .dep_map = { \ + .name = #lockname, \ + .wait_type_inner = LD_WAIT_SLEEP, \ + } #else # define __DEP_MAP_MUTEX_INITIALIZER(lockname) #endif --- a/include/linux/rwlock_types.h +++ b/include/linux/rwlock_types.h @@ -22,7 +22,11 @@ typedef struct { #define RWLOCK_MAGIC 0xdeaf1eed #ifdef CONFIG_DEBUG_LOCK_ALLOC -# define RW_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname } +# define RW_DEP_MAP_INIT(lockname) \ + .dep_map = { \ + .name = #lockname, \ + .wait_type_inner = LD_WAIT_CONFIG, \ + } #else # define RW_DEP_MAP_INIT(lockname) #endif --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -72,7 +72,11 @@ static inline int rwsem_is_locked(struct /* Common initializer macros and functions */ #ifdef CONFIG_DEBUG_LOCK_ALLOC -# define __RWSEM_DEP_MAP_INIT(lockname) , .dep_map = { .name = #lockname } +# define __RWSEM_DEP_MAP_INIT(lockname) \ + , .dep_map = { \ + .name = #lockname, \ + .wait_type_inner = LD_WAIT_SLEEP, \ + } #else # define __RWSEM_DEP_MAP_INIT(lockname) #endif --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -920,6 +920,7 @@ struct task_struct { #ifdef CONFIG_TRACE_IRQFLAGS unsigned int irq_events; + unsigned int hardirq_threaded; unsigned long hardirq_enable_ip; unsigned long hardirq_disable_ip; unsigned int hardirq_enable_event; --- a/include/linux/spinlock.h +++ b/include/linux/spinlock.h @@ -92,12 +92,13 @@ #ifdef CONFIG_DEBUG_SPINLOCK extern void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name, - struct lock_class_key *key); -# define raw_spin_lock_init(lock) \ -do { \ - static struct lock_class_key __key; \ - \ - __raw_spin_lock_init((lock), #lock, &__key); \ + struct lock_class_key *key, short inner); + +# define raw_spin_lock_init(lock) \ +do { \ + static struct lock_class_key __key; \ + \ + __raw_spin_lock_init((lock), #lock, &__key, LD_WAIT_SPIN); \ } while (0) #else @@ -318,12 +319,26 @@ static __always_inline raw_spinlock_t *s return &lock->rlock; } -#define spin_lock_init(_lock) \ -do { \ - spinlock_check(_lock); \ - raw_spin_lock_init(&(_lock)->rlock); \ +#ifdef CONFIG_DEBUG_SPINLOCK + +# define spin_lock_init(lock) \ +do { \ + static struct lock_class_key __key; \ + \ + __raw_spin_lock_init(spinlock_check(lock), \ + #lock, &__key, LD_WAIT_CONFIG); \ +} while (0) + +#else + +# define spin_lock_init(_lock) \ +do { \ + spinlock_check(_lock); \ + *(lock) = __SPIN_LOCK_UNLOCKED(lock); \ } while (0) +#endif + static __always_inline void spin_lock(spinlock_t *lock) { raw_spin_lock(&lock->rlock); --- a/include/linux/spinlock_types.h +++ b/include/linux/spinlock_types.h @@ -33,8 +33,18 @@ typedef struct raw_spinlock { #define SPINLOCK_OWNER_INIT ((void *)-1L) #ifdef CONFIG_DEBUG_LOCK_ALLOC -# define SPIN_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname } +# define RAW_SPIN_DEP_MAP_INIT(lockname) \ + .dep_map = { \ + .name = #lockname, \ + .wait_type_inner = LD_WAIT_SPIN, \ + } +# define SPIN_DEP_MAP_INIT(lockname) \ + .dep_map = { \ + .name = #lockname, \ + .wait_type_inner = LD_WAIT_CONFIG, \ + } #else +# define RAW_SPIN_DEP_MAP_INIT(lockname) # define SPIN_DEP_MAP_INIT(lockname) #endif @@ -51,7 +61,7 @@ typedef struct raw_spinlock { { \ .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED, \ SPIN_DEBUG_INIT(lockname) \ - SPIN_DEP_MAP_INIT(lockname) } + RAW_SPIN_DEP_MAP_INIT(lockname) } #define __RAW_SPIN_LOCK_UNLOCKED(lockname) \ (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname) @@ -72,11 +82,17 @@ typedef struct spinlock { }; } spinlock_t; +#define ___SPIN_LOCK_INITIALIZER(lockname) \ + { \ + .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED, \ + SPIN_DEBUG_INIT(lockname) \ + SPIN_DEP_MAP_INIT(lockname) } + #define __SPIN_LOCK_INITIALIZER(lockname) \ - { { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } } + { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } } #define __SPIN_LOCK_UNLOCKED(lockname) \ - (spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname) + (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname) #define DEFINE_SPINLOCK(x) spinlock_t x = __SPIN_LOCK_UNLOCKED(x) --- a/kernel/irq/handle.c +++ b/kernel/irq/handle.c @@ -145,6 +145,13 @@ irqreturn_t __handle_irq_event_percpu(st for_each_action_of_desc(desc, action) { irqreturn_t res; + /* + * If this IRQ would be threaded under force_irqthreads, mark it so. + */ + if (irq_settings_can_thread(desc) && + !(action->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT))) + trace_hardirq_threaded(); + trace_irq_handler_entry(irq, action); res = action->handler(irq, action->dev_id); trace_irq_handler_exit(irq, action, res); --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -519,7 +519,9 @@ static void print_lock_name(struct lock_ printk(KERN_CONT " ("); __print_lock_name(class); - printk(KERN_CONT "){%s}", usage); + printk("){%s}-{%hd:%hd}", usage, + class->wait_type_outer ?: class->wait_type_inner, + class->wait_type_inner); } static void print_lockdep_cache(struct lockdep_map *lock) @@ -793,6 +795,8 @@ register_lock_class(struct lockdep_map * INIT_LIST_HEAD(&class->locks_before); INIT_LIST_HEAD(&class->locks_after); class->name_version = count_matching_names(class); + class->wait_type_inner = lock->wait_type_inner; + class->wait_type_outer = lock->wait_type_outer; /* * We use RCU's safe list-add method to make * parallel walking of the hash-list safe: @@ -3156,8 +3160,9 @@ static int mark_lock(struct task_struct /* * Initialize a lock instance's lock-class mapping info: */ -static void __lockdep_init_map(struct lockdep_map *lock, const char *name, - struct lock_class_key *key, int subclass) +void lockdep_init_map_waits(struct lockdep_map *lock, const char *name, + struct lock_class_key *key, int subclass, + short inner, short outer) { int i; @@ -3178,6 +3183,9 @@ static void __lockdep_init_map(struct lo lock->name = name; + lock->wait_type_outer = outer; + lock->wait_type_inner = inner; + /* * No key, no joy, we need to hash something. */ @@ -3212,13 +3220,7 @@ static void __lockdep_init_map(struct lo raw_local_irq_restore(flags); } } - -void lockdep_init_map(struct lockdep_map *lock, const char *name, - struct lock_class_key *key, int subclass) -{ - __lockdep_init_map(lock, name, key, subclass); -} -EXPORT_SYMBOL_GPL(lockdep_init_map); +EXPORT_SYMBOL_GPL(lockdep_init_map_waits); struct lock_class_key __lockdep_no_validate__; EXPORT_SYMBOL_GPL(__lockdep_no_validate__); @@ -3257,6 +3259,113 @@ print_lock_nested_lock_not_held(struct t return 0; } +static int +print_lock_invalid_wait_context(struct task_struct *curr, + struct held_lock *hlock) +{ + if (!debug_locks_off()) + return 0; + if (debug_locks_silent) + return 0; + + printk("\n"); + printk("=============================\n"); + printk("[ BUG: Invalid wait context ]\n"); + print_kernel_ident(); + printk("-----------------------------\n"); + + printk("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr)); + print_lock(hlock); + + printk("\nother info that might help us debug this:\n"); + lockdep_print_held_locks(curr); + + printk("\nstack backtrace:\n"); + dump_stack(); + + return 0; +} + +/* + * Verify the wait_type context. + * + * This check validates we takes locks in the right wait-type order; that is it + * ensures that we do not take mutexes inside spinlocks and do not attempt to + * acquire spinlocks inside raw_spinlocks and the sort. + * + * The entire thing is slightly more complex because of RCU, RCU is a lock that + * can be taken from (pretty much) any context but also has constraints. + * However when taken in a stricter environment the RCU lock does not loosen + * the constraints. + * + * Therefore we must look for the strictest environment in the lock stack and + * compare that to the lock we're trying to acquire. + */ +static int check_wait_context(struct task_struct *curr, struct held_lock *next) +{ + short next_inner = hlock_class(next)->wait_type_inner; + short next_outer = hlock_class(next)->wait_type_outer; + short curr_inner; + int depth; + + if (!curr->lockdep_depth || !next_inner || next->trylock) + return 0; + + if (!next_outer) + next_outer = next_inner; + + /* + * Find start of current irq_context.. + */ + for (depth = curr->lockdep_depth - 1; depth >= 0; depth--) { + struct held_lock *prev = curr->held_locks + depth; + if (prev->irq_context != next->irq_context) + break; + } + depth++; + + /* + * Set appropriate wait type for the context; for IRQs we have to take + * into account force_irqthread as that is implied by PREEMPT_RT. + */ + if (curr->hardirq_context) { + /* + * Check if force_irqthreads will run us threaded. + */ + if (curr->hardirq_threaded) + curr_inner = LD_WAIT_CONFIG; + else + curr_inner = LD_WAIT_SPIN; + } else if (curr->softirq_context) { + /* + * Softirqs are always threaded. + */ + curr_inner = LD_WAIT_CONFIG; + } else { + curr_inner = LD_WAIT_MAX; + } + + for (; depth < curr->lockdep_depth; depth++) { + struct held_lock *prev = curr->held_locks + depth; + short prev_inner = hlock_class(prev)->wait_type_inner; + + if (prev_inner) { + /* + * We can have a bigger inner than a previous one + * when outer is smaller than inner, as with RCU. + * + * Also due to trylocks. + */ + curr_inner = min(curr_inner, prev_inner); + } + } + + if (next_outer > curr_inner) + return print_lock_invalid_wait_context(curr, next); + + return 0; +} + static int __lock_is_held(const struct lockdep_map *lock, int read); /* @@ -3323,7 +3432,7 @@ static int __lock_acquire(struct lockdep class_idx = class - lock_classes + 1; - if (depth) { + if (depth) { /* we're holding locks */ hlock = curr->held_locks + depth - 1; if (hlock->class_idx == class_idx && nest_lock) { if (hlock->references) { @@ -3365,6 +3474,9 @@ static int __lock_acquire(struct lockdep #endif hlock->pin_count = pin_count; + if (check_wait_context(curr, hlock)) + return 0; + if (check && !mark_irqflags(curr, hlock)) return 0; @@ -3579,7 +3691,9 @@ __lock_set_class(struct lockdep_map *loc if (!hlock) return print_unlock_imbalance_bug(curr, lock, ip); - lockdep_init_map(lock, name, key, 0); + lockdep_init_map_waits(lock, name, key, 0, + lock->wait_type_inner, + lock->wait_type_outer); class = register_lock_class(lock, subclass, 0); hlock->class_idx = class - lock_classes + 1; --- a/kernel/locking/mutex-debug.c +++ b/kernel/locking/mutex-debug.c @@ -85,7 +85,7 @@ void debug_mutex_init(struct mutex *lock * Make sure we are not reinitializing a held lock: */ debug_check_no_locks_freed((void *)lock, sizeof(*lock)); - lockdep_init_map(&lock->dep_map, name, key, 0); + lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_SLEEP); #endif lock->magic = lock; } --- a/kernel/locking/rwsem-spinlock.c +++ b/kernel/locking/rwsem-spinlock.c @@ -46,7 +46,7 @@ void __init_rwsem(struct rw_semaphore *s * Make sure we are not reinitializing a held semaphore: */ debug_check_no_locks_freed((void *)sem, sizeof(*sem)); - lockdep_init_map(&sem->dep_map, name, key, 0); + lockdep_init_map_wait(&sem->dep_map, name, key, 0, LD_WAIT_SLEEP); #endif sem->count = 0; raw_spin_lock_init(&sem->wait_lock); --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -81,7 +81,7 @@ void __init_rwsem(struct rw_semaphore *s * Make sure we are not reinitializing a held semaphore: */ debug_check_no_locks_freed((void *)sem, sizeof(*sem)); - lockdep_init_map(&sem->dep_map, name, key, 0); + lockdep_init_map_wait(&sem->dep_map, name, key, 0, LD_WAIT_SLEEP); #endif atomic_long_set(&sem->count, RWSEM_UNLOCKED_VALUE); raw_spin_lock_init(&sem->wait_lock); --- a/kernel/locking/spinlock_debug.c +++ b/kernel/locking/spinlock_debug.c @@ -14,14 +14,14 @@ #include <linux/export.h> void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name, - struct lock_class_key *key) + struct lock_class_key *key, short inner) { #ifdef CONFIG_DEBUG_LOCK_ALLOC /* * Make sure we are not reinitializing a held lock: */ debug_check_no_locks_freed((void *)lock, sizeof(*lock)); - lockdep_init_map(&lock->dep_map, name, key, 0); + lockdep_init_map_wait(&lock->dep_map, name, key, 0, inner); #endif lock->raw_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; lock->magic = SPINLOCK_MAGIC; @@ -39,7 +39,7 @@ void __rwlock_init(rwlock_t *lock, const * Make sure we are not reinitializing a held lock: */ debug_check_no_locks_freed((void *)lock, sizeof(*lock)); - lockdep_init_map(&lock->dep_map, name, key, 0); + lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_CONFIG); #endif lock->raw_lock = (arch_rwlock_t) __ARCH_RW_LOCK_UNLOCKED; lock->magic = RWLOCK_MAGIC; --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -228,18 +228,30 @@ core_initcall(rcu_set_runtime_mode); #ifdef CONFIG_DEBUG_LOCK_ALLOC static struct lock_class_key rcu_lock_key; -struct lockdep_map rcu_lock_map = - STATIC_LOCKDEP_MAP_INIT("rcu_read_lock", &rcu_lock_key); +struct lockdep_map rcu_lock_map = { + .name = "rcu_read_lock", + .key = &rcu_lock_key, + .wait_type_outer = LD_WAIT_FREE, + .wait_type_inner = LD_WAIT_CONFIG, /* XXX PREEMPT_RCU ? */ +}; EXPORT_SYMBOL_GPL(rcu_lock_map); static struct lock_class_key rcu_bh_lock_key; -struct lockdep_map rcu_bh_lock_map = - STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_bh", &rcu_bh_lock_key); +struct lockdep_map rcu_bh_lock_map = { + .name = "rcu_read_lock_bh", + .key = &rcu_bh_lock_key, + .wait_type_outer = LD_WAIT_FREE, + .wait_type_inner = LD_WAIT_CONFIG, /* PREEMPT_LOCK also makes BH preemptible */ +}; EXPORT_SYMBOL_GPL(rcu_bh_lock_map); static struct lock_class_key rcu_sched_lock_key; -struct lockdep_map rcu_sched_lock_map = - STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_sched", &rcu_sched_lock_key); +struct lockdep_map rcu_sched_lock_map = { + .name = "rcu_read_lock_sched", + .key = &rcu_sched_lock_key, + .wait_type_outer = LD_WAIT_FREE, + .wait_type_inner = LD_WAIT_SPIN, +}; EXPORT_SYMBOL_GPL(rcu_sched_lock_map); static struct lock_class_key rcu_callback_key; ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v2 04/17] locking/lockdep: Add DEFINE_TERMINAL_SPINLOCK() and related macros 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long ` (2 preceding siblings ...) 2018-11-19 18:55 ` [PATCH v2 03/17] locking/lockdep: Add a new terminal lock type Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-11-19 18:55 ` [PATCH v2 05/17] printk: Mark logbuf_lock & console_owner_lock as terminal locks Waiman Long ` (12 subsequent siblings) 16 siblings, 0 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long Add new DEFINE_RAW_TERMINAL_SPINLOCK() and DEFINE_TERMINAL_SPINLOCK() macro to define a raw terminal spinlock and a terminal spinlock. Signed-off-by: Waiman Long <longman@redhat.com> --- include/linux/spinlock_types.h | 34 +++++++++++++++++++++++----------- kernel/printk/printk_safe.c | 2 +- 2 files changed, 24 insertions(+), 12 deletions(-) diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h index 24b4e6f..6a8086e 100644 --- a/include/linux/spinlock_types.h +++ b/include/linux/spinlock_types.h @@ -33,9 +33,10 @@ #define SPINLOCK_OWNER_INIT ((void *)-1L) #ifdef CONFIG_DEBUG_LOCK_ALLOC -# define SPIN_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname } +# define SPIN_DEP_MAP_INIT(lockname, f) .dep_map = { .name = #lockname, \ + .flags = f } #else -# define SPIN_DEP_MAP_INIT(lockname) +# define SPIN_DEP_MAP_INIT(lockname, f) #endif #ifdef CONFIG_DEBUG_SPINLOCK @@ -47,16 +48,22 @@ # define SPIN_DEBUG_INIT(lockname) #endif -#define __RAW_SPIN_LOCK_INITIALIZER(lockname) \ - { \ - .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED, \ - SPIN_DEBUG_INIT(lockname) \ - SPIN_DEP_MAP_INIT(lockname) } +#define __RAW_SPIN_LOCK_INITIALIZER(lockname, f) \ + { \ + .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED, \ + SPIN_DEBUG_INIT(lockname) \ + SPIN_DEP_MAP_INIT(lockname, f) } #define __RAW_SPIN_LOCK_UNLOCKED(lockname) \ - (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname) + (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname, 0) + +#define __RAW_TERMINAL_SPIN_LOCK_UNLOCKED(lockname) \ + (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname, \ + LOCKDEP_FLAG_TERMINAL) #define DEFINE_RAW_SPINLOCK(x) raw_spinlock_t x = __RAW_SPIN_LOCK_UNLOCKED(x) +#define DEFINE_RAW_TERMINAL_SPINLOCK(x) \ + raw_spinlock_t x = __RAW_TERMINAL_SPIN_LOCK_UNLOCKED(x) typedef struct spinlock { union { @@ -72,13 +79,18 @@ }; } spinlock_t; -#define __SPIN_LOCK_INITIALIZER(lockname) \ - { { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } } +#define __SPIN_LOCK_INITIALIZER(lockname, f) \ + { { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname, f) } } #define __SPIN_LOCK_UNLOCKED(lockname) \ - (spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname) + (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname, 0) + +#define __TERMINAL_SPIN_LOCK_UNLOCKED(lockname) \ + (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname, LOCKDEP_FLAG_TERMINAL) #define DEFINE_SPINLOCK(x) spinlock_t x = __SPIN_LOCK_UNLOCKED(x) +#define DEFINE_TERMINAL_SPINLOCK(x) \ + spinlock_t x = __TERMINAL_SPIN_LOCK_UNLOCKED(x) #include <linux/rwlock_types.h> diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 0913b4d..8ff1033 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -192,7 +192,7 @@ static void report_message_lost(struct printk_safe_seq_buf *s) static void __printk_safe_flush(struct irq_work *work) { static raw_spinlock_t read_lock = - __RAW_SPIN_LOCK_INITIALIZER(read_lock); + __RAW_SPIN_LOCK_UNLOCKED(read_lock); struct printk_safe_seq_buf *s = container_of(work, struct printk_safe_seq_buf, work); unsigned long flags; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v2 05/17] printk: Mark logbuf_lock & console_owner_lock as terminal locks 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long ` (3 preceding siblings ...) 2018-11-19 18:55 ` [PATCH v2 04/17] locking/lockdep: Add DEFINE_TERMINAL_SPINLOCK() and related macros Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-11-19 18:55 ` [PATCH v2 06/17] debugobjects: Mark pool_lock as a terminal lock Waiman Long ` (11 subsequent siblings) 16 siblings, 0 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long By marking logbuf_lock and console_owner_lock as terminal locks, it reduces the performance overhead when those locks are used with lockdep enabled. Signed-off-by: Waiman Long <longman@redhat.com> --- kernel/printk/printk.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 1b2a029..bdbbe31 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -367,7 +367,7 @@ __packed __aligned(4) * within the scheduler's rq lock. It must be released before calling * console_unlock() or anything else that might wake up a process. */ -DEFINE_RAW_SPINLOCK(logbuf_lock); +DEFINE_RAW_TERMINAL_SPINLOCK(logbuf_lock); /* * Helper macros to lock/unlock logbuf_lock and switch between @@ -1568,7 +1568,7 @@ int do_syslog(int type, char __user *buf, int len, int source) }; #endif -static DEFINE_RAW_SPINLOCK(console_owner_lock); +static DEFINE_RAW_TERMINAL_SPINLOCK(console_owner_lock); static struct task_struct *console_owner; static bool console_waiter; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v2 06/17] debugobjects: Mark pool_lock as a terminal lock 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long ` (4 preceding siblings ...) 2018-11-19 18:55 ` [PATCH v2 05/17] printk: Mark logbuf_lock & console_owner_lock as terminal locks Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-11-19 18:55 ` [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections Waiman Long ` (10 subsequent siblings) 16 siblings, 0 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long By marking the internal pool_lock as a terminal lock, lockdep will be able to skip full validation to improve locking performance. Signed-off-by: Waiman Long <longman@redhat.com> --- lib/debugobjects.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/debugobjects.c b/lib/debugobjects.c index 70935ed..403dd95 100644 --- a/lib/debugobjects.c +++ b/lib/debugobjects.c @@ -39,7 +39,7 @@ struct debug_bucket { static struct debug_obj obj_static_pool[ODEBUG_POOL_SIZE] __initdata; -static DEFINE_RAW_SPINLOCK(pool_lock); +static DEFINE_RAW_TERMINAL_SPINLOCK(pool_lock); static HLIST_HEAD(obj_pool); static HLIST_HEAD(obj_to_free); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long ` (5 preceding siblings ...) 2018-11-19 18:55 ` [PATCH v2 06/17] debugobjects: Mark pool_lock as a terminal lock Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-11-21 16:49 ` Waiman Long 2018-12-07 9:21 ` Peter Zijlstra 2018-11-19 18:55 ` [PATCH v2 08/17] locking/lockdep: Add support for nestable terminal locks Waiman Long ` (9 subsequent siblings) 16 siblings, 2 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long The db->lock is a raw spinlock and so the lock hold time is supposed to be short. This will not be the case when printk() is being involved in some of the critical sections. In order to avoid the long hold time, in case some messages need to be printed, the debug_object_is_on_stack() and debug_print_object() calls are now moved out of those critical sections. Signed-off-by: Waiman Long <longman@redhat.com> --- lib/debugobjects.c | 61 +++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 42 insertions(+), 19 deletions(-) diff --git a/lib/debugobjects.c b/lib/debugobjects.c index 403dd95..4216d3d 100644 --- a/lib/debugobjects.c +++ b/lib/debugobjects.c @@ -376,6 +376,8 @@ static void debug_object_is_on_stack(void *addr, int onstack) struct debug_bucket *db; struct debug_obj *obj; unsigned long flags; + bool debug_printobj = false; + bool debug_chkstack = false; fill_pool(); @@ -392,7 +394,7 @@ static void debug_object_is_on_stack(void *addr, int onstack) debug_objects_oom(); return; } - debug_object_is_on_stack(addr, onstack); + debug_chkstack = true; } switch (obj->state) { @@ -403,20 +405,25 @@ static void debug_object_is_on_stack(void *addr, int onstack) break; case ODEBUG_STATE_ACTIVE: - debug_print_object(obj, "init"); state = obj->state; raw_spin_unlock_irqrestore(&db->lock, flags); + debug_print_object(obj, "init"); debug_object_fixup(descr->fixup_init, addr, state); return; case ODEBUG_STATE_DESTROYED: - debug_print_object(obj, "init"); + debug_printobj = true; break; default: break; } raw_spin_unlock_irqrestore(&db->lock, flags); + if (debug_chkstack) + debug_object_is_on_stack(addr, onstack); + if (debug_printobj) + debug_print_object(obj, "init"); + } /** @@ -474,6 +481,8 @@ int debug_object_activate(void *addr, struct debug_obj_descr *descr) obj = lookup_object(addr, db); if (obj) { + bool debug_printobj = false; + switch (obj->state) { case ODEBUG_STATE_INIT: case ODEBUG_STATE_INACTIVE: @@ -482,14 +491,14 @@ int debug_object_activate(void *addr, struct debug_obj_descr *descr) break; case ODEBUG_STATE_ACTIVE: - debug_print_object(obj, "activate"); state = obj->state; raw_spin_unlock_irqrestore(&db->lock, flags); + debug_print_object(obj, "activate"); ret = debug_object_fixup(descr->fixup_activate, addr, state); return ret ? 0 : -EINVAL; case ODEBUG_STATE_DESTROYED: - debug_print_object(obj, "activate"); + debug_printobj = true; ret = -EINVAL; break; default: @@ -497,10 +506,13 @@ int debug_object_activate(void *addr, struct debug_obj_descr *descr) break; } raw_spin_unlock_irqrestore(&db->lock, flags); + if (debug_printobj) + debug_print_object(obj, "activate"); return ret; } raw_spin_unlock_irqrestore(&db->lock, flags); + /* * We are here when a static object is activated. We * let the type specific code confirm whether this is @@ -532,6 +544,7 @@ void debug_object_deactivate(void *addr, struct debug_obj_descr *descr) struct debug_bucket *db; struct debug_obj *obj; unsigned long flags; + bool debug_printobj = false; if (!debug_objects_enabled) return; @@ -549,24 +562,27 @@ void debug_object_deactivate(void *addr, struct debug_obj_descr *descr) if (!obj->astate) obj->state = ODEBUG_STATE_INACTIVE; else - debug_print_object(obj, "deactivate"); + debug_printobj = true; break; case ODEBUG_STATE_DESTROYED: - debug_print_object(obj, "deactivate"); + debug_printobj = true; break; default: break; } - } else { + } + + raw_spin_unlock_irqrestore(&db->lock, flags); + if (!obj) { struct debug_obj o = { .object = addr, .state = ODEBUG_STATE_NOTAVAILABLE, .descr = descr }; debug_print_object(&o, "deactivate"); + } else if (debug_printobj) { + debug_print_object(obj, "deactivate"); } - - raw_spin_unlock_irqrestore(&db->lock, flags); } EXPORT_SYMBOL_GPL(debug_object_deactivate); @@ -581,6 +597,7 @@ void debug_object_destroy(void *addr, struct debug_obj_descr *descr) struct debug_bucket *db; struct debug_obj *obj; unsigned long flags; + bool debug_printobj = false; if (!debug_objects_enabled) return; @@ -600,20 +617,22 @@ void debug_object_destroy(void *addr, struct debug_obj_descr *descr) obj->state = ODEBUG_STATE_DESTROYED; break; case ODEBUG_STATE_ACTIVE: - debug_print_object(obj, "destroy"); state = obj->state; raw_spin_unlock_irqrestore(&db->lock, flags); + debug_print_object(obj, "destroy"); debug_object_fixup(descr->fixup_destroy, addr, state); return; case ODEBUG_STATE_DESTROYED: - debug_print_object(obj, "destroy"); + debug_printobj = true; break; default: break; } out_unlock: raw_spin_unlock_irqrestore(&db->lock, flags); + if (debug_printobj) + debug_print_object(obj, "destroy"); } EXPORT_SYMBOL_GPL(debug_object_destroy); @@ -642,9 +661,9 @@ void debug_object_free(void *addr, struct debug_obj_descr *descr) switch (obj->state) { case ODEBUG_STATE_ACTIVE: - debug_print_object(obj, "free"); state = obj->state; raw_spin_unlock_irqrestore(&db->lock, flags); + debug_print_object(obj, "free"); debug_object_fixup(descr->fixup_free, addr, state); return; default: @@ -717,6 +736,7 @@ void debug_object_assert_init(void *addr, struct debug_obj_descr *descr) struct debug_bucket *db; struct debug_obj *obj; unsigned long flags; + bool debug_printobj = false; if (!debug_objects_enabled) return; @@ -732,22 +752,25 @@ void debug_object_assert_init(void *addr, struct debug_obj_descr *descr) if (obj->astate == expect) obj->astate = next; else - debug_print_object(obj, "active_state"); + debug_printobj = true; break; default: - debug_print_object(obj, "active_state"); + debug_printobj = true; break; } - } else { + } + + raw_spin_unlock_irqrestore(&db->lock, flags); + if (!obj) { struct debug_obj o = { .object = addr, .state = ODEBUG_STATE_NOTAVAILABLE, .descr = descr }; debug_print_object(&o, "active_state"); + } else if (debug_printobj) { + debug_print_object(obj, "active_state"); } - - raw_spin_unlock_irqrestore(&db->lock, flags); } EXPORT_SYMBOL_GPL(debug_object_active_state); @@ -783,10 +806,10 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size) switch (obj->state) { case ODEBUG_STATE_ACTIVE: - debug_print_object(obj, "free"); descr = obj->descr; state = obj->state; raw_spin_unlock_irqrestore(&db->lock, flags); + debug_print_object(obj, "free"); debug_object_fixup(descr->fixup_free, (void *) oaddr, state); goto repeat; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-19 18:55 ` [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections Waiman Long @ 2018-11-21 16:49 ` Waiman Long 2018-11-22 2:04 ` Sergey Senozhatsky 2018-12-07 9:21 ` Peter Zijlstra 1 sibling, 1 reply; 40+ messages in thread From: Waiman Long @ 2018-11-21 16:49 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On 11/19/2018 01:55 PM, Waiman Long wrote: > The db->lock is a raw spinlock and so the lock hold time is supposed > to be short. This will not be the case when printk() is being involved > in some of the critical sections. In order to avoid the long hold time, > in case some messages need to be printed, the debug_object_is_on_stack() > and debug_print_object() calls are now moved out of those critical > sections. > > Signed-off-by: Waiman Long <longman@redhat.com> > --- > lib/debugobjects.c | 61 +++++++++++++++++++++++++++++++++++++----------------- > 1 file changed, 42 insertions(+), 19 deletions(-) > > diff --git a/lib/debugobjects.c b/lib/debugobjects.c > index 403dd95..4216d3d 100644 > --- a/lib/debugobjects.c > +++ b/lib/debugobjects.c > @@ -376,6 +376,8 @@ static void debug_object_is_on_stack(void *addr, int onstack) > struct debug_bucket *db; > struct debug_obj *obj; > unsigned long flags; > + bool debug_printobj = false; > + bool debug_chkstack = false; > > fill_pool(); > > @@ -392,7 +394,7 @@ static void debug_object_is_on_stack(void *addr, int onstack) > debug_objects_oom(); > return; > } > - debug_object_is_on_stack(addr, onstack); > + debug_chkstack = true; > } > > switch (obj->state) { > @@ -403,20 +405,25 @@ static void debug_object_is_on_stack(void *addr, int onstack) > break; > > case ODEBUG_STATE_ACTIVE: > - debug_print_object(obj, "init"); > state = obj->state; > raw_spin_unlock_irqrestore(&db->lock, flags); > + debug_print_object(obj, "init"); > debug_object_fixup(descr->fixup_init, addr, state); > return; > > case ODEBUG_STATE_DESTROYED: > - debug_print_object(obj, "init"); > + debug_printobj = true; > break; > default: > break; > } > > raw_spin_unlock_irqrestore(&db->lock, flags); > + if (debug_chkstack) > + debug_object_is_on_stack(addr, onstack); > + if (debug_printobj) > + debug_print_object(obj, "init"); > + > } > > /** > @@ -474,6 +481,8 @@ int debug_object_activate(void *addr, struct debug_obj_descr *descr) > > obj = lookup_object(addr, db); > if (obj) { > + bool debug_printobj = false; > + > switch (obj->state) { > case ODEBUG_STATE_INIT: > case ODEBUG_STATE_INACTIVE: > @@ -482,14 +491,14 @@ int debug_object_activate(void *addr, struct debug_obj_descr *descr) > break; > > case ODEBUG_STATE_ACTIVE: > - debug_print_object(obj, "activate"); > state = obj->state; > raw_spin_unlock_irqrestore(&db->lock, flags); > + debug_print_object(obj, "activate"); > ret = debug_object_fixup(descr->fixup_activate, addr, state); > return ret ? 0 : -EINVAL; > > case ODEBUG_STATE_DESTROYED: > - debug_print_object(obj, "activate"); > + debug_printobj = true; > ret = -EINVAL; > break; > default: > @@ -497,10 +506,13 @@ int debug_object_activate(void *addr, struct debug_obj_descr *descr) > break; > } > raw_spin_unlock_irqrestore(&db->lock, flags); > + if (debug_printobj) > + debug_print_object(obj, "activate"); > return ret; > } > > raw_spin_unlock_irqrestore(&db->lock, flags); > + > /* > * We are here when a static object is activated. We > * let the type specific code confirm whether this is > @@ -532,6 +544,7 @@ void debug_object_deactivate(void *addr, struct debug_obj_descr *descr) > struct debug_bucket *db; > struct debug_obj *obj; > unsigned long flags; > + bool debug_printobj = false; > > if (!debug_objects_enabled) > return; > @@ -549,24 +562,27 @@ void debug_object_deactivate(void *addr, struct debug_obj_descr *descr) > if (!obj->astate) > obj->state = ODEBUG_STATE_INACTIVE; > else > - debug_print_object(obj, "deactivate"); > + debug_printobj = true; > break; > > case ODEBUG_STATE_DESTROYED: > - debug_print_object(obj, "deactivate"); > + debug_printobj = true; > break; > default: > break; > } > - } else { > + } > + > + raw_spin_unlock_irqrestore(&db->lock, flags); > + if (!obj) { > struct debug_obj o = { .object = addr, > .state = ODEBUG_STATE_NOTAVAILABLE, > .descr = descr }; > > debug_print_object(&o, "deactivate"); > + } else if (debug_printobj) { > + debug_print_object(obj, "deactivate"); > } > - > - raw_spin_unlock_irqrestore(&db->lock, flags); > } > EXPORT_SYMBOL_GPL(debug_object_deactivate); > > @@ -581,6 +597,7 @@ void debug_object_destroy(void *addr, struct debug_obj_descr *descr) > struct debug_bucket *db; > struct debug_obj *obj; > unsigned long flags; > + bool debug_printobj = false; > > if (!debug_objects_enabled) > return; > @@ -600,20 +617,22 @@ void debug_object_destroy(void *addr, struct debug_obj_descr *descr) > obj->state = ODEBUG_STATE_DESTROYED; > break; > case ODEBUG_STATE_ACTIVE: > - debug_print_object(obj, "destroy"); > state = obj->state; > raw_spin_unlock_irqrestore(&db->lock, flags); > + debug_print_object(obj, "destroy"); > debug_object_fixup(descr->fixup_destroy, addr, state); > return; > > case ODEBUG_STATE_DESTROYED: > - debug_print_object(obj, "destroy"); > + debug_printobj = true; > break; > default: > break; > } > out_unlock: > raw_spin_unlock_irqrestore(&db->lock, flags); > + if (debug_printobj) > + debug_print_object(obj, "destroy"); > } > EXPORT_SYMBOL_GPL(debug_object_destroy); > > @@ -642,9 +661,9 @@ void debug_object_free(void *addr, struct debug_obj_descr *descr) > > switch (obj->state) { > case ODEBUG_STATE_ACTIVE: > - debug_print_object(obj, "free"); > state = obj->state; > raw_spin_unlock_irqrestore(&db->lock, flags); > + debug_print_object(obj, "free"); > debug_object_fixup(descr->fixup_free, addr, state); > return; > default: > @@ -717,6 +736,7 @@ void debug_object_assert_init(void *addr, struct debug_obj_descr *descr) > struct debug_bucket *db; > struct debug_obj *obj; > unsigned long flags; > + bool debug_printobj = false; > > if (!debug_objects_enabled) > return; > @@ -732,22 +752,25 @@ void debug_object_assert_init(void *addr, struct debug_obj_descr *descr) > if (obj->astate == expect) > obj->astate = next; > else > - debug_print_object(obj, "active_state"); > + debug_printobj = true; > break; > > default: > - debug_print_object(obj, "active_state"); > + debug_printobj = true; > break; > } > - } else { > + } > + > + raw_spin_unlock_irqrestore(&db->lock, flags); > + if (!obj) { > struct debug_obj o = { .object = addr, > .state = ODEBUG_STATE_NOTAVAILABLE, > .descr = descr }; > > debug_print_object(&o, "active_state"); > + } else if (debug_printobj) { > + debug_print_object(obj, "active_state"); > } > - > - raw_spin_unlock_irqrestore(&db->lock, flags); > } > EXPORT_SYMBOL_GPL(debug_object_active_state); > > @@ -783,10 +806,10 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size) > > switch (obj->state) { > case ODEBUG_STATE_ACTIVE: > - debug_print_object(obj, "free"); > descr = obj->descr; > state = obj->state; > raw_spin_unlock_irqrestore(&db->lock, flags); > + debug_print_object(obj, "free"); > debug_object_fixup(descr->fixup_free, > (void *) oaddr, state); > goto repeat; As a side note, one of the test systems that I used generated a debugobjects splat in the bootup process and the system hanged afterward. Applying this patch alone fix the hanging problem and the system booted up successfully. So it is not really a good idea to call printk() while holding a raw spinlock. Cheers, Longman ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-21 16:49 ` Waiman Long @ 2018-11-22 2:04 ` Sergey Senozhatsky 2018-11-22 10:16 ` Peter Zijlstra ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Sergey Senozhatsky @ 2018-11-22 2:04 UTC (permalink / raw) To: Waiman Long Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On (11/21/18 11:49), Waiman Long wrote: [..] > > case ODEBUG_STATE_ACTIVE: > > - debug_print_object(obj, "init"); > > state = obj->state; > > raw_spin_unlock_irqrestore(&db->lock, flags); > > + debug_print_object(obj, "init"); > > debug_object_fixup(descr->fixup_init, addr, state); > > return; > > > > case ODEBUG_STATE_DESTROYED: > > - debug_print_object(obj, "init"); > > + debug_printobj = true; > > break; > > default: > > break; > > } > > > > raw_spin_unlock_irqrestore(&db->lock, flags); > > + if (debug_chkstack) > > + debug_object_is_on_stack(addr, onstack); > > + if (debug_printobj) > > + debug_print_object(obj, "init"); > > [..] > > As a side note, one of the test systems that I used generated a > debugobjects splat in the bootup process and the system hanged > afterward. Applying this patch alone fix the hanging problem and the > system booted up successfully. So it is not really a good idea to call > printk() while holding a raw spinlock. Right, I like this patch. And I think that we, maybe, can go even further. Some serial consoles call mod_timer(). So what we could have with the debug objects enabled was mod_timer() lock_timer_base() debug_activate() printk() call_console_drivers() foo_console() mod_timer() lock_timer_base() << deadlock That's one possible scenario. The other one can involve console's IRQ handler, uart port spinlock, mod_timer, debug objects, printk, and an eventual deadlock on the uart port spinlock. This one can be mitigated with printk_safe. But mod_timer() deadlock will require a different fix. So maybe we need to switch debug objects print-outs to _always_ printk_deferred(). Debug objects can be used in code which cannot do direct printk() - timekeeping is just one example. -ss ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-22 2:04 ` Sergey Senozhatsky @ 2018-11-22 10:16 ` Peter Zijlstra 2018-11-23 2:40 ` Sergey Senozhatsky 2018-11-22 16:02 ` Petr Mladek 2018-11-22 19:57 ` Waiman Long 2 siblings, 1 reply; 40+ messages in thread From: Peter Zijlstra @ 2018-11-22 10:16 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Waiman Long, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On Thu, Nov 22, 2018 at 11:04:22AM +0900, Sergey Senozhatsky wrote: > Some serial consoles call mod_timer(). So what we could have with the > debug objects enabled was > > mod_timer() > lock_timer_base() > debug_activate() > printk() > call_console_drivers() > foo_console() > mod_timer() > lock_timer_base() << deadlock > > That's one possible scenario. The other one can involve console's > IRQ handler, uart port spinlock, mod_timer, debug objects, printk, > and an eventual deadlock on the uart port spinlock. This one can > be mitigated with printk_safe. But mod_timer() deadlock will require > a different fix. > > So maybe we need to switch debug objects print-outs to _always_ > printk_deferred(). Debug objects can be used in code which cannot > do direct printk() - timekeeping is just one example. No, printk_deferred() is a disease, it needs to be eradicated, not spread around. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-22 10:16 ` Peter Zijlstra @ 2018-11-23 2:40 ` Sergey Senozhatsky 2018-11-23 11:48 ` Petr Mladek 0 siblings, 1 reply; 40+ messages in thread From: Sergey Senozhatsky @ 2018-11-23 2:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Sergey Senozhatsky, Waiman Long, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Steven Rostedt On (11/22/18 11:16), Peter Zijlstra wrote: > > So maybe we need to switch debug objects print-outs to _always_ > > printk_deferred(). Debug objects can be used in code which cannot > > do direct printk() - timekeeping is just one example. > > No, printk_deferred() is a disease, it needs to be eradicated, not > spread around. deadlock-free printk() is deferred, but OK. Another idea then: --- diff --git a/lib/debugobjects.c b/lib/debugobjects.c index 70935ed91125..3928c2b2f77c 100644 --- a/lib/debugobjects.c +++ b/lib/debugobjects.c @@ -323,10 +323,13 @@ static void debug_print_object(struct debug_obj *obj, char *msg) void *hint = descr->debug_hint ? descr->debug_hint(obj->object) : NULL; limit++; + + bust_spinlocks(1); WARN(1, KERN_ERR "ODEBUG: %s %s (active state %u) " "object type: %s hint: %pS\n", msg, obj_states[obj->state], obj->astate, descr->name, hint); + bust_spinlocks(0); } debug_objects_warnings++; } --- This should make serial consoles re-entrant. So printk->console_driver_write() hopefully will not deadlock. IOW, this turns serial consoles into early_consoles, for a while ;) -ss ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-23 2:40 ` Sergey Senozhatsky @ 2018-11-23 11:48 ` Petr Mladek 2018-11-26 4:57 ` Sergey Senozhatsky 0 siblings, 1 reply; 40+ messages in thread From: Petr Mladek @ 2018-11-23 11:48 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Peter Zijlstra, Waiman Long, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Steven Rostedt On Fri 2018-11-23 11:40:48, Sergey Senozhatsky wrote: > On (11/22/18 11:16), Peter Zijlstra wrote: > > > So maybe we need to switch debug objects print-outs to _always_ > > > printk_deferred(). Debug objects can be used in code which cannot > > > do direct printk() - timekeeping is just one example. > > > > No, printk_deferred() is a disease, it needs to be eradicated, not > > spread around. > > deadlock-free printk() is deferred, but OK. The best solution would be lockless console drivers. Sigh. > Another idea then: > > --- > > diff --git a/lib/debugobjects.c b/lib/debugobjects.c > index 70935ed91125..3928c2b2f77c 100644 > --- a/lib/debugobjects.c > +++ b/lib/debugobjects.c > @@ -323,10 +323,13 @@ static void debug_print_object(struct debug_obj *obj, char *msg) > void *hint = descr->debug_hint ? > descr->debug_hint(obj->object) : NULL; > limit++; > + > + bust_spinlocks(1); > WARN(1, KERN_ERR "ODEBUG: %s %s (active state %u) " > "object type: %s hint: %pS\n", > msg, obj_states[obj->state], obj->astate, > descr->name, hint); > + bust_spinlocks(0); > } > debug_objects_warnings++; > } > > --- > > This should make serial consoles re-entrant. > So printk->console_driver_write() hopefully will not deadlock. Is the re-entrance safe? Some risk might be acceptable in Oops/panic situations. It is much less acceptable for random warnings. Best Regards, Petr ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-23 11:48 ` Petr Mladek @ 2018-11-26 4:57 ` Sergey Senozhatsky 2018-11-29 13:09 ` Petr Mladek 0 siblings, 1 reply; 40+ messages in thread From: Sergey Senozhatsky @ 2018-11-26 4:57 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Peter Zijlstra, Waiman Long, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Steven Rostedt On (11/23/18 12:48), Petr Mladek wrote: [..] > > This should make serial consoles re-entrant. > > So printk->console_driver_write() hopefully will not deadlock. > > Is the re-entrance safe? Some risk might be acceptable in Oops/panic > situations. It is much less acceptable for random warnings. Good question. But what's the alternative? A deadlock in a serial console driver; such that even panic() is not guaranteed to make through it (at least of now). debug objects are used from the code which cannot re-entrant console drivers. bust_spinlock is called from various paths, not only panic. git grep bust_spinlocks | wc -l 62 So we already switch to re-entrant consoles (and accept the risks) in mm/fault.c, kernel/traps.c and so on. Which, I guess, makes us a little more confident, faults/traps happen often enough. It seems, that, more or less, serial consoles are ready to handle it. UART consoles in ->write() callbacks just do a bunch of writel() [for every char + \r\n]. -ss ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-26 4:57 ` Sergey Senozhatsky @ 2018-11-29 13:09 ` Petr Mladek 0 siblings, 0 replies; 40+ messages in thread From: Petr Mladek @ 2018-11-29 13:09 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Peter Zijlstra, Waiman Long, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Steven Rostedt On Mon 2018-11-26 13:57:09, Sergey Senozhatsky wrote: > On (11/23/18 12:48), Petr Mladek wrote: > [..] > > > This should make serial consoles re-entrant. > > > So printk->console_driver_write() hopefully will not deadlock. > > > > Is the re-entrance safe? Some risk might be acceptable in Oops/panic > > situations. It is much less acceptable for random warnings. > > Good question. > > But what's the alternative? A deadlock in a serial console driver; such > that even panic() is not guaranteed to make through it (at least of now). > debug objects are used from the code which cannot re-entrant console > drivers. > > bust_spinlock is called from various paths, not only panic. > git grep bust_spinlocks | wc -l > 62 bust_spinlocks() is followed by die() in several situations. The rests seems to be Oops situations where we an invalid address is being accessed. There is a nontrivial chance that the system would die anyway. Now, if I look into Documentation/core-api/debug-objects.rst, the API is used to detect: - Activation of uninitialized objects - Initialization of active objects - Usage of freed/destroyed objects Of course, all the above situations might lead to the system crash. But even in the worst case, use-after-free, there is a non-trivial chance that the data still would be valid and the system would survive. There might be many other warnings of the same severity. > So we already switch to re-entrant consoles (and accept the risks) in > mm/fault.c, kernel/traps.c and so on. Which, I guess, makes us a little > more confident, faults/traps happen often enough. Where is the border line, please? Do we want to have the kernel sources full of bust_spinlocks() callers? > It seems, that, more or less, serial consoles are ready to handle it. > UART consoles in ->write() callbacks just do a bunch of writel() [for > every char + \r\n]. But oops_in_progress does not affect only serial (UART) consoles. We want safe lockless consoles. We do not want to run a most-of-the-time-safe code too often. BTW: I have heard that someone from the RT people is working on a big printk() rewrite. One part is a lockless buffer. Another part should be a different handling of safe (lockless) and more complicated consoles. It was presented on some recent conference (I forgot which one). I do not know any details. But the first version should be sent in a near future. Best Regards, Petr ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-22 2:04 ` Sergey Senozhatsky 2018-11-22 10:16 ` Peter Zijlstra @ 2018-11-22 16:02 ` Petr Mladek 2018-11-22 20:29 ` Waiman Long 2018-11-22 19:57 ` Waiman Long 2 siblings, 1 reply; 40+ messages in thread From: Petr Mladek @ 2018-11-22 16:02 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Waiman Long, Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On Thu 2018-11-22 11:04:22, Sergey Senozhatsky wrote: > On (11/21/18 11:49), Waiman Long wrote: > [..] > > > case ODEBUG_STATE_ACTIVE: > > > - debug_print_object(obj, "init"); > > > state = obj->state; > > > raw_spin_unlock_irqrestore(&db->lock, flags); > > > + debug_print_object(obj, "init"); > > > debug_object_fixup(descr->fixup_init, addr, state); > > > return; > > > > > > case ODEBUG_STATE_DESTROYED: > > > - debug_print_object(obj, "init"); > > > + debug_printobj = true; > > > break; > > > default: > > > break; > > > } > > > > > > raw_spin_unlock_irqrestore(&db->lock, flags); > > > + if (debug_chkstack) > > > + debug_object_is_on_stack(addr, onstack); > > > + if (debug_printobj) > > > + debug_print_object(obj, "init"); > > > > [..] > > > > As a side note, one of the test systems that I used generated a > > debugobjects splat in the bootup process and the system hanged > > afterward. Applying this patch alone fix the hanging problem and the > > system booted up successfully. So it is not really a good idea to call > > printk() while holding a raw spinlock. Please, was the system hang reproducible? I wonder if it was a deadlock described by Sergey below. The commit message is right. printk() might take too long and cause softlockup or livelock. But it does not explain why the system could competely hang. Also note that prinkt() should not longer block a single process indefinitely thanks to the commit dbdda842fe96f8932 ("printk: Add console owner and waiter logic to load balance console writes"). > Some serial consoles call mod_timer(). So what we could have with the > debug objects enabled was > > mod_timer() > lock_timer_base() > debug_activate() > printk() > call_console_drivers() > foo_console() > mod_timer() > lock_timer_base() << deadlock Anyway, I wonder what was the primary motivation for this patch. Was it the system hang? Or was it lockdep report about nesting two terminal locks: db->lock, pool_lock with logbuf_lock? Printk is problematic. It might delay any atomic context where it is used. I would like to better understand the problem here before we start spread printk-related hacks. Best Regards, Petr ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-22 16:02 ` Petr Mladek @ 2018-11-22 20:29 ` Waiman Long 2018-11-23 11:36 ` Petr Mladek 0 siblings, 1 reply; 40+ messages in thread From: Waiman Long @ 2018-11-22 20:29 UTC (permalink / raw) To: Petr Mladek, Sergey Senozhatsky Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On 11/22/2018 11:02 AM, Petr Mladek wrote: > On Thu 2018-11-22 11:04:22, Sergey Senozhatsky wrote: >> On (11/21/18 11:49), Waiman Long wrote: >> [..] >>>> case ODEBUG_STATE_ACTIVE: >>>> - debug_print_object(obj, "init"); >>>> state = obj->state; >>>> raw_spin_unlock_irqrestore(&db->lock, flags); >>>> + debug_print_object(obj, "init"); >>>> debug_object_fixup(descr->fixup_init, addr, state); >>>> return; >>>> >>>> case ODEBUG_STATE_DESTROYED: >>>> - debug_print_object(obj, "init"); >>>> + debug_printobj = true; >>>> break; >>>> default: >>>> break; >>>> } >>>> >>>> raw_spin_unlock_irqrestore(&db->lock, flags); >>>> + if (debug_chkstack) >>>> + debug_object_is_on_stack(addr, onstack); >>>> + if (debug_printobj) >>>> + debug_print_object(obj, "init"); >>>> >> [..] >>> As a side note, one of the test systems that I used generated a >>> debugobjects splat in the bootup process and the system hanged >>> afterward. Applying this patch alone fix the hanging problem and the >>> system booted up successfully. So it is not really a good idea to call >>> printk() while holding a raw spinlock. > Please, was the system hang reproducible? I wonder if it was a > deadlock described by Sergey below. Yes, it is 100% reproducible on the testing system that I used. > The commit message is right. printk() might take too long and > cause softlockup or livelock. But it does not explain why > the system could competely hang. > > Also note that prinkt() should not longer block a single process > indefinitely thanks to the commit dbdda842fe96f8932 ("printk: > Add console owner and waiter logic to load balance console writes"). The problem might have been caused by the fact that IRQ was also disabled in the lock critical section. >> Some serial consoles call mod_timer(). So what we could have with the >> debug objects enabled was >> >> mod_timer() >> lock_timer_base() >> debug_activate() >> printk() >> call_console_drivers() >> foo_console() >> mod_timer() >> lock_timer_base() << deadlock > Anyway, I wonder what was the primary motivation for this patch. > Was it the system hang? Or was it lockdep report about nesting > two terminal locks: db->lock, pool_lock with logbuf_lock? The primary motivation was to make sure that printk() won't be called while holding either db->lock or pool_lock in the debugobjects code. In the determination of which locks can be made terminal, I focused on local spinlocks that won't cause boundary to an unrelated subsystem as it will greatly complicate the analysis. I didn't realize that it fixed a hang problem that I was seeing until I did bisection to find out that it was caused by the patch that cause the debugobjects splat in the first place a few days ago. I was comparing the performance status of the pre and post patched kernel. The pre-patch kernel failed to boot in the one of my test systems, but the post-patched kernel could. I narrowed it down to this patch as the fix for the hang problem. Cheers, Longman ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-22 20:29 ` Waiman Long @ 2018-11-23 11:36 ` Petr Mladek 0 siblings, 0 replies; 40+ messages in thread From: Petr Mladek @ 2018-11-23 11:36 UTC (permalink / raw) To: Waiman Long Cc: Sergey Senozhatsky, Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On Thu 2018-11-22 15:29:35, Waiman Long wrote: > On 11/22/2018 11:02 AM, Petr Mladek wrote: > > Anyway, I wonder what was the primary motivation for this patch. > > Was it the system hang? Or was it lockdep report about nesting > > two terminal locks: db->lock, pool_lock with logbuf_lock? > > The primary motivation was to make sure that printk() won't be called > while holding either db->lock or pool_lock in the debugobjects code. In > the determination of which locks can be made terminal, I focused on > local spinlocks that won't cause boundary to an unrelated subsystem as > it will greatly complicate the analysis. Then please mention this as the real reason in the commit message. The reason that printk might take too long in IRQ context does not explain why we need the patch. printk() is called in IRQ context in many other locations. I do not see why this place should be special. The delay is the price for the chance to see the problem. > I didn't realize that it fixed a hang problem that I was seeing until I > did bisection to find out that it was caused by the patch that cause the > debugobjects splat in the first place a few days ago. I was comparing > the performance status of the pre and post patched kernel. The pre-patch > kernel failed to boot in the one of my test systems, but the > post-patched kernel could. I narrowed it down to this patch as the fix > for the hang problem. The hang is a mystery. My guess is that there is a race somewhere. The patch might have reduced the race window but it probably did not fix the race itself. Best Regards, Petr ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-22 2:04 ` Sergey Senozhatsky 2018-11-22 10:16 ` Peter Zijlstra 2018-11-22 16:02 ` Petr Mladek @ 2018-11-22 19:57 ` Waiman Long 2018-11-23 2:35 ` Sergey Senozhatsky 2018-11-23 11:20 ` Petr Mladek 2 siblings, 2 replies; 40+ messages in thread From: Waiman Long @ 2018-11-22 19:57 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On 11/21/2018 09:04 PM, Sergey Senozhatsky wrote: > On (11/21/18 11:49), Waiman Long wrote: > [..] >>> case ODEBUG_STATE_ACTIVE: >>> - debug_print_object(obj, "init"); >>> state = obj->state; >>> raw_spin_unlock_irqrestore(&db->lock, flags); >>> + debug_print_object(obj, "init"); >>> debug_object_fixup(descr->fixup_init, addr, state); >>> return; >>> >>> case ODEBUG_STATE_DESTROYED: >>> - debug_print_object(obj, "init"); >>> + debug_printobj = true; >>> break; >>> default: >>> break; >>> } >>> >>> raw_spin_unlock_irqrestore(&db->lock, flags); >>> + if (debug_chkstack) >>> + debug_object_is_on_stack(addr, onstack); >>> + if (debug_printobj) >>> + debug_print_object(obj, "init"); >>> > [..] >> As a side note, one of the test systems that I used generated a >> debugobjects splat in the bootup process and the system hanged >> afterward. Applying this patch alone fix the hanging problem and the >> system booted up successfully. So it is not really a good idea to call >> printk() while holding a raw spinlock. > Right, I like this patch. > And I think that we, maybe, can go even further. > > Some serial consoles call mod_timer(). So what we could have with the > debug objects enabled was > > mod_timer() > lock_timer_base() > debug_activate() > printk() > call_console_drivers() > foo_console() > mod_timer() > lock_timer_base() << deadlock > > That's one possible scenario. The other one can involve console's > IRQ handler, uart port spinlock, mod_timer, debug objects, printk, > and an eventual deadlock on the uart port spinlock. This one can > be mitigated with printk_safe. But mod_timer() deadlock will require > a different fix. > > So maybe we need to switch debug objects print-outs to _always_ > printk_deferred(). Debug objects can be used in code which cannot > do direct printk() - timekeeping is just one example. > > -ss Actually, I don't think that was the cause of the hang. The debugobjects splat was caused by debug_object_is_on_stack(), below was the output: [ 6.890048] ODEBUG: object (____ptrval____) is NOT on stack (____ptrval____), but annotated. [ 6.891000] WARNING: CPU: 28 PID: 1 at lib/debugobjects.c:369 __debug_object_init.cold.11+0x51/0x2d6 [ 6.891000] Modules linked in: [ 6.891000] CPU: 28 PID: 1 Comm: swapper/0 Not tainted 4.18.0-41.el8.bz1651764_cgroup_debug.x86_64+debug #1 [ 6.891000] Hardware name: HPE ProLiant DL120 Gen10/ProLiant DL120 Gen10, BIOS U36 11/14/2017 [ 6.891000] RIP: 0010:__debug_object_init.cold.11+0x51/0x2d6 [ 6.891000] Code: ea 03 80 3c 02 00 0f 85 85 02 00 00 49 8b 54 24 18 48 89 de 4c 89 44 24 10 48 c7 c7 00 ce 22 94 e8 73 18 62 ff 4c 8b 44 24 10 <0f> 0b e9 60 db ff ff 41 83 c4 01 b8 ff ff 37 00 44 89 25 ce 46 f9 [ 6.891000] RSP: 0000:ffff880104187960 EFLAGS: 00010086 [ 6.891000] RAX: 0000000000000050 RBX: ffffffff9764c570 RCX: 0000000000000000 [ 6.891000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880104178ca8 [ 6.891000] RBP: 1ffff10020830f34 R08: ffff8807ce68a1d0 R09: fffffbfff2923554 [ 6.891000] R10: fffffbfff2923554 R11: ffffffff9491aaa3 R12: ffff880104178000 [ 6.891000] R13: ffffffff96c809b8 R14: 000000000000a370 R15: ffff8807ce68a1c0 [ 6.891000] FS: 0000000000000000(0000) GS:ffff8807d4200000(0000) knlGS:0000000000000000 [ 6.891000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.891000] CR2: 0000000000000000 CR3: 000000028de16001 CR4: 00000000007606e0 [ 6.891000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6.891000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6.891000] PKRU: 00000000 [ 6.891000] Call Trace: [ 6.891000] ? debug_object_fixup+0x30/0x30 [ 6.891000] ? _raw_spin_unlock_irqrestore+0x4b/0x60 [ 6.891000] ? __lockdep_init_map+0x12f/0x510 [ 6.891000] ? __lockdep_init_map+0x12f/0x510 [ 6.891000] virt_efi_get_next_variable+0xa2/0x160 [ 6.891000] efivar_init+0x1c4/0x6d7 [ 6.891000] ? efivar_ssdt_setup+0x3b/0x3b [ 6.891000] ? efivar_entry_iter+0x120/0x120 [ 6.891000] ? find_held_lock+0x3a/0x1c0 [ 6.891000] ? lock_downgrade+0x5e0/0x5e0 [ 6.891000] ? kmsg_dump_rewind_nolock+0xd9/0xd9 [ 6.891000] ? _raw_spin_unlock_irqrestore+0x4b/0x60 [ 6.891000] ? trace_hardirqs_on_caller+0x381/0x570 [ 6.891000] ? efivar_ssdt_iter+0x1f4/0x1f4 [ 6.891000] efisubsys_init+0x1be/0x4ae [ 6.891000] ? kernfs_get.part.8+0x4c/0x60 [ 6.891000] ? efivar_ssdt_iter+0x1f4/0x1f4 [ 6.891000] ? __kernfs_create_file+0x235/0x2e0 [ 6.891000] ? efivar_ssdt_iter+0x1f4/0x1f4 [ 6.891000] do_one_initcall+0xe9/0x5fd [ 6.891000] ? perf_trace_initcall_level+0x450/0x450 [ 6.891000] ? __wake_up_common+0x5a0/0x5a0 [ 6.891000] ? lock_downgrade+0x5e0/0x5e0 [ 6.891000] kernel_init_freeable+0x51a/0x5f2 [ 6.891000] ? start_kernel+0x7b8/0x7b8 [ 6.891000] ? finish_task_switch+0x19a/0x690 [ 6.891000] ? __switch_to_asm+0x40/0x70 [ 6.891000] ? __switch_to_asm+0x34/0x70 [ 6.891000] ? rest_init+0xe9/0xe9 [ 6.891000] kernel_init+0xc/0x110 [ 6.891000] ? rest_init+0xe9/0xe9 [ 6.891000] ret_from_fork+0x24/0x50 [ 6.891000] irq event stamp: 1081352 [ 6.891000] hardirqs last enabled at (1081351): [<ffffffff93af7dab>] _raw_spin_unlock_irqrestore+0x4b/0x60 [ 6.891000] hardirqs last disabled at (1081352): [<ffffffff93af85c2>] _raw_spin_lock_irqsave+0x22/0x81 [ 6.891000] softirqs last enabled at (1081334): [<ffffffff93e006f9>] __do_softirq+0x6f9/0xaa0 [ 6.891000] softirqs last disabled at (1081325): [<ffffffff921b993f>] irq_exit+0x27f/0x2d0 [ 6.891000] ---[ end trace 15e1083fc009a526 ]--- All the messages above were printed while holding a raw spinlock with IRQ disabled. Further down the bootup sequence, the system appeared to hang: 11.270654] systemd[1]: systemd 239 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=legacy) [ 11.311307] systemd[1]: Detected architecture x86-64. [ 11.316420] systemd[1]: Running in initial RAM disk. Welcome to The system is not responsive at this point. I am not totally sure what caused this. Maybe it was caused by disabling IRQ for too long leading to some kind of corruption. Anyway, moving debug_object_is_on_stack() outside of the IRQ disabled lock critical section seemed to fix the hang problem. Cheers, Longman ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-22 19:57 ` Waiman Long @ 2018-11-23 2:35 ` Sergey Senozhatsky 2018-11-23 11:20 ` Petr Mladek 1 sibling, 0 replies; 40+ messages in thread From: Sergey Senozhatsky @ 2018-11-23 2:35 UTC (permalink / raw) To: Waiman Long Cc: Sergey Senozhatsky, Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On (11/22/18 14:57), Waiman Long wrote: > > [..] > >> As a side note, one of the test systems that I used generated a > >> debugobjects splat in the bootup process and the system hanged > >> afterward. Applying this patch alone fix the hanging problem and the > >> system booted up successfully. So it is not really a good idea to call > >> printk() while holding a raw spinlock. > > Right, I like this patch. > > And I think that we, maybe, can go even further. > > > > Some serial consoles call mod_timer(). So what we could have with the > > debug objects enabled was > > > > mod_timer() > > lock_timer_base() > > debug_activate() > > printk() > > call_console_drivers() > > foo_console() > > mod_timer() > > lock_timer_base() << deadlock > > > > That's one possible scenario. The other one can involve console's > > IRQ handler, uart port spinlock, mod_timer, debug objects, printk, > > and an eventual deadlock on the uart port spinlock. This one can > > be mitigated with printk_safe. But mod_timer() deadlock will require > > a different fix. > > > > So maybe we need to switch debug objects print-outs to _always_ > > printk_deferred(). Debug objects can be used in code which cannot > > do direct printk() - timekeeping is just one example. > > Actually, I don't think that was the cause of the hang. Oh, I didn't suggest that this was the case. Just talked about more problems with printk in debug objects. Serial consoles call mod_time, mod_timer calls debug objects, debug objects call printk and end up in serial console again. Serial consoles are not re-entrant at this point. > The debugobjects splat was caused by debug_object_is_on_stack(), below > was the output: > > [ 6.890048] ODEBUG: object (____ptrval____) is NOT on stack > (____ptrval____), but annotated. > [ 6.891000] WARNING: CPU: 28 PID: 1 at lib/debugobjects.c:369 > __debug_object_init.cold.11+0x51/0x2d6 [..] > 11.270654] systemd[1]: systemd 239 running in system mode. (+PAM > +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP > +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN > +PCRE2 default-hierarchy=legacy) > [ 11.311307] systemd[1]: Detected architecture x86-64. > [ 11.316420] systemd[1]: Running in initial RAM disk. > > Welcome to > > The system is not responsive at this point. > > I am not totally sure what caused this. Hmm, me neither. -ss ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-22 19:57 ` Waiman Long 2018-11-23 2:35 ` Sergey Senozhatsky @ 2018-11-23 11:20 ` Petr Mladek 1 sibling, 0 replies; 40+ messages in thread From: Petr Mladek @ 2018-11-23 11:20 UTC (permalink / raw) To: Waiman Long Cc: Sergey Senozhatsky, Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On Thu 2018-11-22 14:57:02, Waiman Long wrote: > On 11/21/2018 09:04 PM, Sergey Senozhatsky wrote: > > On (11/21/18 11:49), Waiman Long wrote: > > [..] > >>> case ODEBUG_STATE_ACTIVE: > >>> - debug_print_object(obj, "init"); > >>> state = obj->state; > >>> raw_spin_unlock_irqrestore(&db->lock, flags); > >>> + debug_print_object(obj, "init"); > >>> debug_object_fixup(descr->fixup_init, addr, state); > >>> return; > >>> > >>> case ODEBUG_STATE_DESTROYED: > >>> - debug_print_object(obj, "init"); > >>> + debug_printobj = true; > >>> break; > >>> default: > >>> break; > >>> } > >>> > >>> raw_spin_unlock_irqrestore(&db->lock, flags); > >>> + if (debug_chkstack) > >>> + debug_object_is_on_stack(addr, onstack); > >>> + if (debug_printobj) > >>> + debug_print_object(obj, "init"); > >>> > > [..] > >> As a side note, one of the test systems that I used generated a > >> debugobjects splat in the bootup process and the system hanged > >> afterward. Applying this patch alone fix the hanging problem and the > >> system booted up successfully. So it is not really a good idea to call > >> printk() while holding a raw spinlock. > > Right, I like this patch. > > And I think that we, maybe, can go even further. > > > > Some serial consoles call mod_timer(). So what we could have with the > > debug objects enabled was > > > > mod_timer() > > lock_timer_base() > > debug_activate() > > printk() > > call_console_drivers() > > foo_console() > > mod_timer() > > lock_timer_base() << deadlock > > > > That's one possible scenario. The other one can involve console's > > IRQ handler, uart port spinlock, mod_timer, debug objects, printk, > > and an eventual deadlock on the uart port spinlock. This one can > > be mitigated with printk_safe. But mod_timer() deadlock will require > > a different fix. > > > > So maybe we need to switch debug objects print-outs to _always_ > > printk_deferred(). Debug objects can be used in code which cannot > > do direct printk() - timekeeping is just one example. > > > > -ss > > Actually, I don't think that was the cause of the hang. The debugobjects > splat was caused by debug_object_is_on_stack(), below was the output: > > [ 6.890048] ODEBUG: object (____ptrval____) is NOT on stack > (____ptrval____), but annotated. > [ 6.891000] WARNING: CPU: 28 PID: 1 at lib/debugobjects.c:369 > __debug_object_init.cold.11+0x51/0x2d6 > [ 6.891000] Modules linked in: > [ 6.891000] CPU: 28 PID: 1 Comm: swapper/0 Not tainted > 4.18.0-41.el8.bz1651764_cgroup_debug.x86_64+debug #1 > [ 6.891000] Hardware name: HPE ProLiant DL120 Gen10/ProLiant DL120 > Gen10, BIOS U36 11/14/2017 > [ 6.891000] RIP: 0010:__debug_object_init.cold.11+0x51/0x2d6 > [ 6.891000] Code: ea 03 80 3c 02 00 0f 85 85 02 00 00 49 8b 54 24 18 > 48 89 de 4c 89 44 24 10 48 c7 c7 00 ce 22 94 e8 73 18 62 ff 4c 8b 44 24 > 10 <0f> 0b e9 60 db ff ff 41 83 c4 01 b8 ff ff 37 00 44 89 25 ce 46 f9 > [ 6.891000] RSP: 0000:ffff880104187960 EFLAGS: 00010086 > [ 6.891000] RAX: 0000000000000050 RBX: ffffffff9764c570 RCX: > 0000000000000000 > [ 6.891000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > ffff880104178ca8 > [ 6.891000] RBP: 1ffff10020830f34 R08: ffff8807ce68a1d0 R09: > fffffbfff2923554 > [ 6.891000] R10: fffffbfff2923554 R11: ffffffff9491aaa3 R12: > ffff880104178000 > [ 6.891000] R13: ffffffff96c809b8 R14: 000000000000a370 R15: > ffff8807ce68a1c0 > [ 6.891000] FS: 0000000000000000(0000) GS:ffff8807d4200000(0000) > knlGS:0000000000000000 > [ 6.891000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 6.891000] CR2: 0000000000000000 CR3: 000000028de16001 CR4: > 00000000007606e0 > [ 6.891000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 6.891000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 6.891000] PKRU: 00000000 > [ 6.891000] Call Trace: > [ 6.891000] ? debug_object_fixup+0x30/0x30 > [ 6.891000] ? _raw_spin_unlock_irqrestore+0x4b/0x60 > [ 6.891000] ? __lockdep_init_map+0x12f/0x510 > [ 6.891000] ? __lockdep_init_map+0x12f/0x510 > [ 6.891000] virt_efi_get_next_variable+0xa2/0x160 > [ 6.891000] efivar_init+0x1c4/0x6d7 > [ 6.891000] ? efivar_ssdt_setup+0x3b/0x3b > [ 6.891000] ? efivar_entry_iter+0x120/0x120 > [ 6.891000] ? find_held_lock+0x3a/0x1c0 > [ 6.891000] ? lock_downgrade+0x5e0/0x5e0 > [ 6.891000] ? kmsg_dump_rewind_nolock+0xd9/0xd9 > [ 6.891000] ? _raw_spin_unlock_irqrestore+0x4b/0x60 > [ 6.891000] ? trace_hardirqs_on_caller+0x381/0x570 > [ 6.891000] ? efivar_ssdt_iter+0x1f4/0x1f4 > [ 6.891000] efisubsys_init+0x1be/0x4ae > [ 6.891000] ? kernfs_get.part.8+0x4c/0x60 > [ 6.891000] ? efivar_ssdt_iter+0x1f4/0x1f4 > [ 6.891000] ? __kernfs_create_file+0x235/0x2e0 > [ 6.891000] ? efivar_ssdt_iter+0x1f4/0x1f4 > [ 6.891000] do_one_initcall+0xe9/0x5fd > [ 6.891000] ? perf_trace_initcall_level+0x450/0x450 > [ 6.891000] ? __wake_up_common+0x5a0/0x5a0 > [ 6.891000] ? lock_downgrade+0x5e0/0x5e0 > [ 6.891000] kernel_init_freeable+0x51a/0x5f2 > [ 6.891000] ? start_kernel+0x7b8/0x7b8 > [ 6.891000] ? finish_task_switch+0x19a/0x690 > [ 6.891000] ? __switch_to_asm+0x40/0x70 > [ 6.891000] ? __switch_to_asm+0x34/0x70 > [ 6.891000] ? rest_init+0xe9/0xe9 > [ 6.891000] kernel_init+0xc/0x110 > [ 6.891000] ? rest_init+0xe9/0xe9 > [ 6.891000] ret_from_fork+0x24/0x50 > [ 6.891000] irq event stamp: 1081352 > [ 6.891000] hardirqs last enabled at (1081351): [<ffffffff93af7dab>] > _raw_spin_unlock_irqrestore+0x4b/0x60 > [ 6.891000] hardirqs last disabled at (1081352): [<ffffffff93af85c2>] > _raw_spin_lock_irqsave+0x22/0x81 > [ 6.891000] softirqs last enabled at (1081334): [<ffffffff93e006f9>] > __do_softirq+0x6f9/0xaa0 > [ 6.891000] softirqs last disabled at (1081325): [<ffffffff921b993f>] > irq_exit+0x27f/0x2d0 > [ 6.891000] ---[ end trace 15e1083fc009a526 ]--- > > All the messages above were printed while holding a raw spinlock with > IRQ disabled. Further down the bootup sequence, the system appeared to hang: > > 11.270654] systemd[1]: systemd 239 running in system mode. (+PAM > +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP > +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN > +PCRE2 default-hierarchy=legacy) > [ 11.311307] systemd[1]: Detected architecture x86-64. > [ 11.316420] systemd[1]: Running in initial RAM disk. > > Welcome to > > The system is not responsive at this point. > > I am not totally sure what caused this. Maybe it was caused by disabling > IRQ for too long leading to some kind of corruption. Anyway, moving > debug_object_is_on_stack() outside of the IRQ disabled lock critical > section seemed to fix the hang problem. It is hard to say why printing the above with disabled interrupts would break anything. efivar_init() itself should get delayed the same way with and without the patch. Some clue might be in the rest of the log. It would be interesting to compare full logs of non-patched and patched system. Best Regards, Petr ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections 2018-11-19 18:55 ` [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections Waiman Long 2018-11-21 16:49 ` Waiman Long @ 2018-12-07 9:21 ` Peter Zijlstra 1 sibling, 0 replies; 40+ messages in thread From: Peter Zijlstra @ 2018-12-07 9:21 UTC (permalink / raw) To: Waiman Long Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On Mon, Nov 19, 2018 at 01:55:16PM -0500, Waiman Long wrote: > The db->lock is a raw spinlock and so the lock hold time is supposed > to be short. This will not be the case when printk() is being involved > in some of the critical sections. In order to avoid the long hold time, > in case some messages need to be printed, the debug_object_is_on_stack() > and debug_print_object() calls are now moved out of those critical > sections. That's not why you did this patch though; you want to make these locks terminal locks and that means no printk() inside, as that uses locks again. Please write relevant changelogs. ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v2 08/17] locking/lockdep: Add support for nestable terminal locks 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long ` (6 preceding siblings ...) 2018-11-19 18:55 ` [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-12-07 9:22 ` Peter Zijlstra 2018-11-19 18:55 ` [PATCH v2 09/17] debugobjects: Make object hash locks " Waiman Long ` (8 subsequent siblings) 16 siblings, 1 reply; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long There are use cases where we want to allow nesting of one terminal lock underneath another terminal-like lock. That new lock type is called nestable terminal lock which can optionally allow the acquisition of no more than one regular (non-nestable) terminal lock underneath it. Signed-off-by: Waiman Long <longman@redhat.com> --- include/linux/lockdep.h | 9 ++++++++- kernel/locking/lockdep.c | 15 +++++++++++++-- kernel/locking/lockdep_internals.h | 2 +- 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index a146bca..b9435fb 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -148,16 +148,20 @@ struct lock_class_stats { * 1) LOCKDEP_FLAG_NOVALIDATE: No full validation, just simple checks. * 2) LOCKDEP_FLAG_TERMINAL: This is a terminal lock where lock/unlock on * another lock within its critical section is not allowed. + * 3) LOCKDEP_FLAG_TERMINAL_NESTABLE: This is a terminal lock that can + * allow one more regular terminal lock to be nested underneath it. * * Only the least significant 4 bits of the flags will be copied to the * held_lock structure. */ #define LOCKDEP_FLAG_TERMINAL (1 << 0) +#define LOCKDEP_FLAG_TERMINAL_NESTABLE (1 << 1) #define LOCKDEP_FLAG_NOVALIDATE (1 << 4) #define LOCKDEP_HLOCK_FLAGS_MASK 0x0f #define LOCKDEP_NOCHECK_FLAGS (LOCKDEP_FLAG_NOVALIDATE |\ - LOCKDEP_FLAG_TERMINAL) + LOCKDEP_FLAG_TERMINAL |\ + LOCKDEP_FLAG_TERMINAL_NESTABLE) /* * Map the lock object (the lock instance) to the lock-class object. @@ -327,6 +331,8 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name, do { (lock)->dep_map.flags |= LOCKDEP_FLAG_NOVALIDATE; } while (0) #define lockdep_set_terminal_class(lock) \ do { (lock)->dep_map.flags |= LOCKDEP_FLAG_TERMINAL; } while (0) +#define lockdep_set_terminal_nestable_class(lock) \ + do { (lock)->dep_map.flags |= LOCKDEP_FLAG_TERMINAL_NESTABLE; } while (0) /* * Compare locking classes @@ -444,6 +450,7 @@ static inline void lockdep_on(void) #define lockdep_set_novalidate_class(lock) do { } while (0) #define lockdep_set_terminal_class(lock) do { } while (0) +#define lockdep_set_terminal_nestable_class(lock) do { } while (0) /* * We don't define lockdep_match_class() and lockdep_match_key() for !LOCKDEP diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 40894c1..5a853a6 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -3263,13 +3263,24 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, class_idx = class - lock_classes + 1; if (depth) { + int prev_type; + hlock = curr->held_locks + depth - 1; /* - * Warn if the previous lock is a terminal lock. + * Warn if the previous lock is a terminal lock or the + * previous lock is a nestable terminal lock and the current + * one isn't a regular terminal lock. */ - if (DEBUG_LOCKS_WARN_ON(hlock_is_terminal(hlock))) + prev_type = hlock_is_terminal(hlock); + if (DEBUG_LOCKS_WARN_ON((prev_type == LOCKDEP_FLAG_TERMINAL) || + ((prev_type == LOCKDEP_FLAG_TERMINAL_NESTABLE) && + (flags_is_terminal(class->flags) != + LOCKDEP_FLAG_TERMINAL)))) { + pr_warn("Terminal lock error: prev lock = %s, curr lock = %s\n", + hlock->instance->name, class->name); return 0; + } if (hlock->class_idx == class_idx && nest_lock) { if (hlock->references) { diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h index 271fba8..51fa141 100644 --- a/kernel/locking/lockdep_internals.h +++ b/kernel/locking/lockdep_internals.h @@ -215,5 +215,5 @@ static inline unsigned long debug_class_ops_read(struct lock_class *class) static inline unsigned int flags_is_terminal(unsigned int flags) { - return flags & LOCKDEP_FLAG_TERMINAL; + return flags & (LOCKDEP_FLAG_TERMINAL|LOCKDEP_FLAG_TERMINAL_NESTABLE); } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v2 08/17] locking/lockdep: Add support for nestable terminal locks 2018-11-19 18:55 ` [PATCH v2 08/17] locking/lockdep: Add support for nestable terminal locks Waiman Long @ 2018-12-07 9:22 ` Peter Zijlstra 2018-12-07 9:52 ` Peter Zijlstra 0 siblings, 1 reply; 40+ messages in thread From: Peter Zijlstra @ 2018-12-07 9:22 UTC (permalink / raw) To: Waiman Long Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On Mon, Nov 19, 2018 at 01:55:17PM -0500, Waiman Long wrote: > There are use cases where we want to allow nesting of one terminal lock > underneath another terminal-like lock. That new lock type is called > nestable terminal lock which can optionally allow the acquisition of > no more than one regular (non-nestable) terminal lock underneath it. I think I asked for a more coherent changelog on this. The above is still self contradictory and doesn't explain why you'd ever want such a 'misfeature' :-) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 08/17] locking/lockdep: Add support for nestable terminal locks 2018-12-07 9:22 ` Peter Zijlstra @ 2018-12-07 9:52 ` Peter Zijlstra 0 siblings, 0 replies; 40+ messages in thread From: Peter Zijlstra @ 2018-12-07 9:52 UTC (permalink / raw) To: Waiman Long Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On Fri, Dec 07, 2018 at 10:22:52AM +0100, Peter Zijlstra wrote: > On Mon, Nov 19, 2018 at 01:55:17PM -0500, Waiman Long wrote: > > There are use cases where we want to allow nesting of one terminal lock > > underneath another terminal-like lock. That new lock type is called > > nestable terminal lock which can optionally allow the acquisition of > > no more than one regular (non-nestable) terminal lock underneath it. > > I think I asked for a more coherent changelog on this. The above is > still self contradictory and doesn't explain why you'd ever want such a > 'misfeature' :-) So maybe call the thing penterminal (contraction of penultimate and terminal) locks and explain why this annotation is safe -- in great detail. ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v2 09/17] debugobjects: Make object hash locks nestable terminal locks 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long ` (7 preceding siblings ...) 2018-11-19 18:55 ` [PATCH v2 08/17] locking/lockdep: Add support for nestable terminal locks Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-11-22 15:33 ` Petr Mladek 2018-12-07 9:47 ` Peter Zijlstra 2018-11-19 18:55 ` [PATCH v2 10/17] lib/stackdepot: Make depot_lock a terminal spinlock Waiman Long ` (7 subsequent siblings) 16 siblings, 2 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long By making the object hash locks nestable terminal locks, we can avoid a bunch of unnecessary lockdep validations as well as saving space in the lockdep tables. Signed-off-by: Waiman Long <longman@redhat.com> --- lib/debugobjects.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/lib/debugobjects.c b/lib/debugobjects.c index 4216d3d..c6f3967 100644 --- a/lib/debugobjects.c +++ b/lib/debugobjects.c @@ -1129,8 +1129,13 @@ void __init debug_objects_early_init(void) { int i; - for (i = 0; i < ODEBUG_HASH_SIZE; i++) + /* + * Make the obj_hash locks nestable terminal locks. + */ + for (i = 0; i < ODEBUG_HASH_SIZE; i++) { raw_spin_lock_init(&obj_hash[i].lock); + lockdep_set_terminal_nestable_class(&obj_hash[i].lock); + } for (i = 0; i < ODEBUG_POOL_SIZE; i++) hlist_add_head(&obj_static_pool[i].node, &obj_pool); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v2 09/17] debugobjects: Make object hash locks nestable terminal locks 2018-11-19 18:55 ` [PATCH v2 09/17] debugobjects: Make object hash locks " Waiman Long @ 2018-11-22 15:33 ` Petr Mladek 2018-11-22 20:17 ` Waiman Long 2018-12-07 9:47 ` Peter Zijlstra 1 sibling, 1 reply; 40+ messages in thread From: Petr Mladek @ 2018-11-22 15:33 UTC (permalink / raw) To: Waiman Long Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On Mon 2018-11-19 13:55:18, Waiman Long wrote: > By making the object hash locks nestable terminal locks, we can avoid > a bunch of unnecessary lockdep validations as well as saving space > in the lockdep tables. Please, explain which terminal lock might be nested. Hmm, it would hide eventual nesting of other terminal locks. It might reduce lockdep reliability. I wonder if the space optimization is worth it. Finally, it might be good to add a short explanation what (nested) terminal locks mean into each commit. It would help people to understand the effect without digging into the lockdep code, ... Best Regards, Petr ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 09/17] debugobjects: Make object hash locks nestable terminal locks 2018-11-22 15:33 ` Petr Mladek @ 2018-11-22 20:17 ` Waiman Long 2018-11-23 9:29 ` Petr Mladek 0 siblings, 1 reply; 40+ messages in thread From: Waiman Long @ 2018-11-22 20:17 UTC (permalink / raw) To: Petr Mladek Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On 11/22/2018 10:33 AM, Petr Mladek wrote: > On Mon 2018-11-19 13:55:18, Waiman Long wrote: >> By making the object hash locks nestable terminal locks, we can avoid >> a bunch of unnecessary lockdep validations as well as saving space >> in the lockdep tables. > Please, explain which terminal lock might be nested. > > Hmm, it would hide eventual nesting of other terminal locks. > It might reduce lockdep reliability. I wonder if the space > optimization is worth it. > > Finally, it might be good to add a short explanation what (nested) > terminal locks mean into each commit. It would help people to > understand the effect without digging into the lockdep code, ... > > Best Regards, > Petr Nested terminal lock is currently only used in the debugobjects code. It should only be used on a case-by-case basis. In the case of the debugojects code, the locking patterns are: (1) raw_spin_lock(&db_lock); ... raw_spin_unlock(&db_lock); (2) raw_spin_lock(&db_lock); ... raw_spin_lock(&pool_lock); ... raw_spin_unlock(&pool_lock); ... raw_spin_unlock(&db_lock); (3) raw_spin_lock(&pool_lock); ... raw_spin_unlock(&pool_lock); So the db_lock is made to be nestable that it can allow acquisition of pool_lock (a regular terminal lock) underneath it. Cheers, Longman ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 09/17] debugobjects: Make object hash locks nestable terminal locks 2018-11-22 20:17 ` Waiman Long @ 2018-11-23 9:29 ` Petr Mladek 0 siblings, 0 replies; 40+ messages in thread From: Petr Mladek @ 2018-11-23 9:29 UTC (permalink / raw) To: Waiman Long Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On Thu 2018-11-22 15:17:52, Waiman Long wrote: > On 11/22/2018 10:33 AM, Petr Mladek wrote: > > On Mon 2018-11-19 13:55:18, Waiman Long wrote: > >> By making the object hash locks nestable terminal locks, we can avoid > >> a bunch of unnecessary lockdep validations as well as saving space > >> in the lockdep tables. > > Please, explain which terminal lock might be nested. > > So the db_lock is made to be nestable that it can allow acquisition of > pool_lock (a regular terminal lock) underneath it. Please, mention this in the commit message. Best Regards, Petr ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 09/17] debugobjects: Make object hash locks nestable terminal locks 2018-11-19 18:55 ` [PATCH v2 09/17] debugobjects: Make object hash locks " Waiman Long 2018-11-22 15:33 ` Petr Mladek @ 2018-12-07 9:47 ` Peter Zijlstra 1 sibling, 0 replies; 40+ messages in thread From: Peter Zijlstra @ 2018-12-07 9:47 UTC (permalink / raw) To: Waiman Long Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton On Mon, Nov 19, 2018 at 01:55:18PM -0500, Waiman Long wrote: > By making the object hash locks nestable terminal locks, we can avoid > a bunch of unnecessary lockdep validations as well as saving space > in the lockdep tables. So the 'problem'; which you've again not explained; is that debugobjects has the following lock order: &db->lock &pool_lock And you seem to want to tag '&db->lock' as terminal, which is obviuosly a big fat lie. You've also not explained why it is safe to do this (I think it actually is, but you really should've spelled it out). Furthermore; you've not justified any of this 'insanity' with numbers. What do we gain with this nestable madness that justifies the crazy? ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v2 10/17] lib/stackdepot: Make depot_lock a terminal spinlock 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long ` (8 preceding siblings ...) 2018-11-19 18:55 ` [PATCH v2 09/17] debugobjects: Make object hash locks " Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-11-19 18:55 ` [PATCH v2 11/17] locking/rwsem: Mark rwsem.wait_lock as a terminal lock Waiman Long ` (6 subsequent siblings) 16 siblings, 0 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long By defining depot_lock as a terminal spinlock, it reduces the lockdep overhead when this lock is being used. Signed-off-by: Waiman Long <longman@redhat.com> --- lib/stackdepot.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/stackdepot.c b/lib/stackdepot.c index e513459..fb17888 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -78,7 +78,7 @@ struct stack_record { static int depot_index; static int next_slab_inited; static size_t depot_offset; -static DEFINE_SPINLOCK(depot_lock); +static DEFINE_TERMINAL_SPINLOCK(depot_lock); static bool init_stack_slab(void **prealloc) { -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v2 11/17] locking/rwsem: Mark rwsem.wait_lock as a terminal lock 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long ` (9 preceding siblings ...) 2018-11-19 18:55 ` [PATCH v2 10/17] lib/stackdepot: Make depot_lock a terminal spinlock Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-11-19 18:55 ` [PATCH v2 12/17] cgroup: Mark the rstat percpu lock as terminal Waiman Long ` (5 subsequent siblings) 16 siblings, 0 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long The wait_lock in a rwsem is always acquired with IRQ disabled. For the rwsem-xadd.c implementation, no other lock will be called while holding the wait_lock. So it satisfies the condition of being a terminal lock. By marking it as terminal, the lockdep overhead will be reduced. Signed-off-by: Waiman Long <longman@redhat.com> --- include/linux/rwsem.h | 11 ++++++++++- kernel/locking/rwsem-xadd.c | 1 + 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 67dbb57..a2a2385 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -77,16 +77,25 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem) # define __RWSEM_DEP_MAP_INIT(lockname) #endif +/* + * The wait_lock is marked as a terminal lock to reduce lockdep overhead + * when the rwsem-xadd.c is used. This is implied when + * CONFIG_RWSEM_SPIN_ON_OWNER is true. The rwsem-spinlock.c implementation + * allows calling wake_up_process() while holding the wait_lock. So it + * can't be marked as terminal in this case. + */ #ifdef CONFIG_RWSEM_SPIN_ON_OWNER #define __RWSEM_OPT_INIT(lockname) , .osq = OSQ_LOCK_UNLOCKED, .owner = NULL +#define __RWSEM_WAIT_LOCK_INIT(x) __RAW_TERMINAL_SPIN_LOCK_UNLOCKED(x) #else #define __RWSEM_OPT_INIT(lockname) +#define __RWSEM_WAIT_LOCK_INIT(x) __RAW_SPIN_LOCK_UNLOCKED(x) #endif #define __RWSEM_INITIALIZER(name) \ { __RWSEM_INIT_COUNT(name), \ .wait_list = LIST_HEAD_INIT((name).wait_list), \ - .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock) \ + .wait_lock = __RWSEM_WAIT_LOCK_INIT(name.wait_lock) \ __RWSEM_OPT_INIT(name) \ __RWSEM_DEP_MAP_INIT(name) } diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 09b1800..3dbe593 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -85,6 +85,7 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name, #endif atomic_long_set(&sem->count, RWSEM_UNLOCKED_VALUE); raw_spin_lock_init(&sem->wait_lock); + lockdep_set_terminal_class(&sem->wait_lock); INIT_LIST_HEAD(&sem->wait_list); #ifdef CONFIG_RWSEM_SPIN_ON_OWNER sem->owner = NULL; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v2 12/17] cgroup: Mark the rstat percpu lock as terminal 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long ` (10 preceding siblings ...) 2018-11-19 18:55 ` [PATCH v2 11/17] locking/rwsem: Mark rwsem.wait_lock as a terminal lock Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-11-19 18:55 ` [PATCH v2 13/17] mm/kasan: Make quarantine_lock a terminal lock Waiman Long ` (4 subsequent siblings) 16 siblings, 0 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long By classifying the cgroup rstat percpu locks as terminal locks, it reduces the lockdep overhead when these locks are being used. Signed-off-by: Waiman Long <longman@redhat.com> --- kernel/cgroup/rstat.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index d503d1a..47f7ffb 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -291,8 +291,13 @@ void __init cgroup_rstat_boot(void) { int cpu; - for_each_possible_cpu(cpu) - raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)); + for_each_possible_cpu(cpu) { + raw_spinlock_t *cgroup_rstat_percpu_lock = + per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); + + raw_spin_lock_init(cgroup_rstat_percpu_lock); + lockdep_set_terminal_class(cgroup_rstat_percpu_lock); + } BUG_ON(cgroup_rstat_init(&cgrp_dfl_root.cgrp)); } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v2 13/17] mm/kasan: Make quarantine_lock a terminal lock 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long ` (11 preceding siblings ...) 2018-11-19 18:55 ` [PATCH v2 12/17] cgroup: Mark the rstat percpu lock as terminal Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-11-19 18:55 ` [PATCH v2 14/17] dma-debug: Mark free_entries_lock as terminal Waiman Long ` (3 subsequent siblings) 16 siblings, 0 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long By making quarantine_lock a terminal spinlock, it reduces the lockdep overhead when this lock is being used. Signed-off-by: Waiman Long <longman@redhat.com> --- mm/kasan/quarantine.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c index b209dba..c9d36ab 100644 --- a/mm/kasan/quarantine.c +++ b/mm/kasan/quarantine.c @@ -103,7 +103,7 @@ static void qlist_move_all(struct qlist_head *from, struct qlist_head *to) static int quarantine_tail; /* Total size of all objects in global_quarantine across all batches. */ static unsigned long quarantine_size; -static DEFINE_RAW_SPINLOCK(quarantine_lock); +static DEFINE_RAW_TERMINAL_SPINLOCK(quarantine_lock); DEFINE_STATIC_SRCU(remove_cache_srcu); /* Maximum size of the global queue. */ -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v2 14/17] dma-debug: Mark free_entries_lock as terminal 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long ` (12 preceding siblings ...) 2018-11-19 18:55 ` [PATCH v2 13/17] mm/kasan: Make quarantine_lock a terminal lock Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-11-19 18:55 ` [PATCH v2 15/17] kernfs: Mark kernfs_open_node_lock as terminal lock Waiman Long ` (2 subsequent siblings) 16 siblings, 0 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long By making free_entries_lock a terminal spinlock, it reduces the lockdep overhead when this lock is used. Signed-off-by: Waiman Long <longman@redhat.com> --- kernel/dma/debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 231ca46..f891688 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -106,7 +106,7 @@ struct hash_bucket { /* List of pre-allocated dma_debug_entry's */ static LIST_HEAD(free_entries); /* Lock for the list above */ -static DEFINE_SPINLOCK(free_entries_lock); +static DEFINE_TERMINAL_SPINLOCK(free_entries_lock); /* Global disable flag - will be set in case of an error */ static bool global_disable __read_mostly; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v2 15/17] kernfs: Mark kernfs_open_node_lock as terminal lock 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long ` (13 preceding siblings ...) 2018-11-19 18:55 ` [PATCH v2 14/17] dma-debug: Mark free_entries_lock as terminal Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-11-19 18:55 ` [PATCH v2 16/17] delay_acct: Mark task's delays->lock as terminal spinlock Waiman Long 2018-11-19 18:55 ` [PATCH v2 17/17] locking/lockdep: Check raw/non-raw locking conflicts Waiman Long 16 siblings, 0 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long By making kernfs_open_node_lock a terminal spinlock, it reduces the lockdep overhead when this lock is used. Signed-off-by: Waiman Long <longman@redhat.com> --- fs/kernfs/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index dbf5bc2..a86fe22 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -29,7 +29,7 @@ * kernfs_open_file. kernfs_open_files are chained at * kernfs_open_node->files, which is protected by kernfs_open_file_mutex. */ -static DEFINE_SPINLOCK(kernfs_open_node_lock); +static DEFINE_TERMINAL_SPINLOCK(kernfs_open_node_lock); static DEFINE_MUTEX(kernfs_open_file_mutex); struct kernfs_open_node { -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v2 16/17] delay_acct: Mark task's delays->lock as terminal spinlock 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long ` (14 preceding siblings ...) 2018-11-19 18:55 ` [PATCH v2 15/17] kernfs: Mark kernfs_open_node_lock as terminal lock Waiman Long @ 2018-11-19 18:55 ` Waiman Long 2018-11-19 18:55 ` [PATCH v2 17/17] locking/lockdep: Check raw/non-raw locking conflicts Waiman Long 16 siblings, 0 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long By making task's delays->lock a terminal spinlock, it reduces the lockdep overhead when this lock is used. Signed-off-by: Waiman Long <longman@redhat.com> --- kernel/delayacct.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/delayacct.c b/kernel/delayacct.c index 2a12b98..49dd8d3 100644 --- a/kernel/delayacct.c +++ b/kernel/delayacct.c @@ -43,8 +43,10 @@ void delayacct_init(void) void __delayacct_tsk_init(struct task_struct *tsk) { tsk->delays = kmem_cache_zalloc(delayacct_cache, GFP_KERNEL); - if (tsk->delays) + if (tsk->delays) { raw_spin_lock_init(&tsk->delays->lock); + lockdep_set_terminal_class(&tsk->delays->lock); + } } /* -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v2 17/17] locking/lockdep: Check raw/non-raw locking conflicts 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long ` (15 preceding siblings ...) 2018-11-19 18:55 ` [PATCH v2 16/17] delay_acct: Mark task's delays->lock as terminal spinlock Waiman Long @ 2018-11-19 18:55 ` Waiman Long 16 siblings, 0 replies; 40+ messages in thread From: Waiman Long @ 2018-11-19 18:55 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner Cc: linux-kernel, kasan-dev, linux-mm, iommu, Petr Mladek, Sergey Senozhatsky, Andrey Ryabinin, Tejun Heo, Andrew Morton, Waiman Long A task holding a raw spinlock should not acquire a non-raw lock as that will break PREEMPT_RT kernel. Checking is now added and a lockdep warning will be printed if that happens. Signed-off-by: Waiman Long <longman@redhat.com> --- include/linux/lockdep.h | 6 ++++++ include/linux/spinlock_types.h | 4 ++-- kernel/locking/lockdep.c | 15 +++++++++++++-- kernel/locking/spinlock_debug.c | 1 + 4 files changed, 22 insertions(+), 4 deletions(-) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index b9435fb..9a6fe0e 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -150,12 +150,15 @@ struct lock_class_stats { * another lock within its critical section is not allowed. * 3) LOCKDEP_FLAG_TERMINAL_NESTABLE: This is a terminal lock that can * allow one more regular terminal lock to be nested underneath it. + * 4) LOCKDEP_FLAG_RAW: This is a raw spinlock. A task holding a raw + * spinlock should not acquire a non-raw lock. * * Only the least significant 4 bits of the flags will be copied to the * held_lock structure. */ #define LOCKDEP_FLAG_TERMINAL (1 << 0) #define LOCKDEP_FLAG_TERMINAL_NESTABLE (1 << 1) +#define LOCKDEP_FLAG_RAW (1 << 2) #define LOCKDEP_FLAG_NOVALIDATE (1 << 4) #define LOCKDEP_HLOCK_FLAGS_MASK 0x0f @@ -333,6 +336,8 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name, do { (lock)->dep_map.flags |= LOCKDEP_FLAG_TERMINAL; } while (0) #define lockdep_set_terminal_nestable_class(lock) \ do { (lock)->dep_map.flags |= LOCKDEP_FLAG_TERMINAL_NESTABLE; } while (0) +#define lockdep_set_raw_class(lock) \ + do { (lock)->dep_map.flags |= LOCKDEP_FLAG_RAW; } while (0) /* * Compare locking classes @@ -448,6 +453,7 @@ static inline void lockdep_on(void) do { (void)(key); } while (0) #define lockdep_set_subclass(lock, sub) do { } while (0) +#define lockdep_set_raw_class(lock) do { } while (0) #define lockdep_set_novalidate_class(lock) do { } while (0) #define lockdep_set_terminal_class(lock) do { } while (0) #define lockdep_set_terminal_nestable_class(lock) do { } while (0) diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h index 6a8086e..1d2114b 100644 --- a/include/linux/spinlock_types.h +++ b/include/linux/spinlock_types.h @@ -55,11 +55,11 @@ SPIN_DEP_MAP_INIT(lockname, f) } #define __RAW_SPIN_LOCK_UNLOCKED(lockname) \ - (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname, 0) + (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname, LOCKDEP_FLAG_RAW) #define __RAW_TERMINAL_SPIN_LOCK_UNLOCKED(lockname) \ (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname, \ - LOCKDEP_FLAG_TERMINAL) + LOCKDEP_FLAG_TERMINAL|LOCKDEP_FLAG_RAW) #define DEFINE_RAW_SPINLOCK(x) raw_spinlock_t x = __RAW_SPIN_LOCK_UNLOCKED(x) #define DEFINE_RAW_TERMINAL_SPINLOCK(x) \ diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 5a853a6..efafd2d 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -3273,8 +3273,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, * one isn't a regular terminal lock. */ prev_type = hlock_is_terminal(hlock); - if (DEBUG_LOCKS_WARN_ON((prev_type == LOCKDEP_FLAG_TERMINAL) || - ((prev_type == LOCKDEP_FLAG_TERMINAL_NESTABLE) && + if (DEBUG_LOCKS_WARN_ON((prev_type & LOCKDEP_FLAG_TERMINAL) || + ((prev_type & LOCKDEP_FLAG_TERMINAL_NESTABLE) && (flags_is_terminal(class->flags) != LOCKDEP_FLAG_TERMINAL)))) { pr_warn("Terminal lock error: prev lock = %s, curr lock = %s\n", @@ -3282,6 +3282,17 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, return 0; } + /* + * A task holding a raw spinlock should not acquire another + * non-raw lock. + */ + if (DEBUG_LOCKS_WARN_ON((prev_type & LOCKDEP_FLAG_RAW) && + !(class->flags & LOCKDEP_FLAG_RAW))) { + pr_warn("Raw lock error: prev lock = %s, curr lock = %s\n", + hlock->instance->name, class->name); + return 0; + } + if (hlock->class_idx == class_idx && nest_lock) { if (hlock->references) { /* diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c index 9aa0fcc..1794d47 100644 --- a/kernel/locking/spinlock_debug.c +++ b/kernel/locking/spinlock_debug.c @@ -22,6 +22,7 @@ void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name, */ debug_check_no_locks_freed((void *)lock, sizeof(*lock)); lockdep_init_map(&lock->dep_map, name, key, 0); + lockdep_set_raw_class(lock); #endif lock->raw_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; lock->magic = SPINLOCK_MAGIC; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
end of thread, other threads:[~2018-12-11 15:23 UTC | newest] Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long 2018-11-19 18:55 ` [PATCH v2 01/17] locking/lockdep: Remove version from lock_class structure Waiman Long 2018-12-11 15:22 ` [tip:locking/core] locking/lockdep: Remove ::version " tip-bot for Waiman Long 2018-11-19 18:55 ` [PATCH v2 02/17] locking/lockdep: Rework lockdep_set_novalidate_class() Waiman Long 2018-11-19 18:55 ` [PATCH v2 03/17] locking/lockdep: Add a new terminal lock type Waiman Long 2018-12-07 9:19 ` Peter Zijlstra 2018-11-19 18:55 ` [PATCH v2 04/17] locking/lockdep: Add DEFINE_TERMINAL_SPINLOCK() and related macros Waiman Long 2018-11-19 18:55 ` [PATCH v2 05/17] printk: Mark logbuf_lock & console_owner_lock as terminal locks Waiman Long 2018-11-19 18:55 ` [PATCH v2 06/17] debugobjects: Mark pool_lock as a terminal lock Waiman Long 2018-11-19 18:55 ` [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections Waiman Long 2018-11-21 16:49 ` Waiman Long 2018-11-22 2:04 ` Sergey Senozhatsky 2018-11-22 10:16 ` Peter Zijlstra 2018-11-23 2:40 ` Sergey Senozhatsky 2018-11-23 11:48 ` Petr Mladek 2018-11-26 4:57 ` Sergey Senozhatsky 2018-11-29 13:09 ` Petr Mladek 2018-11-22 16:02 ` Petr Mladek 2018-11-22 20:29 ` Waiman Long 2018-11-23 11:36 ` Petr Mladek 2018-11-22 19:57 ` Waiman Long 2018-11-23 2:35 ` Sergey Senozhatsky 2018-11-23 11:20 ` Petr Mladek 2018-12-07 9:21 ` Peter Zijlstra 2018-11-19 18:55 ` [PATCH v2 08/17] locking/lockdep: Add support for nestable terminal locks Waiman Long 2018-12-07 9:22 ` Peter Zijlstra 2018-12-07 9:52 ` Peter Zijlstra 2018-11-19 18:55 ` [PATCH v2 09/17] debugobjects: Make object hash locks " Waiman Long 2018-11-22 15:33 ` Petr Mladek 2018-11-22 20:17 ` Waiman Long 2018-11-23 9:29 ` Petr Mladek 2018-12-07 9:47 ` Peter Zijlstra 2018-11-19 18:55 ` [PATCH v2 10/17] lib/stackdepot: Make depot_lock a terminal spinlock Waiman Long 2018-11-19 18:55 ` [PATCH v2 11/17] locking/rwsem: Mark rwsem.wait_lock as a terminal lock Waiman Long 2018-11-19 18:55 ` [PATCH v2 12/17] cgroup: Mark the rstat percpu lock as terminal Waiman Long 2018-11-19 18:55 ` [PATCH v2 13/17] mm/kasan: Make quarantine_lock a terminal lock Waiman Long 2018-11-19 18:55 ` [PATCH v2 14/17] dma-debug: Mark free_entries_lock as terminal Waiman Long 2018-11-19 18:55 ` [PATCH v2 15/17] kernfs: Mark kernfs_open_node_lock as terminal lock Waiman Long 2018-11-19 18:55 ` [PATCH v2 16/17] delay_acct: Mark task's delays->lock as terminal spinlock Waiman Long 2018-11-19 18:55 ` [PATCH v2 17/17] locking/lockdep: Check raw/non-raw locking conflicts Waiman Long
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).