5.14-0-rc5, splat in block_group_cache_tree_search while __btrfs_unlink

All of lore.kernel.org
 help / color / mirror / Atom feed

* 5.14-0-rc5, splat in block_group_cache_tree_search while __btrfs_unlink_inode
@ 2021-08-14  4:33 Chris Murphy
  2021-08-16  7:17 ` Johannes Thumshirn
  2021-08-16 10:20 ` David Sterba
  0 siblings, 2 replies; 10+ messages in thread
From: Chris Murphy @ 2021-08-14  4:33 UTC (permalink / raw)
  To: Btrfs BTRFS

I get the following call trace about 0.6s after dnf completes an
update which I imagine deletes many files. I'll try to reproduce and
get /proc/lock_stat

[   95.674507] kernel: BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
[   95.676537] kernel: turning off the locking correctness validator.
[   95.676537] kernel: Please attach the output of /proc/lock_stat to
the bug report
[   95.676538] kernel: CPU: 5 PID: 6045 Comm: systemd Not tainted
5.14.0-0.rc5.42.fc35.x86_64+debug #1
[   95.676540] kernel: Hardware name: LENOVO 20QDS3E200/20QDS3E200,
BIOS N2HET63W (1.46 ) 06/01/2021
[   95.681911] kernel: Call Trace:
[   95.681913] kernel:  dump_stack_lvl+0x57/0x72
[   95.681919] kernel:  __lock_acquire.cold+0x11/0x2b8
[   95.681923] kernel:  lock_acquire+0xc4/0x2e0
[   95.681926] kernel:  ? block_group_cache_tree_search+0x24/0x110
[   95.681929] kernel:  ? rcu_read_lock_sched_held+0x3f/0x80
[   95.681931] kernel:  ? check_ref_cleanup+0x154/0x160
[   95.681933] kernel:  ? kmem_cache_free+0x2c7/0x380
[   95.681936] kernel:  ? __mutex_unlock_slowpath+0x35/0x270
[   95.681940] kernel:  _raw_spin_lock+0x31/0x80
[   95.681941] kernel:  ? block_group_cache_tree_search+0x24/0x110
[   95.681943] kernel:  block_group_cache_tree_search+0x24/0x110
[   95.681944] kernel:  btrfs_free_tree_block+0x126/0x300
[   95.681947] kernel:  __btrfs_cow_block+0x308/0x5d0
[   95.681950] kernel:  btrfs_cow_block+0xf8/0x220
[   95.681951] kernel:  btrfs_search_slot+0x508/0x9e0
[   95.681954] kernel:  btrfs_lookup_dir_item+0x6c/0xb0
[   95.681956] kernel:  __btrfs_unlink_inode+0x9a/0x4e0
[   95.681960] kernel:  btrfs_rmdir+0xdf/0x1d0
[   95.681962] kernel:  vfs_rmdir+0x7e/0x190
[   95.681965] kernel:  do_rmdir+0x16a/0x190
[   95.681967] kernel:  do_syscall_64+0x38/0x90
[   95.681970] kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   95.681972] kernel: RIP: 0033:0x7f07bfcb3e8b
[   95.681975] kernel: Code: 73 01 c3 48 8b 0d ed ff 0c 00 f7 d8 64 89
01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 07
01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd ff 0c 00 f7 d8
64 89 01 48
[   95.681976] kernel: RSP: 002b:00007f07be6647a8 EFLAGS: 00000246
ORIG_RAX: 0000000000000107
[   95.681978] kernel: RAX: ffffffffffffffda RBX: 0000000000000006
RCX: 00007f07bfcb3e8b
[   95.681979] kernel: RDX: 0000000000000200 RSI: 00007f07b0000bd3
RDI: 0000000000000026
[   95.681980] kernel: RBP: 0000000000000026 R08: 0000000000000000
R09: 00007f07b0000080
[   95.681980] kernel: R10: 0000000000000078 R11: 0000000000000246
R12: 0000000000000000
[   95.681981] kernel: R13: 00007f07b0000bd3 R14: 0000000000000200
R15: 00007f07be6653f8

mount options:
/dev/nvme0n1p5 on / type btrfs
(rw,noatime,seclabel,compress=zstd:1,ssd,space_cache=v2,subvolid=268,subvol=/root34)

features:
csum_type        1 (xxhash64)
num_devices        1
compat_flags        0x0
compat_ro_flags        0x3
            ( FREE_SPACE_TREE |
              FREE_SPACE_TREE_VALID )
incompat_flags        0x355
            ( MIXED_BACKREF |
              MIXED_GROUPS |
              COMPRESS_ZSTD |
              EXTENDED_IREF |
              SKINNY_METADATA |
              NO_HOLES )

Full dmesg:
https://drive.google.com/file/d/1cLuWfvM_K4E6ML44KgAbNljF16bT3nSi/view?usp=sharing

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 5.14-0-rc5, splat in block_group_cache_tree_search while __btrfs_unlink_inode
  2021-08-14  4:33 5.14-0-rc5, splat in block_group_cache_tree_search while __btrfs_unlink_inode Chris Murphy
@ 2021-08-16  7:17 ` Johannes Thumshirn
  2021-08-17  3:21   ` Chris Murphy
  2021-08-16 10:20 ` David Sterba
  1 sibling, 1 reply; 10+ messages in thread
From: Johannes Thumshirn @ 2021-08-16  7:17 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS

On 14/08/2021 06:33, Chris Murphy wrote:
> I get the following call trace about 0.6s after dnf completes an
> update which I imagine deletes many files. I'll try to reproduce and
> get /proc/lock_stat
> 
> [   95.674507] kernel: BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
> [   95.676537] kernel: turning off the locking correctness validator.
> [   95.676537] kernel: Please attach the output of /proc/lock_stat to
> the bug report

That can be "fixed" by bumping CONFIG_LOCKDEP_CHAINS_BITS.
I have it set to 18 here.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 5.14-0-rc5, splat in block_group_cache_tree_search while __btrfs_unlink_inode
  2021-08-14  4:33 5.14-0-rc5, splat in block_group_cache_tree_search while __btrfs_unlink_inode Chris Murphy
  2021-08-16  7:17 ` Johannes Thumshirn
@ 2021-08-16 10:20 ` David Sterba
  2021-08-16 10:27   ` Nikolay Borisov
  1 sibling, 1 reply; 10+ messages in thread
From: David Sterba @ 2021-08-16 10:20 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Fri, Aug 13, 2021 at 10:33:05PM -0600, Chris Murphy wrote:
> I get the following call trace about 0.6s after dnf completes an
> update which I imagine deletes many files. I'll try to reproduce and
> get /proc/lock_stat
> 
> [   95.674507] kernel: BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!

The message "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low" is related to
lockdep and not a btrfs problem, but it appears from time to time and as
Johannes said either increase the config variable, or ignore it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 5.14-0-rc5, splat in block_group_cache_tree_search while __btrfs_unlink_inode
  2021-08-16 10:20 ` David Sterba
@ 2021-08-16 10:27   ` Nikolay Borisov
  2021-08-16 10:52     ` David Sterba
  2021-08-16 15:11     ` Peter Zijlstra
  0 siblings, 2 replies; 10+ messages in thread
From: Nikolay Borisov @ 2021-08-16 10:27 UTC (permalink / raw)
  To: dsterba, Chris Murphy, Btrfs BTRFS; +Cc: Peter Zijlstra



On 16.08.21 г. 13:20, David Sterba wrote:
> On Fri, Aug 13, 2021 at 10:33:05PM -0600, Chris Murphy wrote:
>> I get the following call trace about 0.6s after dnf completes an
>> update which I imagine deletes many files. I'll try to reproduce and
>> get /proc/lock_stat
>>
>> [   95.674507] kernel: BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
> 
> The message "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low" is related to
> lockdep and not a btrfs problem, but it appears from time to time and as
> Johannes said either increase the config variable, or ignore it.
> 

But is not a bug if code triggers it? I.e I think it's a signal of
misbehaving code. CC'ed PeterZ who can hopefully shed some light on this.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 5.14-0-rc5, splat in block_group_cache_tree_search while __btrfs_unlink_inode
  2021-08-16 10:27   ` Nikolay Borisov
@ 2021-08-16 10:52     ` David Sterba
  2021-08-16 15:11     ` Peter Zijlstra
  1 sibling, 0 replies; 10+ messages in thread
From: David Sterba @ 2021-08-16 10:52 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dsterba, Chris Murphy, Btrfs BTRFS, Peter Zijlstra

On Mon, Aug 16, 2021 at 01:27:35PM +0300, Nikolay Borisov wrote:
> 
> 
> On 16.08.21 г. 13:20, David Sterba wrote:
> > On Fri, Aug 13, 2021 at 10:33:05PM -0600, Chris Murphy wrote:
> >> I get the following call trace about 0.6s after dnf completes an
> >> update which I imagine deletes many files. I'll try to reproduce and
> >> get /proc/lock_stat
> >>
> >> [   95.674507] kernel: BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
> > 
> > The message "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low" is related to
> > lockdep and not a btrfs problem, but it appears from time to time and as
> > Johannes said either increase the config variable, or ignore it.
> > 
> 
> But is not a bug if code triggers it? I.e I think it's a signal of
> misbehaving code. CC'ed PeterZ who can hopefully shed some light on this.

Maybe it's a bug, maybe it's a misconfigured build. Somebody (I don't
remember the name) asked for logs and how often it happened, that I
provided and nothing happened and the warning still shows up.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 5.14-0-rc5, splat in block_group_cache_tree_search while __btrfs_unlink_inode
  2021-08-16 10:27   ` Nikolay Borisov
  2021-08-16 10:52     ` David Sterba
@ 2021-08-16 15:11     ` Peter Zijlstra
  2021-08-16 15:40       ` Matthew Wilcox
  2021-08-16 20:43       ` David Sterba
  1 sibling, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2021-08-16 15:11 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: dsterba, Chris Murphy, Btrfs BTRFS, mingo, will, longman,
	boqun.feng, Thomas Gleixner, willy

On Mon, Aug 16, 2021 at 01:27:35PM +0300, Nikolay Borisov wrote:
> On 16.08.21 г. 13:20, David Sterba wrote:
> > On Fri, Aug 13, 2021 at 10:33:05PM -0600, Chris Murphy wrote:
> >> I get the following call trace about 0.6s after dnf completes an
> >> update which I imagine deletes many files. I'll try to reproduce and
> >> get /proc/lock_stat
> >>
> >> [   95.674507] kernel: BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
> > 
> > The message "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low" is related to
> > lockdep and not a btrfs problem, but it appears from time to time and as
> > Johannes said either increase the config variable, or ignore it.
> > 
> 
> But is not a bug if code triggers it? I.e I think it's a signal of
> misbehaving code. CC'ed PeterZ who can hopefully shed some light on this.

Yeah, it's lockdep running out of (memory) resources... the immediate
thing I noticed is that btrfs_set_buffer_lockdep_class() creates 12*8
lock classes, but when you consider enum btrfs_lock_nesting that becomes
12*8*8, which is a somewhat silly number.. When the number of classes
increase like that, the number of unique lock chains also increases and
consequently we seem to run out of hlocks.

(possibly there's something else that also creates a ton of classes on
that system, but -ENOINFO)

One thing I've been contemplating is using an annotation scheme like
ww_mutex for trees. Then the tree root gets a nest class and all the
tree nodes get the same lock class. The down-side is that any actual
tree lock order violations go unreported. But the current
class-per-level annotation already has some of that anyway.

One thing I've pondered is to augment nest_lock with a custom validation
function which is called in the 'reference' case which validates if the
current 'instance' nests in the previous instance.

Something a little like the *completely* untested below....

Then I imagine someone could do something like:

struct my_tree_node {
	...
	spinlock_t lock;
};

struct my_tree_root {
	...
	struct my_tree_node *node;
	struct lockdep_map dep_map;
};

bool my_tree_validate_order(struct lockdep_map *prev, struct lockdep_map *next)
{
	struct my_tree_node *parent = container_of(prev, struct my_tree_node, lock.dep_map);
	struct my_tree_node *child = container_of(next, struct my_tree_node, lock.dep_map);

	return is_child_of(parent, child);
}

void my_tree_init_root(struct my_tree_root *root)
{
	lockdep_init_map_type(&root->dep_map, "my_tree_root", &my_tree_validate_order, 0,
			      LD_WAIT_INV, LD_WAIT_INV, LD_LOCK_NEST_FUNC);
}

void my_tree_lock_prepare(struct my_tree_root *root)
{
	mutex_acquire(&root->dep_map, 0, 0, _RET_IP_);
}

void my_tree_lock_finish(struct my_tree_root *root)
{
	mutex_release(&root->dep_map, _RET_IP_);
}

void my_tree_lock_node(struct my_tree_root *root, struct my_tree_node *node)
{
	spin_lock_nest_lock(&node->lock, root);
}

void my_tree_unlock_node(struct my_tree_node *node)
{
	spin_unlock(&node->lock);
}

And the whole 'tree' thing collapses into a single class and single hold
entry.

---
 include/linux/lockdep_types.h |  1 +
 kernel/locking/lockdep.c      | 68 ++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 61 insertions(+), 8 deletions(-)

diff --git a/include/linux/lockdep_types.h b/include/linux/lockdep_types.h
index 3e726ace5c62..0afc46a1fd31 100644
--- a/include/linux/lockdep_types.h
+++ b/include/linux/lockdep_types.h
@@ -33,6 +33,7 @@ enum lockdep_wait_type {
 enum lockdep_lock_type {
 	LD_LOCK_NORMAL = 0,	/* normal, catch all */
 	LD_LOCK_PERCPU,		/* percpu */
+	LD_LOCK_NEST_FUNC,
 	LD_LOCK_MAX,
 };
 
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 8a509672a4cc..7e6fcd09a34c 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -723,7 +723,7 @@ static void print_lockdep_cache(struct lockdep_map *lock)
 	printk(KERN_CONT "%s", name);
 }
 
-static void print_lock(struct held_lock *hlock)
+static void __print_lock(struct held_lock *hlock, struct lockdep_map *instance, unsigned long ip)
 {
 	/*
 	 * We can be called locklessly through debug_show_all_locks() so be
@@ -742,9 +742,14 @@ static void print_lock(struct held_lock *hlock)
 		return;
 	}
 
-	printk(KERN_CONT "%px", hlock->instance);
+	printk(KERN_CONT "%px", instance);
 	print_lock_name(lock);
-	printk(KERN_CONT ", at: %pS\n", (void *)hlock->acquire_ip);
+	printk(KERN_CONT ", at: %pS\n", (void *)ip);
+}
+
+static void print_lock(struct held_lock *hlock)
+{
+	__print_lock(hlock, hlock->instance, hlock->acquire_ip);
 }
 
 static void lockdep_print_held_locks(struct task_struct *p)
@@ -806,6 +811,9 @@ static int static_obj(const void *obj)
 	if (arch_is_kernel_data(addr))
 		return 1;
 
+	if (arch_is_kernel_text(addr))
+		return 1;
+
 	/*
 	 * in-kernel percpu var?
 	 */
@@ -4846,8 +4854,34 @@ print_lock_nested_lock_not_held(struct task_struct *curr,
 	pr_warn("\nbut this task is not holding:\n");
 	pr_warn("%s\n", hlock->nest_lock->name);
 
+	pr_warn("\nother info that might help us debug this:\n");
+	lockdep_print_held_locks(curr);
+
 	pr_warn("\nstack backtrace:\n");
 	dump_stack();
+}
+
+static void
+print_lock_nested_invalid(struct task_struct *curr,
+			  struct held_lock *hlock,
+			  struct lockdep_map *lock,
+			  unsigned long ip)
+{
+	if (!debug_locks_off())
+		return;
+	if (debug_locks_silent)
+		return;
+
+	pr_warn("\n");
+	pr_warn("============================\n");
+	pr_warn("WARNING: Nested lock invalid\n");
+	print_kernel_ident();
+	pr_warn("----------------------------\n");
+
+	pr_warn("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr));
+	__print_lock(hlock, lock, ip);
+	pr_warn("but %ps() fails with already held lock:\n", (void*)hlock->nest_lock->key);
+	print_lock(hlock);
 
 	pr_warn("\nother info that might help us debug this:\n");
 	lockdep_print_held_locks(curr);
@@ -4858,6 +4892,13 @@ print_lock_nested_lock_not_held(struct task_struct *curr,
 
 static int __lock_is_held(const struct lockdep_map *lock, int read);
 
+typedef bool (*nest_f)(struct lockdep_map *prev, struct lockdep_map *next);
+
+bool __nest_func_true(struct lockdep_map *prev, struct lockdep_map *next)
+{
+	return true;
+}
+
 /*
  * This gets called for every mutex_lock*()/spin_lock*() operation.
  * We maintain the dependency maps and validate the locking attempt:
@@ -4871,6 +4912,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 			  struct lockdep_map *nest_lock, unsigned long ip,
 			  int references, int pin_count)
 {
+	nest_f nest_func = __nest_func_true;
 	struct task_struct *curr = current;
 	struct lock_class *class = NULL;
 	struct held_lock *hlock;
@@ -4920,6 +4962,15 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 
 	class_idx = class - lock_classes;
 
+	if (nest_lock) {
+		if (!__lock_is_held(nest_lock, -1)) {
+			print_lock_nested_lock_not_held(curr, hlock, ip);
+			return 0;
+		}
+		if (nest_lock->type == LD_LOCK_NEST_FUNC)
+			nest_func = (nest_f)nest_lock->key;
+	}
+
 	if (depth) { /* we're holding locks */
 		hlock = curr->held_locks + depth - 1;
 		if (hlock->class_idx == class_idx && nest_lock) {
@@ -4929,6 +4980,12 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 			if (!hlock->references)
 				hlock->references++;
 
+			if (!nest_func(hlock->instance, lock)) {
+				print_lock_nested_invalid(curr, hlock, lock, ip);
+				return 0;
+			}
+
+			hlock->instance = lock;
 			hlock->references += references;
 
 			/* Overflow */
@@ -5002,11 +5059,6 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	}
 	chain_key = iterate_chain_key(chain_key, hlock_id(hlock));
 
-	if (nest_lock && !__lock_is_held(nest_lock, -1)) {
-		print_lock_nested_lock_not_held(curr, hlock, ip);
-		return 0;
-	}
-
 	if (!debug_locks_silent) {
 		WARN_ON_ONCE(depth && !hlock_class(hlock - 1)->key);
 		WARN_ON_ONCE(!hlock_class(hlock)->key);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: 5.14-0-rc5, splat in block_group_cache_tree_search while __btrfs_unlink_inode
  2021-08-16 15:11     ` Peter Zijlstra
@ 2021-08-16 15:40       ` Matthew Wilcox
  2021-08-16 20:43       ` David Sterba
  1 sibling, 0 replies; 10+ messages in thread
From: Matthew Wilcox @ 2021-08-16 15:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nikolay Borisov, dsterba, Chris Murphy, Btrfs BTRFS, mingo, will,
	longman, boqun.feng, Thomas Gleixner, linux-xfs

On Mon, Aug 16, 2021 at 05:11:55PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 16, 2021 at 01:27:35PM +0300, Nikolay Borisov wrote:
> > On 16.08.21 г. 13:20, David Sterba wrote:
> > > On Fri, Aug 13, 2021 at 10:33:05PM -0600, Chris Murphy wrote:
> > >> I get the following call trace about 0.6s after dnf completes an
> > >> update which I imagine deletes many files. I'll try to reproduce and
> > >> get /proc/lock_stat
> > >>
> > >> [   95.674507] kernel: BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
> > > 
> > > The message "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low" is related to
> > > lockdep and not a btrfs problem, but it appears from time to time and as
> > > Johannes said either increase the config variable, or ignore it.
> > > 
> > 
> > But is not a bug if code triggers it? I.e I think it's a signal of
> > misbehaving code. CC'ed PeterZ who can hopefully shed some light on this.
> 
> Yeah, it's lockdep running out of (memory) resources... the immediate
> thing I noticed is that btrfs_set_buffer_lockdep_class() creates 12*8
> lock classes, but when you consider enum btrfs_lock_nesting that becomes
> 12*8*8, which is a somewhat silly number.. When the number of classes
> increase like that, the number of unique lock chains also increases and
> consequently we seem to run out of hlocks.

This looks a bit like some of the things the XFS people complained about
(and led to XFS basically disabling lockdep for their own locks).
Maybe the annotation scheme you outline below would help them too?

> (possibly there's something else that also creates a ton of classes on
> that system, but -ENOINFO)
> 
> One thing I've been contemplating is using an annotation scheme like
> ww_mutex for trees. Then the tree root gets a nest class and all the
> tree nodes get the same lock class. The down-side is that any actual
> tree lock order violations go unreported. But the current
> class-per-level annotation already has some of that anyway.
> 
> One thing I've pondered is to augment nest_lock with a custom validation
> function which is called in the 'reference' case which validates if the
> current 'instance' nests in the previous instance.
> 
> Something a little like the *completely* untested below....
> 
> Then I imagine someone could do something like:
> 
> struct my_tree_node {
> 	...
> 	spinlock_t lock;
> };
> 
> struct my_tree_root {
> 	...
> 	struct my_tree_node *node;
> 	struct lockdep_map dep_map;
> };
> 
> bool my_tree_validate_order(struct lockdep_map *prev, struct lockdep_map *next)
> {
> 	struct my_tree_node *parent = container_of(prev, struct my_tree_node, lock.dep_map);
> 	struct my_tree_node *child = container_of(next, struct my_tree_node, lock.dep_map);
> 
> 	return is_child_of(parent, child);
> }
> 
> void my_tree_init_root(struct my_tree_root *root)
> {
> 	lockdep_init_map_type(&root->dep_map, "my_tree_root", &my_tree_validate_order, 0,
> 			      LD_WAIT_INV, LD_WAIT_INV, LD_LOCK_NEST_FUNC);
> }
> 
> void my_tree_lock_prepare(struct my_tree_root *root)
> {
> 	mutex_acquire(&root->dep_map, 0, 0, _RET_IP_);
> }
> 
> void my_tree_lock_finish(struct my_tree_root *root)
> {
> 	mutex_release(&root->dep_map, _RET_IP_);
> }
> 
> void my_tree_lock_node(struct my_tree_root *root, struct my_tree_node *node)
> {
> 	spin_lock_nest_lock(&node->lock, root);
> }
> 
> void my_tree_unlock_node(struct my_tree_node *node)
> {
> 	spin_unlock(&node->lock);
> }
> 
> And the whole 'tree' thing collapses into a single class and single hold
> entry.
> 
> ---
>  include/linux/lockdep_types.h |  1 +
>  kernel/locking/lockdep.c      | 68 ++++++++++++++++++++++++++++++++++++++-----
>  2 files changed, 61 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/lockdep_types.h b/include/linux/lockdep_types.h
> index 3e726ace5c62..0afc46a1fd31 100644
> --- a/include/linux/lockdep_types.h
> +++ b/include/linux/lockdep_types.h
> @@ -33,6 +33,7 @@ enum lockdep_wait_type {
>  enum lockdep_lock_type {
>  	LD_LOCK_NORMAL = 0,	/* normal, catch all */
>  	LD_LOCK_PERCPU,		/* percpu */
> +	LD_LOCK_NEST_FUNC,
>  	LD_LOCK_MAX,
>  };
>  
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 8a509672a4cc..7e6fcd09a34c 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -723,7 +723,7 @@ static void print_lockdep_cache(struct lockdep_map *lock)
>  	printk(KERN_CONT "%s", name);
>  }
>  
> -static void print_lock(struct held_lock *hlock)
> +static void __print_lock(struct held_lock *hlock, struct lockdep_map *instance, unsigned long ip)
>  {
>  	/*
>  	 * We can be called locklessly through debug_show_all_locks() so be
> @@ -742,9 +742,14 @@ static void print_lock(struct held_lock *hlock)
>  		return;
>  	}
>  
> -	printk(KERN_CONT "%px", hlock->instance);
> +	printk(KERN_CONT "%px", instance);
>  	print_lock_name(lock);
> -	printk(KERN_CONT ", at: %pS\n", (void *)hlock->acquire_ip);
> +	printk(KERN_CONT ", at: %pS\n", (void *)ip);
> +}
> +
> +static void print_lock(struct held_lock *hlock)
> +{
> +	__print_lock(hlock, hlock->instance, hlock->acquire_ip);
>  }
>  
>  static void lockdep_print_held_locks(struct task_struct *p)
> @@ -806,6 +811,9 @@ static int static_obj(const void *obj)
>  	if (arch_is_kernel_data(addr))
>  		return 1;
>  
> +	if (arch_is_kernel_text(addr))
> +		return 1;
> +
>  	/*
>  	 * in-kernel percpu var?
>  	 */
> @@ -4846,8 +4854,34 @@ print_lock_nested_lock_not_held(struct task_struct *curr,
>  	pr_warn("\nbut this task is not holding:\n");
>  	pr_warn("%s\n", hlock->nest_lock->name);
>  
> +	pr_warn("\nother info that might help us debug this:\n");
> +	lockdep_print_held_locks(curr);
> +
>  	pr_warn("\nstack backtrace:\n");
>  	dump_stack();
> +}
> +
> +static void
> +print_lock_nested_invalid(struct task_struct *curr,
> +			  struct held_lock *hlock,
> +			  struct lockdep_map *lock,
> +			  unsigned long ip)
> +{
> +	if (!debug_locks_off())
> +		return;
> +	if (debug_locks_silent)
> +		return;
> +
> +	pr_warn("\n");
> +	pr_warn("============================\n");
> +	pr_warn("WARNING: Nested lock invalid\n");
> +	print_kernel_ident();
> +	pr_warn("----------------------------\n");
> +
> +	pr_warn("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr));
> +	__print_lock(hlock, lock, ip);
> +	pr_warn("but %ps() fails with already held lock:\n", (void*)hlock->nest_lock->key);
> +	print_lock(hlock);
>  
>  	pr_warn("\nother info that might help us debug this:\n");
>  	lockdep_print_held_locks(curr);
> @@ -4858,6 +4892,13 @@ print_lock_nested_lock_not_held(struct task_struct *curr,
>  
>  static int __lock_is_held(const struct lockdep_map *lock, int read);
>  
> +typedef bool (*nest_f)(struct lockdep_map *prev, struct lockdep_map *next);
> +
> +bool __nest_func_true(struct lockdep_map *prev, struct lockdep_map *next)
> +{
> +	return true;
> +}
> +
>  /*
>   * This gets called for every mutex_lock*()/spin_lock*() operation.
>   * We maintain the dependency maps and validate the locking attempt:
> @@ -4871,6 +4912,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>  			  struct lockdep_map *nest_lock, unsigned long ip,
>  			  int references, int pin_count)
>  {
> +	nest_f nest_func = __nest_func_true;
>  	struct task_struct *curr = current;
>  	struct lock_class *class = NULL;
>  	struct held_lock *hlock;
> @@ -4920,6 +4962,15 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>  
>  	class_idx = class - lock_classes;
>  
> +	if (nest_lock) {
> +		if (!__lock_is_held(nest_lock, -1)) {
> +			print_lock_nested_lock_not_held(curr, hlock, ip);
> +			return 0;
> +		}
> +		if (nest_lock->type == LD_LOCK_NEST_FUNC)
> +			nest_func = (nest_f)nest_lock->key;
> +	}
> +
>  	if (depth) { /* we're holding locks */
>  		hlock = curr->held_locks + depth - 1;
>  		if (hlock->class_idx == class_idx && nest_lock) {
> @@ -4929,6 +4980,12 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>  			if (!hlock->references)
>  				hlock->references++;
>  
> +			if (!nest_func(hlock->instance, lock)) {
> +				print_lock_nested_invalid(curr, hlock, lock, ip);
> +				return 0;
> +			}
> +
> +			hlock->instance = lock;
>  			hlock->references += references;
>  
>  			/* Overflow */
> @@ -5002,11 +5059,6 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>  	}
>  	chain_key = iterate_chain_key(chain_key, hlock_id(hlock));
>  
> -	if (nest_lock && !__lock_is_held(nest_lock, -1)) {
> -		print_lock_nested_lock_not_held(curr, hlock, ip);
> -		return 0;
> -	}
> -
>  	if (!debug_locks_silent) {
>  		WARN_ON_ONCE(depth && !hlock_class(hlock - 1)->key);
>  		WARN_ON_ONCE(!hlock_class(hlock)->key);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 5.14-0-rc5, splat in block_group_cache_tree_search while __btrfs_unlink_inode
  2021-08-16 15:11     ` Peter Zijlstra
  2021-08-16 15:40       ` Matthew Wilcox
@ 2021-08-16 20:43       ` David Sterba
  2021-08-17  8:46         ` Peter Zijlstra
  1 sibling, 1 reply; 10+ messages in thread
From: David Sterba @ 2021-08-16 20:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nikolay Borisov, dsterba, Chris Murphy, Btrfs BTRFS, mingo, will,
	longman, boqun.feng, Thomas Gleixner, willy

On Mon, Aug 16, 2021 at 05:11:55PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 16, 2021 at 01:27:35PM +0300, Nikolay Borisov wrote:
> > On 16.08.21 г. 13:20, David Sterba wrote:
> > > On Fri, Aug 13, 2021 at 10:33:05PM -0600, Chris Murphy wrote:
> > >> I get the following call trace about 0.6s after dnf completes an
> > >> update which I imagine deletes many files. I'll try to reproduce and
> > >> get /proc/lock_stat
> > >>
> > >> [   95.674507] kernel: BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
> > > 
> > > The message "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low" is related to
> > > lockdep and not a btrfs problem, but it appears from time to time and as
> > > Johannes said either increase the config variable, or ignore it.
> > > 
> > 
> > But is not a bug if code triggers it? I.e I think it's a signal of
> > misbehaving code. CC'ed PeterZ who can hopefully shed some light on this.
> 
> Yeah, it's lockdep running out of (memory) resources... the immediate
> thing I noticed is that btrfs_set_buffer_lockdep_class() creates 12*8
> lock classes, but when you consider enum btrfs_lock_nesting that becomes
> 12*8*8, which is a somewhat silly number.. When the number of classes
> increase like that, the number of unique lock chains also increases and
> consequently we seem to run out of hlocks.
> 
> (possibly there's something else that also creates a ton of classes on
> that system, but -ENOINFO)

Yeah it can't be just btrfs taking all the hlocks space, in the past the
warning popped up at random tests. Recently it has consistently started
to appear at test that's #11 in the list out of about ~880 so we lost
lockdep reports. I remember most if not all the reports to be valid
leading to fixes.

The number 12*8*8 is perhaps silly and more than would be sane, but at
least it's fixed and bounded so the CONFIG_LOCKDEP_CHAINS_BITS could be
adjusted as Johaness pointed out in another reply that 18 works for him.
I verified that on my setup and started to see normal lockdep reports
again. This is an improvement, for debugging builds we can live with
that.

> One thing I've been contemplating is using an annotation scheme like
> ww_mutex for trees. Then the tree root gets a nest class and all the
> tree nodes get the same lock class. The down-side is that any actual
> tree lock order violations go unreported. But the current
> class-per-level annotation already has some of that anyway.
> 
> One thing I've pondered is to augment nest_lock with a custom validation
> function which is called in the 'reference' case which validates if the
> current 'instance' nests in the previous instance.
> 
> Something a little like the *completely* untested below....
> 
> Then I imagine someone could do something like:
> 
> struct my_tree_node {
> 	...
> 	spinlock_t lock;

Is this meant to work only for spinlocks or does it work for rwsem too?
I guess with lockdep it does not matter but we use rwsem for tree locks.

> };
> 
> struct my_tree_root {
> 	...
> 	struct my_tree_node *node;
> 	struct lockdep_map dep_map;
> };
> 
> bool my_tree_validate_order(struct lockdep_map *prev, struct lockdep_map *next)
> {
> 	struct my_tree_node *parent = container_of(prev, struct my_tree_node, lock.dep_map);
> 	struct my_tree_node *child = container_of(next, struct my_tree_node, lock.dep_map);
> 
> 	return is_child_of(parent, child);
> }
> 
> void my_tree_init_root(struct my_tree_root *root)
> {
> 	lockdep_init_map_type(&root->dep_map, "my_tree_root", &my_tree_validate_order, 0,
> 			      LD_WAIT_INV, LD_WAIT_INV, LD_LOCK_NEST_FUNC);
> }
> 
> void my_tree_lock_prepare(struct my_tree_root *root)
> {
> 	mutex_acquire(&root->dep_map, 0, 0, _RET_IP_);
> }
> 
> void my_tree_lock_finish(struct my_tree_root *root)
> {
> 	mutex_release(&root->dep_map, _RET_IP_);
> }
> 
> void my_tree_lock_node(struct my_tree_root *root, struct my_tree_node *node)
> {
> 	spin_lock_nest_lock(&node->lock, root);
> }
> 
> void my_tree_unlock_node(struct my_tree_node *node)
> {
> 	spin_unlock(&node->lock);
> }
> 
> And the whole 'tree' thing collapses into a single class and single hold
> entry.

I'll try to adapt the btrfs tree locking to the proposed API.

> ---
>  include/linux/lockdep_types.h |  1 +
>  kernel/locking/lockdep.c      | 68 ++++++++++++++++++++++++++++++++++++++-----
>  2 files changed, 61 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/lockdep_types.h b/include/linux/lockdep_types.h
> index 3e726ace5c62..0afc46a1fd31 100644
> --- a/include/linux/lockdep_types.h
> +++ b/include/linux/lockdep_types.h
> @@ -33,6 +33,7 @@ enum lockdep_wait_type {
>  enum lockdep_lock_type {
>  	LD_LOCK_NORMAL = 0,	/* normal, catch all */
>  	LD_LOCK_PERCPU,		/* percpu */
> +	LD_LOCK_NEST_FUNC,
>  	LD_LOCK_MAX,
>  };
>  
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 8a509672a4cc..7e6fcd09a34c 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -723,7 +723,7 @@ static void print_lockdep_cache(struct lockdep_map *lock)
>  	printk(KERN_CONT "%s", name);
>  }
>  
> -static void print_lock(struct held_lock *hlock)
> +static void __print_lock(struct held_lock *hlock, struct lockdep_map *instance, unsigned long ip)
>  {
>  	/*
>  	 * We can be called locklessly through debug_show_all_locks() so be
> @@ -742,9 +742,14 @@ static void print_lock(struct held_lock *hlock)
>  		return;
>  	}
>  
> -	printk(KERN_CONT "%px", hlock->instance);
> +	printk(KERN_CONT "%px", instance);
>  	print_lock_name(lock);
> -	printk(KERN_CONT ", at: %pS\n", (void *)hlock->acquire_ip);
> +	printk(KERN_CONT ", at: %pS\n", (void *)ip);
> +}
> +
> +static void print_lock(struct held_lock *hlock)
> +{
> +	__print_lock(hlock, hlock->instance, hlock->acquire_ip);
>  }
>  
>  static void lockdep_print_held_locks(struct task_struct *p)
> @@ -806,6 +811,9 @@ static int static_obj(const void *obj)
>  	if (arch_is_kernel_data(addr))
>  		return 1;
>  
> +	if (arch_is_kernel_text(addr))
> +		return 1;
> +
>  	/*
>  	 * in-kernel percpu var?
>  	 */
> @@ -4846,8 +4854,34 @@ print_lock_nested_lock_not_held(struct task_struct *curr,
>  	pr_warn("\nbut this task is not holding:\n");
>  	pr_warn("%s\n", hlock->nest_lock->name);
>  
> +	pr_warn("\nother info that might help us debug this:\n");
> +	lockdep_print_held_locks(curr);
> +
>  	pr_warn("\nstack backtrace:\n");
>  	dump_stack();
> +}
> +
> +static void
> +print_lock_nested_invalid(struct task_struct *curr,
> +			  struct held_lock *hlock,
> +			  struct lockdep_map *lock,
> +			  unsigned long ip)
> +{
> +	if (!debug_locks_off())
> +		return;
> +	if (debug_locks_silent)
> +		return;
> +
> +	pr_warn("\n");
> +	pr_warn("============================\n");
> +	pr_warn("WARNING: Nested lock invalid\n");
> +	print_kernel_ident();
> +	pr_warn("----------------------------\n");
> +
> +	pr_warn("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr));
> +	__print_lock(hlock, lock, ip);
> +	pr_warn("but %ps() fails with already held lock:\n", (void*)hlock->nest_lock->key);
> +	print_lock(hlock);
>  
>  	pr_warn("\nother info that might help us debug this:\n");
>  	lockdep_print_held_locks(curr);
> @@ -4858,6 +4892,13 @@ print_lock_nested_lock_not_held(struct task_struct *curr,
>  
>  static int __lock_is_held(const struct lockdep_map *lock, int read);
>  
> +typedef bool (*nest_f)(struct lockdep_map *prev, struct lockdep_map *next);
> +
> +bool __nest_func_true(struct lockdep_map *prev, struct lockdep_map *next)
> +{
> +	return true;
> +}
> +
>  /*
>   * This gets called for every mutex_lock*()/spin_lock*() operation.
>   * We maintain the dependency maps and validate the locking attempt:
> @@ -4871,6 +4912,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>  			  struct lockdep_map *nest_lock, unsigned long ip,
>  			  int references, int pin_count)
>  {
> +	nest_f nest_func = __nest_func_true;
>  	struct task_struct *curr = current;
>  	struct lock_class *class = NULL;
>  	struct held_lock *hlock;
> @@ -4920,6 +4962,15 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>  
>  	class_idx = class - lock_classes;
>  
> +	if (nest_lock) {
> +		if (!__lock_is_held(nest_lock, -1)) {
> +			print_lock_nested_lock_not_held(curr, hlock, ip);
> +			return 0;
> +		}
> +		if (nest_lock->type == LD_LOCK_NEST_FUNC)

That's probably nest_lock->lock_type, does not compile otherwise.

> +			nest_func = (nest_f)nest_lock->key;
> +	}
> +
>  	if (depth) { /* we're holding locks */
>  		hlock = curr->held_locks + depth - 1;
>  		if (hlock->class_idx == class_idx && nest_lock) {
> @@ -4929,6 +4980,12 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>  			if (!hlock->references)
>  				hlock->references++;
>  
> +			if (!nest_func(hlock->instance, lock)) {
> +				print_lock_nested_invalid(curr, hlock, lock, ip);
> +				return 0;
> +			}
> +
> +			hlock->instance = lock;
>  			hlock->references += references;
>  
>  			/* Overflow */
> @@ -5002,11 +5059,6 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>  	}
>  	chain_key = iterate_chain_key(chain_key, hlock_id(hlock));
>  
> -	if (nest_lock && !__lock_is_held(nest_lock, -1)) {
> -		print_lock_nested_lock_not_held(curr, hlock, ip);
> -		return 0;
> -	}
> -
>  	if (!debug_locks_silent) {
>  		WARN_ON_ONCE(depth && !hlock_class(hlock - 1)->key);
>  		WARN_ON_ONCE(!hlock_class(hlock)->key);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 5.14-0-rc5, splat in block_group_cache_tree_search while __btrfs_unlink_inode
  2021-08-16  7:17 ` Johannes Thumshirn
@ 2021-08-17  3:21   ` Chris Murphy
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2021-08-17  3:21 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: Chris Murphy, Btrfs BTRFS

On Mon, Aug 16, 2021 at 1:17 AM Johannes Thumshirn
<Johannes.Thumshirn@wdc.com> wrote:
>
> On 14/08/2021 06:33, Chris Murphy wrote:
> > I get the following call trace about 0.6s after dnf completes an
> > update which I imagine deletes many files. I'll try to reproduce and
> > get /proc/lock_stat
> >
> > [   95.674507] kernel: BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
> > [   95.676537] kernel: turning off the locking correctness validator.
> > [   95.676537] kernel: Please attach the output of /proc/lock_stat to
> > the bug report
>
> That can be "fixed" by bumping CONFIG_LOCKDEP_CHAINS_BITS.
> I have it set to 18 here.

Fedora is using CONFIG_LOCKDEP_CHAINS_BITS=16

If the consensus is we should use 18 here, I'll get a PR submitted to
the kernel dev team.

Thanks,

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 5.14-0-rc5, splat in block_group_cache_tree_search while __btrfs_unlink_inode
  2021-08-16 20:43       ` David Sterba
@ 2021-08-17  8:46         ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2021-08-17  8:46 UTC (permalink / raw)
  To: dsterba, Nikolay Borisov, Chris Murphy, Btrfs BTRFS, mingo, will,
	longman, boqun.feng, Thomas Gleixner, willy

On Mon, Aug 16, 2021 at 10:43:54PM +0200, David Sterba wrote:
> On Mon, Aug 16, 2021 at 05:11:55PM +0200, Peter Zijlstra wrote:
> > Something a little like the *completely* untested below....
> > 
> > Then I imagine someone could do something like:
> > 
> > struct my_tree_node {
> > 	...
> > 	spinlock_t lock;
> 
> Is this meant to work only for spinlocks or does it work for rwsem too?
> I guess with lockdep it does not matter but we use rwsem for tree locks.

Anything with a ->dep_map member works; courtesy of macros :-) That very
much includes rwsems.

> > @@ -4920,6 +4962,15 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> >  
> >  	class_idx = class - lock_classes;
> >  
> > +	if (nest_lock) {
> > +		if (!__lock_is_held(nest_lock, -1)) {
> > +			print_lock_nested_lock_not_held(curr, hlock, ip);
> > +			return 0;
> > +		}
> > +		if (nest_lock->type == LD_LOCK_NEST_FUNC)
> 
> That's probably nest_lock->lock_type, does not compile otherwise.

Yes, sorry, typing so hard :-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-08-17  8:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-14  4:33 5.14-0-rc5, splat in block_group_cache_tree_search while __btrfs_unlink_inode Chris Murphy
2021-08-16  7:17 ` Johannes Thumshirn
2021-08-17  3:21   ` Chris Murphy
2021-08-16 10:20 ` David Sterba
2021-08-16 10:27   ` Nikolay Borisov
2021-08-16 10:52     ` David Sterba
2021-08-16 15:11     ` Peter Zijlstra
2021-08-16 15:40       ` Matthew Wilcox
2021-08-16 20:43       ` David Sterba
2021-08-17  8:46         ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.