linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half
@ 2020-05-26 17:43 Waiman Long
  2020-05-26 18:58 ` Qian Cai
  0 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2020-05-26 17:43 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: linux-kernel, Qian Cai, Waiman Long

It was found by Qian Cai that lockdep splat sometimes appears with the
"BUG: MAX_LOCKDEP_ENTRIES too low" message on linux-next. On a 32-vcpu VM
guest with a v5.7-rc7 based kernel, I looked at how many of the various
table entries were being used after bootup and after a parallel kernel
build (make -j32). The tables below show the usage statistics.

  After bootup:

  Table               Used        Max      %age
  -----               ----        ---      ----
  lock_classes[]      1834       8192      22.4
  list_entries[]     15646      32768      47.7
  lock_chains[]      20873      65536      31.8
  chain_hlocks[]     83199     327680      25.4
  stack_trace[]     146177     524288      27.9

  After parallel kernel build:

  Table               Used        Max      %age
  -----               ----        ---      ----
  lock_classes[]      1864       8192      22.8
  list_entries[]     17134      32768      52.3
  lock_chains[]      25196      65536      38.4
  chain_hlocks[]    106321     327680      32.4
  stack_trace[]     158700     524288      30.3

Percentage-wise, it can be seen that the list_entries for direct
dependencies are used much more than the other tables. So it is also
the table that is mostly likely to run out of space when running a
compex workload.

To reduce the likelihood of running out of table entries, we can increase
MAX_LOCKDEP_ENTRIES by 50% from 16384/32768 to 24576/49152.  On a 64-bit
architecture, that represents an increase in memory consumption of
917504 bytes. With that change, the percentage usage of list_entries[]
will fall to 31.8% and 34.9% respectively to make them more in line
with the other tables.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lockdep_internals.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h
index baca699b94e9..6108d2fbe775 100644
--- a/kernel/locking/lockdep_internals.h
+++ b/kernel/locking/lockdep_internals.h
@@ -89,12 +89,12 @@ static const unsigned long LOCKF_USED_IN_IRQ_READ =
  * table (if it's not there yet), and we check it for lock order
  * conflicts and deadlocks.
  */
-#define MAX_LOCKDEP_ENTRIES	16384UL
+#define MAX_LOCKDEP_ENTRIES	24576UL
 #define MAX_LOCKDEP_CHAINS_BITS	15
 #define MAX_STACK_TRACE_ENTRIES	262144UL
 #define STACK_TRACE_HASH_SIZE	8192
 #else
-#define MAX_LOCKDEP_ENTRIES	32768UL
+#define MAX_LOCKDEP_ENTRIES	49152UL
 
 #define MAX_LOCKDEP_CHAINS_BITS	16
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half
  2020-05-26 17:43 [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half Waiman Long
@ 2020-05-26 18:58 ` Qian Cai
  2020-05-26 19:28   ` Qian Cai
  2020-05-26 19:56   ` Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Qian Cai @ 2020-05-26 18:58 UTC (permalink / raw)
  To: Waiman Long; +Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

On Tue, May 26, 2020 at 01:43:49PM -0400, Waiman Long wrote:
> It was found by Qian Cai that lockdep splat sometimes appears with the
> "BUG: MAX_LOCKDEP_ENTRIES too low" message on linux-next. On a 32-vcpu VM
> guest with a v5.7-rc7 based kernel, I looked at how many of the various
> table entries were being used after bootup and after a parallel kernel
> build (make -j32). The tables below show the usage statistics.

I think this approach could help the situatin on this AMD server.
However, I don't understand why this particular system starts to exceed
the MAX_LOCKDEP_ENTRIES. All other Intel, powerpc, amd64 and s390
running the same workload have only < 20k at the end, but on this AMD
server,

After system boot: ~13000 (which is similar to other arches.)
After LTP MM +2000
After LTP Syscalls: +5000
After reading all sysfs [1] (except debugfs): +10000
[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/fs/read_all/read_all.c
# read_all -d /sys -q -r 10

I still don't understand why reading all sysfs files on this system
could increase that much, but here is the lockdep file after
running sysfs read to see if you could spot anything obviously,

https://cailca.github.io/files/lockdep.txt

lock-classes:                         2114 [max: 8192]
 direct dependencies:                 28568 [max: 32768]
 indirect dependencies:              126882
 all direct dependencies:            304192
 dependency chains:                   38269 [max: 65536]
 dependency chain hlocks used:       141480 [max: 327680]
 dependency chain hlocks lost:            0
 in-hardirq chains:                     166
 in-softirq chains:                     818
 in-process chains:                   37285
 stack-trace entries:                175558 [max: 524288]
 number of stack traces:               8355
 number of stack hash chains:          6545
 combined max dependencies:       804750782
 hardirq-safe locks:                     80
 hardirq-unsafe locks:                 1048
 softirq-safe locks:                    169
 softirq-unsafe locks:                  957
 irq-safe locks:                        183
 irq-unsafe locks:                     1048
 hardirq-read-safe locks:                 4
 hardirq-read-unsafe locks:             943
 softirq-read-safe locks:                12
 softirq-read-unsafe locks:             940
 irq-read-safe locks:                    12
 irq-read-unsafe locks:                 943
 uncategorized locks:                   255
 unused locks:                            0
 max locking depth:                      19
 max bfs queue depth:                  1136
 chain lookup misses:                 43195
 chain lookup hits:              3151662027
 cyclic checks:                       44009
 redundant checks:                        0
 redundant links:                         0
 find-mask forwards checks:            7726
 find-mask backwards checks:           8095
 hardirq on events:              3503795225
 hardirq off events:             3503795224
 redundant hardirq ons:              453772
 redundant hardirq offs:         1856554342
 softirq on events:                51104551
 softirq off events:               51104551
 redundant softirq ons:                   0
 redundant softirq offs:                  0
 debug_locks:                             1

 zapped classes:                       1101
 zapped lock chains:                   1117
 large chain blocks:                      1

> 
>   After bootup:
> 
>   Table               Used        Max      %age
>   -----               ----        ---      ----
>   lock_classes[]      1834       8192      22.4
>   list_entries[]     15646      32768      47.7
>   lock_chains[]      20873      65536      31.8
>   chain_hlocks[]     83199     327680      25.4
>   stack_trace[]     146177     524288      27.9
> 
>   After parallel kernel build:
> 
>   Table               Used        Max      %age
>   -----               ----        ---      ----
>   lock_classes[]      1864       8192      22.8
>   list_entries[]     17134      32768      52.3
>   lock_chains[]      25196      65536      38.4
>   chain_hlocks[]    106321     327680      32.4
>   stack_trace[]     158700     524288      30.3
> 
> Percentage-wise, it can be seen that the list_entries for direct
> dependencies are used much more than the other tables. So it is also
> the table that is mostly likely to run out of space when running a
> compex workload.
> 
> To reduce the likelihood of running out of table entries, we can increase
> MAX_LOCKDEP_ENTRIES by 50% from 16384/32768 to 24576/49152.  On a 64-bit
> architecture, that represents an increase in memory consumption of
> 917504 bytes. With that change, the percentage usage of list_entries[]
> will fall to 31.8% and 34.9% respectively to make them more in line
> with the other tables.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/locking/lockdep_internals.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h
> index baca699b94e9..6108d2fbe775 100644
> --- a/kernel/locking/lockdep_internals.h
> +++ b/kernel/locking/lockdep_internals.h
> @@ -89,12 +89,12 @@ static const unsigned long LOCKF_USED_IN_IRQ_READ =
>   * table (if it's not there yet), and we check it for lock order
>   * conflicts and deadlocks.
>   */
> -#define MAX_LOCKDEP_ENTRIES	16384UL
> +#define MAX_LOCKDEP_ENTRIES	24576UL
>  #define MAX_LOCKDEP_CHAINS_BITS	15
>  #define MAX_STACK_TRACE_ENTRIES	262144UL
>  #define STACK_TRACE_HASH_SIZE	8192
>  #else
> -#define MAX_LOCKDEP_ENTRIES	32768UL
> +#define MAX_LOCKDEP_ENTRIES	49152UL
>  
>  #define MAX_LOCKDEP_CHAINS_BITS	16
>  
> -- 
> 2.18.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half
  2020-05-26 18:58 ` Qian Cai
@ 2020-05-26 19:28   ` Qian Cai
  2020-05-26 19:56   ` Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Qian Cai @ 2020-05-26 19:28 UTC (permalink / raw)
  To: Waiman Long; +Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

On Tue, May 26, 2020 at 02:58:53PM -0400, Qian Cai wrote:
> On Tue, May 26, 2020 at 01:43:49PM -0400, Waiman Long wrote:
> > It was found by Qian Cai that lockdep splat sometimes appears with the
> > "BUG: MAX_LOCKDEP_ENTRIES too low" message on linux-next. On a 32-vcpu VM
> > guest with a v5.7-rc7 based kernel, I looked at how many of the various
> > table entries were being used after bootup and after a parallel kernel
> > build (make -j32). The tables below show the usage statistics.
> 
> I think this approach could help the situatin on this AMD server.
> However, I don't understand why this particular system starts to exceed
> the MAX_LOCKDEP_ENTRIES. All other Intel, powerpc, amd64 and s390
> running the same workload have only < 20k at the end, but on this AMD
> server,

Correction -- Intel is also quite close to exceed it.

direct dependencies:                 29836 [max: 32768]

arm64:

direct dependencies:                 25268 [max: 32768]


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half
  2020-05-26 18:58 ` Qian Cai
  2020-05-26 19:28   ` Qian Cai
@ 2020-05-26 19:56   ` Peter Zijlstra
  2020-05-26 20:17     ` Qian Cai
  2020-05-26 20:30     ` Waiman Long
  1 sibling, 2 replies; 9+ messages in thread
From: Peter Zijlstra @ 2020-05-26 19:56 UTC (permalink / raw)
  To: Qian Cai; +Cc: Waiman Long, Ingo Molnar, Will Deacon, linux-kernel

On Tue, May 26, 2020 at 02:58:50PM -0400, Qian Cai wrote:

> I still don't understand why reading all sysfs files on this system
> could increase that much, but here is the lockdep file after
> running sysfs read to see if you could spot anything obviously,
> 
> https://cailca.github.io/files/lockdep.txt

00000000f011a2a5 OPS:      20 FD:   45 BD:    1 .+.+: kn->active#834

is that somewhere near the number of CPUs you have?

Anyway, there's very long "kn->active#..." chains in there, which seems
to suggest some annotation is all sorts of buggered.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half
  2020-05-26 19:56   ` Peter Zijlstra
@ 2020-05-26 20:17     ` Qian Cai
  2020-05-26 20:30     ` Waiman Long
  1 sibling, 0 replies; 9+ messages in thread
From: Qian Cai @ 2020-05-26 20:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Waiman Long, Ingo Molnar, Will Deacon, linux-kernel

On Tue, May 26, 2020 at 09:56:30PM +0200, Peter Zijlstra wrote:
> On Tue, May 26, 2020 at 02:58:50PM -0400, Qian Cai wrote:
> 
> > I still don't understand why reading all sysfs files on this system
> > could increase that much, but here is the lockdep file after
> > running sysfs read to see if you could spot anything obviously,
> > 
> > https://cailca.github.io/files/lockdep.txt
> 
> 00000000f011a2a5 OPS:      20 FD:   45 BD:    1 .+.+: kn->active#834
> 
> is that somewhere near the number of CPUs you have?

834? No.

# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  2
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        8
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               1
Model name:          AMD EPYC 7601 32-Core Processor
Stepping:            2
CPU MHz:             2932.801
CPU max MHz:         2200.0000
CPU min MHz:         1200.0000
BogoMIPS:            4391.87
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           64K
L2 cache:            512K
L3 cache:            8192K
NUMA node0 CPU(s):   0-7,64-71
NUMA node1 CPU(s):   8-15,72-79
NUMA node2 CPU(s):   16-23,80-87
NUMA node3 CPU(s):   24-31,88-95
NUMA node4 CPU(s):   32-39,96-103
NUMA node5 CPU(s):   40-47,104-111
NUMA node6 CPU(s):   48-55,112-119
NUMA node7 CPU(s):   56-63,120-127

> 
> Anyway, there's very long "kn->active#..." chains in there, which seems
> to suggest some annotation is all sorts of buggered.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half
  2020-05-26 19:56   ` Peter Zijlstra
  2020-05-26 20:17     ` Qian Cai
@ 2020-05-26 20:30     ` Waiman Long
  2020-05-26 21:24       ` Qian Cai
  2020-05-26 21:27       ` Peter Zijlstra
  1 sibling, 2 replies; 9+ messages in thread
From: Waiman Long @ 2020-05-26 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Qian Cai; +Cc: Ingo Molnar, Will Deacon, linux-kernel

On 5/26/20 3:56 PM, Peter Zijlstra wrote:
> On Tue, May 26, 2020 at 02:58:50PM -0400, Qian Cai wrote:
>
>> I still don't understand why reading all sysfs files on this system
>> could increase that much, but here is the lockdep file after
>> running sysfs read to see if you could spot anything obviously,
>>
>> https://cailca.github.io/files/lockdep.txt
> 00000000f011a2a5 OPS:      20 FD:   45 BD:    1 .+.+: kn->active#834
>
> is that somewhere near the number of CPUs you have?
>
> Anyway, there's very long "kn->active#..." chains in there, which seems
> to suggest some annotation is all sorts of buggered.
>
It is actually one active lock per instance of the kerfs_node 
structures. That means more than 800 sysfs files are accessed in some 
way. As we could have much more than 800 sysfs files in the system, we 
could easily overwhelm the lockdep tables if we really try to access all 
of them.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half
  2020-05-26 20:30     ` Waiman Long
@ 2020-05-26 21:24       ` Qian Cai
  2020-05-26 21:27       ` Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Qian Cai @ 2020-05-26 21:24 UTC (permalink / raw)
  To: Waiman Long; +Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

On Tue, May 26, 2020 at 04:30:58PM -0400, Waiman Long wrote:
> On 5/26/20 3:56 PM, Peter Zijlstra wrote:
> > On Tue, May 26, 2020 at 02:58:50PM -0400, Qian Cai wrote:
> > 
> > > I still don't understand why reading all sysfs files on this system
> > > could increase that much, but here is the lockdep file after
> > > running sysfs read to see if you could spot anything obviously,
> > > 
> > > https://cailca.github.io/files/lockdep.txt
> > 00000000f011a2a5 OPS:      20 FD:   45 BD:    1 .+.+: kn->active#834
> > 
> > is that somewhere near the number of CPUs you have?
> > 
> > Anyway, there's very long "kn->active#..." chains in there, which seems
> > to suggest some annotation is all sorts of buggered.
> > 
> It is actually one active lock per instance of the kerfs_node structures.
> That means more than 800 sysfs files are accessed in some way. As we could
> have much more than 800 sysfs files in the system, we could easily overwhelm
> the lockdep tables if we really try to access all of them.

Yes, there are a lot of those on large systems, NUMA, percpu, slab etc.
Isn't it better to extend MAX_LOCKDEP_ENTRIES dynamically? There are
plenty of memory over there.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half
  2020-05-26 20:30     ` Waiman Long
  2020-05-26 21:24       ` Qian Cai
@ 2020-05-26 21:27       ` Peter Zijlstra
  2020-05-26 22:27         ` Waiman Long
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2020-05-26 21:27 UTC (permalink / raw)
  To: Waiman Long; +Cc: Qian Cai, Ingo Molnar, Will Deacon, linux-kernel

On Tue, May 26, 2020 at 04:30:58PM -0400, Waiman Long wrote:
> On 5/26/20 3:56 PM, Peter Zijlstra wrote:
> > On Tue, May 26, 2020 at 02:58:50PM -0400, Qian Cai wrote:
> > 
> > > I still don't understand why reading all sysfs files on this system
> > > could increase that much, but here is the lockdep file after
> > > running sysfs read to see if you could spot anything obviously,
> > > 
> > > https://cailca.github.io/files/lockdep.txt
> > 00000000f011a2a5 OPS:      20 FD:   45 BD:    1 .+.+: kn->active#834
> > 
> > is that somewhere near the number of CPUs you have?
> > 
> > Anyway, there's very long "kn->active#..." chains in there, which seems
> > to suggest some annotation is all sorts of buggered.
> > 
> It is actually one active lock per instance of the kerfs_node structures.
> That means more than 800 sysfs files are accessed in some way. As we could
> have much more than 800 sysfs files in the system, we could easily overwhelm
> the lockdep tables if we really try to access all of them.

A lock per instance is crazy, that's not what lockdep is made for.
Fixing this seems like a far better idea than increasing the numbers.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half
  2020-05-26 21:27       ` Peter Zijlstra
@ 2020-05-26 22:27         ` Waiman Long
  0 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2020-05-26 22:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Qian Cai, Ingo Molnar, Will Deacon, linux-kernel

On 5/26/20 5:27 PM, Peter Zijlstra wrote:
> On Tue, May 26, 2020 at 04:30:58PM -0400, Waiman Long wrote:
>> On 5/26/20 3:56 PM, Peter Zijlstra wrote:
>>> On Tue, May 26, 2020 at 02:58:50PM -0400, Qian Cai wrote:
>>>
>>>> I still don't understand why reading all sysfs files on this system
>>>> could increase that much, but here is the lockdep file after
>>>> running sysfs read to see if you could spot anything obviously,
>>>>
>>>> https://cailca.github.io/files/lockdep.txt
>>> 00000000f011a2a5 OPS:      20 FD:   45 BD:    1 .+.+: kn->active#834
>>>
>>> is that somewhere near the number of CPUs you have?
>>>
>>> Anyway, there's very long "kn->active#..." chains in there, which seems
>>> to suggest some annotation is all sorts of buggered.
>>>
>> It is actually one active lock per instance of the kerfs_node structures.
>> That means more than 800 sysfs files are accessed in some way. As we could
>> have much more than 800 sysfs files in the system, we could easily overwhelm
>> the lockdep tables if we really try to access all of them.
> A lock per instance is crazy, that's not what lockdep is made for.
> Fixing this seems like a far better idea than increasing the numbers.

Thinking about it, one lock per sysfs file does make sense as access to 
those sysfs files can invoke vastly different code path and unifying 
them into one single lock can easily lead to false positive warning.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-05-26 22:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-26 17:43 [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half Waiman Long
2020-05-26 18:58 ` Qian Cai
2020-05-26 19:28   ` Qian Cai
2020-05-26 19:56   ` Peter Zijlstra
2020-05-26 20:17     ` Qian Cai
2020-05-26 20:30     ` Waiman Long
2020-05-26 21:24       ` Qian Cai
2020-05-26 21:27       ` Peter Zijlstra
2020-05-26 22:27         ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).