linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] debugobjects: Fix potential hard lockup by disabling lockdep
@ 2018-09-25 14:41 Waiman Long
  2018-09-25 14:41 ` [PATCH v2 1/2] locking/lockdep: Don't warn class/lock name mismatch for novalidate class Waiman Long
  2018-09-25 14:41 ` [PATCH v2 2/2] debugobjects: Disable lockdep tracking of debugobjects internal locks Waiman Long
  0 siblings, 2 replies; 8+ messages in thread
From: Waiman Long @ 2018-09-25 14:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Will Deacon
  Cc: linux-kernel, Yang Shi, Arnd Bergmann, chuhu, Waiman Long

 v2:
  - Add a lockdep patch to fix a dmesg warning message due to the use
    of the novalidate class.

The only locking used by the debugobjects code is just 2 sets of raw
spinlocks for synchronization purpose. When lockdep is enabled, the
locking operation itself will become much more expensive especially
if a number of locks have already been acquired previously. In some
extreme cases, it may lead to hard lockup.

As there isn't much value in debugging the debugobjects internal locks,
lockdep checking is now disabled for those internal locks. That will
speed up system operation without compromising the lock checking
operation of production code.

Waiman Long (2):
  locking/lockdep: Don't warn class/lock name mismatch for novalidate
    class
  debugobjects: Disable lockdep tracking of debugobjects internal locks

 kernel/locking/lockdep.c | 7 +++++--
 lib/debugobjects.c       | 9 ++++++++-
 2 files changed, 13 insertions(+), 3 deletions(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] locking/lockdep: Don't warn class/lock name mismatch for novalidate class
  2018-09-25 14:41 [PATCH v2 0/2] debugobjects: Fix potential hard lockup by disabling lockdep Waiman Long
@ 2018-09-25 14:41 ` Waiman Long
  2018-09-25 14:41 ` [PATCH v2 2/2] debugobjects: Disable lockdep tracking of debugobjects internal locks Waiman Long
  1 sibling, 0 replies; 8+ messages in thread
From: Waiman Long @ 2018-09-25 14:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Will Deacon
  Cc: linux-kernel, Yang Shi, Arnd Bergmann, chuhu, Waiman Long

For the special novalidate class (lockdep_set_novalidate_class), multiple
locks with different names may use the same class. The WARN_ON_ONCE()
check in look_up_lock_class() will then report an unnecessary warning.

The current users of lockdep_set_novalidate_class() includes the
device mutex, one mutex and one rwsem in drivers/md/bcache/btree.c.
So the novalidate class is now excluded from the check.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lockdep.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index dd13f865ad40..10babc35953b 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -693,9 +693,12 @@ look_up_lock_class(const struct lockdep_map *lock, unsigned int subclass)
 		if (class->key == key) {
 			/*
 			 * Huh! same key, different name? Did someone trample
-			 * on some memory? We're most confused.
+			 * on some memory? We're most confused unless it is
+			 * __lockdep_no_validate__ where different locks can
+			 * use the same class.
 			 */
-			WARN_ON_ONCE(class->name != lock->name);
+			WARN_ON_ONCE((key != __lockdep_no_validate__.subkeys) &&
+				     (class->name != lock->name));
 			return class;
 		}
 	}
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] debugobjects: Disable lockdep tracking of debugobjects internal locks
  2018-09-25 14:41 [PATCH v2 0/2] debugobjects: Fix potential hard lockup by disabling lockdep Waiman Long
  2018-09-25 14:41 ` [PATCH v2 1/2] locking/lockdep: Don't warn class/lock name mismatch for novalidate class Waiman Long
@ 2018-09-25 14:41 ` Waiman Long
  2018-09-25 15:32   ` Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: Waiman Long @ 2018-09-25 14:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Will Deacon
  Cc: linux-kernel, Yang Shi, Arnd Bergmann, chuhu, Waiman Long

It was discovered that running the ltp openposix_testsuite sigqueue-09-1
test on a certain 8-sock IvyBridge system caused it to have a hard lockup
with a full debug kernel.

 [89981.861500] NMI watchdog: Watchdog detected hard LOCKUP on cpu 17
   :
 [89981.939812] irq event stamp: 1166122
 [89981.943799] hardirqs last  enabled at (1166121): [<ffffffffb1048342>] kprobe_ftrace_handler+0x52/0x170
 [89981.954215] hardirqs last disabled at (1166122): [<ffffffffb08a3725>] tasklist_write_lock_irq+0x15/0x50
   :
 [89982.103134]  [<ffffffffb093a399>] lock_acquire+0x99/0x1e0
 [89982.109163]  [<ffffffffb0c0d19b>] ? debug_check_no_obj_freed+0xfb/0x270
 [89982.116562]  [<ffffffffb1042e1e>] _raw_spin_lock_irqsave+0x5e/0xa0
 [89982.123470]  [<ffffffffb0c0d19b>] ? debug_check_no_obj_freed+0xfb/0x270
 [89982.130851]  [<ffffffffb0c0d19b>] debug_check_no_obj_freed+0xfb/0x270
 [89982.138045]  [<ffffffffb08bf9f3>] ? __sigqueue_free.part.11+0x33/0x40
 [89982.145239]  [<ffffffffb0a6868a>] kmem_cache_free+0xca/0x390
 [89982.151553]  [<ffffffffb08bf9f3>] __sigqueue_free.part.11+0x33/0x40
 [89982.158552]  [<ffffffffb08c07b0>] flush_sigqueue+0x50/0x60
 [89982.164673]  [<ffffffffb08ad102>] release_task+0x3e2/0x6d0
   :

IRQ was disabled by the tasklist_write_lock_irq() call in
release_task(). The lockup is probably caused by the debug code running
for too long. We certainly want the debug code to verify the correctness
of the production code. However, there isn't too much value for one
piece of the debug code to verify the correctness of another piece of
debug code. In this particular case, the lockdep code is verifying the
correctness of the raw debug bucket lock within the debugobjects code.

The use of spin locks in the debugobjects code for synchronization is
pretty standard and looks to be correct to me.  This patch disables
the checking of the debugobjects internal locks by lockdep. In fact,
with this change, the hard lockup was not observed anymore.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 lib/debugobjects.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 70935ed91125..68d72ed9ca22 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -1106,8 +1106,15 @@ void __init debug_objects_early_init(void)
 {
 	int i;
 
-	for (i = 0; i < ODEBUG_HASH_SIZE; i++)
+	/*
+	 * We don't need lockdep to verify correctness of debugobjects
+	 * internal locks.
+	 */
+	lockdep_set_novalidate_class(&pool_lock);
+	for (i = 0; i < ODEBUG_HASH_SIZE; i++) {
 		raw_spin_lock_init(&obj_hash[i].lock);
+		lockdep_set_novalidate_class(&obj_hash[i].lock);
+	}
 
 	for (i = 0; i < ODEBUG_POOL_SIZE; i++)
 		hlist_add_head(&obj_static_pool[i].node, &obj_pool);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] debugobjects: Disable lockdep tracking of debugobjects internal locks
  2018-09-25 14:41 ` [PATCH v2 2/2] debugobjects: Disable lockdep tracking of debugobjects internal locks Waiman Long
@ 2018-09-25 15:32   ` Peter Zijlstra
  2018-09-25 16:20     ` Waiman Long
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2018-09-25 15:32 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, Ingo Molnar, Will Deacon, linux-kernel,
	Yang Shi, Arnd Bergmann, chuhu

On Tue, Sep 25, 2018 at 10:41:09AM -0400, Waiman Long wrote:
> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
> index 70935ed91125..68d72ed9ca22 100644
> --- a/lib/debugobjects.c
> +++ b/lib/debugobjects.c
> @@ -1106,8 +1106,15 @@ void __init debug_objects_early_init(void)
>  {
>  	int i;
>  
> -	for (i = 0; i < ODEBUG_HASH_SIZE; i++)
> +	/*
> +	 * We don't need lockdep to verify correctness of debugobjects
> +	 * internal locks.
> +	 */
> +	lockdep_set_novalidate_class(&pool_lock);
> +	for (i = 0; i < ODEBUG_HASH_SIZE; i++) {
>  		raw_spin_lock_init(&obj_hash[i].lock);
> +		lockdep_set_novalidate_class(&obj_hash[i].lock);
> +	}
>  
>  	for (i = 0; i < ODEBUG_POOL_SIZE; i++)
>  		hlist_add_head(&obj_static_pool[i].node, &obj_pool);

NAK, we do not _EVER_ set novalidate if it can at all be avoided.

If there is a severe performance problem with lockdep, try and cure
that. But really, who runs lockdep kernels on 8 sockets?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] debugobjects: Disable lockdep tracking of debugobjects internal locks
  2018-09-25 15:32   ` Peter Zijlstra
@ 2018-09-25 16:20     ` Waiman Long
  2018-09-25 16:31       ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Waiman Long @ 2018-09-25 16:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Will Deacon, linux-kernel,
	Yang Shi, Arnd Bergmann, chuhu

On 09/25/2018 11:32 AM, Peter Zijlstra wrote:
> On Tue, Sep 25, 2018 at 10:41:09AM -0400, Waiman Long wrote:
>> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
>> index 70935ed91125..68d72ed9ca22 100644
>> --- a/lib/debugobjects.c
>> +++ b/lib/debugobjects.c
>> @@ -1106,8 +1106,15 @@ void __init debug_objects_early_init(void)
>>  {
>>  	int i;
>>  
>> -	for (i = 0; i < ODEBUG_HASH_SIZE; i++)
>> +	/*
>> +	 * We don't need lockdep to verify correctness of debugobjects
>> +	 * internal locks.
>> +	 */
>> +	lockdep_set_novalidate_class(&pool_lock);
>> +	for (i = 0; i < ODEBUG_HASH_SIZE; i++) {
>>  		raw_spin_lock_init(&obj_hash[i].lock);
>> +		lockdep_set_novalidate_class(&obj_hash[i].lock);
>> +	}
>>  
>>  	for (i = 0; i < ODEBUG_POOL_SIZE; i++)
>>  		hlist_add_head(&obj_static_pool[i].node, &obj_pool);
> NAK, we do not _EVER_ set novalidate if it can at all be avoided.
>
> If there is a severe performance problem with lockdep, try and cure
> that. But really, who runs lockdep kernels on 8 sockets?

We do. It is part of our testing process to run both production and
debug kernels on a variety of different machines to see if anything
breaks. Some of them just happen to be 8-socket systems.

The internal locks in the debugobjects code don't interact with other
locks at all as memory allocation isn't called with those lock held. So
disabling lockdep for those locks won't materially affect the accuracy
of the lockdep code.

How about the ability to declare a class of locks as terminal in the
sense that no further lock acquisition is allowed while holding a
terminal lock? That will allow the the lockdep code to fast track the
handling of those locks and hopefully prevent hard lockup problem like that.

Cheers,
Longman





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] debugobjects: Disable lockdep tracking of debugobjects internal locks
  2018-09-25 16:20     ` Waiman Long
@ 2018-09-25 16:31       ` Peter Zijlstra
  2018-09-25 16:36         ` Waiman Long
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2018-09-25 16:31 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, Ingo Molnar, Will Deacon, linux-kernel,
	Yang Shi, Arnd Bergmann, chuhu

On Tue, Sep 25, 2018 at 12:20:05PM -0400, Waiman Long wrote:
> On 09/25/2018 11:32 AM, Peter Zijlstra wrote:
> > On Tue, Sep 25, 2018 at 10:41:09AM -0400, Waiman Long wrote:
> >> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
> >> index 70935ed91125..68d72ed9ca22 100644
> >> --- a/lib/debugobjects.c
> >> +++ b/lib/debugobjects.c
> >> @@ -1106,8 +1106,15 @@ void __init debug_objects_early_init(void)
> >>  {
> >>  	int i;
> >>  
> >> -	for (i = 0; i < ODEBUG_HASH_SIZE; i++)
> >> +	/*
> >> +	 * We don't need lockdep to verify correctness of debugobjects
> >> +	 * internal locks.
> >> +	 */
> >> +	lockdep_set_novalidate_class(&pool_lock);
> >> +	for (i = 0; i < ODEBUG_HASH_SIZE; i++) {
> >>  		raw_spin_lock_init(&obj_hash[i].lock);
> >> +		lockdep_set_novalidate_class(&obj_hash[i].lock);
> >> +	}
> >>  
> >>  	for (i = 0; i < ODEBUG_POOL_SIZE; i++)
> >>  		hlist_add_head(&obj_static_pool[i].node, &obj_pool);
> > NAK, we do not _EVER_ set novalidate if it can at all be avoided.
> >
> > If there is a severe performance problem with lockdep, try and cure
> > that. But really, who runs lockdep kernels on 8 sockets?
> 
> We do. It is part of our testing process to run both production and
> debug kernels on a variety of different machines to see if anything
> breaks. Some of them just happen to be 8-socket systems.
> 
> The internal locks in the debugobjects code don't interact with other
> locks at all as memory allocation isn't called with those lock held. So
> disabling lockdep for those locks won't materially affect the accuracy
> of the lockdep code.
> 
> How about the ability to declare a class of locks as terminal in the
> sense that no further lock acquisition is allowed while holding a
> terminal lock? That will allow the the lockdep code to fast track the
> handling of those locks and hopefully prevent hard lockup problem like that.

Who guarantees they stay leaf locks? What if someone mucks up the
debugobject locking because its Monday morning and they haven't had
their coffee yet. You want lockdep to catch all that.

Also, where is the performance benefit here. The normal lock_acquire
path doesn't change gobal state (it will not see new lock ordering in
99%+ of the cases).

Typically we only push the lock on the task local lock stack, compute
the new hash, and do the hash lookup, find it already exists and return.

So what is the expensive part, and can we do something about that?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] debugobjects: Disable lockdep tracking of debugobjects internal locks
  2018-09-25 16:31       ` Peter Zijlstra
@ 2018-09-25 16:36         ` Waiman Long
  2018-09-28 18:33           ` Waiman Long
  0 siblings, 1 reply; 8+ messages in thread
From: Waiman Long @ 2018-09-25 16:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Will Deacon, linux-kernel,
	Yang Shi, Arnd Bergmann, chuhu

On 09/25/2018 12:31 PM, Peter Zijlstra wrote:
> On Tue, Sep 25, 2018 at 12:20:05PM -0400, Waiman Long wrote:
>> On 09/25/2018 11:32 AM, Peter Zijlstra wrote:
>>> On Tue, Sep 25, 2018 at 10:41:09AM -0400, Waiman Long wrote:
>>>> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
>>>> index 70935ed91125..68d72ed9ca22 100644
>>>> --- a/lib/debugobjects.c
>>>> +++ b/lib/debugobjects.c
>>>> @@ -1106,8 +1106,15 @@ void __init debug_objects_early_init(void)
>>>>  {
>>>>  	int i;
>>>>  
>>>> -	for (i = 0; i < ODEBUG_HASH_SIZE; i++)
>>>> +	/*
>>>> +	 * We don't need lockdep to verify correctness of debugobjects
>>>> +	 * internal locks.
>>>> +	 */
>>>> +	lockdep_set_novalidate_class(&pool_lock);
>>>> +	for (i = 0; i < ODEBUG_HASH_SIZE; i++) {
>>>>  		raw_spin_lock_init(&obj_hash[i].lock);
>>>> +		lockdep_set_novalidate_class(&obj_hash[i].lock);
>>>> +	}
>>>>  
>>>>  	for (i = 0; i < ODEBUG_POOL_SIZE; i++)
>>>>  		hlist_add_head(&obj_static_pool[i].node, &obj_pool);
>>> NAK, we do not _EVER_ set novalidate if it can at all be avoided.
>>>
>>> If there is a severe performance problem with lockdep, try and cure
>>> that. But really, who runs lockdep kernels on 8 sockets?
>> We do. It is part of our testing process to run both production and
>> debug kernels on a variety of different machines to see if anything
>> breaks. Some of them just happen to be 8-socket systems.
>>
>> The internal locks in the debugobjects code don't interact with other
>> locks at all as memory allocation isn't called with those lock held. So
>> disabling lockdep for those locks won't materially affect the accuracy
>> of the lockdep code.
>>
>> How about the ability to declare a class of locks as terminal in the
>> sense that no further lock acquisition is allowed while holding a
>> terminal lock? That will allow the the lockdep code to fast track the
>> handling of those locks and hopefully prevent hard lockup problem like that.
> Who guarantees they stay leaf locks? What if someone mucks up the
> debugobject locking because its Monday morning and they haven't had
> their coffee yet. You want lockdep to catch all that.

Lockdep will need to do checking to guarantee that.

> Also, where is the performance benefit here. The normal lock_acquire
> path doesn't change gobal state (it will not see new lock ordering in
> 99%+ of the cases).
>
> Typically we only push the lock on the task local lock stack, compute
> the new hash, and do the hash lookup, find it already exists and return.
>
> So what is the expensive part, and can we do something about that?

I will need to do some experimentation to find out. Will update this
thread with my finding.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] debugobjects: Disable lockdep tracking of debugobjects internal locks
  2018-09-25 16:36         ` Waiman Long
@ 2018-09-28 18:33           ` Waiman Long
  0 siblings, 0 replies; 8+ messages in thread
From: Waiman Long @ 2018-09-28 18:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Will Deacon, linux-kernel,
	Yang Shi, Arnd Bergmann, chuhu

On 09/25/2018 12:36 PM, Waiman Long wrote:
> On 09/25/2018 12:31 PM, Peter Zijlstra wrote:
>> Also, where is the performance benefit here. The normal lock_acquire
>> path doesn't change gobal state (it will not see new lock ordering in
>> 99%+ of the cases).
>>
>> Typically we only push the lock on the task local lock stack, compute
>> the new hash, and do the hash lookup, find it already exists and return.
>>
>> So what is the expensive part, and can we do something about that?
> I will need to do some experimentation to find out. Will update this
> thread with my finding.
>

I have sent out a patchset on improving lockdep performance that
hopefully will help to alleviate the problems that I saw. I still have
other ideas about further changes I want to investigate, but that will
wait until I am done with the current lockdep patchset.

-Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-09-28 18:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-25 14:41 [PATCH v2 0/2] debugobjects: Fix potential hard lockup by disabling lockdep Waiman Long
2018-09-25 14:41 ` [PATCH v2 1/2] locking/lockdep: Don't warn class/lock name mismatch for novalidate class Waiman Long
2018-09-25 14:41 ` [PATCH v2 2/2] debugobjects: Disable lockdep tracking of debugobjects internal locks Waiman Long
2018-09-25 15:32   ` Peter Zijlstra
2018-09-25 16:20     ` Waiman Long
2018-09-25 16:31       ` Peter Zijlstra
2018-09-25 16:36         ` Waiman Long
2018-09-28 18:33           ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).