linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: Byungchul Park <byungchul.park@lge.com>,
	torvalds@linux-foundation.org, peterz@infradead.org,
	mingo@redhat.com, will@kernel.org, linux-kernel@vger.kernel.org,
	tglx@linutronix.de, joel@joelfernandes.org,
	alexander.levin@microsoft.com, daniel.vetter@ffwll.ch,
	chris@chris-wilson.co.uk, duyuyang@gmail.com,
	johannes.berg@intel.com, tj@kernel.org, tytso@mit.edu,
	willy@infradead.org, david@fromorbit.com, amir73il@gmail.com,
	bfields@fieldses.org, gregkh@linuxfoundation.org,
	kernel-team@lge.com
Subject: Re: [RFC] Are you good with Lockdep?
Date: Wed, 11 Nov 2020 09:36:09 -0500	[thread overview]
Message-ID: <20201111093609.1bd2b637@gandalf.local.home> (raw)
In-Reply-To: <20201111105441.GA78848@gmail.com>

On Wed, 11 Nov 2020 11:54:41 +0100
Ingo Molnar <mingo@kernel.org> wrote:

> * Byungchul Park <byungchul.park@lge.com> wrote:
> 
> > PROBLEM 1) First of all, Lockdep gets disabled on the first detection.  
> 
> Lockdep disabling itself after the first warning was an intentional 
> and deliberate design decision. (See more details below.)
> 

[..]

> Making the code simpler is always welcome, but I disagree with 
> enabling multiple warnings, for the technical reasons outlined above.

I 100% agree with Ingo. I wish we could stop *all* warnings after the first
one. The number of times people sent me bug reports with warnings without
showing me previous warnings that caused me to go on wild goose chases is
too many to count. The first warning is the *only* thing I look at.
Anything after that is likely to be caused by the integrity of the system
being compromised by the first bug.

And this is especially true with lockdep, because lockdep only detects the
deadlock, it doesn't tell you which lock was the incorrect locking.

For example. If we have a locking chain of:

 A -> B -> D

 A -> C -> D

Which on a correct system looks like this:

 lock(A)
 lock(B)
 unlock(B)
 unlock(A)

 lock(B)
 lock(D)
 unlock(D)
 unlock(B)

 lock(A)
 lock(C)
 unlock(C)
 unlock(A)

 lock(C)
 lock(D)
 unlock(D)
 unlock(C)

which creates the above chains in that order.

But, lets say we have a bug and the system boots up doing:

 lock(D)
 lock(A)
 unlock(A)
 unlock(D)

which creates the incorrect chain.

 D -> A


Now you do the correct locking:

 lock(A)
 lock(B)

Creates A -> B

 lock(A)
 lock(C)

Creates A -> C

 lock(B)
 lock(D)

Creates B -> D and lockdep detects:

 D -> A -> B -> D

and gives us the lockdep splat!!!

But we don't disable lockdep. We let it continue...

 lock(C)
 lock(D)

Which creates C -> D

Now it explodes with D -> A -> C -> D

Which it already reported. And it can be much more complex when dealing
with interrupt contexts and longer chains. That is, perhaps a different
chain had a missing irq disable, now you might get 5 or 6 more lockdep
splats because of that one bug.

The point I'm making is that the lockdep splats after the first one may
just be another version of the same bug and not a new one. Worse, if you
only look at the later lockdep splats, it may be much more difficult to
find the original bug than if you just had the first one. Believe me, I've
been down that road too many times!

And it can be very difficult to know if new lockdep splats are not the same
bug, and this will waste a lot of developers time!

This is why the decision to disable lockdep after the first splat was made.
There were times I wanted to check locking somewhere, but is was using
linux-next which had a lockdep splat that I didn't care about. So I
made it not disable lockdep. And then I hit this exact scenario, that the
one incorrect chain was causing reports all over the place. To solve it, I
had to patch the incorrect chain to do raw locking to have lockdep ignore
it ;-) Then I was able to test the code I was interested in.

> 
> > PROBLEM 2) Lockdep forces us to emulate lock acquisition for non-lock.  
> 
> >    I have the patch set. Let me share it with you in a few days.  
> 
> Not sure I understand the "problem 2)" outlined here, but I'm looking 
> forward to your patchset!
> 

I think I understand it. For things like completions and other "wait for
events" we have lockdep annotation, but it is rather awkward to implement.
Having something that says "lockdep_wait_event()" and
"lockdep_exec_event()" wrappers would be useful.

-- Steve

  reply	other threads:[~2020-11-11 14:36 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-11  5:05 [RFC] Are you good with Lockdep? Byungchul Park
2020-11-11 10:54 ` Ingo Molnar
2020-11-11 14:36   ` Steven Rostedt [this message]
2020-11-11 23:16     ` Thomas Gleixner
2020-11-12  8:10       ` Byungchul Park
2020-11-12 14:26         ` Steven Rostedt
2020-11-12 14:52           ` Matthew Wilcox
2020-11-16  8:57             ` Byungchul Park
2020-11-16 15:37               ` Matthew Wilcox
2020-11-18  1:45                 ` Boqun Feng
2020-11-18  3:30                   ` Matthew Wilcox
2020-11-23 13:15                 ` Byungchul Park
2020-11-12 14:58           ` Byungchul Park
2020-11-16  9:05             ` Byungchul Park
2020-11-23 10:45               ` Byungchul Park
2020-11-12 10:32     ` Byungchul Park
2020-11-12 13:56       ` Daniel Vetter
2020-11-16  8:45         ` Byungchul Park
2020-11-12  6:15   ` Byungchul Park
2020-11-12  8:51     ` Byungchul Park
2020-11-12  9:46       ` Byungchul Park
2020-11-23 11:05 ` [RFC] Dept(Dependency Tracker) Implementation Byungchul Park
2020-11-23 11:36   ` [RFC 1/6] dept: Implement Dept(Dependency Tracker) Byungchul Park
2020-11-23 11:36     ` [RFC 2/6] dept: Apply Dept to spinlock Byungchul Park
2020-11-23 11:36     ` [RFC 3/6] dept: Apply Dept to mutex families Byungchul Park
2020-11-23 11:36     ` [RFC 4/6] dept: Apply Dept to rwlock Byungchul Park
2020-11-23 11:36     ` [RFC 5/6] dept: Apply Dept to wait_for_completion()/complete() Byungchul Park
2020-11-23 11:36     ` [RFC 6/6] dept: Assign custom dept_keys or disable to avoid false positives Byungchul Park
2020-11-23 12:29   ` [RFC] Dept(Dependency Tracker) Implementation Byungchul Park
2020-11-23 11:13 ` [RFC] Dept(Dependency Tracker) Report Example Byungchul Park
2020-11-23 12:14   ` Byungchul Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201111093609.1bd2b637@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=alexander.levin@microsoft.com \
    --cc=amir73il@gmail.com \
    --cc=bfields@fieldses.org \
    --cc=byungchul.park@lge.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=daniel.vetter@ffwll.ch \
    --cc=david@fromorbit.com \
    --cc=duyuyang@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=joel@joelfernandes.org \
    --cc=johannes.berg@intel.com \
    --cc=kernel-team@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).