linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, linux-tip-commits@vger.kernel.org,
	tip-bot2 for Peter Zijlstra <tip-bot2@linutronix.de>,
	Qian Cai <cai@redhat.com>, x86 <x86@kernel.org>
Subject: Re: [tip: locking/core] lockdep: Fix usage_traceoverflow
Date: Tue, 27 Oct 2020 13:29:10 +0000	[thread overview]
Message-ID: <160380535006.10461.1259632375207276085@build.alporthouse.com> (raw)
In-Reply-To: <20201027123056.GE2651@hirez.programming.kicks-ass.net>

Quoting Peter Zijlstra (2020-10-27 12:30:56)
> On Tue, Oct 27, 2020 at 12:59:55PM +0100, Peter Zijlstra wrote:
> > On Tue, Oct 27, 2020 at 11:29:35AM +0000, Chris Wilson wrote:
> > > Quoting tip-bot2 for Peter Zijlstra (2020-10-07 17:20:13)
> > > > The following commit has been merged into the locking/core branch of tip:
> > > > 
> > > > Commit-ID:     24d5a3bffef117ed90685f285c6c9d2faa3a02b4
> > > > Gitweb:        https://git.kernel.org/tip/24d5a3bffef117ed90685f285c6c9d2faa3a02b4
> > > > Author:        Peter Zijlstra <peterz@infradead.org>
> > > > AuthorDate:    Wed, 30 Sep 2020 11:49:37 +02:00
> > > > Committer:     Peter Zijlstra <peterz@infradead.org>
> > > > CommitterDate: Wed, 07 Oct 2020 18:14:17 +02:00
> > > > 
> > > > lockdep: Fix usage_traceoverflow
> > > > 
> > > > Basically print_lock_class_header()'s for loop is out of sync with the
> > > > the size of of ->usage_traces[].
> > > 
> > > We're hitting a problem,
> > > 
> > >     $ cat /proc/lockdep_stats
> > > 
> > > upon boot generates:
> > > 
> > > [   29.465702] DEBUG_LOCKS_WARN_ON(debug_atomic_read(nr_unused_locks) != nr_unused)
> > > [   29.465716] WARNING: CPU: 0 PID: 488 at kernel/locking/lockdep_proc.c:256 lockdep_stats_show+0xa33/0xac0
> > > 
> > > that bisected to this patch. Only just completed the bisection and
> > > thought you would like a heads up.
> > 
> > Oh hey, that's 'curious'... it does indeed trivially reproduce, let me
> > have a poke.
> 
> This seems to make it happy. Not quite sure that's the best solution.

Finished the first round of testing on this patch (will try the second
in a second). It solves the nr_unused_locks issue, but we find something
else:

<4> [304.908891] hm#2, depth: 6 [6], 3425cfea6ff31f7f != 547d92e9ec2ab9af
<4> [304.908897] WARNING: CPU: 0 PID: 5658 at kernel/locking/lockdep.c:3679 check_chain_key+0x1a4/0x1f0
<4> [304.908898] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio i915 mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel btusb snd_intel_dspcfg btrtl snd_hda_codec btbcm btintel ghash_clmulni_intel snd_hwdep bluetooth snd_hda_core e1000e snd_pcm cdc_ether ptp usbnet mei_me mii pps_core mei ecdh_generic ecc intel_lpss_pci prime_numbers
<4> [304.908920] CPU: 0 PID: 5658 Comm: kms_psr Not tainted 5.10.0-rc1-CI-Trybot_7174+ #1
<4> [304.908922] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [304.908923] RIP: 0010:check_chain_key+0x1a4/0x1f0
<4> [304.908925] Code: a5 d8 08 00 00 74 e7 e8 7a eb 96 00 8b b5 e0 08 00 00 4c 89 e1 89 da 4c 8b 85 d8 08 00 00 48 c7 c7 d0 8f 30 82 e8 5f 2c 92 00 <0f> 0b 5b 5d 41 5c 41 5d c3 49 89 d5 49 c7 c4 ff ff ff ff 31 db e8
<4> [304.908926] RSP: 0018:ffffc90000ba7af0 EFLAGS: 00010086
<4> [304.908928] RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000002
<4> [304.908929] RDX: 0000000080000002 RSI: ffffffff82348c47 RDI: 00000000ffffffff
<4> [304.908930] RBP: ffff88812d7dc040 R08: 0000000000000000 R09: c000000100002c92
<4> [304.908931] R10: 00000000003b5380 R11: ffffc90000ba7900 R12: 3425cfea6ff31f7f
<4> [304.908931] R13: ffff88812d7dc9f0 R14: 0000000000000003 R15: ffff88812d7dc9f0
<4> [304.908933] FS:  00007f51722bb300(0000) GS:ffff88849fa00000(0000) knlGS:0000000000000000
<4> [304.908934] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [304.908935] CR2: 00007ffd197adff0 CR3: 000000011d9ee004 CR4: 0000000000770ef0
<4> [304.908935] PKRU: 55555554
<4> [304.908936] Call Trace:
<4> [304.908939]  __lock_acquire+0x5d0/0x2740
<4> [304.908941]  lock_acquire+0xdc/0x3c0
<4> [304.908944]  ? drm_modeset_lock+0xf6/0x110
<4> [304.908947]  __ww_mutex_lock.constprop.18+0xd0/0x1010
<4> [304.908949]  ? drm_modeset_lock+0xf6/0x110
<4> [304.908951]  ? drm_modeset_lock+0xf6/0x110
<4> [304.908953]  ? ww_mutex_lock_interruptible+0x39/0xa0
<4> [304.908954]  ww_mutex_lock_interruptible+0x39/0xa0
<4> [304.908956]  drm_modeset_lock+0xf6/0x110
<4> [304.908958]  drm_atomic_get_connector_state+0x28/0x180
<4> [304.909003]  intel_psr_fastset_force+0x76/0x170 [i915]
<4> [304.909034]  i915_edp_psr_debug_set+0x53/0x70 [i915]
<4> [304.909037]  simple_attr_write+0xb1/0xd0
<4> [304.909040]  full_proxy_write+0x51/0x80
<4> [304.909042]  vfs_write+0xc4/0x230
<4> [304.909043]  ksys_write+0x5a/0xd0
<4> [304.909045]  do_syscall_64+0x33/0x80
<4> [304.909046]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4> [304.909047] RIP: 0033:0x7f517180d281
<4> [304.909049] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
<4> [304.909050] RSP: 002b:00007ffd197b0728 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [304.909051] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f517180d281
<4> [304.909052] RDX: 0000000000000003 RSI: 00007f5171cb0dee RDI: 0000000000000009
<4> [304.909053] RBP: 0000000000000003 R08: 00007ffd197eb1b0 R09: 000000000005d270
<4> [304.909054] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5171cb0dee
<4> [304.909055] R13: 0000000000000009 R14: 000055bad33d98c0 R15: 0000000000000000
<4> [304.909057] CPU: 0 PID: 5658 Comm: kms_psr Not tainted 5.10.0-rc1-CI-Trybot_7174+ #1
<4> [304.909058] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [304.909059] Call Trace:
<4> [304.909061]  dump_stack+0x77/0x97
<4> [304.909064]  __warn.cold.14+0xe/0x4b
<4> [304.909066]  ? check_chain_key+0x1a4/0x1f0
<4> [304.909068]  report_bug+0xbd/0xf0
<4> [304.909070]  handle_bug+0x3f/0x70
<4> [304.909071]  exc_invalid_op+0x13/0x60
<4> [304.909072]  asm_exc_invalid_op+0x12/0x20
<4> [304.909074] RIP: 0010:check_chain_key+0x1a4/0x1f0
<4> [304.909075] Code: a5 d8 08 00 00 74 e7 e8 7a eb 96 00 8b b5 e0 08 00 00 4c 89 e1 89 da 4c 8b 85 d8 08 00 00 48 c7 c7 d0 8f 30 82 e8 5f 2c 92 00 <0f> 0b 5b 5d 41 5c 41 5d c3 49 89 d5 49 c7 c4 ff ff ff ff 31 db e8
<4> [304.909076] RSP: 0018:ffffc90000ba7af0 EFLAGS: 00010086
<4> [304.909077] RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000002
<4> [304.909078] RDX: 0000000080000002 RSI: ffffffff82348c47 RDI: 00000000ffffffff
<4> [304.909079] RBP: ffff88812d7dc040 R08: 0000000000000000 R09: c000000100002c92
<4> [304.909080] R10: 00000000003b5380 R11: ffffc90000ba7900 R12: 3425cfea6ff31f7f
<4> [304.909081] R13: ffff88812d7dc9f0 R14: 0000000000000003 R15: ffff88812d7dc9f0
<4> [304.909083]  __lock_acquire+0x5d0/0x2740
<4> [304.909086]  lock_acquire+0xdc/0x3c0
<4> [304.909087]  ? drm_modeset_lock+0xf6/0x110
<4> [304.909090]  __ww_mutex_lock.constprop.18+0xd0/0x1010
<4> [304.909091]  ? drm_modeset_lock+0xf6/0x110
<4> [304.909093]  ? drm_modeset_lock+0xf6/0x110
<4> [304.909095]  ? ww_mutex_lock_interruptible+0x39/0xa0
<4> [304.909096]  ww_mutex_lock_interruptible+0x39/0xa0
<4> [304.909098]  drm_modeset_lock+0xf6/0x110
<4> [304.909100]  drm_atomic_get_connector_state+0x28/0x180
<4> [304.909149]  intel_psr_fastset_force+0x76/0x170 [i915]
<4> [304.909179]  i915_edp_psr_debug_set+0x53/0x70 [i915]
<4> [304.909181]  simple_attr_write+0xb1/0xd0
<4> [304.909183]  full_proxy_write+0x51/0x80
<4> [304.909184]  vfs_write+0xc4/0x230
<4> [304.909185]  ksys_write+0x5a/0xd0
<4> [304.909187]  do_syscall_64+0x33/0x80
<4> [304.909188]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4> [304.909189] RIP: 0033:0x7f517180d281
<4> [304.909190] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
<4> [304.909191] RSP: 002b:00007ffd197b0728 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [304.909193] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f517180d281
<4> [304.909193] RDX: 0000000000000003 RSI: 00007f5171cb0dee RDI: 0000000000000009
<4> [304.909194] RBP: 0000000000000003 R08: 00007ffd197eb1b0 R09: 000000000005d270
<4> [304.909195] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5171cb0dee
<4> [304.909196] R13: 0000000000000009 R14: 000055bad33d98c0 R15: 0000000000000000
<4> [304.909198] irq event stamp: 93035
<4> [304.909200] hardirqs last  enabled at (93035): [<ffffffff81a8b8b2>] _raw_spin_unlock_irqrestore+0x42/0x50
<4> [304.909201] hardirqs last disabled at (93034): [<ffffffff81a8b687>] _raw_spin_lock_irqsave+0x47/0x50
<4> [304.909203] softirqs last  enabled at (92760): [<ffffffff81e00342>] __do_softirq+0x342/0x48e
<4> [304.909204] softirqs last disabled at (92753): [<ffffffff81c00f4f>] asm_call_irq_on_stack+0xf/0x20

https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7174/fi-icl-u2/igt@kms_psr@primary_mmap_gtt.html
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7174/fi-kbl-r/igt@kms_psr@primary_page_flip.html
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7174/fi-cml-s/igt@kms_psr@primary_mmap_gtt.html
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7174/fi-cml-u2/igt@kms_psr@primary_page_flip.html

It is suspicious that those are all the same tests midway through the
run, so it might be our bug, but it is one we haven't seen before.
-Chris

  parent reply	other threads:[~2020-10-27 13:29 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-29 14:31 lockdep null-ptr-deref Qian Cai
2020-09-29 23:08 ` Boqun Feng
2020-09-30  9:16   ` Peter Zijlstra
2020-09-30  9:49     ` Peter Zijlstra
2020-09-30 12:18       ` Boqun Feng
2020-09-30 19:02         ` Peter Zijlstra
2020-10-02 12:36           ` Boqun Feng
2020-10-02 13:09             ` Peter Zijlstra
2020-10-02 13:35               ` Boqun Feng
2020-10-02 10:06       ` Peter Zijlstra
2020-10-02 13:40         ` Qian Cai
2020-10-07 16:20       ` [tip: locking/core] lockdep: Fix usage_traceoverflow tip-bot2 for Peter Zijlstra
2020-10-27 11:29         ` Chris Wilson
2020-10-27 11:59           ` Peter Zijlstra
2020-10-27 12:30             ` Peter Zijlstra
2020-10-27 12:48               ` Peter Zijlstra
2020-10-27 14:13                 ` Chris Wilson
2020-10-31 11:30                 ` [tip: locking/urgent] lockdep: Fix nr_unused_locks accounting tip-bot2 for Peter Zijlstra
2020-10-27 13:29               ` Chris Wilson [this message]
2020-10-27 15:45                 ` [tip: locking/core] lockdep: Fix usage_traceoverflow Peter Zijlstra
2020-10-27 16:34                   ` Chris Wilson
2020-10-28 17:40                     ` Chris Wilson
2020-10-28 18:06                       ` Chris Wilson
2020-10-28 19:42                       ` Peter Zijlstra
2020-10-28 19:59                         ` Peter Zijlstra
2020-10-30  3:51                           ` Boqun Feng
2020-10-30  9:38                             ` Peter Zijlstra
2020-10-30  9:55                               ` Peter Zijlstra
2020-11-02  5:37                               ` [PATCH 1/2] lockdep: Avoid to modify chain keys in validate_chain() Boqun Feng
2020-11-02  5:37                                 ` [PATCH 2/2] lockdep/selftest: Add spin_nest_lock test Boqun Feng
2020-12-03 10:35                                   ` [tip: locking/core] " tip-bot2 for Boqun Feng
2020-11-05  6:25                                 ` [PATCH 1/2] lockdep: Avoid to modify chain keys in validate_chain() Boqun Feng
2020-11-10 17:28                                 ` Peter Zijlstra
2020-11-11  8:23                                 ` [tip: locking/urgent] " tip-bot2 for Boqun Feng
2020-10-09  7:58       ` [tip: locking/core] lockdep: Fix usage_traceoverflow tip-bot2 for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=160380535006.10461.1259632375207276085@build.alporthouse.com \
    --to=chris@chris-wilson.co.uk \
    --cc=cai@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tip-bot2@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).