* [4.2.0-rc1-00201-g59c3cb5] Regression: kernel NULL pointer dereference @ 2015-07-12 8:03 Jörg Otte 2015-07-12 16:33 ` Jörg Otte 2015-07-12 16:52 ` Linus Torvalds 0 siblings, 2 replies; 10+ messages in thread From: Jörg Otte @ 2015-07-12 8:03 UTC (permalink / raw) To: David Airlie, Linus Torvalds, Daniel Vetter, dri-devel, Linux Kernel Mailing List 4.2.0-rc1-00201-g59c3cb5 introducued a null pointer derefence and a system freeze when Xorg is started ( 4.2.0-rc1-00062-gc4b5fd3 was fine) : BUG: unable to handle kernel NULL pointer dereference at 0000000000000009 IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb PGD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 1290 Comm: Xorg Not tainted 4.2.0-rc1-00201-g59c3cb5 #6 Hardware name: FUJITSU LIFEBOOK AH532/FJNBB1C, BIOS Version 1.09 05/22/2012 task: ffff8802149d6c00 ti: ffff880206df4000 task.ti: ffff880206df4000 RIP: 0010:[<ffffffffbd3447bb>] [<ffffffffbd3447bb>] 0xffffffffbd3447bb RSP: 0018:ffff880206df7b08 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88021578f480 RCX: ffff88021578f4d0 RDX: 0000000000000000 RSI: ffff88021630b000 RDI: ffff880214a68000 RBP: ffff88021630b000 R08: ffff88021578f4e0 R09: ffff88021578f4f0 R10: 0000000000003c18 R11: 00000000fffffff2 R12: ffff880214a68000 R13: ffff88021634e800 R14: 0000000000000000 R15: 0000000000000000 FS: 00007ff3caa60880(0000) GS:ffff88021f280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000009 CR3: 0000000206e07000 CR4: 00000000001407e0 Stack: ffff880200010000 ffff880200010000 ffff880200000000 ffff880200000001 ffff88021578f500 ffffffffbd2df135 ffff880213f71c00 ffff880214a68000 0000000000000000 ffff880214a70000 0000000000000001 ffff880214a68000 Call Trace: [<ffffffffbd2df135>] ? 0xffffffffbd2df135 [<ffffffffbd2b6ca8>] ? 0xffffffffbd2b6ca8 [<ffffffffbd33cc7e>] ? 0xffffffffbd33cc7e [<ffffffffbd343673>] ? 0xffffffffbd343673 [<ffffffffbd2d0728>] ? 0xffffffffbd2d0728 [<ffffffffbd2d088e>] ? 0xffffffffbd2d088e [<ffffffffbd2d10c5>] ? 0xffffffffbd2d10c5 [<ffffffffbd2c6976>] ? 0xffffffffbd2c6976 [<ffffffffbd2d0fe0>] ? 0xffffffffbd2d0fe0 [<ffffffffbd0c6a1f>] ? 0xffffffffbd0c6a1f [<ffffffffbd0e79e1>] ? 0xffffffffbd0e79e1 [<ffffffffbd0e7ed1>] ? 0xffffffffbd0e7ed1 [<ffffffffbd6df557>] ? 0xffffffffbd6df557 Code: 48 89 54 24 20 48 8b 54 24 40 48 89 ee 89 0c 24 4c 89 f9 c7 44 24 18 01 00 00 00 89 44 24 08 e8 bc 1f f7 ff 85 c0 41 89 c7 75 67 <41> 80 7e 09 00 74 56 49 8b 84 24 38 02 00 00 c6 85 d0 08 00 00 RIP [<ffffffffbd3447bb>] 0xffffffffbd3447bb RSP <ffff880206df7b08> CR2: 0000000000000009 ---[ end trace dd0931f7f0d43d12 ] --- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [4.2.0-rc1-00201-g59c3cb5] Regression: kernel NULL pointer dereference 2015-07-12 8:03 [4.2.0-rc1-00201-g59c3cb5] Regression: kernel NULL pointer dereference Jörg Otte @ 2015-07-12 16:33 ` Jörg Otte 2015-07-12 16:52 ` Linus Torvalds 1 sibling, 0 replies; 10+ messages in thread From: Jörg Otte @ 2015-07-12 16:33 UTC (permalink / raw) To: David Airlie, Linus Torvalds, Daniel Vetter, dri-devel, Linux Kernel Mailing List, Ander Conselvan de Oliveira 2015-07-12 10:03 GMT+02:00 Jörg Otte <jrg.otte@gmail.com>: > 4.2.0-rc1-00201-g59c3cb5 introducued a null pointer derefence and a > system freeze > when Xorg is started ( 4.2.0-rc1-00062-gc4b5fd3 was fine) : > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000009 > IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb > PGD 0 > Oops: 0000 [#1] SMP > CPU: 1 PID: 1290 Comm: Xorg Not tainted 4.2.0-rc1-00201-g59c3cb5 #6 > Hardware name: FUJITSU LIFEBOOK AH532/FJNBB1C, BIOS Version 1.09 05/22/2012 > task: ffff8802149d6c00 ti: ffff880206df4000 task.ti: ffff880206df4000 > RIP: 0010:[<ffffffffbd3447bb>] [<ffffffffbd3447bb>] 0xffffffffbd3447bb > RSP: 0018:ffff880206df7b08 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffff88021578f480 RCX: ffff88021578f4d0 > RDX: 0000000000000000 RSI: ffff88021630b000 RDI: ffff880214a68000 > RBP: ffff88021630b000 R08: ffff88021578f4e0 R09: ffff88021578f4f0 > R10: 0000000000003c18 R11: 00000000fffffff2 R12: ffff880214a68000 > R13: ffff88021634e800 R14: 0000000000000000 R15: 0000000000000000 > FS: 00007ff3caa60880(0000) GS:ffff88021f280000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000009 CR3: 0000000206e07000 CR4: 00000000001407e0 > Stack: > ffff880200010000 ffff880200010000 ffff880200000000 ffff880200000001 > ffff88021578f500 ffffffffbd2df135 ffff880213f71c00 ffff880214a68000 > 0000000000000000 ffff880214a70000 0000000000000001 ffff880214a68000 > Call Trace: > [<ffffffffbd2df135>] ? 0xffffffffbd2df135 > [<ffffffffbd2b6ca8>] ? 0xffffffffbd2b6ca8 > [<ffffffffbd33cc7e>] ? 0xffffffffbd33cc7e > [<ffffffffbd343673>] ? 0xffffffffbd343673 > [<ffffffffbd2d0728>] ? 0xffffffffbd2d0728 > [<ffffffffbd2d088e>] ? 0xffffffffbd2d088e > [<ffffffffbd2d10c5>] ? 0xffffffffbd2d10c5 > [<ffffffffbd2c6976>] ? 0xffffffffbd2c6976 > [<ffffffffbd2d0fe0>] ? 0xffffffffbd2d0fe0 > [<ffffffffbd0c6a1f>] ? 0xffffffffbd0c6a1f > [<ffffffffbd0e79e1>] ? 0xffffffffbd0e79e1 > [<ffffffffbd0e7ed1>] ? 0xffffffffbd0e7ed1 > [<ffffffffbd6df557>] ? 0xffffffffbd6df557 > Code: 48 89 54 24 20 48 8b 54 24 40 48 89 ee 89 0c 24 4c 89 f9 c7 44 > 24 18 01 00 00 00 89 44 24 08 e8 bc 1f f7 ff 85 c0 41 89 c7 75 67 <41> > 80 7e 09 00 74 56 49 8b 84 24 38 02 00 00 c6 85 d0 08 00 00 > RIP [<ffffffffbd3447bb>] 0xffffffffbd3447bb > RSP <ffff880206df7b08> > CR2: 0000000000000009 > ---[ end trace dd0931f7f0d43d12 ] --- I can fix the problem for me by reverting: commit dec4f799d0a4c9edae20512fa60b0a36f3299ca2 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Jul 7 11:15:47 2015 +0200 drm/i915: Use crtc_state->active in primary check_plane func Since commit 8c7b5ccb729870e606321b3703e2c2e698c49a95 Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Date: Tue Apr 21 17:13:19 2015 +0300 drm/i915: Use atomic helpers for computing changed flags Thanks, Jörg ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [4.2.0-rc1-00201-g59c3cb5] Regression: kernel NULL pointer dereference 2015-07-12 8:03 [4.2.0-rc1-00201-g59c3cb5] Regression: kernel NULL pointer dereference Jörg Otte 2015-07-12 16:33 ` Jörg Otte @ 2015-07-12 16:52 ` Linus Torvalds 2015-07-13 5:56 ` Maarten Lankhorst 2015-07-13 6:22 ` Daniel Vetter 1 sibling, 2 replies; 10+ messages in thread From: Linus Torvalds @ 2015-07-12 16:52 UTC (permalink / raw) To: Jörg Otte, Daniel Vetter Cc: David Airlie, DRI, Linux Kernel Mailing List, Maarten Lankhorst On Sun, Jul 12, 2015 at 1:03 AM, Jörg Otte <jrg.otte@gmail.com> wrote: > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000009 > IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb Ugh. Please enable KALLSYMS to get sane symbols. But yes, "crtc_state->base.active" is at offset 9 from "crtc_state", so it's pretty clearly just that change frm - if (intel_crtc->active) { + if (crtc_state->base.active) { and "crtc_state" is NULL. And the code very much knows that crtc_state can be NULL, since it's initialized with crtc_state = state->base.state ? intel_atomic_get_crtc_state(state->base.state, intel_crtc) : NULL; Tssk. Daniel? Should I just revert that commit dec4f799d0a4 ("drm/i915: Use crtc_state->active in primary check_plane func") for now, or is there a better fix? Like just checking crtc_state for NULL? Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [4.2.0-rc1-00201-g59c3cb5] Regression: kernel NULL pointer dereference 2015-07-12 16:52 ` Linus Torvalds @ 2015-07-13 5:56 ` Maarten Lankhorst 2015-07-13 6:22 ` Daniel Vetter 1 sibling, 0 replies; 10+ messages in thread From: Maarten Lankhorst @ 2015-07-13 5:56 UTC (permalink / raw) To: Linus Torvalds, Jörg Otte, Daniel Vetter Cc: David Airlie, DRI, Linux Kernel Mailing List Op 12-07-15 om 18:52 schreef Linus Torvalds: > On Sun, Jul 12, 2015 at 1:03 AM, Jörg Otte <jrg.otte@gmail.com> wrote: >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000009 >> IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb > Ugh. Please enable KALLSYMS to get sane symbols. > > But yes, "crtc_state->base.active" is at offset 9 from "crtc_state", > so it's pretty clearly just that change frm > > - if (intel_crtc->active) { > + if (crtc_state->base.active) { > > and "crtc_state" is NULL. > > And the code very much knows that crtc_state can be NULL, since it's > initialized with > > crtc_state = state->base.state ? > intel_atomic_get_crtc_state(state->base.state, > intel_crtc) : NULL; > > Tssk. Daniel? Should I just revert that commit dec4f799d0a4 > ("drm/i915: Use crtc_state->active in primary check_plane func") for > now, or is there a better fix? Like just checking crtc_state for NULL? > > Linus More symbols would be nice. With the transitional helpers when crtc_state == NULL you don't want to update the scalers or funny things happen. Fix is probably something like this: diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index ba9321998a41..830e07b23a15 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13276,7 +13276,7 @@ intel_check_primary_plane(struct drm_plane *plane, if (ret) return ret; - if (crtc_state->base.active) { + if (crtc_state ? crtc_state->base.active || crtc->state->active) { struct intel_plane_state *old_state = to_intel_plane_state(plane->state); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [4.2.0-rc1-00201-g59c3cb5] Regression: kernel NULL pointer dereference 2015-07-12 16:52 ` Linus Torvalds 2015-07-13 5:56 ` Maarten Lankhorst @ 2015-07-13 6:22 ` Daniel Vetter 2015-07-13 7:23 ` Maarten Lankhorst 1 sibling, 1 reply; 10+ messages in thread From: Daniel Vetter @ 2015-07-13 6:22 UTC (permalink / raw) To: Linus Torvalds Cc: Jörg Otte, Daniel Vetter, David Airlie, DRI, Linux Kernel Mailing List, Maarten Lankhorst On Sun, Jul 12, 2015 at 09:52:51AM -0700, Linus Torvalds wrote: > On Sun, Jul 12, 2015 at 1:03 AM, Jörg Otte <jrg.otte@gmail.com> wrote: > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000009 > > IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb > > Ugh. Please enable KALLSYMS to get sane symbols. > > But yes, "crtc_state->base.active" is at offset 9 from "crtc_state", > so it's pretty clearly just that change frm > > - if (intel_crtc->active) { > + if (crtc_state->base.active) { > > and "crtc_state" is NULL. > > And the code very much knows that crtc_state can be NULL, since it's > initialized with > > crtc_state = state->base.state ? > intel_atomic_get_crtc_state(state->base.state, > intel_crtc) : NULL; > > Tssk. Daniel? Should I just revert that commit dec4f799d0a4 > ("drm/i915: Use crtc_state->active in primary check_plane func") for > now, or is there a better fix? Like just checking crtc_state for NULL? Indeed embarrassing. I've missed that we still have 1 caller left that's using the transitional helpers, and those don't fill out plane_state->state backpointers to the global atomic update since there is no global atomic update for transitional helpers. Below diff should fix this - we need to preferentially check crts_state->active and if that's not set intel_crtc->active should yield the right result for the one remaining caller (it's in the crtc_disable paths). For cheap excuses why i915 is so crap in 4.2: Thanks to a hipshot decision to transition to a different QA team ("we'll do this in 1 week without upfront planing") I essentially don't have proper QA support for 1-2 months by now. The other trouble in this area specifically is that this code is already completely changed in -next again, so any testing done on integration trees (like -next or drm-intel-nightly) won't test any patches for 4.2. -Daniel Oh and Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> in case you decide to apply this right away. --- diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index ba9321998a41..85ac6d85dc39 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13276,7 +13276,7 @@ intel_check_primary_plane(struct drm_plane *plane, if (ret) return ret; - if (crtc_state->base.active) { + if (crtc_state ? crtc_state->base.active : intel_crtc->active) { struct intel_plane_state *old_state = to_intel_plane_state(plane->state); -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [4.2.0-rc1-00201-g59c3cb5] Regression: kernel NULL pointer dereference 2015-07-13 6:22 ` Daniel Vetter @ 2015-07-13 7:23 ` Maarten Lankhorst 2015-07-13 7:42 ` Jörg Otte 0 siblings, 1 reply; 10+ messages in thread From: Maarten Lankhorst @ 2015-07-13 7:23 UTC (permalink / raw) To: Linus Torvalds, Jörg Otte, David Airlie, DRI, Linux Kernel Mailing List Op 13-07-15 om 08:22 schreef Daniel Vetter: > On Sun, Jul 12, 2015 at 09:52:51AM -0700, Linus Torvalds wrote: >> On Sun, Jul 12, 2015 at 1:03 AM, Jörg Otte <jrg.otte@gmail.com> wrote: >>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000009 >>> IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb >> Ugh. Please enable KALLSYMS to get sane symbols. >> >> But yes, "crtc_state->base.active" is at offset 9 from "crtc_state", >> so it's pretty clearly just that change frm >> >> - if (intel_crtc->active) { >> + if (crtc_state->base.active) { >> >> and "crtc_state" is NULL. >> >> And the code very much knows that crtc_state can be NULL, since it's >> initialized with >> >> crtc_state = state->base.state ? >> intel_atomic_get_crtc_state(state->base.state, >> intel_crtc) : NULL; >> >> Tssk. Daniel? Should I just revert that commit dec4f799d0a4 >> ("drm/i915: Use crtc_state->active in primary check_plane func") for >> now, or is there a better fix? Like just checking crtc_state for NULL? > Indeed embarrassing. I've missed that we still have 1 caller left that's > using the transitional helpers, and those don't fill out > plane_state->state backpointers to the global atomic update since there is > no global atomic update for transitional helpers. Below diff should fix > this - we need to preferentially check crts_state->active and if that's > not set intel_crtc->active should yield the right result for the one > remaining caller (it's in the crtc_disable paths). > > For cheap excuses why i915 is so crap in 4.2: Thanks to a hipshot decision > to transition to a different QA team ("we'll do this in 1 week without > upfront planing") I essentially don't have proper QA support for 1-2 > months by now. The other trouble in this area specifically is that this > code is already completely changed in -next again, so any testing done on > integration trees (like -next or drm-intel-nightly) won't test any patches > for 4.2. > -Daniel > > Oh and Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> in case you > decide to apply this right away. > Well your version has the benefit of compiling without errors. :-) Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [4.2.0-rc1-00201-g59c3cb5] Regression: kernel NULL pointer dereference 2015-07-13 7:23 ` Maarten Lankhorst @ 2015-07-13 7:42 ` Jörg Otte 2015-07-13 7:58 ` Maarten Lankhorst 0 siblings, 1 reply; 10+ messages in thread From: Jörg Otte @ 2015-07-13 7:42 UTC (permalink / raw) To: Maarten Lankhorst Cc: Linus Torvalds, David Airlie, DRI, Linux Kernel Mailing List 2015-07-13 9:23 GMT+02:00 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>: > Op 13-07-15 om 08:22 schreef Daniel Vetter: >> On Sun, Jul 12, 2015 at 09:52:51AM -0700, Linus Torvalds wrote: >>> On Sun, Jul 12, 2015 at 1:03 AM, Jörg Otte <jrg.otte@gmail.com> wrote: >>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000009 >>>> IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb >>> Ugh. Please enable KALLSYMS to get sane symbols. >>> >>> But yes, "crtc_state->base.active" is at offset 9 from "crtc_state", >>> so it's pretty clearly just that change frm >>> >>> - if (intel_crtc->active) { >>> + if (crtc_state->base.active) { >>> >>> and "crtc_state" is NULL. >>> >>> And the code very much knows that crtc_state can be NULL, since it's >>> initialized with >>> >>> crtc_state = state->base.state ? >>> intel_atomic_get_crtc_state(state->base.state, >>> intel_crtc) : NULL; >>> >>> Tssk. Daniel? Should I just revert that commit dec4f799d0a4 >>> ("drm/i915: Use crtc_state->active in primary check_plane func") for >>> now, or is there a better fix? Like just checking crtc_state for NULL? >> Indeed embarrassing. I've missed that we still have 1 caller left that's >> using the transitional helpers, and those don't fill out >> plane_state->state backpointers to the global atomic update since there is >> no global atomic update for transitional helpers. Below diff should fix >> this - we need to preferentially check crts_state->active and if that's >> not set intel_crtc->active should yield the right result for the one >> remaining caller (it's in the crtc_disable paths). >> >> For cheap excuses why i915 is so crap in 4.2: Thanks to a hipshot decision >> to transition to a different QA team ("we'll do this in 1 week without >> upfront planing") I essentially don't have proper QA support for 1-2 >> months by now. The other trouble in this area specifically is that this >> code is already completely changed in -next again, so any testing done on >> integration trees (like -next or drm-intel-nightly) won't test any patches >> for 4.2. >> -Daniel >> >> Oh and Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> in case you >> decide to apply this right away. >> > Well your version has the benefit of compiling without errors. :-) > > Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Just noticed another problem: On each resume I get the following error: -----------[ cut here ]------------ WARNING: CPU: 2 PID: 2663 at /data/kernel/linux/drivers/gpu/drm/i915/intel_display.c:6319 0xffffffff9a33d5e9() WARN_ON(!crtc->state->enable) CPU: 2 PID: 2663 Comm: kworker/u8:80 Not tainted 4.2.0-rc2 #15 ardware name: FUJITSU LIFEBOOK AH532/FJNBB1C, BIOS Version 1.09 05/22/2012 orkqueue: events_unbound 0xffffffff9a055750 0000000000000000 ffffffff9a98ea28 ffffffff9a6d84d2 0000000000000000 ffffffff9a03c416 ffff88020951c4e0 0000000000000000 0000000000000000 ffff8802141cb800 ffff88021630c000 ffffffff9a03c4d5 ffffffff9a9c3664 all Trace: [<ffffffff9a6d84d2>] ? 0xffffffff9a6d84d2 [<ffffffff9a03c416>] ? 0xffffffff9a03c416 [<ffffffff9a03c4d5>] ? 0xffffffff9a03c4d5 [<ffffffff9a33d5e9>] ? 0xffffffff9a33d5e9 [<ffffffff9a343ac3>] ? 0xffffffff9a343ac3 [<ffffffff9a34444a>] ? 0xffffffff9a34444a [<ffffffff9a345518>] ? 0xffffffff9a345518 [<ffffffff9a3246f0>] ? 0xffffffff9a3246f0 [<ffffffff9a2e1ce8>] ? 0xffffffff9a2e1ce8 [<ffffffff9a236170>] ? 0xffffffff9a236170 [<ffffffff9a38b28d>] ? 0xffffffff9a38b28d [<ffffffff9a38b784>] ? 0xffffffff9a38b784 [<ffffffff9a38baa4>] ? 0xffffffff9a38baa4 [<ffffffff9a05577d>] ? 0xffffffff9a05577d [<ffffffff9a04dc47>] ? 0xffffffff9a04dc47 [<ffffffff9a04dfab>] ? 0xffffffff9a04dfab [<ffffffff9a04dea0>] ? 0xffffffff9a04dea0 [<ffffffff9a05331c>] ? 0xffffffff9a05331c [<ffffffff9a053260>] ? 0xffffffff9a053260 [<ffffffff9a6dfa0f>] ? 0xffffffff9a6dfa0f [<ffffffff9a053260>] ? 0xffffffff9a053260 --[ end trace 1b6d28ee34071679 ]--- Nervertheless resume works, so it doesn't hurt me. BTW: I get also up to 40..50! compile warnings like: i915/i915_drv.h: In function 'i915_debugfs_connector_add': i915/i915_drv.h:3119:53: warning: no return statement in function returning non-void [-Wreturn-type] which may cause yet uncovered troubles. Thanks, Jörg ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [4.2.0-rc1-00201-g59c3cb5] Regression: kernel NULL pointer dereference 2015-07-13 7:42 ` Jörg Otte @ 2015-07-13 7:58 ` Maarten Lankhorst 2015-07-13 8:50 ` Jörg Otte 0 siblings, 1 reply; 10+ messages in thread From: Maarten Lankhorst @ 2015-07-13 7:58 UTC (permalink / raw) To: Jörg Otte Cc: Linus Torvalds, David Airlie, DRI, Linux Kernel Mailing List Op 13-07-15 om 09:42 schreef Jörg Otte: > 2015-07-13 9:23 GMT+02:00 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>: >> Op 13-07-15 om 08:22 schreef Daniel Vetter: >>> On Sun, Jul 12, 2015 at 09:52:51AM -0700, Linus Torvalds wrote: >>>> On Sun, Jul 12, 2015 at 1:03 AM, Jörg Otte <jrg.otte@gmail.com> wrote: >>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000009 >>>>> IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb >>>> Ugh. Please enable KALLSYMS to get sane symbols. >>>> >>>> But yes, "crtc_state->base.active" is at offset 9 from "crtc_state", >>>> so it's pretty clearly just that change frm >>>> >>>> - if (intel_crtc->active) { >>>> + if (crtc_state->base.active) { >>>> >>>> and "crtc_state" is NULL. >>>> >>>> And the code very much knows that crtc_state can be NULL, since it's >>>> initialized with >>>> >>>> crtc_state = state->base.state ? >>>> intel_atomic_get_crtc_state(state->base.state, >>>> intel_crtc) : NULL; >>>> >>>> Tssk. Daniel? Should I just revert that commit dec4f799d0a4 >>>> ("drm/i915: Use crtc_state->active in primary check_plane func") for >>>> now, or is there a better fix? Like just checking crtc_state for NULL? >>> Indeed embarrassing. I've missed that we still have 1 caller left that's >>> using the transitional helpers, and those don't fill out >>> plane_state->state backpointers to the global atomic update since there is >>> no global atomic update for transitional helpers. Below diff should fix >>> this - we need to preferentially check crts_state->active and if that's >>> not set intel_crtc->active should yield the right result for the one >>> remaining caller (it's in the crtc_disable paths). >>> >>> For cheap excuses why i915 is so crap in 4.2: Thanks to a hipshot decision >>> to transition to a different QA team ("we'll do this in 1 week without >>> upfront planing") I essentially don't have proper QA support for 1-2 >>> months by now. The other trouble in this area specifically is that this >>> code is already completely changed in -next again, so any testing done on >>> integration trees (like -next or drm-intel-nightly) won't test any patches >>> for 4.2. >>> -Daniel >>> >>> Oh and Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> in case you >>> decide to apply this right away. >>> >> Well your version has the benefit of compiling without errors. :-) >> >> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > Just noticed another problem: > On each resume I get the following error: > -----------[ cut here ]------------ > WARNING: CPU: 2 PID: 2663 at > /data/kernel/linux/drivers/gpu/drm/i915/intel_display.c:6319 > 0xffffffff9a33d5e9() > WARN_ON(!crtc->state->enable) > CPU: 2 PID: 2663 Comm: kworker/u8:80 Not tainted 4.2.0-rc2 #15 > ardware name: FUJITSU LIFEBOOK AH532/FJNBB1C, BIOS Version 1.09 05/22/2012 > orkqueue: events_unbound 0xffffffff9a055750 > 0000000000000000 ffffffff9a98ea28 ffffffff9a6d84d2 0000000000000000 > ffffffff9a03c416 ffff88020951c4e0 0000000000000000 0000000000000000 > ffff8802141cb800 ffff88021630c000 ffffffff9a03c4d5 ffffffff9a9c3664 > all Trace: > [<ffffffff9a6d84d2>] ? 0xffffffff9a6d84d2 > [<ffffffff9a03c416>] ? 0xffffffff9a03c416 > [<ffffffff9a03c4d5>] ? 0xffffffff9a03c4d5 > [<ffffffff9a33d5e9>] ? 0xffffffff9a33d5e9 > [<ffffffff9a343ac3>] ? 0xffffffff9a343ac3 > [<ffffffff9a34444a>] ? 0xffffffff9a34444a > [<ffffffff9a345518>] ? 0xffffffff9a345518 > [<ffffffff9a3246f0>] ? 0xffffffff9a3246f0 > [<ffffffff9a2e1ce8>] ? 0xffffffff9a2e1ce8 > [<ffffffff9a236170>] ? 0xffffffff9a236170 > [<ffffffff9a38b28d>] ? 0xffffffff9a38b28d > [<ffffffff9a38b784>] ? 0xffffffff9a38b784 > [<ffffffff9a38baa4>] ? 0xffffffff9a38baa4 > [<ffffffff9a05577d>] ? 0xffffffff9a05577d > [<ffffffff9a04dc47>] ? 0xffffffff9a04dc47 > [<ffffffff9a04dfab>] ? 0xffffffff9a04dfab > [<ffffffff9a04dea0>] ? 0xffffffff9a04dea0 > [<ffffffff9a05331c>] ? 0xffffffff9a05331c > [<ffffffff9a053260>] ? 0xffffffff9a053260 > [<ffffffff9a6dfa0f>] ? 0xffffffff9a6dfa0f > [<ffffffff9a053260>] ? 0xffffffff9a053260 > --[ end trace 1b6d28ee34071679 ]--- > > Nervertheless resume works, so it doesn't hurt me. > > > BTW: I get also up to 40..50! compile warnings like: > i915/i915_drv.h: In function 'i915_debugfs_connector_add': > i915/i915_drv.h:3119:53: warning: no return statement in function > returning non-void [-Wreturn-type] > > which may cause yet uncovered troubles. > > Thanks, Jörg kallsyms please! Looks like intel_crtc_disable being called with a mode change on a already disabled crtc, it's gone in 4.3 because of the atomic rework. Does something like below work? diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index ba9321998a41..725d2b727704 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -6315,9 +6315,6 @@ static void intel_crtc_disable(struct drm_crtc *crtc) struct drm_connector *connector; struct drm_i915_private *dev_priv = dev->dev_private; - /* crtc should still be enabled when we disable it. */ - WARN_ON(!crtc->state->enable); - intel_crtc_disable_planes(crtc); dev_priv->display.crtc_disable(crtc); dev_priv->display.off(crtc); @@ -12591,7 +12588,8 @@ static int __intel_set_mode(struct drm_crtc *modeset_crtc, continue; if (!crtc_state->enable) { - intel_crtc_disable(crtc); + if (crtc->state->enable) + intel_crtc_disable(crtc); } else if (crtc->state->enable) { intel_crtc_disable_planes(crtc); dev_priv->display.crtc_disable(crtc); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [4.2.0-rc1-00201-g59c3cb5] Regression: kernel NULL pointer dereference 2015-07-13 7:58 ` Maarten Lankhorst @ 2015-07-13 8:50 ` Jörg Otte 2015-07-14 11:00 ` [PATCH] drm/i915: Do not call intel_crtc_disable if the crtc is already disabled Maarten Lankhorst 0 siblings, 1 reply; 10+ messages in thread From: Jörg Otte @ 2015-07-13 8:50 UTC (permalink / raw) To: Maarten Lankhorst Cc: Linus Torvalds, David Airlie, DRI, Linux Kernel Mailing List 2015-07-13 9:58 GMT+02:00 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>: > Op 13-07-15 om 09:42 schreef Jörg Otte: >> 2015-07-13 9:23 GMT+02:00 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>: >>> Op 13-07-15 om 08:22 schreef Daniel Vetter: >>>> On Sun, Jul 12, 2015 at 09:52:51AM -0700, Linus Torvalds wrote: >>>>> On Sun, Jul 12, 2015 at 1:03 AM, Jörg Otte <jrg.otte@gmail.com> wrote: >>>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000009 >>>>>> IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb >>>>> Ugh. Please enable KALLSYMS to get sane symbols. >>>>> >>>>> But yes, "crtc_state->base.active" is at offset 9 from "crtc_state", >>>>> so it's pretty clearly just that change frm >>>>> >>>>> - if (intel_crtc->active) { >>>>> + if (crtc_state->base.active) { >>>>> >>>>> and "crtc_state" is NULL. >>>>> >>>>> And the code very much knows that crtc_state can be NULL, since it's >>>>> initialized with >>>>> >>>>> crtc_state = state->base.state ? >>>>> intel_atomic_get_crtc_state(state->base.state, >>>>> intel_crtc) : NULL; >>>>> >>>>> Tssk. Daniel? Should I just revert that commit dec4f799d0a4 >>>>> ("drm/i915: Use crtc_state->active in primary check_plane func") for >>>>> now, or is there a better fix? Like just checking crtc_state for NULL? >>>> Indeed embarrassing. I've missed that we still have 1 caller left that's >>>> using the transitional helpers, and those don't fill out >>>> plane_state->state backpointers to the global atomic update since there is >>>> no global atomic update for transitional helpers. Below diff should fix >>>> this - we need to preferentially check crts_state->active and if that's >>>> not set intel_crtc->active should yield the right result for the one >>>> remaining caller (it's in the crtc_disable paths). >>>> >>>> For cheap excuses why i915 is so crap in 4.2: Thanks to a hipshot decision >>>> to transition to a different QA team ("we'll do this in 1 week without >>>> upfront planing") I essentially don't have proper QA support for 1-2 >>>> months by now. The other trouble in this area specifically is that this >>>> code is already completely changed in -next again, so any testing done on >>>> integration trees (like -next or drm-intel-nightly) won't test any patches >>>> for 4.2. >>>> -Daniel >>>> >>>> Oh and Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> in case you >>>> decide to apply this right away. >>>> >>> Well your version has the benefit of compiling without errors. :-) >>> >>> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> >> Just noticed another problem: >> On each resume I get the following error: >> -----------[ cut here ]------------ >> WARNING: CPU: 2 PID: 2663 at >> /data/kernel/linux/drivers/gpu/drm/i915/intel_display.c:6319 >> 0xffffffff9a33d5e9() >> WARN_ON(!crtc->state->enable) >> CPU: 2 PID: 2663 Comm: kworker/u8:80 Not tainted 4.2.0-rc2 #15 >> ardware name: FUJITSU LIFEBOOK AH532/FJNBB1C, BIOS Version 1.09 05/22/2012 >> orkqueue: events_unbound 0xffffffff9a055750 >> 0000000000000000 ffffffff9a98ea28 ffffffff9a6d84d2 0000000000000000 >> ffffffff9a03c416 ffff88020951c4e0 0000000000000000 0000000000000000 >> ffff8802141cb800 ffff88021630c000 ffffffff9a03c4d5 ffffffff9a9c3664 >> all Trace: >> [<ffffffff9a6d84d2>] ? 0xffffffff9a6d84d2 >> [<ffffffff9a03c416>] ? 0xffffffff9a03c416 >> [<ffffffff9a03c4d5>] ? 0xffffffff9a03c4d5 >> [<ffffffff9a33d5e9>] ? 0xffffffff9a33d5e9 >> [<ffffffff9a343ac3>] ? 0xffffffff9a343ac3 >> [<ffffffff9a34444a>] ? 0xffffffff9a34444a >> [<ffffffff9a345518>] ? 0xffffffff9a345518 >> [<ffffffff9a3246f0>] ? 0xffffffff9a3246f0 >> [<ffffffff9a2e1ce8>] ? 0xffffffff9a2e1ce8 >> [<ffffffff9a236170>] ? 0xffffffff9a236170 >> [<ffffffff9a38b28d>] ? 0xffffffff9a38b28d >> [<ffffffff9a38b784>] ? 0xffffffff9a38b784 >> [<ffffffff9a38baa4>] ? 0xffffffff9a38baa4 >> [<ffffffff9a05577d>] ? 0xffffffff9a05577d >> [<ffffffff9a04dc47>] ? 0xffffffff9a04dc47 >> [<ffffffff9a04dfab>] ? 0xffffffff9a04dfab >> [<ffffffff9a04dea0>] ? 0xffffffff9a04dea0 >> [<ffffffff9a05331c>] ? 0xffffffff9a05331c >> [<ffffffff9a053260>] ? 0xffffffff9a053260 >> [<ffffffff9a6dfa0f>] ? 0xffffffff9a6dfa0f >> [<ffffffff9a053260>] ? 0xffffffff9a053260 >> --[ end trace 1b6d28ee34071679 ]--- >> >> Nervertheless resume works, so it doesn't hurt me. >> >> >> BTW: I get also up to 40..50! compile warnings like: >> i915/i915_drv.h: In function 'i915_debugfs_connector_add': >> i915/i915_drv.h:3119:53: warning: no return statement in function >> returning non-void [-Wreturn-type] >> >> which may cause yet uncovered troubles. >> >> Thanks, Jörg > kallsyms please! > > Looks like intel_crtc_disable being called with a mode change on a already disabled crtc, it's gone in 4.3 because of the atomic rework. > > Does something like below work? > > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c > index ba9321998a41..725d2b727704 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -6315,9 +6315,6 @@ static void intel_crtc_disable(struct drm_crtc *crtc) > struct drm_connector *connector; > struct drm_i915_private *dev_priv = dev->dev_private; > > - /* crtc should still be enabled when we disable it. */ > - WARN_ON(!crtc->state->enable); > - > intel_crtc_disable_planes(crtc); > dev_priv->display.crtc_disable(crtc); > dev_priv->display.off(crtc); > @@ -12591,7 +12588,8 @@ static int __intel_set_mode(struct drm_crtc *modeset_crtc, > continue; > > if (!crtc_state->enable) { > - intel_crtc_disable(crtc); > + if (crtc->state->enable) > + intel_crtc_disable(crtc); > } else if (crtc->state->enable) { > intel_crtc_disable_planes(crtc); > dev_priv->display.crtc_disable(crtc); > The patch works for me. Thanks, Jörg ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] drm/i915: Do not call intel_crtc_disable if the crtc is already disabled. 2015-07-13 8:50 ` Jörg Otte @ 2015-07-14 11:00 ` Maarten Lankhorst 0 siblings, 0 replies; 10+ messages in thread From: Maarten Lankhorst @ 2015-07-14 11:00 UTC (permalink / raw) To: Jörg Otte Cc: Linus Torvalds, David Airlie, DRI, Linux Kernel Mailing List When resuming with dpms off, the following warn can happen: [ 118.334082] ------------[ cut here ]------------ [ 118.334105] WARNING: CPU: 2 PID: 2274 at drivers/gpu/drm/i915/intel_display.c:6319 __intel_set_mode+0xae5/0xb90 [i915]() [ 118.334106] WARN_ON(!crtc->state->enable) [ 118.334137] Modules linked in: i915 [ 118.334139] CPU: 2 PID: 2274 Comm: kworker/u16:117 Not tainted 4.2.0-rc2-fixes+ #4148 [ 118.334140] Hardware name: LENOVO 2349AV8/2349AV8, BIOS G1ETA5WW (2.65 ) 04/15/2014 [ 118.334144] Workqueue: events_unbound async_run_entry_fn [ 118.334147] ffffffffc017eef0 ffff8800ada93998 ffffffff817aa62a 0000000080000001 [ 118.334149] ffff8800ada939e8 ffff8800ada939d8 ffffffff810807e1 ffff8800ada939c8 [ 118.334151] ffff8800cea3b3d8 0000000000000000 ffff8800ad86b008 ffff880117705668 [ 118.334151] Call Trace: [ 118.334155] [<ffffffff817aa62a>] dump_stack+0x4f/0x7b [ 118.334157] [<ffffffff810807e1>] warn_slowpath_common+0x81/0xc0 [ 118.334158] [<ffffffff81080861>] warn_slowpath_fmt+0x41/0x50 [ 118.334173] [<ffffffffc0120375>] __intel_set_mode+0xae5/0xb90 [i915] [ 118.334188] [<ffffffffc0121312>] ? intel_modeset_compute_config+0x52/0xb40 [i915] [ 118.334191] [<ffffffff8144de53>] ? drm_atomic_set_fb_for_plane+0x63/0x80 [ 118.334205] [<ffffffffc01269d9>] intel_set_mode+0x29/0x60 [i915] [ 118.334219] [<ffffffffc012730a>] intel_crtc_restore_mode+0x13a/0x1f0 [i915] [ 118.334232] [<ffffffffc0101160>] ? gen6_write16+0x250/0x250 [i915] [ 118.334246] [<ffffffffc01283ec>] intel_modeset_setup_hw_state+0x89c/0xcd0 [i915] [ 118.334248] [<ffffffff8137d260>] ? pci_pm_thaw+0x90/0x90 [ 118.334255] [<ffffffffc00ac11b>] i915_drm_resume+0xcb/0x160 [i915] [ 118.334262] [<ffffffffc00ac1d2>] i915_pm_resume+0x22/0x30 [i915] [ 118.334263] [<ffffffff8137d2c3>] pci_pm_resume+0x63/0xa0 [ 118.334266] [<ffffffff81467550>] dpm_run_callback+0x70/0x420 [ 118.334267] [<ffffffff81467cbd>] device_resume+0x9d/0x1c0 [ 118.334269] [<ffffffff814673d0>] ? initcall_debug_start+0x60/0x60 [ 118.334270] [<ffffffff81467dfc>] async_resume+0x1c/0x50 [ 118.334271] [<ffffffff810a6a94>] async_run_entry_fn+0x34/0xd0 [ 118.334273] [<ffffffff8109d4ad>] process_one_work+0x1dd/0x7e0 [ 118.334275] [<ffffffff8109d41a>] ? process_one_work+0x14a/0x7e0 [ 118.334276] [<ffffffff8109daf9>] worker_thread+0x49/0x450 [ 118.334278] [<ffffffff8109dab0>] ? process_one_work+0x7e0/0x7e0 [ 118.334280] [<ffffffff810a3cb9>] kthread+0xf9/0x110 [ 118.334282] [<ffffffff810a3bc0>] ? insert_kthread_work+0x90/0x90 [ 118.334284] [<ffffffff817b414f>] ret_from_fork+0x3f/0x70 [ 118.334286] [<ffffffff810a3bc0>] ? insert_kthread_work+0x90/0x90 [ 118.334287] ---[ end trace 01f2cf6371b82d7a ]--- This warn is harmless, and can be fixed by not calling intel_crtc_disable when the crtc is already disabled. Reported-and-Tested-by: Jörg Otte <jrg.otte@gmail.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> --- diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 85ac6d85dc39..30e0f54ba19d 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -6315,9 +6315,6 @@ static void intel_crtc_disable(struct drm_crtc *crtc) struct drm_connector *connector; struct drm_i915_private *dev_priv = dev->dev_private; - /* crtc should still be enabled when we disable it. */ - WARN_ON(!crtc->state->enable); - intel_crtc_disable_planes(crtc); dev_priv->display.crtc_disable(crtc); dev_priv->display.off(crtc); @@ -12591,7 +12588,8 @@ static int __intel_set_mode(struct drm_crtc *modeset_crtc, continue; if (!crtc_state->enable) { - intel_crtc_disable(crtc); + if (crtc->state->enable) + intel_crtc_disable(crtc); } else if (crtc->state->enable) { intel_crtc_disable_planes(crtc); dev_priv->display.crtc_disable(crtc); ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-07-14 11:00 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-07-12 8:03 [4.2.0-rc1-00201-g59c3cb5] Regression: kernel NULL pointer dereference Jörg Otte 2015-07-12 16:33 ` Jörg Otte 2015-07-12 16:52 ` Linus Torvalds 2015-07-13 5:56 ` Maarten Lankhorst 2015-07-13 6:22 ` Daniel Vetter 2015-07-13 7:23 ` Maarten Lankhorst 2015-07-13 7:42 ` Jörg Otte 2015-07-13 7:58 ` Maarten Lankhorst 2015-07-13 8:50 ` Jörg Otte 2015-07-14 11:00 ` [PATCH] drm/i915: Do not call intel_crtc_disable if the crtc is already disabled Maarten Lankhorst
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).