All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>,
	Matt Roper <matthew.d.roper@intel.com>,
	Lucas De Marchi <lucas.demarchi@intel.com>,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	Caz Yokoyama <caz.yokoyama@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jani Nikula <jani.nikula@linux.intel.com>,
	Matthew Brost <matthew.brost@intel.com>,
	intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [BUG 5.15-rc3] kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
Date: Sat, 2 Oct 2021 03:17:29 -0700 (PDT)	[thread overview]
Message-ID: <259ff554-76b8-8523-033-9e996f549c70@google.com> (raw)
In-Reply-To: <20211002020257.34a0e882@oasis.local.home>

On Sat, 2 Oct 2021, Steven Rostedt wrote:

> When I tried to test patches applied to v5.15-rc3, I hit this bug (and
> hence can not test my code), on 32 bit x86.
> 
> ------------[ cut here ]------------
> kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc1-test+ #456
> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> EIP: __i915_sw_fence_init+0x15/0x38
> Code: 2b 3d 58 98 88 c1 74 05 e8 60 d9 58 00 8d 65 f4 5b 5e 5f 5d c3 3e
> 8d 74 26 00 55 89 e5 56 89 d6 53 85 d2 74 05 f6 c2 03 74 02 <0f> 0b 89
> ca 8b 4d 08 89 c3 e8 48 94 ab ff 89 73 34 c7 43 38 01 00
> EAX: c2508260 EBX: c2508000 ECX: c143de1e EDX: c09dfadd
> ESI: c09dfadd EDI: c45e7200 EBP: c26c9c68 ESP: c26c9c60
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010202
> CR0: 80050033 CR2: 00000000 CR3: 019e2000 CR4: 001506f0
> Call Trace:
>  intel_context_init+0x112/0x145
>  intel_context_create+0x29/0x37
>  intel_ring_submission_setup+0x3cb/0x5a8
>  ? kfree+0x135/0x1c6
>  ? wa_init_finish+0x32/0x59
>  ? wa_init_finish+0x4f/0x59
>  ? intel_engine_init_ctx_wa+0x39a/0x3b3
>  intel_engines_init+0x2dd/0x4d0
>  ? gen6_bsd_submit_request+0x97/0x97
>  intel_gt_init+0x122/0x20d
>  i915_gem_init+0x80/0xef
>  i915_driver_probe+0x880/0xa90
>  ? i915_pci_remove+0x27/0x27
>  i915_pci_probe+0xdd/0xf6
>  ? __pm_runtime_resume+0x63/0x6b
>  ? i915_pci_remove+0x27/0x27
>  pci_device_probe+0xbc/0x11e
>  really_probe+0x13e/0x328
>  __driver_probe_device+0x140/0x176
>  driver_probe_device+0x1f/0x71
>  __driver_attach+0xf6/0x109
>  ? __device_attach_driver+0xbd/0xbd
>  bus_for_each_dev+0x5b/0x88
>  driver_attach+0x19/0x1b
>  ? __device_attach_driver+0xbd/0xbd
>  bus_add_driver+0xf2/0x199
>  driver_register+0x8c/0xbe
>  __pci_register_driver+0x5b/0x60
>  i915_register_pci_driver+0x19/0x1b
>  i915_init+0x15/0x67
>  ? radeon_module_init+0x6a/0x6a
>  do_one_initcall+0xce/0x21c
>  ? rcu_read_lock_sched_held+0x35/0x6d
>  ? trace_initcall_level+0x5f/0x99
>  kernel_init_freeable+0x1fb/0x247
>  ? rest_init+0x129/0x129
>  kernel_init+0x17/0xfd
>  ret_from_fork+0x1c/0x28
> Modules linked in:
> ---[ end trace 791dc89810d853da ]---
> EIP: __i915_sw_fence_init+0x15/0x38
> Code: 2b 3d 58 98 88 c1 74 05 e8 60 d9 58 00 8d 65 f4 5b 5e 5f 5d c3 3e
> 8d 74 26 00 55 89 e5 56 89 d6 53 85 d2 74 05 f6 c2 03 74 02 <0f> 0b 89
> ca 8b 4d 08 89 c3 e8 48 94 ab ff 89 73 34 c7 43 38 01 00
> EAX: c2508260 EBX: c2508000 ECX: c143de1e EDX: c09dfadd
> ESI: c09dfadd EDI: c45e7200 EBP: c26c9c68 ESP: c26c9c60
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010202
> CR0: 80050033 CR2: 00000000 CR3: 019e2000 CR4: 001506f0
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> Kernel Offset: disabled
> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> 
> Attached is the dmesg and the config.
> 
> I bisected it down to this commit:
> 
> 3ffe82d701a4 ("drm/i915/xehp: handle new steering options")

Yes (though bisection doesn't work right on this one): the fix
https://lore.kernel.org/lkml/1f955bff-fd9e-d2ee-132a-f758add9e9cb@google.com/
seems to have got lost in the system: it has not even appeared in
linux-next yet. I was going to send a reminder later this weekend.

Here it is again (but edited to replace "__aligned(4)" in the original
by the official "__i915_sw_fence_call" I discovered afterwards; and
ignoring recent discussions of where __attributes ought to appear :-)


[PATCH] drm/i915: fix blank screen booting crashes

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and suprising "culprits", none of them the actual culprit.

netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: ffffffff814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins <hughd@google.com>
---

 drivers/gpu/drm/i915/gt/intel_context.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -362,6 +362,7 @@ static int __intel_context_active(struct
 	return 0;
 }
 
+__i915_sw_fence_call	/* Respect the I915_SW_FENCE_MASK */
 static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
 				 enum i915_sw_fence_notify state)
 {

WARNING: multiple messages have this Message-ID (diff)
From: Hugh Dickins <hughd@google.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>,
	 Matt Roper <matthew.d.roper@intel.com>,
	 Lucas De Marchi <lucas.demarchi@intel.com>,
	 Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	 Caz Yokoyama <caz.yokoyama@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	 Linus Torvalds <torvalds@linux-foundation.org>,
	 Jani Nikula <jani.nikula@linux.intel.com>,
	 Matthew Brost <matthew.brost@intel.com>,
	intel-gfx@lists.freedesktop.org,
	 dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [BUG 5.15-rc3] kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
Date: Sat, 2 Oct 2021 03:17:29 -0700 (PDT)	[thread overview]
Message-ID: <259ff554-76b8-8523-033-9e996f549c70@google.com> (raw)
In-Reply-To: <20211002020257.34a0e882@oasis.local.home>

On Sat, 2 Oct 2021, Steven Rostedt wrote:

> When I tried to test patches applied to v5.15-rc3, I hit this bug (and
> hence can not test my code), on 32 bit x86.
> 
> ------------[ cut here ]------------
> kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc1-test+ #456
> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> EIP: __i915_sw_fence_init+0x15/0x38
> Code: 2b 3d 58 98 88 c1 74 05 e8 60 d9 58 00 8d 65 f4 5b 5e 5f 5d c3 3e
> 8d 74 26 00 55 89 e5 56 89 d6 53 85 d2 74 05 f6 c2 03 74 02 <0f> 0b 89
> ca 8b 4d 08 89 c3 e8 48 94 ab ff 89 73 34 c7 43 38 01 00
> EAX: c2508260 EBX: c2508000 ECX: c143de1e EDX: c09dfadd
> ESI: c09dfadd EDI: c45e7200 EBP: c26c9c68 ESP: c26c9c60
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010202
> CR0: 80050033 CR2: 00000000 CR3: 019e2000 CR4: 001506f0
> Call Trace:
>  intel_context_init+0x112/0x145
>  intel_context_create+0x29/0x37
>  intel_ring_submission_setup+0x3cb/0x5a8
>  ? kfree+0x135/0x1c6
>  ? wa_init_finish+0x32/0x59
>  ? wa_init_finish+0x4f/0x59
>  ? intel_engine_init_ctx_wa+0x39a/0x3b3
>  intel_engines_init+0x2dd/0x4d0
>  ? gen6_bsd_submit_request+0x97/0x97
>  intel_gt_init+0x122/0x20d
>  i915_gem_init+0x80/0xef
>  i915_driver_probe+0x880/0xa90
>  ? i915_pci_remove+0x27/0x27
>  i915_pci_probe+0xdd/0xf6
>  ? __pm_runtime_resume+0x63/0x6b
>  ? i915_pci_remove+0x27/0x27
>  pci_device_probe+0xbc/0x11e
>  really_probe+0x13e/0x328
>  __driver_probe_device+0x140/0x176
>  driver_probe_device+0x1f/0x71
>  __driver_attach+0xf6/0x109
>  ? __device_attach_driver+0xbd/0xbd
>  bus_for_each_dev+0x5b/0x88
>  driver_attach+0x19/0x1b
>  ? __device_attach_driver+0xbd/0xbd
>  bus_add_driver+0xf2/0x199
>  driver_register+0x8c/0xbe
>  __pci_register_driver+0x5b/0x60
>  i915_register_pci_driver+0x19/0x1b
>  i915_init+0x15/0x67
>  ? radeon_module_init+0x6a/0x6a
>  do_one_initcall+0xce/0x21c
>  ? rcu_read_lock_sched_held+0x35/0x6d
>  ? trace_initcall_level+0x5f/0x99
>  kernel_init_freeable+0x1fb/0x247
>  ? rest_init+0x129/0x129
>  kernel_init+0x17/0xfd
>  ret_from_fork+0x1c/0x28
> Modules linked in:
> ---[ end trace 791dc89810d853da ]---
> EIP: __i915_sw_fence_init+0x15/0x38
> Code: 2b 3d 58 98 88 c1 74 05 e8 60 d9 58 00 8d 65 f4 5b 5e 5f 5d c3 3e
> 8d 74 26 00 55 89 e5 56 89 d6 53 85 d2 74 05 f6 c2 03 74 02 <0f> 0b 89
> ca 8b 4d 08 89 c3 e8 48 94 ab ff 89 73 34 c7 43 38 01 00
> EAX: c2508260 EBX: c2508000 ECX: c143de1e EDX: c09dfadd
> ESI: c09dfadd EDI: c45e7200 EBP: c26c9c68 ESP: c26c9c60
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010202
> CR0: 80050033 CR2: 00000000 CR3: 019e2000 CR4: 001506f0
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> Kernel Offset: disabled
> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> 
> Attached is the dmesg and the config.
> 
> I bisected it down to this commit:
> 
> 3ffe82d701a4 ("drm/i915/xehp: handle new steering options")

Yes (though bisection doesn't work right on this one): the fix
https://lore.kernel.org/lkml/1f955bff-fd9e-d2ee-132a-f758add9e9cb@google.com/
seems to have got lost in the system: it has not even appeared in
linux-next yet. I was going to send a reminder later this weekend.

Here it is again (but edited to replace "__aligned(4)" in the original
by the official "__i915_sw_fence_call" I discovered afterwards; and
ignoring recent discussions of where __attributes ought to appear :-)


[PATCH] drm/i915: fix blank screen booting crashes

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and suprising "culprits", none of them the actual culprit.

netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: ffffffff814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins <hughd@google.com>
---

 drivers/gpu/drm/i915/gt/intel_context.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -362,6 +362,7 @@ static int __intel_context_active(struct
 	return 0;
 }
 
+__i915_sw_fence_call	/* Respect the I915_SW_FENCE_MASK */
 static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
 				 enum i915_sw_fence_notify state)
 {

  reply	other threads:[~2021-10-02 10:17 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-02  6:02 [BUG 5.15-rc3] kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245! Steven Rostedt
2021-10-02  6:02 ` [Intel-gfx] " Steven Rostedt
2021-10-02 10:17 ` Hugh Dickins [this message]
2021-10-02 10:17   ` Hugh Dickins
2021-10-02 10:17   ` [Intel-gfx] " Hugh Dickins
2021-10-02 12:17   ` Steven Rostedt
2021-10-02 12:17     ` [Intel-gfx] " Steven Rostedt
2021-10-02 16:49     ` Linus Torvalds
2021-10-02 16:49       ` Linus Torvalds
2021-10-02 16:49       ` [Intel-gfx] " Linus Torvalds
2021-10-02 17:10       ` Hugh Dickins
2021-10-02 17:10         ` [Intel-gfx] " Hugh Dickins
2021-10-02 17:10         ` Hugh Dickins
2021-10-04  7:39         ` Jani Nikula
2021-10-04  7:39           ` [Intel-gfx] " Jani Nikula
2021-10-02 10:52 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2021-10-02 11:26 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-10-02 12:46 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=259ff554-76b8-8523-033-9e996f549c70@google.com \
    --to=hughd@google.com \
    --cc=caz.yokoyama@intel.com \
    --cc=daniele.ceraolospurio@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jani.nikula@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=matthew.d.roper@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.