All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: José Roberto de Souza <jose.souza@intel.com>,
	intel-gfx@lists.freedesktop.org
Cc: Jani Nikula <jani.nikula@intel.com>
Subject: Re: [Intel-gfx] [PATCH v2] drm/i915: Add missing deinitialization cases of load failure
Date: Wed, 15 Apr 2020 20:51:14 +0100	[thread overview]
Message-ID: <158698027435.24667.4844508474670812797@build.alporthouse.com> (raw)
In-Reply-To: <20200415191408.82574-1-jose.souza@intel.com>

Quoting José Roberto de Souza (2020-04-15 20:14:08)
> The intel_display_power_put_async() used in TC cold sequences made
> easy to hit the missing deinitialization of driver in case of load
> failure as seen in the stack trace bellow.
> 
> intel_modeset_driver_remove_noirq() had to be removed from
> i915_driver_modeset_remove_noirq() as those are different
> initialialition steps with IRQ and GEM initialization in between then.
> 
> [drm:__intel_engine_init_ctx_wa [i915]] Initialized 3 context workarounds on rcs'0
> [drm:__i915_inject_probe_error [i915]] Injecting failure -19 at checkpoint 36 [__uc_init:294]
> [drm:i915_hdcp_component_unbind [i915]] I915 HDCP comp unbind
> [drm:edp_panel_vdd_off_sync [i915]] Turning [ENCODER:275:DDI A] VDD off
> [drm:edp_panel_vdd_off_sync [i915]] PP_STATUS: 0x00000000 PP_CONTROL: 0x00000060
> [drm:intel_power_well_disable [i915]] disabling AUX A
> general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 3 PID: 1142 Comm: kworker/u16:20 Tainted: G     U            5.6.0-CI-Patchwork_17226+ #1
> Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.2457.A16.1912270059 12/27/2019
> Workqueue: events_unbound intel_display_power_put_async_work [i915]
> RIP: 0010:__intel_display_power_put_domain+0xa5/0x180 [i915]
> Code: 48 85 c0 78 54 44 89 e1 41 bd 01 00 00 00 49 c7 c4 80 44 41 a0 49 d3 e5 eb 0d 48 83 eb 10 48 3b 9d 08 ad 00 00 78 32 48 8b 03 <4c> 85 68 10 74 ea 8b 53 08 85 d2 74 2d 83 ea 01 85 d2 89 53 08 75
> RSP: 0018:ffffc9000061fdb0 EFLAGS: 00010206
> RAX: 6b6b6b6b6b6b6b6b RBX: ffff8884948f5df0 RCX: 000000000000003d
> RDX: 0000000080000001 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff888479be0000 R08: ffff88849a180920 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0414480
> R13: 2000000000000000 R14: ffff888479beb320 R15: 2000000000000000
> FS:  0000000000000000(0000) GS:ffff88849ff80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00005634fa8ed670 CR3: 0000000005610004 CR4: 0000000000760ee0
> PKRU: 55555554
> Call Trace:
>  release_async_put_domains+0x9b/0x110 [i915]
>  intel_display_power_put_async_work+0x91/0xf0 [i915]
>  process_one_work+0x260/0x600
>  ? worker_thread+0xc9/0x380
>  worker_thread+0x37/0x380
>  ? process_one_work+0x600/0x600
>  kthread+0x119/0x130
>  ? kthread_park+0x80/0x80
>  ret_from_fork+0x24/0x50
> Modules linked in: i915(+) vgem snd_hda_codec_hdmi mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul cdc_ether usbnet mii snd_intel_dspcfg ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core e1000e ptp mei_me snd_pcm pps_core mei intel_lpss_pci prime_numbers [last unloaded: i915]
> ---[ end trace b402d1b4060f8b97 ]---
> BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1142, name: kworker/u16:20
> INFO: lockdep is turned off.
> Preemption disabled at:
> [<0000000000000000>] 0x0
> CPU: 3 PID: 1142 Comm: kworker/u16:20 Tainted: G     UD           5.6.0-CI-Patchwork_17226+ #1
> Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.2457.A16.1912270059 12/27/2019
> Workqueue: events_unbound intel_display_power_put_async_work [i915]
> Call Trace:
>  dump_stack+0x71/0x9b
>  ___might_sleep+0x178/0x260
>  wait_for_completion+0x37/0x1a0
>  virt_efi_query_variable_info+0x161/0x1b0
>  efi_query_variable_store+0xb3/0x1a0
>  ? efivar_entry_set_safe+0x19c/0x220
>  efivar_entry_set_safe+0x19c/0x220
>  ? efi_pstore_write+0x10b/0x150
>  ? efi_pstore_write+0xa0/0x150
>  efi_pstore_write+0x10b/0x150
>  pstore_dump+0x123/0x340
>  kmsg_dump+0x87/0x1b0
>  oops_end+0x3e/0x90
>  do_general_protection+0x1c3/0x2f0
>  general_protection+0x2d/0x40
> RIP: 0010:__intel_display_power_put_domain+0xa5/0x180 [i915]
> Code: 48 85 c0 78 54 44 89 e1 41 bd 01 00 00 00 49 c7 c4 80 44 41 a0 49 d3 e5 eb 0d 48 83 eb 10 48 3b 9d 08 ad 00 00 78 32 48 8b 03 <4c> 85 68 10 74 ea 8b 53 08 85 d2 74 2d 83 ea 01 85 d2 89 53 08 75
> RSP: 0018:ffffc9000061fdb0 EFLAGS: 00010206
> RAX: 6b6b6b6b6b6b6b6b RBX: ffff8884948f5df0 RCX: 000000000000003d
> RDX: 0000000080000001 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff888479be0000 R08: ffff88849a180920 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0414480
> R13: 2000000000000000 R14: ffff888479beb320 R15: 2000000000000000
>  release_async_put_domains+0x9b/0x110 [i915]
>  intel_display_power_put_async_work+0x91/0xf0 [i915]
>  process_one_work+0x260/0x600
>  ? worker_thread+0xc9/0x380
>  worker_thread+0x37/0x380
>  ? process_one_work+0x600/0x600
>  kthread+0x119/0x130
>  ? kthread_park+0x80/0x80
>  ret_from_fork+0x24/0x50
> ------------[ cut here ]------------
> WARNING: CPU: 3 PID: 1142 at kernel/rcu/tree_plugin.h:293 rcu_note_context_switch+0x87/0x650
> Modules linked in: i915(+) vgem snd_hda_codec_hdmi mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul cdc_ether usbnet mii snd_intel_dspcfg ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core e1000e ptp mei_me snd_pcm pps_core mei intel_lpss_pci prime_numbers [last unloaded: i915]
> 
> v2:
> - fixed handling in case of failure in drm_vblank_init()
> - moved i915_gem_driver_remove() call to before
> i915_driver_modeset_remove_noirq() this match initialization order too
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/1647
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Jani Nikula <jani.nikula@intel.com>
> Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 641f5e03b661..e31535744060 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -228,7 +228,7 @@ static int i915_driver_modeset_probe_noirq(struct drm_i915_private *i915)
>                 ret = drm_vblank_init(&i915->drm,
>                                       INTEL_NUM_PIPES(i915));
>                 if (ret)
> -                       goto out;
> +                       return ret;
>         }
>  
>         intel_bios_init(i915);
> @@ -248,8 +248,11 @@ static int i915_driver_modeset_probe_noirq(struct drm_i915_private *i915)
>         return 0;
>  
>  cleanup_vga_client:
> +       intel_csr_ucode_fini(i915);
> +       intel_power_domains_driver_remove(i915);
>         intel_vga_unregister(i915);
>  out:

Someone might quibble that the label should be more descriptive of its
new role.

> +       intel_bios_driver_remove(i915);
>         return ret;
>  }
>  
> @@ -308,13 +311,13 @@ static void i915_driver_modeset_remove(struct drm_i915_private *i915)
>  /* part #2: call after irq uninstall */
>  static void i915_driver_modeset_remove_noirq(struct drm_i915_private *i915)
>  {
> -       intel_modeset_driver_remove_noirq(i915);

moved to caller

> +       intel_csr_ucode_fini(i915);

reordered

>  
> -       intel_bios_driver_remove(i915);

reordered

> +       intel_power_domains_driver_remove(i915);

pulled from caller

>  
>         intel_vga_unregister(i915);
>  
> -       intel_csr_ucode_fini(i915);
> +       intel_bios_driver_remove(i915);

reordered.

>  }
>  
>  static void intel_init_dpio(struct drm_i915_private *dev_priv)
> @@ -992,7 +995,7 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  out_cleanup_irq:
>         intel_irq_uninstall(i915);
>  out_cleanup_modeset:
> -       /* FIXME */
> +       i915_driver_modeset_remove_noirq(i915);
>  out_cleanup_hw:
>         i915_driver_hw_remove(i915);
>         intel_memory_regions_driver_release(i915);
> @@ -1029,12 +1032,13 @@ void i915_driver_remove(struct drm_i915_private *i915)
>  
>         intel_irq_uninstall(i915);
>  
> -       i915_driver_modeset_remove_noirq(i915);
> +       intel_modeset_driver_remove_noirq(i915);
>  
> -       i915_reset_error_state(i915);
>         i915_gem_driver_remove(i915);
>  
> -       intel_power_domains_driver_remove(i915);
> +       i915_driver_modeset_remove_noirq(i915);
> +
> +       i915_reset_error_state(i915);

Ok, so the net result is just that the sequence is reordered (plus the
FIXME). Aiui, fixing the FIXME is the critical part for
reload-with-fault-injection.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2020-04-15 19:51 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-15 19:14 [Intel-gfx] [PATCH v2] drm/i915: Add missing deinitialization cases of load failure José Roberto de Souza
2020-04-15 19:22 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2020-04-15 19:36 ` [Intel-gfx] [PATCH v2] " Chris Wilson
2020-04-16 19:10   ` Souza, Jose
2020-04-15 19:39 ` [Intel-gfx] ✗ Fi.CI.DOCS: warning for " Patchwork
2020-04-15 19:45 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-04-15 19:51 ` Chris Wilson [this message]
2020-04-16 16:08 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2020-04-16 16:42 ` [Intel-gfx] [PATCH v2] " Imre Deak
2020-04-16 17:03   ` Souza, Jose
2020-04-16 17:06     ` Imre Deak
2020-04-16 17:20       ` Souza, Jose
2020-04-16 17:28         ` Imre Deak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=158698027435.24667.4844508474670812797@build.alporthouse.com \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jani.nikula@intel.com \
    --cc=jose.souza@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.