All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/27] Recover from failure to probe GPU
@ 2023-01-03 22:18 ` Mario Limonciello
  0 siblings, 0 replies; 102+ messages in thread
From: Mario Limonciello @ 2023-01-03 22:18 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Javier Martinez Canillas, Carlos Soriano Sanchez, amd-gfx,
	dri-devel, David Airlie, Daniel Vetter, christian.koenig,
	Lazar Lijo, Mario Limonciello, linux-kernel

One of the first thing that KMS drivers do during initialization is
destroy the system firmware framebuffer by means of
`drm_aperture_remove_conflicting_pci_framebuffers`

This means that if for any reason the GPU failed to probe the user
will be stuck with at best a screen frozen at the last thing that
was shown before the KMS driver continued it's probe.

The problem is most pronounced when new GPU support is introduced
because users will need to have a recent linux-firmware snapshot
on their system when they boot a kernel with matching support.

However the problem is further exaggerated in the case of amdgpu because
it has migrated to "IP discovery" where amdgpu will attempt to load
on "ALL" AMD GPUs even if the driver is missing support for IP blocks
contained in that GPU.

IP discovery requires some probing and isn't run until after the
framebuffer has been destroyed.

This means a situation can occur where a user purchases a new GPU not
yet supported by a distribution and when booting the installer it will
"freeze" even if the distribution doesn't have the matching kernel support
for those IP blocks.

The perfect example of this is Ubuntu 22.10 and the new dGPUs just
launched by AMD.  The installation media ships with kernel 5.19 (which
has IP discovery) but the amdgpu support for those IP blocks landed in
kernel 6.0. The matching linux-firmware was released after 22.10's launch.
The screen will freeze without nomodeset. Even if a user manages to install
and then upgrades to kernel 6.0 after install they'll still have the
problem of missing firmware, and the same experience.

This is quite jarring for users, particularly if they don't know
that they have to use "nomodeset" to install.

To help the situation make changes to GPU discovery:
1) Delay releasing the firmware framebuffer until after early_init
completed.  This will help the situation of an older kernel that doesn't
yet support the IP blocks probing a new GPU. IP discovery will have failed.
2) Request loading all PSP, VCN, SDMA, SMU, DMCUB, MES and GC microcode
into memory during early_init. This will help the situation of new enough
kernel for the IP discovery phase to otherwise pass but missing microcode
from linux-firmware.git.

v3->v4:
 * Rework to delay framebuffer release until early_init is done
 * Make individual IPs load microcode during early init phase
 * Add SMU and DMCUB cases for early_init loading
 * Add some new helper code for wrapping request_firmware calls (needed for
   early_init to return something besides -ENOENT)
v2->v3:
 * Pick up tags for patches 1-10
 * Rework patch 11 to not validate during discovery
 * Fix bugs with GFX9 due to gfx.num_gfx_rings not being set during discovery
 * Fix naming scheme for SDMA on dGPUs
v1->v2:
 * Take the suggestion from v1 thread to delay the framebuffer release until
   ip discovery is done. This patch is CC to stable to that older stable
   kernels with IP discovery won't try to probe unknown IP.
 * Drop changes to drm aperature.
 * Fetch SDMA, VCN, MES, GC and PSP microcode during IP discovery.
Mario Limonciello (27):
  drm/amd: Delay removal of the firmware framebuffer
  drm/amd: Add a legacy mapping to "amdgpu_ucode_ip_version_decode"
  drm/amd: Convert SMUv11 microcode to use
    `amdgpu_ucode_ip_version_decode`
  drm/amd: Convert SMUv13 microcode to use
    `amdgpu_ucode_ip_version_decode`
  drm/amd: Add a new helper for loading/validating microcode
  drm/amd: Use `amdgpu_ucode_load` helper for SDMA
  drm/amd: Convert SDMA to use `amdgpu_ucode_ip_version_decode`
  drm/amd: Make SDMA firmware load failures less noisy.
  drm/amd: Use `amdgpu_ucode_load` helper for VCN
  drm/amd: Load VCN microcode during early_init
  drm/amd: Load MES microcode during early_init
  drm/amd: Use `amdgpu_ucode_load` helper for MES
  drm/amd: Remove superfluous assignment for `adev->mes.adev`
  drm/amd: Use `amdgpu_ucode_load` helper for GFX9
  drm/amd: Load GFX9 microcode during early_init
  drm/amd: Use `amdgpu_ucode_load` helper for GFX10
  drm/amd: Load GFX10 microcode during early_init
  drm/amd: Use `amdgpu_ucode_load` helper for GFX11
  drm/amd: Load GFX11 microcode during early_init
  drm/amd: Parse both v1 and v2 TA microcode headers using same function
  drm/amd: Avoid BUG() for case of SRIOV missing IP version
  drm/amd: Load PSP microcode during early_init
  drm/amd: Use `amdgpu_ucode_load` helper for PSP
  drm/amd/display: Load DMUB microcode during early_init
  drm/amd: Use `amdgpu_ucode_load` helper for SMU
  drm/amd: Load SMU microcode during early_init
  drm/amd: Optimize SRIOV switch/case for PSP microcode load

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |   8 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   6 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |  60 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c       | 276 +++++++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h       |  15 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c      |  18 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h      |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c     | 245 ++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h     |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c       | 103 +++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h       |   1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c        | 117 ++------
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 101 +++----
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c         | 101 ++-----
 drivers/gpu/drm/amd/amdgpu/mes_v10_1.c        |  98 ++-----
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c        |  89 ++----
 drivers/gpu/drm/amd/amdgpu/psp_v10_0.c        |  80 +----
 drivers/gpu/drm/amd/amdgpu/psp_v11_0.c        | 129 +-------
 drivers/gpu/drm/amd/amdgpu/psp_v12_0.c        |  75 +----
 drivers/gpu/drm/amd/amdgpu/psp_v13_0.c        |  27 +-
 drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c      |  14 +-
 drivers/gpu/drm/amd/amdgpu/psp_v3_1.c         |  16 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c        |  47 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c        |  30 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c        |  55 +---
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c        |  25 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c         |   5 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c         |   5 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c         |   5 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c         |   5 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c         |   5 +-
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  89 ++++--
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c     |  12 +-
 .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c    |  40 +--
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c    |  17 +-
 36 files changed, 751 insertions(+), 1174 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2023-01-04 15:53 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-03 22:18 [PATCH v4 00/27] Recover from failure to probe GPU Mario Limonciello
2023-01-03 22:18 ` Mario Limonciello
2023-01-03 22:18 ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 01/27] drm/amd: Delay removal of the firmware framebuffer Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 02/27] drm/amd: Add a legacy mapping to "amdgpu_ucode_ip_version_decode" Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 03/27] drm/amd: Convert SMUv11 microcode to use `amdgpu_ucode_ip_version_decode` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 04/27] drm/amd: Convert SMUv13 " Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 05/27] drm/amd: Add a new helper for loading/validating microcode Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-04  4:53   ` Lazar, Lijo
2023-01-04  4:53     ` Lazar, Lijo
2023-01-04  4:53     ` Lazar, Lijo
2023-01-04  9:37     ` Christian König
2023-01-04  9:37       ` Christian König
2023-01-04  9:37       ` Christian König
2023-01-03 22:18 ` [PATCH v4 06/27] drm/amd: Use `amdgpu_ucode_load` helper for SDMA Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 07/27] drm/amd: Convert SDMA to use `amdgpu_ucode_ip_version_decode` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-04  4:54   ` Lazar, Lijo
2023-01-04  4:54     ` Lazar, Lijo
2023-01-04  4:54     ` Lazar, Lijo
2023-01-03 22:18 ` [PATCH v4 08/27] drm/amd: Make SDMA firmware load failures less noisy Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 09/27] drm/amd: Use `amdgpu_ucode_load` helper for VCN Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 10/27] drm/amd: Load VCN microcode during early_init Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 11/27] drm/amd: Load MES " Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 12/27] drm/amd: Use `amdgpu_ucode_load` helper for MES Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 13/27] drm/amd: Remove superfluous assignment for `adev->mes.adev` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 14/27] drm/amd: Use `amdgpu_ucode_load` helper for GFX9 Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 15/27] drm/amd: Load GFX9 microcode during early_init Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 16/27] drm/amd: Use `amdgpu_ucode_load` helper for GFX10 Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 17/27] drm/amd: Load GFX10 microcode during early_init Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 18/27] drm/amd: Use `amdgpu_ucode_load` helper for GFX11 Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 19/27] drm/amd: Load GFX11 microcode during early_init Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 20/27] drm/amd: Parse both v1 and v2 TA microcode headers using same function Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 21/27] drm/amd: Avoid BUG() for case of SRIOV missing IP version Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 22/27] drm/amd: Load PSP microcode during early_init Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 23/27] drm/amd: Use `amdgpu_ucode_load` helper for PSP Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 24/27] drm/amd/display: Load DMUB microcode during early_init Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-04 15:52   ` Harry Wentland
2023-01-04 15:52     ` Harry Wentland
2023-01-04 15:52     ` Harry Wentland
2023-01-03 22:18 ` [PATCH v4 25/27] drm/amd: Use `amdgpu_ucode_load` helper for SMU Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 26/27] drm/amd: Load SMU microcode during early_init Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18 ` [PATCH v4 27/27] drm/amd: Optimize SRIOV switch/case for PSP microcode load Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-03 22:18   ` Mario Limonciello
2023-01-04 13:18   ` Christian König
2023-01-04 13:18     ` Christian König
2023-01-04 13:18     ` Christian König
2023-01-04 15:42     ` Limonciello, Mario
2023-01-04 15:42       ` Limonciello, Mario
2023-01-04 15:42       ` Limonciello, Mario

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.