All of lore.kernel.org
 help / color / mirror / Atom feed
* amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13
@ 2021-10-24 14:12 PGNet Dev
  2021-10-25 13:48 ` PGNet Dev
  0 siblings, 1 reply; 13+ messages in thread
From: PGNet Dev @ 2021-10-24 14:12 UTC (permalink / raw)
  To: dri-devel

i've a dual gpu system

	inxi -GS
		System:    Host: ws05 Kernel: 5.14.13-200.fc34.x86_64 x86_64 bits: 64 Console: tty pts/0
		           Distro: Fedora release 34 (Thirty Four)
(1)		Graphics:  Device-1: NVIDIA GK208B [GeForce GT 710] driver: nvidia v: 470.74
(2)		           Device-2: Advanced Micro Devices [AMD/ATI] Cezanne driver: N/A
		           Display: server: X.org 1.20.11 driver: loaded: nvidia unloaded: fbdev,modesetting,vesa
		           Message: Advanced graphics data unavailable for root.

running on

	cpu:    Ryzen 5 5600G
	mobo:   ASRockRack X470D4U
	bios:   vP4.20, 04/14/2021
	kernel: 5.14.13-200.fc34.x86_64 x86_64

where,

	the nvidia is a PCIe card
	the amdgpu is the Ryzen-integrated gpu

the nvidia PCI is currently my primary
it's screen-attached, and boots/functions correctly

	lsmod | grep nvidia
		nvidia_drm             69632  0
		nvidia_modeset       1200128  1 nvidia_drm
		nvidia              35332096  1 nvidia_modeset
		drm_kms_helper        303104  2 amdgpu,nvidia_drm
		drm                   630784  8 gpu_sched,drm_kms_helper,nvidia,amdgpu,drm_ttm_helper,nvidia_drm,ttm

	dmesg | grep -i nvidia
		[    5.755494] nvidia: loading out-of-tree module taints kernel.
		[    5.755503] nvidia: module license 'NVIDIA' taints kernel.
		[    5.759769] nvidia: module verification failed: signature and/or required key missing - tainting kernel
		[    5.774894] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
		[    5.775299] nvidia 0000:10:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
		[    5.975449] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  470.74  Mon Sep 13 23:09:15 UTC 2021
		[    6.013181] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  470.74  Mon Sep 13 22:59:50 UTC 2021
		[    6.016444] [drm] [nvidia-drm] [GPU ID 0x00001000] Loading driver
		[    6.227295] caller _nv000723rm+0x1ad/0x200 [nvidia] mapping multiple BARs
		[    6.954906] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:10:00.0 on minor 0
		[   16.820758] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input13
		[   16.820776] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input14
		[   16.820808] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input15
		[   16.820826] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input16
		[   16.820841] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input17

the amdgpu is not (currently/yet) in use; no attached screen

in BIOS, currently,

	'PCI Express' (nvidia gpu) is selected as primary
	'HybridGraphics' is enabled
	'OnBoard VGA' is enabled


on boot, mods are loaded

	lsmod | grep gpu
		amdgpu               7802880  0
		drm_ttm_helper         16384  1 amdgpu
		ttm                    81920  2 amdgpu,drm_ttm_helper
		iommu_v2               24576  1 amdgpu
		gpu_sched              45056  1 amdgpu
		drm_kms_helper        303104  2 amdgpu,nvidia_drm
		drm                   630784  8 gpu_sched,drm_kms_helper,nvidia,amdgpu,drm_ttm_helper,nvidia_drm,ttm
		i2c_algo_bit           16384  2 igb,amdgpu

but i see a 'fatal error' and 'failed' probe,

	dmesg | grep -i amdgpu
		[    5.161923] [drm] amdgpu kernel modesetting enabled.
		[    5.162097] amdgpu: Virtual CRAT table created for CPU
		[    5.162104] amdgpu: Topology: Add CPU node
		[    5.162197] amdgpu 0000:30:00.0: enabling device (0000 -> 0003)
		[    5.162232] amdgpu 0000:30:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
		[    5.169105] amdgpu 0000:30:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
		[    5.174413] amdgpu 0000:30:00.0: amdgpu: Unable to locate a BIOS ROM
		[    5.174415] amdgpu 0000:30:00.0: amdgpu: Fatal error during GPU init
		[    5.174416] amdgpu 0000:30:00.0: amdgpu: amdgpu: finishing device.
		[    5.174425] Modules linked in: amdgpu(+) uas usb_storage fjes(-) raid1 drm_ttm_helper ttm iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul crc32_pclmul igb crc32c_intel cec ghash_clmulni_intel drm sp5100_tco dca ccp i2c_algo_bit wmi video sunrpc tcp_bbr nct6775 hwmon_vid k10temp
		[    5.174463]  amdgpu_device_fini_hw+0x33/0x2c5 [amdgpu]
		[    5.174594]  amdgpu_driver_load_kms.cold+0x72/0x94 [amdgpu]
		[    5.174706]  amdgpu_pci_probe+0x110/0x1a0 [amdgpu]
		[    5.174907] amdgpu: probe of 0000:30:00.0 failed with error -22


are specific configs from

	https://www.kernel.org/doc/html/latest/gpu/amdgpu.html

required to avoid/workaround the init error?  or known bug?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13
  2021-10-24 14:12 amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13 PGNet Dev
@ 2021-10-25 13:48 ` PGNet Dev
  2021-10-25 13:58   ` PGNet Dev
  2021-10-25 14:15   ` Alex Deucher
  0 siblings, 2 replies; 13+ messages in thread
From: PGNet Dev @ 2021-10-25 13:48 UTC (permalink / raw)
  To: amd-gfx

( cc'ing this here, OP -> dri-devel@ )

i've a dual gpu system

	inxi -GS
		System:    Host: ws05 Kernel: 5.14.13-200.fc34.x86_64 x86_64 bits: 64 Console: tty pts/0
		           Distro: Fedora release 34 (Thirty Four)
(1)		Graphics:  Device-1: NVIDIA GK208B [GeForce GT 710] driver: nvidia v: 470.74
(2)		           Device-2: Advanced Micro Devices [AMD/ATI] Cezanne driver: N/A
		           Display: server: X.org 1.20.11 driver: loaded: nvidia unloaded: fbdev,modesetting,vesa
		           Message: Advanced graphics data unavailable for root.

running on

	cpu:    Ryzen 5 5600G
	mobo:   ASRockRack X470D4U
	bios:   vP4.20, 04/14/2021
	kernel: 5.14.13-200.fc34.x86_64 x86_64

where,

	the nvidia is a PCIe card
	the amdgpu is the Ryzen-integrated gpu

the nvidia PCI is currently my primary
it's screen-attached, and boots/functions correctly

	lsmod | grep nvidia
		nvidia_drm             69632  0
		nvidia_modeset       1200128  1 nvidia_drm
		nvidia              35332096  1 nvidia_modeset
		drm_kms_helper        303104  2 amdgpu,nvidia_drm
		drm                   630784  8 gpu_sched,drm_kms_helper,nvidia,amdgpu,drm_ttm_helper,nvidia_drm,ttm

	dmesg | grep -i nvidia
		[    5.755494] nvidia: loading out-of-tree module taints kernel.
		[    5.755503] nvidia: module license 'NVIDIA' taints kernel.
		[    5.759769] nvidia: module verification failed: signature and/or required key missing - tainting kernel
		[    5.774894] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
		[    5.775299] nvidia 0000:10:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
		[    5.975449] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  470.74  Mon Sep 13 23:09:15 UTC 2021
		[    6.013181] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  470.74  Mon Sep 13 22:59:50 UTC 2021
		[    6.016444] [drm] [nvidia-drm] [GPU ID 0x00001000] Loading driver
		[    6.227295] caller _nv000723rm+0x1ad/0x200 [nvidia] mapping multiple BARs
		[    6.954906] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:10:00.0 on minor 0
		[   16.820758] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input13
		[   16.820776] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input14
		[   16.820808] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input15
		[   16.820826] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input16
		[   16.820841] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input17

the amdgpu is not (currently/yet) in use; no attached screen

in BIOS, currently,

	'PCI Express' (nvidia gpu) is selected as primary
	'HybridGraphics' is enabled
	'OnBoard VGA' is enabled


on boot, mods are loaded

	lsmod | grep gpu
		amdgpu               7802880  0
		drm_ttm_helper         16384  1 amdgpu
		ttm                    81920  2 amdgpu,drm_ttm_helper
		iommu_v2               24576  1 amdgpu
		gpu_sched              45056  1 amdgpu
		drm_kms_helper        303104  2 amdgpu,nvidia_drm
		drm                   630784  8 gpu_sched,drm_kms_helper,nvidia,amdgpu,drm_ttm_helper,nvidia_drm,ttm
		i2c_algo_bit           16384  2 igb,amdgpu

but i see a 'fatal error' and 'failed' probe,

	dmesg | grep -i amdgpu
		[    5.161923] [drm] amdgpu kernel modesetting enabled.
		[    5.162097] amdgpu: Virtual CRAT table created for CPU
		[    5.162104] amdgpu: Topology: Add CPU node
		[    5.162197] amdgpu 0000:30:00.0: enabling device (0000 -> 0003)
		[    5.162232] amdgpu 0000:30:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
		[    5.169105] amdgpu 0000:30:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
		[    5.174413] amdgpu 0000:30:00.0: amdgpu: Unable to locate a BIOS ROM
		[    5.174415] amdgpu 0000:30:00.0: amdgpu: Fatal error during GPU init
		[    5.174416] amdgpu 0000:30:00.0: amdgpu: amdgpu: finishing device.
		[    5.174425] Modules linked in: amdgpu(+) uas usb_storage fjes(-) raid1 drm_ttm_helper ttm iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul crc32_pclmul igb crc32c_intel cec ghash_clmulni_intel drm sp5100_tco dca ccp i2c_algo_bit wmi video sunrpc tcp_bbr nct6775 hwmon_vid k10temp
		[    5.174463]  amdgpu_device_fini_hw+0x33/0x2c5 [amdgpu]
		[    5.174594]  amdgpu_driver_load_kms.cold+0x72/0x94 [amdgpu]
		[    5.174706]  amdgpu_pci_probe+0x110/0x1a0 [amdgpu]
		[    5.174907] amdgpu: probe of 0000:30:00.0 failed with error -22


are specific configs from

	https://www.kernel.org/doc/html/latest/gpu/amdgpu.html

required to avoid/workaround the init error?  or known bug?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13
  2021-10-25 13:48 ` PGNet Dev
@ 2021-10-25 13:58   ` PGNet Dev
  2021-10-25 14:15   ` Alex Deucher
  1 sibling, 0 replies; 13+ messages in thread
From: PGNet Dev @ 2021-10-25 13:58 UTC (permalink / raw)
  To: amd-gfx, dri-devel

adding a trace,

...
[    5.201715] [drm] amdgpu kernel modesetting enabled.
[    5.201902] amdgpu: Virtual CRAT table created for CPU
[    5.201909] amdgpu: Topology: Add CPU node
[    5.201968] checking generic (e1000000 1d5000) vs hw (c0000000 10000000)
[    5.201969] checking generic (e1000000 1d5000) vs hw (d0000000 200000)
[    5.201970] checking generic (e1000000 1d5000) vs hw (fc500000 80000)
[    5.201988] amdgpu 0000:30:00.0: enabling device (0000 -> 0003)
[    5.202020] [drm] initializing kernel modesetting (RENOIR 0x1002:0x1638 0x1002:0x1636 0xC9).
[    5.202024] amdgpu 0000:30:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
[    5.202033] [drm] register mmio base: 0xFC500000
[    5.202033] [drm] register mmio size: 524288
[    5.202035] [drm] PCIE atomic ops is not supported
[    5.203075] [drm] add ip block number 0 <soc15_common>
[    5.203076] [drm] add ip block number 1 <gmc_v9_0>
[    5.203077] [drm] add ip block number 2 <vega10_ih>
[    5.203078] [drm] add ip block number 3 <psp>
[    5.203078] [drm] add ip block number 4 <smu>
[    5.203079] [drm] add ip block number 5 <gfx_v9_0>
[    5.203079] [drm] add ip block number 6 <sdma_v4_0>
[    5.203080] [drm] add ip block number 7 <dm>
[    5.203081] [drm] add ip block number 8 <vcn_v2_0>
[    5.203081] [drm] add ip block number 9 <jpeg_v2_0>
[    5.208784] [drm] BIOS signature incorrect 0 0
[    5.208789] amdgpu 0000:30:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    5.214038] [drm] BIOS signature incorrect 0 0
[    5.214042] amdgpu 0000:30:00.0: amdgpu: Unable to locate a BIOS ROM
[    5.214044] amdgpu 0000:30:00.0: amdgpu: Fatal error during GPU init
[    5.214045] amdgpu 0000:30:00.0: amdgpu: amdgpu: finishing device.
[    5.214048] ------------[ cut here ]------------
[    5.214049] WARNING: CPU: 5 PID: 539 at kernel/workqueue.c:3044 __flush_work.isra.0+0x1ef/0x200
[    5.214054] Modules linked in: fjes(-) amdgpu(+) raid1 ast drm_vram_helper drm_ttm_helper iommu_v2 ttm gpu_sched drm_kms_helper igb crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel cec dca i2c_algo_bit sp5100_tco ccp drm uas usb_storage wmi video sunrpc tcp_bbr nct6775 hwmon_vid k10temp
[    5.214065] CPU: 5 PID: 539 Comm: systemd-udevd Not tainted 5.14.13-200.fc34.x86_64 #1
[    5.214067] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U, BIOS P4.20 04/14/2021
[    5.214068] RIP: 0010:__flush_work.isra.0+0x1ef/0x200
[    5.214070] Code: 8b 4d 00 48 8b 55 08 83 e1 08 48 0f ba 6d 00 03 80 c9 f0 e9 37 ff ff ff 0f 0b 48 83 c4 48 44 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45 31 e4 e9 46 ff ff ff 0f 1f 80 00 00 00 00 0f 1f 44 00 00
[    5.214071] RSP: 0018:ffff9d5f00f0fa80 EFLAGS: 00010246
[    5.214073] RAX: 0000000000000011 RBX: 0000000000000000 RCX: 0000000000000027
[    5.214074] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88bc91e25ab8
[    5.214074] RBP: ffff88bc91e25ab8 R08: 0000000000000000 R09: ffff9d5f00f0f898
[    5.214075] R10: ffff9d5f00f0f890 R11: ffff88c39e1fcfe8 R12: 0000000000000001
[    5.214075] R13: ffff88bc92622800 R14: ffff88bc91e20000 R15: ffff9d5f00f0fde0
[    5.214076] FS:  00007f231d7deb40(0000) GS:ffff88c37df40000(0000) knlGS:0000000000000000
[    5.214077] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.214078] CR2: 00007fa1bbfa5ff0 CR3: 0000000104b94000 CR4: 0000000000750ea0
[    5.214078] PKRU: 55555554
[    5.214079] Call Trace:
[    5.214082]  ? dev_printk_emit+0x3e/0x40
[    5.214085]  __cancel_work_timer+0xea/0x170
[    5.214086]  ? del_timer_sync+0x57/0x80
[    5.214089]  ttm_bo_lock_delayed_workqueue+0x11/0x20 [ttm]
[    5.214093]  amdgpu_device_fini_hw+0x33/0x2c5 [amdgpu]
[    5.214225]  amdgpu_driver_load_kms.cold+0x72/0x94 [amdgpu]
[    5.214338]  amdgpu_pci_probe+0x110/0x1a0 [amdgpu]
[    5.214420]  local_pci_probe+0x42/0x80
[    5.214423]  ? __cond_resched+0x16/0x40
[    5.214426]  pci_device_probe+0xd9/0x190
[    5.214427]  really_probe+0x1f5/0x3f0
[    5.214429]  __driver_probe_device+0xfe/0x180
[    5.214430]  driver_probe_device+0x1e/0x90
[    5.214431]  __driver_attach+0xc0/0x1c0
[    5.214433]  ? __device_attach_driver+0xe0/0xe0
[    5.214434]  ? __device_attach_driver+0xe0/0xe0
[    5.214434]  bus_for_each_dev+0x64/0x90
[    5.214436]  bus_add_driver+0x12b/0x1e0
[    5.214438]  driver_register+0x8f/0xe0
[    5.214439]  ? 0xffffffffc0d62000
[    5.214440]  do_one_initcall+0x44/0x1d0
[    5.214443]  ? kmem_cache_alloc_trace+0x15c/0x280
[    5.214445]  do_init_module+0x5c/0x270
[    5.214448]  __do_sys_init_module+0x11d/0x180
[    5.214450]  do_syscall_64+0x3b/0x90
[    5.214452]  ? handle_mm_fault+0xcf/0x2a0
[    5.214454]  ? do_user_addr_fault+0x1d5/0x680
[    5.214457]  ? syscall_exit_to_user_mode+0x18/0x40
[    5.214458]  ? exc_page_fault+0x72/0x150
[    5.214459]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[    5.214461] RIP: 0033:0x7f231e42a0fe
[    5.214463] Code: 48 8b 0d 7d 1d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4a 1d 0c 00 f7 d8 64 89 01 48
[    5.214463] RSP: 002b:00007ffd8d40e9e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[    5.214465] RAX: ffffffffffffffda RBX: 000055f6d5e2b7f0 RCX: 00007f231e42a0fe
[    5.214465] RDX: 00007f231e57d32c RSI: 0000000000d4ebde RDI: 00007f231bc53010
[    5.214466] RBP: 00007f231bc53010 R08: 000055f6d5e0e050 R09: 0000000000d4ebf0
[    5.214466] R10: 000055f38ab3874e R11: 0000000000000246 R12: 00007f231e57d32c
[    5.214467] R13: 000055f6d5e0f840 R14: 0000000000000007 R15: 000055f6d5deff30
[    5.214468] ---[ end trace ce1b3e6fbbcac425 ]---
[    5.214481] x86/PAT: systemd-udevd:539 freeing invalid memtype [mem 0x00000000-0xffffffffffffffff]
[    5.214539] amdgpu: probe of 0000:30:00.0 failed with error -22
[
...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13
  2021-10-25 13:48 ` PGNet Dev
  2021-10-25 13:58   ` PGNet Dev
@ 2021-10-25 14:15   ` Alex Deucher
  2021-10-25 15:15     ` Lazar, Lijo
  2021-10-30 15:24     ` amdgpu on Ryzen 5600G -- 'purple' background [WAS: Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13} PGNet Dev
  1 sibling, 2 replies; 13+ messages in thread
From: Alex Deucher @ 2021-10-25 14:15 UTC (permalink / raw)
  To: PGNet Dev; +Cc: amd-gfx list

On Mon, Oct 25, 2021 at 9:48 AM PGNet Dev <pgnet.dev@gmail.com> wrote:
>
> ( cc'ing this here, OP -> dri-devel@ )
>
> i've a dual gpu system
>
>         inxi -GS
>                 System:    Host: ws05 Kernel: 5.14.13-200.fc34.x86_64 x86_64 bits: 64 Console: tty pts/0
>                            Distro: Fedora release 34 (Thirty Four)
> (1)             Graphics:  Device-1: NVIDIA GK208B [GeForce GT 710] driver: nvidia v: 470.74
> (2)                        Device-2: Advanced Micro Devices [AMD/ATI] Cezanne driver: N/A
>                            Display: server: X.org 1.20.11 driver: loaded: nvidia unloaded: fbdev,modesetting,vesa
>                            Message: Advanced graphics data unavailable for root.
>
> running on
>
>         cpu:    Ryzen 5 5600G
>         mobo:   ASRockRack X470D4U
>         bios:   vP4.20, 04/14/2021
>         kernel: 5.14.13-200.fc34.x86_64 x86_64
>
> where,
>
>         the nvidia is a PCIe card
>         the amdgpu is the Ryzen-integrated gpu
>
> the nvidia PCI is currently my primary
> it's screen-attached, and boots/functions correctly
>
>         lsmod | grep nvidia
>                 nvidia_drm             69632  0
>                 nvidia_modeset       1200128  1 nvidia_drm
>                 nvidia              35332096  1 nvidia_modeset
>                 drm_kms_helper        303104  2 amdgpu,nvidia_drm
>                 drm                   630784  8 gpu_sched,drm_kms_helper,nvidia,amdgpu,drm_ttm_helper,nvidia_drm,ttm
>
>         dmesg | grep -i nvidia
>                 [    5.755494] nvidia: loading out-of-tree module taints kernel.
>                 [    5.755503] nvidia: module license 'NVIDIA' taints kernel.
>                 [    5.759769] nvidia: module verification failed: signature and/or required key missing - tainting kernel
>                 [    5.774894] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
>                 [    5.775299] nvidia 0000:10:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
>                 [    5.975449] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  470.74  Mon Sep 13 23:09:15 UTC 2021
>                 [    6.013181] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  470.74  Mon Sep 13 22:59:50 UTC 2021
>                 [    6.016444] [drm] [nvidia-drm] [GPU ID 0x00001000] Loading driver
>                 [    6.227295] caller _nv000723rm+0x1ad/0x200 [nvidia] mapping multiple BARs
>                 [    6.954906] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:10:00.0 on minor 0
>                 [   16.820758] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input13
>                 [   16.820776] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input14
>                 [   16.820808] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input15
>                 [   16.820826] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input16
>                 [   16.820841] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input17
>
> the amdgpu is not (currently/yet) in use; no attached screen
>
> in BIOS, currently,
>
>         'PCI Express' (nvidia gpu) is selected as primary
>         'HybridGraphics' is enabled
>         'OnBoard VGA' is enabled
>
>
> on boot, mods are loaded
>
>         lsmod | grep gpu
>                 amdgpu               7802880  0
>                 drm_ttm_helper         16384  1 amdgpu
>                 ttm                    81920  2 amdgpu,drm_ttm_helper
>                 iommu_v2               24576  1 amdgpu
>                 gpu_sched              45056  1 amdgpu
>                 drm_kms_helper        303104  2 amdgpu,nvidia_drm
>                 drm                   630784  8 gpu_sched,drm_kms_helper,nvidia,amdgpu,drm_ttm_helper,nvidia_drm,ttm
>                 i2c_algo_bit           16384  2 igb,amdgpu
>
> but i see a 'fatal error' and 'failed' probe,
>
>         dmesg | grep -i amdgpu
>                 [    5.161923] [drm] amdgpu kernel modesetting enabled.
>                 [    5.162097] amdgpu: Virtual CRAT table created for CPU
>                 [    5.162104] amdgpu: Topology: Add CPU node
>                 [    5.162197] amdgpu 0000:30:00.0: enabling device (0000 -> 0003)
>                 [    5.162232] amdgpu 0000:30:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
>                 [    5.169105] amdgpu 0000:30:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
>                 [    5.174413] amdgpu 0000:30:00.0: amdgpu: Unable to locate a BIOS ROM
>                 [    5.174415] amdgpu 0000:30:00.0: amdgpu: Fatal error during GPU init
>                 [    5.174416] amdgpu 0000:30:00.0: amdgpu: amdgpu: finishing device.
>                 [    5.174425] Modules linked in: amdgpu(+) uas usb_storage fjes(-) raid1 drm_ttm_helper ttm iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul crc32_pclmul igb crc32c_intel cec ghash_clmulni_intel drm sp5100_tco dca ccp i2c_algo_bit wmi video sunrpc tcp_bbr nct6775 hwmon_vid k10temp
>                 [    5.174463]  amdgpu_device_fini_hw+0x33/0x2c5 [amdgpu]
>                 [    5.174594]  amdgpu_driver_load_kms.cold+0x72/0x94 [amdgpu]
>                 [    5.174706]  amdgpu_pci_probe+0x110/0x1a0 [amdgpu]
>                 [    5.174907] amdgpu: probe of 0000:30:00.0 failed with error -22
>
>
> are specific configs from
>
>         https://www.kernel.org/doc/html/latest/gpu/amdgpu.html
>
> required to avoid/workaround the init error?  or known bug?

The driver is not able to find the vbios image which is required for
the driver to properly enumerate the hardware.  I would guess it's a
platform issue.  Is there a newer sbios image available for your
platform?  You might try that or check if there are any options in the
sbios regarding the behavior of the integrated graphics when an
external GPU is present.  I suspect the one of the following is the
problem:
1. The sbios should disable the integrated graphics when a dGPU is
present, but due to a bug in the sbios or a particular sbios settings
it has failed to.
2. The sbios should be providing a vbios image for the integrated
graphics, but due to a bug in the sbios or a particular sbios settings
it has failed to.
3. The platform uses some alternative method to provide access to the
vbios image for the integrated graphics that Linux does not yet
handle.

I would start with an sbios update is possible.

Alex

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13
  2021-10-25 14:15   ` Alex Deucher
@ 2021-10-25 15:15     ` Lazar, Lijo
  2021-10-25 16:32       ` PGNet Dev
  2021-10-29 22:13       ` PGNet Dev
  2021-10-30 15:24     ` amdgpu on Ryzen 5600G -- 'purple' background [WAS: Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13} PGNet Dev
  1 sibling, 2 replies; 13+ messages in thread
From: Lazar, Lijo @ 2021-10-25 15:15 UTC (permalink / raw)
  To: Alex Deucher, PGNet Dev; +Cc: amd-gfx list



On 10/25/2021 7:45 PM, Alex Deucher wrote:
> On Mon, Oct 25, 2021 at 9:48 AM PGNet Dev <pgnet.dev@gmail.com> wrote:
>>
>> ( cc'ing this here, OP -> dri-devel@ )
>>
>> i've a dual gpu system
>>
>>          inxi -GS
>>                  System:    Host: ws05 Kernel: 5.14.13-200.fc34.x86_64 x86_64 bits: 64 Console: tty pts/0
>>                             Distro: Fedora release 34 (Thirty Four)
>> (1)             Graphics:  Device-1: NVIDIA GK208B [GeForce GT 710] driver: nvidia v: 470.74
>> (2)                        Device-2: Advanced Micro Devices [AMD/ATI] Cezanne driver: N/A
>>                             Display: server: X.org 1.20.11 driver: loaded: nvidia unloaded: fbdev,modesetting,vesa
>>                             Message: Advanced graphics data unavailable for root.
>>
>> running on
>>
>>          cpu:    Ryzen 5 5600G
>>          mobo:   ASRockRack X470D4U
>>          bios:   vP4.20, 04/14/2021
>>          kernel: 5.14.13-200.fc34.x86_64 x86_64
>>
>> where,
>>
>>          the nvidia is a PCIe card
>>          the amdgpu is the Ryzen-integrated gpu
>>
>> the nvidia PCI is currently my primary
>> it's screen-attached, and boots/functions correctly
>>
>>          lsmod | grep nvidia
>>                  nvidia_drm             69632  0
>>                  nvidia_modeset       1200128  1 nvidia_drm
>>                  nvidia              35332096  1 nvidia_modeset
>>                  drm_kms_helper        303104  2 amdgpu,nvidia_drm
>>                  drm                   630784  8 gpu_sched,drm_kms_helper,nvidia,amdgpu,drm_ttm_helper,nvidia_drm,ttm
>>
>>          dmesg | grep -i nvidia
>>                  [    5.755494] nvidia: loading out-of-tree module taints kernel.
>>                  [    5.755503] nvidia: module license 'NVIDIA' taints kernel.
>>                  [    5.759769] nvidia: module verification failed: signature and/or required key missing - tainting kernel
>>                  [    5.774894] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
>>                  [    5.775299] nvidia 0000:10:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
>>                  [    5.975449] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  470.74  Mon Sep 13 23:09:15 UTC 2021
>>                  [    6.013181] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  470.74  Mon Sep 13 22:59:50 UTC 2021
>>                  [    6.016444] [drm] [nvidia-drm] [GPU ID 0x00001000] Loading driver
>>                  [    6.227295] caller _nv000723rm+0x1ad/0x200 [nvidia] mapping multiple BARs
>>                  [    6.954906] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:10:00.0 on minor 0
>>                  [   16.820758] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input13
>>                  [   16.820776] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input14
>>                  [   16.820808] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input15
>>                  [   16.820826] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input16
>>                  [   16.820841] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input17
>>
>> the amdgpu is not (currently/yet) in use; no attached screen
>>
>> in BIOS, currently,
>>
>>          'PCI Express' (nvidia gpu) is selected as primary
>>          'HybridGraphics' is enabled
>>          'OnBoard VGA' is enabled
>>
>>
>> on boot, mods are loaded
>>
>>          lsmod | grep gpu
>>                  amdgpu               7802880  0
>>                  drm_ttm_helper         16384  1 amdgpu
>>                  ttm                    81920  2 amdgpu,drm_ttm_helper
>>                  iommu_v2               24576  1 amdgpu
>>                  gpu_sched              45056  1 amdgpu
>>                  drm_kms_helper        303104  2 amdgpu,nvidia_drm
>>                  drm                   630784  8 gpu_sched,drm_kms_helper,nvidia,amdgpu,drm_ttm_helper,nvidia_drm,ttm
>>                  i2c_algo_bit           16384  2 igb,amdgpu
>>
>> but i see a 'fatal error' and 'failed' probe,
>>
>>          dmesg | grep -i amdgpu
>>                  [    5.161923] [drm] amdgpu kernel modesetting enabled.
>>                  [    5.162097] amdgpu: Virtual CRAT table created for CPU
>>                  [    5.162104] amdgpu: Topology: Add CPU node
>>                  [    5.162197] amdgpu 0000:30:00.0: enabling device (0000 -> 0003)
>>                  [    5.162232] amdgpu 0000:30:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
>>                  [    5.169105] amdgpu 0000:30:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
>>                  [    5.174413] amdgpu 0000:30:00.0: amdgpu: Unable to locate a BIOS ROM
>>                  [    5.174415] amdgpu 0000:30:00.0: amdgpu: Fatal error during GPU init
>>                  [    5.174416] amdgpu 0000:30:00.0: amdgpu: amdgpu: finishing device.
>>                  [    5.174425] Modules linked in: amdgpu(+) uas usb_storage fjes(-) raid1 drm_ttm_helper ttm iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul crc32_pclmul igb crc32c_intel cec ghash_clmulni_intel drm sp5100_tco dca ccp i2c_algo_bit wmi video sunrpc tcp_bbr nct6775 hwmon_vid k10temp
>>                  [    5.174463]  amdgpu_device_fini_hw+0x33/0x2c5 [amdgpu]
>>                  [    5.174594]  amdgpu_driver_load_kms.cold+0x72/0x94 [amdgpu]
>>                  [    5.174706]  amdgpu_pci_probe+0x110/0x1a0 [amdgpu]
>>                  [    5.174907] amdgpu: probe of 0000:30:00.0 failed with error -22
>>
>>
>> are specific configs from
>>
>>          https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Flatest%2Fgpu%2Famdgpu.html&amp;data=04%7C01%7Clijo.lazar%40amd.com%7C508775dd6cc24018696208d997c1f667%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637707681607159780%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=LivHv37A0%2FxKYxSqt1HUzNiIqFznX2N3OEb5gcR4k8U%3D&amp;reserved=0
>>
>> required to avoid/workaround the init error?  or known bug?
> 
> The driver is not able to find the vbios image which is required for
> the driver to properly enumerate the hardware.  I would guess it's a
> platform issue.  Is there a newer sbios image available for your
> platform?  You might try that or check if there are any options in the
> sbios regarding the behavior of the integrated graphics when an
> external GPU is present.  I suspect the one of the following is the
> problem:
> 1. The sbios should disable the integrated graphics when a dGPU is
> present, but due to a bug in the sbios or a particular sbios settings
> it has failed to.
> 2. The sbios should be providing a vbios image for the integrated
> graphics, but due to a bug in the sbios or a particular sbios settings
> it has failed to.
> 3. The platform uses some alternative method to provide access to the
> vbios image for the integrated graphics that Linux does not yet
> handle.
> 
To add to the list - check if ACPI support is broken or skipped.

Thanks,
Lijo

> I would start with an sbios update is possible.
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13
  2021-10-25 15:15     ` Lazar, Lijo
@ 2021-10-25 16:32       ` PGNet Dev
  2021-10-25 17:48         ` PGNet Dev
  2021-10-29 22:13       ` PGNet Dev
  1 sibling, 1 reply; 13+ messages in thread
From: PGNet Dev @ 2021-10-25 16:32 UTC (permalink / raw)
  To: lijo.lazar, alexdeucher; +Cc: amd-gfx, dri-devel

hi

>> The driver is not able to find the vbios image which is required for
>> the driver to properly enumerate the hardware.  I would guess it's a
>> platform issue.  Is there a newer sbios image available for your
>> platform?
...
>> I would start with an sbios update is possible.

not that i'm aware.

this board is an ASRockRack X470D4U server board

	https://www.asrockrack.com/general/productdetail.asp?Model=X470D4U#Specifications

latest BIOS

	https://www.asrockrack.com/general/productdetail.asp?Model=X470D4U#Download

is,

	4.20	8/12/2021	BIOS

which is installed,

	dmidecode | grep "BIOS Information" -A5
	BIOS Information
         	Vendor: American Megatrends International, LLC.
         	Version: P4.20
         	Release Date: 04/14/2021
         	Address: 0xF0000
         	Runtime Size: 64 kB

>> check if there are any options in the
>> sbios regarding the behavior of the integrated graphics when an
>> external GPU is present.  I suspect the one of the following is the
>> problem:
>> 1. The sbios should disable the integrated graphics when a dGPU is
>> present, but due to a bug in the sbios or a particular sbios settings
>> it has failed to.
>> 2. The sbios should be providing a vbios image for the integrated
>> graphics, but due to a bug in the sbios or a particular sbios settings
>> it has failed to.
>> 3. The platform uses some alternative method to provide access to the
>> vbios image for the integrated graphics that Linux does not yet
>> handle.

Checking on the specific options for Dual/Concurrent GPU support is ... challenging ... so far.
I haven't found a clear statement/doc on how it's intended to behave, what options are available, or details on the individual options.

Per a chat late last week with ASRockRack, my understanding is that dual CPU support is *supposed* to work.

atm, I'm not clear on how specifically test/answer for any of your suspected issues :-/
Reading online to see what to check, etc.

> To add to the list - check if ACPI support is broken or skipped.

It doesn't appear to me to be; here's dmesg output,

dmesg | grep -i acpi
	...
	[    0.000000] BIOS-e820: [mem 0x000000000a200000-0x000000000a20efff] ACPI NVS
	[    0.000000] BIOS-e820: [mem 0x00000000bae2e000-0x00000000bae70fff] ACPI data
	[    0.000000] BIOS-e820: [mem 0x00000000bae71000-0x00000000bd0defff] ACPI NVS
	[    0.000000] reserve setup_data: [mem 0x000000000a200000-0x000000000a20efff] ACPI NVS
	[    0.000000] reserve setup_data: [mem 0x00000000bae2e000-0x00000000bae70fff] ACPI data
	[    0.000000] reserve setup_data: [mem 0x00000000bae71000-0x00000000bd0defff] ACPI NVS
	[    0.000000] efi: ACPI=0xbd0c8000 ACPI 2.0=0xbd0c8014 SMBIOS=0xbdd6d000 SMBIOS 3.0=0xbdd6c000 MEMATTR=0xb5ee6698 ESRT=0xb5804518 MOKvar=0xb568b000 RNG=0xbdd9eb18
	[    0.004625] ACPI: Early table checksum verification disabled
	[    0.004627] ACPI: RSDP 0x00000000BD0C8014 000024 (v02 ALASKA)
	[    0.004630] ACPI: XSDT 0x00000000BD0C7728 0000E4 (v01 ALASKA A M I    01072009 AMI  01000013)
	[    0.004634] ACPI: FACP 0x00000000BAE60000 000114 (v06 ALASKA A M I    01072009 AMI  00010013)
	[    0.004637] ACPI: DSDT 0x00000000BAE59000 006308 (v02 ALASKA A M I    01072009 INTL 20120913)
	[    0.004639] ACPI: FACS 0x00000000BC0C2000 000040
	[    0.004640] ACPI: IVRS 0x00000000BAE70000 0000D0 (v02 AMD    AmdTable 00000001 AMD  00000001)
	[    0.004642] ACPI: SPMI 0x00000000BAE6F000 000041 (v05 ALASKA A M I    00000000 AMI. 00000000)
	[    0.004644] ACPI: SPMI 0x00000000BAE6E000 000041 (v05 ALASKA A M I    00000000 AMI. 00000000)
	[    0.004645] ACPI: SSDT 0x00000000BAE66000 007229 (v02 AMD    MYRTLE   00000002 MSFT 04000000)
	[    0.004647] ACPI: SSDT 0x00000000BAE62000 003BD7 (v01 AMD    AMD AOD  00000001 INTL 20120913)
	[    0.004648] ACPI: SSDT 0x00000000BAE61000 0000C8 (v02 ALASKA CPUSSDT  01072009 AMI  01072009)
	[    0.004650] ACPI: FIDT 0x00000000BAE58000 00009C (v01 ALASKA A M I    01072009 AMI  00010013)
	[    0.004651] ACPI: MCFG 0x00000000BAE57000 00003C (v01 ALASKA A M I    01072009 MSFT 00010013)
	[    0.004653] ACPI: AAFT 0x00000000BAE56000 000068 (v01 ALASKA OEMAAFT  01072009 MSFT 00000097)
	[    0.004654] ACPI: HPET 0x00000000BAE55000 000038 (v01 ALASKA A M I    01072009 AMI  00000005)
	[    0.004656] ACPI: SPCR 0x00000000BAE54000 000050 (v02 A M I  APTIO V  01072009 AMI. 00050011)
	[    0.004657] ACPI: SSDT 0x00000000BAE51000 002B44 (v02 AMD    AmdTable 00000001 AMD  00000001)
	[    0.004659] ACPI: CRAT 0x00000000BAE50000 000B68 (v01 AMD    AmdTable 00000001 AMD  00000001)
	[    0.004660] ACPI: CDIT 0x00000000BAE4F000 000029 (v01 AMD    AmdTable 00000001 AMD  00000001)
	[    0.004662] ACPI: SSDT 0x00000000BAE4E000 000D53 (v01 AMD    MYRTLEG2 00000001 INTL 20120913)
	[    0.004663] ACPI: SSDT 0x00000000BAE4D000 00022A (v01 AMD    MYRTLEGP 00000001 INTL 20120913)
	[    0.004665] ACPI: SSDT 0x00000000BAE49000 00381A (v01 AMD    MYRTLE   00000001 INTL 20120913)
	[    0.004666] ACPI: SSDT 0x00000000BAE48000 0000BF (v01 AMD    AmdTable 00001000 INTL 20120913)
	[    0.004668] ACPI: WSMT 0x00000000BAE47000 000028 (v01 ALASKA A M I    01072009 AMI  00010013)
	[    0.004669] ACPI: APIC 0x00000000BAE46000 00015E (v03 ALASKA A M I    01072009 AMI  00010013)
	[    0.004671] ACPI: SSDT 0x00000000BAE45000 00051B (v01 AMD    MYRTLERN 00000001 INTL 20120913)
	[    0.004672] ACPI: SSDT 0x00000000BAE43000 0010AF (v01 AMD    MYRTLE   00000001 INTL 20120913)
	[    0.004674] ACPI: FPDT 0x00000000BAE42000 000044 (v01 ALASKA A M I    01072009 AMI  01000013)
	[    0.004675] ACPI: Reserving FACP table memory at [mem 0xbae60000-0xbae60113]
	[    0.004676] ACPI: Reserving DSDT table memory at [mem 0xbae59000-0xbae5f307]
	[    0.004676] ACPI: Reserving FACS table memory at [mem 0xbc0c2000-0xbc0c203f]
	[    0.004677] ACPI: Reserving IVRS table memory at [mem 0xbae70000-0xbae700cf]
	[    0.004677] ACPI: Reserving SPMI table memory at [mem 0xbae6f000-0xbae6f040]
	[    0.004678] ACPI: Reserving SPMI table memory at [mem 0xbae6e000-0xbae6e040]
	[    0.004679] ACPI: Reserving SSDT table memory at [mem 0xbae66000-0xbae6d228]
	[    0.004679] ACPI: Reserving SSDT table memory at [mem 0xbae62000-0xbae65bd6]
	[    0.004680] ACPI: Reserving SSDT table memory at [mem 0xbae61000-0xbae610c7]
	[    0.004680] ACPI: Reserving FIDT table memory at [mem 0xbae58000-0xbae5809b]
	[    0.004681] ACPI: Reserving MCFG table memory at [mem 0xbae57000-0xbae5703b]
	[    0.004681] ACPI: Reserving AAFT table memory at [mem 0xbae56000-0xbae56067]
	[    0.004682] ACPI: Reserving HPET table memory at [mem 0xbae55000-0xbae55037]
	[    0.004682] ACPI: Reserving SPCR table memory at [mem 0xbae54000-0xbae5404f]
	[    0.004683] ACPI: Reserving SSDT table memory at [mem 0xbae51000-0xbae53b43]
	[    0.004683] ACPI: Reserving CRAT table memory at [mem 0xbae50000-0xbae50b67]
	[    0.004684] ACPI: Reserving CDIT table memory at [mem 0xbae4f000-0xbae4f028]
	[    0.004684] ACPI: Reserving SSDT table memory at [mem 0xbae4e000-0xbae4ed52]
	[    0.004685] ACPI: Reserving SSDT table memory at [mem 0xbae4d000-0xbae4d229]
	[    0.004686] ACPI: Reserving SSDT table memory at [mem 0xbae49000-0xbae4c819]
	[    0.004686] ACPI: Reserving SSDT table memory at [mem 0xbae48000-0xbae480be]
	[    0.004687] ACPI: Reserving WSMT table memory at [mem 0xbae47000-0xbae47027]
	[    0.004687] ACPI: Reserving APIC table memory at [mem 0xbae46000-0xbae4615d]
	[    0.004688] ACPI: Reserving SSDT table memory at [mem 0xbae45000-0xbae4551a]
	[    0.004688] ACPI: Reserving SSDT table memory at [mem 0xbae43000-0xbae440ae]
	[    0.004689] ACPI: Reserving FPDT table memory at [mem 0xbae42000-0xbae42043]
	[    0.070746] ACPI: PM-Timer IO Port: 0x808
	[    0.070752] ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
	[    0.070770] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
	[    0.070771] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
	[    0.070773] ACPI: Using ACPI (MADT) for SMP configuration information
	[    0.070774] ACPI: HPET id: 0x10228201 base: 0xfed00000
	[    0.070778] ACPI: SPCR: console: uart,io,0x3f8,115200
	[    0.296452] ACPI: Core revision 20210604
	[    0.418616] ACPI: PM: Registering ACPI NVS region [mem 0x0a200000-0x0a20efff] (61440 bytes)
	[    0.418619] ACPI: PM: Registering ACPI NVS region [mem 0xbae71000-0xbd0defff] (36102144 bytes)
	[    0.419845] ACPI: bus type PCI registered
	[    0.419847] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
	[    0.505757] ACPI: Added _OSI(Module Device)
	[    0.505757] ACPI: Added _OSI(Processor Device)
	[    0.505757] ACPI: Added _OSI(3.0 _SCP Extensions)
	[    0.505757] ACPI: Added _OSI(Processor Aggregator Device)
	[    0.505757] ACPI: Added _OSI(Linux-Dell-Video)
	[    0.505757] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
	[    0.505757] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
	[    0.505757] ACPI: Added _OSI(Linux)
	[    0.511109] ACPI BIOS Error (bug): Failure creating named object [\_SB.PCI0.GPP0.VGA], AE_ALREADY_EXISTS (20210604/dswload2-326)
	[    0.511115] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210604/psobject-220)
	[    0.511117] ACPI: Skipping parse of AML opcode: OpcodeName unavailable (0x5B82)
	[    0.511119] ACPI BIOS Error (bug): Failure creating named object [\_SB.PCI0.GPP0.HDAU], AE_ALREADY_EXISTS (20210604/dswload2-326)
	[    0.511121] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20210604/psobject-220)
	[    0.511123] ACPI: Skipping parse of AML opcode: OpcodeName unavailable (0x5B82)
	[    0.511321] ACPI: 11 ACPI AML tables successfully acquired and loaded
	[    0.515248] ACPI: Interpreter enabled
	[    0.515257] ACPI: PM: (supports S0 S4 S5)
	[    0.515258] ACPI: Using IOAPIC for interrupt routing
	[    0.515427] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
	[    0.515619] ACPI: Enabled 4 GPEs in block 00 to 1F
	[    0.520321] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
	[    0.520325] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
	[    0.520401] acpi PNP0A08:00: _OSC: platform does not support [SHPCHotplug LTR DPC]
	[    0.520471] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME AER PCIeCapability]
	[    0.520479] acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-7f] only partially covers this bridge
	[    0.530700] ACPI: PCI: Interrupt link LNKA configured for IRQ 0
	[    0.530727] ACPI: PCI: Interrupt link LNKB configured for IRQ 0
	[    0.530750] ACPI: PCI: Interrupt link LNKC configured for IRQ 0
	[    0.530779] ACPI: PCI: Interrupt link LNKD configured for IRQ 0
	[    0.530805] ACPI: PCI: Interrupt link LNKE configured for IRQ 0
	[    0.530826] ACPI: PCI: Interrupt link LNKF configured for IRQ 0
	[    0.530847] ACPI: PCI: Interrupt link LNKG configured for IRQ 0
	[    0.530868] ACPI: PCI: Interrupt link LNKH configured for IRQ 0
	[    0.531910] ACPI: bus type USB registered
	[    0.531910] PCI: Using ACPI for IRQ routing
	[    0.543389] pnp: PnP ACPI init
	[    0.544422] pnp: PnP ACPI: found 7 devices
	[    0.549677] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
	[    0.597052] ACPI: button: Power Button [PWRB]
	[    0.597085] ACPI: button: Power Button [PWRF]
	[    0.597121] ACPI: \_PR_.C000: Found 3 idle states
	[    0.597212] ACPI: \_PR_.C002: Found 3 idle states
	[    0.597676] ACPI: \_PR_.C004: Found 3 idle states
	[    0.597735] ACPI: \_PR_.C006: Found 3 idle states
	[    0.597787] ACPI: \_PR_.C008: Found 3 idle states
	[    0.597841] ACPI: \_PR_.C00A: Found 3 idle states
	[    0.597891] ACPI: \_PR_.C001: Found 3 idle states
	[    0.597936] ACPI: \_PR_.C003: Found 3 idle states
	[    0.598059] ACPI: \_PR_.C005: Found 3 idle states
	[    0.598201] ACPI: \_PR_.C007: Found 3 idle states
	[    0.598327] ACPI: \_PR_.C009: Found 3 idle states
	[    0.598411] ACPI: \_PR_.C00B: Found 3 idle states
	...



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13
  2021-10-25 16:32       ` PGNet Dev
@ 2021-10-25 17:48         ` PGNet Dev
  2021-10-26 17:02           ` PGNet Dev
  0 siblings, 1 reply; 13+ messages in thread
From: PGNet Dev @ 2021-10-25 17:48 UTC (permalink / raw)
  To: lijo.lazar, alexdeucher; +Cc: amd-gfx, dri-devel

> sbios settings

any of these raise a suspicion?

screenshot from the ASRockRack X470D4U's BIOS setup:

   https://imgur.com/a/rdhGQNy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13
  2021-10-25 17:48         ` PGNet Dev
@ 2021-10-26 17:02           ` PGNet Dev
  0 siblings, 0 replies; 13+ messages in thread
From: PGNet Dev @ 2021-10-26 17:02 UTC (permalink / raw)
  To: lijo.lazar, alexdeucher; +Cc: amd-gfx, dri-devel

>> sbios settings

given suggestion this may be a BIOS issue, I've posted this issue as a question @,

   https://forum.asrock.com/forum_posts.asp?TID=19749&title=x470d4u-p4-20-ryzen5600g-fatal-error-gpu-boot

and pinged ASRockRack tech support via their online tech supp form.

If anyone _here_ knows an appropriate contact @ ASRockRack to link into this discussion, that'd be useful/appreciated!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13
  2021-10-25 15:15     ` Lazar, Lijo
  2021-10-25 16:32       ` PGNet Dev
@ 2021-10-29 22:13       ` PGNet Dev
  2021-10-29 22:34         ` PGNet Dev
  1 sibling, 1 reply; 13+ messages in thread
From: PGNet Dev @ 2021-10-29 22:13 UTC (permalink / raw)
  To: lijo.lazar, alexdeucher; +Cc: amd-gfx

>> I would start with an sbios update is possible.

I swapped out the ASRockRack X470D4U mobo for a new, next-gen X570D4U.

Keeping the same 2X16GB UDIMMs, and trying 2 different Ryzen 5600G CPUs, I now see the following ...

With an NVIDIA PCIe card as primary adapter, it posts & functions, as before; no issues or problems.

Selecting the on-die AMDGPU, via the board's HDMI connector, now also posts & boots; No more OOPS.


Booting, now on

	uname -rm
		5.14.14-200.fc34.x86_64 x86_64

dmesg @ boot is:

	dmesg | grep -i amdgpu
		[    1.623977] [drm] amdgpu kernel modesetting enabled.
		[    1.627731] amdgpu: Virtual CRAT table created for CPU
		[    1.627738] amdgpu: Topology: Add CPU node
		[    1.627782] fb0: switching to amdgpudrmfb from EFI VGA
		[    1.627910] amdgpu 0000:30:00.0: vgaarb: deactivate vga console
		[    1.627972] amdgpu 0000:30:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
		[    1.634655] amdgpu 0000:30:00.0: amdgpu: Fetched VBIOS from ROM BAR
		[    1.634656] amdgpu: ATOM BIOS: 113-CEZANNE-018
		[    1.635463] amdgpu 0000:30:00.0: amdgpu: VRAM: 512M 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
		[    1.635465] amdgpu 0000:30:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
		[    1.635466] amdgpu 0000:30:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
		[    1.635504] [drm] amdgpu: 512M of VRAM memory ready
		[    1.635505] [drm] amdgpu: 3072M of GTT memory ready.
		[    1.639127] amdgpu 0000:30:00.0: amdgpu: PSP runtime database doesn't exist
		[    1.667936] amdgpu 0000:30:00.0: amdgpu: Will use PSP to load VCN firmware
		[    2.469604] amdgpu 0000:30:00.0: amdgpu: RAS: optional ras ta ucode is not available
		[    2.477996] amdgpu 0000:30:00.0: amdgpu: RAP: optional rap ta ucode is not available
		[    2.477999] amdgpu 0000:30:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
		[    2.478948] amdgpu 0000:30:00.0: amdgpu: SMU is initialized successfully!
		[    2.530805] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
		[    2.758719] amdgpu: HMM registered 512MB device memory
		[    2.758741] amdgpu: SRAT table not found
		[    2.758741] amdgpu: Virtual CRAT table created for GPU
		[    2.758942] amdgpu: Topology: Add dGPU node [0x1638:0x1002]
		[    2.758944] kfd kfd: amdgpu: added device 1002:1638
		[    2.758958] amdgpu 0000:30:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 18, active_cu_number 27
		[    2.949242] fbcon: amdgpu (fb0) is primary device
		[    3.052240] amdgpu 0000:30:00.0: [drm] fb0: amdgpu frame buffer device
		[    3.061026] amdgpu 0000:30:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
		[    3.061030] amdgpu 0000:30:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
		[    3.061031] amdgpu 0000:30:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
		[    3.061032] amdgpu 0000:30:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
		[    3.061032] amdgpu 0000:30:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
		[    3.061033] amdgpu 0000:30:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
		[    3.061034] amdgpu 0000:30:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
		[    3.061034] amdgpu 0000:30:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
		[    3.061035] amdgpu 0000:30:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
		[    3.061036] amdgpu 0000:30:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
		[    3.061037] amdgpu 0000:30:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
		[    3.061038] amdgpu 0000:30:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
		[    3.061039] amdgpu 0000:30:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
		[    3.061039] amdgpu 0000:30:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
		[    3.061040] amdgpu 0000:30:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
		[    3.209226] [drm] Initialized amdgpu 3.42.0 20150101 for 0000:30:00.0 on minor 0
		[   13.749477] snd_hda_intel 0000:30:00.1: bound 0000:30:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])

However, now, the output color registration is wrong.

After grub selection, boot shell background is pink/magenta everywhere.  Screenshot here:  https://imgur.com/q2JJ4n6

If I continue from shell to launch a desktop environment (XFCE or KDE), it reaches runlevel 5 with no problems or errors -- EXCEPT the color registration is still wrong.

Switching back to the NVIdia cures the issue - back to normal black background @ runlevel 3 shell, and correct colors @ rl5

The mv from X470D4U -> X570D4U apparently 'fixed' the problem with NO video output from the on-die GPU. BIOS, or other board issues, I'm not clear.

Is this color issue *still* likely a BIOS issue? or the amdgpu driver?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13
  2021-10-29 22:13       ` PGNet Dev
@ 2021-10-29 22:34         ` PGNet Dev
  2021-10-30  1:07           ` PGNet Dev
  0 siblings, 1 reply; 13+ messages in thread
From: PGNet Dev @ 2021-10-29 22:34 UTC (permalink / raw)
  To: lijo.lazar, alexdeucher; +Cc: amd-gfx

in case it's useful

	grep -i amd /var/log/Xorg.0.log | grep -v Modeline
		[   324.709] (II) Applying OutputClass "AMDgpu" to /dev/dri/card0
		[   324.709]    loading driver: amdgpu
		[   324.818] (==) Matched amdgpu as autoconfigured driver 0
		[   324.818] (II) LoadModule: "amdgpu"
		[   324.825] (II) Loading /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
		[   324.877] (II) Module amdgpu: vendor="X.Org Foundation"
		[   324.992] (II) AMDGPU: Driver for AMD Radeon:
		        All GPUs supported by the amdgpu kernel driver
		[   325.108] (II) Loading sub module "ramdac"
		[   325.108] (II) LoadModule: "ramdac"
		[   325.108] (II) Module "ramdac" already built-in
		[   325.110] (II) AMDGPU(0): Creating default Display subsection in Screen section
		[   325.110] (==) AMDGPU(0): Depth 24, (--) framebuffer bpp 32
		[   325.110] (II) AMDGPU(0): Pixel depth = 24 bits stored in 4 bytes (32 bpp pixmaps)
		[   325.110] (==) AMDGPU(0): Default visual is TrueColor
		[   325.110] (==) AMDGPU(0): RGB weight 888
		[   325.110] (II) AMDGPU(0): Using 8 bits per RGB (8 bit DAC)
		[   325.110] (--) AMDGPU(0): Chipset: "Unknown AMD Radeon GPU" (ChipID = 0x1638)
		[   327.957] (II) AMDGPU(0): glamor X acceleration enabled on AMD RENOIR (DRM 3.42.0, 5.14.14-200.fc34.x86_64, LLVM 12.0.1)
		[   327.957] (II) AMDGPU(0): glamor detected, initialising EGL layer.
		[   327.957] (==) AMDGPU(0): TearFree property default: auto
		[   327.957] (==) AMDGPU(0): VariableRefresh: disabled
		[   327.957] (II) AMDGPU(0): KMS Pageflipping: enabled
		[   327.957] (II) AMDGPU(0): Output HDMI-A-0 has no monitor section
		[   327.958] (II) AMDGPU(0): Output HDMI-A-1 has no monitor section
		[   327.958] (II) AMDGPU(0): Output DisplayPort-0 has no monitor section
		[   327.963] (II) AMDGPU(0): EDID for output HDMI-A-0
		[   327.963] (II) AMDGPU(0): EDID for output HDMI-A-1
		[   327.963] (II) AMDGPU(0): Manufacturer: VSC  Model: cc32  Serial#: 16843025
		[   327.963] (II) AMDGPU(0): Year: 2018  Week: 47
		[   327.963] (II) AMDGPU(0): EDID Version: 1.3
		[   327.963] (II) AMDGPU(0): Digital Display Input
		[   327.963] (II) AMDGPU(0): Max Image Size [cm]: horiz.: 60  vert.: 34
		[   327.963] (II) AMDGPU(0): Gamma: 2.20
		[   327.963] (II) AMDGPU(0): DPMS capabilities: Off
		[   327.963] (II) AMDGPU(0): Supported color encodings: RGB 4:4:4 YCrCb 4:4:4
		[   327.963] (II) AMDGPU(0): Default color space is primary color space
		[   327.963] (II) AMDGPU(0): First detailed timing is preferred mode
		[   327.963] (II) AMDGPU(0): redX: 0.661 redY: 0.332   greenX: 0.304 greenY: 0.613
		[   327.963] (II) AMDGPU(0): blueX: 0.149 blueY: 0.060   whiteX: 0.313 whiteY: 0.329
		[   327.963] (II) AMDGPU(0): Supported established timings:
		[   327.963] (II) AMDGPU(0): 720x400@70Hz
		[   327.963] (II) AMDGPU(0): 640x480@60Hz
		[   327.963] (II) AMDGPU(0): 640x480@67Hz
		[   327.963] (II) AMDGPU(0): 640x480@72Hz
		[   327.963] (II) AMDGPU(0): 640x480@75Hz
		[   327.963] (II) AMDGPU(0): 800x600@56Hz
		[   327.963] (II) AMDGPU(0): 800x600@60Hz
		[   327.963] (II) AMDGPU(0): 800x600@72Hz
		[   327.963] (II) AMDGPU(0): 800x600@75Hz
		[   327.964] (II) AMDGPU(0): 832x624@75Hz
		[   327.964] (II) AMDGPU(0): 1024x768@60Hz
		[   327.964] (II) AMDGPU(0): 1024x768@70Hz
		[   327.964] (II) AMDGPU(0): 1024x768@75Hz
		[   327.964] (II) AMDGPU(0): 1280x1024@75Hz
		[   327.964] (II) AMDGPU(0): 1152x864@75Hz
		[   327.964] (II) AMDGPU(0): Manufacturer's mask: 0
		[   327.964] (II) AMDGPU(0): Supported standard timings:
		[   327.964] (II) AMDGPU(0): #0: hsize: 2048  vsize 1152  refresh: 60  vid: 49377
		[   327.964] (II) AMDGPU(0): #1: hsize: 1920  vsize 1200  refresh: 60  vid: 209
		[   327.964] (II) AMDGPU(0): #2: hsize: 1920  vsize 1080  refresh: 60  vid: 49361
		[   327.964] (II) AMDGPU(0): #3: hsize: 1680  vsize 1050  refresh: 60  vid: 179
		[   327.964] (II) AMDGPU(0): #4: hsize: 1600  vsize 900  refresh: 60  vid: 49321
		[   327.964] (II) AMDGPU(0): #5: hsize: 1280  vsize 1024  refresh: 60  vid: 32897
		[   327.964] (II) AMDGPU(0): #6: hsize: 1280  vsize 800  refresh: 60  vid: 129
		[   327.964] (II) AMDGPU(0): #7: hsize: 1280  vsize 720  refresh: 60  vid: 49281
		[   327.964] (II) AMDGPU(0): Supported detailed timing:
		[   327.964] (II) AMDGPU(0): clock: 241.5 MHz   Image Size:  597 x 336 mm
		[   327.964] (II) AMDGPU(0): h_active: 2560  h_sync: 2608  h_sync_end 2640 h_blank_end 2720 h_border: 0
		[   327.964] (II) AMDGPU(0): v_active: 1440  v_sync: 1443  v_sync_end 1448 v_blanking: 1481 v_border: 0
		[   327.964] (II) AMDGPU(0): Serial No: UP2184700251
		[   327.964] (II) AMDGPU(0): Ranges: V min: 24 V max: 120 Hz, H min: 15 H max: 130 kHz, PixClock max 305 MHz
		[   327.964] (II) AMDGPU(0): Monitor name: VP2771
		[   327.964] (II) AMDGPU(0): Supported detailed timing:
		[   327.964] (II) AMDGPU(0): clock: 148.5 MHz   Image Size:  597 x 336 mm
		[   327.964] (II) AMDGPU(0): h_active: 1920  h_sync: 2008  h_sync_end 2052 h_blank_end 2200 h_border: 0
		[   327.964] (II) AMDGPU(0): v_active: 1080  v_sync: 1084  v_sync_end 1089 v_blanking: 1125 v_border: 0
		[   327.964] (II) AMDGPU(0): Supported detailed timing:
		[   327.964] (II) AMDGPU(0): clock: 74.2 MHz   Image Size:  597 x 336 mm
		[   327.964] (II) AMDGPU(0): h_active: 1920  h_sync: 2008  h_sync_end 2052 h_blank_end 2200 h_border: 0
		[   327.964] (II) AMDGPU(0): v_active: 540  v_sync: 542  v_sync_end 547 v_blanking: 562 v_border: 0
		[   327.964] (II) AMDGPU(0): Supported detailed timing:
		[   327.964] (II) AMDGPU(0): clock: 74.2 MHz   Image Size:  597 x 336 mm
		[   327.964] (II) AMDGPU(0): h_active: 1280  h_sync: 1390  h_sync_end 1430 h_blank_end 1650 h_border: 0
		[   327.964] (II) AMDGPU(0): v_active: 720  v_sync: 725  v_sync_end 730 v_blanking: 750 v_border: 0
		[   327.964] (II) AMDGPU(0): Supported detailed timing:
		[   327.964] (II) AMDGPU(0): clock: 127.8 MHz   Image Size:  597 x 336 mm
		[   327.964] (II) AMDGPU(0): h_active: 1280  h_sync: 1328  h_sync_end 1360 h_blank_end 1440 h_border: 0
		[   327.964] (II) AMDGPU(0): v_active: 1440  v_sync: 1443  v_sync_end 1453 v_blanking: 1481 v_border: 0
		[   327.964] (II) AMDGPU(0): Number of EDID sections to follow: 1
		[   327.964] (II) AMDGPU(0): EDID (in hex):
		[   327.964] (II) AMDGPU(0):    00ffffffffffff005a6332cc11010101
		[   327.964] (II) AMDGPU(0):    2f1c0103803c22782e4c55a9554d9d26
		[   327.964] (II) AMDGPU(0):    0f5054bfef80e1c0d100d1c0b300a9c0
		[   327.964] (II) AMDGPU(0):    8180810081c0565e00a0a0a029503020
		[   327.964] (II) AMDGPU(0):    350055502100001a000000ff00555032
		[   327.964] (II) AMDGPU(0):    3138343730303235310a000000fd0018
		[   327.964] (II) AMDGPU(0):    780f821e000a202020202020000000fc
		[   327.964] (II) AMDGPU(0):    005650323737310a2020202020200190
		[   327.964] (II) AMDGPU(0):    020334f15b5f1005040302070609080f
		[   327.964] (II) AMDGPU(0):    0e1f2021221413121116151a191e1d01
		[   327.964] (II) AMDGPU(0):    23097f07830100006b030c001300003c
		[   327.964] (II) AMDGPU(0):    20002001023a801871382d40582c4500
		[   327.964] (II) AMDGPU(0):    55502100001e011d8018711c1620582c
		[   327.964] (II) AMDGPU(0):    250055502100009e011d007251d01e20
		[   327.964] (II) AMDGPU(0):    6e28550055502100001ee73100a050a0
		[   327.964] (II) AMDGPU(0):    295030203a0055502100001a000000b7
		[   327.964] (--) AMDGPU(0): HDMI max TMDS frequency 300000KHz
		[   327.964] (II) AMDGPU(0): Printing probed modes for output HDMI-A-1
		[   327.964] (II) AMDGPU(0): EDID for output DisplayPort-0
		[   327.964] (II) AMDGPU(0): Output HDMI-A-0 disconnected
		[   327.964] (II) AMDGPU(0): Output HDMI-A-1 connected
		[   327.964] (II) AMDGPU(0): Output DisplayPort-0 disconnected
		[   327.964] (II) AMDGPU(0): Using exact sizes for initial modes
		[   327.964] (II) AMDGPU(0): Output HDMI-A-1 using initial mode 2560x1440 +0+0
		[   327.964] (II) AMDGPU(0): mem size init: gart size :bf6ca000 vram size: s:1d906000 visible:1d906000
		[   327.964] (==) AMDGPU(0): DPI set to (96, 96)
		[   327.964] (==) AMDGPU(0): Using gamma correction (1.0, 1.0, 1.0)
		[   327.964] (II) Loading sub module "ramdac"
		[   327.964] (II) LoadModule: "ramdac"
		[   327.964] (II) Module "ramdac" already built-in
		[   328.774] (II) AMDGPU(0): [DRI2] Setup complete
		[   328.774] (II) AMDGPU(0): [DRI2]   DRI driver: radeonsi
		[   328.774] (II) AMDGPU(0): [DRI2]   VDPAU driver: radeonsi
		[   329.625] (II) AMDGPU(0): Front buffer pitch: 10240 bytes
		[   329.642] (II) AMDGPU(0): SYNC extension fences enabled
		[   329.642] (II) AMDGPU(0): Present extension enabled
		[   329.642] (==) AMDGPU(0): DRI3 enabled
		[   329.642] (==) AMDGPU(0): Backing store enabled
		[   329.642] (II) AMDGPU(0): Direct rendering enabled
		[   329.944] (II) AMDGPU(0): Use GLAMOR acceleration.
		[   329.944] (II) AMDGPU(0): Acceleration enabled
		[   329.944] (==) AMDGPU(0): DPMS enabled
		[   329.944] (==) AMDGPU(0): Silken mouse enabled
		[   329.965] (II) AMDGPU(0): Set up textured video (glamor)
		[   330.109] (II) AMDGPU(0): Setting screen physical size to 677 x 381
		[   337.993] (II) AMDGPU(0): EDID vendor "VSC", prod id 52274
		[   337.993] (II) AMDGPU(0): Using EDID range info for horizontal sync
		[   337.993] (II) AMDGPU(0): Using EDID range info for vertical refresh
		[   337.993] (--) AMDGPU(0): HDMI max TMDS frequency 300000KHz
		[   337.994] (II) AMDGPU(0): EDID vendor "VSC", prod id 52274
		[   337.994] (II) AMDGPU(0): Using hsync ranges from config file
		[   337.994] (II) AMDGPU(0): Using vrefresh ranges from config file
		[   337.995] (--) AMDGPU(0): HDMI max TMDS frequency 300000KHz

	rpm -q --whatprovides  /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
		xorg-x11-drv-amdgpu-21.0.0-1.fc34.x86_64

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13
  2021-10-29 22:34         ` PGNet Dev
@ 2021-10-30  1:07           ` PGNet Dev
  0 siblings, 0 replies; 13+ messages in thread
From: PGNet Dev @ 2021-10-30  1:07 UTC (permalink / raw)
  To: lijo.lazar, alexdeucher; +Cc: amd-gfx

I got this comment from ASRockRack support re: the 'purple' screen:

"
Did you also get that background color in the BIOS menu? If not, it appears that this is the color may have something to do with video driver and it seems to be common with open source operating system. I came across these two forums with similar experience, there are some solution mentioned that might help you fix the driver issue.

https://forums.linuxmint.com/viewtopic.php?t=202548
https://askubuntu.com/questions/1219150/ubuntu-19-10-stuck-at-purple-screen-during-boot-using-kvm
"

Doesn't seem like either of those two are a specific fit.

But the point is that ASRockRack is suggesting it's the driver/config.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* amdgpu on Ryzen 5600G -- 'purple' background [WAS: Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13}
  2021-10-25 14:15   ` Alex Deucher
  2021-10-25 15:15     ` Lazar, Lijo
@ 2021-10-30 15:24     ` PGNet Dev
  2021-11-02  9:37       ` PGNet Dev
  1 sibling, 1 reply; 13+ messages in thread
From: PGNet Dev @ 2021-10-30 15:24 UTC (permalink / raw)
  To: amd-gfx, dri-devel

Now I'm just guessing.

TBH, I've no idea what's causing this reproducible 'purple' background with `amdgpu` on Ryzen 5XXXG.

All I can definitively say is that it's real, reproducible, seen elsewhere in the wild for radeon/amd
Possibly related to driver defaulting to HDMI YCbCr color, rather than RGB.

I'll add more info as requested when someone with better knowledge of what's needed chimes in.

For now, here's the last bits of info I've found.

This issue

	https://community.amd.com/t5/drivers-software/purple-ish-desktop-screen-after-clean-installing-the-newest-19-5/td-p/99933/page/15

suggests in 2019, Microsoft fixed driver for this purple-hue issue on Ryzen

	https://support.microsoft.com/en-ca/help/4505903/windows-10-update-kb4505903

Seems like this had to do with the driver selecting YCbCr for HDMI rather than RGB ...

Here,

	Setting the amdgpu HDMI Pixel Format on Linux
	 https://www.wezm.net/v2/posts/2020/linux-amdgpu-pixel-format/

mentions

	"...
	I looked for a way to change the pixel format output from the HDMI port of my RX560 graphics card. Turns out this is super easy on Windows, but the amdgpu driver on Linux does not support changing it.
	..."

and refers to an EDID hack/fix

	https://www.wezm.net/v2/posts/2020/linux-amdgpu-pixel-format/#the-fix

More digging led to

	AMDGPU fails to properly parse EDID information from display, causing weird resolution setting issues
	 https://gitlab.freedesktop.org/drm/amd/-/issues/1589

with a familiar 'purple' display,

	https://gitlab.freedesktop.org/drm/amd/uploads/2e2b485aed26d77a9066ca9ea516d49d/image.png

and points to an amd issue "Created 3 years ago",

	no color format choice in amdgpu
	 https://gitlab.freedesktop.org/drm/amd/-/issues/476#note_852860

and finally, a patch

	[PATCH] drm/amdgpu/dc: Pixel encoding DRM property and module parameter
	 https://www.spinics.net/lists/amd-gfx/msg53281.html

which suggests adding

	pixel_encoding=rgb


checking

	hwinfo --gfxcard | egrep "Model|SysFS ID"
		SysFS ID: /devices/pci0000:00/0000:00:08.1/0000:30:00.0
		Model: "ATI VGA compatible controller"
		SysFS ID: /devices/pci0000:00/0000:00:01.1/0000:10:00.0
		Model: "nVidia GP108 [GeForce GT 1030]"


	ls -ald /sys/class/drm/card* | grep 30:
		lrwxrwxrwx 1 root root 0 Oct 30 09:56 /sys/class/drm/card0 -> ../../devices/pci0000:00/0000:00:08.1/0000:30:00.0/drm/card0/
		lrwxrwxrwx 1 root root 0 Oct 30 09:56 /sys/class/drm/card0-DP-1 -> ../../devices/pci0000:00/0000:00:08.1/0000:30:00.0/drm/card0/card0-DP-1/
		lrwxrwxrwx 1 root root 0 Oct 30 09:56 /sys/class/drm/card0-HDMI-A-1 -> ../../devices/pci0000:00/0000:00:08.1/0000:30:00.0/drm/card0/card0-HDMI-A-1/
		lrwxrwxrwx 1 root root 0 Oct 30 09:56 /sys/class/drm/card0-HDMI-A-2 -> ../../devices/pci0000:00/0000:00:08.1/0000:30:00.0/drm/card0/card0-HDMI-A-2/

( why are there *2* HDMI for card0, when only 1 phy output? )

next, added to kernel cmdline

	video=HDMI-A-1:2560x1440@60:pixel_encoding=rgb video=HDMI-A-2:2560x1440@60:pixel_encoding=rgb

and, for good measure,

	cat /etc/modprobe.d/amdgpu.conf
		

re-gen'd initrd, and rebooted.

STILL getting the purple screen :-/

dmesg, after boot completion,

	dmesg | grep encod
	...
	[    1.650090] amdgpu: unknown parameter 'pixel_encoding' ignored
	...

Paying close(r) attention, screen output after grub-select starts out with black-as-usual background, but switches to purple immediately after:

	dmesg

		...
>>		[    1.268709] systemd[1]: Starting dracut initqueue hook...
		...


where,

	dmesg | egrep -i "atpx|vga|drm|amdgpu|initqueue"
		[    0.329804] ACPI BIOS Error (bug): Failure creating named object [\_SB.PCI0.GPP0.VGA], AE_ALREADY_EXISTS (20210604/dswload2-326)
		[    0.351328] pci 0000:10:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
		[    0.351328] pci 0000:30:00.0: vgaarb: setting as boot VGA device
		[    0.351328] pci 0000:30:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
		[    0.351328] pci 0000:10:00.0: vgaarb: bridge control possible
		[    0.351328] pci 0000:30:00.0: vgaarb: bridge control possible
		[    0.351328] vgaarb: loaded
		[    0.410851] fb0: EFI VGA frame buffer device
		[    1.264753] ACPI: video: Video Device [VGA] (multi-head: yes  rom: no  post: no)
		[    1.265256] ACPI: video: Video Device [VGA1] (multi-head: yes  rom: no  post: no)
		[    1.268709] systemd[1]: Starting dracut initqueue hook...
		[    1.650090] amdgpu: unknown parameter 'pixel_encoding' ignored
		[    1.650094] amdgpu: unknown parameter 'modeset' ignored
		[    1.650416] [drm] amdgpu kernel modesetting enabled.
		[    1.650432] vga_switcheroo: detected switching method \_SB_.PCI0.GP17.VGA_.ATPX handle
		[    1.650720] ATPX version 1, functions 0x00000001
		[    1.650751] ATPX Hybrid Graphics
		[    1.656232] amdgpu: Virtual CRAT table created for CPU
		[    1.656240] amdgpu: Topology: Add CPU node
		[    1.656299] fb0: switching to amdgpudrmfb from EFI VGA
		[    1.656413] amdgpu 0000:30:00.0: vgaarb: deactivate vga console
		[    1.656503] [drm] initializing kernel modesetting (RENOIR 0x1002:0x1638 0x1002:0x1636 0xC9).
		[    1.656516] amdgpu 0000:30:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
		[    1.656548] [drm] register mmio base: 0xFCB00000
		[    1.656548] [drm] register mmio size: 524288
		[    1.656550] [drm] PCIE atomic ops is not supported
		[    1.657578] [drm] add ip block number 0 <soc15_common>
		[    1.657579] [drm] add ip block number 1 <gmc_v9_0>
		[    1.657581] [drm] add ip block number 2 <vega10_ih>
		[    1.657581] [drm] add ip block number 3 <psp>
		[    1.657582] [drm] add ip block number 4 <smu>
		[    1.657583] [drm] add ip block number 5 <gfx_v9_0>
		[    1.657584] [drm] add ip block number 6 <sdma_v4_0>
		[    1.657585] [drm] add ip block number 7 <dm>
		[    1.657586] [drm] add ip block number 8 <vcn_v2_0>
		[    1.657587] [drm] add ip block number 9 <jpeg_v2_0>
		[    1.663332] [drm] BIOS signature incorrect 0 0
		[    1.663360] amdgpu 0000:30:00.0: amdgpu: Fetched VBIOS from ROM BAR
		[    1.663363] amdgpu: ATOM BIOS: 113-CEZANNE-018
		[    1.664115] [drm] VCN decode is enabled in VM mode
		[    1.664117] [drm] VCN encode is enabled in VM mode
		[    1.664118] [drm] JPEG decode is enabled in VM mode
		[    1.664147] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
		[    1.664153] amdgpu 0000:30:00.0: amdgpu: VRAM: 512M 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
		[    1.664155] amdgpu 0000:30:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
		[    1.664156] amdgpu 0000:30:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
		[    1.664161] [drm] Detected VRAM RAM=512M, BAR=512M
		[    1.664162] [drm] RAM width 128bits DDR4
		[    1.664200] [drm] amdgpu: 512M of VRAM memory ready
		[    1.664201] [drm] amdgpu: 3072M of GTT memory ready.
		[    1.664206] [drm] GART: num cpu pages 262144, num gpu pages 262144
		[    1.664285] [drm] PCIE GART of 1024M enabled.
		[    1.664286] [drm] PTB located at 0x000000F400900000
		[    1.668403] amdgpu 0000:30:00.0: amdgpu: PSP runtime database doesn't exist
		[    1.684148] [drm] Loading DMUB firmware via PSP: version=0x01010019
		[    1.699355] [drm] Found VCN firmware Version ENC: 1.14 DEC: 5 VEP: 0 Revision: 20
		[    1.699368] amdgpu 0000:30:00.0: amdgpu: Will use PSP to load VCN firmware
		[    2.119364] nvidia 0000:10:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
		[    2.371596] [drm] [nvidia-drm] [GPU ID 0x00001000] Loading driver
		[    2.425174] [drm] reserve 0x400000 from 0xf41f800000 for PSP TMR
		[    2.506778] amdgpu 0000:30:00.0: amdgpu: RAS: optional ras ta ucode is not available
		[    2.515189] amdgpu 0000:30:00.0: amdgpu: RAP: optional rap ta ucode is not available
		[    2.515191] amdgpu 0000:30:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
		[    2.515540] amdgpu 0000:30:00.0: amdgpu: SMU is initialized successfully!
		[    2.516855] [drm] kiq ring mec 2 pipe 1 q 0
		[    2.517561] [drm] Display Core initialized with v3.2.141!
		[    2.517981] [drm] DMUB hardware initialized: version=0x01010019
		[    2.577117] [drm] VCN decode and encode initialized successfully(under DPG Mode).
		[    2.577132] [drm] JPEG decode initialized successfully.
		[    2.578072] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
		[    2.707339] amdgpu: HMM registered 512MB device memory
		[    2.707361] amdgpu: SRAT table not found
		[    2.707361] amdgpu: Virtual CRAT table created for GPU
		[    2.708098] amdgpu: Topology: Add dGPU node [0x1638:0x1002]
		[    2.708103] kfd kfd: amdgpu: added device 1002:1638
		[    2.708170] amdgpu 0000:30:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 18, active_cu_number 27
		[    2.709057] [drm] fb mappable at 0x90CD2000
		[    2.709059] [drm] vram apper at 0x90000000
		[    2.709059] [drm] size 14745600
		[    2.709059] [drm] fb depth is 24
		[    2.709060] [drm]    pitch is 10240
		[    2.896150] fbcon: amdgpu (fb0) is primary device
		[    3.000331] amdgpu 0000:30:00.0: [drm] fb0: amdgpu frame buffer device
		[    3.009001] amdgpu 0000:30:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
		[    3.009004] amdgpu 0000:30:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
		[    3.009006] amdgpu 0000:30:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
		[    3.009006] amdgpu 0000:30:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
		[    3.009007] amdgpu 0000:30:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
		[    3.009008] amdgpu 0000:30:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
		[    3.009009] amdgpu 0000:30:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
		[    3.009010] amdgpu 0000:30:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
		[    3.009010] amdgpu 0000:30:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
		[    3.009011] amdgpu 0000:30:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
		[    3.009012] amdgpu 0000:30:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
		[    3.009013] amdgpu 0000:30:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
		[    3.009014] amdgpu 0000:30:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
		[    3.009015] amdgpu 0000:30:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
		[    3.009016] amdgpu 0000:30:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
		[    3.155840] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:10:00.0 on minor 1
		[    3.156216] [drm] Initialized amdgpu 3.42.0 20150101 for 0000:30:00.0 on minor 0
		[    4.598742] systemd[1]: Finished dracut initqueue hook.
		[    5.043799] systemd[1]: dracut-initqueue.service: Deactivated successfully.
		[    5.043873] systemd[1]: Stopped dracut initqueue hook.
		[   10.007951] systemd[1]: Starting Load Kernel Module drm...
		[   10.170619] systemd[1]: modprobe@drm.service: Deactivated successfully.
		[   10.170718] systemd[1]: Finished Load Kernel Module drm.
		[   13.750415] snd_hda_intel 0000:10:00.1: Handle vga_switcheroo audio client
		[   13.750632] snd_hda_intel 0000:30:00.1: Handle vga_switcheroo audio client
		[   13.879276] snd_hda_intel 0000:30:00.1: bound 0000:30:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])


Is that kernel/modconfig config incorrect?
Not relevant to the problem?
Something else(where) needed?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: amdgpu on Ryzen 5600G -- 'purple' background [WAS: Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13}
  2021-10-30 15:24     ` amdgpu on Ryzen 5600G -- 'purple' background [WAS: Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13} PGNet Dev
@ 2021-11-02  9:37       ` PGNet Dev
  0 siblings, 0 replies; 13+ messages in thread
From: PGNet Dev @ 2021-11-02  9:37 UTC (permalink / raw)
  To: amd-gfx, dri-devel

On 10/30/21 11:24, PGNet Dev wrote:
> Is that kernel/modconfig config incorrect?
> Not relevant to the problem?
> Something else(where) needed?

fwiw,

AMD Global Customer Care's response to question about this 'purple' issue:

"...
Please be informed that Ryzen 5600G APU is supported only on Windows 10, 11, RHEL and Ubuntu OS. Also, this might vary depending on the manufacturer in your case AsRock.

https://www.amd.com/en/products/apu/amd-ryzen-5-5600g

I suggest you try installing Ubuntu 20.04.3 and update to latest version and check the status.
"

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-11-02  9:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-24 14:12 amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13 PGNet Dev
2021-10-25 13:48 ` PGNet Dev
2021-10-25 13:58   ` PGNet Dev
2021-10-25 14:15   ` Alex Deucher
2021-10-25 15:15     ` Lazar, Lijo
2021-10-25 16:32       ` PGNet Dev
2021-10-25 17:48         ` PGNet Dev
2021-10-26 17:02           ` PGNet Dev
2021-10-29 22:13       ` PGNet Dev
2021-10-29 22:34         ` PGNet Dev
2021-10-30  1:07           ` PGNet Dev
2021-10-30 15:24     ` amdgpu on Ryzen 5600G -- 'purple' background [WAS: Re: amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13} PGNet Dev
2021-11-02  9:37       ` PGNet Dev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.