> Yes, but it could be the same underlying reason. There is no PCI setup issue that we are aware of. > For a first try, use 5.9.3. If it reproduces there, please try booting with "pci=noats" on the kernel command line. Did compile the kernel 5.9.3 and started a reboot test to see if it is going to fail again. However I found out that with Kernel 5.9.3 the amdgpu kernel module is not loaded/installed. So this way I don´t see it makes sense for further investigation. I might did something wrong when compiling the linux kernel 5.9.3. I did reuse my .config file that I used with 5.4.0-47 for configuration of the kernel 5.9.3. However I do not know why it did not install amdgpu. > Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine where this happens. For comparison I attached the logs when using 5.4.0-47 and 5.9.3. Best regards, Edgar -----Original Message----- From: jroedel@suse.de Sent: Mittwoch, 4. November 2020 11:15 To: Merger, Edgar [AUTOSOL/MAS/AUGS] Cc: iommu@lists.linux-foundation.org Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled On Wed, Nov 04, 2020 at 09:21:35AM +0000, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error is at [ 52.772273], hence much earlier. Yes, but it could be the same underlying reason. > Have not tried to use an upstream kernel yet. Which one would you recommend? For a first try, use 5.9.3. If it reproduces there, please try booting with "pci=noats" on the kernel command line. Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine where this happens. Regards, Joerg > > As far as inconsistencies in the PCI-setup is concerned, the only thing that I know of right now is that we haven´t entered a PCI subsystem vendor and device ID yet. It is still "Advanced Micro Devices". We will change that soon to "General Electric" or "Emerson". > > Best regards, > Edgar > > -----Original Message----- > From: jroedel@suse.de > Sent: Mittwoch, 4. November 2020 09:53 > To: Merger, Edgar [AUTOSOL/MAS/AUGS] > Cc: iommu@lists.linux-foundation.org > Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled > > Hi Edgar, > > On Fri, Oct 30, 2020 at 02:26:23PM +0000, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > > With one board we have a boot-problem that is reproducible at every ~50 boot. > > The system is accessible via ssh and works fine except for the > > Graphics. The graphics is off. We don´t see a screen. Please see > > attached “dmesg.log”. From [52.772273] onwards the kernel reports > > drm/amdgpu errors. It even tries to reset the GPU but that fails too. > > I tried to reset amdgpu also by command “sudo cat /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either. > > Can you reproduce the problem with an upstream kernel too? > > These messages in dmesg indicate some problem in the platform setup: > > AMD-Vi: Completion-Wait loop timed out > > Might there be some inconsistencies in the PCI setup between the bridges and the endpoints or something? > > Regards, > > Joerg