Re: [PATCH 3/3] drm/amdgpu: move iommu_resume before ip init/resume

From: James Zhu <jamesz@amd.com>
To: Felix Kuehling <felix.kuehling@amd.com>,
	James Zhu <James.Zhu@amd.com>,
	amd-gfx@lists.freedesktop.org
Cc: kolAflash@kolahilft.de, me@jeromec.com, alexdeucher@gmail.com,
	ted437@gmail.com, Huang Rui <ray.huang@amd.com>
Subject: Re: [PATCH 3/3] drm/amdgpu: move iommu_resume before ip init/resume
Date: Thu, 9 Sep 2021 12:19:48 -0400	[thread overview]
Message-ID: <75d218ac-467f-f608-3f60-c05946acb32c@amd.com> (raw)
In-Reply-To: <8d26e2ce-5161-60fa-9c0e-3a87431f57e7@amd.com>

On 2021-09-07 6:20 p.m., Felix Kuehling wrote:
> Am 2021-09-07 um 4:30 p.m. schrieb James Zhu:
>>
>> On 2021-09-07 1:53 p.m., Felix Kuehling wrote:
>>> Am 2021-09-07 um 1:51 p.m. schrieb Felix Kuehling:
>>>> Am 2021-09-07 um 1:22 p.m. schrieb James Zhu:
>>>>> On 2021-09-07 12:48 p.m., Felix Kuehling wrote:
>>>>>> Am 2021-09-07 um 12:07 p.m. schrieb James Zhu:
>>>>>>> Separate iommu_resume from kfd_resume, and move it before
>>>>>>> other amdgpu ip init/resume.
>>>>>>>
>>>>>>> Fixed Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211277
>>>>>> I think the change is OK. But I don't understand how the IOMMUv2
>>>>>> initialization sequence could affect a crash in DM. The display should
>>>>>> not depend on IOMMUv2 at all. What am I missing?
>>>>> [JZ] It is a weird issue. disable VCN IP block or disable gpu_off
>>>>> feature, or set pci=noats, all
>>>>>
>>>>> can fix DM crash. Also the issue occurred quite random, some time
>>>>> after few suspend/resume cycle,
>>>>>
>>>>> some times after few hundreds S/R cycles. the maximum that I saw is
>>>>> 2422 S/R cycles.
>>>>>
>>>>> But every time DM crash, I can see one or two iommu errors ahead:
>>>>>
>>>>> *AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=****
>>>>> flags=0x0070]*
>>>> This error is not from IOMMUv2 doing GVA to GPA translations. It's from
>>>> IOMMUv1 doing GPA to SPA translation. This error points to an invalid
>>>> physical (GVA) address being used by the GPU to access random system
>>>>
>>>> Oops: s/GVA/GPA
>>>> memory it shouldn't be accessing (because there is no valid DMA mapping).
>>>>
>>>> On AMD systems, IOMMUv1 tends to be in pass-through mode when IOMMUv2 is
>>>> enabled. It's possible that the earlier initialization of IOMMUv2 hides
>>>> the problem by putting the IOMMU into passthrough mode. I don't think
>>>> this patch series is a valid solution.
>> [JZ] Good to know, thanks! So amd_iommu_init_device is for v2 only.
>>
>> And it is supposed to be safe to do amd_iommu_init_device after amdgpu
>> IP init/resume without any interference.
>>
> Yes, it's supposed to. But with your results below, this is getting very
> confusing. It's as if the IOMMUv2 initialization has some unintended
> side effects if it happens at the wrong moment during resume. If you
> want to debug this further, you'll probably need to work with the server
> team that's working on the IOMMU driver. I'm not sure it's worth the
> trouble.

[JZ] Can you point to me who is the right person from service team? I 
wish they

can give a review on the patches and issues? Thanks!

Also I got advice from Ray, and used ignore_crat=1 during modprobe to 
get iommu v2 fallthrough,

and it  also fixed the issue.

> The series is
>
> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
>
>
>>>> You can probably fix the problem with this kernel boot parameter: iommu=pt
>> [JZ] Still not working after apply *iommu=pt*
>>
>> BOOT_IMAGE=/boot/vmlinuz-5.8.0-41-generic
>> root=UUID=030a18fe-22f0-49be-818f-192093d543b5 quiet splash
>> modprobe.blacklist=amdgpu *iommu=pt* 3
>> [    0.612117] iommu: Default domain type: *Passthrough* (set via
>> kernel command line)
>> [  354.067871] amdgpu 0000:04:00.0: AMD-Vi: Event logged
>> [*IO_PAGE_FAULT* domain=0x0000 address=0x32de00040 flags=0x0070]
>> [  354.067884] amdgpu 0000:04:00.0: AMD-Vi: Event logged
>> [IO_PAGE_FAULT domain=0x0000 address=0x32de40000 flags=0x0070]
>>
>>>> And you can probably reproduce it even with this patch series if instead
>>>> you add: iommu=nopt amd_iommu=force_isolation
>> [JZ] could not set both *iommu=nopt *and*  amd*_*iommu=force_isolation
>> *together*. *(does it mean something?)*
>> *
>>
>> BOOT_IMAGE=/boot/vmlinuz-5.13.0-custom+
>> root=UUID=030a18fe-22f0-49be-818f-192093d543b5 quiet splash
>> modprobe.blacklist=amdgpu*iommu=nopt amd_iommu=force_isolation* 3
>> [    0.294242] iommu: Default domain type: Translated (set via kernel
>> command line)
>> [    0.350675] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4
>> counters/bank).
>> [  106.403927] amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_resume
>> failed (-6).
>> [  106.403931] PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -6
>> [  106.403941] amdgpu 0000:04:00.0: PM: failed to resume async: error -6
>>
> This is weird. Is this happening during resume or driver init?
[JZ] this happened during init, not resume.
>
>
>> *iommu=nopt**: *Passed at least 200 S/R cycles
>>
>> BOOT_IMAGE=/boot/vmlinuz-5.13.0-custom+
>> root=UUID=030a18fe-22f0-49be-818f-192093d543b5 quiet splash
>> modprobe.blacklist=amdgpu   *iommu=nopt* 3
>> [    0.294242] iommu: Default domain type: Translated (set via kernel
>> command line)
>>
> Interesting. That's the opposite of what I would have expected.
>
>
>> *amd_iommu=force_isolation*: failed at 1st resume
>>
>> BOOT_IMAGE=/boot/vmlinuz-5.13.0-custom+
>> root=UUID=030a18fe-22f0-49be-818f-192093d543b5 quiet splash
>> modprobe.blacklist=amdgpu *amd_iommu=force_isolation*   3
>> [    0.294242] iommu: Default domain type: Translated
>>
>> [   49.513262] PM: suspend entry (deep)
>> [   49.514404] Filesystems sync: 0.001 seconds
>> [   49.514668] Freezing user space processes ...
>> [   69.523111] Freezing of tasks failed after 20.008 seconds (2 tasks
>> refusing to freeze, wq_busy=0):
>> [   69.523163] task:gnome-shell     state:D stack:    0 pid: 2196
>> ppid:  2108 flags:0x00000004
>>
> I've never seen this before.
>
> I think if you want to debug this further, you'll need to work with the
> IOMMU driver team.
>
> Regards,
>    Felix
>
>
>> [   69.523172] Call Trace:
>> [   69.523182]  __schedule+0x2ee/0x900
>> [   69.523193]  ? __mod_memcg_lruvec_state+0x22/0xe0
>> [   69.523204]  schedule+0x4f/0xc0
>> [   69.523214]  drm_sched_entity_flush+0x17c/0x230 [gpu_sched]
>> [   69.523225]  ? wait_woken+0x80/0x80
>> [   69.523233]  amdgpu_ctx_mgr_entity_flush+0x97/0xf0 [amdgpu]
>> [   69.523517]  amdgpu_flush+0x2b/0x50 [amdgpu]
>> [   69.523773]  filp_close+0x37/0x70
>> [   69.523780]  do_close_on_exec+0xda/0x110
>> [   69.523787]  begin_new_exec+0x59d/0xa40
>> [   69.523793]  load_elf_binary+0x144/0x1720
>> [   69.523801]  ? __kernel_read+0x1a0/0x2d0
>> [   69.523807]  ? __kernel_read+0x1a0/0x2d0
>> [   69.523812]  ? aa_get_task_label+0x49/0xd0
>> [   69.523820]  bprm_execve+0x288/0x680
>> [   69.523826]  do_execveat_common.isra.0+0x189/0x1c0
>> [   69.523831]  __x64_sys_execve+0x37/0x50
>> [   69.523836]  do_syscall_64+0x40/0xb0
>> [   69.523843]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [   69.523851] RIP: 0033:0x7ff1244132fb
>> [   69.523856] RSP: 002b:00007fff91a9f2b8 EFLAGS: 00000206 ORIG_RAX:
>> 000000000000003b
>> [   69.523862] RAX: ffffffffffffffda RBX: 00007ff11ee2e180 RCX:
>> 00007ff1244132fb
>> [   69.523866] RDX: 0000561199f5bc00 RSI: 000056119a1b0890 RDI:
>> 0000561199f2021a
>> [   69.523868] RBP: 000000000000001a R08: 00007fff91aa2a58 R09:
>> 000000179a034e00
>> [   69.523871] R10: 000056119a1b0890 R11: 0000000000000206 R12:
>> 00007fff91aa2a60
>> [   69.523874] R13: 000056119a1b0890 R14: 0000561199f2021a R15:
>> 0000000000000001
>> [   69.523882] task:gst-plugin-scan state:D stack:    0 pid: 2213
>> ppid:  2199 flags:0x00004004
>> [   69.523888] Call Trace:
>> [   69.523891]  __schedule+0x2ee/0x900
>> [   69.523897]  schedule+0x4f/0xc0
>> [   69.523902]  drm_sched_entity_flush+0x17c/0x230 [gpu_sched]
>> [   69.523912]  ? wait_woken+0x80/0x80
>> [   69.523918]  drm_sched_entity_destroy+0x18/0x30 [gpu_sched]
>> [   69.523928]  amdgpu_vm_fini+0x256/0x3d0 [amdgpu]
>> [   69.524210]  amdgpu_driver_postclose_kms+0x179/0x240 [amdgpu]
>> [   69.524444]  drm_file_free.part.0+0x1e5/0x250 [drm]
>> [   69.524481]  ? dma_fence_release+0x140/0x140
>> [   69.524489]  drm_close_helper.isra.0+0x65/0x70 [drm]
>> [   69.524524]  drm_release+0x6e/0xf0 [drm]
>> [   69.524559]  __fput+0x9f/0x250
>> [   69.524564]  ____fput+0xe/0x10
>> [   69.524569]  task_work_run+0x70/0xb0
>> [   69.524575]  exit_to_user_mode_prepare+0x1c8/0x1d0
>> [   69.524581]  syscall_exit_to_user_mode+0x27/0x50
>> [   69.524586]  ? __x64_sys_close+0x12/0x40
>> [   69.524589]  do_syscall_64+0x4d/0xb0
>> [   69.524594]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [   69.524599] RIP: 0033:0x7f2c12adb9ab
>> [   69.524602] RSP: 002b:00007fff981aaaa0 EFLAGS: 00000293 ORIG_RAX:
>> 0000000000000003
>> [   69.524606] RAX: 0000000000000000 RBX: 0000556b6f83f060 RCX:
>> 00007f2c12adb9ab
>> [   69.524608] RDX: 0000000000000014 RSI: 0000556b6f841400 RDI:
>> 0000000000000006
>> [   69.524611] RBP: 0000556b6f83f100 R08: 0000000000000000 R09:
>> 000000000000000e
>> [   69.524613] R10: 00007fff981db090 R11: 0000000000000293 R12:
>> 0000556b6f841400
>> [   69.524616] R13: 00007f2c12763e30 R14: 0000556b6f817330 R15:
>> 00007f2c127420b4
>>
>>>> Regards,
>>>>    Felix
>>>>
>>>>
>>>>> Since we can't stop HW/FW/SW right the way after IO page fault
>>>>> detected, so I can't tell which part try to access
>>>>> system memory through IOMMU.
>>>>>
>>>>> But after moving IOMMU device init before other amdgpu IP init/resume,
>>>>> the DM crash /IOMMU page fault issues are gone.
>>>>>
>>>>> Those patches can't directly explain why the issue fixed, but this new
>>>>> sequence makes more sense to me.
>>>>>
>>>>> Can I have you RB on those patches?
>>>>>
>>>>> Thanks!
>>>>> James
>>>>>
>>>>>> Regards,
>>>>>>    Felix
>>>>>>
>>>>>>
>>>>>>> Signed-off-by: James Zhu <James.Zhu@amd.com>
>>>>>>> ---
>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 ++++++++++++
>>>>>>>   1 file changed, 12 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> index 653bd8f..e3f0308 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> @@ -2393,6 +2393,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>>>>>>>   	if (r)
>>>>>>>   		goto init_failed;
>>>>>>>   
>>>>>>> +	r = amdgpu_amdkfd_resume_iommu(adev);
>>>>>>> +	if (r)
>>>>>>> +		goto init_failed;
>>>>>>> +
>>>>>>>   	r = amdgpu_device_ip_hw_init_phase1(adev);
>>>>>>>   	if (r)
>>>>>>>   		goto init_failed;
>>>>>>> @@ -3147,6 +3151,10 @@ static int amdgpu_device_ip_resume(struct amdgpu_device *adev)
>>>>>>>   {
>>>>>>>   	int r;
>>>>>>>   
>>>>>>> +	r = amdgpu_amdkfd_resume_iommu(adev);
>>>>>>> +	if (r)
>>>>>>> +		return r;
>>>>>>> +
>>>>>>>   	r = amdgpu_device_ip_resume_phase1(adev);
>>>>>>>   	if (r)
>>>>>>>   		return r;
>>>>>>> @@ -4602,6 +4610,10 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
>>>>>>>   				dev_warn(tmp_adev->dev, "asic atom init failed!");
>>>>>>>   			} else {
>>>>>>>   				dev_info(tmp_adev->dev, "GPU reset succeeded, trying to resume\n");
>>>>>>> +				r = amdgpu_amdkfd_resume_iommu(tmp_adev);
>>>>>>> +				if (r)
>>>>>>> +					goto out;
>>>>>>> +
>>>>>>>   				r = amdgpu_device_ip_resume_phase1(tmp_adev);
>>>>>>>   				if (r)
>>>>>>>   					goto out;