Hi I've just noticed something similar when starting weston, I still see it with this patch, but not on linus's tree I'll confirm for sure tomorrow and send the stack trace if I can save it Cheers Mike On Tue, 3 Aug 2021 at 02:56, Chen, Guchun wrote: > [Public] > > Hi Alex, > > I submitted the patch before your message, I will take care of this next > time. > > Regards, > Guchun > > -----Original Message----- > From: Alex Deucher > Sent: Monday, August 2, 2021 9:35 PM > To: Chen, Guchun > Cc: Christian König ; > amd-gfx@lists.freedesktop.org; Gao, Likun ; Koenig, > Christian ; Zhang, Hawking < > Hawking.Zhang@amd.com>; Deucher, Alexander > Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in > s3 test (v2) > > On Mon, Aug 2, 2021 at 4:23 AM Chen, Guchun wrote: > > > > [Public] > > > > Thank you, Christian. > > > > Regarding fence_drv.initialized, it looks to a bit redundant, anyway let > me look into this more. > > Does this patch fix this bug? > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1668&data=04%7C01%7CGuchun.Chen%40amd.com%7C2bf8bebf5b424751572408d955ba66e8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637635081353279181%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FuAo44Ws5SnuCxt45A%2Fqmu%2B3OfEkat1G%2BixO8G9uDVc%3D&reserved=0 > > If so, please add: > Bug: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1668&data=04%7C01%7CGuchun.Chen%40amd.com%7C2bf8bebf5b424751572408d955ba66e8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637635081353279181%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FuAo44Ws5SnuCxt45A%2Fqmu%2B3OfEkat1G%2BixO8G9uDVc%3D&reserved=0 > to the commit message. > > Alex > > > > > Regards, > > Guchun > > > > -----Original Message----- > > From: Christian König > > Sent: Monday, August 2, 2021 2:56 PM > > To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; > > Gao, Likun ; Koenig, Christian > > ; Zhang, Hawking ; > > Deucher, Alexander > > Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver > > fini in s3 test (v2) > > > > Am 02.08.21 um 07:16 schrieb Guchun Chen: > > > In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to > > > stop scheduler in s3 test, otherwise, fence related failure will > > > arrive after resume. To fix this and for a better clean up, move > > > drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of > > > driver shutdown, and should never be called in hw_fini. > > > > > > v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init, > > > to keep sw_init and sw_fini paired. > > > > > > Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence > > > Suggested-by: Christian König > > > Signed-off-by: Guchun Chen > > > > It's a bit ambiguous now what fence_drv.initialized means, but I think > we can live with that for now. > > > > Patch is Reviewed-by: Christian König . > > > > Regards, > > Christian. > > > > > --- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 ++--- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 12 +++++++----- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 4 ++-- > > > 3 files changed, 11 insertions(+), 10 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > index b1d2dc39e8be..9e53ff851496 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device > > > *adev, > > > > > > fence_driver_init: > > > /* Fence driver */ > > > - r = amdgpu_fence_driver_init(adev); > > > + r = amdgpu_fence_driver_sw_init(adev); > > > if (r) { > > > - dev_err(adev->dev, "amdgpu_fence_driver_init failed\n"); > > > + dev_err(adev->dev, "amdgpu_fence_driver_sw_init > > > + failed\n"); > > > amdgpu_vf_error_put(adev, > AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 0, 0); > > > goto failed; > > > } > > > @@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev, > bool fbcon) > > > } > > > amdgpu_fence_driver_hw_init(adev); > > > > > > - > > > r = amdgpu_device_ip_late_init(adev); > > > if (r) > > > return r; > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > > index 49c5c7331c53..7495911516c2 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > > > @@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct > amdgpu_ring *ring, > > > } > > > > > > /** > > > - * amdgpu_fence_driver_init - init the fence driver > > > + * amdgpu_fence_driver_sw_init - init the fence driver > > > * for all possible rings. > > > * > > > * @adev: amdgpu device pointer > > > @@ -509,13 +509,13 @@ int amdgpu_fence_driver_init_ring(struct > amdgpu_ring *ring, > > > * amdgpu_fence_driver_start_ring(). > > > * Returns 0 for success. > > > */ > > > -int amdgpu_fence_driver_init(struct amdgpu_device *adev) > > > +int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev) > > > { > > > return 0; > > > } > > > > > > /** > > > - * amdgpu_fence_driver_fini - tear down the fence driver > > > + * amdgpu_fence_driver_hw_fini - tear down the fence driver > > > * for all possible rings. > > > * > > > * @adev: amdgpu device pointer > > > @@ -531,8 +531,7 @@ void amdgpu_fence_driver_hw_fini(struct > > > amdgpu_device *adev) > > > > > > if (!ring || !ring->fence_drv.initialized) > > > continue; > > > - if (!ring->no_scheduler) > > > - drm_sched_fini(&ring->sched); > > > + > > > /* You can't wait for HW to signal if it's gone */ > > > if (!drm_dev_is_unplugged(&adev->ddev)) > > > r = amdgpu_fence_wait_empty(ring); @@ -560,6 > > > +559,9 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev) > > > if (!ring || !ring->fence_drv.initialized) > > > continue; > > > > > > + if (!ring->no_scheduler) > > > + drm_sched_fini(&ring->sched); > > > + > > > for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j) > > > dma_fence_put(ring->fence_drv.fences[j]); > > > kfree(ring->fence_drv.fences); diff --git > > > a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > > > index 27adffa7658d..9c11ced4312c 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > > > @@ -106,7 +106,6 @@ struct amdgpu_fence_driver { > > > struct dma_fence **fences; > > > }; > > > > > > -int amdgpu_fence_driver_init(struct amdgpu_device *adev); > > > void amdgpu_fence_driver_force_completion(struct amdgpu_ring > > > *ring); > > > > > > int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, @@ > > > -115,9 +114,10 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring > *ring, > > > int amdgpu_fence_driver_start_ring(struct amdgpu_ring *ring, > > > struct amdgpu_irq_src *irq_src, > > > unsigned irq_type); > > > +void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev); > > > void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev); > > > +int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev); > > > void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev); > > > -void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev); > > > int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence > **fence, > > > unsigned flags); > > > int amdgpu_fence_emit_polling(struct amdgpu_ring *ring, uint32_t *s, >