All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Liu, Monk" <Monk.Liu-5C7GfCeVMHo@public.gmane.org>
To: "Koenig,
	Christian" <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>,
	"amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Subject: RE: [PATCH 3/4] drm/amdgpu: don't return when ring not ready for fill_buffer
Date: Thu, 1 Mar 2018 08:51:41 +0000	[thread overview]
Message-ID: <BLUPR12MB04491577FB4B6D6A5BCC3B8784C60@BLUPR12MB0449.namprd12.prod.outlook.com> (raw)
In-Reply-To: <6dedfc7f-cf69-676f-463d-be52cda1b1bb-5C7GfCeVMHo@public.gmane.org>

> amdgpu_evict_flags() and amdgpu_bo_move() needs to know if the SDMA ring is available during suspend/resume.

As long as we are in gpu reset, we don't return error when ring not ready, cuz eventually it either success or failed and rescheduled by scheduler since it is kernel job
What's your concern ?

/Monk


-----Original Message-----
From: Koenig, Christian 
Sent: 2018年3月1日 16:41
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 3/4] drm/amdgpu: don't return when ring not ready for fill_buffer

> If we are in gpu rest stage, we can let SDMA job live with the case of ring not ready,  can you point out where is the problem ?
amdgpu_evict_flags() and amdgpu_bo_move() needs to know if the SDMA ring is available during suspend/resume.

E.g. suspend works as this:
1. Evict all normal BOs from VRAM using the SDMA.
2. Disable SDMA, GART, PSP etc... e.g. disable the hardware.
3. Unpin all BOs in VRAM used by hardware engines, e.g. the GART table, firmware etc...
4. Evict all remaining BOs from VRAM using CPU copies.

Resume works the other way around:
1. Move all BOs for GART table and firmware back into VRAM using CPU copies.
2. Pin those BOs in VRAM and enable the hardware.
3. Use the SDMA to move back all other BOs into VRAM.

To figure out if we should use the CPU copy or the SDMA the ring->ready flag is used. So removing this check is really a no-go because it would break suspend/resume.

The only other alternative I can see is to add a separate flag in amdgpu_mman if buffer_funcs should be used or not. See all the callers of amdgpu_ttm_set_active_vram_size() as well.

Something like renaming amdgpu_ttm_set_active_vram_size() into
amdgpu_ttm_set_buffer_funcs_read() and ignoring any change if gpu_reset is set.

BTW: Changing the active VRAM size in GPU reset is quite a bug you stumbled over here, so this really needs fixing anyway.

Regards,
Christian.

Am 01.03.2018 um 09:23 schrieb Liu, Monk:
> Accel_working is only for GFX ring, I don't think test it is appropriate for SDMA jobs ....
>
> If we are in gpu rest stage, we can let SDMA job live with the case of ring not ready,  can you point out where is the problem ?
>
> In fact we did stress TDR test with this patch and it can really fix 
> couple over kill issues
>
>
> /Monk
>
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
> Sent: 2018年3月1日 16:16
> To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian 
> <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 3/4] drm/amdgpu: don't return when ring not ready 
> for fill_buffer
>
> Ah, crap yeah that won't work since we don't free the ring.
>
> Key point is we need to distinct between the ring doesn't work temporary because we are in a GPU reset and it doesn't work at all because we are missing firmware or stuff like that.
>
> And no, checking the gpu_reset flag is totally racy and can't be done either. How about checking accel_working instead?
>
> Christian.
>
> Am 01.03.2018 um 07:01 schrieb Liu, Monk:
>>> Please change the test to use ring->ring_obj instead, this way we still bail out if somebody tries to submit commands before the ring is even allocated.
>> I don't understand how could fill_buffer() get run under the case that ring->ring_obj is not even allocated ... where is such case ?
>>
>>
>> /Monk
>>
>> -----Original Message-----
>> From: Koenig, Christian
>> Sent: 2018年2月28日 20:46
>> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 3/4] drm/amdgpu: don't return when ring not ready 
>> for fill_buffer
>>
>> Good point, but in this case we need some other handling.
>>
>> Please change the test to use ring->ring_obj instead, this way we still bail out if somebody tries to submit commands before the ring is even allocated.
>>
>> And you also need to fix a couple of other places in amdgpu_ttm.c.
>>
>> Regards,
>> Christian.
>>
>> Am 28.02.2018 um 13:34 schrieb Liu, Monk:
>>> Because when SDMA was hang by like process A, and meanwhile another 
>>> process B is already running into the code of fill_buffer() So just let process B continue, don't block it otherwise process B would fail by software reason .
>>>
>>> Let it run and finally process B's job would fail and GPU recover 
>>> will repeat it again (since it is a kernel job)
>>>
>>> Without this solution other process will be greatly harmed by one 
>>> black sheep that triggering GPU recover
>>>
>>> /Monk
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
>>> Sent: 2018年2月28日 20:24
>>> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH 3/4] drm/amdgpu: don't return when ring not 
>>> ready for fill_buffer
>>>
>>> Am 28.02.2018 um 08:21 schrieb Monk Liu:
>>>> because this time SDMA may under GPU RESET so its ring->ready may 
>>>> not true, keep going and GPU scheduler will reschedule this job if 
>>>> it failed.
>>>>
>>>> give a warning on copy_buffer when go through direct_submit while
>>>> ring->ready is false
>>> NAK, that test has already saved us quite a bunch of trouble with the fb layer.
>>>
>>> Why exactly are you running into issues with that?
>>>
>>> Christian.
>>>
>>>> Change-Id: Ife6cd55e0e843d99900e5bed5418499e88633685
>>>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>>>> ---
>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 +-----
>>>>      1 file changed, 1 insertion(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> index e38e6db..7b75ac9 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> @@ -1656,6 +1656,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
>>>>      	amdgpu_ring_pad_ib(ring, &job->ibs[0]);
>>>>      	WARN_ON(job->ibs[0].length_dw > num_dw);
>>>>      	if (direct_submit) {
>>>> +		WARN_ON(!ring->ready);
>>>>      		r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs,
>>>>      				       NULL, fence);
>>>>      		job->fence = dma_fence_get(*fence); @@ -1692,11 +1693,6 @@ 
>>>> int amdgpu_fill_buffer(struct amdgpu_bo *bo,
>>>>      	struct amdgpu_job *job;
>>>>      	int r;
>>>>      
>>>> -	if (!ring->ready) {
>>>> -		DRM_ERROR("Trying to clear memory with ring turned off.\n");
>>>> -		return -EINVAL;
>>>> -	}
>>>> -
>>>>      	if (bo->tbo.mem.mem_type == TTM_PL_TT) {
>>>>      		r = amdgpu_ttm_alloc_gart(&bo->tbo);
>>>>      		if (r)
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2018-03-01  8:51 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-28  7:21 [PATCH 1/4] drm/amdgpu: stop all rings before doing gpu recover Monk Liu
     [not found] ` <1519802463-9090-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2018-02-28  7:21   ` [PATCH 2/4] drm/amdgpu: cleanups for vram lost handling Monk Liu
     [not found]     ` <1519802463-9090-2-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2018-02-28 12:22       ` Christian König
2018-02-28 13:29       ` Andrey Grodzovsky
     [not found]         ` <13337cd9-78e9-df36-f2ab-749cf182177b-5C7GfCeVMHo@public.gmane.org>
2018-02-28 13:36           ` Liu, Monk
     [not found]             ` <BLUPR12MB0449D9513F4798028569ED0884C70-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-02-28 13:41               ` Andrey Grodzovsky
2018-02-28 14:25       ` Alex Deucher
2018-02-28  7:21   ` [PATCH 3/4] drm/amdgpu: don't return when ring not ready for fill_buffer Monk Liu
     [not found]     ` <1519802463-9090-3-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2018-02-28 12:23       ` Christian König
     [not found]         ` <01879d0d-edea-680e-c9f2-1005d94f1dfd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-02-28 12:34           ` Liu, Monk
     [not found]             ` <BLUPR12MB0449BEDBC1FCE72C0021C88184C70-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-02-28 12:40               ` Liu, Monk
2018-02-28 12:45               ` Christian König
     [not found]                 ` <3a8945f9-848e-dd19-373d-5dddc69f76cb-5C7GfCeVMHo@public.gmane.org>
2018-03-01  6:01                   ` Liu, Monk
     [not found]                     ` <BLUPR12MB04498FBA36C747652758D56E84C60-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-01  8:16                       ` Christian König
     [not found]                         ` <7af9b6fb-28e8-4e25-5d4a-5b566a00cbea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-03-01  8:23                           ` Liu, Monk
     [not found]                             ` <BLUPR12MB04490BF8390F6270B265969984C60-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-01  8:41                               ` Christian König
     [not found]                                 ` <6dedfc7f-cf69-676f-463d-be52cda1b1bb-5C7GfCeVMHo@public.gmane.org>
2018-03-01  8:51                                   ` Liu, Monk [this message]
     [not found]                                     ` <BLUPR12MB04491577FB4B6D6A5BCC3B8784C60-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-01  9:25                                       ` Liu, Monk
     [not found]                                         ` <BLUPR12MB044933DC340A46EADC1F7A2784C60-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-01  9:30                                           ` Christian König
2018-03-01  9:37                                       ` Liu, Monk
     [not found]                                         ` <BLUPR12MB044905DB87968358166BDEF484C60-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-01  9:39                                           ` Christian König
     [not found]                                             ` <a6812ba2-1c34-c6a4-d65a-09f924ea0940-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-03-01  9:41                                               ` Liu, Monk
     [not found]                                                 ` <BLUPR12MB044923819BB9561C57EACA8584C60-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-01  9:51                                                   ` Christian König
     [not found]                                                     ` <26eb2e4c-33dd-2160-5b83-a7ff9e3a2558-5C7GfCeVMHo@public.gmane.org>
2018-03-01 10:11                                                       ` Liu, Monk
2018-02-28  7:21   ` [PATCH 4/4] drm/amdgpu: try again kiq access if not in IRQ Monk Liu
     [not found]     ` <1519802463-9090-4-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2018-02-28  7:41       ` Liu, Monk
2018-02-28 12:20   ` [PATCH 1/4] drm/amdgpu: stop all rings before doing gpu recover Christian König
     [not found]     ` <9e575c74-c6ce-76ef-a09c-1dec5a4807a3-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-02-28 13:30       ` Andrey Grodzovsky
     [not found]         ` <e4756f8b-d8f9-9849-aad4-a23193e367f6-5C7GfCeVMHo@public.gmane.org>
2018-02-28 13:31           ` Liu, Monk
     [not found]             ` <BLUPR12MB04491C15749E9DCAB3DEF1A684C70-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-02-28 13:42               ` Andrey Grodzovsky
2018-02-28 16:40               ` Andrey Grodzovsky
     [not found]                 ` <a54b5a4f-a370-87ec-7bac-33e6036107f9-5C7GfCeVMHo@public.gmane.org>
2018-02-28 17:42                   ` Andrey Grodzovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BLUPR12MB04491577FB4B6D6A5BCC3B8784C60@BLUPR12MB0449.namprd12.prod.outlook.com \
    --to=monk.liu-5c7gfcevmho@public.gmane.org \
    --cc=Christian.Koenig-5C7GfCeVMHo@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.