amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Felix Kuehling <felix.kuehling@amd.com>,
	"Li, Dennis" <Dennis.Li@amd.com>,
	Alex Deucher <alexdeucher@gmail.com>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset
Date: Fri, 3 Jul 2020 09:50:31 +0200	[thread overview]
Message-ID: <8e41e787-4c70-fd84-b1ef-5e33165a5547@amd.com> (raw)
In-Reply-To: <ba40ec07-b7b3-4b12-283b-d001a3adbc74@amd.com>

Am 03.07.20 um 08:05 schrieb Felix Kuehling:
> Am 2020-07-01 um 10:34 a.m. schrieb Li, Dennis:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi, Christian and Alex
>>        Not only amdgpu ioctls, but amdkfd ioctls also have the same issue.
> Most KFD ioctls don't access HW directly. The only place that interacts
> with HW in KFD is the device queues manager (DQM) and beneath it the
> packet manager. In DQM we already have protections to avoid HW access
> while a reset is in progress.
>
> For other HW access, KFD goes through helper functions in amdgpu.
>
> Memory management ioctls indirectly access HW for page table updates.
> However, that requires validating the page table BOs first. Are VRAM BOs
> considered "valid" during a GPU reset? When using SDMA for page table
> updates, the DRM GPU scheduler is also involved. Is that suspended
> during a GPU reset?

That stuff should work concurrently. The scheduler is stopped during a 
reset, but we can still push new jobs to the queues.

Stuff like TLB flushes are also harmless since after a reset we can 
safely assume that the TLB is completely empty.

> The only other KFD ioctl that looks like it might access HW during a GPU
> reset is kfd_ioctl_get_clock_counters by calling
> amdgpu_amdkfd_get_gpu_clock_counter.

Yeah, that is indeed a problem which needs handling.

Christian.

>
> Regards,
>    Felix
>
>
>
>> Best Regards
>> Dennis Li
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
>> Sent: Wednesday, July 1, 2020 4:20 PM
>> To: Alex Deucher <alexdeucher@gmail.com>; amd-gfx list <amd-gfx@lists.freedesktop.org>
>> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset
>>
>> I don't think this is a good idea, we should probably rather wait for the GPU reset to finish by taking the appropriate lock.
>>
>> Christian.
>>
>> Am 01.07.20 um 07:33 schrieb Alex Deucher:
>>> ping?
>>>
>>> On Fri, Jun 26, 2020 at 10:04 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>>>> When the GPU is in reset, accessing the hw is unreliable and could
>>>> interfere with the reset.  Return an error in those cases.
>>>>
>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 ++++++
>>>>    1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> index 341d072edd95..fd51d6554ee2 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> @@ -684,6 +684,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file
>>>>                   if (info->read_mmr_reg.count > 128)
>>>>                           return -EINVAL;
>>>>
>>>> +               if (adev->in_gpu_reset)
>>>> +                       return -EPERM;
>>>> +
>>>>                   regs = kmalloc_array(info->read_mmr_reg.count, sizeof(*regs), GFP_KERNEL);
>>>>                   if (!regs)
>>>>                           return -ENOMEM; @@ -854,6 +857,9 @@ static
>>>> int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file
>>>>                   if (!adev->pm.dpm_enabled)
>>>>                           return -ENOENT;
>>>>
>>>> +               if (adev->in_gpu_reset)
>>>> +                       return -EPERM;
>>>> +
>>>>                   switch (info->sensor_info.type) {
>>>>                   case AMDGPU_INFO_SENSOR_GFX_SCLK:
>>>>                           /* get sclk in Mhz */
>>>> --
>>>> 2.25.4
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://list
>>> s.freedesktop.org/mailman/listinfo/amd-gfx
>>> nnis.Li%40amd.com%7Cefeeda4b6d194660fbc508d81d9791a3%7C3dd8961fe4884e6
>>> 08e11a82d994e183d%7C0%7C0%7C637291884123360340&amp;sdata=GNPWQNndUJKx7
>>> 70fDTuRGBnJzfmRUQjD4B1HBie3xUQ%3D&amp;reserved=0
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

      reply	other threads:[~2020-07-03  7:50 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-26 14:04 [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset Alex Deucher
2020-07-01  5:33 ` Alex Deucher
2020-07-01  8:20   ` Christian König
2020-07-01 14:34     ` Li, Dennis
2020-07-03  6:05       ` Felix Kuehling
2020-07-03  7:50         ` Christian König [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e41e787-4c70-fd84-b1ef-5e33165a5547@amd.com \
    --to=christian.koenig@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Dennis.Li@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).