amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Felix Kuehling <felix.kuehling@amd.com>
To: "Li, Dennis" <Dennis.Li@amd.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	Alex Deucher <alexdeucher@gmail.com>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset
Date: Fri, 3 Jul 2020 02:05:44 -0400	[thread overview]
Message-ID: <ba40ec07-b7b3-4b12-283b-d001a3adbc74@amd.com> (raw)
In-Reply-To: <DM5PR12MB253318A2BF34A961F75176C0ED6C0@DM5PR12MB2533.namprd12.prod.outlook.com>


Am 2020-07-01 um 10:34 a.m. schrieb Li, Dennis:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi, Christian and Alex
>       Not only amdgpu ioctls, but amdkfd ioctls also have the same issue. 

Most KFD ioctls don't access HW directly. The only place that interacts
with HW in KFD is the device queues manager (DQM) and beneath it the
packet manager. In DQM we already have protections to avoid HW access
while a reset is in progress.

For other HW access, KFD goes through helper functions in amdgpu.

Memory management ioctls indirectly access HW for page table updates.
However, that requires validating the page table BOs first. Are VRAM BOs
considered "valid" during a GPU reset? When using SDMA for page table
updates, the DRM GPU scheduler is also involved. Is that suspended
during a GPU reset?

The only other KFD ioctl that looks like it might access HW during a GPU
reset is kfd_ioctl_get_clock_counters by calling
amdgpu_amdkfd_get_gpu_clock_counter.

Regards,
  Felix



>
> Best Regards
> Dennis Li
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
> Sent: Wednesday, July 1, 2020 4:20 PM
> To: Alex Deucher <alexdeucher@gmail.com>; amd-gfx list <amd-gfx@lists.freedesktop.org>
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset
>
> I don't think this is a good idea, we should probably rather wait for the GPU reset to finish by taking the appropriate lock.
>
> Christian.
>
> Am 01.07.20 um 07:33 schrieb Alex Deucher:
>> ping?
>>
>> On Fri, Jun 26, 2020 at 10:04 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>>> When the GPU is in reset, accessing the hw is unreliable and could 
>>> interfere with the reset.  Return an error in those cases.
>>>
>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 ++++++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>> index 341d072edd95..fd51d6554ee2 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>> @@ -684,6 +684,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file
>>>                  if (info->read_mmr_reg.count > 128)
>>>                          return -EINVAL;
>>>
>>> +               if (adev->in_gpu_reset)
>>> +                       return -EPERM;
>>> +
>>>                  regs = kmalloc_array(info->read_mmr_reg.count, sizeof(*regs), GFP_KERNEL);
>>>                  if (!regs)
>>>                          return -ENOMEM; @@ -854,6 +857,9 @@ static 
>>> int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file
>>>                  if (!adev->pm.dpm_enabled)
>>>                          return -ENOENT;
>>>
>>> +               if (adev->in_gpu_reset)
>>> +                       return -EPERM;
>>> +
>>>                  switch (info->sensor_info.type) {
>>>                  case AMDGPU_INFO_SENSOR_GFX_SCLK:
>>>                          /* get sclk in Mhz */
>>> --
>>> 2.25.4
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://list
>> s.freedesktop.org/mailman/listinfo/amd-gfx
>> nnis.Li%40amd.com%7Cefeeda4b6d194660fbc508d81d9791a3%7C3dd8961fe4884e6
>> 08e11a82d994e183d%7C0%7C0%7C637291884123360340&amp;sdata=GNPWQNndUJKx7
>> 70fDTuRGBnJzfmRUQjD4B1HBie3xUQ%3D&amp;reserved=0
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply	other threads:[~2020-07-03  6:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-26 14:04 [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset Alex Deucher
2020-07-01  5:33 ` Alex Deucher
2020-07-01  8:20   ` Christian König
2020-07-01 14:34     ` Li, Dennis
2020-07-03  6:05       ` Felix Kuehling [this message]
2020-07-03  7:50         ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba40ec07-b7b3-4b12-283b-d001a3adbc74@amd.com \
    --to=felix.kuehling@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Dennis.Li@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).