* [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset @ 2020-06-26 14:04 Alex Deucher 2020-07-01 5:33 ` Alex Deucher 0 siblings, 1 reply; 6+ messages in thread From: Alex Deucher @ 2020-06-26 14:04 UTC (permalink / raw) To: amd-gfx; +Cc: Alex Deucher When the GPU is in reset, accessing the hw is unreliable and could interfere with the reset. Return an error in those cases. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 341d072edd95..fd51d6554ee2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -684,6 +684,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file if (info->read_mmr_reg.count > 128) return -EINVAL; + if (adev->in_gpu_reset) + return -EPERM; + regs = kmalloc_array(info->read_mmr_reg.count, sizeof(*regs), GFP_KERNEL); if (!regs) return -ENOMEM; @@ -854,6 +857,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file if (!adev->pm.dpm_enabled) return -ENOENT; + if (adev->in_gpu_reset) + return -EPERM; + switch (info->sensor_info.type) { case AMDGPU_INFO_SENSOR_GFX_SCLK: /* get sclk in Mhz */ -- 2.25.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset 2020-06-26 14:04 [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset Alex Deucher @ 2020-07-01 5:33 ` Alex Deucher 2020-07-01 8:20 ` Christian König 0 siblings, 1 reply; 6+ messages in thread From: Alex Deucher @ 2020-07-01 5:33 UTC (permalink / raw) To: amd-gfx list; +Cc: Alex Deucher ping? On Fri, Jun 26, 2020 at 10:04 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > When the GPU is in reset, accessing the hw is unreliable and could > interfere with the reset. Return an error in those cases. > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > index 341d072edd95..fd51d6554ee2 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > @@ -684,6 +684,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file > if (info->read_mmr_reg.count > 128) > return -EINVAL; > > + if (adev->in_gpu_reset) > + return -EPERM; > + > regs = kmalloc_array(info->read_mmr_reg.count, sizeof(*regs), GFP_KERNEL); > if (!regs) > return -ENOMEM; > @@ -854,6 +857,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file > if (!adev->pm.dpm_enabled) > return -ENOENT; > > + if (adev->in_gpu_reset) > + return -EPERM; > + > switch (info->sensor_info.type) { > case AMDGPU_INFO_SENSOR_GFX_SCLK: > /* get sclk in Mhz */ > -- > 2.25.4 > _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset 2020-07-01 5:33 ` Alex Deucher @ 2020-07-01 8:20 ` Christian König 2020-07-01 14:34 ` Li, Dennis 0 siblings, 1 reply; 6+ messages in thread From: Christian König @ 2020-07-01 8:20 UTC (permalink / raw) To: Alex Deucher, amd-gfx list; +Cc: Alex Deucher I don't think this is a good idea, we should probably rather wait for the GPU reset to finish by taking the appropriate lock. Christian. Am 01.07.20 um 07:33 schrieb Alex Deucher: > ping? > > On Fri, Jun 26, 2020 at 10:04 AM Alex Deucher <alexdeucher@gmail.com> wrote: >> When the GPU is in reset, accessing the hw is unreliable and could >> interfere with the reset. Return an error in those cases. >> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> index 341d072edd95..fd51d6554ee2 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> @@ -684,6 +684,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file >> if (info->read_mmr_reg.count > 128) >> return -EINVAL; >> >> + if (adev->in_gpu_reset) >> + return -EPERM; >> + >> regs = kmalloc_array(info->read_mmr_reg.count, sizeof(*regs), GFP_KERNEL); >> if (!regs) >> return -ENOMEM; >> @@ -854,6 +857,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file >> if (!adev->pm.dpm_enabled) >> return -ENOENT; >> >> + if (adev->in_gpu_reset) >> + return -EPERM; >> + >> switch (info->sensor_info.type) { >> case AMDGPU_INFO_SENSOR_GFX_SCLK: >> /* get sclk in Mhz */ >> -- >> 2.25.4 >> > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset 2020-07-01 8:20 ` Christian König @ 2020-07-01 14:34 ` Li, Dennis 2020-07-03 6:05 ` Felix Kuehling 0 siblings, 1 reply; 6+ messages in thread From: Li, Dennis @ 2020-07-01 14:34 UTC (permalink / raw) To: Koenig, Christian, Alex Deucher, amd-gfx list; +Cc: Deucher, Alexander [AMD Official Use Only - Internal Distribution Only] Hi, Christian and Alex Not only amdgpu ioctls, but amdkfd ioctls also have the same issue. Best Regards Dennis Li -----Original Message----- From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König Sent: Wednesday, July 1, 2020 4:20 PM To: Alex Deucher <alexdeucher@gmail.com>; amd-gfx list <amd-gfx@lists.freedesktop.org> Cc: Deucher, Alexander <Alexander.Deucher@amd.com> Subject: Re: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset I don't think this is a good idea, we should probably rather wait for the GPU reset to finish by taking the appropriate lock. Christian. Am 01.07.20 um 07:33 schrieb Alex Deucher: > ping? > > On Fri, Jun 26, 2020 at 10:04 AM Alex Deucher <alexdeucher@gmail.com> wrote: >> When the GPU is in reset, accessing the hw is unreliable and could >> interfere with the reset. Return an error in those cases. >> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> index 341d072edd95..fd51d6554ee2 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >> @@ -684,6 +684,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file >> if (info->read_mmr_reg.count > 128) >> return -EINVAL; >> >> + if (adev->in_gpu_reset) >> + return -EPERM; >> + >> regs = kmalloc_array(info->read_mmr_reg.count, sizeof(*regs), GFP_KERNEL); >> if (!regs) >> return -ENOMEM; @@ -854,6 +857,9 @@ static >> int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file >> if (!adev->pm.dpm_enabled) >> return -ENOENT; >> >> + if (adev->in_gpu_reset) >> + return -EPERM; >> + >> switch (info->sensor_info.type) { >> case AMDGPU_INFO_SENSOR_GFX_SCLK: >> /* get sclk in Mhz */ >> -- >> 2.25.4 >> > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist > s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7CDe > nnis.Li%40amd.com%7Cefeeda4b6d194660fbc508d81d9791a3%7C3dd8961fe4884e6 > 08e11a82d994e183d%7C0%7C0%7C637291884123360340&sdata=GNPWQNndUJKx7 > 70fDTuRGBnJzfmRUQjD4B1HBie3xUQ%3D&reserved=0 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7CDennis.Li%40amd.com%7Cefeeda4b6d194660fbc508d81d9791a3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637291884123360340&sdata=GNPWQNndUJKx770fDTuRGBnJzfmRUQjD4B1HBie3xUQ%3D&reserved=0 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset 2020-07-01 14:34 ` Li, Dennis @ 2020-07-03 6:05 ` Felix Kuehling 2020-07-03 7:50 ` Christian König 0 siblings, 1 reply; 6+ messages in thread From: Felix Kuehling @ 2020-07-03 6:05 UTC (permalink / raw) To: Li, Dennis, Koenig, Christian, Alex Deucher, amd-gfx list Cc: Deucher, Alexander Am 2020-07-01 um 10:34 a.m. schrieb Li, Dennis: > [AMD Official Use Only - Internal Distribution Only] > > Hi, Christian and Alex > Not only amdgpu ioctls, but amdkfd ioctls also have the same issue. Most KFD ioctls don't access HW directly. The only place that interacts with HW in KFD is the device queues manager (DQM) and beneath it the packet manager. In DQM we already have protections to avoid HW access while a reset is in progress. For other HW access, KFD goes through helper functions in amdgpu. Memory management ioctls indirectly access HW for page table updates. However, that requires validating the page table BOs first. Are VRAM BOs considered "valid" during a GPU reset? When using SDMA for page table updates, the DRM GPU scheduler is also involved. Is that suspended during a GPU reset? The only other KFD ioctl that looks like it might access HW during a GPU reset is kfd_ioctl_get_clock_counters by calling amdgpu_amdkfd_get_gpu_clock_counter. Regards, Felix > > Best Regards > Dennis Li > -----Original Message----- > From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König > Sent: Wednesday, July 1, 2020 4:20 PM > To: Alex Deucher <alexdeucher@gmail.com>; amd-gfx list <amd-gfx@lists.freedesktop.org> > Cc: Deucher, Alexander <Alexander.Deucher@amd.com> > Subject: Re: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset > > I don't think this is a good idea, we should probably rather wait for the GPU reset to finish by taking the appropriate lock. > > Christian. > > Am 01.07.20 um 07:33 schrieb Alex Deucher: >> ping? >> >> On Fri, Jun 26, 2020 at 10:04 AM Alex Deucher <alexdeucher@gmail.com> wrote: >>> When the GPU is in reset, accessing the hw is unreliable and could >>> interfere with the reset. Return an error in those cases. >>> >>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>> index 341d072edd95..fd51d6554ee2 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>> @@ -684,6 +684,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file >>> if (info->read_mmr_reg.count > 128) >>> return -EINVAL; >>> >>> + if (adev->in_gpu_reset) >>> + return -EPERM; >>> + >>> regs = kmalloc_array(info->read_mmr_reg.count, sizeof(*regs), GFP_KERNEL); >>> if (!regs) >>> return -ENOMEM; @@ -854,6 +857,9 @@ static >>> int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file >>> if (!adev->pm.dpm_enabled) >>> return -ENOENT; >>> >>> + if (adev->in_gpu_reset) >>> + return -EPERM; >>> + >>> switch (info->sensor_info.type) { >>> case AMDGPU_INFO_SENSOR_GFX_SCLK: >>> /* get sclk in Mhz */ >>> -- >>> 2.25.4 >>> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://list >> s.freedesktop.org/mailman/listinfo/amd-gfx >> nnis.Li%40amd.com%7Cefeeda4b6d194660fbc508d81d9791a3%7C3dd8961fe4884e6 >> 08e11a82d994e183d%7C0%7C0%7C637291884123360340&sdata=GNPWQNndUJKx7 >> 70fDTuRGBnJzfmRUQjD4B1HBie3xUQ%3D&reserved=0 > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset 2020-07-03 6:05 ` Felix Kuehling @ 2020-07-03 7:50 ` Christian König 0 siblings, 0 replies; 6+ messages in thread From: Christian König @ 2020-07-03 7:50 UTC (permalink / raw) To: Felix Kuehling, Li, Dennis, Alex Deucher, amd-gfx list; +Cc: Deucher, Alexander Am 03.07.20 um 08:05 schrieb Felix Kuehling: > Am 2020-07-01 um 10:34 a.m. schrieb Li, Dennis: >> [AMD Official Use Only - Internal Distribution Only] >> >> Hi, Christian and Alex >> Not only amdgpu ioctls, but amdkfd ioctls also have the same issue. > Most KFD ioctls don't access HW directly. The only place that interacts > with HW in KFD is the device queues manager (DQM) and beneath it the > packet manager. In DQM we already have protections to avoid HW access > while a reset is in progress. > > For other HW access, KFD goes through helper functions in amdgpu. > > Memory management ioctls indirectly access HW for page table updates. > However, that requires validating the page table BOs first. Are VRAM BOs > considered "valid" during a GPU reset? When using SDMA for page table > updates, the DRM GPU scheduler is also involved. Is that suspended > during a GPU reset? That stuff should work concurrently. The scheduler is stopped during a reset, but we can still push new jobs to the queues. Stuff like TLB flushes are also harmless since after a reset we can safely assume that the TLB is completely empty. > The only other KFD ioctl that looks like it might access HW during a GPU > reset is kfd_ioctl_get_clock_counters by calling > amdgpu_amdkfd_get_gpu_clock_counter. Yeah, that is indeed a problem which needs handling. Christian. > > Regards, > Felix > > > >> Best Regards >> Dennis Li >> -----Original Message----- >> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König >> Sent: Wednesday, July 1, 2020 4:20 PM >> To: Alex Deucher <alexdeucher@gmail.com>; amd-gfx list <amd-gfx@lists.freedesktop.org> >> Cc: Deucher, Alexander <Alexander.Deucher@amd.com> >> Subject: Re: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset >> >> I don't think this is a good idea, we should probably rather wait for the GPU reset to finish by taking the appropriate lock. >> >> Christian. >> >> Am 01.07.20 um 07:33 schrieb Alex Deucher: >>> ping? >>> >>> On Fri, Jun 26, 2020 at 10:04 AM Alex Deucher <alexdeucher@gmail.com> wrote: >>>> When the GPU is in reset, accessing the hw is unreliable and could >>>> interfere with the reset. Return an error in those cases. >>>> >>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> index 341d072edd95..fd51d6554ee2 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c >>>> @@ -684,6 +684,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file >>>> if (info->read_mmr_reg.count > 128) >>>> return -EINVAL; >>>> >>>> + if (adev->in_gpu_reset) >>>> + return -EPERM; >>>> + >>>> regs = kmalloc_array(info->read_mmr_reg.count, sizeof(*regs), GFP_KERNEL); >>>> if (!regs) >>>> return -ENOMEM; @@ -854,6 +857,9 @@ static >>>> int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file >>>> if (!adev->pm.dpm_enabled) >>>> return -ENOENT; >>>> >>>> + if (adev->in_gpu_reset) >>>> + return -EPERM; >>>> + >>>> switch (info->sensor_info.type) { >>>> case AMDGPU_INFO_SENSOR_GFX_SCLK: >>>> /* get sclk in Mhz */ >>>> -- >>>> 2.25.4 >>>> >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@lists.freedesktop.org >>> https://list >>> s.freedesktop.org/mailman/listinfo/amd-gfx >>> nnis.Li%40amd.com%7Cefeeda4b6d194660fbc508d81d9791a3%7C3dd8961fe4884e6 >>> 08e11a82d994e183d%7C0%7C0%7C637291884123360340&sdata=GNPWQNndUJKx7 >>> 70fDTuRGBnJzfmRUQjD4B1HBie3xUQ%3D&reserved=0 >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-07-03 7:50 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-26 14:04 [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset Alex Deucher 2020-07-01 5:33 ` Alex Deucher 2020-07-01 8:20 ` Christian König 2020-07-01 14:34 ` Li, Dennis 2020-07-03 6:05 ` Felix Kuehling 2020-07-03 7:50 ` Christian König
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).