All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Patch serials to implement guilty ctx/entity for SRIOV TDR
@ 2017-05-01  7:22 Monk Liu
       [not found] ` <1493623371-32614-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Monk Liu @ 2017-05-01  7:22 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

sometime user space submits bad command steam to kernel and with current scheme
gpu-scheduler will always resubmit all un-signaled job to hw ring after gpu reset
thus this bad submit will infinitly trigger GPU hang.

this patch serials implement a system called guilty context, which can avoid submitting
malicious jobs and invalidate the related context behind them, that way the regular
application can still continue to run, and other VF can also suffer less GPU time reductions

the guilty charge is simple: if a job hang too much times exceeds the threshold, we
consider it guilty, and we invalidates the context behind it, and pop out all job in
its entities of each scheduler. the next IOCTL on this CTX handler will get -ENODEV
error thus UMD can know this context is released by driver due to its malicious 
command submit.

Monk Liu (5):
  drm/amdgpu:keep ctx alive till all job finished
  drm/amdgpu:some modifications in amdgpu_ctx
  drm/amdgpu:Impl guilty ctx feature for sriov TDR
  drm/amdgpu:change sriov_gpu_reset interface
  drm/amdgpu:sriov TDR only recover hang ring

 drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 12 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        | 26 ++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c       | 39 ++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 43 ++++++++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c     |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       | 30 +++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h      |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h      |  2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c         |  2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c         |  2 +-
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 87 ++++++++++++++++++++++++---
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  3 +
 13 files changed, 209 insertions(+), 47 deletions(-)

-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-05-03 17:40 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-01  7:22 [PATCH 0/5] Patch serials to implement guilty ctx/entity for SRIOV TDR Monk Liu
     [not found] ` <1493623371-32614-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-05-01  7:22   ` [PATCH 1/5] drm/amdgpu:keep ctx alive till all job finished Monk Liu
     [not found]     ` <1493623371-32614-2-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-05-01 14:47       ` Christian König
     [not found]         ` <a4605d10-b1f7-7fee-63c9-829d612c63aa-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-05-03  3:30           ` Liu, Monk
     [not found]             ` <DM5PR12MB16102746DB02DBE8ED69DA9C84160-2J9CzHegvk++jCVTvoAFKAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-05-03  3:57               ` Liu, Monk
     [not found]                 ` <DM5PR12MB16107F8A55F3EF0B1C834FC384160-2J9CzHegvk++jCVTvoAFKAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-05-03  4:54                   ` Zhou, David(ChunMing)
     [not found]                     ` <MWHPR1201MB020601F998809FC8F0527723B4160-3iK1xFAIwjrUF/YbdlDdgWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-05-03  6:02                       ` Liu, Monk
     [not found]                         ` <DM5PR12MB161082763FA0163FF22E1C1F84160-2J9CzHegvk++jCVTvoAFKAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-05-03  7:23                           ` zhoucm1
     [not found]                             ` <59098580.6090204-5C7GfCeVMHo@public.gmane.org>
2017-05-03  9:11                               ` Christian König
     [not found]                                 ` <ba31391d-1f42-705b-5c94-bfd7bd1a194f-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-05-03  9:14                                   ` Liu, Monk
     [not found]                                     ` <DM5PR12MB1610875E9D1BC9E967BE119A84160-2J9CzHegvk++jCVTvoAFKAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-05-03  9:23                                       ` Christian König
2017-05-03  8:58               ` Christian König
     [not found]                 ` <059fe927-90c8-0cf3-336c-56818d9277f0-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-05-03  9:08                   ` Liu, Monk
     [not found]                     ` <DM5PR12MB1610E867F75FA922A874D74884160-2J9CzHegvk++jCVTvoAFKAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-05-03  9:18                       ` Christian König
     [not found]                         ` <eb637720-5c9a-636b-237e-228b499ff3bb-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-05-03  9:29                           ` zhoucm1
2017-05-03  9:36                           ` Liu, Monk
     [not found]                             ` <DM5PR12MB161020C82674A01805B8C8D384160-2J9CzHegvk++jCVTvoAFKAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-05-03 12:49                               ` Christian König
     [not found]                                 ` <cefbc7ee-36a7-3aba-7b4a-102a5a0f2e22-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-05-03 13:31                                   ` Liu, Monk
     [not found]                                     ` <DM5PR12MB1610C0502DE515B570B2F4C984160-2J9CzHegvk++jCVTvoAFKAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-05-03 13:34                                       ` Christian König
     [not found]                                         ` <200bd9aa-1374-69be-c155-689013ba49c5-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-05-03 13:42                                           ` Liu, Monk
     [not found]                                             ` <DM5PR12MB1610435D144871B4CC2D88AF84160-2J9CzHegvk++jCVTvoAFKAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-05-03 14:04                                               ` Christian König
     [not found]                                                 ` <44d1cc5a-a064-322a-15a6-08015378311c-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-05-03 15:17                                                   ` Liu, Monk
     [not found]                                                     ` <DM5PR12MB1610D501E6D8E8C1B25D4BC384160-2J9CzHegvk++jCVTvoAFKAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-05-03 17:40                                                       ` Christian König
2017-05-01  7:22   ` [PATCH 2/5] drm/amdgpu:some modifications in amdgpu_ctx Monk Liu
2017-05-01  7:22   ` [PATCH 3/5] drm/amdgpu:Impl guilty ctx feature for sriov TDR Monk Liu
2017-05-01  7:22   ` [PATCH 4/5] drm/amdgpu:change sriov_gpu_reset interface Monk Liu
2017-05-01  7:22   ` [PATCH 5/5] drm/amdgpu:sriov TDR only recover hang ring Monk Liu
2017-05-01 14:53   ` [PATCH 0/5] Patch serials to implement guilty ctx/entity for SRIOV TDR Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.