force app kill patch

* force app kill patch
@ 2018-04-18  7:11 Liu, Monk
       [not found] ` <BLUPR12MB044915BDA633308DF65967BD84B60-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Liu, Monk @ 2018-04-18  7:11 UTC (permalink / raw)
  To: Koenig, Christian, Deng, Emily; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

[-- Attachment #1.1: Type: text/plain, Size: 1183 bytes --]

Hi Christian & Emily

I think the v4 fix for "fix force app kill hang" is still not good enough:

First:
See that in "sched_entity_fini", we only call dma_fence_put(entity->last_scheduled" under the condition of "If (entity->fini_status)", so
This way there is memory leak for the case of "entity->fini_stats ==0"

Second:
If we move dma_fence_put(entity->last_scheduled) out of the condition of "if (entity->fini_status)", the memory leak issue can be fixed
But there will be kernel NULL pointer access, I think the time you call dma_fence_put(entity->last_scheduled") may actually executed *not*
On the last scheduled fence of this entity, because it is run without "thread_park/unpark" pair which to make sure scheduler not dealing this entity

So with certain race issue, here is the scenario:

1.        scheduler is doing the dma_fence_put() on the 1st fence,

2.        scheduler set entity->last_scheduled to 1st fence

3.        now sched_entity_fini() run, and it call dma_fence_put() on entity->last_scheduled

4.        now this 1st fence is actually put double time and the real last fence won't get put by expected

any idea?

/Monk

[-- Attachment #1.2: Type: text/html, Size: 7943 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread