All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: fix potential VM faults
@ 2019-09-19  8:41 Christian König
       [not found] ` <20190919084136.82658-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Christian König @ 2019-09-19  8:41 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

When we allocate new page tables under memory
pressure we should not evict old ones.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 70d45d48907a..8e44ecaada35 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -514,7 +514,8 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 		.interruptible = (bp->type != ttm_bo_type_kernel),
 		.no_wait_gpu = bp->no_wait_gpu,
 		.resv = bp->resv,
-		.flags = TTM_OPT_FLAG_ALLOW_RES_EVICT
+		.flags = bp->type != ttm_bo_type_kernel ?
+			TTM_OPT_FLAG_ALLOW_RES_EVICT : 0
 	};
 	struct amdgpu_bo *bo;
 	unsigned long page_align, size = bp->size;
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdgpu: fix potential VM faults
       [not found] ` <20190919084136.82658-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
@ 2019-09-19 12:45   ` Deucher, Alexander
  2019-09-19 14:29   ` Kuehling, Felix
  2019-09-25 13:51   ` Liu, Monk
  2 siblings, 0 replies; 6+ messages in thread
From: Deucher, Alexander @ 2019-09-19 12:45 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 1733 bytes --]

Acked-by: Alex Deucher <alexander.deucher-5C7GfCeVMHo@public.gmane.org>
________________________________
From: amd-gfx <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org> on behalf of Christian König <ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Sent: Thursday, September 19, 2019 4:41 AM
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org <amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Subject: [PATCH] drm/amdgpu: fix potential VM faults

When we allocate new page tables under memory
pressure we should not evict old ones.

Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 70d45d48907a..8e44ecaada35 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -514,7 +514,8 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
                 .interruptible = (bp->type != ttm_bo_type_kernel),
                 .no_wait_gpu = bp->no_wait_gpu,
                 .resv = bp->resv,
-               .flags = TTM_OPT_FLAG_ALLOW_RES_EVICT
+               .flags = bp->type != ttm_bo_type_kernel ?
+                       TTM_OPT_FLAG_ALLOW_RES_EVICT : 0
         };
         struct amdgpu_bo *bo;
         unsigned long page_align, size = bp->size;
--
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[-- Attachment #1.2: Type: text/html, Size: 3316 bytes --]

[-- Attachment #2: Type: text/plain, Size: 153 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdgpu: fix potential VM faults
       [not found] ` <20190919084136.82658-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2019-09-19 12:45   ` Deucher, Alexander
@ 2019-09-19 14:29   ` Kuehling, Felix
       [not found]     ` <fd47f3c6-a336-5a02-6ff2-3c24ad81dcb9-5C7GfCeVMHo@public.gmane.org>
  2019-09-25 13:51   ` Liu, Monk
  2 siblings, 1 reply; 6+ messages in thread
From: Kuehling, Felix @ 2019-09-19 14:29 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

I'm not disagreeing with the change. Just trying to understand how this 
could have caused a VM fault. If the page tables are reserved or fenced 
while you allocate a new one, they would not be evicted. If they are not 
reserved or fenced, there should be no expectation that they stay resident.

Is this related to recoverable page fault handling? Do we need some more 
generic way to handle eviction of page tables and update the parent page 
directory (invalidate the corresponding PDE)?

Regards,
   Felix

On 2019-09-19 4:41, Christian König wrote:
> When we allocate new page tables under memory
> pressure we should not evict old ones.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 70d45d48907a..8e44ecaada35 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -514,7 +514,8 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
>   		.interruptible = (bp->type != ttm_bo_type_kernel),
>   		.no_wait_gpu = bp->no_wait_gpu,
>   		.resv = bp->resv,
> -		.flags = TTM_OPT_FLAG_ALLOW_RES_EVICT
> +		.flags = bp->type != ttm_bo_type_kernel ?
> +			TTM_OPT_FLAG_ALLOW_RES_EVICT : 0
>   	};
>   	struct amdgpu_bo *bo;
>   	unsigned long page_align, size = bp->size;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdgpu: fix potential VM faults
       [not found]     ` <fd47f3c6-a336-5a02-6ff2-3c24ad81dcb9-5C7GfCeVMHo@public.gmane.org>
@ 2019-09-19 16:51       ` Christian König
  0 siblings, 0 replies; 6+ messages in thread
From: Christian König @ 2019-09-19 16:51 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

> If the page tables are reserved or fenced while you allocate a new one, they would not be evicted.
And exactly that's not correct. The TTM_OPT_FLAG_ALLOW_RES_EVICT flag 
allows evicting of reserved objects.

This is useful for allocating per VM BOs, but is of course completely 
fatal in all other cases.

Regards,
Christian.

Am 19.09.19 um 16:29 schrieb Kuehling, Felix:
> I'm not disagreeing with the change. Just trying to understand how this
> could have caused a VM fault. If the page tables are reserved or fenced
> while you allocate a new one, they would not be evicted. If they are not
> reserved or fenced, there should be no expectation that they stay resident.
>
> Is this related to recoverable page fault handling? Do we need some more
> generic way to handle eviction of page tables and update the parent page
> directory (invalidate the corresponding PDE)?
>
> Regards,
>     Felix
>
> On 2019-09-19 4:41, Christian König wrote:
>> When we allocate new page tables under memory
>> pressure we should not evict old ones.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 70d45d48907a..8e44ecaada35 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -514,7 +514,8 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
>>    		.interruptible = (bp->type != ttm_bo_type_kernel),
>>    		.no_wait_gpu = bp->no_wait_gpu,
>>    		.resv = bp->resv,
>> -		.flags = TTM_OPT_FLAG_ALLOW_RES_EVICT
>> +		.flags = bp->type != ttm_bo_type_kernel ?
>> +			TTM_OPT_FLAG_ALLOW_RES_EVICT : 0
>>    	};
>>    	struct amdgpu_bo *bo;
>>    	unsigned long page_align, size = bp->size;

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] drm/amdgpu: fix potential VM faults
       [not found] ` <20190919084136.82658-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2019-09-19 12:45   ` Deucher, Alexander
  2019-09-19 14:29   ` Kuehling, Felix
@ 2019-09-25 13:51   ` Liu, Monk
       [not found]     ` <MN2PR12MB393312A21B75620C726F18DF84870-rweVpJHSKTq/67K4VYF1uAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  2 siblings, 1 reply; 6+ messages in thread
From: Liu, Monk @ 2019-09-25 13:51 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Christian

Theoretically the vm pt/pd should be allowed to be evicted like other BOs ..

If you encountered page fault and could be avoided by this patch, that means there is bug in the VM/ttm system , and your patch simply

w/a the root cause.

_____________________________________
Monk Liu|GPU Virtualization Team |AMD


-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian K?nig
Sent: Thursday, September 19, 2019 4:42 PM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH] drm/amdgpu: fix potential VM faults

When we allocate new page tables under memory pressure we should not evict old ones.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 70d45d48907a..8e44ecaada35 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -514,7 +514,8 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 		.interruptible = (bp->type != ttm_bo_type_kernel),
 		.no_wait_gpu = bp->no_wait_gpu,
 		.resv = bp->resv,
-		.flags = TTM_OPT_FLAG_ALLOW_RES_EVICT
+		.flags = bp->type != ttm_bo_type_kernel ?
+			TTM_OPT_FLAG_ALLOW_RES_EVICT : 0
 	};
 	struct amdgpu_bo *bo;
 	unsigned long page_align, size = bp->size;
--
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdgpu: fix potential VM faults
       [not found]     ` <MN2PR12MB393312A21B75620C726F18DF84870-rweVpJHSKTq/67K4VYF1uAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2019-09-25 14:04       ` Christian König
  0 siblings, 0 replies; 6+ messages in thread
From: Christian König @ 2019-09-25 14:04 UTC (permalink / raw)
  To: Liu, Monk, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Monk,

this patch doesn't prevents PD/PT eviction.

The intention of the code here was that per VM BOs can evict other per 
VM BOs during allocation.

The problem my patch fixes is that this unfortunately also meant that 
allocation PDs/PTs could evict other PDs/PTs from the same process.

Regards,
Christian.

Am 25.09.19 um 15:51 schrieb Liu, Monk:
> Hi Christian
>
> Theoretically the vm pt/pd should be allowed to be evicted like other BOs ..
>
> If you encountered page fault and could be avoided by this patch, that means there is bug in the VM/ttm system , and your patch simply
>
> w/a the root cause.
>
> _____________________________________
> Monk Liu|GPU Virtualization Team |AMD
>
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian K?nig
> Sent: Thursday, September 19, 2019 4:42 PM
> To: amd-gfx@lists.freedesktop.org
> Subject: [PATCH] drm/amdgpu: fix potential VM faults
>
> When we allocate new page tables under memory pressure we should not evict old ones.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 70d45d48907a..8e44ecaada35 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -514,7 +514,8 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
>   		.interruptible = (bp->type != ttm_bo_type_kernel),
>   		.no_wait_gpu = bp->no_wait_gpu,
>   		.resv = bp->resv,
> -		.flags = TTM_OPT_FLAG_ALLOW_RES_EVICT
> +		.flags = bp->type != ttm_bo_type_kernel ?
> +			TTM_OPT_FLAG_ALLOW_RES_EVICT : 0
>   	};
>   	struct amdgpu_bo *bo;
>   	unsigned long page_align, size = bp->size;
> --
> 2.17.1
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-09-25 14:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-19  8:41 [PATCH] drm/amdgpu: fix potential VM faults Christian König
     [not found] ` <20190919084136.82658-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2019-09-19 12:45   ` Deucher, Alexander
2019-09-19 14:29   ` Kuehling, Felix
     [not found]     ` <fd47f3c6-a336-5a02-6ff2-3c24ad81dcb9-5C7GfCeVMHo@public.gmane.org>
2019-09-19 16:51       ` Christian König
2019-09-25 13:51   ` Liu, Monk
     [not found]     ` <MN2PR12MB393312A21B75620C726F18DF84870-rweVpJHSKTq/67K4VYF1uAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-09-25 14:04       ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.