All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdkfd: dqm fence memory corruption
@ 2021-01-27 12:33 ` Qu Huang
  0 siblings, 0 replies; 12+ messages in thread
From: Qu Huang @ 2021-01-27 12:33 UTC (permalink / raw)
  To: Felix.Kuehling
  Cc: alexander.deucher, christian.koenig, airlied, daniel, amd-gfx,
	dri-devel, linux-kernel, Qu Huang

Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.

Signed-off-by: Qu Huang <jinsdb@126.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e686ce2..8b38d0c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm)
 	pr_debug("Allocating fence memory\n");
 
 	/* allocate fence memory on the gart */
-	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
+	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
 					&dqm->fence_mem);
 
 	if (retval)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] drm/amdkfd: dqm fence memory corruption
@ 2021-01-27 12:33 ` Qu Huang
  0 siblings, 0 replies; 12+ messages in thread
From: Qu Huang @ 2021-01-27 12:33 UTC (permalink / raw)
  To: Felix.Kuehling
  Cc: airlied, Qu Huang, linux-kernel, dri-devel, amd-gfx,
	alexander.deucher, christian.koenig

Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.

Signed-off-by: Qu Huang <jinsdb@126.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e686ce2..8b38d0c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm)
 	pr_debug("Allocating fence memory\n");
 
 	/* allocate fence memory on the gart */
-	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
+	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
 					&dqm->fence_mem);
 
 	if (retval)
-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] drm/amdkfd: dqm fence memory corruption
@ 2021-01-27 12:33 ` Qu Huang
  0 siblings, 0 replies; 12+ messages in thread
From: Qu Huang @ 2021-01-27 12:33 UTC (permalink / raw)
  To: Felix.Kuehling
  Cc: airlied, Qu Huang, linux-kernel, dri-devel, amd-gfx, daniel,
	alexander.deucher, christian.koenig

Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.

Signed-off-by: Qu Huang <jinsdb@126.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e686ce2..8b38d0c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm)
 	pr_debug("Allocating fence memory\n");
 
 	/* allocate fence memory on the gart */
-	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
+	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
 					&dqm->fence_mem);
 
 	if (retval)
-- 
1.8.3.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdkfd: dqm fence memory corruption
  2021-01-27 12:33 ` Qu Huang
  (?)
@ 2021-01-27 21:50   ` Felix Kuehling
  -1 siblings, 0 replies; 12+ messages in thread
From: Felix Kuehling @ 2021-01-27 21:50 UTC (permalink / raw)
  To: Qu Huang
  Cc: alexander.deucher, christian.koenig, airlied, daniel, amd-gfx,
	dri-devel, linux-kernel

Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang:
> Amdgpu driver uses 4-byte data type as DQM fence memory,
> and transmits GPU address of fence memory to microcode
> through query status PM4 message. However, query status
> PM4 message definition and microcode processing are all
> processed according to 8 bytes. Fence memory only allocates
> 4 bytes of memory, but microcode does write 8 bytes of memory,
> so there is a memory corruption.

Thank you for pointing out that discrepancy. That's a good catch!

I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer.
We should probably also fix up the query_status and
amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence
values everywhere to be consistent.

Regards,
  Felix


>
> Signed-off-by: Qu Huang <jinsdb@126.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index e686ce2..8b38d0c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm)
>  	pr_debug("Allocating fence memory\n");
>  
>  	/* allocate fence memory on the gart */
> -	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
> +	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
>  					&dqm->fence_mem);
>  
>  	if (retval)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdkfd: dqm fence memory corruption
@ 2021-01-27 21:50   ` Felix Kuehling
  0 siblings, 0 replies; 12+ messages in thread
From: Felix Kuehling @ 2021-01-27 21:50 UTC (permalink / raw)
  To: Qu Huang
  Cc: airlied, linux-kernel, dri-devel, amd-gfx, alexander.deucher,
	christian.koenig

Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang:
> Amdgpu driver uses 4-byte data type as DQM fence memory,
> and transmits GPU address of fence memory to microcode
> through query status PM4 message. However, query status
> PM4 message definition and microcode processing are all
> processed according to 8 bytes. Fence memory only allocates
> 4 bytes of memory, but microcode does write 8 bytes of memory,
> so there is a memory corruption.

Thank you for pointing out that discrepancy. That's a good catch!

I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer.
We should probably also fix up the query_status and
amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence
values everywhere to be consistent.

Regards,
  Felix


>
> Signed-off-by: Qu Huang <jinsdb@126.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index e686ce2..8b38d0c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm)
>  	pr_debug("Allocating fence memory\n");
>  
>  	/* allocate fence memory on the gart */
> -	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
> +	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
>  					&dqm->fence_mem);
>  
>  	if (retval)
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdkfd: dqm fence memory corruption
@ 2021-01-27 21:50   ` Felix Kuehling
  0 siblings, 0 replies; 12+ messages in thread
From: Felix Kuehling @ 2021-01-27 21:50 UTC (permalink / raw)
  To: Qu Huang
  Cc: airlied, linux-kernel, dri-devel, amd-gfx, daniel,
	alexander.deucher, christian.koenig

Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang:
> Amdgpu driver uses 4-byte data type as DQM fence memory,
> and transmits GPU address of fence memory to microcode
> through query status PM4 message. However, query status
> PM4 message definition and microcode processing are all
> processed according to 8 bytes. Fence memory only allocates
> 4 bytes of memory, but microcode does write 8 bytes of memory,
> so there is a memory corruption.

Thank you for pointing out that discrepancy. That's a good catch!

I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer.
We should probably also fix up the query_status and
amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence
values everywhere to be consistent.

Regards,
  Felix


>
> Signed-off-by: Qu Huang <jinsdb@126.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index e686ce2..8b38d0c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm)
>  	pr_debug("Allocating fence memory\n");
>  
>  	/* allocate fence memory on the gart */
> -	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
> +	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
>  					&dqm->fence_mem);
>  
>  	if (retval)
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdkfd: dqm fence memory corruption
  2021-01-27 21:50   ` Felix Kuehling
  (?)
@ 2021-03-26  9:38     ` Qu Huang
  -1 siblings, 0 replies; 12+ messages in thread
From: Qu Huang @ 2021-03-26  9:38 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: alexander.deucher, christian.koenig, airlied, daniel, amd-gfx,
	dri-devel, linux-kernel

On 2021/1/28 5:50, Felix Kuehling wrote:
> Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang:
>> Amdgpu driver uses 4-byte data type as DQM fence memory,
>> and transmits GPU address of fence memory to microcode
>> through query status PM4 message. However, query status
>> PM4 message definition and microcode processing are all
>> processed according to 8 bytes. Fence memory only allocates
>> 4 bytes of memory, but microcode does write 8 bytes of memory,
>> so there is a memory corruption.
> 
> Thank you for pointing out that discrepancy. That's a good catch!
> 
> I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer.
> We should probably also fix up the query_status and
> amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence
> values everywhere to be consistent.
> 
> Regards,
>    Felix
Hi Felix, Thanks for your advice, please check v2 at 
https://lore.kernel.org/patchwork/patch/1372584/
Thanks,
Qu.
> 
> 
>>
>> Signed-off-by: Qu Huang <jinsdb@126.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> index e686ce2..8b38d0c 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> @@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm)
>>   	pr_debug("Allocating fence memory\n");
>>   
>>   	/* allocate fence memory on the gart */
>> -	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
>> +	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
>>   					&dqm->fence_mem);
>>   
>>   	if (retval)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdkfd: dqm fence memory corruption
@ 2021-03-26  9:38     ` Qu Huang
  0 siblings, 0 replies; 12+ messages in thread
From: Qu Huang @ 2021-03-26  9:38 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: airlied, linux-kernel, dri-devel, amd-gfx, alexander.deucher,
	christian.koenig

On 2021/1/28 5:50, Felix Kuehling wrote:
> Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang:
>> Amdgpu driver uses 4-byte data type as DQM fence memory,
>> and transmits GPU address of fence memory to microcode
>> through query status PM4 message. However, query status
>> PM4 message definition and microcode processing are all
>> processed according to 8 bytes. Fence memory only allocates
>> 4 bytes of memory, but microcode does write 8 bytes of memory,
>> so there is a memory corruption.
> 
> Thank you for pointing out that discrepancy. That's a good catch!
> 
> I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer.
> We should probably also fix up the query_status and
> amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence
> values everywhere to be consistent.
> 
> Regards,
>    Felix
Hi Felix, Thanks for your advice, please check v2 at 
https://lore.kernel.org/patchwork/patch/1372584/
Thanks,
Qu.
> 
> 
>>
>> Signed-off-by: Qu Huang <jinsdb@126.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> index e686ce2..8b38d0c 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> @@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm)
>>   	pr_debug("Allocating fence memory\n");
>>   
>>   	/* allocate fence memory on the gart */
>> -	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
>> +	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
>>   					&dqm->fence_mem);
>>   
>>   	if (retval)

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdkfd: dqm fence memory corruption
@ 2021-03-26  9:38     ` Qu Huang
  0 siblings, 0 replies; 12+ messages in thread
From: Qu Huang @ 2021-03-26  9:38 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: airlied, linux-kernel, dri-devel, amd-gfx, daniel,
	alexander.deucher, christian.koenig

On 2021/1/28 5:50, Felix Kuehling wrote:
> Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang:
>> Amdgpu driver uses 4-byte data type as DQM fence memory,
>> and transmits GPU address of fence memory to microcode
>> through query status PM4 message. However, query status
>> PM4 message definition and microcode processing are all
>> processed according to 8 bytes. Fence memory only allocates
>> 4 bytes of memory, but microcode does write 8 bytes of memory,
>> so there is a memory corruption.
> 
> Thank you for pointing out that discrepancy. That's a good catch!
> 
> I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer.
> We should probably also fix up the query_status and
> amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence
> values everywhere to be consistent.
> 
> Regards,
>    Felix
Hi Felix, Thanks for your advice, please check v2 at 
https://lore.kernel.org/patchwork/patch/1372584/
Thanks,
Qu.
> 
> 
>>
>> Signed-off-by: Qu Huang <jinsdb@126.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> index e686ce2..8b38d0c 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> @@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm)
>>   	pr_debug("Allocating fence memory\n");
>>   
>>   	/* allocate fence memory on the gart */
>> -	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
>> +	retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
>>   					&dqm->fence_mem);
>>   
>>   	if (retval)

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdkfd: dqm fence memory corruption
  2021-03-26  9:38     ` Qu Huang
  (?)
@ 2021-03-26 19:23       ` Felix Kuehling
  -1 siblings, 0 replies; 12+ messages in thread
From: Felix Kuehling @ 2021-03-26 19:23 UTC (permalink / raw)
  To: Qu Huang
  Cc: alexander.deucher, christian.koenig, airlied, daniel, amd-gfx,
	dri-devel, linux-kernel

Am 2021-03-26 um 5:38 a.m. schrieb Qu Huang:
> On 2021/1/28 5:50, Felix Kuehling wrote:
>> Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang:
>>> Amdgpu driver uses 4-byte data type as DQM fence memory,
>>> and transmits GPU address of fence memory to microcode
>>> through query status PM4 message. However, query status
>>> PM4 message definition and microcode processing are all
>>> processed according to 8 bytes. Fence memory only allocates
>>> 4 bytes of memory, but microcode does write 8 bytes of memory,
>>> so there is a memory corruption.
>>
>> Thank you for pointing out that discrepancy. That's a good catch!
>>
>> I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer.
>> We should probably also fix up the query_status and
>> amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence
>> values everywhere to be consistent.
>>
>> Regards,
>>    Felix
> Hi Felix, Thanks for your advice, please check v2 at
> https://lore.kernel.org/patchwork/patch/1372584/

Thank you for the reminder. I somehow missed your v2 patch on the
mailing list. I have reviewed and applied it to amd-staging-drm-next now.

Regards,
  Felix


> Thanks,
> Qu.
>>
>>
>>>
>>> Signed-off-by: Qu Huang <jinsdb@126.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> index e686ce2..8b38d0c 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> @@ -1161,7 +1161,7 @@ static int start_cpsch(struct
>>> device_queue_manager *dqm)
>>>       pr_debug("Allocating fence memory\n");
>>>         /* allocate fence memory on the gart */
>>> -    retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
>>> +    retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
>>>                       &dqm->fence_mem);
>>>         if (retval)
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdkfd: dqm fence memory corruption
@ 2021-03-26 19:23       ` Felix Kuehling
  0 siblings, 0 replies; 12+ messages in thread
From: Felix Kuehling @ 2021-03-26 19:23 UTC (permalink / raw)
  To: Qu Huang
  Cc: airlied, linux-kernel, dri-devel, amd-gfx, alexander.deucher,
	christian.koenig

Am 2021-03-26 um 5:38 a.m. schrieb Qu Huang:
> On 2021/1/28 5:50, Felix Kuehling wrote:
>> Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang:
>>> Amdgpu driver uses 4-byte data type as DQM fence memory,
>>> and transmits GPU address of fence memory to microcode
>>> through query status PM4 message. However, query status
>>> PM4 message definition and microcode processing are all
>>> processed according to 8 bytes. Fence memory only allocates
>>> 4 bytes of memory, but microcode does write 8 bytes of memory,
>>> so there is a memory corruption.
>>
>> Thank you for pointing out that discrepancy. That's a good catch!
>>
>> I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer.
>> We should probably also fix up the query_status and
>> amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence
>> values everywhere to be consistent.
>>
>> Regards,
>>    Felix
> Hi Felix, Thanks for your advice, please check v2 at
> https://lore.kernel.org/patchwork/patch/1372584/

Thank you for the reminder. I somehow missed your v2 patch on the
mailing list. I have reviewed and applied it to amd-staging-drm-next now.

Regards,
  Felix


> Thanks,
> Qu.
>>
>>
>>>
>>> Signed-off-by: Qu Huang <jinsdb@126.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> index e686ce2..8b38d0c 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> @@ -1161,7 +1161,7 @@ static int start_cpsch(struct
>>> device_queue_manager *dqm)
>>>       pr_debug("Allocating fence memory\n");
>>>         /* allocate fence memory on the gart */
>>> -    retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
>>> +    retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
>>>                       &dqm->fence_mem);
>>>         if (retval)
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdkfd: dqm fence memory corruption
@ 2021-03-26 19:23       ` Felix Kuehling
  0 siblings, 0 replies; 12+ messages in thread
From: Felix Kuehling @ 2021-03-26 19:23 UTC (permalink / raw)
  To: Qu Huang
  Cc: airlied, linux-kernel, dri-devel, amd-gfx, daniel,
	alexander.deucher, christian.koenig

Am 2021-03-26 um 5:38 a.m. schrieb Qu Huang:
> On 2021/1/28 5:50, Felix Kuehling wrote:
>> Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang:
>>> Amdgpu driver uses 4-byte data type as DQM fence memory,
>>> and transmits GPU address of fence memory to microcode
>>> through query status PM4 message. However, query status
>>> PM4 message definition and microcode processing are all
>>> processed according to 8 bytes. Fence memory only allocates
>>> 4 bytes of memory, but microcode does write 8 bytes of memory,
>>> so there is a memory corruption.
>>
>> Thank you for pointing out that discrepancy. That's a good catch!
>>
>> I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer.
>> We should probably also fix up the query_status and
>> amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence
>> values everywhere to be consistent.
>>
>> Regards,
>>    Felix
> Hi Felix, Thanks for your advice, please check v2 at
> https://lore.kernel.org/patchwork/patch/1372584/

Thank you for the reminder. I somehow missed your v2 patch on the
mailing list. I have reviewed and applied it to amd-staging-drm-next now.

Regards,
  Felix


> Thanks,
> Qu.
>>
>>
>>>
>>> Signed-off-by: Qu Huang <jinsdb@126.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> index e686ce2..8b38d0c 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> @@ -1161,7 +1161,7 @@ static int start_cpsch(struct
>>> device_queue_manager *dqm)
>>>       pr_debug("Allocating fence memory\n");
>>>         /* allocate fence memory on the gart */
>>> -    retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
>>> +    retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
>>>                       &dqm->fence_mem);
>>>         if (retval)
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-03-26 19:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-27 12:33 [PATCH] drm/amdkfd: dqm fence memory corruption Qu Huang
2021-01-27 12:33 ` Qu Huang
2021-01-27 12:33 ` Qu Huang
2021-01-27 21:50 ` Felix Kuehling
2021-01-27 21:50   ` Felix Kuehling
2021-01-27 21:50   ` Felix Kuehling
2021-03-26  9:38   ` Qu Huang
2021-03-26  9:38     ` Qu Huang
2021-03-26  9:38     ` Qu Huang
2021-03-26 19:23     ` Felix Kuehling
2021-03-26 19:23       ` Felix Kuehling
2021-03-26 19:23       ` Felix Kuehling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.