* [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed
@ 2018-09-10 8:57 Huang Rui
[not found] ` <1536569876-27262-1-git-send-email-ray.huang-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Huang Rui @ 2018-09-10 8:57 UTC (permalink / raw)
To: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: Tom StDenis, Huang Rui, Christian König
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="y", Size: 728 bytes --]
It avoids to be refered again after freed.
Signed-off-by: Huang Rui <ray.huang-5C7GfCeVMHo@public.gmane.org>
Cc: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
Cc: Tom StDenis <Tom.StDenis-5C7GfCeVMHo@public.gmane.org>
---
drivers/gpu/drm/ttm/ttm_bo.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 138c989..d3ef5f8 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -54,6 +54,7 @@ static struct attribute ttm_bo_count = {
static void ttm_bo_default_destroy(struct ttm_buffer_object *bo)
{
kfree(bo);
+ bo = NULL;
}
static inline int ttm_mem_type_from_place(const struct ttm_place *place,
--
2.7.4
[-- Attachment #2: Type: text/plain, Size: 154 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/2] drm/amdgpu: set tbo pointer as null after amdgpu bo is freed
[not found] ` <1536569876-27262-1-git-send-email-ray.huang-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-10 8:57 ` Huang Rui
2018-09-10 9:00 ` [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed Christian König
1 sibling, 0 replies; 13+ messages in thread
From: Huang Rui @ 2018-09-10 8:57 UTC (permalink / raw)
To: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: Tom StDenis, Huang Rui, Christian König
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="y", Size: 844 bytes --]
The tbo pointer will still have value even the amdgpu bo is freed.
It doesn't make sense that it still points a freed memory. It could be refered
mistakenly, so set it as null.
Signed-off-by: Huang Rui <ray.huang-5C7GfCeVMHo@public.gmane.org>
Cc: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
Cc: Tom StDenis <Tom.StDenis-5C7GfCeVMHo@public.gmane.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index de990bd..ae84c08 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -108,6 +108,7 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo)
}
kfree(bo->metadata);
kfree(bo);
+ tbo = NULL;
}
/**
--
2.7.4
[-- Attachment #2: Type: text/plain, Size: 154 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed
[not found] ` <1536569876-27262-1-git-send-email-ray.huang-5C7GfCeVMHo@public.gmane.org>
2018-09-10 8:57 ` [PATCH 2/2] drm/amdgpu: set tbo pointer as null after amdgpu bo is freed Huang Rui
@ 2018-09-10 9:00 ` Christian König
[not found] ` <d40314d2-f1c4-b353-bda6-50aed5dda8db-5C7GfCeVMHo@public.gmane.org>
1 sibling, 1 reply; 13+ messages in thread
From: Christian König @ 2018-09-10 9:00 UTC (permalink / raw)
To: Huang Rui, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: Tom StDenis
Hi Ray,
well those patches doesn't make sense, the pointer is only local to the
function.
Regards,
Christian.
Am 10.09.2018 um 10:57 schrieb Huang Rui:
> It avoids to be refered again after freed.
>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Tom StDenis <Tom.StDenis@amd.com>
> ---
> drivers/gpu/drm/ttm/ttm_bo.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 138c989..d3ef5f8 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -54,6 +54,7 @@ static struct attribute ttm_bo_count = {
> static void ttm_bo_default_destroy(struct ttm_buffer_object *bo)
> {
> kfree(bo);
> + bo = NULL;
> }
>
> static inline int ttm_mem_type_from_place(const struct ttm_place *place,
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed
[not found] ` <d40314d2-f1c4-b353-bda6-50aed5dda8db-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-10 9:23 ` Huang Rui
2018-09-10 9:25 ` Christian König
0 siblings, 1 reply; 13+ messages in thread
From: Huang Rui @ 2018-09-10 9:23 UTC (permalink / raw)
To: Christian König
Cc: Tom StDenis, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On Mon, Sep 10, 2018 at 11:00:04AM +0200, Christian König wrote:
> Hi Ray,
>
> well those patches doesn't make sense, the pointer is only local to
> the function.
You're right.
I narrowed it with gdb dump from ttm_bo_bulk_move_lru_tail+0x2b, the
use-after-free should be in below codes:
man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
Is there a case, when orignal bo is destroyed in the bulk pos, but it
doesn't update pos->first pointer, then we still use it during the bulk
moving?
Thanks,
Ray
>
> Regards,
> Christian.
>
> Am 10.09.2018 um 10:57 schrieb Huang Rui:
> >It avoids to be refered again after freed.
> >
> >Signed-off-by: Huang Rui <ray.huang@amd.com>
> >Cc: Christian König <christian.koenig@amd.com>
> >Cc: Tom StDenis <Tom.StDenis@amd.com>
> >---
> > drivers/gpu/drm/ttm/ttm_bo.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> >diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> >index 138c989..d3ef5f8 100644
> >--- a/drivers/gpu/drm/ttm/ttm_bo.c
> >+++ b/drivers/gpu/drm/ttm/ttm_bo.c
> >@@ -54,6 +54,7 @@ static struct attribute ttm_bo_count = {
> > static void ttm_bo_default_destroy(struct ttm_buffer_object *bo)
> > {
> > kfree(bo);
> >+ bo = NULL;
> > }
> > static inline int ttm_mem_type_from_place(const struct ttm_place *place,
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed
2018-09-10 9:23 ` Huang Rui
@ 2018-09-10 9:25 ` Christian König
2018-09-10 11:58 ` Huang Rui
2018-09-10 12:05 ` Huang Rui
0 siblings, 2 replies; 13+ messages in thread
From: Christian König @ 2018-09-10 9:25 UTC (permalink / raw)
To: Huang Rui
Cc: Tom StDenis, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Am 10.09.2018 um 11:23 schrieb Huang Rui:
> On Mon, Sep 10, 2018 at 11:00:04AM +0200, Christian König wrote:
>> Hi Ray,
>>
>> well those patches doesn't make sense, the pointer is only local to
>> the function.
> You're right.
> I narrowed it with gdb dump from ttm_bo_bulk_move_lru_tail+0x2b, the
> use-after-free should be in below codes:
>
> man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
> ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
>
> Is there a case, when orignal bo is destroyed in the bulk pos, but it
> doesn't update pos->first pointer, then we still use it during the bulk
> moving?
Only when a per VM BO is freed or the VM destroyed.
The first case should now be handled by "drm/amdgpu: set bulk_moveable
to false when a per VM is released" and when we use a destroyed VM we
would see other problems as well.
BTW: Just pushed this commit to the repository, should show up any second.
Christian.
>
> Thanks,
> Ray
>
>> Regards,
>> Christian.
>>
>> Am 10.09.2018 um 10:57 schrieb Huang Rui:
>>> It avoids to be refered again after freed.
>>>
>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Tom StDenis <Tom.StDenis@amd.com>
>>> ---
>>> drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 138c989..d3ef5f8 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -54,6 +54,7 @@ static struct attribute ttm_bo_count = {
>>> static void ttm_bo_default_destroy(struct ttm_buffer_object *bo)
>>> {
>>> kfree(bo);
>>> + bo = NULL;
>>> }
>>> static inline int ttm_mem_type_from_place(const struct ttm_place *place,
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed
2018-09-10 9:25 ` Christian König
@ 2018-09-10 11:58 ` Huang Rui
2018-09-10 12:05 ` Huang Rui
1 sibling, 0 replies; 13+ messages in thread
From: Huang Rui @ 2018-09-10 11:58 UTC (permalink / raw)
To: Koenig, Christian; +Cc: StDenis, Tom, amd-gfx, dri-devel
On Mon, Sep 10, 2018 at 05:25:48PM +0800, Koenig, Christian wrote:
> Am 10.09.2018 um 11:23 schrieb Huang Rui:
> > On Mon, Sep 10, 2018 at 11:00:04AM +0200, Christian König wrote:
> >> Hi Ray,
> >>
> >> well those patches doesn't make sense, the pointer is only local to
> >> the function.
> > You're right.
> > I narrowed it with gdb dump from ttm_bo_bulk_move_lru_tail+0x2b, the
> > use-after-free should be in below codes:
> >
> > man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
> > ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
> >
> > Is there a case, when orignal bo is destroyed in the bulk pos, but it
> > doesn't update pos->first pointer, then we still use it during the bulk
> > moving?
>
> Only when a per VM BO is freed or the VM destroyed.
>
> The first case should now be handled by "drm/amdgpu: set bulk_moveable
> to false when a per VM is released" and when we use a destroyed VM we
> would see other problems as well.
When a VM instance is teared down, all BOs which belong to that VM should
be removed from LRU list. How can we use a destroyed VM when we submmit a
command? You know, we do the bulk moving at the last step of submission.
>
> BTW: Just pushed this commit to the repository, should show up any second.
I see, thanks.
Thanks,
Ray
>
> Christian.
>
> >
> > Thanks,
> > Ray
> >
> >> Regards,
> >> Christian.
> >>
> >> Am 10.09.2018 um 10:57 schrieb Huang Rui:
> >>> It avoids to be refered again after freed.
> >>>
> >>> Signed-off-by: Huang Rui <ray.huang@amd.com>
> >>> Cc: Christian König <christian.koenig@amd.com>
> >>> Cc: Tom StDenis <Tom.StDenis@amd.com>
> >>> ---
> >>> drivers/gpu/drm/ttm/ttm_bo.c | 1 +
> >>> 1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> >>> index 138c989..d3ef5f8 100644
> >>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> >>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> >>> @@ -54,6 +54,7 @@ static struct attribute ttm_bo_count = {
> >>> static void ttm_bo_default_destroy(struct ttm_buffer_object *bo)
> >>> {
> >>> kfree(bo);
> >>> + bo = NULL;
> >>> }
> >>> static inline int ttm_mem_type_from_place(const struct ttm_place *place,
> >> _______________________________________________
> >> amd-gfx mailing list
> >> amd-gfx@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed
2018-09-10 9:25 ` Christian König
2018-09-10 11:58 ` Huang Rui
@ 2018-09-10 12:05 ` Huang Rui
2018-09-10 12:49 ` Christian König
1 sibling, 1 reply; 13+ messages in thread
From: Huang Rui @ 2018-09-10 12:05 UTC (permalink / raw)
To: Koenig, Christian; +Cc: StDenis, Tom, amd-gfx, dri-devel
On Mon, Sep 10, 2018 at 05:25:48PM +0800, Koenig, Christian wrote:
> Am 10.09.2018 um 11:23 schrieb Huang Rui:
> > On Mon, Sep 10, 2018 at 11:00:04AM +0200, Christian König wrote:
> >> Hi Ray,
> >>
> >> well those patches doesn't make sense, the pointer is only local to
> >> the function.
> > You're right.
> > I narrowed it with gdb dump from ttm_bo_bulk_move_lru_tail+0x2b, the
> > use-after-free should be in below codes:
> >
> > man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
> > ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
> >
> > Is there a case, when orignal bo is destroyed in the bulk pos, but it
> > doesn't update pos->first pointer, then we still use it during the bulk
> > moving?
>
> Only when a per VM BO is freed or the VM destroyed.
>
> The first case should now be handled by "drm/amdgpu: set bulk_moveable
> to false when a per VM is released" and when we use a destroyed VM we
> would see other problems as well.
>
If a VM instance is teared down, all BOs which belong that VM should be
removed from LRU. But how can we submit cmd based on a destroyed VM? You
know, we do the bulk move at last step of submission.
Thanks,
Ray
> BTW: Just pushed this commit to the repository, should show up any second.
>
> Christian.
>
> >
> > Thanks,
> > Ray
> >
> >> Regards,
> >> Christian.
> >>
> >> Am 10.09.2018 um 10:57 schrieb Huang Rui:
> >>> It avoids to be refered again after freed.
> >>>
> >>> Signed-off-by: Huang Rui <ray.huang@amd.com>
> >>> Cc: Christian König <christian.koenig@amd.com>
> >>> Cc: Tom StDenis <Tom.StDenis@amd.com>
> >>> ---
> >>> drivers/gpu/drm/ttm/ttm_bo.c | 1 +
> >>> 1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> >>> index 138c989..d3ef5f8 100644
> >>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> >>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> >>> @@ -54,6 +54,7 @@ static struct attribute ttm_bo_count = {
> >>> static void ttm_bo_default_destroy(struct ttm_buffer_object *bo)
> >>> {
> >>> kfree(bo);
> >>> + bo = NULL;
> >>> }
> >>> static inline int ttm_mem_type_from_place(const struct ttm_place *place,
> >> _______________________________________________
> >> amd-gfx mailing list
> >> amd-gfx@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed
2018-09-10 12:05 ` Huang Rui
@ 2018-09-10 12:49 ` Christian König
[not found] ` <746c5afd-9264-7dfa-4e3c-425ea82453ff-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2018-09-10 12:49 UTC (permalink / raw)
To: Huang Rui, Koenig, Christian
Cc: StDenis, Tom, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Am 10.09.2018 um 14:05 schrieb Huang Rui:
> On Mon, Sep 10, 2018 at 05:25:48PM +0800, Koenig, Christian wrote:
>> Am 10.09.2018 um 11:23 schrieb Huang Rui:
>>> On Mon, Sep 10, 2018 at 11:00:04AM +0200, Christian König wrote:
>>>> Hi Ray,
>>>>
>>>> well those patches doesn't make sense, the pointer is only local to
>>>> the function.
>>> You're right.
>>> I narrowed it with gdb dump from ttm_bo_bulk_move_lru_tail+0x2b, the
>>> use-after-free should be in below codes:
>>>
>>> man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
>>> ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
>>>
>>> Is there a case, when orignal bo is destroyed in the bulk pos, but it
>>> doesn't update pos->first pointer, then we still use it during the bulk
>>> moving?
>> Only when a per VM BO is freed or the VM destroyed.
>>
>> The first case should now be handled by "drm/amdgpu: set bulk_moveable
>> to false when a per VM is released" and when we use a destroyed VM we
>> would see other problems as well.
>>
> If a VM instance is teared down, all BOs which belong that VM should be
> removed from LRU. But how can we submit cmd based on a destroyed VM? You
> know, we do the bulk move at last step of submission.
Well exactly that's the point this can't happen :)
Otherwise we would crash because of using freed up memory much earlier
in the command submission.
The best idea I have to track this down further is to add some
trace_printk in ttm_bo_bulk_move_helper and amdgpu_bo_destroy and see
why and when we are actually using a destroyed BO.
Christian.
>
>
> Thanks,
> Ray
>
>> BTW: Just pushed this commit to the repository, should show up any second.
>>
>> Christian.
>>
>>> Thanks,
>>> Ray
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 10.09.2018 um 10:57 schrieb Huang Rui:
>>>>> It avoids to be refered again after freed.
>>>>>
>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>> Cc: Tom StDenis <Tom.StDenis@amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>>>> 1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> index 138c989..d3ef5f8 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> @@ -54,6 +54,7 @@ static struct attribute ttm_bo_count = {
>>>>> static void ttm_bo_default_destroy(struct ttm_buffer_object *bo)
>>>>> {
>>>>> kfree(bo);
>>>>> + bo = NULL;
>>>>> }
>>>>> static inline int ttm_mem_type_from_place(const struct ttm_place *place,
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed
[not found] ` <746c5afd-9264-7dfa-4e3c-425ea82453ff-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-09-10 12:59 ` Tom St Denis
[not found] ` <382aab77-b9e2-1d18-ecd8-cd763227e5d0-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Tom St Denis @ 2018-09-10 12:59 UTC (permalink / raw)
To: christian.koenig-5C7GfCeVMHo, Huang Rui
Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Hi Christian,
Are you adding new traces or turning on existing ones? Would you like
me to try them out in my setup?
Tom
On 2018-09-10 8:49 a.m., Christian König wrote:
> Am 10.09.2018 um 14:05 schrieb Huang Rui:
>> On Mon, Sep 10, 2018 at 05:25:48PM +0800, Koenig, Christian wrote:
>>> Am 10.09.2018 um 11:23 schrieb Huang Rui:
>>>> On Mon, Sep 10, 2018 at 11:00:04AM +0200, Christian König wrote:
>>>>> Hi Ray,
>>>>>
>>>>> well those patches doesn't make sense, the pointer is only local to
>>>>> the function.
>>>> You're right.
>>>> I narrowed it with gdb dump from ttm_bo_bulk_move_lru_tail+0x2b, the
>>>> use-after-free should be in below codes:
>>>>
>>>> man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
>>>> ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
>>>>
>>>> Is there a case, when orignal bo is destroyed in the bulk pos, but it
>>>> doesn't update pos->first pointer, then we still use it during the bulk
>>>> moving?
>>> Only when a per VM BO is freed or the VM destroyed.
>>>
>>> The first case should now be handled by "drm/amdgpu: set bulk_moveable
>>> to false when a per VM is released" and when we use a destroyed VM we
>>> would see other problems as well.
>>>
>> If a VM instance is teared down, all BOs which belong that VM should be
>> removed from LRU. But how can we submit cmd based on a destroyed VM? You
>> know, we do the bulk move at last step of submission.
>
> Well exactly that's the point this can't happen :)
>
> Otherwise we would crash because of using freed up memory much earlier
> in the command submission.
>
> The best idea I have to track this down further is to add some
> trace_printk in ttm_bo_bulk_move_helper and amdgpu_bo_destroy and see
> why and when we are actually using a destroyed BO.
>
> Christian.
>
>>
>>
>> Thanks,
>> Ray
>>
>>> BTW: Just pushed this commit to the repository, should show up any
>>> second.
>>>
>>> Christian.
>>>
>>>> Thanks,
>>>> Ray
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 10.09.2018 um 10:57 schrieb Huang Rui:
>>>>>> It avoids to be refered again after freed.
>>>>>>
>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>>> Cc: Tom StDenis <Tom.StDenis@amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>>>>> 1 file changed, 1 insertion(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>> index 138c989..d3ef5f8 100644
>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>> @@ -54,6 +54,7 @@ static struct attribute ttm_bo_count = {
>>>>>> static void ttm_bo_default_destroy(struct ttm_buffer_object *bo)
>>>>>> {
>>>>>> kfree(bo);
>>>>>> + bo = NULL;
>>>>>> }
>>>>>> static inline int ttm_mem_type_from_place(const struct
>>>>>> ttm_place *place,
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed
[not found] ` <382aab77-b9e2-1d18-ecd8-cd763227e5d0-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-10 13:04 ` Christian König
[not found] ` <c26b96d6-a9b2-8609-4779-bf1f25402fe7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2018-09-10 13:04 UTC (permalink / raw)
To: Tom St Denis, christian.koenig-5C7GfCeVMHo, Huang Rui
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Hi Tom,
I'm talking about adding new printks to figure out what the heck is
going wrong here.
Thanks,
Christian.
Am 10.09.2018 um 14:59 schrieb Tom St Denis:
> Hi Christian,
>
> Are you adding new traces or turning on existing ones? Would you like
> me to try them out in my setup?
>
> Tom
>
> On 2018-09-10 8:49 a.m., Christian König wrote:
>> Am 10.09.2018 um 14:05 schrieb Huang Rui:
>>> On Mon, Sep 10, 2018 at 05:25:48PM +0800, Koenig, Christian wrote:
>>>> Am 10.09.2018 um 11:23 schrieb Huang Rui:
>>>>> On Mon, Sep 10, 2018 at 11:00:04AM +0200, Christian König wrote:
>>>>>> Hi Ray,
>>>>>>
>>>>>> well those patches doesn't make sense, the pointer is only local to
>>>>>> the function.
>>>>> You're right.
>>>>> I narrowed it with gdb dump from ttm_bo_bulk_move_lru_tail+0x2b, the
>>>>> use-after-free should be in below codes:
>>>>>
>>>>> man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
>>>>> ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
>>>>>
>>>>> Is there a case, when orignal bo is destroyed in the bulk pos, but it
>>>>> doesn't update pos->first pointer, then we still use it during the
>>>>> bulk
>>>>> moving?
>>>> Only when a per VM BO is freed or the VM destroyed.
>>>>
>>>> The first case should now be handled by "drm/amdgpu: set bulk_moveable
>>>> to false when a per VM is released" and when we use a destroyed VM we
>>>> would see other problems as well.
>>>>
>>> If a VM instance is teared down, all BOs which belong that VM should be
>>> removed from LRU. But how can we submit cmd based on a destroyed VM?
>>> You
>>> know, we do the bulk move at last step of submission.
>>
>> Well exactly that's the point this can't happen :)
>>
>> Otherwise we would crash because of using freed up memory much
>> earlier in the command submission.
>>
>> The best idea I have to track this down further is to add some
>> trace_printk in ttm_bo_bulk_move_helper and amdgpu_bo_destroy and see
>> why and when we are actually using a destroyed BO.
>>
>> Christian.
>>
>>>
>>>
>>> Thanks,
>>> Ray
>>>
>>>> BTW: Just pushed this commit to the repository, should show up any
>>>> second.
>>>>
>>>> Christian.
>>>>
>>>>> Thanks,
>>>>> Ray
>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>> Am 10.09.2018 um 10:57 schrieb Huang Rui:
>>>>>>> It avoids to be refered again after freed.
>>>>>>>
>>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>>>> Cc: Tom StDenis <Tom.StDenis@amd.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>> index 138c989..d3ef5f8 100644
>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>> @@ -54,6 +54,7 @@ static struct attribute ttm_bo_count = {
>>>>>>> static void ttm_bo_default_destroy(struct ttm_buffer_object *bo)
>>>>>>> {
>>>>>>> kfree(bo);
>>>>>>> + bo = NULL;
>>>>>>> }
>>>>>>> static inline int ttm_mem_type_from_place(const struct
>>>>>>> ttm_place *place,
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed
[not found] ` <c26b96d6-a9b2-8609-4779-bf1f25402fe7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-09-10 13:05 ` Tom St Denis
[not found] ` <1e1de90f-dc8c-4395-2375-3351f06d80b4-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Tom St Denis @ 2018-09-10 13:05 UTC (permalink / raw)
To: christian.koenig-5C7GfCeVMHo, Huang Rui
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On 2018-09-10 9:04 a.m., Christian König wrote:
> Hi Tom,
>
> I'm talking about adding new printks to figure out what the heck is
> going wrong here.
>
> Thanks,
> Christian.
Hi Christian,
Sure, if you want to send me a simple patch that adds more printk I'll
gladly give it a try (doubly so since my workstation depends on our
staging tree to work properly...).
Tom
>
> Am 10.09.2018 um 14:59 schrieb Tom St Denis:
>> Hi Christian,
>>
>> Are you adding new traces or turning on existing ones? Would you like
>> me to try them out in my setup?
>>
>> Tom
>>
>> On 2018-09-10 8:49 a.m., Christian König wrote:
>>> Am 10.09.2018 um 14:05 schrieb Huang Rui:
>>>> On Mon, Sep 10, 2018 at 05:25:48PM +0800, Koenig, Christian wrote:
>>>>> Am 10.09.2018 um 11:23 schrieb Huang Rui:
>>>>>> On Mon, Sep 10, 2018 at 11:00:04AM +0200, Christian König wrote:
>>>>>>> Hi Ray,
>>>>>>>
>>>>>>> well those patches doesn't make sense, the pointer is only local to
>>>>>>> the function.
>>>>>> You're right.
>>>>>> I narrowed it with gdb dump from ttm_bo_bulk_move_lru_tail+0x2b, the
>>>>>> use-after-free should be in below codes:
>>>>>>
>>>>>> man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
>>>>>> ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
>>>>>>
>>>>>> Is there a case, when orignal bo is destroyed in the bulk pos, but it
>>>>>> doesn't update pos->first pointer, then we still use it during the
>>>>>> bulk
>>>>>> moving?
>>>>> Only when a per VM BO is freed or the VM destroyed.
>>>>>
>>>>> The first case should now be handled by "drm/amdgpu: set bulk_moveable
>>>>> to false when a per VM is released" and when we use a destroyed VM we
>>>>> would see other problems as well.
>>>>>
>>>> If a VM instance is teared down, all BOs which belong that VM should be
>>>> removed from LRU. But how can we submit cmd based on a destroyed VM?
>>>> You
>>>> know, we do the bulk move at last step of submission.
>>>
>>> Well exactly that's the point this can't happen :)
>>>
>>> Otherwise we would crash because of using freed up memory much
>>> earlier in the command submission.
>>>
>>> The best idea I have to track this down further is to add some
>>> trace_printk in ttm_bo_bulk_move_helper and amdgpu_bo_destroy and see
>>> why and when we are actually using a destroyed BO.
>>>
>>> Christian.
>>>
>>>>
>>>>
>>>> Thanks,
>>>> Ray
>>>>
>>>>> BTW: Just pushed this commit to the repository, should show up any
>>>>> second.
>>>>>
>>>>> Christian.
>>>>>
>>>>>> Thanks,
>>>>>> Ray
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 10.09.2018 um 10:57 schrieb Huang Rui:
>>>>>>>> It avoids to be refered again after freed.
>>>>>>>>
>>>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>>>>> Cc: Tom StDenis <Tom.StDenis@amd.com>
>>>>>>>> ---
>>>>>>>> drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>> index 138c989..d3ef5f8 100644
>>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>> @@ -54,6 +54,7 @@ static struct attribute ttm_bo_count = {
>>>>>>>> static void ttm_bo_default_destroy(struct ttm_buffer_object *bo)
>>>>>>>> {
>>>>>>>> kfree(bo);
>>>>>>>> + bo = NULL;
>>>>>>>> }
>>>>>>>> static inline int ttm_mem_type_from_place(const struct
>>>>>>>> ttm_place *place,
>>>>>>> _______________________________________________
>>>>>>> amd-gfx mailing list
>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed
[not found] ` <1e1de90f-dc8c-4395-2375-3351f06d80b4-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-10 13:10 ` Christian König
[not found] ` <6d66a7c9-5904-2b23-4a65-f4995753009d-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2018-09-10 13:10 UTC (permalink / raw)
To: Tom St Denis, Huang Rui
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Am 10.09.2018 um 15:05 schrieb Tom St Denis:
> On 2018-09-10 9:04 a.m., Christian König wrote:
>> Hi Tom,
>>
>> I'm talking about adding new printks to figure out what the heck is
>> going wrong here.
>>
>> Thanks,
>> Christian.
>
> Hi Christian,
>
> Sure, if you want to send me a simple patch that adds more printk I'll
> gladly give it a try (doubly so since my workstation depends on our
> staging tree to work properly...).
Just add a printk to ttm_bo_bulk_move_helper to print pos->first and
pos->last.
And another one to amdgpu_bo_destroy to printk the value of tbo.
Christian.
>
> Tom
>
>>
>> Am 10.09.2018 um 14:59 schrieb Tom St Denis:
>>> Hi Christian,
>>>
>>> Are you adding new traces or turning on existing ones? Would you
>>> like me to try them out in my setup?
>>>
>>> Tom
>>>
>>> On 2018-09-10 8:49 a.m., Christian König wrote:
>>>> Am 10.09.2018 um 14:05 schrieb Huang Rui:
>>>>> On Mon, Sep 10, 2018 at 05:25:48PM +0800, Koenig, Christian wrote:
>>>>>> Am 10.09.2018 um 11:23 schrieb Huang Rui:
>>>>>>> On Mon, Sep 10, 2018 at 11:00:04AM +0200, Christian König wrote:
>>>>>>>> Hi Ray,
>>>>>>>>
>>>>>>>> well those patches doesn't make sense, the pointer is only
>>>>>>>> local to
>>>>>>>> the function.
>>>>>>> You're right.
>>>>>>> I narrowed it with gdb dump from ttm_bo_bulk_move_lru_tail+0x2b,
>>>>>>> the
>>>>>>> use-after-free should be in below codes:
>>>>>>>
>>>>>>> man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
>>>>>>> ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
>>>>>>>
>>>>>>> Is there a case, when orignal bo is destroyed in the bulk pos,
>>>>>>> but it
>>>>>>> doesn't update pos->first pointer, then we still use it during
>>>>>>> the bulk
>>>>>>> moving?
>>>>>> Only when a per VM BO is freed or the VM destroyed.
>>>>>>
>>>>>> The first case should now be handled by "drm/amdgpu: set
>>>>>> bulk_moveable
>>>>>> to false when a per VM is released" and when we use a destroyed
>>>>>> VM we
>>>>>> would see other problems as well.
>>>>>>
>>>>> If a VM instance is teared down, all BOs which belong that VM
>>>>> should be
>>>>> removed from LRU. But how can we submit cmd based on a destroyed
>>>>> VM? You
>>>>> know, we do the bulk move at last step of submission.
>>>>
>>>> Well exactly that's the point this can't happen :)
>>>>
>>>> Otherwise we would crash because of using freed up memory much
>>>> earlier in the command submission.
>>>>
>>>> The best idea I have to track this down further is to add some
>>>> trace_printk in ttm_bo_bulk_move_helper and amdgpu_bo_destroy and
>>>> see why and when we are actually using a destroyed BO.
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Ray
>>>>>
>>>>>> BTW: Just pushed this commit to the repository, should show up
>>>>>> any second.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> Thanks,
>>>>>>> Ray
>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Am 10.09.2018 um 10:57 schrieb Huang Rui:
>>>>>>>>> It avoids to be refered again after freed.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>>>>>> Cc: Tom StDenis <Tom.StDenis@amd.com>
>>>>>>>>> ---
>>>>>>>>> drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>> index 138c989..d3ef5f8 100644
>>>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>> @@ -54,6 +54,7 @@ static struct attribute ttm_bo_count = {
>>>>>>>>> static void ttm_bo_default_destroy(struct ttm_buffer_object
>>>>>>>>> *bo)
>>>>>>>>> {
>>>>>>>>> kfree(bo);
>>>>>>>>> + bo = NULL;
>>>>>>>>> }
>>>>>>>>> static inline int ttm_mem_type_from_place(const struct
>>>>>>>>> ttm_place *place,
>>>>>>>> _______________________________________________
>>>>>>>> amd-gfx mailing list
>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed
[not found] ` <6d66a7c9-5904-2b23-4a65-f4995753009d-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-11 7:34 ` Huang Rui
0 siblings, 0 replies; 13+ messages in thread
From: Huang Rui @ 2018-09-11 7:34 UTC (permalink / raw)
To: Koenig, Christian
Cc: StDenis, Tom, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On Mon, Sep 10, 2018 at 09:10:00PM +0800, Koenig, Christian wrote:
> Am 10.09.2018 um 15:05 schrieb Tom St Denis:
> > On 2018-09-10 9:04 a.m., Christian König wrote:
> >> Hi Tom,
> >>
> >> I'm talking about adding new printks to figure out what the heck is
> >> going wrong here.
> >>
> >> Thanks,
> >> Christian.
> >
> > Hi Christian,
> >
> > Sure, if you want to send me a simple patch that adds more printk I'll
> > gladly give it a try (doubly so since my workstation depends on our
> > staging tree to work properly...).
>
> Just add a printk to ttm_bo_bulk_move_helper to print pos->first and
> pos->last.
>
> And another one to amdgpu_bo_destroy to printk the value of tbo.
>
Hi Tom,
Could you help to add below traces to check when the bo is freed.
8<---
From 919cabfbf4d202876a510cd51caa9c86cf7c8fd5 Mon Sep 17 00:00:00 2001
From: Huang Rui <ray.huang@amd.com>
Date: Tue, 11 Sep 2018 15:24:27 +0800
Subject: [PATCH] drm/amdgpu: add traces for lru bulk move
Signed-off-by: Huang Rui <ray.huang@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 47 ++++++++++++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 +
3 files changed, 50 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index de990bd..ce28326 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -89,6 +89,8 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo)
struct amdgpu_device *adev = amdgpu_ttm_adev(tbo->bdev);
struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);
+ trace_amdgpu_bo_destroy(bo);
+
if (bo->pin_count > 0)
amdgpu_bo_subtract_pin_size(bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index 2e87414..5d93431 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -383,6 +383,53 @@ TRACE_EVENT(amdgpu_vm_flush,
__entry->vm_hub,__entry->pd_addr)
);
+TRACE_EVENT(amdgpu_vm_lru_bulk_move,
+ TP_PROTO(struct amdgpu_vm *vm,
+ struct amdgpu_bo *bo),
+ TP_ARGS(vm, bo),
+ TP_STRUCT__entry(
+ __field(struct amdgpu_bo *, bo)
+ __field(u32, mem_type)
+ __field(struct ttm_buffer_object *, tt_first)
+ __field(struct ttm_buffer_object *, tt_last)
+ __field(struct ttm_buffer_object *, vram_first)
+ __field(struct ttm_buffer_object *, vram_last)
+ __field(struct ttm_buffer_object *, swap_first)
+ __field(struct ttm_buffer_object *, swap_last)
+ ),
+
+ TP_fast_assign(
+ __entry->bo = bo;
+ __entry->mem_type = bo->tbo.mem.mem_type;
+ __entry->tt_first = vm->lru_bulk_move.tt[bo->tbo.priority].first;
+ __entry->tt_last = vm->lru_bulk_move.tt[bo->tbo.priority].last;
+ __entry->vram_first = vm->lru_bulk_move.vram[bo->tbo.priority].first;
+ __entry->vram_last = vm->lru_bulk_move.vram[bo->tbo.priority].last;
+ __entry->swap_first = vm->lru_bulk_move.swap[bo->tbo.priority].first;
+ __entry->swap_last = vm->lru_bulk_move.swap[bo->tbo.priority].last;
+ ),
+ TP_printk("bo=%p, mem_type=%d, tt_first=%p, tt_last=%p, vram_first=%p, vram_last=%p, swap_first=%p, swap_last=%p",
+ __entry->bo, __entry->mem_type,
+ __entry->tt_first, __entry->tt_last,
+ __entry->vram_first, __entry->vram_last,
+ __entry->swap_first, __entry->swap_last)
+);
+
+TRACE_EVENT(amdgpu_bo_destroy,
+ TP_PROTO(struct amdgpu_bo *bo),
+ TP_ARGS(bo),
+ TP_STRUCT__entry(
+ __field(struct amdgpu_bo *, bo)
+ __field(struct ttm_buffer_object *, tbo)
+ ),
+
+ TP_fast_assign(
+ __entry->bo = bo;
+ __entry->tbo = &bo->tbo;
+ ),
+ TP_printk("bo=%p, tbo=%p", __entry->bo, __entry->tbo)
+);
+
DECLARE_EVENT_CLASS(amdgpu_pasid,
TP_PROTO(unsigned pasid),
TP_ARGS(pasid),
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ab95a9c..351bc58 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -391,6 +391,7 @@ void amdgpu_vm_move_to_lru_tail(struct amdgpu_device *adev,
continue;
ttm_bo_move_to_lru_tail(&bo->tbo, &vm->lru_bulk_move);
+ trace_amdgpu_vm_lru_bulk_move(vm, bo);
if (bo->shadow)
ttm_bo_move_to_lru_tail(&bo->shadow->tbo,
&vm->lru_bulk_move);
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-09-11 7:34 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-10 8:57 [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed Huang Rui
[not found] ` <1536569876-27262-1-git-send-email-ray.huang-5C7GfCeVMHo@public.gmane.org>
2018-09-10 8:57 ` [PATCH 2/2] drm/amdgpu: set tbo pointer as null after amdgpu bo is freed Huang Rui
2018-09-10 9:00 ` [PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed Christian König
[not found] ` <d40314d2-f1c4-b353-bda6-50aed5dda8db-5C7GfCeVMHo@public.gmane.org>
2018-09-10 9:23 ` Huang Rui
2018-09-10 9:25 ` Christian König
2018-09-10 11:58 ` Huang Rui
2018-09-10 12:05 ` Huang Rui
2018-09-10 12:49 ` Christian König
[not found] ` <746c5afd-9264-7dfa-4e3c-425ea82453ff-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-09-10 12:59 ` Tom St Denis
[not found] ` <382aab77-b9e2-1d18-ecd8-cd763227e5d0-5C7GfCeVMHo@public.gmane.org>
2018-09-10 13:04 ` Christian König
[not found] ` <c26b96d6-a9b2-8609-4779-bf1f25402fe7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-09-10 13:05 ` Tom St Denis
[not found] ` <1e1de90f-dc8c-4395-2375-3351f06d80b4-5C7GfCeVMHo@public.gmane.org>
2018-09-10 13:10 ` Christian König
[not found] ` <6d66a7c9-5904-2b23-4a65-f4995753009d-5C7GfCeVMHo@public.gmane.org>
2018-09-11 7:34 ` Huang Rui
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.