* [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout
@ 2022-07-14 21:58 Felix Kuehling
2022-07-15 6:46 ` Christian König
0 siblings, 1 reply; 7+ messages in thread
From: Felix Kuehling @ 2022-07-14 21:58 UTC (permalink / raw)
To: amd-gfx; +Cc: christian.koenig
Backport of Christian's patch 81b0d0e4f811 to amd-staging-drm-next. This
branch may be nearly obsolete, but this patch may still be worth
applying as it can serve as a template for backports to some release
branches. It fixes intermittent kernel oopses when memory is severely
overcommitted.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
drivers/gpu/drm/ttm/ttm_device.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index be24bb6cefd0..165a6cbb45d5 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -157,6 +157,9 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
list_for_each_entry(bo, &man->lru[j], lru) {
uint32_t num_pages = PFN_UP(bo->base.size);
+ if (!bo->resource)
+ continue;
+
ret = ttm_bo_swapout(bo, ctx, gfp_flags);
/* ttm_bo_swapout has dropped the lru_lock */
if (!ret)
--
2.32.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout
2022-07-14 21:58 [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout Felix Kuehling
@ 2022-07-15 6:46 ` Christian König
0 siblings, 0 replies; 7+ messages in thread
From: Christian König @ 2022-07-15 6:46 UTC (permalink / raw)
To: Felix Kuehling, amd-gfx
Am 14.07.22 um 23:58 schrieb Felix Kuehling:
> Backport of Christian's patch 81b0d0e4f811 to amd-staging-drm-next. This
> branch may be nearly obsolete, but this patch may still be worth
> applying as it can serve as a template for backports to some release
> branches. It fixes intermittent kernel oopses when memory is severely
> overcommitted.
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
I was hoping that Alex rebase would land before anybody notices this
problem.
Anyway patch is Reviewed-by: Christian König <christian.koenig@amd.com>.
Regards,
Christian.
> ---
> drivers/gpu/drm/ttm/ttm_device.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index be24bb6cefd0..165a6cbb45d5 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -157,6 +157,9 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
> list_for_each_entry(bo, &man->lru[j], lru) {
> uint32_t num_pages = PFN_UP(bo->base.size);
>
> + if (!bo->resource)
> + continue;
> +
> ret = ttm_bo_swapout(bo, ctx, gfp_flags);
> /* ttm_bo_swapout has dropped the lru_lock */
> if (!ret)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout
@ 2022-06-03 10:46 Christian König
2022-06-03 19:37 ` Felix Kuehling
2022-06-06 13:15 ` Felix Kuehling
0 siblings, 2 replies; 7+ messages in thread
From: Christian König @ 2022-06-03 10:46 UTC (permalink / raw)
To: mike, dri-devel; +Cc: Christian König
Resources about to be destructed are not tied to BOs any more.
Signed-off-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/ttm/ttm_device.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index a0562ab386f5..e7147e304637 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -156,8 +156,12 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
ttm_resource_manager_for_each_res(man, &cursor, res) {
struct ttm_buffer_object *bo = res->bo;
- uint32_t num_pages = PFN_UP(bo->base.size);
+ uint32_t num_pages;
+ if (!bo)
+ continue;
+
+ num_pages = PFN_UP(bo->base.size);
ret = ttm_bo_swapout(bo, ctx, gfp_flags);
/* ttm_bo_swapout has dropped the lru_lock */
if (!ret)
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout
2022-06-03 10:46 Christian König
@ 2022-06-03 19:37 ` Felix Kuehling
2022-06-03 22:44 ` Felix Kuehling
2022-06-06 13:15 ` Felix Kuehling
1 sibling, 1 reply; 7+ messages in thread
From: Felix Kuehling @ 2022-06-03 19:37 UTC (permalink / raw)
To: Christian König, mike, dri-devel; +Cc: Christian König
On 2022-06-03 06:46, Christian König wrote:
> Resources about to be destructed are not tied to BOs any more.
I've been seeing a backtrace in that area with a patch series I'm
working on, but didn't have enough time to track it down yet. I'll try
if this patch fixes it.
Regards,
Felix
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/ttm/ttm_device.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index a0562ab386f5..e7147e304637 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -156,8 +156,12 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
>
> ttm_resource_manager_for_each_res(man, &cursor, res) {
> struct ttm_buffer_object *bo = res->bo;
> - uint32_t num_pages = PFN_UP(bo->base.size);
> + uint32_t num_pages;
>
> + if (!bo)
> + continue;
> +
> + num_pages = PFN_UP(bo->base.size);
> ret = ttm_bo_swapout(bo, ctx, gfp_flags);
> /* ttm_bo_swapout has dropped the lru_lock */
> if (!ret)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout
2022-06-03 19:37 ` Felix Kuehling
@ 2022-06-03 22:44 ` Felix Kuehling
2022-06-06 10:20 ` Christian König
0 siblings, 1 reply; 7+ messages in thread
From: Felix Kuehling @ 2022-06-03 22:44 UTC (permalink / raw)
To: Christian König, mike, dri-devel, amd-gfx; +Cc: Christian König
[+amd-gfx]
On 2022-06-03 15:37, Felix Kuehling wrote:
>
> On 2022-06-03 06:46, Christian König wrote:
>> Resources about to be destructed are not tied to BOs any more.
> I've been seeing a backtrace in that area with a patch series I'm
> working on, but didn't have enough time to track it down yet. I'll try
> if this patch fixes it.
The patch doesn't apply on amd-staging-drm-next. I made the following
change instead, which fixes my problem (and I do see the pr_err being
triggered):
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -157,6 +157,10 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
list_for_each_entry(bo, &man->lru[j], lru) {
uint32_t num_pages = PFN_UP(bo->base.size);
+ if (!bo->resource) {
+ pr_err("### bo->resource is NULL\n");
+ continue;
+ }
ret = ttm_bo_swapout(bo, ctx, gfp_flags);
/* ttm_bo_swapout has dropped the lru_lock */
if (!ret)
>
> Regards,
> Felix
>
>
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>> drivers/gpu/drm/ttm/ttm_device.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c
>> b/drivers/gpu/drm/ttm/ttm_device.c
>> index a0562ab386f5..e7147e304637 100644
>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>> @@ -156,8 +156,12 @@ int ttm_device_swapout(struct ttm_device *bdev,
>> struct ttm_operation_ctx *ctx,
>> ttm_resource_manager_for_each_res(man, &cursor, res) {
>> struct ttm_buffer_object *bo = res->bo;
>> - uint32_t num_pages = PFN_UP(bo->base.size);
>> + uint32_t num_pages;
>> + if (!bo)
>> + continue;
>> +
>> + num_pages = PFN_UP(bo->base.size);
>> ret = ttm_bo_swapout(bo, ctx, gfp_flags);
>> /* ttm_bo_swapout has dropped the lru_lock */
>> if (!ret)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout
2022-06-03 22:44 ` Felix Kuehling
@ 2022-06-06 10:20 ` Christian König
0 siblings, 0 replies; 7+ messages in thread
From: Christian König @ 2022-06-06 10:20 UTC (permalink / raw)
To: Felix Kuehling, mike, dri-devel, amd-gfx; +Cc: Christian König
Am 04.06.22 um 00:44 schrieb Felix Kuehling:
> [+amd-gfx]
>
>
> On 2022-06-03 15:37, Felix Kuehling wrote:
>>
>> On 2022-06-03 06:46, Christian König wrote:
>>> Resources about to be destructed are not tied to BOs any more.
>> I've been seeing a backtrace in that area with a patch series I'm
>> working on, but didn't have enough time to track it down yet. I'll
>> try if this patch fixes it.
>
> The patch doesn't apply on amd-staging-drm-next. I made the following
> change instead, which fixes my problem (and I do see the pr_err being
> triggered):
>
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -157,6 +157,10 @@ int ttm_device_swapout(struct ttm_device *bdev,
> struct ttm_operation_ctx *ctx,
> list_for_each_entry(bo, &man->lru[j], lru) {
> uint32_t num_pages =
> PFN_UP(bo->base.size);
>
> + if (!bo->resource) {
> + pr_err("### bo->resource is
> NULL\n");
> + continue;
> + }
Yeah, that should be functional identical.
Can I get an rb for that? Going to provide backports to older kernels as
well then.
Regards,
Christian.
> ret = ttm_bo_swapout(bo, ctx, gfp_flags);
> /* ttm_bo_swapout has dropped the
> lru_lock */
> if (!ret)
>
>>
>> Regards,
>> Felix
>>
>>
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>> drivers/gpu/drm/ttm/ttm_device.c | 6 +++++-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c
>>> b/drivers/gpu/drm/ttm/ttm_device.c
>>> index a0562ab386f5..e7147e304637 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>>> @@ -156,8 +156,12 @@ int ttm_device_swapout(struct ttm_device *bdev,
>>> struct ttm_operation_ctx *ctx,
>>> ttm_resource_manager_for_each_res(man, &cursor, res) {
>>> struct ttm_buffer_object *bo = res->bo;
>>> - uint32_t num_pages = PFN_UP(bo->base.size);
>>> + uint32_t num_pages;
>>> + if (!bo)
>>> + continue;
>>> +
>>> + num_pages = PFN_UP(bo->base.size);
>>> ret = ttm_bo_swapout(bo, ctx, gfp_flags);
>>> /* ttm_bo_swapout has dropped the lru_lock */
>>> if (!ret)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout
2022-06-03 10:46 Christian König
2022-06-03 19:37 ` Felix Kuehling
@ 2022-06-06 13:15 ` Felix Kuehling
1 sibling, 0 replies; 7+ messages in thread
From: Felix Kuehling @ 2022-06-06 13:15 UTC (permalink / raw)
To: Christian König, mike, dri-devel, amd-gfx list; +Cc: Christian König
Am 2022-06-03 um 06:46 schrieb Christian König:
> Resources about to be destructed are not tied to BOs any more.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
> drivers/gpu/drm/ttm/ttm_device.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index a0562ab386f5..e7147e304637 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -156,8 +156,12 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
>
> ttm_resource_manager_for_each_res(man, &cursor, res) {
> struct ttm_buffer_object *bo = res->bo;
> - uint32_t num_pages = PFN_UP(bo->base.size);
> + uint32_t num_pages;
>
> + if (!bo)
> + continue;
> +
> + num_pages = PFN_UP(bo->base.size);
> ret = ttm_bo_swapout(bo, ctx, gfp_flags);
> /* ttm_bo_swapout has dropped the lru_lock */
> if (!ret)
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-07-15 6:46 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14 21:58 [PATCH] drm/ttm: fix missing NULL check in ttm_device_swapout Felix Kuehling
2022-07-15 6:46 ` Christian König
-- strict thread matches above, loose matches on Subject: below --
2022-06-03 10:46 Christian König
2022-06-03 19:37 ` Felix Kuehling
2022-06-03 22:44 ` Felix Kuehling
2022-06-06 10:20 ` Christian König
2022-06-06 13:15 ` Felix Kuehling
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.