* Oops in qxl_bo_move_notify()
@ 2021-07-07 16:36 Roberto Sassu
2021-07-08 10:14 ` Daniel Vetter
0 siblings, 1 reply; 3+ messages in thread
From: Roberto Sassu @ 2021-07-07 16:36 UTC (permalink / raw)
To: dri-devel
Hi
I'm getting this oops (on commit a180bd1d7e16):
[ 17.711520] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ 17.739451] RIP: 0010:qxl_bo_move_notify+0x35/0x80 [qxl]
[ 17.827345] RSP: 0018:ffffc90000457c08 EFLAGS: 00010286
[ 17.827350] RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
[ 17.827353] RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffffffff85596feb
[ 17.827356] RBP: ffff88800e311c00 R08: 0000000000000000 R09: 0000000000000000
[ 17.827358] R10: ffffffff8697b243 R11: fffffbfff0d2f648 R12: 0000000000000000
[ 17.827361] R13: ffff88800e311e48 R14: ffff88800e311e98 R15: ffff88800e311e90
[ 17.827364] FS: 0000000000000000(0000) GS:ffff88805d800000(0000) knlGS:0000000000000000
[ 17.861699] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 17.861703] CR2: 0000000000000010 CR3: 000000002642c000 CR4: 0000000000350ee0
[ 17.861707] Call Trace:
[ 17.861712] ttm_bo_cleanup_memtype_use+0x4d/0xb0 [ttm]
[ 17.861730] ttm_bo_release+0x42d/0x7c0 [ttm]
[ 17.861746] ? ttm_bo_cleanup_refs+0x127/0x420 [ttm]
[ 17.888300] ttm_bo_delayed_delete+0x289/0x390 [ttm]
[ 17.888317] ? ttm_bo_cleanup_refs+0x420/0x420 [ttm]
[ 17.888332] ? lock_release+0x9c/0x5c0
[ 17.901033] ? rcu_read_lock_held_common+0x1a/0x50
[ 17.905183] ttm_device_delayed_workqueue+0x18/0x50 [ttm]
[ 17.909371] process_one_work+0x537/0x9f0
[ 17.913345] ? pwq_dec_nr_in_flight+0x160/0x160
[ 17.917297] ? lock_acquired+0xa4/0x580
[ 17.921168] ? worker_thread+0x169/0x600
[ 17.925034] worker_thread+0x7a/0x600
[ 17.928657] ? process_one_work+0x9f0/0x9f0
[ 17.932360] kthread+0x200/0x230
[ 17.935930] ? set_kthread_struct+0x80/0x80
[ 17.939593] ret_from_fork+0x22/0x30
[ 17.951737] CR2: 0000000000000010
[ 17.955496] ---[ end trace e30cc21c24e81ee5 ]---
I had a look at the code, and it seems that this is caused by
trying to use bo->resource which is NULL.
bo->resource is freed by ttm_bo_cleanup_refs() ->
ttm_bo_cleanup_memtype_use() -> ttm_resource_free().
And then a notification is issued by ttm_bo_cleanup_refs() ->
ttm_bo_put() -> ttm_bo_release() ->
ttm_bo_cleanup_memtype_use(), this time with bo->release
equal to NULL.
I was thinking a proper way to fix this. Checking that
bo->release is not NULL in qxl_bo_move_notify() would
solve the issue. But maybe there is a better way, like
avoiding that ttm_bo_cleanup_memtype_use() is called
twice. Which way would be preferable?
Thanks
Roberto
HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
Managing Director: Li Peng, Li Jian, Shi Yanli
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Oops in qxl_bo_move_notify()
2021-07-07 16:36 Oops in qxl_bo_move_notify() Roberto Sassu
@ 2021-07-08 10:14 ` Daniel Vetter
2021-07-08 11:30 ` Christian König
0 siblings, 1 reply; 3+ messages in thread
From: Daniel Vetter @ 2021-07-08 10:14 UTC (permalink / raw)
To: Roberto Sassu, Christian König, Dave Airlie; +Cc: dri-devel
On Wed, Jul 07, 2021 at 04:36:49PM +0000, Roberto Sassu wrote:
> Hi
>
> I'm getting this oops (on commit a180bd1d7e16):
>
> [ 17.711520] BUG: kernel NULL pointer dereference, address: 0000000000000010
> [ 17.739451] RIP: 0010:qxl_bo_move_notify+0x35/0x80 [qxl]
> [ 17.827345] RSP: 0018:ffffc90000457c08 EFLAGS: 00010286
> [ 17.827350] RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
> [ 17.827353] RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffffffff85596feb
> [ 17.827356] RBP: ffff88800e311c00 R08: 0000000000000000 R09: 0000000000000000
> [ 17.827358] R10: ffffffff8697b243 R11: fffffbfff0d2f648 R12: 0000000000000000
> [ 17.827361] R13: ffff88800e311e48 R14: ffff88800e311e98 R15: ffff88800e311e90
> [ 17.827364] FS: 0000000000000000(0000) GS:ffff88805d800000(0000) knlGS:0000000000000000
> [ 17.861699] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 17.861703] CR2: 0000000000000010 CR3: 000000002642c000 CR4: 0000000000350ee0
> [ 17.861707] Call Trace:
> [ 17.861712] ttm_bo_cleanup_memtype_use+0x4d/0xb0 [ttm]
> [ 17.861730] ttm_bo_release+0x42d/0x7c0 [ttm]
> [ 17.861746] ? ttm_bo_cleanup_refs+0x127/0x420 [ttm]
> [ 17.888300] ttm_bo_delayed_delete+0x289/0x390 [ttm]
> [ 17.888317] ? ttm_bo_cleanup_refs+0x420/0x420 [ttm]
> [ 17.888332] ? lock_release+0x9c/0x5c0
> [ 17.901033] ? rcu_read_lock_held_common+0x1a/0x50
> [ 17.905183] ttm_device_delayed_workqueue+0x18/0x50 [ttm]
> [ 17.909371] process_one_work+0x537/0x9f0
> [ 17.913345] ? pwq_dec_nr_in_flight+0x160/0x160
> [ 17.917297] ? lock_acquired+0xa4/0x580
> [ 17.921168] ? worker_thread+0x169/0x600
> [ 17.925034] worker_thread+0x7a/0x600
> [ 17.928657] ? process_one_work+0x9f0/0x9f0
> [ 17.932360] kthread+0x200/0x230
> [ 17.935930] ? set_kthread_struct+0x80/0x80
> [ 17.939593] ret_from_fork+0x22/0x30
> [ 17.951737] CR2: 0000000000000010
> [ 17.955496] ---[ end trace e30cc21c24e81ee5 ]---
>
> I had a look at the code, and it seems that this is caused by
> trying to use bo->resource which is NULL.
>
> bo->resource is freed by ttm_bo_cleanup_refs() ->
> ttm_bo_cleanup_memtype_use() -> ttm_resource_free().
>
> And then a notification is issued by ttm_bo_cleanup_refs() ->
> ttm_bo_put() -> ttm_bo_release() ->
> ttm_bo_cleanup_memtype_use(), this time with bo->release
> equal to NULL.
>
> I was thinking a proper way to fix this. Checking that
> bo->release is not NULL in qxl_bo_move_notify() would
> solve the issue. But maybe there is a better way, like
> avoiding that ttm_bo_cleanup_memtype_use() is called
> twice. Which way would be preferable?
Adding Christian and Dave, who've touched all this recently iirc.
-Daniel
>
> Thanks
>
> Roberto
>
> HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
> Managing Director: Li Peng, Li Jian, Shi Yanli
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Oops in qxl_bo_move_notify()
2021-07-08 10:14 ` Daniel Vetter
@ 2021-07-08 11:30 ` Christian König
0 siblings, 0 replies; 3+ messages in thread
From: Christian König @ 2021-07-08 11:30 UTC (permalink / raw)
To: Daniel Vetter, Roberto Sassu, Dave Airlie; +Cc: dri-devel
Yeah, that's an already known issue.
When the allocation fails bo->resource might be NULL now and we need to
add checks for that corner case as well.
Christian.
Am 08.07.21 um 12:14 schrieb Daniel Vetter:
> On Wed, Jul 07, 2021 at 04:36:49PM +0000, Roberto Sassu wrote:
>> Hi
>>
>> I'm getting this oops (on commit a180bd1d7e16):
>>
>> [ 17.711520] BUG: kernel NULL pointer dereference, address: 0000000000000010
>> [ 17.739451] RIP: 0010:qxl_bo_move_notify+0x35/0x80 [qxl]
>> [ 17.827345] RSP: 0018:ffffc90000457c08 EFLAGS: 00010286
>> [ 17.827350] RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
>> [ 17.827353] RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffffffff85596feb
>> [ 17.827356] RBP: ffff88800e311c00 R08: 0000000000000000 R09: 0000000000000000
>> [ 17.827358] R10: ffffffff8697b243 R11: fffffbfff0d2f648 R12: 0000000000000000
>> [ 17.827361] R13: ffff88800e311e48 R14: ffff88800e311e98 R15: ffff88800e311e90
>> [ 17.827364] FS: 0000000000000000(0000) GS:ffff88805d800000(0000) knlGS:0000000000000000
>> [ 17.861699] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 17.861703] CR2: 0000000000000010 CR3: 000000002642c000 CR4: 0000000000350ee0
>> [ 17.861707] Call Trace:
>> [ 17.861712] ttm_bo_cleanup_memtype_use+0x4d/0xb0 [ttm]
>> [ 17.861730] ttm_bo_release+0x42d/0x7c0 [ttm]
>> [ 17.861746] ? ttm_bo_cleanup_refs+0x127/0x420 [ttm]
>> [ 17.888300] ttm_bo_delayed_delete+0x289/0x390 [ttm]
>> [ 17.888317] ? ttm_bo_cleanup_refs+0x420/0x420 [ttm]
>> [ 17.888332] ? lock_release+0x9c/0x5c0
>> [ 17.901033] ? rcu_read_lock_held_common+0x1a/0x50
>> [ 17.905183] ttm_device_delayed_workqueue+0x18/0x50 [ttm]
>> [ 17.909371] process_one_work+0x537/0x9f0
>> [ 17.913345] ? pwq_dec_nr_in_flight+0x160/0x160
>> [ 17.917297] ? lock_acquired+0xa4/0x580
>> [ 17.921168] ? worker_thread+0x169/0x600
>> [ 17.925034] worker_thread+0x7a/0x600
>> [ 17.928657] ? process_one_work+0x9f0/0x9f0
>> [ 17.932360] kthread+0x200/0x230
>> [ 17.935930] ? set_kthread_struct+0x80/0x80
>> [ 17.939593] ret_from_fork+0x22/0x30
>> [ 17.951737] CR2: 0000000000000010
>> [ 17.955496] ---[ end trace e30cc21c24e81ee5 ]---
>>
>> I had a look at the code, and it seems that this is caused by
>> trying to use bo->resource which is NULL.
>>
>> bo->resource is freed by ttm_bo_cleanup_refs() ->
>> ttm_bo_cleanup_memtype_use() -> ttm_resource_free().
>>
>> And then a notification is issued by ttm_bo_cleanup_refs() ->
>> ttm_bo_put() -> ttm_bo_release() ->
>> ttm_bo_cleanup_memtype_use(), this time with bo->release
>> equal to NULL.
>>
>> I was thinking a proper way to fix this. Checking that
>> bo->release is not NULL in qxl_bo_move_notify() would
>> solve the issue. But maybe there is a better way, like
>> avoiding that ttm_bo_cleanup_memtype_use() is called
>> twice. Which way would be preferable?
> Adding Christian and Dave, who've touched all this recently iirc.
> -Daniel
>
>> Thanks
>>
>> Roberto
>>
>> HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
>> Managing Director: Li Peng, Li Jian, Shi Yanli
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-07-08 11:30 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-07 16:36 Oops in qxl_bo_move_notify() Roberto Sassu
2021-07-08 10:14 ` Daniel Vetter
2021-07-08 11:30 ` Christian König
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.