linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* HMM related use-after-free with amdgpu
@ 2019-07-15 16:51 Michel Dänzer
  2019-07-15 17:25 ` Jason Gunthorpe
  0 siblings, 1 reply; 9+ messages in thread
From: Michel Dänzer @ 2019-07-15 16:51 UTC (permalink / raw)
  To: amd-gfx; +Cc: Jérôme Glisse, linux-mm, Jason Gunthorpe

[-- Attachment #1: Type: text/plain, Size: 534 bytes --]


With a KASAN enabled kernel built from amd-staging-drm-next, the
attached use-after-free is pretty reliably detected during a piglit gpu run.

Any ideas?


P.S. With my standard kernels without KASAN (currently 5.2.y + drm-next
changes for 5.3), I'm having trouble lately completing a piglit run,
running into various issues which look like memory corruption, so might
be related.

-- 
Earthling Michel Dänzer               |              https://www.amd.com
Libre software enthusiast             |             Mesa and X developer

[-- Attachment #2: kern.log --]
[-- Type: text/x-log, Size: 8929 bytes --]

Jul 15 18:09:29 kaveri kernel: [  560.388751][T12568] ==================================================================
Jul 15 18:09:29 kaveri kernel: [  560.389063][T12568] BUG: KASAN: use-after-free in __mmu_notifier_release+0x286/0x3e0
Jul 15 18:09:29 kaveri kernel: [  560.389068][T12568] Read of size 8 at addr ffff88835e1c7cb0 by task amd_pinned_memo/12568
Jul 15 18:09:29 kaveri kernel: [  560.389071][T12568] 
Jul 15 18:09:29 kaveri kernel: [  560.389077][T12568] CPU: 9 PID: 12568 Comm: amd_pinned_memo Tainted: G           OE     5.2.0-rc1-00811-g2ad5a7d31bdf #125
Jul 15 18:09:29 kaveri kernel: [  560.389080][T12568] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.80 09/13/2017
Jul 15 18:09:29 kaveri kernel: [  560.389084][T12568] Call Trace:
Jul 15 18:09:29 kaveri kernel: [  560.389091][T12568]  dump_stack+0x7c/0xc0
Jul 15 18:09:29 kaveri kernel: [  560.389097][T12568]  ? __mmu_notifier_release+0x286/0x3e0
Jul 15 18:09:29 kaveri kernel: [  560.389101][T12568]  print_address_description+0x65/0x22e
Jul 15 18:09:29 kaveri kernel: [  560.389106][T12568]  ? __mmu_notifier_release+0x286/0x3e0
Jul 15 18:09:29 kaveri kernel: [  560.389110][T12568]  ? __mmu_notifier_release+0x286/0x3e0
Jul 15 18:09:29 kaveri kernel: [  560.389115][T12568]  __kasan_report.cold.3+0x1a/0x3d
Jul 15 18:09:29 kaveri kernel: [  560.389122][T12568]  ? __mmu_notifier_release+0x286/0x3e0
Jul 15 18:09:29 kaveri kernel: [  560.389128][T12568]  kasan_report+0xe/0x20
Jul 15 18:09:29 kaveri kernel: [  560.389132][T12568]  __mmu_notifier_release+0x286/0x3e0
Jul 15 18:09:29 kaveri kernel: [  560.389142][T12568]  exit_mmap+0x93/0x400
Jul 15 18:09:29 kaveri kernel: [  560.389146][T12568]  ? quarantine_put+0xb7/0x150
Jul 15 18:09:29 kaveri kernel: [  560.389151][T12568]  ? do_munmap+0x10/0x10
Jul 15 18:09:29 kaveri kernel: [  560.389156][T12568]  ? lockdep_hardirqs_on+0x37f/0x560
Jul 15 18:09:29 kaveri kernel: [  560.389165][T12568]  ? __khugepaged_exit+0x2af/0x3e0
Jul 15 18:09:29 kaveri kernel: [  560.389169][T12568]  ? __khugepaged_exit+0x2af/0x3e0
Jul 15 18:09:29 kaveri kernel: [  560.389174][T12568]  ? rcu_read_lock_sched_held+0xd8/0x110
Jul 15 18:09:29 kaveri kernel: [  560.389179][T12568]  ? kmem_cache_free+0x279/0x2c0
Jul 15 18:09:29 kaveri kernel: [  560.389185][T12568]  ? __khugepaged_exit+0x2be/0x3e0
Jul 15 18:09:29 kaveri kernel: [  560.389192][T12568]  mmput+0xb2/0x390
Jul 15 18:09:29 kaveri kernel: [  560.389199][T12568]  do_exit+0x880/0x2a70
Jul 15 18:09:29 kaveri kernel: [  560.389207][T12568]  ? find_held_lock+0x33/0x1c0
Jul 15 18:09:29 kaveri kernel: [  560.389213][T12568]  ? mm_update_next_owner+0x5d0/0x5d0
Jul 15 18:09:29 kaveri kernel: [  560.389218][T12568]  ? __do_page_fault+0x41d/0xa20
Jul 15 18:09:29 kaveri kernel: [  560.389226][T12568]  ? lock_downgrade+0x620/0x620
Jul 15 18:09:29 kaveri kernel: [  560.389232][T12568]  ? handle_mm_fault+0x4ab/0x6a0
Jul 15 18:09:29 kaveri kernel: [  560.389242][T12568]  do_group_exit+0xf0/0x2e0
Jul 15 18:09:29 kaveri kernel: [  560.389249][T12568]  __x64_sys_exit_group+0x3a/0x50
Jul 15 18:09:29 kaveri kernel: [  560.389255][T12568]  do_syscall_64+0x9c/0x430
Jul 15 18:09:29 kaveri kernel: [  560.389261][T12568]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
Jul 15 18:09:29 kaveri kernel: [  560.389266][T12568] RIP: 0033:0x7fc23d8ed9d6
Jul 15 18:09:29 kaveri kernel: [  560.389271][T12568] Code: 00 4c 8b 0d bc 44 0f 00 eb 19 66 2e 0f 1f 84 00 00 00 00 00 89 d7 89 f0 0f 05 48 3d 00 f0 ff ff 77 22 f4 89 d7 44 89 c0 0f 05 <48> 3d 00 f0 ff ff 76 e2 f7 d8 64 41 89 01 eb da 66 2e 0f 1f 84 00
Jul 15 18:09:29 kaveri kernel: [  560.389275][T12568] RSP: 002b:00007fff8c3bcfa8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
Jul 15 18:09:29 kaveri kernel: [  560.389280][T12568] RAX: ffffffffffffffda RBX: 00007fc23d9de760 RCX: 00007fc23d8ed9d6
Jul 15 18:09:29 kaveri kernel: [  560.389283][T12568] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
Jul 15 18:09:29 kaveri kernel: [  560.389287][T12568] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff48
Jul 15 18:09:29 kaveri kernel: [  560.389290][T12568] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc23d9de760
Jul 15 18:09:29 kaveri kernel: [  560.389293][T12568] R13: 00000000000004f0 R14: 00007fc23d9e7428 R15: 0000000000000000
Jul 15 18:09:29 kaveri kernel: [  560.389306][T12568] 
Jul 15 18:09:29 kaveri kernel: [  560.389309][T12568] Allocated by task 12568:
Jul 15 18:09:29 kaveri kernel: [  560.389314][T12568]  save_stack+0x19/0x80
Jul 15 18:09:29 kaveri kernel: [  560.389318][T12568]  __kasan_kmalloc.constprop.8+0xc1/0xd0
Jul 15 18:09:29 kaveri kernel: [  560.389323][T12568]  hmm_get_or_create+0x8f/0x3f0
Jul 15 18:09:29 kaveri kernel: [  560.389327][T12568]  hmm_mirror_register+0x58/0x240
Jul 15 18:09:29 kaveri kernel: [  560.389425][T12568]  amdgpu_mn_get+0x37b/0x6c0 [amdgpu]
Jul 15 18:09:29 kaveri kernel: [  560.389554][T12568]  amdgpu_mn_register+0xf6/0x710 [amdgpu]
Jul 15 18:09:29 kaveri kernel: [  560.389656][T12568]  amdgpu_gem_userptr_ioctl+0x6a3/0x8b0 [amdgpu]
Jul 15 18:09:29 kaveri kernel: [  560.389678][T12568]  drm_ioctl_kernel+0x1c9/0x260 [drm]
Jul 15 18:09:29 kaveri kernel: [  560.389701][T12568]  drm_ioctl+0x436/0x930 [drm]
Jul 15 18:09:29 kaveri kernel: [  560.389830][T12568]  amdgpu_drm_ioctl+0xd0/0x1b0 [amdgpu]
Jul 15 18:09:29 kaveri kernel: [  560.389836][T12568]  do_vfs_ioctl+0x193/0xfd0
Jul 15 18:09:29 kaveri kernel: [  560.389839][T12568]  ksys_ioctl+0x60/0x90
Jul 15 18:09:29 kaveri kernel: [  560.389843][T12568]  __x64_sys_ioctl+0x6f/0xb0
Jul 15 18:09:29 kaveri kernel: [  560.389847][T12568]  do_syscall_64+0x9c/0x430
Jul 15 18:09:29 kaveri kernel: [  560.389851][T12568]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
Jul 15 18:09:29 kaveri kernel: [  560.389853][T12568] 
Jul 15 18:09:29 kaveri kernel: [  560.389857][T12568] Freed by task 12568:
Jul 15 18:09:29 kaveri kernel: [  560.389860][T12568]  save_stack+0x19/0x80
Jul 15 18:09:29 kaveri kernel: [  560.389864][T12568]  __kasan_slab_free+0x125/0x170
Jul 15 18:09:29 kaveri kernel: [  560.389867][T12568]  kfree+0xe2/0x290
Jul 15 18:09:29 kaveri kernel: [  560.389871][T12568]  __mmu_notifier_release+0xef/0x3e0
Jul 15 18:09:29 kaveri kernel: [  560.389875][T12568]  exit_mmap+0x93/0x400
Jul 15 18:09:29 kaveri kernel: [  560.389879][T12568]  mmput+0xb2/0x390
Jul 15 18:09:29 kaveri kernel: [  560.389883][T12568]  do_exit+0x880/0x2a70
Jul 15 18:09:29 kaveri kernel: [  560.389886][T12568]  do_group_exit+0xf0/0x2e0
Jul 15 18:09:29 kaveri kernel: [  560.389890][T12568]  __x64_sys_exit_group+0x3a/0x50
Jul 15 18:09:29 kaveri kernel: [  560.389893][T12568]  do_syscall_64+0x9c/0x430
Jul 15 18:09:29 kaveri kernel: [  560.389897][T12568]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
Jul 15 18:09:29 kaveri kernel: [  560.389900][T12568] 
Jul 15 18:09:29 kaveri kernel: [  560.389903][T12568] The buggy address belongs to the object at ffff88835e1c7c00
Jul 15 18:09:29 kaveri kernel: [  560.389903][T12568]  which belongs to the cache kmalloc-512 of size 512
Jul 15 18:09:29 kaveri kernel: [  560.389908][T12568] The buggy address is located 176 bytes inside of
Jul 15 18:09:29 kaveri kernel: [  560.389908][T12568]  512-byte region [ffff88835e1c7c00, ffff88835e1c7e00)
Jul 15 18:09:29 kaveri kernel: [  560.389911][T12568] The buggy address belongs to the page:
Jul 15 18:09:29 kaveri kernel: [  560.389915][T12568] page:ffffea000d787100 refcount:1 mapcount:0 mapping:ffff88837d80ec00 index:0x0 compound_mapcount: 0
Jul 15 18:09:29 kaveri kernel: [  560.389921][T12568] flags: 0x17fffc000010200(slab|head)
Jul 15 18:09:29 kaveri kernel: [  560.389929][T12568] raw: 017fffc000010200 0000000000000000 0000000100000001 ffff88837d80ec00
Jul 15 18:09:29 kaveri kernel: [  560.389933][T12568] raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000
Jul 15 18:09:29 kaveri kernel: [  560.389936][T12568] page dumped because: kasan: bad access detected
Jul 15 18:09:29 kaveri kernel: [  560.389939][T12568] 
Jul 15 18:09:29 kaveri kernel: [  560.389942][T12568] Memory state around the buggy address:
Jul 15 18:09:29 kaveri kernel: [  560.389946][T12568]  ffff88835e1c7b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Jul 15 18:09:29 kaveri kernel: [  560.389949][T12568]  ffff88835e1c7c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jul 15 18:09:29 kaveri kernel: [  560.389953][T12568] >ffff88835e1c7c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jul 15 18:09:29 kaveri kernel: [  560.389956][T12568]                                      ^
Jul 15 18:09:29 kaveri kernel: [  560.389960][T12568]  ffff88835e1c7d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jul 15 18:09:29 kaveri kernel: [  560.389963][T12568]  ffff88835e1c7d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jul 15 18:09:29 kaveri kernel: [  560.389966][T12568] ==================================================================

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HMM related use-after-free with amdgpu
  2019-07-15 16:51 HMM related use-after-free with amdgpu Michel Dänzer
@ 2019-07-15 17:25 ` Jason Gunthorpe
  2019-07-16 16:31   ` Michel Dänzer
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2019-07-15 17:25 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx, Jérôme Glisse, linux-mm

On Mon, Jul 15, 2019 at 06:51:06PM +0200, Michel Dänzer wrote:
> 
> With a KASAN enabled kernel built from amd-staging-drm-next, the
> attached use-after-free is pretty reliably detected during a piglit gpu run.

Does this branch you are testing have the hmm.git merged? I think from
the name it does not?

Use after free's of this nature were something that was fixed in
hmm.git..

I don't see an obvious way you can hit something like this with the
new code arrangement..

> P.S. With my standard kernels without KASAN (currently 5.2.y + drm-next
> changes for 5.3), I'm having trouble lately completing a piglit run,
> running into various issues which look like memory corruption, so might
> be related.

I'm skeptical that the AMDGPU implementation of the locking around the
hmm_range & mirror is working, it doesn'r follow the perscribed
pattern at least.

> Jul 15 18:09:29 kaveri kernel: [  560.388751][T12568] ==================================================================
> Jul 15 18:09:29 kaveri kernel: [  560.389063][T12568] BUG: KASAN: use-after-free in __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [  560.389068][T12568] Read of size 8 at addr ffff88835e1c7cb0 by task amd_pinned_memo/12568
> Jul 15 18:09:29 kaveri kernel: [  560.389071][T12568] 
> Jul 15 18:09:29 kaveri kernel: [  560.389077][T12568] CPU: 9 PID: 12568 Comm: amd_pinned_memo Tainted: G           OE     5.2.0-rc1-00811-g2ad5a7d31bdf #125
> Jul 15 18:09:29 kaveri kernel: [  560.389080][T12568] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.80 09/13/2017
> Jul 15 18:09:29 kaveri kernel: [  560.389084][T12568] Call Trace:
> Jul 15 18:09:29 kaveri kernel: [  560.389091][T12568]  dump_stack+0x7c/0xc0
> Jul 15 18:09:29 kaveri kernel: [  560.389097][T12568]  ? __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [  560.389101][T12568]  print_address_description+0x65/0x22e
> Jul 15 18:09:29 kaveri kernel: [  560.389106][T12568]  ? __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [  560.389110][T12568]  ? __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [  560.389115][T12568]  __kasan_report.cold.3+0x1a/0x3d
> Jul 15 18:09:29 kaveri kernel: [  560.389122][T12568]  ? __mmu_notifier_release+0x286/0x3e0
> Jul 15 18:09:29 kaveri kernel: [  560.389128][T12568]  kasan_report+0xe/0x20
> Jul 15 18:09:29 kaveri kernel: [  560.389132][T12568]  __mmu_notifier_release+0x286/0x3e0

So we are iterating over the mn list and touched free'd memory

> Jul 15 18:09:29 kaveri kernel: [  560.389309][T12568] Allocated by task 12568:
> Jul 15 18:09:29 kaveri kernel: [  560.389314][T12568]  save_stack+0x19/0x80
> Jul 15 18:09:29 kaveri kernel: [  560.389318][T12568]  __kasan_kmalloc.constprop.8+0xc1/0xd0
> Jul 15 18:09:29 kaveri kernel: [  560.389323][T12568]  hmm_get_or_create+0x8f/0x3f0

The memory is probably a struct hmm

> Jul 15 18:09:29 kaveri kernel: [  560.389857][T12568] Freed by task 12568:
> Jul 15 18:09:29 kaveri kernel: [  560.389860][T12568]  save_stack+0x19/0x80
> Jul 15 18:09:29 kaveri kernel: [  560.389864][T12568]  __kasan_slab_free+0x125/0x170
> Jul 15 18:09:29 kaveri kernel: [  560.389867][T12568]  kfree+0xe2/0x290
> Jul 15 18:09:29 kaveri kernel: [  560.389871][T12568]  __mmu_notifier_release+0xef/0x3e0
> Jul 15 18:09:29 kaveri kernel: [  560.389875][T12568]  exit_mmap+0x93/0x400

And the free was also done in notifier_release (presumably the
backtrace is corrupt and this is really in the old hmm_release ->
hmm_put -> hmm_free -> kfree call chain)

Which was not OK, as __mmu_notifier_release doesn't use a 'safe' hlist
iterator, so the release callback can never trigger kfree of a struct
mmu_notifier.

The new hmm.git code does not call kfree from release, it schedules
that through a SRCU which won't run until __mmu_notifier_release
returns, by definition. 

So should be fixed.

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HMM related use-after-free with amdgpu
  2019-07-15 17:25 ` Jason Gunthorpe
@ 2019-07-16 16:31   ` Michel Dänzer
  2019-07-16 16:35     ` Jason Gunthorpe
  0 siblings, 1 reply; 9+ messages in thread
From: Michel Dänzer @ 2019-07-16 16:31 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-mm, Jérôme Glisse, amd-gfx

On 2019-07-15 7:25 p.m., Jason Gunthorpe wrote:
> On Mon, Jul 15, 2019 at 06:51:06PM +0200, Michel Dänzer wrote:
>>
>> With a KASAN enabled kernel built from amd-staging-drm-next, the
>> attached use-after-free is pretty reliably detected during a piglit gpu run.
> 
> Does this branch you are testing have the hmm.git merged? I think from
> the name it does not?

Indeed, no.


> Use after free's of this nature were something that was fixed in
> hmm.git..
> 
> I don't see an obvious way you can hit something like this with the
> new code arrangement..

I tried merging the hmm-devmem-cleanup.4 changes[0] into my 5.2.y +
drm-next for 5.3 kernel. While the result didn't hit the problem, all
GL_AMD_pinned_memory piglit tests failed, so I suspect the problem was
simply avoided by not actually hitting the HMM related functionality.

It's possible that I made a mistake in merging the changes, or that I
missed some other required changes. But it's also possible that the HMM
changes broke the corresponding user-pointer functionality in amdgpu.


[0] Specifically, the following (ranges of) commits:

9ffbe8ac05dbb4ab4a4836a55a47fc6be945a38f (-> lockdep_assert_held_write)
e1bfa87399e372446454ecbaeba2800f0a385733..5da04cc86d1215fd9fe0e5c88ead6e8428a75e56
fec88ab0af9706b2201e5daf377c5031c62d11f7^..fec88ab0af9706b2201e5daf377c5031c62d11f7

-- 
Earthling Michel Dänzer               |              https://www.amd.com
Libre software enthusiast             |             Mesa and X developer


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HMM related use-after-free with amdgpu
  2019-07-16 16:31   ` Michel Dänzer
@ 2019-07-16 16:35     ` Jason Gunthorpe
  2019-07-16 17:04       ` Michel Dänzer
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2019-07-16 16:35 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: linux-mm, Jérôme Glisse, amd-gfx

On Tue, Jul 16, 2019 at 06:31:09PM +0200, Michel Dänzer wrote:
> On 2019-07-15 7:25 p.m., Jason Gunthorpe wrote:
> > On Mon, Jul 15, 2019 at 06:51:06PM +0200, Michel Dänzer wrote:
> >>
> >> With a KASAN enabled kernel built from amd-staging-drm-next, the
> >> attached use-after-free is pretty reliably detected during a piglit gpu run.
> > 
> > Does this branch you are testing have the hmm.git merged? I think from
> > the name it does not?
> 
> Indeed, no.
> 
> 
> > Use after free's of this nature were something that was fixed in
> > hmm.git..
> > 
> > I don't see an obvious way you can hit something like this with the
> > new code arrangement..
> 
> I tried merging the hmm-devmem-cleanup.4 changes[0] into my 5.2.y +
> drm-next for 5.3 kernel. While the result didn't hit the problem, all
> GL_AMD_pinned_memory piglit tests failed, so I suspect the problem was
> simply avoided by not actually hitting the HMM related functionality.
> 
> It's possible that I made a mistake in merging the changes, or that I
> missed some other required changes. But it's also possible that the HMM
> changes broke the corresponding user-pointer functionality in amdgpu.

Not sure, this was all Tested by the AMD team so it should work, I
hope.

It should all be sorted out in rc1, try again then?

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HMM related use-after-free with amdgpu
  2019-07-16 16:35     ` Jason Gunthorpe
@ 2019-07-16 17:04       ` Michel Dänzer
  2019-07-16 17:20         ` Jason Gunthorpe
  2019-07-16 22:10         ` Kuehling, Felix
  0 siblings, 2 replies; 9+ messages in thread
From: Michel Dänzer @ 2019-07-16 17:04 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-mm, Jérôme Glisse, amd-gfx

On 2019-07-16 6:35 p.m., Jason Gunthorpe wrote:
> On Tue, Jul 16, 2019 at 06:31:09PM +0200, Michel Dänzer wrote:
>> On 2019-07-15 7:25 p.m., Jason Gunthorpe wrote:
>>> On Mon, Jul 15, 2019 at 06:51:06PM +0200, Michel Dänzer wrote:
>>>>
>>>> With a KASAN enabled kernel built from amd-staging-drm-next, the
>>>> attached use-after-free is pretty reliably detected during a piglit gpu run.
>>>
>>> Does this branch you are testing have the hmm.git merged? I think from
>>> the name it does not?
>>
>> Indeed, no.
>>
>>
>>> Use after free's of this nature were something that was fixed in
>>> hmm.git..
>>>
>>> I don't see an obvious way you can hit something like this with the
>>> new code arrangement..
>>
>> I tried merging the hmm-devmem-cleanup.4 changes[0] into my 5.2.y +
>> drm-next for 5.3 kernel. While the result didn't hit the problem, all
>> GL_AMD_pinned_memory piglit tests failed, so I suspect the problem was
>> simply avoided by not actually hitting the HMM related functionality.
>>
>> It's possible that I made a mistake in merging the changes, or that I
>> missed some other required changes. But it's also possible that the HMM
>> changes broke the corresponding user-pointer functionality in amdgpu.
> 
> Not sure, this was all Tested by the AMD team so it should work, I
> hope.

It can't, due to the issue pointed out by Linus in the "drm pull for
5.3-rc1" thread: DRM_AMDGPU_USERPTR still depends on ARCH_HAS_HMM, which
no longer exists, so it can't be enabled.

Fixing that up manually, it successfully finished a piglit run with that
functionality enabled as well.


-- 
Earthling Michel Dänzer               |              https://www.amd.com
Libre software enthusiast             |             Mesa and X developer


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HMM related use-after-free with amdgpu
  2019-07-16 17:04       ` Michel Dänzer
@ 2019-07-16 17:20         ` Jason Gunthorpe
  2019-07-16 22:10         ` Kuehling, Felix
  1 sibling, 0 replies; 9+ messages in thread
From: Jason Gunthorpe @ 2019-07-16 17:20 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: linux-mm, Jérôme Glisse, amd-gfx

On Tue, Jul 16, 2019 at 07:04:52PM +0200, Michel Dänzer wrote:
> On 2019-07-16 6:35 p.m., Jason Gunthorpe wrote:
> > On Tue, Jul 16, 2019 at 06:31:09PM +0200, Michel Dänzer wrote:
> >> On 2019-07-15 7:25 p.m., Jason Gunthorpe wrote:
> >>> On Mon, Jul 15, 2019 at 06:51:06PM +0200, Michel Dänzer wrote:
> >>>>
> >>>> With a KASAN enabled kernel built from amd-staging-drm-next, the
> >>>> attached use-after-free is pretty reliably detected during a piglit gpu run.
> >>>
> >>> Does this branch you are testing have the hmm.git merged? I think from
> >>> the name it does not?
> >>
> >> Indeed, no.
> >>
> >>
> >>> Use after free's of this nature were something that was fixed in
> >>> hmm.git..
> >>>
> >>> I don't see an obvious way you can hit something like this with the
> >>> new code arrangement..
> >>
> >> I tried merging the hmm-devmem-cleanup.4 changes[0] into my 5.2.y +
> >> drm-next for 5.3 kernel. While the result didn't hit the problem, all
> >> GL_AMD_pinned_memory piglit tests failed, so I suspect the problem was
> >> simply avoided by not actually hitting the HMM related functionality.
> >>
> >> It's possible that I made a mistake in merging the changes, or that I
> >> missed some other required changes. But it's also possible that the HMM
> >> changes broke the corresponding user-pointer functionality in amdgpu.
> > 
> > Not sure, this was all Tested by the AMD team so it should work, I
> > hope.
> 
> It can't, due to the issue pointed out by Linus in the "drm pull for
> 5.3-rc1" thread: DRM_AMDGPU_USERPTR still depends on ARCH_HAS_HMM, which
> no longer exists, so it can't be enabled.

Somehow that merge resolution got missed, but I think the AMD folks
must have included it when they did their merge & test.

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HMM related use-after-free with amdgpu
  2019-07-16 17:04       ` Michel Dänzer
  2019-07-16 17:20         ` Jason Gunthorpe
@ 2019-07-16 22:10         ` Kuehling, Felix
  2019-07-17  7:47           ` Michel Dänzer
  2019-07-17 11:34           ` Jason Gunthorpe
  1 sibling, 2 replies; 9+ messages in thread
From: Kuehling, Felix @ 2019-07-16 22:10 UTC (permalink / raw)
  To: Michel Dänzer, Jason Gunthorpe
  Cc: linux-mm, Jérôme Glisse, amd-gfx

On 2019-07-16 1:04 p.m., Michel Dänzer wrote:
> On 2019-07-16 6:35 p.m., Jason Gunthorpe wrote:
>> On Tue, Jul 16, 2019 at 06:31:09PM +0200, Michel Dänzer wrote:
>>> On 2019-07-15 7:25 p.m., Jason Gunthorpe wrote:
>>>> On Mon, Jul 15, 2019 at 06:51:06PM +0200, Michel Dänzer wrote:
>>>>> With a KASAN enabled kernel built from amd-staging-drm-next, the
>>>>> attached use-after-free is pretty reliably detected during a piglit gpu run.
>>>> Does this branch you are testing have the hmm.git merged? I think from
>>>> the name it does not?
>>> Indeed, no.
>>>
>>>
>>>> Use after free's of this nature were something that was fixed in
>>>> hmm.git..
>>>>
>>>> I don't see an obvious way you can hit something like this with the
>>>> new code arrangement..
>>> I tried merging the hmm-devmem-cleanup.4 changes[0] into my 5.2.y +
>>> drm-next for 5.3 kernel. While the result didn't hit the problem, all
>>> GL_AMD_pinned_memory piglit tests failed, so I suspect the problem was
>>> simply avoided by not actually hitting the HMM related functionality.
>>>
>>> It's possible that I made a mistake in merging the changes, or that I
>>> missed some other required changes. But it's also possible that the HMM
>>> changes broke the corresponding user-pointer functionality in amdgpu.
>> Not sure, this was all Tested by the AMD team so it should work, I
>> hope.
> It can't, due to the issue pointed out by Linus in the "drm pull for
> 5.3-rc1" thread: DRM_AMDGPU_USERPTR still depends on ARCH_HAS_HMM, which
> no longer exists, so it can't be enabled.

As far as I can tell, Linus fixed this up in his merge commit 
be8454afc50f43016ca8b6130d9673bdd0bd56ec. Jason, is hmm.git going to get 
rebased or merge to pick up the amdgpu changes for HMM from master?

Regards,
   Felix


>
> Fixing that up manually, it successfully finished a piglit run with that
> functionality enabled as well.
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HMM related use-after-free with amdgpu
  2019-07-16 22:10         ` Kuehling, Felix
@ 2019-07-17  7:47           ` Michel Dänzer
  2019-07-17 11:34           ` Jason Gunthorpe
  1 sibling, 0 replies; 9+ messages in thread
From: Michel Dänzer @ 2019-07-17  7:47 UTC (permalink / raw)
  To: Kuehling, Felix, Jason Gunthorpe
  Cc: linux-mm, Jérôme Glisse, amd-gfx

On 2019-07-17 12:10 a.m., Kuehling, Felix wrote:
> On 2019-07-16 1:04 p.m., Michel Dänzer wrote:
>> On 2019-07-16 6:35 p.m., Jason Gunthorpe wrote:
>>> On Tue, Jul 16, 2019 at 06:31:09PM +0200, Michel Dänzer wrote:
>>>> On 2019-07-15 7:25 p.m., Jason Gunthorpe wrote:
>>>>> On Mon, Jul 15, 2019 at 06:51:06PM +0200, Michel Dänzer wrote:
>>>>>> With a KASAN enabled kernel built from amd-staging-drm-next, the
>>>>>> attached use-after-free is pretty reliably detected during a piglit gpu run.
>>>>> Does this branch you are testing have the hmm.git merged? I think from
>>>>> the name it does not?
>>>> Indeed, no.
>>>>
>>>>
>>>>> Use after free's of this nature were something that was fixed in
>>>>> hmm.git..
>>>>>
>>>>> I don't see an obvious way you can hit something like this with the
>>>>> new code arrangement..
>>>> I tried merging the hmm-devmem-cleanup.4 changes[0] into my 5.2.y +
>>>> drm-next for 5.3 kernel. While the result didn't hit the problem, all
>>>> GL_AMD_pinned_memory piglit tests failed, so I suspect the problem was
>>>> simply avoided by not actually hitting the HMM related functionality.
>>>>
>>>> It's possible that I made a mistake in merging the changes, or that I
>>>> missed some other required changes. But it's also possible that the HMM
>>>> changes broke the corresponding user-pointer functionality in amdgpu.
>>> Not sure, this was all Tested by the AMD team so it should work, I
>>> hope.
>> It can't, due to the issue pointed out by Linus in the "drm pull for
>> 5.3-rc1" thread: DRM_AMDGPU_USERPTR still depends on ARCH_HAS_HMM, which
>> no longer exists, so it can't be enabled.
> 
> As far as I can tell, Linus fixed this up in his merge commit 
> be8454afc50f43016ca8b6130d9673bdd0bd56ec.

Ah! That's the piece I was missing, since I had merged the drm-next
changes before Linus did. Thanks Felix.

Note that AFAICT it was basically luck that Linus noticed this and fixed
it up. It would be better not to push our luck like this. :)


-- 
Earthling Michel Dänzer               |              https://www.amd.com
Libre software enthusiast             |             Mesa and X developer


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: HMM related use-after-free with amdgpu
  2019-07-16 22:10         ` Kuehling, Felix
  2019-07-17  7:47           ` Michel Dänzer
@ 2019-07-17 11:34           ` Jason Gunthorpe
  1 sibling, 0 replies; 9+ messages in thread
From: Jason Gunthorpe @ 2019-07-17 11:34 UTC (permalink / raw)
  To: Kuehling, Felix
  Cc: Michel Dänzer, linux-mm, Jérôme Glisse, amd-gfx

On Tue, Jul 16, 2019 at 10:10:46PM +0000, Kuehling, Felix wrote:
> On 2019-07-16 1:04 p.m., Michel Dänzer wrote:
> > On 2019-07-16 6:35 p.m., Jason Gunthorpe wrote:
> >> On Tue, Jul 16, 2019 at 06:31:09PM +0200, Michel Dänzer wrote:
> >>> On 2019-07-15 7:25 p.m., Jason Gunthorpe wrote:
> >>>> On Mon, Jul 15, 2019 at 06:51:06PM +0200, Michel Dänzer wrote:
> >>>>> With a KASAN enabled kernel built from amd-staging-drm-next, the
> >>>>> attached use-after-free is pretty reliably detected during a piglit gpu run.
> >>>> Does this branch you are testing have the hmm.git merged? I think from
> >>>> the name it does not?
> >>> Indeed, no.
> >>>
> >>>
> >>>> Use after free's of this nature were something that was fixed in
> >>>> hmm.git..
> >>>>
> >>>> I don't see an obvious way you can hit something like this with the
> >>>> new code arrangement..
> >>> I tried merging the hmm-devmem-cleanup.4 changes[0] into my 5.2.y +
> >>> drm-next for 5.3 kernel. While the result didn't hit the problem, all
> >>> GL_AMD_pinned_memory piglit tests failed, so I suspect the problem was
> >>> simply avoided by not actually hitting the HMM related functionality.
> >>>
> >>> It's possible that I made a mistake in merging the changes, or that I
> >>> missed some other required changes. But it's also possible that the HMM
> >>> changes broke the corresponding user-pointer functionality in amdgpu.
> >> Not sure, this was all Tested by the AMD team so it should work, I
> >> hope.
> > It can't, due to the issue pointed out by Linus in the "drm pull for
> > 5.3-rc1" thread: DRM_AMDGPU_USERPTR still depends on ARCH_HAS_HMM, which
> > no longer exists, so it can't be enabled.
> 
> As far as I can tell, Linus fixed this up in his merge commit 
> be8454afc50f43016ca8b6130d9673bdd0bd56ec. Jason, is hmm.git going to get 
> rebased or merge to pick up the amdgpu changes for HMM from master?

It will be reset to -rc1 when it comes out, then we start all over
again.

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-07-17 11:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-15 16:51 HMM related use-after-free with amdgpu Michel Dänzer
2019-07-15 17:25 ` Jason Gunthorpe
2019-07-16 16:31   ` Michel Dänzer
2019-07-16 16:35     ` Jason Gunthorpe
2019-07-16 17:04       ` Michel Dänzer
2019-07-16 17:20         ` Jason Gunthorpe
2019-07-16 22:10         ` Kuehling, Felix
2019-07-17  7:47           ` Michel Dänzer
2019-07-17 11:34           ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).