All of lore.kernel.org
 help / color / mirror / Atom feed
* DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
@ 2018-07-11 16:11 Michel Dänzer
       [not found] ` <cacdbfb1-1760-518c-6a52-94fbd11748c5-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Michel Dänzer @ 2018-07-11 16:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

[-- Attachment #1: Type: text/plain, Size: 388 bytes --]


I've been occasionally getting the debugging warnings seen in the
attached kernel log excerpt. Only for piglit amd_pinned_memory and for
libdrm amdgpu_test, so I suspect it's pointing at a userptr related
issue. Christian, any ideas?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: kern.log --]
[-- Type: text/x-log; name="kern.log", Size: 6447 bytes --]

Jul  9 16:33:10 kaveri kernel: [ 1048.706008] DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
Jul  9 16:33:10 kaveri kernel: [ 1048.706023] WARNING: CPU: 13 PID: 19903 at kernel/locking/rwsem.c:217 up_read_non_owner+0xd5/0x100
Jul  9 16:33:10 kaveri kernel: [ 1048.706029] Modules linked in: lz4(E) lz4_compress(E) cpufreq_powersave(E) cpufreq_userspace(E) cpufreq_conservative(E) binfmt_misc(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) edac_mce_amd(E) amdkfd(OE) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) snd_hda_codec_realtek(E) ghash_clmulni_intel(E) amdgpu(OE) wmi_bmof(E) radeon(OE) snd_hda_codec_generic(E) pcbc(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) chash(OE) snd_hda_codec(E) gpu_sched(OE) ttm(OE) snd_hda_core(E) snd_hwdep(E) drm_kms_helper(OE) efi_pstore(E) aesni_intel(E) aes_x86_64(E) drm(OE) crypto_simd(E) snd_pcm(E) cryptd(E) r8169(E) i2c_algo_bit(E) glue_helper(E) pcspkr(E) efivars(E) k10temp(E) snd_timer(E) fb_sys_fops(E) mii(E) sg(E) ccp(E) syscopyarea(E) sp5100_tco(E) snd(E) sysfillrect(E) sysimgblt(E) soundcore(E) rng_core(E) i2c_piix4(E)
Jul  9 16:33:10 kaveri kernel: [ 1048.706126]  wmi(E) button(E) acpi_cpufreq(E) tcp_bbr(E) sch_fq(E) nct6775(E) hwmon_vid(E) sunrpc(E) efivarfs(E) ip_tables(E) x_tables(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) dm_mod(E) raid10(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) sd_mod(E) evdev(E) hid_generic(E) usbhid(E) hid(E) ahci(E) libahci(E) xhci_pci(E) libata(E) xhci_hcd(E) crc32c_intel(E) scsi_mod(E) usbcore(E) gpio_amdpt(E) gpio_generic(E)
Jul  9 16:33:10 kaveri kernel: [ 1048.706191] CPU: 13 PID: 19903 Comm: amd_pinned_memo Tainted: G        W  OE     4.18.0-rc1+ #110
Jul  9 16:33:10 kaveri kernel: [ 1048.706195] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.80 09/13/2017
Jul  9 16:33:10 kaveri kernel: [ 1048.706202] RIP: 0010:up_read_non_owner+0xd5/0x100
Jul  9 16:33:10 kaveri kernel: [ 1048.706205] Code: e0 07 83 c0 03 38 d0 7c 04 84 d2 75 2b 8b 05 32 e5 d8 03 85 c0 75 a8 48 c7 c6 e0 43 47 a2 48 c7 c7 40 44 47 a2 e8 3b 99 ee ff <0f> 0b eb 91 e8 52 49 43 00 e9 57 ff ff ff e8 48 49 43 00 eb ce e8 
Jul  9 16:33:10 kaveri kernel: [ 1048.706289] RSP: 0000:ffff8803ddc37ab8 EFLAGS: 00010286
Jul  9 16:33:10 kaveri kernel: [ 1048.706295] RAX: 0000000000000000 RBX: ffff8803ce344580 RCX: ffffffffa0c4d5e0
Jul  9 16:33:10 kaveri kernel: [ 1048.706299] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8803ee35ea90
Jul  9 16:33:10 kaveri kernel: [ 1048.706303] RBP: ffff8803ddc37b10 R08: ffffed007dc6bd53 R09: ffffed007dc6bd53
Jul  9 16:33:10 kaveri kernel: [ 1048.706308] R10: 0000000000000001 R11: ffffed007dc6bd52 R12: 0000559fbdfcb000
Jul  9 16:33:10 kaveri kernel: [ 1048.706311] R13: 0000559fbdfcc000 R14: 0000000000000001 R15: dffffc0000000000
Jul  9 16:33:10 kaveri kernel: [ 1048.706316] FS:  00007fc20c5207c0(0000) GS:ffff8803ee340000(0000) knlGS:0000000000000000
Jul  9 16:33:10 kaveri kernel: [ 1048.706320] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul  9 16:33:10 kaveri kernel: [ 1048.706324] CR2: 0000559fbdfcb000 CR3: 00000003e8fae000 CR4: 00000000003406e0
Jul  9 16:33:10 kaveri kernel: [ 1048.706327] Call Trace:
Jul  9 16:33:10 kaveri kernel: [ 1048.706337]  __mmu_notifier_invalidate_range_end+0x14f/0x210
Jul  9 16:33:10 kaveri kernel: [ 1048.706346]  wp_page_copy+0xc1d/0x1790
Jul  9 16:33:10 kaveri kernel: [ 1048.706355]  ? __do_fault+0x310/0x310
Jul  9 16:33:10 kaveri kernel: [ 1048.706361]  ? __lock_acquire+0x605/0x3670
Jul  9 16:33:10 kaveri kernel: [ 1048.706370]  do_wp_page+0x422/0x1b10
Jul  9 16:33:10 kaveri kernel: [ 1048.706376]  ? lock_acquire+0x10b/0x330
Jul  9 16:33:10 kaveri kernel: [ 1048.706381]  ? finish_mkwrite_fault+0x560/0x560
Jul  9 16:33:10 kaveri kernel: [ 1048.706391]  __handle_mm_fault+0x1b22/0x3130
Jul  9 16:33:10 kaveri kernel: [ 1048.706398]  ? finish_task_switch+0x11f/0x670
Jul  9 16:33:10 kaveri kernel: [ 1048.706403]  ? __pmd_alloc+0x430/0x430
Jul  9 16:33:10 kaveri kernel: [ 1048.706409]  ? find_held_lock+0x32/0x1c0
Jul  9 16:33:10 kaveri kernel: [ 1048.706421]  ? mark_held_locks+0xa8/0xf0
Jul  9 16:33:10 kaveri kernel: [ 1048.706426]  ? handle_mm_fault+0x17e/0x7a0
Jul  9 16:33:10 kaveri kernel: [ 1048.706433]  handle_mm_fault+0x257/0x7a0
Jul  9 16:33:10 kaveri kernel: [ 1048.706442]  __do_page_fault+0x47f/0xa80
Jul  9 16:33:10 kaveri kernel: [ 1048.706450]  ? retint_user+0x18/0x18
Jul  9 16:33:10 kaveri kernel: [ 1048.706455]  ? mm_fault_error+0x2d0/0x2d0
Jul  9 16:33:10 kaveri kernel: [ 1048.706461]  ? page_fault+0x8/0x30
Jul  9 16:33:10 kaveri kernel: [ 1048.706466]  ? trace_hardirqs_off_thunk+0x1a/0x1c
Jul  9 16:33:10 kaveri kernel: [ 1048.706473]  ? page_fault+0x8/0x30
Jul  9 16:33:10 kaveri kernel: [ 1048.706479]  page_fault+0x1e/0x30
Jul  9 16:33:10 kaveri kernel: [ 1048.706484] RIP: 0033:0x7fc20fae3357
Jul  9 16:33:10 kaveri kernel: [ 1048.706487] Code: 47 20 c5 fe 7f 44 17 c0 c5 fe 7f 47 40 c5 fe 7f 44 17 a0 c5 fe 7f 47 60 c5 fe 7f 44 17 80 48 01 fa 48 83 e2 80 48 39 d1 74 ba <c5> fd 7f 01 c5 fd 7f 41 20 c5 fd 7f 41 40 c5 fd 7f 41 60 48 81 c1 
Jul  9 16:33:10 kaveri kernel: [ 1048.706570] RSP: 002b:00007ffe17619218 EFLAGS: 00010206
Jul  9 16:33:10 kaveri kernel: [ 1048.706575] RAX: 0000559fbdfc8010 RBX: 0000000000029000 RCX: 0000559fbdfcb000
Jul  9 16:33:10 kaveri kernel: [ 1048.706579] RDX: 0000559fbdfeb800 RSI: 0000000000000000 RDI: 0000559fbdfc8010
Jul  9 16:33:10 kaveri kernel: [ 1048.706583] RBP: 00000000000237f8 R08: 0000559fbdfc8010 R09: 0000559fbdfc0870
Jul  9 16:33:10 kaveri kernel: [ 1048.706586] R10: 0000000000000000 R11: 0000000000000004 R12: 00007fc20fb449d8
Jul  9 16:33:10 kaveri kernel: [ 1048.706590] R13: 0000559fbdfc8000 R14: 00007fc20fb3fc40 R15: 00007ffe176192c0
Jul  9 16:33:10 kaveri kernel: [ 1048.706601] irq event stamp: 109699
Jul  9 16:33:10 kaveri kernel: [ 1048.706608] hardirqs last  enabled at (109699): [<ffffffffa0eab8ed>] mem_cgroup_commit_charge+0xbd/0xed0
Jul  9 16:33:10 kaveri kernel: [ 1048.706613] hardirqs last disabled at (109698): [<ffffffffa0eab8cc>] mem_cgroup_commit_charge+0x9c/0xed0
Jul  9 16:33:10 kaveri kernel: [ 1048.706619] softirqs last  enabled at (107488): [<ffffffffa2200620>] __do_softirq+0x620/0x919
Jul  9 16:33:10 kaveri kernel: [ 1048.706625] softirqs last disabled at (107471): [<ffffffffa093443e>] irq_exit+0x19e/0x1d0

[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
       [not found] ` <cacdbfb1-1760-518c-6a52-94fbd11748c5-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2018-07-12  0:43   ` Felix Kuehling
       [not found]     ` <49364825-3647-e1bb-5ef7-bbcc25dbcfa0-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Felix Kuehling @ 2018-07-12  0:43 UTC (permalink / raw)
  To: Michel Dänzer, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Kent just caught a similar backtrace in one of our KFD pre-submission
tests (see below)

Neither KFD nor AMDGPU are implied in the backtrace. Is this a
regression in the kernel itself? amd-kfd-staging is currently based on
4.18-rc1.

Regards,
  Felix

[   19.435544] ------------[ cut here ]------------
[   19.435551] DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
[   19.435558] WARNING: CPU: 2 PID: 3194 at /home/jenkins/jenkins-root/workspace/compute-psdb/kernel/kernel/locking/rwsem.c:217 up_read_non_owner+0x58/0x60
[   19.435572] Modules linked in: iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables x_tables nf_nat nf_conntrack br_netfilter fuse acpi_pad x86_pkg_temp_thermal video amdkfd amd_iommu_v2 amdgpu chash gpu_sched ttm
[   19.435598] CPU: 2 PID: 3194 Comm: correlator_test Not tainted 4.18.0-rc1-kfd-compute-psdb-22716 #1
[   19.435604] Hardware name: MSI MS-7977 <http://ontrack-internal.amd.com/browse/MS-7977>/Z170A GAMING M5 (MS-7977 <http://ontrack-internal.amd.com/browse/MS-7977>), BIOS 1.C0 10/19/2016
[   19.435611] RIP: 0010:up_read_non_owner+0x58/0x60
[   19.435615] Code: b0 00 5b c3 e8 c9 39 54 00 85 c0 74 df 8b 05 b7 72 
a1 02 85 c0 75 d5 48 c7 c6 f8 a0 32 b8 48 c7 c7 ab e9 30 b8 e8 28 e7 f9 
ff <0f> 0b eb be 0f 1f 40 00 0f 1f 44 00 00 53 48 8b 74 24 08 48 
89 fb
[   19.435661] RSP: 0018:ffffb1f0c2483c28 EFLAGS: 00010286
[   19.435666] RAX: 0000000000000000 RBX: ffff99bd19633c80 RCX: 0000000000000006
[   19.435671] RDX: 0000000000000007 RSI: 0000000000000001 RDI: ffff99bd2ed158f0
[   19.435676] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   19.435682] R10: ffffb1f0c2483bc8 R11: ffffffffb70e5b0a R12: ffff99bd181c4800
[   19.435687] R13: 0000000001a59000 R14: 0000000001a58000 R15: 0000000000000000
[   19.435693] FS:  00007fae045bb700(0000) GS:ffff99bd2ed00000(0000) knlGS:0000000000000000
[   19.435699] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   19.435704] CR2: 00007fadff7fe250 CR3: 000000045745e004 CR4: 00000000003606e0
[   19.435710] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   19.435715] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   19.435720] Call Trace:
[   19.435726]  __mmu_notifier_invalidate_range_end+0x9b/0xe0
[   19.435732]  unmap_region+0xae/0x120
[   19.435738]  ? __vma_rb_erase+0x11e/0x240
[   19.435744]  do_munmap+0x262/0x400
[   19.435749]  mmap_region+0xb1/0x5d0
[   19.435755]  ? selinux_file_mprotect+0x140/0x140
[   19.435760]  do_mmap+0x489/0x660
[   19.435765]  ? vm_mmap_pgoff+0x9f/0x110
[   19.435770]  vm_mmap_pgoff+0xcf/0x110
[   19.435776]  ksys_mmap_pgoff+0x1b4/0x260
[   19.435781]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[   19.435787]  do_syscall_64+0x56/0x1a0
[   19.435792]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   19.435797] RIP: 0033:0x7fae03bca6ba
[   19.435800] Code: 89 f5 41 54 49 89 fc 55 53 74 35 49 63 e8 48 63 da 
4d 89 f9 49 89 e8 4d 63 d6 48 89 da 4c 89 ee 4c 89 e7 b8 09 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 77 56 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 
1f 00
[   19.435847] RSP: 002b:00007ffd3f8dc058 EFLAGS: 00000206 ORIG_RAX: 0000000000000009
[   19.435853] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fae03bca6ba
[   19.435859] RDX: 0000000000000003 RSI: 0000000000001000 RDI: 0000000001a58000
[   19.435864] RBP: 0000000000000006 R08: 0000000000000006 R09: 0000000104a2a000
[   19.435870] R10: 0000000000000011 R11: 0000000000000206 R12: 0000000001a58000
[   19.435875] R13: 0000000000001000 R14: 0000000000000011 R15: 0000000104a2a000
[   19.435883] irq event stamp: 416603
[   19.435887] hardirqs last  enabled at (416603): [<ffffffffb7002b42>] do_syscall_64+0x12/0x1a0
[   19.435894] hardirqs last disabled at (416602): [<ffffffffb7c00082>] entry_SYSCALL_64_after_hwframe+0x3e/0xbe
[   19.435902] softirqs last  enabled at (415488): [<ffffffffb7e00393>] __do_softirq+0x393/0x4a6
[   19.435910] softirqs last disabled at (415471): [<ffffffffb7076261>] irq_exit+0xc1/0xd0
[   19.435916] ---[ end trace 3e22281c2c3bcb4c ]---


On 2018-07-11 12:11 PM, Michel Dänzer wrote:
> I've been occasionally getting the debugging warnings seen in the
> attached kernel log excerpt. Only for piglit amd_pinned_memory and for
> libdrm amdgpu_test, so I suspect it's pointing at a userptr related
> issue. Christian, any ideas?
>
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
       [not found]     ` <49364825-3647-e1bb-5ef7-bbcc25dbcfa0-5C7GfCeVMHo@public.gmane.org>
@ 2018-07-12  7:16       ` Michel Dänzer
       [not found]         ` <6122cef7-ca1a-7266-1928-125db40a6735-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Michel Dänzer @ 2018-07-12  7:16 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2018-07-12 02:43 AM, Felix Kuehling wrote:
> Kent just caught a similar backtrace in one of our KFD pre-submission
> tests (see below)

Yeah, looks the same.


> Neither KFD nor AMDGPU are implied in the backtrace. Is this a
> regression in the kernel itself? amd-kfd-staging is currently based on
> 4.18-rc1.

FWIW, I saw this with 4.17 based kernels already, and I didn't have
CONFIG_DEBUG_RWSEMS enabled with older kernels, so I'm not sure it's a
(recent) regression.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
       [not found]         ` <6122cef7-ca1a-7266-1928-125db40a6735-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2018-07-12 16:32           ` Felix Kuehling
       [not found]             ` <f3c2ec87-dbc7-0146-be1b-5572faabd30e-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Felix Kuehling @ 2018-07-12 16:32 UTC (permalink / raw)
  To: Michel Dänzer, Pan, Xinhui; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

[+Pan Xinhui]

On 2018-07-12 03:16 AM, Michel Dänzer wrote:
> On 2018-07-12 02:43 AM, Felix Kuehling wrote:
>> Kent just caught a similar backtrace in one of our KFD pre-submission
>> tests (see below)
> Yeah, looks the same.
>
>
>> Neither KFD nor AMDGPU are implied in the backtrace. Is this a
>> regression in the kernel itself? amd-kfd-staging is currently based on
>> 4.18-rc1.
> FWIW, I saw this with 4.17 based kernels already, and I didn't have
> CONFIG_DEBUG_RWSEMS enabled with older kernels, so I'm not sure it's a
> (recent) regression.

I've now also seen it on Oded's branch (4.17-rc5). It is reproduced
reliably by a new test that Pan Xinhui just added to our kfdtest
(KFDMemoryTest.MMapLarge).

Regards,
  Felix


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
       [not found]             ` <f3c2ec87-dbc7-0146-be1b-5572faabd30e-5C7GfCeVMHo@public.gmane.org>
@ 2018-07-13 10:17               ` Pan, Xinhui
       [not found]                 ` <DM3PR12MB079529281552909CEE8D696987580-4hRkV8tDpBjVFNfCxKK//QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Pan, Xinhui @ 2018-07-13 10:17 UTC (permalink / raw)
  To: Kuehling, Felix, Michel Dänzer
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Reproduced on my side. This backtrace only shows one time after boot up unless reboot again.

This backtrace is introduced by KFDIPCTest.BasicTest(other tests also can introduce it).
This test defines some buffers which are mapped to gpu, and the fork later causes this warning. 

The code comments says libhfskmt should do cleanup work after fork as these buffers are invalid in child process.
But the warning shows during fork(), how libhsakmt do such cleanup work?

-----Original Message-----
From: Kuehling, Felix 
Sent: 2018年7月13日 0:32
To: Michel Dänzer <michel@daenzer.net>; Pan, Xinhui <Xinhui.Pan@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))

[+Pan Xinhui]

On 2018-07-12 03:16 AM, Michel Dänzer wrote:
> On 2018-07-12 02:43 AM, Felix Kuehling wrote:
>> Kent just caught a similar backtrace in one of our KFD pre-submission 
>> tests (see below)
> Yeah, looks the same.
>
>
>> Neither KFD nor AMDGPU are implied in the backtrace. Is this a 
>> regression in the kernel itself? amd-kfd-staging is currently based 
>> on 4.18-rc1.
> FWIW, I saw this with 4.17 based kernels already, and I didn't have 
> CONFIG_DEBUG_RWSEMS enabled with older kernels, so I'm not sure it's a
> (recent) regression.

I've now also seen it on Oded's branch (4.17-rc5). It is reproduced reliably by a new test that Pan Xinhui just added to our kfdtest (KFDMemoryTest.MMapLarge).

Regards,
  Felix


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
       [not found]                 ` <DM3PR12MB079529281552909CEE8D696987580-4hRkV8tDpBjVFNfCxKK//QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2018-07-13 11:13                   ` Pan, Xinhui
       [not found]                     ` <DM3PR12MB0795C18614354BF2A95D2D8D87580-4hRkV8tDpBjVFNfCxKK//QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  2018-07-13 13:54                   ` Pan, Xinhui
  1 sibling, 1 reply; 8+ messages in thread
From: Pan, Xinhui @ 2018-07-13 11:13 UTC (permalink / raw)
  To: Kuehling, Felix, Michel Dänzer
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

I think this is a kernel bug. See patch
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/locking?h=v4.18-rc4&id=03eeafdd9ab06a770d42c2b264d50dff7e2f4eee


-----Original Message-----
From: Pan, Xinhui 
Sent: 2018年7月13日 18:17
To: Kuehling, Felix <Felix.Kuehling@amd.com>; Michel Dänzer <michel@daenzer.net>
Cc: amd-gfx@lists.freedesktop.org
Subject: RE: DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))

Reproduced on my side. This backtrace only shows one time after boot up unless reboot again.

This backtrace is introduced by KFDIPCTest.BasicTest(other tests also can introduce it).
This test defines some buffers which are mapped to gpu, and the fork later causes this warning. 

The code comments says libhfskmt should do cleanup work after fork as these buffers are invalid in child process.
But the warning shows during fork(), how libhsakmt do such cleanup work?

-----Original Message-----
From: Kuehling, Felix
Sent: 2018年7月13日 0:32
To: Michel Dänzer <michel@daenzer.net>; Pan, Xinhui <Xinhui.Pan@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))

[+Pan Xinhui]

On 2018-07-12 03:16 AM, Michel Dänzer wrote:
> On 2018-07-12 02:43 AM, Felix Kuehling wrote:
>> Kent just caught a similar backtrace in one of our KFD pre-submission 
>> tests (see below)
> Yeah, looks the same.
>
>
>> Neither KFD nor AMDGPU are implied in the backtrace. Is this a 
>> regression in the kernel itself? amd-kfd-staging is currently based 
>> on 4.18-rc1.
> FWIW, I saw this with 4.17 based kernels already, and I didn't have 
> CONFIG_DEBUG_RWSEMS enabled with older kernels, so I'm not sure it's a
> (recent) regression.

I've now also seen it on Oded's branch (4.17-rc5). It is reproduced reliably by a new test that Pan Xinhui just added to our kfdtest (KFDMemoryTest.MMapLarge).

Regards,
  Felix


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
       [not found]                 ` <DM3PR12MB079529281552909CEE8D696987580-4hRkV8tDpBjVFNfCxKK//QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  2018-07-13 11:13                   ` Pan, Xinhui
@ 2018-07-13 13:54                   ` Pan, Xinhui
  1 sibling, 0 replies; 8+ messages in thread
From: Pan, Xinhui @ 2018-07-13 13:54 UTC (permalink / raw)
  To: Kuehling, Felix, Michel Dänzer
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 1930 bytes --]

Already fixed on upstream.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/locking?h=v4.18-rc4&id=03eeafdd9ab06a770d42c2b264d50dff7e2f4eee
________________________________
From: Pan, Xinhui
Sent: Friday, July 13, 2018 6:17:06 PM
To: Kuehling, Felix; Michel Dänzer
Cc: amd-gfx@lists.freedesktop.org
Subject: RE: DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))

Reproduced on my side. This backtrace only shows one time after boot up unless reboot again.

This backtrace is introduced by KFDIPCTest.BasicTest(other tests also can introduce it).
This test defines some buffers which are mapped to gpu, and the fork later causes this warning.

The code comments says libhfskmt should do cleanup work after fork as these buffers are invalid in child process.
But the warning shows during fork(), how libhsakmt do such cleanup work?

-----Original Message-----
From: Kuehling, Felix
Sent: 2018年7月13日 0:32
To: Michel Dänzer <michel@daenzer.net>; Pan, Xinhui <Xinhui.Pan@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))

[+Pan Xinhui]

On 2018-07-12 03:16 AM, Michel Dänzer wrote:
> On 2018-07-12 02:43 AM, Felix Kuehling wrote:
>> Kent just caught a similar backtrace in one of our KFD pre-submission
>> tests (see below)
> Yeah, looks the same.
>
>
>> Neither KFD nor AMDGPU are implied in the backtrace. Is this a
>> regression in the kernel itself? amd-kfd-staging is currently based
>> on 4.18-rc1.
> FWIW, I saw this with 4.17 based kernels already, and I didn't have
> CONFIG_DEBUG_RWSEMS enabled with older kernels, so I'm not sure it's a
> (recent) regression.

I've now also seen it on Oded's branch (4.17-rc5). It is reproduced reliably by a new test that Pan Xinhui just added to our kfdtest (KFDMemoryTest.MMapLarge).

Regards,
  Felix



[-- Attachment #1.2: Type: text/html, Size: 3239 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
       [not found]                     ` <DM3PR12MB0795C18614354BF2A95D2D8D87580-4hRkV8tDpBjVFNfCxKK//QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2018-07-13 13:55                       ` Michel Dänzer
  0 siblings, 0 replies; 8+ messages in thread
From: Michel Dänzer @ 2018-07-13 13:55 UTC (permalink / raw)
  To: Pan, Xinhui, Kuehling, Felix; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2018-07-13 01:13 PM, Pan, Xinhui wrote:
> I think this is a kernel bug. See patch
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/locking?h=v4.18-rc4&id=03eeafdd9ab06a770d42c2b264d50dff7e2f4eee

Nice find, thanks!


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-07-13 13:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-11 16:11 DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0))) Michel Dänzer
     [not found] ` <cacdbfb1-1760-518c-6a52-94fbd11748c5-otUistvHUpPR7s880joybQ@public.gmane.org>
2018-07-12  0:43   ` Felix Kuehling
     [not found]     ` <49364825-3647-e1bb-5ef7-bbcc25dbcfa0-5C7GfCeVMHo@public.gmane.org>
2018-07-12  7:16       ` Michel Dänzer
     [not found]         ` <6122cef7-ca1a-7266-1928-125db40a6735-otUistvHUpPR7s880joybQ@public.gmane.org>
2018-07-12 16:32           ` Felix Kuehling
     [not found]             ` <f3c2ec87-dbc7-0146-be1b-5572faabd30e-5C7GfCeVMHo@public.gmane.org>
2018-07-13 10:17               ` Pan, Xinhui
     [not found]                 ` <DM3PR12MB079529281552909CEE8D696987580-4hRkV8tDpBjVFNfCxKK//QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-07-13 11:13                   ` Pan, Xinhui
     [not found]                     ` <DM3PR12MB0795C18614354BF2A95D2D8D87580-4hRkV8tDpBjVFNfCxKK//QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-07-13 13:55                       ` Michel Dänzer
2018-07-13 13:54                   ` Pan, Xinhui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.