On 6/30/17 5:57 PM, Jerome Glisse wrote: ... Hi Jerome, I am working on a sporadic data corruption seen in highly contented use cases. So far, I've been able to re-create a sporadic hang that happens when multiple threads compete to migrate the same page to and from device memory. The reproducer uses only the dummy driver from hmm-next. Please find attached. This is how it hangs on my 12-core Intel i7-5930K SMT system: &&& 2 migrate threads, 2 read threads: STARTING (EE:84) hmm_buffer_mirror_read error -1 &&& 2 migrate threads, 2 read threads: PASSED &&& 2 migrate threads, 3 read threads: STARTING &&& 2 migrate threads, 3 read threads: PASSED &&& 2 migrate threads, 4 read threads: STARTING &&& 2 migrate threads, 4 read threads: PASSED &&& 3 migrate threads, 2 read threads: STARTING The kernel log (also attached) shows multiple threads blocked in hmm_vma_fault() and migrate_vma(): [ 139.054907] sanity_rmem004 D13528 3997 3818 0x00000000 [ 139.054912] Call Trace: [ 139.054914] __schedule+0x20b/0x6c0 [ 139.054916] schedule+0x36/0x80 [ 139.054920] io_schedule+0x16/0x40 [ 139.054923] __lock_page+0xf2/0x130 [ 139.054929] migrate_vma+0x48a/0xee0 [ 139.054933] dummy_migrate.isra.10+0xd9/0x110 [hmm_dmirror] [ 139.054945] dummy_fops_unlocked_ioctl+0x1e8/0x330 [hmm_dmirror] [ 139.054954] do_vfs_ioctl+0x96/0x5a0 [ 139.054957] SyS_ioctl+0x79/0x90 [ 139.054960] entry_SYSCALL_64_fastpath+0x13/0x94 ... [ 139.055067] sanity_rmem004 D13136 3999 3818 0x00000000 [ 139.055072] Call Trace: [ 139.055074] __schedule+0x20b/0x6c0 [ 139.055076] schedule+0x36/0x80 [ 139.055079] io_schedule+0x16/0x40 [ 139.055083] wait_on_page_bit+0xee/0x120 [ 139.055089] __migration_entry_wait+0xe8/0x190 [ 139.055091] migration_entry_wait+0x5f/0x70 [ 139.055094] do_swap_page+0x4c7/0x4e0 [ 139.055096] __handle_mm_fault+0x347/0x9d0 [ 139.055099] handle_mm_fault+0x88/0x150 [ 139.055103] hmm_vma_walk_clear+0x8f/0xd0 [ 139.055105] hmm_vma_walk_pmd+0x1ba/0x250 [ 139.055109] __walk_page_range+0x1e8/0x420 [ 139.055112] walk_page_range+0x73/0xf0 [ 139.055114] hmm_vma_fault+0x180/0x260 [ 139.055121] dummy_fault+0xda/0x1f0 [hmm_dmirror] [ 139.055138] dummy_fops_unlocked_ioctl+0x12c/0x330 [hmm_dmirror] [ 139.055142] do_vfs_ioctl+0x96/0x5a0 [ 139.055145] SyS_ioctl+0x79/0x90 [ 139.055148] entry_SYSCALL_64_fastpath+0x13/0x94 Please compile and run the attached program this way: $ ./build.sh $ sudo ./kload.sh $ sudo ./run.sh Thanks! Evgeny Baskakov NVIDIA