madvise(MADV_REMOVE) deadlocks on shmem THP

* madvise(MADV_REMOVE) deadlocks on shmem THP
@ 2021-01-14  3:33 Sergey Senozhatsky
  2021-01-14  4:31 ` Hugh Dickins
  0 siblings, 1 reply; 3+ messages in thread
From: Sergey Senozhatsky @ 2021-01-14  3:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Kirill A. Shutemov, Suleiman Souhlal,
	Matthew Wilcox, linux-mm, linux-kernel

Hi,

We are running into lockups during the memory pressure tests on our
boards, which essentially NMI panic them. In short the test case is

- THP shmem
    echo advise > /sys/kernel/mm/transparent_hugepage/shmem_enabled

- And a user-space process doing madvise(MADV_HUGEPAGE) on new mappings,
  and madvise(MADV_REMOVE) when it wants to remove the page range

The problem boils down to the reverse locking chain:
	kswapd does

		lock_page(page) -> down_read(page->mapping->i_mmap_rwsem)

	madvise() process does

		down_write(page->mapping->i_mmap_rwsem) -> lock_page(page)

CPU0                                                       CPU1

kswapd                                                     vfs_fallocate()
 shrink_node()                                              shmem_fallocate()
  shrink_active_list()                                       unmap_mapping_range()
   page_referenced() << lock page:PG_locked >>                unmap_mapping_pages()  << down_write(mapping->i_mmap_rwsem) >>
    rmap_walk_file()                                           zap_page_range_single()
     down_read(mapping->i_mmap_rwsem) << W-locked on CPU1>>     unmap_page_range()
      rwsem_down_read_failed()                                   __split_huge_pmd()
       __rwsem_down_read_failed_common()                          __lock_page()  << PG_locked on CPU0 >>
        schedule()                                                 wait_on_page_bit_common()
                                                                    io_schedule()

	-ss

^ permalink raw reply	[flat|nested] 3+ messages in thread