linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* madvise(MADV_REMOVE) deadlocks on shmem THP
@ 2021-01-14  3:33 Sergey Senozhatsky
  2021-01-14  4:31 ` Hugh Dickins
  0 siblings, 1 reply; 3+ messages in thread
From: Sergey Senozhatsky @ 2021-01-14  3:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Kirill A. Shutemov, Suleiman Souhlal,
	Matthew Wilcox, linux-mm, linux-kernel

Hi,

We are running into lockups during the memory pressure tests on our
boards, which essentially NMI panic them. In short the test case is

- THP shmem
    echo advise > /sys/kernel/mm/transparent_hugepage/shmem_enabled

- And a user-space process doing madvise(MADV_HUGEPAGE) on new mappings,
  and madvise(MADV_REMOVE) when it wants to remove the page range

The problem boils down to the reverse locking chain:
	kswapd does

		lock_page(page) -> down_read(page->mapping->i_mmap_rwsem)

	madvise() process does

		down_write(page->mapping->i_mmap_rwsem) -> lock_page(page)



CPU0                                                       CPU1

kswapd                                                     vfs_fallocate()
 shrink_node()                                              shmem_fallocate()
  shrink_active_list()                                       unmap_mapping_range()
   page_referenced() << lock page:PG_locked >>                unmap_mapping_pages()  << down_write(mapping->i_mmap_rwsem) >>
    rmap_walk_file()                                           zap_page_range_single()
     down_read(mapping->i_mmap_rwsem) << W-locked on CPU1>>     unmap_page_range()
      rwsem_down_read_failed()                                   __split_huge_pmd()
       __rwsem_down_read_failed_common()                          __lock_page()  << PG_locked on CPU0 >>
        schedule()                                                 wait_on_page_bit_common()
                                                                    io_schedule()

	-ss


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: madvise(MADV_REMOVE) deadlocks on shmem THP
  2021-01-14  3:33 madvise(MADV_REMOVE) deadlocks on shmem THP Sergey Senozhatsky
@ 2021-01-14  4:31 ` Hugh Dickins
  2021-01-14  5:38   ` Sergey Senozhatsky
  0 siblings, 1 reply; 3+ messages in thread
From: Hugh Dickins @ 2021-01-14  4:31 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Hugh Dickins, Kirill A. Shutemov,
	Suleiman Souhlal, Matthew Wilcox, Andrea Arcangeli, linux-mm,
	linux-kernel

On Thu, 14 Jan 2021, Sergey Senozhatsky wrote:

> Hi,
> 
> We are running into lockups during the memory pressure tests on our
> boards, which essentially NMI panic them. In short the test case is
> 
> - THP shmem
>     echo advise > /sys/kernel/mm/transparent_hugepage/shmem_enabled
> 
> - And a user-space process doing madvise(MADV_HUGEPAGE) on new mappings,
>   and madvise(MADV_REMOVE) when it wants to remove the page range
> 
> The problem boils down to the reverse locking chain:
> 	kswapd does
> 
> 		lock_page(page) -> down_read(page->mapping->i_mmap_rwsem)
> 
> 	madvise() process does
> 
> 		down_write(page->mapping->i_mmap_rwsem) -> lock_page(page)
> 
> 
> 
> CPU0                                                       CPU1
> 
> kswapd                                                     vfs_fallocate()
>  shrink_node()                                              shmem_fallocate()
>   shrink_active_list()                                       unmap_mapping_range()
>    page_referenced() << lock page:PG_locked >>                unmap_mapping_pages()  << down_write(mapping->i_mmap_rwsem) >>
>     rmap_walk_file()                                           zap_page_range_single()
>      down_read(mapping->i_mmap_rwsem) << W-locked on CPU1>>     unmap_page_range()
>       rwsem_down_read_failed()                                   __split_huge_pmd()
>        __rwsem_down_read_failed_common()                          __lock_page()  << PG_locked on CPU0 >>
>         schedule()                                                 wait_on_page_bit_common()
>                                                                     io_schedule()

Very interesting, Sergey: many thanks for this report.

There is no doubt that kswapd is right in its lock ordering:
__split_huge_pmd() is in the wrong to be attempting lock_page().

Which used not to be done, but was added in 5.8's c444eb564fb1 ("mm:
thp: make the THP mapcount atomic against __split_huge_pmd_locked()").

Which explains why this deadlock was not seen years ago: that
surprised me at first, since the case you show to reproduce it is good,
but I'd expect more common ways in which that deadlock could show up.

And your report is remarkably timely too: I have two other reasons
for looking at that change at the moment (I'm currently catching up
with recent discussion of page_count versus mapcount when deciding
COW page reuse).

I won't say more tonight, but should have more to add tomorrow.

Hugh


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: madvise(MADV_REMOVE) deadlocks on shmem THP
  2021-01-14  4:31 ` Hugh Dickins
@ 2021-01-14  5:38   ` Sergey Senozhatsky
  0 siblings, 0 replies; 3+ messages in thread
From: Sergey Senozhatsky @ 2021-01-14  5:38 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Sergey Senozhatsky, Andrew Morton, Kirill A. Shutemov,
	Suleiman Souhlal, Matthew Wilcox, Andrea Arcangeli, linux-mm,
	linux-kernel

On (21/01/13 20:31), Hugh Dickins wrote:
> > We are running into lockups during the memory pressure tests on our
> > boards, which essentially NMI panic them. In short the test case is
> > 
> > - THP shmem
> >     echo advise > /sys/kernel/mm/transparent_hugepage/shmem_enabled
> > 
> > - And a user-space process doing madvise(MADV_HUGEPAGE) on new mappings,
> >   and madvise(MADV_REMOVE) when it wants to remove the page range
> > 
> > The problem boils down to the reverse locking chain:
> > 	kswapd does
> > 
> > 		lock_page(page) -> down_read(page->mapping->i_mmap_rwsem)
> > 
> > 	madvise() process does
> > 
> > 		down_write(page->mapping->i_mmap_rwsem) -> lock_page(page)
> > 
> > 
> > 
> > CPU0                                                       CPU1
> > 
> > kswapd                                                     vfs_fallocate()
> >  shrink_node()                                              shmem_fallocate()
> >   shrink_active_list()                                       unmap_mapping_range()
> >    page_referenced() << lock page:PG_locked >>                unmap_mapping_pages()  << down_write(mapping->i_mmap_rwsem) >>
> >     rmap_walk_file()                                           zap_page_range_single()
> >      down_read(mapping->i_mmap_rwsem) << W-locked on CPU1>>     unmap_page_range()
> >       rwsem_down_read_failed()                                   __split_huge_pmd()
> >        __rwsem_down_read_failed_common()                          __lock_page()  << PG_locked on CPU0 >>
> >         schedule()                                                 wait_on_page_bit_common()
> >                                                                     io_schedule()
> 
> Very interesting, Sergey: many thanks for this report.

Thanks for the quick feedback.

> There is no doubt that kswapd is right in its lock ordering:
> __split_huge_pmd() is in the wrong to be attempting lock_page().
> 
> Which used not to be done, but was added in 5.8's c444eb564fb1 ("mm:
> thp: make the THP mapcount atomic against __split_huge_pmd_locked()").

Hugh, I forgot to mention, we are facing these issues on 4.19.
Let me check if (maybe) we have cherry picked c444eb564fb1.

	-ss


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-01-14  5:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-14  3:33 madvise(MADV_REMOVE) deadlocks on shmem THP Sergey Senozhatsky
2021-01-14  4:31 ` Hugh Dickins
2021-01-14  5:38   ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).