linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rik van Riel <riel@surriel.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Chris Mason <clm@meta.com>, David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, Andrew Morton <akpm@linux-foundation.org>
Subject: [BUG] hugetlbfs_no_page vs MADV_DONTNEED race leading to SIGBUS
Date: Tue, 25 Oct 2022 16:37:51 -0400	[thread overview]
Message-ID: <215d225585ff3c5ea90c64e6c9bdff04ab548156.camel@surriel.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1758 bytes --]

Hi Mike,

After getting promising results initially, we discovered there
is yet another bug left with hugetlbfs MADV_DONTNEED.

This one involves a page fault on a hugetlbfs address, while
another thread in the same process is in the middle of MADV_DONTNEED
on that same memory address.

The code in __unmap_hugepage_range() will clear the page table
entry, and then at some point later the lazy TLB code will 
actually free the huge page back into the hugetlbfs free page
pool.

Meanwhile, hugetlb_no_page will call alloc_huge_page, and that
will fail because the code calling __unmap_hugepage_range() has
not actually returned the page to the free list yet.

The result is that the process gets killed with SIGBUS.

I have thought of a few different solutions to this problem, but
none of them look good:
- Make MADV_DONTNEED take a write lock on mmap_sem, to exclude
  page faults. This could make MADV_DONTNEED on VMAs with 4kB
  pages unacceptably slow.
- Some sort of atomic counter kept by __unmap_hugepage_range()
  that huge pages may be getting placed in the tlb gather, and
  freed later by tlb_finish_mmu().  This would involve changes
  to the MMU gather code, outside of hugetlbfs.
- Some sort of generation counter that tracks tlb_gather_mmu
  cycles in progress, with the alloc_huge_page failure path
  waiting until all mmu gather operations that started before
  it to finish, before retrying the allocation. This requires
  changes to the generic code, outside of hugetlbfs.

What are the reasonable alternatives here?

Should we see if anybody can come up with a simple solution
to the problem, or would it be better to just disable
MADV_DONTNEED on hugetlbfs for now?

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

             reply	other threads:[~2022-10-25 20:38 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-25 20:37 Rik van Riel [this message]
2022-10-26  3:07 ` [BUG] hugetlbfs_no_page vs MADV_DONTNEED race leading to SIGBUS Mike Kravetz
2022-10-26  4:13   ` Mike Kravetz
2022-10-26 19:38 ` Mike Kravetz
2022-10-27 15:07   ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=215d225585ff3c5ea90c64e6c9bdff04ab548156.camel@surriel.com \
    --to=riel@surriel.com \
    --cc=akpm@linux-foundation.org \
    --cc=clm@meta.com \
    --cc=david@redhat.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).