From: Mike Kravetz <mike.kravetz@oracle.com> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ltp@lists.linux.it Cc: Li Wang <liwang@redhat.com>, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>, Michal Hocko <mhocko@kernel.org>, Cyril Hrubis <chrubis@suse.cz>, xishi.qiuxishi@alibaba-inc.com, Andrew Morton <akpm@linux-foundation.org>, Mike Kravetz <mike.kravetz@oracle.com> Subject: [PATCH] hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS Date: Wed, 7 Aug 2019 17:05:33 -0700 [thread overview] Message-ID: <20190808000533.7701-1-mike.kravetz@oracle.com> (raw) Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS in the kernel-v5.2.3 testing. This is caused by a race between hugetlb page migration and page fault. If a hugetlb page can not be allocated to satisfy a page fault, the task is sent SIGBUS. This is normal hugetlbfs behavior. A hugetlb fault mutex exists to prevent two tasks from trying to instantiate the same page. This protects against the situation where there is only one hugetlb page, and both tasks would try to allocate. Without the mutex, one would fail and SIGBUS even though the other fault would be successful. There is a similar race between hugetlb page migration and fault. Migration code will allocate a page for the target of the migration. It will then unmap the original page from all page tables. It does this unmap by first clearing the pte and then writing a migration entry. The page table lock is held for the duration of this clear and write operation. However, the beginnings of the hugetlb page fault code optimistically checks the pte without taking the page table lock. If clear (as it can be during the migration unmap operation), a hugetlb page allocation is attempted to satisfy the fault. Note that the page which will eventually satisfy this fault was already allocated by the migration code. However, the allocation within the fault path could fail which would result in the task incorrectly being sent SIGBUS. Ideally, we could take the hugetlb fault mutex in the migration code when modifying the page tables. However, locks must be taken in the order of hugetlb fault mutex, page lock, page table lock. This would require significant rework of the migration code. Instead, the issue is addressed in the hugetlb fault code. After failing to allocate a huge page, take the page table lock and check for huge_pte_none before returning an error. This is the same check that must be made further in the code even if page allocation is successful. Reported-by: Li Wang <liwang@redhat.com> Fixes: 290408d4a250 ("hugetlb: hugepage migration core") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Tested-by: Li Wang <liwang@redhat.com> --- mm/hugetlb.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ede7e7f5d1ab..6d7296dd11b8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3856,6 +3856,25 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, page = alloc_huge_page(vma, haddr, 0); if (IS_ERR(page)) { + /* + * Returning error will result in faulting task being + * sent SIGBUS. The hugetlb fault mutex prevents two + * tasks from racing to fault in the same page which + * could result in false unable to allocate errors. + * Page migration does not take the fault mutex, but + * does a clear then write of pte's under page table + * lock. Page fault code could race with migration, + * notice the clear pte and try to allocate a page + * here. Before returning error, get ptl and make + * sure there really is no pte entry. + */ + ptl = huge_pte_lock(h, mm, ptep); + if (!huge_pte_none(huge_ptep_get(ptep))) { + ret = 0; + spin_unlock(ptl); + goto out; + } + spin_unlock(ptl); ret = vmf_error(PTR_ERR(page)); goto out; } -- 2.20.1
WARNING: multiple messages have this Message-ID (diff)
From: Mike Kravetz <mike.kravetz@oracle.com> To: ltp@lists.linux.it Subject: [LTP] [PATCH] hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS Date: Wed, 7 Aug 2019 17:05:33 -0700 [thread overview] Message-ID: <20190808000533.7701-1-mike.kravetz@oracle.com> (raw) Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS in the kernel-v5.2.3 testing. This is caused by a race between hugetlb page migration and page fault. If a hugetlb page can not be allocated to satisfy a page fault, the task is sent SIGBUS. This is normal hugetlbfs behavior. A hugetlb fault mutex exists to prevent two tasks from trying to instantiate the same page. This protects against the situation where there is only one hugetlb page, and both tasks would try to allocate. Without the mutex, one would fail and SIGBUS even though the other fault would be successful. There is a similar race between hugetlb page migration and fault. Migration code will allocate a page for the target of the migration. It will then unmap the original page from all page tables. It does this unmap by first clearing the pte and then writing a migration entry. The page table lock is held for the duration of this clear and write operation. However, the beginnings of the hugetlb page fault code optimistically checks the pte without taking the page table lock. If clear (as it can be during the migration unmap operation), a hugetlb page allocation is attempted to satisfy the fault. Note that the page which will eventually satisfy this fault was already allocated by the migration code. However, the allocation within the fault path could fail which would result in the task incorrectly being sent SIGBUS. Ideally, we could take the hugetlb fault mutex in the migration code when modifying the page tables. However, locks must be taken in the order of hugetlb fault mutex, page lock, page table lock. This would require significant rework of the migration code. Instead, the issue is addressed in the hugetlb fault code. After failing to allocate a huge page, take the page table lock and check for huge_pte_none before returning an error. This is the same check that must be made further in the code even if page allocation is successful. Reported-by: Li Wang <liwang@redhat.com> Fixes: 290408d4a250 ("hugetlb: hugepage migration core") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Tested-by: Li Wang <liwang@redhat.com> --- mm/hugetlb.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ede7e7f5d1ab..6d7296dd11b8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3856,6 +3856,25 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, page = alloc_huge_page(vma, haddr, 0); if (IS_ERR(page)) { + /* + * Returning error will result in faulting task being + * sent SIGBUS. The hugetlb fault mutex prevents two + * tasks from racing to fault in the same page which + * could result in false unable to allocate errors. + * Page migration does not take the fault mutex, but + * does a clear then write of pte's under page table + * lock. Page fault code could race with migration, + * notice the clear pte and try to allocate a page + * here. Before returning error, get ptl and make + * sure there really is no pte entry. + */ + ptl = huge_pte_lock(h, mm, ptep); + if (!huge_pte_none(huge_ptep_get(ptep))) { + ret = 0; + spin_unlock(ptl); + goto out; + } + spin_unlock(ptl); ret = vmf_error(PTR_ERR(page)); goto out; } -- 2.20.1
next reply other threads:[~2019-08-08 0:06 UTC|newest] Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-08-08 0:05 Mike Kravetz [this message] 2019-08-08 0:05 ` [LTP] [PATCH] hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS Mike Kravetz 2019-08-08 3:36 ` Naoya Horiguchi 2019-08-08 3:36 ` [LTP] " Naoya Horiguchi 2019-08-08 7:46 ` Michal Hocko 2019-08-08 7:46 ` [LTP] " Michal Hocko 2019-08-08 7:47 ` Michal Hocko 2019-08-08 7:47 ` [LTP] " Michal Hocko 2019-08-08 16:55 ` Mike Kravetz 2019-08-08 16:55 ` [LTP] " Mike Kravetz 2019-08-08 18:53 ` Michal Hocko 2019-08-08 18:53 ` [LTP] " Michal Hocko 2019-08-08 23:39 ` Andrew Morton 2019-08-08 23:39 ` [LTP] " Andrew Morton 2019-08-09 6:46 ` Michal Hocko 2019-08-09 6:46 ` [LTP] " Michal Hocko 2019-08-09 22:17 ` Andrew Morton 2019-08-09 22:17 ` [LTP] " Andrew Morton 2019-08-11 23:46 ` Sasha Levin 2019-08-11 23:46 ` [LTP] " Sasha Levin 2019-08-12 8:45 ` Michal Hocko 2019-08-12 8:45 ` [LTP] " Michal Hocko 2019-08-12 13:14 ` Vlastimil Babka 2019-08-12 13:14 ` [LTP] " Vlastimil Babka 2019-08-12 13:22 ` Michal Hocko 2019-08-12 13:22 ` [LTP] " Michal Hocko 2019-08-12 15:33 ` Sasha Levin 2019-08-12 15:33 ` [LTP] " Sasha Levin 2019-08-12 16:09 ` Qian Cai 2019-08-12 16:09 ` [LTP] " Qian Cai 2019-08-12 16:09 ` Qian Cai 2019-08-12 21:37 ` Andrew Morton 2019-08-12 21:37 ` [LTP] " Andrew Morton 2019-08-13 8:43 ` Michal Hocko 2019-08-13 8:43 ` [LTP] " Michal Hocko 2019-08-08 2:24 裘稀石(稀石) 2019-08-08 2:44 ` Mike Kravetz
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190808000533.7701-1-mike.kravetz@oracle.com \ --to=mike.kravetz@oracle.com \ --cc=akpm@linux-foundation.org \ --cc=chrubis@suse.cz \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=liwang@redhat.com \ --cc=ltp@lists.linux.it \ --cc=mhocko@kernel.org \ --cc=n-horiguchi@ah.jp.nec.com \ --cc=xishi.qiuxishi@alibaba-inc.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.