All of lore.kernel.org
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Li Wang <liwang@redhat.com>, Linux-MM <linux-mm@kvack.org>,
	LTP List <ltp@lists.linux.it>,
	"xishi.qiuxishi@alibaba-inc.com" <xishi.qiuxishi@alibaba-inc.com>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	Cyril Hrubis <chrubis@suse.cz>
Subject: Re: [MM Bug?] mmap() triggers SIGBUS while doing the​ ​numa_move_pages() for offlined hugepage in background
Date: Mon, 5 Aug 2019 00:40:42 +0000	[thread overview]
Message-ID: <20190805004042.GA16862@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <54a5c9f5-eade-0d8f-24f9-bff6f19d4905@oracle.com>

On Fri, Aug 02, 2019 at 10:42:33AM -0700, Mike Kravetz wrote:
> On 8/1/19 9:15 PM, Naoya Horiguchi wrote:
> > On Thu, Aug 01, 2019 at 05:19:41PM -0700, Mike Kravetz wrote:
> >> There appears to be a race with hugetlb_fault and try_to_unmap_one of
> >> the migration path.
> >>
> >> Can you try this patch in your environment?  I am not sure if it will
> >> be the final fix, but just wanted to see if it addresses issue for you.
> >>
> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >> index ede7e7f5d1ab..f3156c5432e3 100644
> >> --- a/mm/hugetlb.c
> >> +++ b/mm/hugetlb.c
> >> @@ -3856,6 +3856,20 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> >>  
> >>  		page = alloc_huge_page(vma, haddr, 0);
> >>  		if (IS_ERR(page)) {
> >> +			/*
> >> +			 * We could race with page migration (try_to_unmap_one)
> >> +			 * which is modifying page table with lock.  However,
> >> +			 * we are not holding lock here.  Before returning
> >> +			 * error that will SIGBUS caller, get ptl and make
> >> +			 * sure there really is no entry.
> >> +			 */
> >> +			ptl = huge_pte_lock(h, mm, ptep);
> >> +			if (!huge_pte_none(huge_ptep_get(ptep))) {
> >> +				ret = 0;
> >> +				spin_unlock(ptl);
> >> +				goto out;
> >> +			}
> >> +			spin_unlock(ptl);
> > 
> > Thanks you for investigation, Mike.
> > I tried this change and found no SIGBUS, so it works well.
> > 
> > I'm still not clear about how !huge_pte_none() becomes true here,
> > because we enter hugetlb_no_page() only when huge_pte_none() is non-null
> > and (racy) try_to_unmap_one() from page migration should convert the
> > huge_pte into a migration entry, not null.
> 
> Thanks for taking a look Naoya.
> 
> In try_to_unmap_one(), there is this code block:
> 
> 		/* Nuke the page table entry. */
> 		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
> 		if (should_defer_flush(mm, flags)) {
> 			/*
> 			 * We clear the PTE but do not flush so potentially
> 			 * a remote CPU could still be writing to the page.
> 			 * If the entry was previously clean then the
> 			 * architecture must guarantee that a clear->dirty
> 			 * transition on a cached TLB entry is written through
> 			 * and traps if the PTE is unmapped.
> 			 */
> 			pteval = ptep_get_and_clear(mm, address, pvmw.pte);
> 
> 			set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
> 		} else {
> 			pteval = ptep_clear_flush(vma, address, pvmw.pte);
> 		}
> 
> That happens before setting the migration entry.  Therefore, for a period
> of time the pte is NULL (huge_pte_none() returns true).
> 
> try_to_unmap_one holds the page table lock, but hugetlb_fault does not take
> the lock to 'optimistically' check huge_pte_none().  When huge_pte_none
> returns true, it calls hugetlb_no_page which is where we try to allocate
> a page and fails.
> 
> Does that make sense, or am I missing something?

Make sense to me, thanks.

> 
> The patch checks for this specific condition: someone changing the pte
> from NULL to non-NULL while holding the lock.  I am not sure if this is
> the best way to fix.  But, it may be the easiest.

Yes, I think so.

- Naoya

WARNING: multiple messages have this Message-ID (diff)
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: ltp@lists.linux.it
Subject: [LTP]  [MM Bug?] mmap() triggers SIGBUS while doing the​ ​numa_move_pages() for offlined hugepage in background
Date: Mon, 5 Aug 2019 00:40:42 +0000	[thread overview]
Message-ID: <20190805004042.GA16862@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <54a5c9f5-eade-0d8f-24f9-bff6f19d4905@oracle.com>

On Fri, Aug 02, 2019 at 10:42:33AM -0700, Mike Kravetz wrote:
> On 8/1/19 9:15 PM, Naoya Horiguchi wrote:
> > On Thu, Aug 01, 2019 at 05:19:41PM -0700, Mike Kravetz wrote:
> >> There appears to be a race with hugetlb_fault and try_to_unmap_one of
> >> the migration path.
> >>
> >> Can you try this patch in your environment?  I am not sure if it will
> >> be the final fix, but just wanted to see if it addresses issue for you.
> >>
> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >> index ede7e7f5d1ab..f3156c5432e3 100644
> >> --- a/mm/hugetlb.c
> >> +++ b/mm/hugetlb.c
> >> @@ -3856,6 +3856,20 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> >>  
> >>  		page = alloc_huge_page(vma, haddr, 0);
> >>  		if (IS_ERR(page)) {
> >> +			/*
> >> +			 * We could race with page migration (try_to_unmap_one)
> >> +			 * which is modifying page table with lock.  However,
> >> +			 * we are not holding lock here.  Before returning
> >> +			 * error that will SIGBUS caller, get ptl and make
> >> +			 * sure there really is no entry.
> >> +			 */
> >> +			ptl = huge_pte_lock(h, mm, ptep);
> >> +			if (!huge_pte_none(huge_ptep_get(ptep))) {
> >> +				ret = 0;
> >> +				spin_unlock(ptl);
> >> +				goto out;
> >> +			}
> >> +			spin_unlock(ptl);
> > 
> > Thanks you for investigation, Mike.
> > I tried this change and found no SIGBUS, so it works well.
> > 
> > I'm still not clear about how !huge_pte_none() becomes true here,
> > because we enter hugetlb_no_page() only when huge_pte_none() is non-null
> > and (racy) try_to_unmap_one() from page migration should convert the
> > huge_pte into a migration entry, not null.
> 
> Thanks for taking a look Naoya.
> 
> In try_to_unmap_one(), there is this code block:
> 
> 		/* Nuke the page table entry. */
> 		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
> 		if (should_defer_flush(mm, flags)) {
> 			/*
> 			 * We clear the PTE but do not flush so potentially
> 			 * a remote CPU could still be writing to the page.
> 			 * If the entry was previously clean then the
> 			 * architecture must guarantee that a clear->dirty
> 			 * transition on a cached TLB entry is written through
> 			 * and traps if the PTE is unmapped.
> 			 */
> 			pteval = ptep_get_and_clear(mm, address, pvmw.pte);
> 
> 			set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
> 		} else {
> 			pteval = ptep_clear_flush(vma, address, pvmw.pte);
> 		}
> 
> That happens before setting the migration entry.  Therefore, for a period
> of time the pte is NULL (huge_pte_none() returns true).
> 
> try_to_unmap_one holds the page table lock, but hugetlb_fault does not take
> the lock to 'optimistically' check huge_pte_none().  When huge_pte_none
> returns true, it calls hugetlb_no_page which is where we try to allocate
> a page and fails.
> 
> Does that make sense, or am I missing something?

Make sense to me, thanks.

> 
> The patch checks for this specific condition: someone changing the pte
> from NULL to non-NULL while holding the lock.  I am not sure if this is
> the best way to fix.  But, it may be the easiest.

Yes, I think so.

- Naoya

  reply	other threads:[~2019-08-05  0:42 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-29  5:17 [MM Bug?] mmap() triggers SIGBUS while doing the​ ​numa_move_pages() for offlined hugepage in background Li Wang
2019-07-29  5:17 ` [LTP] " Li Wang
2019-07-29 19:00 ` Mike Kravetz
2019-07-29 19:00   ` [LTP] " Mike Kravetz
2019-07-30  6:29   ` Li Wang
2019-07-30  6:29     ` [LTP] " Li Wang
2019-07-31  0:44     ` Mike Kravetz
2019-07-31  0:44       ` [LTP] " Mike Kravetz
2019-08-02  0:19       ` Mike Kravetz
2019-08-02  0:19         ` [LTP] " Mike Kravetz
2019-08-02  4:15         ` Naoya Horiguchi
2019-08-02  4:15           ` [LTP] " Naoya Horiguchi
2019-08-02 17:42           ` Mike Kravetz
2019-08-02 17:42             ` [LTP] " Mike Kravetz
2019-08-05  0:40             ` Naoya Horiguchi [this message]
2019-08-05  0:40               ` Naoya Horiguchi
2019-08-05  8:57             ` Michal Hocko
2019-08-05  8:57               ` [LTP] " Michal Hocko
2019-08-05 17:36               ` Mike Kravetz
2019-08-05 17:36                 ` [LTP] " Mike Kravetz
2019-08-07  0:07                 ` Mike Kravetz
2019-08-07  0:07                   ` [LTP] " Mike Kravetz
2019-08-07  7:39                   ` Michal Hocko
2019-08-07  7:39                     ` [LTP] " Michal Hocko
2019-08-07 15:10                     ` Mike Kravetz
2019-08-07 15:10                       ` [LTP] " Mike Kravetz
2019-08-02  9:59         ` Li Wang
2019-08-02  9:59           ` [LTP] " Li Wang
2019-07-30  6:38   ` Li Wang
2019-07-30  6:38     ` [LTP] " Li Wang
2019-08-02  3:48 ` Naoya Horiguchi
2019-08-02  3:48   ` [LTP] " Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190805004042.GA16862@hori.linux.bs1.fc.nec.co.jp \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=chrubis@suse.cz \
    --cc=linux-mm@kvack.org \
    --cc=liwang@redhat.com \
    --cc=ltp@lists.linux.it \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=xishi.qiuxishi@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.