All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aaron Tomlin <atomlin@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-mm@kvack.org, Mel Gorman <mgorman@suse.de>,
	Jerome Glisse <jglisse@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 2/3] mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()
Date: Mon, 15 Oct 2018 16:38:08 +0100	[thread overview]
Message-ID: <20181015153808.zcpswxnchlp67sdh@atomlin.usersys.com> (raw)
In-Reply-To: <20181013002430.698-3-aarcange@redhat.com>

On Fri 2018-10-12 20:24 -0400, Andrea Arcangeli wrote:
> change_huge_pmd() after arming the numa/protnone pmd doesn't flush the
> TLB right away. do_huge_pmd_numa_page() flushes the TLB before calling
> migrate_misplaced_transhuge_page(). By the time
> do_huge_pmd_numa_page() runs some CPU could still access the page
> through the TLB.
> 
> change_huge_pmd() before arming the numa/protnone transhuge pmd calls
> mmu_notifier_invalidate_range_start(). So there's no need of
> mmu_notifier_invalidate_range_start()/mmu_notifier_invalidate_range_only_end()
> sequence in migrate_misplaced_transhuge_page() too, because by the
> time migrate_misplaced_transhuge_page() runs, the pmd mapping has
> already been invalidated in the secondary MMUs. It has to or if a
> secondary MMU can still write to the page, the migrate_page_copy()
> would lose data.
> 
> However an explicit mmu_notifier_invalidate_range() is needed before
> migrate_misplaced_transhuge_page() starts copying the data of the
> transhuge page or the below can happen for MMU notifier users sharing
> the primary MMU pagetables and only implementing ->invalidate_range:
> 
> CPU0		CPU1		GPU sharing linux pagetables using
>                                 only ->invalidate_range
> -----------	------------	---------
> 				GPU secondary MMU writes to the page
> 				mapped by the transhuge pmd
> change_pmd_range()
> mmu..._range_start()
> ->invalidate_range_start() noop
> change_huge_pmd()
> set_pmd_at(numa/protnone)
> pmd_unlock()
> 		do_huge_pmd_numa_page()
> 		CPU TLB flush globally (1)
> 		CPU cannot write to page
> 		migrate_misplaced_transhuge_page()
> 				GPU writes to the page...
> 		migrate_page_copy()
> 				...GPU stops writing to the page
> CPU TLB flush (2)
> mmu..._range_end() (3)
> ->invalidate_range_stop() noop
> ->invalidate_range()
> 				GPU secondary MMU is invalidated
> 				and cannot write to the page anymore
> 				(too late)
> 
> Just like we need a CPU TLB flush (1) because the TLB flush (2)
> arrives too late, we also need a mmu_notifier_invalidate_range()
> before calling migrate_misplaced_transhuge_page(), because the
> ->invalidate_range() in (3) also arrives too late.
> 
> This requirement is the result of the lazy optimization in
> change_huge_pmd() that releases the pmd_lock without first flushing
> the TLB and without first calling mmu_notifier_invalidate_range().
> 
> Even converting the removed mmu_notifier_invalidate_range_only_end()
> into a mmu_notifier_invalidate_range_end() would not have been enough
> to fix this, because it run after migrate_page_copy().
> 
> After the hugepage data copy is done
> migrate_misplaced_transhuge_page() can proceed and call set_pmd_at
> without having to flush the TLB nor any secondary MMUs because the
> secondary MMU invalidate, just like the CPU TLB flush, has to happen
> before the migrate_page_copy() is called or it would be a bug in the
> first place (and it was for drivers using ->invalidate_range()).
> 
> KVM is unaffected because it doesn't implement ->invalidate_range().
> 
> The standard PAGE_SIZEd migrate_misplaced_page is less accelerated and
> uses the generic migrate_pages which transitions the pte from
> numa/protnone to a migration entry in try_to_unmap_one() and flushes
> TLBs and all mmu notifiers there before copying the page.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Reviewed-by: Aaron Tomlin <atomlin@redhat.com>

-- 
Aaron Tomlin

  parent reply	other threads:[~2018-10-15 15:38 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-13  0:24 [PATCH 0/3] migrate_misplaced_transhuge_page race conditions Andrea Arcangeli
2018-10-13  0:24 ` [PATCH 1/3] mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition Andrea Arcangeli
2018-10-15 11:33   ` Mel Gorman
2018-10-15 15:30   ` Kirill A. Shutemov
2018-10-13  0:24 ` [PATCH 2/3] mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page() Andrea Arcangeli
2018-10-15 11:36   ` Mel Gorman
2018-10-15 15:33   ` Kirill A. Shutemov
2018-10-15 15:38   ` Aaron Tomlin [this message]
2018-10-13  0:24 ` [PATCH 3/3] mm: thp: relocate flush_cache_range() " Andrea Arcangeli
2018-10-14  9:58   ` kbuild test robot
2018-10-14 19:58     ` Andrea Arcangeli
2018-10-15 15:35       ` Kirill A. Shutemov
2018-10-15 20:23   ` [PATCH 1/1] " Andrea Arcangeli
2018-10-15 22:11     ` Andrew Morton
2018-10-15 22:52     ` Andrew Morton
2018-10-15 23:03       ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181015153808.zcpswxnchlp67sdh@atomlin.usersys.com \
    --to=atomlin@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=jglisse@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.