linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Jerome Glisse <jglisse@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	stable@vger.kernel.org, linux-rdma@vger.kernel.org,
	Matan Barak <matanb@mellanox.com>,
	Leon Romanovsky <leonro@mellanox.com>,
	Dimitri Sivanich <sivanich@sgi.com>
Subject: Re: [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages
Date: Mon, 3 Sep 2018 07:56:54 +0200	[thread overview]
Message-ID: <20180903055654.GA14951@dhcp22.suse.cz> (raw)
In-Reply-To: <20180830183944.GE3529@redhat.com>

On Thu 30-08-18 14:39:44, Jerome Glisse wrote:
> On Thu, Aug 30, 2018 at 11:05:16AM -0700, Mike Kravetz wrote:
> > On 08/30/2018 09:57 AM, Jerome Glisse wrote:
> > > On Thu, Aug 30, 2018 at 06:19:52PM +0200, Michal Hocko wrote:
> > >> On Thu 30-08-18 10:08:25, Jerome Glisse wrote:
> > >>> On Thu, Aug 30, 2018 at 12:56:16PM +0200, Michal Hocko wrote:
> > >>>> On Wed 29-08-18 17:11:07, Jerome Glisse wrote:
> > >>>>> On Wed, Aug 29, 2018 at 08:39:06PM +0200, Michal Hocko wrote:
> > >>>>>> On Wed 29-08-18 14:14:25, Jerome Glisse wrote:
> > >>>>>>> On Wed, Aug 29, 2018 at 10:24:44AM -0700, Mike Kravetz wrote:
> > >>>>>> [...]
> > >>>>>>>> What would be the best mmu notifier interface to use where there are no
> > >>>>>>>> start/end calls?
> > >>>>>>>> Or, is the best solution to add the start/end calls as is done in later
> > >>>>>>>> versions of the code?  If that is the suggestion, has there been any change
> > >>>>>>>> in invalidate start/end semantics that we should take into account?
> > >>>>>>>
> > >>>>>>> start/end would be the one to add, 4.4 seems broken in respect to THP
> > >>>>>>> and mmu notification. Another solution is to fix user of mmu notifier,
> > >>>>>>> they were only a handful back then. For instance properly adjust the
> > >>>>>>> address to match first address covered by pmd or pud and passing down
> > >>>>>>> correct page size to mmu_notifier_invalidate_page() would allow to fix
> > >>>>>>> this easily.
> > >>>>>>>
> > >>>>>>> This is ok because user of try_to_unmap_one() replace the pte/pmd/pud
> > >>>>>>> with an invalid one (either poison, migration or swap) inside the
> > >>>>>>> function. So anyone racing would synchronize on those special entry
> > >>>>>>> hence why it is fine to delay mmu_notifier_invalidate_page() to after
> > >>>>>>> dropping the page table lock.
> > >>>>>>>
> > >>>>>>> Adding start/end might the solution with less code churn as you would
> > >>>>>>> only need to change try_to_unmap_one().
> > >>>>>>
> > >>>>>> What about dependencies? 369ea8242c0fb sounds like it needs work for all
> > >>>>>> notifiers need to be updated as well.
> > >>>>>
> > >>>>> This commit remove mmu_notifier_invalidate_page() hence why everything
> > >>>>> need to be updated. But in 4.4 you can get away with just adding start/
> > >>>>> end and keep around mmu_notifier_invalidate_page() to minimize disruption.
> > >>>>
> > >>>> OK, this is really interesting. I was really worried to change the
> > >>>> semantic of the mmu notifiers in stable kernels because this is really
> > >>>> a hard to review change and high risk for anybody running those old
> > >>>> kernels. If we can keep the mmu_notifier_invalidate_page and wrap them
> > >>>> into the range scope API then this sounds like the best way forward.
> > >>>>
> > >>>> So just to make sure we are at the same page. Does this sounds goo for
> > >>>> stable 4.4. backport? Mike's hugetlb pmd shared fixup can be applied on
> > >>>> top. What do you think?
> > >>>
> > >>> You need to invalidate outside page table lock so before the call to
> > >>> page_check_address(). For instance like below patch, which also only
> > >>> do the range invalidation for huge page which would avoid too much of
> > >>> a behavior change for user of mmu notifier.
> > >>
> > >> Right. I would rather not make this PageHuge special though. So the
> > >> fixed version should be.
> > > 
> > > Why not testing for huge ? Only huge is broken and thus only that
> > > need the extra range invalidation. Doing the double invalidation
> > > for single page is bit overkill.
> > 
> > I am a bit confused, and hope this does not add to any confusion by others.
> > 
> > IIUC, the patch below does not attempt to 'fix' anything.  It is simply
> > there to add the start/end notifiers to the v4.4 version of this routine
> > so that a subsequent patch can use them (with modified ranges) to handle
> > unmapping a shared pmd huge page.  That is the mainline fix which started
> > this thread.
> > 
> > Since we are only/mostly interested in fixing the shared pmd issue in
> > 4.4, how about just adding the start/end notifiers to the very specific
> > case where pmd sharing is possible?
> > 
> > I can see the value in trying to back port dependent patches such as this
> > so that stable releases look more like mainline.  However, I am not sure of
> > the value in this case as this patch was part of a larger set changing
> > notifier semantics.
> 
> For all intents and purposes this is not a backport of the original
> patch so maybe we should just drop the commit reference and just
> explains that it is there to fix mmu notifier in respect to huge page
> migration.
> 
> The original patches fix more than this case because newer featurers
> like THP migration, THP swapping, ... added more cases where things
> would have been wrong. But in 4.4 frame there is only huge tlb fs
> migration.

And THP migration is still a problem with 4.4 AFAICS. All other cases
simply split the huge page but THP migration keeps it in one piece and
as such it is theoretically broken as you have explained. So I would
stick with what I posted with some more clarifications in the changelog
if you think it is appropriate (suggestions welcome).
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-09-03  5:57 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-23 20:59 [PATCH v6 0/2] huge_pmd_unshare migration and flushing Mike Kravetz
2018-08-23 20:59 ` [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages Mike Kravetz
2018-08-24  2:59   ` Naoya Horiguchi
2018-08-24  8:41   ` Michal Hocko
2018-08-24 18:08     ` Mike Kravetz
2018-08-27  7:46       ` Michal Hocko
2018-08-27 13:46         ` Jerome Glisse
2018-08-27 19:09           ` Michal Hocko
2018-08-29 17:24           ` Mike Kravetz
2018-08-29 18:14             ` Jerome Glisse
2018-08-29 18:39               ` Michal Hocko
2018-08-29 21:11                 ` Jerome Glisse
2018-08-30  0:40                   ` Mike Kravetz
2018-08-30 10:56                   ` Michal Hocko
2018-08-30 14:08                     ` Jerome Glisse
2018-08-30 16:19                       ` Michal Hocko
2018-08-30 16:57                         ` Jerome Glisse
2018-08-30 18:05                           ` Mike Kravetz
2018-08-30 18:39                             ` Jerome Glisse
2018-09-03  5:56                               ` Michal Hocko [this message]
2018-09-04 14:00                                 ` Jerome Glisse
2018-09-04 17:55                                   ` Mike Kravetz
2018-09-05  6:57                                   ` Michal Hocko
2018-08-27 16:42         ` Mike Kravetz
2018-08-27 19:11       ` Michal Hocko
2018-08-24  9:25   ` Michal Hocko
2018-08-23 20:59 ` [PATCH v6 2/2] hugetlb: take PMD sharing into account when flushing tlb/caches Mike Kravetz
2018-08-24  3:07   ` Naoya Horiguchi
2018-08-24 11:35 ` [PATCH v6 0/2] huge_pmd_unshare migration and flushing Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180903055654.GA14951@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=jglisse@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=leonro@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=matanb@mellanox.com \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=sivanich@sgi.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).