linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	stable@vger.kernel.org, linux-rdma@vger.kernel.org,
	Matan Barak <matanb@mellanox.com>,
	Leon Romanovsky <leonro@mellanox.com>,
	Dimitri Sivanich <sivanich@sgi.com>
Subject: Re: [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages
Date: Tue, 4 Sep 2018 10:00:36 -0400	[thread overview]
Message-ID: <20180904140035.GA3526@redhat.com> (raw)
In-Reply-To: <20180903055654.GA14951@dhcp22.suse.cz>

On Mon, Sep 03, 2018 at 07:56:54AM +0200, Michal Hocko wrote:
> On Thu 30-08-18 14:39:44, Jerome Glisse wrote:
> > On Thu, Aug 30, 2018 at 11:05:16AM -0700, Mike Kravetz wrote:
> > > On 08/30/2018 09:57 AM, Jerome Glisse wrote:
> > > > On Thu, Aug 30, 2018 at 06:19:52PM +0200, Michal Hocko wrote:
> > > >> On Thu 30-08-18 10:08:25, Jerome Glisse wrote:
> > > >>> On Thu, Aug 30, 2018 at 12:56:16PM +0200, Michal Hocko wrote:
> > > >>>> On Wed 29-08-18 17:11:07, Jerome Glisse wrote:
> > > >>>>> On Wed, Aug 29, 2018 at 08:39:06PM +0200, Michal Hocko wrote:
> > > >>>>>> On Wed 29-08-18 14:14:25, Jerome Glisse wrote:
> > > >>>>>>> On Wed, Aug 29, 2018 at 10:24:44AM -0700, Mike Kravetz wrote:
> > > >>>>>> [...]
> > > >>>>>>>> What would be the best mmu notifier interface to use where there are no
> > > >>>>>>>> start/end calls?
> > > >>>>>>>> Or, is the best solution to add the start/end calls as is done in later
> > > >>>>>>>> versions of the code?  If that is the suggestion, has there been any change
> > > >>>>>>>> in invalidate start/end semantics that we should take into account?
> > > >>>>>>>
> > > >>>>>>> start/end would be the one to add, 4.4 seems broken in respect to THP
> > > >>>>>>> and mmu notification. Another solution is to fix user of mmu notifier,
> > > >>>>>>> they were only a handful back then. For instance properly adjust the
> > > >>>>>>> address to match first address covered by pmd or pud and passing down
> > > >>>>>>> correct page size to mmu_notifier_invalidate_page() would allow to fix
> > > >>>>>>> this easily.
> > > >>>>>>>
> > > >>>>>>> This is ok because user of try_to_unmap_one() replace the pte/pmd/pud
> > > >>>>>>> with an invalid one (either poison, migration or swap) inside the
> > > >>>>>>> function. So anyone racing would synchronize on those special entry
> > > >>>>>>> hence why it is fine to delay mmu_notifier_invalidate_page() to after
> > > >>>>>>> dropping the page table lock.
> > > >>>>>>>
> > > >>>>>>> Adding start/end might the solution with less code churn as you would
> > > >>>>>>> only need to change try_to_unmap_one().
> > > >>>>>>
> > > >>>>>> What about dependencies? 369ea8242c0fb sounds like it needs work for all
> > > >>>>>> notifiers need to be updated as well.
> > > >>>>>
> > > >>>>> This commit remove mmu_notifier_invalidate_page() hence why everything
> > > >>>>> need to be updated. But in 4.4 you can get away with just adding start/
> > > >>>>> end and keep around mmu_notifier_invalidate_page() to minimize disruption.
> > > >>>>
> > > >>>> OK, this is really interesting. I was really worried to change the
> > > >>>> semantic of the mmu notifiers in stable kernels because this is really
> > > >>>> a hard to review change and high risk for anybody running those old
> > > >>>> kernels. If we can keep the mmu_notifier_invalidate_page and wrap them
> > > >>>> into the range scope API then this sounds like the best way forward.
> > > >>>>
> > > >>>> So just to make sure we are at the same page. Does this sounds goo for
> > > >>>> stable 4.4. backport? Mike's hugetlb pmd shared fixup can be applied on
> > > >>>> top. What do you think?
> > > >>>
> > > >>> You need to invalidate outside page table lock so before the call to
> > > >>> page_check_address(). For instance like below patch, which also only
> > > >>> do the range invalidation for huge page which would avoid too much of
> > > >>> a behavior change for user of mmu notifier.
> > > >>
> > > >> Right. I would rather not make this PageHuge special though. So the
> > > >> fixed version should be.
> > > > 
> > > > Why not testing for huge ? Only huge is broken and thus only that
> > > > need the extra range invalidation. Doing the double invalidation
> > > > for single page is bit overkill.
> > > 
> > > I am a bit confused, and hope this does not add to any confusion by others.
> > > 
> > > IIUC, the patch below does not attempt to 'fix' anything.  It is simply
> > > there to add the start/end notifiers to the v4.4 version of this routine
> > > so that a subsequent patch can use them (with modified ranges) to handle
> > > unmapping a shared pmd huge page.  That is the mainline fix which started
> > > this thread.
> > > 
> > > Since we are only/mostly interested in fixing the shared pmd issue in
> > > 4.4, how about just adding the start/end notifiers to the very specific
> > > case where pmd sharing is possible?
> > > 
> > > I can see the value in trying to back port dependent patches such as this
> > > so that stable releases look more like mainline.  However, I am not sure of
> > > the value in this case as this patch was part of a larger set changing
> > > notifier semantics.
> > 
> > For all intents and purposes this is not a backport of the original
> > patch so maybe we should just drop the commit reference and just
> > explains that it is there to fix mmu notifier in respect to huge page
> > migration.
> > 
> > The original patches fix more than this case because newer featurers
> > like THP migration, THP swapping, ... added more cases where things
> > would have been wrong. But in 4.4 frame there is only huge tlb fs
> > migration.
> 
> And THP migration is still a problem with 4.4 AFAICS. All other cases
> simply split the huge page but THP migration keeps it in one piece and
> as such it is theoretically broken as you have explained. So I would
> stick with what I posted with some more clarifications in the changelog
> if you think it is appropriate (suggestions welcome).

Reading code there is no THP migration in 4.4 only huge tlb migration.
Look at handle_mm_fault which do not know how to handle swap pmd, only
the huge tlb fs fault handler knows how to handle those. Hence why i
was checking for huge tlb exactly as page_check_address() to only range
invalidate for huge tlb fs migration.

But i am fine with doing the range invalidation with all.

Cheers,
Jérôme

  reply	other threads:[~2018-09-04 14:00 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-23 20:59 [PATCH v6 0/2] huge_pmd_unshare migration and flushing Mike Kravetz
2018-08-23 20:59 ` [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages Mike Kravetz
2018-08-24  2:59   ` Naoya Horiguchi
2018-08-24  8:41   ` Michal Hocko
2018-08-24 18:08     ` Mike Kravetz
2018-08-27  7:46       ` Michal Hocko
2018-08-27 13:46         ` Jerome Glisse
2018-08-27 19:09           ` Michal Hocko
2018-08-29 17:24           ` Mike Kravetz
2018-08-29 18:14             ` Jerome Glisse
2018-08-29 18:39               ` Michal Hocko
2018-08-29 21:11                 ` Jerome Glisse
2018-08-30  0:40                   ` Mike Kravetz
2018-08-30 10:56                   ` Michal Hocko
2018-08-30 14:08                     ` Jerome Glisse
2018-08-30 16:19                       ` Michal Hocko
2018-08-30 16:57                         ` Jerome Glisse
2018-08-30 18:05                           ` Mike Kravetz
2018-08-30 18:39                             ` Jerome Glisse
2018-09-03  5:56                               ` Michal Hocko
2018-09-04 14:00                                 ` Jerome Glisse [this message]
2018-09-04 17:55                                   ` Mike Kravetz
2018-09-05  6:57                                   ` Michal Hocko
2018-08-27 16:42         ` Mike Kravetz
2018-08-27 19:11       ` Michal Hocko
2018-08-24  9:25   ` Michal Hocko
2018-08-23 20:59 ` [PATCH v6 2/2] hugetlb: take PMD sharing into account when flushing tlb/caches Mike Kravetz
2018-08-24  3:07   ` Naoya Horiguchi
2018-08-24 11:35 ` [PATCH v6 0/2] huge_pmd_unshare migration and flushing Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180904140035.GA3526@redhat.com \
    --to=jglisse@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=leonro@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=matanb@mellanox.com \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=sivanich@sgi.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).