Linux-mm Archive on
 help / color / Atom feed
From: "Li Xinhai" <>
To: mhocko <>
Cc: "Mike Kravetz" <>,
	 "" <>,
	 akpm <>,
	 yang.shi <>,
	 n-horiguchi <>
Subject: Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
Date: Fri, 17 Jan 2020 20:05:23 +0800
Message-ID: <> (raw)
In-Reply-To: <>

On 2020-01-17 at 15:57 Michal Hocko wrote:
>On Fri 17-01-20 10:38:21, Li Xinhai wrote:
>> On 2020-01-17 at 03:22 Mike Kravetz wrote:
>> >On 1/15/20 11:59 PM, Michal Hocko wrote:
>> >> On Wed 15-01-20 13:07:17, Mike Kravetz wrote:
>> >>> What should we do?
>> >>> ==================
>> >>> 1) Nothing more than optimizations by Li Xinhai.  Behavior that could be
>> >>>    seen as conflicting with man page has existed since v3.12 and I am
>> >>>    not aware of any complaints.
>> >>> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore
>> >>>    MPOL_MF_STRICT for huge page mappings.  This would be fairly easy to do
>> >>>    after a failure of migrate_pages().  We could simply traverse the list
>> >>>    of pages that were not migrated looking for any non-hugetlb page.
>> >>> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
>> >>>    and modify code accordingly.
>> >>>
>> >>> My suggestion would be for 1 or 2.  Thoughts?
>> >>
>> >> And why do we exactly need to do anything at all? There is an
>> >> inconsistency that has been there for years without anybody noticing.
>> >> NUMA API is a mess on its own and unfixable at this stage, there will
>> >> always be some corner cases. If there is no real workload hitting this
>> >> incosistency and suffering, I would rather not touch this at all.
>> >> Unless the change would clean up the code or make it more maintainable.
>> >
>> >That is a very valid point.  Sometimes we as developers get focused on the
>> >actual code changes and fail to ask the question "does this really need to
>> >be changed?" or "what value do the code changes provide?".
>> >
>> >Li Xinhai came up with two optimizations in how the mbind code deals with
>> >hugetlb pages.  This 'sub-optimal' code has existed for more than 6 years.
>> >Unless I am mistaken, nobody has actually complained or noticed this behavior.
>> >I believe Li Xinhai noticed this inefficient code via code inspection.  Of
>> >course, based on what we know today one could write a test program to show
>> >the inefficient behavior.  However, no real users have noticed this during
>> >the past 6 years.
>> >
>> >The proposed code changes are fairly simple.  However, I would not say that
>> >they clean up the code or make it more maintainable.  They essentially add
>> >or modify two checks to bail out early for hugetlb vma's if the flag which
>> >is documented to not apply to hugetlb pages (MPOL_MF_STRICT) is specified.
>> >If one is trying to follow the entire mbind code path for hugetlb pages,
>> >these patches will make that easier follow/understand.  That is simply
>> >because one can ignore downstream code/functionality.
>> >
>> >Based on Michal's criteria above, I now believe the code changes should not
>> >be made.  Yes, they are fairly simple.  However, even simple changes have
>> >the potential to break something (build breakage with v1 of patch).  We should
>> >leave this code as is unless issues are reported by users.
>> Acctually I am the user of this API, and when using STRICT alone to know if
>> pages are misplaced and take action later, it seems don't work consistantly
>> on hugepage. Initially, I assumed that I didn't use it correctly, that flag must
>> use with MOVE*? After check the code, found that flag didn't been handled,
>> and finally noticed that there is a NOTE about it in MAN page.
>This is the first time you are mentioning an actual use case. This
>should have been expressed from the very begining. Including an
>explanation of what the use case is and how it is affected by the
>current behavior.
OK, that is better practice, thanks.

>> I'd like the STRICT been handled, but at present since this is not going to
>> be implemented, I have to handle differently on hugetlb mapping.
>> At meantime, I thought that this patch can be useful and posted it, because
>> for scenarios where hugetlb mapping are handled with other mappings, less
>> cost for the mbind call. (although it didn't help my current case)
>> I think if we could provid better performance, why not do that only because
>> that code has exists for 6 years?
>Do you have any numbers to prove performance improvements? I believe
>arguments against the patch have been already presented. 

I didn't test to know difference with this patch since it doesn't help for
my case. If walking the page table and checking the present PTE is
basically no cost according to existing experience, then it is not needed.

Now, for my case, I am doing workaround by call mbind then check the
numa_maps, instead of just call mbind. So, I'd like option #3 is there. That is
a simple change, and I am glad to add this function if no objection.

>Michal Hocko
>SUSE Labs

  reply index

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14  9:16 [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Li Xinhai
2020-01-14  9:16 ` [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone Li Xinhai
2020-01-14 14:09   ` Li Xinhai
2020-01-14 18:27     ` Yang Shi
2020-01-15  1:07     ` Mike Kravetz
2020-01-15  1:24       ` Yang Shi
2020-01-15  4:28         ` Mike Kravetz
2020-01-15  5:23           ` Yang Shi
2020-01-15  7:36             ` Li Xinhai
2020-01-15 17:16               ` Yang Shi
2020-01-15 21:07       ` Mike Kravetz
2020-01-15 21:30         ` Yang Shi
2020-01-15 21:45           ` Mike Kravetz
2020-01-15 21:59             ` Yang Shi
2020-01-16  8:07               ` HORIGUCHI NAOYA(堀口 直也)
2020-01-16 15:32                 ` Li Xinhai
2020-01-16  7:59         ` Michal Hocko
2020-01-16 19:22           ` Mike Kravetz
2020-01-17  2:32             ` Yang Shi
2020-01-17  2:38             ` Li Xinhai
2020-01-17  7:57               ` Michal Hocko
2020-01-17 12:05                 ` Li Xinhai [this message]
2020-01-17 15:20                   ` Michal Hocko
2020-01-17 15:46                     ` Li Xinhai
2020-01-20 12:45                       ` Michal Hocko
2020-01-21 14:15                         ` Li Xinhai
2020-01-21 14:53                           ` Michal Hocko
2020-01-22 13:55                             ` Li Xinhai
2020-01-14 19:12 ` [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Mike Kravetz
2020-01-15  1:25 ` Andrew Morton

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on

Archives are clonable:
	git clone --mirror linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ \
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone