Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable
@ 2020-01-14  9:16 Li Xinhai
  2020-01-14  9:16 ` [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone Li Xinhai
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Li Xinhai @ 2020-01-14  9:16 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, Michal Hocko, Mike Kravetz

Checking hstate at early phase when isolating page, instead of during
unmap and move phase, to avoid useless isolation.

Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/mempolicy.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 5228c62..986e51d 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -185,10 +185,9 @@ static inline bool vma_migratable(struct vm_area_struct *vma)
 	if (vma_is_dax(vma))
 		return false;
 
-#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
-	if (vma->vm_flags & VM_HUGETLB)
+	if (is_vm_hugetlb_page(vma) &&
+		!hugepage_migration_supported(hstate_vma(vma)))
 		return false;
-#endif
 
 	/*
 	 * Migration allocates pages in the highest zone. If we cannot
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-14  9:16 [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Li Xinhai
@ 2020-01-14  9:16 ` Li Xinhai
  2020-01-14 14:09   ` Li Xinhai
  2020-01-14 19:12 ` [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Mike Kravetz
  2020-01-15  1:25 ` Andrew Morton
  2 siblings, 1 reply; 30+ messages in thread
From: Li Xinhai @ 2020-01-14  9:16 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, Michal Hocko, Mike Kravetz

Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man
page:

Notes
MPOL_MF_STRICT is ignored on huge page mappings.

If MPOL_MF_STRICT is specified alone without any MOVE flag, we should
indicate, from test_walk, that walking this vma should be skipped even if
there are misplaced pages.

Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/mempolicy.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 067cf7d..c117b5f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -656,6 +656,13 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
 		return 1;
 	}
 
+	/* MPOL_MF_STRICT is ignored for huge page, skip checking
+	*  misplaced pages
+	*/
+	if ((flags & MPOL_MF_VALID) == MPOL_MF_STRICT &&
+		is_vm_hugetlb_page(vma))
+		return 1;
+
 	/* queue pages from current vma */
 	if (flags & MPOL_MF_VALID)
 		return 0;
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-14  9:16 ` [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone Li Xinhai
@ 2020-01-14 14:09   ` Li Xinhai
  2020-01-14 18:27     ` Yang Shi
  2020-01-15  1:07     ` Mike Kravetz
  0 siblings, 2 replies; 30+ messages in thread
From: Li Xinhai @ 2020-01-14 14:09 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, mhocko, Mike Kravetz, yang.shi, n-horiguchi

Add cc to
Yang Shi <yang.shi@linux.alibaba.com>
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
, who has been worked on this part

On 2020-01-14 at 17:16 Li Xinhai wrote:
>Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man
>page:
>
>Notes
>MPOL_MF_STRICT is ignored on huge page mappings.
>
>If MPOL_MF_STRICT is specified alone without any MOVE flag, we should
>indicate, from test_walk, that walking this vma should be skipped even if
>there are misplaced pages.
>
>Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Mike Kravetz <mike.kravetz@oracle.com>
>---
> mm/mempolicy.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
>diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>index 067cf7d..c117b5f 100644
>--- a/mm/mempolicy.c
>+++ b/mm/mempolicy.c
>@@ -656,6 +656,13 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
> return 1;
> }
>
>+	/* MPOL_MF_STRICT is ignored for huge page, skip checking
>+	*  misplaced pages
>+	*/
>+	if ((flags & MPOL_MF_VALID) == MPOL_MF_STRICT &&
>+	is_vm_hugetlb_page(vma))
>+	return 1;
>+
> /* queue pages from current vma */
> if (flags & MPOL_MF_VALID)
> return 0;
>--
>1.8.3.1
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-14 14:09   ` Li Xinhai
@ 2020-01-14 18:27     ` Yang Shi
  2020-01-15  1:07     ` Mike Kravetz
  1 sibling, 0 replies; 30+ messages in thread
From: Yang Shi @ 2020-01-14 18:27 UTC (permalink / raw)
  To: Li Xinhai, linux-mm; +Cc: akpm, mhocko, Mike Kravetz, n-horiguchi



On 1/14/20 6:09 AM, Li Xinhai wrote:
> Add cc to
> Yang Shi <yang.shi@linux.alibaba.com>
> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> , who has been worked on this part
>
> On 2020-01-14 at 17:16 Li Xinhai wrote:
>> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man
>> page:
>>
>> Notes
>> MPOL_MF_STRICT is ignored on huge page mappings.
>>
>> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should
>> indicate, from test_walk, that walking this vma should be skipped even if
>> there are misplaced pages.
>>
>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>> mm/mempolicy.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index 067cf7d..c117b5f 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -656,6 +656,13 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
>> return 1;
>> }
>>
>> +	/* MPOL_MF_STRICT is ignored for huge page, skip checking
>> +	*  misplaced pages
>> +	*/
>> +	if ((flags & MPOL_MF_VALID) == MPOL_MF_STRICT &&
>> +	is_vm_hugetlb_page(vma))
>> +	return 1;

It makes some sense to me.  We do save some effort by not 
acquiring/releasing ptl and walking page tables.

Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>

>> /* queue pages from current vma */
>> if (flags & MPOL_MF_VALID)
>> return 0;
>> --
>> 1.8.3.1
> >



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable
  2020-01-14  9:16 [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Li Xinhai
  2020-01-14  9:16 ` [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone Li Xinhai
@ 2020-01-14 19:12 ` Mike Kravetz
  2020-01-15  1:25 ` Andrew Morton
  2 siblings, 0 replies; 30+ messages in thread
From: Mike Kravetz @ 2020-01-14 19:12 UTC (permalink / raw)
  To: Li Xinhai, linux-mm; +Cc: akpm, Michal Hocko

On 1/14/20 1:16 AM, Li Xinhai wrote:
> Checking hstate at early phase when isolating page, instead of during
> unmap and move phase, to avoid useless isolation.
> 
> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  include/linux/mempolicy.h | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
> index 5228c62..986e51d 100644
> --- a/include/linux/mempolicy.h
> +++ b/include/linux/mempolicy.h
> @@ -185,10 +185,9 @@ static inline bool vma_migratable(struct vm_area_struct *vma)
>  	if (vma_is_dax(vma))
>  		return false;
>  
> -#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> -	if (vma->vm_flags & VM_HUGETLB)
> +	if (is_vm_hugetlb_page(vma) &&
> +		!hugepage_migration_supported(hstate_vma(vma)))
>  		return false;
> -#endif
>  
>  	/*
>  	 * Migration allocates pages in the highest zone. If we cannot
> 

Thanks!  It is indeed better to perform the check early.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-14 14:09   ` Li Xinhai
  2020-01-14 18:27     ` Yang Shi
@ 2020-01-15  1:07     ` Mike Kravetz
  2020-01-15  1:24       ` Yang Shi
  2020-01-15 21:07       ` Mike Kravetz
  1 sibling, 2 replies; 30+ messages in thread
From: Mike Kravetz @ 2020-01-15  1:07 UTC (permalink / raw)
  To: Li Xinhai, linux-mm; +Cc: akpm, mhocko, yang.shi, n-horiguchi

On 1/14/20 6:09 AM, Li Xinhai wrote:
> Add cc to
> Yang Shi <yang.shi@linux.alibaba.com>
> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> , who has been worked on this part
> 
> On 2020-01-14 at 17:16 Li Xinhai wrote:
>> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man
>> page:
>>
>> Notes
>> MPOL_MF_STRICT is ignored on huge page mappings.
>>
>> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should
>> indicate, from test_walk, that walking this vma should be skipped even if
>> there are misplaced pages.
>>
>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Mike Kravetz <mike.kravetz@oracle.com>

I do not necessarily disagree with the change.  However, this has made me
question a couple things:
1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings?
   - Is that leftover from the the days when huge page migration was not
     supported?
   - Is it just because huge page migration is more likely to fail than
     base page migration.
2) Does the mbind code function properly when unable to migrate a huge page
   MPOL_MF_STRICT is set?  A quick look at the code looks like it returns
   EIO.

I will look into these questions.  However, if someone already knows the
answer that would be appreciated.
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-15  1:07     ` Mike Kravetz
@ 2020-01-15  1:24       ` Yang Shi
  2020-01-15  4:28         ` Mike Kravetz
  2020-01-15 21:07       ` Mike Kravetz
  1 sibling, 1 reply; 30+ messages in thread
From: Yang Shi @ 2020-01-15  1:24 UTC (permalink / raw)
  To: Mike Kravetz, Li Xinhai, linux-mm; +Cc: akpm, mhocko, n-horiguchi



On 1/14/20 5:07 PM, Mike Kravetz wrote:
> On 1/14/20 6:09 AM, Li Xinhai wrote:
>> Add cc to
>> Yang Shi <yang.shi@linux.alibaba.com>
>> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> , who has been worked on this part
>>
>> On 2020-01-14 at 17:16 Li Xinhai wrote:
>>> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man
>>> page:
>>>
>>> Notes
>>> MPOL_MF_STRICT is ignored on huge page mappings.
>>>
>>> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should
>>> indicate, from test_walk, that walking this vma should be skipped even if
>>> there are misplaced pages.
>>>
>>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> I do not necessarily disagree with the change.  However, this has made me
> question a couple things:
> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings?
>     - Is that leftover from the the days when huge page migration was not
>       supported?
>     - Is it just because huge page migration is more likely to fail than
>       base page migration.
> 2) Does the mbind code function properly when unable to migrate a huge page
>     MPOL_MF_STRICT is set?  A quick look at the code looks like it returns
>     EIO.

I don't know the answer about question #1 I didn't dig into the history. 
The queue_pages_hugetlb() returns 0 unconditionally, I think this is 
what "MPOL_MF_STRICT is ignored on huge page mappings" means in code.

It would return -EIO for base pages or THP as what the manpage describes.

>
> I will look into these questions.  However, if someone already knows the
> answer that would be appreciated.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable
  2020-01-14  9:16 [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Li Xinhai
  2020-01-14  9:16 ` [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone Li Xinhai
  2020-01-14 19:12 ` [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Mike Kravetz
@ 2020-01-15  1:25 ` Andrew Morton
  2 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2020-01-15  1:25 UTC (permalink / raw)
  To: Li Xinhai; +Cc: linux-mm, Michal Hocko, Mike Kravetz

On Tue, 14 Jan 2020 09:16:17 +0000 Li Xinhai <lixinhai.lxh@gmail.com> wrote:

> Checking hstate at early phase when isolating page, instead of during
> unmap and move phase, to avoid useless isolation.
> 
> ...
>
> --- a/include/linux/mempolicy.h
> +++ b/include/linux/mempolicy.h
> @@ -185,10 +185,9 @@ static inline bool vma_migratable(struct vm_area_struct *vma)
>  	if (vma_is_dax(vma))
>  		return false;
>  
> -#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> -	if (vma->vm_flags & VM_HUGETLB)
> +	if (is_vm_hugetlb_page(vma) &&
> +		!hugepage_migration_supported(hstate_vma(vma)))
>  		return false;
> -#endif
>  
>  	/*
>  	 * Migration allocates pages in the highest zone. If we cannot

x86_64 allmodconfig:

In file included from ./include/linux/hugetlb.h:26:0,
                 from arch/x86/mm/fault.c:15:
./include/linux/mempolicy.h: In function vma_migratable:
./include/linux/mempolicy.h:190:4: error: implicit declaration of function hugepage_migration_supported; did you mean thp_migration_supported? [-Werror=implicit-function-declaration]
   !hugepage_migration_supported(hstate_vma(vma)))
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

the obvious fix didn't work.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-15  1:24       ` Yang Shi
@ 2020-01-15  4:28         ` Mike Kravetz
  2020-01-15  5:23           ` Yang Shi
  0 siblings, 1 reply; 30+ messages in thread
From: Mike Kravetz @ 2020-01-15  4:28 UTC (permalink / raw)
  To: Yang Shi, Li Xinhai, linux-mm; +Cc: akpm, mhocko, n-horiguchi

On 1/14/20 5:24 PM, Yang Shi wrote:
> 
> 
> On 1/14/20 5:07 PM, Mike Kravetz wrote:
>> On 1/14/20 6:09 AM, Li Xinhai wrote:
>>> Add cc to
>>> Yang Shi <yang.shi@linux.alibaba.com>
>>> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>> , who has been worked on this part
>>>
>>> On 2020-01-14 at 17:16 Li Xinhai wrote:
>>>> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man
>>>> page:
>>>>
>>>> Notes
>>>> MPOL_MF_STRICT is ignored on huge page mappings.
>>>>
>>>> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should
>>>> indicate, from test_walk, that walking this vma should be skipped even if
>>>> there are misplaced pages.
>>>>
>>>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> I do not necessarily disagree with the change.  However, this has made me
>> question a couple things:
>> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings?
>>     - Is that leftover from the the days when huge page migration was not
>>       supported?
>>     - Is it just because huge page migration is more likely to fail than
>>       base page migration.
>> 2) Does the mbind code function properly when unable to migrate a huge page
>>     MPOL_MF_STRICT is set?  A quick look at the code looks like it returns
>>     EIO.
> 
> I don't know the answer about question #1 I didn't dig into the history. The queue_pages_hugetlb() returns 0 unconditionally, I think this is what "MPOL_MF_STRICT is ignored on huge page mappings" means in code.
> 
> It would return -EIO for base pages or THP as what the manpage describes.
> 

I was thinking about a migration failure after isolation.  This block of
code in do_mbind() after queue_pages_range() and mbind_range().

        if (!err) {
                int nr_failed = 0;

                if (!list_empty(&pagelist)) {
                        WARN_ON_ONCE(flags & MPOL_MF_LAZY);
                        nr_failed = migrate_pages(&pagelist, new_page, NULL,
                                start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
                        if (nr_failed)
                                putback_movable_pages(&pagelist);
                }

                if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
                        err = -EIO;

-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-15  4:28         ` Mike Kravetz
@ 2020-01-15  5:23           ` Yang Shi
  2020-01-15  7:36             ` Li Xinhai
  0 siblings, 1 reply; 30+ messages in thread
From: Yang Shi @ 2020-01-15  5:23 UTC (permalink / raw)
  To: Mike Kravetz, Li Xinhai, linux-mm; +Cc: akpm, mhocko, n-horiguchi



On 1/14/20 8:28 PM, Mike Kravetz wrote:
> On 1/14/20 5:24 PM, Yang Shi wrote:
>>
>> On 1/14/20 5:07 PM, Mike Kravetz wrote:
>>> On 1/14/20 6:09 AM, Li Xinhai wrote:
>>>> Add cc to
>>>> Yang Shi <yang.shi@linux.alibaba.com>
>>>> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>>> , who has been worked on this part
>>>>
>>>> On 2020-01-14 at 17:16 Li Xinhai wrote:
>>>>> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man
>>>>> page:
>>>>>
>>>>> Notes
>>>>> MPOL_MF_STRICT is ignored on huge page mappings.
>>>>>
>>>>> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should
>>>>> indicate, from test_walk, that walking this vma should be skipped even if
>>>>> there are misplaced pages.
>>>>>
>>>>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>>> I do not necessarily disagree with the change.  However, this has made me
>>> question a couple things:
>>> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings?
>>>      - Is that leftover from the the days when huge page migration was not
>>>        supported?
>>>      - Is it just because huge page migration is more likely to fail than
>>>        base page migration.
>>> 2) Does the mbind code function properly when unable to migrate a huge page
>>>      MPOL_MF_STRICT is set?  A quick look at the code looks like it returns
>>>      EIO.
>> I don't know the answer about question #1 I didn't dig into the history. The queue_pages_hugetlb() returns 0 unconditionally, I think this is what "MPOL_MF_STRICT is ignored on huge page mappings" means in code.
>>
>> It would return -EIO for base pages or THP as what the manpage describes.
>>
> I was thinking about a migration failure after isolation.  This block of
> code in do_mbind() after queue_pages_range() and mbind_range().
>
>          if (!err) {
>                  int nr_failed = 0;
>
>                  if (!list_empty(&pagelist)) {
>                          WARN_ON_ONCE(flags & MPOL_MF_LAZY);
>                          nr_failed = migrate_pages(&pagelist, new_page, NULL,
>                                  start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
>                          if (nr_failed)
>                                  putback_movable_pages(&pagelist);
>                  }
>
>                  if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
>                          err = -EIO;

Hmm.. I agree this part in man page does look ambiguous. We may assume 
"MPOL_MF_STRICT is ignored on huge page mappings." implies if 
MPOL_MF_STRICT is specified alone? If MOVE flag is specified it should 
return -EIO if some pages could not be moved as what the man page describes.

I don't know what the intention was at the first place. We may have to 
dig into the history.

>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-15  5:23           ` Yang Shi
@ 2020-01-15  7:36             ` Li Xinhai
  2020-01-15 17:16               ` Yang Shi
  0 siblings, 1 reply; 30+ messages in thread
From: Li Xinhai @ 2020-01-15  7:36 UTC (permalink / raw)
  To: yang.shi, Mike Kravetz, linux-mm; +Cc: akpm, mhocko, n-horiguchi

On 2020-01-15 at 13:23 Yang Shi wrote:
>
>
>On 1/14/20 8:28 PM, Mike Kravetz wrote:
>> On 1/14/20 5:24 PM, Yang Shi wrote:
>>>
>>> On 1/14/20 5:07 PM, Mike Kravetz wrote:
>>>> On 1/14/20 6:09 AM, Li Xinhai wrote:
>>>>> Add cc to
>>>>> Yang Shi <yang.shi@linux.alibaba.com>
>>>>> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>>>> , who has been worked on this part
>>>>>
>>>>> On 2020-01-14 at 17:16 Li Xinhai wrote:
>>>>>> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man
>>>>>> page:
>>>>>>
>>>>>> Notes
>>>>>> MPOL_MF_STRICT is ignored on huge page mappings.
>>>>>>
>>>>>> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should
>>>>>> indicate, from test_walk, that walking this vma should be skipped even if
>>>>>> there are misplaced pages.
>>>>>>
>>>>>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>>>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>>>> I do not necessarily disagree with the change.  However, this has made me
>>>> question a couple things:
>>>> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings?
>>>>      - Is that leftover from the the days when huge page migration was not
>>>>        supported?
>>>>      - Is it just because huge page migration is more likely to fail than
>>>>        base page migration.
>>>> 2) Does the mbind code function properly when unable to migrate a huge page
>>>>      MPOL_MF_STRICT is set?  A quick look at the code looks like it returns
>>>>      EIO. 
for question (2),
look into queue_pages_hugetlb(), the misplaced page would not
cause -EIO reported, for both STRICT set alone and STRICT set with MOVE*;
that means STRICT been effectively ignored during isolation phase.

In unmap and move phase, -EIO is reported if failed to move page and
STRICT is set.

>>> I don't know the answer about question #1 I didn't dig into the history. The queue_pages_hugetlb() returns 0 unconditionally, I think this is what "MPOL_MF_STRICT is ignored on huge page mappings" means in code.
>>>
>>> It would return -EIO for base pages or THP as what the manpage describes.
>>>
>> I was thinking about a migration failure after isolation.  This block of
>> code in do_mbind() after queue_pages_range() and mbind_range().
>>
>>          if (!err) {
>>                  int nr_failed = 0;
>>
>>                  if (!list_empty(&pagelist)) {
>>                          WARN_ON_ONCE(flags & MPOL_MF_LAZY);
>>                          nr_failed = migrate_pages(&pagelist, new_page, NULL,
>>                                  start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
>>                          if (nr_failed)
>>                                  putback_movable_pages(&pagelist);
>>                  }
>>
>>                  if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
>>                          err = -EIO;
>
>Hmm.. I agree this part in man page does look ambiguous. We may assume
>"MPOL_MF_STRICT is ignored on huge page mappings." implies if
>MPOL_MF_STRICT is specified alone? If MOVE flag is specified it should
>return -EIO if some pages could not be moved as what the man page describes.
> 
It looks to me that current code has no feasible way to ignore STRICT
flag for hugetlb page when failure happen in  unmap&move phase,
because mbind is about to handle multiple vma(i.e., hugetlb vma mixed with
other vma) in one call.

>I don't know what the intention was at the first place. We may have to
>dig into the history.
>
>>
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-15  7:36             ` Li Xinhai
@ 2020-01-15 17:16               ` Yang Shi
  0 siblings, 0 replies; 30+ messages in thread
From: Yang Shi @ 2020-01-15 17:16 UTC (permalink / raw)
  To: Li Xinhai, Mike Kravetz, linux-mm; +Cc: akpm, mhocko, n-horiguchi



On 1/14/20 11:36 PM, Li Xinhai wrote:
> On 2020-01-15 at 13:23 Yang Shi wrote:
>>
>> On 1/14/20 8:28 PM, Mike Kravetz wrote:
>>> On 1/14/20 5:24 PM, Yang Shi wrote:
>>>> On 1/14/20 5:07 PM, Mike Kravetz wrote:
>>>>> On 1/14/20 6:09 AM, Li Xinhai wrote:
>>>>>> Add cc to
>>>>>> Yang Shi <yang.shi@linux.alibaba.com>
>>>>>> Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>>>>> , who has been worked on this part
>>>>>>
>>>>>> On 2020-01-14 at 17:16 Li Xinhai wrote:
>>>>>>> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to mbind man
>>>>>>> page:
>>>>>>>
>>>>>>> Notes
>>>>>>> MPOL_MF_STRICT is ignored on huge page mappings.
>>>>>>>
>>>>>>> If MPOL_MF_STRICT is specified alone without any MOVE flag, we should
>>>>>>> indicate, from test_walk, that walking this vma should be skipped even if
>>>>>>> there are misplaced pages.
>>>>>>>
>>>>>>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>>>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>>>>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>>>>> I do not necessarily disagree with the change.  However, this has made me
>>>>> question a couple things:
>>>>> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings?
>>>>>        - Is that leftover from the the days when huge page migration was not
>>>>>          supported?
>>>>>        - Is it just because huge page migration is more likely to fail than
>>>>>          base page migration.
>>>>> 2) Does the mbind code function properly when unable to migrate a huge page
>>>>>        MPOL_MF_STRICT is set?  A quick look at the code looks like it returns
>>>>>        EIO.
> for question (2),
> look into queue_pages_hugetlb(), the misplaced page would not
> cause -EIO reported, for both STRICT set alone and STRICT set with MOVE*;
> that means STRICT been effectively ignored during isolation phase.
>
> In unmap and move phase, -EIO is reported if failed to move page and
> STRICT is set.
>
>>>> I don't know the answer about question #1 I didn't dig into the history. The queue_pages_hugetlb() returns 0 unconditionally, I think this is what "MPOL_MF_STRICT is ignored on huge page mappings" means in code.
>>>>
>>>> It would return -EIO for base pages or THP as what the manpage describes.
>>>>
>>> I was thinking about a migration failure after isolation.  This block of
>>> code in do_mbind() after queue_pages_range() and mbind_range().
>>>
>>>            if (!err) {
>>>                    int nr_failed = 0;
>>>
>>>                    if (!list_empty(&pagelist)) {
>>>                            WARN_ON_ONCE(flags & MPOL_MF_LAZY);
>>>                            nr_failed = migrate_pages(&pagelist, new_page, NULL,
>>>                                    start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
>>>                            if (nr_failed)
>>>                                    putback_movable_pages(&pagelist);
>>>                    }
>>>
>>>                    if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
>>>                            err = -EIO;
>> Hmm.. I agree this part in man page does look ambiguous. We may assume
>> "MPOL_MF_STRICT is ignored on huge page mappings." implies if
>> MPOL_MF_STRICT is specified alone? If MOVE flag is specified it should
>> return -EIO if some pages could not be moved as what the man page describes.
>>
> It looks to me that current code has no feasible way to ignore STRICT
> flag for hugetlb page when failure happen in  unmap&move phase,
> because mbind is about to handle multiple vma(i.e., hugetlb vma mixed with
> other vma) in one call.

Yes, if you have both MPOL_MF_STRICT and MPOL_MF_MOVE_{ALL} specified, 
so I speculate the condition is MPOL_MF_STRICT is specified alone, but 
the man page doesn't elaborate this. This is also what the code does. 
But, I'm not sure if my understanding is correct or not.

>
>> I don't know what the intention was at the first place. We may have to
>> dig into the history.
>>
> >



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-15  1:07     ` Mike Kravetz
  2020-01-15  1:24       ` Yang Shi
@ 2020-01-15 21:07       ` Mike Kravetz
  2020-01-15 21:30         ` Yang Shi
  2020-01-16  7:59         ` Michal Hocko
  1 sibling, 2 replies; 30+ messages in thread
From: Mike Kravetz @ 2020-01-15 21:07 UTC (permalink / raw)
  To: Li Xinhai, linux-mm; +Cc: akpm, mhocko, yang.shi, n-horiguchi

Naoya, would appreciate any comments you have as this seems to be related
to your changes to add huge page migration support.

On 1/14/20 5:07 PM, Mike Kravetz wrote:
> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings?
>    - Is that leftover from the the days when huge page migration was not
>      supported?
>    - Is it just because huge page migration is more likely to fail than
>      base page migration.


According to man page, mbind was added to v2.6.7 kernel.  git history does
not go back that far, so I first looked at v2.6.12.

Definitions from include/linux/mempolicy.h
------------------------------------------
/* Policies */
#define MPOL_DEFAULT    0
#define MPOL_PREFERRED  1
#define MPOL_BIND       2
#define MPOL_INTERLEAVE 3

/* Flags for mbind */
#define MPOL_MF_STRICT  (1<<0)  /* Verify existing pages in the mapping */

Note the MPOL_MF_STRICT would simply verify current page placement.  At
this time, hugetlb pages were not part of this verification.

From v2.6.12 routine check_range()
...
	for (vma = first; vma && vma->vm_start < end; vma = vma->vm_next) {
		if (!vma->vm_next && vma->vm_end < end)
			return ERR_PTR(-EFAULT);
		if (prev && prev->vm_end < vma->vm_start)
			return ERR_PTR(-EFAULT);
		if ((flags & MPOL_MF_STRICT) && !is_vm_hugetlb_page(vma)) {
			err = verify_pages(vma->vm_mm,
					   vma->vm_start, vma->vm_end, nodes);
			if (err) {
				first = ERR_PTR(err);
				break;
			}
		}
		prev = vma;
	}
...

man page history does not go back this far.  The first mbind man page has
some things that are interesting:
1) It DOES NOT have the note saying MPOL_MF_STRICT is ignored on huge
   page mappings (even though the code does this).
2) Support for huge page policy was added with 2.6.16
3) For MPOL_MF_STRICT, in 2.6.16 or later the kernel will also try to
   move pages to the requested node with this flag.

Next look at v2.6.16 kernel
==========================
This kernel introduces the MPOL_MF_MOVE* flags
#define MPOL_MF_MOVE    (1<<1)  /* Move pages owned by this process to conform
				   to mapping */
#define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to mapping */

Once again, the up front check_range() routine will skip hugetlb vma's.
Note that check_range() will also isolate pages.  So since hugetlb vma's
are skipped, no hugetlb pages will be isolated.  Since no hugetlb pages
are isolated, none will be on the list to be migrate.  Therefore, hugetlb
pages are effectively ignored for the new MPOL_MF_MOVE* flags.  This makes
sense as huge page migration support was not added until the v3.12 kernel.

Note that at when MPOL_MF_MOVE* flags were added to the mbind man page,
the statement "MPOL_MF_STRICT is ignored on huge page mappings right now."
was added.  This was later changed to "MPOL_MF_STRICT is ignored on huge
page mappings."

Next look at v3.12 kernel
=========================
The v3.12 code looks similiar to today's code.  In the verify/isolation
phase, v3.12 routine queue_pages_hugetlb_pmd_range() looks very similiar
to queue_pages_hugetlb().  Neither will cause errors at this point in the
process.  And, hugetlb pages with a mapcount == 1 will potentially be
isolated and added to the list of pages to be migrated.  In addition, if
the subsequent call to migrate_pages() fails to migrate ANY page, we
return -EIO if MPOL_MF_STRICT is set.  This is true even if the only page(s)
that failed to migrate were hugetlb pages.

It should also be noted that no mbind man page updates were made WRT
hugetlb pages after migration support was added.

Summary
=======
It 'looks' like the statement "MPOL_MF_STRICT is ignored on huge page
mappings." is left over from the original mbind implementation.  When
the huge page migration support was added, I can not be sure if ignoring
MPOL_MF_STRICT for huge pages during the verify/isolation phase was
intentional.  It seems like it was as the return value from
isolate_huge_page() is ignored.

What should we do?
==================
1) Nothing more than optimizations by Li Xinhai.  Behavior that could be
   seen as conflicting with man page has existed since v3.12 and I am
   not aware of any complaints.
2) In addition to optimizations by Li Xinhai, modify code to truly ignore
   MPOL_MF_STRICT for huge page mappings.  This would be fairly easy to do
   after a failure of migrate_pages().  We could simply traverse the list
   of pages that were not migrated looking for any non-hugetlb page.
3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
   and modify code accordingly.

My suggestion would be for 1 or 2.  Thoughts?

-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-15 21:07       ` Mike Kravetz
@ 2020-01-15 21:30         ` Yang Shi
  2020-01-15 21:45           ` Mike Kravetz
  2020-01-16  7:59         ` Michal Hocko
  1 sibling, 1 reply; 30+ messages in thread
From: Yang Shi @ 2020-01-15 21:30 UTC (permalink / raw)
  To: Mike Kravetz, Li Xinhai, linux-mm; +Cc: akpm, mhocko, n-horiguchi

[-- Attachment #1: Type: text/plain, Size: 5502 bytes --]



On 1/15/20 1:07 PM, Mike Kravetz wrote:
> Naoya, would appreciate any comments you have as this seems to be related
> to your changes to add huge page migration support.
>
> On 1/14/20 5:07 PM, Mike Kravetz wrote:
>> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page mappings?
>>     - Is that leftover from the the days when huge page migration was not
>>       supported?
>>     - Is it just because huge page migration is more likely to fail than
>>       base page migration.
>
> According to man page, mbind was added to v2.6.7 kernel.  git history does
> not go back that far, so I first looked at v2.6.12.
>
> Definitions from include/linux/mempolicy.h
> ------------------------------------------
> /* Policies */
> #define MPOL_DEFAULT    0
> #define MPOL_PREFERRED  1
> #define MPOL_BIND       2
> #define MPOL_INTERLEAVE 3
>
> /* Flags for mbind */
> #define MPOL_MF_STRICT  (1<<0)  /* Verify existing pages in the mapping */
>
> Note the MPOL_MF_STRICT would simply verify current page placement.  At
> this time, hugetlb pages were not part of this verification.
>
>  From v2.6.12 routine check_range()
> ...
> 	for (vma = first; vma && vma->vm_start < end; vma = vma->vm_next) {
> 		if (!vma->vm_next && vma->vm_end < end)
> 			return ERR_PTR(-EFAULT);
> 		if (prev && prev->vm_end < vma->vm_start)
> 			return ERR_PTR(-EFAULT);
> 		if ((flags & MPOL_MF_STRICT) && !is_vm_hugetlb_page(vma)) {
> 			err = verify_pages(vma->vm_mm,
> 					   vma->vm_start, vma->vm_end, nodes);
> 			if (err) {
> 				first = ERR_PTR(err);
> 				break;
> 			}
> 		}
> 		prev = vma;
> 	}
> ...
>
> man page history does not go back this far.  The first mbind man page has
> some things that are interesting:
> 1) It DOES NOT have the note saying MPOL_MF_STRICT is ignored on huge
>     page mappings (even though the code does this).
> 2) Support for huge page policy was added with 2.6.16
> 3) For MPOL_MF_STRICT, in 2.6.16 or later the kernel will also try to
>     move pages to the requested node with this flag.
>
> Next look at v2.6.16 kernel
> ==========================
> This kernel introduces the MPOL_MF_MOVE* flags
> #define MPOL_MF_MOVE    (1<<1)  /* Move pages owned by this process to conform
> 				   to mapping */
> #define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to mapping */
>
> Once again, the up front check_range() routine will skip hugetlb vma's.
> Note that check_range() will also isolate pages.  So since hugetlb vma's
> are skipped, no hugetlb pages will be isolated.  Since no hugetlb pages
> are isolated, none will be on the list to be migrate.  Therefore, hugetlb
> pages are effectively ignored for the new MPOL_MF_MOVE* flags.  This makes
> sense as huge page migration support was not added until the v3.12 kernel.
>
> Note that at when MPOL_MF_MOVE* flags were added to the mbind man page,
> the statement "MPOL_MF_STRICT is ignored on huge page mappings right now."
> was added.  This was later changed to "MPOL_MF_STRICT is ignored on huge
> page mappings."
>
> Next look at v3.12 kernel
> =========================
> The v3.12 code looks similiar to today's code.  In the verify/isolation
> phase, v3.12 routine queue_pages_hugetlb_pmd_range() looks very similiar
> to queue_pages_hugetlb().  Neither will cause errors at this point in the
> process.  And, hugetlb pages with a mapcount == 1 will potentially be
> isolated and added to the list of pages to be migrated.  In addition, if
> the subsequent call to migrate_pages() fails to migrate ANY page, we
> return -EIO if MPOL_MF_STRICT is set.  This is true even if the only page(s)
> that failed to migrate were hugetlb pages.
>
> It should also be noted that no mbind man page updates were made WRT
> hugetlb pages after migration support was added.
>
> Summary
> =======
> It 'looks' like the statement "MPOL_MF_STRICT is ignored on huge page
> mappings." is left over from the original mbind implementation.  When
> the huge page migration support was added, I can not be sure if ignoring
> MPOL_MF_STRICT for huge pages during the verify/isolation phase was
> intentional.  It seems like it was as the return value from
> isolate_huge_page() is ignored.

Thanks a lot for digging into the history.

>
> What should we do?
> ==================
> 1) Nothing more than optimizations by Li Xinhai.  Behavior that could be
>     seen as conflicting with man page has existed since v3.12 and I am
>     not aware of any complaints.
> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore
>     MPOL_MF_STRICT for huge page mappings.  This would be fairly easy to do
>     after a failure of migrate_pages().  We could simply traverse the list
>     of pages that were not migrated looking for any non-hugetlb page.

I don't think we can do this easily since migrate_pages() would put the 
migration failed hugetlb pages back to hugepage_activelist so there 
should not any hugetlb page reside on the pagelist regardless of failure 
if I read the code correctly.

Other than that traversing page list to look for a certain type page 
doesn't sound scalable to me.

> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
>     and modify code accordingly.
>
> My suggestion would be for 1 or 2.  Thoughts?

By rethinking the history (thanks again for digging into it), it sounds 
#3 should be more reasonable. It sounds like the behavior was changed 
since hugetlb migration was added but the man page was not updated to 
reflect the change.



[-- Attachment #2: Type: text/html, Size: 6455 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-15 21:30         ` Yang Shi
@ 2020-01-15 21:45           ` Mike Kravetz
  2020-01-15 21:59             ` Yang Shi
  0 siblings, 1 reply; 30+ messages in thread
From: Mike Kravetz @ 2020-01-15 21:45 UTC (permalink / raw)
  To: Yang Shi, Li Xinhai, linux-mm; +Cc: akpm, mhocko, n-horiguchi

On 1/15/20 1:30 PM, Yang Shi wrote:
> On 1/15/20 1:07 PM, Mike Kravetz wrote:
>> What should we do?
>> ==================
>> 1) Nothing more than optimizations by Li Xinhai.  Behavior that could be
>>     seen as conflicting with man page has existed since v3.12 and I am
>>     not aware of any complaints.
>> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore
>>     MPOL_MF_STRICT for huge page mappings.  This would be fairly easy to do
>>     after a failure of migrate_pages().  We could simply traverse the list
>>     of pages that were not migrated looking for any non-hugetlb page.
> 
> I don't think we can do this easily since migrate_pages() would put the migration failed hugetlb pages back to hugepage_activelist so there should not any hugetlb page reside on the pagelist regardless of failure if I read the code correctly.
> 

You are correct.  I made an assumption without actually looking at the code. :(

> Other than that traversing page list to look for a certain type page doesn't sound scalable to me.
> 
>> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
>>     and modify code accordingly.
>>
>> My suggestion would be for 1 or 2.  Thoughts?
> 
> By rethinking the history (thanks again for digging into it), it sounds #3 should be more reasonable. It sounds like the behavior was changed since hugetlb migration was added but the man page was not updated to reflect the change.
> 

Let's hope Naoya comments.  My only concern with #3 is that we will be changing
behavior.  I do not think many people (if any) depend on existing behavior.
However, you can never be sure.

-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-15 21:45           ` Mike Kravetz
@ 2020-01-15 21:59             ` Yang Shi
  2020-01-16  8:07               ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 30+ messages in thread
From: Yang Shi @ 2020-01-15 21:59 UTC (permalink / raw)
  To: Mike Kravetz, Li Xinhai, linux-mm; +Cc: akpm, mhocko, n-horiguchi



On 1/15/20 1:45 PM, Mike Kravetz wrote:
> On 1/15/20 1:30 PM, Yang Shi wrote:
>> On 1/15/20 1:07 PM, Mike Kravetz wrote:
>>> What should we do?
>>> ==================
>>> 1) Nothing more than optimizations by Li Xinhai.  Behavior that could be
>>>      seen as conflicting with man page has existed since v3.12 and I am
>>>      not aware of any complaints.
>>> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore
>>>      MPOL_MF_STRICT for huge page mappings.  This would be fairly easy to do
>>>      after a failure of migrate_pages().  We could simply traverse the list
>>>      of pages that were not migrated looking for any non-hugetlb page.
>> I don't think we can do this easily since migrate_pages() would put the migration failed hugetlb pages back to hugepage_activelist so there should not any hugetlb page reside on the pagelist regardless of failure if I read the code correctly.
>>
> You are correct.  I made an assumption without actually looking at the code. :(
>
>> Other than that traversing page list to look for a certain type page doesn't sound scalable to me.
>>
>>> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
>>>      and modify code accordingly.
>>>
>>> My suggestion would be for 1 or 2.  Thoughts?
>> By rethinking the history (thanks again for digging into it), it sounds #3 should be more reasonable. It sounds like the behavior was changed since hugetlb migration was added but the man page was not updated to reflect the change.
>>
> Let's hope Naoya comments.  My only concern with #3 is that we will be changing
> behavior.  I do not think many people (if any) depend on existing behavior.
> However, you can never be sure.

Yes, this would change the bahavior, but I don't see why we have to 
treat hugetlb specially nowadays with migration supported.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-15 21:07       ` Mike Kravetz
  2020-01-15 21:30         ` Yang Shi
@ 2020-01-16  7:59         ` Michal Hocko
  2020-01-16 19:22           ` Mike Kravetz
  1 sibling, 1 reply; 30+ messages in thread
From: Michal Hocko @ 2020-01-16  7:59 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Li Xinhai, linux-mm, akpm, yang.shi, n-horiguchi

On Wed 15-01-20 13:07:17, Mike Kravetz wrote:
[...]
> Summary
> =======
> It 'looks' like the statement "MPOL_MF_STRICT is ignored on huge page
> mappings." is left over from the original mbind implementation.  When
> the huge page migration support was added, I can not be sure if ignoring
> MPOL_MF_STRICT for huge pages during the verify/isolation phase was
> intentional.  It seems like it was as the return value from
> isolate_huge_page() is ignored.

THanks for the tedious work of studying the mess^Whistory.
> 
> What should we do?
> ==================
> 1) Nothing more than optimizations by Li Xinhai.  Behavior that could be
>    seen as conflicting with man page has existed since v3.12 and I am
>    not aware of any complaints.
> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore
>    MPOL_MF_STRICT for huge page mappings.  This would be fairly easy to do
>    after a failure of migrate_pages().  We could simply traverse the list
>    of pages that were not migrated looking for any non-hugetlb page.
> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
>    and modify code accordingly.
> 
> My suggestion would be for 1 or 2.  Thoughts?

And why do we exactly need to do anything at all? There is an
inconsistency that has been there for years without anybody noticing.
NUMA API is a mess on its own and unfixable at this stage, there will
always be some corner cases. If there is no real workload hitting this
incosistency and suffering, I would rather not touch this at all.
Unless the change would clean up the code or make it more maintainable.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-15 21:59             ` Yang Shi
@ 2020-01-16  8:07               ` HORIGUCHI NAOYA(堀口 直也)
  2020-01-16 15:32                 ` Li Xinhai
  0 siblings, 1 reply; 30+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2020-01-16  8:07 UTC (permalink / raw)
  To: Yang Shi, Mike Kravetz; +Cc: Li Xinhai, linux-mm, akpm, mhocko, n-horiguchi

Hi everyone,

Thank you all for finding and digging the issue.

> Summary
> =======
> It 'looks' like the statement "MPOL_MF_STRICT is ignored on huge page
> mappings." is left over from the original mbind implementation.  When
> the huge page migration support was added, I can not be sure if ignoring
> MPOL_MF_STRICT for huge pages during the verify/isolation phase was
> intentional.  It seems like it was as the return value from
> isolate_huge_page() is ignored.

This summary is totally correct.  I've simply missed considering MPOL_MF_STRICT
flag when implementing hugetlb migration.  As you pointed out, the discrepacy
between the manpage and the code is also due to the lack of updates on the
"MPOL_MF_STRICT is ignored on huge page mappings." statement.

On Wed, Jan 15, 2020 at 01:59:14PM -0800, Yang Shi wrote:
> 
> On 1/15/20 1:45 PM, Mike Kravetz wrote:
> > On 1/15/20 1:30 PM, Yang Shi wrote:
> > > On 1/15/20 1:07 PM, Mike Kravetz wrote:
> > > > What should we do?
> > > > ==================
> > > > 1) Nothing more than optimizations by Li Xinhai.  Behavior that could be
> > > >      seen as conflicting with man page has existed since v3.12 and I am
> > > >      not aware of any complaints.
> > > > 2) In addition to optimizations by Li Xinhai, modify code to truly ignore
> > > >      MPOL_MF_STRICT for huge page mappings.  This would be fairly easy to do
> > > >      after a failure of migrate_pages().  We could simply traverse the list
> > > >      of pages that were not migrated looking for any non-hugetlb page.
> > > I don't think we can do this easily since migrate_pages() would put the migration failed hugetlb pages back to hugepage_activelist so there should not any hugetlb page reside on the pagelist regardless of failure if I read the code correctly.

Although this behavior seems to me not prevent from finding non-hugetlb
pages in migration list, this is a difference in migration behavior between
normal pages and hugepages that might be better to be optimized.
Maybe hugepages failed to migrate should remain in migration list after
migrate_pages() returns and the should be put back via putback_movable_pages().

> > > 
> > You are correct.  I made an assumption without actually looking at the code. :(
> > 
> > > Other than that traversing page list to look for a certain type page doesn't sound scalable to me.
> > > 
> > > > 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
> > > >      and modify code accordingly.
> > > > 
> > > > My suggestion would be for 1 or 2.  Thoughts?
> > > By rethinking the history (thanks again for digging into it), it sounds #3 should be more reasonable. It sounds like the behavior was changed since hugetlb migration was added but the man page was not updated to reflect the change.
> > > 
> > Let's hope Naoya comments.  My only concern with #3 is that we will be changing
> > behavior.  I do not think many people (if any) depend on existing behavior.
> > However, you can never be sure.
> 
> Yes, this would change the bahavior, but I don't see why we have to treat
> hugetlb specially nowadays with migration supported.

(Option #1 is good for short term solution, but eventually) I agree with option #3.
We have no reason to handle hugetlb differently about MPOL_MF_STRICT flag.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-16  8:07               ` HORIGUCHI NAOYA(堀口 直也)
@ 2020-01-16 15:32                 ` Li Xinhai
  0 siblings, 0 replies; 30+ messages in thread
From: Li Xinhai @ 2020-01-16 15:32 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也),
	yang.shi, Mike Kravetz
  Cc: linux-mm, akpm, mhocko, n-horiguchi

On 2020-01-16 at 16:07 HORIGUCHI NAOYA(堀口 直也) wrote:
>Hi everyone,
>
>Thank you all for finding and digging the issue.
>
>> Summary
>> =======
>> It 'looks' like the statement "MPOL_MF_STRICT is ignored on huge page
>> mappings." is left over from the original mbind implementation.  When
>> the huge page migration support was added, I can not be sure if ignoring
>> MPOL_MF_STRICT for huge pages during the verify/isolation phase was
>> intentional.  It seems like it was as the return value from
>> isolate_huge_page() is ignored.
>
>This summary is totally correct.  I've simply missed considering MPOL_MF_STRICT
>flag when implementing hugetlb migration.  As you pointed out, the discrepacy
>between the manpage and the code is also due to the lack of updates on the
>"MPOL_MF_STRICT is ignored on huge page mappings." statement.
>
>On Wed, Jan 15, 2020 at 01:59:14PM -0800, Yang Shi wrote:
>>
>> On 1/15/20 1:45 PM, Mike Kravetz wrote:
>> > On 1/15/20 1:30 PM, Yang Shi wrote:
>> > > On 1/15/20 1:07 PM, Mike Kravetz wrote:
>> > > > What should we do?
>> > > > ==================
>> > > > 1) Nothing more than optimizations by Li Xinhai.  Behavior that could be
>> > > >      seen as conflicting with man page has existed since v3.12 and I am
>> > > >      not aware of any complaints.
>> > > > 2) In addition to optimizations by Li Xinhai, modify code to truly ignore
>> > > >      MPOL_MF_STRICT for huge page mappings.  This would be fairly easy to do
>> > > >      after a failure of migrate_pages().  We could simply traverse the list
>> > > >      of pages that were not migrated looking for any non-hugetlb page.
>> > > I don't think we can do this easily since migrate_pages() would put the migration failed hugetlb pages back to hugepage_activelist so there should not any hugetlb page reside on the pagelist regardless of failure if I read the code correctly.
>
>Although this behavior seems to me not prevent from finding non-hugetlb
>pages in migration list, this is a difference in migration behavior between
>normal pages and hugepages that might be better to be optimized.
>Maybe hugepages failed to migrate should remain in migration list after
>migrate_pages() returns and the should be put back via putback_movable_pages().
>
>> > >
>> > You are correct.  I made an assumption without actually looking at the code. :(
>> >
>> > > Other than that traversing page list to look for a certain type page doesn't sound scalable to me.
>> > >
>> > > > 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
>> > > >      and modify code accordingly.
>> > > >
>> > > > My suggestion would be for 1 or 2.  Thoughts?
>> > > By rethinking the history (thanks again for digging into it), it sounds #3 should be more reasonable. It sounds like the behavior was changed since hugetlb migration was added but the man page was not updated to reflect the change.
>> > >
>> > Let's hope Naoya comments.  My only concern with #3 is that we will be changing
>> > behavior.  I do not think many people (if any) depend on existing behavior.
>> > However, you can never be sure.
>>
>> Yes, this would change the bahavior, but I don't see why we have to treat
>> hugetlb specially nowadays with migration supported.
>
>(Option #1 is good for short term solution, but eventually) I agree with option #3.
>We have no reason to handle hugetlb differently about MPOL_MF_STRICT flag. 

Thanks. Same thoughts for option #3, but it seems better not change current
behavior. 
Add more about current behavior of code:
- In unmap&move phase, there is no different behavior of handling hugepage
and non-hugepage, that is when STRICT is set, report -EIO if any page
can't move, when STRICT is not set, don't report when failed to move;
- In isolation phase, STRICT is effective for non-hugepge, that means set
STRICT alone will cause -EIO if found misplaced pages, and set STRICT with
MOVE* will cause -EIO if failed to isolate pages; for hugepage, STRICT is
ignored, it don't detect misplaced pages nor report -EIO if isolation failed.

This patch don't change any part of current behavior, only avoids walking
page table, where currently do nothing if STRICT is set alone.

>
>Thanks,
>Naoya Horiguchi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-16  7:59         ` Michal Hocko
@ 2020-01-16 19:22           ` Mike Kravetz
  2020-01-17  2:32             ` Yang Shi
  2020-01-17  2:38             ` Li Xinhai
  0 siblings, 2 replies; 30+ messages in thread
From: Mike Kravetz @ 2020-01-16 19:22 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Li Xinhai, linux-mm, akpm, yang.shi, n-horiguchi

On 1/15/20 11:59 PM, Michal Hocko wrote:
> On Wed 15-01-20 13:07:17, Mike Kravetz wrote:
>> What should we do?
>> ==================
>> 1) Nothing more than optimizations by Li Xinhai.  Behavior that could be
>>    seen as conflicting with man page has existed since v3.12 and I am
>>    not aware of any complaints.
>> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore
>>    MPOL_MF_STRICT for huge page mappings.  This would be fairly easy to do
>>    after a failure of migrate_pages().  We could simply traverse the list
>>    of pages that were not migrated looking for any non-hugetlb page.
>> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
>>    and modify code accordingly.
>>
>> My suggestion would be for 1 or 2.  Thoughts?
> 
> And why do we exactly need to do anything at all? There is an
> inconsistency that has been there for years without anybody noticing.
> NUMA API is a mess on its own and unfixable at this stage, there will
> always be some corner cases. If there is no real workload hitting this
> incosistency and suffering, I would rather not touch this at all.
> Unless the change would clean up the code or make it more maintainable.

That is a very valid point.  Sometimes we as developers get focused on the
actual code changes and fail to ask the question "does this really need to
be changed?" or "what value do the code changes provide?".

Li Xinhai came up with two optimizations in how the mbind code deals with
hugetlb pages.  This 'sub-optimal' code has existed for more than 6 years.
Unless I am mistaken, nobody has actually complained or noticed this behavior.
I believe Li Xinhai noticed this inefficient code via code inspection.  Of
course, based on what we know today one could write a test program to show
the inefficient behavior.  However, no real users have noticed this during
the past 6 years.

The proposed code changes are fairly simple.  However, I would not say that
they clean up the code or make it more maintainable.  They essentially add
or modify two checks to bail out early for hugetlb vma's if the flag which
is documented to not apply to hugetlb pages (MPOL_MF_STRICT) is specified.
If one is trying to follow the entire mbind code path for hugetlb pages,
these patches will make that easier follow/understand.  That is simply
because one can ignore downstream code/functionality.

Based on Michal's criteria above, I now believe the code changes should not
be made.  Yes, they are fairly simple.  However, even simple changes have
the potential to break something (build breakage with v1 of patch).  We should
leave this code as is unless issues are reported by users.
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-16 19:22           ` Mike Kravetz
@ 2020-01-17  2:32             ` Yang Shi
  2020-01-17  2:38             ` Li Xinhai
  1 sibling, 0 replies; 30+ messages in thread
From: Yang Shi @ 2020-01-17  2:32 UTC (permalink / raw)
  To: Mike Kravetz, Michal Hocko; +Cc: Li Xinhai, linux-mm, akpm, n-horiguchi



On 1/16/20 11:22 AM, Mike Kravetz wrote:
> On 1/15/20 11:59 PM, Michal Hocko wrote:
>> On Wed 15-01-20 13:07:17, Mike Kravetz wrote:
>>> What should we do?
>>> ==================
>>> 1) Nothing more than optimizations by Li Xinhai.  Behavior that could be
>>>     seen as conflicting with man page has existed since v3.12 and I am
>>>     not aware of any complaints.
>>> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore
>>>     MPOL_MF_STRICT for huge page mappings.  This would be fairly easy to do
>>>     after a failure of migrate_pages().  We could simply traverse the list
>>>     of pages that were not migrated looking for any non-hugetlb page.
>>> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
>>>     and modify code accordingly.
>>>
>>> My suggestion would be for 1 or 2.  Thoughts?
>> And why do we exactly need to do anything at all? There is an
>> inconsistency that has been there for years without anybody noticing.
>> NUMA API is a mess on its own and unfixable at this stage, there will
>> always be some corner cases. If there is no real workload hitting this
>> incosistency and suffering, I would rather not touch this at all.
>> Unless the change would clean up the code or make it more maintainable.
> That is a very valid point.  Sometimes we as developers get focused on the
> actual code changes and fail to ask the question "does this really need to
> be changed?" or "what value do the code changes provide?".
>
> Li Xinhai came up with two optimizations in how the mbind code deals with
> hugetlb pages.  This 'sub-optimal' code has existed for more than 6 years.
> Unless I am mistaken, nobody has actually complained or noticed this behavior.
> I believe Li Xinhai noticed this inefficient code via code inspection.  Of
> course, based on what we know today one could write a test program to show
> the inefficient behavior.  However, no real users have noticed this during
> the past 6 years.
>
> The proposed code changes are fairly simple.  However, I would not say that
> they clean up the code or make it more maintainable.  They essentially add
> or modify two checks to bail out early for hugetlb vma's if the flag which
> is documented to not apply to hugetlb pages (MPOL_MF_STRICT) is specified.
> If one is trying to follow the entire mbind code path for hugetlb pages,
> these patches will make that easier follow/understand.  That is simply
> because one can ignore downstream code/functionality.
>
> Based on Michal's criteria above, I now believe the code changes should not
> be made.  Yes, they are fairly simple.  However, even simple changes have
> the potential to break something (build breakage with v1 of patch).  We should
> leave this code as is unless issues are reported by users.

I tend to agree with you. And, according to what Horiguchi explained, 
the intention was not ignoring hugetlb mappings when hugetlb migration 
was added at the first place.  And, I'm supposed all of us agree hugetlb 
pages should be not treated specially although it is not a good timing 
or there is not strong motivation to fix it right now (we may correct 
the behavior in the future). The patch may convey the wrong information. 
And, the code path is definitely not a hot path, so I'm fine to drop it.

And, I'm wondering if we need add some comments in the code to explain 
the edge case just in case someone else repeat all the tedious history 
digging.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-16 19:22           ` Mike Kravetz
  2020-01-17  2:32             ` Yang Shi
@ 2020-01-17  2:38             ` Li Xinhai
  2020-01-17  7:57               ` Michal Hocko
  1 sibling, 1 reply; 30+ messages in thread
From: Li Xinhai @ 2020-01-17  2:38 UTC (permalink / raw)
  To: Mike Kravetz, mhocko; +Cc: linux-mm, akpm, yang.shi, n-horiguchi

On 2020-01-17 at 03:22 Mike Kravetz wrote:
>On 1/15/20 11:59 PM, Michal Hocko wrote:
>> On Wed 15-01-20 13:07:17, Mike Kravetz wrote:
>>> What should we do?
>>> ==================
>>> 1) Nothing more than optimizations by Li Xinhai.  Behavior that could be
>>>    seen as conflicting with man page has existed since v3.12 and I am
>>>    not aware of any complaints.
>>> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore
>>>    MPOL_MF_STRICT for huge page mappings.  This would be fairly easy to do
>>>    after a failure of migrate_pages().  We could simply traverse the list
>>>    of pages that were not migrated looking for any non-hugetlb page.
>>> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
>>>    and modify code accordingly.
>>>
>>> My suggestion would be for 1 or 2.  Thoughts?
>>
>> And why do we exactly need to do anything at all? There is an
>> inconsistency that has been there for years without anybody noticing.
>> NUMA API is a mess on its own and unfixable at this stage, there will
>> always be some corner cases. If there is no real workload hitting this
>> incosistency and suffering, I would rather not touch this at all.
>> Unless the change would clean up the code or make it more maintainable.
>
>That is a very valid point.  Sometimes we as developers get focused on the
>actual code changes and fail to ask the question "does this really need to
>be changed?" or "what value do the code changes provide?".
>
>Li Xinhai came up with two optimizations in how the mbind code deals with
>hugetlb pages.  This 'sub-optimal' code has existed for more than 6 years.
>Unless I am mistaken, nobody has actually complained or noticed this behavior.
>I believe Li Xinhai noticed this inefficient code via code inspection.  Of
>course, based on what we know today one could write a test program to show
>the inefficient behavior.  However, no real users have noticed this during
>the past 6 years.
>
>The proposed code changes are fairly simple.  However, I would not say that
>they clean up the code or make it more maintainable.  They essentially add
>or modify two checks to bail out early for hugetlb vma's if the flag which
>is documented to not apply to hugetlb pages (MPOL_MF_STRICT) is specified.
>If one is trying to follow the entire mbind code path for hugetlb pages,
>these patches will make that easier follow/understand.  That is simply
>because one can ignore downstream code/functionality.
>
>Based on Michal's criteria above, I now believe the code changes should not
>be made.  Yes, they are fairly simple.  However, even simple changes have
>the potential to break something (build breakage with v1 of patch).  We should
>leave this code as is unless issues are reported by users. 

Acctually I am the user of this API, and when using STRICT alone to know if
pages are misplaced and take action later, it seems don't work consistantly
on hugepage. Initially, I assumed that I didn't use it correctly, that flag must
use with MOVE*? After check the code, found that flag didn't been handled,
and finally noticed that there is a NOTE about it in MAN page.

I'd like the STRICT been handled, but at present since this is not going to
be implemented, I have to handle differently on hugetlb mapping.

At meantime, I thought that this patch can be useful and posted it, because
for scenarios where hugetlb mapping are handled with other mappings, less
cost for the mbind call. (although it didn't help my current case)

I think if we could provid better performance, why not do that only because
that code has exists for 6 years?

>--
>Mike Kravetz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-17  2:38             ` Li Xinhai
@ 2020-01-17  7:57               ` Michal Hocko
  2020-01-17 12:05                 ` Li Xinhai
  0 siblings, 1 reply; 30+ messages in thread
From: Michal Hocko @ 2020-01-17  7:57 UTC (permalink / raw)
  To: Li Xinhai; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi

On Fri 17-01-20 10:38:21, Li Xinhai wrote:
> On 2020-01-17 at 03:22 Mike Kravetz wrote:
> >On 1/15/20 11:59 PM, Michal Hocko wrote:
> >> On Wed 15-01-20 13:07:17, Mike Kravetz wrote:
> >>> What should we do?
> >>> ==================
> >>> 1) Nothing more than optimizations by Li Xinhai.  Behavior that could be
> >>>    seen as conflicting with man page has existed since v3.12 and I am
> >>>    not aware of any complaints.
> >>> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore
> >>>    MPOL_MF_STRICT for huge page mappings.  This would be fairly easy to do
> >>>    after a failure of migrate_pages().  We could simply traverse the list
> >>>    of pages that were not migrated looking for any non-hugetlb page.
> >>> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
> >>>    and modify code accordingly.
> >>>
> >>> My suggestion would be for 1 or 2.  Thoughts?
> >>
> >> And why do we exactly need to do anything at all? There is an
> >> inconsistency that has been there for years without anybody noticing.
> >> NUMA API is a mess on its own and unfixable at this stage, there will
> >> always be some corner cases. If there is no real workload hitting this
> >> incosistency and suffering, I would rather not touch this at all.
> >> Unless the change would clean up the code or make it more maintainable.
> >
> >That is a very valid point.  Sometimes we as developers get focused on the
> >actual code changes and fail to ask the question "does this really need to
> >be changed?" or "what value do the code changes provide?".
> >
> >Li Xinhai came up with two optimizations in how the mbind code deals with
> >hugetlb pages.  This 'sub-optimal' code has existed for more than 6 years.
> >Unless I am mistaken, nobody has actually complained or noticed this behavior.
> >I believe Li Xinhai noticed this inefficient code via code inspection.  Of
> >course, based on what we know today one could write a test program to show
> >the inefficient behavior.  However, no real users have noticed this during
> >the past 6 years.
> >
> >The proposed code changes are fairly simple.  However, I would not say that
> >they clean up the code or make it more maintainable.  They essentially add
> >or modify two checks to bail out early for hugetlb vma's if the flag which
> >is documented to not apply to hugetlb pages (MPOL_MF_STRICT) is specified.
> >If one is trying to follow the entire mbind code path for hugetlb pages,
> >these patches will make that easier follow/understand.  That is simply
> >because one can ignore downstream code/functionality.
> >
> >Based on Michal's criteria above, I now believe the code changes should not
> >be made.  Yes, they are fairly simple.  However, even simple changes have
> >the potential to break something (build breakage with v1 of patch).  We should
> >leave this code as is unless issues are reported by users. 
> 
> Acctually I am the user of this API, and when using STRICT alone to know if
> pages are misplaced and take action later, it seems don't work consistantly
> on hugepage. Initially, I assumed that I didn't use it correctly, that flag must
> use with MOVE*? After check the code, found that flag didn't been handled,
> and finally noticed that there is a NOTE about it in MAN page.

This is the first time you are mentioning an actual use case. This
should have been expressed from the very begining. Including an
explanation of what the use case is and how it is affected by the
current behavior.

> I'd like the STRICT been handled, but at present since this is not going to
> be implemented, I have to handle differently on hugetlb mapping.
> 
> At meantime, I thought that this patch can be useful and posted it, because
> for scenarios where hugetlb mapping are handled with other mappings, less
> cost for the mbind call. (although it didn't help my current case)
> 
> I think if we could provid better performance, why not do that only because
> that code has exists for 6 years?

Do you have any numbers to prove performance improvements? I believe
arguments against the patch have been already presented.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-17  7:57               ` Michal Hocko
@ 2020-01-17 12:05                 ` Li Xinhai
  2020-01-17 15:20                   ` Michal Hocko
  0 siblings, 1 reply; 30+ messages in thread
From: Li Xinhai @ 2020-01-17 12:05 UTC (permalink / raw)
  To: mhocko; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi

On 2020-01-17 at 15:57 Michal Hocko wrote:
>On Fri 17-01-20 10:38:21, Li Xinhai wrote:
>> On 2020-01-17 at 03:22 Mike Kravetz wrote:
>> >On 1/15/20 11:59 PM, Michal Hocko wrote:
>> >> On Wed 15-01-20 13:07:17, Mike Kravetz wrote:
>> >>> What should we do?
>> >>> ==================
>> >>> 1) Nothing more than optimizations by Li Xinhai.  Behavior that could be
>> >>>    seen as conflicting with man page has existed since v3.12 and I am
>> >>>    not aware of any complaints.
>> >>> 2) In addition to optimizations by Li Xinhai, modify code to truly ignore
>> >>>    MPOL_MF_STRICT for huge page mappings.  This would be fairly easy to do
>> >>>    after a failure of migrate_pages().  We could simply traverse the list
>> >>>    of pages that were not migrated looking for any non-hugetlb page.
>> >>> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
>> >>>    and modify code accordingly.
>> >>>
>> >>> My suggestion would be for 1 or 2.  Thoughts?
>> >>
>> >> And why do we exactly need to do anything at all? There is an
>> >> inconsistency that has been there for years without anybody noticing.
>> >> NUMA API is a mess on its own and unfixable at this stage, there will
>> >> always be some corner cases. If there is no real workload hitting this
>> >> incosistency and suffering, I would rather not touch this at all.
>> >> Unless the change would clean up the code or make it more maintainable.
>> >
>> >That is a very valid point.  Sometimes we as developers get focused on the
>> >actual code changes and fail to ask the question "does this really need to
>> >be changed?" or "what value do the code changes provide?".
>> >
>> >Li Xinhai came up with two optimizations in how the mbind code deals with
>> >hugetlb pages.  This 'sub-optimal' code has existed for more than 6 years.
>> >Unless I am mistaken, nobody has actually complained or noticed this behavior.
>> >I believe Li Xinhai noticed this inefficient code via code inspection.  Of
>> >course, based on what we know today one could write a test program to show
>> >the inefficient behavior.  However, no real users have noticed this during
>> >the past 6 years.
>> >
>> >The proposed code changes are fairly simple.  However, I would not say that
>> >they clean up the code or make it more maintainable.  They essentially add
>> >or modify two checks to bail out early for hugetlb vma's if the flag which
>> >is documented to not apply to hugetlb pages (MPOL_MF_STRICT) is specified.
>> >If one is trying to follow the entire mbind code path for hugetlb pages,
>> >these patches will make that easier follow/understand.  That is simply
>> >because one can ignore downstream code/functionality.
>> >
>> >Based on Michal's criteria above, I now believe the code changes should not
>> >be made.  Yes, they are fairly simple.  However, even simple changes have
>> >the potential to break something (build breakage with v1 of patch).  We should
>> >leave this code as is unless issues are reported by users.
>>
>> Acctually I am the user of this API, and when using STRICT alone to know if
>> pages are misplaced and take action later, it seems don't work consistantly
>> on hugepage. Initially, I assumed that I didn't use it correctly, that flag must
>> use with MOVE*? After check the code, found that flag didn't been handled,
>> and finally noticed that there is a NOTE about it in MAN page.
>
>This is the first time you are mentioning an actual use case. This
>should have been expressed from the very begining. Including an
>explanation of what the use case is and how it is affected by the
>current behavior.
> 
OK, that is better practice, thanks.

>> I'd like the STRICT been handled, but at present since this is not going to
>> be implemented, I have to handle differently on hugetlb mapping.
>>
>> At meantime, I thought that this patch can be useful and posted it, because
>> for scenarios where hugetlb mapping are handled with other mappings, less
>> cost for the mbind call. (although it didn't help my current case)
>>
>> I think if we could provid better performance, why not do that only because
>> that code has exists for 6 years?
>
>Do you have any numbers to prove performance improvements? I believe
>arguments against the patch have been already presented. 

I didn't test to know difference with this patch since it doesn't help for
my case. If walking the page table and checking the present PTE is
basically no cost according to existing experience, then it is not needed.

Now, for my case, I am doing workaround by call mbind then check the
numa_maps, instead of just call mbind. So, I'd like option #3 is there. That is
a simple change, and I am glad to add this function if no objection.

>--
>Michal Hocko
>SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-17 12:05                 ` Li Xinhai
@ 2020-01-17 15:20                   ` Michal Hocko
  2020-01-17 15:46                     ` Li Xinhai
  0 siblings, 1 reply; 30+ messages in thread
From: Michal Hocko @ 2020-01-17 15:20 UTC (permalink / raw)
  To: Li Xinhai; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi

On Fri 17-01-20 20:05:23, Li Xinhai wrote:
[...]
> Now, for my case, I am doing workaround

workaround for what?

> by call mbind then check the
> numa_maps, instead of just call mbind. So, I'd like option #3 is there. That is
> a simple change, and I am glad to add this function if no objection.

Any reason you cannot use move_pages syscall?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-17 15:20                   ` Michal Hocko
@ 2020-01-17 15:46                     ` Li Xinhai
  2020-01-20 12:45                       ` Michal Hocko
  0 siblings, 1 reply; 30+ messages in thread
From: Li Xinhai @ 2020-01-17 15:46 UTC (permalink / raw)
  To: mhocko; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi

On 2020-01-17 at 23:20 Michal Hocko wrote:
>On Fri 17-01-20 20:05:23, Li Xinhai wrote:
>[...]
>> Now, for my case, I am doing workaround
>
>workaround for what?
> 
I mean now call mbind without MOVE* or STRICT, only apply policy,
because my purpose is to take action later on misplaced pages.
If STRICT give correct feedback, I don't need to check numa_maps.
For above description, I don't use MOVE*, as want to take action later.

>> by call mbind then check the
>> numa_maps, instead of just call mbind. So, I'd like option #3 is there. That is
>> a simple change, and I am glad to add this function if no objection.
>
>Any reason you cannot use move_pages syscall? 

I use it, that is the 'take action later' mentioned above, I'd like to
know if there are misplace pages before call it if mbind can help
to give answer.

>--
>Michal Hocko
>SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-17 15:46                     ` Li Xinhai
@ 2020-01-20 12:45                       ` Michal Hocko
  2020-01-21 14:15                         ` Li Xinhai
  0 siblings, 1 reply; 30+ messages in thread
From: Michal Hocko @ 2020-01-20 12:45 UTC (permalink / raw)
  To: Li Xinhai; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi

On Fri 17-01-20 23:46:03, Li Xinhai wrote:
[...]
> >> by call mbind then check the
> >> numa_maps, instead of just call mbind. So, I'd like option #3 is there. That is
> >> a simple change, and I am glad to add this function if no objection.
> >
> >Any reason you cannot use move_pages syscall? 
> 
> I use it, that is the 'take action later' mentioned above, I'd like to
> know if there are misplace pages before call it if mbind can help
> to give answer.

I am sorry but I do not follow? Why do you need to do two steps? Why
don't you simply call move_pages right away?

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-20 12:45                       ` Michal Hocko
@ 2020-01-21 14:15                         ` Li Xinhai
  2020-01-21 14:53                           ` Michal Hocko
  0 siblings, 1 reply; 30+ messages in thread
From: Li Xinhai @ 2020-01-21 14:15 UTC (permalink / raw)
  To: mhocko; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi

On 2020-01-20 at 20:45 Michal Hocko wrote:
>On Fri 17-01-20 23:46:03, Li Xinhai wrote:
>[...]
>> >> by call mbind then check the
>> >> numa_maps, instead of just call mbind. So, I'd like option #3 is there. That is
>> >> a simple change, and I am glad to add this function if no objection.
>> >
>> >Any reason you cannot use move_pages syscall?
>>
>> I use it, that is the 'take action later' mentioned above, I'd like to
>> know if there are misplace pages before call it if mbind can help
>> to give answer.
>
>I am sorry but I do not follow? Why do you need to do two steps? Why
>don't you simply call move_pages right away? 
I am trying to decrease the number of call to move_pages(). For
call move_pages(), we need prepare parameters in array for 'pages',
'nodes' (i.e., it do not support moving pages as specified through start addr
and length), which could be cost.

One way for handling my case is 
- mbind: set policy without detecting if any pages misplaced nor moving 
  pages.
(continue below part when timing is right)
- move_pages:  call on pages which are within range of above mbind.

The other way is:
- mbind: set policy and detects if any pages are misplaced.
(continue below part when timing is right)
- move_pages:  call on pages which are within range has misplaced pages.

Knowing about misplaced pages from mbind will give me flexibility for
choosing different solutions.

>
>--
>Michal Hocko
>SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-21 14:15                         ` Li Xinhai
@ 2020-01-21 14:53                           ` Michal Hocko
  2020-01-22 13:55                             ` Li Xinhai
  0 siblings, 1 reply; 30+ messages in thread
From: Michal Hocko @ 2020-01-21 14:53 UTC (permalink / raw)
  To: Li Xinhai; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi

On Tue 21-01-20 22:15:43, Li Xinhai wrote:
[...]
> Knowing about misplaced pages from mbind will give me flexibility for
> choosing different solutions.

Would /proc/pid/numa_maps help to give you rough idea which mappings
of interest have misplaced pages and move_pages for those? It is still
allocating arrays but is that really a big deal?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone
  2020-01-21 14:53                           ` Michal Hocko
@ 2020-01-22 13:55                             ` Li Xinhai
  0 siblings, 0 replies; 30+ messages in thread
From: Li Xinhai @ 2020-01-22 13:55 UTC (permalink / raw)
  To: mhocko; +Cc: Mike Kravetz, linux-mm, akpm, yang.shi, n-horiguchi

On 2020-01-21 at 22:53 Michal Hocko wrote:
>On Tue 21-01-20 22:15:43, Li Xinhai wrote:
>[...]
>> Knowing about misplaced pages from mbind will give me flexibility for
>> choosing different solutions.
>
>Would /proc/pid/numa_maps help to give you rough idea which mappings
>of interest have misplaced pages and move_pages for those? It is still
>allocating arrays but is that really a big deal? 

numa_maps can be checked about misplaced pages, and I mentioned it in
previous mail for this purpose.
For my case, using move_pages() is necessary for later action. The only difference
is about mbind() could give feedback of misplaced page on impacted range,
numa_mpas requres me iterate over maps (that will collect states on those
maps although don't been used) to reach impacted range and check
it. 
I may assume there is no much difference on overall performance, it is more
about coding effort in user sapce application.

I've being try to apply patch for mbind(i.e., apply STRICT on hugetlb mapping),
and may sharing it after verification and let's see if that is usable, thanks.

>--
>Michal Hocko
>SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, back to index

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-14  9:16 [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Li Xinhai
2020-01-14  9:16 ` [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone Li Xinhai
2020-01-14 14:09   ` Li Xinhai
2020-01-14 18:27     ` Yang Shi
2020-01-15  1:07     ` Mike Kravetz
2020-01-15  1:24       ` Yang Shi
2020-01-15  4:28         ` Mike Kravetz
2020-01-15  5:23           ` Yang Shi
2020-01-15  7:36             ` Li Xinhai
2020-01-15 17:16               ` Yang Shi
2020-01-15 21:07       ` Mike Kravetz
2020-01-15 21:30         ` Yang Shi
2020-01-15 21:45           ` Mike Kravetz
2020-01-15 21:59             ` Yang Shi
2020-01-16  8:07               ` HORIGUCHI NAOYA(堀口 直也)
2020-01-16 15:32                 ` Li Xinhai
2020-01-16  7:59         ` Michal Hocko
2020-01-16 19:22           ` Mike Kravetz
2020-01-17  2:32             ` Yang Shi
2020-01-17  2:38             ` Li Xinhai
2020-01-17  7:57               ` Michal Hocko
2020-01-17 12:05                 ` Li Xinhai
2020-01-17 15:20                   ` Michal Hocko
2020-01-17 15:46                     ` Li Xinhai
2020-01-20 12:45                       ` Michal Hocko
2020-01-21 14:15                         ` Li Xinhai
2020-01-21 14:53                           ` Michal Hocko
2020-01-22 13:55                             ` Li Xinhai
2020-01-14 19:12 ` [PATCH 1/2] mm/mempolicy: Checking hstate for hugetlbfs page in vma_migratable Mike Kravetz
2020-01-15  1:25 ` Andrew Morton

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git