linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
@ 2019-03-19 18:35 Yang Shi
  2019-03-20  0:49 ` David Rientjes
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Yang Shi @ 2019-03-19 18:35 UTC (permalink / raw)
  To: chrubis, vbabka, kirill, osalvador, akpm
  Cc: yang.shi, stable, linux-mm, linux-kernel

When MPOL_MF_STRICT was specified and an existing page was already
on a node that does not follow the policy, mbind() should return -EIO.
But commit 6f4576e3687b ("mempolicy: apply page table walker on
queue_pages_range()") broke the rule.

And, commit c8633798497c ("mm: mempolicy: mbind and migrate_pages
support thp migration") didn't return the correct value for THP mbind()
too.

If MPOL_MF_STRICT is set, ignore vma_migratable() to make sure it reaches
queue_pages_to_pte_range() or queue_pages_pmd() to check if an existing
page was already on a node that does not follow the policy.  And,
non-migratable vma may be used, return -EIO too if MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL was specified.

Tested with https://github.com/metan-ucw/ltp/blob/master/testcases/kernel/syscalls/mbind/mbind02.c

Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()")
Reported-by: Cyril Hrubis <chrubis@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: stable@vger.kernel.org
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/mempolicy.c | 40 +++++++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index abe7a67..401c817 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -447,6 +447,13 @@ static inline bool queue_pages_required(struct page *page,
 	return node_isset(nid, *qp->nmask) == !(flags & MPOL_MF_INVERT);
 }
 
+/*
+ * The queue_pages_pmd() may have three kind of return value.
+ * 1 - pages are placed on he right node or queued successfully.
+ * 0 - THP get split.
+ * -EIO - is migration entry or MPOL_MF_STRICT was specified and an existing
+ *        page was already on a node that does not follow the policy.
+ */
 static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
 				unsigned long end, struct mm_walk *walk)
 {
@@ -456,7 +463,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
 	unsigned long flags;
 
 	if (unlikely(is_pmd_migration_entry(*pmd))) {
-		ret = 1;
+		ret = -EIO;
 		goto unlock;
 	}
 	page = pmd_page(*pmd);
@@ -473,8 +480,15 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
 	ret = 1;
 	flags = qp->flags;
 	/* go to thp migration */
-	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
+	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
+		if (!vma_migratable(walk->vma)) {
+			ret = -EIO;
+			goto unlock;
+		}
+
 		migrate_page_add(page, qp->pagelist, flags);
+	} else
+		ret = -EIO;
 unlock:
 	spin_unlock(ptl);
 out:
@@ -499,8 +513,10 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
 	ptl = pmd_trans_huge_lock(pmd, vma);
 	if (ptl) {
 		ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
-		if (ret)
+		if (ret > 0)
 			return 0;
+		else if (ret < 0)
+			return ret;
 	}
 
 	if (pmd_trans_unstable(pmd))
@@ -521,11 +537,16 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
 			continue;
 		if (!queue_pages_required(page, qp))
 			continue;
-		migrate_page_add(page, qp->pagelist, flags);
+		if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
+			if (!vma_migratable(vma))
+				break;
+			migrate_page_add(page, qp->pagelist, flags);
+		} else
+			break;
 	}
 	pte_unmap_unlock(pte - 1, ptl);
 	cond_resched();
-	return 0;
+	return addr != end ? -EIO : 0;
 }
 
 static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
@@ -595,7 +616,12 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
 	unsigned long endvma = vma->vm_end;
 	unsigned long flags = qp->flags;
 
-	if (!vma_migratable(vma))
+	/*
+	 * Need check MPOL_MF_STRICT to return -EIO if possible
+	 * regardless of vma_migratable
+	 */ 
+	if (!vma_migratable(vma) &&
+	    !(flags & MPOL_MF_STRICT))
 		return 1;
 
 	if (endvma > end)
@@ -622,7 +648,7 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
 	}
 
 	/* queue pages from current vma */
-	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
+	if (flags & MPOL_MF_VALID)
 		return 0;
 	return 1;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
  2019-03-19 18:35 [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified Yang Shi
@ 2019-03-20  0:49 ` David Rientjes
  2019-03-20  1:06   ` Yang Shi
  2019-03-20  5:53 ` Souptick Joarder
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2019-03-20  0:49 UTC (permalink / raw)
  To: Yang Shi
  Cc: chrubis, vbabka, kirill, osalvador, akpm, stable, linux-mm, linux-kernel

On Wed, 20 Mar 2019, Yang Shi wrote:

> When MPOL_MF_STRICT was specified and an existing page was already
> on a node that does not follow the policy, mbind() should return -EIO.
> But commit 6f4576e3687b ("mempolicy: apply page table walker on
> queue_pages_range()") broke the rule.
> 
> And, commit c8633798497c ("mm: mempolicy: mbind and migrate_pages
> support thp migration") didn't return the correct value for THP mbind()
> too.
> 
> If MPOL_MF_STRICT is set, ignore vma_migratable() to make sure it reaches
> queue_pages_to_pte_range() or queue_pages_pmd() to check if an existing
> page was already on a node that does not follow the policy.  And,
> non-migratable vma may be used, return -EIO too if MPOL_MF_MOVE or
> MPOL_MF_MOVE_ALL was specified.
> 
> Tested with https://github.com/metan-ucw/ltp/blob/master/testcases/kernel/syscalls/mbind/mbind02.c
> 
> Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()")
> Reported-by: Cyril Hrubis <chrubis@suse.cz>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: stable@vger.kernel.org
> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Acked-by: David Rientjes <rientjes@google.com>

Thanks.  I think this needs stable for 4.0+, can you confirm?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
  2019-03-20  0:49 ` David Rientjes
@ 2019-03-20  1:06   ` Yang Shi
  0 siblings, 0 replies; 10+ messages in thread
From: Yang Shi @ 2019-03-20  1:06 UTC (permalink / raw)
  To: David Rientjes
  Cc: chrubis, vbabka, kirill, osalvador, akpm, stable, linux-mm, linux-kernel



On 3/19/19 5:49 PM, David Rientjes wrote:
> On Wed, 20 Mar 2019, Yang Shi wrote:
>
>> When MPOL_MF_STRICT was specified and an existing page was already
>> on a node that does not follow the policy, mbind() should return -EIO.
>> But commit 6f4576e3687b ("mempolicy: apply page table walker on
>> queue_pages_range()") broke the rule.
>>
>> And, commit c8633798497c ("mm: mempolicy: mbind and migrate_pages
>> support thp migration") didn't return the correct value for THP mbind()
>> too.
>>
>> If MPOL_MF_STRICT is set, ignore vma_migratable() to make sure it reaches
>> queue_pages_to_pte_range() or queue_pages_pmd() to check if an existing
>> page was already on a node that does not follow the policy.  And,
>> non-migratable vma may be used, return -EIO too if MPOL_MF_MOVE or
>> MPOL_MF_MOVE_ALL was specified.
>>
>> Tested with https://github.com/metan-ucw/ltp/blob/master/testcases/kernel/syscalls/mbind/mbind02.c
>>
>> Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()")
>> Reported-by: Cyril Hrubis <chrubis@suse.cz>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: stable@vger.kernel.org
>> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Acked-by: David Rientjes <rientjes@google.com>
>
> Thanks.  I think this needs stable for 4.0+, can you confirm?

Thanks. Yes, this needs stable for 4.0+.

Yang



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
  2019-03-19 18:35 [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified Yang Shi
  2019-03-20  0:49 ` David Rientjes
@ 2019-03-20  5:53 ` Souptick Joarder
  2019-03-20 22:16   ` Andrew Morton
  2019-03-20  8:16 ` Oscar Salvador
  2019-03-20 15:44 ` Rafael Aquini
  3 siblings, 1 reply; 10+ messages in thread
From: Souptick Joarder @ 2019-03-20  5:53 UTC (permalink / raw)
  To: Yang Shi
  Cc: chrubis, Vlastimil Babka, kirill, osalvador, Andrew Morton,
	stable, Linux-MM, linux-kernel

On Wed, Mar 20, 2019 at 12:06 AM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
> When MPOL_MF_STRICT was specified and an existing page was already
> on a node that does not follow the policy, mbind() should return -EIO.
> But commit 6f4576e3687b ("mempolicy: apply page table walker on
> queue_pages_range()") broke the rule.
>
> And, commit c8633798497c ("mm: mempolicy: mbind and migrate_pages
> support thp migration") didn't return the correct value for THP mbind()
> too.
>
> If MPOL_MF_STRICT is set, ignore vma_migratable() to make sure it reaches
> queue_pages_to_pte_range() or queue_pages_pmd() to check if an existing
> page was already on a node that does not follow the policy.  And,
> non-migratable vma may be used, return -EIO too if MPOL_MF_MOVE or
> MPOL_MF_MOVE_ALL was specified.
>
> Tested with https://github.com/metan-ucw/ltp/blob/master/testcases/kernel/syscalls/mbind/mbind02.c
>
> Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()")
> Reported-by: Cyril Hrubis <chrubis@suse.cz>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: stable@vger.kernel.org
> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  mm/mempolicy.c | 40 +++++++++++++++++++++++++++++++++-------
>  1 file changed, 33 insertions(+), 7 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index abe7a67..401c817 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -447,6 +447,13 @@ static inline bool queue_pages_required(struct page *page,
>         return node_isset(nid, *qp->nmask) == !(flags & MPOL_MF_INVERT);
>  }
>
> +/*
> + * The queue_pages_pmd() may have three kind of return value.
> + * 1 - pages are placed on he right node or queued successfully.

Minor typo -> s/he/the ?

> + * 0 - THP get split.
> + * -EIO - is migration entry or MPOL_MF_STRICT was specified and an existing
> + *        page was already on a node that does not follow the policy.
> + */
>  static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
>                                 unsigned long end, struct mm_walk *walk)
>  {
> @@ -456,7 +463,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
>         unsigned long flags;
>
>         if (unlikely(is_pmd_migration_entry(*pmd))) {
> -               ret = 1;
> +               ret = -EIO;
>                 goto unlock;
>         }
>         page = pmd_page(*pmd);
> @@ -473,8 +480,15 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
>         ret = 1;
>         flags = qp->flags;
>         /* go to thp migration */
> -       if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
> +       if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
> +               if (!vma_migratable(walk->vma)) {
> +                       ret = -EIO;
> +                       goto unlock;
> +               }
> +
>                 migrate_page_add(page, qp->pagelist, flags);
> +       } else
> +               ret = -EIO;
>  unlock:
>         spin_unlock(ptl);
>  out:
> @@ -499,8 +513,10 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
>         ptl = pmd_trans_huge_lock(pmd, vma);
>         if (ptl) {
>                 ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
> -               if (ret)
> +               if (ret > 0)
>                         return 0;
> +               else if (ret < 0)
> +                       return ret;
>         }
>
>         if (pmd_trans_unstable(pmd))
> @@ -521,11 +537,16 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
>                         continue;
>                 if (!queue_pages_required(page, qp))
>                         continue;
> -               migrate_page_add(page, qp->pagelist, flags);
> +               if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
> +                       if (!vma_migratable(vma))
> +                               break;
> +                       migrate_page_add(page, qp->pagelist, flags);
> +               } else
> +                       break;
>         }
>         pte_unmap_unlock(pte - 1, ptl);
>         cond_resched();
> -       return 0;
> +       return addr != end ? -EIO : 0;
>  }
>
>  static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
> @@ -595,7 +616,12 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
>         unsigned long endvma = vma->vm_end;
>         unsigned long flags = qp->flags;
>
> -       if (!vma_migratable(vma))
> +       /*
> +        * Need check MPOL_MF_STRICT to return -EIO if possible
> +        * regardless of vma_migratable
> +        */
> +       if (!vma_migratable(vma) &&
> +           !(flags & MPOL_MF_STRICT))
>                 return 1;
>
>         if (endvma > end)
> @@ -622,7 +648,7 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
>         }
>
>         /* queue pages from current vma */
> -       if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
> +       if (flags & MPOL_MF_VALID)
>                 return 0;
>         return 1;
>  }
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
  2019-03-19 18:35 [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified Yang Shi
  2019-03-20  0:49 ` David Rientjes
  2019-03-20  5:53 ` Souptick Joarder
@ 2019-03-20  8:16 ` Oscar Salvador
  2019-03-20 18:31   ` Yang Shi
  2019-03-20 15:44 ` Rafael Aquini
  3 siblings, 1 reply; 10+ messages in thread
From: Oscar Salvador @ 2019-03-20  8:16 UTC (permalink / raw)
  To: Yang Shi; +Cc: chrubis, vbabka, kirill, akpm, stable, linux-mm, linux-kernel

On Wed, Mar 20, 2019 at 02:35:56AM +0800, Yang Shi wrote:
> Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()")
> Reported-by: Cyril Hrubis <chrubis@suse.cz>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: stable@vger.kernel.org
> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Hi Yang, thanks for the patch.

Some observations below.

>  	}
>  	page = pmd_page(*pmd);
> @@ -473,8 +480,15 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
>  	ret = 1;
>  	flags = qp->flags;
>  	/* go to thp migration */
> -	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
> +	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
> +		if (!vma_migratable(walk->vma)) {
> +			ret = -EIO;
> +			goto unlock;
> +		}
> +
>  		migrate_page_add(page, qp->pagelist, flags);
> +	} else
> +		ret = -EIO;

	if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) ||
       	        !vma_migratable(walk->vma)) {
               	ret = -EIO;
                goto unlock;
        }

	migrate_page_add(page, qp->pagelist, flags); 
unlock:
        spin_unlock(ptl);
out:
        return ret;

seems more clean to me?


>  unlock:
>  	spin_unlock(ptl);
>  out:
> @@ -499,8 +513,10 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
>  	ptl = pmd_trans_huge_lock(pmd, vma);
>  	if (ptl) {
>  		ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
> -		if (ret)
> +		if (ret > 0)
>  			return 0;
> +		else if (ret < 0)
> +			return ret;

I would go with the following, but that's a matter of taste I guess.

if (ret < 0)
	return ret;
else
	return 0;

>  	}
>  
>  	if (pmd_trans_unstable(pmd))
> @@ -521,11 +537,16 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
>  			continue;
>  		if (!queue_pages_required(page, qp))
>  			continue;
> -		migrate_page_add(page, qp->pagelist, flags);
> +		if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
> +			if (!vma_migratable(vma))
> +				break;
> +			migrate_page_add(page, qp->pagelist, flags);
> +		} else
> +			break;

I might be missing something, but AFAICS neither vma nor flags is going to change
while we are in queue_pages_pte_range(), so, could not we move the check just
above the loop?
In that way, 1) we only perform the check once and 2) if we enter the loop
we know that we are going to do some work, so, something like:

index af171ccb56a2..7c0e44389826 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -487,6 +487,9 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
        if (pmd_trans_unstable(pmd))
                return 0;
 
+       if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || !vma_migratable(vma))
+               return -EIO;
+
        pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
        for (; addr != end; pte++, addr += PAGE_SIZE) {
                if (!pte_present(*pte))


>  	}
>  	pte_unmap_unlock(pte - 1, ptl);
>  	cond_resched();
> -	return 0;
> +	return addr != end ? -EIO : 0;

If we can do the above, we can leave the return value as it was.

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
  2019-03-19 18:35 [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified Yang Shi
                   ` (2 preceding siblings ...)
  2019-03-20  8:16 ` Oscar Salvador
@ 2019-03-20 15:44 ` Rafael Aquini
  3 siblings, 0 replies; 10+ messages in thread
From: Rafael Aquini @ 2019-03-20 15:44 UTC (permalink / raw)
  To: Yang Shi
  Cc: chrubis, vbabka, kirill, osalvador, akpm, stable, linux-mm, linux-kernel

On Wed, Mar 20, 2019 at 02:35:56AM +0800, Yang Shi wrote:
> When MPOL_MF_STRICT was specified and an existing page was already
> on a node that does not follow the policy, mbind() should return -EIO.
> But commit 6f4576e3687b ("mempolicy: apply page table walker on
> queue_pages_range()") broke the rule.
> 
> And, commit c8633798497c ("mm: mempolicy: mbind and migrate_pages
> support thp migration") didn't return the correct value for THP mbind()
> too.
> 
> If MPOL_MF_STRICT is set, ignore vma_migratable() to make sure it reaches
> queue_pages_to_pte_range() or queue_pages_pmd() to check if an existing
> page was already on a node that does not follow the policy.  And,
> non-migratable vma may be used, return -EIO too if MPOL_MF_MOVE or
> MPOL_MF_MOVE_ALL was specified.
> 
> Tested with https://github.com/metan-ucw/ltp/blob/master/testcases/kernel/syscalls/mbind/mbind02.c
> 
> Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()")
> Reported-by: Cyril Hrubis <chrubis@suse.cz>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: stable@vger.kernel.org
> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  mm/mempolicy.c | 40 +++++++++++++++++++++++++++++++++-------
>  1 file changed, 33 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index abe7a67..401c817 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -447,6 +447,13 @@ static inline bool queue_pages_required(struct page *page,
>  	return node_isset(nid, *qp->nmask) == !(flags & MPOL_MF_INVERT);
>  }
>  
> +/*
> + * The queue_pages_pmd() may have three kind of return value.
> + * 1 - pages are placed on he right node or queued successfully.
> + * 0 - THP get split.
> + * -EIO - is migration entry or MPOL_MF_STRICT was specified and an existing
> + *        page was already on a node that does not follow the policy.
> + */
>  static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
>  				unsigned long end, struct mm_walk *walk)
>  {
> @@ -456,7 +463,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
>  	unsigned long flags;
>  
>  	if (unlikely(is_pmd_migration_entry(*pmd))) {
> -		ret = 1;
> +		ret = -EIO;
>  		goto unlock;
>  	}
>  	page = pmd_page(*pmd);
> @@ -473,8 +480,15 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
>  	ret = 1;
>  	flags = qp->flags;
>  	/* go to thp migration */
> -	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
> +	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
> +		if (!vma_migratable(walk->vma)) {
> +			ret = -EIO;
> +			goto unlock;
> +		}
> +
>  		migrate_page_add(page, qp->pagelist, flags);
> +	} else
> +		ret = -EIO;
>  unlock:
>  	spin_unlock(ptl);
>  out:
> @@ -499,8 +513,10 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
>  	ptl = pmd_trans_huge_lock(pmd, vma);
>  	if (ptl) {
>  		ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
> -		if (ret)
> +		if (ret > 0)
>  			return 0;
> +		else if (ret < 0)
> +			return ret;
>  	}
>  
>  	if (pmd_trans_unstable(pmd))
> @@ -521,11 +537,16 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
>  			continue;
>  		if (!queue_pages_required(page, qp))
>  			continue;
> -		migrate_page_add(page, qp->pagelist, flags);
> +		if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
> +			if (!vma_migratable(vma))
> +				break;
> +			migrate_page_add(page, qp->pagelist, flags);
> +		} else
> +			break;
>  	}
>  	pte_unmap_unlock(pte - 1, ptl);
>  	cond_resched();
> -	return 0;
> +	return addr != end ? -EIO : 0;
>  }
>  
>  static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
> @@ -595,7 +616,12 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
>  	unsigned long endvma = vma->vm_end;
>  	unsigned long flags = qp->flags;
>  
> -	if (!vma_migratable(vma))
> +	/*
> +	 * Need check MPOL_MF_STRICT to return -EIO if possible
> +	 * regardless of vma_migratable
> +	 */ 
> +	if (!vma_migratable(vma) &&
> +	    !(flags & MPOL_MF_STRICT))
>  		return 1;
>  
>  	if (endvma > end)
> @@ -622,7 +648,7 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
>  	}
>  
>  	/* queue pages from current vma */
> -	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
> +	if (flags & MPOL_MF_VALID)
>  		return 0;
>  	return 1;
>  }
> -- 
> 1.8.3.1
> 
Acked-by: Rafael Aquini <aquini@redhat.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
  2019-03-20  8:16 ` Oscar Salvador
@ 2019-03-20 18:31   ` Yang Shi
  2019-03-20 18:48     ` Oscar Salvador
  0 siblings, 1 reply; 10+ messages in thread
From: Yang Shi @ 2019-03-20 18:31 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: chrubis, vbabka, kirill, akpm, stable, linux-mm, linux-kernel



On 3/20/19 1:16 AM, Oscar Salvador wrote:
> On Wed, Mar 20, 2019 at 02:35:56AM +0800, Yang Shi wrote:
>> Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()")
>> Reported-by: Cyril Hrubis <chrubis@suse.cz>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: stable@vger.kernel.org
>> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Hi Yang, thanks for the patch.
>
> Some observations below.
>
>>   	}
>>   	page = pmd_page(*pmd);
>> @@ -473,8 +480,15 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
>>   	ret = 1;
>>   	flags = qp->flags;
>>   	/* go to thp migration */
>> -	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
>> +	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
>> +		if (!vma_migratable(walk->vma)) {
>> +			ret = -EIO;
>> +			goto unlock;
>> +		}
>> +
>>   		migrate_page_add(page, qp->pagelist, flags);
>> +	} else
>> +		ret = -EIO;
> 	if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) ||
>         	        !vma_migratable(walk->vma)) {
>                 	ret = -EIO;
>                  goto unlock;
>          }
>
> 	migrate_page_add(page, qp->pagelist, flags);
> unlock:
>          spin_unlock(ptl);
> out:
>          return ret;
>
> seems more clean to me?

Yes, it sounds so.

>
>
>>   unlock:
>>   	spin_unlock(ptl);
>>   out:
>> @@ -499,8 +513,10 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
>>   	ptl = pmd_trans_huge_lock(pmd, vma);
>>   	if (ptl) {
>>   		ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
>> -		if (ret)
>> +		if (ret > 0)
>>   			return 0;
>> +		else if (ret < 0)
>> +			return ret;
> I would go with the following, but that's a matter of taste I guess.
>
> if (ret < 0)
> 	return ret;
> else
> 	return 0;

No, this is not correct. queue_pages_pmd() may return 0, which means THP 
gets split. If it returns 0 the code should just fall through instead of 
returning.

>
>>   	}
>>   
>>   	if (pmd_trans_unstable(pmd))
>> @@ -521,11 +537,16 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
>>   			continue;
>>   		if (!queue_pages_required(page, qp))
>>   			continue;
>> -		migrate_page_add(page, qp->pagelist, flags);
>> +		if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
>> +			if (!vma_migratable(vma))
>> +				break;
>> +			migrate_page_add(page, qp->pagelist, flags);
>> +		} else
>> +			break;
> I might be missing something, but AFAICS neither vma nor flags is going to change
> while we are in queue_pages_pte_range(), so, could not we move the check just
> above the loop?
> In that way, 1) we only perform the check once and 2) if we enter the loop
> we know that we are going to do some work, so, something like:
>
> index af171ccb56a2..7c0e44389826 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -487,6 +487,9 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
>          if (pmd_trans_unstable(pmd))
>                  return 0;
>   
> +       if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || !vma_migratable(vma))
> +               return -EIO;

It sounds not correct to me. We need check if there is existing page on 
the node which is not allowed by the policy. This is what 
queue_pages_required() does.

Thanks,
Yang

> +
>          pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
>          for (; addr != end; pte++, addr += PAGE_SIZE) {
>                  if (!pte_present(*pte))
>
>
>>   	}
>>   	pte_unmap_unlock(pte - 1, ptl);
>>   	cond_resched();
>> -	return 0;
>> +	return addr != end ? -EIO : 0;
> If we can do the above, we can leave the return value as it was.
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
  2019-03-20 18:31   ` Yang Shi
@ 2019-03-20 18:48     ` Oscar Salvador
  0 siblings, 0 replies; 10+ messages in thread
From: Oscar Salvador @ 2019-03-20 18:48 UTC (permalink / raw)
  To: Yang Shi; +Cc: chrubis, vbabka, kirill, akpm, stable, linux-mm, linux-kernel

On Wed, 2019-03-20 at 11:31 -0700, Yang Shi wrote:
> No, this is not correct. queue_pages_pmd() may return 0, which means
> THP 
> gets split. If it returns 0 the code should just fall through instead
> of 
> returning.

Right, I overlooked that.

> It sounds not correct to me. We need check if there is existing page
> on 
> the node which is not allowed by the policy. This is what 
> queue_pages_required() does.

Bleh, I guess it was too early in the morning.
That is the whole point of it actually, so that was quite wrong.

Sorry for trying to mislead you ;-)

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
  2019-03-20  5:53 ` Souptick Joarder
@ 2019-03-20 22:16   ` Andrew Morton
  2019-03-20 23:06     ` Yang Shi
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2019-03-20 22:16 UTC (permalink / raw)
  To: Souptick Joarder
  Cc: Yang Shi, chrubis, Vlastimil Babka, kirill, osalvador, stable,
	Linux-MM, linux-kernel

On Wed, 20 Mar 2019 11:23:03 +0530 Souptick Joarder <jrdr.linux@gmail.com> wrote:

> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -447,6 +447,13 @@ static inline bool queue_pages_required(struct page *page,
> >         return node_isset(nid, *qp->nmask) == !(flags & MPOL_MF_INVERT);
> >  }
> >
> > +/*
> > + * The queue_pages_pmd() may have three kind of return value.
> > + * 1 - pages are placed on he right node or queued successfully.
> 
> Minor typo -> s/he/the ?

Yes, that comment needs some help.  This?

--- a/mm/mempolicy.c~mm-mempolicy-make-mbind-return-eio-when-mpol_mf_strict-is-specified-fix
+++ a/mm/mempolicy.c
@@ -429,9 +429,9 @@ static inline bool queue_pages_required(
 }
 
 /*
- * The queue_pages_pmd() may have three kind of return value.
- * 1 - pages are placed on he right node or queued successfully.
- * 0 - THP get split.
+ * queue_pages_pmd() has three possible return values:
+ * 1 - pages are placed on the right node or queued successfully.
+ * 0 - THP was split.
  * -EIO - is migration entry or MPOL_MF_STRICT was specified and an existing
  *        page was already on a node that does not follow the policy.
  */
_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
  2019-03-20 22:16   ` Andrew Morton
@ 2019-03-20 23:06     ` Yang Shi
  0 siblings, 0 replies; 10+ messages in thread
From: Yang Shi @ 2019-03-20 23:06 UTC (permalink / raw)
  To: Andrew Morton, Souptick Joarder
  Cc: chrubis, Vlastimil Babka, kirill, osalvador, stable, Linux-MM,
	linux-kernel



On 3/20/19 3:16 PM, Andrew Morton wrote:
> On Wed, 20 Mar 2019 11:23:03 +0530 Souptick Joarder <jrdr.linux@gmail.com> wrote:
>
>>> --- a/mm/mempolicy.c
>>> +++ b/mm/mempolicy.c
>>> @@ -447,6 +447,13 @@ static inline bool queue_pages_required(struct page *page,
>>>          return node_isset(nid, *qp->nmask) == !(flags & MPOL_MF_INVERT);
>>>   }
>>>
>>> +/*
>>> + * The queue_pages_pmd() may have three kind of return value.
>>> + * 1 - pages are placed on he right node or queued successfully.
>> Minor typo -> s/he/the ?
> Yes, that comment needs some help.  This?
>
> --- a/mm/mempolicy.c~mm-mempolicy-make-mbind-return-eio-when-mpol_mf_strict-is-specified-fix
> +++ a/mm/mempolicy.c
> @@ -429,9 +429,9 @@ static inline bool queue_pages_required(
>   }
>   
>   /*
> - * The queue_pages_pmd() may have three kind of return value.
> - * 1 - pages are placed on he right node or queued successfully.
> - * 0 - THP get split.
> + * queue_pages_pmd() has three possible return values:
> + * 1 - pages are placed on the right node or queued successfully.
> + * 0 - THP was split.
>    * -EIO - is migration entry or MPOL_MF_STRICT was specified and an existing
>    *        page was already on a node that does not follow the policy.
>    */

It looks good to me. Thanks, Andrew.

Yang

> _


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-03-20 23:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-19 18:35 [PATCH] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified Yang Shi
2019-03-20  0:49 ` David Rientjes
2019-03-20  1:06   ` Yang Shi
2019-03-20  5:53 ` Souptick Joarder
2019-03-20 22:16   ` Andrew Morton
2019-03-20 23:06     ` Yang Shi
2019-03-20  8:16 ` Oscar Salvador
2019-03-20 18:31   ` Yang Shi
2019-03-20 18:48     ` Oscar Salvador
2019-03-20 15:44 ` Rafael Aquini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).