linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] mm: Fix checking unmapped holes for mbind
@ 2019-11-08 13:01 Li Xinhai
  2019-11-08 13:01 ` [PATCH v4 1/2] mm: Check range first in queue_pages_test_walk Li Xinhai
  2019-11-08 13:01 ` [PATCH v4 2/2] mm: Fix checking unmapped holes for mbind Li Xinhai
  0 siblings, 2 replies; 5+ messages in thread
From: Li Xinhai @ 2019-11-08 13:01 UTC (permalink / raw)
  To: linux-mm


This patchset fix checking unmapped holes for mbind().

First patch makes sure the vma been correctly tracked in .test_walk(), 
so each time when .test_walk() is called, the neighborhood of two vma 
is correct.
Current problem is that the !vma_migratable() check could 
cause return immediately without update tracking to vma.

Second patch fix the inconsistent report of EFAULT when mbind() is 
called for MPOL_DEFAULT and non MPOL_DEFAULT cases, so application do 
not need to have workaround code to handle this special behavior.
Currently there are two problems, one is that the .test_walk() can not 
know there is hole at tail side of range, because .test_walk() only 
call for vma not for hole. The other one is that mbind_range() checks 
for hole at head side of range but do not consider the 
MPOL_MF_DISCONTIG_OK flag as done in .test_walk().

Changes v3->v4:
  - Split to two patch;
  - Illustrates the application visible behaviors in changelog;
  - Hole at tail side of range is checked in .test_walk(), not after 
    walk page. Only the case for whole range in hole is checked after 
    walk page, because our .test_walk() function would not be called at 
    all in this case.

Changes v2->v3:
  - Add more details in change log;
  - Check holes in .test_walk() and after call walk_page_range();

Li Xinhai (2):
  mm: Check range first in queue_pages_test_walk
  mm: Fix checking unmapped holes for mbind

 mm/mempolicy.c | 47 +++++++++++++++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 16 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 1/2] mm: Check range first in queue_pages_test_walk
  2019-11-08 13:01 [PATCH v4 0/2] mm: Fix checking unmapped holes for mbind Li Xinhai
@ 2019-11-08 13:01 ` Li Xinhai
  2019-11-14  9:35   ` Naoya Horiguchi
  2019-11-08 13:01 ` [PATCH v4 2/2] mm: Fix checking unmapped holes for mbind Li Xinhai
  1 sibling, 1 reply; 5+ messages in thread
From: Li Xinhai @ 2019-11-08 13:01 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Naoya Horiguchi, Michal Hocko, Vlastimil Babka,
	Hugh Dickins, linux-man

Checking unmapped hole and updating the previous vma must be handled first,
otherwise the unmapped hole could be calculated from a wrong previous vma.

Several commits were relevant to this error:
commit 6f4576e3687b
("mempolicy: apply page table walker on queue_pages_range()"),
This commit was correct, the VM_PFNMAP check was after updateing previous
vma

commit 48684a65b4e3
(mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)),
This commit added VM_PFNMAP check before updateing previous vma. Then,
there were two VM_PFNMAP check did same thing twice.

commit acda0c334028
(mm/mempolicy.c: get rid of duplicated check for vma(VM_PFNMAP) in queue_page
s_range()),
This commit tried to fix the duplicated VM_PFNMAP check, but it wrongly
removed the one which was after updateing vma.

Fixes: acda0c334028 (mm/mempolicy.c: get rid of duplicated check for vma(VM_PFNMAP) in queue_page
s_range())
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-man <linux-man@vger.kernel.org>
Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
---
 mm/mempolicy.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4ae967b..807f06f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -618,6 +618,16 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
 	unsigned long endvma = vma->vm_end;
 	unsigned long flags = qp->flags;
 
+	/* range check first */
+	if (!(flags & MPOL_MF_DISCONTIG_OK)) {
+		if (!vma->vm_next && vma->vm_end < end)
+			return -EFAULT;
+		if (qp->prev && qp->prev->vm_end < vma->vm_start)
+			return -EFAULT;
+	}
+
+	qp->prev = vma;
+
 	/*
 	 * Need check MPOL_MF_STRICT to return -EIO if possible
 	 * regardless of vma_migratable
@@ -631,15 +641,6 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
 	if (vma->vm_start > start)
 		start = vma->vm_start;
 
-	if (!(flags & MPOL_MF_DISCONTIG_OK)) {
-		if (!vma->vm_next && vma->vm_end < end)
-			return -EFAULT;
-		if (qp->prev && qp->prev->vm_end < vma->vm_start)
-			return -EFAULT;
-	}
-
-	qp->prev = vma;
-
 	if (flags & MPOL_MF_LAZY) {
 		/* Similar to task_numa_work, skip inaccessible VMAs */
 		if (!is_vm_hugetlb_page(vma) &&
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v4 2/2] mm: Fix checking unmapped holes for mbind
  2019-11-08 13:01 [PATCH v4 0/2] mm: Fix checking unmapped holes for mbind Li Xinhai
  2019-11-08 13:01 ` [PATCH v4 1/2] mm: Check range first in queue_pages_test_walk Li Xinhai
@ 2019-11-08 13:01 ` Li Xinhai
       [not found]   ` <20191114093018.GA2144@hori.linux.bs1.fc.nec.co.jp>
  1 sibling, 1 reply; 5+ messages in thread
From: Li Xinhai @ 2019-11-08 13:01 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Naoya Horiguchi, Michal Hocko, Vlastimil Babka,
	Hugh Dickins, linux-man

mbind() is required to report EFAULT if range, specified by addr and len,
contains unmapped holes. In current implementation, below rules are applied
for this checking:
1 Unmapped holes at any part of the specified range should be reported as
  EFAULT if mbind() for none MPOL_DEFAULT cases;
2 Unmapped holes at any part of the specified range should be ignored (do
  not reprot EFAULT) if mbind() for MPOL_DEFAULT case;
3 The whole range in an unmapped hole should be reported as EFAULT;
Note that rule 2 does not fullfill the mbind() API definition, but since
that behavior has existed for long days (the internal flag
MPOL_MF_DISCONTIG_OK is for this purpose), this patch does not plan to
change it.

In current code, application observed inconsistent behavior on rule 1 and
rule 2 respectively. That inconsistency is fixed as below details.

Cases of rule 1:
1) Hole at head side of range. Current code reprot EFAULT, no change by
this patch.
[  vma  ][ hole ][  vma  ]
            [  range  ]
2) Hole at middle of range. Current code report EFAULT, no change by
this patch.
[  vma  ][ hole ][ vma ]
   [     range      ]
3) Hole at tail side of range. Current code do not report EFAULT, this
patch fix it.
[  vma  ][ hole ][ vma ]
   [  range  ]

Cases of rule 2:
1) Hole at head side of range. Current code reprot EFAULT, this patch
fix it.
[  vma  ][ hole ][  vma  ]
            [  range  ]
2) Hole at middle of range. Current code do not report EFAULT, no change
by this patch.
this patch.
[  vma  ][ hole ][ vma]
   [     range      ]
3) Hole at tail side of range. Current code do not report EFAULT, no
change by this patch.
[  vma  ][ hole ][ vma]
   [  range  ]

This patch has no changes to rule 3.

The unmapped hole checking can also be handled by using .pte_hole(),
instead of .test_walk(). But .pte_hole() is called for holes inside and
outside vma, which causes more cost, so this patch keeps the original
design with .test_walk().

Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()")
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-man <linux-man@vger.kernel.org>
Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
---
 mm/mempolicy.c | 40 +++++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 807f06f..c697b29 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -410,7 +410,9 @@ struct queue_pages {
 	struct list_head *pagelist;
 	unsigned long flags;
 	nodemask_t *nmask;
-	struct vm_area_struct *prev;
+	unsigned long start;
+	unsigned long end;
+	struct vm_area_struct *first;
 };
 
 /*
@@ -619,14 +621,20 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
 	unsigned long flags = qp->flags;
 
 	/* range check first */
-	if (!(flags & MPOL_MF_DISCONTIG_OK)) {
-		if (!vma->vm_next && vma->vm_end < end)
-			return -EFAULT;
-		if (qp->prev && qp->prev->vm_end < vma->vm_start)
+	VM_BUG_ON((vma->vm_start > start) || (vma->vm_end < end));
+
+	if (!qp->first) {
+		qp->first = vma;
+		if (!(flags & MPOL_MF_DISCONTIG_OK) &&
+			(qp->start < vma->vm_start))
+			/* hole at head side of range */
 			return -EFAULT;
 	}
-
-	qp->prev = vma;
+	if (!(flags & MPOL_MF_DISCONTIG_OK) &&
+		((vma->vm_end < qp->end) && 
+		(!vma->vm_next || vma->vm_end < vma->vm_next->vm_start)))
+		/* hole at middle or tail of range */
+		return -EFAULT;
 
 	/*
 	 * Need check MPOL_MF_STRICT to return -EIO if possible
@@ -638,8 +646,6 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
 
 	if (endvma > end)
 		endvma = end;
-	if (vma->vm_start > start)
-		start = vma->vm_start;
 
 	if (flags & MPOL_MF_LAZY) {
 		/* Similar to task_numa_work, skip inaccessible VMAs */
@@ -680,14 +686,23 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
 		nodemask_t *nodes, unsigned long flags,
 		struct list_head *pagelist)
 {
+	int err;
 	struct queue_pages qp = {
 		.pagelist = pagelist,
 		.flags = flags,
 		.nmask = nodes,
-		.prev = NULL,
+		.start = start,
+		.end = end,
+		.first = NULL,
 	};
 
-	return walk_page_range(mm, start, end, &queue_pages_walk_ops, &qp);
+	err = walk_page_range(mm, start, end, &queue_pages_walk_ops, &qp);
+
+	if (!qp.first)
+		/* whole range in hole */
+		err = -EFAULT;
+
+	return err;
 }
 
 /*
@@ -739,8 +754,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
 	unsigned long vmend;
 
 	vma = find_vma(mm, start);
-	if (!vma || vma->vm_start > start)
-		return -EFAULT;
+	VM_BUG_ON(!vma);
 
 	prev = vma->vm_prev;
 	if (start > vma->vm_start)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/2] mm: Check range first in queue_pages_test_walk
  2019-11-08 13:01 ` [PATCH v4 1/2] mm: Check range first in queue_pages_test_walk Li Xinhai
@ 2019-11-14  9:35   ` Naoya Horiguchi
  0 siblings, 0 replies; 5+ messages in thread
From: Naoya Horiguchi @ 2019-11-14  9:35 UTC (permalink / raw)
  To: Li Xinhai
  Cc: linux-mm, Andrew Morton, Michal Hocko, Vlastimil Babka,
	Hugh Dickins, linux-man

On Fri, Nov 08, 2019 at 09:01:43PM +0800, Li Xinhai wrote:
> Checking unmapped hole and updating the previous vma must be handled first,
> otherwise the unmapped hole could be calculated from a wrong previous vma.
> 
> Several commits were relevant to this error:
> commit 6f4576e3687b
> ("mempolicy: apply page table walker on queue_pages_range()"),
> This commit was correct, the VM_PFNMAP check was after updateing previous
> vma
> 
> commit 48684a65b4e3
> (mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)),
> This commit added VM_PFNMAP check before updateing previous vma. Then,
> there were two VM_PFNMAP check did same thing twice.
> 
> commit acda0c334028
> (mm/mempolicy.c: get rid of duplicated check for vma(VM_PFNMAP) in queue_page
> s_range()),
> This commit tried to fix the duplicated VM_PFNMAP check, but it wrongly
> removed the one which was after updateing vma.
> 
> Fixes: acda0c334028 (mm/mempolicy.c: get rid of duplicated check for vma(VM_PFNMAP) in queue_page
> s_range())
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: linux-man <linux-man@vger.kernel.org>
> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 2/2] mm: Fix checking unmapped holes for mbind
       [not found]   ` <20191114093018.GA2144@hori.linux.bs1.fc.nec.co.jp>
@ 2019-11-15 15:33     ` lixinhai.lxh
  0 siblings, 0 replies; 5+ messages in thread
From: lixinhai.lxh @ 2019-11-15 15:33 UTC (permalink / raw)
  To: n-horiguchi
  Cc: linux-mm, akpm, mhocko, Vlastimil Babka, Hugh Dickins, linux-man

On 2019-11-14 at 17:30 Naoya Horiguchi wrote:
>On Fri, Nov 08, 2019 at 09:01:44PM +0800, Li Xinhai wrote:
>> mbind() is required to report EFAULT if range, specified by addr and len,
>> contains unmapped holes. In current implementation, below rules are applied
>> for this checking:
>> 1 Unmapped holes at any part of the specified range should be reported as
>>   EFAULT if mbind() for none MPOL_DEFAULT cases;
>> 2 Unmapped holes at any part of the specified range should be ignored (do
>>   not reprot EFAULT) if mbind() for MPOL_DEFAULT case;
>> 3 The whole range in an unmapped hole should be reported as EFAULT;
>> Note that rule 2 does not fullfill the mbind() API definition, but since
>> that behavior has existed for long days (the internal flag
>> MPOL_MF_DISCONTIG_OK is for this purpose), this patch does not plan to
>> change it.
>>
>> In current code, application observed inconsistent behavior on rule 1 and
>> rule 2 respectively. That inconsistency is fixed as below details.
>>
>> Cases of rule 1:
>> 1) Hole at head side of range. Current code reprot EFAULT, no change by
>> this patch.
>> [  vma  ][ hole ][  vma  ]
>>             [  range  ]
>> 2) Hole at middle of range. Current code report EFAULT, no change by
>> this patch.
>> [  vma  ][ hole ][ vma ]
>>    [     range      ]
>> 3) Hole at tail side of range. Current code do not report EFAULT, this
>> patch fix it.
>> [  vma  ][ hole ][ vma ]
>>    [  range  ]
>>
>> Cases of rule 2:
>> 1) Hole at head side of range. Current code reprot EFAULT, this patch
>> fix it.
>> [  vma  ][ hole ][  vma  ]
>>             [  range  ]
>> 2) Hole at middle of range. Current code do not report EFAULT, no change
>> by this patch.
>> this patch.
>> [  vma  ][ hole ][ vma]
>>    [     range      ]
>> 3) Hole at tail side of range. Current code do not report EFAULT, no
>> change by this patch.
>> [  vma  ][ hole ][ vma]
>>    [  range  ]
>>
>> This patch has no changes to rule 3.
>>
>> The unmapped hole checking can also be handled by using .pte_hole(),
>> instead of .test_walk(). But .pte_hole() is called for holes inside and
>> outside vma, which causes more cost, so this patch keeps the original
>> design with .test_walk().
>>
>> Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()")
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: linux-man <linux-man@vger.kernel.org>
>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>> ---
>>  mm/mempolicy.c | 40 +++++++++++++++++++++++++++-------------
>>  1 file changed, 27 insertions(+), 13 deletions(-)
>>
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index 807f06f..c697b29 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -410,7 +410,9 @@ struct queue_pages {
>>  struct list_head *pagelist;
>>  unsigned long flags;
>>  nodemask_t *nmask;
>> -	struct vm_area_struct *prev;
>> +	unsigned long start;
>> +	unsigned long end;
>> +	struct vm_area_struct *first;
>>  };
>> 
>>  /*
>> @@ -619,14 +621,20 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
>>  unsigned long flags = qp->flags;
>> 
>>  /* range check first */
>> -	if (!(flags & MPOL_MF_DISCONTIG_OK)) {
>> -	if (!vma->vm_next && vma->vm_end < end)
>> -	return -EFAULT;
>> -	if (qp->prev && qp->prev->vm_end < vma->vm_start)
>> +	VM_BUG_ON((vma->vm_start > start) || (vma->vm_end < end));
>> +
>> +	if (!qp->first) {
>> +	qp->first = vma;
>> +	if (!(flags & MPOL_MF_DISCONTIG_OK) &&
>> +	(qp->start < vma->vm_start))
>> +	/* hole at head side of range */
>>  return -EFAULT;
>>  }
>> -
>> -	qp->prev = vma;
>> +	if (!(flags & MPOL_MF_DISCONTIG_OK) &&
>> +	((vma->vm_end < qp->end) &&
>
>You here have a trailing whitespace.
>
>Otherwise, looks good to me.
>
>Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> 

Thanks for review, I'v sent out new patch with same title
- Xinhai



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-11-15 15:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-08 13:01 [PATCH v4 0/2] mm: Fix checking unmapped holes for mbind Li Xinhai
2019-11-08 13:01 ` [PATCH v4 1/2] mm: Check range first in queue_pages_test_walk Li Xinhai
2019-11-14  9:35   ` Naoya Horiguchi
2019-11-08 13:01 ` [PATCH v4 2/2] mm: Fix checking unmapped holes for mbind Li Xinhai
     [not found]   ` <20191114093018.GA2144@hori.linux.bs1.fc.nec.co.jp>
2019-11-15 15:33     ` lixinhai.lxh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).