linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little
@ 2020-01-17 23:22 Wei Yang
  2020-01-17 23:22 ` [PATCH 1/5] mm/mremap: format the check in move_normal_pmd() same as move_huge_pmd() Wei Yang
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Wei Yang @ 2020-01-17 23:22 UTC (permalink / raw)
  To: akpm, dan.j.williams, aneesh.kumar, kirill, yang.shi,
	richardw.yang, thellstrom
  Cc: linux-kernel, linux-mm

move_page_tables() tries to move page table by PMD or PTE.

The root reason is if it tries to move PMD, both old and new range should
be PMD aligned. But current code calculate old range and new range
separately.  This leads to some redundant check and calculation.

This cleanup tries to consolidate the range check in one place to reduce
some extra range handling.

Wei Yang (5):
  mm/mremap: format the check in move_normal_pmd() same as
    move_huge_pmd()
  mm/mremap: it is sure to have enough space when extent meets
    requirement
  mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  mm/mremap: calculate extent in one place
  mm/mremap: start addresses are properly aligned

 include/linux/huge_mm.h |  2 +-
 mm/huge_memory.c        |  8 +-------
 mm/mremap.c             | 24 +++++++-----------------
 3 files changed, 9 insertions(+), 25 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/5] mm/mremap: format the check in move_normal_pmd() same as move_huge_pmd()
  2020-01-17 23:22 [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little Wei Yang
@ 2020-01-17 23:22 ` Wei Yang
  2020-01-17 23:22 ` [PATCH 2/5] mm/mremap: it is sure to have enough space when extent meets requirement Wei Yang
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Wei Yang @ 2020-01-17 23:22 UTC (permalink / raw)
  To: akpm, dan.j.williams, aneesh.kumar, kirill, yang.shi,
	richardw.yang, thellstrom
  Cc: linux-kernel, linux-mm

No functional change, just improve the readability and prepare for
following cleanup.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
---
 mm/mremap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 122938dcec15..bcc7aa62f2d9 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -200,8 +200,9 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 	struct mm_struct *mm = vma->vm_mm;
 	pmd_t pmd;
 
-	if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
-	    || old_end - old_addr < PMD_SIZE)
+	if ((old_addr & ~PMD_MASK) ||
+	    (new_addr & ~PMD_MASK) ||
+	    old_end - old_addr < PMD_SIZE)
 		return false;
 
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/5] mm/mremap: it is sure to have enough space when extent meets requirement
  2020-01-17 23:22 [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little Wei Yang
  2020-01-17 23:22 ` [PATCH 1/5] mm/mremap: format the check in move_normal_pmd() same as move_huge_pmd() Wei Yang
@ 2020-01-17 23:22 ` Wei Yang
  2020-01-17 23:22 ` [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables() Wei Yang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Wei Yang @ 2020-01-17 23:22 UTC (permalink / raw)
  To: akpm, dan.j.williams, aneesh.kumar, kirill, yang.shi,
	richardw.yang, thellstrom
  Cc: linux-kernel, linux-mm

old_end is passed to these two function to check whether there is enough
space to do the move, while this check is done before invoking these
functions.

These two functions only would be invoked when extent meets the
requirement and there is one check before invoking these functions:

    if (extent > old_end - old_addr)
        extent = old_end - old_addr;

This implies (old_end - old_addr) won't fail the check in these two
functions.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
---
 include/linux/huge_mm.h |  2 +-
 mm/huge_memory.c        |  7 ++-----
 mm/mremap.c             | 11 ++++-------
 3 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 0b84e13e88e2..2a5281ca46c8 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -42,7 +42,7 @@ extern int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			unsigned long addr, unsigned long end,
 			unsigned char *vec);
 extern bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
-			 unsigned long new_addr, unsigned long old_end,
+			 unsigned long new_addr,
 			 pmd_t *old_pmd, pmd_t *new_pmd);
 extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			unsigned long addr, pgprot_t newprot,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5b2876942639..8f1bbbf01f5b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1871,17 +1871,14 @@ static pmd_t move_soft_dirty_pmd(pmd_t pmd)
 }
 
 bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
-		  unsigned long new_addr, unsigned long old_end,
-		  pmd_t *old_pmd, pmd_t *new_pmd)
+		  unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd)
 {
 	spinlock_t *old_ptl, *new_ptl;
 	pmd_t pmd;
 	struct mm_struct *mm = vma->vm_mm;
 	bool force_flush = false;
 
-	if ((old_addr & ~HPAGE_PMD_MASK) ||
-	    (new_addr & ~HPAGE_PMD_MASK) ||
-	    old_end - old_addr < HPAGE_PMD_SIZE)
+	if ((old_addr & ~HPAGE_PMD_MASK) || (new_addr & ~HPAGE_PMD_MASK))
 		return false;
 
 	/*
diff --git a/mm/mremap.c b/mm/mremap.c
index bcc7aa62f2d9..c2af8ba4ba43 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -193,16 +193,13 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
 
 #ifdef CONFIG_HAVE_MOVE_PMD
 static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
-		  unsigned long new_addr, unsigned long old_end,
-		  pmd_t *old_pmd, pmd_t *new_pmd)
+		  unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd)
 {
 	spinlock_t *old_ptl, *new_ptl;
 	struct mm_struct *mm = vma->vm_mm;
 	pmd_t pmd;
 
-	if ((old_addr & ~PMD_MASK) ||
-	    (new_addr & ~PMD_MASK) ||
-	    old_end - old_addr < PMD_SIZE)
+	if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK))
 		return false;
 
 	/*
@@ -274,7 +271,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 				if (need_rmap_locks)
 					take_rmap_locks(vma);
 				moved = move_huge_pmd(vma, old_addr, new_addr,
-						    old_end, old_pmd, new_pmd);
+						      old_pmd, new_pmd);
 				if (need_rmap_locks)
 					drop_rmap_locks(vma);
 				if (moved)
@@ -294,7 +291,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 			if (need_rmap_locks)
 				take_rmap_locks(vma);
 			moved = move_normal_pmd(vma, old_addr, new_addr,
-					old_end, old_pmd, new_pmd);
+						old_pmd, new_pmd);
 			if (need_rmap_locks)
 				drop_rmap_locks(vma);
 			if (moved)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-17 23:22 [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little Wei Yang
  2020-01-17 23:22 ` [PATCH 1/5] mm/mremap: format the check in move_normal_pmd() same as move_huge_pmd() Wei Yang
  2020-01-17 23:22 ` [PATCH 2/5] mm/mremap: it is sure to have enough space when extent meets requirement Wei Yang
@ 2020-01-17 23:22 ` Wei Yang
  2020-01-26 14:47   ` Dmitry Osipenko
  2020-01-17 23:22 ` [PATCH 4/5] mm/mremap: calculate extent in one place Wei Yang
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Wei Yang @ 2020-01-17 23:22 UTC (permalink / raw)
  To: akpm, dan.j.williams, aneesh.kumar, kirill, yang.shi,
	richardw.yang, thellstrom
  Cc: linux-kernel, linux-mm

Use the general helper instead of do it by hand.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
---
 mm/mremap.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index c2af8ba4ba43..a258914f3ee1 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 
 	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
 		cond_resched();
-		next = (old_addr + PMD_SIZE) & PMD_MASK;
-		/* even if next overflowed, extent below will be ok */
+		next = pmd_addr_end(old_addr, old_end);
 		extent = next - old_addr;
-		if (extent > old_end - old_addr)
-			extent = old_end - old_addr;
 		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
 		if (!old_pmd)
 			continue;
@@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 
 		if (pte_alloc(new_vma->vm_mm, new_pmd))
 			break;
-		next = (new_addr + PMD_SIZE) & PMD_MASK;
+		next = pmd_addr_end(new_addr, new_addr + len);
 		if (extent > next - new_addr)
 			extent = next - new_addr;
 		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/5] mm/mremap: calculate extent in one place
  2020-01-17 23:22 [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little Wei Yang
                   ` (2 preceding siblings ...)
  2020-01-17 23:22 ` [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables() Wei Yang
@ 2020-01-17 23:22 ` Wei Yang
  2020-01-17 23:22 ` [PATCH 5/5] mm/mremap: start addresses are properly aligned Wei Yang
  2020-01-19  0:07 ` [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little Andrew Morton
  5 siblings, 0 replies; 26+ messages in thread
From: Wei Yang @ 2020-01-17 23:22 UTC (permalink / raw)
  To: akpm, dan.j.williams, aneesh.kumar, kirill, yang.shi,
	richardw.yang, thellstrom
  Cc: linux-kernel, linux-mm

Page tables is moved on the base of PMD. This requires both source
and destination range should meet the requirement.

Current code works well since move_huge_pmd() and move_normal_pmd()
would check old_addr and new_addr again. And then return to move_ptes()
if the either of them is not aligned.

In stead of calculating the extent separately, it is better to calculate
in one place, so we know it is not necessary to try move pmd. By doing
so, the logic seems a little clear.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
---
 mm/mremap.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index a258914f3ee1..37335a10779d 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -240,7 +240,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 		unsigned long new_addr, unsigned long len,
 		bool need_rmap_locks)
 {
-	unsigned long extent, next, old_end;
+	unsigned long extent, old_next, new_next, old_end;
 	struct mmu_notifier_range range;
 	pmd_t *old_pmd, *new_pmd;
 
@@ -253,8 +253,9 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 
 	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
 		cond_resched();
-		next = pmd_addr_end(old_addr, old_end);
-		extent = next - old_addr;
+		old_next = pmd_addr_end(old_addr, old_end);
+		new_next = pmd_addr_end(new_addr, new_addr + len);
+		extent = min((old_next - old_addr), (new_next - new_addr));
 		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
 		if (!old_pmd)
 			continue;
@@ -298,9 +299,6 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 
 		if (pte_alloc(new_vma->vm_mm, new_pmd))
 			break;
-		next = pmd_addr_end(new_addr, new_addr + len);
-		if (extent > next - new_addr)
-			extent = next - new_addr;
 		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
 			  new_pmd, new_addr, need_rmap_locks);
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/5] mm/mremap: start addresses are properly aligned
  2020-01-17 23:22 [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little Wei Yang
                   ` (3 preceding siblings ...)
  2020-01-17 23:22 ` [PATCH 4/5] mm/mremap: calculate extent in one place Wei Yang
@ 2020-01-17 23:22 ` Wei Yang
  2020-01-19  0:07 ` [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little Andrew Morton
  5 siblings, 0 replies; 26+ messages in thread
From: Wei Yang @ 2020-01-17 23:22 UTC (permalink / raw)
  To: akpm, dan.j.williams, aneesh.kumar, kirill, yang.shi,
	richardw.yang, thellstrom
  Cc: linux-kernel, linux-mm

After previous cleanup, extent is the minimal step for both source and
destination. This means when extent is HPAGE_PMD_SIZE or PMD_SIZE,
old_addr and new_addr are properly aligned too.

Since these two functions are only invoked in move_page_tables, it is
safe to remove the check now.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
---
 mm/huge_memory.c | 3 ---
 mm/mremap.c      | 3 ---
 2 files changed, 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8f1bbbf01f5b..cc98d0f07d0a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1878,9 +1878,6 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 	struct mm_struct *mm = vma->vm_mm;
 	bool force_flush = false;
 
-	if ((old_addr & ~HPAGE_PMD_MASK) || (new_addr & ~HPAGE_PMD_MASK))
-		return false;
-
 	/*
 	 * The destination pmd shouldn't be established, free_pgtables()
 	 * should have release it.
diff --git a/mm/mremap.c b/mm/mremap.c
index 37335a10779d..5d7597fb3023 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -199,9 +199,6 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 	struct mm_struct *mm = vma->vm_mm;
 	pmd_t pmd;
 
-	if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK))
-		return false;
-
 	/*
 	 * The destination pmd shouldn't be established, free_pgtables()
 	 * should have release it.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little
  2020-01-17 23:22 [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little Wei Yang
                   ` (4 preceding siblings ...)
  2020-01-17 23:22 ` [PATCH 5/5] mm/mremap: start addresses are properly aligned Wei Yang
@ 2020-01-19  0:07 ` Andrew Morton
  2020-01-19  2:11   ` Wei Yang
  5 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2020-01-19  0:07 UTC (permalink / raw)
  To: Wei Yang
  Cc: dan.j.williams, aneesh.kumar, kirill, yang.shi, thellstrom,
	linux-kernel, linux-mm

On Sat, 18 Jan 2020 07:22:49 +0800 Wei Yang <richardw.yang@linux.intel.com> wrote:

> move_page_tables() tries to move page table by PMD or PTE.
> 
> The root reason is if it tries to move PMD, both old and new range should
> be PMD aligned. But current code calculate old range and new range
> separately.  This leads to some redundant check and calculation.
> 
> This cleanup tries to consolidate the range check in one place to reduce
> some extra range handling.

Thanks, I grabbed these, aimed at 5.7-rc1.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little
  2020-01-19  0:07 ` [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little Andrew Morton
@ 2020-01-19  2:11   ` Wei Yang
  0 siblings, 0 replies; 26+ messages in thread
From: Wei Yang @ 2020-01-19  2:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Wei Yang, dan.j.williams, aneesh.kumar, kirill, yang.shi,
	thellstrom, linux-kernel, linux-mm

On Sat, Jan 18, 2020 at 04:07:02PM -0800, Andrew Morton wrote:
>On Sat, 18 Jan 2020 07:22:49 +0800 Wei Yang <richardw.yang@linux.intel.com> wrote:
>
>> move_page_tables() tries to move page table by PMD or PTE.
>> 
>> The root reason is if it tries to move PMD, both old and new range should
>> be PMD aligned. But current code calculate old range and new range
>> separately.  This leads to some redundant check and calculation.
>> 
>> This cleanup tries to consolidate the range check in one place to reduce
>> some extra range handling.
>
>Thanks, I grabbed these, aimed at 5.7-rc1.

Thanks :)

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-17 23:22 ` [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables() Wei Yang
@ 2020-01-26 14:47   ` Dmitry Osipenko
  2020-01-27  2:59     ` Andrew Morton
                       ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Dmitry Osipenko @ 2020-01-26 14:47 UTC (permalink / raw)
  To: Wei Yang, akpm, dan.j.williams, aneesh.kumar, kirill, yang.shi,
	thellstrom, Thierry Reding, Jon Hunter
  Cc: linux-kernel, linux-mm, linux-tegra, linux-arm-kernel,
	Russell King - ARM Linux

18.01.2020 02:22, Wei Yang пишет:
> Use the general helper instead of do it by hand.
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> ---
>  mm/mremap.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/mremap.c b/mm/mremap.c
> index c2af8ba4ba43..a258914f3ee1 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>  
>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>  		cond_resched();
> -		next = (old_addr + PMD_SIZE) & PMD_MASK;
> -		/* even if next overflowed, extent below will be ok */
> +		next = pmd_addr_end(old_addr, old_end);
>  		extent = next - old_addr;
> -		if (extent > old_end - old_addr)
> -			extent = old_end - old_addr;
>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>  		if (!old_pmd)
>  			continue;
> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>  
>  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>  			break;
> -		next = (new_addr + PMD_SIZE) & PMD_MASK;
> +		next = pmd_addr_end(new_addr, new_addr + len);
>  		if (extent > next - new_addr)
>  			extent = next - new_addr;
>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
> 

Hello Wei,

Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
Tegra (ARM32):

  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190

and eventually kernel hangs.

Git's bisection points to this patch and reverting it helps. Please fix,
thanks in advance.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-26 14:47   ` Dmitry Osipenko
@ 2020-01-27  2:59     ` Andrew Morton
  2020-01-29 17:18       ` Dmitry Osipenko
  2020-01-28  0:43     ` Wei Yang
  2020-01-29  9:47     ` Russell King - ARM Linux admin
  2 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2020-01-27  2:59 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Wei Yang, dan.j.williams, aneesh.kumar, kirill, yang.shi,
	thellstrom, Thierry Reding, Jon Hunter, linux-kernel, linux-mm,
	linux-tegra, linux-arm-kernel, Russell King - ARM Linux

On Sun, 26 Jan 2020 17:47:57 +0300 Dmitry Osipenko <digetx@gmail.com> wrote:

> 18.01.2020 02:22, Wei Yang пишет:
> > Use the general helper instead of do it by hand.
> > 
> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> > ---
> >  mm/mremap.c | 7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index c2af8ba4ba43..a258914f3ee1 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >  
> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> >  		cond_resched();
> > -		next = (old_addr + PMD_SIZE) & PMD_MASK;
> > -		/* even if next overflowed, extent below will be ok */
> > +		next = pmd_addr_end(old_addr, old_end);
> >  		extent = next - old_addr;
> > -		if (extent > old_end - old_addr)
> > -			extent = old_end - old_addr;
> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> >  		if (!old_pmd)
> >  			continue;
> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >  
> >  		if (pte_alloc(new_vma->vm_mm, new_pmd))
> >  			break;
> > -		next = (new_addr + PMD_SIZE) & PMD_MASK;
> > +		next = pmd_addr_end(new_addr, new_addr + len);
> >  		if (extent > next - new_addr)
> >  			extent = next - new_addr;
> >  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
> > 
> 
> Hello Wei,
> 
> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> Tegra (ARM32):
> 
>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
> 
> and eventually kernel hangs.
> 
> Git's bisection points to this patch and reverting it helps. Please fix,
> thanks in advance.

Thanks.  I had these tagged for 5.7-rc1 anyway, so I'll drop all five
patches.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-26 14:47   ` Dmitry Osipenko
  2020-01-27  2:59     ` Andrew Morton
@ 2020-01-28  0:43     ` Wei Yang
  2020-01-28 15:59       ` Dmitry Osipenko
  2020-01-29  9:47     ` Russell King - ARM Linux admin
  2 siblings, 1 reply; 26+ messages in thread
From: Wei Yang @ 2020-01-28  0:43 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Wei Yang, akpm, dan.j.williams, aneesh.kumar, kirill, yang.shi,
	thellstrom, Thierry Reding, Jon Hunter, linux-kernel, linux-mm,
	linux-tegra, linux-arm-kernel, Russell King - ARM Linux

On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>18.01.2020 02:22, Wei Yang пишет:
>> Use the general helper instead of do it by hand.
>> 
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> ---
>>  mm/mremap.c | 7 ++-----
>>  1 file changed, 2 insertions(+), 5 deletions(-)
>> 
>> diff --git a/mm/mremap.c b/mm/mremap.c
>> index c2af8ba4ba43..a258914f3ee1 100644
>> --- a/mm/mremap.c
>> +++ b/mm/mremap.c
>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>  
>>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>  		cond_resched();
>> -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>> -		/* even if next overflowed, extent below will be ok */
>> +		next = pmd_addr_end(old_addr, old_end);
>>  		extent = next - old_addr;
>> -		if (extent > old_end - old_addr)
>> -			extent = old_end - old_addr;
>>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>  		if (!old_pmd)
>>  			continue;
>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>  
>>  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>>  			break;
>> -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>> +		next = pmd_addr_end(new_addr, new_addr + len);
>>  		if (extent > next - new_addr)
>>  			extent = next - new_addr;
>>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>> 
>
>Hello Wei,
>
>Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>Tegra (ARM32):
>
>  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>

Thanks.

Would you mind letting me know which case you are testing? Or the special
thing is 32-bit platform?

>and eventually kernel hangs.
>
>Git's bisection points to this patch and reverting it helps. Please fix,
>thanks in advance.

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-28  0:43     ` Wei Yang
@ 2020-01-28 15:59       ` Dmitry Osipenko
  2020-01-28 23:29         ` Wei Yang
  0 siblings, 1 reply; 26+ messages in thread
From: Dmitry Osipenko @ 2020-01-28 15:59 UTC (permalink / raw)
  To: Wei Yang, Russell King - ARM Linux
  Cc: akpm, dan.j.williams, aneesh.kumar, kirill, yang.shi, thellstrom,
	Thierry Reding, Jon Hunter, linux-kernel, linux-mm, linux-tegra,
	linux-arm-kernel

28.01.2020 03:43, Wei Yang пишет:
> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> 18.01.2020 02:22, Wei Yang пишет:
>>> Use the general helper instead of do it by hand.
>>>
>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>> ---
>>>  mm/mremap.c | 7 ++-----
>>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>> index c2af8ba4ba43..a258914f3ee1 100644
>>> --- a/mm/mremap.c
>>> +++ b/mm/mremap.c
>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>  
>>>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>>  		cond_resched();
>>> -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>>> -		/* even if next overflowed, extent below will be ok */
>>> +		next = pmd_addr_end(old_addr, old_end);
>>>  		extent = next - old_addr;
>>> -		if (extent > old_end - old_addr)
>>> -			extent = old_end - old_addr;
>>>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>>  		if (!old_pmd)
>>>  			continue;
>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>  
>>>  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>>>  			break;
>>> -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>>> +		next = pmd_addr_end(new_addr, new_addr + len);
>>>  		if (extent > next - new_addr)
>>>  			extent = next - new_addr;
>>>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>
>>
>> Hello Wei,
>>
>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> Tegra (ARM32):
>>
>>  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>
> 
> Thanks.
> 
> Would you mind letting me know which case you are testing?

Nothing special, systemd starts to fall apart during boot.

> Or the special thing is 32-bit platform?
I have a limited knowledge about mm/, so can't provide detailed explanation.

Please take a look at this:

[1]
https://elixir.bootlin.com/linux/v5.5/source/arch/arm/include/asm/pgtable-2level.h#L210

[2]
https://elixir.bootlin.com/linux/v5.5/source/include/asm-generic/pgtable.h#L549

[3]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0ba10b512eb2e2a3888b6e6cc0e089f5e7a191b

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-28 15:59       ` Dmitry Osipenko
@ 2020-01-28 23:29         ` Wei Yang
  2020-01-28 23:35           ` Dmitry Osipenko
  0 siblings, 1 reply; 26+ messages in thread
From: Wei Yang @ 2020-01-28 23:29 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Wei Yang, Russell King - ARM Linux, akpm, dan.j.williams,
	aneesh.kumar, kirill, yang.shi, thellstrom, Thierry Reding,
	Jon Hunter, linux-kernel, linux-mm, linux-tegra,
	linux-arm-kernel

On Tue, Jan 28, 2020 at 06:59:48PM +0300, Dmitry Osipenko wrote:
>28.01.2020 03:43, Wei Yang пишет:
>> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>>> 18.01.2020 02:22, Wei Yang пишет:
>>>> Use the general helper instead of do it by hand.
>>>>
>>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>>> ---
>>>>  mm/mremap.c | 7 ++-----
>>>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>> index c2af8ba4ba43..a258914f3ee1 100644
>>>> --- a/mm/mremap.c
>>>> +++ b/mm/mremap.c
>>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>  
>>>>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>>>  		cond_resched();
>>>> -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>>>> -		/* even if next overflowed, extent below will be ok */
>>>> +		next = pmd_addr_end(old_addr, old_end);
>>>>  		extent = next - old_addr;
>>>> -		if (extent > old_end - old_addr)
>>>> -			extent = old_end - old_addr;
>>>>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>>>  		if (!old_pmd)
>>>>  			continue;
>>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>  
>>>>  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>>>>  			break;
>>>> -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>>>> +		next = pmd_addr_end(new_addr, new_addr + len);
>>>>  		if (extent > next - new_addr)
>>>>  			extent = next - new_addr;
>>>>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>>
>>>
>>> Hello Wei,
>>>
>>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>>> Tegra (ARM32):
>>>
>>>  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>>
>> 
>> Thanks.
>> 
>> Would you mind letting me know which case you are testing?
>
>Nothing special, systemd starts to fall apart during boot.
>
>> Or the special thing is 32-bit platform?
>I have a limited knowledge about mm/, so can't provide detailed explanation.
>
>Please take a look at this:
>
>[1]
>https://elixir.bootlin.com/linux/v5.5/source/arch/arm/include/asm/pgtable-2level.h#L210
>
>[2]
>https://elixir.bootlin.com/linux/v5.5/source/include/asm-generic/pgtable.h#L549
>
>[3]
>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0ba10b512eb2e2a3888b6e6cc0e089f5e7a191b

Thanks, I see the difference here.

If this is the case, we can't use pmd_addr_end() to simplify the calculation.
This changes the behavior.

I would prepare another patch set to fix this. Would you mind helping me
verify on your platform?

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-28 23:29         ` Wei Yang
@ 2020-01-28 23:35           ` Dmitry Osipenko
  2020-01-29  0:28             ` Wei Yang
  0 siblings, 1 reply; 26+ messages in thread
From: Dmitry Osipenko @ 2020-01-28 23:35 UTC (permalink / raw)
  To: Wei Yang
  Cc: Russell King - ARM Linux, akpm, dan.j.williams, aneesh.kumar,
	kirill, yang.shi, thellstrom, Thierry Reding, Jon Hunter,
	linux-kernel, linux-mm, linux-tegra, linux-arm-kernel

29.01.2020 02:29, Wei Yang пишет:
> On Tue, Jan 28, 2020 at 06:59:48PM +0300, Dmitry Osipenko wrote:
>> 28.01.2020 03:43, Wei Yang пишет:
>>> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>>>> 18.01.2020 02:22, Wei Yang пишет:
>>>>> Use the general helper instead of do it by hand.
>>>>>
>>>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>>>> ---
>>>>>  mm/mremap.c | 7 ++-----
>>>>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>> index c2af8ba4ba43..a258914f3ee1 100644
>>>>> --- a/mm/mremap.c
>>>>> +++ b/mm/mremap.c
>>>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>>  
>>>>>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>>>>  		cond_resched();
>>>>> -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>>>>> -		/* even if next overflowed, extent below will be ok */
>>>>> +		next = pmd_addr_end(old_addr, old_end);
>>>>>  		extent = next - old_addr;
>>>>> -		if (extent > old_end - old_addr)
>>>>> -			extent = old_end - old_addr;
>>>>>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>>>>  		if (!old_pmd)
>>>>>  			continue;
>>>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>>  
>>>>>  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>>>>>  			break;
>>>>> -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>>>>> +		next = pmd_addr_end(new_addr, new_addr + len);
>>>>>  		if (extent > next - new_addr)
>>>>>  			extent = next - new_addr;
>>>>>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>>>
>>>>
>>>> Hello Wei,
>>>>
>>>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>>>> Tegra (ARM32):
>>>>
>>>>  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>>>
>>>
>>> Thanks.
>>>
>>> Would you mind letting me know which case you are testing?
>>
>> Nothing special, systemd starts to fall apart during boot.
>>
>>> Or the special thing is 32-bit platform?
>> I have a limited knowledge about mm/, so can't provide detailed explanation.
>>
>> Please take a look at this:
>>
>> [1]
>> https://elixir.bootlin.com/linux/v5.5/source/arch/arm/include/asm/pgtable-2level.h#L210
>>
>> [2]
>> https://elixir.bootlin.com/linux/v5.5/source/include/asm-generic/pgtable.h#L549
>>
>> [3]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0ba10b512eb2e2a3888b6e6cc0e089f5e7a191b
> 
> Thanks, I see the difference here.
> 
> If this is the case, we can't use pmd_addr_end() to simplify the calculation.
> This changes the behavior.
> 
> I would prepare another patch set to fix this. Would you mind helping me
> verify on your platform?

Sure, please feel free to CC me on that patch.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-28 23:35           ` Dmitry Osipenko
@ 2020-01-29  0:28             ` Wei Yang
  2020-01-29 18:56               ` Dmitry Osipenko
  0 siblings, 1 reply; 26+ messages in thread
From: Wei Yang @ 2020-01-29  0:28 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Wei Yang, Russell King - ARM Linux, akpm, dan.j.williams,
	aneesh.kumar, kirill, yang.shi, thellstrom, Thierry Reding,
	Jon Hunter, linux-kernel, linux-mm, linux-tegra,
	linux-arm-kernel

On Wed, Jan 29, 2020 at 02:35:25AM +0300, Dmitry Osipenko wrote:
>29.01.2020 02:29, Wei Yang пишет:
>> On Tue, Jan 28, 2020 at 06:59:48PM +0300, Dmitry Osipenko wrote:
>>> 28.01.2020 03:43, Wei Yang пишет:
>>>> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>>>>> 18.01.2020 02:22, Wei Yang пишет:
>>>>>> Use the general helper instead of do it by hand.
>>>>>>
>>>>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>>>>> ---
>>>>>>  mm/mremap.c | 7 ++-----
>>>>>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>>> index c2af8ba4ba43..a258914f3ee1 100644
>>>>>> --- a/mm/mremap.c
>>>>>> +++ b/mm/mremap.c
>>>>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>>>  
>>>>>>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>>>>>  		cond_resched();
>>>>>> -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>>>>>> -		/* even if next overflowed, extent below will be ok */
>>>>>> +		next = pmd_addr_end(old_addr, old_end);
>>>>>>  		extent = next - old_addr;
>>>>>> -		if (extent > old_end - old_addr)
>>>>>> -			extent = old_end - old_addr;
>>>>>>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>>>>>  		if (!old_pmd)
>>>>>>  			continue;
>>>>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>>>>  
>>>>>>  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>>>>>>  			break;
>>>>>> -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>>>>>> +		next = pmd_addr_end(new_addr, new_addr + len);
>>>>>>  		if (extent > next - new_addr)
>>>>>>  			extent = next - new_addr;
>>>>>>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>>>>
>>>>>
>>>>> Hello Wei,
>>>>>
>>>>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>>>>> Tegra (ARM32):
>>>>>
>>>>>  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>>>>
>>>>
>>>> Thanks.
>>>>
>>>> Would you mind letting me know which case you are testing?
>>>
>>> Nothing special, systemd starts to fall apart during boot.
>>>
>>>> Or the special thing is 32-bit platform?
>>> I have a limited knowledge about mm/, so can't provide detailed explanation.
>>>
>>> Please take a look at this:
>>>
>>> [1]
>>> https://elixir.bootlin.com/linux/v5.5/source/arch/arm/include/asm/pgtable-2level.h#L210
>>>
>>> [2]
>>> https://elixir.bootlin.com/linux/v5.5/source/include/asm-generic/pgtable.h#L549
>>>
>>> [3]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0ba10b512eb2e2a3888b6e6cc0e089f5e7a191b
>> 
>> Thanks, I see the difference here.
>> 
>> If this is the case, we can't use pmd_addr_end() to simplify the calculation.
>> This changes the behavior.
>> 
>> I would prepare another patch set to fix this. Would you mind helping me
>> verify on your platform?
>
>Sure, please feel free to CC me on that patch.

Thanks, you are in the cc list of v2.

Hope this one works fine on ARM.

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-26 14:47   ` Dmitry Osipenko
  2020-01-27  2:59     ` Andrew Morton
  2020-01-28  0:43     ` Wei Yang
@ 2020-01-29  9:47     ` Russell King - ARM Linux admin
  2020-01-29 14:21       ` Dmitry Osipenko
  2020-01-29 21:57       ` Wei Yang
  2 siblings, 2 replies; 26+ messages in thread
From: Russell King - ARM Linux admin @ 2020-01-29  9:47 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Wei Yang, akpm, dan.j.williams, aneesh.kumar, kirill, yang.shi,
	thellstrom, Thierry Reding, Jon Hunter, linux-kernel, linux-mm,
	linux-tegra, linux-arm-kernel

On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
> 18.01.2020 02:22, Wei Yang пишет:
> > Use the general helper instead of do it by hand.
> > 
> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> > ---
> >  mm/mremap.c | 7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index c2af8ba4ba43..a258914f3ee1 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >  
> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> >  		cond_resched();
> > -		next = (old_addr + PMD_SIZE) & PMD_MASK;
> > -		/* even if next overflowed, extent below will be ok */
> > +		next = pmd_addr_end(old_addr, old_end);
> >  		extent = next - old_addr;
> > -		if (extent > old_end - old_addr)
> > -			extent = old_end - old_addr;
> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> >  		if (!old_pmd)
> >  			continue;
> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >  
> >  		if (pte_alloc(new_vma->vm_mm, new_pmd))
> >  			break;
> > -		next = (new_addr + PMD_SIZE) & PMD_MASK;
> > +		next = pmd_addr_end(new_addr, new_addr + len);
> >  		if (extent > next - new_addr)
> >  			extent = next - new_addr;
> >  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
> > 
> 
> Hello Wei,
> 
> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> Tegra (ARM32):
> 
>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
> 
> and eventually kernel hangs.
> 
> Git's bisection points to this patch and reverting it helps. Please fix,
> thanks in advance.

The above is definitely wrong - pXX_addr_end() are designed to be used
with an address index within the pXX table table and the address index
of either the last entry in the same pXX table or the beginning of the
_next_ pXX table.  Arbitary end address indicies are not allowed.

When page tables are "rolled up" when levels don't exist, it is common
practice for these macros to just return their end address index.
Hence, if they are used with arbitary end address indicies, then the
iteration will fail.

The only way to do this is:

	next = pmd_addr_end(old_addr,
			pud_addr_end(old_addr,
				p4d_addr_end(old_addr,
					pgd_addr_end(old_addr, old_end))));

which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
the correct end argument.  However, that's a more complex and verbose,
and likely less efficient than the current code.

I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
and trying to "clean it up" will just result in less efficient or
broken code.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-29  9:47     ` Russell King - ARM Linux admin
@ 2020-01-29 14:21       ` Dmitry Osipenko
  2020-01-29 21:57       ` Wei Yang
  1 sibling, 0 replies; 26+ messages in thread
From: Dmitry Osipenko @ 2020-01-29 14:21 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Wei Yang, akpm, dan.j.williams, aneesh.kumar, kirill, yang.shi,
	thellstrom, Thierry Reding, Jon Hunter, linux-kernel, linux-mm,
	linux-tegra, linux-arm-kernel

29.01.2020 12:47, Russell King - ARM Linux admin пишет:
> On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> 18.01.2020 02:22, Wei Yang пишет:
>>> Use the general helper instead of do it by hand.
>>>
>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>> ---
>>>  mm/mremap.c | 7 ++-----
>>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>> index c2af8ba4ba43..a258914f3ee1 100644
>>> --- a/mm/mremap.c
>>> +++ b/mm/mremap.c
>>> @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>  
>>>  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>>>  		cond_resched();
>>> -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>>> -		/* even if next overflowed, extent below will be ok */
>>> +		next = pmd_addr_end(old_addr, old_end);
>>>  		extent = next - old_addr;
>>> -		if (extent > old_end - old_addr)
>>> -			extent = old_end - old_addr;
>>>  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>>>  		if (!old_pmd)
>>>  			continue;
>>> @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>>>  
>>>  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>>>  			break;
>>> -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>>> +		next = pmd_addr_end(new_addr, new_addr + len);
>>>  		if (extent > next - new_addr)
>>>  			extent = next - new_addr;
>>>  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>>>
>>
>> Hello Wei,
>>
>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> Tegra (ARM32):
>>
>>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>
>> and eventually kernel hangs.
>>
>> Git's bisection points to this patch and reverting it helps. Please fix,
>> thanks in advance.
> 
> The above is definitely wrong - pXX_addr_end() are designed to be used
> with an address index within the pXX table table and the address index
> of either the last entry in the same pXX table or the beginning of the
> _next_ pXX table.  Arbitary end address indicies are not allowed.
> 
> When page tables are "rolled up" when levels don't exist, it is common
> practice for these macros to just return their end address index.
> Hence, if they are used with arbitary end address indicies, then the
> iteration will fail.
> 
> The only way to do this is:
> 
> 	next = pmd_addr_end(old_addr,
> 			pud_addr_end(old_addr,
> 				p4d_addr_end(old_addr,
> 					pgd_addr_end(old_addr, old_end))));
> 
> which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
> the correct end argument.  However, that's a more complex and verbose,
> and likely less efficient than the current code.
> 
> I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
> and trying to "clean it up" will just result in less efficient or
> broken code.
> 

Hello Russell,

Thank you very much for the extra clarification!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-27  2:59     ` Andrew Morton
@ 2020-01-29 17:18       ` Dmitry Osipenko
  2020-01-29 23:59         ` Andrew Morton
  0 siblings, 1 reply; 26+ messages in thread
From: Dmitry Osipenko @ 2020-01-29 17:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Wei Yang, dan.j.williams, aneesh.kumar, kirill, yang.shi,
	thellstrom, Thierry Reding, Jon Hunter, linux-kernel, linux-mm,
	linux-tegra, linux-arm-kernel, Russell King - ARM Linux

27.01.2020 05:59, Andrew Morton пишет:
> On Sun, 26 Jan 2020 17:47:57 +0300 Dmitry Osipenko <digetx@gmail.com> wrote:
...
>> Hello Wei,
>>
>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> Tegra (ARM32):
>>
>>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>>
>> and eventually kernel hangs.
>>
>> Git's bisection points to this patch and reverting it helps. Please fix,
>> thanks in advance.
> 
> Thanks.  I had these tagged for 5.7-rc1 anyway, so I'll drop all five
> patches.
> 

Hello Andrew,

FYI, I'm still seeing the offending patches in the today's next-20200129.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-29  0:28             ` Wei Yang
@ 2020-01-29 18:56               ` Dmitry Osipenko
  0 siblings, 0 replies; 26+ messages in thread
From: Dmitry Osipenko @ 2020-01-29 18:56 UTC (permalink / raw)
  To: Wei Yang
  Cc: Russell King - ARM Linux, akpm, dan.j.williams, aneesh.kumar,
	kirill, yang.shi, thellstrom, Thierry Reding, Jon Hunter,
	linux-kernel, linux-mm, linux-tegra, linux-arm-kernel

29.01.2020 03:28, Wei Yang пишет:
...
>>> I would prepare another patch set to fix this. Would you mind helping me
>>> verify on your platform?
>>
>> Sure, please feel free to CC me on that patch.
> 
> Thanks, you are in the cc list of v2.
> 
> Hope this one works fine on ARM.

Okay, I'll reply to the v2 after some more extensive testing (tomorrow).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-29  9:47     ` Russell King - ARM Linux admin
  2020-01-29 14:21       ` Dmitry Osipenko
@ 2020-01-29 21:57       ` Wei Yang
  2020-01-29 23:24         ` Russell King - ARM Linux admin
  1 sibling, 1 reply; 26+ messages in thread
From: Wei Yang @ 2020-01-29 21:57 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Dmitry Osipenko, Wei Yang, akpm, dan.j.williams, aneesh.kumar,
	kirill, yang.shi, thellstrom, Thierry Reding, Jon Hunter,
	linux-kernel, linux-mm, linux-tegra, linux-arm-kernel

On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
>On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> 18.01.2020 02:22, Wei Yang пишет:
>> > Use the general helper instead of do it by hand.
>> > 
>> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> > ---
>> >  mm/mremap.c | 7 ++-----
>> >  1 file changed, 2 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/mm/mremap.c b/mm/mremap.c
>> > index c2af8ba4ba43..a258914f3ee1 100644
>> > --- a/mm/mremap.c
>> > +++ b/mm/mremap.c
>> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >  
>> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>> >  		cond_resched();
>> > -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>> > -		/* even if next overflowed, extent below will be ok */
>> > +		next = pmd_addr_end(old_addr, old_end);
>> >  		extent = next - old_addr;
>> > -		if (extent > old_end - old_addr)
>> > -			extent = old_end - old_addr;
>> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>> >  		if (!old_pmd)
>> >  			continue;
>> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >  
>> >  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>> >  			break;
>> > -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>> > +		next = pmd_addr_end(new_addr, new_addr + len);
>> >  		if (extent > next - new_addr)
>> >  			extent = next - new_addr;
>> >  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>> > 
>> 
>> Hello Wei,
>> 
>> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> Tegra (ARM32):
>> 
>>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>> 
>> and eventually kernel hangs.
>> 
>> Git's bisection points to this patch and reverting it helps. Please fix,
>> thanks in advance.
>
>The above is definitely wrong - pXX_addr_end() are designed to be used
>with an address index within the pXX table table and the address index
>of either the last entry in the same pXX table or the beginning of the
>_next_ pXX table.  Arbitary end address indicies are not allowed.
>

#define pmd_addr_end(addr, end)						\
({	unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;	\
	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
})

If my understanding is correct, the definition here align the addr to next PMD
boundary or end.

I don't see the possibility to across another PMD. Do I miss something?

>When page tables are "rolled up" when levels don't exist, it is common
>practice for these macros to just return their end address index.
>Hence, if they are used with arbitary end address indicies, then the
>iteration will fail.
>
>The only way to do this is:
>
>	next = pmd_addr_end(old_addr,
>			pud_addr_end(old_addr,
>				p4d_addr_end(old_addr,
>					pgd_addr_end(old_addr, old_end))));
>
>which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
>the correct end argument.  However, that's a more complex and verbose,
>and likely less efficient than the current code.
>
>I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
>and trying to "clean it up" will just result in less efficient or
>broken code.
>
>-- 
>RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
>According to speedtest.net: 11.9Mbps down 500kbps up

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-29 21:57       ` Wei Yang
@ 2020-01-29 23:24         ` Russell King - ARM Linux admin
  2020-01-30  1:30           ` Wei Yang
  0 siblings, 1 reply; 26+ messages in thread
From: Russell King - ARM Linux admin @ 2020-01-29 23:24 UTC (permalink / raw)
  To: Wei Yang
  Cc: Dmitry Osipenko, akpm, dan.j.williams, aneesh.kumar, kirill,
	yang.shi, thellstrom, Thierry Reding, Jon Hunter, linux-kernel,
	linux-mm, linux-tegra, linux-arm-kernel

On Thu, Jan 30, 2020 at 05:57:45AM +0800, Wei Yang wrote:
> On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
> >On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
> >> 18.01.2020 02:22, Wei Yang пишет:
> >> > Use the general helper instead of do it by hand.
> >> > 
> >> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >> > ---
> >> >  mm/mremap.c | 7 ++-----
> >> >  1 file changed, 2 insertions(+), 5 deletions(-)
> >> > 
> >> > diff --git a/mm/mremap.c b/mm/mremap.c
> >> > index c2af8ba4ba43..a258914f3ee1 100644
> >> > --- a/mm/mremap.c
> >> > +++ b/mm/mremap.c
> >> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >> >  
> >> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> >> >  		cond_resched();
> >> > -		next = (old_addr + PMD_SIZE) & PMD_MASK;
> >> > -		/* even if next overflowed, extent below will be ok */
> >> > +		next = pmd_addr_end(old_addr, old_end);
> >> >  		extent = next - old_addr;
> >> > -		if (extent > old_end - old_addr)
> >> > -			extent = old_end - old_addr;
> >> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> >> >  		if (!old_pmd)
> >> >  			continue;
> >> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >> >  
> >> >  		if (pte_alloc(new_vma->vm_mm, new_pmd))
> >> >  			break;
> >> > -		next = (new_addr + PMD_SIZE) & PMD_MASK;
> >> > +		next = pmd_addr_end(new_addr, new_addr + len);
> >> >  		if (extent > next - new_addr)
> >> >  			extent = next - new_addr;
> >> >  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
> >> > 
> >> 
> >> Hello Wei,
> >> 
> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> >> Tegra (ARM32):
> >> 
> >>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
> >> 
> >> and eventually kernel hangs.
> >> 
> >> Git's bisection points to this patch and reverting it helps. Please fix,
> >> thanks in advance.
> >
> >The above is definitely wrong - pXX_addr_end() are designed to be used
> >with an address index within the pXX table table and the address index
> >of either the last entry in the same pXX table or the beginning of the
> >_next_ pXX table.  Arbitary end address indicies are not allowed.
> >
> 
> #define pmd_addr_end(addr, end)						\
> ({	unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;	\
> 	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
> })
> 
> If my understanding is correct, the definition here align the addr to next PMD
> boundary or end.
> 
> I don't see the possibility to across another PMD. Do I miss something?

Look at the definition of p*_addr_end() that are used when page tables
are rolled up.

> >When page tables are "rolled up" when levels don't exist, it is common
> >practice for these macros to just return their end address index.
> >Hence, if they are used with arbitary end address indicies, then the
> >iteration will fail.
> >
> >The only way to do this is:
> >
> >	next = pmd_addr_end(old_addr,
> >			pud_addr_end(old_addr,
> >				p4d_addr_end(old_addr,
> >					pgd_addr_end(old_addr, old_end))));
> >
> >which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
> >the correct end argument.  However, that's a more complex and verbose,
> >and likely less efficient than the current code.
> >
> >I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
> >and trying to "clean it up" will just result in less efficient or
> >broken code.
> >
> >-- 
> >RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> >FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> >According to speedtest.net: 11.9Mbps down 500kbps up
> 
> -- 
> Wei Yang
> Help you, Help me
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-29 17:18       ` Dmitry Osipenko
@ 2020-01-29 23:59         ` Andrew Morton
  2020-01-30  0:28           ` Stephen Rothwell
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2020-01-29 23:59 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Wei Yang, dan.j.williams, aneesh.kumar, kirill, yang.shi,
	thellstrom, Thierry Reding, Jon Hunter, linux-kernel, linux-mm,
	linux-tegra, linux-arm-kernel, Russell King - ARM Linux,
	Stephen Rothwell

On Wed, 29 Jan 2020 20:18:56 +0300 Dmitry Osipenko <digetx@gmail.com> wrote:

> 27.01.2020 05:59, Andrew Morton пишет:
> > On Sun, 26 Jan 2020 17:47:57 +0300 Dmitry Osipenko <digetx@gmail.com> wrote:
> ...
> >> Hello Wei,
> >>
> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> >> Tegra (ARM32):
> >>
> >>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
> >>
> >> and eventually kernel hangs.
> >>
> >> Git's bisection points to this patch and reverting it helps. Please fix,
> >> thanks in advance.
> > 
> > Thanks.  I had these tagged for 5.7-rc1 anyway, so I'll drop all five
> > patches.
> > 
> 
> Hello Andrew,
> 
> FYI, I'm still seeing the offending patches in the today's next-20200129.

hm, me too.  Stephen, it's unexpected that 9ff4452912d63f ("mm/mremap:
use pmd_addr_end to calculate next in move_page_tables()") is still in
the -next lineup?  It was dropped from -mm on Jan 26?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-29 23:59         ` Andrew Morton
@ 2020-01-30  0:28           ` Stephen Rothwell
  0 siblings, 0 replies; 26+ messages in thread
From: Stephen Rothwell @ 2020-01-30  0:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dmitry Osipenko, Wei Yang, dan.j.williams, aneesh.kumar, kirill,
	yang.shi, thellstrom, Thierry Reding, Jon Hunter, linux-kernel,
	linux-mm, linux-tegra, linux-arm-kernel,
	Russell King - ARM Linux

[-- Attachment #1: Type: text/plain, Size: 638 bytes --]

Hi Andrew,

On Wed, 29 Jan 2020 15:59:07 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:
>
> hm, me too.  Stephen, it's unexpected that 9ff4452912d63f ("mm/mremap:
> use pmd_addr_end to calculate next in move_page_tables()") is still in
> the -next lineup?  It was dropped from -mm on Jan 26?

The mmotm 2020-01-28-20-05 arrived just to late for yesterday's
linux-next (it will be in today's linux-next).  The mmotm before that
was 2020-01-23-21-12.  I attempt to fetch mmotm (along with all the
remaining unmerged trees) about every 30 minutes (sometimes more often)
during the day.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-29 23:24         ` Russell King - ARM Linux admin
@ 2020-01-30  1:30           ` Wei Yang
  2020-01-30 14:15             ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 26+ messages in thread
From: Wei Yang @ 2020-01-30  1:30 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Wei Yang, Dmitry Osipenko, akpm, dan.j.williams, aneesh.kumar,
	kirill, yang.shi, thellstrom, Thierry Reding, Jon Hunter,
	linux-kernel, linux-mm, linux-tegra, linux-arm-kernel

On Wed, Jan 29, 2020 at 11:24:41PM +0000, Russell King - ARM Linux admin wrote:
>On Thu, Jan 30, 2020 at 05:57:45AM +0800, Wei Yang wrote:
>> On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
>> >On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> >> 18.01.2020 02:22, Wei Yang пишет:
>> >> > Use the general helper instead of do it by hand.
>> >> > 
>> >> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> >> > ---
>> >> >  mm/mremap.c | 7 ++-----
>> >> >  1 file changed, 2 insertions(+), 5 deletions(-)
>> >> > 
>> >> > diff --git a/mm/mremap.c b/mm/mremap.c
>> >> > index c2af8ba4ba43..a258914f3ee1 100644
>> >> > --- a/mm/mremap.c
>> >> > +++ b/mm/mremap.c
>> >> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >> >  
>> >> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>> >> >  		cond_resched();
>> >> > -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>> >> > -		/* even if next overflowed, extent below will be ok */
>> >> > +		next = pmd_addr_end(old_addr, old_end);
>> >> >  		extent = next - old_addr;
>> >> > -		if (extent > old_end - old_addr)
>> >> > -			extent = old_end - old_addr;
>> >> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>> >> >  		if (!old_pmd)
>> >> >  			continue;
>> >> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >> >  
>> >> >  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>> >> >  			break;
>> >> > -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>> >> > +		next = pmd_addr_end(new_addr, new_addr + len);
>> >> >  		if (extent > next - new_addr)
>> >> >  			extent = next - new_addr;
>> >> >  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>> >> > 
>> >> 
>> >> Hello Wei,
>> >> 
>> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> >> Tegra (ARM32):
>> >> 
>> >>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>> >> 
>> >> and eventually kernel hangs.
>> >> 
>> >> Git's bisection points to this patch and reverting it helps. Please fix,
>> >> thanks in advance.
>> >
>> >The above is definitely wrong - pXX_addr_end() are designed to be used
>> >with an address index within the pXX table table and the address index
>> >of either the last entry in the same pXX table or the beginning of the
>> >_next_ pXX table.  Arbitary end address indicies are not allowed.
>> >
>> 
>> #define pmd_addr_end(addr, end)						\
>> ({	unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;	\
>> 	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
>> })
>> 
>> If my understanding is correct, the definition here align the addr to next PMD
>> boundary or end.
>> 
>> I don't see the possibility to across another PMD. Do I miss something?
>
>Look at the definition of p*_addr_end() that are used when page tables
>are rolled up.
>

Sorry, I don't get your point.

What's the meaning of "roll up" here?

Would you mind giving me an example? I see pmd_addr_end() is not used in many
places in core kernel. By glancing those usages, all the places use it like
pmd_addr_end(addr, end). Seems no specially handing on the end address.

Or you mean the case when pmd_addr_end() is defined to return "end" directly? 

>> >When page tables are "rolled up" when levels don't exist, it is common
>> >practice for these macros to just return their end address index.
>> >Hence, if they are used with arbitary end address indicies, then the
>> >iteration will fail.
>> >
>> >The only way to do this is:
>> >
>> >	next = pmd_addr_end(old_addr,
>> >			pud_addr_end(old_addr,
>> >				p4d_addr_end(old_addr,
>> >					pgd_addr_end(old_addr, old_end))));
>> >
>> >which gives pmd_addr_end() (and each of the intermediate pXX_addr_end())
>> >the correct end argument.  However, that's a more complex and verbose,
>> >and likely less efficient than the current code.
>> >
>> >I'd suggest that there's nothing to "fix" in the v5.5 code wrt this,
>> >and trying to "clean it up" will just result in less efficient or
>> >broken code.
>> >
>> >-- 
>> >RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>> >FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
>> >According to speedtest.net: 11.9Mbps down 500kbps up
>> 
>> -- 
>> Wei Yang
>> Help you, Help me
>> 
>
>-- 
>RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
>According to speedtest.net: 11.9Mbps down 500kbps up

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-30  1:30           ` Wei Yang
@ 2020-01-30 14:15             ` Russell King - ARM Linux admin
  2020-01-30 21:57               ` Wei Yang
  0 siblings, 1 reply; 26+ messages in thread
From: Russell King - ARM Linux admin @ 2020-01-30 14:15 UTC (permalink / raw)
  To: Wei Yang
  Cc: Dmitry Osipenko, akpm, dan.j.williams, aneesh.kumar, kirill,
	yang.shi, thellstrom, Thierry Reding, Jon Hunter, linux-kernel,
	linux-mm, linux-tegra, linux-arm-kernel

On Thu, Jan 30, 2020 at 09:30:00AM +0800, Wei Yang wrote:
> On Wed, Jan 29, 2020 at 11:24:41PM +0000, Russell King - ARM Linux admin wrote:
> >On Thu, Jan 30, 2020 at 05:57:45AM +0800, Wei Yang wrote:
> >> On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
> >> >On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
> >> >> 18.01.2020 02:22, Wei Yang пишет:
> >> >> > Use the general helper instead of do it by hand.
> >> >> > 
> >> >> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >> >> > ---
> >> >> >  mm/mremap.c | 7 ++-----
> >> >> >  1 file changed, 2 insertions(+), 5 deletions(-)
> >> >> > 
> >> >> > diff --git a/mm/mremap.c b/mm/mremap.c
> >> >> > index c2af8ba4ba43..a258914f3ee1 100644
> >> >> > --- a/mm/mremap.c
> >> >> > +++ b/mm/mremap.c
> >> >> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >> >> >  
> >> >> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> >> >> >  		cond_resched();
> >> >> > -		next = (old_addr + PMD_SIZE) & PMD_MASK;
> >> >> > -		/* even if next overflowed, extent below will be ok */
> >> >> > +		next = pmd_addr_end(old_addr, old_end);
> >> >> >  		extent = next - old_addr;
> >> >> > -		if (extent > old_end - old_addr)
> >> >> > -			extent = old_end - old_addr;
> >> >> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
> >> >> >  		if (!old_pmd)
> >> >> >  			continue;
> >> >> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >> >> >  
> >> >> >  		if (pte_alloc(new_vma->vm_mm, new_pmd))
> >> >> >  			break;
> >> >> > -		next = (new_addr + PMD_SIZE) & PMD_MASK;
> >> >> > +		next = pmd_addr_end(new_addr, new_addr + len);
> >> >> >  		if (extent > next - new_addr)
> >> >> >  			extent = next - new_addr;
> >> >> >  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
> >> >> > 
> >> >> 
> >> >> Hello Wei,
> >> >> 
> >> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
> >> >> Tegra (ARM32):
> >> >> 
> >> >>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
> >> >> 
> >> >> and eventually kernel hangs.
> >> >> 
> >> >> Git's bisection points to this patch and reverting it helps. Please fix,
> >> >> thanks in advance.
> >> >
> >> >The above is definitely wrong - pXX_addr_end() are designed to be used
> >> >with an address index within the pXX table table and the address index
> >> >of either the last entry in the same pXX table or the beginning of the
> >> >_next_ pXX table.  Arbitary end address indicies are not allowed.
> >> >
> >> 
> >> #define pmd_addr_end(addr, end)						\
> >> ({	unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;	\
> >> 	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
> >> })
> >> 
> >> If my understanding is correct, the definition here align the addr to next PMD
> >> boundary or end.
> >> 
> >> I don't see the possibility to across another PMD. Do I miss something?
> >
> >Look at the definition of p*_addr_end() that are used when page tables
> >are rolled up.
> >
> 
> Sorry, I don't get your point.
> 
> What's the meaning of "roll up" here?
> 
> Would you mind giving me an example? I see pmd_addr_end() is not used in many
> places in core kernel. By glancing those usages, all the places use it like
> pmd_addr_end(addr, end). Seems no specially handing on the end address.
> 
> Or you mean the case when pmd_addr_end() is defined to return "end" directly? 

Not all hardware has five levels of page tables.  When hardware does not
have five levels, it is common to "roll up" some of the page tables into
others.

There are generic ways to implement this, which include using:

include/asm-generic/pgtable-nop4d.h
include/asm-generic/pgtable-nopud.h
include/asm-generic/pgtable-nopmd.h

and then there's architecture ways to implement this.  32-bit ARM takes
its implementation for PMD not from the generic version, which
post-dates 32-bit ARM, but from how page table roll-up was implemented
back at the time when the current ARM scheme was devised.  The generic
scheme is unsuitable for 32-bit ARM since we do more than just roll-up
page tables, but this is irrelevent for this discussion.

All three of the generic implementations, and 32-bit ARM, define the
pXX_addr_end() macros thusly:

include/asm-generic/pgtable-nop4d.h:#define p4d_addr_end(addr, end) (end)
include/asm-generic/pgtable-nopmd.h:#define pmd_addr_end(addr, end) (end)
include/asm-generic/pgtable-nopud.h:#define pud_addr_end(addr, end) (end)
arch/arm/include/asm/pgtable-2level.h:#define pmd_addr_end(addr,end) (end)

since, as I stated, pXX_addr_end() expects its "end" argument to be
the address index of the next entry in the immediately upper page
table level, or the address index of the last entry we wish to
process, which ever is smaller.

If it's larger than the address index of the next entry in the
immediately upper page table level, then the effect of all these
macros will be to walk off the end of the current level of page
table.

To see how they _should_ be used, see the loops in free_pgd_range()
and the free_pXX_range() functions called from there and below.

In all cases when the pXX_addr_end() macro was introduced, what I state
above holds true - and I believe still holds true today, until this
patch that has reportedly caused issues.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables()
  2020-01-30 14:15             ` Russell King - ARM Linux admin
@ 2020-01-30 21:57               ` Wei Yang
  0 siblings, 0 replies; 26+ messages in thread
From: Wei Yang @ 2020-01-30 21:57 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Wei Yang, Dmitry Osipenko, akpm, dan.j.williams, aneesh.kumar,
	kirill, yang.shi, thellstrom, Thierry Reding, Jon Hunter,
	linux-kernel, linux-mm, linux-tegra, linux-arm-kernel

On Thu, Jan 30, 2020 at 02:15:05PM +0000, Russell King - ARM Linux admin wrote:
>On Thu, Jan 30, 2020 at 09:30:00AM +0800, Wei Yang wrote:
>> On Wed, Jan 29, 2020 at 11:24:41PM +0000, Russell King - ARM Linux admin wrote:
>> >On Thu, Jan 30, 2020 at 05:57:45AM +0800, Wei Yang wrote:
>> >> On Wed, Jan 29, 2020 at 09:47:38AM +0000, Russell King - ARM Linux admin wrote:
>> >> >On Sun, Jan 26, 2020 at 05:47:57PM +0300, Dmitry Osipenko wrote:
>> >> >> 18.01.2020 02:22, Wei Yang пишет:
>> >> >> > Use the general helper instead of do it by hand.
>> >> >> > 
>> >> >> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> >> >> > ---
>> >> >> >  mm/mremap.c | 7 ++-----
>> >> >> >  1 file changed, 2 insertions(+), 5 deletions(-)
>> >> >> > 
>> >> >> > diff --git a/mm/mremap.c b/mm/mremap.c
>> >> >> > index c2af8ba4ba43..a258914f3ee1 100644
>> >> >> > --- a/mm/mremap.c
>> >> >> > +++ b/mm/mremap.c
>> >> >> > @@ -253,11 +253,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >> >> >  
>> >> >> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
>> >> >> >  		cond_resched();
>> >> >> > -		next = (old_addr + PMD_SIZE) & PMD_MASK;
>> >> >> > -		/* even if next overflowed, extent below will be ok */
>> >> >> > +		next = pmd_addr_end(old_addr, old_end);
>> >> >> >  		extent = next - old_addr;
>> >> >> > -		if (extent > old_end - old_addr)
>> >> >> > -			extent = old_end - old_addr;
>> >> >> >  		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
>> >> >> >  		if (!old_pmd)
>> >> >> >  			continue;
>> >> >> > @@ -301,7 +298,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>> >> >> >  
>> >> >> >  		if (pte_alloc(new_vma->vm_mm, new_pmd))
>> >> >> >  			break;
>> >> >> > -		next = (new_addr + PMD_SIZE) & PMD_MASK;
>> >> >> > +		next = pmd_addr_end(new_addr, new_addr + len);
>> >> >> >  		if (extent > next - new_addr)
>> >> >> >  			extent = next - new_addr;
>> >> >> >  		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
>> >> >> > 
>> >> >> 
>> >> >> Hello Wei,
>> >> >> 
>> >> >> Starting with next-20200122, I'm seeing the following in KMSG on NVIDIA
>> >> >> Tegra (ARM32):
>> >> >> 
>> >> >>   BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:190
>> >> >> 
>> >> >> and eventually kernel hangs.
>> >> >> 
>> >> >> Git's bisection points to this patch and reverting it helps. Please fix,
>> >> >> thanks in advance.
>> >> >
>> >> >The above is definitely wrong - pXX_addr_end() are designed to be used
>> >> >with an address index within the pXX table table and the address index
>> >> >of either the last entry in the same pXX table or the beginning of the
>> >> >_next_ pXX table.  Arbitary end address indicies are not allowed.
>> >> >
>> >> 
>> >> #define pmd_addr_end(addr, end)						\
>> >> ({	unsigned long __boundary = ((addr) + PMD_SIZE) & PMD_MASK;	\
>> >> 	(__boundary - 1 < (end) - 1)? __boundary: (end);		\
>> >> })
>> >> 
>> >> If my understanding is correct, the definition here align the addr to next PMD
>> >> boundary or end.
>> >> 
>> >> I don't see the possibility to across another PMD. Do I miss something?
>> >
>> >Look at the definition of p*_addr_end() that are used when page tables
>> >are rolled up.
>> >
>> 
>> Sorry, I don't get your point.
>> 
>> What's the meaning of "roll up" here?
>> 
>> Would you mind giving me an example? I see pmd_addr_end() is not used in many
>> places in core kernel. By glancing those usages, all the places use it like
>> pmd_addr_end(addr, end). Seems no specially handing on the end address.
>> 
>> Or you mean the case when pmd_addr_end() is defined to return "end" directly? 
>
>Not all hardware has five levels of page tables.  When hardware does not
>have five levels, it is common to "roll up" some of the page tables into
>others.
>
>There are generic ways to implement this, which include using:
>
>include/asm-generic/pgtable-nop4d.h
>include/asm-generic/pgtable-nopud.h
>include/asm-generic/pgtable-nopmd.h
>
>and then there's architecture ways to implement this.  32-bit ARM takes
>its implementation for PMD not from the generic version, which
>post-dates 32-bit ARM, but from how page table roll-up was implemented
>back at the time when the current ARM scheme was devised.  The generic
>scheme is unsuitable for 32-bit ARM since we do more than just roll-up
>page tables, but this is irrelevent for this discussion.
>
>All three of the generic implementations, and 32-bit ARM, define the
>pXX_addr_end() macros thusly:
>
>include/asm-generic/pgtable-nop4d.h:#define p4d_addr_end(addr, end) (end)
>include/asm-generic/pgtable-nopmd.h:#define pmd_addr_end(addr, end) (end)
>include/asm-generic/pgtable-nopud.h:#define pud_addr_end(addr, end) (end)
>arch/arm/include/asm/pgtable-2level.h:#define pmd_addr_end(addr,end) (end)
>
>since, as I stated, pXX_addr_end() expects its "end" argument to be
>the address index of the next entry in the immediately upper page
>table level, or the address index of the last entry we wish to
>process, which ever is smaller.
>
>If it's larger than the address index of the next entry in the
>immediately upper page table level, then the effect of all these
>macros will be to walk off the end of the current level of page
>table.
>
>To see how they _should_ be used, see the loops in free_pgd_range()
>and the free_pXX_range() functions called from there and below.
>
>In all cases when the pXX_addr_end() macro was introduced, what I state
>above holds true - and I believe still holds true today, until this
>patch that has reportedly caused issues.
>

Thanks for your patience in explaining this for me.

I got your point. This is my fault in understanding the code.

>-- 
>RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
>According to speedtest.net: 11.9Mbps down 500kbps up

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-01-30 21:57 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-17 23:22 [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little Wei Yang
2020-01-17 23:22 ` [PATCH 1/5] mm/mremap: format the check in move_normal_pmd() same as move_huge_pmd() Wei Yang
2020-01-17 23:22 ` [PATCH 2/5] mm/mremap: it is sure to have enough space when extent meets requirement Wei Yang
2020-01-17 23:22 ` [PATCH 3/5] mm/mremap: use pmd_addr_end to calculate next in move_page_tables() Wei Yang
2020-01-26 14:47   ` Dmitry Osipenko
2020-01-27  2:59     ` Andrew Morton
2020-01-29 17:18       ` Dmitry Osipenko
2020-01-29 23:59         ` Andrew Morton
2020-01-30  0:28           ` Stephen Rothwell
2020-01-28  0:43     ` Wei Yang
2020-01-28 15:59       ` Dmitry Osipenko
2020-01-28 23:29         ` Wei Yang
2020-01-28 23:35           ` Dmitry Osipenko
2020-01-29  0:28             ` Wei Yang
2020-01-29 18:56               ` Dmitry Osipenko
2020-01-29  9:47     ` Russell King - ARM Linux admin
2020-01-29 14:21       ` Dmitry Osipenko
2020-01-29 21:57       ` Wei Yang
2020-01-29 23:24         ` Russell King - ARM Linux admin
2020-01-30  1:30           ` Wei Yang
2020-01-30 14:15             ` Russell King - ARM Linux admin
2020-01-30 21:57               ` Wei Yang
2020-01-17 23:22 ` [PATCH 4/5] mm/mremap: calculate extent in one place Wei Yang
2020-01-17 23:22 ` [PATCH 5/5] mm/mremap: start addresses are properly aligned Wei Yang
2020-01-19  0:07 ` [PATCH 0/5] mm/mremap.c: cleanup move_page_tables() a little Andrew Morton
2020-01-19  2:11   ` Wei Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).