linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/mm: pgds getting out of sync after memory hot remove
@ 2017-05-19 18:01 Jérôme Glisse
  2017-05-19 18:01 ` [PATCH] x86/mm: synchronize pgd in vmemmap_free() Jérôme Glisse
  2017-05-22 13:12 ` [PATCH] x86/mm: pgds getting out of sync after memory hot remove Kirill A. Shutemov
  0 siblings, 2 replies; 6+ messages in thread
From: Jérôme Glisse @ 2017-05-19 18:01 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jérôme Glisse, Kirill A. Shutemov, Andrew Morton,
	Ingo Molnar, Michal Hocko, Mel Gorman

After memory hot remove it seems we do not synchronize pgds for kernel
virtual memory range (on vmemmap_free()). This seems bogus to me as it
means we are left with stall entry for process with mm != mm_init

Yet i am puzzle by the fact that i am only now hitting this issue. It
never was an issue with 4.12 or before ie HMM never triggered following
BUG_ON inside sync_global_pgds():

if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
   BUG_ON(p4d_page_vaddr(*p4d) != p4d_page_vaddr(*p4d_ref));


It seems that Kirill 5 level page table changes play a role in this
behavior change. I could not bisect because HMM is painfull to rebase
for each bisection step so that is just my best guess.


Am i missing something here ? Am i wrong in assuming that should sync
pgd on vmemmap_free() ? If so anyone have a good guess on why i am now
seeing the above BUG_ON ?

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@suse.de>

Jérôme Glisse (1):
  x86/mm: synchronize pgd in vmemmap_free()

 arch/x86/mm/init_64.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

-- 
2.4.11

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] x86/mm: synchronize pgd in vmemmap_free()
  2017-05-19 18:01 [PATCH] x86/mm: pgds getting out of sync after memory hot remove Jérôme Glisse
@ 2017-05-19 18:01 ` Jérôme Glisse
  2017-05-20  0:34   ` John Hubbard
  2017-05-22 13:12 ` [PATCH] x86/mm: pgds getting out of sync after memory hot remove Kirill A. Shutemov
  1 sibling, 1 reply; 6+ messages in thread
From: Jérôme Glisse @ 2017-05-19 18:01 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jérôme Glisse, Kirill A. Shutemov, Andrew Morton,
	Ingo Molnar, Michal Hocko, Mel Gorman

When we free kernel virtual map we should synchronize p4d/pud for
all the pgds to avoid any stall entry in non canonical pgd.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@suse.de>
---
 arch/x86/mm/init_64.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index ff95fe8..df753f8 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -108,8 +108,6 @@ void sync_global_pgds(unsigned long start, unsigned long end)
 		BUILD_BUG_ON(pgd_none(*pgd_ref));
 		p4d_ref = p4d_offset(pgd_ref, address);
 
-		if (p4d_none(*p4d_ref))
-			continue;
 
 		spin_lock(&pgd_lock);
 		list_for_each_entry(page, &pgd_list, lru) {
@@ -123,12 +121,16 @@ void sync_global_pgds(unsigned long start, unsigned long end)
 			pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
 			spin_lock(pgt_lock);
 
-			if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
-				BUG_ON(p4d_page_vaddr(*p4d)
-				       != p4d_page_vaddr(*p4d_ref));
-
-			if (p4d_none(*p4d))
+			if (p4d_none(*p4d_ref)) {
 				set_p4d(p4d, *p4d_ref);
+			} else {
+				if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
+					BUG_ON(p4d_page_vaddr(*p4d)
+					       != p4d_page_vaddr(*p4d_ref));
+
+				if (p4d_none(*p4d))
+					set_p4d(p4d, *p4d_ref);
+			}
 
 			spin_unlock(pgt_lock);
 		}
@@ -1024,6 +1026,7 @@ remove_pagetable(unsigned long start, unsigned long end, bool direct)
 void __ref vmemmap_free(unsigned long start, unsigned long end)
 {
 	remove_pagetable(start, end, false);
+	sync_global_pgds(start, end - 1);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/mm: synchronize pgd in vmemmap_free()
  2017-05-19 18:01 ` [PATCH] x86/mm: synchronize pgd in vmemmap_free() Jérôme Glisse
@ 2017-05-20  0:34   ` John Hubbard
  0 siblings, 0 replies; 6+ messages in thread
From: John Hubbard @ 2017-05-20  0:34 UTC (permalink / raw)
  To: Jérôme Glisse, linux-kernel, linux-mm
  Cc: Kirill A. Shutemov, Andrew Morton, Ingo Molnar, Michal Hocko, Mel Gorman

Hi Jerome,

On 05/19/2017 11:01 AM, Jérôme Glisse wrote:
> When we free kernel virtual map we should synchronize p4d/pud for
> all the pgds to avoid any stall entry in non canonical pgd.

"any stale entry in the non-canonical pgd", is what I think you meant to type there.

Also, it would be nice to clarify that commit description a bit: I'm not sure what is meant here by 
a "non-canonical pgd".

Also, it seems like the reshuffling of the internals of sync_global_pgds() deserves at least some 
mention here. More below.

> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Mel Gorman <mgorman@suse.de>
> ---
>   arch/x86/mm/init_64.c | 17 ++++++++++-------
>   1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index ff95fe8..df753f8 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -108,8 +108,6 @@ void sync_global_pgds(unsigned long start, unsigned long end)
>   		BUILD_BUG_ON(pgd_none(*pgd_ref));
>   		p4d_ref = p4d_offset(pgd_ref, address);
>   
> -		if (p4d_none(*p4d_ref))
> -			continue;
>   
>   		spin_lock(&pgd_lock);
>   		list_for_each_entry(page, &pgd_list, lru) {
> @@ -123,12 +121,16 @@ void sync_global_pgds(unsigned long start, unsigned long end)
>   			pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
>   			spin_lock(pgt_lock);
>   
> -			if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
> -				BUG_ON(p4d_page_vaddr(*p4d)
> -				       != p4d_page_vaddr(*p4d_ref));
> -
> -			if (p4d_none(*p4d))
> +			if (p4d_none(*p4d_ref)) {
>   				set_p4d(p4d, *p4d_ref);

Is the intention really to set p4d to a zeroed *p4d_ref, or is that a mistake?

> +			} else {
> +				if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))

I think the code needs to be somewhat restructured, but as it stands, the above !p4d_none(*p4d_ref) 
will always be true, because first part of the if/else checked for the opposite case: 
p4d_none(*p4d_ref).  This is a side effect of moving that block of code.

> +					BUG_ON(p4d_page_vaddr(*p4d)
> +					       != p4d_page_vaddr(*p4d_ref));
> +
> +				if (p4d_none(*p4d))
> +					set_p4d(p4d, *p4d_ref);
> +			}
>   
>   			spin_unlock(pgt_lock);
>   		}
> @@ -1024,6 +1026,7 @@ remove_pagetable(unsigned long start, unsigned long end, bool direct)
>   void __ref vmemmap_free(unsigned long start, unsigned long end)
>   {
>   	remove_pagetable(start, end, false);
> +	sync_global_pgds(start, end - 1);

This does fix the HMM crash that I was seeing in hmm-next.

thanks,
John Hubbard
NVIDIA

>   }
>   
>   #ifdef CONFIG_MEMORY_HOTREMOVE
> -- 
> 2.4.11
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/mm: pgds getting out of sync after memory hot remove
  2017-05-19 18:01 [PATCH] x86/mm: pgds getting out of sync after memory hot remove Jérôme Glisse
  2017-05-19 18:01 ` [PATCH] x86/mm: synchronize pgd in vmemmap_free() Jérôme Glisse
@ 2017-05-22 13:12 ` Kirill A. Shutemov
  2017-05-22 14:11   ` Jerome Glisse
  1 sibling, 1 reply; 6+ messages in thread
From: Kirill A. Shutemov @ 2017-05-22 13:12 UTC (permalink / raw)
  To: Jérôme Glisse
  Cc: linux-kernel, linux-mm, Kirill A. Shutemov, Andrew Morton,
	Ingo Molnar, Michal Hocko, Mel Gorman

On Fri, May 19, 2017 at 02:01:26PM -0400, Jérôme Glisse wrote:
> After memory hot remove it seems we do not synchronize pgds for kernel
> virtual memory range (on vmemmap_free()). This seems bogus to me as it
> means we are left with stall entry for process with mm != mm_init
> 
> Yet i am puzzle by the fact that i am only now hitting this issue. It
> never was an issue with 4.12 or before ie HMM never triggered following
> BUG_ON inside sync_global_pgds():
> 
> if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
>    BUG_ON(p4d_page_vaddr(*p4d) != p4d_page_vaddr(*p4d_ref));
> 
> 
> It seems that Kirill 5 level page table changes play a role in this
> behavior change. I could not bisect because HMM is painfull to rebase
> for each bisection step so that is just my best guess.
> 
> 
> Am i missing something here ? Am i wrong in assuming that should sync
> pgd on vmemmap_free() ? If so anyone have a good guess on why i am now
> seeing the above BUG_ON ?

What would we gain by syncing pgd on free? Stale pgds are fine as long as
they are not referenced (use-after-free case). Syncing is addtional work.

See af2cf278ef4f ("x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()")
and 5372e155a28f ("x86/mm: Drop unused argument 'removed' from sync_global_pgds()").

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/mm: pgds getting out of sync after memory hot remove
  2017-05-22 13:12 ` [PATCH] x86/mm: pgds getting out of sync after memory hot remove Kirill A. Shutemov
@ 2017-05-22 14:11   ` Jerome Glisse
  2017-05-22 14:29     ` Kirill A. Shutemov
  0 siblings, 1 reply; 6+ messages in thread
From: Jerome Glisse @ 2017-05-22 14:11 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-kernel, linux-mm, Kirill A. Shutemov, Andrew Morton,
	Ingo Molnar, Michal Hocko, Mel Gorman

On Mon, May 22, 2017 at 04:12:15PM +0300, Kirill A. Shutemov wrote:
> On Fri, May 19, 2017 at 02:01:26PM -0400, Jérôme Glisse wrote:
> > After memory hot remove it seems we do not synchronize pgds for kernel
> > virtual memory range (on vmemmap_free()). This seems bogus to me as it
> > means we are left with stall entry for process with mm != mm_init
> > 
> > Yet i am puzzle by the fact that i am only now hitting this issue. It
> > never was an issue with 4.12 or before ie HMM never triggered following
> > BUG_ON inside sync_global_pgds():
> > 
> > if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
> >    BUG_ON(p4d_page_vaddr(*p4d) != p4d_page_vaddr(*p4d_ref));
> > 
> > 
> > It seems that Kirill 5 level page table changes play a role in this
> > behavior change. I could not bisect because HMM is painfull to rebase
> > for each bisection step so that is just my best guess.
> > 
> > 
> > Am i missing something here ? Am i wrong in assuming that should sync
> > pgd on vmemmap_free() ? If so anyone have a good guess on why i am now
> > seeing the above BUG_ON ?
> 
> What would we gain by syncing pgd on free? Stale pgds are fine as long as
> they are not referenced (use-after-free case). Syncing is addtional work.

Well then how do i avoid the BUG_ON above ? Because the init_mm pgd is
clear but none of the stall entry in any other mm. So if i unplug memory
and replug memory at exact same address it tries to allocate new p4d/pud
for struct page area and then when sync_global_pgds() is call it goes
over the list of pgd and BUG_ON() :

if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
    BUG_ON(p4d_page_vaddr(*p4d) != p4d_page_vaddr(*p4d_ref));


So to me either above check need to go and we should overwritte pgd no
matter what or we should restore previous behavior. I don't mind either
one.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/mm: pgds getting out of sync after memory hot remove
  2017-05-22 14:11   ` Jerome Glisse
@ 2017-05-22 14:29     ` Kirill A. Shutemov
  0 siblings, 0 replies; 6+ messages in thread
From: Kirill A. Shutemov @ 2017-05-22 14:29 UTC (permalink / raw)
  Cc: linux-kernel, linux-mm, Kirill A. Shutemov, Andrew Morton,
	Michal Hocko, Mel Gorman

On Mon, May 22, 2017 at 10:11:51AM -0400, Jerome Glisse wrote:
> On Mon, May 22, 2017 at 04:12:15PM +0300, Kirill A. Shutemov wrote:
> > On Fri, May 19, 2017 at 02:01:26PM -0400, Jérôme Glisse wrote:
> > > After memory hot remove it seems we do not synchronize pgds for kernel
> > > virtual memory range (on vmemmap_free()). This seems bogus to me as it
> > > means we are left with stall entry for process with mm != mm_init
> > > 
> > > Yet i am puzzle by the fact that i am only now hitting this issue. It
> > > never was an issue with 4.12 or before ie HMM never triggered following
> > > BUG_ON inside sync_global_pgds():
> > > 
> > > if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
> > >    BUG_ON(p4d_page_vaddr(*p4d) != p4d_page_vaddr(*p4d_ref));
> > > 
> > > 
> > > It seems that Kirill 5 level page table changes play a role in this
> > > behavior change. I could not bisect because HMM is painfull to rebase
> > > for each bisection step so that is just my best guess.
> > > 
> > > 
> > > Am i missing something here ? Am i wrong in assuming that should sync
> > > pgd on vmemmap_free() ? If so anyone have a good guess on why i am now
> > > seeing the above BUG_ON ?
> > 
> > What would we gain by syncing pgd on free? Stale pgds are fine as long as
> > they are not referenced (use-after-free case). Syncing is addtional work.
> 
> Well then how do i avoid the BUG_ON above ? Because the init_mm pgd is
> clear but none of the stall entry in any other mm. So if i unplug memory
> and replug memory at exact same address it tries to allocate new p4d/pud
> for struct page area and then when sync_global_pgds() is call it goes
> over the list of pgd and BUG_ON() :
> 
> if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
>     BUG_ON(p4d_page_vaddr(*p4d) != p4d_page_vaddr(*p4d_ref));
> 
> 
> So to me either above check need to go and we should overwritte pgd no
> matter what or we should restore previous behavior. I don't mind either
> one.

I would prefer to drop the BUG_ON.

Ingo?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-05-22 14:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-19 18:01 [PATCH] x86/mm: pgds getting out of sync after memory hot remove Jérôme Glisse
2017-05-19 18:01 ` [PATCH] x86/mm: synchronize pgd in vmemmap_free() Jérôme Glisse
2017-05-20  0:34   ` John Hubbard
2017-05-22 13:12 ` [PATCH] x86/mm: pgds getting out of sync after memory hot remove Kirill A. Shutemov
2017-05-22 14:11   ` Jerome Glisse
2017-05-22 14:29     ` Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).