linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/mm: do not BUG_ON() on stall pgd entries
@ 2017-05-31 15:03 Jérôme Glisse
  2017-06-01  9:57 ` Kirill A. Shutemov
  2017-06-01 13:59 ` Andy Lutomirski
  0 siblings, 2 replies; 7+ messages in thread
From: Jérôme Glisse @ 2017-05-31 15:03 UTC (permalink / raw)
  To: linux-mm; +Cc: Jérôme Glisse, Ingo Molnar, Kirill A . Shutemov

Since af2cf278ef4f ("Don't remove PGD entries in remove_pagetable()")
we no longer cleanup stall pgd entries and thus the BUG_ON() inside
sync_global_pgds() is wrong.

This patch remove the BUG_ON() and unconditionaly update stall pgd
entries.

Signed-off-by: JA(C)rA'me Glisse <jglisse@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/mm/init_64.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index ff95fe8..36b9020 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -123,12 +123,7 @@ void sync_global_pgds(unsigned long start, unsigned long end)
 			pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
 			spin_lock(pgt_lock);
 
-			if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
-				BUG_ON(p4d_page_vaddr(*p4d)
-				       != p4d_page_vaddr(*p4d_ref));
-
-			if (p4d_none(*p4d))
-				set_p4d(p4d, *p4d_ref);
+			set_p4d(p4d, *p4d_ref);
 
 			spin_unlock(pgt_lock);
 		}
-- 
2.9.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/mm: do not BUG_ON() on stall pgd entries
  2017-05-31 15:03 [PATCH] x86/mm: do not BUG_ON() on stall pgd entries Jérôme Glisse
@ 2017-06-01  9:57 ` Kirill A. Shutemov
  2017-06-01 13:59 ` Andy Lutomirski
  1 sibling, 0 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2017-06-01  9:57 UTC (permalink / raw)
  To: Jérôme Glisse; +Cc: linux-mm, Ingo Molnar

On Wed, May 31, 2017 at 11:03:49AM -0400, Jerome Glisse wrote:
> Since af2cf278ef4f ("Don't remove PGD entries in remove_pagetable()")
> we no longer cleanup stall pgd entries and thus the BUG_ON() inside
> sync_global_pgds() is wrong.
> 
> This patch remove the BUG_ON() and unconditionaly update stall pgd
> entries.
> 
> Signed-off-by: Jerome Glisse <jglisse@redhat.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/mm: do not BUG_ON() on stall pgd entries
  2017-05-31 15:03 [PATCH] x86/mm: do not BUG_ON() on stall pgd entries Jérôme Glisse
  2017-06-01  9:57 ` Kirill A. Shutemov
@ 2017-06-01 13:59 ` Andy Lutomirski
  2017-06-01 14:33   ` Jerome Glisse
  1 sibling, 1 reply; 7+ messages in thread
From: Andy Lutomirski @ 2017-06-01 13:59 UTC (permalink / raw)
  To: Jérôme Glisse; +Cc: linux-mm, Ingo Molnar, Kirill A . Shutemov

On Wed, May 31, 2017 at 8:03 AM, Jérôme Glisse <jglisse@redhat.com> wrote:
> Since af2cf278ef4f ("Don't remove PGD entries in remove_pagetable()")
> we no longer cleanup stall pgd entries and thus the BUG_ON() inside
> sync_global_pgds() is wrong.
>
> This patch remove the BUG_ON() and unconditionaly update stall pgd
> entries.
>
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/mm/init_64.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index ff95fe8..36b9020 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -123,12 +123,7 @@ void sync_global_pgds(unsigned long start, unsigned long end)
>                         pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
>                         spin_lock(pgt_lock);
>
> -                       if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
> -                               BUG_ON(p4d_page_vaddr(*p4d)
> -                                      != p4d_page_vaddr(*p4d_ref));
> -
> -                       if (p4d_none(*p4d))
> -                               set_p4d(p4d, *p4d_ref);
> +                       set_p4d(p4d, *p4d_ref);

If we have a mismatch in the vmalloc range, vmalloc_fault is going to
screw up and we'll end up using incorrect page tables.

What's causing the mismatch?  If you're hitting this BUG in practice,
I suspect we have a bug elsewhere.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/mm: do not BUG_ON() on stall pgd entries
  2017-06-01 13:59 ` Andy Lutomirski
@ 2017-06-01 14:33   ` Jerome Glisse
  2017-06-01 14:38     ` Andy Lutomirski
  0 siblings, 1 reply; 7+ messages in thread
From: Jerome Glisse @ 2017-06-01 14:33 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: linux-mm, Ingo Molnar, Kirill A . Shutemov

On Thu, Jun 01, 2017 at 06:59:04AM -0700, Andy Lutomirski wrote:
> On Wed, May 31, 2017 at 8:03 AM, Jerome Glisse <jglisse@redhat.com> wrote:
> > Since af2cf278ef4f ("Don't remove PGD entries in remove_pagetable()")
> > we no longer cleanup stall pgd entries and thus the BUG_ON() inside
> > sync_global_pgds() is wrong.
> >
> > This patch remove the BUG_ON() and unconditionaly update stall pgd
> > entries.
> >
> > Signed-off-by: Jerome Glisse <jglisse@redhat.com>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  arch/x86/mm/init_64.c | 7 +------
> >  1 file changed, 1 insertion(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> > index ff95fe8..36b9020 100644
> > --- a/arch/x86/mm/init_64.c
> > +++ b/arch/x86/mm/init_64.c
> > @@ -123,12 +123,7 @@ void sync_global_pgds(unsigned long start, unsigned long end)
> >                         pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
> >                         spin_lock(pgt_lock);
> >
> > -                       if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
> > -                               BUG_ON(p4d_page_vaddr(*p4d)
> > -                                      != p4d_page_vaddr(*p4d_ref));
> > -
> > -                       if (p4d_none(*p4d))
> > -                               set_p4d(p4d, *p4d_ref);
> > +                       set_p4d(p4d, *p4d_ref);
> 
> If we have a mismatch in the vmalloc range, vmalloc_fault is going to
> screw up and we'll end up using incorrect page tables.
> 
> What's causing the mismatch?  If you're hitting this BUG in practice,
> I suspect we have a bug elsewhere.

No bug elsewhere, simply hotplug memory then hotremove same memory you
just hotplugged then hotplug it again and you will trigger this as on
the first hotplug we allocate p4d/pud for the struct pages area, then on
hot remove we free that memory and clear the p4d/pud in the mm_init pgd
but not in any of the other pgds. So at that point the next hotplug
will trigger the BUG because of stall entries from the first hotplug.

Maybe we can add a flag to differentiate between hotplug and vmalloc
(thought looking at virtual address range is a dead give away of vmalloc
versus hotplug) and only avoid BUG_ON and force overwritte for hotplug.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/mm: do not BUG_ON() on stall pgd entries
  2017-06-01 14:33   ` Jerome Glisse
@ 2017-06-01 14:38     ` Andy Lutomirski
  2017-06-01 15:11       ` Jerome Glisse
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Lutomirski @ 2017-06-01 14:38 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: Andy Lutomirski, linux-mm, Ingo Molnar, Kirill A . Shutemov

On Thu, Jun 1, 2017 at 7:33 AM, Jerome Glisse <jglisse@redhat.com> wrote:
> On Thu, Jun 01, 2017 at 06:59:04AM -0700, Andy Lutomirski wrote:
>> On Wed, May 31, 2017 at 8:03 AM, Jérôme Glisse <jglisse@redhat.com> wrote:
>> > Since af2cf278ef4f ("Don't remove PGD entries in remove_pagetable()")
>> > we no longer cleanup stall pgd entries and thus the BUG_ON() inside
>> > sync_global_pgds() is wrong.
>> >
>> > This patch remove the BUG_ON() and unconditionaly update stall pgd
>> > entries.
>> >
>> > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
>> > Cc: Ingo Molnar <mingo@kernel.org>
>> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> > ---
>> >  arch/x86/mm/init_64.c | 7 +------
>> >  1 file changed, 1 insertion(+), 6 deletions(-)
>> >
>> > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> > index ff95fe8..36b9020 100644
>> > --- a/arch/x86/mm/init_64.c
>> > +++ b/arch/x86/mm/init_64.c
>> > @@ -123,12 +123,7 @@ void sync_global_pgds(unsigned long start, unsigned long end)
>> >                         pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
>> >                         spin_lock(pgt_lock);
>> >
>> > -                       if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
>> > -                               BUG_ON(p4d_page_vaddr(*p4d)
>> > -                                      != p4d_page_vaddr(*p4d_ref));
>> > -
>> > -                       if (p4d_none(*p4d))
>> > -                               set_p4d(p4d, *p4d_ref);
>> > +                       set_p4d(p4d, *p4d_ref);
>>
>> If we have a mismatch in the vmalloc range, vmalloc_fault is going to
>> screw up and we'll end up using incorrect page tables.
>>
>> What's causing the mismatch?  If you're hitting this BUG in practice,
>> I suspect we have a bug elsewhere.
>
> No bug elsewhere, simply hotplug memory then hotremove same memory you
> just hotplugged then hotplug it again and you will trigger this as on
> the first hotplug we allocate p4d/pud for the struct pages area, then on
> hot remove we free that memory and clear the p4d/pud in the mm_init pgd
> but not in any of the other pgds.

That sounds like a bug to me.  Either we should remove the stale
entries and fix all the attendant races, or we should unconditionally
allocate second-highest-level kernel page tables in unremovable memory
and never free them.  I prefer the latter even though it's slightly
slower.

> So at that point the next hotplug
> will trigger the BUG because of stall entries from the first hotplug.

By the time we have a pgd with an entry pointing off into the woods,
we've already lost.  Removing the BUG just hides the problem.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/mm: do not BUG_ON() on stall pgd entries
  2017-06-01 14:38     ` Andy Lutomirski
@ 2017-06-01 15:11       ` Jerome Glisse
  2017-06-01 18:38         ` Andy Lutomirski
  0 siblings, 1 reply; 7+ messages in thread
From: Jerome Glisse @ 2017-06-01 15:11 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: linux-mm, Ingo Molnar, Kirill A . Shutemov

On Thu, Jun 01, 2017 at 07:38:25AM -0700, Andy Lutomirski wrote:
> On Thu, Jun 1, 2017 at 7:33 AM, Jerome Glisse <jglisse@redhat.com> wrote:
> > On Thu, Jun 01, 2017 at 06:59:04AM -0700, Andy Lutomirski wrote:
> >> On Wed, May 31, 2017 at 8:03 AM, Jerome Glisse <jglisse@redhat.com> wrote:
> >> > Since af2cf278ef4f ("Don't remove PGD entries in remove_pagetable()")
> >> > we no longer cleanup stall pgd entries and thus the BUG_ON() inside
> >> > sync_global_pgds() is wrong.
> >> >
> >> > This patch remove the BUG_ON() and unconditionaly update stall pgd
> >> > entries.
> >> >
> >> > Signed-off-by: Jerome Glisse <jglisse@redhat.com>
> >> > Cc: Ingo Molnar <mingo@kernel.org>
> >> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >> > ---
> >> >  arch/x86/mm/init_64.c | 7 +------
> >> >  1 file changed, 1 insertion(+), 6 deletions(-)
> >> >
> >> > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> >> > index ff95fe8..36b9020 100644
> >> > --- a/arch/x86/mm/init_64.c
> >> > +++ b/arch/x86/mm/init_64.c
> >> > @@ -123,12 +123,7 @@ void sync_global_pgds(unsigned long start, unsigned long end)
> >> >                         pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
> >> >                         spin_lock(pgt_lock);
> >> >
> >> > -                       if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
> >> > -                               BUG_ON(p4d_page_vaddr(*p4d)
> >> > -                                      != p4d_page_vaddr(*p4d_ref));
> >> > -
> >> > -                       if (p4d_none(*p4d))
> >> > -                               set_p4d(p4d, *p4d_ref);
> >> > +                       set_p4d(p4d, *p4d_ref);
> >>
> >> If we have a mismatch in the vmalloc range, vmalloc_fault is going to
> >> screw up and we'll end up using incorrect page tables.
> >>
> >> What's causing the mismatch?  If you're hitting this BUG in practice,
> >> I suspect we have a bug elsewhere.
> >
> > No bug elsewhere, simply hotplug memory then hotremove same memory you
> > just hotplugged then hotplug it again and you will trigger this as on
> > the first hotplug we allocate p4d/pud for the struct pages area, then on
> > hot remove we free that memory and clear the p4d/pud in the mm_init pgd
> > but not in any of the other pgds.
> 
> That sounds like a bug to me.  Either we should remove the stale
> entries and fix all the attendant races, or we should unconditionally
> allocate second-highest-level kernel page tables in unremovable memory
> and never free them.  I prefer the latter even though it's slightly
> slower.
> 
> > So at that point the next hotplug
> > will trigger the BUG because of stall entries from the first hotplug.
> 
> By the time we have a pgd with an entry pointing off into the woods,
> we've already lost.  Removing the BUG just hides the problem.

Then why did you sign of on af2cf278ef4f9289f88504c3e03cb12f76027575
this is what introduced this change in behavior.

If i understand you correctly you want to avoid deallocating p4d/pud
directory page when hotremove happen ? But this happen in common non
arch specific code vmemmap_populate_basepages() thought we can make
x86 vmemmap_populate() code arch different thought.

So i am not sure how to proceed here. My first attempt was to undo
af2cf278ef4f9289f88504c3e03cb12f76027575 so that we keep all pgds
synchronize. No if we want special case p4d/pud allocation that's
a different approach all together.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/mm: do not BUG_ON() on stall pgd entries
  2017-06-01 15:11       ` Jerome Glisse
@ 2017-06-01 18:38         ` Andy Lutomirski
  0 siblings, 0 replies; 7+ messages in thread
From: Andy Lutomirski @ 2017-06-01 18:38 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: Andy Lutomirski, linux-mm, Ingo Molnar, Kirill A . Shutemov

On Thu, Jun 1, 2017 at 8:11 AM, Jerome Glisse <jglisse@redhat.com> wrote:
> On Thu, Jun 01, 2017 at 07:38:25AM -0700, Andy Lutomirski wrote:
>> On Thu, Jun 1, 2017 at 7:33 AM, Jerome Glisse <jglisse@redhat.com> wrote:
>> > On Thu, Jun 01, 2017 at 06:59:04AM -0700, Andy Lutomirski wrote:
>> >> On Wed, May 31, 2017 at 8:03 AM, Jérôme Glisse <jglisse@redhat.com> wrote:
>> >> > Since af2cf278ef4f ("Don't remove PGD entries in remove_pagetable()")
>> >> > we no longer cleanup stall pgd entries and thus the BUG_ON() inside
>> >> > sync_global_pgds() is wrong.
>> >> >
>> >> > This patch remove the BUG_ON() and unconditionaly update stall pgd
>> >> > entries.
>> >> >
>> >> > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
>> >> > Cc: Ingo Molnar <mingo@kernel.org>
>> >> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> >> > ---
>> >> >  arch/x86/mm/init_64.c | 7 +------
>> >> >  1 file changed, 1 insertion(+), 6 deletions(-)
>> >> >
>> >> > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> >> > index ff95fe8..36b9020 100644
>> >> > --- a/arch/x86/mm/init_64.c
>> >> > +++ b/arch/x86/mm/init_64.c
>> >> > @@ -123,12 +123,7 @@ void sync_global_pgds(unsigned long start, unsigned long end)
>> >> >                         pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
>> >> >                         spin_lock(pgt_lock);
>> >> >
>> >> > -                       if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
>> >> > -                               BUG_ON(p4d_page_vaddr(*p4d)
>> >> > -                                      != p4d_page_vaddr(*p4d_ref));
>> >> > -
>> >> > -                       if (p4d_none(*p4d))
>> >> > -                               set_p4d(p4d, *p4d_ref);
>> >> > +                       set_p4d(p4d, *p4d_ref);
>> >>
>> >> If we have a mismatch in the vmalloc range, vmalloc_fault is going to
>> >> screw up and we'll end up using incorrect page tables.
>> >>
>> >> What's causing the mismatch?  If you're hitting this BUG in practice,
>> >> I suspect we have a bug elsewhere.
>> >
>> > No bug elsewhere, simply hotplug memory then hotremove same memory you
>> > just hotplugged then hotplug it again and you will trigger this as on
>> > the first hotplug we allocate p4d/pud for the struct pages area, then on
>> > hot remove we free that memory and clear the p4d/pud in the mm_init pgd
>> > but not in any of the other pgds.
>>
>> That sounds like a bug to me.  Either we should remove the stale
>> entries and fix all the attendant races, or we should unconditionally
>> allocate second-highest-level kernel page tables in unremovable memory
>> and never free them.  I prefer the latter even though it's slightly
>> slower.
>>
>> > So at that point the next hotplug
>> > will trigger the BUG because of stall entries from the first hotplug.
>>
>> By the time we have a pgd with an entry pointing off into the woods,
>> we've already lost.  Removing the BUG just hides the problem.
>
> Then why did you sign of on af2cf278ef4f9289f88504c3e03cb12f76027575
> this is what introduced this change in behavior.
>
> If i understand you correctly you want to avoid deallocating p4d/pud
> directory page when hotremove happen ? But this happen in common non
> arch specific code vmemmap_populate_basepages() thought we can make
> x86 vmemmap_populate() code arch different thought.
>
> So i am not sure how to proceed here. My first attempt was to undo
> af2cf278ef4f9289f88504c3e03cb12f76027575 so that we keep all pgds
> synchronize. No if we want special case p4d/pud allocation that's
> a different approach all together.
>

The intent of that patch was to leave the pud table allocated and to
leave the pgd entry in place.  The code appears to do that.

I'm not terribly familiar with memory hotplug.  Where's the
problematic code?  I suspect there's a fairly simple bug somewhere
that needs fixing.  kernel_physical_mapping_init() looks right,
though.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-06-01 18:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-31 15:03 [PATCH] x86/mm: do not BUG_ON() on stall pgd entries Jérôme Glisse
2017-06-01  9:57 ` Kirill A. Shutemov
2017-06-01 13:59 ` Andy Lutomirski
2017-06-01 14:33   ` Jerome Glisse
2017-06-01 14:38     ` Andy Lutomirski
2017-06-01 15:11       ` Jerome Glisse
2017-06-01 18:38         ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).