All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
@ 2013-02-26 23:57 ` Boris Ostrovsky
  0 siblings, 0 replies; 28+ messages in thread
From: Boris Ostrovsky @ 2013-02-26 23:57 UTC (permalink / raw)
  To: hpa; +Cc: mingo, konrad.wilk, tglx, xen-devel, linux-kernel


----- hpa@zytor.com wrote:

> On 02/26/2013 02:56 PM, Boris Ostrovsky wrote:
> > When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
> > kernel_map_pages() are not made visible (via TLB flush) immediately
> if lazy
> > MMU is on. In environments that support lazy MMU (e.g. Xen) this may
> lead to
> > fatal page faults, for example, when zap_pte_range() needs to
> allocate pages
> > in __tlb_remove_page() -> tlb_next_batch().
> > 
> > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> > ---
> >  arch/x86/mm/pageattr.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> > index ca1f1c2..7b3216e 100644
> > --- a/arch/x86/mm/pageattr.c
> > +++ b/arch/x86/mm/pageattr.c
> > @@ -1369,6 +1369,8 @@ void kernel_map_pages(struct page *page, int
> numpages, int enable)
> >  	 * but that can deadlock->flush only current cpu:
> >  	 */
> >  	__flush_tlb_all();
> > +
> > +	arch_flush_lazy_mmu_mode();
> >  }
> >  
> >  #ifdef CONFIG_HIBERNATION
> > 
> 
> This sounds like a critical fix, i.e. a -stable candidate.  Am I
> correct?

I considered copying stable but then I decided that this is a debugging feature
--- kernel_map_pages() is only defined if CONFIG_DEBUG_PAGEALLOC is set and my
thinking was that stable kernels usually don't do this.


-boris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
@ 2013-02-26 23:57 ` Boris Ostrovsky
  0 siblings, 0 replies; 28+ messages in thread
From: Boris Ostrovsky @ 2013-02-26 23:57 UTC (permalink / raw)
  To: hpa; +Cc: tglx, mingo, xen-devel, linux-kernel, konrad.wilk


----- hpa@zytor.com wrote:

> On 02/26/2013 02:56 PM, Boris Ostrovsky wrote:
> > When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
> > kernel_map_pages() are not made visible (via TLB flush) immediately
> if lazy
> > MMU is on. In environments that support lazy MMU (e.g. Xen) this may
> lead to
> > fatal page faults, for example, when zap_pte_range() needs to
> allocate pages
> > in __tlb_remove_page() -> tlb_next_batch().
> > 
> > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> > ---
> >  arch/x86/mm/pageattr.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> > index ca1f1c2..7b3216e 100644
> > --- a/arch/x86/mm/pageattr.c
> > +++ b/arch/x86/mm/pageattr.c
> > @@ -1369,6 +1369,8 @@ void kernel_map_pages(struct page *page, int
> numpages, int enable)
> >  	 * but that can deadlock->flush only current cpu:
> >  	 */
> >  	__flush_tlb_all();
> > +
> > +	arch_flush_lazy_mmu_mode();
> >  }
> >  
> >  #ifdef CONFIG_HIBERNATION
> > 
> 
> This sounds like a critical fix, i.e. a -stable candidate.  Am I
> correct?

I considered copying stable but then I decided that this is a debugging feature
--- kernel_map_pages() is only defined if CONFIG_DEBUG_PAGEALLOC is set and my
thinking was that stable kernels usually don't do this.


-boris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-26 23:57 ` Boris Ostrovsky
  (?)
  (?)
@ 2013-02-27 22:40 ` H. Peter Anvin
  2013-02-27 23:00   ` Greg KH
  2013-02-27 23:00   ` Greg KH
  -1 siblings, 2 replies; 28+ messages in thread
From: H. Peter Anvin @ 2013-02-27 22:40 UTC (permalink / raw)
  To: Greg KH
  Cc: Boris Ostrovsky, mingo, konrad.wilk, tglx, xen-devel, linux-kernel

Greg, policy opinion?

	-hpa

On 02/26/2013 03:57 PM, Boris Ostrovsky wrote:
> 
> ----- hpa@zytor.com wrote:
> 
>> On 02/26/2013 02:56 PM, Boris Ostrovsky wrote:
>>> When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
>>> kernel_map_pages() are not made visible (via TLB flush) immediately
>> if lazy
>>> MMU is on. In environments that support lazy MMU (e.g. Xen) this may
>> lead to
>>> fatal page faults, for example, when zap_pte_range() needs to
>> allocate pages
>>> in __tlb_remove_page() -> tlb_next_batch().
>>>
>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>> ---
>>>  arch/x86/mm/pageattr.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
>>> index ca1f1c2..7b3216e 100644
>>> --- a/arch/x86/mm/pageattr.c
>>> +++ b/arch/x86/mm/pageattr.c
>>> @@ -1369,6 +1369,8 @@ void kernel_map_pages(struct page *page, int
>> numpages, int enable)
>>>  	 * but that can deadlock->flush only current cpu:
>>>  	 */
>>>  	__flush_tlb_all();
>>> +
>>> +	arch_flush_lazy_mmu_mode();
>>>  }
>>>  
>>>  #ifdef CONFIG_HIBERNATION
>>>
>>
>> This sounds like a critical fix, i.e. a -stable candidate.  Am I
>> correct?
> 
> I considered copying stable but then I decided that this is a debugging feature
> --- kernel_map_pages() is only defined if CONFIG_DEBUG_PAGEALLOC is set and my
> thinking was that stable kernels usually don't do this.
> 
> 
> -boris
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-26 23:57 ` Boris Ostrovsky
  (?)
@ 2013-02-27 22:40 ` H. Peter Anvin
  -1 siblings, 0 replies; 28+ messages in thread
From: H. Peter Anvin @ 2013-02-27 22:40 UTC (permalink / raw)
  To: Greg KH
  Cc: konrad.wilk, linux-kernel, xen-devel, mingo, Boris Ostrovsky, tglx

Greg, policy opinion?

	-hpa

On 02/26/2013 03:57 PM, Boris Ostrovsky wrote:
> 
> ----- hpa@zytor.com wrote:
> 
>> On 02/26/2013 02:56 PM, Boris Ostrovsky wrote:
>>> When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
>>> kernel_map_pages() are not made visible (via TLB flush) immediately
>> if lazy
>>> MMU is on. In environments that support lazy MMU (e.g. Xen) this may
>> lead to
>>> fatal page faults, for example, when zap_pte_range() needs to
>> allocate pages
>>> in __tlb_remove_page() -> tlb_next_batch().
>>>
>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>> ---
>>>  arch/x86/mm/pageattr.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
>>> index ca1f1c2..7b3216e 100644
>>> --- a/arch/x86/mm/pageattr.c
>>> +++ b/arch/x86/mm/pageattr.c
>>> @@ -1369,6 +1369,8 @@ void kernel_map_pages(struct page *page, int
>> numpages, int enable)
>>>  	 * but that can deadlock->flush only current cpu:
>>>  	 */
>>>  	__flush_tlb_all();
>>> +
>>> +	arch_flush_lazy_mmu_mode();
>>>  }
>>>  
>>>  #ifdef CONFIG_HIBERNATION
>>>
>>
>> This sounds like a critical fix, i.e. a -stable candidate.  Am I
>> correct?
> 
> I considered copying stable but then I decided that this is a debugging feature
> --- kernel_map_pages() is only defined if CONFIG_DEBUG_PAGEALLOC is set and my
> thinking was that stable kernels usually don't do this.
> 
> 
> -boris
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-27 22:40 ` H. Peter Anvin
@ 2013-02-27 23:00   ` Greg KH
  2013-02-27 23:07     ` H. Peter Anvin
  2013-02-27 23:07     ` H. Peter Anvin
  2013-02-27 23:00   ` Greg KH
  1 sibling, 2 replies; 28+ messages in thread
From: Greg KH @ 2013-02-27 23:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Boris Ostrovsky, mingo, konrad.wilk, tglx, xen-devel, linux-kernel

On Wed, Feb 27, 2013 at 02:40:01PM -0800, H. Peter Anvin wrote:
> Greg, policy opinion?
> 
> 	-hpa
> 
> On 02/26/2013 03:57 PM, Boris Ostrovsky wrote:
> > 
> > ----- hpa@zytor.com wrote:
> > 
> >> On 02/26/2013 02:56 PM, Boris Ostrovsky wrote:
> >>> When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
> >>> kernel_map_pages() are not made visible (via TLB flush) immediately
> >> if lazy
> >>> MMU is on. In environments that support lazy MMU (e.g. Xen) this may
> >> lead to
> >>> fatal page faults, for example, when zap_pte_range() needs to
> >> allocate pages
> >>> in __tlb_remove_page() -> tlb_next_batch().
> >>>
> >>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> >>> ---
> >>>  arch/x86/mm/pageattr.c | 2 ++
> >>>  1 file changed, 2 insertions(+)
> >>>
> >>> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> >>> index ca1f1c2..7b3216e 100644
> >>> --- a/arch/x86/mm/pageattr.c
> >>> +++ b/arch/x86/mm/pageattr.c
> >>> @@ -1369,6 +1369,8 @@ void kernel_map_pages(struct page *page, int
> >> numpages, int enable)
> >>>  	 * but that can deadlock->flush only current cpu:
> >>>  	 */
> >>>  	__flush_tlb_all();
> >>> +
> >>> +	arch_flush_lazy_mmu_mode();
> >>>  }
> >>>  
> >>>  #ifdef CONFIG_HIBERNATION
> >>>
> >>
> >> This sounds like a critical fix, i.e. a -stable candidate.  Am I
> >> correct?
> > 
> > I considered copying stable but then I decided that this is a debugging feature
> > --- kernel_map_pages() is only defined if CONFIG_DEBUG_PAGEALLOC is set and my
> > thinking was that stable kernels usually don't do this.

"Stable" kernels are used all over the place, like in distros, which
might enable this.

I have no objection to taking this patch in a stable release, as it does
fix a real problem.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-27 22:40 ` H. Peter Anvin
  2013-02-27 23:00   ` Greg KH
@ 2013-02-27 23:00   ` Greg KH
  1 sibling, 0 replies; 28+ messages in thread
From: Greg KH @ 2013-02-27 23:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: konrad.wilk, linux-kernel, xen-devel, mingo, Boris Ostrovsky, tglx

On Wed, Feb 27, 2013 at 02:40:01PM -0800, H. Peter Anvin wrote:
> Greg, policy opinion?
> 
> 	-hpa
> 
> On 02/26/2013 03:57 PM, Boris Ostrovsky wrote:
> > 
> > ----- hpa@zytor.com wrote:
> > 
> >> On 02/26/2013 02:56 PM, Boris Ostrovsky wrote:
> >>> When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
> >>> kernel_map_pages() are not made visible (via TLB flush) immediately
> >> if lazy
> >>> MMU is on. In environments that support lazy MMU (e.g. Xen) this may
> >> lead to
> >>> fatal page faults, for example, when zap_pte_range() needs to
> >> allocate pages
> >>> in __tlb_remove_page() -> tlb_next_batch().
> >>>
> >>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> >>> ---
> >>>  arch/x86/mm/pageattr.c | 2 ++
> >>>  1 file changed, 2 insertions(+)
> >>>
> >>> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> >>> index ca1f1c2..7b3216e 100644
> >>> --- a/arch/x86/mm/pageattr.c
> >>> +++ b/arch/x86/mm/pageattr.c
> >>> @@ -1369,6 +1369,8 @@ void kernel_map_pages(struct page *page, int
> >> numpages, int enable)
> >>>  	 * but that can deadlock->flush only current cpu:
> >>>  	 */
> >>>  	__flush_tlb_all();
> >>> +
> >>> +	arch_flush_lazy_mmu_mode();
> >>>  }
> >>>  
> >>>  #ifdef CONFIG_HIBERNATION
> >>>
> >>
> >> This sounds like a critical fix, i.e. a -stable candidate.  Am I
> >> correct?
> > 
> > I considered copying stable but then I decided that this is a debugging feature
> > --- kernel_map_pages() is only defined if CONFIG_DEBUG_PAGEALLOC is set and my
> > thinking was that stable kernels usually don't do this.

"Stable" kernels are used all over the place, like in distros, which
might enable this.

I have no objection to taking this patch in a stable release, as it does
fix a real problem.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-27 23:00   ` Greg KH
  2013-02-27 23:07     ` H. Peter Anvin
@ 2013-02-27 23:07     ` H. Peter Anvin
  2013-02-28 14:29       ` Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: " Konrad Rzeszutek Wilk
  2013-02-28 14:29       ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 28+ messages in thread
From: H. Peter Anvin @ 2013-02-27 23:07 UTC (permalink / raw)
  To: Greg KH
  Cc: Boris Ostrovsky, mingo, konrad.wilk, tglx, xen-devel, linux-kernel

On 02/27/2013 03:00 PM, Greg KH wrote:
> 
> "Stable" kernels are used all over the place, like in distros, which
> might enable this.
> 
> I have no objection to taking this patch in a stable release, as it does
> fix a real problem.
> 

OK.  I will queue it up in the next fixes (tip:x86/urgent) batch to Linus.

	-hpa



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-27 23:00   ` Greg KH
@ 2013-02-27 23:07     ` H. Peter Anvin
  2013-02-27 23:07     ` H. Peter Anvin
  1 sibling, 0 replies; 28+ messages in thread
From: H. Peter Anvin @ 2013-02-27 23:07 UTC (permalink / raw)
  To: Greg KH
  Cc: konrad.wilk, linux-kernel, xen-devel, mingo, Boris Ostrovsky, tglx

On 02/27/2013 03:00 PM, Greg KH wrote:
> 
> "Stable" kernels are used all over the place, like in distros, which
> might enable this.
> 
> I have no objection to taking this patch in a stable release, as it does
> fix a real problem.
> 

OK.  I will queue it up in the next fixes (tip:x86/urgent) batch to Linus.

	-hpa

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-27 23:07     ` H. Peter Anvin
@ 2013-02-28 14:29       ` Konrad Rzeszutek Wilk
  2013-02-28 15:38         ` Borislav Petkov
  2013-02-28 15:38         ` Borislav Petkov
  2013-02-28 14:29       ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 28+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-02-28 14:29 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Greg KH, Boris Ostrovsky, mingo, tglx, xen-devel, linux-kernel,
	samu.kallio, kraman, jwboyer

On Wed, Feb 27, 2013 at 03:07:35PM -0800, H. Peter Anvin wrote:
> On 02/27/2013 03:00 PM, Greg KH wrote:
> > 
> > "Stable" kernels are used all over the place, like in distros, which
> > might enable this.
> > 
> > I have no objection to taking this patch in a stable release, as it does
> > fix a real problem.
> > 
> 
> OK.  I will queue it up in the next fixes (tip:x86/urgent) batch to Linus.

Thank you.

Could you also consider this one (I CC-ed Ingo on it but never got any
response):


>From a6ed4a88eff4f6329bb4acae3372cccc8a8367d5 Mon Sep 17 00:00:00 2001
From: Samu Kallio <samu.kallio@aberdeencloud.com>
Date: Sun, 17 Feb 2013 02:35:52 +0000
Subject: [PATCH] x86: mm: Fix vmalloc_fault oops during lazy MMU updates.

In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
when lazy MMU updates are enabled, because set_pgd effects are being
deferred.

One instance of this problem is during process mm cleanup with memory
cgroups enabled. The chain of events is as follows:

- zap_pte_range enables lazy MMU updates
- zap_pte_range eventually calls mem_cgroup_charge_statistics,
  which accesses the vmalloc'd mem_cgroup per-cpu stat area
- vmalloc_fault is triggered which tries to sync the corresponding
  PGD entry with set_pgd, but the update is deferred
- vmalloc_fault oopses due to a mismatch in the PUD entries

The OOPs usually looks as so:

------------[ cut here ]------------
kernel BUG at arch/x86/mm/fault.c:396!
invalid opcode: 0000 [#1] SMP
.. snip ..
CPU 1
Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1
RIP: e030:[<ffffffff816271bf>]  [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208
.. snip ..
Call Trace:
 [<ffffffff81627759>] do_page_fault+0x399/0x4b0
 [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110
 [<ffffffff81624065>] page_fault+0x25/0x30
 [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50
 [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350
 [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60
 [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150
 [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80
 [<ffffffff81153e61>] unmap_single_vma+0x531/0x870
 [<ffffffff81154962>] unmap_vmas+0x52/0xa0
 [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100
 [<ffffffff8115c8f8>] exit_mmap+0x98/0x170
 [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff81059ce3>] mmput+0x83/0xf0
 [<ffffffff810624c4>] exit_mm+0x104/0x130
 [<ffffffff8106264a>] do_exit+0x15a/0x8c0
 [<ffffffff810630ff>] do_group_exit+0x3f/0xa0
 [<ffffffff81063177>] sys_exit_group+0x17/0x20
 [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b

Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
changes visible to the consistency checks.

RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737
Reported-and-Tested-by: Krishna Raman <kraman@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/mm/fault.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fb674fd..4f7d793 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
 	if (pgd_none(*pgd_ref))
 		return -1;
 
-	if (pgd_none(*pgd))
+	if (pgd_none(*pgd)) {
 		set_pgd(pgd, *pgd_ref);
-	else
+		arch_flush_lazy_mmu_mode();
+	} else {
 		BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+	}
 
 	/*
 	 * Below here mismatches are bugs because these lower tables
-- 
1.8.0.2

> 
> 	-hpa
> 
> 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-27 23:07     ` H. Peter Anvin
  2013-02-28 14:29       ` Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: " Konrad Rzeszutek Wilk
@ 2013-02-28 14:29       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 28+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-02-28 14:29 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: jwboyer, Greg KH, linux-kernel, xen-devel, mingo,
	Boris Ostrovsky, samu.kallio, kraman, tglx

On Wed, Feb 27, 2013 at 03:07:35PM -0800, H. Peter Anvin wrote:
> On 02/27/2013 03:00 PM, Greg KH wrote:
> > 
> > "Stable" kernels are used all over the place, like in distros, which
> > might enable this.
> > 
> > I have no objection to taking this patch in a stable release, as it does
> > fix a real problem.
> > 
> 
> OK.  I will queue it up in the next fixes (tip:x86/urgent) batch to Linus.

Thank you.

Could you also consider this one (I CC-ed Ingo on it but never got any
response):


>From a6ed4a88eff4f6329bb4acae3372cccc8a8367d5 Mon Sep 17 00:00:00 2001
From: Samu Kallio <samu.kallio@aberdeencloud.com>
Date: Sun, 17 Feb 2013 02:35:52 +0000
Subject: [PATCH] x86: mm: Fix vmalloc_fault oops during lazy MMU updates.

In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
when lazy MMU updates are enabled, because set_pgd effects are being
deferred.

One instance of this problem is during process mm cleanup with memory
cgroups enabled. The chain of events is as follows:

- zap_pte_range enables lazy MMU updates
- zap_pte_range eventually calls mem_cgroup_charge_statistics,
  which accesses the vmalloc'd mem_cgroup per-cpu stat area
- vmalloc_fault is triggered which tries to sync the corresponding
  PGD entry with set_pgd, but the update is deferred
- vmalloc_fault oopses due to a mismatch in the PUD entries

The OOPs usually looks as so:

------------[ cut here ]------------
kernel BUG at arch/x86/mm/fault.c:396!
invalid opcode: 0000 [#1] SMP
.. snip ..
CPU 1
Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1
RIP: e030:[<ffffffff816271bf>]  [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208
.. snip ..
Call Trace:
 [<ffffffff81627759>] do_page_fault+0x399/0x4b0
 [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110
 [<ffffffff81624065>] page_fault+0x25/0x30
 [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50
 [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350
 [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60
 [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150
 [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80
 [<ffffffff81153e61>] unmap_single_vma+0x531/0x870
 [<ffffffff81154962>] unmap_vmas+0x52/0xa0
 [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100
 [<ffffffff8115c8f8>] exit_mmap+0x98/0x170
 [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff81059ce3>] mmput+0x83/0xf0
 [<ffffffff810624c4>] exit_mm+0x104/0x130
 [<ffffffff8106264a>] do_exit+0x15a/0x8c0
 [<ffffffff810630ff>] do_group_exit+0x3f/0xa0
 [<ffffffff81063177>] sys_exit_group+0x17/0x20
 [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b

Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
changes visible to the consistency checks.

RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737
Reported-and-Tested-by: Krishna Raman <kraman@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/mm/fault.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fb674fd..4f7d793 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
 	if (pgd_none(*pgd_ref))
 		return -1;
 
-	if (pgd_none(*pgd))
+	if (pgd_none(*pgd)) {
 		set_pgd(pgd, *pgd_ref);
-	else
+		arch_flush_lazy_mmu_mode();
+	} else {
 		BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+	}
 
 	/*
 	 * Below here mismatches are bugs because these lower tables
-- 
1.8.0.2

> 
> 	-hpa
> 
> 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 14:29       ` Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: " Konrad Rzeszutek Wilk
  2013-02-28 15:38         ` Borislav Petkov
@ 2013-02-28 15:38         ` Borislav Petkov
  2013-02-28 15:53           ` H. Peter Anvin
  2013-02-28 15:53           ` H. Peter Anvin
  1 sibling, 2 replies; 28+ messages in thread
From: Borislav Petkov @ 2013-02-28 15:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: H. Peter Anvin, Greg KH, Boris Ostrovsky, mingo, tglx, xen-devel,
	linux-kernel, samu.kallio, kraman, jwboyer

On Thu, Feb 28, 2013 at 09:29:10AM -0500, Konrad Rzeszutek Wilk wrote:
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index fb674fd..4f7d793 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
>  	if (pgd_none(*pgd_ref))
>  		return -1;
>  
> -	if (pgd_none(*pgd))
> +	if (pgd_none(*pgd)) {
>  		set_pgd(pgd, *pgd_ref);
> -	else
> +		arch_flush_lazy_mmu_mode();

Do I understand it correctly that this would cost us a
"preempt_disable(); preempt_enable()" needlessly on baremetal when
running with CONFIG_PARAVIRT enabled?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 14:29       ` Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: " Konrad Rzeszutek Wilk
@ 2013-02-28 15:38         ` Borislav Petkov
  2013-02-28 15:38         ` Borislav Petkov
  1 sibling, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2013-02-28 15:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: jwboyer, Greg KH, linux-kernel, xen-devel, mingo, H. Peter Anvin,
	Boris Ostrovsky, samu.kallio, kraman, tglx

On Thu, Feb 28, 2013 at 09:29:10AM -0500, Konrad Rzeszutek Wilk wrote:
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index fb674fd..4f7d793 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
>  	if (pgd_none(*pgd_ref))
>  		return -1;
>  
> -	if (pgd_none(*pgd))
> +	if (pgd_none(*pgd)) {
>  		set_pgd(pgd, *pgd_ref);
> -	else
> +		arch_flush_lazy_mmu_mode();

Do I understand it correctly that this would cost us a
"preempt_disable(); preempt_enable()" needlessly on baremetal when
running with CONFIG_PARAVIRT enabled?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 15:38         ` Borislav Petkov
  2013-02-28 15:53           ` H. Peter Anvin
@ 2013-02-28 15:53           ` H. Peter Anvin
  2013-02-28 16:10             ` Borislav Petkov
  2013-02-28 16:10             ` Borislav Petkov
  1 sibling, 2 replies; 28+ messages in thread
From: H. Peter Anvin @ 2013-02-28 15:53 UTC (permalink / raw)
  To: Borislav Petkov, Konrad Rzeszutek Wilk, Greg KH, Boris Ostrovsky,
	mingo, tglx, xen-devel, linux-kernel, samu.kallio, kraman,
	jwboyer

On 02/28/2013 07:38 AM, Borislav Petkov wrote:
> On Thu, Feb 28, 2013 at 09:29:10AM -0500, Konrad Rzeszutek Wilk wrote:
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index fb674fd..4f7d793 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
>>  	if (pgd_none(*pgd_ref))
>>  		return -1;
>>  
>> -	if (pgd_none(*pgd))
>> +	if (pgd_none(*pgd)) {
>>  		set_pgd(pgd, *pgd_ref);
>> -	else
>> +		arch_flush_lazy_mmu_mode();
> 
> Do I understand it correctly that this would cost us a
> "preempt_disable(); preempt_enable()" needlessly on baremetal when
> running with CONFIG_PARAVIRT enabled?
> 

OK, this is mind-boggingly crazy and these things that just drives me
batty about PV.

arch_flush_lazy_mmu_mode() involves a function call and preemption off
*before it even tests for being a noop*.  On bare metal this function
doesn't do anything, and cannot do anything.

At the very least we should have an early filter for the **COMMON!**
case that we are not on a PV platform.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 15:38         ` Borislav Petkov
@ 2013-02-28 15:53           ` H. Peter Anvin
  2013-02-28 15:53           ` H. Peter Anvin
  1 sibling, 0 replies; 28+ messages in thread
From: H. Peter Anvin @ 2013-02-28 15:53 UTC (permalink / raw)
  To: Borislav Petkov, Konrad Rzeszutek Wilk, Greg KH, Boris Ostrovsky,
	mingo, tglx, xen-devel, linux-kernel, samu.kallio, kraman,
	jwboyer

On 02/28/2013 07:38 AM, Borislav Petkov wrote:
> On Thu, Feb 28, 2013 at 09:29:10AM -0500, Konrad Rzeszutek Wilk wrote:
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index fb674fd..4f7d793 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
>>  	if (pgd_none(*pgd_ref))
>>  		return -1;
>>  
>> -	if (pgd_none(*pgd))
>> +	if (pgd_none(*pgd)) {
>>  		set_pgd(pgd, *pgd_ref);
>> -	else
>> +		arch_flush_lazy_mmu_mode();
> 
> Do I understand it correctly that this would cost us a
> "preempt_disable(); preempt_enable()" needlessly on baremetal when
> running with CONFIG_PARAVIRT enabled?
> 

OK, this is mind-boggingly crazy and these things that just drives me
batty about PV.

arch_flush_lazy_mmu_mode() involves a function call and preemption off
*before it even tests for being a noop*.  On bare metal this function
doesn't do anything, and cannot do anything.

At the very least we should have an early filter for the **COMMON!**
case that we are not on a PV platform.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 15:53           ` H. Peter Anvin
@ 2013-02-28 16:10             ` Borislav Petkov
  2013-02-28 16:20               ` Boris Ostrovsky
  2013-02-28 16:20               ` Boris Ostrovsky
  2013-02-28 16:10             ` Borislav Petkov
  1 sibling, 2 replies; 28+ messages in thread
From: Borislav Petkov @ 2013-02-28 16:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Konrad Rzeszutek Wilk, Greg KH, Boris Ostrovsky, mingo, tglx,
	xen-devel, linux-kernel, samu.kallio, kraman, jwboyer

On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
> At the very least we should have an early filter for the **COMMON!**
> case that we are not on a PV platform.

... or, patch it out with the alternatives on baremetal, as Steven
suggested.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 15:53           ` H. Peter Anvin
  2013-02-28 16:10             ` Borislav Petkov
@ 2013-02-28 16:10             ` Borislav Petkov
  1 sibling, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2013-02-28 16:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Konrad Rzeszutek Wilk, Greg KH, jwboyer, linux-kernel, xen-devel,
	mingo, Boris Ostrovsky, samu.kallio, kraman, tglx

On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
> At the very least we should have an early filter for the **COMMON!**
> case that we are not on a PV platform.

... or, patch it out with the alternatives on baremetal, as Steven
suggested.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:10             ` Borislav Petkov
@ 2013-02-28 16:20               ` Boris Ostrovsky
  2013-02-28 16:22                 ` Borislav Petkov
  2013-02-28 16:22                 ` Borislav Petkov
  2013-02-28 16:20               ` Boris Ostrovsky
  1 sibling, 2 replies; 28+ messages in thread
From: Boris Ostrovsky @ 2013-02-28 16:20 UTC (permalink / raw)
  To: Borislav Petkov, H. Peter Anvin, Konrad Rzeszutek Wilk, Greg KH,
	mingo, tglx, xen-devel, linux-kernel, samu.kallio, kraman,
	jwboyer

On 02/28/2013 11:10 AM, Borislav Petkov wrote:
> On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
>> At the very least we should have an early filter for the **COMMON!**
>> case that we are not on a PV platform.
> ... or, patch it out with the alternatives on baremetal, as Steven
> suggested.
>

I think making a check for paravirt_enabled() is safe enough. I'll send 
a patch shortly.

-boris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:10             ` Borislav Petkov
  2013-02-28 16:20               ` Boris Ostrovsky
@ 2013-02-28 16:20               ` Boris Ostrovsky
  1 sibling, 0 replies; 28+ messages in thread
From: Boris Ostrovsky @ 2013-02-28 16:20 UTC (permalink / raw)
  To: Borislav Petkov, H. Peter Anvin, Konrad Rzeszutek Wilk, Greg KH,
	mingo, tglx, xen-devel, linux-kernel, samu.kallio, kraman,
	jwboyer

On 02/28/2013 11:10 AM, Borislav Petkov wrote:
> On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
>> At the very least we should have an early filter for the **COMMON!**
>> case that we are not on a PV platform.
> ... or, patch it out with the alternatives on baremetal, as Steven
> suggested.
>

I think making a check for paravirt_enabled() is safe enough. I'll send 
a patch shortly.

-boris

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:20               ` Boris Ostrovsky
  2013-02-28 16:22                 ` Borislav Petkov
@ 2013-02-28 16:22                 ` Borislav Petkov
  2013-02-28 16:24                   ` H. Peter Anvin
                                     ` (3 more replies)
  1 sibling, 4 replies; 28+ messages in thread
From: Borislav Petkov @ 2013-02-28 16:22 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: H. Peter Anvin, Konrad Rzeszutek Wilk, Greg KH, mingo, tglx,
	xen-devel, linux-kernel, samu.kallio, kraman, jwboyer

On Thu, Feb 28, 2013 at 11:20:20AM -0500, Boris Ostrovsky wrote:
> On 02/28/2013 11:10 AM, Borislav Petkov wrote:
> >On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
> >>At the very least we should have an early filter for the **COMMON!**
> >>case that we are not on a PV platform.
> >... or, patch it out with the alternatives on baremetal, as Steven
> >suggested.
> >
> I think making a check for paravirt_enabled() is safe enough. I'll
> send a patch shortly.

Why not make it absolutely for free?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:20               ` Boris Ostrovsky
@ 2013-02-28 16:22                 ` Borislav Petkov
  2013-02-28 16:22                 ` Borislav Petkov
  1 sibling, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2013-02-28 16:22 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Konrad Rzeszutek Wilk, Greg KH, jwboyer, linux-kernel, xen-devel,
	mingo, H. Peter Anvin, tglx, samu.kallio, kraman

On Thu, Feb 28, 2013 at 11:20:20AM -0500, Boris Ostrovsky wrote:
> On 02/28/2013 11:10 AM, Borislav Petkov wrote:
> >On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
> >>At the very least we should have an early filter for the **COMMON!**
> >>case that we are not on a PV platform.
> >... or, patch it out with the alternatives on baremetal, as Steven
> >suggested.
> >
> I think making a check for paravirt_enabled() is safe enough. I'll
> send a patch shortly.

Why not make it absolutely for free?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:22                 ` Borislav Petkov
@ 2013-02-28 16:24                   ` H. Peter Anvin
  2013-02-28 16:24                   ` H. Peter Anvin
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: H. Peter Anvin @ 2013-02-28 16:24 UTC (permalink / raw)
  To: Borislav Petkov, Boris Ostrovsky, Konrad Rzeszutek Wilk, Greg KH,
	mingo, tglx, xen-devel, linux-kernel, samu.kallio, kraman,
	jwboyer

On 02/28/2013 08:22 AM, Borislav Petkov wrote:
> On Thu, Feb 28, 2013 at 11:20:20AM -0500, Boris Ostrovsky wrote:
>> On 02/28/2013 11:10 AM, Borislav Petkov wrote:
>>> On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
>>>> At the very least we should have an early filter for the **COMMON!**
>>>> case that we are not on a PV platform.
>>> ... or, patch it out with the alternatives on baremetal, as Steven
>>> suggested.
>>>
>> I think making a check for paravirt_enabled() is safe enough. I'll
>> send a patch shortly.
> 
> Why not make it absolutely for free?

That makes more sense, I think... although of course, there is a cost in
complexity.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:22                 ` Borislav Petkov
  2013-02-28 16:24                   ` H. Peter Anvin
@ 2013-02-28 16:24                   ` H. Peter Anvin
  2013-02-28 16:27                   ` Boris Ostrovsky
  2013-02-28 16:27                   ` Boris Ostrovsky
  3 siblings, 0 replies; 28+ messages in thread
From: H. Peter Anvin @ 2013-02-28 16:24 UTC (permalink / raw)
  To: Borislav Petkov, Boris Ostrovsky, Konrad Rzeszutek Wilk, Greg KH,
	mingo, tglx, xen-devel, linux-kernel, samu.kallio, kraman,
	jwboyer

On 02/28/2013 08:22 AM, Borislav Petkov wrote:
> On Thu, Feb 28, 2013 at 11:20:20AM -0500, Boris Ostrovsky wrote:
>> On 02/28/2013 11:10 AM, Borislav Petkov wrote:
>>> On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
>>>> At the very least we should have an early filter for the **COMMON!**
>>>> case that we are not on a PV platform.
>>> ... or, patch it out with the alternatives on baremetal, as Steven
>>> suggested.
>>>
>> I think making a check for paravirt_enabled() is safe enough. I'll
>> send a patch shortly.
> 
> Why not make it absolutely for free?

That makes more sense, I think... although of course, there is a cost in
complexity.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:22                 ` Borislav Petkov
                                     ` (2 preceding siblings ...)
  2013-02-28 16:27                   ` Boris Ostrovsky
@ 2013-02-28 16:27                   ` Boris Ostrovsky
  2013-02-28 16:32                     ` Borislav Petkov
                                       ` (3 more replies)
  3 siblings, 4 replies; 28+ messages in thread
From: Boris Ostrovsky @ 2013-02-28 16:27 UTC (permalink / raw)
  To: Borislav Petkov, H. Peter Anvin, Konrad Rzeszutek Wilk, Greg KH,
	mingo, tglx, xen-devel, linux-kernel, samu.kallio, kraman,
	jwboyer

On 02/28/2013 11:22 AM, Borislav Petkov wrote:
> On Thu, Feb 28, 2013 at 11:20:20AM -0500, Boris Ostrovsky wrote:
>> On 02/28/2013 11:10 AM, Borislav Petkov wrote:
>>> On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
>>>> At the very least we should have an early filter for the **COMMON!**
>>>> case that we are not on a PV platform.
>>> ... or, patch it out with the alternatives on baremetal, as Steven
>>> suggested.

What was the suggestion exactly? I don't remember seeing that message.

-boris

>>>
>> I think making a check for paravirt_enabled() is safe enough. I'll
>> send a patch shortly.
> Why not make it absolutely for free?
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:22                 ` Borislav Petkov
  2013-02-28 16:24                   ` H. Peter Anvin
  2013-02-28 16:24                   ` H. Peter Anvin
@ 2013-02-28 16:27                   ` Boris Ostrovsky
  2013-02-28 16:27                   ` Boris Ostrovsky
  3 siblings, 0 replies; 28+ messages in thread
From: Boris Ostrovsky @ 2013-02-28 16:27 UTC (permalink / raw)
  To: Borislav Petkov, H. Peter Anvin, Konrad Rzeszutek Wilk, Greg KH,
	mingo, tglx, xen-devel, linux-kernel, samu.kallio, kraman,
	jwboyer

On 02/28/2013 11:22 AM, Borislav Petkov wrote:
> On Thu, Feb 28, 2013 at 11:20:20AM -0500, Boris Ostrovsky wrote:
>> On 02/28/2013 11:10 AM, Borislav Petkov wrote:
>>> On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
>>>> At the very least we should have an early filter for the **COMMON!**
>>>> case that we are not on a PV platform.
>>> ... or, patch it out with the alternatives on baremetal, as Steven
>>> suggested.

What was the suggestion exactly? I don't remember seeing that message.

-boris

>>>
>> I think making a check for paravirt_enabled() is safe enough. I'll
>> send a patch shortly.
> Why not make it absolutely for free?
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:27                   ` Boris Ostrovsky
  2013-02-28 16:32                     ` Borislav Petkov
@ 2013-02-28 16:32                     ` Borislav Petkov
  2013-02-28 18:14                     ` Steven Rostedt
  2013-02-28 18:14                     ` Steven Rostedt
  3 siblings, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2013-02-28 16:32 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: H. Peter Anvin, Konrad Rzeszutek Wilk, Greg KH, mingo, tglx,
	xen-devel, linux-kernel, samu.kallio, kraman, jwboyer

On Thu, Feb 28, 2013 at 11:27:23AM -0500, Boris Ostrovsky wrote:
> What was the suggestion exactly? I don't remember seeing that message.

Use the paravirt patching facility to replace arch_flush_lazy_mmu_mode()
with a nop when running on baremetal. I.e., the whole functionality
around apply_paravirt() et al.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:27                   ` Boris Ostrovsky
@ 2013-02-28 16:32                     ` Borislav Petkov
  2013-02-28 16:32                     ` Borislav Petkov
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2013-02-28 16:32 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Konrad Rzeszutek Wilk, Greg KH, jwboyer, linux-kernel, xen-devel,
	mingo, H. Peter Anvin, tglx, samu.kallio, kraman

On Thu, Feb 28, 2013 at 11:27:23AM -0500, Boris Ostrovsky wrote:
> What was the suggestion exactly? I don't remember seeing that message.

Use the paravirt patching facility to replace arch_flush_lazy_mmu_mode()
with a nop when running on baremetal. I.e., the whole functionality
around apply_paravirt() et al.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:27                   ` Boris Ostrovsky
                                       ` (2 preceding siblings ...)
  2013-02-28 18:14                     ` Steven Rostedt
@ 2013-02-28 18:14                     ` Steven Rostedt
  3 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2013-02-28 18:14 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Borislav Petkov, H. Peter Anvin, Konrad Rzeszutek Wilk, Greg KH,
	mingo, tglx, xen-devel, linux-kernel, samu.kallio, kraman,
	jwboyer

On Thu, Feb 28, 2013 at 11:27:23AM -0500, Boris Ostrovsky wrote:
> On 02/28/2013 11:22 AM, Borislav Petkov wrote:
> >On Thu, Feb 28, 2013 at 11:20:20AM -0500, Boris Ostrovsky wrote:
> >>On 02/28/2013 11:10 AM, Borislav Petkov wrote:
> >>>On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
> >>>>At the very least we should have an early filter for the **COMMON!**
> >>>>case that we are not on a PV platform.
> >>>... or, patch it out with the alternatives on baremetal, as Steven
> >>>suggested.
> 
> What was the suggestion exactly? I don't remember seeing that message.

Yeah, Borislav talked to me privately on IRC about this and I pointed
him to the apply_paravirt() function in arch/x86/kernel/alternative.c
where things that apply only to paravirt get patched out on baremetal.

It may add complexity, but there's a method for doing it and I rather
not burden baremetal for pravirt nonsense. I know adding a simple call
to preempt_count() can show a noticable impact to function tracing. It
requires referencing the gs segment register and doing some offset
games (as it's stored as a per cpu pointer) to find the stack.

I was actually a bit amazed that it had as big of an impact as it did. I
can understand why Christoph Lameter tried hard not to add a
preempt_disable() in his code for just a tiny location.

-- Steve

> 
> -boris
> 
> >>>
> >>I think making a check for paravirt_enabled() is safe enough. I'll
> >>send a patch shortly.
> >Why not make it absolutely for free?
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:27                   ` Boris Ostrovsky
  2013-02-28 16:32                     ` Borislav Petkov
  2013-02-28 16:32                     ` Borislav Petkov
@ 2013-02-28 18:14                     ` Steven Rostedt
  2013-02-28 18:14                     ` Steven Rostedt
  3 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2013-02-28 18:14 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Konrad Rzeszutek Wilk, Greg KH, jwboyer, linux-kernel, xen-devel,
	mingo, Borislav Petkov, H. Peter Anvin, tglx, samu.kallio,
	kraman

On Thu, Feb 28, 2013 at 11:27:23AM -0500, Boris Ostrovsky wrote:
> On 02/28/2013 11:22 AM, Borislav Petkov wrote:
> >On Thu, Feb 28, 2013 at 11:20:20AM -0500, Boris Ostrovsky wrote:
> >>On 02/28/2013 11:10 AM, Borislav Petkov wrote:
> >>>On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
> >>>>At the very least we should have an early filter for the **COMMON!**
> >>>>case that we are not on a PV platform.
> >>>... or, patch it out with the alternatives on baremetal, as Steven
> >>>suggested.
> 
> What was the suggestion exactly? I don't remember seeing that message.

Yeah, Borislav talked to me privately on IRC about this and I pointed
him to the apply_paravirt() function in arch/x86/kernel/alternative.c
where things that apply only to paravirt get patched out on baremetal.

It may add complexity, but there's a method for doing it and I rather
not burden baremetal for pravirt nonsense. I know adding a simple call
to preempt_count() can show a noticable impact to function tracing. It
requires referencing the gs segment register and doing some offset
games (as it's stored as a per cpu pointer) to find the stack.

I was actually a bit amazed that it had as big of an impact as it did. I
can understand why Christoph Lameter tried hard not to add a
preempt_disable() in his code for just a tiny location.

-- Steve

> 
> -boris
> 
> >>>
> >>I think making a check for paravirt_enabled() is safe enough. I'll
> >>send a patch shortly.
> >Why not make it absolutely for free?
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-02-28 18:14 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-26 23:57 [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set Boris Ostrovsky
2013-02-26 23:57 ` Boris Ostrovsky
2013-02-27 22:40 ` H. Peter Anvin
2013-02-27 22:40 ` H. Peter Anvin
2013-02-27 23:00   ` Greg KH
2013-02-27 23:07     ` H. Peter Anvin
2013-02-27 23:07     ` H. Peter Anvin
2013-02-28 14:29       ` Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: " Konrad Rzeszutek Wilk
2013-02-28 15:38         ` Borislav Petkov
2013-02-28 15:38         ` Borislav Petkov
2013-02-28 15:53           ` H. Peter Anvin
2013-02-28 15:53           ` H. Peter Anvin
2013-02-28 16:10             ` Borislav Petkov
2013-02-28 16:20               ` Boris Ostrovsky
2013-02-28 16:22                 ` Borislav Petkov
2013-02-28 16:22                 ` Borislav Petkov
2013-02-28 16:24                   ` H. Peter Anvin
2013-02-28 16:24                   ` H. Peter Anvin
2013-02-28 16:27                   ` Boris Ostrovsky
2013-02-28 16:27                   ` Boris Ostrovsky
2013-02-28 16:32                     ` Borislav Petkov
2013-02-28 16:32                     ` Borislav Petkov
2013-02-28 18:14                     ` Steven Rostedt
2013-02-28 18:14                     ` Steven Rostedt
2013-02-28 16:20               ` Boris Ostrovsky
2013-02-28 16:10             ` Borislav Petkov
2013-02-28 14:29       ` Konrad Rzeszutek Wilk
2013-02-27 23:00   ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.