linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
@ 2013-02-26 23:57 Boris Ostrovsky
  2013-02-27 22:40 ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: Boris Ostrovsky @ 2013-02-26 23:57 UTC (permalink / raw)
  To: hpa; +Cc: mingo, konrad.wilk, tglx, xen-devel, linux-kernel


----- hpa@zytor.com wrote:

> On 02/26/2013 02:56 PM, Boris Ostrovsky wrote:
> > When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
> > kernel_map_pages() are not made visible (via TLB flush) immediately
> if lazy
> > MMU is on. In environments that support lazy MMU (e.g. Xen) this may
> lead to
> > fatal page faults, for example, when zap_pte_range() needs to
> allocate pages
> > in __tlb_remove_page() -> tlb_next_batch().
> > 
> > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> > ---
> >  arch/x86/mm/pageattr.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> > index ca1f1c2..7b3216e 100644
> > --- a/arch/x86/mm/pageattr.c
> > +++ b/arch/x86/mm/pageattr.c
> > @@ -1369,6 +1369,8 @@ void kernel_map_pages(struct page *page, int
> numpages, int enable)
> >  	 * but that can deadlock->flush only current cpu:
> >  	 */
> >  	__flush_tlb_all();
> > +
> > +	arch_flush_lazy_mmu_mode();
> >  }
> >  
> >  #ifdef CONFIG_HIBERNATION
> > 
> 
> This sounds like a critical fix, i.e. a -stable candidate.  Am I
> correct?

I considered copying stable but then I decided that this is a debugging feature
--- kernel_map_pages() is only defined if CONFIG_DEBUG_PAGEALLOC is set and my
thinking was that stable kernels usually don't do this.


-boris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-26 23:57 [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set Boris Ostrovsky
@ 2013-02-27 22:40 ` H. Peter Anvin
  2013-02-27 23:00   ` Greg KH
  0 siblings, 1 reply; 16+ messages in thread
From: H. Peter Anvin @ 2013-02-27 22:40 UTC (permalink / raw)
  To: Greg KH
  Cc: Boris Ostrovsky, mingo, konrad.wilk, tglx, xen-devel, linux-kernel

Greg, policy opinion?

	-hpa

On 02/26/2013 03:57 PM, Boris Ostrovsky wrote:
> 
> ----- hpa@zytor.com wrote:
> 
>> On 02/26/2013 02:56 PM, Boris Ostrovsky wrote:
>>> When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
>>> kernel_map_pages() are not made visible (via TLB flush) immediately
>> if lazy
>>> MMU is on. In environments that support lazy MMU (e.g. Xen) this may
>> lead to
>>> fatal page faults, for example, when zap_pte_range() needs to
>> allocate pages
>>> in __tlb_remove_page() -> tlb_next_batch().
>>>
>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>> ---
>>>  arch/x86/mm/pageattr.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
>>> index ca1f1c2..7b3216e 100644
>>> --- a/arch/x86/mm/pageattr.c
>>> +++ b/arch/x86/mm/pageattr.c
>>> @@ -1369,6 +1369,8 @@ void kernel_map_pages(struct page *page, int
>> numpages, int enable)
>>>  	 * but that can deadlock->flush only current cpu:
>>>  	 */
>>>  	__flush_tlb_all();
>>> +
>>> +	arch_flush_lazy_mmu_mode();
>>>  }
>>>  
>>>  #ifdef CONFIG_HIBERNATION
>>>
>>
>> This sounds like a critical fix, i.e. a -stable candidate.  Am I
>> correct?
> 
> I considered copying stable but then I decided that this is a debugging feature
> --- kernel_map_pages() is only defined if CONFIG_DEBUG_PAGEALLOC is set and my
> thinking was that stable kernels usually don't do this.
> 
> 
> -boris
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-27 22:40 ` H. Peter Anvin
@ 2013-02-27 23:00   ` Greg KH
  2013-02-27 23:07     ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: Greg KH @ 2013-02-27 23:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Boris Ostrovsky, mingo, konrad.wilk, tglx, xen-devel, linux-kernel

On Wed, Feb 27, 2013 at 02:40:01PM -0800, H. Peter Anvin wrote:
> Greg, policy opinion?
> 
> 	-hpa
> 
> On 02/26/2013 03:57 PM, Boris Ostrovsky wrote:
> > 
> > ----- hpa@zytor.com wrote:
> > 
> >> On 02/26/2013 02:56 PM, Boris Ostrovsky wrote:
> >>> When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
> >>> kernel_map_pages() are not made visible (via TLB flush) immediately
> >> if lazy
> >>> MMU is on. In environments that support lazy MMU (e.g. Xen) this may
> >> lead to
> >>> fatal page faults, for example, when zap_pte_range() needs to
> >> allocate pages
> >>> in __tlb_remove_page() -> tlb_next_batch().
> >>>
> >>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> >>> ---
> >>>  arch/x86/mm/pageattr.c | 2 ++
> >>>  1 file changed, 2 insertions(+)
> >>>
> >>> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> >>> index ca1f1c2..7b3216e 100644
> >>> --- a/arch/x86/mm/pageattr.c
> >>> +++ b/arch/x86/mm/pageattr.c
> >>> @@ -1369,6 +1369,8 @@ void kernel_map_pages(struct page *page, int
> >> numpages, int enable)
> >>>  	 * but that can deadlock->flush only current cpu:
> >>>  	 */
> >>>  	__flush_tlb_all();
> >>> +
> >>> +	arch_flush_lazy_mmu_mode();
> >>>  }
> >>>  
> >>>  #ifdef CONFIG_HIBERNATION
> >>>
> >>
> >> This sounds like a critical fix, i.e. a -stable candidate.  Am I
> >> correct?
> > 
> > I considered copying stable but then I decided that this is a debugging feature
> > --- kernel_map_pages() is only defined if CONFIG_DEBUG_PAGEALLOC is set and my
> > thinking was that stable kernels usually don't do this.

"Stable" kernels are used all over the place, like in distros, which
might enable this.

I have no objection to taking this patch in a stable release, as it does
fix a real problem.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-27 23:00   ` Greg KH
@ 2013-02-27 23:07     ` H. Peter Anvin
  2013-02-28 14:29       ` Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: " Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 16+ messages in thread
From: H. Peter Anvin @ 2013-02-27 23:07 UTC (permalink / raw)
  To: Greg KH
  Cc: Boris Ostrovsky, mingo, konrad.wilk, tglx, xen-devel, linux-kernel

On 02/27/2013 03:00 PM, Greg KH wrote:
> 
> "Stable" kernels are used all over the place, like in distros, which
> might enable this.
> 
> I have no objection to taking this patch in a stable release, as it does
> fix a real problem.
> 

OK.  I will queue it up in the next fixes (tip:x86/urgent) batch to Linus.

	-hpa



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-27 23:07     ` H. Peter Anvin
@ 2013-02-28 14:29       ` Konrad Rzeszutek Wilk
  2013-02-28 15:38         ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-02-28 14:29 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Greg KH, Boris Ostrovsky, mingo, tglx, xen-devel, linux-kernel,
	samu.kallio, kraman, jwboyer

On Wed, Feb 27, 2013 at 03:07:35PM -0800, H. Peter Anvin wrote:
> On 02/27/2013 03:00 PM, Greg KH wrote:
> > 
> > "Stable" kernels are used all over the place, like in distros, which
> > might enable this.
> > 
> > I have no objection to taking this patch in a stable release, as it does
> > fix a real problem.
> > 
> 
> OK.  I will queue it up in the next fixes (tip:x86/urgent) batch to Linus.

Thank you.

Could you also consider this one (I CC-ed Ingo on it but never got any
response):


>From a6ed4a88eff4f6329bb4acae3372cccc8a8367d5 Mon Sep 17 00:00:00 2001
From: Samu Kallio <samu.kallio@aberdeencloud.com>
Date: Sun, 17 Feb 2013 02:35:52 +0000
Subject: [PATCH] x86: mm: Fix vmalloc_fault oops during lazy MMU updates.

In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
when lazy MMU updates are enabled, because set_pgd effects are being
deferred.

One instance of this problem is during process mm cleanup with memory
cgroups enabled. The chain of events is as follows:

- zap_pte_range enables lazy MMU updates
- zap_pte_range eventually calls mem_cgroup_charge_statistics,
  which accesses the vmalloc'd mem_cgroup per-cpu stat area
- vmalloc_fault is triggered which tries to sync the corresponding
  PGD entry with set_pgd, but the update is deferred
- vmalloc_fault oopses due to a mismatch in the PUD entries

The OOPs usually looks as so:

------------[ cut here ]------------
kernel BUG at arch/x86/mm/fault.c:396!
invalid opcode: 0000 [#1] SMP
.. snip ..
CPU 1
Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1
RIP: e030:[<ffffffff816271bf>]  [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208
.. snip ..
Call Trace:
 [<ffffffff81627759>] do_page_fault+0x399/0x4b0
 [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110
 [<ffffffff81624065>] page_fault+0x25/0x30
 [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50
 [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350
 [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60
 [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150
 [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80
 [<ffffffff81153e61>] unmap_single_vma+0x531/0x870
 [<ffffffff81154962>] unmap_vmas+0x52/0xa0
 [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100
 [<ffffffff8115c8f8>] exit_mmap+0x98/0x170
 [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff81059ce3>] mmput+0x83/0xf0
 [<ffffffff810624c4>] exit_mm+0x104/0x130
 [<ffffffff8106264a>] do_exit+0x15a/0x8c0
 [<ffffffff810630ff>] do_group_exit+0x3f/0xa0
 [<ffffffff81063177>] sys_exit_group+0x17/0x20
 [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b

Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
changes visible to the consistency checks.

RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737
Reported-and-Tested-by: Krishna Raman <kraman@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/mm/fault.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fb674fd..4f7d793 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
 	if (pgd_none(*pgd_ref))
 		return -1;
 
-	if (pgd_none(*pgd))
+	if (pgd_none(*pgd)) {
 		set_pgd(pgd, *pgd_ref);
-	else
+		arch_flush_lazy_mmu_mode();
+	} else {
 		BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+	}
 
 	/*
 	 * Below here mismatches are bugs because these lower tables
-- 
1.8.0.2

> 
> 	-hpa
> 
> 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 14:29       ` Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: " Konrad Rzeszutek Wilk
@ 2013-02-28 15:38         ` Borislav Petkov
  2013-02-28 15:53           ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2013-02-28 15:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: H. Peter Anvin, Greg KH, Boris Ostrovsky, mingo, tglx, xen-devel,
	linux-kernel, samu.kallio, kraman, jwboyer

On Thu, Feb 28, 2013 at 09:29:10AM -0500, Konrad Rzeszutek Wilk wrote:
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index fb674fd..4f7d793 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
>  	if (pgd_none(*pgd_ref))
>  		return -1;
>  
> -	if (pgd_none(*pgd))
> +	if (pgd_none(*pgd)) {
>  		set_pgd(pgd, *pgd_ref);
> -	else
> +		arch_flush_lazy_mmu_mode();

Do I understand it correctly that this would cost us a
"preempt_disable(); preempt_enable()" needlessly on baremetal when
running with CONFIG_PARAVIRT enabled?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 15:38         ` Borislav Petkov
@ 2013-02-28 15:53           ` H. Peter Anvin
  2013-02-28 16:10             ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: H. Peter Anvin @ 2013-02-28 15:53 UTC (permalink / raw)
  To: Borislav Petkov, Konrad Rzeszutek Wilk, Greg KH, Boris Ostrovsky,
	mingo, tglx, xen-devel, linux-kernel, samu.kallio, kraman,
	jwboyer

On 02/28/2013 07:38 AM, Borislav Petkov wrote:
> On Thu, Feb 28, 2013 at 09:29:10AM -0500, Konrad Rzeszutek Wilk wrote:
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index fb674fd..4f7d793 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
>>  	if (pgd_none(*pgd_ref))
>>  		return -1;
>>  
>> -	if (pgd_none(*pgd))
>> +	if (pgd_none(*pgd)) {
>>  		set_pgd(pgd, *pgd_ref);
>> -	else
>> +		arch_flush_lazy_mmu_mode();
> 
> Do I understand it correctly that this would cost us a
> "preempt_disable(); preempt_enable()" needlessly on baremetal when
> running with CONFIG_PARAVIRT enabled?
> 

OK, this is mind-boggingly crazy and these things that just drives me
batty about PV.

arch_flush_lazy_mmu_mode() involves a function call and preemption off
*before it even tests for being a noop*.  On bare metal this function
doesn't do anything, and cannot do anything.

At the very least we should have an early filter for the **COMMON!**
case that we are not on a PV platform.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 15:53           ` H. Peter Anvin
@ 2013-02-28 16:10             ` Borislav Petkov
  2013-02-28 16:20               ` Boris Ostrovsky
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2013-02-28 16:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Konrad Rzeszutek Wilk, Greg KH, Boris Ostrovsky, mingo, tglx,
	xen-devel, linux-kernel, samu.kallio, kraman, jwboyer

On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
> At the very least we should have an early filter for the **COMMON!**
> case that we are not on a PV platform.

... or, patch it out with the alternatives on baremetal, as Steven
suggested.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:10             ` Borislav Petkov
@ 2013-02-28 16:20               ` Boris Ostrovsky
  2013-02-28 16:22                 ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: Boris Ostrovsky @ 2013-02-28 16:20 UTC (permalink / raw)
  To: Borislav Petkov, H. Peter Anvin, Konrad Rzeszutek Wilk, Greg KH,
	mingo, tglx, xen-devel, linux-kernel, samu.kallio, kraman,
	jwboyer

On 02/28/2013 11:10 AM, Borislav Petkov wrote:
> On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
>> At the very least we should have an early filter for the **COMMON!**
>> case that we are not on a PV platform.
> ... or, patch it out with the alternatives on baremetal, as Steven
> suggested.
>

I think making a check for paravirt_enabled() is safe enough. I'll send 
a patch shortly.

-boris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:20               ` Boris Ostrovsky
@ 2013-02-28 16:22                 ` Borislav Petkov
  2013-02-28 16:24                   ` H. Peter Anvin
  2013-02-28 16:27                   ` Boris Ostrovsky
  0 siblings, 2 replies; 16+ messages in thread
From: Borislav Petkov @ 2013-02-28 16:22 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: H. Peter Anvin, Konrad Rzeszutek Wilk, Greg KH, mingo, tglx,
	xen-devel, linux-kernel, samu.kallio, kraman, jwboyer

On Thu, Feb 28, 2013 at 11:20:20AM -0500, Boris Ostrovsky wrote:
> On 02/28/2013 11:10 AM, Borislav Petkov wrote:
> >On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
> >>At the very least we should have an early filter for the **COMMON!**
> >>case that we are not on a PV platform.
> >... or, patch it out with the alternatives on baremetal, as Steven
> >suggested.
> >
> I think making a check for paravirt_enabled() is safe enough. I'll
> send a patch shortly.

Why not make it absolutely for free?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:22                 ` Borislav Petkov
@ 2013-02-28 16:24                   ` H. Peter Anvin
  2013-02-28 16:27                   ` Boris Ostrovsky
  1 sibling, 0 replies; 16+ messages in thread
From: H. Peter Anvin @ 2013-02-28 16:24 UTC (permalink / raw)
  To: Borislav Petkov, Boris Ostrovsky, Konrad Rzeszutek Wilk, Greg KH,
	mingo, tglx, xen-devel, linux-kernel, samu.kallio, kraman,
	jwboyer

On 02/28/2013 08:22 AM, Borislav Petkov wrote:
> On Thu, Feb 28, 2013 at 11:20:20AM -0500, Boris Ostrovsky wrote:
>> On 02/28/2013 11:10 AM, Borislav Petkov wrote:
>>> On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
>>>> At the very least we should have an early filter for the **COMMON!**
>>>> case that we are not on a PV platform.
>>> ... or, patch it out with the alternatives on baremetal, as Steven
>>> suggested.
>>>
>> I think making a check for paravirt_enabled() is safe enough. I'll
>> send a patch shortly.
> 
> Why not make it absolutely for free?

That makes more sense, I think... although of course, there is a cost in
complexity.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:22                 ` Borislav Petkov
  2013-02-28 16:24                   ` H. Peter Anvin
@ 2013-02-28 16:27                   ` Boris Ostrovsky
  2013-02-28 16:32                     ` Borislav Petkov
  2013-02-28 18:14                     ` Steven Rostedt
  1 sibling, 2 replies; 16+ messages in thread
From: Boris Ostrovsky @ 2013-02-28 16:27 UTC (permalink / raw)
  To: Borislav Petkov, H. Peter Anvin, Konrad Rzeszutek Wilk, Greg KH,
	mingo, tglx, xen-devel, linux-kernel, samu.kallio, kraman,
	jwboyer

On 02/28/2013 11:22 AM, Borislav Petkov wrote:
> On Thu, Feb 28, 2013 at 11:20:20AM -0500, Boris Ostrovsky wrote:
>> On 02/28/2013 11:10 AM, Borislav Petkov wrote:
>>> On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
>>>> At the very least we should have an early filter for the **COMMON!**
>>>> case that we are not on a PV platform.
>>> ... or, patch it out with the alternatives on baremetal, as Steven
>>> suggested.

What was the suggestion exactly? I don't remember seeing that message.

-boris

>>>
>> I think making a check for paravirt_enabled() is safe enough. I'll
>> send a patch shortly.
> Why not make it absolutely for free?
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:27                   ` Boris Ostrovsky
@ 2013-02-28 16:32                     ` Borislav Petkov
  2013-02-28 18:14                     ` Steven Rostedt
  1 sibling, 0 replies; 16+ messages in thread
From: Borislav Petkov @ 2013-02-28 16:32 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: H. Peter Anvin, Konrad Rzeszutek Wilk, Greg KH, mingo, tglx,
	xen-devel, linux-kernel, samu.kallio, kraman, jwboyer

On Thu, Feb 28, 2013 at 11:27:23AM -0500, Boris Ostrovsky wrote:
> What was the suggestion exactly? I don't remember seeing that message.

Use the paravirt patching facility to replace arch_flush_lazy_mmu_mode()
with a nop when running on baremetal. I.e., the whole functionality
around apply_paravirt() et al.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-28 16:27                   ` Boris Ostrovsky
  2013-02-28 16:32                     ` Borislav Petkov
@ 2013-02-28 18:14                     ` Steven Rostedt
  1 sibling, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2013-02-28 18:14 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Borislav Petkov, H. Peter Anvin, Konrad Rzeszutek Wilk, Greg KH,
	mingo, tglx, xen-devel, linux-kernel, samu.kallio, kraman,
	jwboyer

On Thu, Feb 28, 2013 at 11:27:23AM -0500, Boris Ostrovsky wrote:
> On 02/28/2013 11:22 AM, Borislav Petkov wrote:
> >On Thu, Feb 28, 2013 at 11:20:20AM -0500, Boris Ostrovsky wrote:
> >>On 02/28/2013 11:10 AM, Borislav Petkov wrote:
> >>>On Thu, Feb 28, 2013 at 07:53:44AM -0800, H. Peter Anvin wrote:
> >>>>At the very least we should have an early filter for the **COMMON!**
> >>>>case that we are not on a PV platform.
> >>>... or, patch it out with the alternatives on baremetal, as Steven
> >>>suggested.
> 
> What was the suggestion exactly? I don't remember seeing that message.

Yeah, Borislav talked to me privately on IRC about this and I pointed
him to the apply_paravirt() function in arch/x86/kernel/alternative.c
where things that apply only to paravirt get patched out on baremetal.

It may add complexity, but there's a method for doing it and I rather
not burden baremetal for pravirt nonsense. I know adding a simple call
to preempt_count() can show a noticable impact to function tracing. It
requires referencing the gs segment register and doing some offset
games (as it's stored as a per cpu pointer) to find the stack.

I was actually a bit amazed that it had as big of an impact as it did. I
can understand why Christoph Lameter tried hard not to add a
preempt_disable() in his code for just a tiny location.

-- Steve

> 
> -boris
> 
> >>>
> >>I think making a check for paravirt_enabled() is safe enough. I'll
> >>send a patch shortly.
> >Why not make it absolutely for free?
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
  2013-02-26 22:56 Boris Ostrovsky
@ 2013-02-26 23:38 ` H. Peter Anvin
  0 siblings, 0 replies; 16+ messages in thread
From: H. Peter Anvin @ 2013-02-26 23:38 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: tglx, mingo, konrad.wilk, xen-devel, linux-kernel

On 02/26/2013 02:56 PM, Boris Ostrovsky wrote:
> When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
> kernel_map_pages() are not made visible (via TLB flush) immediately if lazy
> MMU is on. In environments that support lazy MMU (e.g. Xen) this may lead to
> fatal page faults, for example, when zap_pte_range() needs to allocate pages
> in __tlb_remove_page() -> tlb_next_batch().
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  arch/x86/mm/pageattr.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index ca1f1c2..7b3216e 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -1369,6 +1369,8 @@ void kernel_map_pages(struct page *page, int numpages, int enable)
>  	 * but that can deadlock->flush only current cpu:
>  	 */
>  	__flush_tlb_all();
> +
> +	arch_flush_lazy_mmu_mode();
>  }
>  
>  #ifdef CONFIG_HIBERNATION
> 

This sounds like a critical fix, i.e. a -stable candidate.  Am I correct?

	-hpa


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set
@ 2013-02-26 22:56 Boris Ostrovsky
  2013-02-26 23:38 ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: Boris Ostrovsky @ 2013-02-26 22:56 UTC (permalink / raw)
  To: tglx, mingo, hpa; +Cc: konrad.wilk, xen-devel, linux-kernel, Boris Ostrovsky

When CONFIG_DEBUG_PAGEALLOC is set page table updates made by
kernel_map_pages() are not made visible (via TLB flush) immediately if lazy
MMU is on. In environments that support lazy MMU (e.g. Xen) this may lead to
fatal page faults, for example, when zap_pte_range() needs to allocate pages
in __tlb_remove_page() -> tlb_next_batch().

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 arch/x86/mm/pageattr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index ca1f1c2..7b3216e 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1369,6 +1369,8 @@ void kernel_map_pages(struct page *page, int numpages, int enable)
 	 * but that can deadlock->flush only current cpu:
 	 */
 	__flush_tlb_all();
+
+	arch_flush_lazy_mmu_mode();
 }
 
 #ifdef CONFIG_HIBERNATION
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-02-28 18:14 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-26 23:57 [PATCH] mm/x86: Flush lazy MMU when DEBUG_PAGEALLOC is set Boris Ostrovsky
2013-02-27 22:40 ` H. Peter Anvin
2013-02-27 23:00   ` Greg KH
2013-02-27 23:07     ` H. Peter Anvin
2013-02-28 14:29       ` Is: x86: mm: Fix vmalloc_fault oops during lazy MMU updates Was: " Konrad Rzeszutek Wilk
2013-02-28 15:38         ` Borislav Petkov
2013-02-28 15:53           ` H. Peter Anvin
2013-02-28 16:10             ` Borislav Petkov
2013-02-28 16:20               ` Boris Ostrovsky
2013-02-28 16:22                 ` Borislav Petkov
2013-02-28 16:24                   ` H. Peter Anvin
2013-02-28 16:27                   ` Boris Ostrovsky
2013-02-28 16:32                     ` Borislav Petkov
2013-02-28 18:14                     ` Steven Rostedt
  -- strict thread matches above, loose matches on Subject: below --
2013-02-26 22:56 Boris Ostrovsky
2013-02-26 23:38 ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).