linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
@ 2013-03-20 13:53 Boris Ostrovsky
  2013-03-21  0:08 ` Josh Boyer
  0 siblings, 1 reply; 10+ messages in thread
From: Boris Ostrovsky @ 2013-03-20 13:53 UTC (permalink / raw)
  To: jwboyer
  Cc: mingo, konrad.wilk, tglx, rostedt, kraman, gregkh, stable, bp,
	samu.kallio, xen-devel, linux-kernel, hpa


----- jwboyer@redhat.com wrote:

> On Wed, Mar 13, 2013 at 09:25:44AM -0400, Boris Ostrovsky wrote:
> > On 03/01/2013 07:14 AM, Josh Boyer wrote:
> > >On Thu, Feb 28, 2013 at 04:52:20PM -0800, H. Peter Anvin wrote:
> > >>On 02/28/2013 04:42 PM, Josh Boyer wrote:
> > >>>On Fri, Mar 01, 2013 at 01:36:29AM +0100, Borislav Petkov wrote:
> > >>>>On Thu, Feb 28, 2013 at 04:15:45PM -0800, H. Peter Anvin wrote:
> > >>>>>>I'll try to get someone to test this tomorrow.
> > >>>>Btw, you'd need to apply that other patch too
> > >>>>
> > >>>>http://marc.info/?l=xen-devel&m=136206183814547&w=2
> > >>>>
> > >>>>so that arch_flush_lazy_mmu_mode() has at least one caller on
> x86_64.
> > >>>Yeah, we already have that applied.  It stops crashes in xen
> > >>>environments so we pulled it in as a bugfix.  Thanks though!
> > >>>
> > >>Who are "we"?
> > >Sorry, Fedora.  That patch has a link to a bug in it.  We applied
> the
> > >patch for that bug.  I'll apply Boris' patch on top and get the
> same
> > >people to test it.
> > 
> > Josh, have you had a chance to test this?
> 
> I've tested it on bare metal for a while now.  No problems noticed at
> all.  I've not heard back from Krishna who was testing it in the Xen
> environment.  Krishna?


Any updates?

Thanks.
-boris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
  2013-03-20 13:53 [PATCH] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Boris Ostrovsky
@ 2013-03-21  0:08 ` Josh Boyer
  2013-03-22 20:09   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 10+ messages in thread
From: Josh Boyer @ 2013-03-21  0:08 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: mingo, konrad.wilk, tglx, rostedt, kraman, gregkh, stable, bp,
	samu.kallio, xen-devel, linux-kernel, hpa

On Wed, Mar 20, 2013 at 06:53:55AM -0700, Boris Ostrovsky wrote:
> 
> ----- jwboyer@redhat.com wrote:
> 
> > On Wed, Mar 13, 2013 at 09:25:44AM -0400, Boris Ostrovsky wrote:
> > > On 03/01/2013 07:14 AM, Josh Boyer wrote:
> > > >On Thu, Feb 28, 2013 at 04:52:20PM -0800, H. Peter Anvin wrote:
> > > >>On 02/28/2013 04:42 PM, Josh Boyer wrote:
> > > >>>On Fri, Mar 01, 2013 at 01:36:29AM +0100, Borislav Petkov wrote:
> > > >>>>On Thu, Feb 28, 2013 at 04:15:45PM -0800, H. Peter Anvin wrote:
> > > >>>>>>I'll try to get someone to test this tomorrow.
> > > >>>>Btw, you'd need to apply that other patch too
> > > >>>>
> > > >>>>http://marc.info/?l=xen-devel&m=136206183814547&w=2
> > > >>>>
> > > >>>>so that arch_flush_lazy_mmu_mode() has at least one caller on
> > x86_64.
> > > >>>Yeah, we already have that applied.  It stops crashes in xen
> > > >>>environments so we pulled it in as a bugfix.  Thanks though!
> > > >>>
> > > >>Who are "we"?
> > > >Sorry, Fedora.  That patch has a link to a bug in it.  We applied
> > the
> > > >patch for that bug.  I'll apply Boris' patch on top and get the
> > same
> > > >people to test it.
> > > 
> > > Josh, have you had a chance to test this?
> > 
> > I've tested it on bare metal for a while now.  No problems noticed at
> > all.  I've not heard back from Krishna who was testing it in the Xen
> > environment.  Krishna?
> 
> 
> Any updates?

No.  I've still not heard from Krishna.

At this point I've tested it on bare metal quite a bit, and Konrad has
tested it on both bare metal and Xen.  That should already cover the
case Krishna was going to test anyway.  I suggest we move forward and
take the patch.

josh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
  2013-03-21  0:08 ` Josh Boyer
@ 2013-03-22 20:09   ` Konrad Rzeszutek Wilk
  2013-03-22 20:25     ` H. Peter Anvin
  0 siblings, 1 reply; 10+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-22 20:09 UTC (permalink / raw)
  To: Josh Boyer, hpa
  Cc: Boris Ostrovsky, mingo, tglx, rostedt, kraman, gregkh, stable,
	bp, samu.kallio, xen-devel, linux-kernel, hpa

On Wed, Mar 20, 2013 at 08:08:45PM -0400, Josh Boyer wrote:
> On Wed, Mar 20, 2013 at 06:53:55AM -0700, Boris Ostrovsky wrote:
> > 
> > ----- jwboyer@redhat.com wrote:
> > 
> > > On Wed, Mar 13, 2013 at 09:25:44AM -0400, Boris Ostrovsky wrote:
> > > > On 03/01/2013 07:14 AM, Josh Boyer wrote:
> > > > >On Thu, Feb 28, 2013 at 04:52:20PM -0800, H. Peter Anvin wrote:
> > > > >>On 02/28/2013 04:42 PM, Josh Boyer wrote:
> > > > >>>On Fri, Mar 01, 2013 at 01:36:29AM +0100, Borislav Petkov wrote:
> > > > >>>>On Thu, Feb 28, 2013 at 04:15:45PM -0800, H. Peter Anvin wrote:
> > > > >>>>>>I'll try to get someone to test this tomorrow.
> > > > >>>>Btw, you'd need to apply that other patch too
> > > > >>>>
> > > > >>>>http://marc.info/?l=xen-devel&m=136206183814547&w=2
> > > > >>>>
> > > > >>>>so that arch_flush_lazy_mmu_mode() has at least one caller on
> > > x86_64.
> > > > >>>Yeah, we already have that applied.  It stops crashes in xen
> > > > >>>environments so we pulled it in as a bugfix.  Thanks though!
> > > > >>>
> > > > >>Who are "we"?
> > > > >Sorry, Fedora.  That patch has a link to a bug in it.  We applied
> > > the
> > > > >patch for that bug.  I'll apply Boris' patch on top and get the
> > > same
> > > > >people to test it.
> > > > 
> > > > Josh, have you had a chance to test this?
> > > 
> > > I've tested it on bare metal for a while now.  No problems noticed at
> > > all.  I've not heard back from Krishna who was testing it in the Xen
> > > environment.  Krishna?
> > 
> > 
> > Any updates?
> 
> No.  I've still not heard from Krishna.
> 
> At this point I've tested it on bare metal quite a bit, and Konrad has
> tested it on both bare metal and Xen.  That should already cover the
> case Krishna was going to test anyway.  I suggest we move forward and
> take the patch.

Peter?

Would you like me or Boris to clean up the two patches with the
appropiate Acks and send them to you? 
> 
> josh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
  2013-03-22 20:09   ` Konrad Rzeszutek Wilk
@ 2013-03-22 20:25     ` H. Peter Anvin
  2013-03-23 13:36       ` [PATCH 1/2] x86: mm: Fix vmalloc_fault oops during lazy MMU updates Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2013-03-22 20:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Josh Boyer
  Cc: Boris Ostrovsky, mingo, tglx, rostedt, kraman, gregkh, stable,
	bp, samu.kallio, xen-devel, linux-kernel

Sure.

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

>On Wed, Mar 20, 2013 at 08:08:45PM -0400, Josh Boyer wrote:
>> On Wed, Mar 20, 2013 at 06:53:55AM -0700, Boris Ostrovsky wrote:
>> > 
>> > ----- jwboyer@redhat.com wrote:
>> > 
>> > > On Wed, Mar 13, 2013 at 09:25:44AM -0400, Boris Ostrovsky wrote:
>> > > > On 03/01/2013 07:14 AM, Josh Boyer wrote:
>> > > > >On Thu, Feb 28, 2013 at 04:52:20PM -0800, H. Peter Anvin
>wrote:
>> > > > >>On 02/28/2013 04:42 PM, Josh Boyer wrote:
>> > > > >>>On Fri, Mar 01, 2013 at 01:36:29AM +0100, Borislav Petkov
>wrote:
>> > > > >>>>On Thu, Feb 28, 2013 at 04:15:45PM -0800, H. Peter Anvin
>wrote:
>> > > > >>>>>>I'll try to get someone to test this tomorrow.
>> > > > >>>>Btw, you'd need to apply that other patch too
>> > > > >>>>
>> > > > >>>>http://marc.info/?l=xen-devel&m=136206183814547&w=2
>> > > > >>>>
>> > > > >>>>so that arch_flush_lazy_mmu_mode() has at least one caller
>on
>> > > x86_64.
>> > > > >>>Yeah, we already have that applied.  It stops crashes in xen
>> > > > >>>environments so we pulled it in as a bugfix.  Thanks though!
>> > > > >>>
>> > > > >>Who are "we"?
>> > > > >Sorry, Fedora.  That patch has a link to a bug in it.  We
>applied
>> > > the
>> > > > >patch for that bug.  I'll apply Boris' patch on top and get
>the
>> > > same
>> > > > >people to test it.
>> > > > 
>> > > > Josh, have you had a chance to test this?
>> > > 
>> > > I've tested it on bare metal for a while now.  No problems
>noticed at
>> > > all.  I've not heard back from Krishna who was testing it in the
>Xen
>> > > environment.  Krishna?
>> > 
>> > 
>> > Any updates?
>> 
>> No.  I've still not heard from Krishna.
>> 
>> At this point I've tested it on bare metal quite a bit, and Konrad
>has
>> tested it on both bare metal and Xen.  That should already cover the
>> case Krishna was going to test anyway.  I suggest we move forward and
>> take the patch.
>
>Peter?
>
>Would you like me or Boris to clean up the two patches with the
>appropiate Acks and send them to you? 
>> 
>> josh

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] x86: mm: Fix vmalloc_fault oops during lazy MMU updates.
  2013-03-22 20:25     ` H. Peter Anvin
@ 2013-03-23 13:36       ` Konrad Rzeszutek Wilk
  2013-03-23 13:36         ` [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Konrad Rzeszutek Wilk
  2013-04-11  0:29         ` [tip:x86/urgent] x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates tip-bot for Samu Kallio
  0 siblings, 2 replies; 10+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-23 13:36 UTC (permalink / raw)
  To: xen-devel, linux-kernel, hpa
  Cc: mingo, tglx, rostedt, kraman, gregkh, bp, samu.kallio, stable,
	Konrad Rzeszutek Wilk

From: Samu Kallio <samu.kallio@aberdeencloud.com>

In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
when lazy MMU updates are enabled, because set_pgd effects are being
deferred.

One instance of this problem is during process mm cleanup with memory
cgroups enabled. The chain of events is as follows:

- zap_pte_range enables lazy MMU updates
- zap_pte_range eventually calls mem_cgroup_charge_statistics,
  which accesses the vmalloc'd mem_cgroup per-cpu stat area
- vmalloc_fault is triggered which tries to sync the corresponding
  PGD entry with set_pgd, but the update is deferred
- vmalloc_fault oopses due to a mismatch in the PUD entries

The OOPs usually looks as so:

------------[ cut here ]------------
kernel BUG at arch/x86/mm/fault.c:396!
invalid opcode: 0000 [#1] SMP
.. snip ..
CPU 1
Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1
RIP: e030:[<ffffffff816271bf>]  [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208
.. snip ..
Call Trace:
 [<ffffffff81627759>] do_page_fault+0x399/0x4b0
 [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110
 [<ffffffff81624065>] page_fault+0x25/0x30
 [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50
 [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350
 [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60
 [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150
 [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80
 [<ffffffff81153e61>] unmap_single_vma+0x531/0x870
 [<ffffffff81154962>] unmap_vmas+0x52/0xa0
 [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100
 [<ffffffff8115c8f8>] exit_mmap+0x98/0x170
 [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff81059ce3>] mmput+0x83/0xf0
 [<ffffffff810624c4>] exit_mm+0x104/0x130
 [<ffffffff8106264a>] do_exit+0x15a/0x8c0
 [<ffffffff810630ff>] do_group_exit+0x3f/0xa0
 [<ffffffff81063177>] sys_exit_group+0x17/0x20
 [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b

Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
changes visible to the consistency checks.

CC: stable@vger.kernel.org
RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737
Tested-by: Josh Boyer <jwboyer@redhat.com>
Reported-and-Tested-by: Krishna Raman <kraman@redhat.com>
Signed-off-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/mm/fault.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2b97525..0e88336 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
 	if (pgd_none(*pgd_ref))
 		return -1;
 
-	if (pgd_none(*pgd))
+	if (pgd_none(*pgd)) {
 		set_pgd(pgd, *pgd_ref);
-	else
+		arch_flush_lazy_mmu_mode();
+	} else {
 		BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+	}
 
 	/*
 	 * Below here mismatches are bugs because these lower tables
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
  2013-03-23 13:36       ` [PATCH 1/2] x86: mm: Fix vmalloc_fault oops during lazy MMU updates Konrad Rzeszutek Wilk
@ 2013-03-23 13:36         ` Konrad Rzeszutek Wilk
  2013-04-03 13:26           ` Boris Ostrovsky
  2013-04-11  0:30           ` [tip:x86/urgent] x86, mm: " tip-bot for Boris Ostrovsky
  2013-04-11  0:29         ` [tip:x86/urgent] x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates tip-bot for Samu Kallio
  1 sibling, 2 replies; 10+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-23 13:36 UTC (permalink / raw)
  To: xen-devel, linux-kernel, hpa
  Cc: mingo, tglx, rostedt, kraman, gregkh, bp, samu.kallio,
	Boris Ostrovsky, Konrad Rzeszutek Wilk

From: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Invoking arch_flush_lazy_mmu_mode() results in calls to
preempt_enable()/disable() which may have performance impact.

Since lazy MMU is not used on bare metal we can patch away
arch_flush_lazy_mmu_mode() so that it is never called in such
environment.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/paravirt.h       |  5 ++++-
 arch/x86/include/asm/paravirt_types.h |  2 ++
 arch/x86/kernel/paravirt.c            | 25 +++++++++++++------------
 arch/x86/lguest/boot.c                |  1 +
 arch/x86/xen/mmu.c                    |  1 +
 5 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 5edd174..7361e47 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -703,7 +703,10 @@ static inline void arch_leave_lazy_mmu_mode(void)
 	PVOP_VCALL0(pv_mmu_ops.lazy_mode.leave);
 }
 
-void arch_flush_lazy_mmu_mode(void);
+static inline void arch_flush_lazy_mmu_mode(void)
+{
+	PVOP_VCALL0(pv_mmu_ops.lazy_mode.flush);
+}
 
 static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
 				phys_addr_t phys, pgprot_t flags)
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 142236e..b3b0ec1 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -91,6 +91,7 @@ struct pv_lazy_ops {
 	/* Set deferred update mode, used for batching operations. */
 	void (*enter)(void);
 	void (*leave)(void);
+	void (*flush)(void);
 };
 
 struct pv_time_ops {
@@ -679,6 +680,7 @@ void paravirt_end_context_switch(struct task_struct *next);
 
 void paravirt_enter_lazy_mmu(void);
 void paravirt_leave_lazy_mmu(void);
+void paravirt_flush_lazy_mmu(void);
 
 void _paravirt_nop(void);
 u32 _paravirt_ident_32(u32);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 17fff18..8bfb335 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -263,6 +263,18 @@ void paravirt_leave_lazy_mmu(void)
 	leave_lazy(PARAVIRT_LAZY_MMU);
 }
 
+void paravirt_flush_lazy_mmu(void)
+{
+	preempt_disable();
+
+	if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) {
+		arch_leave_lazy_mmu_mode();
+		arch_enter_lazy_mmu_mode();
+	}
+
+	preempt_enable();
+}
+
 void paravirt_start_context_switch(struct task_struct *prev)
 {
 	BUG_ON(preemptible());
@@ -292,18 +304,6 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
 	return this_cpu_read(paravirt_lazy_mode);
 }
 
-void arch_flush_lazy_mmu_mode(void)
-{
-	preempt_disable();
-
-	if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) {
-		arch_leave_lazy_mmu_mode();
-		arch_enter_lazy_mmu_mode();
-	}
-
-	preempt_enable();
-}
-
 struct pv_info pv_info = {
 	.name = "bare hardware",
 	.paravirt_enabled = 0,
@@ -475,6 +475,7 @@ struct pv_mmu_ops pv_mmu_ops = {
 	.lazy_mode = {
 		.enter = paravirt_nop,
 		.leave = paravirt_nop,
+		.flush = paravirt_nop,
 	},
 
 	.set_fixmap = native_set_fixmap,
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index 1cbd89c..7114c63 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -1334,6 +1334,7 @@ __init void lguest_init(void)
 	pv_mmu_ops.read_cr3 = lguest_read_cr3;
 	pv_mmu_ops.lazy_mode.enter = paravirt_enter_lazy_mmu;
 	pv_mmu_ops.lazy_mode.leave = lguest_leave_lazy_mmu_mode;
+	pv_mmu_ops.lazy_mode.flush = paravirt_flush_lazy_mmu;
 	pv_mmu_ops.pte_update = lguest_pte_update;
 	pv_mmu_ops.pte_update_defer = lguest_pte_update;
 
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index e8e3493..f4f4105 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2197,6 +2197,7 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
 	.lazy_mode = {
 		.enter = paravirt_enter_lazy_mmu,
 		.leave = xen_leave_lazy_mmu,
+		.flush = paravirt_flush_lazy_mmu,
 	},
 
 	.set_fixmap = xen_set_fixmap,
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
  2013-03-23 13:36         ` [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Konrad Rzeszutek Wilk
@ 2013-04-03 13:26           ` Boris Ostrovsky
  2013-04-11  0:30           ` [tip:x86/urgent] x86, mm: " tip-bot for Boris Ostrovsky
  1 sibling, 0 replies; 10+ messages in thread
From: Boris Ostrovsky @ 2013-04-03 13:26 UTC (permalink / raw)
  To: hpa
  Cc: Konrad Rzeszutek Wilk, xen-devel, linux-kernel, mingo, tglx,
	rostedt, kraman, gregkh, bp, samu.kallio

On 03/23/2013 09:36 AM, Konrad Rzeszutek Wilk wrote:
> From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>
> Invoking arch_flush_lazy_mmu_mode() results in calls to
> preempt_enable()/disable() which may have performance impact.
>
> Since lazy MMU is not used on bare metal we can patch away
> arch_flush_lazy_mmu_mode() so that it is never called in such
> environment.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Tested-by: Josh Boyer <jwboyer@redhat.com>
> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Acked-by: Borislav Petkov <bp@suse.de>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Peter, what's the status of these two patches? They are not going into
3.9, right?

Thanks.
-boris


> ---
>   arch/x86/include/asm/paravirt.h       |  5 ++++-
>   arch/x86/include/asm/paravirt_types.h |  2 ++
>   arch/x86/kernel/paravirt.c            | 25 +++++++++++++------------
>   arch/x86/lguest/boot.c                |  1 +
>   arch/x86/xen/mmu.c                    |  1 +
>   5 files changed, 21 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 5edd174..7361e47 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -703,7 +703,10 @@ static inline void arch_leave_lazy_mmu_mode(void)
>   	PVOP_VCALL0(pv_mmu_ops.lazy_mode.leave);
>   }
>   
> -void arch_flush_lazy_mmu_mode(void);
> +static inline void arch_flush_lazy_mmu_mode(void)
> +{
> +	PVOP_VCALL0(pv_mmu_ops.lazy_mode.flush);
> +}
>   
>   static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
>   				phys_addr_t phys, pgprot_t flags)
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index 142236e..b3b0ec1 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -91,6 +91,7 @@ struct pv_lazy_ops {
>   	/* Set deferred update mode, used for batching operations. */
>   	void (*enter)(void);
>   	void (*leave)(void);
> +	void (*flush)(void);
>   };
>   
>   struct pv_time_ops {
> @@ -679,6 +680,7 @@ void paravirt_end_context_switch(struct task_struct *next);
>   
>   void paravirt_enter_lazy_mmu(void);
>   void paravirt_leave_lazy_mmu(void);
> +void paravirt_flush_lazy_mmu(void);
>   
>   void _paravirt_nop(void);
>   u32 _paravirt_ident_32(u32);
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 17fff18..8bfb335 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -263,6 +263,18 @@ void paravirt_leave_lazy_mmu(void)
>   	leave_lazy(PARAVIRT_LAZY_MMU);
>   }
>   
> +void paravirt_flush_lazy_mmu(void)
> +{
> +	preempt_disable();
> +
> +	if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) {
> +		arch_leave_lazy_mmu_mode();
> +		arch_enter_lazy_mmu_mode();
> +	}
> +
> +	preempt_enable();
> +}
> +
>   void paravirt_start_context_switch(struct task_struct *prev)
>   {
>   	BUG_ON(preemptible());
> @@ -292,18 +304,6 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
>   	return this_cpu_read(paravirt_lazy_mode);
>   }
>   
> -void arch_flush_lazy_mmu_mode(void)
> -{
> -	preempt_disable();
> -
> -	if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) {
> -		arch_leave_lazy_mmu_mode();
> -		arch_enter_lazy_mmu_mode();
> -	}
> -
> -	preempt_enable();
> -}
> -
>   struct pv_info pv_info = {
>   	.name = "bare hardware",
>   	.paravirt_enabled = 0,
> @@ -475,6 +475,7 @@ struct pv_mmu_ops pv_mmu_ops = {
>   	.lazy_mode = {
>   		.enter = paravirt_nop,
>   		.leave = paravirt_nop,
> +		.flush = paravirt_nop,
>   	},
>   
>   	.set_fixmap = native_set_fixmap,
> diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
> index 1cbd89c..7114c63 100644
> --- a/arch/x86/lguest/boot.c
> +++ b/arch/x86/lguest/boot.c
> @@ -1334,6 +1334,7 @@ __init void lguest_init(void)
>   	pv_mmu_ops.read_cr3 = lguest_read_cr3;
>   	pv_mmu_ops.lazy_mode.enter = paravirt_enter_lazy_mmu;
>   	pv_mmu_ops.lazy_mode.leave = lguest_leave_lazy_mmu_mode;
> +	pv_mmu_ops.lazy_mode.flush = paravirt_flush_lazy_mmu;
>   	pv_mmu_ops.pte_update = lguest_pte_update;
>   	pv_mmu_ops.pte_update_defer = lguest_pte_update;
>   
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index e8e3493..f4f4105 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -2197,6 +2197,7 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
>   	.lazy_mode = {
>   		.enter = paravirt_enter_lazy_mmu,
>   		.leave = xen_leave_lazy_mmu,
> +		.flush = paravirt_flush_lazy_mmu,
>   	},
>   
>   	.set_fixmap = xen_set_fixmap,


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:x86/urgent] x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates
  2013-03-23 13:36       ` [PATCH 1/2] x86: mm: Fix vmalloc_fault oops during lazy MMU updates Konrad Rzeszutek Wilk
  2013-03-23 13:36         ` [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Konrad Rzeszutek Wilk
@ 2013-04-11  0:29         ` tip-bot for Samu Kallio
  1 sibling, 0 replies; 10+ messages in thread
From: tip-bot for Samu Kallio @ 2013-04-11  0:29 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, konrad.wilk, samu.kallio, stable,
	jwboyer, tglx, hpa, kraman

Commit-ID:  1160c2779b826c6f5c08e5cc542de58fd1f667d5
Gitweb:     http://git.kernel.org/tip/1160c2779b826c6f5c08e5cc542de58fd1f667d5
Author:     Samu Kallio <samu.kallio@aberdeencloud.com>
AuthorDate: Sat, 23 Mar 2013 09:36:35 -0400
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Wed, 10 Apr 2013 11:25:07 -0700

x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates

In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
when lazy MMU updates are enabled, because set_pgd effects are being
deferred.

One instance of this problem is during process mm cleanup with memory
cgroups enabled. The chain of events is as follows:

- zap_pte_range enables lazy MMU updates
- zap_pte_range eventually calls mem_cgroup_charge_statistics,
  which accesses the vmalloc'd mem_cgroup per-cpu stat area
- vmalloc_fault is triggered which tries to sync the corresponding
  PGD entry with set_pgd, but the update is deferred
- vmalloc_fault oopses due to a mismatch in the PUD entries

The OOPs usually looks as so:

------------[ cut here ]------------
kernel BUG at arch/x86/mm/fault.c:396!
invalid opcode: 0000 [#1] SMP
.. snip ..
CPU 1
Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1
RIP: e030:[<ffffffff816271bf>]  [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208
.. snip ..
Call Trace:
 [<ffffffff81627759>] do_page_fault+0x399/0x4b0
 [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110
 [<ffffffff81624065>] page_fault+0x25/0x30
 [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50
 [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350
 [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60
 [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150
 [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80
 [<ffffffff81153e61>] unmap_single_vma+0x531/0x870
 [<ffffffff81154962>] unmap_vmas+0x52/0xa0
 [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100
 [<ffffffff8115c8f8>] exit_mmap+0x98/0x170
 [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff81059ce3>] mmput+0x83/0xf0
 [<ffffffff810624c4>] exit_mm+0x104/0x130
 [<ffffffff8106264a>] do_exit+0x15a/0x8c0
 [<ffffffff810630ff>] do_group_exit+0x3f/0xa0
 [<ffffffff81063177>] sys_exit_group+0x17/0x20
 [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b

Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
changes visible to the consistency checks.

Cc: <stable@vger.kernel.org>
RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737
Tested-by: Josh Boyer <jwboyer@redhat.com>
Reported-and-Tested-by: Krishna Raman <kraman@redhat.com>
Signed-off-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.com
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/fault.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2b97525..0e88336 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
 	if (pgd_none(*pgd_ref))
 		return -1;
 
-	if (pgd_none(*pgd))
+	if (pgd_none(*pgd)) {
 		set_pgd(pgd, *pgd_ref);
-	else
+		arch_flush_lazy_mmu_mode();
+	} else {
 		BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+	}
 
 	/*
 	 * Below here mismatches are bugs because these lower tables

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:x86/urgent] x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
  2013-03-23 13:36         ` [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Konrad Rzeszutek Wilk
  2013-04-03 13:26           ` Boris Ostrovsky
@ 2013-04-11  0:30           ` tip-bot for Boris Ostrovsky
  2013-04-11 15:17             ` Boris Ostrovsky
  1 sibling, 1 reply; 10+ messages in thread
From: tip-bot for Boris Ostrovsky @ 2013-04-11  0:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, konrad.wilk, boris.ostrovsky, jwboyer,
	tglx, hpa, bp

Commit-ID:  511ba86e1d386f671084b5d0e6f110bb30b8eeb2
Gitweb:     http://git.kernel.org/tip/511ba86e1d386f671084b5d0e6f110bb30b8eeb2
Author:     Boris Ostrovsky <boris.ostrovsky@oracle.com>
AuthorDate: Sat, 23 Mar 2013 09:36:36 -0400
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Wed, 10 Apr 2013 11:25:10 -0700

x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal

Invoking arch_flush_lazy_mmu_mode() results in calls to
preempt_enable()/disable() which may have performance impact.

Since lazy MMU is not used on bare metal we can patch away
arch_flush_lazy_mmu_mode() so that it is never called in such
environment.

[ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU
  updates" may cause a minor performance regression on
  bare metal.  This patch resolves that performance regression.  It is
  somewhat unclear to me if this is a good -stable candidate. ]

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.com
Tested-by: Josh Boyer <jwboyer@redhat.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> SEE NOTE ABOVE
---
 arch/x86/include/asm/paravirt.h       |  5 ++++-
 arch/x86/include/asm/paravirt_types.h |  2 ++
 arch/x86/kernel/paravirt.c            | 25 +++++++++++++------------
 arch/x86/lguest/boot.c                |  1 +
 arch/x86/xen/mmu.c                    |  1 +
 5 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 5edd174..7361e47 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -703,7 +703,10 @@ static inline void arch_leave_lazy_mmu_mode(void)
 	PVOP_VCALL0(pv_mmu_ops.lazy_mode.leave);
 }
 
-void arch_flush_lazy_mmu_mode(void);
+static inline void arch_flush_lazy_mmu_mode(void)
+{
+	PVOP_VCALL0(pv_mmu_ops.lazy_mode.flush);
+}
 
 static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
 				phys_addr_t phys, pgprot_t flags)
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 142236e..b3b0ec1 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -91,6 +91,7 @@ struct pv_lazy_ops {
 	/* Set deferred update mode, used for batching operations. */
 	void (*enter)(void);
 	void (*leave)(void);
+	void (*flush)(void);
 };
 
 struct pv_time_ops {
@@ -679,6 +680,7 @@ void paravirt_end_context_switch(struct task_struct *next);
 
 void paravirt_enter_lazy_mmu(void);
 void paravirt_leave_lazy_mmu(void);
+void paravirt_flush_lazy_mmu(void);
 
 void _paravirt_nop(void);
 u32 _paravirt_ident_32(u32);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 17fff18..8bfb335 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -263,6 +263,18 @@ void paravirt_leave_lazy_mmu(void)
 	leave_lazy(PARAVIRT_LAZY_MMU);
 }
 
+void paravirt_flush_lazy_mmu(void)
+{
+	preempt_disable();
+
+	if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) {
+		arch_leave_lazy_mmu_mode();
+		arch_enter_lazy_mmu_mode();
+	}
+
+	preempt_enable();
+}
+
 void paravirt_start_context_switch(struct task_struct *prev)
 {
 	BUG_ON(preemptible());
@@ -292,18 +304,6 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
 	return this_cpu_read(paravirt_lazy_mode);
 }
 
-void arch_flush_lazy_mmu_mode(void)
-{
-	preempt_disable();
-
-	if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) {
-		arch_leave_lazy_mmu_mode();
-		arch_enter_lazy_mmu_mode();
-	}
-
-	preempt_enable();
-}
-
 struct pv_info pv_info = {
 	.name = "bare hardware",
 	.paravirt_enabled = 0,
@@ -475,6 +475,7 @@ struct pv_mmu_ops pv_mmu_ops = {
 	.lazy_mode = {
 		.enter = paravirt_nop,
 		.leave = paravirt_nop,
+		.flush = paravirt_nop,
 	},
 
 	.set_fixmap = native_set_fixmap,
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index 1cbd89c..7114c63 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -1334,6 +1334,7 @@ __init void lguest_init(void)
 	pv_mmu_ops.read_cr3 = lguest_read_cr3;
 	pv_mmu_ops.lazy_mode.enter = paravirt_enter_lazy_mmu;
 	pv_mmu_ops.lazy_mode.leave = lguest_leave_lazy_mmu_mode;
+	pv_mmu_ops.lazy_mode.flush = paravirt_flush_lazy_mmu;
 	pv_mmu_ops.pte_update = lguest_pte_update;
 	pv_mmu_ops.pte_update_defer = lguest_pte_update;
 
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 6afbb2c..2f5d687 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2196,6 +2196,7 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
 	.lazy_mode = {
 		.enter = paravirt_enter_lazy_mmu,
 		.leave = xen_leave_lazy_mmu,
+		.flush = paravirt_flush_lazy_mmu,
 	},
 
 	.set_fixmap = xen_set_fixmap,

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [tip:x86/urgent] x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
  2013-04-11  0:30           ` [tip:x86/urgent] x86, mm: " tip-bot for Boris Ostrovsky
@ 2013-04-11 15:17             ` Boris Ostrovsky
  0 siblings, 0 replies; 10+ messages in thread
From: Boris Ostrovsky @ 2013-04-11 15:17 UTC (permalink / raw)
  To: hpa; +Cc: mingo, linux-kernel, konrad.wilk, boris.ostrovsky, jwboyer, tglx, bp

On 04/10/2013 08:30 PM, tip-bot for Boris Ostrovsky wrote:
> Commit-ID:  511ba86e1d386f671084b5d0e6f110bb30b8eeb2
> Gitweb:     http://git.kernel.org/tip/511ba86e1d386f671084b5d0e6f110bb30b8eeb2
> Author:     Boris Ostrovsky <boris.ostrovsky@oracle.com>
> AuthorDate: Sat, 23 Mar 2013 09:36:36 -0400
> Committer:  H. Peter Anvin <hpa@linux.intel.com>
> CommitDate: Wed, 10 Apr 2013 11:25:10 -0700
>
> x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
>
> Invoking arch_flush_lazy_mmu_mode() results in calls to
> preempt_enable()/disable() which may have performance impact.
>
> Since lazy MMU is not used on bare metal we can patch away
> arch_flush_lazy_mmu_mode() so that it is never called in such
> environment.
>
> [ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU
>    updates" may cause a minor performance regression on
>    bare metal.  This patch resolves that performance regression.  It is
>    somewhat unclear to me if this is a good -stable candidate. ]

I think this

    https://lkml.org/lkml/2013/2/26/420

was also part of lazy mmu set of patches but is missing in the latest 
batch of
commits.

-boris


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-04-11 15:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-20 13:53 [PATCH] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Boris Ostrovsky
2013-03-21  0:08 ` Josh Boyer
2013-03-22 20:09   ` Konrad Rzeszutek Wilk
2013-03-22 20:25     ` H. Peter Anvin
2013-03-23 13:36       ` [PATCH 1/2] x86: mm: Fix vmalloc_fault oops during lazy MMU updates Konrad Rzeszutek Wilk
2013-03-23 13:36         ` [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Konrad Rzeszutek Wilk
2013-04-03 13:26           ` Boris Ostrovsky
2013-04-11  0:30           ` [tip:x86/urgent] x86, mm: " tip-bot for Boris Ostrovsky
2013-04-11 15:17             ` Boris Ostrovsky
2013-04-11  0:29         ` [tip:x86/urgent] x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates tip-bot for Samu Kallio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).