All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
@ 2010-09-13 14:55 Stephane Eranian
  2010-09-13 15:08 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Stephane Eranian @ 2010-09-13 14:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, mingo, paulus, davem, fweisbec, perfmon2-devel, eranian,
	eranian, robert.richter, markus.t.metzger

The DS, BTS, and PEBS memory regions were allocated using kzalloc(), i.e.,
requesting contiguous physical memory. There is no such restriction on
DS, PEBS and BTS buffers. Using kzalloc() could lead to error in case
no contiguous physical memory is available. BTS is requesting 64KB,
thus it can cause issues. PEBS is currently only requesting one page.
Both PEBS and BTS are static buffers allocated for each CPU at the
first user. When the last user exists, the buffers are released.

All buffers are only accessed on the CPU they are attached to.
kzalloc() does not take into account NUMA, thus all allocations
are taking place on the NUMA node where the perf_event_open() is
made.

This patch switches allocation to vmalloc_node() to use non-contiguous
physical memory and to allocate on the NUMA node corresponding to each
CPU. We switched DS and PEBS although they do not cause problems today,
to, at least, make the allocation on the correct NUMA node. In the future,
the PEBS buffer size may increase. DS may also grow bigger than a page.
This patch eliminates the memory allocation imbalance.

vmalloc_node() returns page-aligned addresses which do conform with the
restriction on PEBS buffer as documented by Intel in Vol3a section 16.9.4.2.

Signed-off-by: Stephane Eranian <eranian@google.com>
--

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 4977f9c..94293cd 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -94,9 +94,9 @@ static void release_ds_buffers(void)
 
 		per_cpu(cpu_hw_events, cpu).ds = NULL;
 
-		kfree((void *)(unsigned long)ds->pebs_buffer_base);
-		kfree((void *)(unsigned long)ds->bts_buffer_base);
-		kfree(ds);
+		vfree((void *)(unsigned long)ds->pebs_buffer_base);
+		vfree((void *)(unsigned long)ds->bts_buffer_base);
+		vfree(ds);
 	}
 
 	put_online_cpus();
@@ -115,18 +115,32 @@ static int reserve_ds_buffers(void)
 		struct debug_store *ds;
 		void *buffer;
 		int max, thresh;
-
+		int node = cpu_to_node(cpu);
+
+		/*
+		 * Neither DS, BTS, nor PEBS need contiguous physical
+		 * pages.  See Intel Vol3a Section 16.9.4.2.
+		 *
+		 * Furthermore, they are all mostly accessed on
+		 * their respective CPU.
+		 * Therefore, we can use vmalloc_node()
+		 */
 		err = -ENOMEM;
-		ds = kzalloc(sizeof(*ds), GFP_KERNEL);
+		ds = vmalloc_node(sizeof(*ds), node);
 		if (unlikely(!ds))
 			break;
+
+		memset(ds, 0, sizeof(*ds));
+
 		per_cpu(cpu_hw_events, cpu).ds = ds;
 
 		if (x86_pmu.bts) {
-			buffer = kzalloc(BTS_BUFFER_SIZE, GFP_KERNEL);
+			buffer = vmalloc_node(BTS_BUFFER_SIZE, node);
 			if (unlikely(!buffer))
 				break;
 
+			memset(buffer, 0, BTS_BUFFER_SIZE);
+
 			max = BTS_BUFFER_SIZE / BTS_RECORD_SIZE;
 			thresh = max / 16;
 
@@ -139,10 +153,12 @@ static int reserve_ds_buffers(void)
 		}
 
 		if (x86_pmu.pebs) {
-			buffer = kzalloc(PEBS_BUFFER_SIZE, GFP_KERNEL);
+			buffer = vmalloc_node(PEBS_BUFFER_SIZE, node);
 			if (unlikely(!buffer))
 				break;
 
+			memset(buffer, 0, PEBS_BUFFER_SIZE);
+
 			max = PEBS_BUFFER_SIZE / x86_pmu.pebs_record_size;
 
 			ds->pebs_buffer_base = (u64)(unsigned long)buffer;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 14:55 [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation Stephane Eranian
@ 2010-09-13 15:08 ` Peter Zijlstra
  2010-09-13 15:21   ` Stephane Eranian
  2010-09-13 15:09 ` Frederic Weisbecker
  2010-09-13 19:35 ` Andi Kleen
  2 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2010-09-13 15:08 UTC (permalink / raw)
  To: eranian
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, perfmon2-devel,
	eranian, robert.richter, markus.t.metzger

On Mon, 2010-09-13 at 16:55 +0200, Stephane Eranian wrote:
> The DS, BTS, and PEBS memory regions were allocated using kzalloc(), i.e.,
> requesting contiguous physical memory. There is no such restriction on
> DS, PEBS and BTS buffers. Using kzalloc() could lead to error in case
> no contiguous physical memory is available. BTS is requesting 64KB,
> thus it can cause issues. PEBS is currently only requesting one page.
> Both PEBS and BTS are static buffers allocated for each CPU at the
> first user. When the last user exists, the buffers are released.
> 
> All buffers are only accessed on the CPU they are attached to.
> kzalloc() does not take into account NUMA, thus all allocations
> are taking place on the NUMA node where the perf_event_open() is
> made.

I guess that should have been a alloc_pages_node() indeed.

> This patch switches allocation to vmalloc_node() to use non-contiguous
> physical memory and to allocate on the NUMA node corresponding to each
> CPU. We switched DS and PEBS although they do not cause problems today,
> to, at least, make the allocation on the correct NUMA node. In the future,
> the PEBS buffer size may increase. DS may also grow bigger than a page.
> This patch eliminates the memory allocation imbalance.

I'm not really a fan of vmalloc, have you actually observed allocation
failures for these 64k (order-4) allocations?



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 14:55 [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation Stephane Eranian
  2010-09-13 15:08 ` Peter Zijlstra
@ 2010-09-13 15:09 ` Frederic Weisbecker
  2010-09-13 15:13   ` Stephane Eranian
  2010-09-13 19:35 ` Andi Kleen
  2 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2010-09-13 15:09 UTC (permalink / raw)
  To: Stephane Eranian, Mathieu Desnoyers
  Cc: linux-kernel, peterz, mingo, paulus, davem, perfmon2-devel,
	eranian, robert.richter, markus.t.metzger

On Mon, Sep 13, 2010 at 04:55:01PM +0200, Stephane Eranian wrote:
> The DS, BTS, and PEBS memory regions were allocated using kzalloc(), i.e.,
> requesting contiguous physical memory. There is no such restriction on
> DS, PEBS and BTS buffers. Using kzalloc() could lead to error in case
> no contiguous physical memory is available. BTS is requesting 64KB,
> thus it can cause issues. PEBS is currently only requesting one page.
> Both PEBS and BTS are static buffers allocated for each CPU at the
> first user. When the last user exists, the buffers are released.
> 
> All buffers are only accessed on the CPU they are attached to.
> kzalloc() does not take into account NUMA, thus all allocations
> are taking place on the NUMA node where the perf_event_open() is
> made.
> 
> This patch switches allocation to vmalloc_node() to use non-contiguous
> physical memory and to allocate on the NUMA node corresponding to each
> CPU. We switched DS and PEBS although they do not cause problems today,
> to, at least, make the allocation on the correct NUMA node. In the future,
> the PEBS buffer size may increase. DS may also grow bigger than a page.
> This patch eliminates the memory allocation imbalance.
> 
> vmalloc_node() returns page-aligned addresses which do conform with the
> restriction on PEBS buffer as documented by Intel in Vol3a section 16.9.4.2.
> 
> Signed-off-by: Stephane Eranian <eranian@google.com>
> --


For now I think you can not do this. vmalloc'ed memory can't be safely
accessed from NMIs in x86 because that might fault. And faults from NMIs
are not supported. They cause very bad things: return from fault calls
iret which reenables NMI, so NMI can nest but in the meantime there is
only one NMI stack, so that gets quickly messed up.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 15:09 ` Frederic Weisbecker
@ 2010-09-13 15:13   ` Stephane Eranian
  2010-09-13 15:16     ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Stephane Eranian @ 2010-09-13 15:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Mathieu Desnoyers, linux-kernel, peterz, mingo, paulus, davem,
	perfmon2-devel, eranian, robert.richter, markus.t.metzger

On Mon, Sep 13, 2010 at 5:09 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Mon, Sep 13, 2010 at 04:55:01PM +0200, Stephane Eranian wrote:
>> The DS, BTS, and PEBS memory regions were allocated using kzalloc(), i.e.,
>> requesting contiguous physical memory. There is no such restriction on
>> DS, PEBS and BTS buffers. Using kzalloc() could lead to error in case
>> no contiguous physical memory is available. BTS is requesting 64KB,
>> thus it can cause issues. PEBS is currently only requesting one page.
>> Both PEBS and BTS are static buffers allocated for each CPU at the
>> first user. When the last user exists, the buffers are released.
>>
>> All buffers are only accessed on the CPU they are attached to.
>> kzalloc() does not take into account NUMA, thus all allocations
>> are taking place on the NUMA node where the perf_event_open() is
>> made.
>>
>> This patch switches allocation to vmalloc_node() to use non-contiguous
>> physical memory and to allocate on the NUMA node corresponding to each
>> CPU. We switched DS and PEBS although they do not cause problems today,
>> to, at least, make the allocation on the correct NUMA node. In the future,
>> the PEBS buffer size may increase. DS may also grow bigger than a page.
>> This patch eliminates the memory allocation imbalance.
>>
>> vmalloc_node() returns page-aligned addresses which do conform with the
>> restriction on PEBS buffer as documented by Intel in Vol3a section 16.9.4.2.
>>
>> Signed-off-by: Stephane Eranian <eranian@google.com>
>> --
>
>
> For now I think you can not do this. vmalloc'ed memory can't be safely
> accessed from NMIs in x86 because that might fault. And faults from NMIs
> are not supported. They cause very bad things: return from fault calls
> iret which reenables NMI, so NMI can nest but in the meantime there is
> only one NMI stack, so that gets quickly messed up.
>
What kind of faults are you talking about here? TLB faults?

But I don't want contiguous memory. This puts unnecessary pressure on
the memory subsystem. I have seen failures on my system because it
could not find 64KB of contiguous physical, but there was clearly more
than 64kb of physical memory available. And I want NUMA local allocations
as well.


>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 15:13   ` Stephane Eranian
@ 2010-09-13 15:16     ` Peter Zijlstra
  2010-09-13 15:20       ` Stephane Eranian
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2010-09-13 15:16 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Frederic Weisbecker, Mathieu Desnoyers, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, 2010-09-13 at 17:13 +0200, Stephane Eranian wrote:
> > For now I think you can not do this. vmalloc'ed memory can't be safely
> > accessed from NMIs in x86 because that might fault. And faults from NMIs
> > are not supported. They cause very bad things: return from fault calls
> > iret which reenables NMI, so NMI can nest but in the meantime there is
> > only one NMI stack, so that gets quickly messed up.
> >
> What kind of faults are you talking about here? TLB faults?

Page faults. vmalloc pte setup is lazy.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 15:16     ` Peter Zijlstra
@ 2010-09-13 15:20       ` Stephane Eranian
  2010-09-13 15:24         ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Stephane Eranian @ 2010-09-13 15:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Mathieu Desnoyers, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, Sep 13, 2010 at 5:16 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2010-09-13 at 17:13 +0200, Stephane Eranian wrote:
>> > For now I think you can not do this. vmalloc'ed memory can't be safely
>> > accessed from NMIs in x86 because that might fault. And faults from NMIs
>> > are not supported. They cause very bad things: return from fault calls
>> > iret which reenables NMI, so NMI can nest but in the meantime there is
>> > only one NMI stack, so that gets quickly messed up.
>> >
>> What kind of faults are you talking about here? TLB faults?
>
> Page faults. vmalloc pte setup is lazy.
>
Is there a way to not do lazy?

I guess we could do alloc_pages_nodes() if we make sure the
buffer size can be expressed as a page order and not just a page size,
or we are willing to waste memory.

That is the case we the sizes you have chosen today. For DS, we
could round up to one page for now.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 15:08 ` Peter Zijlstra
@ 2010-09-13 15:21   ` Stephane Eranian
  0 siblings, 0 replies; 29+ messages in thread
From: Stephane Eranian @ 2010-09-13 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, perfmon2-devel,
	eranian, robert.richter, markus.t.metzger

On Mon, Sep 13, 2010 at 5:08 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2010-09-13 at 16:55 +0200, Stephane Eranian wrote:
>> The DS, BTS, and PEBS memory regions were allocated using kzalloc(), i.e.,
>> requesting contiguous physical memory. There is no such restriction on
>> DS, PEBS and BTS buffers. Using kzalloc() could lead to error in case
>> no contiguous physical memory is available. BTS is requesting 64KB,
>> thus it can cause issues. PEBS is currently only requesting one page.
>> Both PEBS and BTS are static buffers allocated for each CPU at the
>> first user. When the last user exists, the buffers are released.
>>
>> All buffers are only accessed on the CPU they are attached to.
>> kzalloc() does not take into account NUMA, thus all allocations
>> are taking place on the NUMA node where the perf_event_open() is
>> made.
>
> I guess that should have been a alloc_pages_node() indeed.
>
>> This patch switches allocation to vmalloc_node() to use non-contiguous
>> physical memory and to allocate on the NUMA node corresponding to each
>> CPU. We switched DS and PEBS although they do not cause problems today,
>> to, at least, make the allocation on the correct NUMA node. In the future,
>> the PEBS buffer size may increase. DS may also grow bigger than a page.
>> This patch eliminates the memory allocation imbalance.
>
> I'm not really a fan of vmalloc, have you actually observed allocation
> failures for these 64k (order-4) allocations?
>
I did with small amount of memory in containers, after stressing
the system with perf for instance.

>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 15:20       ` Stephane Eranian
@ 2010-09-13 15:24         ` Peter Zijlstra
  2010-09-13 15:31           ` Stephane Eranian
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2010-09-13 15:24 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Frederic Weisbecker, Mathieu Desnoyers, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger, markus.t.metzger

On Mon, 2010-09-13 at 17:20 +0200, Stephane Eranian wrote:

> That is the case we the sizes you have chosen today. For DS, we
> could round up to one page for now.

Markus chose the BTS size, for PEBS a single page was plenty since we do
single event things (although we could do multiple for attr.precise_ip <
2).

For DS there's:
  kmalloc_node(sizeof(struct ds), GFP_KERNEL | __GFP_ZERO, cpu_node(cpu));



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 15:24         ` Peter Zijlstra
@ 2010-09-13 15:31           ` Stephane Eranian
  2010-09-13 15:41             ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Stephane Eranian @ 2010-09-13 15:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Mathieu Desnoyers, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, Sep 13, 2010 at 5:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2010-09-13 at 17:20 +0200, Stephane Eranian wrote:
>
>> That is the case we the sizes you have chosen today. For DS, we
>> could round up to one page for now.
>
> Markus chose the BTS size, for PEBS a single page was plenty since we do
> single event things (although we could do multiple for attr.precise_ip <
> 2).
>
> For DS there's:
>  kmalloc_node(sizeof(struct ds), GFP_KERNEL | __GFP_ZERO, cpu_node(cpu));
>
Ok, let try again with alloc_pages_node() + kmalloc_node().
I think we can stick with kmalloc() for DS because we are far from
consuming a page.

>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 15:31           ` Stephane Eranian
@ 2010-09-13 15:41             ` Peter Zijlstra
  2010-09-13 15:51               ` Frederic Weisbecker
                                 ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Peter Zijlstra @ 2010-09-13 15:41 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Frederic Weisbecker, Mathieu Desnoyers, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, 2010-09-13 at 17:31 +0200, Stephane Eranian wrote:
> On Mon, Sep 13, 2010 at 5:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, 2010-09-13 at 17:20 +0200, Stephane Eranian wrote:
> >
> >> That is the case we the sizes you have chosen today. For DS, we
> >> could round up to one page for now.
> >
> > Markus chose the BTS size, for PEBS a single page was plenty since we do
> > single event things (although we could do multiple for attr.precise_ip <
> > 2).
> >
> > For DS there's:
> >  kmalloc_node(sizeof(struct ds), GFP_KERNEL | __GFP_ZERO, cpu_node(cpu));
> >
> Ok, let try again with alloc_pages_node() + kmalloc_node().
> I think we can stick with kmalloc() for DS because we are far from
> consuming a page.

Thing is, if you're really seeing allocation failures,
alloc_pages_node() isn't going to help. And the problem is, these
allocations aren't movable, so memory compaction and all the other fancy
stuff aren't really going to help much :/

There was some talk about a function that should sync all of vmalloc
space, but iirc it was broken for some configs.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 15:41             ` Peter Zijlstra
@ 2010-09-13 15:51               ` Frederic Weisbecker
  2010-09-13 15:55               ` Stephane Eranian
  2010-09-13 17:24               ` Stephane Eranian
  2 siblings, 0 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2010-09-13 15:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, Mathieu Desnoyers, linux-kernel, mingo, paulus,
	davem, perfmon2-devel, eranian, robert.richter, markus.t.metzger

On Mon, Sep 13, 2010 at 05:41:16PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-09-13 at 17:31 +0200, Stephane Eranian wrote:
> > On Mon, Sep 13, 2010 at 5:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Mon, 2010-09-13 at 17:20 +0200, Stephane Eranian wrote:
> > >
> > >> That is the case we the sizes you have chosen today. For DS, we
> > >> could round up to one page for now.
> > >
> > > Markus chose the BTS size, for PEBS a single page was plenty since we do
> > > single event things (although we could do multiple for attr.precise_ip <
> > > 2).
> > >
> > > For DS there's:
> > >  kmalloc_node(sizeof(struct ds), GFP_KERNEL | __GFP_ZERO, cpu_node(cpu));
> > >
> > Ok, let try again with alloc_pages_node() + kmalloc_node().
> > I think we can stick with kmalloc() for DS because we are far from
> > consuming a page.
> 
> Thing is, if you're really seeing allocation failures,
> alloc_pages_node() isn't going to help. And the problem is, these
> allocations aren't movable, so memory compaction and all the other fancy
> stuff aren't really going to help much :/
> 
> There was some talk about a function that should sync all of vmalloc
> space, but iirc it was broken for some configs.


Mathieu has posted a set of patches to support traps/faults in NMIs,
this has been followed by a discussion with Linus.

http://linux.derkeiler.com/Mailing-Lists/Kernel/2010-07/msg05222.html

I think they came with some patches to solve the problem, pushing all the
fixup code in the NMI area, but the discussion was preempted by holidays
and I suspect Mathieu is focusing on some other scheduler patches right now :)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 15:41             ` Peter Zijlstra
  2010-09-13 15:51               ` Frederic Weisbecker
@ 2010-09-13 15:55               ` Stephane Eranian
  2010-09-13 17:35                 ` Peter Zijlstra
  2010-09-13 17:24               ` Stephane Eranian
  2 siblings, 1 reply; 29+ messages in thread
From: Stephane Eranian @ 2010-09-13 15:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Mathieu Desnoyers, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, Sep 13, 2010 at 5:41 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2010-09-13 at 17:31 +0200, Stephane Eranian wrote:
>> On Mon, Sep 13, 2010 at 5:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Mon, 2010-09-13 at 17:20 +0200, Stephane Eranian wrote:
>> >
>> >> That is the case we the sizes you have chosen today. For DS, we
>> >> could round up to one page for now.
>> >
>> > Markus chose the BTS size, for PEBS a single page was plenty since we do
>> > single event things (although we could do multiple for attr.precise_ip <
>> > 2).
>> >
>> > For DS there's:
>> >  kmalloc_node(sizeof(struct ds), GFP_KERNEL | __GFP_ZERO, cpu_node(cpu));
>> >
>> Ok, let try again with alloc_pages_node() + kmalloc_node().
>> I think we can stick with kmalloc() for DS because we are far from
>> consuming a page.
>
> Thing is, if you're really seeing allocation failures,
> alloc_pages_node() isn't going to help. And the problem is, these
> allocations aren't movable, so memory compaction and all the other fancy
> stuff aren't really going to help much :/
>

Ok, so you're saying there is no allocator that will give non-contiguous
physical memory WITHOUT requiring a page fault to populate the pte.

On the other hand, with vmalloc_node() the pte are populated when
you first touch the memory. That happens as part of memset() right after
the allocation and thus outside of NMI interrupt handler.

Does this sound right?


> There was some talk about a function that should sync all of vmalloc
> space, but iirc it was broken for some configs.
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 15:41             ` Peter Zijlstra
  2010-09-13 15:51               ` Frederic Weisbecker
  2010-09-13 15:55               ` Stephane Eranian
@ 2010-09-13 17:24               ` Stephane Eranian
  2010-09-13 17:36                 ` Peter Zijlstra
  2 siblings, 1 reply; 29+ messages in thread
From: Stephane Eranian @ 2010-09-13 17:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Mathieu Desnoyers, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, Sep 13, 2010 at 5:41 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2010-09-13 at 17:31 +0200, Stephane Eranian wrote:
>> On Mon, Sep 13, 2010 at 5:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Mon, 2010-09-13 at 17:20 +0200, Stephane Eranian wrote:
>> >
>> >> That is the case we the sizes you have chosen today. For DS, we
>> >> could round up to one page for now.
>> >
>> > Markus chose the BTS size, for PEBS a single page was plenty since we do
>> > single event things (although we could do multiple for attr.precise_ip <
>> > 2).
>> >
>> > For DS there's:
>> >  kmalloc_node(sizeof(struct ds), GFP_KERNEL | __GFP_ZERO, cpu_node(cpu));
>> >
>> Ok, let try again with alloc_pages_node() + kmalloc_node().
>> I think we can stick with kmalloc() for DS because we are far from
>> consuming a page.
>
> Thing is, if you're really seeing allocation failures,
> alloc_pages_node() isn't going to help. And the problem is, these
> allocations aren't movable, so memory compaction and all the other fancy
> stuff aren't really going to help much :/
>
Based on this comment, I assume that the only reason the allocation
of the sampling buffer in perf_buffer_alloc() is immune to this is because
you are allocating each page individually (order 0). Right?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 15:55               ` Stephane Eranian
@ 2010-09-13 17:35                 ` Peter Zijlstra
  2010-09-13 18:40                   ` Stephane Eranian
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2010-09-13 17:35 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Frederic Weisbecker, Mathieu Desnoyers, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, 2010-09-13 at 17:55 +0200, Stephane Eranian wrote:
> 
> Ok, so you're saying there is no allocator that will give non-contiguous
> physical memory WITHOUT requiring a page fault to populate the pte.
> 
> On the other hand, with vmalloc_node() the pte are populated when
> you first touch the memory. That happens as part of memset() right after
> the allocation and thus outside of NMI interrupt handler.
> 
> Does this sound right? 

Nope, in particular read: http://lkml.org/lkml/2010/7/14/465

The issue is that the vmalloc space can be mapped in different
processes, and that memset() will only ensure its mapped in the current
process, but the next one might need that fault to populate.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 17:24               ` Stephane Eranian
@ 2010-09-13 17:36                 ` Peter Zijlstra
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2010-09-13 17:36 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Frederic Weisbecker, Mathieu Desnoyers, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, 2010-09-13 at 19:24 +0200, Stephane Eranian wrote:
> Based on this comment, I assume that the only reason the allocation
> of the sampling buffer in perf_buffer_alloc() is immune to this is because
> you are allocating each page individually (order 0). Right? 

Right, and I software stitch the bits together.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 17:35                 ` Peter Zijlstra
@ 2010-09-13 18:40                   ` Stephane Eranian
  2010-09-13 18:42                     ` Peter Zijlstra
  2010-09-13 19:31                     ` Mathieu Desnoyers
  0 siblings, 2 replies; 29+ messages in thread
From: Stephane Eranian @ 2010-09-13 18:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Mathieu Desnoyers, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, Sep 13, 2010 at 7:35 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2010-09-13 at 17:55 +0200, Stephane Eranian wrote:
>>
>> Ok, so you're saying there is no allocator that will give non-contiguous
>> physical memory WITHOUT requiring a page fault to populate the pte.
>>
>> On the other hand, with vmalloc_node() the pte are populated when
>> you first touch the memory. That happens as part of memset() right after
>> the allocation and thus outside of NMI interrupt handler.
>>
>> Does this sound right?
>
> Nope, in particular read: http://lkml.org/lkml/2010/7/14/465
>
> The issue is that the vmalloc space can be mapped in different
> processes, and that memset() will only ensure its mapped in the current
> process, but the next one might need that fault to populate.
>
Ok, so can we play the same trick you're playing with the sampling
buffer, i.e., you use alloc_pages_node() for one page at a time, and
then you stitch them on demand via SW?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 18:40                   ` Stephane Eranian
@ 2010-09-13 18:42                     ` Peter Zijlstra
  2010-09-13 18:49                       ` Stephane Eranian
  2010-09-13 19:31                     ` Mathieu Desnoyers
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2010-09-13 18:42 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Frederic Weisbecker, Mathieu Desnoyers, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, 2010-09-13 at 20:40 +0200, Stephane Eranian wrote:
> Ok, so can we play the same trick you're playing with the sampling
> buffer, i.e., you use alloc_pages_node() for one page at a time, and
> then you stitch them on demand via SW? 

Not for BTS, it wants a linear range, getting the vmalloc vs NMI thing
sorted would be best I think.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 18:42                     ` Peter Zijlstra
@ 2010-09-13 18:49                       ` Stephane Eranian
  2010-09-13 18:57                         ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Stephane Eranian @ 2010-09-13 18:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Mathieu Desnoyers, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, Sep 13, 2010 at 8:42 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2010-09-13 at 20:40 +0200, Stephane Eranian wrote:
>> Ok, so can we play the same trick you're playing with the sampling
>> buffer, i.e., you use alloc_pages_node() for one page at a time, and
>> then you stitch them on demand via SW?
>
> Not for BTS, it wants a linear range, getting the vmalloc vs NMI thing
> sorted would be best I think.
>
What is annoying in this is that you run into the problem even though
you may not be using BTS nor PEBS.

What mitigates the problem, I think, is the NMI watchdog. It is the first
user of perf_events. As such, the BTS and PEBS buffers get allocated
during kernel initialization thereby increasing the chances of finding
contiguous chunks of memory. What would partly help would be to use of
kmalloc_node() to at least balance allocations amongst the various NUMA
nodes. That would be until the vmalloc() vs. NMI is sorted out.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 18:49                       ` Stephane Eranian
@ 2010-09-13 18:57                         ` Peter Zijlstra
  2010-09-13 19:12                           ` Stephane Eranian
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2010-09-13 18:57 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Frederic Weisbecker, Mathieu Desnoyers, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, 2010-09-13 at 20:49 +0200, Stephane Eranian wrote:
> On Mon, Sep 13, 2010 at 8:42 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, 2010-09-13 at 20:40 +0200, Stephane Eranian wrote:
> >> Ok, so can we play the same trick you're playing with the sampling
> >> buffer, i.e., you use alloc_pages_node() for one page at a time, and
> >> then you stitch them on demand via SW?
> >
> > Not for BTS, it wants a linear range, getting the vmalloc vs NMI thing
> > sorted would be best I think.
> >
> What is annoying in this is that you run into the problem even though
> you may not be using BTS nor PEBS.

Yes, one thing we could do is simply disable BTS when we fail that
alloc, instead of fail everything.

> What mitigates the problem, I think, is the NMI watchdog. It is the first
> user of perf_events. As such, the BTS and PEBS buffers get allocated
> during kernel initialization thereby increasing the chances of finding
> contiguous chunks of memory. What would partly help would be to use of
> kmalloc_node() to at least balance allocations amongst the various NUMA
> nodes. That would be until the vmalloc() vs. NMI is sorted out.

Right, that would be a simple change to make.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 18:57                         ` Peter Zijlstra
@ 2010-09-13 19:12                           ` Stephane Eranian
  0 siblings, 0 replies; 29+ messages in thread
From: Stephane Eranian @ 2010-09-13 19:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Mathieu Desnoyers, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, Sep 13, 2010 at 8:57 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2010-09-13 at 20:49 +0200, Stephane Eranian wrote:
>> On Mon, Sep 13, 2010 at 8:42 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Mon, 2010-09-13 at 20:40 +0200, Stephane Eranian wrote:
>> >> Ok, so can we play the same trick you're playing with the sampling
>> >> buffer, i.e., you use alloc_pages_node() for one page at a time, and
>> >> then you stitch them on demand via SW?
>> >
>> > Not for BTS, it wants a linear range, getting the vmalloc vs NMI thing
>> > sorted would be best I think.
>> >
>> What is annoying in this is that you run into the problem even though
>> you may not be using BTS nor PEBS.
>
> Yes, one thing we could do is simply disable BTS when we fail that
> alloc, instead of fail everything.
>
>> What mitigates the problem, I think, is the NMI watchdog. It is the first
>> user of perf_events. As such, the BTS and PEBS buffers get allocated
>> during kernel initialization thereby increasing the chances of finding
>> contiguous chunks of memory. What would partly help would be to use of
>> kmalloc_node() to at least balance allocations amongst the various NUMA
>> nodes. That would be until the vmalloc() vs. NMI is sorted out.
>
> Right, that would be a simple change to make.
>

Ok, I will resubmit with those changes.
Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 18:40                   ` Stephane Eranian
  2010-09-13 18:42                     ` Peter Zijlstra
@ 2010-09-13 19:31                     ` Mathieu Desnoyers
  2010-09-13 19:34                       ` Peter Zijlstra
  1 sibling, 1 reply; 29+ messages in thread
From: Mathieu Desnoyers @ 2010-09-13 19:31 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, Frederic Weisbecker, linux-kernel, mingo, paulus,
	davem, perfmon2-devel, eranian, robert.richter, markus.t.metzger

* Stephane Eranian (eranian@google.com) wrote:
> On Mon, Sep 13, 2010 at 7:35 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, 2010-09-13 at 17:55 +0200, Stephane Eranian wrote:
> >>
> >> Ok, so you're saying there is no allocator that will give non-contiguous
> >> physical memory WITHOUT requiring a page fault to populate the pte.
> >>
> >> On the other hand, with vmalloc_node() the pte are populated when
> >> you first touch the memory. That happens as part of memset() right after
> >> the allocation and thus outside of NMI interrupt handler.
> >>
> >> Does this sound right?
> >
> > Nope, in particular read: http://lkml.org/lkml/2010/7/14/465
> >
> > The issue is that the vmalloc space can be mapped in different
> > processes, and that memset() will only ensure its mapped in the current
> > process, but the next one might need that fault to populate.
> >
> Ok, so can we play the same trick you're playing with the sampling
> buffer, i.e., you use alloc_pages_node() for one page at a time, and
> then you stitch them on demand via SW?

Well, a thought is striking me: it sounds like you are re-doing YAORB (short
for Yet Another Ring Buffer, which I start to expect will become a frequently
used acronym). Have you looked at my "generic ring buffer library" ? It's at:

git://git.kernel.org/pub/scm/linux/kernel/git/compudj/linux-2.6-ringbuffer.git
current branch: tip-current-ringbuffer-0.248

documentation is under Documentation/ringbuffer/.

I think it can save you a lot of trouble. E.g., it does stitch pages togeter by
software.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 19:31                     ` Mathieu Desnoyers
@ 2010-09-13 19:34                       ` Peter Zijlstra
  2010-09-13 19:35                         ` Peter Zijlstra
  2010-09-13 19:42                         ` Mathieu Desnoyers
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2010-09-13 19:34 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Stephane Eranian, Frederic Weisbecker, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, 2010-09-13 at 15:31 -0400, Mathieu Desnoyers wrote:
> > Ok, so can we play the same trick you're playing with the sampling
> > buffer, i.e., you use alloc_pages_node() for one page at a time, and
> > then you stitch them on demand via SW?
> 
> Well, a thought is striking me: it sounds like you are re-doing YAORB (short
> for Yet Another Ring Buffer, 

He's not.. the hardware needs a large (virtually) contiguous region to
poke data into, we need to read it out from NMI context.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 19:34                       ` Peter Zijlstra
@ 2010-09-13 19:35                         ` Peter Zijlstra
  2010-09-13 19:42                         ` Mathieu Desnoyers
  1 sibling, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2010-09-13 19:35 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Stephane Eranian, Frederic Weisbecker, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

On Mon, 2010-09-13 at 21:34 +0200, Peter Zijlstra wrote:
> On Mon, 2010-09-13 at 15:31 -0400, Mathieu Desnoyers wrote:
> > > Ok, so can we play the same trick you're playing with the sampling
> > > buffer, i.e., you use alloc_pages_node() for one page at a time, and
> > > then you stitch them on demand via SW?
> > 
> > Well, a thought is striking me: it sounds like you are re-doing YAORB (short
> > for Yet Another Ring Buffer, 
> 
> He's not.. the hardware needs a large (virtually) contiguous region to
> poke data into, we need to read it out from NMI context.

Which just made me realize, we really need vmalloc_sync_all() for this,
the hardware will want to walk the pagetables as well, it cannot lazy
fault the pages in.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 14:55 [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation Stephane Eranian
  2010-09-13 15:08 ` Peter Zijlstra
  2010-09-13 15:09 ` Frederic Weisbecker
@ 2010-09-13 19:35 ` Andi Kleen
  2010-09-13 19:49   ` Peter Zijlstra
  2010-09-13 20:34   ` H. Peter Anvin
  2 siblings, 2 replies; 29+ messages in thread
From: Andi Kleen @ 2010-09-13 19:35 UTC (permalink / raw)
  To: eranian
  Cc: linux-kernel, peterz, mingo, paulus, davem, fweisbec,
	perfmon2-devel, eranian, robert.richter, markus.t.metzger

Stephane Eranian <eranian@google.com> writes:

> The DS, BTS, and PEBS memory regions were allocated using kzalloc(), i.e.,
> requesting contiguous physical memory. There is no such restriction on
> DS, PEBS and BTS buffers. Using kzalloc() could lead to error in case
> no contiguous physical memory is available. BTS is requesting 64KB,
> thus it can cause issues. PEBS is currently only requesting one page.
> Both PEBS and BTS are static buffers allocated for each CPU at the
> first user. When the last user exists, the buffers are released.

DS supports page tables, but I have some doubts it really 
supports page faults. vmalloc today does page faults. 

I think the change is a good idea, but it will need vmalloc_sync_all()
everywhere.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 19:34                       ` Peter Zijlstra
  2010-09-13 19:35                         ` Peter Zijlstra
@ 2010-09-13 19:42                         ` Mathieu Desnoyers
  1 sibling, 0 replies; 29+ messages in thread
From: Mathieu Desnoyers @ 2010-09-13 19:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, Frederic Weisbecker, linux-kernel, mingo,
	paulus, davem, perfmon2-devel, eranian, robert.richter,
	markus.t.metzger

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Mon, 2010-09-13 at 15:31 -0400, Mathieu Desnoyers wrote:
> > > Ok, so can we play the same trick you're playing with the sampling
> > > buffer, i.e., you use alloc_pages_node() for one page at a time, and
> > > then you stitch them on demand via SW?
> > 
> > Well, a thought is striking me: it sounds like you are re-doing YAORB (short
> > for Yet Another Ring Buffer, 
> 
> He's not.. the hardware needs a large (virtually) contiguous region to
> poke data into, we need to read it out from NMI context.

Ah ok, it's different then.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 19:35 ` Andi Kleen
@ 2010-09-13 19:49   ` Peter Zijlstra
  2010-09-13 20:51     ` Andi Kleen
  2010-09-13 20:34   ` H. Peter Anvin
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2010-09-13 19:49 UTC (permalink / raw)
  To: Andi Kleen
  Cc: eranian, linux-kernel, mingo, paulus, davem, fweisbec,
	perfmon2-devel, eranian, robert.richter, markus.t.metzger

On Mon, 2010-09-13 at 21:35 +0200, Andi Kleen wrote:
> Stephane Eranian <eranian@google.com> writes:
> 
> > The DS, BTS, and PEBS memory regions were allocated using kzalloc(), i.e.,
> > requesting contiguous physical memory. There is no such restriction on
> > DS, PEBS and BTS buffers. Using kzalloc() could lead to error in case
> > no contiguous physical memory is available. BTS is requesting 64KB,
> > thus it can cause issues. PEBS is currently only requesting one page.
> > Both PEBS and BTS are static buffers allocated for each CPU at the
> > first user. When the last user exists, the buffers are released.
> 
> DS supports page tables, but I have some doubts it really 
> supports page faults. vmalloc today does page faults. 
> 
> I think the change is a good idea, but it will need vmalloc_sync_all()
> everywhere.

Right, I seem to remember from that last discussion on vmalloc vs NMI
that vmalloc_sync_all() had some issues, or am I totally mis-remembering
that?

But yes, a vmalloc_sync_all() after the vmalloc_node() and this should
indeed work.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 19:35 ` Andi Kleen
  2010-09-13 19:49   ` Peter Zijlstra
@ 2010-09-13 20:34   ` H. Peter Anvin
  1 sibling, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2010-09-13 20:34 UTC (permalink / raw)
  To: Andi Kleen
  Cc: eranian, linux-kernel, peterz, mingo, paulus, davem, fweisbec,
	perfmon2-devel, eranian, robert.richter, markus.t.metzger

On 09/13/2010 12:35 PM, Andi Kleen wrote:
> 
> DS supports page tables, but I have some doubts it really 
> supports page faults. vmalloc today does page faults. 
> 

It specifically does not (SDM III 16.4.9.2).  In fact, it requires that
the pages be mapped Accessed and Dirty so the hardware doesn't have to
stop and set those bits.

The options thus are vmalloc_sync_all(), which will make Linus unhappy
(since he seems to want to get rid of the thing), or doing fixmap-style
reserved address space which is always consistent because the PDEs are
preallocated and frozen.

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 19:49   ` Peter Zijlstra
@ 2010-09-13 20:51     ` Andi Kleen
  2010-09-13 20:57       ` [perfmon2] " Luck, Tony
  0 siblings, 1 reply; 29+ messages in thread
From: Andi Kleen @ 2010-09-13 20:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: eranian, linux-kernel, mingo, paulus, davem, fweisbec,
	perfmon2-devel, eranian, robert.richter, markus.t.metzger

On Mon, 13 Sep 2010 21:49:20 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, 2010-09-13 at 21:35 +0200, Andi Kleen wrote:
> > Stephane Eranian <eranian@google.com> writes:
> > 
> > > The DS, BTS, and PEBS memory regions were allocated using
> > > kzalloc(), i.e., requesting contiguous physical memory. There is
> > > no such restriction on DS, PEBS and BTS buffers. Using kzalloc()
> > > could lead to error in case no contiguous physical memory is
> > > available. BTS is requesting 64KB, thus it can cause issues. PEBS
> > > is currently only requesting one page. Both PEBS and BTS are
> > > static buffers allocated for each CPU at the first user. When the
> > > last user exists, the buffers are released.
> > 
> > DS supports page tables, but I have some doubts it really 
> > supports page faults. vmalloc today does page faults. 
> > 
> > I think the change is a good idea, but it will need
> > vmalloc_sync_all() everywhere.
> 
> Right, I seem to remember from that last discussion on vmalloc vs NMI
> that vmalloc_sync_all() had some issues, or am I totally
> mis-remembering that?

Linus thought it was ugly, but he never explained why and it was
not obvious to me. 

His proposed replacement wouldn't work for this case.

I am not aware of any real technical issues, except that
it needs to be done for both 32bit and 64bit.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [perfmon2] [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation
  2010-09-13 20:51     ` Andi Kleen
@ 2010-09-13 20:57       ` Luck, Tony
  0 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2010-09-13 20:57 UTC (permalink / raw)
  To: Andi Kleen, Peter Zijlstra
  Cc: perfmon2-devel, eranian, fweisbec, linux-kernel, eranian, paulus,
	Metzger, Markus T, mingo, davem

> I am not aware of any real technical issues, except that
> it needs to be done for both 32bit and 64bit.

Some good documentation on when you need vmalloc_sync_all(), and
why most places don't need it would help to avoid a surge of
patches spraying calls to it all over the kernel.

-Tony

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2010-09-13 20:57 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-13 14:55 [PATCH] perf_events: improve DS/BTS/PEBS buffer allocation Stephane Eranian
2010-09-13 15:08 ` Peter Zijlstra
2010-09-13 15:21   ` Stephane Eranian
2010-09-13 15:09 ` Frederic Weisbecker
2010-09-13 15:13   ` Stephane Eranian
2010-09-13 15:16     ` Peter Zijlstra
2010-09-13 15:20       ` Stephane Eranian
2010-09-13 15:24         ` Peter Zijlstra
2010-09-13 15:31           ` Stephane Eranian
2010-09-13 15:41             ` Peter Zijlstra
2010-09-13 15:51               ` Frederic Weisbecker
2010-09-13 15:55               ` Stephane Eranian
2010-09-13 17:35                 ` Peter Zijlstra
2010-09-13 18:40                   ` Stephane Eranian
2010-09-13 18:42                     ` Peter Zijlstra
2010-09-13 18:49                       ` Stephane Eranian
2010-09-13 18:57                         ` Peter Zijlstra
2010-09-13 19:12                           ` Stephane Eranian
2010-09-13 19:31                     ` Mathieu Desnoyers
2010-09-13 19:34                       ` Peter Zijlstra
2010-09-13 19:35                         ` Peter Zijlstra
2010-09-13 19:42                         ` Mathieu Desnoyers
2010-09-13 17:24               ` Stephane Eranian
2010-09-13 17:36                 ` Peter Zijlstra
2010-09-13 19:35 ` Andi Kleen
2010-09-13 19:49   ` Peter Zijlstra
2010-09-13 20:51     ` Andi Kleen
2010-09-13 20:57       ` [perfmon2] " Luck, Tony
2010-09-13 20:34   ` H. Peter Anvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.