linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] x86, perf: avoid spamming kernel log for bts buffer failure
@ 2014-06-30 23:04 David Rientjes
  2014-07-01  9:34 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: David Rientjes @ 2014-06-30 23:04 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Stephane Eranian, x86, linux-kernel

It's unnecessary to excessively spam the kernel log anytime the BTS buffer 
cannot be allocated, so make this allocation __GFP_NOWARN.

The user probably will want to at least find some artifact that the 
allocation has failed in the past, probably due to fragmentation because 
of its large size, when it's not allocated at bootstrap.  Thus, add a 
WARN_ONCE() so something is left behind for them to understand why perf 
commnads that require PEBS is not working properly.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -311,9 +311,11 @@ static int alloc_bts_buffer(int cpu)
 	if (!x86_pmu.bts)
 		return 0;
 
-	buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL, node);
-	if (unlikely(!buffer))
+	buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, node);
+	if (unlikely(!buffer)) {
+		WARN_ONCE(1, "%s: BTS buffer allocation failure\n", __func__);
 		return -ENOMEM;
+	}
 
 	max = BTS_BUFFER_SIZE / BTS_RECORD_SIZE;
 	thresh = max / 16;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, perf: avoid spamming kernel log for bts buffer failure
  2014-06-30 23:04 [patch] x86, perf: avoid spamming kernel log for bts buffer failure David Rientjes
@ 2014-07-01  9:34 ` Peter Zijlstra
  2014-07-02 13:16   ` Stephane Eranian
  2014-07-14 23:33 ` David Rientjes
  2014-07-16 19:21 ` [tip:perf/urgent] perf/x86/intel: Avoid spamming kernel log for BTS " tip-bot for David Rientjes
  2 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2014-07-01  9:34 UTC (permalink / raw)
  To: David Rientjes
  Cc: Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo,
	Stephane Eranian, x86, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 960 bytes --]

On Mon, Jun 30, 2014 at 04:04:08PM -0700, David Rientjes wrote:
> It's unnecessary to excessively spam the kernel log anytime the BTS buffer 
> cannot be allocated, so make this allocation __GFP_NOWARN.
> 
> The user probably will want to at least find some artifact that the 
> allocation has failed in the past, probably due to fragmentation because 
> of its large size, when it's not allocated at bootstrap.  Thus, add a 
> WARN_ONCE() so something is left behind for them to understand why perf 
> commnads that require PEBS is not working properly.

Can you elaborate a bit under which conditions this triggered? Typically
we should be doing fairly well allocating such buffers with GFP_KERNEL,
that should allow things like compaction to run and create higher order
pages.

And the BTS (branch trace store) isn't _that_ large.

That said, the patch is reasonable; although arguably we should maybe do
the same to alloc_pebs_buffer().

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, perf: avoid spamming kernel log for bts buffer failure
  2014-07-01  9:34 ` Peter Zijlstra
@ 2014-07-02 13:16   ` Stephane Eranian
  2014-07-02 13:31     ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Stephane Eranian @ 2014-07-02 13:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Rientjes, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, x86, LKML

On Tue, Jul 1, 2014 at 11:34 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Jun 30, 2014 at 04:04:08PM -0700, David Rientjes wrote:
>> It's unnecessary to excessively spam the kernel log anytime the BTS buffer
>> cannot be allocated, so make this allocation __GFP_NOWARN.
>>
>> The user probably will want to at least find some artifact that the
>> allocation has failed in the past, probably due to fragmentation because
>> of its large size, when it's not allocated at bootstrap.  Thus, add a
>> WARN_ONCE() so something is left behind for them to understand why perf
>> commnads that require PEBS is not working properly.
>
> Can you elaborate a bit under which conditions this triggered? Typically
> we should be doing fairly well allocating such buffers with GFP_KERNEL,
> that should allow things like compaction to run and create higher order
> pages.
>
I think this triggers when you have fragmented memory and you have
perf_events active and inactive (i.e., 0 users = no nmi watchdog) frequently.
Each first user invokes the reserve_ds() function to reserve DS, PEBS, BTS.

The reason for BTS rather then PEBS is the size of the allocation.
PEBS allocates one page, i.e., less likely to get a failure than BTS
which allocates 4 pages, I think.

David and I discussed this. He can probably add more background
info, if needed.

> And the BTS (branch trace store) isn't _that_ large.
>
> That said, the patch is reasonable; although arguably we should maybe do
> the same to alloc_pebs_buffer().

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, perf: avoid spamming kernel log for bts buffer failure
  2014-07-02 13:16   ` Stephane Eranian
@ 2014-07-02 13:31     ` Peter Zijlstra
  2014-07-02 13:38       ` Stephane Eranian
  2014-07-02 23:30       ` David Rientjes
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2014-07-02 13:31 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: David Rientjes, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, x86, LKML

[-- Attachment #1: Type: text/plain, Size: 1758 bytes --]

On Wed, Jul 02, 2014 at 03:16:40PM +0200, Stephane Eranian wrote:
> On Tue, Jul 1, 2014 at 11:34 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, Jun 30, 2014 at 04:04:08PM -0700, David Rientjes wrote:
> >> It's unnecessary to excessively spam the kernel log anytime the BTS buffer
> >> cannot be allocated, so make this allocation __GFP_NOWARN.
> >>
> >> The user probably will want to at least find some artifact that the
> >> allocation has failed in the past, probably due to fragmentation because
> >> of its large size, when it's not allocated at bootstrap.  Thus, add a
> >> WARN_ONCE() so something is left behind for them to understand why perf
> >> commnads that require PEBS is not working properly.
> >
> > Can you elaborate a bit under which conditions this triggered? Typically
> > we should be doing fairly well allocating such buffers with GFP_KERNEL,
> > that should allow things like compaction to run and create higher order
> > pages.
> >
> I think this triggers when you have fragmented memory and you have
> perf_events active and inactive (i.e., 0 users = no nmi watchdog) frequently.
> Each first user invokes the reserve_ds() function to reserve DS, PEBS, BTS.

Right, that'd suck. I suppose we could also change that to allocate the
DS resources on first demand and never free them again.

So only allocate the PEBS buffer when we create the first PEBS event,
and idem for the BTS muck.

> The reason for BTS rather then PEBS is the size of the allocation.
> PEBS allocates one page, i.e., less likely to get a failure than BTS
> which allocates 4 pages, I think.

Sure..

> David and I discussed this. He can probably add more background
> info, if needed.

It would still be good to see why compaction etc is failing.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, perf: avoid spamming kernel log for bts buffer failure
  2014-07-02 13:31     ` Peter Zijlstra
@ 2014-07-02 13:38       ` Stephane Eranian
  2014-07-02 14:54         ` Peter Zijlstra
  2014-07-02 23:30       ` David Rientjes
  1 sibling, 1 reply; 10+ messages in thread
From: Stephane Eranian @ 2014-07-02 13:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Rientjes, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, x86, LKML

On Wed, Jul 2, 2014 at 3:31 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Jul 02, 2014 at 03:16:40PM +0200, Stephane Eranian wrote:
>> On Tue, Jul 1, 2014 at 11:34 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Mon, Jun 30, 2014 at 04:04:08PM -0700, David Rientjes wrote:
>> >> It's unnecessary to excessively spam the kernel log anytime the BTS buffer
>> >> cannot be allocated, so make this allocation __GFP_NOWARN.
>> >>
>> >> The user probably will want to at least find some artifact that the
>> >> allocation has failed in the past, probably due to fragmentation because
>> >> of its large size, when it's not allocated at bootstrap.  Thus, add a
>> >> WARN_ONCE() so something is left behind for them to understand why perf
>> >> commnads that require PEBS is not working properly.
>> >
>> > Can you elaborate a bit under which conditions this triggered? Typically
>> > we should be doing fairly well allocating such buffers with GFP_KERNEL,
>> > that should allow things like compaction to run and create higher order
>> > pages.
>> >
>> I think this triggers when you have fragmented memory and you have
>> perf_events active and inactive (i.e., 0 users = no nmi watchdog) frequently.
>> Each first user invokes the reserve_ds() function to reserve DS, PEBS, BTS.
>
> Right, that'd suck. I suppose we could also change that to allocate the
> DS resources on first demand and never free them again.
>
Some may argue that if you never use perf_event again, you are wasting
(1 + 1 + 4) pages per CPU. That may not be okay on some systems.

But yes, it would avoid this problem and also take the penalty for the allocs
only once.


> So only allocate the PEBS buffer when we create the first PEBS event,
> and idem for the BTS muck.
>
>> The reason for BTS rather then PEBS is the size of the allocation.
>> PEBS allocates one page, i.e., less likely to get a failure than BTS
>> which allocates 4 pages, I think.
>
> Sure..
>
>> David and I discussed this. He can probably add more background
>> info, if needed.
>
> It would still be good to see why compaction etc is failing.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, perf: avoid spamming kernel log for bts buffer failure
  2014-07-02 13:38       ` Stephane Eranian
@ 2014-07-02 14:54         ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2014-07-02 14:54 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: David Rientjes, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, x86, LKML

[-- Attachment #1: Type: text/plain, Size: 700 bytes --]

On Wed, Jul 02, 2014 at 03:38:50PM +0200, Stephane Eranian wrote:
> On Wed, Jul 2, 2014 at 3:31 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > Right, that'd suck. I suppose we could also change that to allocate the
> > DS resources on first demand and never free them again.
> >
> Some may argue that if you never use perf_event again, you are wasting
> (1 + 1 + 4) pages per CPU. That may not be okay on some systems.
> 
> But yes, it would avoid this problem and also take the penalty for the allocs
> only once.

We could of course over engineer this and put a timer on them to free
after 5 minutes to avoid the alloc/free cycle for workloads that
create/destroy events a lot.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, perf: avoid spamming kernel log for bts buffer failure
  2014-07-02 13:31     ` Peter Zijlstra
  2014-07-02 13:38       ` Stephane Eranian
@ 2014-07-02 23:30       ` David Rientjes
  1 sibling, 0 replies; 10+ messages in thread
From: David Rientjes @ 2014-07-02 23:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, x86, LKML

On Wed, 2 Jul 2014, Peter Zijlstra wrote:

> > David and I discussed this. He can probably add more background
> > info, if needed.
> 
> It would still be good to see why compaction etc is failing.
> 

"Why compaction is failing" has been the story of my life for the past few 
weeks, unfortunately.  One person ran into this and here is the breakdown 
at the time of the page allocation failure:

Node 0 DMA32: 12478*4kB 7595*8kB 2087*16kB 26*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 144896kB
Node 0 Normal: 55037*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 222196kB
Node 1 Normal: 165860*4kB 18*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 663600kB

So we really don't have any order-4 or higher memory and if compaction and 
reclaim fails, then the allocation failure is printed every time.  This is 
what the patch is trying to address and we can't guarantee order-4 is 
always allocatable even with GFP_KERNEL.  This allocation is just outside 
the PAGE_ALLOC_COSTLY_ORDER == 3 threshold that would have oom killed 
something and retried; the oom killer is deferred for this case because 
there's no guarantee killing a process would have resulted in order-4 
memory being available (and nobody wants killing when they are 902MB above 
the per-zone min watermarks like this poor guy).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, perf: avoid spamming kernel log for bts buffer failure
  2014-06-30 23:04 [patch] x86, perf: avoid spamming kernel log for bts buffer failure David Rientjes
  2014-07-01  9:34 ` Peter Zijlstra
@ 2014-07-14 23:33 ` David Rientjes
  2014-07-15  8:53   ` Peter Zijlstra
  2014-07-16 19:21 ` [tip:perf/urgent] perf/x86/intel: Avoid spamming kernel log for BTS " tip-bot for David Rientjes
  2 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2014-07-14 23:33 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Stephane Eranian, x86, linux-kernel

On Mon, 30 Jun 2014, David Rientjes wrote:

> It's unnecessary to excessively spam the kernel log anytime the BTS buffer 
> cannot be allocated, so make this allocation __GFP_NOWARN.
> 
> The user probably will want to at least find some artifact that the 
> allocation has failed in the past, probably due to fragmentation because 
> of its large size, when it's not allocated at bootstrap.  Thus, add a 
> WARN_ONCE() so something is left behind for them to understand why perf 
> commnads that require PEBS is not working properly.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

Peter, would you like to ack this?  Trying to get this merged in time for 
the next merge window.

> ---
>  arch/x86/kernel/cpu/perf_event_intel_ds.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> @@ -311,9 +311,11 @@ static int alloc_bts_buffer(int cpu)
>  	if (!x86_pmu.bts)
>  		return 0;
>  
> -	buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL, node);
> -	if (unlikely(!buffer))
> +	buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, node);
> +	if (unlikely(!buffer)) {
> +		WARN_ONCE(1, "%s: BTS buffer allocation failure\n", __func__);
>  		return -ENOMEM;
> +	}
>  
>  	max = BTS_BUFFER_SIZE / BTS_RECORD_SIZE;
>  	thresh = max / 16;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, perf: avoid spamming kernel log for bts buffer failure
  2014-07-14 23:33 ` David Rientjes
@ 2014-07-15  8:53   ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2014-07-15  8:53 UTC (permalink / raw)
  To: David Rientjes
  Cc: Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo,
	Stephane Eranian, x86, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 848 bytes --]

On Mon, Jul 14, 2014 at 04:33:47PM -0700, David Rientjes wrote:
> On Mon, 30 Jun 2014, David Rientjes wrote:
> 
> > It's unnecessary to excessively spam the kernel log anytime the BTS buffer 
> > cannot be allocated, so make this allocation __GFP_NOWARN.
> > 
> > The user probably will want to at least find some artifact that the 
> > allocation has failed in the past, probably due to fragmentation because 
> > of its large size, when it's not allocated at bootstrap.  Thus, add a 
> > WARN_ONCE() so something is left behind for them to understand why perf 
> > commnads that require PEBS is not working properly.
> > 
> > Signed-off-by: David Rientjes <rientjes@google.com>
> 
> Peter, would you like to ack this?  Trying to get this merged in time for 
> the next merge window.

I've queued it; still don't like it though.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:perf/urgent] perf/x86/intel: Avoid spamming kernel log for BTS buffer failure
  2014-06-30 23:04 [patch] x86, perf: avoid spamming kernel log for bts buffer failure David Rientjes
  2014-07-01  9:34 ` Peter Zijlstra
  2014-07-14 23:33 ` David Rientjes
@ 2014-07-16 19:21 ` tip-bot for David Rientjes
  2 siblings, 0 replies; 10+ messages in thread
From: tip-bot for David Rientjes @ 2014-07-16 19:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, peterz, acme, tglx, rientjes

Commit-ID:  4485154138f6ffa5b252cb490aba3e8eb30124e4
Gitweb:     http://git.kernel.org/tip/4485154138f6ffa5b252cb490aba3e8eb30124e4
Author:     David Rientjes <rientjes@google.com>
AuthorDate: Mon, 30 Jun 2014 16:04:08 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 16 Jul 2014 13:31:30 +0200

perf/x86/intel: Avoid spamming kernel log for BTS buffer failure

It's unnecessary to excessively spam the kernel log anytime the BTS buffer
cannot be allocated, so make this allocation __GFP_NOWARN.

The user probably will want to at least find some artifact that the
allocation has failed in the past, probably due to fragmentation because
of its large size, when it's not allocated at bootstrap.  Thus, add a
WARN_ONCE() so something is left behind for them to understand why perf
commnads that require PEBS is not working properly.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1406301600460.26302@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 980970c..696ade3 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -311,9 +311,11 @@ static int alloc_bts_buffer(int cpu)
 	if (!x86_pmu.bts)
 		return 0;
 
-	buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL, node);
-	if (unlikely(!buffer))
+	buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, node);
+	if (unlikely(!buffer)) {
+		WARN_ONCE(1, "%s: BTS buffer allocation failure\n", __func__);
 		return -ENOMEM;
+	}
 
 	max = BTS_BUFFER_SIZE / BTS_RECORD_SIZE;
 	thresh = max / 16;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-07-16 19:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-30 23:04 [patch] x86, perf: avoid spamming kernel log for bts buffer failure David Rientjes
2014-07-01  9:34 ` Peter Zijlstra
2014-07-02 13:16   ` Stephane Eranian
2014-07-02 13:31     ` Peter Zijlstra
2014-07-02 13:38       ` Stephane Eranian
2014-07-02 14:54         ` Peter Zijlstra
2014-07-02 23:30       ` David Rientjes
2014-07-14 23:33 ` David Rientjes
2014-07-15  8:53   ` Peter Zijlstra
2014-07-16 19:21 ` [tip:perf/urgent] perf/x86/intel: Avoid spamming kernel log for BTS " tip-bot for David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).