linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Qian Cai <cai@lca.pw>, Peter Zijlstra <peterz@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	X86 ML <x86@kernel.org>, "Paul E. McKenney" <paulmck@kernel.org>,
	Alexandre Chartre <alexandre.chartre@oracle.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Petr Mladek <pmladek@suse.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Juergen Gross <jgross@suse.com>, Brian Gerst <brgerst@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Will Deacon <will@kernel.org>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Wei Liu <wei.liu@kernel.org>,
	Michael Kelley <mikelley@microsoft.com>,
	Jason Chen CJ <jason.cj.chen@intel.com>,
	Zhao Yakui <yakui.zhao@intel.com>,
	Alexander Potapenko <glider@google.com>
Subject: Re: [patch V9 10/39] x86/entry: Provide helpers for execute on irqstack
Date: Tue, 09 Jun 2020 00:20:06 +0200	[thread overview]
Message-ID: <87pna98ajt.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <20200608160144.GA987@lca.pw>

Qian,

can you please ensure that people who got cc'ed because the problem
affects their subsystem are included on your replies even if you are
replying to a different subthread?

I explicitely did:

     Cc:+ Alexander

at the very beginning of my reply:

   https://lore.kernel.org/r/87v9k3jdc6.fsf@nanos.tec.linutronix.de

to make you aware of that.

Yes, email sucks, but it sucks even more when people are careless.

Qian Cai <cai@lca.pw> writes:
> On Fri, Jun 05, 2020 at 07:36:22PM +0200, Peter Zijlstra wrote:
>
> Even after I trimmed the .config [1] to the barely minimal, a subset of LTP
> test still unable to finish on those AMD servers with page_owner=on.

What a surprise...

> [1]:
> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
>
> It looks like because this new IRQ entry introduced by this patch,
>
> </IRQ>
> asm_call_on_stack at arch/x86/entry/entry_64.S:710
> handle_edge_irq at kernel/irq/chip.c:832
>
> which will running out of the stack depot limit due to nested loops
> below.
>
> which has this loop,
>
> 	do {
> 		...
> 		handle_irq_event(desc);
> 		...
> 	} while ((desc->istate & IRQS_PENDING) &&
> 		!irqd_irq_disabled(&desc->irq_data));

This loop has absolutely nothing to do with stack entry usage. 

foo()
  do {
     bar();
  } while (condition);
}

If you take a stack trace inside bar() it will be the same stack trace
for every single loop iteration. And that stack trace will not be any
different from:

foo()
{
  bar():
}

assumed that the call chain leading to foo() is the same in both cases.

And you can add even more loops in subsequent call chains within
bar(). They do not matter at all.

>  Here has a nested loop,
>
>	    for_each_action_of_desc(desc, action) {
>		    ...
>		    res = action->handler(irq, action->dev_id);
>		    ...
>	    }

And this one is completely irrelevant because the interrupt which we are
looking at is a PCI interrupt which CANNOT be shared. IOW, the number of
loop iterations and the number of handlers invoked is exactly ONE.

I seriously have no idea what you are trying to demonstrate by finding
loops in a SINGLE callchain.

> Here has one more nested loop,
>
> 	while (table->orig_nents) {
> 		...
> 		free_fn(sgl, alloc_size);
> 		...
> 	}
>
> free_fn() will call kmem_cache_free(). Since we have page_owner=on, it
> will call save_stack() to save the each free stack trace.

That stack trace for each invocation of free_fn() in this loop is
exactly the same stack trace. The same stack trace is not eating up any
memory because the hash matches, i.e. the stack trace in the depot is
already known.

Here is the simplified difference between the old code and the new code:

  Old                             New

  handle_edge_irq                 handle_edge_irq
  do_IRQ	                  asm_call_on_stack
  common_interrupt                common_interrupt
                                  asm_common_interrupt

IOW, for every _UNIQUE_ interrupt related call chain, there is exactly
ONE stack entry more than before.

For a loop which generates the exact same stack trace for every
iteration this extra entry is not a problem.

But what matters is that interrupts can hit any random code path. So the
amount of possible non-unique call chains is pretty much unlimited. And
with a high number of non-unique call chains the extra entry starts to
matter.

It's trival math, isn't it?

TS  = Total Size of depot
as  = average size of all stored unique stack traces
ms  = maximum number of unqiue stack traces which fit in TS

    ms_old = TS / as

Lets further assume that the vast majority of stack traces are taken
from interrupt context. That means with the new code this results in:

    ms_new = TS / (as + 1)

==> ms_new = ms_old * as / (as + 1)

Depending on the value of 'as' the +1 can shave off a significant
percentage of capacity. IOW, the capacity is simply too small now for
the test scenario you are running. Truly a surprising outcome, right?

To get facts instead of useless loop theories, can you please apply the
patch below, enable DEBUGFS and provide the output of

       /sys/kernel/debug/stackdepot/info

for a kernel before that change and after? Please read out that file at
periodically roughly the same amounts of time after starting your test
scenario.

Note, that I doubled the size of the stack depot so that we get real
numbers and not the cutoff by the size limit. IOW, the warning should
not trigger anymore. If it triggers nevertheless then the numbers will
still tell us an interesting story.

Thanks,

        tglx
---
 lib/stackdepot.c |   43 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -31,6 +31,7 @@
 #include <linux/stackdepot.h>
 #include <linux/string.h>
 #include <linux/types.h>
+#include <linux/debugfs.h>
 
 #define DEPOT_STACK_BITS (sizeof(depot_stack_handle_t) * 8)
 
@@ -42,7 +43,7 @@
 					STACK_ALLOC_ALIGN)
 #define STACK_ALLOC_INDEX_BITS (DEPOT_STACK_BITS - \
 		STACK_ALLOC_NULL_PROTECTION_BITS - STACK_ALLOC_OFFSET_BITS)
-#define STACK_ALLOC_SLABS_CAP 8192
+#define STACK_ALLOC_SLABS_CAP 16384
 #define STACK_ALLOC_MAX_SLABS \
 	(((1LL << (STACK_ALLOC_INDEX_BITS)) < STACK_ALLOC_SLABS_CAP) ? \
 	 (1LL << (STACK_ALLOC_INDEX_BITS)) : STACK_ALLOC_SLABS_CAP)
@@ -70,6 +71,7 @@ static void *stack_slabs[STACK_ALLOC_MAX
 static int depot_index;
 static int next_slab_inited;
 static size_t depot_offset;
+static unsigned long unique_stacks;
 static DEFINE_SPINLOCK(depot_lock);
 
 static bool init_stack_slab(void **prealloc)
@@ -138,6 +140,7 @@ static struct stack_record *depot_alloc_
 	stack->handle.valid = 1;
 	memcpy(stack->entries, entries, size * sizeof(unsigned long));
 	depot_offset += required_size;
+	unique_stacks++;
 
 	return stack;
 }
@@ -340,3 +343,41 @@ unsigned int filter_irq_stacks(unsigned
 	return nr_entries;
 }
 EXPORT_SYMBOL_GPL(filter_irq_stacks);
+
+static int debug_show(struct seq_file *m, void *p)
+{
+	unsigned long unst;
+	int didx, doff;
+
+	spin_lock_irq(&depot_lock);
+	unst = unique_stacks;
+	didx = depot_index;
+	doff = depot_offset;
+	spin_unlock_irq(&depot_lock);
+
+	seq_printf(m, "Unique stacks: %lu\n", unst);
+	seq_printf(m, "Depot index:   %d\n", didx);
+	seq_printf(m, "Depot offset:  %d\n", doff);
+	return 0;
+}
+
+static int debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, debug_show, inode->i_private);
+}
+
+static const struct file_operations dfs_ops = {
+	.open		= debug_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static int __init debugfs_init(void)
+{
+	struct dentry *root_dir = debugfs_create_dir("stackdepot", NULL);
+
+	debugfs_create_file("info", 0444, root_dir, NULL, &dfs_ops);
+	return 0;
+}
+__initcall(debugfs_init);



  reply	other threads:[~2020-06-08 22:20 UTC|newest]

Thread overview: 132+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-21 20:05 [patch V9 00/39] x86/entry: Rework leftovers (was part V) Thomas Gleixner
2020-05-21 20:05 ` [patch V9 01/39] nmi, tracing: Make hardware latency tracing noinstr safe Thomas Gleixner
2020-05-27  8:12   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 02/39] rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter() Thomas Gleixner
2020-05-21 21:03   ` Paul E. McKenney
2020-05-21 21:25     ` Thomas Gleixner
2020-05-26  8:14     ` Ingo Molnar
2020-05-26 15:34       ` Paul E. McKenney
2020-05-27  8:12   ` [tip: x86/entry] " tip-bot2 for Paul E. McKenney
2020-05-21 20:05 ` [patch V9 03/39] rcu: Provide rcu_irq_exit_check_preempt() Thomas Gleixner
2020-05-27  8:12   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 04/39] x86/entry: Provide idtentry_entry/exit_cond_rcu() Thomas Gleixner
2020-05-21 21:06   ` Paul E. McKenney
2020-05-26  8:23   ` Ingo Molnar
2020-05-26  8:58     ` Thomas Gleixner
2020-05-21 20:05 ` [patch V9 05/39] x86/entry: Provide idtentry_enter/exit_user() Thomas Gleixner
2020-05-27  8:12   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 06/39] x86/idtentry: Switch to conditional RCU handling Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 07/39] x86/entry: Cleanup idtentry_enter/exit() leftovers Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] x86/entry: Clean up " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 08/39] genirq: Provide irq_enter/exit_rcu() Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 09/39] genirq: Provide __irq_enter/exit_raw() Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 10/39] x86/entry: Provide helpers for execute on irqstack Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] x86/entry: Provide helpers for executing on the irqstack tip-bot2 for Thomas Gleixner
2020-06-05 17:18   ` [patch V9 10/39] x86/entry: Provide helpers for execute on irqstack Qian Cai
2020-06-05 17:36     ` Peter Zijlstra
2020-06-05 17:52       ` Qian Cai
2020-06-07 11:59         ` Thomas Gleixner
2020-06-07 18:27           ` Qian Cai
2020-06-08 16:01       ` Qian Cai
2020-06-08 22:20         ` Thomas Gleixner [this message]
2020-06-09  2:32           ` Qian Cai
2020-06-09 20:33             ` Thomas Gleixner
2020-06-09 20:50               ` Thomas Gleixner
2020-06-10 12:38                 ` Qian Cai
2020-06-10 19:38                   ` Thomas Gleixner
2020-06-13 13:55                     ` Qian Cai
2020-06-13 14:03                       ` Thomas Gleixner
2020-06-13 21:41                         ` Qian Cai
2020-06-14  8:59                           ` Thomas Gleixner
2020-05-21 20:05 ` [patch V9 11/39] x86/entry/64: Move do_softirq_own_stack() to C Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 12/39] x86/entry: Split out idtentry_exit_cond_resched() Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 13/39] x86/entry: Switch XEN/PV hypercall entry to IDTENTRY Thomas Gleixner
2020-05-22 18:32   ` [patch V9-1 " Thomas Gleixner
2020-05-26  7:44     ` Jürgen Groß
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 14/39] x86/entry/64: Simplify idtentry_body Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 15/39] x86/entry: Switch page fault exception to IDTENTRY_RAW Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 16/39] x86/entry: Remove the transition leftovers Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 17/39] x86/entry: Change exit path of xen_failsafe_callback Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 18/39] x86/entry/64: Remove error_exit Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] x86/entry/64: Remove error_exit() tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 19/39] x86/entry/32: Remove common_exception Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] x86/entry/32: Remove common_exception() tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 20/39] x86/irq: Use generic irq_regs implementation Thomas Gleixner
2020-05-26 18:39   ` damian
2020-05-28  9:50     ` Thomas Gleixner
2020-05-28 20:20       ` damian
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 21/39] x86/irq: Convey vector as argument and not in ptregs Thomas Gleixner
2020-05-22 19:34   ` Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-08-24 17:29   ` [patch V9 21/39] " Alexander Graf
2020-08-25 10:28     ` Thomas Gleixner
2020-08-25 23:17       ` Alexander Graf
2020-08-25 23:41         ` Andy Lutomirski
2020-08-26  0:04           ` Alexander Graf
2020-08-26  1:03             ` Brian Gerst
2020-08-26  0:55           ` Thomas Gleixner
2020-05-21 20:05 ` [patch V9 22/39] x86/irq: Rework handle_irq() for 64bit Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] x86/irq: Rework handle_irq() for 64-bit tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 23/39] x86/entry: Add IRQENTRY_IRQ macro Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 24/39] x86/entry: Use idtentry for interrupts Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 25/39] x86/entry: Provide IDTENTRY_SYSVEC Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 26/39] x86/entry: Convert APIC interrupts to IDTENTRY_SYSVEC Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 27/39] x86/entry: Convert SMP system vectors " Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 28/39] x86/entry: Convert various system vectors Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 29/39] x86/entry: Convert KVM vectors to IDTENTRY_SYSVEC* Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 30/39] x86/entry: Convert various hypervisor vectors to IDTENTRY_SYSVEC Thomas Gleixner
2020-05-26  9:29   ` Wei Liu
2020-05-27  1:46   ` Boqun Feng
2020-05-27  8:38     ` Wei Liu
2020-05-27 12:09       ` Wei Liu
2020-05-27 23:06         ` Boqun Feng
2020-05-27 12:30       ` Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 31/39] x86/entry: Convert XEN hypercall vector " Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 32/39] x86/entry: Convert reschedule interrupt to IDTENTRY_SYSVEC_SIMPLE Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 33/39] x86/entry: Remove the apic/BUILD interrupt leftovers Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 34/39] x86/entry/64: Remove IRQ stack switching ASM Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 35/39] x86/entry: Make enter_from_user_mode() static Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 36/39] x86/entry/32: Remove redundant irq disable code Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 37/39] x86/entry/64: Remove TRACE_IRQS_*_DEBUG Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 38/39] x86/entry: Move paranoid irq tracing out of ASM code Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 39/39] x86/entry: Remove the TRACE_IRQS cruft Thomas Gleixner
2020-05-27  8:11   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-22  7:20 ` [patch V9 00/39] x86/entry: Rework leftovers (was part V) Andrew Cooper
2020-05-22 21:17   ` Peter Zijlstra
2020-06-03 19:18     ` Andrew Cooper
2020-06-04 13:25       ` Peter Zijlstra
2020-06-04 13:29         ` Paolo Bonzini
2020-06-04 13:35           ` Peter Zijlstra
2020-06-04 15:42             ` Andy Lutomirski
2020-06-04 15:55               ` Peter Zijlstra
2020-05-22 14:26 ` Boris Ostrovsky
2020-05-22 17:47   ` Thomas Gleixner
2020-05-22 18:08     ` Thomas Gleixner
2020-05-26  4:33 ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pna98ajt.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=alexandre.chartre@oracle.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=brgerst@gmail.com \
    --cc=cai@lca.pw \
    --cc=frederic@kernel.org \
    --cc=glider@google.com \
    --cc=jason.cj.chen@intel.com \
    --cc=jgross@suse.com \
    --cc=joel@joelfernandes.org \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mikelley@microsoft.com \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sean.j.christopherson@intel.com \
    --cc=thomas.lendacky@amd.com \
    --cc=wei.liu@kernel.org \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yakui.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).