From: Brian Gerst <brgerst@gmail.com>
To: Alexander Graf <graf@amazon.com>
Cc: Andy Lutomirski <luto@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
LKML <linux-kernel@vger.kernel.org>,
Andrew Cooper <andrew.cooper3@citrix.com>,
X86 ML <x86@kernel.org>, "Paul E. McKenney" <paulmck@kernel.org>,
Alexandre Chartre <alexandre.chartre@oracle.com>,
Frederic Weisbecker <frederic@kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <sean.j.christopherson@intel.com>,
Masami Hiramatsu <mhiramat@kernel.org>,
Petr Mladek <pmladek@suse.com>,
Steven Rostedt <rostedt@goodmis.org>,
Joel Fernandes <joel@joelfernandes.org>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Juergen Gross <jgross@suse.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Josh Poimboeuf <jpoimboe@redhat.com>,
Will Deacon <will@kernel.org>,
Tom Lendacky <thomas.lendacky@amd.com>,
Wei Liu <wei.liu@kernel.org>,
Michael Kelley <mikelley@microsoft.com>,
Jason Chen CJ <jason.cj.chen@intel.com>,
Zhao Yakui <yakui.zhao@intel.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Avi Kivity <avi@scylladb.com>,
"Herrenschmidt, Benjamin" <benh@amazon.com>,
robketr@amazon.de, amos@scylladb.com
Subject: Re: [patch V9 21/39] x86/irq: Convey vector as argument and not in ptregs
Date: Tue, 25 Aug 2020 21:03:05 -0400 [thread overview]
Message-ID: <CAMzpN2i3AL3cED-XAo-YmaAD5PhjxfwPs9e0JPPNZOkOpu=9HQ@mail.gmail.com> (raw)
In-Reply-To: <b1fdf037-9c11-9f47-f285-e9a0843d648a@amazon.com>
On Tue, Aug 25, 2020 at 8:04 PM Alexander Graf <graf@amazon.com> wrote:
>
> Hi Andy,
>
> On 26.08.20 01:41, Andy Lutomirski wrote:
> >
> > On Tue, Aug 25, 2020 at 4:18 PM Alexander Graf <graf@amazon.com> wrote:
> >>
> >> Hi Thomas,
> >>
> >> On 25.08.20 12:28, Thomas Gleixner wrote:
> >>> void irq_complete_move(struct irq_cfg *cfg)
> > {
> > __irq_complete_move(cfg, ~get_irq_regs()->orig_ax);
> > }
> >
> >>> Alex,
> >>>
> >>> On Mon, Aug 24 2020 at 19:29, Alexander Graf wrote:
> >>>> I'm currently trying to understand a performance regression with
> >>>> ScyllaDB on i3en.3xlarge (KVM based VM on Skylake) which we reliably
> >>>> bisected down to this commit:
> >>>>
> >>>> https://github.com/scylladb/scylla/issues/7036
> >>>>
> >>>> What we're seeing is that syscalls such as membarrier() take forever
> >>>> (0-10 µs would be normal):
> >>> ...
> >>>> That again seems to stem from a vastly slowed down
> >>>> smp_call_function_many_cond():
> >>>>
> >>>> Samples: 218K of event 'cpu-clock', 4000 Hz
> >>>> Overhead Shared Object Symbol
> >>>> 94.51% [kernel] [k] smp_call_function_many_cond
> >>>> 0.76% [kernel] [k] __do_softirq
> >>>> 0.32% [kernel] [k] native_queued_spin_lock_slowpath
> >>>> [...]
> >>>>
> >>>> which is stuck in
> >>>>
> >>>> │ csd_lock_wait():
> >>>> │ smp_cond_load_acquire(&csd->flags, !(VAL &
> >>>> 0.00 │ mov 0x8(%rcx),%edx
> >>>> 0.00 │ and $0x1,%edx
> >>>> │ ↓ je 2b9
> >>>> │ rep_nop():
> >>>> 0.70 │2af: pause
> >>>> │ csd_lock_wait():
> >>>> 92.82 │ mov 0x8(%rcx),%edx
> >>>> 6.48 │ and $0x1,%edx
> >>>> 0.00 │ ↑ jne 2af
> >>>> 0.00 │2b9: ↑ jmp 282
> >>>>
> >>>>
> >>>> Given the patch at hand I was expecting lost IPIs, but I can't quite see
> >>>> anything getting lost.
> >>>
> >>> I have no idea how that patch should be related to IPI and smp function
> >>> calls. It's changing the way how regular device interrupts and their
> >>> spurious counterpart are handled and not the way how IPIs are
> >>> handled. They are handled by direct vectors and do not go through
> >>> do_IRQ() at all.
> >>>
> >>> Aside of that the commit just changes the way how the interrupt vector
> >>> of a regular device interrupt is stored and conveyed. The extra read and
> >>> write on the cache hot stack is hardly related to anything IPI.
> >>
> >> I am as puzzled as you are, but the bisect is very clear: 79b9c183021e
> >> works fast and 633260fa1 (as well as mainline) shows the weird behavior
> >> above.
> >>
> >> It gets even better. This small (demonstrative only, mangled) patch on
> >> top of 633260fa1 also resolves the performance issue:
> >>
> >> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> >> index c766936..7e91e9a 100644
> >> --- a/arch/x86/kernel/irq.c
> >> +++ b/arch/x86/kernel/irq.c
> >> @@ -239,6 +239,7 @@ __visible void __irq_entry do_IRQ(struct pt_regs
> >> *regs, unsigned long vector)
> >> * lower 8 bits.
> >> */
> >> vector &= 0xFF;
> >> + regs->orig_ax = ~vector;
> >>
> >> /* entering_irq() tells RCU that we're not quiescent. Check it. */
> >> RCU_LOCKDEP_WARN(!rcu_is_watching(), "IRQ failed to wake up RCU");
> >>
> >>
> >> To me that sounds like some irq exit code somewhere must still be
> >> looking at orig_ax to decide on something - and that something is wrong
> >> now that we removed the negation of the vector. It also seems to have an
> >> impact on remote function calls.
> >>
> >> I'll have a deeper look tomorrow again if I can find any such place, but
> >> I wouldn't mind if anyone could point me into the right direction
> >> earlier :).
> >
> > How about this:
> >
> > void irq_complete_move(struct irq_cfg *cfg)
> > {
> > __irq_complete_move(cfg, ~get_irq_regs()->orig_ax);
> > }
> >
> > in arch/x86/kernel/apic/vector.c.
> >
>
> Thanks a lot, I stumbled over the same thing just after I sent the email
> as well and had been trying to see if I can quickly patch it up before I
> fall asleep :).
>
> The code path above is used by the APIC vector move (irqbalance) logic,
> which explains why not everyone was seeing issues.
>
> So with 633260fa1 applied, we never get out of moving state for our IRQ
> because orig_ax is always -1. That means we send an IPI to the cleanup
> vector on every single device interrupt, completely occupying the poor
> CPU that we moved the IRQ from.
>
> I've confirmed that the patch below fixes the issue and will send a
> proper, more complete patch on top of mainline with fancy description
> and stable tag tomorrow.
>
>
> Alex
>
>
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index e7434cd..a474e6e 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -734,7 +734,6 @@ SYM_CODE_START_LOCAL(common_spurious)
> call interrupt_entry
> UNWIND_HINT_REGS indirect=1
> movq ORIG_RAX(%rdi), %rsi /* get vector from stack */
> - movq $-1, ORIG_RAX(%rdi) /* no syscall to restart */
> call smp_spurious_interrupt /* rdi points to pt_regs */
> jmp ret_from_intr
> SYM_CODE_END(common_spurious)
> @@ -746,7 +745,6 @@ SYM_CODE_START_LOCAL(common_interrupt)
> call interrupt_entry
> UNWIND_HINT_REGS indirect=1
> movq ORIG_RAX(%rdi), %rsi /* get vector from stack */
> - movq $-1, ORIG_RAX(%rdi) /* no syscall to restart */
> call do_IRQ /* rdi points to pt_regs */
> /* 0(%rsp): old RSP */
> ret_from_intr:
> diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
> index 67768e5443..5b6f74e 100644
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -934,7 +934,7 @@ static void __irq_complete_move(struct irq_cfg *cfg,
> unsigned vector)
>
> void irq_complete_move(struct irq_cfg *cfg)
> {
> - __irq_complete_move(cfg, ~get_irq_regs()->orig_ax);
> + __irq_complete_move(cfg, get_irq_regs()->orig_ax);
> }
I think you need to also truncate the vector to 8-bits, since it now
gets sign-extended when pushed into the orig_ax slot.
--
Brian Gerst
next prev parent reply other threads:[~2020-08-26 1:11 UTC|newest]
Thread overview: 132+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-21 20:05 [patch V9 00/39] x86/entry: Rework leftovers (was part V) Thomas Gleixner
2020-05-21 20:05 ` [patch V9 01/39] nmi, tracing: Make hardware latency tracing noinstr safe Thomas Gleixner
2020-05-27 8:12 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 02/39] rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter() Thomas Gleixner
2020-05-21 21:03 ` Paul E. McKenney
2020-05-21 21:25 ` Thomas Gleixner
2020-05-26 8:14 ` Ingo Molnar
2020-05-26 15:34 ` Paul E. McKenney
2020-05-27 8:12 ` [tip: x86/entry] " tip-bot2 for Paul E. McKenney
2020-05-21 20:05 ` [patch V9 03/39] rcu: Provide rcu_irq_exit_check_preempt() Thomas Gleixner
2020-05-27 8:12 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 04/39] x86/entry: Provide idtentry_entry/exit_cond_rcu() Thomas Gleixner
2020-05-21 21:06 ` Paul E. McKenney
2020-05-26 8:23 ` Ingo Molnar
2020-05-26 8:58 ` Thomas Gleixner
2020-05-21 20:05 ` [patch V9 05/39] x86/entry: Provide idtentry_enter/exit_user() Thomas Gleixner
2020-05-27 8:12 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 06/39] x86/idtentry: Switch to conditional RCU handling Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 07/39] x86/entry: Cleanup idtentry_enter/exit() leftovers Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] x86/entry: Clean up " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 08/39] genirq: Provide irq_enter/exit_rcu() Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 09/39] genirq: Provide __irq_enter/exit_raw() Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 10/39] x86/entry: Provide helpers for execute on irqstack Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] x86/entry: Provide helpers for executing on the irqstack tip-bot2 for Thomas Gleixner
2020-06-05 17:18 ` [patch V9 10/39] x86/entry: Provide helpers for execute on irqstack Qian Cai
2020-06-05 17:36 ` Peter Zijlstra
2020-06-05 17:52 ` Qian Cai
2020-06-07 11:59 ` Thomas Gleixner
2020-06-07 18:27 ` Qian Cai
2020-06-08 16:01 ` Qian Cai
2020-06-08 22:20 ` Thomas Gleixner
2020-06-09 2:32 ` Qian Cai
2020-06-09 20:33 ` Thomas Gleixner
2020-06-09 20:50 ` Thomas Gleixner
2020-06-10 12:38 ` Qian Cai
2020-06-10 19:38 ` Thomas Gleixner
2020-06-13 13:55 ` Qian Cai
2020-06-13 14:03 ` Thomas Gleixner
2020-06-13 21:41 ` Qian Cai
2020-06-14 8:59 ` Thomas Gleixner
2020-05-21 20:05 ` [patch V9 11/39] x86/entry/64: Move do_softirq_own_stack() to C Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 12/39] x86/entry: Split out idtentry_exit_cond_resched() Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 13/39] x86/entry: Switch XEN/PV hypercall entry to IDTENTRY Thomas Gleixner
2020-05-22 18:32 ` [patch V9-1 " Thomas Gleixner
2020-05-26 7:44 ` Jürgen Groß
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 14/39] x86/entry/64: Simplify idtentry_body Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 15/39] x86/entry: Switch page fault exception to IDTENTRY_RAW Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 16/39] x86/entry: Remove the transition leftovers Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 17/39] x86/entry: Change exit path of xen_failsafe_callback Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 18/39] x86/entry/64: Remove error_exit Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] x86/entry/64: Remove error_exit() tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 19/39] x86/entry/32: Remove common_exception Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] x86/entry/32: Remove common_exception() tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 20/39] x86/irq: Use generic irq_regs implementation Thomas Gleixner
2020-05-26 18:39 ` damian
2020-05-28 9:50 ` Thomas Gleixner
2020-05-28 20:20 ` damian
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 21/39] x86/irq: Convey vector as argument and not in ptregs Thomas Gleixner
2020-05-22 19:34 ` Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-08-24 17:29 ` [patch V9 21/39] " Alexander Graf
2020-08-25 10:28 ` Thomas Gleixner
2020-08-25 23:17 ` Alexander Graf
2020-08-25 23:41 ` Andy Lutomirski
2020-08-26 0:04 ` Alexander Graf
2020-08-26 1:03 ` Brian Gerst [this message]
2020-08-26 0:55 ` Thomas Gleixner
2020-05-21 20:05 ` [patch V9 22/39] x86/irq: Rework handle_irq() for 64bit Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] x86/irq: Rework handle_irq() for 64-bit tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 23/39] x86/entry: Add IRQENTRY_IRQ macro Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 24/39] x86/entry: Use idtentry for interrupts Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 25/39] x86/entry: Provide IDTENTRY_SYSVEC Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 26/39] x86/entry: Convert APIC interrupts to IDTENTRY_SYSVEC Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 27/39] x86/entry: Convert SMP system vectors " Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 28/39] x86/entry: Convert various system vectors Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 29/39] x86/entry: Convert KVM vectors to IDTENTRY_SYSVEC* Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 30/39] x86/entry: Convert various hypervisor vectors to IDTENTRY_SYSVEC Thomas Gleixner
2020-05-26 9:29 ` Wei Liu
2020-05-27 1:46 ` Boqun Feng
2020-05-27 8:38 ` Wei Liu
2020-05-27 12:09 ` Wei Liu
2020-05-27 23:06 ` Boqun Feng
2020-05-27 12:30 ` Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 31/39] x86/entry: Convert XEN hypercall vector " Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 32/39] x86/entry: Convert reschedule interrupt to IDTENTRY_SYSVEC_SIMPLE Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 33/39] x86/entry: Remove the apic/BUILD interrupt leftovers Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 34/39] x86/entry/64: Remove IRQ stack switching ASM Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 35/39] x86/entry: Make enter_from_user_mode() static Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 36/39] x86/entry/32: Remove redundant irq disable code Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 37/39] x86/entry/64: Remove TRACE_IRQS_*_DEBUG Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 38/39] x86/entry: Move paranoid irq tracing out of ASM code Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-21 20:05 ` [patch V9 39/39] x86/entry: Remove the TRACE_IRQS cruft Thomas Gleixner
2020-05-27 8:11 ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-22 7:20 ` [patch V9 00/39] x86/entry: Rework leftovers (was part V) Andrew Cooper
2020-05-22 21:17 ` Peter Zijlstra
2020-06-03 19:18 ` Andrew Cooper
2020-06-04 13:25 ` Peter Zijlstra
2020-06-04 13:29 ` Paolo Bonzini
2020-06-04 13:35 ` Peter Zijlstra
2020-06-04 15:42 ` Andy Lutomirski
2020-06-04 15:55 ` Peter Zijlstra
2020-05-22 14:26 ` Boris Ostrovsky
2020-05-22 17:47 ` Thomas Gleixner
2020-05-22 18:08 ` Thomas Gleixner
2020-05-26 4:33 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMzpN2i3AL3cED-XAo-YmaAD5PhjxfwPs9e0JPPNZOkOpu=9HQ@mail.gmail.com' \
--to=brgerst@gmail.com \
--cc=alexandre.chartre@oracle.com \
--cc=amos@scylladb.com \
--cc=andrew.cooper3@citrix.com \
--cc=avi@scylladb.com \
--cc=benh@amazon.com \
--cc=boris.ostrovsky@oracle.com \
--cc=frederic@kernel.org \
--cc=graf@amazon.com \
--cc=jason.cj.chen@intel.com \
--cc=jgross@suse.com \
--cc=joel@joelfernandes.org \
--cc=jpoimboe@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mikelley@microsoft.com \
--cc=paulmck@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=robketr@amazon.de \
--cc=rostedt@goodmis.org \
--cc=sean.j.christopherson@intel.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=wei.liu@kernel.org \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yakui.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).