All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
	Alexandre Chartre <alexandre.chartre@oracle.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Petr Mladek <pmladek@suse.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Juergen Gross <jgross@suse.com>, Brian Gerst <brgerst@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Will Deacon <will@kernel.org>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Wei Liu <wei.liu@kernel.org>,
	Michael Kelley <mikelley@microsoft.com>,
	Jason Chen CJ <jason.cj.chen@intel.com>,
	Zhao Yakui <yakui.zhao@intel.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>
Subject: Re: [patch V6 12/37] x86/entry: Provide idtentry_entry/exit_cond_rcu()
Date: Wed, 20 May 2020 10:38:06 -0700	[thread overview]
Message-ID: <20200520173806.GP2869@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <CALCETrWAVTjsKwih06GeK237w7RLSE2D2+naiunA=VFEJY1meQ@mail.gmail.com>

On Wed, May 20, 2020 at 08:36:06AM -0700, Andy Lutomirski wrote:
> On Tue, May 19, 2020 at 7:23 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > On Tue, May 19, 2020 at 05:26:58PM -0700, Andy Lutomirski wrote:
> > > On Tue, May 19, 2020 at 2:20 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > Andy Lutomirski <luto@kernel.org> writes:
> > > > > On Tue, May 19, 2020 at 1:20 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > >> Thomas Gleixner <tglx@linutronix.de> writes:
> > > > >> It's about this:
> > > > >>
> > > > >> rcu_nmi_enter()
> > > > >> {
> > > > >>         if (!rcu_is_watching()) {
> > > > >>             make it watch;
> > > > >>         } else if (!in_nmi()) {
> > > > >>             do_magic_nohz_dyntick_muck();
> > > > >>         }
> > > > >>
> > > > >> So if we do all irq/system vector entries conditional then the
> > > > >> do_magic() gets never executed. After that I got lost...
> > > > >
> > > > > I'm also baffled by that magic, but I'm also not suggesting doing this
> > > > > to *all* entries -- just the not-super-magic ones that use
> > > > > idtentry_enter().
> > > > >
> > > > > Paul, what is this code actually trying to do?
> > > >
> > > > Citing Paul from IRC:
> > > >
> > > >   "The way things are right now, you can leave out the rcu_irq_enter()
> > > >    if this is not a nohz_full CPU.
> > > >
> > > >    Or if this is a nohz_full CPU, and the tick is already
> > > >    enabled, in that case you could also leave out the rcu_irq_enter().
> > > >
> > > >    Or even if this is a nohz_full CPU and it does not have the tick
> > > >    enabled, if it has been in the kernel less than a few tens of
> > > >    milliseconds, still OK to avoid invoking rcu_irq_enter()
> > > >
> > > >    But my guess is that it would be a lot simpler to just always call
> > > >    it.
> > > >
> > > > Hope that helps.
> > >
> > > Maybe?
> > >
> > > Unless I've missed something, the effect here is that #PF hitting in
> > > an RCU-watching context will skip rcu_irq_enter(), whereas all IRQs
> > > (because you converted them) as well as other faults and traps will
> > > call rcu_irq_enter().
> > >
> > > Once upon a time, we did this horrible thing where, on entry from user
> > > mode, we would turn on interrupts while still in CONTEXT_USER, which
> > > means we could get an IRQ in an extended quiescent state.  This means
> > > that the IRQ code had to end the EQS so that IRQ handlers could use
> > > RCU.  But I killed this a few years ago -- x86 Linux now has a rule
> > > that, if IF=1, we are *not* in an EQS with the sole exception of the
> > > idle code.
> > >
> > > In my dream world, we would never ever get IRQs while in an EQS -- we
> > > would do MWAIT with IF=0 and we would exit the EQS before taking the
> > > interrupt.  But I guess we still need to support HLT, which means we
> > > have this mess.
> > >
> > > But I still think we can plausibly get rid of the conditional.
> >
> > You mean the conditional in rcu_nmi_enter()?  In a NO_HZ_FULL=n system,
> > this becomes:
> 
> So, I meant the conditional in tglx's patch that makes page faults special.

OK.

> > >                                                                 If we
> > > get an IRQ or (egads!) a fault in idle context, we'll have
> > > !__rcu_is_watching(), but, AFAICT, we also have preemption off.
> >
> > Or we could be early in the kernel-entry code or late in the kernel-exit
> > code, but as far as I know, preemption is disabled on those code paths.
> > As are interrupts, right?  And interrupts are disabled on the portions
> > of the CPU-hotplug code where RCU is not watching, if I recall correctly.
> 
> Interrupts are off in the parts of the entry/exit that RCU considers
> to be user mode.  We can get various faults, although these should be
> either NMI-like or events that genuinely or effectively happened in
> user mode.

Fair enough!

> > A nohz_full CPU does not enable the scheduling-clock interrupt upon
> > entry to the kernel.  Normally, this is fine because that CPU will very
> > quickly exit back to nohz_full userspace execution, so that RCU will
> > see the quiescent state, either by sampling it directly or by deducing
> > the CPU's passage through that quiescent state by comparing with state
> > that was captured earlier.  The grace-period kthread notices the lack
> > of a quiescent state and will eventually set ->rcu_urgent_qs to
> > trigger this code.
> >
> > But if the nohz_full CPU stays in the kernel for an extended time,
> > perhaps due to OOM handling or due to processing of some huge I/O that
> > hits in-memory buffers/cache, then RCU needs some way of detecting
> > quiescent states on that CPU.  This requires the scheduling-clock
> > interrupt to be alive and well.
> >
> > Are there other ways to get this done?  But of course!  RCU could
> > for example use smp_call_function_single() or use workqueues to force
> > execution onto that CPU and enable the tick that way.  This gets a
> > little involved in order to avoid deadlock, but if the added check
> > in rcu_nmi_enter() is causing trouble, something can be arranged.
> > Though that something would cause more latency excursions than
> > does the current code.
> >
> > Or did you have something else in mind?
> 
> I'm trying to understand when we actually need to call the function.
> Is it just the scheduling interrupt that's supposed to call
> rcu_irq_enter()?  But the scheduling interrupt is off, so I'm
> confused.

The scheduling-clock interrupt is indeed off, but if execution remains
in the kernel for an extended time period, this becomes a problem.
RCU quiescent states don't happen, or if they do, they are not reported
to RCU.  Grace periods never end, and the system eventually OOMs.

And it is not all that hard to make a CPU stay in the kernel for minutes
at a time on a large system.

So what happens is that if RCU notices that a given CPU has not responded
in a reasonable time period, it sets that CPU's ->rcu_urgent_qs.  This
flag plays various roles in various configurations, but on nohz_full CPUs
it causes that CPU's next rcu_nmi_enter() invocation to turn that CPU's
tick on.  It also sets that CPU's ->rcu_forced_tick flag, which prevents
redundant turning on of the tick and also causes the quiescent-state
detection code to turn off the tick for this CPU.

As you say, the scheduling-clock tick cannot turn itself on, but
there might be other interrupts, exceptions, and so on that could.
And if nothing like that happens (as might well be the case on a
well-isolated CPU), RCU will eventually force one.  But it waits a few
hundred milliseconds in order to take advantage of whatever naturally
occurring interrupt might appear in the meantime.

Does that help?

							Thanx, Paul

  parent reply	other threads:[~2020-05-20 17:38 UTC|newest]

Thread overview: 159+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-15 23:45 [patch V6 00/37] x86/entry: Rework leftovers and merge plan Thomas Gleixner
2020-05-15 23:45 ` [patch V6 01/37] tracing/hwlat: Use ktime_get_mono_fast_ns() Thomas Gleixner
2020-05-19 21:26   ` Steven Rostedt
2020-05-19 21:45     ` Thomas Gleixner
2020-05-19 22:18       ` Steven Rostedt
2020-05-20 19:51         ` Thomas Gleixner
2020-05-20 20:14   ` Peter Zijlstra
2020-05-20 22:20     ` Thomas Gleixner
2020-05-15 23:45 ` [patch V6 02/37] tracing/hwlat: Split ftrace_nmi_enter/exit() Thomas Gleixner
2020-05-19 22:23   ` Steven Rostedt
2020-05-15 23:45 ` [patch V6 03/37] nmi, tracing: Provide nmi_enter/exit_notrace() Thomas Gleixner
2020-05-17  5:12   ` Andy Lutomirski
2020-05-19 22:24   ` Steven Rostedt
2020-05-15 23:45 ` [patch V6 04/37] x86: Make hardware latency tracing explicit Thomas Gleixner
2020-05-17  5:36   ` Andy Lutomirski
2020-05-17  8:48     ` Thomas Gleixner
2020-05-18  5:50       ` Andy Lutomirski
2020-05-18  8:03         ` Thomas Gleixner
2020-05-18 20:42           ` Andy Lutomirski
2020-05-18  8:01   ` Peter Zijlstra
2020-05-18  8:05     ` Thomas Gleixner
2020-05-18  8:08       ` Peter Zijlstra
2020-05-20 20:09         ` Thomas Gleixner
2020-05-20 20:14           ` Andy Lutomirski
2020-05-20 22:20             ` Thomas Gleixner
2020-05-15 23:45 ` [patch V6 05/37] genirq: Provide irq_enter/exit_rcu() Thomas Gleixner
2020-05-18 23:06   ` Andy Lutomirski
2020-05-15 23:45 ` [patch V6 06/37] genirq: Provde __irq_enter/exit_raw() Thomas Gleixner
2020-05-18 23:07   ` Andy Lutomirski
2020-05-15 23:45 ` [patch V6 07/37] x86/entry: Provide helpers for execute on irqstack Thomas Gleixner
2020-05-18 23:11   ` Andy Lutomirski
2020-05-18 23:46     ` Andy Lutomirski
2020-05-18 23:53       ` Thomas Gleixner
2020-05-18 23:56         ` Andy Lutomirski
2020-05-20 12:35           ` Thomas Gleixner
2020-05-20 15:09             ` Andy Lutomirski
2020-05-20 15:27               ` Thomas Gleixner
2020-05-20 15:36                 ` Andy Lutomirski
2020-05-18 23:51     ` Thomas Gleixner
2020-05-15 23:45 ` [patch V6 08/37] x86/entry/64: Move do_softirq_own_stack() to C Thomas Gleixner
2020-05-18 23:48   ` Andy Lutomirski
2020-05-15 23:45 ` [patch V6 09/37] x86/entry: Split idtentry_enter/exit() Thomas Gleixner
2020-05-18 23:49   ` Andy Lutomirski
2020-05-19  8:25     ` Thomas Gleixner
2020-05-15 23:45 ` [patch V6 10/37] x86/entry: Switch XEN/PV hypercall entry to IDTENTRY Thomas Gleixner
2020-05-19 17:06   ` Andy Lutomirski
2020-05-19 18:57     ` Thomas Gleixner
2020-05-19 19:44       ` Andy Lutomirski
2020-05-20  8:06         ` Jürgen Groß
2020-05-20 11:31           ` Andrew Cooper
2020-05-20 14:13         ` Thomas Gleixner
2020-05-20 15:16           ` Andy Lutomirski
2020-05-20 17:22             ` Andy Lutomirski
2020-05-20 19:16               ` Thomas Gleixner
2020-05-20 23:21                 ` Andy Lutomirski
2020-05-21 10:45                   ` Thomas Gleixner
2020-05-21  2:23                 ` Boris Ostrovsky
2020-05-21  7:08                   ` Thomas Gleixner
2020-05-15 23:45 ` [patch V6 11/37] x86/entry/64: Simplify idtentry_body Thomas Gleixner
2020-05-19 17:06   ` Andy Lutomirski
2020-05-15 23:45 ` [patch V6 12/37] x86/entry: Provide idtentry_entry/exit_cond_rcu() Thomas Gleixner
2020-05-19 17:08   ` Andy Lutomirski
2020-05-19 19:00     ` Thomas Gleixner
2020-05-19 20:20       ` Thomas Gleixner
2020-05-19 20:24         ` Andy Lutomirski
2020-05-19 21:20           ` Thomas Gleixner
2020-05-20  0:26             ` Andy Lutomirski
2020-05-20  2:23               ` Paul E. McKenney
2020-05-20 15:36                 ` Andy Lutomirski
2020-05-20 16:51                   ` Andy Lutomirski
2020-05-20 18:05                     ` Paul E. McKenney
2020-05-20 19:49                       ` Thomas Gleixner
2020-05-20 22:15                         ` Paul E. McKenney
2020-05-20 23:25                           ` Paul E. McKenney
2020-05-21  8:31                             ` Thomas Gleixner
2020-05-21 13:39                               ` Paul E. McKenney
2020-05-21 18:41                                 ` Thomas Gleixner
2020-05-21 19:04                                   ` Paul E. McKenney
2020-05-20 18:32                     ` Thomas Gleixner
2020-05-20 19:24                     ` Thomas Gleixner
2020-05-20 19:42                       ` Paul E. McKenney
2020-05-20 17:38                   ` Paul E. McKenney [this message]
2020-05-20 17:47                     ` Andy Lutomirski
2020-05-20 18:11                       ` Paul E. McKenney
2020-05-20 14:19               ` Thomas Gleixner
2020-05-27  8:12   ` [tip: x86/entry] " tip-bot2 for Thomas Gleixner
2020-05-15 23:46 ` [patch V6 13/37] x86/entry: Switch page fault exception to IDTENTRY_RAW Thomas Gleixner
2020-05-19 20:12   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 14/37] x86/entry: Remove the transition leftovers Thomas Gleixner
2020-05-19 20:13   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 15/37] x86/entry: Change exit path of xen_failsafe_callback Thomas Gleixner
2020-05-19 20:14   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 16/37] x86/entry/64: Remove error_exit Thomas Gleixner
2020-05-19 20:14   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 17/37] x86/entry/32: Remove common_exception Thomas Gleixner
2020-05-19 20:14   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 18/37] x86/irq: Use generic irq_regs implementation Thomas Gleixner
2020-05-15 23:46 ` [patch V6 19/37] x86/irq: Convey vector as argument and not in ptregs Thomas Gleixner
2020-05-19 20:19   ` Andy Lutomirski
2020-05-21 13:22     ` Thomas Gleixner
2020-05-22 18:48       ` Boris Ostrovsky
2020-05-22 19:26         ` Josh Poimboeuf
2020-05-22 19:54           ` Thomas Gleixner
2020-05-15 23:46 ` [patch V6 20/37] x86/irq/64: Provide handle_irq() Thomas Gleixner
2020-05-19 20:21   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 21/37] x86/entry: Add IRQENTRY_IRQ macro Thomas Gleixner
2020-05-19 20:27   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 22/37] x86/entry: Use idtentry for interrupts Thomas Gleixner
2020-05-19 20:28   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 23/37] x86/entry: Provide IDTENTRY_SYSVEC Thomas Gleixner
2020-05-20  0:29   ` Andy Lutomirski
2020-05-20 15:07     ` Thomas Gleixner
2020-05-15 23:46 ` [patch V6 24/37] x86/entry: Convert APIC interrupts to IDTENTRY_SYSVEC Thomas Gleixner
2020-05-20  0:27   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 25/37] x86/entry: Convert SMP system vectors " Thomas Gleixner
2020-05-20  0:28   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 26/37] x86/entry: Convert various system vectors Thomas Gleixner
2020-05-20  0:30   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 27/37] x86/entry: Convert KVM vectors to IDTENTRY_SYSVEC Thomas Gleixner
2020-05-20  0:30   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 28/37] x86/entry: Convert various hypervisor " Thomas Gleixner
2020-05-20  0:31   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 29/37] x86/entry: Convert XEN hypercall vector " Thomas Gleixner
2020-05-20  0:31   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 30/37] x86/entry: Convert reschedule interrupt to IDTENTRY_RAW Thomas Gleixner
2020-05-19 23:57   ` Andy Lutomirski
2020-05-20 15:08     ` Thomas Gleixner
2020-05-15 23:46 ` [patch V6 31/37] x86/entry: Remove the apic/BUILD interrupt leftovers Thomas Gleixner
2020-05-20  0:32   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 32/37] x86/entry/64: Remove IRQ stack switching ASM Thomas Gleixner
2020-05-20  0:33   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 33/37] x86/entry: Make enter_from_user_mode() static Thomas Gleixner
2020-05-20  0:34   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 34/37] x86/entry/32: Remove redundant irq disable code Thomas Gleixner
2020-05-20  0:35   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 35/37] x86/entry/64: Remove TRACE_IRQS_*_DEBUG Thomas Gleixner
2020-05-20  0:46   ` Andy Lutomirski
2020-05-15 23:46 ` [patch V6 36/37] x86/entry: Move paranoid irq tracing out of ASM code Thomas Gleixner
2020-05-20  0:53   ` Andy Lutomirski
2020-05-20 15:16     ` Thomas Gleixner
2020-05-20 17:13       ` Andy Lutomirski
2020-05-20 18:33         ` Thomas Gleixner
2020-05-15 23:46 ` [patch V6 37/37] x86/entry: Remove the TRACE_IRQS cruft Thomas Gleixner
2020-05-18 23:07   ` Andy Lutomirski
2020-05-16 17:18 ` [patch V6 00/37] x86/entry: Rework leftovers and merge plan Paul E. McKenney
2020-05-19 12:28   ` Joel Fernandes
2020-05-18 16:07 ` Peter Zijlstra
2020-05-18 18:53   ` Thomas Gleixner
2020-05-19  8:29     ` Peter Zijlstra
2020-05-18 20:24   ` Thomas Gleixner
2020-05-19  8:38     ` Peter Zijlstra
2020-05-19  9:02       ` Peter Zijlstra
2020-05-23  2:52         ` Lai Jiangshan
2020-05-23 13:08           ` Peter Zijlstra
2020-06-15 16:17             ` Peter Zijlstra
2020-05-19  9:06       ` Thomas Gleixner
2020-05-19 18:37 ` Steven Rostedt
2020-05-19 19:09   ` Thomas Gleixner
2020-05-19 19:13     ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200520173806.GP2869@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=alexandre.chartre@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=brgerst@gmail.com \
    --cc=frederic@kernel.org \
    --cc=jason.cj.chen@intel.com \
    --cc=jgross@suse.com \
    --cc=joel@joelfernandes.org \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mikelley@microsoft.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sean.j.christopherson@intel.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=wei.liu@kernel.org \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yakui.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.