[RFC,v1,2/2] rcu/tree: Remove dynticks_nmi_nesting counter
diff mbox series

Message ID 5d648897.1c69fb81.5e60a.fc70@mx.google.com
State New
Headers show
Series
  • RCU dyntick nesting counter cleanups
Related show

Commit Message

Joel Fernandes Aug. 27, 2019, 1:33 a.m. UTC
The dynticks_nmi_nesting counter serves 4 purposes:

      (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
          interrupt nesting level.

      (b) We need to detect half-interrupts till we are sure they're not an
          issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0.

      (c) When a quiescent state report is needed from a nohz_full CPU.
          The nesting counter detects we are a first level interrupt.

For (a) we can just use dyntick_nesting == 1 to determine this. Only the
outermost interrupt that interrupted an RCU-idle state can set it to 1.

For (b), this warning condition has not occurred for several kernel
releases.  But we still keep the warning but change it to use
in_interrupt() instead of the nesting counter. In a later year, we can
remove the warning.

For (c), the nest check is not really necessary since forced_tick would
have been set to true in the outermost interrupt, so the nested/NMI
interrupts will check forced_tick anyway, and bail.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 .../Data-Structures/Data-Structures.rst       | 31 +++------
 Documentation/RCU/stallwarn.txt               |  6 +-
 kernel/rcu/tree.c                             | 64 +++++++------------
 kernel/rcu/tree.h                             |  4 +-
 kernel/rcu/tree_stall.h                       |  4 +-
 5 files changed, 41 insertions(+), 68 deletions(-)

Comments

Joel Fernandes Aug. 27, 2019, 1:40 a.m. UTC | #1
On Mon, Aug 26, 2019 at 9:34 PM Joel Fernandes (Google)
<joel@joelfernandes.org> wrote:
>
> The dynticks_nmi_nesting counter serves 4 purposes:
>

And actually, I meant 3 purposes ;-) :-P

thanks,

 - Joel


>       (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
>           interrupt nesting level.
>
>       (b) We need to detect half-interrupts till we are sure they're not an
>           issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0.
>
>       (c) When a quiescent state report is needed from a nohz_full CPU.
>           The nesting counter detects we are a first level interrupt.
>
> For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> outermost interrupt that interrupted an RCU-idle state can set it to 1.
>
> For (b), this warning condition has not occurred for several kernel
> releases.  But we still keep the warning but change it to use
> in_interrupt() instead of the nesting counter. In a later year, we can
> remove the warning.
>
> For (c), the nest check is not really necessary since forced_tick would
> have been set to true in the outermost interrupt, so the nested/NMI
> interrupts will check forced_tick anyway, and bail.
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  .../Data-Structures/Data-Structures.rst       | 31 +++------
>  Documentation/RCU/stallwarn.txt               |  6 +-
>  kernel/rcu/tree.c                             | 64 +++++++------------
>  kernel/rcu/tree.h                             |  4 +-
>  kernel/rcu/tree_stall.h                       |  4 +-
>  5 files changed, 41 insertions(+), 68 deletions(-)
>
> diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
> index 4a48e20a46f2..a5a907f434a1 100644
> --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
> +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
> @@ -936,10 +936,9 @@ This portion of the rcu_data structure is declared as follows:
>  ::
>
>       1   long dynticks_nesting;
> -     2   long dynticks_nmi_nesting;
> -     3   atomic_t dynticks;
> -     4   bool rcu_need_heavy_qs;
> -     5   bool rcu_urgent_qs;
> +     2   atomic_t dynticks;
> +     3   bool rcu_need_heavy_qs;
> +     4   bool rcu_urgent_qs;
>
>  These fields in the rcu_data structure maintain the per-CPU dyntick-idle
>  state for the corresponding CPU. The fields may be accessed only from
> @@ -948,26 +947,14 @@ the corresponding CPU (and from tracing) unless otherwise stated.
>  The ``->dynticks_nesting`` field counts the nesting depth of process
>  execution, so that in normal circumstances this counter has value zero
>  or one. NMIs, irqs, and tracers are counted by the
> -``->dynticks_nmi_nesting`` field. Because NMIs cannot be masked, changes
> +``->dynticks_nesting`` field as well. Because NMIs cannot be masked, changes
>  to this variable have to be undertaken carefully using an algorithm
>  provided by Andy Lutomirski. The initial transition from idle adds one,
>  and nested transitions add two, so that a nesting level of five is
> -represented by a ``->dynticks_nmi_nesting`` value of nine. This counter
> +represented by a ``->dynticks_nesting`` value of nine. This counter
>  can therefore be thought of as counting the number of reasons why this
> -CPU cannot be permitted to enter dyntick-idle mode, aside from
> -process-level transitions.
> -
> -However, it turns out that when running in non-idle kernel context, the
> -Linux kernel is fully capable of entering interrupt handlers that never
> -exit and perhaps also vice versa. Therefore, whenever the
> -``->dynticks_nesting`` field is incremented up from zero, the
> -``->dynticks_nmi_nesting`` field is set to a large positive number, and
> -whenever the ``->dynticks_nesting`` field is decremented down to zero,
> -the the ``->dynticks_nmi_nesting`` field is set to zero. Assuming that
> -the number of misnested interrupts is not sufficient to overflow the
> -counter, this approach corrects the ``->dynticks_nmi_nesting`` field
> -every time the corresponding CPU enters the idle loop from process
> -context.
> +CPU cannot be permitted to enter dyntick-idle mode. It counts both the
> +process-level and interrupt transitions.
>
>  The ``->dynticks`` field counts the corresponding CPU's transitions to
>  and from either dyntick-idle or user mode, so that this counter has an
> @@ -1000,7 +987,9 @@ code.
>  +-----------------------------------------------------------------------+
>  | Because this would fail in the presence of interrupts whose handlers  |
>  | never return and of handlers that manage to return from a made-up     |
> -| interrupt.                                                            |
> +| interrupt. NOTE: The counters have now been combined however          |
> +| a temporary warning has been left to make sure this condition never   |
> +| occurs.                                                               |
>  +-----------------------------------------------------------------------+
>
>  Additional fields are present for some special-purpose builds, and are
> diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
> index f48f4621ccbc..585f73009a56 100644
> --- a/Documentation/RCU/stallwarn.txt
> +++ b/Documentation/RCU/stallwarn.txt
> @@ -173,8 +173,8 @@ For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
>  it will print a message similar to the following:
>
>         INFO: rcu_sched detected stalls on CPUs/tasks:
> -       2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
> -       16-...: (0 ticks this GP) idle=81c/0/0 softirq=764/764 fqs=0
> +       2-...: (3 GPs behind) idle=06c/0 softirq=1453/1455 fqs=0
> +       16-...: (0 ticks this GP) idle=81c/0 softirq=764/764 fqs=0
>         (detected by 32, t=2603 jiffies, g=7075, q=625)
>
>  This message indicates that CPU 32 detected that CPUs 2 and 16 were both
> @@ -225,7 +225,7 @@ an estimate of the total number of RCU callbacks queued across all CPUs
>  In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
>  for each CPU:
>
> -       0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D
> +       0: (64628 ticks this GP) idle=dd5/3fffffffffffffff softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D
>
>  The "last_accelerate:" prints the low-order 16 bits (in hex) of the
>  jiffies counter when this CPU last invoked rcu_try_advance_all_cbs()
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 255cd6835526..1465a3e406f8 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -81,7 +81,6 @@
>
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
>         .dynticks_nesting = 1,
> -       .dynticks_nmi_nesting = 0,
>         .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
>  };
>  struct rcu_state rcu_state = {
> @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>         /* Check for counter underflows */
>         RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
>                          "RCU dynticks_nesting counter underflow!");
> -       RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> -                        "RCU dynticks_nmi_nesting counter underflow/zero!");
>
> -       /* Are we at first interrupt nesting level? */
> -       if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> -               return false;
> -
> -       /* Does CPU appear to be idle from an RCU standpoint? */
> -       return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> +       /* Are we the outermost interrupt that arrived when RCU was idle? */
> +       return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
>  }
>
>  #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch ... */
> @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
>         struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>
>         /* Entering usermode/idle from interrupt is not handled. These would
> -        * mean usermode upcalls or idle entry happened from interrupts. But,
> -        * reset the counter if we warn.
> +        * mean usermode upcalls or idle exit happened from interrupts. Remove
> +        * the warning by 2020.
>          */
> -       if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> -               WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> +       WARN_ON_ONCE(in_interrupt());
>
>         WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
>                      rdp->dynticks_nesting == 0);
> @@ -627,9 +619,8 @@ void rcu_user_enter(void)
>
>  /*
>   * If we are returning from the outermost NMI handler that interrupted an
> - * RCU-idle period, update rdp->dynticks and rdp->dynticks_nmi_nesting
> - * to let the RCU grace-period handling know that the CPU is back to
> - * being RCU-idle.
> + * RCU-idle period, update rdp->dynticks to let the RCU grace-period handling
> + * know that the CPU is back to being RCU-idle.
>   *
>   * If you add or remove a call to rcu_nmi_exit_common(), be sure to test
>   * with CONFIG_RCU_EQS_DEBUG=y.
> @@ -639,16 +630,13 @@ static __always_inline void rcu_nmi_exit_common(bool irq)
>         struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>
>         /*
> -        * Check for ->dynticks_nmi_nesting underflow and bad ->dynticks.
> +        * Check for ->dynticks_nesting underflow and bad ->dynticks.
>          * (We are exiting an NMI handler, so RCU better be paying attention
>          * to us!)
>          */
>         WARN_ON_ONCE(rdp->dynticks_nesting <= 0);
> -       WARN_ON_ONCE(rdp->dynticks_nmi_nesting <= 0);
>         WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs());
>
> -       WRITE_ONCE(rdp->dynticks_nmi_nesting, /* No store tearing. */
> -                  rdp->dynticks_nmi_nesting - 1);
>         /*
>          * If the nesting level is not 1, the CPU wasn't RCU-idle, so
>          * leave it in non-RCU-idle state.
> @@ -750,11 +738,10 @@ static void rcu_eqs_exit(bool user)
>         WRITE_ONCE(rdp->dynticks_nesting, 1);
>
>         /* Exiting usermode/idle from interrupt is not handled. These would
> -        * mean usermode upcalls or idle exit happened from interrupts. But,
> -        * reset the counter if we warn.
> +        * mean usermode upcalls or idle exit happened from interrupts. Remove
> +        * the warning by 2020.
>          */
> -       if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> -               WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> +       WARN_ON_ONCE(in_interrupt());
>  }
>
>  /**
> @@ -795,14 +782,13 @@ void rcu_user_exit(void)
>   * rcu_nmi_enter_common - inform RCU of entry to NMI context
>   * @irq: Is this call from rcu_irq_enter?
>   *
> - * If the CPU was idle from RCU's viewpoint, update rdp->dynticks and
> - * rdp->dynticks_nmi_nesting to let the RCU grace-period handling know
> - * that the CPU is active.  This implementation permits nested NMIs, as
> - * long as the nesting level does not overflow an int.  (You will probably
> - * run out of stack space first.)
> + * If the CPU was idle from RCU's viewpoint, update rdp->dynticks to let the
> + * RCU grace-period handling know that the CPU is active.  This implementation
> + * permits nested NMIs, as long as the nesting level does not overflow a long.
> + * (You will probably run out of stack space first.)
>   *
> - * If you add or remove a call to rcu_nmi_enter_common(), be sure to test
> - * with CONFIG_RCU_EQS_DEBUG=y.
> + * If you add or remove a call to rcu_nmi_enter_common(), be sure to test with
> + * CONFIG_RCU_EQS_DEBUG=y.
>   */
>  static __always_inline void rcu_nmi_enter_common(bool irq)
>  {
> @@ -811,15 +797,16 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
>
>         /* Complain about underflow. */
>         WARN_ON_ONCE(rdp->dynticks_nesting < 0);
> -       WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0);
>
>         /*
>          * If idle from RCU viewpoint, atomically increment ->dynticks
> -        * to mark non-idle and increment ->dynticks_nmi_nesting by one.
> -        * Otherwise, increment ->dynticks_nmi_nesting by two.  This means
> -        * if ->dynticks_nmi_nesting is equal to one, we are guaranteed
> +        * to mark non-idle and increment ->dynticks_nesting by one.
> +        * Otherwise, increment ->dynticks_nesting by two.  This means
> +        * if ->dynticks_nesting is equal to one, we are guaranteed
>          * to be in the outermost NMI handler that interrupted an RCU-idle
> -        * period (observation due to Andy Lutomirski).
> +        * period (observation due to Andy Lutomirski). An exception
> +        * is if the interrupt arrived in kernel mode; in this case we would
> +        * be the outermost interrupt but still increment by 2 which is Ok.
>          */
>         if (rcu_dynticks_curr_cpu_in_eqs()) {
>
> @@ -832,8 +819,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
>                         rcu_cleanup_after_idle();
>
>                 incby = 1;
> -       } else if (tick_nohz_full_cpu(rdp->cpu) &&
> -                  !rdp->dynticks_nmi_nesting && rdp->rcu_urgent_qs &&
> +       } else if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_urgent_qs &&
>                    !rdp->rcu_forced_tick) {
>                 rdp->rcu_forced_tick = true;
>                 tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> @@ -846,8 +832,6 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
>         WRITE_ONCE(rdp->dynticks_nesting, /* Prevent store tearing. */
>                    rdp->dynticks_nesting + incby);
>
> -       WRITE_ONCE(rdp->dynticks_nmi_nesting, /* Prevent store tearing. */
> -                  rdp->dynticks_nmi_nesting + 1);
>         barrier();
>  }
>
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 055c31781d3a..ad7d3e31c5cf 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -176,8 +176,8 @@ struct rcu_data {
>
>         /* 3) dynticks interface. */
>         int dynticks_snap;              /* Per-GP tracking for dynticks. */
> -       long dynticks_nesting;          /* Track process nesting level. */
> -       long dynticks_nmi_nesting;      /* Track irq/NMI nesting level. */
> +       long dynticks_nesting;          /* Track dyntick (non-IDLE) nesting
> +                                        * level for kernel entry and interrupt. */
>         atomic_t dynticks;              /* Even value for idle, else odd. */
>         bool rcu_need_heavy_qs;         /* GP old, so heavy quiescent state! */
>         bool rcu_urgent_qs;             /* GP old need light quiescent state. */
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 841ab43f3e60..0676460107d0 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -313,7 +313,7 @@ static void print_cpu_stall_info(int cpu)
>         }
>         print_cpu_stall_fast_no_hz(fast_no_hz, cpu);
>         delta = rcu_seq_ctr(rdp->mynode->gp_seq - rdp->rcu_iw_gp_seq);
> -       pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld/%#lx softirq=%u/%u fqs=%ld %s\n",
> +       pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld softirq=%u/%u fqs=%ld %s\n",
>                cpu,
>                "O."[!!cpu_online(cpu)],
>                "o."[!!(rdp->grpmask & rdp->mynode->qsmaskinit)],
> @@ -323,7 +323,7 @@ static void print_cpu_stall_info(int cpu)
>                                 "!."[!delta],
>                ticks_value, ticks_title,
>                rcu_dynticks_snap(rdp) & 0xfff,
> -              rdp->dynticks_nesting, rdp->dynticks_nmi_nesting,
> +              rdp->dynticks_nesting,
>                rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
>                READ_ONCE(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
>                fast_no_hz);
> --
> 2.23.0.187.g17f5b7556c-goog
>
Paul E. McKenney Aug. 28, 2019, 8:23 p.m. UTC | #2
On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) wrote:
> The dynticks_nmi_nesting counter serves 4 purposes:
> 
>       (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
>           interrupt nesting level.
> 
>       (b) We need to detect half-interrupts till we are sure they're not an
>           issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0.
> 
>       (c) When a quiescent state report is needed from a nohz_full CPU.
>           The nesting counter detects we are a first level interrupt.
> 
> For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> outermost interrupt that interrupted an RCU-idle state can set it to 1.
> 
> For (b), this warning condition has not occurred for several kernel
> releases.  But we still keep the warning but change it to use
> in_interrupt() instead of the nesting counter. In a later year, we can
> remove the warning.
> 
> For (c), the nest check is not really necessary since forced_tick would
> have been set to true in the outermost interrupt, so the nested/NMI
> interrupts will check forced_tick anyway, and bail.

Skipping the commit log and documentation for this pass.

> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  .../Data-Structures/Data-Structures.rst       | 31 +++------
>  Documentation/RCU/stallwarn.txt               |  6 +-
>  kernel/rcu/tree.c                             | 64 +++++++------------
>  kernel/rcu/tree.h                             |  4 +-
>  kernel/rcu/tree_stall.h                       |  4 +-
>  5 files changed, 41 insertions(+), 68 deletions(-)

[ . . . ]

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 255cd6835526..1465a3e406f8 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -81,7 +81,6 @@
>  
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
>  	.dynticks_nesting = 1,
> -	.dynticks_nmi_nesting = 0,

This should be in the previous patch, give or take naming.

>  	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
>  };
>  struct rcu_state rcu_state = {
> @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>  	/* Check for counter underflows */
>  	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
>  			 "RCU dynticks_nesting counter underflow!");
> -	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> -			 "RCU dynticks_nmi_nesting counter underflow/zero!");
>  
> -	/* Are we at first interrupt nesting level? */
> -	if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> -		return false;
> -
> -	/* Does CPU appear to be idle from an RCU standpoint? */
> -	return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> +	/* Are we the outermost interrupt that arrived when RCU was idle? */
> +	return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
>  }
>  
>  #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch ... */
> @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
>  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>  
>  	/* Entering usermode/idle from interrupt is not handled. These would
> -	 * mean usermode upcalls or idle entry happened from interrupts. But,
> -	 * reset the counter if we warn.
> +	 * mean usermode upcalls or idle exit happened from interrupts. Remove
> +	 * the warning by 2020.
>  	 */
> -	if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> -		WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> +	WARN_ON_ONCE(in_interrupt());

And this is a red flag.  Bad things happen should some common code
that disables BH be invoked from the idle loop.  This might not be
happening now, but we need to avoid this sort of constraint.

How about instead merging ->dyntick_nesting into the low-order bits
of ->dyntick_nmi_nesting?

Yes, this assumes that we don't enter process level twice, but it should
be easy to add a WARN_ON() to test for that.  Except that we don't have
to because there is already this near the end of rcu_eqs_exit():

	WARN_ON_ONCE(rdp->dynticks_nmi_nesting);

So the low-order bit of the combined counter could indicate process-level
non-idle, the next three bits could be unused to make interpretation
of hex printouts easier, and then the rest of the bits could be used in
the same way as currently.

This would allow a single read to see the full state, so that 0x1 means
at process level in the kernel, 0x11 is interrupt (or NMI) from process
level, 0x10 is interrupt/NMI from idle/user, and so on.

What am I missing here?  Why wouldn't this work, and without adding yet
another RCU-imposed constraint on some other subsystem?

							Thanx, Paul
Joel Fernandes Aug. 28, 2019, 9:05 p.m. UTC | #3
On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) wrote:
> > The dynticks_nmi_nesting counter serves 4 purposes:
> > 
> >       (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
> >           interrupt nesting level.
> > 
> >       (b) We need to detect half-interrupts till we are sure they're not an
> >           issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0.
> > 
> >       (c) When a quiescent state report is needed from a nohz_full CPU.
> >           The nesting counter detects we are a first level interrupt.
> > 
> > For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> > outermost interrupt that interrupted an RCU-idle state can set it to 1.
> > 
> > For (b), this warning condition has not occurred for several kernel
> > releases.  But we still keep the warning but change it to use
> > in_interrupt() instead of the nesting counter. In a later year, we can
> > remove the warning.
> > 
> > For (c), the nest check is not really necessary since forced_tick would
> > have been set to true in the outermost interrupt, so the nested/NMI
> > interrupts will check forced_tick anyway, and bail.
> 
> Skipping the commit log and documentation for this pass.
[snip] 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 255cd6835526..1465a3e406f8 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -81,7 +81,6 @@
> >  
> >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
> >  	.dynticks_nesting = 1,
> > -	.dynticks_nmi_nesting = 0,
> 
> This should be in the previous patch, give or take naming.

Done.

> >  	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> >  };
> >  struct rcu_state rcu_state = {
> > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> >  	/* Check for counter underflows */
> >  	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> >  			 "RCU dynticks_nesting counter underflow!");
> > -	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> > -			 "RCU dynticks_nmi_nesting counter underflow/zero!");
> >  
> > -	/* Are we at first interrupt nesting level? */
> > -	if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > -		return false;
> > -
> > -	/* Does CPU appear to be idle from an RCU standpoint? */
> > -	return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > +	/* Are we the outermost interrupt that arrived when RCU was idle? */
> > +	return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> >  }
> >  
> >  #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch ... */
> > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> >  
> >  	/* Entering usermode/idle from interrupt is not handled. These would
> > -	 * mean usermode upcalls or idle entry happened from interrupts. But,
> > -	 * reset the counter if we warn.
> > +	 * mean usermode upcalls or idle exit happened from interrupts. Remove
> > +	 * the warning by 2020.
> >  	 */
> > -	if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > -		WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > +	WARN_ON_ONCE(in_interrupt());
> 
> And this is a red flag.  Bad things happen should some common code
> that disables BH be invoked from the idle loop.  This might not be
> happening now, but we need to avoid this sort of constraint.
> How about instead merging ->dyntick_nesting into the low-order bits
> of ->dyntick_nmi_nesting?
> 
> Yes, this assumes that we don't enter process level twice, but it should
> be easy to add a WARN_ON() to test for that.  Except that we don't have
> to because there is already this near the end of rcu_eqs_exit():
> 
> 	WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
> 
> So the low-order bit of the combined counter could indicate process-level
> non-idle, the next three bits could be unused to make interpretation
> of hex printouts easier, and then the rest of the bits could be used in
> the same way as currently.
> 
> This would allow a single read to see the full state, so that 0x1 means
> at process level in the kernel, 0x11 is interrupt (or NMI) from process
> level, 0x10 is interrupt/NMI from idle/user, and so on.
> 
> What am I missing here?  Why wouldn't this work, and without adding yet
> another RCU-imposed constraint on some other subsystem?

What about replacing the warning with a WARN_ON_ONCE(in_irq()), would that
address your concern?

Also, considering this warning condition is most likely never occurring as we
know it, and we are considering deleting it soon enough, is it really worth
reimplementing the whole mechanism with a complex bit-sharing scheme just
because of the BH-disable condition you mentioned, which likely doesn't
happen today? In my implementation, this is just a simple counter. I feel
combining bits in the same counter will just introduce more complexity that
this patch tries to address/avoid.

OTOH, I also don't mind with just deleting the warning altogether if you are
Ok with that.

thanks!

 - Joel
Paul E. McKenney Aug. 28, 2019, 9:19 p.m. UTC | #4
On Wed, Aug 28, 2019 at 05:05:25PM -0400, Joel Fernandes wrote:
> On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) wrote:
> > > The dynticks_nmi_nesting counter serves 4 purposes:
> > > 
> > >       (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
> > >           interrupt nesting level.
> > > 
> > >       (b) We need to detect half-interrupts till we are sure they're not an
> > >           issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0.
> > > 
> > >       (c) When a quiescent state report is needed from a nohz_full CPU.
> > >           The nesting counter detects we are a first level interrupt.
> > > 
> > > For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> > > outermost interrupt that interrupted an RCU-idle state can set it to 1.
> > > 
> > > For (b), this warning condition has not occurred for several kernel
> > > releases.  But we still keep the warning but change it to use
> > > in_interrupt() instead of the nesting counter. In a later year, we can
> > > remove the warning.
> > > 
> > > For (c), the nest check is not really necessary since forced_tick would
> > > have been set to true in the outermost interrupt, so the nested/NMI
> > > interrupts will check forced_tick anyway, and bail.
> > 
> > Skipping the commit log and documentation for this pass.
> [snip] 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 255cd6835526..1465a3e406f8 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -81,7 +81,6 @@
> > >  
> > >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
> > >  	.dynticks_nesting = 1,
> > > -	.dynticks_nmi_nesting = 0,
> > 
> > This should be in the previous patch, give or take naming.
> 
> Done.
> 
> > >  	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> > >  };
> > >  struct rcu_state rcu_state = {
> > > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > >  	/* Check for counter underflows */
> > >  	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > >  			 "RCU dynticks_nesting counter underflow!");
> > > -	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> > > -			 "RCU dynticks_nmi_nesting counter underflow/zero!");
> > >  
> > > -	/* Are we at first interrupt nesting level? */
> > > -	if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > > -		return false;
> > > -
> > > -	/* Does CPU appear to be idle from an RCU standpoint? */
> > > -	return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > > +	/* Are we the outermost interrupt that arrived when RCU was idle? */
> > > +	return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> > >  }
> > >  
> > >  #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch ... */
> > > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > >  
> > >  	/* Entering usermode/idle from interrupt is not handled. These would
> > > -	 * mean usermode upcalls or idle entry happened from interrupts. But,
> > > -	 * reset the counter if we warn.
> > > +	 * mean usermode upcalls or idle exit happened from interrupts. Remove
> > > +	 * the warning by 2020.
> > >  	 */
> > > -	if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > > -		WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > > +	WARN_ON_ONCE(in_interrupt());
> > 
> > And this is a red flag.  Bad things happen should some common code
> > that disables BH be invoked from the idle loop.  This might not be
> > happening now, but we need to avoid this sort of constraint.
> > How about instead merging ->dyntick_nesting into the low-order bits
> > of ->dyntick_nmi_nesting?
> > 
> > Yes, this assumes that we don't enter process level twice, but it should
> > be easy to add a WARN_ON() to test for that.  Except that we don't have
> > to because there is already this near the end of rcu_eqs_exit():
> > 
> > 	WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
> > 
> > So the low-order bit of the combined counter could indicate process-level
> > non-idle, the next three bits could be unused to make interpretation
> > of hex printouts easier, and then the rest of the bits could be used in
> > the same way as currently.
> > 
> > This would allow a single read to see the full state, so that 0x1 means
> > at process level in the kernel, 0x11 is interrupt (or NMI) from process
> > level, 0x10 is interrupt/NMI from idle/user, and so on.
> > 
> > What am I missing here?  Why wouldn't this work, and without adding yet
> > another RCU-imposed constraint on some other subsystem?
> 
> What about replacing the warning with a WARN_ON_ONCE(in_irq()), would that
> address your concern?
> 
> Also, considering this warning condition is most likely never occurring as we
> know it, and we are considering deleting it soon enough, is it really worth
> reimplementing the whole mechanism with a complex bit-sharing scheme just
> because of the BH-disable condition you mentioned, which likely doesn't
> happen today? In my implementation, this is just a simple counter. I feel
> combining bits in the same counter will just introduce more complexity that
> this patch tries to address/avoid.
> 
> OTOH, I also don't mind with just deleting the warning altogether if you are
> Ok with that.

The big advantage of combining the counters is that all of the state is
explicit and visible in one place.  Plus it can be accessed atomically.
And it avoids setting a time bomb for some poor guys just trying to get
their idle-loop jobs done some time in the dim distant future.

Besides, this pair of patches already makes a large change from a
conceptual viewpoint.  If we are going to make a large change, let's
get our money's worth out of that change!

							Thanx, Paul
Joel Fernandes Aug. 28, 2019, 9:42 p.m. UTC | #5
On Wed, Aug 28, 2019 at 02:19:04PM -0700, Paul E. McKenney wrote:
> On Wed, Aug 28, 2019 at 05:05:25PM -0400, Joel Fernandes wrote:
> > On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> > > On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) wrote:
> > > > The dynticks_nmi_nesting counter serves 4 purposes:
> > > > 
> > > >       (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
> > > >           interrupt nesting level.
> > > > 
> > > >       (b) We need to detect half-interrupts till we are sure they're not an
> > > >           issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0.
> > > > 
> > > >       (c) When a quiescent state report is needed from a nohz_full CPU.
> > > >           The nesting counter detects we are a first level interrupt.
> > > > 
> > > > For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> > > > outermost interrupt that interrupted an RCU-idle state can set it to 1.
> > > > 
> > > > For (b), this warning condition has not occurred for several kernel
> > > > releases.  But we still keep the warning but change it to use
> > > > in_interrupt() instead of the nesting counter. In a later year, we can
> > > > remove the warning.
> > > > 
> > > > For (c), the nest check is not really necessary since forced_tick would
> > > > have been set to true in the outermost interrupt, so the nested/NMI
> > > > interrupts will check forced_tick anyway, and bail.
> > > 
> > > Skipping the commit log and documentation for this pass.
> > [snip] 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 255cd6835526..1465a3e406f8 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -81,7 +81,6 @@
> > > >  
> > > >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
> > > >  	.dynticks_nesting = 1,
> > > > -	.dynticks_nmi_nesting = 0,
> > > 
> > > This should be in the previous patch, give or take naming.
> > 
> > Done.
> > 
> > > >  	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> > > >  };
> > > >  struct rcu_state rcu_state = {
> > > > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > > >  	/* Check for counter underflows */
> > > >  	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > > >  			 "RCU dynticks_nesting counter underflow!");
> > > > -	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> > > > -			 "RCU dynticks_nmi_nesting counter underflow/zero!");
> > > >  
> > > > -	/* Are we at first interrupt nesting level? */
> > > > -	if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > > > -		return false;
> > > > -
> > > > -	/* Does CPU appear to be idle from an RCU standpoint? */
> > > > -	return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > > > +	/* Are we the outermost interrupt that arrived when RCU was idle? */
> > > > +	return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> > > >  }
> > > >  
> > > >  #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch ... */
> > > > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > >  
> > > >  	/* Entering usermode/idle from interrupt is not handled. These would
> > > > -	 * mean usermode upcalls or idle entry happened from interrupts. But,
> > > > -	 * reset the counter if we warn.
> > > > +	 * mean usermode upcalls or idle exit happened from interrupts. Remove
> > > > +	 * the warning by 2020.
> > > >  	 */
> > > > -	if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > > > -		WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > > > +	WARN_ON_ONCE(in_interrupt());
> > > 
> > > And this is a red flag.  Bad things happen should some common code
> > > that disables BH be invoked from the idle loop.  This might not be
> > > happening now, but we need to avoid this sort of constraint.
> > > How about instead merging ->dyntick_nesting into the low-order bits
> > > of ->dyntick_nmi_nesting?
> > > 
> > > Yes, this assumes that we don't enter process level twice, but it should
> > > be easy to add a WARN_ON() to test for that.  Except that we don't have
> > > to because there is already this near the end of rcu_eqs_exit():
> > > 
> > > 	WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
> > > 
> > > So the low-order bit of the combined counter could indicate process-level
> > > non-idle, the next three bits could be unused to make interpretation
> > > of hex printouts easier, and then the rest of the bits could be used in
> > > the same way as currently.
> > > 
> > > This would allow a single read to see the full state, so that 0x1 means
> > > at process level in the kernel, 0x11 is interrupt (or NMI) from process
> > > level, 0x10 is interrupt/NMI from idle/user, and so on.
> > > 
> > > What am I missing here?  Why wouldn't this work, and without adding yet
> > > another RCU-imposed constraint on some other subsystem?
> > 
> > What about replacing the warning with a WARN_ON_ONCE(in_irq()), would that
> > address your concern?
> > 
> > Also, considering this warning condition is most likely never occurring as we
> > know it, and we are considering deleting it soon enough, is it really worth
> > reimplementing the whole mechanism with a complex bit-sharing scheme just
> > because of the BH-disable condition you mentioned, which likely doesn't
> > happen today? In my implementation, this is just a simple counter. I feel
> > combining bits in the same counter will just introduce more complexity that
> > this patch tries to address/avoid.
> > 
> > OTOH, I also don't mind with just deleting the warning altogether if you are
> > Ok with that.
> 
> The big advantage of combining the counters is that all of the state is
> explicit and visible in one place.  Plus it can be accessed atomically.
> And it avoids setting a time bomb for some poor guys just trying to get
> their idle-loop jobs done some time in the dim distant future.

I could try the approach you're suggesting but I didn't actually see an issue
with the patch in its current state other than the WARN_ON_ONCE which I could
change to WARN_ON_ONCE(in_irq()) to remove the concern. AFAICS, we don't
detect "half soft-interrupts" in this code in anyway.

I do feel the approach you're suggesting can be a follow up, these 2 patches
just focus on deleting dynticks_nmi_nesting counter and we can test this
approach thoroughly for a release or so.

> Besides, this pair of patches already makes a large change from a
> conceptual viewpoint.  If we are going to make a large change, let's
> get our money's worth out of that change!

IMHO, most of the changes are to code comments, the actual code change is
very little and is just removal of dynticks_nmi_nesting and simplification;
its not really an introduction of a new mechanism.

thanks,

 - Joel
Paul E. McKenney Aug. 28, 2019, 10:01 p.m. UTC | #6
On Wed, Aug 28, 2019 at 05:42:41PM -0400, Joel Fernandes wrote:
> On Wed, Aug 28, 2019 at 02:19:04PM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 28, 2019 at 05:05:25PM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) wrote:
> > > > > The dynticks_nmi_nesting counter serves 4 purposes:
> > > > > 
> > > > >       (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
> > > > >           interrupt nesting level.
> > > > > 
> > > > >       (b) We need to detect half-interrupts till we are sure they're not an
> > > > >           issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0.
> > > > > 
> > > > >       (c) When a quiescent state report is needed from a nohz_full CPU.
> > > > >           The nesting counter detects we are a first level interrupt.
> > > > > 
> > > > > For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> > > > > outermost interrupt that interrupted an RCU-idle state can set it to 1.
> > > > > 
> > > > > For (b), this warning condition has not occurred for several kernel
> > > > > releases.  But we still keep the warning but change it to use
> > > > > in_interrupt() instead of the nesting counter. In a later year, we can
> > > > > remove the warning.
> > > > > 
> > > > > For (c), the nest check is not really necessary since forced_tick would
> > > > > have been set to true in the outermost interrupt, so the nested/NMI
> > > > > interrupts will check forced_tick anyway, and bail.
> > > > 
> > > > Skipping the commit log and documentation for this pass.
> > > [snip] 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 255cd6835526..1465a3e406f8 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -81,7 +81,6 @@
> > > > >  
> > > > >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
> > > > >  	.dynticks_nesting = 1,
> > > > > -	.dynticks_nmi_nesting = 0,
> > > > 
> > > > This should be in the previous patch, give or take naming.
> > > 
> > > Done.
> > > 
> > > > >  	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> > > > >  };
> > > > >  struct rcu_state rcu_state = {
> > > > > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > > > >  	/* Check for counter underflows */
> > > > >  	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > > > >  			 "RCU dynticks_nesting counter underflow!");
> > > > > -	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> > > > > -			 "RCU dynticks_nmi_nesting counter underflow/zero!");
> > > > >  
> > > > > -	/* Are we at first interrupt nesting level? */
> > > > > -	if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > > > > -		return false;
> > > > > -
> > > > > -	/* Does CPU appear to be idle from an RCU standpoint? */
> > > > > -	return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > > > > +	/* Are we the outermost interrupt that arrived when RCU was idle? */
> > > > > +	return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> > > > >  }
> > > > >  
> > > > >  #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch ... */
> > > > > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > > > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > > >  
> > > > >  	/* Entering usermode/idle from interrupt is not handled. These would
> > > > > -	 * mean usermode upcalls or idle entry happened from interrupts. But,
> > > > > -	 * reset the counter if we warn.
> > > > > +	 * mean usermode upcalls or idle exit happened from interrupts. Remove
> > > > > +	 * the warning by 2020.
> > > > >  	 */
> > > > > -	if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > > > > -		WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > > > > +	WARN_ON_ONCE(in_interrupt());
> > > > 
> > > > And this is a red flag.  Bad things happen should some common code
> > > > that disables BH be invoked from the idle loop.  This might not be
> > > > happening now, but we need to avoid this sort of constraint.
> > > > How about instead merging ->dyntick_nesting into the low-order bits
> > > > of ->dyntick_nmi_nesting?
> > > > 
> > > > Yes, this assumes that we don't enter process level twice, but it should
> > > > be easy to add a WARN_ON() to test for that.  Except that we don't have
> > > > to because there is already this near the end of rcu_eqs_exit():
> > > > 
> > > > 	WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
> > > > 
> > > > So the low-order bit of the combined counter could indicate process-level
> > > > non-idle, the next three bits could be unused to make interpretation
> > > > of hex printouts easier, and then the rest of the bits could be used in
> > > > the same way as currently.
> > > > 
> > > > This would allow a single read to see the full state, so that 0x1 means
> > > > at process level in the kernel, 0x11 is interrupt (or NMI) from process
> > > > level, 0x10 is interrupt/NMI from idle/user, and so on.
> > > > 
> > > > What am I missing here?  Why wouldn't this work, and without adding yet
> > > > another RCU-imposed constraint on some other subsystem?
> > > 
> > > What about replacing the warning with a WARN_ON_ONCE(in_irq()), would that
> > > address your concern?
> > > 
> > > Also, considering this warning condition is most likely never occurring as we
> > > know it, and we are considering deleting it soon enough, is it really worth
> > > reimplementing the whole mechanism with a complex bit-sharing scheme just
> > > because of the BH-disable condition you mentioned, which likely doesn't
> > > happen today? In my implementation, this is just a simple counter. I feel
> > > combining bits in the same counter will just introduce more complexity that
> > > this patch tries to address/avoid.
> > > 
> > > OTOH, I also don't mind with just deleting the warning altogether if you are
> > > Ok with that.
> > 
> > The big advantage of combining the counters is that all of the state is
> > explicit and visible in one place.  Plus it can be accessed atomically.
> > And it avoids setting a time bomb for some poor guys just trying to get
> > their idle-loop jobs done some time in the dim distant future.
> 
> I could try the approach you're suggesting but I didn't actually see an issue
> with the patch in its current state other than the WARN_ON_ONCE which I could
> change to WARN_ON_ONCE(in_irq()) to remove the concern. AFAICS, we don't
> detect "half soft-interrupts" in this code in anyway.
> 
> I do feel the approach you're suggesting can be a follow up, these 2 patches
> just focus on deleting dynticks_nmi_nesting counter and we can test this
> approach thoroughly for a release or so.
> 
> > Besides, this pair of patches already makes a large change from a
> > conceptual viewpoint.  If we are going to make a large change, let's
> > get our money's worth out of that change!
> 
> IMHO, most of the changes are to code comments, the actual code change is
> very little and is just removal of dynticks_nmi_nesting and simplification;
> its not really an introduction of a new mechanism.

This change is not fixing a bug, so there is no need for an emergency fix,
and thus no point in additional churn.  I understand that it is a bit
annoying to code and test something and have your friendly maintainer say
"sorry, wrong rocks", and the reason that I understand this is that I do
that to myself rather often.

Welcome to the wonderful world of RCU!  ;-)

							Thanx, Paul
Joel Fernandes Aug. 28, 2019, 10:14 p.m. UTC | #7
On Wed, Aug 28, 2019 at 03:01:08PM -0700, Paul E. McKenney wrote:
> On Wed, Aug 28, 2019 at 05:42:41PM -0400, Joel Fernandes wrote:
> > On Wed, Aug 28, 2019 at 02:19:04PM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 28, 2019 at 05:05:25PM -0400, Joel Fernandes wrote:
> > > > On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) wrote:
> > > > > > The dynticks_nmi_nesting counter serves 4 purposes:
> > > > > > 
> > > > > >       (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
> > > > > >           interrupt nesting level.
> > > > > > 
> > > > > >       (b) We need to detect half-interrupts till we are sure they're not an
> > > > > >           issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0.
> > > > > > 
> > > > > >       (c) When a quiescent state report is needed from a nohz_full CPU.
> > > > > >           The nesting counter detects we are a first level interrupt.
> > > > > > 
> > > > > > For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> > > > > > outermost interrupt that interrupted an RCU-idle state can set it to 1.
> > > > > > 
> > > > > > For (b), this warning condition has not occurred for several kernel
> > > > > > releases.  But we still keep the warning but change it to use
> > > > > > in_interrupt() instead of the nesting counter. In a later year, we can
> > > > > > remove the warning.
> > > > > > 
> > > > > > For (c), the nest check is not really necessary since forced_tick would
> > > > > > have been set to true in the outermost interrupt, so the nested/NMI
> > > > > > interrupts will check forced_tick anyway, and bail.
> > > > > 
> > > > > Skipping the commit log and documentation for this pass.
> > > > [snip] 
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index 255cd6835526..1465a3e406f8 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -81,7 +81,6 @@
> > > > > >  
> > > > > >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
> > > > > >  	.dynticks_nesting = 1,
> > > > > > -	.dynticks_nmi_nesting = 0,
> > > > > 
> > > > > This should be in the previous patch, give or take naming.
> > > > 
> > > > Done.
> > > > 
> > > > > >  	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> > > > > >  };
> > > > > >  struct rcu_state rcu_state = {
> > > > > > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > > > > >  	/* Check for counter underflows */
> > > > > >  	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > > > > >  			 "RCU dynticks_nesting counter underflow!");
> > > > > > -	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> > > > > > -			 "RCU dynticks_nmi_nesting counter underflow/zero!");
> > > > > >  
> > > > > > -	/* Are we at first interrupt nesting level? */
> > > > > > -	if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > > > > > -		return false;
> > > > > > -
> > > > > > -	/* Does CPU appear to be idle from an RCU standpoint? */
> > > > > > -	return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > > > > > +	/* Are we the outermost interrupt that arrived when RCU was idle? */
> > > > > > +	return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> > > > > >  }
> > > > > >  
> > > > > >  #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch ... */
> > > > > > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > > > > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > > > >  
> > > > > >  	/* Entering usermode/idle from interrupt is not handled. These would
> > > > > > -	 * mean usermode upcalls or idle entry happened from interrupts. But,
> > > > > > -	 * reset the counter if we warn.
> > > > > > +	 * mean usermode upcalls or idle exit happened from interrupts. Remove
> > > > > > +	 * the warning by 2020.
> > > > > >  	 */
> > > > > > -	if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > > > > > -		WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > > > > > +	WARN_ON_ONCE(in_interrupt());
> > > > > 
> > > > > And this is a red flag.  Bad things happen should some common code
> > > > > that disables BH be invoked from the idle loop.  This might not be
> > > > > happening now, but we need to avoid this sort of constraint.
> > > > > How about instead merging ->dyntick_nesting into the low-order bits
> > > > > of ->dyntick_nmi_nesting?
> > > > > 
> > > > > Yes, this assumes that we don't enter process level twice, but it should
> > > > > be easy to add a WARN_ON() to test for that.  Except that we don't have
> > > > > to because there is already this near the end of rcu_eqs_exit():
> > > > > 
> > > > > 	WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
> > > > > 
> > > > > So the low-order bit of the combined counter could indicate process-level
> > > > > non-idle, the next three bits could be unused to make interpretation
> > > > > of hex printouts easier, and then the rest of the bits could be used in
> > > > > the same way as currently.
> > > > > 
> > > > > This would allow a single read to see the full state, so that 0x1 means
> > > > > at process level in the kernel, 0x11 is interrupt (or NMI) from process
> > > > > level, 0x10 is interrupt/NMI from idle/user, and so on.
> > > > > 
> > > > > What am I missing here?  Why wouldn't this work, and without adding yet
> > > > > another RCU-imposed constraint on some other subsystem?
> > > > 
> > > > What about replacing the warning with a WARN_ON_ONCE(in_irq()), would that
> > > > address your concern?
> > > > 
> > > > Also, considering this warning condition is most likely never occurring as we
> > > > know it, and we are considering deleting it soon enough, is it really worth
> > > > reimplementing the whole mechanism with a complex bit-sharing scheme just
> > > > because of the BH-disable condition you mentioned, which likely doesn't
> > > > happen today? In my implementation, this is just a simple counter. I feel
> > > > combining bits in the same counter will just introduce more complexity that
> > > > this patch tries to address/avoid.
> > > > 
> > > > OTOH, I also don't mind with just deleting the warning altogether if you are
> > > > Ok with that.
> > > 
> > > The big advantage of combining the counters is that all of the state is
> > > explicit and visible in one place.  Plus it can be accessed atomically.
> > > And it avoids setting a time bomb for some poor guys just trying to get
> > > their idle-loop jobs done some time in the dim distant future.
> > 
> > I could try the approach you're suggesting but I didn't actually see an issue
> > with the patch in its current state other than the WARN_ON_ONCE which I could
> > change to WARN_ON_ONCE(in_irq()) to remove the concern. AFAICS, we don't
> > detect "half soft-interrupts" in this code in anyway.
> > 
> > I do feel the approach you're suggesting can be a follow up, these 2 patches
> > just focus on deleting dynticks_nmi_nesting counter and we can test this
> > approach thoroughly for a release or so.
> > 
> > > Besides, this pair of patches already makes a large change from a
> > > conceptual viewpoint.  If we are going to make a large change, let's
> > > get our money's worth out of that change!
> > 
> > IMHO, most of the changes are to code comments, the actual code change is
> > very little and is just removal of dynticks_nmi_nesting and simplification;
> > its not really an introduction of a new mechanism.
> 
> This change is not fixing a bug, so there is no need for an emergency fix,
> and thus no point in additional churn.  I understand that it is a bit
> annoying to code and test something and have your friendly maintainer say
> "sorry, wrong rocks", and the reason that I understand this is that I do
> that to myself rather often.

The motivation for me for this change is to avoid future bugs such as with
the following patch where "== 2" did not take the force write of
DYNTICK_IRQ_NONIDLE into account:
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=13c4b07593977d9288e5d0c21c89d9ba27e2ea1f

I still don't see it as pointless churn, it is also a maintenance cost in its
current form and the simplification is worth it IMHO both from a readability,
and maintenance stand point.

I still don't see what's technically wrong with the patch. I could perhaps
add the above "== 2" point in the patch?

We could also discuss f2f at LPC to see if we can agree about it?

thanks,

 - Joel
Paul E. McKenney Aug. 28, 2019, 11:12 p.m. UTC | #8
On Wed, Aug 28, 2019 at 06:14:44PM -0400, Joel Fernandes wrote:
> On Wed, Aug 28, 2019 at 03:01:08PM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 28, 2019 at 05:42:41PM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 28, 2019 at 02:19:04PM -0700, Paul E. McKenney wrote:
> > > > On Wed, Aug 28, 2019 at 05:05:25PM -0400, Joel Fernandes wrote:
> > > > > On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> > > > > > On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) wrote:
> > > > > > > The dynticks_nmi_nesting counter serves 4 purposes:
> > > > > > > 
> > > > > > >       (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
> > > > > > >           interrupt nesting level.
> > > > > > > 
> > > > > > >       (b) We need to detect half-interrupts till we are sure they're not an
> > > > > > >           issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0.
> > > > > > > 
> > > > > > >       (c) When a quiescent state report is needed from a nohz_full CPU.
> > > > > > >           The nesting counter detects we are a first level interrupt.
> > > > > > > 
> > > > > > > For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> > > > > > > outermost interrupt that interrupted an RCU-idle state can set it to 1.
> > > > > > > 
> > > > > > > For (b), this warning condition has not occurred for several kernel
> > > > > > > releases.  But we still keep the warning but change it to use
> > > > > > > in_interrupt() instead of the nesting counter. In a later year, we can
> > > > > > > remove the warning.
> > > > > > > 
> > > > > > > For (c), the nest check is not really necessary since forced_tick would
> > > > > > > have been set to true in the outermost interrupt, so the nested/NMI
> > > > > > > interrupts will check forced_tick anyway, and bail.
> > > > > > 
> > > > > > Skipping the commit log and documentation for this pass.
> > > > > [snip] 
> > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > index 255cd6835526..1465a3e406f8 100644
> > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > @@ -81,7 +81,6 @@
> > > > > > >  
> > > > > > >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
> > > > > > >  	.dynticks_nesting = 1,
> > > > > > > -	.dynticks_nmi_nesting = 0,
> > > > > > 
> > > > > > This should be in the previous patch, give or take naming.
> > > > > 
> > > > > Done.
> > > > > 
> > > > > > >  	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> > > > > > >  };
> > > > > > >  struct rcu_state rcu_state = {
> > > > > > > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > > > > > >  	/* Check for counter underflows */
> > > > > > >  	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > > > > > >  			 "RCU dynticks_nesting counter underflow!");
> > > > > > > -	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> > > > > > > -			 "RCU dynticks_nmi_nesting counter underflow/zero!");
> > > > > > >  
> > > > > > > -	/* Are we at first interrupt nesting level? */
> > > > > > > -	if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > > > > > > -		return false;
> > > > > > > -
> > > > > > > -	/* Does CPU appear to be idle from an RCU standpoint? */
> > > > > > > -	return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > > > > > > +	/* Are we the outermost interrupt that arrived when RCU was idle? */
> > > > > > > +	return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> > > > > > >  }
> > > > > > >  
> > > > > > >  #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch ... */
> > > > > > > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > > > > > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > > > > >  
> > > > > > >  	/* Entering usermode/idle from interrupt is not handled. These would
> > > > > > > -	 * mean usermode upcalls or idle entry happened from interrupts. But,
> > > > > > > -	 * reset the counter if we warn.
> > > > > > > +	 * mean usermode upcalls or idle exit happened from interrupts. Remove
> > > > > > > +	 * the warning by 2020.
> > > > > > >  	 */
> > > > > > > -	if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > > > > > > -		WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > > > > > > +	WARN_ON_ONCE(in_interrupt());
> > > > > > 
> > > > > > And this is a red flag.  Bad things happen should some common code
> > > > > > that disables BH be invoked from the idle loop.  This might not be
> > > > > > happening now, but we need to avoid this sort of constraint.
> > > > > > How about instead merging ->dyntick_nesting into the low-order bits
> > > > > > of ->dyntick_nmi_nesting?
> > > > > > 
> > > > > > Yes, this assumes that we don't enter process level twice, but it should
> > > > > > be easy to add a WARN_ON() to test for that.  Except that we don't have
> > > > > > to because there is already this near the end of rcu_eqs_exit():
> > > > > > 
> > > > > > 	WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
> > > > > > 
> > > > > > So the low-order bit of the combined counter could indicate process-level
> > > > > > non-idle, the next three bits could be unused to make interpretation
> > > > > > of hex printouts easier, and then the rest of the bits could be used in
> > > > > > the same way as currently.
> > > > > > 
> > > > > > This would allow a single read to see the full state, so that 0x1 means
> > > > > > at process level in the kernel, 0x11 is interrupt (or NMI) from process
> > > > > > level, 0x10 is interrupt/NMI from idle/user, and so on.
> > > > > > 
> > > > > > What am I missing here?  Why wouldn't this work, and without adding yet
> > > > > > another RCU-imposed constraint on some other subsystem?
> > > > > 
> > > > > What about replacing the warning with a WARN_ON_ONCE(in_irq()), would that
> > > > > address your concern?
> > > > > 
> > > > > Also, considering this warning condition is most likely never occurring as we
> > > > > know it, and we are considering deleting it soon enough, is it really worth
> > > > > reimplementing the whole mechanism with a complex bit-sharing scheme just
> > > > > because of the BH-disable condition you mentioned, which likely doesn't
> > > > > happen today? In my implementation, this is just a simple counter. I feel
> > > > > combining bits in the same counter will just introduce more complexity that
> > > > > this patch tries to address/avoid.
> > > > > 
> > > > > OTOH, I also don't mind with just deleting the warning altogether if you are
> > > > > Ok with that.
> > > > 
> > > > The big advantage of combining the counters is that all of the state is
> > > > explicit and visible in one place.  Plus it can be accessed atomically.
> > > > And it avoids setting a time bomb for some poor guys just trying to get
> > > > their idle-loop jobs done some time in the dim distant future.
> > > 
> > > I could try the approach you're suggesting but I didn't actually see an issue
> > > with the patch in its current state other than the WARN_ON_ONCE which I could
> > > change to WARN_ON_ONCE(in_irq()) to remove the concern. AFAICS, we don't
> > > detect "half soft-interrupts" in this code in anyway.
> > > 
> > > I do feel the approach you're suggesting can be a follow up, these 2 patches
> > > just focus on deleting dynticks_nmi_nesting counter and we can test this
> > > approach thoroughly for a release or so.
> > > 
> > > > Besides, this pair of patches already makes a large change from a
> > > > conceptual viewpoint.  If we are going to make a large change, let's
> > > > get our money's worth out of that change!
> > > 
> > > IMHO, most of the changes are to code comments, the actual code change is
> > > very little and is just removal of dynticks_nmi_nesting and simplification;
> > > its not really an introduction of a new mechanism.
> > 
> > This change is not fixing a bug, so there is no need for an emergency fix,
> > and thus no point in additional churn.  I understand that it is a bit
> > annoying to code and test something and have your friendly maintainer say
> > "sorry, wrong rocks", and the reason that I understand this is that I do
> > that to myself rather often.
> 
> The motivation for me for this change is to avoid future bugs such as with
> the following patch where "== 2" did not take the force write of
> DYNTICK_IRQ_NONIDLE into account:
> https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=13c4b07593977d9288e5d0c21c89d9ba27e2ea1f

Yes, the current code does need some simplification.

> I still don't see it as pointless churn, it is also a maintenance cost in its
> current form and the simplification is worth it IMHO both from a readability,
> and maintenance stand point.
> 
> I still don't see what's technically wrong with the patch. I could perhaps
> add the above "== 2" point in the patch?

I don't know of a crash or splat your patch would cause, if that is
your question.  But that is also true of the current code, so the point
is simplification, not bug fixing.  And from what I can see, there is an
opportunity to simplify quite a bit further.  And with something like
RCU, further simplification is worth -serious- consideration.

> We could also discuss f2f at LPC to see if we can agree about it?

That might make a lot of sense.

In the meantime, could you please create an independent series (or
more than one series, as the case may be) of the other patches?

							Thanx, Paul
Joel Fernandes Aug. 29, 2019, 1:51 a.m. UTC | #9
On Wed, Aug 28, 2019 at 04:12:47PM -0700, Paul E. McKenney wrote:
> On Wed, Aug 28, 2019 at 06:14:44PM -0400, Joel Fernandes wrote:
> > On Wed, Aug 28, 2019 at 03:01:08PM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 28, 2019 at 05:42:41PM -0400, Joel Fernandes wrote:
> > > > On Wed, Aug 28, 2019 at 02:19:04PM -0700, Paul E. McKenney wrote:
> > > > > On Wed, Aug 28, 2019 at 05:05:25PM -0400, Joel Fernandes wrote:
> > > > > > On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> > > > > > > On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) wrote:
> > > > > > > > The dynticks_nmi_nesting counter serves 4 purposes:
> > > > > > > > 
> > > > > > > >       (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
> > > > > > > >           interrupt nesting level.
> > > > > > > > 
> > > > > > > >       (b) We need to detect half-interrupts till we are sure they're not an
> > > > > > > >           issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0.
> > > > > > > > 
> > > > > > > >       (c) When a quiescent state report is needed from a nohz_full CPU.
> > > > > > > >           The nesting counter detects we are a first level interrupt.
> > > > > > > > 
> > > > > > > > For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> > > > > > > > outermost interrupt that interrupted an RCU-idle state can set it to 1.
> > > > > > > > 
> > > > > > > > For (b), this warning condition has not occurred for several kernel
> > > > > > > > releases.  But we still keep the warning but change it to use
> > > > > > > > in_interrupt() instead of the nesting counter. In a later year, we can
> > > > > > > > remove the warning.
> > > > > > > > 
> > > > > > > > For (c), the nest check is not really necessary since forced_tick would
> > > > > > > > have been set to true in the outermost interrupt, so the nested/NMI
> > > > > > > > interrupts will check forced_tick anyway, and bail.
> > > > > > > 
> > > > > > > Skipping the commit log and documentation for this pass.
> > > > > > [snip] 
> > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > > index 255cd6835526..1465a3e406f8 100644
> > > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > > @@ -81,7 +81,6 @@
> > > > > > > >  
> > > > > > > >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
> > > > > > > >  	.dynticks_nesting = 1,
> > > > > > > > -	.dynticks_nmi_nesting = 0,
> > > > > > > 
> > > > > > > This should be in the previous patch, give or take naming.
> > > > > > 
> > > > > > Done.
> > > > > > 
> > > > > > > >  	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> > > > > > > >  };
> > > > > > > >  struct rcu_state rcu_state = {
> > > > > > > > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > > > > > > >  	/* Check for counter underflows */
> > > > > > > >  	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > > > > > > >  			 "RCU dynticks_nesting counter underflow!");
> > > > > > > > -	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> > > > > > > > -			 "RCU dynticks_nmi_nesting counter underflow/zero!");
> > > > > > > >  
> > > > > > > > -	/* Are we at first interrupt nesting level? */
> > > > > > > > -	if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > > > > > > > -		return false;
> > > > > > > > -
> > > > > > > > -	/* Does CPU appear to be idle from an RCU standpoint? */
> > > > > > > > -	return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > > > > > > > +	/* Are we the outermost interrupt that arrived when RCU was idle? */
> > > > > > > > +	return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch ... */
> > > > > > > > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > > > > > > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > > > > > >  
> > > > > > > >  	/* Entering usermode/idle from interrupt is not handled. These would
> > > > > > > > -	 * mean usermode upcalls or idle entry happened from interrupts. But,
> > > > > > > > -	 * reset the counter if we warn.
> > > > > > > > +	 * mean usermode upcalls or idle exit happened from interrupts. Remove
> > > > > > > > +	 * the warning by 2020.
> > > > > > > >  	 */
> > > > > > > > -	if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > > > > > > > -		WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > > > > > > > +	WARN_ON_ONCE(in_interrupt());
> > > > > > > 
> > > > > > > And this is a red flag.  Bad things happen should some common code
> > > > > > > that disables BH be invoked from the idle loop.  This might not be
> > > > > > > happening now, but we need to avoid this sort of constraint.
> > > > > > > How about instead merging ->dyntick_nesting into the low-order bits
> > > > > > > of ->dyntick_nmi_nesting?
> > > > > > > 
> > > > > > > Yes, this assumes that we don't enter process level twice, but it should
> > > > > > > be easy to add a WARN_ON() to test for that.  Except that we don't have
> > > > > > > to because there is already this near the end of rcu_eqs_exit():
> > > > > > > 
> > > > > > > 	WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
> > > > > > > 
> > > > > > > So the low-order bit of the combined counter could indicate process-level
> > > > > > > non-idle, the next three bits could be unused to make interpretation
> > > > > > > of hex printouts easier, and then the rest of the bits could be used in
> > > > > > > the same way as currently.
> > > > > > > 
> > > > > > > This would allow a single read to see the full state, so that 0x1 means
> > > > > > > at process level in the kernel, 0x11 is interrupt (or NMI) from process
> > > > > > > level, 0x10 is interrupt/NMI from idle/user, and so on.
> > > > > > > 
> > > > > > > What am I missing here?  Why wouldn't this work, and without adding yet
> > > > > > > another RCU-imposed constraint on some other subsystem?
> > > > > > 
> > > > > > What about replacing the warning with a WARN_ON_ONCE(in_irq()), would that
> > > > > > address your concern?
> > > > > > 
> > > > > > Also, considering this warning condition is most likely never occurring as we
> > > > > > know it, and we are considering deleting it soon enough, is it really worth
> > > > > > reimplementing the whole mechanism with a complex bit-sharing scheme just
> > > > > > because of the BH-disable condition you mentioned, which likely doesn't
> > > > > > happen today? In my implementation, this is just a simple counter. I feel
> > > > > > combining bits in the same counter will just introduce more complexity that
> > > > > > this patch tries to address/avoid.
> > > > > > 
> > > > > > OTOH, I also don't mind with just deleting the warning altogether if you are
> > > > > > Ok with that.
> > > > > 
> > > > > The big advantage of combining the counters is that all of the state is
> > > > > explicit and visible in one place.  Plus it can be accessed atomically.
> > > > > And it avoids setting a time bomb for some poor guys just trying to get
> > > > > their idle-loop jobs done some time in the dim distant future.
> > > > 
> > > > I could try the approach you're suggesting but I didn't actually see an issue
> > > > with the patch in its current state other than the WARN_ON_ONCE which I could
> > > > change to WARN_ON_ONCE(in_irq()) to remove the concern. AFAICS, we don't
> > > > detect "half soft-interrupts" in this code in anyway.
> > > > 
> > > > I do feel the approach you're suggesting can be a follow up, these 2 patches
> > > > just focus on deleting dynticks_nmi_nesting counter and we can test this
> > > > approach thoroughly for a release or so.
> > > > 
> > > > > Besides, this pair of patches already makes a large change from a
> > > > > conceptual viewpoint.  If we are going to make a large change, let's
> > > > > get our money's worth out of that change!
> > > > 
> > > > IMHO, most of the changes are to code comments, the actual code change is
> > > > very little and is just removal of dynticks_nmi_nesting and simplification;
> > > > its not really an introduction of a new mechanism.
> > > 
> > > This change is not fixing a bug, so there is no need for an emergency fix,
> > > and thus no point in additional churn.  I understand that it is a bit
> > > annoying to code and test something and have your friendly maintainer say
> > > "sorry, wrong rocks", and the reason that I understand this is that I do
> > > that to myself rather often.
> > 
> > The motivation for me for this change is to avoid future bugs such as with
> > the following patch where "== 2" did not take the force write of
> > DYNTICK_IRQ_NONIDLE into account:
> > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=13c4b07593977d9288e5d0c21c89d9ba27e2ea1f
> 
> Yes, the current code does need some simplification.
> 
> > I still don't see it as pointless churn, it is also a maintenance cost in its
> > current form and the simplification is worth it IMHO both from a readability,
> > and maintenance stand point.
> > 
> > I still don't see what's technically wrong with the patch. I could perhaps
> > add the above "== 2" point in the patch?
> 
> I don't know of a crash or splat your patch would cause, if that is
> your question.  But that is also true of the current code, so the point
> is simplification, not bug fixing.  And from what I can see, there is an
> opportunity to simplify quite a bit further.  And with something like
> RCU, further simplification is worth -serious- consideration.
> 
> > We could also discuss f2f at LPC to see if we can agree about it?
> 
> That might make a lot of sense.

Sure. I am up for a further redesign / simplification. I will think more
about your suggestions and can also further discuss at LPC.

And this patch is on LKML archives and is not going anywhere so there's no
rush I guess ;-)

> In the meantime, could you please create an independent series (or
> more than one series, as the case may be) of the other patches?

The only other patch in this series is to repurpose the dyntick_nesting
counter to do the job of the dynticks_nmi_nesting counter. Did you mean that
one? Or did you mean the dntick tracing patch? I think I should indeed submit
the tracing patch separately.

thanks,

 - Joel
Paul E. McKenney Aug. 29, 2019, 3:43 a.m. UTC | #10
On Wed, Aug 28, 2019 at 09:51:55PM -0400, Joel Fernandes wrote:
> On Wed, Aug 28, 2019 at 04:12:47PM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 28, 2019 at 06:14:44PM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 28, 2019 at 03:01:08PM -0700, Paul E. McKenney wrote:
> > > > On Wed, Aug 28, 2019 at 05:42:41PM -0400, Joel Fernandes wrote:
> > > > > On Wed, Aug 28, 2019 at 02:19:04PM -0700, Paul E. McKenney wrote:
> > > > > > On Wed, Aug 28, 2019 at 05:05:25PM -0400, Joel Fernandes wrote:
> > > > > > > On Wed, Aug 28, 2019 at 01:23:30PM -0700, Paul E. McKenney wrote:
> > > > > > > > On Mon, Aug 26, 2019 at 09:33:54PM -0400, Joel Fernandes (Google) wrote:
> > > > > > > > > The dynticks_nmi_nesting counter serves 4 purposes:
> > > > > > > > > 
> > > > > > > > >       (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first
> > > > > > > > >           interrupt nesting level.
> > > > > > > > > 
> > > > > > > > >       (b) We need to detect half-interrupts till we are sure they're not an
> > > > > > > > >           issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0.
> > > > > > > > > 
> > > > > > > > >       (c) When a quiescent state report is needed from a nohz_full CPU.
> > > > > > > > >           The nesting counter detects we are a first level interrupt.
> > > > > > > > > 
> > > > > > > > > For (a) we can just use dyntick_nesting == 1 to determine this. Only the
> > > > > > > > > outermost interrupt that interrupted an RCU-idle state can set it to 1.
> > > > > > > > > 
> > > > > > > > > For (b), this warning condition has not occurred for several kernel
> > > > > > > > > releases.  But we still keep the warning but change it to use
> > > > > > > > > in_interrupt() instead of the nesting counter. In a later year, we can
> > > > > > > > > remove the warning.
> > > > > > > > > 
> > > > > > > > > For (c), the nest check is not really necessary since forced_tick would
> > > > > > > > > have been set to true in the outermost interrupt, so the nested/NMI
> > > > > > > > > interrupts will check forced_tick anyway, and bail.
> > > > > > > > 
> > > > > > > > Skipping the commit log and documentation for this pass.
> > > > > > > [snip] 
> > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > > > index 255cd6835526..1465a3e406f8 100644
> > > > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > > > @@ -81,7 +81,6 @@
> > > > > > > > >  
> > > > > > > > >  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
> > > > > > > > >  	.dynticks_nesting = 1,
> > > > > > > > > -	.dynticks_nmi_nesting = 0,
> > > > > > > > 
> > > > > > > > This should be in the previous patch, give or take naming.
> > > > > > > 
> > > > > > > Done.
> > > > > > > 
> > > > > > > > >  	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
> > > > > > > > >  };
> > > > > > > > >  struct rcu_state rcu_state = {
> > > > > > > > > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > > > > > > > >  	/* Check for counter underflows */
> > > > > > > > >  	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > > > > > > > >  			 "RCU dynticks_nesting counter underflow!");
> > > > > > > > > -	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> > > > > > > > > -			 "RCU dynticks_nmi_nesting counter underflow/zero!");
> > > > > > > > >  
> > > > > > > > > -	/* Are we at first interrupt nesting level? */
> > > > > > > > > -	if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > > > > > > > > -		return false;
> > > > > > > > > -
> > > > > > > > > -	/* Does CPU appear to be idle from an RCU standpoint? */
> > > > > > > > > -	return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> > > > > > > > > +	/* Are we the outermost interrupt that arrived when RCU was idle? */
> > > > > > > > > +	return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > >  #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch ... */
> > > > > > > > > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user)
> > > > > > > > >  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > > > > > > >  
> > > > > > > > >  	/* Entering usermode/idle from interrupt is not handled. These would
> > > > > > > > > -	 * mean usermode upcalls or idle entry happened from interrupts. But,
> > > > > > > > > -	 * reset the counter if we warn.
> > > > > > > > > +	 * mean usermode upcalls or idle exit happened from interrupts. Remove
> > > > > > > > > +	 * the warning by 2020.
> > > > > > > > >  	 */
> > > > > > > > > -	if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
> > > > > > > > > -		WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
> > > > > > > > > +	WARN_ON_ONCE(in_interrupt());
> > > > > > > > 
> > > > > > > > And this is a red flag.  Bad things happen should some common code
> > > > > > > > that disables BH be invoked from the idle loop.  This might not be
> > > > > > > > happening now, but we need to avoid this sort of constraint.
> > > > > > > > How about instead merging ->dyntick_nesting into the low-order bits
> > > > > > > > of ->dyntick_nmi_nesting?
> > > > > > > > 
> > > > > > > > Yes, this assumes that we don't enter process level twice, but it should
> > > > > > > > be easy to add a WARN_ON() to test for that.  Except that we don't have
> > > > > > > > to because there is already this near the end of rcu_eqs_exit():
> > > > > > > > 
> > > > > > > > 	WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
> > > > > > > > 
> > > > > > > > So the low-order bit of the combined counter could indicate process-level
> > > > > > > > non-idle, the next three bits could be unused to make interpretation
> > > > > > > > of hex printouts easier, and then the rest of the bits could be used in
> > > > > > > > the same way as currently.
> > > > > > > > 
> > > > > > > > This would allow a single read to see the full state, so that 0x1 means
> > > > > > > > at process level in the kernel, 0x11 is interrupt (or NMI) from process
> > > > > > > > level, 0x10 is interrupt/NMI from idle/user, and so on.
> > > > > > > > 
> > > > > > > > What am I missing here?  Why wouldn't this work, and without adding yet
> > > > > > > > another RCU-imposed constraint on some other subsystem?
> > > > > > > 
> > > > > > > What about replacing the warning with a WARN_ON_ONCE(in_irq()), would that
> > > > > > > address your concern?
> > > > > > > 
> > > > > > > Also, considering this warning condition is most likely never occurring as we
> > > > > > > know it, and we are considering deleting it soon enough, is it really worth
> > > > > > > reimplementing the whole mechanism with a complex bit-sharing scheme just
> > > > > > > because of the BH-disable condition you mentioned, which likely doesn't
> > > > > > > happen today? In my implementation, this is just a simple counter. I feel
> > > > > > > combining bits in the same counter will just introduce more complexity that
> > > > > > > this patch tries to address/avoid.
> > > > > > > 
> > > > > > > OTOH, I also don't mind with just deleting the warning altogether if you are
> > > > > > > Ok with that.
> > > > > > 
> > > > > > The big advantage of combining the counters is that all of the state is
> > > > > > explicit and visible in one place.  Plus it can be accessed atomically.
> > > > > > And it avoids setting a time bomb for some poor guys just trying to get
> > > > > > their idle-loop jobs done some time in the dim distant future.
> > > > > 
> > > > > I could try the approach you're suggesting but I didn't actually see an issue
> > > > > with the patch in its current state other than the WARN_ON_ONCE which I could
> > > > > change to WARN_ON_ONCE(in_irq()) to remove the concern. AFAICS, we don't
> > > > > detect "half soft-interrupts" in this code in anyway.
> > > > > 
> > > > > I do feel the approach you're suggesting can be a follow up, these 2 patches
> > > > > just focus on deleting dynticks_nmi_nesting counter and we can test this
> > > > > approach thoroughly for a release or so.
> > > > > 
> > > > > > Besides, this pair of patches already makes a large change from a
> > > > > > conceptual viewpoint.  If we are going to make a large change, let's
> > > > > > get our money's worth out of that change!
> > > > > 
> > > > > IMHO, most of the changes are to code comments, the actual code change is
> > > > > very little and is just removal of dynticks_nmi_nesting and simplification;
> > > > > its not really an introduction of a new mechanism.
> > > > 
> > > > This change is not fixing a bug, so there is no need for an emergency fix,
> > > > and thus no point in additional churn.  I understand that it is a bit
> > > > annoying to code and test something and have your friendly maintainer say
> > > > "sorry, wrong rocks", and the reason that I understand this is that I do
> > > > that to myself rather often.
> > > 
> > > The motivation for me for this change is to avoid future bugs such as with
> > > the following patch where "== 2" did not take the force write of
> > > DYNTICK_IRQ_NONIDLE into account:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=13c4b07593977d9288e5d0c21c89d9ba27e2ea1f
> > 
> > Yes, the current code does need some simplification.
> > 
> > > I still don't see it as pointless churn, it is also a maintenance cost in its
> > > current form and the simplification is worth it IMHO both from a readability,
> > > and maintenance stand point.
> > > 
> > > I still don't see what's technically wrong with the patch. I could perhaps
> > > add the above "== 2" point in the patch?
> > 
> > I don't know of a crash or splat your patch would cause, if that is
> > your question.  But that is also true of the current code, so the point
> > is simplification, not bug fixing.  And from what I can see, there is an
> > opportunity to simplify quite a bit further.  And with something like
> > RCU, further simplification is worth -serious- consideration.
> > 
> > > We could also discuss f2f at LPC to see if we can agree about it?
> > 
> > That might make a lot of sense.
> 
> Sure. I am up for a further redesign / simplification. I will think more
> about your suggestions and can also further discuss at LPC.

One question that might (or might not) help:  Given the compound counter,
where the low-order hex digit indicates whether the corresponding CPU
is running in a non-idle kernel task and the rest of the hex digits
indicate the NMI-style nesting counter shifted up by four bits, what
could rcu_is_cpu_rrupt_from_idle() be reduced to?

> And this patch is on LKML archives and is not going anywhere so there's no
> rush I guess ;-)

True enough!  ;-)

> > In the meantime, could you please create an independent series (or
> > more than one series, as the case may be) of the other patches?
> 
> The only other patch in this series is to repurpose the dyntick_nesting
> counter to do the job of the dynticks_nmi_nesting counter. Did you mean that
> one? Or did you mean the dntick tracing patch? I think I should indeed submit
> the tracing patch separately.

Heh!

You have sent me a number of patches over the past day or two.  Respin
them as needed and resend.  My feeling was that one of the groups could
be split, but then again there might have been dependencies among the
patches that I didn't see.  You decide.

On the tracing patch...  That patch might be a good idea regardless,
but I bet that the reason that you felt the sudden need for it was due
to the loss of information in your eventual ->dynticks_nesting field.
After all, the value 0x1 might be an interrupt from idle, or it might
just as easily be a task running in the kernel at process level.

The reason the patch might nevertheless be a good idea is that redundant
information can be helpful when debugging.  Especially when debugging
new architecture-specific code, which is when RCU's dyntick-idle warnings
tend to find bugs.

							Thanx, Paul
Joel Fernandes Aug. 29, 2019, 1:59 p.m. UTC | #11
On Wed, Aug 28, 2019 at 08:43:36PM -0700, Paul E. McKenney wrote:
[snip] 
> On the tracing patch...  That patch might be a good idea regardless,
> but I bet that the reason that you felt the sudden need for it was due
> to the loss of information in your eventual ->dynticks_nesting field.
> After all, the value 0x1 might be an interrupt from idle, or it might
> just as easily be a task running in the kernel at process level.

True, however what really triggered me to do it was the existing code which
does not distinguish between entry/exit from USER and IDLE.

> The reason the patch might nevertheless be a good idea is that redundant
> information can be helpful when debugging.  Especially when debugging
> new architecture-specific code, which is when RCU's dyntick-idle warnings
> tend to find bugs.

Sure, and also that it is more readable to ordinary human beings than "++="
and "--=" :-D.

thanks,

 - Joel
Joel Fernandes Aug. 29, 2019, 2:43 p.m. UTC | #12
On Wed, Aug 28, 2019 at 08:43:36PM -0700, Paul E. McKenney wrote:
[snip]
> > > > > This change is not fixing a bug, so there is no need for an emergency fix,
> > > > > and thus no point in additional churn.  I understand that it is a bit
> > > > > annoying to code and test something and have your friendly maintainer say
> > > > > "sorry, wrong rocks", and the reason that I understand this is that I do
> > > > > that to myself rather often.
> > > > 
> > > > The motivation for me for this change is to avoid future bugs such as with
> > > > the following patch where "== 2" did not take the force write of
> > > > DYNTICK_IRQ_NONIDLE into account:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=13c4b07593977d9288e5d0c21c89d9ba27e2ea1f
> > > 
> > > Yes, the current code does need some simplification.
> > > 
> > > > I still don't see it as pointless churn, it is also a maintenance cost in its
> > > > current form and the simplification is worth it IMHO both from a readability,
> > > > and maintenance stand point.
> > > > 
> > > > I still don't see what's technically wrong with the patch. I could perhaps
> > > > add the above "== 2" point in the patch?
> > > 
> > > I don't know of a crash or splat your patch would cause, if that is
> > > your question.  But that is also true of the current code, so the point
> > > is simplification, not bug fixing.  And from what I can see, there is an
> > > opportunity to simplify quite a bit further.  And with something like
> > > RCU, further simplification is worth -serious- consideration.
> > > 
> > > > We could also discuss f2f at LPC to see if we can agree about it?
> > > 
> > > That might make a lot of sense.
> > 
> > Sure. I am up for a further redesign / simplification. I will think more
> > about your suggestions and can also further discuss at LPC.
> 
> One question that might (or might not) help:  Given the compound counter,
> where the low-order hex digit indicates whether the corresponding CPU
> is running in a non-idle kernel task and the rest of the hex digits
> indicate the NMI-style nesting counter shifted up by four bits, what
> could rcu_is_cpu_rrupt_from_idle() be reduced to?
> 
> > And this patch is on LKML archives and is not going anywhere so there's no
> > rush I guess ;-)
> 
> True enough!  ;-)

Paul, do we also nuke rcu_eqs_special_set()?  Currently I don't see anyone
using it. And also remove the bottom most bit of dynticks?

Also what happens if a TLB flush broadcast is needed? Do we IPI nohz or idle
CPUs are the moment?

All of this was introduced in:
b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")

thanks,

 - Joel
Joel Fernandes Aug. 29, 2019, 3:13 p.m. UTC | #13
On Thu, Aug 29, 2019 at 10:43:55AM -0400, Joel Fernandes wrote:
> On Wed, Aug 28, 2019 at 08:43:36PM -0700, Paul E. McKenney wrote:
> [snip]
> > > > > > This change is not fixing a bug, so there is no need for an emergency fix,
> > > > > > and thus no point in additional churn.  I understand that it is a bit
> > > > > > annoying to code and test something and have your friendly maintainer say
> > > > > > "sorry, wrong rocks", and the reason that I understand this is that I do
> > > > > > that to myself rather often.
> > > > > 
> > > > > The motivation for me for this change is to avoid future bugs such as with
> > > > > the following patch where "== 2" did not take the force write of
> > > > > DYNTICK_IRQ_NONIDLE into account:
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=13c4b07593977d9288e5d0c21c89d9ba27e2ea1f
> > > > 
> > > > Yes, the current code does need some simplification.
> > > > 
> > > > > I still don't see it as pointless churn, it is also a maintenance cost in its
> > > > > current form and the simplification is worth it IMHO both from a readability,
> > > > > and maintenance stand point.
> > > > > 
> > > > > I still don't see what's technically wrong with the patch. I could perhaps
> > > > > add the above "== 2" point in the patch?
> > > > 
> > > > I don't know of a crash or splat your patch would cause, if that is
> > > > your question.  But that is also true of the current code, so the point
> > > > is simplification, not bug fixing.  And from what I can see, there is an
> > > > opportunity to simplify quite a bit further.  And with something like
> > > > RCU, further simplification is worth -serious- consideration.
> > > > 
> > > > > We could also discuss f2f at LPC to see if we can agree about it?
> > > > 
> > > > That might make a lot of sense.
> > > 
> > > Sure. I am up for a further redesign / simplification. I will think more
> > > about your suggestions and can also further discuss at LPC.
> > 
> > One question that might (or might not) help:  Given the compound counter,
> > where the low-order hex digit indicates whether the corresponding CPU
> > is running in a non-idle kernel task and the rest of the hex digits
> > indicate the NMI-style nesting counter shifted up by four bits, what
> > could rcu_is_cpu_rrupt_from_idle() be reduced to?
> > 
> > > And this patch is on LKML archives and is not going anywhere so there's no
> > > rush I guess ;-)
> > 
> > True enough!  ;-)
> 
> Paul, do we also nuke rcu_eqs_special_set()?  Currently I don't see anyone
> using it. And also remove the bottom most bit of dynticks?
> 
> Also what happens if a TLB flush broadcast is needed? Do we IPI nohz or idle
> CPUs are the moment?
> 
> All of this was introduced in:
> b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")


Paul, also what what happens in the following scenario:

CPU0                                                 CPU1

A syscall causes rcu_eqs_exit()
rcu_read_lock();
                                                     ---> FQS loop waiting on
						           dyntick_snap
usermode-upcall  entry -->causes rcu_eqs_enter();

usermode-upcall  exit  -->causes rcu_eqs_exit();

                                                     ---> FQS loop sees
						          dyntick snap
							  increment and
							  declares CPU0 is
							  in a QS state
							  before the
							  rcu_read_unlock!

rcu_read_unlock();
---

Does the context tracking not call rcu_user_enter() in this case, or did I
really miss something?

thanks,

 - Joel
Paul E. McKenney Aug. 29, 2019, 4:09 p.m. UTC | #14
On Thu, Aug 29, 2019 at 10:43:55AM -0400, Joel Fernandes wrote:

[ . . . ]

> Paul, do we also nuke rcu_eqs_special_set()?  Currently I don't see anyone
> using it. And also remove the bottom most bit of dynticks?
> 
> Also what happens if a TLB flush broadcast is needed? Do we IPI nohz or idle
> CPUs are the moment?
> 
> All of this was introduced in:
> b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")

Adding Andy Lutomirski on CC.

Andy, is this going to be used in the near term, or should we just get
rid of it?

							Thanx, Paul
Paul E. McKenney Aug. 29, 2019, 4:13 p.m. UTC | #15
On Thu, Aug 29, 2019 at 11:13:25AM -0400, Joel Fernandes wrote:
> On Thu, Aug 29, 2019 at 10:43:55AM -0400, Joel Fernandes wrote:
> > On Wed, Aug 28, 2019 at 08:43:36PM -0700, Paul E. McKenney wrote:
> > [snip]
> > > > > > > This change is not fixing a bug, so there is no need for an emergency fix,
> > > > > > > and thus no point in additional churn.  I understand that it is a bit
> > > > > > > annoying to code and test something and have your friendly maintainer say
> > > > > > > "sorry, wrong rocks", and the reason that I understand this is that I do
> > > > > > > that to myself rather often.
> > > > > > 
> > > > > > The motivation for me for this change is to avoid future bugs such as with
> > > > > > the following patch where "== 2" did not take the force write of
> > > > > > DYNTICK_IRQ_NONIDLE into account:
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=13c4b07593977d9288e5d0c21c89d9ba27e2ea1f
> > > > > 
> > > > > Yes, the current code does need some simplification.
> > > > > 
> > > > > > I still don't see it as pointless churn, it is also a maintenance cost in its
> > > > > > current form and the simplification is worth it IMHO both from a readability,
> > > > > > and maintenance stand point.
> > > > > > 
> > > > > > I still don't see what's technically wrong with the patch. I could perhaps
> > > > > > add the above "== 2" point in the patch?
> > > > > 
> > > > > I don't know of a crash or splat your patch would cause, if that is
> > > > > your question.  But that is also true of the current code, so the point
> > > > > is simplification, not bug fixing.  And from what I can see, there is an
> > > > > opportunity to simplify quite a bit further.  And with something like
> > > > > RCU, further simplification is worth -serious- consideration.
> > > > > 
> > > > > > We could also discuss f2f at LPC to see if we can agree about it?
> > > > > 
> > > > > That might make a lot of sense.
> > > > 
> > > > Sure. I am up for a further redesign / simplification. I will think more
> > > > about your suggestions and can also further discuss at LPC.
> > > 
> > > One question that might (or might not) help:  Given the compound counter,
> > > where the low-order hex digit indicates whether the corresponding CPU
> > > is running in a non-idle kernel task and the rest of the hex digits
> > > indicate the NMI-style nesting counter shifted up by four bits, what
> > > could rcu_is_cpu_rrupt_from_idle() be reduced to?
> > > 
> > > > And this patch is on LKML archives and is not going anywhere so there's no
> > > > rush I guess ;-)
> > > 
> > > True enough!  ;-)
> > 
> > Paul, do we also nuke rcu_eqs_special_set()?  Currently I don't see anyone
> > using it. And also remove the bottom most bit of dynticks?
> > 
> > Also what happens if a TLB flush broadcast is needed? Do we IPI nohz or idle
> > CPUs are the moment?
> > 
> > All of this was introduced in:
> > b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")
> 
> 
> Paul, also what what happens in the following scenario:
> 
> CPU0                                                 CPU1
> 
> A syscall causes rcu_eqs_exit()
> rcu_read_lock();
>                                                      ---> FQS loop waiting on
> 						           dyntick_snap
> usermode-upcall  entry -->causes rcu_eqs_enter();
> 
> usermode-upcall  exit  -->causes rcu_eqs_exit();
> 
>                                                      ---> FQS loop sees
> 						          dyntick snap
> 							  increment and
> 							  declares CPU0 is
> 							  in a QS state
> 							  before the
> 							  rcu_read_unlock!
> 
> rcu_read_unlock();
> ---
> 
> Does the context tracking not call rcu_user_enter() in this case, or did I
> really miss something?

Holding rcu_read_lock() across usermode execution (in this case,
the usermode upcall) is a bad idea.  Why is CPU 0 doing that?

							Thanx, Paul
Andy Lutomirski Aug. 29, 2019, 4:21 p.m. UTC | #16
On Thu, Aug 29, 2019 at 9:10 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Thu, Aug 29, 2019 at 10:43:55AM -0400, Joel Fernandes wrote:
>
> [ . . . ]
>
> > Paul, do we also nuke rcu_eqs_special_set()?  Currently I don't see anyone
> > using it. And also remove the bottom most bit of dynticks?
> >
> > Also what happens if a TLB flush broadcast is needed? Do we IPI nohz or idle
> > CPUs are the moment?
> >
> > All of this was introduced in:
> > b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")
>
> Adding Andy Lutomirski on CC.
>
> Andy, is this going to be used in the near term, or should we just get
> rid of it?
>

Let's get rid of it.  I'm not actually convinced it *can* be used as designed.

For those who forgot the history or weren't cc'd on all of it: I had
this clever idea about how we could reduce TLB flushes.  I implemented
some of it (but not the part that would have used this RCU feature),
and it exploded in nasty and subtle ways.  This caused me to learn
that speculative TLB fills were a problem that I had entirely failed
to account for.  Then PTI happened and thoroughly muddied the water.

So I think we should just drop this :(
Paul E. McKenney Aug. 29, 2019, 4:32 p.m. UTC | #17
On Thu, Aug 29, 2019 at 09:59:07AM -0400, Joel Fernandes wrote:
> On Wed, Aug 28, 2019 at 08:43:36PM -0700, Paul E. McKenney wrote:
> [snip] 
> > On the tracing patch...  That patch might be a good idea regardless,
> > but I bet that the reason that you felt the sudden need for it was due
> > to the loss of information in your eventual ->dynticks_nesting field.
> > After all, the value 0x1 might be an interrupt from idle, or it might
> > just as easily be a task running in the kernel at process level.
> 
> True, however what really triggered me to do it was the existing code which
> does not distinguish between entry/exit from USER and IDLE.
> 
> > The reason the patch might nevertheless be a good idea is that redundant
> > information can be helpful when debugging.  Especially when debugging
> > new architecture-specific code, which is when RCU's dyntick-idle warnings
> > tend to find bugs.
> 
> Sure, and also that it is more readable to ordinary human beings than "++="
> and "--=" :-D.

And those considerations did figure into my deciding that the tracing
change was likely a good thing in any case.  ;-)

							Thanx, Paul
Paul E. McKenney Aug. 29, 2019, 4:54 p.m. UTC | #18
On Thu, Aug 29, 2019 at 09:21:46AM -0700, Andy Lutomirski wrote:
> On Thu, Aug 29, 2019 at 9:10 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Thu, Aug 29, 2019 at 10:43:55AM -0400, Joel Fernandes wrote:
> >
> > [ . . . ]
> >
> > > Paul, do we also nuke rcu_eqs_special_set()?  Currently I don't see anyone
> > > using it. And also remove the bottom most bit of dynticks?
> > >
> > > Also what happens if a TLB flush broadcast is needed? Do we IPI nohz or idle
> > > CPUs are the moment?
> > >
> > > All of this was introduced in:
> > > b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")
> >
> > Adding Andy Lutomirski on CC.
> >
> > Andy, is this going to be used in the near term, or should we just get
> > rid of it?
> 
> Let's get rid of it.  I'm not actually convinced it *can* be used as designed.
> 
> For those who forgot the history or weren't cc'd on all of it: I had
> this clever idea about how we could reduce TLB flushes.  I implemented
> some of it (but not the part that would have used this RCU feature),
> and it exploded in nasty and subtle ways.  This caused me to learn
> that speculative TLB fills were a problem that I had entirely failed
> to account for.  Then PTI happened and thoroughly muddied the water.

Yeah, PTI was quite annoying.  Still is, from what I can see.  :-/

> So I think we should just drop this :(

OK, thank you!  I will put a tag into -rcu marking its removal in case
it should prove useful whenever for whatever.

Joel, would you like to remove this, or would you rather that I did?
It is in code you are working with right now, so if I do it, I need to
wait until yours is finalized.  Which wouldn't be a problem.

						Thanx, Paul
Joel Fernandes Aug. 29, 2019, 5:14 p.m. UTC | #19
On Thu, Aug 29, 2019 at 09:13:01AM -0700, Paul E. McKenney wrote:
> On Thu, Aug 29, 2019 at 11:13:25AM -0400, Joel Fernandes wrote:
> > On Thu, Aug 29, 2019 at 10:43:55AM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 28, 2019 at 08:43:36PM -0700, Paul E. McKenney wrote:
> > > [snip]
> > > > > > > > This change is not fixing a bug, so there is no need for an emergency fix,
> > > > > > > > and thus no point in additional churn.  I understand that it is a bit
> > > > > > > > annoying to code and test something and have your friendly maintainer say
> > > > > > > > "sorry, wrong rocks", and the reason that I understand this is that I do
> > > > > > > > that to myself rather often.
> > > > > > > 
> > > > > > > The motivation for me for this change is to avoid future bugs such as with
> > > > > > > the following patch where "== 2" did not take the force write of
> > > > > > > DYNTICK_IRQ_NONIDLE into account:
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=13c4b07593977d9288e5d0c21c89d9ba27e2ea1f
> > > > > > 
> > > > > > Yes, the current code does need some simplification.
> > > > > > 
> > > > > > > I still don't see it as pointless churn, it is also a maintenance cost in its
> > > > > > > current form and the simplification is worth it IMHO both from a readability,
> > > > > > > and maintenance stand point.
> > > > > > > 
> > > > > > > I still don't see what's technically wrong with the patch. I could perhaps
> > > > > > > add the above "== 2" point in the patch?
> > > > > > 
> > > > > > I don't know of a crash or splat your patch would cause, if that is
> > > > > > your question.  But that is also true of the current code, so the point
> > > > > > is simplification, not bug fixing.  And from what I can see, there is an
> > > > > > opportunity to simplify quite a bit further.  And with something like
> > > > > > RCU, further simplification is worth -serious- consideration.
> > > > > > 
> > > > > > > We could also discuss f2f at LPC to see if we can agree about it?
> > > > > > 
> > > > > > That might make a lot of sense.
> > > > > 
> > > > > Sure. I am up for a further redesign / simplification. I will think more
> > > > > about your suggestions and can also further discuss at LPC.
> > > > 
> > > > One question that might (or might not) help:  Given the compound counter,
> > > > where the low-order hex digit indicates whether the corresponding CPU
> > > > is running in a non-idle kernel task and the rest of the hex digits
> > > > indicate the NMI-style nesting counter shifted up by four bits, what
> > > > could rcu_is_cpu_rrupt_from_idle() be reduced to?
> > > > 
> > > > > And this patch is on LKML archives and is not going anywhere so there's no
> > > > > rush I guess ;-)
> > > > 
> > > > True enough!  ;-)
> > > 
> > > Paul, do we also nuke rcu_eqs_special_set()?  Currently I don't see anyone
> > > using it. And also remove the bottom most bit of dynticks?
> > > 
> > > Also what happens if a TLB flush broadcast is needed? Do we IPI nohz or idle
> > > CPUs are the moment?
> > > 
> > > All of this was introduced in:
> > > b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")
> > 
> > 
> > Paul, also what what happens in the following scenario:
> > 
> > CPU0                                                 CPU1
> > 
> > A syscall causes rcu_eqs_exit()
> > rcu_read_lock();
> >                                                      ---> FQS loop waiting on
> > 						           dyntick_snap
> > usermode-upcall  entry -->causes rcu_eqs_enter();
> > 
> > usermode-upcall  exit  -->causes rcu_eqs_exit();
> > 
> >                                                      ---> FQS loop sees
> > 						          dyntick snap
> > 							  increment and
> > 							  declares CPU0 is
> > 							  in a QS state
> > 							  before the
> > 							  rcu_read_unlock!
> > 
> > rcu_read_unlock();
> > ---
> > 
> > Does the context tracking not call rcu_user_enter() in this case, or did I
> > really miss something?
> 
> Holding rcu_read_lock() across usermode execution (in this case,
> the usermode upcall) is a bad idea.  Why is CPU 0 doing that?

Oh, ok. I was just hypothesizing that since usermode upcalls from
something as heavy as interrupts, it could also mean we had the same from
some path that held an rcu_read_lock() as well. It was just a theoretical
concern, if it is not an issue, no problem.

The other question I had was, in which cases would dyntick_nesting in current
RCU code be > 1 (after removing the lower bit and any crowbarring) ? In the
scenarios I worked out on paper, I can only see this as 1 or 0. But the
wording of it is 'dynticks_nesting'. May be I am missing a nesting scenario?
We can exit RCU-idleness into process context only once (either exiting idle
mode or user mode). Both cases would imply a value of 1.

thanks!

 - Joel
Joel Fernandes Aug. 29, 2019, 7 p.m. UTC | #20
Hi Paul,

On Thu, Aug 29, 2019 at 09:54:07AM -0700, Paul E. McKenney wrote:
> On Thu, Aug 29, 2019 at 09:21:46AM -0700, Andy Lutomirski wrote:
> > On Thu, Aug 29, 2019 at 9:10 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Thu, Aug 29, 2019 at 10:43:55AM -0400, Joel Fernandes wrote:
> > >
> > > [ . . . ]
> > >
> > > > Paul, do we also nuke rcu_eqs_special_set()?  Currently I don't see anyone
> > > > using it. And also remove the bottom most bit of dynticks?
> > > >
> > > > Also what happens if a TLB flush broadcast is needed? Do we IPI nohz or idle
> > > > CPUs are the moment?
> > > >
> > > > All of this was introduced in:
> > > > b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")
> > >
> > > Adding Andy Lutomirski on CC.
> > >
> > > Andy, is this going to be used in the near term, or should we just get
> > > rid of it?
> > 
> > Let's get rid of it.  I'm not actually convinced it *can* be used as designed.
> > 
> > For those who forgot the history or weren't cc'd on all of it: I had
> > this clever idea about how we could reduce TLB flushes.  I implemented
> > some of it (but not the part that would have used this RCU feature),
> > and it exploded in nasty and subtle ways.  This caused me to learn
> > that speculative TLB fills were a problem that I had entirely failed
> > to account for.  Then PTI happened and thoroughly muddied the water.
> 
> Yeah, PTI was quite annoying.  Still is, from what I can see.  :-/
> 
> > So I think we should just drop this :(
> 
> OK, thank you!  I will put a tag into -rcu marking its removal in case
> it should prove useful whenever for whatever.
> 
> Joel, would you like to remove this, or would you rather that I did?
> It is in code you are working with right now, so if I do it, I need to
> wait until yours is finalized.  Which wouldn't be a problem.

I can remove it in my series, made a note to do so.

thanks,

 - Joel
Paul E. McKenney Aug. 30, 2019, 12:47 a.m. UTC | #21
On Thu, Aug 29, 2019 at 01:14:54PM -0400, Joel Fernandes wrote:
> On Thu, Aug 29, 2019 at 09:13:01AM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 29, 2019 at 11:13:25AM -0400, Joel Fernandes wrote:
> > > On Thu, Aug 29, 2019 at 10:43:55AM -0400, Joel Fernandes wrote:
> > > > On Wed, Aug 28, 2019 at 08:43:36PM -0700, Paul E. McKenney wrote:
> > > > [snip]
> > > > > > > > > This change is not fixing a bug, so there is no need for an emergency fix,
> > > > > > > > > and thus no point in additional churn.  I understand that it is a bit
> > > > > > > > > annoying to code and test something and have your friendly maintainer say
> > > > > > > > > "sorry, wrong rocks", and the reason that I understand this is that I do
> > > > > > > > > that to myself rather often.
> > > > > > > > 
> > > > > > > > The motivation for me for this change is to avoid future bugs such as with
> > > > > > > > the following patch where "== 2" did not take the force write of
> > > > > > > > DYNTICK_IRQ_NONIDLE into account:
> > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=13c4b07593977d9288e5d0c21c89d9ba27e2ea1f
> > > > > > > 
> > > > > > > Yes, the current code does need some simplification.
> > > > > > > 
> > > > > > > > I still don't see it as pointless churn, it is also a maintenance cost in its
> > > > > > > > current form and the simplification is worth it IMHO both from a readability,
> > > > > > > > and maintenance stand point.
> > > > > > > > 
> > > > > > > > I still don't see what's technically wrong with the patch. I could perhaps
> > > > > > > > add the above "== 2" point in the patch?
> > > > > > > 
> > > > > > > I don't know of a crash or splat your patch would cause, if that is
> > > > > > > your question.  But that is also true of the current code, so the point
> > > > > > > is simplification, not bug fixing.  And from what I can see, there is an
> > > > > > > opportunity to simplify quite a bit further.  And with something like
> > > > > > > RCU, further simplification is worth -serious- consideration.
> > > > > > > 
> > > > > > > > We could also discuss f2f at LPC to see if we can agree about it?
> > > > > > > 
> > > > > > > That might make a lot of sense.
> > > > > > 
> > > > > > Sure. I am up for a further redesign / simplification. I will think more
> > > > > > about your suggestions and can also further discuss at LPC.
> > > > > 
> > > > > One question that might (or might not) help:  Given the compound counter,
> > > > > where the low-order hex digit indicates whether the corresponding CPU
> > > > > is running in a non-idle kernel task and the rest of the hex digits
> > > > > indicate the NMI-style nesting counter shifted up by four bits, what
> > > > > could rcu_is_cpu_rrupt_from_idle() be reduced to?
> > > > > 
> > > > > > And this patch is on LKML archives and is not going anywhere so there's no
> > > > > > rush I guess ;-)
> > > > > 
> > > > > True enough!  ;-)
> > > > 
> > > > Paul, do we also nuke rcu_eqs_special_set()?  Currently I don't see anyone
> > > > using it. And also remove the bottom most bit of dynticks?
> > > > 
> > > > Also what happens if a TLB flush broadcast is needed? Do we IPI nohz or idle
> > > > CPUs are the moment?
> > > > 
> > > > All of this was introduced in:
> > > > b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")
> > > 
> > > 
> > > Paul, also what what happens in the following scenario:
> > > 
> > > CPU0                                                 CPU1
> > > 
> > > A syscall causes rcu_eqs_exit()
> > > rcu_read_lock();
> > >                                                      ---> FQS loop waiting on
> > > 						           dyntick_snap
> > > usermode-upcall  entry -->causes rcu_eqs_enter();
> > > 
> > > usermode-upcall  exit  -->causes rcu_eqs_exit();
> > > 
> > >                                                      ---> FQS loop sees
> > > 						          dyntick snap
> > > 							  increment and
> > > 							  declares CPU0 is
> > > 							  in a QS state
> > > 							  before the
> > > 							  rcu_read_unlock!
> > > 
> > > rcu_read_unlock();
> > > ---
> > > 
> > > Does the context tracking not call rcu_user_enter() in this case, or did I
> > > really miss something?
> > 
> > Holding rcu_read_lock() across usermode execution (in this case,
> > the usermode upcall) is a bad idea.  Why is CPU 0 doing that?
> 
> Oh, ok. I was just hypothesizing that since usermode upcalls from
> something as heavy as interrupts, it could also mean we had the same from
> some path that held an rcu_read_lock() as well. It was just a theoretical
> concern, if it is not an issue, no problem.

Are there the usual lockdep checks in the upcall code?  Holding a spinlock
across them would seem to be at least as bad as holding an rcu_read_lock()
across them.

> The other question I had was, in which cases would dyntick_nesting in current
> RCU code be > 1 (after removing the lower bit and any crowbarring) ? In the
> scenarios I worked out on paper, I can only see this as 1 or 0. But the
> wording of it is 'dynticks_nesting'. May be I am missing a nesting scenario?
> We can exit RCU-idleness into process context only once (either exiting idle
> mode or user mode). Both cases would imply a value of 1.

Interrrupt -> NMI -> certain types of tracing.  I believe that can get
it to 5.  There might be even more elaborate sequences of events.

							Thanx, Paul
Paul E. McKenney Aug. 30, 2019, 12:48 a.m. UTC | #22
On Thu, Aug 29, 2019 at 03:00:46PM -0400, Joel Fernandes wrote:
> Hi Paul,
> 
> On Thu, Aug 29, 2019 at 09:54:07AM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 29, 2019 at 09:21:46AM -0700, Andy Lutomirski wrote:
> > > On Thu, Aug 29, 2019 at 9:10 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Thu, Aug 29, 2019 at 10:43:55AM -0400, Joel Fernandes wrote:
> > > >
> > > > [ . . . ]
> > > >
> > > > > Paul, do we also nuke rcu_eqs_special_set()?  Currently I don't see anyone
> > > > > using it. And also remove the bottom most bit of dynticks?
> > > > >
> > > > > Also what happens if a TLB flush broadcast is needed? Do we IPI nohz or idle
> > > > > CPUs are the moment?
> > > > >
> > > > > All of this was introduced in:
> > > > > b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")
> > > >
> > > > Adding Andy Lutomirski on CC.
> > > >
> > > > Andy, is this going to be used in the near term, or should we just get
> > > > rid of it?
> > > 
> > > Let's get rid of it.  I'm not actually convinced it *can* be used as designed.
> > > 
> > > For those who forgot the history or weren't cc'd on all of it: I had
> > > this clever idea about how we could reduce TLB flushes.  I implemented
> > > some of it (but not the part that would have used this RCU feature),
> > > and it exploded in nasty and subtle ways.  This caused me to learn
> > > that speculative TLB fills were a problem that I had entirely failed
> > > to account for.  Then PTI happened and thoroughly muddied the water.
> > 
> > Yeah, PTI was quite annoying.  Still is, from what I can see.  :-/
> > 
> > > So I think we should just drop this :(
> > 
> > OK, thank you!  I will put a tag into -rcu marking its removal in case
> > it should prove useful whenever for whatever.
> > 
> > Joel, would you like to remove this, or would you rather that I did?
> > It is in code you are working with right now, so if I do it, I need to
> > wait until yours is finalized.  Which wouldn't be a problem.
> 
> I can remove it in my series, made a note to do so.

Sounds good!

							Thanx, Paul
Joel Fernandes Aug. 30, 2019, 1:20 a.m. UTC | #23
On Thu, Aug 29, 2019 at 05:47:56PM -0700, Paul E. McKenney wrote:
[snip]
> > > > Paul, also what what happens in the following scenario:
> > > > 
> > > > CPU0                                                 CPU1
> > > > 
> > > > A syscall causes rcu_eqs_exit()
> > > > rcu_read_lock();
> > > >                                                      ---> FQS loop waiting on
> > > > 						           dyntick_snap
> > > > usermode-upcall  entry -->causes rcu_eqs_enter();
> > > > 
> > > > usermode-upcall  exit  -->causes rcu_eqs_exit();
> > > > 
> > > >                                                      ---> FQS loop sees
> > > > 						          dyntick snap
> > > > 							  increment and
> > > > 							  declares CPU0 is
> > > > 							  in a QS state
> > > > 							  before the
> > > > 							  rcu_read_unlock!
> > > > 
> > > > rcu_read_unlock();
> > > > ---
> > > > 
> > > > Does the context tracking not call rcu_user_enter() in this case, or did I
> > > > really miss something?
> > > 
> > > Holding rcu_read_lock() across usermode execution (in this case,
> > > the usermode upcall) is a bad idea.  Why is CPU 0 doing that?
> > 
> > Oh, ok. I was just hypothesizing that since usermode upcalls from
> > something as heavy as interrupts, it could also mean we had the same from
> > some path that held an rcu_read_lock() as well. It was just a theoretical
> > concern, if it is not an issue, no problem.
> 
> Are there the usual lockdep checks in the upcall code?  Holding a spinlock
> across them would seem to be at least as bad as holding an rcu_read_lock()
> across them.

Great point, I'll take a look.

> > The other question I had was, in which cases would dyntick_nesting in current
> > RCU code be > 1 (after removing the lower bit and any crowbarring) ? In the
> > scenarios I worked out on paper, I can only see this as 1 or 0. But the
> > wording of it is 'dynticks_nesting'. May be I am missing a nesting scenario?
> > We can exit RCU-idleness into process context only once (either exiting idle
> > mode or user mode). Both cases would imply a value of 1.
> 
> Interrrupt -> NMI -> certain types of tracing.  I believe that can get
> it to 5.  There might be even more elaborate sequences of events.

I am only talking about dynticks_nesting, not dynticks_nmi_nesting. In
current mainline, I see this only 0 or 1. I am running the below patch
overnight on all RCU configurations to see if it is ever any other value.

And, please feel free to ignore my emails as you mentioned you are supposed
to be out next 2 days! Thanks for the replies though!

---8<-----------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 68ebf0eb64c8..8c8ddb6457d5 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -571,6 +571,9 @@ static void rcu_eqs_enter(bool user)
 	WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
 		     rdp->dynticks_nesting == 0);
+
+	WARN_ON_ONCE(rdp->dynticks_nesting != 1);
+
 	if (rdp->dynticks_nesting != 1) {
 		rdp->dynticks_nesting--;
 		return;
@@ -736,6 +739,9 @@ static void rcu_eqs_exit(bool user)
 	lockdep_assert_irqs_disabled();
 	rdp = this_cpu_ptr(&rcu_data);
 	oldval = rdp->dynticks_nesting;
+
+	WARN_ON_ONCE(rdp->dynticks_nesting != 0);
+
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && oldval < 0);
 	if (oldval) {
 		rdp->dynticks_nesting++;
Paul E. McKenney Aug. 30, 2019, 2:45 a.m. UTC | #24
On Thu, Aug 29, 2019 at 09:20:36PM -0400, Joel Fernandes wrote:
> On Thu, Aug 29, 2019 at 05:47:56PM -0700, Paul E. McKenney wrote:
> [snip]
> > > > > Paul, also what what happens in the following scenario:
> > > > > 
> > > > > CPU0                                                 CPU1
> > > > > 
> > > > > A syscall causes rcu_eqs_exit()
> > > > > rcu_read_lock();
> > > > >                                                      ---> FQS loop waiting on
> > > > > 						           dyntick_snap
> > > > > usermode-upcall  entry -->causes rcu_eqs_enter();
> > > > > 
> > > > > usermode-upcall  exit  -->causes rcu_eqs_exit();
> > > > > 
> > > > >                                                      ---> FQS loop sees
> > > > > 						          dyntick snap
> > > > > 							  increment and
> > > > > 							  declares CPU0 is
> > > > > 							  in a QS state
> > > > > 							  before the
> > > > > 							  rcu_read_unlock!
> > > > > 
> > > > > rcu_read_unlock();
> > > > > ---
> > > > > 
> > > > > Does the context tracking not call rcu_user_enter() in this case, or did I
> > > > > really miss something?
> > > > 
> > > > Holding rcu_read_lock() across usermode execution (in this case,
> > > > the usermode upcall) is a bad idea.  Why is CPU 0 doing that?
> > > 
> > > Oh, ok. I was just hypothesizing that since usermode upcalls from
> > > something as heavy as interrupts, it could also mean we had the same from
> > > some path that held an rcu_read_lock() as well. It was just a theoretical
> > > concern, if it is not an issue, no problem.
> > 
> > Are there the usual lockdep checks in the upcall code?  Holding a spinlock
> > across them would seem to be at least as bad as holding an rcu_read_lock()
> > across them.
> 
> Great point, I'll take a look.
> 
> > > The other question I had was, in which cases would dyntick_nesting in current
> > > RCU code be > 1 (after removing the lower bit and any crowbarring) ? In the
> > > scenarios I worked out on paper, I can only see this as 1 or 0. But the
> > > wording of it is 'dynticks_nesting'. May be I am missing a nesting scenario?
> > > We can exit RCU-idleness into process context only once (either exiting idle
> > > mode or user mode). Both cases would imply a value of 1.
> > 
> > Interrrupt -> NMI -> certain types of tracing.  I believe that can get
> > it to 5.  There might be even more elaborate sequences of events.
> 
> I am only talking about dynticks_nesting, not dynticks_nmi_nesting. In
> current mainline, I see this only 0 or 1. I am running the below patch
> overnight on all RCU configurations to see if it is ever any other value.

Ah!  Then yes, we never enter non-idle/non-user process-level mode
twice without having exited it.  There would have been a splat,
I believe.

> And, please feel free to ignore my emails as you mentioned you are supposed
> to be out next 2 days! Thanks for the replies though!

Actually this day and next.  ;-)

							Thanx, Paul

> ---8<-----------------------
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 68ebf0eb64c8..8c8ddb6457d5 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -571,6 +571,9 @@ static void rcu_eqs_enter(bool user)
>  	WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
>  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
>  		     rdp->dynticks_nesting == 0);
> +
> +	WARN_ON_ONCE(rdp->dynticks_nesting != 1);
> +
>  	if (rdp->dynticks_nesting != 1) {
>  		rdp->dynticks_nesting--;
>  		return;
> @@ -736,6 +739,9 @@ static void rcu_eqs_exit(bool user)
>  	lockdep_assert_irqs_disabled();
>  	rdp = this_cpu_ptr(&rcu_data);
>  	oldval = rdp->dynticks_nesting;
> +
> +	WARN_ON_ONCE(rdp->dynticks_nesting != 0);
> +
>  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && oldval < 0);
>  	if (oldval) {
>  		rdp->dynticks_nesting++;
> -- 
> 2.23.0.187.g17f5b7556c-goog
>

Patch
diff mbox series

diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
index 4a48e20a46f2..a5a907f434a1 100644
--- a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
+++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
@@ -936,10 +936,9 @@  This portion of the rcu_data structure is declared as follows:
 ::
 
      1   long dynticks_nesting;
-     2   long dynticks_nmi_nesting;
-     3   atomic_t dynticks;
-     4   bool rcu_need_heavy_qs;
-     5   bool rcu_urgent_qs;
+     2   atomic_t dynticks;
+     3   bool rcu_need_heavy_qs;
+     4   bool rcu_urgent_qs;
 
 These fields in the rcu_data structure maintain the per-CPU dyntick-idle
 state for the corresponding CPU. The fields may be accessed only from
@@ -948,26 +947,14 @@  the corresponding CPU (and from tracing) unless otherwise stated.
 The ``->dynticks_nesting`` field counts the nesting depth of process
 execution, so that in normal circumstances this counter has value zero
 or one. NMIs, irqs, and tracers are counted by the
-``->dynticks_nmi_nesting`` field. Because NMIs cannot be masked, changes
+``->dynticks_nesting`` field as well. Because NMIs cannot be masked, changes
 to this variable have to be undertaken carefully using an algorithm
 provided by Andy Lutomirski. The initial transition from idle adds one,
 and nested transitions add two, so that a nesting level of five is
-represented by a ``->dynticks_nmi_nesting`` value of nine. This counter
+represented by a ``->dynticks_nesting`` value of nine. This counter
 can therefore be thought of as counting the number of reasons why this
-CPU cannot be permitted to enter dyntick-idle mode, aside from
-process-level transitions.
-
-However, it turns out that when running in non-idle kernel context, the
-Linux kernel is fully capable of entering interrupt handlers that never
-exit and perhaps also vice versa. Therefore, whenever the
-``->dynticks_nesting`` field is incremented up from zero, the
-``->dynticks_nmi_nesting`` field is set to a large positive number, and
-whenever the ``->dynticks_nesting`` field is decremented down to zero,
-the the ``->dynticks_nmi_nesting`` field is set to zero. Assuming that
-the number of misnested interrupts is not sufficient to overflow the
-counter, this approach corrects the ``->dynticks_nmi_nesting`` field
-every time the corresponding CPU enters the idle loop from process
-context.
+CPU cannot be permitted to enter dyntick-idle mode. It counts both the
+process-level and interrupt transitions.
 
 The ``->dynticks`` field counts the corresponding CPU's transitions to
 and from either dyntick-idle or user mode, so that this counter has an
@@ -1000,7 +987,9 @@  code.
 +-----------------------------------------------------------------------+
 | Because this would fail in the presence of interrupts whose handlers  |
 | never return and of handlers that manage to return from a made-up     |
-| interrupt.                                                            |
+| interrupt. NOTE: The counters have now been combined however          |
+| a temporary warning has been left to make sure this condition never   |
+| occurs.                                                               |
 +-----------------------------------------------------------------------+
 
 Additional fields are present for some special-purpose builds, and are
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index f48f4621ccbc..585f73009a56 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -173,8 +173,8 @@  For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
 it will print a message similar to the following:
 
 	INFO: rcu_sched detected stalls on CPUs/tasks:
-	2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
-	16-...: (0 ticks this GP) idle=81c/0/0 softirq=764/764 fqs=0
+	2-...: (3 GPs behind) idle=06c/0 softirq=1453/1455 fqs=0
+	16-...: (0 ticks this GP) idle=81c/0 softirq=764/764 fqs=0
 	(detected by 32, t=2603 jiffies, g=7075, q=625)
 
 This message indicates that CPU 32 detected that CPUs 2 and 16 were both
@@ -225,7 +225,7 @@  an estimate of the total number of RCU callbacks queued across all CPUs
 In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
 for each CPU:
 
-	0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D
+	0: (64628 ticks this GP) idle=dd5/3fffffffffffffff softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D
 
 The "last_accelerate:" prints the low-order 16 bits (in hex) of the
 jiffies counter when this CPU last invoked rcu_try_advance_all_cbs()
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 255cd6835526..1465a3e406f8 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -81,7 +81,6 @@ 
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
 	.dynticks_nesting = 1,
-	.dynticks_nmi_nesting = 0,
 	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
 };
 struct rcu_state rcu_state = {
@@ -392,15 +391,9 @@  static int rcu_is_cpu_rrupt_from_idle(void)
 	/* Check for counter underflows */
 	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
 			 "RCU dynticks_nesting counter underflow!");
-	RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
-			 "RCU dynticks_nmi_nesting counter underflow/zero!");
 
-	/* Are we at first interrupt nesting level? */
-	if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
-		return false;
-
-	/* Does CPU appear to be idle from an RCU standpoint? */
-	return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
+	/* Are we the outermost interrupt that arrived when RCU was idle? */
+	return __this_cpu_read(rcu_data.dynticks_nesting) == 1;
 }
 
 #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch ... */
@@ -564,11 +557,10 @@  static void rcu_eqs_enter(bool user)
 	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 
 	/* Entering usermode/idle from interrupt is not handled. These would
-	 * mean usermode upcalls or idle entry happened from interrupts. But,
-	 * reset the counter if we warn.
+	 * mean usermode upcalls or idle exit happened from interrupts. Remove
+	 * the warning by 2020.
 	 */
-	if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
-		WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
+	WARN_ON_ONCE(in_interrupt());
 
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
 		     rdp->dynticks_nesting == 0);
@@ -627,9 +619,8 @@  void rcu_user_enter(void)
 
 /*
  * If we are returning from the outermost NMI handler that interrupted an
- * RCU-idle period, update rdp->dynticks and rdp->dynticks_nmi_nesting
- * to let the RCU grace-period handling know that the CPU is back to
- * being RCU-idle.
+ * RCU-idle period, update rdp->dynticks to let the RCU grace-period handling
+ * know that the CPU is back to being RCU-idle.
  *
  * If you add or remove a call to rcu_nmi_exit_common(), be sure to test
  * with CONFIG_RCU_EQS_DEBUG=y.
@@ -639,16 +630,13 @@  static __always_inline void rcu_nmi_exit_common(bool irq)
 	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 
 	/*
-	 * Check for ->dynticks_nmi_nesting underflow and bad ->dynticks.
+	 * Check for ->dynticks_nesting underflow and bad ->dynticks.
 	 * (We are exiting an NMI handler, so RCU better be paying attention
 	 * to us!)
 	 */
 	WARN_ON_ONCE(rdp->dynticks_nesting <= 0);
-	WARN_ON_ONCE(rdp->dynticks_nmi_nesting <= 0);
 	WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs());
 
-	WRITE_ONCE(rdp->dynticks_nmi_nesting, /* No store tearing. */
-		   rdp->dynticks_nmi_nesting - 1);
 	/*
 	 * If the nesting level is not 1, the CPU wasn't RCU-idle, so
 	 * leave it in non-RCU-idle state.
@@ -750,11 +738,10 @@  static void rcu_eqs_exit(bool user)
 	WRITE_ONCE(rdp->dynticks_nesting, 1);
 
 	/* Exiting usermode/idle from interrupt is not handled. These would
-	 * mean usermode upcalls or idle exit happened from interrupts. But,
-	 * reset the counter if we warn.
+	 * mean usermode upcalls or idle exit happened from interrupts. Remove
+	 * the warning by 2020.
 	 */
-	if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0))
-		WRITE_ONCE(rdp->dynticks_nmi_nesting, 0);
+	WARN_ON_ONCE(in_interrupt());
 }
 
 /**
@@ -795,14 +782,13 @@  void rcu_user_exit(void)
  * rcu_nmi_enter_common - inform RCU of entry to NMI context
  * @irq: Is this call from rcu_irq_enter?
  *
- * If the CPU was idle from RCU's viewpoint, update rdp->dynticks and
- * rdp->dynticks_nmi_nesting to let the RCU grace-period handling know
- * that the CPU is active.  This implementation permits nested NMIs, as
- * long as the nesting level does not overflow an int.  (You will probably
- * run out of stack space first.)
+ * If the CPU was idle from RCU's viewpoint, update rdp->dynticks to let the
+ * RCU grace-period handling know that the CPU is active.  This implementation
+ * permits nested NMIs, as long as the nesting level does not overflow a long.
+ * (You will probably run out of stack space first.)
  *
- * If you add or remove a call to rcu_nmi_enter_common(), be sure to test
- * with CONFIG_RCU_EQS_DEBUG=y.
+ * If you add or remove a call to rcu_nmi_enter_common(), be sure to test with
+ * CONFIG_RCU_EQS_DEBUG=y.
  */
 static __always_inline void rcu_nmi_enter_common(bool irq)
 {
@@ -811,15 +797,16 @@  static __always_inline void rcu_nmi_enter_common(bool irq)
 
 	/* Complain about underflow. */
 	WARN_ON_ONCE(rdp->dynticks_nesting < 0);
-	WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0);
 
 	/*
 	 * If idle from RCU viewpoint, atomically increment ->dynticks
-	 * to mark non-idle and increment ->dynticks_nmi_nesting by one.
-	 * Otherwise, increment ->dynticks_nmi_nesting by two.  This means
-	 * if ->dynticks_nmi_nesting is equal to one, we are guaranteed
+	 * to mark non-idle and increment ->dynticks_nesting by one.
+	 * Otherwise, increment ->dynticks_nesting by two.  This means
+	 * if ->dynticks_nesting is equal to one, we are guaranteed
 	 * to be in the outermost NMI handler that interrupted an RCU-idle
-	 * period (observation due to Andy Lutomirski).
+	 * period (observation due to Andy Lutomirski). An exception
+	 * is if the interrupt arrived in kernel mode; in this case we would
+	 * be the outermost interrupt but still increment by 2 which is Ok.
 	 */
 	if (rcu_dynticks_curr_cpu_in_eqs()) {
 
@@ -832,8 +819,7 @@  static __always_inline void rcu_nmi_enter_common(bool irq)
 			rcu_cleanup_after_idle();
 
 		incby = 1;
-	} else if (tick_nohz_full_cpu(rdp->cpu) &&
-		   !rdp->dynticks_nmi_nesting && rdp->rcu_urgent_qs &&
+	} else if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_urgent_qs &&
 		   !rdp->rcu_forced_tick) {
 		rdp->rcu_forced_tick = true;
 		tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
@@ -846,8 +832,6 @@  static __always_inline void rcu_nmi_enter_common(bool irq)
 	WRITE_ONCE(rdp->dynticks_nesting, /* Prevent store tearing. */
 		   rdp->dynticks_nesting + incby);
 
-	WRITE_ONCE(rdp->dynticks_nmi_nesting, /* Prevent store tearing. */
-		   rdp->dynticks_nmi_nesting + 1);
 	barrier();
 }
 
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 055c31781d3a..ad7d3e31c5cf 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -176,8 +176,8 @@  struct rcu_data {
 
 	/* 3) dynticks interface. */
 	int dynticks_snap;		/* Per-GP tracking for dynticks. */
-	long dynticks_nesting;		/* Track process nesting level. */
-	long dynticks_nmi_nesting;	/* Track irq/NMI nesting level. */
+	long dynticks_nesting;		/* Track dyntick (non-IDLE) nesting
+					 * level for kernel entry and interrupt. */
 	atomic_t dynticks;		/* Even value for idle, else odd. */
 	bool rcu_need_heavy_qs;		/* GP old, so heavy quiescent state! */
 	bool rcu_urgent_qs;		/* GP old need light quiescent state. */
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 841ab43f3e60..0676460107d0 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -313,7 +313,7 @@  static void print_cpu_stall_info(int cpu)
 	}
 	print_cpu_stall_fast_no_hz(fast_no_hz, cpu);
 	delta = rcu_seq_ctr(rdp->mynode->gp_seq - rdp->rcu_iw_gp_seq);
-	pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld/%#lx softirq=%u/%u fqs=%ld %s\n",
+	pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld softirq=%u/%u fqs=%ld %s\n",
 	       cpu,
 	       "O."[!!cpu_online(cpu)],
 	       "o."[!!(rdp->grpmask & rdp->mynode->qsmaskinit)],
@@ -323,7 +323,7 @@  static void print_cpu_stall_info(int cpu)
 				"!."[!delta],
 	       ticks_value, ticks_title,
 	       rcu_dynticks_snap(rdp) & 0xfff,
-	       rdp->dynticks_nesting, rdp->dynticks_nmi_nesting,
+	       rdp->dynticks_nesting,
 	       rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
 	       READ_ONCE(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
 	       fast_no_hz);