[RFC,2/2] rcu: Remove ->dynticks_nmi_nesting from struct rcu_dynticks
diff mbox series

Message ID 1529484440-20634-2-git-send-email-byungchul.park@lge.com
State New, archived
Headers show
Series
  • [RFC,1/2] rcu: Do prepare and cleanup idle depending on in_nmi()
Related show

Commit Message

Byungchul Park June 20, 2018, 8:47 a.m. UTC
Hello folks,

I'm careful in saying that ->dynticks_nmi_nesting can be removed but I
think it's possible since the only thing we are interested in with
regard to ->dynticks_nesting or ->dynticks_nmi_nesting is whether rcu is
idle or not.

And I'm also afraid if the assumption is correct for every archs which I
based on, that is, an assignment operation on a single aligned word is
atomic in terms of instruction.

Thoughs?

----->8-----
>From 84970b33eb06c3bb1bebbb1754db405c0fc50fbe Mon Sep 17 00:00:00 2001
From: Byungchul Park <byungchul.park@lge.com>
Date: Wed, 20 Jun 2018 16:01:20 +0900
Subject: [RFC 2/2] rcu: Remove ->dynticks_nmi_nesting from struct rcu_dynticks

The only thing we are interested in with regard to ->dynticks_nesting or
->dynticks_nmi_nesting is whether rcu is idle or not, which can be
handled only using ->dynticks_nesting though. ->dynticks_nmi_nesting is
unnecessary but to make the code more complicated.

This patch makes both rcu_eqs_{enter,exit}() and rcu_nmi_{enter,exit}()
count up and down a single variable, ->dynticks_nesting to keep how many
rcu non-idle sections have been nested.

As a result, no matter who made the variable be non-zero, it's anyway
non-idle, and it can be considered as just having been idle once the
variable is equal to zero. So tricky code can be removed.

In addition, it was assumed that an assignment operation on a single
aligned word is atomic so that ->dynticks_nesting can be safely assigned
within both nmi context and others concurrently.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/rcu/tree.c        | 76 ++++++++++++++++++++----------------------------
 kernel/rcu/tree.h        |  4 +--
 kernel/rcu/tree_plugin.h |  4 +--
 3 files changed, 35 insertions(+), 49 deletions(-)

Comments

Paul E. McKenney June 20, 2018, 2:58 p.m. UTC | #1
On Wed, Jun 20, 2018 at 05:47:20PM +0900, Byungchul Park wrote:
> Hello folks,
> 
> I'm careful in saying that ->dynticks_nmi_nesting can be removed but I
> think it's possible since the only thing we are interested in with
> regard to ->dynticks_nesting or ->dynticks_nmi_nesting is whether rcu is
> idle or not.

Please keep in mind that NMIs cannot be masked, which means that the
rcu_nmi_enter() and rcu_nmi_exit() pair can be invoked at any point in
the process, between any consecutive pair of instructions.  The saving
grace is that these two functions restore state, but you cannot make them.
After all, NMI does stand for non-maskable interrupt.

At first glance, the code below does -not- take this into account.
What am I missing that would make this change safe?  (You would need to
convince both me and Andy Lutomirski, who I have added on CC.)

> And I'm also afraid if the assumption is correct for every archs which I
> based on, that is, an assignment operation on a single aligned word is
> atomic in terms of instruction.

The key point is that both ->dynticks_nesting and ->dynticks_nmi_nesting
are accessed only by the corresponding CPU (other than debug prints).
Load and store tearing should therefore not be a problem.  Are there
other reasons to promote to READ_ONCE() and WRITE_ONCE()?  If there are,
a separate patch doing that promotion would be good.

							Thanx, Paul

> Thoughs?
> 
> ----->8-----
> >From 84970b33eb06c3bb1bebbb1754db405c0fc50fbe Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul.park@lge.com>
> Date: Wed, 20 Jun 2018 16:01:20 +0900
> Subject: [RFC 2/2] rcu: Remove ->dynticks_nmi_nesting from struct rcu_dynticks
> 
> The only thing we are interested in with regard to ->dynticks_nesting or
> ->dynticks_nmi_nesting is whether rcu is idle or not, which can be
> handled only using ->dynticks_nesting though. ->dynticks_nmi_nesting is
> unnecessary but to make the code more complicated.
> 
> This patch makes both rcu_eqs_{enter,exit}() and rcu_nmi_{enter,exit}()
> count up and down a single variable, ->dynticks_nesting to keep how many
> rcu non-idle sections have been nested.
> 
> As a result, no matter who made the variable be non-zero, it's anyway
> non-idle, and it can be considered as just having been idle once the
> variable is equal to zero. So tricky code can be removed.
> 
> In addition, it was assumed that an assignment operation on a single
> aligned word is atomic so that ->dynticks_nesting can be safely assigned
> within both nmi context and others concurrently.
> 
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
>  kernel/rcu/tree.c        | 76 ++++++++++++++++++++----------------------------
>  kernel/rcu/tree.h        |  4 +--
>  kernel/rcu/tree_plugin.h |  4 +--
>  3 files changed, 35 insertions(+), 49 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 59ae94e..61f203a 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -260,7 +260,6 @@ void rcu_bh_qs(void)
> 
>  static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
>  	.dynticks_nesting = 1,
> -	.dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
>  	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
>  };
> 
> @@ -694,10 +693,6 @@ static struct rcu_node *rcu_get_root(struct rcu_state *rsp)
>  /*
>   * Enter an RCU extended quiescent state, which can be either the
>   * idle loop or adaptive-tickless usermode execution.
> - *
> - * We crowbar the ->dynticks_nmi_nesting field to zero to allow for
> - * the possibility of usermode upcalls having messed up our count
> - * of interrupt nesting level during the prior busy period.
>   */
>  static void rcu_eqs_enter(bool user)
>  {
> @@ -706,11 +701,11 @@ static void rcu_eqs_enter(bool user)
>  	struct rcu_dynticks *rdtp;
> 
>  	rdtp = this_cpu_ptr(&rcu_dynticks);
> -	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0);
>  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
>  		     rdtp->dynticks_nesting == 0);
>  	if (rdtp->dynticks_nesting != 1) {
> -		rdtp->dynticks_nesting--;
> +		WRITE_ONCE(rdtp->dynticks_nesting, /* No store tearing. */
> +			   rdtp->dynticks_nesting - 1);
>  		return;
>  	}
> 
> @@ -767,7 +762,7 @@ void rcu_user_enter(void)
>   * rcu_nmi_exit - inform RCU of exit from NMI context
>   *
>   * If we are returning from the outermost NMI handler that interrupted an
> - * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting
> + * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nesting
>   * to let the RCU grace-period handling know that the CPU is back to
>   * being RCU-idle.
>   *
> @@ -779,21 +774,21 @@ void rcu_nmi_exit(void)
>  	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> 
>  	/*
> -	 * Check for ->dynticks_nmi_nesting underflow and bad ->dynticks.
> +	 * Check for ->dynticks_nesting underflow and bad ->dynticks.
>  	 * (We are exiting an NMI handler, so RCU better be paying attention
>  	 * to us!)
>  	 */
> -	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting <= 0);
> +	WARN_ON_ONCE(rdtp->dynticks_nesting <= 0);
>  	WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs());
> 
>  	/*
>  	 * If the nesting level is not 1, the CPU wasn't RCU-idle, so
>  	 * leave it in non-RCU-idle state.
>  	 */
> -	if (rdtp->dynticks_nmi_nesting != 1) {
> -		trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nmi_nesting, rdtp->dynticks_nmi_nesting - 2, rdtp->dynticks);
> -		WRITE_ONCE(rdtp->dynticks_nmi_nesting, /* No store tearing. */
> -			   rdtp->dynticks_nmi_nesting - 2);
> +	if (rdtp->dynticks_nesting != 1) {
> +		trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1, rdtp->dynticks);
> +		WRITE_ONCE(rdtp->dynticks_nesting, /* No store tearing. */
> +			   rdtp->dynticks_nesting - 1);
>  		return;
>  	}
> 
> @@ -803,8 +798,8 @@ void rcu_nmi_exit(void)
>  	}
> 
>  	/* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
> -	trace_rcu_dyntick(TPS("Startirq"), rdtp->dynticks_nmi_nesting, 0, rdtp->dynticks);
> -	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
> +	trace_rcu_dyntick(TPS("Startirq"), rdtp->dynticks_nesting, 0, rdtp->dynticks);
> +	WRITE_ONCE(rdtp->dynticks_nesting, 0); /* Avoid store tearing. */
>  	rcu_dynticks_eqs_enter();
>  }
> 
> @@ -851,10 +846,6 @@ void rcu_irq_exit_irqson(void)
>  /*
>   * Exit an RCU extended quiescent state, which can be either the
>   * idle loop or adaptive-tickless usermode execution.
> - *
> - * We crowbar the ->dynticks_nmi_nesting field to DYNTICK_IRQ_NONIDLE to
> - * allow for the possibility of usermode upcalls messing up our count of
> - * interrupt nesting level during the busy period that is just now starting.
>   */
>  static void rcu_eqs_exit(bool user)
>  {
> @@ -866,7 +857,8 @@ static void rcu_eqs_exit(bool user)
>  	oldval = rdtp->dynticks_nesting;
>  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && oldval < 0);
>  	if (oldval) {
> -		rdtp->dynticks_nesting++;
> +		WRITE_ONCE(rdtp->dynticks_nesting, /* No store tearing. */
> +			   rdtp->dynticks_nesting + 1);
>  		return;
>  	}
>  	rcu_dynticks_task_exit();
> @@ -875,7 +867,6 @@ static void rcu_eqs_exit(bool user)
>  	trace_rcu_dyntick(TPS("End"), rdtp->dynticks_nesting, 1, rdtp->dynticks);
>  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
>  	WRITE_ONCE(rdtp->dynticks_nesting, 1);
> -	WRITE_ONCE(rdtp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);
>  }
> 
>  /**
> @@ -915,11 +906,11 @@ void rcu_user_exit(void)
>  /**
>   * rcu_nmi_enter - inform RCU of entry to NMI context
>   *
> - * If the CPU was idle from RCU's viewpoint, update rdtp->dynticks and
> - * rdtp->dynticks_nmi_nesting to let the RCU grace-period handling know
> - * that the CPU is active.  This implementation permits nested NMIs, as
> - * long as the nesting level does not overflow an int.  (You will probably
> - * run out of stack space first.)
> + * If the CPU was idle from RCU's viewpoint, update rdtp->dynticks. And
> + * then update rdtp->dynticks_nesting to let the RCU grace-period handling
> + * know that the CPU is active.  This implementation permits nested NMIs,
> + * as long as the nesting level does not overflow an int.  (You will
> + * probably run out of stack space first.)
>   *
>   * If you add or remove a call to rcu_nmi_enter(), be sure to test
>   * with CONFIG_RCU_EQS_DEBUG=y.
> @@ -927,33 +918,29 @@ void rcu_user_exit(void)
>  void rcu_nmi_enter(void)
>  {
>  	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> -	long incby = 2;
> 
>  	/* Complain about underflow. */
> -	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 0);
> +	WARN_ON_ONCE(rdtp->dynticks_nesting < 0);
> 
> -	/*
> -	 * If idle from RCU viewpoint, atomically increment ->dynticks
> -	 * to mark non-idle and increment ->dynticks_nmi_nesting by one.
> -	 * Otherwise, increment ->dynticks_nmi_nesting by two.  This means
> -	 * if ->dynticks_nmi_nesting is equal to one, we are guaranteed
> -	 * to be in the outermost NMI handler that interrupted an RCU-idle
> -	 * period (observation due to Andy Lutomirski).
> -	 */
>  	if (rcu_dynticks_curr_cpu_in_eqs()) {
>  		rcu_dynticks_eqs_exit();
> -		incby = 1;
> 
>  		if (!in_nmi()) {
>  			rcu_dynticks_task_exit();
>  			rcu_cleanup_after_idle();
>  		}
>  	}
> -	trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
> -			  rdtp->dynticks_nmi_nesting,
> -			  rdtp->dynticks_nmi_nesting + incby, rdtp->dynticks);
> -	WRITE_ONCE(rdtp->dynticks_nmi_nesting, /* Prevent store tearing. */
> -		   rdtp->dynticks_nmi_nesting + incby);
> +
> +	trace_rcu_dyntick(rdtp->dynticks_nesting ? TPS("++=") : TPS("Endirq"),
> +			  rdtp->dynticks_nesting,
> +			  rdtp->dynticks_nesting + 1, rdtp->dynticks);
> +	/*
> +	 * If ->dynticks_nesting is equal to one on rcu_nmi_exit(), we are
> +	 * guaranteed to be in the outermost NMI handler that interrupted
> +	 * an RCU-idle period (observation due to Andy Lutomirski).
> +	 */
> +	WRITE_ONCE(rdtp->dynticks_nesting, /* Prevent store tearing. */
> +		   rdtp->dynticks_nesting + 1);
>  	barrier();
>  }
> 
> @@ -1089,8 +1076,7 @@ bool rcu_lockdep_current_cpu_online(void)
>   */
>  static int rcu_is_cpu_rrupt_from_idle(void)
>  {
> -	return __this_cpu_read(rcu_dynticks.dynticks_nesting) <= 0 &&
> -	       __this_cpu_read(rcu_dynticks.dynticks_nmi_nesting) <= 1;
> +	return __this_cpu_read(rcu_dynticks.dynticks_nesting) <= 1;
>  }
> 
>  /*
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 4e74df7..071afe4 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -38,8 +38,8 @@
>   * Dynticks per-CPU state.
>   */
>  struct rcu_dynticks {
> -	long dynticks_nesting;      /* Track process nesting level. */
> -	long dynticks_nmi_nesting;  /* Track irq/NMI nesting level. */
> +	long dynticks_nesting __attribute__((aligned(sizeof(long))));
> +				    /* Track process nesting level. */
>  	atomic_t dynticks;	    /* Even value for idle, else odd. */
>  	bool rcu_need_heavy_qs;     /* GP old, need heavy quiescent state. */
>  	unsigned long rcu_qs_ctr;   /* Light universal quiescent state ctr. */
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index c1b17f5..0c57e50 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1801,7 +1801,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
>  	}
>  	print_cpu_stall_fast_no_hz(fast_no_hz, cpu);
>  	delta = rcu_seq_ctr(rdp->mynode->gp_seq - rdp->rcu_iw_gp_seq);
> -	pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld/%#lx softirq=%u/%u fqs=%ld %s\n",
> +	pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld softirq=%u/%u fqs=%ld %s\n",
>  	       cpu,
>  	       "O."[!!cpu_online(cpu)],
>  	       "o."[!!(rdp->grpmask & rdp->mynode->qsmaskinit)],
> @@ -1811,7 +1811,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
>  				"!."[!delta],
>  	       ticks_value, ticks_title,
>  	       rcu_dynticks_snap(rdtp) & 0xfff,
> -	       rdtp->dynticks_nesting, rdtp->dynticks_nmi_nesting,
> +	       rdtp->dynticks_nesting,
>  	       rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
>  	       READ_ONCE(rsp->n_force_qs) - rsp->n_force_qs_gpstart,
>  	       fast_no_hz);
> -- 
> 1.9.1
>
Byungchul Park June 20, 2018, 4:05 p.m. UTC | #2
On Wed, Jun 20, 2018 at 11:58 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Wed, Jun 20, 2018 at 05:47:20PM +0900, Byungchul Park wrote:
>> Hello folks,
>>
>> I'm careful in saying that ->dynticks_nmi_nesting can be removed but I
>> think it's possible since the only thing we are interested in with
>> regard to ->dynticks_nesting or ->dynticks_nmi_nesting is whether rcu is
>> idle or not.
>
> Please keep in mind that NMIs cannot be masked, which means that the
> rcu_nmi_enter() and rcu_nmi_exit() pair can be invoked at any point in
> the process, between any consecutive pair of instructions.  The saving

I believe I understand what NMI is and why you introduced
->dynticks_nmi_nesting. Or am I missing something?

> grace is that these two functions restore state, but you cannot make them.
> After all, NMI does stand for non-maskable interrupt.

Excuse me, but I think I've considered that all. Could you show me
what problem can happen with this?

These two functions, rcu_nmi_enter() and rcu_nmi_exit(), would still
save and restore the state with ->dynticks_nesting.

Even though I made ->dynticks_nesting shared between NMI and
other contexts entering or exiting eqs, I believe it's not a problem
because anyway the variable would be updated and finally restored
in a *nested* manner.

> At first glance, the code below does -not- take this into account.

Excuse me, but could explain it more? I don't understand your point :(

> What am I missing that would make this change safe?  (You would need to
> convince both me and Andy Lutomirski, who I have added on CC.)

Thanks for doing that.

>> And I'm also afraid if the assumption is correct for every archs which I
>> based on, that is, an assignment operation on a single aligned word is
>> atomic in terms of instruction.
>
> The key point is that both ->dynticks_nesting and ->dynticks_nmi_nesting
> are accessed only by the corresponding CPU (other than debug prints).
> Load and store tearing should therefore not be a problem.  Are there

Right. But I thought it can be a problem between NMI and other contexts
because I made ->dynticks_nesting shared between NMI and others.

> other reasons to promote to READ_ONCE() and WRITE_ONCE()?  If there are,
> a separate patch doing that promotion would be good.

But the promotion is meaningless without making ->dynticks_nesting
shared as you said. I'm afraid it's too dependent on this patch to
separate it.

I'm sorry I don't understand your point. It would be very appreciated if
you explain it more about what I'm missing or your point :(

>                                                         Thanx, Paul
>
>> Thoughs?
>>
>> ----->8-----
>> >From 84970b33eb06c3bb1bebbb1754db405c0fc50fbe Mon Sep 17 00:00:00 2001
>> From: Byungchul Park <byungchul.park@lge.com>
>> Date: Wed, 20 Jun 2018 16:01:20 +0900
>> Subject: [RFC 2/2] rcu: Remove ->dynticks_nmi_nesting from struct rcu_dynticks
>>
>> The only thing we are interested in with regard to ->dynticks_nesting or
>> ->dynticks_nmi_nesting is whether rcu is idle or not, which can be
>> handled only using ->dynticks_nesting though. ->dynticks_nmi_nesting is
>> unnecessary but to make the code more complicated.
>>
>> This patch makes both rcu_eqs_{enter,exit}() and rcu_nmi_{enter,exit}()
>> count up and down a single variable, ->dynticks_nesting to keep how many
>> rcu non-idle sections have been nested.
>>
>> As a result, no matter who made the variable be non-zero, it's anyway
>> non-idle, and it can be considered as just having been idle once the
>> variable is equal to zero. So tricky code can be removed.
>>
>> In addition, it was assumed that an assignment operation on a single
>> aligned word is atomic so that ->dynticks_nesting can be safely assigned
>> within both nmi context and others concurrently.
>>
>> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
>> ---
>>  kernel/rcu/tree.c        | 76 ++++++++++++++++++++----------------------------
>>  kernel/rcu/tree.h        |  4 +--
>>  kernel/rcu/tree_plugin.h |  4 +--
>>  3 files changed, 35 insertions(+), 49 deletions(-)
>>
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index 59ae94e..61f203a 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -260,7 +260,6 @@ void rcu_bh_qs(void)
>>
>>  static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
>>       .dynticks_nesting = 1,
>> -     .dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
>>       .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
>>  };
>>
>> @@ -694,10 +693,6 @@ static struct rcu_node *rcu_get_root(struct rcu_state *rsp)
>>  /*
>>   * Enter an RCU extended quiescent state, which can be either the
>>   * idle loop or adaptive-tickless usermode execution.
>> - *
>> - * We crowbar the ->dynticks_nmi_nesting field to zero to allow for
>> - * the possibility of usermode upcalls having messed up our count
>> - * of interrupt nesting level during the prior busy period.
>>   */
>>  static void rcu_eqs_enter(bool user)
>>  {
>> @@ -706,11 +701,11 @@ static void rcu_eqs_enter(bool user)
>>       struct rcu_dynticks *rdtp;
>>
>>       rdtp = this_cpu_ptr(&rcu_dynticks);
>> -     WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0);
>>       WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
>>                    rdtp->dynticks_nesting == 0);
>>       if (rdtp->dynticks_nesting != 1) {
>> -             rdtp->dynticks_nesting--;
>> +             WRITE_ONCE(rdtp->dynticks_nesting, /* No store tearing. */
>> +                        rdtp->dynticks_nesting - 1);
>>               return;
>>       }
>>
>> @@ -767,7 +762,7 @@ void rcu_user_enter(void)
>>   * rcu_nmi_exit - inform RCU of exit from NMI context
>>   *
>>   * If we are returning from the outermost NMI handler that interrupted an
>> - * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting
>> + * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nesting
>>   * to let the RCU grace-period handling know that the CPU is back to
>>   * being RCU-idle.
>>   *
>> @@ -779,21 +774,21 @@ void rcu_nmi_exit(void)
>>       struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
>>
>>       /*
>> -      * Check for ->dynticks_nmi_nesting underflow and bad ->dynticks.
>> +      * Check for ->dynticks_nesting underflow and bad ->dynticks.
>>        * (We are exiting an NMI handler, so RCU better be paying attention
>>        * to us!)
>>        */
>> -     WARN_ON_ONCE(rdtp->dynticks_nmi_nesting <= 0);
>> +     WARN_ON_ONCE(rdtp->dynticks_nesting <= 0);
>>       WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs());
>>
>>       /*
>>        * If the nesting level is not 1, the CPU wasn't RCU-idle, so
>>        * leave it in non-RCU-idle state.
>>        */
>> -     if (rdtp->dynticks_nmi_nesting != 1) {
>> -             trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nmi_nesting, rdtp->dynticks_nmi_nesting - 2, rdtp->dynticks);
>> -             WRITE_ONCE(rdtp->dynticks_nmi_nesting, /* No store tearing. */
>> -                        rdtp->dynticks_nmi_nesting - 2);
>> +     if (rdtp->dynticks_nesting != 1) {
>> +             trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1, rdtp->dynticks);
>> +             WRITE_ONCE(rdtp->dynticks_nesting, /* No store tearing. */
>> +                        rdtp->dynticks_nesting - 1);
>>               return;
>>       }
>>
>> @@ -803,8 +798,8 @@ void rcu_nmi_exit(void)
>>       }
>>
>>       /* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
>> -     trace_rcu_dyntick(TPS("Startirq"), rdtp->dynticks_nmi_nesting, 0, rdtp->dynticks);
>> -     WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
>> +     trace_rcu_dyntick(TPS("Startirq"), rdtp->dynticks_nesting, 0, rdtp->dynticks);
>> +     WRITE_ONCE(rdtp->dynticks_nesting, 0); /* Avoid store tearing. */
>>       rcu_dynticks_eqs_enter();
>>  }
>>
>> @@ -851,10 +846,6 @@ void rcu_irq_exit_irqson(void)
>>  /*
>>   * Exit an RCU extended quiescent state, which can be either the
>>   * idle loop or adaptive-tickless usermode execution.
>> - *
>> - * We crowbar the ->dynticks_nmi_nesting field to DYNTICK_IRQ_NONIDLE to
>> - * allow for the possibility of usermode upcalls messing up our count of
>> - * interrupt nesting level during the busy period that is just now starting.
>>   */
>>  static void rcu_eqs_exit(bool user)
>>  {
>> @@ -866,7 +857,8 @@ static void rcu_eqs_exit(bool user)
>>       oldval = rdtp->dynticks_nesting;
>>       WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && oldval < 0);
>>       if (oldval) {
>> -             rdtp->dynticks_nesting++;
>> +             WRITE_ONCE(rdtp->dynticks_nesting, /* No store tearing. */
>> +                        rdtp->dynticks_nesting + 1);
>>               return;
>>       }
>>       rcu_dynticks_task_exit();
>> @@ -875,7 +867,6 @@ static void rcu_eqs_exit(bool user)
>>       trace_rcu_dyntick(TPS("End"), rdtp->dynticks_nesting, 1, rdtp->dynticks);
>>       WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
>>       WRITE_ONCE(rdtp->dynticks_nesting, 1);
>> -     WRITE_ONCE(rdtp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);
>>  }
>>
>>  /**
>> @@ -915,11 +906,11 @@ void rcu_user_exit(void)
>>  /**
>>   * rcu_nmi_enter - inform RCU of entry to NMI context
>>   *
>> - * If the CPU was idle from RCU's viewpoint, update rdtp->dynticks and
>> - * rdtp->dynticks_nmi_nesting to let the RCU grace-period handling know
>> - * that the CPU is active.  This implementation permits nested NMIs, as
>> - * long as the nesting level does not overflow an int.  (You will probably
>> - * run out of stack space first.)
>> + * If the CPU was idle from RCU's viewpoint, update rdtp->dynticks. And
>> + * then update rdtp->dynticks_nesting to let the RCU grace-period handling
>> + * know that the CPU is active.  This implementation permits nested NMIs,
>> + * as long as the nesting level does not overflow an int.  (You will
>> + * probably run out of stack space first.)
>>   *
>>   * If you add or remove a call to rcu_nmi_enter(), be sure to test
>>   * with CONFIG_RCU_EQS_DEBUG=y.
>> @@ -927,33 +918,29 @@ void rcu_user_exit(void)
>>  void rcu_nmi_enter(void)
>>  {
>>       struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
>> -     long incby = 2;
>>
>>       /* Complain about underflow. */
>> -     WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 0);
>> +     WARN_ON_ONCE(rdtp->dynticks_nesting < 0);
>>
>> -     /*
>> -      * If idle from RCU viewpoint, atomically increment ->dynticks
>> -      * to mark non-idle and increment ->dynticks_nmi_nesting by one.
>> -      * Otherwise, increment ->dynticks_nmi_nesting by two.  This means
>> -      * if ->dynticks_nmi_nesting is equal to one, we are guaranteed
>> -      * to be in the outermost NMI handler that interrupted an RCU-idle
>> -      * period (observation due to Andy Lutomirski).
>> -      */
>>       if (rcu_dynticks_curr_cpu_in_eqs()) {
>>               rcu_dynticks_eqs_exit();
>> -             incby = 1;
>>
>>               if (!in_nmi()) {
>>                       rcu_dynticks_task_exit();
>>                       rcu_cleanup_after_idle();
>>               }
>>       }
>> -     trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
>> -                       rdtp->dynticks_nmi_nesting,
>> -                       rdtp->dynticks_nmi_nesting + incby, rdtp->dynticks);
>> -     WRITE_ONCE(rdtp->dynticks_nmi_nesting, /* Prevent store tearing. */
>> -                rdtp->dynticks_nmi_nesting + incby);
>> +
>> +     trace_rcu_dyntick(rdtp->dynticks_nesting ? TPS("++=") : TPS("Endirq"),
>> +                       rdtp->dynticks_nesting,
>> +                       rdtp->dynticks_nesting + 1, rdtp->dynticks);
>> +     /*
>> +      * If ->dynticks_nesting is equal to one on rcu_nmi_exit(), we are
>> +      * guaranteed to be in the outermost NMI handler that interrupted
>> +      * an RCU-idle period (observation due to Andy Lutomirski).
>> +      */
>> +     WRITE_ONCE(rdtp->dynticks_nesting, /* Prevent store tearing. */
>> +                rdtp->dynticks_nesting + 1);
>>       barrier();
>>  }
>>
>> @@ -1089,8 +1076,7 @@ bool rcu_lockdep_current_cpu_online(void)
>>   */
>>  static int rcu_is_cpu_rrupt_from_idle(void)
>>  {
>> -     return __this_cpu_read(rcu_dynticks.dynticks_nesting) <= 0 &&
>> -            __this_cpu_read(rcu_dynticks.dynticks_nmi_nesting) <= 1;
>> +     return __this_cpu_read(rcu_dynticks.dynticks_nesting) <= 1;
>>  }
>>
>>  /*
>> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
>> index 4e74df7..071afe4 100644
>> --- a/kernel/rcu/tree.h
>> +++ b/kernel/rcu/tree.h
>> @@ -38,8 +38,8 @@
>>   * Dynticks per-CPU state.
>>   */
>>  struct rcu_dynticks {
>> -     long dynticks_nesting;      /* Track process nesting level. */
>> -     long dynticks_nmi_nesting;  /* Track irq/NMI nesting level. */
>> +     long dynticks_nesting __attribute__((aligned(sizeof(long))));
>> +                                 /* Track process nesting level. */
>>       atomic_t dynticks;          /* Even value for idle, else odd. */
>>       bool rcu_need_heavy_qs;     /* GP old, need heavy quiescent state. */
>>       unsigned long rcu_qs_ctr;   /* Light universal quiescent state ctr. */
>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>> index c1b17f5..0c57e50 100644
>> --- a/kernel/rcu/tree_plugin.h
>> +++ b/kernel/rcu/tree_plugin.h
>> @@ -1801,7 +1801,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
>>       }
>>       print_cpu_stall_fast_no_hz(fast_no_hz, cpu);
>>       delta = rcu_seq_ctr(rdp->mynode->gp_seq - rdp->rcu_iw_gp_seq);
>> -     pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld/%#lx softirq=%u/%u fqs=%ld %s\n",
>> +     pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld softirq=%u/%u fqs=%ld %s\n",
>>              cpu,
>>              "O."[!!cpu_online(cpu)],
>>              "o."[!!(rdp->grpmask & rdp->mynode->qsmaskinit)],
>> @@ -1811,7 +1811,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
>>                               "!."[!delta],
>>              ticks_value, ticks_title,
>>              rcu_dynticks_snap(rdtp) & 0xfff,
>> -            rdtp->dynticks_nesting, rdtp->dynticks_nmi_nesting,
>> +            rdtp->dynticks_nesting,
>>              rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
>>              READ_ONCE(rsp->n_force_qs) - rsp->n_force_qs_gpstart,
>>              fast_no_hz);
>> --
>> 1.9.1
>>
>
Paul E. McKenney June 20, 2018, 5:40 p.m. UTC | #3
On Thu, Jun 21, 2018 at 02:15:07AM +0900, Byungchul Park wrote:
> On Thu, Jun 21, 2018 at 1:49 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> 
> [...]
> 
> > Perhaps the fact that there are architectures that can enter interrupt
> > handlers and never leave them when the CPU is non-idle.  One example of
> > this is the usermode upcalls in the comment that you removed.
> >
> > Or have all the architectures been modified so that each and every call to
> > rcu_irq_enter() and to rcu_irq_exit() are now properly paired and nested?
> >
> > Proper nesting and pairing was -not- present in the past, hence the
> > special updates (AKA "crowbar") to the counters when transitioning to
> > and from idle.
> 
> Thank you very much for explaining it in detail.
> 
> > If proper nesting and pairing of rcu_irq_enter() and rcu_irq_exit()
> > is now fully in force across all architectures and configurations, the
> > commit log needs to say this, preferably pointing to the corresponding
> > commits that made this change.
> 
> Right.
> 
> > There is a call to rcu_irq_enter() without a corresponding call to
> > rcu_irq_exit(), so that the ->dynticks_nesting counter never goes back to
> > zero so that the next time this CPU goes idle, RCU thinks that the CPU
> > is still non-idle.  This can result in excessively long grace periods
> > and needless IPIs to idle CPUs.
> 
> No doubt.
> 
> > But only if calls to rcu_irq_enter() and rcu_irq_exit() are now always
> > properly paired and nested, which was definitely -not- the case last
> > I looked.
> 
> I missed it. Right. It's worth only in the case that calls to rcu_irq_enter()
> and rcu_irq_exit() are always properly paired and nested.
> 
> > OK, so I can further consider this pair of patches only if
> > all architectures now properly pair and nest rcu_irq_enter() and
> > rcu_irq_exit().  It would be very good if they did, but actually testing
> > back in the day showed that they did not.  If that has changed, that
> > would be a very good thing, but if not, this patch injects bugs.
> 
> Totally agree with you. Sorry bothering you.

Absolutely not a problem, absolutely no need to apologize!  I am
actually very happy that you are taking RCU seriously and looking at it
in such depth.

My problem is that when I see a patch like this, something in the back of
my head screams "WRONG!!!", and I sometimes get confused about exactly
what the back of my head is screaming about, which was the case here.
Hence my misguided initial complaint about NMI nesting instead of about
the possibility of unpaired rcu_irq_enter() calls.

So apologies for that, but I unfortunately cannot promise that this
won't happen again.  I have learned the hard way to trust the back of
my head.  It sometimes makes mistakes, but less often than the rest of
my head does.  ;-)

In the meantime, is it possible to rearrange rcu_irq_enter() and
rcu_nmi_enter() (and similarly rcu_irq_exit() and rcu_nmi_exit())
to avoid the conditionals (via compiler inlining) while still keeping
function calls ordered properly?  I bet that you could do it by splitting
rcu_nmi_enter() and rcu_nmi_exit() sort of like this:

	static void rcu_nmi_enter_common(bool irq)
	{
		/*
		 * You fill this in.  Maybe __always_inline above.  The
		 * rcu_dynticks_task_exit() and rcu_cleanup_after_idle()
		 * calls need to be on opposite sides of the
		 * rcu_dynticks_eqs_exit() call, just like they are now.
		 */
	}

	void rcu_nmi_enter(void)
	{
		rcu_nmi_enter_common(false);
	}

	void rcu_irq_enter(void)
	{
		lockdep_assert_irqs_disabled();
		rcu_nmi_enter(true);
	}

Saving a couple of branches on the irq enter/exit paths seems like it
just might be worth something.  ;-)

							Thanx, Paul
Byungchul Park June 21, 2018, 6:48 a.m. UTC | #4
On Thu, Jun 21, 2018 at 03:39:49PM +0900, Byungchul Park wrote:
> On Wed, Jun 20, 2018 at 10:40:37AM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 21, 2018 at 02:15:07AM +0900, Byungchul Park wrote:
> 
> [...]
> 
> > > Totally agree with you. Sorry bothering you.
> > 
> > Absolutely not a problem, absolutely no need to apologize!  I am
> > actually very happy that you are taking RCU seriously and looking at it
> > in such depth.
> 
> Thanks a lot. :)
> 
> > My problem is that when I see a patch like this, something in the back of
> > my head screams "WRONG!!!", and I sometimes get confused about exactly
> > what the back of my head is screaming about, which was the case here.
> > Hence my misguided initial complaint about NMI nesting instead of about
> > the possibility of unpaired rcu_irq_enter() calls.
> > 
> > So apologies for that, but I unfortunately cannot promise that this
> 
> It's ok. I also made a mistake.
> 
> > won't happen again.  I have learned the hard way to trust the back of
> > my head.  It sometimes makes mistakes, but less often than the rest of
> > my head does.  ;-)
> 
> I believe it doesn't matter at all as everybody makes mistakes. You must
> be much more careful in everything than others though. I believe the
> only problem with regard to human's mistakes is the attitude never even
> trying to communicate with others, being convinced that they've never
> made mistakes.
> 
> > In the meantime, is it possible to rearrange rcu_irq_enter() and
> > rcu_nmi_enter() (and similarly rcu_irq_exit() and rcu_nmi_exit())
> > to avoid the conditionals (via compiler inlining) while still keeping
> > function calls ordered properly?  I bet that you could do it by splitting
> > rcu_nmi_enter() and rcu_nmi_exit() sort of like this:
> > 
> > 	static void rcu_nmi_enter_common(bool irq)
> > 	{
> > 		/*
> > 		 * You fill this in.  Maybe __always_inline above.  The
> > 		 * rcu_dynticks_task_exit() and rcu_cleanup_after_idle()
> > 		 * calls need to be on opposite sides of the
> > 		 * rcu_dynticks_eqs_exit() call, just like they are now.
> > 		 */
> > 	}
> > 
> > 	void rcu_nmi_enter(void)
> > 	{
> > 		rcu_nmi_enter_common(false);
> > 	}
> > 
> > 	void rcu_irq_enter(void)
> > 	{
> > 		lockdep_assert_irqs_disabled();
> > 		rcu_nmi_enter(true);
> > 	}
> > 
> > Saving a couple of branches on the irq enter/exit paths seems like it
> > just might be worth something.  ;-)
> 
> What about the following patch?
> 
> I applied what you suggested and re-named rcu_nmi_{enter,exit} to
> rcu_irq_{enter,exit} and applied the same re-naming to
> ->dynticks_nmi_nesting as well, since those are not things to do with
> nmi anymore but both irq and nmi.
> 
> I think "irq" is better to represent both irq and nmi than "nmi".
> Please let me know if you don't think so. I can get rid of the re-
> naming from the patch.

Or I can make this patch into two, one of which is only for the re-naming
including ./Documentation/RCU/Design/Data-Structures/Data-Structures.html
as well.

Just let me know. I will follow your decision.

> I will re-send this with a change log after getting your opinion.
> 
> ----->8-----
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index deb2508..413fef7 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -260,7 +260,7 @@ void rcu_bh_qs(void)
>  
>  static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
>  	.dynticks_nesting = 1,
> -	.dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
> +	.dynticks_irq_nesting = DYNTICK_IRQ_NONIDLE,
>  	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
>  };
>  
> @@ -695,7 +695,7 @@ static struct rcu_node *rcu_get_root(struct rcu_state *rsp)
>   * Enter an RCU extended quiescent state, which can be either the
>   * idle loop or adaptive-tickless usermode execution.
>   *
> - * We crowbar the ->dynticks_nmi_nesting field to zero to allow for
> + * We crowbar the ->dynticks_irq_nesting field to zero to allow for
>   * the possibility of usermode upcalls having messed up our count
>   * of interrupt nesting level during the prior busy period.
>   */
> @@ -706,7 +706,7 @@ static void rcu_eqs_enter(bool user)
>  	struct rcu_dynticks *rdtp;
>  
>  	rdtp = this_cpu_ptr(&rcu_dynticks);
> -	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0);
> +	WRITE_ONCE(rdtp->dynticks_irq_nesting, 0);
>  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
>  		     rdtp->dynticks_nesting == 0);
>  	if (rdtp->dynticks_nesting != 1) {
> @@ -764,43 +764,58 @@ void rcu_user_enter(void)
>  #endif /* CONFIG_NO_HZ_FULL */
>  
>  /**
> - * rcu_nmi_exit - inform RCU of exit from NMI context
> + * rcu_irq_exit_common - inform RCU of exit from interrupt context
>   *
> - * If we are returning from the outermost NMI handler that interrupted an
> - * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting
> - * to let the RCU grace-period handling know that the CPU is back to
> - * being RCU-idle.
> + * If we are returning from the outermost interrupt handler that
> + * interrupted an RCU-idle period, update rdtp->dynticks and
> + * rdtp->dynticks_irq_nesting to let the RCU grace-period handling
> + * know that the CPU is back to being RCU-idle.
>   *
> - * If you add or remove a call to rcu_nmi_exit(), be sure to test
> - * with CONFIG_RCU_EQS_DEBUG=y.
> + * If you add or remove a call to rcu_irq_exit_common(), be sure to
> + * test with CONFIG_RCU_EQS_DEBUG=y.
>   */
> -void rcu_nmi_exit(void)
> +static __always_inline void rcu_irq_exit_common(bool nmi)
>  {
>  	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
>  
>  	/*
> -	 * Check for ->dynticks_nmi_nesting underflow and bad ->dynticks.
> +	 * Check for ->dynticks_irq_nesting underflow and bad ->dynticks.
>  	 * (We are exiting an NMI handler, so RCU better be paying attention
>  	 * to us!)
>  	 */
> -	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting <= 0);
> +	WARN_ON_ONCE(rdtp->dynticks_irq_nesting <= 0);
>  	WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs());
>  
>  	/*
>  	 * If the nesting level is not 1, the CPU wasn't RCU-idle, so
>  	 * leave it in non-RCU-idle state.
>  	 */
> -	if (rdtp->dynticks_nmi_nesting != 1) {
> -		trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nmi_nesting, rdtp->dynticks_nmi_nesting - 2, rdtp->dynticks);
> -		WRITE_ONCE(rdtp->dynticks_nmi_nesting, /* No store tearing. */
> -			   rdtp->dynticks_nmi_nesting - 2);
> +	if (rdtp->dynticks_irq_nesting != 1) {
> +		trace_rcu_dyntick(TPS("--="), rdtp->dynticks_irq_nesting, rdtp->dynticks_irq_nesting - 2, rdtp->dynticks);
> +		WRITE_ONCE(rdtp->dynticks_irq_nesting, /* No store tearing. */
> +			   rdtp->dynticks_irq_nesting - 2);
>  		return;
>  	}
>  
>  	/* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
> -	trace_rcu_dyntick(TPS("Startirq"), rdtp->dynticks_nmi_nesting, 0, rdtp->dynticks);
> -	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
> +	trace_rcu_dyntick(TPS("Startirq"), rdtp->dynticks_irq_nesting, 0, rdtp->dynticks);
> +	WRITE_ONCE(rdtp->dynticks_irq_nesting, 0); /* Avoid store tearing. */
> +
> +	if (!nmi)
> +		rcu_prepare_for_idle();
> +
>  	rcu_dynticks_eqs_enter();
> +
> +	if (!nmi)
> +		rcu_dynticks_task_enter();
> +}
> +
> +/**
> + * rcu_nmi_exit - inform RCU of exit from NMI context
> + */
> +void rcu_nmi_exit(void)
> +{
> +	rcu_irq_exit_common(true);
>  }
>  
>  /**
> @@ -824,14 +839,8 @@ void rcu_nmi_exit(void)
>   */
>  void rcu_irq_exit(void)
>  {
> -	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> -
>  	lockdep_assert_irqs_disabled();
> -	if (rdtp->dynticks_nmi_nesting == 1)
> -		rcu_prepare_for_idle();
> -	rcu_nmi_exit();
> -	if (rdtp->dynticks_nmi_nesting == 0)
> -		rcu_dynticks_task_enter();
> +	rcu_irq_exit_common(false);
>  }
>  
>  /*
> @@ -853,7 +862,7 @@ void rcu_irq_exit_irqson(void)
>   * Exit an RCU extended quiescent state, which can be either the
>   * idle loop or adaptive-tickless usermode execution.
>   *
> - * We crowbar the ->dynticks_nmi_nesting field to DYNTICK_IRQ_NONIDLE to
> + * We crowbar the ->dynticks_irq_nesting field to DYNTICK_IRQ_NONIDLE to
>   * allow for the possibility of usermode upcalls messing up our count of
>   * interrupt nesting level during the busy period that is just now starting.
>   */
> @@ -876,7 +885,7 @@ static void rcu_eqs_exit(bool user)
>  	trace_rcu_dyntick(TPS("End"), rdtp->dynticks_nesting, 1, rdtp->dynticks);
>  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
>  	WRITE_ONCE(rdtp->dynticks_nesting, 1);
> -	WRITE_ONCE(rdtp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);
> +	WRITE_ONCE(rdtp->dynticks_irq_nesting, DYNTICK_IRQ_NONIDLE);
>  }
>  
>  /**
> @@ -914,46 +923,62 @@ void rcu_user_exit(void)
>  #endif /* CONFIG_NO_HZ_FULL */
>  
>  /**
> - * rcu_nmi_enter - inform RCU of entry to NMI context
> + * rcu_irq_enter_common - inform RCU of entry to interrupt context
>   *
>   * If the CPU was idle from RCU's viewpoint, update rdtp->dynticks and
> - * rdtp->dynticks_nmi_nesting to let the RCU grace-period handling know
> - * that the CPU is active.  This implementation permits nested NMIs, as
> - * long as the nesting level does not overflow an int.  (You will probably
> - * run out of stack space first.)
> + * rdtp->dynticks_irq_nesting to let the RCU grace-period handling know
> + * that the CPU is active.  This implementation permits nested
> + * interrupts including NMIs, as long as the nesting level does not
> + * overflow an int. (You will probably run out of stack space first.)
>   *
> - * If you add or remove a call to rcu_nmi_enter(), be sure to test
> - * with CONFIG_RCU_EQS_DEBUG=y.
> + * If you add or remove a call to rcu_irq_enter_common(), be sure to
> + * test with CONFIG_RCU_EQS_DEBUG=y.
>   */
> -void rcu_nmi_enter(void)
> +static __always_inline void rcu_irq_enter_common(bool nmi)
>  {
>  	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
>  	long incby = 2;
>  
>  	/* Complain about underflow. */
> -	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 0);
> +	WARN_ON_ONCE(rdtp->dynticks_irq_nesting < 0);
>  
>  	/*
>  	 * If idle from RCU viewpoint, atomically increment ->dynticks
> -	 * to mark non-idle and increment ->dynticks_nmi_nesting by one.
> -	 * Otherwise, increment ->dynticks_nmi_nesting by two.  This means
> -	 * if ->dynticks_nmi_nesting is equal to one, we are guaranteed
> +	 * to mark non-idle and increment ->dynticks_irq_nesting by one.
> +	 * Otherwise, increment ->dynticks_irq_nesting by two.  This means
> +	 * if ->dynticks_irq_nesting is equal to one, we are guaranteed
>  	 * to be in the outermost NMI handler that interrupted an RCU-idle
>  	 * period (observation due to Andy Lutomirski).
>  	 */
>  	if (rcu_dynticks_curr_cpu_in_eqs()) {
> +
> +		if (!nmi)
> +			rcu_dynticks_task_exit();
> +
>  		rcu_dynticks_eqs_exit();
> +
> +		if (!nmi)
> +			rcu_cleanup_after_idle();
> +
>  		incby = 1;
>  	}
>  	trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
> -			  rdtp->dynticks_nmi_nesting,
> -			  rdtp->dynticks_nmi_nesting + incby, rdtp->dynticks);
> -	WRITE_ONCE(rdtp->dynticks_nmi_nesting, /* Prevent store tearing. */
> -		   rdtp->dynticks_nmi_nesting + incby);
> +			  rdtp->dynticks_irq_nesting,
> +			  rdtp->dynticks_irq_nesting + incby, rdtp->dynticks);
> +	WRITE_ONCE(rdtp->dynticks_irq_nesting, /* Prevent store tearing. */
> +		   rdtp->dynticks_irq_nesting + incby);
>  	barrier();
>  }
>  
>  /**
> + * rcu_nmi_enter - inform RCU of entry to NMI context
> + */
> +void rcu_nmi_enter(void)
> +{
> +	rcu_irq_enter_common(true);
> +}
> +
> +/**
>   * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle
>   *
>   * Enter an interrupt handler, which might possibly result in exiting
> @@ -977,14 +1002,8 @@ void rcu_nmi_enter(void)
>   */
>  void rcu_irq_enter(void)
>  {
> -	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> -
>  	lockdep_assert_irqs_disabled();
> -	if (rdtp->dynticks_nmi_nesting == 0)
> -		rcu_dynticks_task_exit();
> -	rcu_nmi_enter();
> -	if (rdtp->dynticks_nmi_nesting == 1)
> -		rcu_cleanup_after_idle();
> +	rcu_irq_enter_common(false);
>  }
>  
>  /*
> @@ -1092,7 +1111,7 @@ bool rcu_lockdep_current_cpu_online(void)
>  static int rcu_is_cpu_rrupt_from_idle(void)
>  {
>  	return __this_cpu_read(rcu_dynticks.dynticks_nesting) <= 0 &&
> -	       __this_cpu_read(rcu_dynticks.dynticks_nmi_nesting) <= 1;
> +	       __this_cpu_read(rcu_dynticks.dynticks_irq_nesting) <= 1;
>  }
>  
>  /*
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 4e74df7..80ba455 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -39,7 +39,7 @@
>   */
>  struct rcu_dynticks {
>  	long dynticks_nesting;      /* Track process nesting level. */
> -	long dynticks_nmi_nesting;  /* Track irq/NMI nesting level. */
> +	long dynticks_irq_nesting;  /* Track irq/NMI nesting level. */
>  	atomic_t dynticks;	    /* Even value for idle, else odd. */
>  	bool rcu_need_heavy_qs;     /* GP old, need heavy quiescent state. */
>  	unsigned long rcu_qs_ctr;   /* Light universal quiescent state ctr. */
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index c1b17f5..2cd637d 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1811,7 +1811,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
>  				"!."[!delta],
>  	       ticks_value, ticks_title,
>  	       rcu_dynticks_snap(rdtp) & 0xfff,
> -	       rdtp->dynticks_nesting, rdtp->dynticks_nmi_nesting,
> +	       rdtp->dynticks_nesting, rdtp->dynticks_irq_nesting,
>  	       rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
>  	       READ_ONCE(rsp->n_force_qs) - rsp->n_force_qs_gpstart,
>  	       fast_no_hz);
Byungchul Park June 21, 2018, 10:08 a.m. UTC | #5
On Thu, Jun 21, 2018 at 03:39:49PM +0900, Byungchul Park wrote:

[...]

> I applied what you suggested and re-named rcu_nmi_{enter,exit} to
                                               ^
                        rcu_nmi_{enter,exit}_common(bool irq)

> rcu_irq_{enter,exit} and applied the same re-naming to
          ^
     rcu_irq_{enter,exit}_common(bool nmi)

> ->dynticks_nmi_nesting as well, since those are not things to do with
           ^
    dynticks_nmi_nesting -> dynticks_irq_nesting

> nmi anymore but both irq and nmi.
> 
> I think "irq" is better to represent both irq and nmi than "nmi".
> Please let me know if you don't think so. I can get rid of the re-
> naming from the patch.

--
Thanks,
Byungchul
Byungchul Park June 22, 2018, 3 a.m. UTC | #6
On Thu, Jun 21, 2018 at 08:04:07AM -0700, Paul E. McKenney wrote:

> Nothing quite like concurrent programming to help one see one's own
> mistakes.  ;-)

Haha.

> Your reasoning has merit, but the nice thing about keeping "nmi" is
> that it helps casual readers see that NMIs must be handled.  If we
> rename this to "irq", we lose that hint and probably leave some
> readers wondering why the strange increment-by-2 code is there.
> So let's please keep the current names.

Got it. I will.

> >  /**
> > - * rcu_nmi_exit - inform RCU of exit from NMI context
> > + * rcu_irq_exit_common - inform RCU of exit from interrupt context
> >   *
> > - * If we are returning from the outermost NMI handler that interrupted an
> > - * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting
> > - * to let the RCU grace-period handling know that the CPU is back to
> > - * being RCU-idle.
> > + * If we are returning from the outermost interrupt handler that
> > + * interrupted an RCU-idle period, update rdtp->dynticks and
> > + * rdtp->dynticks_irq_nesting to let the RCU grace-period handling
> > + * know that the CPU is back to being RCU-idle.
> >   *
> > - * If you add or remove a call to rcu_nmi_exit(), be sure to test
> > - * with CONFIG_RCU_EQS_DEBUG=y.
> > + * If you add or remove a call to rcu_irq_exit_common(), be sure to
> > + * test with CONFIG_RCU_EQS_DEBUG=y.
> >   */
> > -void rcu_nmi_exit(void)
> > +static __always_inline void rcu_irq_exit_common(bool nmi)
> 
> However, I suggest making this function's parameter "irq" because ...

I will.

> Does the generated code really get rid of the conditional branches?
> I would hope that it wouild, but it is always good to check.  This
> should be easy to find in the assembly-language output because of the
> calls to rcu_prepare_for_idle() and rcu_dynticks_task_enter().

Good! It works as we expect, I did it only with x86_64 tho. Let me show
you the part we are interested in. The rest are almost same.

<rcu_nmi_exit>:
	5b                   	pop    %rbx
	5d                   	pop    %rbp
	41 5c                	pop    %r12
	41 5d                	pop    %r13
	41 5e                	pop    %r14
	41 5f                	pop    %r15
	e9 0f 75 ff ff       	jmpq   ffffffff810bb440 <rcu_dynticks_eqs_enter>

<rcu_irq_exit>:
	e8 e6 e5 ff ff       	callq  ffffffff810c26a0 <rcu_prepare_for_idle>
	e8 81 73 ff ff       	callq  ffffffff810bb440 <rcu_dynticks_eqs_enter>
	e8 ec 3a 2b 00       	callq  ffffffff81377bb0 <debug_smp_processor_id>
	65 48 8b 14 25 00 4d 	mov    %gs:0x14d00,%rdx
	01 00 
	89 82 94 03 00 00    	mov    %eax,0x394(%rdx)
	5b                   	pop    %rbx
	5d                   	pop    %rbp
	41 5c                	pop    %r12
	41 5d                	pop    %r13
	41 5e                	pop    %r14
	41 5f                	pop    %r15
	c3                   	retq

Even though they return in a little bit different way, anyway I can see
all the branchs we are interested in were removed by compiler!

> >  {
> >  	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> >  	long incby = 2;
> > 
> >  	/* Complain about underflow. */
> > -	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 0);
> > +	WARN_ON_ONCE(rdtp->dynticks_irq_nesting < 0);
> > 
> >  	/*
> >  	 * If idle from RCU viewpoint, atomically increment ->dynticks
> > -	 * to mark non-idle and increment ->dynticks_nmi_nesting by one.
> > -	 * Otherwise, increment ->dynticks_nmi_nesting by two.  This means
> > -	 * if ->dynticks_nmi_nesting is equal to one, we are guaranteed
> > +	 * to mark non-idle and increment ->dynticks_irq_nesting by one.
> > +	 * Otherwise, increment ->dynticks_irq_nesting by two.  This means
> > +	 * if ->dynticks_irq_nesting is equal to one, we are guaranteed
> >  	 * to be in the outermost NMI handler that interrupted an RCU-idle
> >  	 * period (observation due to Andy Lutomirski).
> >  	 */
> >  	if (rcu_dynticks_curr_cpu_in_eqs()) {
> > +
> > +		if (!nmi)
> > +			rcu_dynticks_task_exit();
> > +
> >  		rcu_dynticks_eqs_exit();
> > +
> > +		if (!nmi)
> 
> ... and checking for branches here.

Also good! The following is the only different part.

<rcu_nmi_enter>:
	e8 dc 81 ff ff       	callq  ffffffff810bc450 <rcu_dynticks_eqs_exit>

<rcu_irq_enter>:
	65 48 8b 04 25 00 4d 	mov    %gs:0x14d00,%rax
	01 00 
	c7 80 94 03 00 00 ff 	movl   $0xffffffff,0x394(%rax)
	ff ff ff 
	e8 b9 80 ff ff       	callq  ffffffff810bc450 <rcu_dynticks_eqs_exit>
	e8 d4 b9 ff ff       	callq  ffffffff810bfd70 <rcu_cleanup_after_idle>

--
Thanks,
Byungchul
Joel Fernandes June 22, 2018, 5:56 a.m. UTC | #7
Hi Paul,

On Wed, Jun 20, 2018 at 09:49:02AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 21, 2018 at 01:05:22AM +0900, Byungchul Park wrote:
> > On Wed, Jun 20, 2018 at 11:58 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > > On Wed, Jun 20, 2018 at 05:47:20PM +0900, Byungchul Park wrote:
> > >> Hello folks,
> > >>
> > >> I'm careful in saying that ->dynticks_nmi_nesting can be removed but I
> > >> think it's possible since the only thing we are interested in with
> > >> regard to ->dynticks_nesting or ->dynticks_nmi_nesting is whether rcu is
> > >> idle or not.
> > >
> > > Please keep in mind that NMIs cannot be masked, which means that the
> > > rcu_nmi_enter() and rcu_nmi_exit() pair can be invoked at any point in
> > > the process, between any consecutive pair of instructions.  The saving
> 
> And yes, I should have looked at this patch more closely before replying.
> But please see below.
> 
> > I believe I understand what NMI is and why you introduced
> > ->dynticks_nmi_nesting. Or am I missing something?
> 
> Perhaps the fact that there are architectures that can enter interrupt
> handlers and never leave them when the CPU is non-idle.  One example of
> this is the usermode upcalls in the comment that you removed.

I spent some time tonight and last night trying to understand this concept of
never leaving an interrupt, I hope you don't mind me asking this dumb
question... perhaps I will learn something : Could you let me know how is it
possible that an interrupt never exits?

Typically an interrupt never exiting sounds like a hard-lockup. This is how
hardlock detector works: Since regular interrupts in linux can't nest, the
hardlockup detector checks if hrtimer interrupts are being handled and if
not, then it throws a splat, panics the kernel etc. So I am a bit troubled by
this interrupt never exiting concept..

Further since an interrupt is an atomic context, it cannot sleep or schedule
into usermode so how are these upcalls handled from the interrupt?

Lastly, can you point me to an example how the rcu_nmi_enter/exit() pair can go
out sync? That is they aren't paired and nested properly? In my mind they
always should be but I may be missing the usecase. I'm happy to try and
reproduce and trace this if you can let me know how to so that I can study
it better. 

Thanks a lot Paul for your help,

 - Joel
Paul E. McKenney June 22, 2018, 1:28 p.m. UTC | #8
On Thu, Jun 21, 2018 at 10:56:59PM -0700, Joel Fernandes wrote:
> Hi Paul,
> 
> On Wed, Jun 20, 2018 at 09:49:02AM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 21, 2018 at 01:05:22AM +0900, Byungchul Park wrote:
> > > On Wed, Jun 20, 2018 at 11:58 PM, Paul E. McKenney
> > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > On Wed, Jun 20, 2018 at 05:47:20PM +0900, Byungchul Park wrote:
> > > >> Hello folks,
> > > >>
> > > >> I'm careful in saying that ->dynticks_nmi_nesting can be removed but I
> > > >> think it's possible since the only thing we are interested in with
> > > >> regard to ->dynticks_nesting or ->dynticks_nmi_nesting is whether rcu is
> > > >> idle or not.
> > > >
> > > > Please keep in mind that NMIs cannot be masked, which means that the
> > > > rcu_nmi_enter() and rcu_nmi_exit() pair can be invoked at any point in
> > > > the process, between any consecutive pair of instructions.  The saving
> > 
> > And yes, I should have looked at this patch more closely before replying.
> > But please see below.
> > 
> > > I believe I understand what NMI is and why you introduced
> > > ->dynticks_nmi_nesting. Or am I missing something?
> > 
> > Perhaps the fact that there are architectures that can enter interrupt
> > handlers and never leave them when the CPU is non-idle.  One example of
> > this is the usermode upcalls in the comment that you removed.
> 
> I spent some time tonight and last night trying to understand this concept of
> never leaving an interrupt, I hope you don't mind me asking this dumb
> question... perhaps I will learn something : Could you let me know how is it
> possible that an interrupt never exits?
> 
> Typically an interrupt never exiting sounds like a hard-lockup. This is how
> hardlock detector works: Since regular interrupts in linux can't nest, the
> hardlockup detector checks if hrtimer interrupts are being handled and if
> not, then it throws a splat, panics the kernel etc. So I am a bit troubled by
> this interrupt never exiting concept..
> 
> Further since an interrupt is an atomic context, it cannot sleep or schedule
> into usermode so how are these upcalls handled from the interrupt?

It has been some years since I traced the code flow, but what happened
back then is that it switches itself from an interrupt handler to not
without actually returning from the interrupt.  This can only happen when
interrupting a non-idle process, thankfully, and RCU's dyntick-idle code
relies on this restriction.  If I remember correctly, the code ends up
executing in the context of the interrupted process, but it has been some
years, so please apply appropriate skepticism.

Please take a look at the "Interrupts and NMIs" section of the file
Documentation/RCU/Design/Requirements/Requirements.html for a bit
more information.

> Lastly, can you point me to an example how the rcu_nmi_enter/exit() pair can go
> out sync? That is they aren't paired and nested properly? In my mind they
> always should be but I may be missing the usecase. I'm happy to try and
> reproduce and trace this if you can let me know how to so that I can study
> it better. 

I have never seen NMIs be unpaired or improperly nested.  However,
given that rcu_irq_enter() invokes rcu_nmi_enter() and rcu_irq_exit()
invokes rcu_nmi_exit(), it is definitely the case that rcu_nmi_enter()
and rcu_nmi_exit() need to deal with unpaired and improperly nested
invocations.

So why this function-call structure?  Well, you see, NMI handlers can
take what appear to RCU to be normal interrupts...

(And I just added that fun fact to Requirements.html.)

> Thanks a lot Paul for your help,

Please feel free to take a look at Requirements.html.  There are a lot
more surprising RCU facts of life recorded there.  ;-)

							Thanx, Paul
Paul E. McKenney June 22, 2018, 1:36 p.m. UTC | #9
On Fri, Jun 22, 2018 at 12:00:32PM +0900, Byungchul Park wrote:
> On Thu, Jun 21, 2018 at 08:04:07AM -0700, Paul E. McKenney wrote:
> 
> > Nothing quite like concurrent programming to help one see one's own
> > mistakes.  ;-)
> 
> Haha.
> 
> > Your reasoning has merit, but the nice thing about keeping "nmi" is
> > that it helps casual readers see that NMIs must be handled.  If we
> > rename this to "irq", we lose that hint and probably leave some
> > readers wondering why the strange increment-by-2 code is there.
> > So let's please keep the current names.
> 
> Got it. I will.
> 
> > >  /**
> > > - * rcu_nmi_exit - inform RCU of exit from NMI context
> > > + * rcu_irq_exit_common - inform RCU of exit from interrupt context
> > >   *
> > > - * If we are returning from the outermost NMI handler that interrupted an
> > > - * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting
> > > - * to let the RCU grace-period handling know that the CPU is back to
> > > - * being RCU-idle.
> > > + * If we are returning from the outermost interrupt handler that
> > > + * interrupted an RCU-idle period, update rdtp->dynticks and
> > > + * rdtp->dynticks_irq_nesting to let the RCU grace-period handling
> > > + * know that the CPU is back to being RCU-idle.
> > >   *
> > > - * If you add or remove a call to rcu_nmi_exit(), be sure to test
> > > - * with CONFIG_RCU_EQS_DEBUG=y.
> > > + * If you add or remove a call to rcu_irq_exit_common(), be sure to
> > > + * test with CONFIG_RCU_EQS_DEBUG=y.
> > >   */
> > > -void rcu_nmi_exit(void)
> > > +static __always_inline void rcu_irq_exit_common(bool nmi)
> > 
> > However, I suggest making this function's parameter "irq" because ...
> 
> I will.
> 
> > Does the generated code really get rid of the conditional branches?
> > I would hope that it wouild, but it is always good to check.  This
> > should be easy to find in the assembly-language output because of the
> > calls to rcu_prepare_for_idle() and rcu_dynticks_task_enter().
> 
> Good! It works as we expect, I did it only with x86_64 tho.

I suspect that it would be similar for most other architectures running
the same compiler version.  Might be worth firing up a cross-compiler
to check one or two more, though.

>                                                             Let me show
> you the part we are interested in. The rest are almost same.
> 
> <rcu_nmi_exit>:
> 	5b                   	pop    %rbx
> 	5d                   	pop    %rbp
> 	41 5c                	pop    %r12
> 	41 5d                	pop    %r13
> 	41 5e                	pop    %r14
> 	41 5f                	pop    %r15
> 	e9 0f 75 ff ff       	jmpq   ffffffff810bb440 <rcu_dynticks_eqs_enter>
> 
> <rcu_irq_exit>:
> 	e8 e6 e5 ff ff       	callq  ffffffff810c26a0 <rcu_prepare_for_idle>
> 	e8 81 73 ff ff       	callq  ffffffff810bb440 <rcu_dynticks_eqs_enter>
> 	e8 ec 3a 2b 00       	callq  ffffffff81377bb0 <debug_smp_processor_id>
> 	65 48 8b 14 25 00 4d 	mov    %gs:0x14d00,%rdx
> 	01 00 
> 	89 82 94 03 00 00    	mov    %eax,0x394(%rdx)
> 	5b                   	pop    %rbx
> 	5d                   	pop    %rbp
> 	41 5c                	pop    %r12
> 	41 5d                	pop    %r13
> 	41 5e                	pop    %r14
> 	41 5f                	pop    %r15
> 	c3                   	retq

This is a summary view focusing on the function calls, correct?
(I am guessing this based on your "the part we are interested in".)

> Even though they return in a little bit different way, anyway I can see
> all the branchs we are interested in were removed by compiler!

Yes, very nice!

The reason for the difference is that the compiler applied tail
recursion to rcu_nmi_exit()'s call to rcu_dynticks_eqs_enter(), and
inlined rcu_irq_exit()'s call to rcu_dynticks_task_enter().

> > >  {
> > >  	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> > >  	long incby = 2;
> > > 
> > >  	/* Complain about underflow. */
> > > -	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 0);
> > > +	WARN_ON_ONCE(rdtp->dynticks_irq_nesting < 0);
> > > 
> > >  	/*
> > >  	 * If idle from RCU viewpoint, atomically increment ->dynticks
> > > -	 * to mark non-idle and increment ->dynticks_nmi_nesting by one.
> > > -	 * Otherwise, increment ->dynticks_nmi_nesting by two.  This means
> > > -	 * if ->dynticks_nmi_nesting is equal to one, we are guaranteed
> > > +	 * to mark non-idle and increment ->dynticks_irq_nesting by one.
> > > +	 * Otherwise, increment ->dynticks_irq_nesting by two.  This means
> > > +	 * if ->dynticks_irq_nesting is equal to one, we are guaranteed
> > >  	 * to be in the outermost NMI handler that interrupted an RCU-idle
> > >  	 * period (observation due to Andy Lutomirski).
> > >  	 */
> > >  	if (rcu_dynticks_curr_cpu_in_eqs()) {
> > > +
> > > +		if (!nmi)
> > > +			rcu_dynticks_task_exit();
> > > +
> > >  		rcu_dynticks_eqs_exit();
> > > +
> > > +		if (!nmi)
> > 
> > ... and checking for branches here.
> 
> Also good! The following is the only different part.
> 
> <rcu_nmi_enter>:
> 	e8 dc 81 ff ff       	callq  ffffffff810bc450 <rcu_dynticks_eqs_exit>
> 
> <rcu_irq_enter>:
> 	65 48 8b 04 25 00 4d 	mov    %gs:0x14d00,%rax
> 	01 00 
> 	c7 80 94 03 00 00 ff 	movl   $0xffffffff,0x394(%rax)
> 	ff ff ff 
> 	e8 b9 80 ff ff       	callq  ffffffff810bc450 <rcu_dynticks_eqs_exit>
> 	e8 d4 b9 ff ff       	callq  ffffffff810bfd70 <rcu_cleanup_after_idle>

Also looks good, so please do send the patches!

							Thanx, Paul
Andy Lutomirski June 22, 2018, 2:19 p.m. UTC | #10
On Fri, Jun 22, 2018 at 6:26 AM Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> On Thu, Jun 21, 2018 at 10:56:59PM -0700, Joel Fernandes wrote:
> > Hi Paul,
> >
> > On Wed, Jun 20, 2018 at 09:49:02AM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 21, 2018 at 01:05:22AM +0900, Byungchul Park wrote:
> > > > On Wed, Jun 20, 2018 at 11:58 PM, Paul E. McKenney
> > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > On Wed, Jun 20, 2018 at 05:47:20PM +0900, Byungchul Park wrote:
> > > > >> Hello folks,
> > > > >>
> > > > >> I'm careful in saying that ->dynticks_nmi_nesting can be removed but I
> > > > >> think it's possible since the only thing we are interested in with
> > > > >> regard to ->dynticks_nesting or ->dynticks_nmi_nesting is whether rcu is
> > > > >> idle or not.
> > > > >
> > > > > Please keep in mind that NMIs cannot be masked, which means that the
> > > > > rcu_nmi_enter() and rcu_nmi_exit() pair can be invoked at any point in
> > > > > the process, between any consecutive pair of instructions.  The saving
> > >
> > > And yes, I should have looked at this patch more closely before replying.
> > > But please see below.
> > >
> > > > I believe I understand what NMI is and why you introduced
> > > > ->dynticks_nmi_nesting. Or am I missing something?
> > >
> > > Perhaps the fact that there are architectures that can enter interrupt
> > > handlers and never leave them when the CPU is non-idle.  One example of
> > > this is the usermode upcalls in the comment that you removed.
> >
> > I spent some time tonight and last night trying to understand this concept of
> > never leaving an interrupt, I hope you don't mind me asking this dumb
> > question... perhaps I will learn something : Could you let me know how is it
> > possible that an interrupt never exits?
> >
> > Typically an interrupt never exiting sounds like a hard-lockup. This is how
> > hardlock detector works: Since regular interrupts in linux can't nest, the
> > hardlockup detector checks if hrtimer interrupts are being handled and if
> > not, then it throws a splat, panics the kernel etc. So I am a bit troubled by
> > this interrupt never exiting concept..
> >
> > Further since an interrupt is an atomic context, it cannot sleep or schedule
> > into usermode so how are these upcalls handled from the interrupt?
>
> It has been some years since I traced the code flow, but what happened
> back then is that it switches itself from an interrupt handler to not
> without actually returning from the interrupt.  This can only happen when
> interrupting a non-idle process, thankfully, and RCU's dyntick-idle code
> relies on this restriction.  If I remember correctly, the code ends up
> executing in the context of the interrupted process, but it has been some
> years, so please apply appropriate skepticism.

...

>
> I have never seen NMIs be unpaired or improperly nested.  However,
> given that rcu_irq_enter() invokes rcu_nmi_enter() and rcu_irq_exit()
> invokes rcu_nmi_exit(), it is definitely the case that rcu_nmi_enter()
> and rcu_nmi_exit() need to deal with unpaired and improperly nested
> invocations.

This is very strange.  There are certainly cases in x86 where an
interrupt-ish code path can become less interrupt-ish without
returning (killing a task that overflows a kernel stack is an
example), but the RCU calls should still nest correctly.  Do you know
the history of this requirement?
Steven Rostedt June 22, 2018, 4:01 p.m. UTC | #11
On Fri, 22 Jun 2018 06:28:43 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> It has been some years since I traced the code flow, but what happened
> back then is that it switches itself from an interrupt handler to not
> without actually returning from the interrupt.  This can only happen when
> interrupting a non-idle process, thankfully, and RCU's dyntick-idle code
> relies on this restriction.  If I remember correctly, the code ends up
> executing in the context of the interrupted process, but it has been some
> years, so please apply appropriate skepticism.

If irq_enter() is not paired with irq_exit() then major things will
break. Especially since that's how in_interrupt() and friends rely on to
work.

Now, perhaps rcu_irq_enter() is called elsewhere (as a git grep appears
it may be), and that rcu_irq_enter() may not be paired with
rcu_irq_exit(). But that's not anything to do with the irq_enter() and
irq_exit() routines being paired or not.

-- Steve
Paul E. McKenney June 22, 2018, 4:12 p.m. UTC | #12
On Fri, Jun 22, 2018 at 07:19:13AM -0700, Andy Lutomirski wrote:
> On Fri, Jun 22, 2018 at 6:26 AM Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > On Thu, Jun 21, 2018 at 10:56:59PM -0700, Joel Fernandes wrote:
> > > Hi Paul,
> > >
> > > On Wed, Jun 20, 2018 at 09:49:02AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Jun 21, 2018 at 01:05:22AM +0900, Byungchul Park wrote:
> > > > > On Wed, Jun 20, 2018 at 11:58 PM, Paul E. McKenney
> > > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > On Wed, Jun 20, 2018 at 05:47:20PM +0900, Byungchul Park wrote:
> > > > > >> Hello folks,
> > > > > >>
> > > > > >> I'm careful in saying that ->dynticks_nmi_nesting can be removed but I
> > > > > >> think it's possible since the only thing we are interested in with
> > > > > >> regard to ->dynticks_nesting or ->dynticks_nmi_nesting is whether rcu is
> > > > > >> idle or not.
> > > > > >
> > > > > > Please keep in mind that NMIs cannot be masked, which means that the
> > > > > > rcu_nmi_enter() and rcu_nmi_exit() pair can be invoked at any point in
> > > > > > the process, between any consecutive pair of instructions.  The saving
> > > >
> > > > And yes, I should have looked at this patch more closely before replying.
> > > > But please see below.
> > > >
> > > > > I believe I understand what NMI is and why you introduced
> > > > > ->dynticks_nmi_nesting. Or am I missing something?
> > > >
> > > > Perhaps the fact that there are architectures that can enter interrupt
> > > > handlers and never leave them when the CPU is non-idle.  One example of
> > > > this is the usermode upcalls in the comment that you removed.
> > >
> > > I spent some time tonight and last night trying to understand this concept of
> > > never leaving an interrupt, I hope you don't mind me asking this dumb
> > > question... perhaps I will learn something : Could you let me know how is it
> > > possible that an interrupt never exits?
> > >
> > > Typically an interrupt never exiting sounds like a hard-lockup. This is how
> > > hardlock detector works: Since regular interrupts in linux can't nest, the
> > > hardlockup detector checks if hrtimer interrupts are being handled and if
> > > not, then it throws a splat, panics the kernel etc. So I am a bit troubled by
> > > this interrupt never exiting concept..
> > >
> > > Further since an interrupt is an atomic context, it cannot sleep or schedule
> > > into usermode so how are these upcalls handled from the interrupt?
> >
> > It has been some years since I traced the code flow, but what happened
> > back then is that it switches itself from an interrupt handler to not
> > without actually returning from the interrupt.  This can only happen when
> > interrupting a non-idle process, thankfully, and RCU's dyntick-idle code
> > relies on this restriction.  If I remember correctly, the code ends up
> > executing in the context of the interrupted process, but it has been some
> > years, so please apply appropriate skepticism.
> 
> ...
> 
> >
> > I have never seen NMIs be unpaired or improperly nested.  However,
> > given that rcu_irq_enter() invokes rcu_nmi_enter() and rcu_irq_exit()
> > invokes rcu_nmi_exit(), it is definitely the case that rcu_nmi_enter()
> > and rcu_nmi_exit() need to deal with unpaired and improperly nested
> > invocations.
> 
> This is very strange.  There are certainly cases in x86 where an
> interrupt-ish code path can become less interrupt-ish without
> returning (killing a task that overflows a kernel stack is an
> example), but the RCU calls should still nest correctly.  Do you know
> the history of this requirement?

I believe that they are called "usermode helpers", and are (were?)
used on a number of architectures to implement system calls from
within the kernel.

							Thanx, Paul
Paul E. McKenney June 22, 2018, 6:14 p.m. UTC | #13
On Fri, Jun 22, 2018 at 12:01:49PM -0400, Steven Rostedt wrote:
> On Fri, 22 Jun 2018 06:28:43 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > It has been some years since I traced the code flow, but what happened
> > back then is that it switches itself from an interrupt handler to not
> > without actually returning from the interrupt.  This can only happen when
> > interrupting a non-idle process, thankfully, and RCU's dyntick-idle code
> > relies on this restriction.  If I remember correctly, the code ends up
> > executing in the context of the interrupted process, but it has been some
> > years, so please apply appropriate skepticism.
> 
> If irq_enter() is not paired with irq_exit() then major things will
> break. Especially since that's how in_interrupt() and friends rely on to
> work.
> 
> Now, perhaps rcu_irq_enter() is called elsewhere (as a git grep appears
> it may be), and that rcu_irq_enter() may not be paired with
> rcu_irq_exit(). But that's not anything to do with the irq_enter() and
> irq_exit() routines being paired or not.

The non-irq_enter() calls to rcu_irq_enter() and the non-irq_exit()
calls to rcu_irq_exit() do appear to be balanced as of v4.17.

If I recall correctly, the offending piece of functionality was the
usermode helpers, which on some architectures did a syscall exception
from within the kernel to make a system call happen.  This seems to now
be common code using workqueues, kernel threads, and do_execve().
Here is the best reference I could find to the specific problem
I encountered back in the day:

https://groups.google.com/forum/#!msg/linux.kernel/B5hZX1tJRs8/sOVVfhrirL8J

I do recall that there were real failures.  There is no way I would have
written code tolerating half-interrupts without cause, no more than I
would have written code handling what looks to RCU like interrupts from
NMI handlers without cause.  ;-)

One approach would be for me to add a WARN_ON_ONCE() to check for
misnesting.  If this didn't trigger for some time long enough for the
check to propagate to the various distros' users, then this code could
be simplified.  Though it would not be that big a deal, just the removal
of a store or two.

							Thanx, Paul
Joel Fernandes June 22, 2018, 6:19 p.m. UTC | #14
On Fri, Jun 22, 2018 at 06:28:43AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 21, 2018 at 10:56:59PM -0700, Joel Fernandes wrote:
> > Hi Paul,
> > 
> > On Wed, Jun 20, 2018 at 09:49:02AM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 21, 2018 at 01:05:22AM +0900, Byungchul Park wrote:
> > > > On Wed, Jun 20, 2018 at 11:58 PM, Paul E. McKenney
> > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > On Wed, Jun 20, 2018 at 05:47:20PM +0900, Byungchul Park wrote:
> > > > >> Hello folks,
> > > > >>
> > > > >> I'm careful in saying that ->dynticks_nmi_nesting can be removed but I
> > > > >> think it's possible since the only thing we are interested in with
> > > > >> regard to ->dynticks_nesting or ->dynticks_nmi_nesting is whether rcu is
> > > > >> idle or not.
> > > > >
> > > > > Please keep in mind that NMIs cannot be masked, which means that the
> > > > > rcu_nmi_enter() and rcu_nmi_exit() pair can be invoked at any point in
> > > > > the process, between any consecutive pair of instructions.  The saving
> > > 
> > > And yes, I should have looked at this patch more closely before replying.
> > > But please see below.
> > > 
> > > > I believe I understand what NMI is and why you introduced
> > > > ->dynticks_nmi_nesting. Or am I missing something?
> > > 
> > > Perhaps the fact that there are architectures that can enter interrupt
> > > handlers and never leave them when the CPU is non-idle.  One example of
> > > this is the usermode upcalls in the comment that you removed.
> > 
> > I spent some time tonight and last night trying to understand this concept of
> > never leaving an interrupt, I hope you don't mind me asking this dumb
> > question... perhaps I will learn something : Could you let me know how is it
> > possible that an interrupt never exits?
> > 
> > Typically an interrupt never exiting sounds like a hard-lockup. This is how
> > hardlock detector works: Since regular interrupts in linux can't nest, the
> > hardlockup detector checks if hrtimer interrupts are being handled and if
> > not, then it throws a splat, panics the kernel etc. So I am a bit troubled by
> > this interrupt never exiting concept..
> > 
> > Further since an interrupt is an atomic context, it cannot sleep or schedule
> > into usermode so how are these upcalls handled from the interrupt?
> 
> It has been some years since I traced the code flow, but what happened

No problem, thanks for the discussion. :)

> back then is that it switches itself from an interrupt handler to not
> without actually returning from the interrupt.  This can only happen when
> interrupting a non-idle process, thankfully, and RCU's dyntick-idle code
> relies on this restriction.  If I remember correctly, the code ends up
> executing in the context of the interrupted process, but it has been some
> years, so please apply appropriate skepticism.

Sure. So in a later thread you mentioned "usermode helpers". I took a closer
look at that subsystem, and it seems you can execute usermode helpers from
atomic sections with help of UMH_NO_WAIT flag.

Then I checked where this flag is used and it turns out its from the
mce_work_trigger function in x86/kernel/cpu/mcheck/dev-mcelog.c which can be
called infact from an interrupt context (mce_notify_irq).

Is this the usecase you remember causing this weird transitions to userspace?

> Please take a look at the "Interrupts and NMIs" section of the file
> Documentation/RCU/Design/Requirements/Requirements.html for a bit
> more information.

Sure, thanks for the pointer.

> > Lastly, can you point me to an example how the rcu_nmi_enter/exit() pair can go
> > out sync? That is they aren't paired and nested properly? In my mind they
> > always should be but I may be missing the usecase. I'm happy to try and
> > reproduce and trace this if you can let me know how to so that I can study
> > it better. 
> 
> I have never seen NMIs be unpaired or improperly nested.  However,
> given that rcu_irq_enter() invokes rcu_nmi_enter() and rcu_irq_exit()
> invokes rcu_nmi_exit(), it is definitely the case that rcu_nmi_enter()
> and rcu_nmi_exit() need to deal with unpaired and improperly nested
> invocations.

Just wondering how would the fact that rcu_irq_enter calls into rcu_nmi_enter
cause an improper nesting?

Just to define what "improper nesting" means, if we can go through an
example. Do you mean a scenario like?

rcu_nmi_enter (called because of NMI)
rcu_nmi_enter (called because of IRQ)
rcu_nmi_exit (called because of NMI)
rcu_nmi_exit (called because of IRQ)

Such scenario seems impossible to me because the IRQ can't be entered after
the NMI entered.

On the other hand, if you meant that when IRQ is being handled, an NMI fires
just before the rcu_irq_enter calls rcu_nmi_enter, then the worst that could
happen seems to be that the rcu_nmi_enter/exit pairs will not be nested
within the outer rcu_nmi_enter/exit pair even though the NMI interrupted the
IRQ. So it'll be something like:

rcu_nmi_enter (called because of NMI)
rcu_nmi_exit (called because of NMI)
rcu_nmi_enter (called because of IRQ)
rcu_nmi_exit (called because of IRQ)

Even though what really happened in the real world is:

IRQ entered
NMI entered
NMI exited
IRQ exited

This also seems reasonable to me, but is this what you meant by improper
nesting of the rcu_nmi_enter/exit? If yes, what makes it unreasonable?

> So why this function-call structure?  Well, you see, NMI handlers can
> take what appear to RCU to be normal interrupts...
> 
> (And I just added that fun fact to Requirements.html.)

Yes, I'll definitely go through all the interrupt requirements in the doc and
thanks for referring me to it.

thanks,

 - Joel
Steven Rostedt June 22, 2018, 6:32 p.m. UTC | #15
On Fri, 22 Jun 2018 11:19:16 -0700
Joel Fernandes <joel@joelfernandes.org> wrote:

> Sure. So in a later thread you mentioned "usermode helpers". I took a closer
> look at that subsystem, and it seems you can execute usermode helpers from
> atomic sections with help of UMH_NO_WAIT flag.
> 
> Then I checked where this flag is used and it turns out its from the
> mce_work_trigger function in x86/kernel/cpu/mcheck/dev-mcelog.c which can be
> called infact from an interrupt context (mce_notify_irq).
> 
> Is this the usecase you remember causing this weird transitions to userspace?

But this case still looks like it uses work queues, it just doesn't
wait for the result.

I'll have to look at the code from what it looked like back in 2011, to
see if there was an actual issue here back then.

-- Steve
Joel Fernandes June 22, 2018, 8:05 p.m. UTC | #16
On Fri, Jun 22, 2018 at 02:32:47PM -0400, Steven Rostedt wrote:
> On Fri, 22 Jun 2018 11:19:16 -0700
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > Sure. So in a later thread you mentioned "usermode helpers". I took a closer
> > look at that subsystem, and it seems you can execute usermode helpers from
> > atomic sections with help of UMH_NO_WAIT flag.
> > 
> > Then I checked where this flag is used and it turns out its from the
> > mce_work_trigger function in x86/kernel/cpu/mcheck/dev-mcelog.c which can be
> > called infact from an interrupt context (mce_notify_irq).
> > 
> > Is this the usecase you remember causing this weird transitions to userspace?
> 
> But this case still looks like it uses work queues, it just doesn't
> wait for the result.
> 
> I'll have to look at the code from what it looked like back in 2011, to
> see if there was an actual issue here back then.

Good point Steve. So I guess in the current kernel sources, there's no code
that uses UMH in IRQ context AFAICT. I'll go through the google group thread
Paul pointed as well to study the history of the problem a bit more.

thanks,

- Joel
Paul E. McKenney June 22, 2018, 8:58 p.m. UTC | #17
On Fri, Jun 22, 2018 at 11:19:16AM -0700, Joel Fernandes wrote:
> On Fri, Jun 22, 2018 at 06:28:43AM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 21, 2018 at 10:56:59PM -0700, Joel Fernandes wrote:
> > > Hi Paul,
> > > 
> > > On Wed, Jun 20, 2018 at 09:49:02AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Jun 21, 2018 at 01:05:22AM +0900, Byungchul Park wrote:
> > > > > On Wed, Jun 20, 2018 at 11:58 PM, Paul E. McKenney
> > > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > On Wed, Jun 20, 2018 at 05:47:20PM +0900, Byungchul Park wrote:
> > > > > >> Hello folks,
> > > > > >>
> > > > > >> I'm careful in saying that ->dynticks_nmi_nesting can be removed but I
> > > > > >> think it's possible since the only thing we are interested in with
> > > > > >> regard to ->dynticks_nesting or ->dynticks_nmi_nesting is whether rcu is
> > > > > >> idle or not.
> > > > > >
> > > > > > Please keep in mind that NMIs cannot be masked, which means that the
> > > > > > rcu_nmi_enter() and rcu_nmi_exit() pair can be invoked at any point in
> > > > > > the process, between any consecutive pair of instructions.  The saving
> > > > 
> > > > And yes, I should have looked at this patch more closely before replying.
> > > > But please see below.
> > > > 
> > > > > I believe I understand what NMI is and why you introduced
> > > > > ->dynticks_nmi_nesting. Or am I missing something?
> > > > 
> > > > Perhaps the fact that there are architectures that can enter interrupt
> > > > handlers and never leave them when the CPU is non-idle.  One example of
> > > > this is the usermode upcalls in the comment that you removed.
> > > 
> > > I spent some time tonight and last night trying to understand this concept of
> > > never leaving an interrupt, I hope you don't mind me asking this dumb
> > > question... perhaps I will learn something : Could you let me know how is it
> > > possible that an interrupt never exits?
> > > 
> > > Typically an interrupt never exiting sounds like a hard-lockup. This is how
> > > hardlock detector works: Since regular interrupts in linux can't nest, the
> > > hardlockup detector checks if hrtimer interrupts are being handled and if
> > > not, then it throws a splat, panics the kernel etc. So I am a bit troubled by
> > > this interrupt never exiting concept..
> > > 
> > > Further since an interrupt is an atomic context, it cannot sleep or schedule
> > > into usermode so how are these upcalls handled from the interrupt?
> > 
> > It has been some years since I traced the code flow, but what happened
> 
> No problem, thanks for the discussion. :)
> 
> > back then is that it switches itself from an interrupt handler to not
> > without actually returning from the interrupt.  This can only happen when
> > interrupting a non-idle process, thankfully, and RCU's dyntick-idle code
> > relies on this restriction.  If I remember correctly, the code ends up
> > executing in the context of the interrupted process, but it has been some
> > years, so please apply appropriate skepticism.
> 
> Sure. So in a later thread you mentioned "usermode helpers". I took a closer
> look at that subsystem, and it seems you can execute usermode helpers from
> atomic sections with help of UMH_NO_WAIT flag.
> 
> Then I checked where this flag is used and it turns out its from the
> mce_work_trigger function in x86/kernel/cpu/mcheck/dev-mcelog.c which can be
> called infact from an interrupt context (mce_notify_irq).
> 
> Is this the usecase you remember causing this weird transitions to userspace?

It was on Power, so it was not this one.

> > Please take a look at the "Interrupts and NMIs" section of the file
> > Documentation/RCU/Design/Requirements/Requirements.html for a bit
> > more information.
> 
> Sure, thanks for the pointer.
> 
> > > Lastly, can you point me to an example how the rcu_nmi_enter/exit() pair can go
> > > out sync? That is they aren't paired and nested properly? In my mind they
> > > always should be but I may be missing the usecase. I'm happy to try and
> > > reproduce and trace this if you can let me know how to so that I can study
> > > it better. 
> > 
> > I have never seen NMIs be unpaired or improperly nested.  However,
> > given that rcu_irq_enter() invokes rcu_nmi_enter() and rcu_irq_exit()
> > invokes rcu_nmi_exit(), it is definitely the case that rcu_nmi_enter()
> > and rcu_nmi_exit() need to deal with unpaired and improperly nested
> > invocations.
> 
> Just wondering how would the fact that rcu_irq_enter calls into rcu_nmi_enter
> cause an improper nesting?
> 
> Just to define what "improper nesting" means, if we can go through an
> example. Do you mean a scenario like?
> 
> rcu_nmi_enter (called because of NMI)
> rcu_nmi_enter (called because of IRQ)
> rcu_nmi_exit (called because of NMI)
> rcu_nmi_exit (called because of IRQ)
> 
> Such scenario seems impossible to me because the IRQ can't be entered after
> the NMI entered.

That is OK.  NMIs really can nest.  Or more accurately, things that
appear to be NMIs from RCU's viewpoint can nest within other things
that appear to be NMIs from RCU's viewpoint.

> On the other hand, if you meant that when IRQ is being handled, an NMI fires
> just before the rcu_irq_enter calls rcu_nmi_enter, then the worst that could
> happen seems to be that the rcu_nmi_enter/exit pairs will not be nested
> within the outer rcu_nmi_enter/exit pair even though the NMI interrupted the
> IRQ. So it'll be something like:
> 
> rcu_nmi_enter (called because of NMI)
> rcu_nmi_exit (called because of NMI)
> rcu_nmi_enter (called because of IRQ)
> rcu_nmi_exit (called because of IRQ)
> 
> Even though what really happened in the real world is:
> 
> IRQ entered
> NMI entered
> NMI exited
> IRQ exited
> 
> This also seems reasonable to me, but is this what you meant by improper
> nesting of the rcu_nmi_enter/exit? If yes, what makes it unreasonable?

Something like this:

	IRQ entered

And never exited.  Ever.  I actually saw this in 2011.

Or something like this:

	IRQ exited

Without a corresponding IRQ enter.

The current code handles both of these situations, at least assuming
that the interrupt entry/exit happens during a non-idle period.

> > So why this function-call structure?  Well, you see, NMI handlers can
> > take what appear to RCU to be normal interrupts...
> > 
> > (And I just added that fun fact to Requirements.html.)
> 
> Yes, I'll definitely go through all the interrupt requirements in the doc and
> thanks for referring me to it.

My concern may well be obsolete.  It would be good if it was!  ;-)

								Thanx, Paul
Paul E. McKenney June 22, 2018, 8:58 p.m. UTC | #18
On Fri, Jun 22, 2018 at 02:32:47PM -0400, Steven Rostedt wrote:
> On Fri, 22 Jun 2018 11:19:16 -0700
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > Sure. So in a later thread you mentioned "usermode helpers". I took a closer
> > look at that subsystem, and it seems you can execute usermode helpers from
> > atomic sections with help of UMH_NO_WAIT flag.
> > 
> > Then I checked where this flag is used and it turns out its from the
> > mce_work_trigger function in x86/kernel/cpu/mcheck/dev-mcelog.c which can be
> > called infact from an interrupt context (mce_notify_irq).
> > 
> > Is this the usecase you remember causing this weird transitions to userspace?
> 
> But this case still looks like it uses work queues, it just doesn't
> wait for the result.
> 
> I'll have to look at the code from what it looked like back in 2011, to
> see if there was an actual issue here back then.

There was.  Been there, got the failure on real hardware.  ;-)

							Thanx, Paul
Steven Rostedt June 22, 2018, 9 p.m. UTC | #19
On Fri, 22 Jun 2018 13:58:13 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> Something like this:
> 
> 	IRQ entered
> 
> And never exited.  Ever.  I actually saw this in 2011.

I still believe this was actually a bug. And perhaps you made the RCU
code robust enough to handle this bug ;-)

> 
> Or something like this:
> 
> 	IRQ exited
> 
> Without a corresponding IRQ enter.
> 
> The current code handles both of these situations, at least assuming
> that the interrupt entry/exit happens during a non-idle period.
> 
> > > So why this function-call structure?  Well, you see, NMI handlers can
> > > take what appear to RCU to be normal interrupts...
> > > 
> > > (And I just added that fun fact to Requirements.html.)  
> > 
> > Yes, I'll definitely go through all the interrupt requirements in the doc and
> > thanks for referring me to it.  
> 
> My concern may well be obsolete.  It would be good if it was!  ;-)

I'd love to mandate that irq_enter() must be paired with irq_exit(). I
don't really see any rationale for it to be otherwise. If there is a
case, perhaps it needs to be fixed.

-- Steve
Paul E. McKenney June 22, 2018, 9:16 p.m. UTC | #20
On Fri, Jun 22, 2018 at 05:00:42PM -0400, Steven Rostedt wrote:
> On Fri, 22 Jun 2018 13:58:13 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > Something like this:
> > 
> > 	IRQ entered
> > 
> > And never exited.  Ever.  I actually saw this in 2011.
> 
> I still believe this was actually a bug. And perhaps you made the RCU
> code robust enough to handle this bug ;-)

Welcome to my world!

But I recall it being used in several places, so if it was a bug, it
was an intentional bug.  Probably the worst kind.

Sort of like nested NMIs and interrupts within NMI handlers.  ;-)

> > Or something like this:
> > 
> > 	IRQ exited
> > 
> > Without a corresponding IRQ enter.
> > 
> > The current code handles both of these situations, at least assuming
> > that the interrupt entry/exit happens during a non-idle period.
> > 
> > > > So why this function-call structure?  Well, you see, NMI handlers can
> > > > take what appear to RCU to be normal interrupts...
> > > > 
> > > > (And I just added that fun fact to Requirements.html.)  
> > > 
> > > Yes, I'll definitely go through all the interrupt requirements in the doc and
> > > thanks for referring me to it.  
> > 
> > My concern may well be obsolete.  It would be good if it was!  ;-)
> 
> I'd love to mandate that irq_enter() must be paired with irq_exit(). I
> don't really see any rationale for it to be otherwise. If there is a
> case, perhaps it needs to be fixed.

Given that the usermode helpers now look to be common code using
workqueues, kthreads, and calls to do_execve(), it might well be that
the days of half-interrupts are behind us.

But how to actually validate this?  My offer of adding a WARN_ON_ONCE()
and waiting a few years still stands, but perhaps you have a better
approach.

							Thanx, Paul
Andy Lutomirski June 22, 2018, 10:03 p.m. UTC | #21
On Fri, Jun 22, 2018 at 2:14 PM Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> On Fri, Jun 22, 2018 at 05:00:42PM -0400, Steven Rostedt wrote:
> > On Fri, 22 Jun 2018 13:58:13 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >
> > > Something like this:
> > >
> > >     IRQ entered
> > >
> > > And never exited.  Ever.  I actually saw this in 2011.
> >
> > I still believe this was actually a bug. And perhaps you made the RCU
> > code robust enough to handle this bug ;-)
>
> Welcome to my world!
>
> But I recall it being used in several places, so if it was a bug, it
> was an intentional bug.  Probably the worst kind.
>
> Sort of like nested NMIs and interrupts within NMI handlers.  ;-)
>
> > > Or something like this:
> > >
> > >     IRQ exited
> > >
> > > Without a corresponding IRQ enter.
> > >
> > > The current code handles both of these situations, at least assuming
> > > that the interrupt entry/exit happens during a non-idle period.
> > >
> > > > > So why this function-call structure?  Well, you see, NMI handlers can
> > > > > take what appear to RCU to be normal interrupts...
> > > > >
> > > > > (And I just added that fun fact to Requirements.html.)
> > > >
> > > > Yes, I'll definitely go through all the interrupt requirements in the doc and
> > > > thanks for referring me to it.
> > >
> > > My concern may well be obsolete.  It would be good if it was!  ;-)
> >
> > I'd love to mandate that irq_enter() must be paired with irq_exit(). I
> > don't really see any rationale for it to be otherwise. If there is a
> > case, perhaps it needs to be fixed.
>
> Given that the usermode helpers now look to be common code using
> workqueues, kthreads, and calls to do_execve(), it might well be that
> the days of half-interrupts are behind us.
>
> But how to actually validate this?  My offer of adding a WARN_ON_ONCE()
> and waiting a few years still stands, but perhaps you have a better
> approach.

I think you should add a WARN_ON_ONCE().  Let's get the bugs fixed.

--Andy
Joel Fernandes June 23, 2018, 3:48 p.m. UTC | #22
On Fri, Jun 22, 2018 at 02:16:00PM -0700, Paul E. McKenney wrote:
> On Fri, Jun 22, 2018 at 05:00:42PM -0400, Steven Rostedt wrote:
> > On Fri, 22 Jun 2018 13:58:13 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > Something like this:
> > > 
> > > 	IRQ entered
> > > 
> > > And never exited.  Ever.  I actually saw this in 2011.
> > 
> > I still believe this was actually a bug. And perhaps you made the RCU
> > code robust enough to handle this bug ;-)
> 
> Welcome to my world!
> 
> But I recall it being used in several places, so if it was a bug, it
> was an intentional bug.  Probably the worst kind.
> 
> Sort of like nested NMIs and interrupts within NMI handlers.  ;-)
> 
> > > Or something like this:
> > > 
> > > 	IRQ exited
> > > 
> > > Without a corresponding IRQ enter.
> > > 
> > > The current code handles both of these situations, at least assuming
> > > that the interrupt entry/exit happens during a non-idle period.
> > > 
> > > > > So why this function-call structure?  Well, you see, NMI handlers can
> > > > > take what appear to RCU to be normal interrupts...
> > > > > 
> > > > > (And I just added that fun fact to Requirements.html.)  
> > > > 
> > > > Yes, I'll definitely go through all the interrupt requirements in the doc and
> > > > thanks for referring me to it.  
> > > 
> > > My concern may well be obsolete.  It would be good if it was!  ;-)
> > 
> > I'd love to mandate that irq_enter() must be paired with irq_exit(). I
> > don't really see any rationale for it to be otherwise. If there is a
> > case, perhaps it needs to be fixed.
> 
> Given that the usermode helpers now look to be common code using
> workqueues, kthreads, and calls to do_execve(), it might well be that
> the days of half-interrupts are behind us.
> 
> But how to actually validate this?  My offer of adding a WARN_ON_ONCE()
> and waiting a few years still stands, but perhaps you have a better
> approach.

Hi Paul, I am Ok with adding a warning for a couple of releases if you and
others are Ok with it, how about something like this? Feel free to use the
diff as a starting point or a different approach if you/others prefer
something else. Thanks.

---8<-----------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8788ddbc0d13..176de74f5027 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -767,6 +767,21 @@ void rcu_idle_enter(void)
  */
 void rcu_user_enter(void)
 {
+	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
+
+	/*
+	 * Add warning to ensure: no known instances of entering userspace from
+	 * IRQ/NMI handlers exist today. Currently special crowbarring of the
+	 * dynticks_nmi_nesting and maintaining of this separate counter when
+	 * dynticks_nesting exists, is done in order to handle entering of
+	 * userspace from an IRQ context and never returning. Lets track it for
+	 * a couple of kernel releases and then if the warning doesn't occur,
+	 * we can try to simplify the code and combine/eliminate the counters.
+	 * See: http://lkml.kernel.org/r/20180622181422.GT3593@linux.vnet.ibm.com
+	 */
+	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting &&
+		     rdtp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);
+
 	lockdep_assert_irqs_disabled();
 	rcu_eqs_enter(true);
 }
Paul E. McKenney June 23, 2018, 5:53 p.m. UTC | #23
On Fri, Jun 22, 2018 at 03:03:35PM -0700, Andy Lutomirski wrote:
> On Fri, Jun 22, 2018 at 2:14 PM Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > On Fri, Jun 22, 2018 at 05:00:42PM -0400, Steven Rostedt wrote:
> > > On Fri, 22 Jun 2018 13:58:13 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > >
> > > > Something like this:
> > > >
> > > >     IRQ entered
> > > >
> > > > And never exited.  Ever.  I actually saw this in 2011.
> > >
> > > I still believe this was actually a bug. And perhaps you made the RCU
> > > code robust enough to handle this bug ;-)
> >
> > Welcome to my world!
> >
> > But I recall it being used in several places, so if it was a bug, it
> > was an intentional bug.  Probably the worst kind.
> >
> > Sort of like nested NMIs and interrupts within NMI handlers.  ;-)
> >
> > > > Or something like this:
> > > >
> > > >     IRQ exited
> > > >
> > > > Without a corresponding IRQ enter.
> > > >
> > > > The current code handles both of these situations, at least assuming
> > > > that the interrupt entry/exit happens during a non-idle period.
> > > >
> > > > > > So why this function-call structure?  Well, you see, NMI handlers can
> > > > > > take what appear to RCU to be normal interrupts...
> > > > > >
> > > > > > (And I just added that fun fact to Requirements.html.)
> > > > >
> > > > > Yes, I'll definitely go through all the interrupt requirements in the doc and
> > > > > thanks for referring me to it.
> > > >
> > > > My concern may well be obsolete.  It would be good if it was!  ;-)
> > >
> > > I'd love to mandate that irq_enter() must be paired with irq_exit(). I
> > > don't really see any rationale for it to be otherwise. If there is a
> > > case, perhaps it needs to be fixed.
> >
> > Given that the usermode helpers now look to be common code using
> > workqueues, kthreads, and calls to do_execve(), it might well be that
> > the days of half-interrupts are behind us.
> >
> > But how to actually validate this?  My offer of adding a WARN_ON_ONCE()
> > and waiting a few years still stands, but perhaps you have a better
> > approach.
> 
> I think you should add a WARN_ON_ONCE().  Let's get the bugs fixed.

Or the obscure features identified, as the case may be.  ;-)

Either way, will do!

							Thanx, Paul
Paul E. McKenney June 23, 2018, 5:56 p.m. UTC | #24
On Sat, Jun 23, 2018 at 08:48:39AM -0700, Joel Fernandes wrote:
> On Fri, Jun 22, 2018 at 02:16:00PM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 22, 2018 at 05:00:42PM -0400, Steven Rostedt wrote:
> > > On Fri, 22 Jun 2018 13:58:13 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > 
> > > > Something like this:
> > > > 
> > > > 	IRQ entered
> > > > 
> > > > And never exited.  Ever.  I actually saw this in 2011.
> > > 
> > > I still believe this was actually a bug. And perhaps you made the RCU
> > > code robust enough to handle this bug ;-)
> > 
> > Welcome to my world!
> > 
> > But I recall it being used in several places, so if it was a bug, it
> > was an intentional bug.  Probably the worst kind.
> > 
> > Sort of like nested NMIs and interrupts within NMI handlers.  ;-)
> > 
> > > > Or something like this:
> > > > 
> > > > 	IRQ exited
> > > > 
> > > > Without a corresponding IRQ enter.
> > > > 
> > > > The current code handles both of these situations, at least assuming
> > > > that the interrupt entry/exit happens during a non-idle period.
> > > > 
> > > > > > So why this function-call structure?  Well, you see, NMI handlers can
> > > > > > take what appear to RCU to be normal interrupts...
> > > > > > 
> > > > > > (And I just added that fun fact to Requirements.html.)  
> > > > > 
> > > > > Yes, I'll definitely go through all the interrupt requirements in the doc and
> > > > > thanks for referring me to it.  
> > > > 
> > > > My concern may well be obsolete.  It would be good if it was!  ;-)
> > > 
> > > I'd love to mandate that irq_enter() must be paired with irq_exit(). I
> > > don't really see any rationale for it to be otherwise. If there is a
> > > case, perhaps it needs to be fixed.
> > 
> > Given that the usermode helpers now look to be common code using
> > workqueues, kthreads, and calls to do_execve(), it might well be that
> > the days of half-interrupts are behind us.
> > 
> > But how to actually validate this?  My offer of adding a WARN_ON_ONCE()
> > and waiting a few years still stands, but perhaps you have a better
> > approach.
> 
> Hi Paul, I am Ok with adding a warning for a couple of releases if you and
> others are Ok with it, how about something like this? Feel free to use the
> diff as a starting point or a different approach if you/others prefer
> something else. Thanks.

A few years rather than a few releases, but yes.  ;-)

The checks would need to go just before the "crowbar" stores.  I will put
something together after Byungchul's patches in this area have had time
to burn in for a few days.

							Thanx, Paul

> ---8<-----------------------
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8788ddbc0d13..176de74f5027 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -767,6 +767,21 @@ void rcu_idle_enter(void)
>   */
>  void rcu_user_enter(void)
>  {
> +	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> +
> +	/*
> +	 * Add warning to ensure: no known instances of entering userspace from
> +	 * IRQ/NMI handlers exist today. Currently special crowbarring of the
> +	 * dynticks_nmi_nesting and maintaining of this separate counter when
> +	 * dynticks_nesting exists, is done in order to handle entering of
> +	 * userspace from an IRQ context and never returning. Lets track it for
> +	 * a couple of kernel releases and then if the warning doesn't occur,
> +	 * we can try to simplify the code and combine/eliminate the counters.
> +	 * See: http://lkml.kernel.org/r/20180622181422.GT3593@linux.vnet.ibm.com
> +	 */
> +	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting &&
> +		     rdtp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);
> +
>  	lockdep_assert_irqs_disabled();
>  	rcu_eqs_enter(true);
>  }
>
Joel Fernandes June 24, 2018, 3:02 a.m. UTC | #25
On Sat, Jun 23, 2018 at 10:56:34AM -0700, Paul E. McKenney wrote:
> On Sat, Jun 23, 2018 at 08:48:39AM -0700, Joel Fernandes wrote:
> > On Fri, Jun 22, 2018 at 02:16:00PM -0700, Paul E. McKenney wrote:
> > > On Fri, Jun 22, 2018 at 05:00:42PM -0400, Steven Rostedt wrote:
> > > > On Fri, 22 Jun 2018 13:58:13 -0700
> > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > > 
> > > > > Something like this:
> > > > > 
> > > > > 	IRQ entered
> > > > > 
> > > > > And never exited.  Ever.  I actually saw this in 2011.
> > > > 
> > > > I still believe this was actually a bug. And perhaps you made the RCU
> > > > code robust enough to handle this bug ;-)
> > > 
> > > Welcome to my world!
> > > 
> > > But I recall it being used in several places, so if it was a bug, it
> > > was an intentional bug.  Probably the worst kind.
> > > 
> > > Sort of like nested NMIs and interrupts within NMI handlers.  ;-)
> > > 
> > > > > Or something like this:
> > > > > 
> > > > > 	IRQ exited
> > > > > 
> > > > > Without a corresponding IRQ enter.
> > > > > 
> > > > > The current code handles both of these situations, at least assuming
> > > > > that the interrupt entry/exit happens during a non-idle period.
> > > > > 
> > > > > > > So why this function-call structure?  Well, you see, NMI handlers can
> > > > > > > take what appear to RCU to be normal interrupts...
> > > > > > > 
> > > > > > > (And I just added that fun fact to Requirements.html.)  
> > > > > > 
> > > > > > Yes, I'll definitely go through all the interrupt requirements in the doc and
> > > > > > thanks for referring me to it.  
> > > > > 
> > > > > My concern may well be obsolete.  It would be good if it was!  ;-)
> > > > 
> > > > I'd love to mandate that irq_enter() must be paired with irq_exit(). I
> > > > don't really see any rationale for it to be otherwise. If there is a
> > > > case, perhaps it needs to be fixed.
> > > 
> > > Given that the usermode helpers now look to be common code using
> > > workqueues, kthreads, and calls to do_execve(), it might well be that
> > > the days of half-interrupts are behind us.
> > > 
> > > But how to actually validate this?  My offer of adding a WARN_ON_ONCE()
> > > and waiting a few years still stands, but perhaps you have a better
> > > approach.
> > 
> > Hi Paul, I am Ok with adding a warning for a couple of releases if you and
> > others are Ok with it, how about something like this? Feel free to use the
> > diff as a starting point or a different approach if you/others prefer
> > something else. Thanks.
> 
> A few years rather than a few releases, but yes.  ;-)
> The checks would need to go just before the "crowbar" stores.  

Ok. I guess that would work too.

> I will put something together after Byungchul's patches in this area have
> had time to burn in for a few days.

Sounds great, thanks!

 - Joel
Byungchul Park June 25, 2018, 8:28 a.m. UTC | #26
On Fri, Jun 22, 2018 at 01:05:48PM -0700, Joel Fernandes wrote:
> On Fri, Jun 22, 2018 at 02:32:47PM -0400, Steven Rostedt wrote:
> > On Fri, 22 Jun 2018 11:19:16 -0700
> > Joel Fernandes <joel@joelfernandes.org> wrote:
> > 
> > > Sure. So in a later thread you mentioned "usermode helpers". I took a closer
> > > look at that subsystem, and it seems you can execute usermode helpers from
> > > atomic sections with help of UMH_NO_WAIT flag.
> > > 
> > > Then I checked where this flag is used and it turns out its from the
> > > mce_work_trigger function in x86/kernel/cpu/mcheck/dev-mcelog.c which can be
> > > called infact from an interrupt context (mce_notify_irq).
> > > 
> > > Is this the usecase you remember causing this weird transitions to userspace?
> > 
> > But this case still looks like it uses work queues, it just doesn't
> > wait for the result.
> > 
> > I'll have to look at the code from what it looked like back in 2011, to
> > see if there was an actual issue here back then.
> 
> Good point Steve. So I guess in the current kernel sources, there's no code
> that uses UMH in IRQ context AFAICT. I'll go through the google group thread
> Paul pointed as well to study the history of the problem a bit more.

Me too. Good discussion we had thanks to you, Joel.

> thanks,
> 
> - Joel
Joel Fernandes June 25, 2018, 4:39 p.m. UTC | #27
On Mon, Jun 25, 2018 at 05:28:24PM +0900, Byungchul Park wrote:
> On Fri, Jun 22, 2018 at 01:05:48PM -0700, Joel Fernandes wrote:
> > On Fri, Jun 22, 2018 at 02:32:47PM -0400, Steven Rostedt wrote:
> > > On Fri, 22 Jun 2018 11:19:16 -0700
> > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > > 
> > > > Sure. So in a later thread you mentioned "usermode helpers". I took a closer
> > > > look at that subsystem, and it seems you can execute usermode helpers from
> > > > atomic sections with help of UMH_NO_WAIT flag.
> > > > 
> > > > Then I checked where this flag is used and it turns out its from the
> > > > mce_work_trigger function in x86/kernel/cpu/mcheck/dev-mcelog.c which can be
> > > > called infact from an interrupt context (mce_notify_irq).
> > > > 
> > > > Is this the usecase you remember causing this weird transitions to userspace?
> > > 
> > > But this case still looks like it uses work queues, it just doesn't
> > > wait for the result.
> > > 
> > > I'll have to look at the code from what it looked like back in 2011, to
> > > see if there was an actual issue here back then.
> > 
> > Good point Steve. So I guess in the current kernel sources, there's no code
> > that uses UMH in IRQ context AFAICT. I'll go through the google group thread
> > Paul pointed as well to study the history of the problem a bit more.
> 
> Me too. Good discussion we had thanks to you, Joel.

No problem, thanks for the patch in the first place which triggered this
discussion.

For whatever its worth, I made some notes of what I understood from reading
the code and old posts because I was sure I would otherwise forget
everything:
http://www.joelfernandes.org/linuxinternals/2018/06/15/rcu-dynticks.html

Feel free to comment on that post directly (or here) if you feel something is
grossly wrong.

Again thank you and everyone for the discussion! ;-)

 - Joel
Paul E. McKenney June 25, 2018, 5:19 p.m. UTC | #28
On Mon, Jun 25, 2018 at 09:39:51AM -0700, Joel Fernandes wrote:
> On Mon, Jun 25, 2018 at 05:28:24PM +0900, Byungchul Park wrote:
> > On Fri, Jun 22, 2018 at 01:05:48PM -0700, Joel Fernandes wrote:
> > > On Fri, Jun 22, 2018 at 02:32:47PM -0400, Steven Rostedt wrote:
> > > > On Fri, 22 Jun 2018 11:19:16 -0700
> > > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > > > 
> > > > > Sure. So in a later thread you mentioned "usermode helpers". I took a closer
> > > > > look at that subsystem, and it seems you can execute usermode helpers from
> > > > > atomic sections with help of UMH_NO_WAIT flag.
> > > > > 
> > > > > Then I checked where this flag is used and it turns out its from the
> > > > > mce_work_trigger function in x86/kernel/cpu/mcheck/dev-mcelog.c which can be
> > > > > called infact from an interrupt context (mce_notify_irq).
> > > > > 
> > > > > Is this the usecase you remember causing this weird transitions to userspace?
> > > > 
> > > > But this case still looks like it uses work queues, it just doesn't
> > > > wait for the result.
> > > > 
> > > > I'll have to look at the code from what it looked like back in 2011, to
> > > > see if there was an actual issue here back then.
> > > 
> > > Good point Steve. So I guess in the current kernel sources, there's no code
> > > that uses UMH in IRQ context AFAICT. I'll go through the google group thread
> > > Paul pointed as well to study the history of the problem a bit more.
> > 
> > Me too. Good discussion we had thanks to you, Joel.
> 
> No problem, thanks for the patch in the first place which triggered this
> discussion.
> 
> For whatever its worth, I made some notes of what I understood from reading
> the code and old posts because I was sure I would otherwise forget
> everything:
> http://www.joelfernandes.org/linuxinternals/2018/06/15/rcu-dynticks.html
> 
> Feel free to comment on that post directly (or here) if you feel something is
> grossly wrong.
> 
> Again thank you and everyone for the discussion! ;-)

Not a bad writeup!  A few comments, as usual...

							Thanx, Paul

------------------------------------------------------------------------

	When I traced rdtp->dynticks_nesting, I could only find its
	value to be either a 0 or a 1. However looking back at old kernel
	sources, it appears that these can be nested becaues of so called
	“half-interrupts”. I believe these are basically interrupts
	that cause a transition to usermode due to usermode upcalls
	(usermode helper subsystem). So a nesting situation could be
	something like: 1. Transition from idle to process context which
	makes dynticks_nesting == 1. Next, an interrupt comes in which
	makes a usermode upcall. This usermode call now makes a system
	call causing entry back into process context, which increments
	the dynticks_nesting counter to 2. Such a crazy situation is
	perhaps possible.

The half-interrupts can instead cause ->dynticks_nmi_nesting to either
fail to return to zero or to go negative, depending on which half of
the interrupt was present.  I don't immediately recall the reason for
allowing nested process-level entry/exit.  Might be another place to
put a WARN_ON_ONCE(), as eliminating this capability would save another
conditional branch.

	Any time the rdtp->dynticks counter’s second-lowest most bit
	is not set, we are in an EQS, and if its set, then we are not
	(second lowest because lowest is reserved for something else as
	of v4.18-rc1). This function is not useful to check if we’re
	in an EQS from a timer tick though, because its possible the
	timer tick interrupt entry caused an EQS exit which updated
	the counter. IOW, the ‘dynticks’ counter is not capable of
	checking if we had already exited the EQS before. To check if
	we were in an EQS or not from the timer tick, we instead must
	use dynticks_nesting counter. More on that later. The above
	function is probably just useful to make sure that interrupt
	entry/exit is properly updating the dynticks counter, and also
	to make sure from non-interrupt context that RCU is in an EQS
	(see rcu_gp_fqs function).

You lost me on this one.  There is rcu_is_cpu_rrupt_from_idle(), but
I am not sure what you are trying to achieve here, so I am not sure
whether this function does what you want.

	When dynticks_nesting is decremented to 0 (the outermost
	process-context nesting level exit causes an eqs-entry), the
	dynticks_nmi_nesting is reset to

I think you want "0." at the end of this sentence.  Or maybe my browser
is messing things up.

							Thanx, Paul
Joel Fernandes June 25, 2018, 7:15 p.m. UTC | #29
Hi Paul,

Thanks a lot for your comments, my replies inline:

On Mon, Jun 25, 2018 at 10:19:20AM -0700, Paul E. McKenney wrote:
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> 	When I traced rdtp->dynticks_nesting, I could only find its
> 	value to be either a 0 or a 1. However looking back at old kernel
> 	sources, it appears that these can be nested becaues of so called
> 	“half-interrupts”. I believe these are basically interrupts
> 	that cause a transition to usermode due to usermode upcalls
> 	(usermode helper subsystem). So a nesting situation could be
> 	something like: 1. Transition from idle to process context which
> 	makes dynticks_nesting == 1. Next, an interrupt comes in which
> 	makes a usermode upcall. This usermode call now makes a system
> 	call causing entry back into process context, which increments
> 	the dynticks_nesting counter to 2. Such a crazy situation is
> 	perhaps possible.
> 
> The half-interrupts can instead cause ->dynticks_nmi_nesting to either
> fail to return to zero or to go negative, depending on which half of

Actually in the above paragraph I was referring to a "half interrupt" messing
up dynticks_nesting, not dynticks_nmi_nesting. I know that the latter can be
messed up too but I wasn't referring to dynticks_nmi_nesting in this part of
the article.

I was thinking more in terms of the comment in:
https://elixir.bootlin.com/linux/v3.19.8/source/kernel/rcu/rcu.h#L34
/*
 * Process-level increment to ->dynticks_nesting field.  This allows for
 * architectures that use half-interrupts and half-exceptions from
 * process context.
  ... */

 In my hypothetical example above that you quoted from my notes, I was trying
 to reason about how taking a half-interrupt in process context can cause
 dynticks_nesting to increase to 2. Thinking some more though, I am not sure
 how the above hypothetical example I mentioned can cause this ;) since the
 transition to usermode from the half-interrupt should have corrected the
 dynticks_nesting counter due to the callchain: rcu_user_enter->rcu_eqs_enter ?

> the interrupt was present.  I don't immediately recall the reason for
> allowing nested process-level entry/exit.  Might be another place to
> put a WARN_ON_ONCE(), as eliminating this capability would save another
> conditional branch.

Sure, sounds good to me.

> 
> 	Any time the rdtp->dynticks counter’s second-lowest most bit
> 	is not set, we are in an EQS, and if its set, then we are not
> 	(second lowest because lowest is reserved for something else as
> 	of v4.18-rc1). This function is not useful to check if we’re
> 	in an EQS from a timer tick though, because its possible the
> 	timer tick interrupt entry caused an EQS exit which updated
> 	the counter. IOW, the ‘dynticks’ counter is not capable of
> 	checking if we had already exited the EQS before. To check if
> 	we were in an EQS or not from the timer tick, we instead must
> 	use dynticks_nesting counter. More on that later. The above
> 	function is probably just useful to make sure that interrupt
> 	entry/exit is properly updating the dynticks counter, and also
> 	to make sure from non-interrupt context that RCU is in an EQS
> 	(see rcu_gp_fqs function).
> 
> You lost me on this one.  There is rcu_is_cpu_rrupt_from_idle(), but
> I am not sure what you are trying to achieve here, so I am not sure
> whether this function does what you want.

Sorry about that. Let me try to explain in detail about why I wrote the above
paragraph when talking about rdtp->dynticks.

I was trying to determine how the RCU code determines if the CPU is idle. It
appears from the code that there are 2 ways it does so:

1. By calling rcu_is_cpu_rrupt_from_idle() which checks for the
dynticks_nesting counter. If the counter is 0, then CPU was idle at the time
of the check. This is how rcu_check_callbacks knows that the CPU was idle.

2. By checking for evenness of the dynticks counter. If its even we were idle
(or perhaps in usermode, but I think that extra inference doesn't hurt). This
is done in rcu_dynticks_curr_cpu_in_eqs.

So basically, there are 2 different counters that seem to serve the same
purpose as far as determining if we're in an idle EQS state goes. Right?

Then I was trying to see why we can't just use method 2. in
rcu_check_callbacks to determine if the "timer interrupt was taken while the
CPU was idle".  rcu_check_callbacks could simply call
rcu_dynticks_curr_cpu_in_eqs() from rcu_check_callbacks(). I was trying to
convince myself why that wouldn't work.

I concluded that that wouldn't work because the timer interrupt that led to
the rcu_check_callbacks() call would have tainted the dynticks counter
because of it would have called rcu_nmi_enter() during interrupt entry. So
there's no way to know if the CPU was really idle at the time of the
interrupt if we were to rely on rcu_dynticks_curr_cpu_in_eqs for that. Hence
we would need to rely on method 1 for the "did I take an interrupt while I
was idle" in rcu_check_callbacks() function which uses the dynticks_nesting
counter for this determination. Does that make sense?

> 
> 	When dynticks_nesting is decremented to 0 (the outermost
> 	process-context nesting level exit causes an eqs-entry), the
> 	dynticks_nmi_nesting is reset to
> 
> I think you want "0." at the end of this sentence.  Or maybe my browser
> is messing things up.

Yes the 0. was on the next line, but I moved it back to the previous line so
its easier to read. Thanks for letting me know.

Thanks!

 - Joel
Steven Rostedt June 25, 2018, 8:25 p.m. UTC | #30
On Mon, 25 Jun 2018 09:39:51 -0700
Joel Fernandes <joel@joelfernandes.org> wrote:

> For whatever its worth, I made some notes of what I understood from reading
> the code and old posts because I was sure I would otherwise forget
> everything:
> http://www.joelfernandes.org/linuxinternals/2018/06/15/rcu-dynticks.html

Nice write up. I may point some people to this ;-)

Anyway "complications due to nested NMIs (yes NMIs can nest!)"

What arch allows for NMIs to nest. Because we don't let that happen on
x86, and there's code that I know of that is called by NMIs that is not
re-entrant, and can crash if we allow for NMIs to nest. For example
"in_nmi()" will not show that we are in_nmi() if we allow for nesting
of NMIs. It has a single bit that gets incremented when we enter NMI
code, and cleared when we leave it.

-- Steve
Paul E. McKenney June 25, 2018, 8:47 p.m. UTC | #31
On Mon, Jun 25, 2018 at 04:25:57PM -0400, Steven Rostedt wrote:
> On Mon, 25 Jun 2018 09:39:51 -0700
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > For whatever its worth, I made some notes of what I understood from reading
> > the code and old posts because I was sure I would otherwise forget
> > everything:
> > http://www.joelfernandes.org/linuxinternals/2018/06/15/rcu-dynticks.html
> 
> Nice write up. I may point some people to this ;-)
> 
> Anyway "complications due to nested NMIs (yes NMIs can nest!)"
> 
> What arch allows for NMIs to nest. Because we don't let that happen on
> x86, and there's code that I know of that is called by NMIs that is not
> re-entrant, and can crash if we allow for NMIs to nest. For example
> "in_nmi()" will not show that we are in_nmi() if we allow for nesting
> of NMIs. It has a single bit that gets incremented when we enter NMI
> code, and cleared when we leave it.

Last I checked with Andy Lutomirski, there are a number of things that,
though not NMIs, act like NMIs and that can interrupt each others'
handlers.  This is on x86.

							Thanx, Paul
Andy Lutomirski June 25, 2018, 8:47 p.m. UTC | #32
On Mon, Jun 25, 2018 at 1:45 PM Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> On Mon, Jun 25, 2018 at 04:25:57PM -0400, Steven Rostedt wrote:
> > On Mon, 25 Jun 2018 09:39:51 -0700
> > Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > > For whatever its worth, I made some notes of what I understood from reading
> > > the code and old posts because I was sure I would otherwise forget
> > > everything:
> > > http://www.joelfernandes.org/linuxinternals/2018/06/15/rcu-dynticks.html
> >
> > Nice write up. I may point some people to this ;-)
> >
> > Anyway "complications due to nested NMIs (yes NMIs can nest!)"
> >
> > What arch allows for NMIs to nest. Because we don't let that happen on
> > x86, and there's code that I know of that is called by NMIs that is not
> > re-entrant, and can crash if we allow for NMIs to nest. For example
> > "in_nmi()" will not show that we are in_nmi() if we allow for nesting
> > of NMIs. It has a single bit that gets incremented when we enter NMI
> > code, and cleared when we leave it.
>
> Last I checked with Andy Lutomirski, there are a number of things that,
> though not NMIs, act like NMIs and that can interrupt each others'
> handlers.  This is on x86.
>

As a straightforward example, NMI and MCE can nest inside each other.
IIRC we treat #DB somewhat NMI-ish-ly as well.

--Andy


>                                                         Thanx, Paul
>
Joel Fernandes June 25, 2018, 9:25 p.m. UTC | #33
On Mon, Jun 25, 2018 at 04:25:57PM -0400, Steven Rostedt wrote:
> On Mon, 25 Jun 2018 09:39:51 -0700
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > For whatever its worth, I made some notes of what I understood from reading
> > the code and old posts because I was sure I would otherwise forget
> > everything:
> > http://www.joelfernandes.org/linuxinternals/2018/06/15/rcu-dynticks.html
> 
> Nice write up. I may point some people to this ;-)

Thanks! :)

> Anyway "complications due to nested NMIs (yes NMIs can nest!)"
> 
> What arch allows for NMIs to nest. Because we don't let that happen on
> x86, and there's code that I know of that is called by NMIs that is not
> re-entrant, and can crash if we allow for NMIs to nest. For example
> "in_nmi()" will not show that we are in_nmi() if we allow for nesting
> of NMIs. It has a single bit that gets incremented when we enter NMI
> code, and cleared when we leave it.

Great point. Andy mentioned something about MCEs in the other thread, I'll
let him clarify how such nesting is safe in light of what you mentioned. ;-)

Thanks,

- Joel
Steven Rostedt June 25, 2018, 10:15 p.m. UTC | #34
On Mon, 25 Jun 2018 13:47:08 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Mon, Jun 25, 2018 at 04:25:57PM -0400, Steven Rostedt wrote:
> > On Mon, 25 Jun 2018 09:39:51 -0700
> > Joel Fernandes <joel@joelfernandes.org> wrote:
> >   
> > > For whatever its worth, I made some notes of what I understood from reading
> > > the code and old posts because I was sure I would otherwise forget
> > > everything:
> > > http://www.joelfernandes.org/linuxinternals/2018/06/15/rcu-dynticks.html  
> > 
> > Nice write up. I may point some people to this ;-)
> > 
> > Anyway "complications due to nested NMIs (yes NMIs can nest!)"
> > 
> > What arch allows for NMIs to nest. Because we don't let that happen on
> > x86, and there's code that I know of that is called by NMIs that is not
> > re-entrant, and can crash if we allow for NMIs to nest. For example
> > "in_nmi()" will not show that we are in_nmi() if we allow for nesting
> > of NMIs. It has a single bit that gets incremented when we enter NMI
> > code, and cleared when we leave it.  
> 
> Last I checked with Andy Lutomirski, there are a number of things that,
> though not NMIs, act like NMIs and that can interrupt each others'
> handlers.  This is on x86.
>

Perhaps things like MCEs, but they don't call nmi_enter(). And usually
when something does, it probably puts the machine into an unstable
state. Getting RCU right, may be the least of the worries.

You may want to ask Andy if there's legitimate interruptions of NMIs
that doesn't mean "please reboot as soon as possible"?

-- Steve
Steven Rostedt June 25, 2018, 10:16 p.m. UTC | #35
On Mon, 25 Jun 2018 13:47:45 -0700
Andy Lutomirski <luto@kernel.org> wrote:

> 
> As a straightforward example, NMI and MCE can nest inside each other.

But I'm not sure how stable the machine becomes when MCE's start
interrupting NMIs. Also, can MCE callbacks use RCU?

> IIRC we treat #DB somewhat NMI-ish-ly as well.

And #DB is something I think we finally said shouldn't be done in NMIs,
IIRC.

-- Steve
Andy Lutomirski June 25, 2018, 11:30 p.m. UTC | #36
On Mon, Jun 25, 2018 at 3:16 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 25 Jun 2018 13:47:45 -0700
> Andy Lutomirski <luto@kernel.org> wrote:
>
> >
> > As a straightforward example, NMI and MCE can nest inside each other.
>
> But I'm not sure how stable the machine becomes when MCE's start
> interrupting NMIs. Also, can MCE callbacks use RCU?
>
> > IIRC we treat #DB somewhat NMI-ish-ly as well.
>
> And #DB is something I think we finally said shouldn't be done in NMIs,
> IIRC.
>

No one ever applied the patch to make it so, so we still support it.
Andy Lutomirski June 25, 2018, 11:32 p.m. UTC | #37
On Mon, Jun 25, 2018 at 3:15 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 25 Jun 2018 13:47:08 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>
> > On Mon, Jun 25, 2018 at 04:25:57PM -0400, Steven Rostedt wrote:
> > > On Mon, 25 Jun 2018 09:39:51 -0700
> > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > > For whatever its worth, I made some notes of what I understood from reading
> > > > the code and old posts because I was sure I would otherwise forget
> > > > everything:
> > > > http://www.joelfernandes.org/linuxinternals/2018/06/15/rcu-dynticks.html
> > >
> > > Nice write up. I may point some people to this ;-)
> > >
> > > Anyway "complications due to nested NMIs (yes NMIs can nest!)"
> > >
> > > What arch allows for NMIs to nest. Because we don't let that happen on
> > > x86, and there's code that I know of that is called by NMIs that is not
> > > re-entrant, and can crash if we allow for NMIs to nest. For example
> > > "in_nmi()" will not show that we are in_nmi() if we allow for nesting
> > > of NMIs. It has a single bit that gets incremented when we enter NMI
> > > code, and cleared when we leave it.
> >
> > Last I checked with Andy Lutomirski, there are a number of things that,
> > though not NMIs, act like NMIs and that can interrupt each others'
> > handlers.  This is on x86.
> >
>
> Perhaps things like MCEs, but they don't call nmi_enter(). And usually
> when something does, it probably puts the machine into an unstable
> state. Getting RCU right, may be the least of the worries.
>
> You may want to ask Andy if there's legitimate interruptions of NMIs
> that doesn't mean "please reboot as soon as possible"?
>

Yes, sadly.  CPU A gets an NMI.  While processing it, CPU B has user
code access a faulty NVDIMM address, causing CPU B to generate a
machine check.  CPU A also gets a machine check because someone at
Intel thought it was a good idea.  CPU A will mostly ignore the
machine check, but it still happens.

I think it's reasonable to say that nmi_enter() won't nest, but I
don't see how we can avoid rcu_nmi_enter() nesting.
Paul E. McKenney June 28, 2018, 8:02 p.m. UTC | #38
On Sat, Jun 23, 2018 at 10:53:56AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 22, 2018 at 03:03:35PM -0700, Andy Lutomirski wrote:
> > On Fri, Jun 22, 2018 at 2:14 PM Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > >
> > > On Fri, Jun 22, 2018 at 05:00:42PM -0400, Steven Rostedt wrote:
> > > > On Fri, 22 Jun 2018 13:58:13 -0700
> > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > >
> > > > > Something like this:
> > > > >
> > > > >     IRQ entered
> > > > >
> > > > > And never exited.  Ever.  I actually saw this in 2011.
> > > >
> > > > I still believe this was actually a bug. And perhaps you made the RCU
> > > > code robust enough to handle this bug ;-)
> > >
> > > Welcome to my world!
> > >
> > > But I recall it being used in several places, so if it was a bug, it
> > > was an intentional bug.  Probably the worst kind.
> > >
> > > Sort of like nested NMIs and interrupts within NMI handlers.  ;-)
> > >
> > > > > Or something like this:
> > > > >
> > > > >     IRQ exited
> > > > >
> > > > > Without a corresponding IRQ enter.
> > > > >
> > > > > The current code handles both of these situations, at least assuming
> > > > > that the interrupt entry/exit happens during a non-idle period.
> > > > >
> > > > > > > So why this function-call structure?  Well, you see, NMI handlers can
> > > > > > > take what appear to RCU to be normal interrupts...
> > > > > > >
> > > > > > > (And I just added that fun fact to Requirements.html.)
> > > > > >
> > > > > > Yes, I'll definitely go through all the interrupt requirements in the doc and
> > > > > > thanks for referring me to it.
> > > > >
> > > > > My concern may well be obsolete.  It would be good if it was!  ;-)
> > > >
> > > > I'd love to mandate that irq_enter() must be paired with irq_exit(). I
> > > > don't really see any rationale for it to be otherwise. If there is a
> > > > case, perhaps it needs to be fixed.
> > >
> > > Given that the usermode helpers now look to be common code using
> > > workqueues, kthreads, and calls to do_execve(), it might well be that
> > > the days of half-interrupts are behind us.
> > >
> > > But how to actually validate this?  My offer of adding a WARN_ON_ONCE()
> > > and waiting a few years still stands, but perhaps you have a better
> > > approach.
> > 
> > I think you should add a WARN_ON_ONCE().  Let's get the bugs fixed.
> 
> Or the obscure features identified, as the case may be.  ;-)
> 
> Either way, will do!

And here is a prototype patch.

							Thanx, Paul

------------------------------------------------------------------------

commit ef544593a7bcad74628fa0537badc49dce1f2d95
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Thu Jun 28 12:45:23 2018 -0700

    rcu: Add warning to detect half-interrupts
    
    RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
    either an interrupt that invokes rcu_irq_enter() but never invokes the
    corresponding rcu_irq_exit() on the one hand, or an interrupt that never
    invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
    on the other.  These things really did happen at one time, as evidenced
    by this ca-2011 LKML post:
    
    http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com
    
    The reason why RCU tolerates half-interrupts is that usermode helpers
    used exceptions to invoke a system call from within the kernel such that
    the system call did a normal return (not a return from exception) to
    the calling context.  This caused rcu_irq_enter() to be invoked without
    a matching rcu_irq_exit().  However, usermode helpers have since been
    rewritten to make much more housebroken use of workqueues, kernel threads,
    and do_execve(), and therefore should no longer produce half-interrupts.
    No one knows of any other source of half-interrupts, but then again,
    no one seems insane enough to go audit the entire kernel to verify that
    half-interrupts really are a relic of the past.
    
    This commit therefore adds a pair of WARN_ON_ONCE() calls that will
    trigger in the presence of half interrupts, which the code will continue
    to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
    mid-2021, then perhaps RCU can stop handling half-interrupts, which
    would be a considerable simplification.
    
    Reported-by: Steven Rostedt <rostedt@goodmis.org>
    Reported-by: Joel Fernandes <joel@joelfernandes.org>
    Reported-by: Andy Lutomirski <luto@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6c5a7f0daadc..37ae0d77854d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -714,6 +714,7 @@ static void rcu_eqs_enter(bool user)
 	struct rcu_dynticks *rdtp;
 
 	rdtp = this_cpu_ptr(&rcu_dynticks);
+	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting != DYNTICK_IRQ_NONIDLE);
 	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0);
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
 		     rdtp->dynticks_nesting == 0);
@@ -895,6 +896,7 @@ static void rcu_eqs_exit(bool user)
 	trace_rcu_dyntick(TPS("End"), rdtp->dynticks_nesting, 1, rdtp->dynticks);
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
 	WRITE_ONCE(rdtp->dynticks_nesting, 1);
+	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting);
 	WRITE_ONCE(rdtp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);
 }
Joel Fernandes June 28, 2018, 9:13 p.m. UTC | #39
On Thu, Jun 28, 2018 at 01:02:05PM -0700, Paul E. McKenney wrote:
> > > <paulmck@linux.vnet.ibm.com> wrote:
[..]
> > > > > > > > So why this function-call structure?  Well, you see, NMI handlers can
> > > > > > > > take what appear to RCU to be normal interrupts...
> > > > > > > >
> > > > > > > > (And I just added that fun fact to Requirements.html.)
> > > > > > >
> > > > > > > Yes, I'll definitely go through all the interrupt requirements in the doc and
> > > > > > > thanks for referring me to it.
> > > > > >
> > > > > > My concern may well be obsolete.  It would be good if it was!  ;-)
> > > > >
> > > > > I'd love to mandate that irq_enter() must be paired with irq_exit(). I
> > > > > don't really see any rationale for it to be otherwise. If there is a
> > > > > case, perhaps it needs to be fixed.
> > > >
> > > > Given that the usermode helpers now look to be common code using
> > > > workqueues, kthreads, and calls to do_execve(), it might well be that
> > > > the days of half-interrupts are behind us.
> > > >
> > > > But how to actually validate this?  My offer of adding a WARN_ON_ONCE()
> > > > and waiting a few years still stands, but perhaps you have a better
> > > > approach.
> > > 
> > > I think you should add a WARN_ON_ONCE().  Let's get the bugs fixed.
> > 
> > Or the obscure features identified, as the case may be.  ;-)
> > 
> > Either way, will do!
> 
> And here is a prototype patch.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit ef544593a7bcad74628fa0537badc49dce1f2d95
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Thu Jun 28 12:45:23 2018 -0700
> 
>     rcu: Add warning to detect half-interrupts
>     
>     RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
>     either an interrupt that invokes rcu_irq_enter() but never invokes the
>     corresponding rcu_irq_exit() on the one hand, or an interrupt that never
>     invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
>     on the other.  These things really did happen at one time, as evidenced
>     by this ca-2011 LKML post:
>     
>     http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com
>     
>     The reason why RCU tolerates half-interrupts is that usermode helpers
>     used exceptions to invoke a system call from within the kernel such that
>     the system call did a normal return (not a return from exception) to
>     the calling context.  This caused rcu_irq_enter() to be invoked without
>     a matching rcu_irq_exit().  However, usermode helpers have since been
>     rewritten to make much more housebroken use of workqueues, kernel threads,
>     and do_execve(), and therefore should no longer produce half-interrupts.
>     No one knows of any other source of half-interrupts, but then again,
>     no one seems insane enough to go audit the entire kernel to verify that
>     half-interrupts really are a relic of the past.
>     
>     This commit therefore adds a pair of WARN_ON_ONCE() calls that will
>     trigger in the presence of half interrupts, which the code will continue
>     to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
>     mid-2021, then perhaps RCU can stop handling half-interrupts, which
>     would be a considerable simplification.
>     
>     Reported-by: Steven Rostedt <rostedt@goodmis.org>
>     Reported-by: Joel Fernandes <joel@joelfernandes.org>
>     Reported-by: Andy Lutomirski <luto@kernel.org>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Looks good to me!

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

- Joel
Paul E. McKenney June 28, 2018, 9:41 p.m. UTC | #40
On Thu, Jun 28, 2018 at 02:13:15PM -0700, Joel Fernandes wrote:
> On Thu, Jun 28, 2018 at 01:02:05PM -0700, Paul E. McKenney wrote:
> > > > <paulmck@linux.vnet.ibm.com> wrote:
> [..]
> > > > > > > > > So why this function-call structure?  Well, you see, NMI handlers can
> > > > > > > > > take what appear to RCU to be normal interrupts...
> > > > > > > > >
> > > > > > > > > (And I just added that fun fact to Requirements.html.)
> > > > > > > >
> > > > > > > > Yes, I'll definitely go through all the interrupt requirements in the doc and
> > > > > > > > thanks for referring me to it.
> > > > > > >
> > > > > > > My concern may well be obsolete.  It would be good if it was!  ;-)
> > > > > >
> > > > > > I'd love to mandate that irq_enter() must be paired with irq_exit(). I
> > > > > > don't really see any rationale for it to be otherwise. If there is a
> > > > > > case, perhaps it needs to be fixed.
> > > > >
> > > > > Given that the usermode helpers now look to be common code using
> > > > > workqueues, kthreads, and calls to do_execve(), it might well be that
> > > > > the days of half-interrupts are behind us.
> > > > >
> > > > > But how to actually validate this?  My offer of adding a WARN_ON_ONCE()
> > > > > and waiting a few years still stands, but perhaps you have a better
> > > > > approach.
> > > > 
> > > > I think you should add a WARN_ON_ONCE().  Let's get the bugs fixed.
> > > 
> > > Or the obscure features identified, as the case may be.  ;-)
> > > 
> > > Either way, will do!
> > 
> > And here is a prototype patch.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit ef544593a7bcad74628fa0537badc49dce1f2d95
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Thu Jun 28 12:45:23 2018 -0700
> > 
> >     rcu: Add warning to detect half-interrupts
> >     
> >     RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
> >     either an interrupt that invokes rcu_irq_enter() but never invokes the
> >     corresponding rcu_irq_exit() on the one hand, or an interrupt that never
> >     invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
> >     on the other.  These things really did happen at one time, as evidenced
> >     by this ca-2011 LKML post:
> >     
> >     http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com
> >     
> >     The reason why RCU tolerates half-interrupts is that usermode helpers
> >     used exceptions to invoke a system call from within the kernel such that
> >     the system call did a normal return (not a return from exception) to
> >     the calling context.  This caused rcu_irq_enter() to be invoked without
> >     a matching rcu_irq_exit().  However, usermode helpers have since been
> >     rewritten to make much more housebroken use of workqueues, kernel threads,
> >     and do_execve(), and therefore should no longer produce half-interrupts.
> >     No one knows of any other source of half-interrupts, but then again,
> >     no one seems insane enough to go audit the entire kernel to verify that
> >     half-interrupts really are a relic of the past.
> >     
> >     This commit therefore adds a pair of WARN_ON_ONCE() calls that will
> >     trigger in the presence of half interrupts, which the code will continue
> >     to handle correctly.  If neither of these WARN_ON_ONCE() trigger by
> >     mid-2021, then perhaps RCU can stop handling half-interrupts, which
> >     would be a considerable simplification.
> >     
> >     Reported-by: Steven Rostedt <rostedt@goodmis.org>
> >     Reported-by: Joel Fernandes <joel@joelfernandes.org>
> >     Reported-by: Andy Lutomirski <luto@kernel.org>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Looks good to me!
> 
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Applied, thank you!!!

							Thanx, Paul

Patch
diff mbox series

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 59ae94e..61f203a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -260,7 +260,6 @@  void rcu_bh_qs(void)
 
 static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
 	.dynticks_nesting = 1,
-	.dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
 	.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
 };
 
@@ -694,10 +693,6 @@  static struct rcu_node *rcu_get_root(struct rcu_state *rsp)
 /*
  * Enter an RCU extended quiescent state, which can be either the
  * idle loop or adaptive-tickless usermode execution.
- *
- * We crowbar the ->dynticks_nmi_nesting field to zero to allow for
- * the possibility of usermode upcalls having messed up our count
- * of interrupt nesting level during the prior busy period.
  */
 static void rcu_eqs_enter(bool user)
 {
@@ -706,11 +701,11 @@  static void rcu_eqs_enter(bool user)
 	struct rcu_dynticks *rdtp;
 
 	rdtp = this_cpu_ptr(&rcu_dynticks);
-	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0);
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
 		     rdtp->dynticks_nesting == 0);
 	if (rdtp->dynticks_nesting != 1) {
-		rdtp->dynticks_nesting--;
+		WRITE_ONCE(rdtp->dynticks_nesting, /* No store tearing. */
+			   rdtp->dynticks_nesting - 1);
 		return;
 	}
 
@@ -767,7 +762,7 @@  void rcu_user_enter(void)
  * rcu_nmi_exit - inform RCU of exit from NMI context
  *
  * If we are returning from the outermost NMI handler that interrupted an
- * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting
+ * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nesting
  * to let the RCU grace-period handling know that the CPU is back to
  * being RCU-idle.
  *
@@ -779,21 +774,21 @@  void rcu_nmi_exit(void)
 	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
 
 	/*
-	 * Check for ->dynticks_nmi_nesting underflow and bad ->dynticks.
+	 * Check for ->dynticks_nesting underflow and bad ->dynticks.
 	 * (We are exiting an NMI handler, so RCU better be paying attention
 	 * to us!)
 	 */
-	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting <= 0);
+	WARN_ON_ONCE(rdtp->dynticks_nesting <= 0);
 	WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs());
 
 	/*
 	 * If the nesting level is not 1, the CPU wasn't RCU-idle, so
 	 * leave it in non-RCU-idle state.
 	 */
-	if (rdtp->dynticks_nmi_nesting != 1) {
-		trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nmi_nesting, rdtp->dynticks_nmi_nesting - 2, rdtp->dynticks);
-		WRITE_ONCE(rdtp->dynticks_nmi_nesting, /* No store tearing. */
-			   rdtp->dynticks_nmi_nesting - 2);
+	if (rdtp->dynticks_nesting != 1) {
+		trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1, rdtp->dynticks);
+		WRITE_ONCE(rdtp->dynticks_nesting, /* No store tearing. */
+			   rdtp->dynticks_nesting - 1);
 		return;
 	}
 
@@ -803,8 +798,8 @@  void rcu_nmi_exit(void)
 	}
 
 	/* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
-	trace_rcu_dyntick(TPS("Startirq"), rdtp->dynticks_nmi_nesting, 0, rdtp->dynticks);
-	WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
+	trace_rcu_dyntick(TPS("Startirq"), rdtp->dynticks_nesting, 0, rdtp->dynticks);
+	WRITE_ONCE(rdtp->dynticks_nesting, 0); /* Avoid store tearing. */
 	rcu_dynticks_eqs_enter();
 }
 
@@ -851,10 +846,6 @@  void rcu_irq_exit_irqson(void)
 /*
  * Exit an RCU extended quiescent state, which can be either the
  * idle loop or adaptive-tickless usermode execution.
- *
- * We crowbar the ->dynticks_nmi_nesting field to DYNTICK_IRQ_NONIDLE to
- * allow for the possibility of usermode upcalls messing up our count of
- * interrupt nesting level during the busy period that is just now starting.
  */
 static void rcu_eqs_exit(bool user)
 {
@@ -866,7 +857,8 @@  static void rcu_eqs_exit(bool user)
 	oldval = rdtp->dynticks_nesting;
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && oldval < 0);
 	if (oldval) {
-		rdtp->dynticks_nesting++;
+		WRITE_ONCE(rdtp->dynticks_nesting, /* No store tearing. */
+			   rdtp->dynticks_nesting + 1);
 		return;
 	}
 	rcu_dynticks_task_exit();
@@ -875,7 +867,6 @@  static void rcu_eqs_exit(bool user)
 	trace_rcu_dyntick(TPS("End"), rdtp->dynticks_nesting, 1, rdtp->dynticks);
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
 	WRITE_ONCE(rdtp->dynticks_nesting, 1);
-	WRITE_ONCE(rdtp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);
 }
 
 /**
@@ -915,11 +906,11 @@  void rcu_user_exit(void)
 /**
  * rcu_nmi_enter - inform RCU of entry to NMI context
  *
- * If the CPU was idle from RCU's viewpoint, update rdtp->dynticks and
- * rdtp->dynticks_nmi_nesting to let the RCU grace-period handling know
- * that the CPU is active.  This implementation permits nested NMIs, as
- * long as the nesting level does not overflow an int.  (You will probably
- * run out of stack space first.)
+ * If the CPU was idle from RCU's viewpoint, update rdtp->dynticks. And
+ * then update rdtp->dynticks_nesting to let the RCU grace-period handling
+ * know that the CPU is active.  This implementation permits nested NMIs,
+ * as long as the nesting level does not overflow an int.  (You will
+ * probably run out of stack space first.)
  *
  * If you add or remove a call to rcu_nmi_enter(), be sure to test
  * with CONFIG_RCU_EQS_DEBUG=y.
@@ -927,33 +918,29 @@  void rcu_user_exit(void)
 void rcu_nmi_enter(void)
 {
 	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
-	long incby = 2;
 
 	/* Complain about underflow. */
-	WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 0);
+	WARN_ON_ONCE(rdtp->dynticks_nesting < 0);
 
-	/*
-	 * If idle from RCU viewpoint, atomically increment ->dynticks
-	 * to mark non-idle and increment ->dynticks_nmi_nesting by one.
-	 * Otherwise, increment ->dynticks_nmi_nesting by two.  This means
-	 * if ->dynticks_nmi_nesting is equal to one, we are guaranteed
-	 * to be in the outermost NMI handler that interrupted an RCU-idle
-	 * period (observation due to Andy Lutomirski).
-	 */
 	if (rcu_dynticks_curr_cpu_in_eqs()) {
 		rcu_dynticks_eqs_exit();
-		incby = 1;
 
 		if (!in_nmi()) {
 			rcu_dynticks_task_exit();
 			rcu_cleanup_after_idle();
 		}
 	}
-	trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
-			  rdtp->dynticks_nmi_nesting,
-			  rdtp->dynticks_nmi_nesting + incby, rdtp->dynticks);
-	WRITE_ONCE(rdtp->dynticks_nmi_nesting, /* Prevent store tearing. */
-		   rdtp->dynticks_nmi_nesting + incby);
+
+	trace_rcu_dyntick(rdtp->dynticks_nesting ? TPS("++=") : TPS("Endirq"),
+			  rdtp->dynticks_nesting,
+			  rdtp->dynticks_nesting + 1, rdtp->dynticks);
+	/*
+	 * If ->dynticks_nesting is equal to one on rcu_nmi_exit(), we are
+	 * guaranteed to be in the outermost NMI handler that interrupted
+	 * an RCU-idle period (observation due to Andy Lutomirski).
+	 */
+	WRITE_ONCE(rdtp->dynticks_nesting, /* Prevent store tearing. */
+		   rdtp->dynticks_nesting + 1);
 	barrier();
 }
 
@@ -1089,8 +1076,7 @@  bool rcu_lockdep_current_cpu_online(void)
  */
 static int rcu_is_cpu_rrupt_from_idle(void)
 {
-	return __this_cpu_read(rcu_dynticks.dynticks_nesting) <= 0 &&
-	       __this_cpu_read(rcu_dynticks.dynticks_nmi_nesting) <= 1;
+	return __this_cpu_read(rcu_dynticks.dynticks_nesting) <= 1;
 }
 
 /*
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4e74df7..071afe4 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -38,8 +38,8 @@ 
  * Dynticks per-CPU state.
  */
 struct rcu_dynticks {
-	long dynticks_nesting;      /* Track process nesting level. */
-	long dynticks_nmi_nesting;  /* Track irq/NMI nesting level. */
+	long dynticks_nesting __attribute__((aligned(sizeof(long))));
+				    /* Track process nesting level. */
 	atomic_t dynticks;	    /* Even value for idle, else odd. */
 	bool rcu_need_heavy_qs;     /* GP old, need heavy quiescent state. */
 	unsigned long rcu_qs_ctr;   /* Light universal quiescent state ctr. */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c1b17f5..0c57e50 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1801,7 +1801,7 @@  static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
 	}
 	print_cpu_stall_fast_no_hz(fast_no_hz, cpu);
 	delta = rcu_seq_ctr(rdp->mynode->gp_seq - rdp->rcu_iw_gp_seq);
-	pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld/%#lx softirq=%u/%u fqs=%ld %s\n",
+	pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld softirq=%u/%u fqs=%ld %s\n",
 	       cpu,
 	       "O."[!!cpu_online(cpu)],
 	       "o."[!!(rdp->grpmask & rdp->mynode->qsmaskinit)],
@@ -1811,7 +1811,7 @@  static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
 				"!."[!delta],
 	       ticks_value, ticks_title,
 	       rcu_dynticks_snap(rdtp) & 0xfff,
-	       rdtp->dynticks_nesting, rdtp->dynticks_nmi_nesting,
+	       rdtp->dynticks_nesting,
 	       rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
 	       READ_ONCE(rsp->n_force_qs) - rsp->n_force_qs_gpstart,
 	       fast_no_hz);