All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: rostedt <rostedt@goodmis.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	"Joel Fernandes, Google" <joel@joelfernandes.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Gustavo A. R. Silva" <gustavo@embeddedor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	paulmck <paulmck@kernel.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>
Subject: Re: [PATCH v2] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook
Date: Tue, 11 Feb 2020 10:34:38 -0500 (EST)	[thread overview]
Message-ID: <504801580.617591.1581435278202.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20200211095047.58ddf750@gandalf.local.home>

----- On Feb 11, 2020, at 9:50 AM, rostedt rostedt@goodmis.org wrote:

> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Commit e6753f23d961d ("tracepoint: Make rcuidle tracepoint callers use
> SRCU") removed the calls to rcu_irq_enter/exit_irqson() and replaced it with
> srcu callbacks as that much faster for the rcuidle cases. But this caused an
> issue with perf, because perf only uses rcu to synchronize its trace point
> callback routines.
> 
> The issue was that if perf traced one of the "rcuidle" paths, that path no
> longer enabled RCU if it was not watching, and this caused lockdep to
> complain when the perf code used rcu_read_lock() and RCU was not "watching".
> 
> Commit 865e63b04e9b2 ("tracing: Add back in rcu_irq_enter/exit_irqson() for
> rcuidle tracepoints") added back the rcu_irq_enter/exit_irqson() code, but
> this made the srcu changes no longer applicable.
> 
> As perf is the only callback that needs the heavier weight
> "rcu_irq_enter/exit_irqson()" calls, move it to the perf specific code and
> not bog down those that do not require it.
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
> Changes since v1:
> 
>  - Moved the rcu_is_watching logic to perf_tp_event() and remove the
>    exporting of rcu_irq_enter/exit_irqson().
> 
> include/linux/tracepoint.h |  8 ++------
> kernel/events/core.c       | 17 ++++++++++++++++-
> 2 files changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 1fb11daa5c53..a83fd076a312 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -179,10 +179,8 @@ static inline struct tracepoint
> *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> 		 * For rcuidle callers, use srcu since sched-rcu	\
> 		 * doesn't work from the idle path.			\
> 		 */							\
> -		if (rcuidle) {						\
> +		if (rcuidle)						\
> 			__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
> -			rcu_irq_enter_irqson();				\
> -		}							\
> 									\
> 		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
> 									\
> @@ -194,10 +192,8 @@ static inline struct tracepoint
> *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> 			} while ((++it_func_ptr)->func);		\
> 		}							\
> 									\
> -		if (rcuidle) {						\
> -			rcu_irq_exit_irqson();				\
> +		if (rcuidle)						\
> 			srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
> -		}							\
> 									\
> 		preempt_enable_notrace();				\
> 	} while (0)
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 455451d24b4a..0abbf5e2ee62 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8941,6 +8941,7 @@ void perf_tp_event(u16 event_type, u64 count, void
> *record, int entry_size,
> {
> 	struct perf_sample_data data;
> 	struct perf_event *event;
> +	bool rcu_watching = rcu_is_watching();
> 
> 	struct perf_raw_record raw = {
> 		.frag = {
> @@ -8949,6 +8950,17 @@ void perf_tp_event(u16 event_type, u64 count, void
> *record, int entry_size,
> 		},
> 	};
> 
> +	if (!rcu_watching) {
> +		/*
> +		 * If nmi_enter() is traced, it is possible that
> +		 * RCU may not be watching "yet", and this is called.
> +		 * We can not call rcu_irq_enter_irqson() in this case.
> +		 */
> +		if (unlikely(in_nmi()))
> +			goto out;
> +		rcu_irq_enter_irqson();
> +	}
> +
> 	perf_sample_data_init(&data, 0, 0);
> 	data.raw = &raw;

I'm puzzled by this function. It does:

perf_tp_event(...)
{
     hlist_for_each_entry_rcu(event, head, hlist_entry) {
         ...
     }
     if (task && task != current) {
         rcu_read_lock();
         ... = rcu_dereference();
         list_for_each_entry_rcu(...) {
             ....
         }
         rcu_read_unlock();
     }
}

What is the purpose of the rcu_read_lock/unlock within the if (),
considering that there is already an hlist rcu iteration just before ?
It seems to assume that a RCU read-side of some kind of already
ongoing.

Thanks,

Mathieu


> 
> @@ -8985,8 +8997,11 @@ void perf_tp_event(u16 event_type, u64 count, void
> *record, int entry_size,
> unlock:
> 		rcu_read_unlock();
> 	}
> -
> +	if (!rcu_watching)
> +		rcu_irq_exit_irqson();
> +out:
> 	perf_swevent_put_recursion_context(rctx);
> +
> }
> EXPORT_SYMBOL_GPL(perf_tp_event);
> 
> --
> 2.20.1

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2020-02-11 15:34 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-11 14:50 [PATCH v2] tracing/perf: Move rcu_irq_enter/exit_irqson() to perf trace point hook Steven Rostedt
2020-02-11 15:34 ` Mathieu Desnoyers [this message]
2020-02-11 15:46   ` Peter Zijlstra
2020-02-11 16:02     ` Mathieu Desnoyers
2020-02-11 15:34 ` Peter Zijlstra
2020-02-11 16:18   ` Steven Rostedt
2020-02-11 16:27     ` Mathieu Desnoyers
2020-02-11 16:35       ` Steven Rostedt
2020-02-11 17:29     ` Peter Zijlstra
2020-02-11 17:32       ` Peter Zijlstra
2020-02-11 18:54         ` Paul E. McKenney
2020-02-12  8:05           ` Peter Zijlstra
2020-02-12  9:05             ` Paul E. McKenney
2020-02-11 17:35       ` Mathieu Desnoyers
2020-02-12  8:02         ` Peter Zijlstra
2020-02-12 15:14           ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=504801580.617591.1581435278202.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=gustavo@embeddedor.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.