All of lore.kernel.org
 help / color / mirror / Atom feed
From: Masami Hiramatsu <mhiramat@kernel.org>
To: Binoy Jayan <binoy.jayan@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>,
	linaro-kernel@lists.linaro.org,
	Daniel Wagner <daniel.wagner@bmw-carit.de>,
	Carsten Emde <C.Emde@osadl.org>,
	linux-kernel@vger.kernel.org,
	"Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
Subject: Re: [PATCH v2 1/3] tracing: Deference pointers without RCU checks
Date: Fri, 26 Aug 2016 10:49:13 +0900	[thread overview]
Message-ID: <20160826104913.71b0200cde155220cf5d5c00@kernel.org> (raw)
In-Reply-To: <1472037450-8907-2-git-send-email-binoy.jayan@linaro.org>

On Wed, 24 Aug 2016 16:47:28 +0530
Binoy Jayan <binoy.jayan@linaro.org> wrote:

> From: Daniel Wagner <daniel.wagner@bmw-carit.de>
> 
> The tracepoint can't be used in code section where we are in the
> middle of a state transition.
> 
> For example if we place a tracepoint inside start/stop_critical_section(),
> lockdep complains with
> 
> [    0.035589] WARNING: CPU: 0 PID: 3 at kernel/locking/lockdep.c:3560 \
> check_flags.part.36+0x1bc/0x210() [    0.036000] \
> DEBUG_LOCKS_WARN_ON(current->softirqs_enabled) [    0.036000] Kernel panic - not \
> syncing: panic_on_warn set ... [    0.036000]
> [    0.036000] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.0.0-rc7+ #460
> [    0.036000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS \
> 1.7.5-20140709_153950- 04/01/2014 [    0.036000]  ffffffff81f2463a ffff88007c93bb98 \
> ffffffff81afb317 0000000000000001 [    0.036000]  ffffffff81f212b3 ffff88007c93bc18 \
> ffffffff81af7bc2 ffff88007c93bbb8 [    0.036000]  ffffffff00000008 ffff88007c93bc28 \
> ffff88007c93bbc8 000000000093bbd8 [    0.036000] Call Trace:
> [    0.036000]  [<ffffffff81afb317>] dump_stack+0x4f/0x7b
> [    0.036000]  [<ffffffff81af7bc2>] panic+0xc0/0x1e9
> [    0.036000]  [<ffffffff81b07188>] ? _raw_spin_unlock_irqrestore+0x38/0x80
> [    0.036000]  [<ffffffff81052a60>] warn_slowpath_common+0xc0/0xc0
> [    0.036000]  [<ffffffff81057116>] ? __local_bh_enable+0x36/0x70
> [    0.036000]  [<ffffffff81052aa6>] warn_slowpath_fmt+0x46/0x50
> [    0.036000]  [<ffffffff810a468c>] check_flags.part.36+0x1bc/0x210
> [    0.036000]  [<ffffffff810a5168>] lock_is_held+0x78/0x90
> [    0.036000]  [<ffffffff81057116>] ? __local_bh_enable+0x36/0x70
> [    0.036000]  [<ffffffff810588ab>] ? __do_softirq+0x3db/0x500
> [    0.036000]  [<ffffffff81134495>] trace_preempt_on+0x255/0x260
> [    0.036000]  [<ffffffff81081c0b>] preempt_count_sub+0xab/0xf0
> [    0.036000]  [<ffffffff81057116>] __local_bh_enable+0x36/0x70
> [    0.036000]  [<ffffffff810588ab>] __do_softirq+0x3db/0x500
> [    0.036000]  [<ffffffff810589ef>] run_ksoftirqd+0x1f/0x60
> [    0.036000]  [<ffffffff81079653>] smpboot_thread_fn+0x193/0x2a0
> [    0.036000]  [<ffffffff810794c0>] ? SyS_setgroups+0x150/0x150
> [    0.036000]  [<ffffffff81075312>] kthread+0xf2/0x110
> [    0.036000]  [<ffffffff81b026b3>] ? wait_for_completion+0xc3/0x120
> [    0.036000]  [<ffffffff81081c0b>] ? preempt_count_sub+0xab/0xf0
> [    0.036000]  [<ffffffff81075220>] ? kthread_create_on_node+0x240/0x240
> [    0.036000]  [<ffffffff81b07cc8>] ret_from_fork+0x58/0x90
> [    0.036000]  [<ffffffff81075220>] ? kthread_create_on_node+0x240/0x240
> [    0.036000] ---[ end Kernel panic - not syncing: panic_on_warn set ...
> 
> PeterZ was so kind to explain it to me what is happening:
> 
> "__local_bh_enable() tests if this is the last SOFTIRQ_OFFSET, if so it
> tells lockdep softirqs are enabled with trace_softirqs_on() after that
> we go an actually modify the preempt_count with preempt_count_sub().
> Then in preempt_count_sub() you call into trace_preempt_on() if this
> was the last preempt_count increment but you do that _before_ you
> actually change the preempt_count with __preempt_count_sub() at this
> point lockdep and preempt_count think the world differs and *boom*"
> 
> So the simplest way to avoid this is by disabling the consistency
> checks.
> 
> We also need to take care of the iterating in trace_events_trigger.c
> to avoid a splatter in conjunction with the hist trigger.

Special care for lockdep inside tracepoint handler is reasonable.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>

Steven, since this seems a bugfix, could you pick this from the series?

Thank you,

> 
> Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> ---
>  include/linux/rculist.h             | 36 ++++++++++++++++++++++++++++++++++++
>  include/linux/tracepoint.h          |  4 ++--
>  kernel/trace/trace_events_trigger.c |  6 +++---
>  3 files changed, 41 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 8beb98d..bee836b 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -279,6 +279,24 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
>  	container_of(lockless_dereference(ptr), type, member)
>  
>  /**
> + * list_entry_rcu_notrace - get the struct for this entry (for tracing)
> + * @ptr:        the &struct list_head pointer.
> + * @type:       the type of the struct this is embedded in.
> + * @member:     the name of the list_head within the struct.
> + *
> + * This primitive may safely run concurrently with the _rcu list-mutation
> + * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
> + *
> + * This is the same as list_entry_rcu() except that it does
> + * not do any RCU debugging or tracing.
> + */
> +#define list_entry_rcu_notrace(ptr, type, member) \
> +({ \
> +	typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
> +	container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), type, member); \
> +})
> +
> +/**
>   * Where are list_empty_rcu() and list_first_entry_rcu()?
>   *
>   * Implementing those functions following their counterparts list_empty() and
> @@ -391,6 +409,24 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
>  	     pos = list_entry_lockless(pos->member.next, typeof(*pos), member))
>  
>  /**
> + * list_for_each_entry_rcu_notrace	-	iterate over rcu list of given type (for tracing)
> + * @pos:	the type * to use as a loop cursor.
> + * @head:	the head for your list.
> + * @member:	the name of the list_head within the struct.
> + *
> + * This list-traversal primitive may safely run concurrently with
> + * the _rcu list-mutation primitives such as list_add_rcu()
> + * as long as the traversal is guarded by rcu_read_lock().
> + *
> + * This is the same as list_for_each_entry_rcu() except that it does
> + * not do any RCU debugging or tracing.
> + */
> +#define list_for_each_entry_rcu_notrace(pos, head, member) \
> +	for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \
> +		&pos->member != (head); \
> +		pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), member))
> +
> +/**
>   * list_for_each_entry_continue_rcu - continue iteration over list of given type
>   * @pos:	the type * to use as a loop cursor.
>   * @head:	the head for your list.
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index be586c6..d0e2a82 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -138,7 +138,7 @@ extern void syscall_unregfunc(void);
>  			return;						\
>  		prercu;							\
>  		rcu_read_lock_sched_notrace();				\
> -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> +		it_func_ptr = rcu_dereference_raw_notrace((tp)->funcs);	\
>  		if (it_func_ptr) {					\
>  			do {						\
>  				it_func = (it_func_ptr)->func;		\
> @@ -189,7 +189,7 @@ extern void syscall_unregfunc(void);
>  				TP_CONDITION(cond),,);			\
>  		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
>  			rcu_read_lock_sched_notrace();			\
> -			rcu_dereference_sched(__tracepoint_##name.funcs);\
> +			rcu_dereference_raw_notrace(__tracepoint_##name.funcs);\
>  			rcu_read_unlock_sched_notrace();		\
>  		}							\
>  	}								\
> diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> index a975571..4489fb4 100644
> --- a/kernel/trace/trace_events_trigger.c
> +++ b/kernel/trace/trace_events_trigger.c
> @@ -71,14 +71,14 @@ event_triggers_call(struct trace_event_file *file, void *rec)
>  	if (list_empty(&file->triggers))
>  		return tt;
>  
> -	list_for_each_entry_rcu(data, &file->triggers, list) {
> +	list_for_each_entry_rcu_notrace(data, &file->triggers, list) {
>  		if (data->paused)
>  			continue;
>  		if (!rec) {
>  			data->ops->func(data, rec);
>  			continue;
>  		}
> -		filter = rcu_dereference_sched(data->filter);
> +		filter = rcu_dereference_raw_notrace(data->filter);
>  		if (filter && !filter_match_preds(filter, rec))
>  			continue;
>  		if (event_command_post_trigger(data->cmd_ops)) {
> @@ -111,7 +111,7 @@ event_triggers_post_call(struct trace_event_file *file,
>  {
>  	struct event_trigger_data *data;
>  
> -	list_for_each_entry_rcu(data, &file->triggers, list) {
> +	list_for_each_entry_rcu_notrace(data, &file->triggers, list) {
>  		if (data->paused)
>  			continue;
>  		if (data->cmd_ops->trigger_type & tt)
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

  reply	other threads:[~2016-08-26  1:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-24 11:17 [PATCH v2 0/3] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Binoy Jayan
2016-08-24 11:17 ` [PATCH v2 1/3] tracing: Deference pointers without RCU checks Binoy Jayan
2016-08-26  1:49   ` Masami Hiramatsu [this message]
2016-08-26  5:26     ` Binoy Jayan
2016-08-28 23:50       ` Masami Hiramatsu
2016-08-24 11:17 ` [PATCH v2 2/3] tracing: Add trace_irqsoff tracepoints Binoy Jayan
2016-08-25  6:35   ` kbuild test robot
2016-08-25  6:35   ` kbuild test robot
2016-08-24 11:17 ` [PATCH v2 3/3] tracing: Histogram for missed timer offsets Binoy Jayan
2016-08-25  5:26 ` [PATCH v2 0/3] *** Latency histograms - IRQSOFF,PREEMPTOFF *** Daniel Wagner
2016-08-25  6:33   ` Binoy Jayan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160826104913.71b0200cde155220cf5d5c00@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=C.Emde@osadl.org \
    --cc=arnd@arndb.de \
    --cc=binoy.jayan@linaro.org \
    --cc=daniel.wagner@bmw-carit.de \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.