All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Jim Keniston <jkenisto@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-mm <linux-mm@kvack.org>, Andi Kleen <andi@firstfloor.org>,
	Christoph Hellwig <hch@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Arnaldo Carvalho de Melo <acme@infradead.org>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Anton Arapov <anton@redhat.com>
Subject: Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
Date: Sun, 15 Apr 2012 23:48:33 +0200	[thread overview]
Message-ID: <1334526513.28150.23.camel@twins> (raw)
In-Reply-To: <20120415195351.GA22095@redhat.com>

On Sun, 2012-04-15 at 21:53 +0200, Oleg Nesterov wrote:
> On 04/15, Peter Zijlstra wrote:
> >
> > On Sat, 2012-04-14 at 22:52 +0200, Oleg Nesterov wrote:
> > > > >     - can it work or I missed something "in general" ?
> > > >
> > > > So we insert in the rb-tree before we take mmap_sem, this means we can
> > > > hit a non-uprobe int3 and still find a uprobe there, no?
> > >
> > > Yes, but unless I miss something this is "off-topic", this
> > > can happen with or without these changes. If find_uprobe()
> > > succeeds we assume that this bp was inserted by uprobe.
> >
> > OK, but then I completely missed what the point of that
> > down_write() stuff is..
> 
> To ensure handle_swbp() can't race with unregister + register
> and send the wrong SIGTRAP.
> 
> handle_swbp() roughly does under down_read(mmap_sem)
> 
> 
> 	if (find_uprobe(vaddr))
> 		process_uprobe();
> 	else
> 	if (is_swbp_at_addr_fast(vaddr))	// non-uprobe int3
> 		send_sig(SIGTRAP);
> 	else
> 		restart_insn(vaddr);		// raced with unregister
> 
> 
> note that is_swbp_at_addr_fast() is used (currently) to detect
> the race with upbrobe_unregister() and that is why we can remove
> uprobes_srcu.
> 
> But if find_uprobe() fails, there is a window before
> is_swbp_at_addr_fast() reads the memory. Suppose that the next
> uprobe_register() inserts the new uprobe at the same address.
> In this case the task will be wrongly killed.

OK, still not seeing how your proposal could work.. consider the below
patch comment, I'm not seeing how is_swbp_at_addr_fast() deals with an
in-progress INT3 while we remove the probe.

By ensuring the non-race with reg/unreg it will either find the uprobe
(no problem) or not find it and not see a breakpoint instruction either,
even though the pending breakpoint was generated by a uprobe (which is
now gone), causing a false positive SIGTRAP.

Or am I still not getting it?

---
 kernel/events/uprobes.c |   53 +++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 29e881b..67818ff 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -723,20 +723,57 @@ remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, loff_t vaddr)
 }
 
 /*
- * There could be threads that have hit the breakpoint and are entering the
- * notifier code and trying to acquire the uprobes_treelock. The thread
- * calling delete_uprobe() that is removing the uprobe from the rb_tree can
- * race with these threads and might acquire the uprobes_treelock compared
- * to some of the breakpoint hit threads. In such a case, the breakpoint
- * hit threads will not find the uprobe. The current unregistering thread
- * waits till all other threads have hit a breakpoint, to acquire the
- * uprobes_treelock before the uprobe is removed from the rbtree.
+ * <userspace>
+ *  ...
+ *  int3 ---->	<IRQ>
+ *	   	  do_int3
+ *	  (A)	    DIE_INT3 -> uprobe_pre_sstep_notifier()
+ *	   	      ...
+ *		      set_thread_flag(TIF_UPROBE)
+ *		      srcu_read_lock_raw()
+ *		<EOI>
+ *	  (B)
+ *		ret_from_intr
+ *		  do_notify_resume()
+ *		    uprobe_notify_resume()
+ *		      handle_swbp()
+ *		        uprobe = find_uprobe()
+ *		          atomic_inc(&uprobe->ref)
+ *			srcu_read_unlock_raw()
+ *			...
+ *	  (C)
+ *	  		put_uprobe()
+ *	 <----	ret_from_intr
+ *
+ * ...
  */
 static void delete_uprobe(struct uprobe *uprobe)
 {
 	unsigned long flags;
 
+	/*
+	 * At this point all breakpoint instructions belonging to this uprobe
+	 * have been removed, so no new references to this uprobe can be
+	 * created, however!
+	 *
+	 * There could be an in-progress breakpoint from before we removed the
+	 * instruction still pending (A). synchronize_sched() insures all CPUs
+	 * will have scheduled at least once, therefore all such pending
+	 * interrupts will hereafter have reached (B) and thus have taken their
+	 * SRCU reference.
+	 */
+	synchronize_sched();
+
+	/*
+	 * Wait for all in-progress breakpoint handlers to finish, ensuring all
+	 * handlers passed (C) turning all references into active refcounts.
+	 */
 	synchronize_srcu(&uprobes_srcu);
+
+	/*
+	 * We can now safely remove the uprobe, all references are active
+	 * references and the refcounting will work as expected.
+	 */
 	spin_lock_irqsave(&uprobes_treelock, flags);
 	rb_erase(&uprobe->rb_node, &uprobes_tree);
 	spin_unlock_irqrestore(&uprobes_treelock, flags);


WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Jim Keniston <jkenisto@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-mm <linux-mm@kvack.org>, Andi Kleen <andi@firstfloor.org>,
	Christoph Hellwig <hch@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Arnaldo Carvalho de Melo <acme@infradead.org>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Anton Arapov <anton@redhat.com>
Subject: Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
Date: Sun, 15 Apr 2012 23:48:33 +0200	[thread overview]
Message-ID: <1334526513.28150.23.camel@twins> (raw)
In-Reply-To: <20120415195351.GA22095@redhat.com>

On Sun, 2012-04-15 at 21:53 +0200, Oleg Nesterov wrote:
> On 04/15, Peter Zijlstra wrote:
> >
> > On Sat, 2012-04-14 at 22:52 +0200, Oleg Nesterov wrote:
> > > > >     - can it work or I missed something "in general" ?
> > > >
> > > > So we insert in the rb-tree before we take mmap_sem, this means we can
> > > > hit a non-uprobe int3 and still find a uprobe there, no?
> > >
> > > Yes, but unless I miss something this is "off-topic", this
> > > can happen with or without these changes. If find_uprobe()
> > > succeeds we assume that this bp was inserted by uprobe.
> >
> > OK, but then I completely missed what the point of that
> > down_write() stuff is..
> 
> To ensure handle_swbp() can't race with unregister + register
> and send the wrong SIGTRAP.
> 
> handle_swbp() roughly does under down_read(mmap_sem)
> 
> 
> 	if (find_uprobe(vaddr))
> 		process_uprobe();
> 	else
> 	if (is_swbp_at_addr_fast(vaddr))	// non-uprobe int3
> 		send_sig(SIGTRAP);
> 	else
> 		restart_insn(vaddr);		// raced with unregister
> 
> 
> note that is_swbp_at_addr_fast() is used (currently) to detect
> the race with upbrobe_unregister() and that is why we can remove
> uprobes_srcu.
> 
> But if find_uprobe() fails, there is a window before
> is_swbp_at_addr_fast() reads the memory. Suppose that the next
> uprobe_register() inserts the new uprobe at the same address.
> In this case the task will be wrongly killed.

OK, still not seeing how your proposal could work.. consider the below
patch comment, I'm not seeing how is_swbp_at_addr_fast() deals with an
in-progress INT3 while we remove the probe.

By ensuring the non-race with reg/unreg it will either find the uprobe
(no problem) or not find it and not see a breakpoint instruction either,
even though the pending breakpoint was generated by a uprobe (which is
now gone), causing a false positive SIGTRAP.

Or am I still not getting it?

---
 kernel/events/uprobes.c |   53 +++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 29e881b..67818ff 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -723,20 +723,57 @@ remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, loff_t vaddr)
 }
 
 /*
- * There could be threads that have hit the breakpoint and are entering the
- * notifier code and trying to acquire the uprobes_treelock. The thread
- * calling delete_uprobe() that is removing the uprobe from the rb_tree can
- * race with these threads and might acquire the uprobes_treelock compared
- * to some of the breakpoint hit threads. In such a case, the breakpoint
- * hit threads will not find the uprobe. The current unregistering thread
- * waits till all other threads have hit a breakpoint, to acquire the
- * uprobes_treelock before the uprobe is removed from the rbtree.
+ * <userspace>
+ *  ...
+ *  int3 ---->	<IRQ>
+ *	   	  do_int3
+ *	  (A)	    DIE_INT3 -> uprobe_pre_sstep_notifier()
+ *	   	      ...
+ *		      set_thread_flag(TIF_UPROBE)
+ *		      srcu_read_lock_raw()
+ *		<EOI>
+ *	  (B)
+ *		ret_from_intr
+ *		  do_notify_resume()
+ *		    uprobe_notify_resume()
+ *		      handle_swbp()
+ *		        uprobe = find_uprobe()
+ *		          atomic_inc(&uprobe->ref)
+ *			srcu_read_unlock_raw()
+ *			...
+ *	  (C)
+ *	  		put_uprobe()
+ *	 <----	ret_from_intr
+ *
+ * ...
  */
 static void delete_uprobe(struct uprobe *uprobe)
 {
 	unsigned long flags;
 
+	/*
+	 * At this point all breakpoint instructions belonging to this uprobe
+	 * have been removed, so no new references to this uprobe can be
+	 * created, however!
+	 *
+	 * There could be an in-progress breakpoint from before we removed the
+	 * instruction still pending (A). synchronize_sched() insures all CPUs
+	 * will have scheduled at least once, therefore all such pending
+	 * interrupts will hereafter have reached (B) and thus have taken their
+	 * SRCU reference.
+	 */
+	synchronize_sched();
+
+	/*
+	 * Wait for all in-progress breakpoint handlers to finish, ensuring all
+	 * handlers passed (C) turning all references into active refcounts.
+	 */
 	synchronize_srcu(&uprobes_srcu);
+
+	/*
+	 * We can now safely remove the uprobe, all references are active
+	 * references and the refcounting will work as expected.
+	 */
 	spin_lock_irqsave(&uprobes_treelock, flags);
 	rb_erase(&uprobe->rb_node, &uprobes_tree);
 	spin_unlock_irqrestore(&uprobes_treelock, flags);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-04-15 21:49 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-05 22:20 [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id Oleg Nesterov
2012-04-05 22:20 ` Oleg Nesterov
2012-04-05 22:20 ` [PATCH 1/6] uprobes: introduce find_active_uprobe() Oleg Nesterov
2012-04-05 22:20   ` Oleg Nesterov
2012-04-05 22:21 ` [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast() Oleg Nesterov
2012-04-05 22:21   ` Oleg Nesterov
2012-04-16 10:08   ` Peter Zijlstra
2012-04-16 10:08     ` Peter Zijlstra
2012-04-16 14:44     ` Oleg Nesterov
2012-04-16 14:44       ` Oleg Nesterov
2012-04-16 14:55       ` Peter Zijlstra
2012-04-16 14:55         ` Peter Zijlstra
2012-04-16 15:34         ` Oleg Nesterov
2012-04-16 15:34           ` Oleg Nesterov
2012-04-17 10:08           ` Peter Zijlstra
2012-04-17 10:08             ` Peter Zijlstra
2012-04-17 17:09             ` Oleg Nesterov
2012-04-17 17:09               ` Oleg Nesterov
2012-04-17 19:53               ` Peter Zijlstra
2012-04-17 19:53                 ` Peter Zijlstra
2012-04-05 22:21 ` [PATCH 3/6] uprobes: teach find_active_uprobe() to provide the "is_swbp" info Oleg Nesterov
2012-04-05 22:21   ` Oleg Nesterov
2012-04-05 22:21 ` [PATCH 4/6] uprobes: change register_for_each_vma() to take mm->mmap_sem for writing Oleg Nesterov
2012-04-05 22:21   ` Oleg Nesterov
2012-04-05 22:22 ` [PATCH 5/6] uprobes: teach handle_swbp() to rely on "is_swbp" rather than uprobes_srcu Oleg Nesterov
2012-04-05 22:22   ` Oleg Nesterov
2012-04-05 22:22 ` [PATCH 6/6] uprobes: kill uprobes_srcu/uprobe_srcu_id Oleg Nesterov
2012-04-05 22:22   ` Oleg Nesterov
2012-04-14 11:16 ` [RFC 0/6] " Ingo Molnar
2012-04-14 11:16   ` Ingo Molnar
2012-04-16 11:31   ` Srikar Dronamraju
2012-04-16 11:31     ` Srikar Dronamraju
2012-04-16 14:41     ` Oleg Nesterov
2012-04-16 14:41       ` Oleg Nesterov
2012-04-25 12:52       ` Srikar Dronamraju
2012-04-25 12:52         ` Srikar Dronamraju
2012-04-25 14:22         ` Oleg Nesterov
2012-04-25 14:22           ` Oleg Nesterov
2012-04-14 13:16 ` Peter Zijlstra
2012-04-14 13:16   ` Peter Zijlstra
2012-04-14 20:52   ` Oleg Nesterov
2012-04-14 20:52     ` Oleg Nesterov
2012-04-15 10:51     ` Peter Zijlstra
2012-04-15 10:51       ` Peter Zijlstra
2012-04-15 19:53       ` Oleg Nesterov
2012-04-15 19:53         ` Oleg Nesterov
2012-04-15 21:48         ` Peter Zijlstra [this message]
2012-04-15 21:48           ` Peter Zijlstra
2012-04-15 23:44           ` Oleg Nesterov
2012-04-15 23:44             ` Oleg Nesterov
2012-04-16 10:16             ` Peter Zijlstra
2012-04-16 10:16               ` Peter Zijlstra
2012-04-16 21:47               ` Oleg Nesterov
2012-04-16 21:47                 ` Oleg Nesterov
2012-04-20 10:14                 ` Peter Zijlstra
2012-04-20 10:14                   ` Peter Zijlstra
2012-04-20 10:16                   ` Srikar Dronamraju
2012-04-20 10:16                     ` Srikar Dronamraju
2012-04-20 18:58                     ` Oleg Nesterov
2012-04-20 18:58                       ` Oleg Nesterov
2012-04-20 18:37                   ` Oleg Nesterov
2012-04-20 18:37                     ` Oleg Nesterov
2012-04-23  7:14                     ` Peter Zijlstra
2012-04-23  7:14                       ` Peter Zijlstra
2012-04-23  7:24                       ` Srikar Dronamraju
2012-04-23  7:24                         ` Srikar Dronamraju
2012-04-23  7:40                         ` Peter Zijlstra
2012-04-23  7:40                           ` Peter Zijlstra
2012-04-23 17:29                           ` Oleg Nesterov
2012-04-23 17:29                             ` Oleg Nesterov
2012-04-23 19:18                             ` Peter Zijlstra
2012-04-23 19:18                               ` Peter Zijlstra
2012-04-23 20:50                               ` Oleg Nesterov
2012-04-23 20:50                                 ` Oleg Nesterov
2012-04-23 21:25                                 ` Oleg Nesterov
2012-04-23 21:25                                   ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1334526513.28150.23.camel@twins \
    --to=peterz@infradead.org \
    --cc=acme@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=ananth@in.ibm.com \
    --cc=andi@firstfloor.org \
    --cc=anton@redhat.com \
    --cc=hch@infradead.org \
    --cc=jkenisto@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.