All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-05 22:20 ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-05 22:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Andrew Morton, Linus Torvalds, Ananth N Mavinakayanahalli,
	Jim Keniston, LKML, Linux-mm, Andi Kleen, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Thomas Gleixner, Anton Arapov

Hello.

Not for inclusion yet, only for the early review.

I didn't even try to test these changes, and I am not expert
in this area. And even _if_ this code is correct, I need to
re-split these changes anyway, update the changelogs, etc.

Questions:

	- does it make sense?

	- can it work or I missed something "in general" ?

Why:

	- It would be nice to remove a member from task_struct.

	- Afaics, the usage of uprobes_srcu does not look right,
	  at least in theory, see 6/6.

	  The comment above delete_uprobe() says:

	  	The current unregistering thread waits till all
	  	other threads have hit a breakpoint, to acquire
	  	the uprobes_treelock before the uprobe is removed
	  	from the rbtree.

	  but synchronize_srcu() can only help if a thread which
	  have hit the breakpoint has already called srcu_read_lock().
	  It can't synchronize with read_lock "in future", and there
	  is a small window.

	  We could probably add another synchronize_sched() before
	  synchronize_srcu(), but this doesn't look very nice and

	- I am not sure yet, but perhaps with these changes we can
	  also kill mm->uprobes_state.count.

Any review is very much appreciated.

Oleg.

 include/linux/sched.h   |    1 -
 kernel/events/uprobes.c |  117 ++++++++++++++++++++++++++++++-----------------
 2 files changed, 75 insertions(+), 43 deletions(-)


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-05 22:20 ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-05 22:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Andrew Morton, Linus Torvalds, Ananth N Mavinakayanahalli,
	Jim Keniston, LKML, Linux-mm, Andi Kleen, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Thomas Gleixner, Anton Arapov

Hello.

Not for inclusion yet, only for the early review.

I didn't even try to test these changes, and I am not expert
in this area. And even _if_ this code is correct, I need to
re-split these changes anyway, update the changelogs, etc.

Questions:

	- does it make sense?

	- can it work or I missed something "in general" ?

Why:

	- It would be nice to remove a member from task_struct.

	- Afaics, the usage of uprobes_srcu does not look right,
	  at least in theory, see 6/6.

	  The comment above delete_uprobe() says:

	  	The current unregistering thread waits till all
	  	other threads have hit a breakpoint, to acquire
	  	the uprobes_treelock before the uprobe is removed
	  	from the rbtree.

	  but synchronize_srcu() can only help if a thread which
	  have hit the breakpoint has already called srcu_read_lock().
	  It can't synchronize with read_lock "in future", and there
	  is a small window.

	  We could probably add another synchronize_sched() before
	  synchronize_srcu(), but this doesn't look very nice and

	- I am not sure yet, but perhaps with these changes we can
	  also kill mm->uprobes_state.count.

Any review is very much appreciated.

Oleg.

 include/linux/sched.h   |    1 -
 kernel/events/uprobes.c |  117 ++++++++++++++++++++++++++++++-----------------
 2 files changed, 75 insertions(+), 43 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 1/6] uprobes: introduce find_active_uprobe()
  2012-04-05 22:20 ` Oleg Nesterov
@ 2012-04-05 22:20   ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-05 22:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Andrew Morton, Linus Torvalds, Ananth N Mavinakayanahalli,
	Jim Keniston, LKML, Linux-mm, Andi Kleen, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Thomas Gleixner, Anton Arapov

No functional changes. Move the "find uprobe" code from
handle_swbp() to the new helper, find_active_uprobe().

Note: with or without this change, the find-active-uprobe logic
is not exactly right. We can race with another thread which unmaps
the memory with the valid uprobe before we take mm->mmap_sem. We
can't find this uprobe simply because find_vma() fails. In this
case we wrongly assume that this trap was not caused by uprobe
and send the erroneous SIGTRAP.
---
 kernel/events/uprobes.c |   31 +++++++++++++++++++------------
 1 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 29e881b..3d0a4d6 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1474,21 +1474,12 @@ static bool can_skip_sstep(struct uprobe *uprobe, struct pt_regs *regs)
 	return false;
 }
 
-/*
- * Run handler and ask thread to singlestep.
- * Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
- */
-static void handle_swbp(struct pt_regs *regs)
+static struct uprobe *find_active_uprobe(unsigned long bp_vaddr)
 {
+	struct mm_struct *mm = current->mm;
+	struct uprobe *uprobe = NULL;
 	struct vm_area_struct *vma;
-	struct uprobe_task *utask;
-	struct uprobe *uprobe;
-	struct mm_struct *mm;
-	unsigned long bp_vaddr;
 
-	uprobe = NULL;
-	bp_vaddr = uprobe_get_swbp_addr(regs);
-	mm = current->mm;
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, bp_vaddr);
 
@@ -1506,6 +1497,22 @@ static void handle_swbp(struct pt_regs *regs)
 	current->uprobe_srcu_id = -1;
 	up_read(&mm->mmap_sem);
 
+	return uprobe;
+}
+
+/*
+ * Run handler and ask thread to singlestep.
+ * Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
+ */
+static void handle_swbp(struct pt_regs *regs)
+{
+	struct uprobe_task *utask;
+	struct uprobe *uprobe;
+	unsigned long bp_vaddr;
+
+	bp_vaddr = uprobe_get_swbp_addr(regs);
+	uprobe = find_active_uprobe(bp_vaddr);
+
 	if (!uprobe) {
 		/* No matching uprobe; signal SIGTRAP. */
 		send_sig(SIGTRAP, current, 0);
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 1/6] uprobes: introduce find_active_uprobe()
@ 2012-04-05 22:20   ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-05 22:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Andrew Morton, Linus Torvalds, Ananth N Mavinakayanahalli,
	Jim Keniston, LKML, Linux-mm, Andi Kleen, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Thomas Gleixner, Anton Arapov

No functional changes. Move the "find uprobe" code from
handle_swbp() to the new helper, find_active_uprobe().

Note: with or without this change, the find-active-uprobe logic
is not exactly right. We can race with another thread which unmaps
the memory with the valid uprobe before we take mm->mmap_sem. We
can't find this uprobe simply because find_vma() fails. In this
case we wrongly assume that this trap was not caused by uprobe
and send the erroneous SIGTRAP.
---
 kernel/events/uprobes.c |   31 +++++++++++++++++++------------
 1 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 29e881b..3d0a4d6 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1474,21 +1474,12 @@ static bool can_skip_sstep(struct uprobe *uprobe, struct pt_regs *regs)
 	return false;
 }
 
-/*
- * Run handler and ask thread to singlestep.
- * Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
- */
-static void handle_swbp(struct pt_regs *regs)
+static struct uprobe *find_active_uprobe(unsigned long bp_vaddr)
 {
+	struct mm_struct *mm = current->mm;
+	struct uprobe *uprobe = NULL;
 	struct vm_area_struct *vma;
-	struct uprobe_task *utask;
-	struct uprobe *uprobe;
-	struct mm_struct *mm;
-	unsigned long bp_vaddr;
 
-	uprobe = NULL;
-	bp_vaddr = uprobe_get_swbp_addr(regs);
-	mm = current->mm;
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, bp_vaddr);
 
@@ -1506,6 +1497,22 @@ static void handle_swbp(struct pt_regs *regs)
 	current->uprobe_srcu_id = -1;
 	up_read(&mm->mmap_sem);
 
+	return uprobe;
+}
+
+/*
+ * Run handler and ask thread to singlestep.
+ * Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
+ */
+static void handle_swbp(struct pt_regs *regs)
+{
+	struct uprobe_task *utask;
+	struct uprobe *uprobe;
+	unsigned long bp_vaddr;
+
+	bp_vaddr = uprobe_get_swbp_addr(regs);
+	uprobe = find_active_uprobe(bp_vaddr);
+
 	if (!uprobe) {
 		/* No matching uprobe; signal SIGTRAP. */
 		send_sig(SIGTRAP, current, 0);
-- 
1.5.5.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
  2012-04-05 22:20 ` Oleg Nesterov
@ 2012-04-05 22:21   ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-05 22:21 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Andrew Morton, Linus Torvalds, Ananth N Mavinakayanahalli,
	Jim Keniston, LKML, Linux-mm, Andi Kleen, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Thomas Gleixner, Anton Arapov

Add the new helper, is_swbp_at_addr_fast(), will be used by
find_active_uprobe().

It is almost the same as is_swbp_at_addr(), but since it plays
with current->mm it can avoid the slow get_user_pages() in the
likely case.
---
 kernel/events/uprobes.c |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 3d0a4d6..2050b1a 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1474,6 +1474,29 @@ static bool can_skip_sstep(struct uprobe *uprobe, struct pt_regs *regs)
 	return false;
 }
 
+int __weak is_swbp_at_addr_fast(unsigned long vaddr)
+{
+	uprobe_opcode_t opcode;
+	int fault;
+
+	pagefault_disable();
+	fault = __copy_from_user_inatomic(&opcode, (void __user*)vaddr,
+							sizeof(opcode));
+	pagefault_enable();
+
+	if (unlikely(fault)) {
+		/*
+		 * XXX: read_opcode() lacks FOLL_FORCE, it can fail if
+		 * we race with another thread which does mprotect(NONE)
+		 * after we hit bp.
+		 */
+		if (read_opcode(current->mm, vaddr, &opcode))
+			return -EFAULT;
+	}
+
+	return is_swbp_insn(&opcode);
+}
+
 static struct uprobe *find_active_uprobe(unsigned long bp_vaddr)
 {
 	struct mm_struct *mm = current->mm;
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
@ 2012-04-05 22:21   ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-05 22:21 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Andrew Morton, Linus Torvalds, Ananth N Mavinakayanahalli,
	Jim Keniston, LKML, Linux-mm, Andi Kleen, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Thomas Gleixner, Anton Arapov

Add the new helper, is_swbp_at_addr_fast(), will be used by
find_active_uprobe().

It is almost the same as is_swbp_at_addr(), but since it plays
with current->mm it can avoid the slow get_user_pages() in the
likely case.
---
 kernel/events/uprobes.c |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 3d0a4d6..2050b1a 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1474,6 +1474,29 @@ static bool can_skip_sstep(struct uprobe *uprobe, struct pt_regs *regs)
 	return false;
 }
 
+int __weak is_swbp_at_addr_fast(unsigned long vaddr)
+{
+	uprobe_opcode_t opcode;
+	int fault;
+
+	pagefault_disable();
+	fault = __copy_from_user_inatomic(&opcode, (void __user*)vaddr,
+							sizeof(opcode));
+	pagefault_enable();
+
+	if (unlikely(fault)) {
+		/*
+		 * XXX: read_opcode() lacks FOLL_FORCE, it can fail if
+		 * we race with another thread which does mprotect(NONE)
+		 * after we hit bp.
+		 */
+		if (read_opcode(current->mm, vaddr, &opcode))
+			return -EFAULT;
+	}
+
+	return is_swbp_insn(&opcode);
+}
+
 static struct uprobe *find_active_uprobe(unsigned long bp_vaddr)
 {
 	struct mm_struct *mm = current->mm;
-- 
1.5.5.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 3/6] uprobes: teach find_active_uprobe() to provide the "is_swbp" info
  2012-04-05 22:20 ` Oleg Nesterov
@ 2012-04-05 22:21   ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-05 22:21 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Andrew Morton, Linus Torvalds, Ananth N Mavinakayanahalli,
	Jim Keniston, LKML, Linux-mm, Andi Kleen, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Thomas Gleixner, Anton Arapov

A separate patch to simplify the review, and for the documentation.

The patch adds another "int *is_swbp" argument to find_active_uprobe(),
so far its only caller doesn't use this info.

With this patch find_active_uprobe() additionally does:

	- if find_vma() + ->vm_start check fails, *is_swbp = -EFAULT

	- otherwise, if valid_vma() + find_uprobe() fails, we return
	  the result of is_swbp_at_addr_fast(), can be -EFAULT too.

IOW. If find_active_uprobe(&is_swbp) returns NULL, the caller can look
at is_swbp to figure out if the current insn is bp or not, or if we
can't access this memory.

Note: I think that performance-wise this change is fine. This adds
is_swbp_at_addr_fast(), but only if we race with uprobe_unregister()
or we hit the "normal" int3 but this mm has uprobes as well. And even
in this case the slow read_opcode() path is very unlikely, this insn
recently triggered do_int3(), __copy_from_user_inatomic() shouldn't
fail in the likely case.
---
 kernel/events/uprobes.c |   27 +++++++++++++++++----------
 1 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2050b1a..054c00f 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1497,7 +1497,7 @@ int __weak is_swbp_at_addr_fast(unsigned long vaddr)
 	return is_swbp_insn(&opcode);
 }
 
-static struct uprobe *find_active_uprobe(unsigned long bp_vaddr)
+static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
 {
 	struct mm_struct *mm = current->mm;
 	struct uprobe *uprobe = NULL;
@@ -1505,15 +1505,21 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr)
 
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, bp_vaddr);
+	if (vma && vma->vm_start <= bp_vaddr) {
+		if (valid_vma(vma, false)) {
+			struct inode *inode;
+			loff_t offset;
+
+			inode = vma->vm_file->f_mapping->host;
+			offset = bp_vaddr - vma->vm_start;
+			offset += (vma->vm_pgoff << PAGE_SHIFT);
+			uprobe = find_uprobe(inode, offset);
+		}
 
-	if (vma && vma->vm_start <= bp_vaddr && valid_vma(vma, false)) {
-		struct inode *inode;
-		loff_t offset;
-
-		inode = vma->vm_file->f_mapping->host;
-		offset = bp_vaddr - vma->vm_start;
-		offset += (vma->vm_pgoff << PAGE_SHIFT);
-		uprobe = find_uprobe(inode, offset);
+		if (!uprobe)
+			*is_swbp = is_swbp_at_addr_fast(bp_vaddr);
+	} else {
+		*is_swbp = -EFAULT;
 	}
 
 	srcu_read_unlock_raw(&uprobes_srcu, current->uprobe_srcu_id);
@@ -1532,9 +1538,10 @@ static void handle_swbp(struct pt_regs *regs)
 	struct uprobe_task *utask;
 	struct uprobe *uprobe;
 	unsigned long bp_vaddr;
+	int is_swbp;
 
 	bp_vaddr = uprobe_get_swbp_addr(regs);
-	uprobe = find_active_uprobe(bp_vaddr);
+	uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
 
 	if (!uprobe) {
 		/* No matching uprobe; signal SIGTRAP. */
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 3/6] uprobes: teach find_active_uprobe() to provide the "is_swbp" info
@ 2012-04-05 22:21   ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-05 22:21 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Andrew Morton, Linus Torvalds, Ananth N Mavinakayanahalli,
	Jim Keniston, LKML, Linux-mm, Andi Kleen, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Thomas Gleixner, Anton Arapov

A separate patch to simplify the review, and for the documentation.

The patch adds another "int *is_swbp" argument to find_active_uprobe(),
so far its only caller doesn't use this info.

With this patch find_active_uprobe() additionally does:

	- if find_vma() + ->vm_start check fails, *is_swbp = -EFAULT

	- otherwise, if valid_vma() + find_uprobe() fails, we return
	  the result of is_swbp_at_addr_fast(), can be -EFAULT too.

IOW. If find_active_uprobe(&is_swbp) returns NULL, the caller can look
at is_swbp to figure out if the current insn is bp or not, or if we
can't access this memory.

Note: I think that performance-wise this change is fine. This adds
is_swbp_at_addr_fast(), but only if we race with uprobe_unregister()
or we hit the "normal" int3 but this mm has uprobes as well. And even
in this case the slow read_opcode() path is very unlikely, this insn
recently triggered do_int3(), __copy_from_user_inatomic() shouldn't
fail in the likely case.
---
 kernel/events/uprobes.c |   27 +++++++++++++++++----------
 1 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2050b1a..054c00f 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1497,7 +1497,7 @@ int __weak is_swbp_at_addr_fast(unsigned long vaddr)
 	return is_swbp_insn(&opcode);
 }
 
-static struct uprobe *find_active_uprobe(unsigned long bp_vaddr)
+static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
 {
 	struct mm_struct *mm = current->mm;
 	struct uprobe *uprobe = NULL;
@@ -1505,15 +1505,21 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr)
 
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, bp_vaddr);
+	if (vma && vma->vm_start <= bp_vaddr) {
+		if (valid_vma(vma, false)) {
+			struct inode *inode;
+			loff_t offset;
+
+			inode = vma->vm_file->f_mapping->host;
+			offset = bp_vaddr - vma->vm_start;
+			offset += (vma->vm_pgoff << PAGE_SHIFT);
+			uprobe = find_uprobe(inode, offset);
+		}
 
-	if (vma && vma->vm_start <= bp_vaddr && valid_vma(vma, false)) {
-		struct inode *inode;
-		loff_t offset;
-
-		inode = vma->vm_file->f_mapping->host;
-		offset = bp_vaddr - vma->vm_start;
-		offset += (vma->vm_pgoff << PAGE_SHIFT);
-		uprobe = find_uprobe(inode, offset);
+		if (!uprobe)
+			*is_swbp = is_swbp_at_addr_fast(bp_vaddr);
+	} else {
+		*is_swbp = -EFAULT;
 	}
 
 	srcu_read_unlock_raw(&uprobes_srcu, current->uprobe_srcu_id);
@@ -1532,9 +1538,10 @@ static void handle_swbp(struct pt_regs *regs)
 	struct uprobe_task *utask;
 	struct uprobe *uprobe;
 	unsigned long bp_vaddr;
+	int is_swbp;
 
 	bp_vaddr = uprobe_get_swbp_addr(regs);
-	uprobe = find_active_uprobe(bp_vaddr);
+	uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
 
 	if (!uprobe) {
 		/* No matching uprobe; signal SIGTRAP. */
-- 
1.5.5.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 4/6] uprobes: change register_for_each_vma() to take mm->mmap_sem for writing
  2012-04-05 22:20 ` Oleg Nesterov
@ 2012-04-05 22:21   ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-05 22:21 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Andrew Morton, Linus Torvalds, Ananth N Mavinakayanahalli,
	Jim Keniston, LKML, Linux-mm, Andi Kleen, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Thomas Gleixner, Anton Arapov

Change register_for_each_vma() to take mm->mmap_sem for writing.
This is a bit unfortunate but hopefully not too bad, this is the
slow path anyway.

This is needed to ensure that find_active_uprobe() can not race
with uprobe_register() which adds the new bp at the same bp_vaddr,
after find_uprobe() fails and before is_swbp_at_addr_fast() checks
the memory.

IOW, this is needed to ensure that if find_active_uprobe() returns
NULL but is_swbp == true, we can safely assume that it was the
"normal" int3 and we should send SIGTRAP.
---
 kernel/events/uprobes.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 054c00f..2af458d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -838,12 +838,12 @@ static int register_for_each_vma(struct uprobe *uprobe, bool is_register)
 		}
 
 		mm = vi->mm;
-		down_read(&mm->mmap_sem);
+		down_write(&mm->mmap_sem);
 		vma = find_vma(mm, (unsigned long)vi->vaddr);
 		if (!vma || !valid_vma(vma, is_register)) {
 			list_del(&vi->probe_list);
 			kfree(vi);
-			up_read(&mm->mmap_sem);
+			up_write(&mm->mmap_sem);
 			mmput(mm);
 			continue;
 		}
@@ -852,7 +852,7 @@ static int register_for_each_vma(struct uprobe *uprobe, bool is_register)
 						vaddr != vi->vaddr) {
 			list_del(&vi->probe_list);
 			kfree(vi);
-			up_read(&mm->mmap_sem);
+			up_write(&mm->mmap_sem);
 			mmput(mm);
 			continue;
 		}
@@ -862,7 +862,7 @@ static int register_for_each_vma(struct uprobe *uprobe, bool is_register)
 		else
 			remove_breakpoint(uprobe, mm, vi->vaddr);
 
-		up_read(&mm->mmap_sem);
+		up_write(&mm->mmap_sem);
 		mmput(mm);
 		if (is_register) {
 			if (ret && ret == -EEXIST)
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 4/6] uprobes: change register_for_each_vma() to take mm->mmap_sem for writing
@ 2012-04-05 22:21   ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-05 22:21 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Andrew Morton, Linus Torvalds, Ananth N Mavinakayanahalli,
	Jim Keniston, LKML, Linux-mm, Andi Kleen, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Thomas Gleixner, Anton Arapov

Change register_for_each_vma() to take mm->mmap_sem for writing.
This is a bit unfortunate but hopefully not too bad, this is the
slow path anyway.

This is needed to ensure that find_active_uprobe() can not race
with uprobe_register() which adds the new bp at the same bp_vaddr,
after find_uprobe() fails and before is_swbp_at_addr_fast() checks
the memory.

IOW, this is needed to ensure that if find_active_uprobe() returns
NULL but is_swbp == true, we can safely assume that it was the
"normal" int3 and we should send SIGTRAP.
---
 kernel/events/uprobes.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 054c00f..2af458d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -838,12 +838,12 @@ static int register_for_each_vma(struct uprobe *uprobe, bool is_register)
 		}
 
 		mm = vi->mm;
-		down_read(&mm->mmap_sem);
+		down_write(&mm->mmap_sem);
 		vma = find_vma(mm, (unsigned long)vi->vaddr);
 		if (!vma || !valid_vma(vma, is_register)) {
 			list_del(&vi->probe_list);
 			kfree(vi);
-			up_read(&mm->mmap_sem);
+			up_write(&mm->mmap_sem);
 			mmput(mm);
 			continue;
 		}
@@ -852,7 +852,7 @@ static int register_for_each_vma(struct uprobe *uprobe, bool is_register)
 						vaddr != vi->vaddr) {
 			list_del(&vi->probe_list);
 			kfree(vi);
-			up_read(&mm->mmap_sem);
+			up_write(&mm->mmap_sem);
 			mmput(mm);
 			continue;
 		}
@@ -862,7 +862,7 @@ static int register_for_each_vma(struct uprobe *uprobe, bool is_register)
 		else
 			remove_breakpoint(uprobe, mm, vi->vaddr);
 
-		up_read(&mm->mmap_sem);
+		up_write(&mm->mmap_sem);
 		mmput(mm);
 		if (is_register) {
 			if (ret && ret == -EEXIST)
-- 
1.5.5.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 5/6] uprobes: teach handle_swbp() to rely on "is_swbp" rather than uprobes_srcu
  2012-04-05 22:20 ` Oleg Nesterov
@ 2012-04-05 22:22   ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-05 22:22 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Andrew Morton, Linus Torvalds, Ananth N Mavinakayanahalli,
	Jim Keniston, LKML, Linux-mm, Andi Kleen, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Thomas Gleixner, Anton Arapov

Currently handle_swbp() assumes that it can't race with unregister,
so it roughly does:

	if (find_uprobe(vaddr))
		process_uprobe();
	else
		send_sig(SIGTRAP);

This relies on the not-really-working uprobes_srcu code we are going
to remove.

With this patch we rely on the result of is_swbp_at_addr_fast(bp_vaddr)
if find_uprobe() fails.

If is_swbp == 1, then we hit the normal int3, we should send SIGTRAP.

If is_swbp == 0, we raced with uprobe_unregister(), we simply restart
this insn again.

The "difficult" case is is_swbp == -EFAULT, when we can't read this
memory. In this case I think we should restart too, and this is more
correct compared to the current code which sends SIGTRAP.

Ignoring ENOMEM/etc from get_user_pages(), this can only happen if
another thread unmaps this memory before find_active_uprobe() takes
mmap_sem. It would be better to pretend it was unmapped before this
insn was executed, restart, and get SIGSEGV.
---
 kernel/events/uprobes.c |   18 +++++++++++++++---
 1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2af458d..ed76ee5 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1538,14 +1538,26 @@ static void handle_swbp(struct pt_regs *regs)
 	struct uprobe_task *utask;
 	struct uprobe *uprobe;
 	unsigned long bp_vaddr;
-	int is_swbp;
+	int uninitialized_var(is_swbp);
 
 	bp_vaddr = uprobe_get_swbp_addr(regs);
 	uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
 
 	if (!uprobe) {
-		/* No matching uprobe; signal SIGTRAP. */
-		send_sig(SIGTRAP, current, 0);
+		if (is_swbp > 0) {
+			/* No matching uprobe; signal SIGTRAP. */
+			send_sig(SIGTRAP, current, 0);
+		} else {
+			/*
+			 * Either we raced with uprobe_unregister() or we can't
+			 * access this memory. The latter is only possible if
+			 * another thread plays with our ->mm. In both cases
+			 * we can simply restart. If this vma was unmapped we
+			 * can pretend this insn was not executed yet and get
+			 * the (correct) SIGSEGV after restart.
+			 */
+			instruction_pointer_set(regs, bp_vaddr);
+		}
 		return;
 	}
 
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 5/6] uprobes: teach handle_swbp() to rely on "is_swbp" rather than uprobes_srcu
@ 2012-04-05 22:22   ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-05 22:22 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Andrew Morton, Linus Torvalds, Ananth N Mavinakayanahalli,
	Jim Keniston, LKML, Linux-mm, Andi Kleen, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Thomas Gleixner, Anton Arapov

Currently handle_swbp() assumes that it can't race with unregister,
so it roughly does:

	if (find_uprobe(vaddr))
		process_uprobe();
	else
		send_sig(SIGTRAP);

This relies on the not-really-working uprobes_srcu code we are going
to remove.

With this patch we rely on the result of is_swbp_at_addr_fast(bp_vaddr)
if find_uprobe() fails.

If is_swbp == 1, then we hit the normal int3, we should send SIGTRAP.

If is_swbp == 0, we raced with uprobe_unregister(), we simply restart
this insn again.

The "difficult" case is is_swbp == -EFAULT, when we can't read this
memory. In this case I think we should restart too, and this is more
correct compared to the current code which sends SIGTRAP.

Ignoring ENOMEM/etc from get_user_pages(), this can only happen if
another thread unmaps this memory before find_active_uprobe() takes
mmap_sem. It would be better to pretend it was unmapped before this
insn was executed, restart, and get SIGSEGV.
---
 kernel/events/uprobes.c |   18 +++++++++++++++---
 1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2af458d..ed76ee5 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1538,14 +1538,26 @@ static void handle_swbp(struct pt_regs *regs)
 	struct uprobe_task *utask;
 	struct uprobe *uprobe;
 	unsigned long bp_vaddr;
-	int is_swbp;
+	int uninitialized_var(is_swbp);
 
 	bp_vaddr = uprobe_get_swbp_addr(regs);
 	uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
 
 	if (!uprobe) {
-		/* No matching uprobe; signal SIGTRAP. */
-		send_sig(SIGTRAP, current, 0);
+		if (is_swbp > 0) {
+			/* No matching uprobe; signal SIGTRAP. */
+			send_sig(SIGTRAP, current, 0);
+		} else {
+			/*
+			 * Either we raced with uprobe_unregister() or we can't
+			 * access this memory. The latter is only possible if
+			 * another thread plays with our ->mm. In both cases
+			 * we can simply restart. If this vma was unmapped we
+			 * can pretend this insn was not executed yet and get
+			 * the (correct) SIGSEGV after restart.
+			 */
+			instruction_pointer_set(regs, bp_vaddr);
+		}
 		return;
 	}
 
-- 
1.5.5.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 6/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-05 22:20 ` Oleg Nesterov
@ 2012-04-05 22:22   ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-05 22:22 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Andrew Morton, Linus Torvalds, Ananth N Mavinakayanahalli,
	Jim Keniston, LKML, Linux-mm, Andi Kleen, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Thomas Gleixner, Anton Arapov

Kill the no longer needed uprobes_srcu/uprobe_srcu_id code.

It doesn't really work anyway. synchronize_srcu() can only synchronize
with the code "inside" the srcu_read_lock/srcu_read_unlock section,
while uprobe_pre_sstep_notifier() does srcu_read_lock() _after_ we
already hit the breakpoint.

I guess this probably works "in practice". synchronize_srcu() is slow
and it implies synchronize_sched(), and the probed task enters the non-
preemptible section at the start of exception handler. Still this is not
right at least in theory, and task->uprobe_srcu_id blows task_struct.
---
 include/linux/sched.h   |    1 -
 kernel/events/uprobes.c |   22 +++-------------------
 2 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8379e37..90a1f1d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1592,7 +1592,6 @@ struct task_struct {
 #endif
 #ifdef CONFIG_UPROBES
 	struct uprobe_task *utask;
-	int uprobe_srcu_id;
 #endif
 };
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ed76ee5..221e670 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -38,7 +38,6 @@
 #define UINSNS_PER_PAGE			(PAGE_SIZE/UPROBE_XOL_SLOT_BYTES)
 #define MAX_UPROBE_XOL_SLOTS		UINSNS_PER_PAGE
 
-static struct srcu_struct uprobes_srcu;
 static struct rb_root uprobes_tree = RB_ROOT;
 
 static DEFINE_SPINLOCK(uprobes_treelock);	/* serialize rbtree access */
@@ -723,20 +722,14 @@ remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, loff_t vaddr)
 }
 
 /*
- * There could be threads that have hit the breakpoint and are entering the
- * notifier code and trying to acquire the uprobes_treelock. The thread
- * calling delete_uprobe() that is removing the uprobe from the rb_tree can
- * race with these threads and might acquire the uprobes_treelock compared
- * to some of the breakpoint hit threads. In such a case, the breakpoint
- * hit threads will not find the uprobe. The current unregistering thread
- * waits till all other threads have hit a breakpoint, to acquire the
- * uprobes_treelock before the uprobe is removed from the rbtree.
+ * There could be threads that have already hit the breakpoint. They
+ * will recheck the current insn and restart if find_uprobe() fails.
+ * See find_active_uprobe().
  */
 static void delete_uprobe(struct uprobe *uprobe)
 {
 	unsigned long flags;
 
-	synchronize_srcu(&uprobes_srcu);
 	spin_lock_irqsave(&uprobes_treelock, flags);
 	rb_erase(&uprobe->rb_node, &uprobes_tree);
 	spin_unlock_irqrestore(&uprobes_treelock, flags);
@@ -1373,9 +1366,6 @@ void uprobe_free_utask(struct task_struct *t)
 {
 	struct uprobe_task *utask = t->utask;
 
-	if (t->uprobe_srcu_id != -1)
-		srcu_read_unlock_raw(&uprobes_srcu, t->uprobe_srcu_id);
-
 	if (!utask)
 		return;
 
@@ -1393,7 +1383,6 @@ void uprobe_free_utask(struct task_struct *t)
 void uprobe_copy_process(struct task_struct *t)
 {
 	t->utask = NULL;
-	t->uprobe_srcu_id = -1;
 }
 
 /*
@@ -1521,9 +1510,6 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
 	} else {
 		*is_swbp = -EFAULT;
 	}
-
-	srcu_read_unlock_raw(&uprobes_srcu, current->uprobe_srcu_id);
-	current->uprobe_srcu_id = -1;
 	up_read(&mm->mmap_sem);
 
 	return uprobe;
@@ -1664,7 +1650,6 @@ int uprobe_pre_sstep_notifier(struct pt_regs *regs)
 		utask->state = UTASK_BP_HIT;
 
 	set_thread_flag(TIF_UPROBE);
-	current->uprobe_srcu_id = srcu_read_lock_raw(&uprobes_srcu);
 
 	return 1;
 }
@@ -1699,7 +1684,6 @@ static int __init init_uprobes(void)
 		mutex_init(&uprobes_mutex[i]);
 		mutex_init(&uprobes_mmap_mutex[i]);
 	}
-	init_srcu_struct(&uprobes_srcu);
 
 	return register_die_notifier(&uprobe_exception_nb);
 }
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 6/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-05 22:22   ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-05 22:22 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Andrew Morton, Linus Torvalds, Ananth N Mavinakayanahalli,
	Jim Keniston, LKML, Linux-mm, Andi Kleen, Christoph Hellwig,
	Steven Rostedt, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Thomas Gleixner, Anton Arapov

Kill the no longer needed uprobes_srcu/uprobe_srcu_id code.

It doesn't really work anyway. synchronize_srcu() can only synchronize
with the code "inside" the srcu_read_lock/srcu_read_unlock section,
while uprobe_pre_sstep_notifier() does srcu_read_lock() _after_ we
already hit the breakpoint.

I guess this probably works "in practice". synchronize_srcu() is slow
and it implies synchronize_sched(), and the probed task enters the non-
preemptible section at the start of exception handler. Still this is not
right at least in theory, and task->uprobe_srcu_id blows task_struct.
---
 include/linux/sched.h   |    1 -
 kernel/events/uprobes.c |   22 +++-------------------
 2 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8379e37..90a1f1d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1592,7 +1592,6 @@ struct task_struct {
 #endif
 #ifdef CONFIG_UPROBES
 	struct uprobe_task *utask;
-	int uprobe_srcu_id;
 #endif
 };
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ed76ee5..221e670 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -38,7 +38,6 @@
 #define UINSNS_PER_PAGE			(PAGE_SIZE/UPROBE_XOL_SLOT_BYTES)
 #define MAX_UPROBE_XOL_SLOTS		UINSNS_PER_PAGE
 
-static struct srcu_struct uprobes_srcu;
 static struct rb_root uprobes_tree = RB_ROOT;
 
 static DEFINE_SPINLOCK(uprobes_treelock);	/* serialize rbtree access */
@@ -723,20 +722,14 @@ remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, loff_t vaddr)
 }
 
 /*
- * There could be threads that have hit the breakpoint and are entering the
- * notifier code and trying to acquire the uprobes_treelock. The thread
- * calling delete_uprobe() that is removing the uprobe from the rb_tree can
- * race with these threads and might acquire the uprobes_treelock compared
- * to some of the breakpoint hit threads. In such a case, the breakpoint
- * hit threads will not find the uprobe. The current unregistering thread
- * waits till all other threads have hit a breakpoint, to acquire the
- * uprobes_treelock before the uprobe is removed from the rbtree.
+ * There could be threads that have already hit the breakpoint. They
+ * will recheck the current insn and restart if find_uprobe() fails.
+ * See find_active_uprobe().
  */
 static void delete_uprobe(struct uprobe *uprobe)
 {
 	unsigned long flags;
 
-	synchronize_srcu(&uprobes_srcu);
 	spin_lock_irqsave(&uprobes_treelock, flags);
 	rb_erase(&uprobe->rb_node, &uprobes_tree);
 	spin_unlock_irqrestore(&uprobes_treelock, flags);
@@ -1373,9 +1366,6 @@ void uprobe_free_utask(struct task_struct *t)
 {
 	struct uprobe_task *utask = t->utask;
 
-	if (t->uprobe_srcu_id != -1)
-		srcu_read_unlock_raw(&uprobes_srcu, t->uprobe_srcu_id);
-
 	if (!utask)
 		return;
 
@@ -1393,7 +1383,6 @@ void uprobe_free_utask(struct task_struct *t)
 void uprobe_copy_process(struct task_struct *t)
 {
 	t->utask = NULL;
-	t->uprobe_srcu_id = -1;
 }
 
 /*
@@ -1521,9 +1510,6 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
 	} else {
 		*is_swbp = -EFAULT;
 	}
-
-	srcu_read_unlock_raw(&uprobes_srcu, current->uprobe_srcu_id);
-	current->uprobe_srcu_id = -1;
 	up_read(&mm->mmap_sem);
 
 	return uprobe;
@@ -1664,7 +1650,6 @@ int uprobe_pre_sstep_notifier(struct pt_regs *regs)
 		utask->state = UTASK_BP_HIT;
 
 	set_thread_flag(TIF_UPROBE);
-	current->uprobe_srcu_id = srcu_read_lock_raw(&uprobes_srcu);
 
 	return 1;
 }
@@ -1699,7 +1684,6 @@ static int __init init_uprobes(void)
 		mutex_init(&uprobes_mutex[i]);
 		mutex_init(&uprobes_mmap_mutex[i]);
 	}
-	init_srcu_struct(&uprobes_srcu);
 
 	return register_die_notifier(&uprobe_exception_nb);
 }
-- 
1.5.5.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-05 22:20 ` Oleg Nesterov
@ 2012-04-14 11:16   ` Ingo Molnar
  -1 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2012-04-14 11:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju, Andrew Morton,
	Linus Torvalds, Ananth N Mavinakayanahalli, Jim Keniston, LKML,
	Linux-mm, Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov


* Oleg Nesterov <oleg@redhat.com> wrote:

> Hello.
> 
> Not for inclusion yet, only for the early review.

Srikar, any objection to these? They look like fine changes to 
me in principle, making uprobes less of an overhead to other 
kernel subsystems.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-14 11:16   ` Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2012-04-14 11:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju, Andrew Morton,
	Linus Torvalds, Ananth N Mavinakayanahalli, Jim Keniston, LKML,
	Linux-mm, Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov


* Oleg Nesterov <oleg@redhat.com> wrote:

> Hello.
> 
> Not for inclusion yet, only for the early review.

Srikar, any objection to these? They look like fine changes to 
me in principle, making uprobes less of an overhead to other 
kernel subsystems.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-05 22:20 ` Oleg Nesterov
@ 2012-04-14 13:16   ` Peter Zijlstra
  -1 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-14 13:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Fri, 2012-04-06 at 00:20 +0200, Oleg Nesterov wrote:
> Hello.
> 
> Not for inclusion yet, only for the early review.
> 
> I didn't even try to test these changes, and I am not expert
> in this area. And even _if_ this code is correct, I need to
> re-split these changes anyway, update the changelogs, etc.
> 
> Questions:
> 
> 	- does it make sense?

Maybe, upside is reclaiming that int from task_struct, downside is that
down_write :/ It would be very good not to have to do that. Nor do I
really see how that works.

> 	- can it work or I missed something "in general" ?

So we insert in the rb-tree before we take mmap_sem, this means we can
hit a non-uprobe int3 and still find a uprobe there, no?

> Why:
> 
> 	- It would be nice to remove a member from task_struct.
> 
> 	- Afaics, the usage of uprobes_srcu does not look right,
> 	  at least in theory, see 6/6.
> 
> 	  The comment above delete_uprobe() says:
> 
> 	  	The current unregistering thread waits till all
> 	  	other threads have hit a breakpoint, to acquire
> 	  	the uprobes_treelock before the uprobe is removed
> 	  	from the rbtree.
> 
> 	  but synchronize_srcu() can only help if a thread which
> 	  have hit the breakpoint has already called srcu_read_lock().
> 	  It can't synchronize with read_lock "in future", and there
> 	  is a small window.
> 
> 	  We could probably add another synchronize_sched() before
> 	  synchronize_srcu(), but this doesn't look very nice and

Right, I think that all was written with the assumption that sync_srcu
implied a sync_rcu, which of course we've recently wrecked.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-14 13:16   ` Peter Zijlstra
  0 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-14 13:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Fri, 2012-04-06 at 00:20 +0200, Oleg Nesterov wrote:
> Hello.
> 
> Not for inclusion yet, only for the early review.
> 
> I didn't even try to test these changes, and I am not expert
> in this area. And even _if_ this code is correct, I need to
> re-split these changes anyway, update the changelogs, etc.
> 
> Questions:
> 
> 	- does it make sense?

Maybe, upside is reclaiming that int from task_struct, downside is that
down_write :/ It would be very good not to have to do that. Nor do I
really see how that works.

> 	- can it work or I missed something "in general" ?

So we insert in the rb-tree before we take mmap_sem, this means we can
hit a non-uprobe int3 and still find a uprobe there, no?

> Why:
> 
> 	- It would be nice to remove a member from task_struct.
> 
> 	- Afaics, the usage of uprobes_srcu does not look right,
> 	  at least in theory, see 6/6.
> 
> 	  The comment above delete_uprobe() says:
> 
> 	  	The current unregistering thread waits till all
> 	  	other threads have hit a breakpoint, to acquire
> 	  	the uprobes_treelock before the uprobe is removed
> 	  	from the rbtree.
> 
> 	  but synchronize_srcu() can only help if a thread which
> 	  have hit the breakpoint has already called srcu_read_lock().
> 	  It can't synchronize with read_lock "in future", and there
> 	  is a small window.
> 
> 	  We could probably add another synchronize_sched() before
> 	  synchronize_srcu(), but this doesn't look very nice and

Right, I think that all was written with the assumption that sync_srcu
implied a sync_rcu, which of course we've recently wrecked.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-14 13:16   ` Peter Zijlstra
@ 2012-04-14 20:52     ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-14 20:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/14, Peter Zijlstra wrote:
>
> On Fri, 2012-04-06 at 00:20 +0200, Oleg Nesterov wrote:
> > Hello.
> >
> > Not for inclusion yet, only for the early review.
> >
> > I didn't even try to test these changes, and I am not expert
> > in this area. And even _if_ this code is correct, I need to
> > re-split these changes anyway, update the changelogs, etc.
> >
> > Questions:
> >
> > 	- does it make sense?
>
> Maybe, upside is reclaiming that int from task_struct, downside is that
> down_write :/ It would be very good not to have to do that.

Yes, down_write() is pessimization, I agree.

> Nor do I
> really see how that works.
>
> > 	- can it work or I missed something "in general" ?
>
> So we insert in the rb-tree before we take mmap_sem, this means we can
> hit a non-uprobe int3 and still find a uprobe there, no?

Yes, but unless I miss something this is "off-topic", this
can happen with or without these changes. If find_uprobe()
succeeds we assume that this bp was inserted by uprobe.

Perhaps uprobe_register() should not "ignore" -EXIST from
install_breakpoint()->is_swbp_insn(), or perhaps we can
add UPROBE_SHARED_BP.

Oleg.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-14 20:52     ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-14 20:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/14, Peter Zijlstra wrote:
>
> On Fri, 2012-04-06 at 00:20 +0200, Oleg Nesterov wrote:
> > Hello.
> >
> > Not for inclusion yet, only for the early review.
> >
> > I didn't even try to test these changes, and I am not expert
> > in this area. And even _if_ this code is correct, I need to
> > re-split these changes anyway, update the changelogs, etc.
> >
> > Questions:
> >
> > 	- does it make sense?
>
> Maybe, upside is reclaiming that int from task_struct, downside is that
> down_write :/ It would be very good not to have to do that.

Yes, down_write() is pessimization, I agree.

> Nor do I
> really see how that works.
>
> > 	- can it work or I missed something "in general" ?
>
> So we insert in the rb-tree before we take mmap_sem, this means we can
> hit a non-uprobe int3 and still find a uprobe there, no?

Yes, but unless I miss something this is "off-topic", this
can happen with or without these changes. If find_uprobe()
succeeds we assume that this bp was inserted by uprobe.

Perhaps uprobe_register() should not "ignore" -EXIST from
install_breakpoint()->is_swbp_insn(), or perhaps we can
add UPROBE_SHARED_BP.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-14 20:52     ` Oleg Nesterov
@ 2012-04-15 10:51       ` Peter Zijlstra
  -1 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-15 10:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Sat, 2012-04-14 at 22:52 +0200, Oleg Nesterov wrote:
> > >     - can it work or I missed something "in general" ?
> >
> > So we insert in the rb-tree before we take mmap_sem, this means we can
> > hit a non-uprobe int3 and still find a uprobe there, no?
> 
> Yes, but unless I miss something this is "off-topic", this
> can happen with or without these changes. If find_uprobe()
> succeeds we assume that this bp was inserted by uprobe.

OK, but then I completely missed what the point of that 
down_write() stuff is..

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-15 10:51       ` Peter Zijlstra
  0 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-15 10:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Sat, 2012-04-14 at 22:52 +0200, Oleg Nesterov wrote:
> > >     - can it work or I missed something "in general" ?
> >
> > So we insert in the rb-tree before we take mmap_sem, this means we can
> > hit a non-uprobe int3 and still find a uprobe there, no?
> 
> Yes, but unless I miss something this is "off-topic", this
> can happen with or without these changes. If find_uprobe()
> succeeds we assume that this bp was inserted by uprobe.

OK, but then I completely missed what the point of that 
down_write() stuff is..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-15 10:51       ` Peter Zijlstra
@ 2012-04-15 19:53         ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-15 19:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/15, Peter Zijlstra wrote:
>
> On Sat, 2012-04-14 at 22:52 +0200, Oleg Nesterov wrote:
> > > >     - can it work or I missed something "in general" ?
> > >
> > > So we insert in the rb-tree before we take mmap_sem, this means we can
> > > hit a non-uprobe int3 and still find a uprobe there, no?
> >
> > Yes, but unless I miss something this is "off-topic", this
> > can happen with or without these changes. If find_uprobe()
> > succeeds we assume that this bp was inserted by uprobe.
>
> OK, but then I completely missed what the point of that
> down_write() stuff is..

To ensure handle_swbp() can't race with unregister + register
and send the wrong SIGTRAP.

handle_swbp() roughly does under down_read(mmap_sem)


	if (find_uprobe(vaddr))
		process_uprobe();
	else
	if (is_swbp_at_addr_fast(vaddr))	// non-uprobe int3
		send_sig(SIGTRAP);
	else
		restart_insn(vaddr);		// raced with unregister


note that is_swbp_at_addr_fast() is used (currently) to detect
the race with upbrobe_unregister() and that is why we can remove
uprobes_srcu.

But if find_uprobe() fails, there is a window before
is_swbp_at_addr_fast() reads the memory. Suppose that the next
uprobe_register() inserts the new uprobe at the same address.
In this case the task will be wrongly killed.

Oleg.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-15 19:53         ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-15 19:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/15, Peter Zijlstra wrote:
>
> On Sat, 2012-04-14 at 22:52 +0200, Oleg Nesterov wrote:
> > > >     - can it work or I missed something "in general" ?
> > >
> > > So we insert in the rb-tree before we take mmap_sem, this means we can
> > > hit a non-uprobe int3 and still find a uprobe there, no?
> >
> > Yes, but unless I miss something this is "off-topic", this
> > can happen with or without these changes. If find_uprobe()
> > succeeds we assume that this bp was inserted by uprobe.
>
> OK, but then I completely missed what the point of that
> down_write() stuff is..

To ensure handle_swbp() can't race with unregister + register
and send the wrong SIGTRAP.

handle_swbp() roughly does under down_read(mmap_sem)


	if (find_uprobe(vaddr))
		process_uprobe();
	else
	if (is_swbp_at_addr_fast(vaddr))	// non-uprobe int3
		send_sig(SIGTRAP);
	else
		restart_insn(vaddr);		// raced with unregister


note that is_swbp_at_addr_fast() is used (currently) to detect
the race with upbrobe_unregister() and that is why we can remove
uprobes_srcu.

But if find_uprobe() fails, there is a window before
is_swbp_at_addr_fast() reads the memory. Suppose that the next
uprobe_register() inserts the new uprobe at the same address.
In this case the task will be wrongly killed.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-15 19:53         ` Oleg Nesterov
@ 2012-04-15 21:48           ` Peter Zijlstra
  -1 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-15 21:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Sun, 2012-04-15 at 21:53 +0200, Oleg Nesterov wrote:
> On 04/15, Peter Zijlstra wrote:
> >
> > On Sat, 2012-04-14 at 22:52 +0200, Oleg Nesterov wrote:
> > > > >     - can it work or I missed something "in general" ?
> > > >
> > > > So we insert in the rb-tree before we take mmap_sem, this means we can
> > > > hit a non-uprobe int3 and still find a uprobe there, no?
> > >
> > > Yes, but unless I miss something this is "off-topic", this
> > > can happen with or without these changes. If find_uprobe()
> > > succeeds we assume that this bp was inserted by uprobe.
> >
> > OK, but then I completely missed what the point of that
> > down_write() stuff is..
> 
> To ensure handle_swbp() can't race with unregister + register
> and send the wrong SIGTRAP.
> 
> handle_swbp() roughly does under down_read(mmap_sem)
> 
> 
> 	if (find_uprobe(vaddr))
> 		process_uprobe();
> 	else
> 	if (is_swbp_at_addr_fast(vaddr))	// non-uprobe int3
> 		send_sig(SIGTRAP);
> 	else
> 		restart_insn(vaddr);		// raced with unregister
> 
> 
> note that is_swbp_at_addr_fast() is used (currently) to detect
> the race with upbrobe_unregister() and that is why we can remove
> uprobes_srcu.
> 
> But if find_uprobe() fails, there is a window before
> is_swbp_at_addr_fast() reads the memory. Suppose that the next
> uprobe_register() inserts the new uprobe at the same address.
> In this case the task will be wrongly killed.

OK, still not seeing how your proposal could work.. consider the below
patch comment, I'm not seeing how is_swbp_at_addr_fast() deals with an
in-progress INT3 while we remove the probe.

By ensuring the non-race with reg/unreg it will either find the uprobe
(no problem) or not find it and not see a breakpoint instruction either,
even though the pending breakpoint was generated by a uprobe (which is
now gone), causing a false positive SIGTRAP.

Or am I still not getting it?

---
 kernel/events/uprobes.c |   53 +++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 29e881b..67818ff 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -723,20 +723,57 @@ remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, loff_t vaddr)
 }
 
 /*
- * There could be threads that have hit the breakpoint and are entering the
- * notifier code and trying to acquire the uprobes_treelock. The thread
- * calling delete_uprobe() that is removing the uprobe from the rb_tree can
- * race with these threads and might acquire the uprobes_treelock compared
- * to some of the breakpoint hit threads. In such a case, the breakpoint
- * hit threads will not find the uprobe. The current unregistering thread
- * waits till all other threads have hit a breakpoint, to acquire the
- * uprobes_treelock before the uprobe is removed from the rbtree.
+ * <userspace>
+ *  ...
+ *  int3 ---->	<IRQ>
+ *	   	  do_int3
+ *	  (A)	    DIE_INT3 -> uprobe_pre_sstep_notifier()
+ *	   	      ...
+ *		      set_thread_flag(TIF_UPROBE)
+ *		      srcu_read_lock_raw()
+ *		<EOI>
+ *	  (B)
+ *		ret_from_intr
+ *		  do_notify_resume()
+ *		    uprobe_notify_resume()
+ *		      handle_swbp()
+ *		        uprobe = find_uprobe()
+ *		          atomic_inc(&uprobe->ref)
+ *			srcu_read_unlock_raw()
+ *			...
+ *	  (C)
+ *	  		put_uprobe()
+ *	 <----	ret_from_intr
+ *
+ * ...
  */
 static void delete_uprobe(struct uprobe *uprobe)
 {
 	unsigned long flags;
 
+	/*
+	 * At this point all breakpoint instructions belonging to this uprobe
+	 * have been removed, so no new references to this uprobe can be
+	 * created, however!
+	 *
+	 * There could be an in-progress breakpoint from before we removed the
+	 * instruction still pending (A). synchronize_sched() insures all CPUs
+	 * will have scheduled at least once, therefore all such pending
+	 * interrupts will hereafter have reached (B) and thus have taken their
+	 * SRCU reference.
+	 */
+	synchronize_sched();
+
+	/*
+	 * Wait for all in-progress breakpoint handlers to finish, ensuring all
+	 * handlers passed (C) turning all references into active refcounts.
+	 */
 	synchronize_srcu(&uprobes_srcu);
+
+	/*
+	 * We can now safely remove the uprobe, all references are active
+	 * references and the refcounting will work as expected.
+	 */
 	spin_lock_irqsave(&uprobes_treelock, flags);
 	rb_erase(&uprobe->rb_node, &uprobes_tree);
 	spin_unlock_irqrestore(&uprobes_treelock, flags);


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-15 21:48           ` Peter Zijlstra
  0 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-15 21:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Sun, 2012-04-15 at 21:53 +0200, Oleg Nesterov wrote:
> On 04/15, Peter Zijlstra wrote:
> >
> > On Sat, 2012-04-14 at 22:52 +0200, Oleg Nesterov wrote:
> > > > >     - can it work or I missed something "in general" ?
> > > >
> > > > So we insert in the rb-tree before we take mmap_sem, this means we can
> > > > hit a non-uprobe int3 and still find a uprobe there, no?
> > >
> > > Yes, but unless I miss something this is "off-topic", this
> > > can happen with or without these changes. If find_uprobe()
> > > succeeds we assume that this bp was inserted by uprobe.
> >
> > OK, but then I completely missed what the point of that
> > down_write() stuff is..
> 
> To ensure handle_swbp() can't race with unregister + register
> and send the wrong SIGTRAP.
> 
> handle_swbp() roughly does under down_read(mmap_sem)
> 
> 
> 	if (find_uprobe(vaddr))
> 		process_uprobe();
> 	else
> 	if (is_swbp_at_addr_fast(vaddr))	// non-uprobe int3
> 		send_sig(SIGTRAP);
> 	else
> 		restart_insn(vaddr);		// raced with unregister
> 
> 
> note that is_swbp_at_addr_fast() is used (currently) to detect
> the race with upbrobe_unregister() and that is why we can remove
> uprobes_srcu.
> 
> But if find_uprobe() fails, there is a window before
> is_swbp_at_addr_fast() reads the memory. Suppose that the next
> uprobe_register() inserts the new uprobe at the same address.
> In this case the task will be wrongly killed.

OK, still not seeing how your proposal could work.. consider the below
patch comment, I'm not seeing how is_swbp_at_addr_fast() deals with an
in-progress INT3 while we remove the probe.

By ensuring the non-race with reg/unreg it will either find the uprobe
(no problem) or not find it and not see a breakpoint instruction either,
even though the pending breakpoint was generated by a uprobe (which is
now gone), causing a false positive SIGTRAP.

Or am I still not getting it?

---
 kernel/events/uprobes.c |   53 +++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 29e881b..67818ff 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -723,20 +723,57 @@ remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, loff_t vaddr)
 }
 
 /*
- * There could be threads that have hit the breakpoint and are entering the
- * notifier code and trying to acquire the uprobes_treelock. The thread
- * calling delete_uprobe() that is removing the uprobe from the rb_tree can
- * race with these threads and might acquire the uprobes_treelock compared
- * to some of the breakpoint hit threads. In such a case, the breakpoint
- * hit threads will not find the uprobe. The current unregistering thread
- * waits till all other threads have hit a breakpoint, to acquire the
- * uprobes_treelock before the uprobe is removed from the rbtree.
+ * <userspace>
+ *  ...
+ *  int3 ---->	<IRQ>
+ *	   	  do_int3
+ *	  (A)	    DIE_INT3 -> uprobe_pre_sstep_notifier()
+ *	   	      ...
+ *		      set_thread_flag(TIF_UPROBE)
+ *		      srcu_read_lock_raw()
+ *		<EOI>
+ *	  (B)
+ *		ret_from_intr
+ *		  do_notify_resume()
+ *		    uprobe_notify_resume()
+ *		      handle_swbp()
+ *		        uprobe = find_uprobe()
+ *		          atomic_inc(&uprobe->ref)
+ *			srcu_read_unlock_raw()
+ *			...
+ *	  (C)
+ *	  		put_uprobe()
+ *	 <----	ret_from_intr
+ *
+ * ...
  */
 static void delete_uprobe(struct uprobe *uprobe)
 {
 	unsigned long flags;
 
+	/*
+	 * At this point all breakpoint instructions belonging to this uprobe
+	 * have been removed, so no new references to this uprobe can be
+	 * created, however!
+	 *
+	 * There could be an in-progress breakpoint from before we removed the
+	 * instruction still pending (A). synchronize_sched() insures all CPUs
+	 * will have scheduled at least once, therefore all such pending
+	 * interrupts will hereafter have reached (B) and thus have taken their
+	 * SRCU reference.
+	 */
+	synchronize_sched();
+
+	/*
+	 * Wait for all in-progress breakpoint handlers to finish, ensuring all
+	 * handlers passed (C) turning all references into active refcounts.
+	 */
 	synchronize_srcu(&uprobes_srcu);
+
+	/*
+	 * We can now safely remove the uprobe, all references are active
+	 * references and the refcounting will work as expected.
+	 */
 	spin_lock_irqsave(&uprobes_treelock, flags);
 	rb_erase(&uprobe->rb_node, &uprobes_tree);
 	spin_unlock_irqrestore(&uprobes_treelock, flags);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-15 21:48           ` Peter Zijlstra
@ 2012-04-15 23:44             ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-15 23:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

I'll write another email tomorrow, just one note...

On 04/15, Peter Zijlstra wrote:
>
> OK, still not seeing how your proposal could work.. consider the below
> patch comment, I'm not seeing how is_swbp_at_addr_fast() deals with an
> in-progress INT3 while we remove the probe.
>
> By ensuring the non-race with reg/unreg it will either find the uprobe
> (no problem)

Yes,

> or not find it and not see a breakpoint instruction either,
> even though the pending breakpoint was generated by a uprobe (which is
> now gone),

Yes,

> causing a false positive SIGTRAP.

No. Please note that if is_swbp_at_addr_fast() sets is_swbp == 0 we
restart this insn.

(note that we also restart if get_user_pages() fails, this is hopefully
 is more correct too but minor).

> Or am I still not getting it?

My experience shows this is very unlikely. I am starting to think
I missed something, will re-check.


And. I have another reason for down_write() in register/unregister.
I am still not sure this is possible (I had no time to try to
implement), but it seems to me we can kill the uprobe counter in
mm_struct.

Oleg.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-15 23:44             ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-15 23:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

I'll write another email tomorrow, just one note...

On 04/15, Peter Zijlstra wrote:
>
> OK, still not seeing how your proposal could work.. consider the below
> patch comment, I'm not seeing how is_swbp_at_addr_fast() deals with an
> in-progress INT3 while we remove the probe.
>
> By ensuring the non-race with reg/unreg it will either find the uprobe
> (no problem)

Yes,

> or not find it and not see a breakpoint instruction either,
> even though the pending breakpoint was generated by a uprobe (which is
> now gone),

Yes,

> causing a false positive SIGTRAP.

No. Please note that if is_swbp_at_addr_fast() sets is_swbp == 0 we
restart this insn.

(note that we also restart if get_user_pages() fails, this is hopefully
 is more correct too but minor).

> Or am I still not getting it?

My experience shows this is very unlikely. I am starting to think
I missed something, will re-check.


And. I have another reason for down_write() in register/unregister.
I am still not sure this is possible (I had no time to try to
implement), but it seems to me we can kill the uprobe counter in
mm_struct.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
  2012-04-05 22:21   ` Oleg Nesterov
@ 2012-04-16 10:08     ` Peter Zijlstra
  -1 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-16 10:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Fri, 2012-04-06 at 00:21 +0200, Oleg Nesterov wrote:
> +int __weak is_swbp_at_addr_fast(unsigned long vaddr)
> +{
> +       uprobe_opcode_t opcode;
> +       int fault;
> +
> +       pagefault_disable();
> +       fault = __copy_from_user_inatomic(&opcode, (void __user*)vaddr,
> +                                                       sizeof(opcode));
> +       pagefault_enable();
> +
> +       if (unlikely(fault)) {
> +               /*
> +                * XXX: read_opcode() lacks FOLL_FORCE, it can fail if
> +                * we race with another thread which does mprotect(NONE)
> +                * after we hit bp.
> +                */
> +               if (read_opcode(current->mm, vaddr, &opcode))
> +                       return -EFAULT;
> +       }
> +
> +       return is_swbp_insn(&opcode);
> +} 

Why bother with the pagefault_disable() and unlikely fault case and not
simply do copy_from_user() and have it deal with the fault if its needed
anyway?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
@ 2012-04-16 10:08     ` Peter Zijlstra
  0 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-16 10:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Fri, 2012-04-06 at 00:21 +0200, Oleg Nesterov wrote:
> +int __weak is_swbp_at_addr_fast(unsigned long vaddr)
> +{
> +       uprobe_opcode_t opcode;
> +       int fault;
> +
> +       pagefault_disable();
> +       fault = __copy_from_user_inatomic(&opcode, (void __user*)vaddr,
> +                                                       sizeof(opcode));
> +       pagefault_enable();
> +
> +       if (unlikely(fault)) {
> +               /*
> +                * XXX: read_opcode() lacks FOLL_FORCE, it can fail if
> +                * we race with another thread which does mprotect(NONE)
> +                * after we hit bp.
> +                */
> +               if (read_opcode(current->mm, vaddr, &opcode))
> +                       return -EFAULT;
> +       }
> +
> +       return is_swbp_insn(&opcode);
> +} 

Why bother with the pagefault_disable() and unlikely fault case and not
simply do copy_from_user() and have it deal with the fault if its needed
anyway?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-15 23:44             ` Oleg Nesterov
@ 2012-04-16 10:16               ` Peter Zijlstra
  -1 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-16 10:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Mon, 2012-04-16 at 01:44 +0200, Oleg Nesterov wrote:

> No. Please note that if is_swbp_at_addr_fast() sets is_swbp == 0 we
> restart this insn.

Ah, see I was missing something..  Hmm ok, let me think about this a
little more though.. but at least I think I'm now (finally!) seeing what
you propose.

> And. I have another reason for down_write() in register/unregister.
> I am still not sure this is possible (I had no time to try to
> implement), but it seems to me we can kill the uprobe counter in
> mm_struct.

You mean by making register/unregister down_write, you're exclusive with
munmap() and thus we can rely on is_swbp_at_addr_fast() to inspect the
address to see if there's a breakpoint or not and avoid the rest of the
work that way?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-16 10:16               ` Peter Zijlstra
  0 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-16 10:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Mon, 2012-04-16 at 01:44 +0200, Oleg Nesterov wrote:

> No. Please note that if is_swbp_at_addr_fast() sets is_swbp == 0 we
> restart this insn.

Ah, see I was missing something..  Hmm ok, let me think about this a
little more though.. but at least I think I'm now (finally!) seeing what
you propose.

> And. I have another reason for down_write() in register/unregister.
> I am still not sure this is possible (I had no time to try to
> implement), but it seems to me we can kill the uprobe counter in
> mm_struct.

You mean by making register/unregister down_write, you're exclusive with
munmap() and thus we can rely on is_swbp_at_addr_fast() to inspect the
address to see if there's a breakpoint or not and avoid the rest of the
work that way?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-14 11:16   ` Ingo Molnar
@ 2012-04-16 11:31     ` Srikar Dronamraju
  -1 siblings, 0 replies; 76+ messages in thread
From: Srikar Dronamraju @ 2012-04-16 11:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oleg Nesterov, Ingo Molnar, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, Ananth N Mavinakayanahalli, Jim Keniston, LKML,
	Linux-mm, Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

* Ingo Molnar <mingo@kernel.org> [2012-04-14 13:16:37]:

> 
> * Oleg Nesterov <oleg@redhat.com> wrote:
> 
> > Hello.
> > 
> > Not for inclusion yet, only for the early review.
> 
> Srikar, any objection to these? They look like fine changes to 
> me in principle, making uprobes less of an overhead to other 
> kernel subsystems.
> 

I dont have any objections to Oleg's changes. However as Oleg has said,
they are currently untested. Once the tracing and perf bits are in -tip,
I will add these bits and test them and let you know.

Given that I have tested the current bits in -tip several times, I think
it might be worthwhile to keep these changes on hold till the current
uprobe bits in -tip go upstream. 

Oleg: Would this be acceptable?

-- 
Thanks and Regards
Srikar


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-16 11:31     ` Srikar Dronamraju
  0 siblings, 0 replies; 76+ messages in thread
From: Srikar Dronamraju @ 2012-04-16 11:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oleg Nesterov, Ingo Molnar, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, Ananth N Mavinakayanahalli, Jim Keniston, LKML,
	Linux-mm, Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

* Ingo Molnar <mingo@kernel.org> [2012-04-14 13:16:37]:

> 
> * Oleg Nesterov <oleg@redhat.com> wrote:
> 
> > Hello.
> > 
> > Not for inclusion yet, only for the early review.
> 
> Srikar, any objection to these? They look like fine changes to 
> me in principle, making uprobes less of an overhead to other 
> kernel subsystems.
> 

I dont have any objections to Oleg's changes. However as Oleg has said,
they are currently untested. Once the tracing and perf bits are in -tip,
I will add these bits and test them and let you know.

Given that I have tested the current bits in -tip several times, I think
it might be worthwhile to keep these changes on hold till the current
uprobe bits in -tip go upstream. 

Oleg: Would this be acceptable?

-- 
Thanks and Regards
Srikar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-16 11:31     ` Srikar Dronamraju
@ 2012-04-16 14:41       ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-16 14:41 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Ingo Molnar, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, Ananth N Mavinakayanahalli, Jim Keniston, LKML,
	Linux-mm, Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/16, Srikar Dronamraju wrote:
>
> Given that I have tested the current bits in -tip several times, I think
> it might be worthwhile to keep these changes on hold till the current
> uprobe bits in -tip go upstream.
>
> Oleg: Would this be acceptable?

Yes, yes, agreed.

Oleg.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-16 14:41       ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-16 14:41 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Ingo Molnar, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, Ananth N Mavinakayanahalli, Jim Keniston, LKML,
	Linux-mm, Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/16, Srikar Dronamraju wrote:
>
> Given that I have tested the current bits in -tip several times, I think
> it might be worthwhile to keep these changes on hold till the current
> uprobe bits in -tip go upstream.
>
> Oleg: Would this be acceptable?

Yes, yes, agreed.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
  2012-04-16 10:08     ` Peter Zijlstra
@ 2012-04-16 14:44       ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-16 14:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/16, Peter Zijlstra wrote:
>
> On Fri, 2012-04-06 at 00:21 +0200, Oleg Nesterov wrote:
> > +int __weak is_swbp_at_addr_fast(unsigned long vaddr)
> > +{
> > +       uprobe_opcode_t opcode;
> > +       int fault;
> > +
> > +       pagefault_disable();
> > +       fault = __copy_from_user_inatomic(&opcode, (void __user*)vaddr,
> > +                                                       sizeof(opcode));
> > +       pagefault_enable();
> > +
> > +       if (unlikely(fault)) {
> > +               /*
> > +                * XXX: read_opcode() lacks FOLL_FORCE, it can fail if
> > +                * we race with another thread which does mprotect(NONE)
> > +                * after we hit bp.
> > +                */
> > +               if (read_opcode(current->mm, vaddr, &opcode))
> > +                       return -EFAULT;
> > +       }
> > +
> > +       return is_swbp_insn(&opcode);
> > +}
>
> Why bother with the pagefault_disable() and unlikely fault case and not
> simply do copy_from_user() and have it deal with the fault if its needed
> anyway?

But we can't do this under down_read(mmap_sem) ?

If another thread waits for down_write() then do_page_fault() can't take
this lock, right?

Oleg.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
@ 2012-04-16 14:44       ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-16 14:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/16, Peter Zijlstra wrote:
>
> On Fri, 2012-04-06 at 00:21 +0200, Oleg Nesterov wrote:
> > +int __weak is_swbp_at_addr_fast(unsigned long vaddr)
> > +{
> > +       uprobe_opcode_t opcode;
> > +       int fault;
> > +
> > +       pagefault_disable();
> > +       fault = __copy_from_user_inatomic(&opcode, (void __user*)vaddr,
> > +                                                       sizeof(opcode));
> > +       pagefault_enable();
> > +
> > +       if (unlikely(fault)) {
> > +               /*
> > +                * XXX: read_opcode() lacks FOLL_FORCE, it can fail if
> > +                * we race with another thread which does mprotect(NONE)
> > +                * after we hit bp.
> > +                */
> > +               if (read_opcode(current->mm, vaddr, &opcode))
> > +                       return -EFAULT;
> > +       }
> > +
> > +       return is_swbp_insn(&opcode);
> > +}
>
> Why bother with the pagefault_disable() and unlikely fault case and not
> simply do copy_from_user() and have it deal with the fault if its needed
> anyway?

But we can't do this under down_read(mmap_sem) ?

If another thread waits for down_write() then do_page_fault() can't take
this lock, right?

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
  2012-04-16 14:44       ` Oleg Nesterov
@ 2012-04-16 14:55         ` Peter Zijlstra
  -1 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-16 14:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Mon, 2012-04-16 at 16:44 +0200, Oleg Nesterov wrote:
> On 04/16, Peter Zijlstra wrote:
> >
> > On Fri, 2012-04-06 at 00:21 +0200, Oleg Nesterov wrote:
> > > +int __weak is_swbp_at_addr_fast(unsigned long vaddr)
> > > +{
> > > +       uprobe_opcode_t opcode;
> > > +       int fault;
> > > +
> > > +       pagefault_disable();
> > > +       fault = __copy_from_user_inatomic(&opcode, (void __user*)vaddr,
> > > +                                                       sizeof(opcode));
> > > +       pagefault_enable();
> > > +
> > > +       if (unlikely(fault)) {
> > > +               /*
> > > +                * XXX: read_opcode() lacks FOLL_FORCE, it can fail if
> > > +                * we race with another thread which does mprotect(NONE)
> > > +                * after we hit bp.
> > > +                */
> > > +               if (read_opcode(current->mm, vaddr, &opcode))
> > > +                       return -EFAULT;
> > > +       }
> > > +
> > > +       return is_swbp_insn(&opcode);
> > > +}
> >
> > Why bother with the pagefault_disable() and unlikely fault case and not
> > simply do copy_from_user() and have it deal with the fault if its needed
> > anyway?
> 
> But we can't do this under down_read(mmap_sem) ?
> 
> If another thread waits for down_write() then do_page_fault() can't take
> this lock, right?

Ah, indeed, I thought read_opcode() would do the fault, but that's
get_user_pages() which requires the caller to hold mmap_sem instead.

Can't we 'optimize' read_opcode() by doing the pagefault_disable() +
__copy_from_user_inatomic() optimistically before going down the whole
gup()+lock+kmap path?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
@ 2012-04-16 14:55         ` Peter Zijlstra
  0 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-16 14:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Mon, 2012-04-16 at 16:44 +0200, Oleg Nesterov wrote:
> On 04/16, Peter Zijlstra wrote:
> >
> > On Fri, 2012-04-06 at 00:21 +0200, Oleg Nesterov wrote:
> > > +int __weak is_swbp_at_addr_fast(unsigned long vaddr)
> > > +{
> > > +       uprobe_opcode_t opcode;
> > > +       int fault;
> > > +
> > > +       pagefault_disable();
> > > +       fault = __copy_from_user_inatomic(&opcode, (void __user*)vaddr,
> > > +                                                       sizeof(opcode));
> > > +       pagefault_enable();
> > > +
> > > +       if (unlikely(fault)) {
> > > +               /*
> > > +                * XXX: read_opcode() lacks FOLL_FORCE, it can fail if
> > > +                * we race with another thread which does mprotect(NONE)
> > > +                * after we hit bp.
> > > +                */
> > > +               if (read_opcode(current->mm, vaddr, &opcode))
> > > +                       return -EFAULT;
> > > +       }
> > > +
> > > +       return is_swbp_insn(&opcode);
> > > +}
> >
> > Why bother with the pagefault_disable() and unlikely fault case and not
> > simply do copy_from_user() and have it deal with the fault if its needed
> > anyway?
> 
> But we can't do this under down_read(mmap_sem) ?
> 
> If another thread waits for down_write() then do_page_fault() can't take
> this lock, right?

Ah, indeed, I thought read_opcode() would do the fault, but that's
get_user_pages() which requires the caller to hold mmap_sem instead.

Can't we 'optimize' read_opcode() by doing the pagefault_disable() +
__copy_from_user_inatomic() optimistically before going down the whole
gup()+lock+kmap path?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
  2012-04-16 14:55         ` Peter Zijlstra
@ 2012-04-16 15:34           ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-16 15:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/16, Peter Zijlstra wrote:
>
> Can't we 'optimize' read_opcode() by doing the pagefault_disable() +
> __copy_from_user_inatomic() optimistically before going down the whole
> gup()+lock+kmap path?

Unlikely, the task is not current.

Oleg.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
@ 2012-04-16 15:34           ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-16 15:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/16, Peter Zijlstra wrote:
>
> Can't we 'optimize' read_opcode() by doing the pagefault_disable() +
> __copy_from_user_inatomic() optimistically before going down the whole
> gup()+lock+kmap path?

Unlikely, the task is not current.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-16 10:16               ` Peter Zijlstra
@ 2012-04-16 21:47                 ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-16 21:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/16, Peter Zijlstra wrote:
>
> On Mon, 2012-04-16 at 01:44 +0200, Oleg Nesterov wrote:
>
> > And. I have another reason for down_write() in register/unregister.
> > I am still not sure this is possible (I had no time to try to
> > implement), but it seems to me we can kill the uprobe counter in
> > mm_struct.
>
> You mean by making register/unregister down_write, you're exclusive with
> munmap()

.. and with register/unregister.

Why do we need mm->uprobes_state.count? It is writeonly, except we
check it in the DIE_INT3 notifier before anything else to avoid the
unnecessary uprobes overhead.

Suppose we kill it, and add the new MMF_HAS_UPROBE flag instead.
install_breakpoint() sets it unconditionally,
uprobe_pre_sstep_notifier() checks it.

(And perhaps we can stop right here? I mean how often this can
 slow down the debugger which installs int3 in the same mm?)

Now we need to clear MMF_HAS_UPROBE somehowe, when the last
uprobe goes away. Lets ignore uprobe_map/unmap for simplicity.

	- We add another flag, MMF_UPROBE_RECALC, it is set by
	  remove_breakpoint().

	- We change handle_swbp(). Ignoring all details it does:

		if (find_uprobe(vaddr))
			process_uprobe();
		else if (test_bit(MMF_HAS_UPROBE) && test_bit(MMF_UPROBE_RECALC))
			recalc_mmf_uprobe_flag();

	  where recalc_mmf_uprobe_flag() checks all vmas and either
	  clears both flags or MMF_UPROBE_RECALC only.

	  This is the really slow O(n) path, but it can only happen after
	  unregister, and only if we hit another non-uprobe breakpoint
	  in the same mm.

Something like this. What do you think?

Oleg.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-16 21:47                 ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-16 21:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/16, Peter Zijlstra wrote:
>
> On Mon, 2012-04-16 at 01:44 +0200, Oleg Nesterov wrote:
>
> > And. I have another reason for down_write() in register/unregister.
> > I am still not sure this is possible (I had no time to try to
> > implement), but it seems to me we can kill the uprobe counter in
> > mm_struct.
>
> You mean by making register/unregister down_write, you're exclusive with
> munmap()

.. and with register/unregister.

Why do we need mm->uprobes_state.count? It is writeonly, except we
check it in the DIE_INT3 notifier before anything else to avoid the
unnecessary uprobes overhead.

Suppose we kill it, and add the new MMF_HAS_UPROBE flag instead.
install_breakpoint() sets it unconditionally,
uprobe_pre_sstep_notifier() checks it.

(And perhaps we can stop right here? I mean how often this can
 slow down the debugger which installs int3 in the same mm?)

Now we need to clear MMF_HAS_UPROBE somehowe, when the last
uprobe goes away. Lets ignore uprobe_map/unmap for simplicity.

	- We add another flag, MMF_UPROBE_RECALC, it is set by
	  remove_breakpoint().

	- We change handle_swbp(). Ignoring all details it does:

		if (find_uprobe(vaddr))
			process_uprobe();
		else if (test_bit(MMF_HAS_UPROBE) && test_bit(MMF_UPROBE_RECALC))
			recalc_mmf_uprobe_flag();

	  where recalc_mmf_uprobe_flag() checks all vmas and either
	  clears both flags or MMF_UPROBE_RECALC only.

	  This is the really slow O(n) path, but it can only happen after
	  unregister, and only if we hit another non-uprobe breakpoint
	  in the same mm.

Something like this. What do you think?

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
  2012-04-16 15:34           ` Oleg Nesterov
@ 2012-04-17 10:08             ` Peter Zijlstra
  -1 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-17 10:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Mon, 2012-04-16 at 17:34 +0200, Oleg Nesterov wrote:
> On 04/16, Peter Zijlstra wrote:
> >
> > Can't we 'optimize' read_opcode() by doing the pagefault_disable() +
> > __copy_from_user_inatomic() optimistically before going down the whole
> > gup()+lock+kmap path?
> 
> Unlikely, the task is not current.

Easy enough to test that though.. and that should make the regular path
fast enough, no?


---
 kernel/events/uprobes.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 985be4d..7f5d8c5 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -312,6 +312,15 @@ static int read_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_
 	void *vaddr_new;
 	int ret;
 
+	if (mm == current->mm) {
+		pagefault_disable();
+		ret = __copy_from_user_inatomic(opcode, (void __user *)vaddr, 
+						sizeof(*opcode));
+		pagefault_enable();
+		if (!ret)
+			return 0;
+	}
+
 	ret = get_user_pages(NULL, mm, vaddr, 1, 0, 0, &page, NULL);
 	if (ret <= 0)
 		return ret;


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
@ 2012-04-17 10:08             ` Peter Zijlstra
  0 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-17 10:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Mon, 2012-04-16 at 17:34 +0200, Oleg Nesterov wrote:
> On 04/16, Peter Zijlstra wrote:
> >
> > Can't we 'optimize' read_opcode() by doing the pagefault_disable() +
> > __copy_from_user_inatomic() optimistically before going down the whole
> > gup()+lock+kmap path?
> 
> Unlikely, the task is not current.

Easy enough to test that though.. and that should make the regular path
fast enough, no?


---
 kernel/events/uprobes.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 985be4d..7f5d8c5 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -312,6 +312,15 @@ static int read_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_
 	void *vaddr_new;
 	int ret;
 
+	if (mm == current->mm) {
+		pagefault_disable();
+		ret = __copy_from_user_inatomic(opcode, (void __user *)vaddr, 
+						sizeof(*opcode));
+		pagefault_enable();
+		if (!ret)
+			return 0;
+	}
+
 	ret = get_user_pages(NULL, mm, vaddr, 1, 0, 0, &page, NULL);
 	if (ret <= 0)
 		return ret;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
  2012-04-17 10:08             ` Peter Zijlstra
@ 2012-04-17 17:09               ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-17 17:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/17, Peter Zijlstra wrote:
>
> On Mon, 2012-04-16 at 17:34 +0200, Oleg Nesterov wrote:
> > On 04/16, Peter Zijlstra wrote:
> > >
> > > Can't we 'optimize' read_opcode() by doing the pagefault_disable() +
> > > __copy_from_user_inatomic() optimistically before going down the whole
> > > gup()+lock+kmap path?
> >
> > Unlikely, the task is not current.
>
> Easy enough to test that though.. and that should make the regular path
> fast enough, no?
>
>
> ---
>  kernel/events/uprobes.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 985be4d..7f5d8c5 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -312,6 +312,15 @@ static int read_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_
>  	void *vaddr_new;
>  	int ret;
>
> +	if (mm == current->mm) {
> +		pagefault_disable();
> +		ret = __copy_from_user_inatomic(opcode, (void __user *)vaddr,
> +						sizeof(*opcode));
> +		pagefault_enable();
> +		if (!ret)
> +			return 0;
> +	}

Indeed. And then we do not need is_swbp_at_addr_fast().

This reminds me. Why read_opcode() does lock_page? I was going
to send the cleanup which removes it, but I need to recheck.

Perhaps you can explain the reason?

Oleg.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
@ 2012-04-17 17:09               ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-17 17:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/17, Peter Zijlstra wrote:
>
> On Mon, 2012-04-16 at 17:34 +0200, Oleg Nesterov wrote:
> > On 04/16, Peter Zijlstra wrote:
> > >
> > > Can't we 'optimize' read_opcode() by doing the pagefault_disable() +
> > > __copy_from_user_inatomic() optimistically before going down the whole
> > > gup()+lock+kmap path?
> >
> > Unlikely, the task is not current.
>
> Easy enough to test that though.. and that should make the regular path
> fast enough, no?
>
>
> ---
>  kernel/events/uprobes.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 985be4d..7f5d8c5 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -312,6 +312,15 @@ static int read_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_
>  	void *vaddr_new;
>  	int ret;
>
> +	if (mm == current->mm) {
> +		pagefault_disable();
> +		ret = __copy_from_user_inatomic(opcode, (void __user *)vaddr,
> +						sizeof(*opcode));
> +		pagefault_enable();
> +		if (!ret)
> +			return 0;
> +	}

Indeed. And then we do not need is_swbp_at_addr_fast().

This reminds me. Why read_opcode() does lock_page? I was going
to send the cleanup which removes it, but I need to recheck.

Perhaps you can explain the reason?

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
  2012-04-17 17:09               ` Oleg Nesterov
@ 2012-04-17 19:53                 ` Peter Zijlstra
  -1 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-17 19:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Tue, 2012-04-17 at 19:09 +0200, Oleg Nesterov wrote:
> 
> This reminds me. Why read_opcode() does lock_page? I was going
> to send the cleanup which removes it, but I need to recheck.
> 
> Perhaps you can explain the reason? 

I can't seem to recall, I suspect its to serialize against
__replace_page(). I can't say if that's strictly needed though.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast()
@ 2012-04-17 19:53                 ` Peter Zijlstra
  0 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-17 19:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Tue, 2012-04-17 at 19:09 +0200, Oleg Nesterov wrote:
> 
> This reminds me. Why read_opcode() does lock_page? I was going
> to send the cleanup which removes it, but I need to recheck.
> 
> Perhaps you can explain the reason? 

I can't seem to recall, I suspect its to serialize against
__replace_page(). I can't say if that's strictly needed though.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-16 21:47                 ` Oleg Nesterov
@ 2012-04-20 10:14                   ` Peter Zijlstra
  -1 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-20 10:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Mon, 2012-04-16 at 23:47 +0200, Oleg Nesterov wrote:
> On 04/16, Peter Zijlstra wrote:
> >
> > On Mon, 2012-04-16 at 01:44 +0200, Oleg Nesterov wrote:
> >
> > > And. I have another reason for down_write() in register/unregister.
> > > I am still not sure this is possible (I had no time to try to
> > > implement), but it seems to me we can kill the uprobe counter in
> > > mm_struct.
> >
> > You mean by making register/unregister down_write, you're exclusive with
> > munmap()
> 
> .. and with register/unregister.
> 
> Why do we need mm->uprobes_state.count? It is writeonly, except we
> check it in the DIE_INT3 notifier before anything else to avoid the
> unnecessary uprobes overhead.

and uprobe_munmap().

> Suppose we kill it, and add the new MMF_HAS_UPROBE flag instead.
> install_breakpoint() sets it unconditionally,
> uprobe_pre_sstep_notifier() checks it.

Argh, why are MMF_flags part of sched.h.. one would expect those to be
in mm.h or mm_types.h.. somewhere near struct mm.

> (And perhaps we can stop right here? I mean how often this can
>  slow down the debugger which installs int3 in the same mm?)
> 
> Now we need to clear MMF_HAS_UPROBE somehowe, when the last
> uprobe goes away. Lets ignore uprobe_map/unmap for simplicity.
>
> 	- We add another flag, MMF_UPROBE_RECALC, it is set by
> 	  remove_breakpoint().
> 
> 	- We change handle_swbp(). Ignoring all details it does:
> 
> 		if (find_uprobe(vaddr))
> 			process_uprobe();
> 		else if (test_bit(MMF_HAS_UPROBE) && test_bit(MMF_UPROBE_RECALC))
> 			recalc_mmf_uprobe_flag();
> 
> 	  where recalc_mmf_uprobe_flag() checks all vmas and either
> 	  clears both flags or MMF_UPROBE_RECALC only.
> 
> 	  This is the really slow O(n) path, but it can only happen after
> 	  unregister, and only if we hit another non-uprobe breakpoint
> 	  in the same mm.
> 
> Something like this. What do you think?

I think I can live with the simple set MMF_HAS_UPROBE and leave it at
that. The better optimization seems to be to not install breakpoints
when ->filter() excludes the task..

It looks like we currently install the breakpoint unconditionally and
only ->filter() once we hit the breakpoint, which is somewhat
sub-optimal.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-20 10:14                   ` Peter Zijlstra
  0 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-20 10:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Mon, 2012-04-16 at 23:47 +0200, Oleg Nesterov wrote:
> On 04/16, Peter Zijlstra wrote:
> >
> > On Mon, 2012-04-16 at 01:44 +0200, Oleg Nesterov wrote:
> >
> > > And. I have another reason for down_write() in register/unregister.
> > > I am still not sure this is possible (I had no time to try to
> > > implement), but it seems to me we can kill the uprobe counter in
> > > mm_struct.
> >
> > You mean by making register/unregister down_write, you're exclusive with
> > munmap()
> 
> .. and with register/unregister.
> 
> Why do we need mm->uprobes_state.count? It is writeonly, except we
> check it in the DIE_INT3 notifier before anything else to avoid the
> unnecessary uprobes overhead.

and uprobe_munmap().

> Suppose we kill it, and add the new MMF_HAS_UPROBE flag instead.
> install_breakpoint() sets it unconditionally,
> uprobe_pre_sstep_notifier() checks it.

Argh, why are MMF_flags part of sched.h.. one would expect those to be
in mm.h or mm_types.h.. somewhere near struct mm.

> (And perhaps we can stop right here? I mean how often this can
>  slow down the debugger which installs int3 in the same mm?)
> 
> Now we need to clear MMF_HAS_UPROBE somehowe, when the last
> uprobe goes away. Lets ignore uprobe_map/unmap for simplicity.
>
> 	- We add another flag, MMF_UPROBE_RECALC, it is set by
> 	  remove_breakpoint().
> 
> 	- We change handle_swbp(). Ignoring all details it does:
> 
> 		if (find_uprobe(vaddr))
> 			process_uprobe();
> 		else if (test_bit(MMF_HAS_UPROBE) && test_bit(MMF_UPROBE_RECALC))
> 			recalc_mmf_uprobe_flag();
> 
> 	  where recalc_mmf_uprobe_flag() checks all vmas and either
> 	  clears both flags or MMF_UPROBE_RECALC only.
> 
> 	  This is the really slow O(n) path, but it can only happen after
> 	  unregister, and only if we hit another non-uprobe breakpoint
> 	  in the same mm.
> 
> Something like this. What do you think?

I think I can live with the simple set MMF_HAS_UPROBE and leave it at
that. The better optimization seems to be to not install breakpoints
when ->filter() excludes the task..

It looks like we currently install the breakpoint unconditionally and
only ->filter() once we hit the breakpoint, which is somewhat
sub-optimal.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-20 10:14                   ` Peter Zijlstra
@ 2012-04-20 10:16                     ` Srikar Dronamraju
  -1 siblings, 0 replies; 76+ messages in thread
From: Srikar Dronamraju @ 2012-04-20 10:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

* Peter Zijlstra <peterz@infradead.org> [2012-04-20 12:14:21]:

> On Mon, 2012-04-16 at 23:47 +0200, Oleg Nesterov wrote:
> > On 04/16, Peter Zijlstra wrote:
> > >
> > > On Mon, 2012-04-16 at 01:44 +0200, Oleg Nesterov wrote:
> > >
> > > > And. I have another reason for down_write() in register/unregister.
> > > > I am still not sure this is possible (I had no time to try to
> > > > implement), but it seems to me we can kill the uprobe counter in
> > > > mm_struct.
> > >
> > > You mean by making register/unregister down_write, you're exclusive with
> > > munmap()
> > 
> > .. and with register/unregister.
> > 
> > Why do we need mm->uprobes_state.count? It is writeonly, except we
> > check it in the DIE_INT3 notifier before anything else to avoid the
> > unnecessary uprobes overhead.
> 
> and uprobe_munmap().

If we can kill mm->uprobs_state.count, we can do away with
uprobe_munmap. Because uprobe_munmap is only around to manage
mm->uprobes_state.count.

> 
> > Suppose we kill it, and add the new MMF_HAS_UPROBE flag instead.
> > install_breakpoint() sets it unconditionally,
> > uprobe_pre_sstep_notifier() checks it.
> 
> Argh, why are MMF_flags part of sched.h.. one would expect those to be
> in mm.h or mm_types.h.. somewhere near struct mm.
> 
> > (And perhaps we can stop right here? I mean how often this can
> >  slow down the debugger which installs int3 in the same mm?)
> > 
> > Now we need to clear MMF_HAS_UPROBE somehowe, when the last
> > uprobe goes away. Lets ignore uprobe_map/unmap for simplicity.
> >
> > 	- We add another flag, MMF_UPROBE_RECALC, it is set by
> > 	  remove_breakpoint().
> > 
> > 	- We change handle_swbp(). Ignoring all details it does:
> > 
> > 		if (find_uprobe(vaddr))
> > 			process_uprobe();
> > 		else if (test_bit(MMF_HAS_UPROBE) && test_bit(MMF_UPROBE_RECALC))
> > 			recalc_mmf_uprobe_flag();
> > 
> > 	  where recalc_mmf_uprobe_flag() checks all vmas and either
> > 	  clears both flags or MMF_UPROBE_RECALC only.
> > 
> > 	  This is the really slow O(n) path, but it can only happen after
> > 	  unregister, and only if we hit another non-uprobe breakpoint
> > 	  in the same mm.
> > 
> > Something like this. What do you think?
> 
> I think I can live with the simple set MMF_HAS_UPROBE and leave it at
> that. The better optimization seems to be to not install breakpoints
> when ->filter() excludes the task..
> 
> It looks like we currently install the breakpoint unconditionally and
> only ->filter() once we hit the breakpoint, which is somewhat
> sub-optimal.
> 

Yes, We install breakpoints unconditionally, I think we had already
discussed this and Oleg had proposed a solution too.
http://lkml.org/lkml/2011/6/16/470 where we move the mm struct from task
struct to signal struct.

-- 
Thanks and Regards
Srikar


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-20 10:16                     ` Srikar Dronamraju
  0 siblings, 0 replies; 76+ messages in thread
From: Srikar Dronamraju @ 2012-04-20 10:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

* Peter Zijlstra <peterz@infradead.org> [2012-04-20 12:14:21]:

> On Mon, 2012-04-16 at 23:47 +0200, Oleg Nesterov wrote:
> > On 04/16, Peter Zijlstra wrote:
> > >
> > > On Mon, 2012-04-16 at 01:44 +0200, Oleg Nesterov wrote:
> > >
> > > > And. I have another reason for down_write() in register/unregister.
> > > > I am still not sure this is possible (I had no time to try to
> > > > implement), but it seems to me we can kill the uprobe counter in
> > > > mm_struct.
> > >
> > > You mean by making register/unregister down_write, you're exclusive with
> > > munmap()
> > 
> > .. and with register/unregister.
> > 
> > Why do we need mm->uprobes_state.count? It is writeonly, except we
> > check it in the DIE_INT3 notifier before anything else to avoid the
> > unnecessary uprobes overhead.
> 
> and uprobe_munmap().

If we can kill mm->uprobs_state.count, we can do away with
uprobe_munmap. Because uprobe_munmap is only around to manage
mm->uprobes_state.count.

> 
> > Suppose we kill it, and add the new MMF_HAS_UPROBE flag instead.
> > install_breakpoint() sets it unconditionally,
> > uprobe_pre_sstep_notifier() checks it.
> 
> Argh, why are MMF_flags part of sched.h.. one would expect those to be
> in mm.h or mm_types.h.. somewhere near struct mm.
> 
> > (And perhaps we can stop right here? I mean how often this can
> >  slow down the debugger which installs int3 in the same mm?)
> > 
> > Now we need to clear MMF_HAS_UPROBE somehowe, when the last
> > uprobe goes away. Lets ignore uprobe_map/unmap for simplicity.
> >
> > 	- We add another flag, MMF_UPROBE_RECALC, it is set by
> > 	  remove_breakpoint().
> > 
> > 	- We change handle_swbp(). Ignoring all details it does:
> > 
> > 		if (find_uprobe(vaddr))
> > 			process_uprobe();
> > 		else if (test_bit(MMF_HAS_UPROBE) && test_bit(MMF_UPROBE_RECALC))
> > 			recalc_mmf_uprobe_flag();
> > 
> > 	  where recalc_mmf_uprobe_flag() checks all vmas and either
> > 	  clears both flags or MMF_UPROBE_RECALC only.
> > 
> > 	  This is the really slow O(n) path, but it can only happen after
> > 	  unregister, and only if we hit another non-uprobe breakpoint
> > 	  in the same mm.
> > 
> > Something like this. What do you think?
> 
> I think I can live with the simple set MMF_HAS_UPROBE and leave it at
> that. The better optimization seems to be to not install breakpoints
> when ->filter() excludes the task..
> 
> It looks like we currently install the breakpoint unconditionally and
> only ->filter() once we hit the breakpoint, which is somewhat
> sub-optimal.
> 

Yes, We install breakpoints unconditionally, I think we had already
discussed this and Oleg had proposed a solution too.
http://lkml.org/lkml/2011/6/16/470 where we move the mm struct from task
struct to signal struct.

-- 
Thanks and Regards
Srikar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-20 10:14                   ` Peter Zijlstra
@ 2012-04-20 18:37                     ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-20 18:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/20, Peter Zijlstra wrote:
>
> On Mon, 2012-04-16 at 23:47 +0200, Oleg Nesterov wrote:
> > (And perhaps we can stop right here? I mean how often this can
> >  slow down the debugger which installs int3 in the same mm?)
> >
> > Now we need to clear MMF_HAS_UPROBE somehowe, when the last
> > uprobe goes away. Lets ignore uprobe_map/unmap for simplicity.
> >
> > 	- We add another flag, MMF_UPROBE_RECALC, it is set by
> > 	  remove_breakpoint().
> >
> > 	- We change handle_swbp(). Ignoring all details it does:
> >
> > 		if (find_uprobe(vaddr))
> > 			process_uprobe();
> > 		else if (test_bit(MMF_HAS_UPROBE) && test_bit(MMF_UPROBE_RECALC))
> > 			recalc_mmf_uprobe_flag();
> >
> > 	  where recalc_mmf_uprobe_flag() checks all vmas and either
> > 	  clears both flags or MMF_UPROBE_RECALC only.
> >
> > 	  This is the really slow O(n) path, but it can only happen after
> > 	  unregister, and only if we hit another non-uprobe breakpoint
> > 	  in the same mm.
> >
> > Something like this. What do you think?
>
> I think I can live with the simple set MMF_HAS_UPROBE and leave it at
> that.

Sure, I agree.

A false positive MMF_HAS_UPROBE can only slow down the non-uprobe
int3 in the same ->mm, I think we can tolerate this.

> The better optimization seems to be to not install breakpoints
> when ->filter() excludes the task..

Ah, this is another story. And I agree this is more important.

So far I do not understand what we should do. Of course, it would
be simple to add the filtering when we install the breakpoint but
I don't think it is that simple, even if we ignore the nasty
complications with multiple consumers with different filters.

Say, a user wants to probe /sbin/init only. What if init forks?
We should remove breakpoints from child->mm somehow.

And then we also need the filtering in uprobe_mmap() at least.

But yes, I agree, it would be very nice to do this.

Oleg.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-20 18:37                     ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-20 18:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/20, Peter Zijlstra wrote:
>
> On Mon, 2012-04-16 at 23:47 +0200, Oleg Nesterov wrote:
> > (And perhaps we can stop right here? I mean how often this can
> >  slow down the debugger which installs int3 in the same mm?)
> >
> > Now we need to clear MMF_HAS_UPROBE somehowe, when the last
> > uprobe goes away. Lets ignore uprobe_map/unmap for simplicity.
> >
> > 	- We add another flag, MMF_UPROBE_RECALC, it is set by
> > 	  remove_breakpoint().
> >
> > 	- We change handle_swbp(). Ignoring all details it does:
> >
> > 		if (find_uprobe(vaddr))
> > 			process_uprobe();
> > 		else if (test_bit(MMF_HAS_UPROBE) && test_bit(MMF_UPROBE_RECALC))
> > 			recalc_mmf_uprobe_flag();
> >
> > 	  where recalc_mmf_uprobe_flag() checks all vmas and either
> > 	  clears both flags or MMF_UPROBE_RECALC only.
> >
> > 	  This is the really slow O(n) path, but it can only happen after
> > 	  unregister, and only if we hit another non-uprobe breakpoint
> > 	  in the same mm.
> >
> > Something like this. What do you think?
>
> I think I can live with the simple set MMF_HAS_UPROBE and leave it at
> that.

Sure, I agree.

A false positive MMF_HAS_UPROBE can only slow down the non-uprobe
int3 in the same ->mm, I think we can tolerate this.

> The better optimization seems to be to not install breakpoints
> when ->filter() excludes the task..

Ah, this is another story. And I agree this is more important.

So far I do not understand what we should do. Of course, it would
be simple to add the filtering when we install the breakpoint but
I don't think it is that simple, even if we ignore the nasty
complications with multiple consumers with different filters.

Say, a user wants to probe /sbin/init only. What if init forks?
We should remove breakpoints from child->mm somehow.

And then we also need the filtering in uprobe_mmap() at least.

But yes, I agree, it would be very nice to do this.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-20 10:16                     ` Srikar Dronamraju
@ 2012-04-20 18:58                       ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-20 18:58 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/20, Srikar Dronamraju wrote:
>
> Yes, We install breakpoints unconditionally, I think we had already
> discussed this and Oleg had proposed a solution too.
> http://lkml.org/lkml/2011/6/16/470 where we move the mm struct from task
> struct to signal struct.

Yes, this too.

Hmm. May be we can do something else? Currently uprobe_consumer->filter()
returns a boolean. Perhaps it can return "yes" || "no" || "remove_bp" ?
But then another uprobe_register(same_addr) should handle this case
somehow. Just a crazy idea, unlikely it makes sense.

Oleg.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-20 18:58                       ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-20 18:58 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/20, Srikar Dronamraju wrote:
>
> Yes, We install breakpoints unconditionally, I think we had already
> discussed this and Oleg had proposed a solution too.
> http://lkml.org/lkml/2011/6/16/470 where we move the mm struct from task
> struct to signal struct.

Yes, this too.

Hmm. May be we can do something else? Currently uprobe_consumer->filter()
returns a boolean. Perhaps it can return "yes" || "no" || "remove_bp" ?
But then another uprobe_register(same_addr) should handle this case
somehow. Just a crazy idea, unlikely it makes sense.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-20 18:37                     ` Oleg Nesterov
@ 2012-04-23  7:14                       ` Peter Zijlstra
  -1 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-23  7:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Fri, 2012-04-20 at 20:37 +0200, Oleg Nesterov wrote:
> Say, a user wants to probe /sbin/init only. What if init forks?
> We should remove breakpoints from child->mm somehow. 

How is that hard? dup_mmap() only copies the VMAs, this doesn't actually
copy the breakpoint. So the child doesn't have a breakpoint to be
removed.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-23  7:14                       ` Peter Zijlstra
  0 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-23  7:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Fri, 2012-04-20 at 20:37 +0200, Oleg Nesterov wrote:
> Say, a user wants to probe /sbin/init only. What if init forks?
> We should remove breakpoints from child->mm somehow. 

How is that hard? dup_mmap() only copies the VMAs, this doesn't actually
copy the breakpoint. So the child doesn't have a breakpoint to be
removed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-23  7:14                       ` Peter Zijlstra
@ 2012-04-23  7:24                         ` Srikar Dronamraju
  -1 siblings, 0 replies; 76+ messages in thread
From: Srikar Dronamraju @ 2012-04-23  7:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

* Peter Zijlstra <peterz@infradead.org> [2012-04-23 09:14:00]:

> On Fri, 2012-04-20 at 20:37 +0200, Oleg Nesterov wrote:
> > Say, a user wants to probe /sbin/init only. What if init forks?
> > We should remove breakpoints from child->mm somehow. 
> 
> How is that hard? dup_mmap() only copies the VMAs, this doesn't actually
> copy the breakpoint. So the child doesn't have a breakpoint to be
> removed.
> 

Because the pages are COWED, the breakpoint gets copied over to the
child. If we dont want the breakpoints to be not visible to the child,
then we would have to remove them explicitly based on the filter (i.e if
and if we had inserted breakpoints conditionally based on filter). 

Once we add the conditional breakpoint insertion (which is tricky), we have
to support conditional breakpoint removal in the dup_mmap() thro the
uprobe_mmap hook (which I think is not that hard).  Conditional removal
of breakpoints in fork path would just be an extension of the
conditional breakpoint insertion.

-- 
Thanks and Regards
Srikar


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-23  7:24                         ` Srikar Dronamraju
  0 siblings, 0 replies; 76+ messages in thread
From: Srikar Dronamraju @ 2012-04-23  7:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

* Peter Zijlstra <peterz@infradead.org> [2012-04-23 09:14:00]:

> On Fri, 2012-04-20 at 20:37 +0200, Oleg Nesterov wrote:
> > Say, a user wants to probe /sbin/init only. What if init forks?
> > We should remove breakpoints from child->mm somehow. 
> 
> How is that hard? dup_mmap() only copies the VMAs, this doesn't actually
> copy the breakpoint. So the child doesn't have a breakpoint to be
> removed.
> 

Because the pages are COWED, the breakpoint gets copied over to the
child. If we dont want the breakpoints to be not visible to the child,
then we would have to remove them explicitly based on the filter (i.e if
and if we had inserted breakpoints conditionally based on filter). 

Once we add the conditional breakpoint insertion (which is tricky), we have
to support conditional breakpoint removal in the dup_mmap() thro the
uprobe_mmap hook (which I think is not that hard).  Conditional removal
of breakpoints in fork path would just be an extension of the
conditional breakpoint insertion.

-- 
Thanks and Regards
Srikar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-23  7:24                         ` Srikar Dronamraju
@ 2012-04-23  7:40                           ` Peter Zijlstra
  -1 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-23  7:40 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Oleg Nesterov, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Mon, 2012-04-23 at 12:54 +0530, Srikar Dronamraju wrote:
> * Peter Zijlstra <peterz@infradead.org> [2012-04-23 09:14:00]:
> 
> > On Fri, 2012-04-20 at 20:37 +0200, Oleg Nesterov wrote:
> > > Say, a user wants to probe /sbin/init only. What if init forks?
> > > We should remove breakpoints from child->mm somehow. 
> > 
> > How is that hard? dup_mmap() only copies the VMAs, this doesn't actually
> > copy the breakpoint. So the child doesn't have a breakpoint to be
> > removed.
> > 
> 
> Because the pages are COWED, the breakpoint gets copied over to the
> child. If we dont want the breakpoints to be not visible to the child,
> then we would have to remove them explicitly based on the filter (i.e if
> and if we had inserted breakpoints conditionally based on filter). 

I thought we didn't COW shared maps since the fault handler will fill in
the pages right and only anon stuff gets copied.

> Once we add the conditional breakpoint insertion (which is tricky),

How so?

>  we have
> to support conditional breakpoint removal in the dup_mmap() thro the
> uprobe_mmap hook (which I think is not that hard).  Conditional removal
> of breakpoints in fork path would just be an extension of the
> conditional breakpoint insertion.

Right, I don't think that removal is particularly hard if needed.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-23  7:40                           ` Peter Zijlstra
  0 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-23  7:40 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Oleg Nesterov, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Mon, 2012-04-23 at 12:54 +0530, Srikar Dronamraju wrote:
> * Peter Zijlstra <peterz@infradead.org> [2012-04-23 09:14:00]:
> 
> > On Fri, 2012-04-20 at 20:37 +0200, Oleg Nesterov wrote:
> > > Say, a user wants to probe /sbin/init only. What if init forks?
> > > We should remove breakpoints from child->mm somehow. 
> > 
> > How is that hard? dup_mmap() only copies the VMAs, this doesn't actually
> > copy the breakpoint. So the child doesn't have a breakpoint to be
> > removed.
> > 
> 
> Because the pages are COWED, the breakpoint gets copied over to the
> child. If we dont want the breakpoints to be not visible to the child,
> then we would have to remove them explicitly based on the filter (i.e if
> and if we had inserted breakpoints conditionally based on filter). 

I thought we didn't COW shared maps since the fault handler will fill in
the pages right and only anon stuff gets copied.

> Once we add the conditional breakpoint insertion (which is tricky),

How so?

>  we have
> to support conditional breakpoint removal in the dup_mmap() thro the
> uprobe_mmap hook (which I think is not that hard).  Conditional removal
> of breakpoints in fork path would just be an extension of the
> conditional breakpoint insertion.

Right, I don't think that removal is particularly hard if needed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-23  7:40                           ` Peter Zijlstra
@ 2012-04-23 17:29                             ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-23 17:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/23, Peter Zijlstra wrote:
>
> On Mon, 2012-04-23 at 12:54 +0530, Srikar Dronamraju wrote:
> > * Peter Zijlstra <peterz@infradead.org> [2012-04-23 09:14:00]:
> >
> > > On Fri, 2012-04-20 at 20:37 +0200, Oleg Nesterov wrote:
> > > > Say, a user wants to probe /sbin/init only. What if init forks?
> > > > We should remove breakpoints from child->mm somehow.
> > >
> > > How is that hard? dup_mmap() only copies the VMAs, this doesn't actually
> > > copy the breakpoint. So the child doesn't have a breakpoint to be
> > > removed.
> > >
> >
> > Because the pages are COWED, the breakpoint gets copied over to the
> > child. If we dont want the breakpoints to be not visible to the child,
> > then we would have to remove them explicitly based on the filter (i.e if
> > and if we had inserted breakpoints conditionally based on filter).
>
> I thought we didn't COW shared maps since the fault handler will fill in
> the pages right and only anon stuff gets copied.

Confused...

Do you mean the "Don't copy ptes where a page fault will fill them correctly"
check in copy_page_range() ? Yes, but this vma should have ->anon_vma != NULL
if it has the breakpoint installed by uprobes.

Yes, we do not COW this page during dup_mmap(), but the new child's pte
should point to the same page with bp.

OK, I guess I misunderstood.

> > Once we add the conditional breakpoint insertion (which is tricky),
>
> How so?

I agree with Srikar this doesn't look simple to me. First of all,
currently it is not easy to find the tasks which use this ->mm.
OK, we can simply do for_each_process() under tasklist, but this is
not very nice.

But again, to me this is not the main problem.

> > Conditional removal
> > of breakpoints in fork path would just be an extension of the
> > conditional breakpoint insertion.
>
> Right, I don't think that removal is particularly hard if needed.

I agree that remove_breakpoint() itself is not that hard, probably.

But the whole idea of filtering is not clear to me. I mean, when/how
we should call the filter, and what should be the argument.
task_struct? Probably, but I am not sure.

And btw fork()->dup_mmap() should call the filter too. Suppose that
uprobe_consumer wants to trace the task T and its children, this looks
very natural.

And we need to rework uprobe_register(). It can't simply return if
this (inode, offset) already has the consumer.

So far I think this needs more thinking. And imho we should merge the
working code Srikar already has, then try to add this (agreed, very
important) optimization.

Oleg.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-23 17:29                             ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-23 17:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/23, Peter Zijlstra wrote:
>
> On Mon, 2012-04-23 at 12:54 +0530, Srikar Dronamraju wrote:
> > * Peter Zijlstra <peterz@infradead.org> [2012-04-23 09:14:00]:
> >
> > > On Fri, 2012-04-20 at 20:37 +0200, Oleg Nesterov wrote:
> > > > Say, a user wants to probe /sbin/init only. What if init forks?
> > > > We should remove breakpoints from child->mm somehow.
> > >
> > > How is that hard? dup_mmap() only copies the VMAs, this doesn't actually
> > > copy the breakpoint. So the child doesn't have a breakpoint to be
> > > removed.
> > >
> >
> > Because the pages are COWED, the breakpoint gets copied over to the
> > child. If we dont want the breakpoints to be not visible to the child,
> > then we would have to remove them explicitly based on the filter (i.e if
> > and if we had inserted breakpoints conditionally based on filter).
>
> I thought we didn't COW shared maps since the fault handler will fill in
> the pages right and only anon stuff gets copied.

Confused...

Do you mean the "Don't copy ptes where a page fault will fill them correctly"
check in copy_page_range() ? Yes, but this vma should have ->anon_vma != NULL
if it has the breakpoint installed by uprobes.

Yes, we do not COW this page during dup_mmap(), but the new child's pte
should point to the same page with bp.

OK, I guess I misunderstood.

> > Once we add the conditional breakpoint insertion (which is tricky),
>
> How so?

I agree with Srikar this doesn't look simple to me. First of all,
currently it is not easy to find the tasks which use this ->mm.
OK, we can simply do for_each_process() under tasklist, but this is
not very nice.

But again, to me this is not the main problem.

> > Conditional removal
> > of breakpoints in fork path would just be an extension of the
> > conditional breakpoint insertion.
>
> Right, I don't think that removal is particularly hard if needed.

I agree that remove_breakpoint() itself is not that hard, probably.

But the whole idea of filtering is not clear to me. I mean, when/how
we should call the filter, and what should be the argument.
task_struct? Probably, but I am not sure.

And btw fork()->dup_mmap() should call the filter too. Suppose that
uprobe_consumer wants to trace the task T and its children, this looks
very natural.

And we need to rework uprobe_register(). It can't simply return if
this (inode, offset) already has the consumer.

So far I think this needs more thinking. And imho we should merge the
working code Srikar already has, then try to add this (agreed, very
important) optimization.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-23 17:29                             ` Oleg Nesterov
@ 2012-04-23 19:18                               ` Peter Zijlstra
  -1 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-23 19:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Srikar Dronamraju, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Mon, 2012-04-23 at 19:29 +0200, Oleg Nesterov wrote:
> On 04/23, Peter Zijlstra wrote:
> >
> > On Mon, 2012-04-23 at 12:54 +0530, Srikar Dronamraju wrote:
> > > * Peter Zijlstra <peterz@infradead.org> [2012-04-23 09:14:00]:
> > >
> > > > On Fri, 2012-04-20 at 20:37 +0200, Oleg Nesterov wrote:
> > > > > Say, a user wants to probe /sbin/init only. What if init forks?
> > > > > We should remove breakpoints from child->mm somehow.
> > > >
> > > > How is that hard? dup_mmap() only copies the VMAs, this doesn't actually
> > > > copy the breakpoint. So the child doesn't have a breakpoint to be
> > > > removed.
> > > >
> > >
> > > Because the pages are COWED, the breakpoint gets copied over to the
> > > child. If we dont want the breakpoints to be not visible to the child,
> > > then we would have to remove them explicitly based on the filter (i.e if
> > > and if we had inserted breakpoints conditionally based on filter).
> >
> > I thought we didn't COW shared maps since the fault handler will fill in
> > the pages right and only anon stuff gets copied.
> 
> Confused...
> 
> Do you mean the "Don't copy ptes where a page fault will fill them correctly"
> check in copy_page_range() ? Yes, but this vma should have ->anon_vma != NULL
> if it has the breakpoint installed by uprobes.

Oh, argh yeah, we add an anon_vma there..

> > > Once we add the conditional breakpoint insertion (which is tricky),
> >
> > How so?
> 
> I agree with Srikar this doesn't look simple to me. First of all,
> currently it is not easy to find the tasks which use this ->mm.
> OK, we can simply do for_each_process() under tasklist, but this is
> not very nice.
> 
> But again, to me this is not the main problem.

CLONE_VM without CLONE_THREAD is the problem, right?

Can we get away with not supporting that, at least initially?

> > > Conditional removal
> > > of breakpoints in fork path would just be an extension of the
> > > conditional breakpoint insertion.
> >
> > Right, I don't think that removal is particularly hard if needed.
> 
> I agree that remove_breakpoint() itself is not that hard, probably.
> 
> But the whole idea of filtering is not clear to me. I mean, when/how
> we should call the filter, and what should be the argument.
> task_struct? Probably, but I am not sure.

Well, the idea is really very simple: if for a probe an {mm,tasks} set
has all negative filters we do not install the probe on that mm.

The filters already take a uprobe_consumer and task_struct as argument.

> And btw fork()->dup_mmap() should call the filter too. Suppose that
> uprobe_consumer wants to trace the task T and its children, this looks
> very natural.

Agreed.

> And we need to rework uprobe_register(). It can't simply return if
> this (inode, offset) already has the consumer.

Not quite sure what you mean. uprobe_register() doesn't have such a
return value. It returns 0 on success and an error otherwise. Do you
mean __uprobe_register() ? That calls register_for_each_vma() and that
can simply call ->filter() for each vma it iterates. In fact, it can get
away with only calling the filter for the new consumer.

> So far I think this needs more thinking. And imho we should merge the
> working code Srikar already has, then try to add this (agreed, very
> important) optimization.

Sure..



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-23 19:18                               ` Peter Zijlstra
  0 siblings, 0 replies; 76+ messages in thread
From: Peter Zijlstra @ 2012-04-23 19:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Srikar Dronamraju, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On Mon, 2012-04-23 at 19:29 +0200, Oleg Nesterov wrote:
> On 04/23, Peter Zijlstra wrote:
> >
> > On Mon, 2012-04-23 at 12:54 +0530, Srikar Dronamraju wrote:
> > > * Peter Zijlstra <peterz@infradead.org> [2012-04-23 09:14:00]:
> > >
> > > > On Fri, 2012-04-20 at 20:37 +0200, Oleg Nesterov wrote:
> > > > > Say, a user wants to probe /sbin/init only. What if init forks?
> > > > > We should remove breakpoints from child->mm somehow.
> > > >
> > > > How is that hard? dup_mmap() only copies the VMAs, this doesn't actually
> > > > copy the breakpoint. So the child doesn't have a breakpoint to be
> > > > removed.
> > > >
> > >
> > > Because the pages are COWED, the breakpoint gets copied over to the
> > > child. If we dont want the breakpoints to be not visible to the child,
> > > then we would have to remove them explicitly based on the filter (i.e if
> > > and if we had inserted breakpoints conditionally based on filter).
> >
> > I thought we didn't COW shared maps since the fault handler will fill in
> > the pages right and only anon stuff gets copied.
> 
> Confused...
> 
> Do you mean the "Don't copy ptes where a page fault will fill them correctly"
> check in copy_page_range() ? Yes, but this vma should have ->anon_vma != NULL
> if it has the breakpoint installed by uprobes.

Oh, argh yeah, we add an anon_vma there..

> > > Once we add the conditional breakpoint insertion (which is tricky),
> >
> > How so?
> 
> I agree with Srikar this doesn't look simple to me. First of all,
> currently it is not easy to find the tasks which use this ->mm.
> OK, we can simply do for_each_process() under tasklist, but this is
> not very nice.
> 
> But again, to me this is not the main problem.

CLONE_VM without CLONE_THREAD is the problem, right?

Can we get away with not supporting that, at least initially?

> > > Conditional removal
> > > of breakpoints in fork path would just be an extension of the
> > > conditional breakpoint insertion.
> >
> > Right, I don't think that removal is particularly hard if needed.
> 
> I agree that remove_breakpoint() itself is not that hard, probably.
> 
> But the whole idea of filtering is not clear to me. I mean, when/how
> we should call the filter, and what should be the argument.
> task_struct? Probably, but I am not sure.

Well, the idea is really very simple: if for a probe an {mm,tasks} set
has all negative filters we do not install the probe on that mm.

The filters already take a uprobe_consumer and task_struct as argument.

> And btw fork()->dup_mmap() should call the filter too. Suppose that
> uprobe_consumer wants to trace the task T and its children, this looks
> very natural.

Agreed.

> And we need to rework uprobe_register(). It can't simply return if
> this (inode, offset) already has the consumer.

Not quite sure what you mean. uprobe_register() doesn't have such a
return value. It returns 0 on success and an error otherwise. Do you
mean __uprobe_register() ? That calls register_for_each_vma() and that
can simply call ->filter() for each vma it iterates. In fact, it can get
away with only calling the filter for the new consumer.

> So far I think this needs more thinking. And imho we should merge the
> working code Srikar already has, then try to add this (agreed, very
> important) optimization.

Sure..


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-23 19:18                               ` Peter Zijlstra
@ 2012-04-23 20:50                                 ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-23 20:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/23, Peter Zijlstra wrote:
>
> On Mon, 2012-04-23 at 19:29 +0200, Oleg Nesterov wrote:
> >
> > I agree with Srikar this doesn't look simple to me. First of all,
> > currently it is not easy to find the tasks which use this ->mm.
> > OK, we can simply do for_each_process() under tasklist, but this is
> > not very nice.
> >
> > But again, to me this is not the main problem.
>
> CLONE_VM without CLONE_THREAD is the problem, right?

Not even CLONE_VM without CLONE_THREAD... And perhaps I overestimate
the problem, I dunno. Just it seems to me there are to many "details"
we should discuss to make the filtering reasonable.

> Can we get away with not supporting that, at least initially?

I dunno. But yes, in this case one of the problems goes away, no
need to find all mm users. We need to find at least one task which
uses this mm if we want to pass its task_struct to ->filter(),
perhaps we can rely on mm->owner.

> > But the whole idea of filtering is not clear to me. I mean, when/how
> > we should call the filter, and what should be the argument.
> > task_struct? Probably, but I am not sure.
>
> Well, the idea is really very simple: if for a probe an {mm,tasks} set
> has all negative filters we do not install the probe on that mm.

Sure, but this is not enough. Assuming you mean uprobe_register().
And note that in this case we have a single uprobe.

We already discussed fork()->dup_mmap() a bit. However mmap needs
to call the filter too. Otherwise, how can you probe, say, some
function in /bin/true (with filtering)? By the time uprobe_register()
is called nobody mmaps this file.

Now potentionally we have multiple uprobes, each should be consulted..
OK, if we ignore "CLONE_VM without CLONE_THREAD" this is not that
bad probably.

> The filters already take a uprobe_consumer and task_struct as argument.

Yes, and probably this makes sense for handler_chain(). Although otoh
I do not really understand what this filter buys us at this point.
And note that this task is current.

But if we call ->filter() from, say, uprobe_register(), should we call
it for each thread? This looks strange a bit. Perhaps we should do
this only once and pass the group_leader. And then we need more locking,
say to avoid the races with exec/exit. So may be we should simply pass
tgid for example, I dunno.

> > And we need to rework uprobe_register(). It can't simply return if
> > this (inode, offset) already has the consumer.
>
> Not quite sure what you mean. uprobe_register() doesn't have such a
> return value.

Yes, I wasn't clear.

Suppose we have a consumer which wants to probe, say, sys_getpid()
and its ->filter() only wants to trace the task T1. So _register()
installs the breakpoint into T1->mm.

Then comes another consumer which wants to trace the task T2, but
(inode, offset) is the same: sys_getpid() from libc.

Now. Currently the 2nd uprobes_register() does nothing. It finds
the old uprobe we already have and assumes that all we need is to
add the new consumer to uprobe->consumers list.

If we add the filtering, we need register_for_each_vma() in this
case too, but it would be nice to skip the mm_struct's which were
already "acked" by the previous consumers and thus already have
this breakpoint installed.

Or, somehow we need to know that this int3 in this mm was inserted
by this uprobe. Currently this doesn't really matter (afaics), but
it seems that the current -EXIST logic is not exactly correct.

And uprobe_register() probably needs the changes too. Suppose that
the first consumer does uprobe_unregister(). Currently this only
removes the consumer, but with the filtering we probably want to
cleanup T1->mm.

Oleg.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-23 20:50                                 ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-23 20:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/23, Peter Zijlstra wrote:
>
> On Mon, 2012-04-23 at 19:29 +0200, Oleg Nesterov wrote:
> >
> > I agree with Srikar this doesn't look simple to me. First of all,
> > currently it is not easy to find the tasks which use this ->mm.
> > OK, we can simply do for_each_process() under tasklist, but this is
> > not very nice.
> >
> > But again, to me this is not the main problem.
>
> CLONE_VM without CLONE_THREAD is the problem, right?

Not even CLONE_VM without CLONE_THREAD... And perhaps I overestimate
the problem, I dunno. Just it seems to me there are to many "details"
we should discuss to make the filtering reasonable.

> Can we get away with not supporting that, at least initially?

I dunno. But yes, in this case one of the problems goes away, no
need to find all mm users. We need to find at least one task which
uses this mm if we want to pass its task_struct to ->filter(),
perhaps we can rely on mm->owner.

> > But the whole idea of filtering is not clear to me. I mean, when/how
> > we should call the filter, and what should be the argument.
> > task_struct? Probably, but I am not sure.
>
> Well, the idea is really very simple: if for a probe an {mm,tasks} set
> has all negative filters we do not install the probe on that mm.

Sure, but this is not enough. Assuming you mean uprobe_register().
And note that in this case we have a single uprobe.

We already discussed fork()->dup_mmap() a bit. However mmap needs
to call the filter too. Otherwise, how can you probe, say, some
function in /bin/true (with filtering)? By the time uprobe_register()
is called nobody mmaps this file.

Now potentionally we have multiple uprobes, each should be consulted..
OK, if we ignore "CLONE_VM without CLONE_THREAD" this is not that
bad probably.

> The filters already take a uprobe_consumer and task_struct as argument.

Yes, and probably this makes sense for handler_chain(). Although otoh
I do not really understand what this filter buys us at this point.
And note that this task is current.

But if we call ->filter() from, say, uprobe_register(), should we call
it for each thread? This looks strange a bit. Perhaps we should do
this only once and pass the group_leader. And then we need more locking,
say to avoid the races with exec/exit. So may be we should simply pass
tgid for example, I dunno.

> > And we need to rework uprobe_register(). It can't simply return if
> > this (inode, offset) already has the consumer.
>
> Not quite sure what you mean. uprobe_register() doesn't have such a
> return value.

Yes, I wasn't clear.

Suppose we have a consumer which wants to probe, say, sys_getpid()
and its ->filter() only wants to trace the task T1. So _register()
installs the breakpoint into T1->mm.

Then comes another consumer which wants to trace the task T2, but
(inode, offset) is the same: sys_getpid() from libc.

Now. Currently the 2nd uprobes_register() does nothing. It finds
the old uprobe we already have and assumes that all we need is to
add the new consumer to uprobe->consumers list.

If we add the filtering, we need register_for_each_vma() in this
case too, but it would be nice to skip the mm_struct's which were
already "acked" by the previous consumers and thus already have
this breakpoint installed.

Or, somehow we need to know that this int3 in this mm was inserted
by this uprobe. Currently this doesn't really matter (afaics), but
it seems that the current -EXIST logic is not exactly correct.

And uprobe_register() probably needs the changes too. Suppose that
the first consumer does uprobe_unregister(). Currently this only
removes the consumer, but with the filtering we probably want to
cleanup T1->mm.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-23 20:50                                 ` Oleg Nesterov
@ 2012-04-23 21:25                                   ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-23 21:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

forgot to mention,

On 04/23, Oleg Nesterov wrote:
>
> Just it seems to me there are to many "details"
> we should discuss to make the filtering reasonable.

And so far we assumed that consumer->filter() is "stable" and never
changes its mind.

Perhaps this is fine, but I am not sure. May we need need some
interface to add/del the task. Probably not, but unregister + register
doesn't look very convenient and can miss a hit.

> Yes, and probably this makes sense for handler_chain(). Although otoh
> I do not really understand what this filter buys us at this point.

But if we change the rules so that ->filter() or ->handler() itself can
return the "please remove this bp from ->mm" then perhaps it makes more
sense for the filtering. Again, not sure.

Oleg.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-23 21:25                                   ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-23 21:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Andrew Morton, Linus Torvalds,
	Ananth N Mavinakayanahalli, Jim Keniston, LKML, Linux-mm,
	Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

forgot to mention,

On 04/23, Oleg Nesterov wrote:
>
> Just it seems to me there are to many "details"
> we should discuss to make the filtering reasonable.

And so far we assumed that consumer->filter() is "stable" and never
changes its mind.

Perhaps this is fine, but I am not sure. May we need need some
interface to add/del the task. Probably not, but unregister + register
doesn't look very convenient and can miss a hit.

> Yes, and probably this makes sense for handler_chain(). Although otoh
> I do not really understand what this filter buys us at this point.

But if we change the rules so that ->filter() or ->handler() itself can
return the "please remove this bp from ->mm" then perhaps it makes more
sense for the filtering. Again, not sure.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-16 14:41       ` Oleg Nesterov
@ 2012-04-25 12:52         ` Srikar Dronamraju
  -1 siblings, 0 replies; 76+ messages in thread
From: Srikar Dronamraju @ 2012-04-25 12:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Ingo Molnar, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, Ananth N Mavinakayanahalli, Jim Keniston, LKML,
	Linux-mm, Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

* Oleg Nesterov <oleg@redhat.com> [2012-04-16 16:41:16]:

> On 04/16, Srikar Dronamraju wrote:
> >
> > Given that I have tested the current bits in -tip several times, I think
> > it might be worthwhile to keep these changes on hold till the current
> > uprobe bits in -tip go upstream.
> >
> > Oleg: Would this be acceptable?
> 
> Yes, yes, agreed.
> 

I applied the patches and ran all my tests. 
Everything works as expected.


--
Thanks and Regards
Srikar


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-25 12:52         ` Srikar Dronamraju
  0 siblings, 0 replies; 76+ messages in thread
From: Srikar Dronamraju @ 2012-04-25 12:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Ingo Molnar, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, Ananth N Mavinakayanahalli, Jim Keniston, LKML,
	Linux-mm, Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

* Oleg Nesterov <oleg@redhat.com> [2012-04-16 16:41:16]:

> On 04/16, Srikar Dronamraju wrote:
> >
> > Given that I have tested the current bits in -tip several times, I think
> > it might be worthwhile to keep these changes on hold till the current
> > uprobe bits in -tip go upstream.
> >
> > Oleg: Would this be acceptable?
> 
> Yes, yes, agreed.
> 

I applied the patches and ran all my tests. 
Everything works as expected.


--
Thanks and Regards
Srikar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
  2012-04-25 12:52         ` Srikar Dronamraju
@ 2012-04-25 14:22           ` Oleg Nesterov
  -1 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-25 14:22 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Ingo Molnar, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, Ananth N Mavinakayanahalli, Jim Keniston, LKML,
	Linux-mm, Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/25, Srikar Dronamraju wrote:
>
> I applied the patches and ran all my tests.
> Everything works as expected.

Thanks a lot Srikar.

I'll resend this series with some cleanups. Probably we do not need
is_swbp_at_addr_fast, we can check mm == current->mm in read_opcode()
as Peter suggests. Plus I'll try to make the MMF_UPROBE changes we
discussed. And a couple of really minor and off-topic cleanups.

But I'll wait until we have all pending patches in -tip to avoid
the unnecessary noise at this stage.

Oleg.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id
@ 2012-04-25 14:22           ` Oleg Nesterov
  0 siblings, 0 replies; 76+ messages in thread
From: Oleg Nesterov @ 2012-04-25 14:22 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Ingo Molnar, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, Ananth N Mavinakayanahalli, Jim Keniston, LKML,
	Linux-mm, Andi Kleen, Christoph Hellwig, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Thomas Gleixner,
	Anton Arapov

On 04/25, Srikar Dronamraju wrote:
>
> I applied the patches and ran all my tests.
> Everything works as expected.

Thanks a lot Srikar.

I'll resend this series with some cleanups. Probably we do not need
is_swbp_at_addr_fast, we can check mm == current->mm in read_opcode()
as Peter suggests. Plus I'll try to make the MMF_UPROBE changes we
discussed. And a couple of really minor and off-topic cleanups.

But I'll wait until we have all pending patches in -tip to avoid
the unnecessary noise at this stage.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2012-04-25 14:24 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-05 22:20 [RFC 0/6] uprobes: kill uprobes_srcu/uprobe_srcu_id Oleg Nesterov
2012-04-05 22:20 ` Oleg Nesterov
2012-04-05 22:20 ` [PATCH 1/6] uprobes: introduce find_active_uprobe() Oleg Nesterov
2012-04-05 22:20   ` Oleg Nesterov
2012-04-05 22:21 ` [PATCH 2/6] uprobes: introduce is_swbp_at_addr_fast() Oleg Nesterov
2012-04-05 22:21   ` Oleg Nesterov
2012-04-16 10:08   ` Peter Zijlstra
2012-04-16 10:08     ` Peter Zijlstra
2012-04-16 14:44     ` Oleg Nesterov
2012-04-16 14:44       ` Oleg Nesterov
2012-04-16 14:55       ` Peter Zijlstra
2012-04-16 14:55         ` Peter Zijlstra
2012-04-16 15:34         ` Oleg Nesterov
2012-04-16 15:34           ` Oleg Nesterov
2012-04-17 10:08           ` Peter Zijlstra
2012-04-17 10:08             ` Peter Zijlstra
2012-04-17 17:09             ` Oleg Nesterov
2012-04-17 17:09               ` Oleg Nesterov
2012-04-17 19:53               ` Peter Zijlstra
2012-04-17 19:53                 ` Peter Zijlstra
2012-04-05 22:21 ` [PATCH 3/6] uprobes: teach find_active_uprobe() to provide the "is_swbp" info Oleg Nesterov
2012-04-05 22:21   ` Oleg Nesterov
2012-04-05 22:21 ` [PATCH 4/6] uprobes: change register_for_each_vma() to take mm->mmap_sem for writing Oleg Nesterov
2012-04-05 22:21   ` Oleg Nesterov
2012-04-05 22:22 ` [PATCH 5/6] uprobes: teach handle_swbp() to rely on "is_swbp" rather than uprobes_srcu Oleg Nesterov
2012-04-05 22:22   ` Oleg Nesterov
2012-04-05 22:22 ` [PATCH 6/6] uprobes: kill uprobes_srcu/uprobe_srcu_id Oleg Nesterov
2012-04-05 22:22   ` Oleg Nesterov
2012-04-14 11:16 ` [RFC 0/6] " Ingo Molnar
2012-04-14 11:16   ` Ingo Molnar
2012-04-16 11:31   ` Srikar Dronamraju
2012-04-16 11:31     ` Srikar Dronamraju
2012-04-16 14:41     ` Oleg Nesterov
2012-04-16 14:41       ` Oleg Nesterov
2012-04-25 12:52       ` Srikar Dronamraju
2012-04-25 12:52         ` Srikar Dronamraju
2012-04-25 14:22         ` Oleg Nesterov
2012-04-25 14:22           ` Oleg Nesterov
2012-04-14 13:16 ` Peter Zijlstra
2012-04-14 13:16   ` Peter Zijlstra
2012-04-14 20:52   ` Oleg Nesterov
2012-04-14 20:52     ` Oleg Nesterov
2012-04-15 10:51     ` Peter Zijlstra
2012-04-15 10:51       ` Peter Zijlstra
2012-04-15 19:53       ` Oleg Nesterov
2012-04-15 19:53         ` Oleg Nesterov
2012-04-15 21:48         ` Peter Zijlstra
2012-04-15 21:48           ` Peter Zijlstra
2012-04-15 23:44           ` Oleg Nesterov
2012-04-15 23:44             ` Oleg Nesterov
2012-04-16 10:16             ` Peter Zijlstra
2012-04-16 10:16               ` Peter Zijlstra
2012-04-16 21:47               ` Oleg Nesterov
2012-04-16 21:47                 ` Oleg Nesterov
2012-04-20 10:14                 ` Peter Zijlstra
2012-04-20 10:14                   ` Peter Zijlstra
2012-04-20 10:16                   ` Srikar Dronamraju
2012-04-20 10:16                     ` Srikar Dronamraju
2012-04-20 18:58                     ` Oleg Nesterov
2012-04-20 18:58                       ` Oleg Nesterov
2012-04-20 18:37                   ` Oleg Nesterov
2012-04-20 18:37                     ` Oleg Nesterov
2012-04-23  7:14                     ` Peter Zijlstra
2012-04-23  7:14                       ` Peter Zijlstra
2012-04-23  7:24                       ` Srikar Dronamraju
2012-04-23  7:24                         ` Srikar Dronamraju
2012-04-23  7:40                         ` Peter Zijlstra
2012-04-23  7:40                           ` Peter Zijlstra
2012-04-23 17:29                           ` Oleg Nesterov
2012-04-23 17:29                             ` Oleg Nesterov
2012-04-23 19:18                             ` Peter Zijlstra
2012-04-23 19:18                               ` Peter Zijlstra
2012-04-23 20:50                               ` Oleg Nesterov
2012-04-23 20:50                                 ` Oleg Nesterov
2012-04-23 21:25                                 ` Oleg Nesterov
2012-04-23 21:25                                   ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.