All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] uprobes: misc minor changes
@ 2012-05-30 16:57 Oleg Nesterov
  2012-05-30 16:58 ` [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Oleg Nesterov @ 2012-05-30 16:57 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, Anton Arapov, Linus Torvalds,
	Masami Hiramatsu, linux-kernel

Hello,

A couple of random/minor changes before I'll try to kill
uprobes_state->count.

On top of 1-7 I sent yesterday, but doesn't depend on it.

Oleg.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-05-30 16:57 [PATCH 0/3] uprobes: misc minor changes Oleg Nesterov
@ 2012-05-30 16:58 ` Oleg Nesterov
  2012-05-30 17:28   ` Peter Zijlstra
  2012-05-30 16:58 ` [PATCH 2/3] uprobes: remove the unnecessary initialization in add_utask() Oleg Nesterov
  2012-05-30 16:58 ` [PATCH 3/3] uprobes: simplify the usage of uprobe->pending_list Oleg Nesterov
  2 siblings, 1 reply; 24+ messages in thread
From: Oleg Nesterov @ 2012-05-30 16:58 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, Anton Arapov, Linus Torvalds,
	Masami Hiramatsu, linux-kernel

install_breakpoint() returns -EEXIST if is_swbp_insn(orig_insn) == T,
the caller treats this code as success.

This is doubly wrong. The successful return should set UPROBE_COPY_INSN,
but the real problem is that it shouldn't succeed. If the probed insn is
int3 the application should get SIGTRAP, this won't happen with uprobe.

Probably we can fix this, we can add the UPROBE_SHARED_BP flag and teach
handle_swbp/set_orig_insn to handle this case correctly. But this needs
some complications and we have other insns which can't be probed, lets
make a simple fix for now.

I think this needs a cleanup. UPROBE_COPY_INSN should die, copy_insn()
should be called by alloc_uprobe(). arch_uprobe_analyze_insn() depends
on ->mm (ia32_compat) but it is called only once.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/events/uprobes.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 8c5e043..1593b43 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -704,7 +704,7 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
 			return ret;
 
 		if (is_swbp_insn((uprobe_opcode_t *)uprobe->arch.insn))
-			return -EEXIST;
+			return -ENOTSUPP;
 
 		ret = arch_uprobe_analyze_insn(&uprobe->arch, mm);
 		if (ret)
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/3] uprobes: remove the unnecessary initialization in add_utask()
  2012-05-30 16:57 [PATCH 0/3] uprobes: misc minor changes Oleg Nesterov
  2012-05-30 16:58 ` [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T Oleg Nesterov
@ 2012-05-30 16:58 ` Oleg Nesterov
  2012-05-30 16:58 ` [PATCH 3/3] uprobes: simplify the usage of uprobe->pending_list Oleg Nesterov
  2 siblings, 0 replies; 24+ messages in thread
From: Oleg Nesterov @ 2012-05-30 16:58 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, Anton Arapov, Linus Torvalds,
	Masami Hiramatsu, linux-kernel

Trivial cleanup. No need to nullify ->active_uprobe after kzalloc().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/events/uprobes.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1593b43..50c8a9c 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1416,7 +1416,6 @@ static struct uprobe_task *add_utask(void)
 	if (unlikely(!utask))
 		return NULL;
 
-	utask->active_uprobe = NULL;
 	current->utask = utask;
 	return utask;
 }
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/3] uprobes: simplify the usage of uprobe->pending_list
  2012-05-30 16:57 [PATCH 0/3] uprobes: misc minor changes Oleg Nesterov
  2012-05-30 16:58 ` [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T Oleg Nesterov
  2012-05-30 16:58 ` [PATCH 2/3] uprobes: remove the unnecessary initialization in add_utask() Oleg Nesterov
@ 2012-05-30 16:58 ` Oleg Nesterov
  2012-05-30 17:48   ` Srikar Dronamraju
  2 siblings, 1 reply; 24+ messages in thread
From: Oleg Nesterov @ 2012-05-30 16:58 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
  Cc: Ananth N Mavinakayanahalli, Anton Arapov, Linus Torvalds,
	Masami Hiramatsu, linux-kernel

uprobe->pending_list is only used to create the temporary list,
it has no meaning after we drop uprobes_mmap_hash(inode).

No need to initialize this node or remove it from tmp_list, and
we can use list_for_each_entry().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/events/uprobes.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 50c8a9c..de931e7 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -529,7 +529,6 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
 	uprobe->inode = igrab(inode);
 	uprobe->offset = offset;
 	init_rwsem(&uprobe->consumer_rwsem);
-	INIT_LIST_HEAD(&uprobe->pending_list);
 
 	/* add to uprobes_tree, sorted on inode:offset */
 	cur_uprobe = insert_uprobe(uprobe);
@@ -1051,7 +1050,7 @@ static void build_probe_list(struct inode *inode, struct list_head *head)
 int uprobe_mmap(struct vm_area_struct *vma)
 {
 	struct list_head tmp_list;
-	struct uprobe *uprobe, *u;
+	struct uprobe *uprobe;
 	struct inode *inode;
 	int ret, count;
 
@@ -1069,10 +1068,9 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	ret = 0;
 	count = 0;
 
-	list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
+	list_for_each_entry(uprobe, &tmp_list, pending_list) {
 		loff_t vaddr;
 
-		list_del(&uprobe->pending_list);
 		if (!ret) {
 			vaddr = vma_address(vma, uprobe->offset);
 
@@ -1118,7 +1116,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
 void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end)
 {
 	struct list_head tmp_list;
-	struct uprobe *uprobe, *u;
+	struct uprobe *uprobe;
 	struct inode *inode;
 
 	if (!atomic_read(&uprobe_events) || !valid_vma(vma, false))
@@ -1135,12 +1133,10 @@ void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned lon
 	mutex_lock(uprobes_mmap_hash(inode));
 	build_probe_list(inode, &tmp_list);
 
-	list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
+	list_for_each_entry(uprobe, &tmp_list, pending_list) {
 		loff_t vaddr;
 
-		list_del(&uprobe->pending_list);
 		vaddr = vma_address(vma, uprobe->offset);
-
 		if (vaddr >= start && vaddr < end) {
 			/*
 			 * An unregister could have removed the probe before
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-05-30 16:58 ` [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T Oleg Nesterov
@ 2012-05-30 17:28   ` Peter Zijlstra
  2012-05-30 17:37     ` Srikar Dronamraju
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2012-05-30 17:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Srikar Dronamraju, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

On Wed, 2012-05-30 at 18:58 +0200, Oleg Nesterov wrote:
> install_breakpoint() returns -EEXIST if is_swbp_insn(orig_insn) == T,
> the caller treats this code as success.
> 
> This is doubly wrong. The successful return should set UPROBE_COPY_INSN,
> but the real problem is that it shouldn't succeed. If the probed insn is
> int3 the application should get SIGTRAP, this won't happen with uprobe.
> 
> Probably we can fix this, we can add the UPROBE_SHARED_BP flag and teach
> handle_swbp/set_orig_insn to handle this case correctly. But this needs
> some complications and we have other insns which can't be probed, lets
> make a simple fix for now.
> 
> I think this needs a cleanup. UPROBE_COPY_INSN should die, copy_insn()
> should be called by alloc_uprobe(). arch_uprobe_analyze_insn() depends
> on ->mm (ia32_compat) but it is called only once.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/events/uprobes.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 8c5e043..1593b43 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -704,7 +704,7 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
>  			return ret;
>  
>  		if (is_swbp_insn((uprobe_opcode_t *)uprobe->arch.insn))
> -			return -EEXIST;
> +			return -ENOTSUPP;
>  
>  		ret = arch_uprobe_analyze_insn(&uprobe->arch, mm);
>  		if (ret)

IIRC this -EEXIST existed because the vma iteration it does is racy and
one can encounter the same vma twice or so. See the special -EEXIST
handling in register_for_each_vma().

Changing it like this would break stuff. 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-05-30 17:28   ` Peter Zijlstra
@ 2012-05-30 17:37     ` Srikar Dronamraju
  2012-05-30 17:49       ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Srikar Dronamraju @ 2012-05-30 17:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

> > 
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> > ---
> >  kernel/events/uprobes.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > index 8c5e043..1593b43 100644
> > --- a/kernel/events/uprobes.c
> > +++ b/kernel/events/uprobes.c
> > @@ -704,7 +704,7 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
> >  			return ret;
> >  
> >  		if (is_swbp_insn((uprobe_opcode_t *)uprobe->arch.insn))
> > -			return -EEXIST;
> > +			return -ENOTSUPP;
> >  
> >  		ret = arch_uprobe_analyze_insn(&uprobe->arch, mm);
> >  		if (ret)
> 
> IIRC this -EEXIST existed because the vma iteration it does is racy and
> one can encounter the same vma twice or so. See the special -EEXIST
> handling in register_for_each_vma().
> 
> Changing it like this would break stuff. 
> 

Peter, 

is_swbp_insn() is looking at the copy of the instruction thats read from
the file. This path is only taken even before any mm's are inserted with
the breakpoint instruction.

We still check and return -EEXIST if the memory while inserting the breakpoint
instruction  already has a breakpoint.

Hence this change is correct. 

-- 
thanks and regards
Srikar


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] uprobes: simplify the usage of uprobe->pending_list
  2012-05-30 16:58 ` [PATCH 3/3] uprobes: simplify the usage of uprobe->pending_list Oleg Nesterov
@ 2012-05-30 17:48   ` Srikar Dronamraju
  2012-05-30 18:10     ` Oleg Nesterov
  0 siblings, 1 reply; 24+ messages in thread
From: Srikar Dronamraju @ 2012-05-30 17:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Peter Zijlstra, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

* Oleg Nesterov <oleg@redhat.com> [2012-05-30 18:58:46]:

> uprobe->pending_list is only used to create the temporary list,
> it has no meaning after we drop uprobes_mmap_hash(inode).
> 
> No need to initialize this node or remove it from tmp_list, and
> we can use list_for_each_entry().


I actually dont see the patch that removed the uprobe->pending_list.
I dont see it as part of the 7 that you sent yesterday. Nor does is it
part of the series that you sent today.

the uprobe_mmap_hash was required because we can do only one insert or
remove operation at a time. 

How are we going to synchronize uprobe_mmap() for two different vmas
that belong two different mms but map the same inode?

May be I am missing a patch somewhere.

-- 
thanks and regards
Srikar


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-05-30 17:37     ` Srikar Dronamraju
@ 2012-05-30 17:49       ` Peter Zijlstra
  2012-05-30 17:54         ` Peter Zijlstra
  2012-05-31 12:15         ` Peter Zijlstra
  0 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2012-05-30 17:49 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Oleg Nesterov, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

On Wed, 2012-05-30 at 23:07 +0530, Srikar Dronamraju wrote:
> > > 
> > > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> > > ---
> > >  kernel/events/uprobes.c |    2 +-
> > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > > index 8c5e043..1593b43 100644
> > > --- a/kernel/events/uprobes.c
> > > +++ b/kernel/events/uprobes.c
> > > @@ -704,7 +704,7 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
> > >  			return ret;
> > >  
> > >  		if (is_swbp_insn((uprobe_opcode_t *)uprobe->arch.insn))
> > > -			return -EEXIST;
> > > +			return -ENOTSUPP;
> > >  
> > >  		ret = arch_uprobe_analyze_insn(&uprobe->arch, mm);
> > >  		if (ret)
> > 
> > IIRC this -EEXIST existed because the vma iteration it does is racy and
> > one can encounter the same vma twice or so. See the special -EEXIST
> > handling in register_for_each_vma().
> > 
> > Changing it like this would break stuff. 
> > 
> 
> Peter, 
> 
> is_swbp_insn() is looking at the copy of the instruction thats read from
> the file. This path is only taken even before any mm's are inserted with
> the breakpoint instruction.
> 
> We still check and return -EEXIST if the memory while inserting the breakpoint
> instruction  already has a breakpoint.
> 
> Hence this change is correct. 

Oh, indeed. I overlooked copy_insn() is taking the page from the
page-cache instead of the page-tables and will thus get the original.

OK, no worries then.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-05-30 17:49       ` Peter Zijlstra
@ 2012-05-30 17:54         ` Peter Zijlstra
  2012-05-30 18:03           ` Peter Zijlstra
                             ` (2 more replies)
  2012-05-31 12:15         ` Peter Zijlstra
  1 sibling, 3 replies; 24+ messages in thread
From: Peter Zijlstra @ 2012-05-30 17:54 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Oleg Nesterov, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

On Wed, 2012-05-30 at 19:49 +0200, Peter Zijlstra wrote:
> Oh, indeed. I overlooked copy_insn() is taking the page from the
> page-cache instead of the page-tables and will thus get the original.
> 
> OK, no worries then. 

So its the -EEXIST from set_swbp() that I was thinking about. I think I
was also wrong on the reason that that can happen. register's vma
iteration is very careful not to have the same vma twice, but it can
race against mmap() because of the uprobe_hash() vs uprobe_mmap_hash()
madness, right?

Something like so?

---
 kernel/events/uprobes.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 985be4d..b4e749e 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -45,6 +45,19 @@ static DEFINE_SPINLOCK(uprobes_treelock);	/* serialize rbtree access */
 
 #define UPROBES_HASH_SZ	13
 
+/*
+ * We need separate register/unregister and mmap/munmap lock hashes because of
+ * mmap_sem nesting.
+ *
+ * {,un}egister_uprobe() needs to install probes on (potentially) all processes
+ * and thus need to acquire multiple mmap_sems (consecutively, not
+ * concurrently), whereas uprobe_m{,un}map() is called while holding mmap_sem
+ * for the particular process doing the mmap.
+ *
+ * This all means that register_uprobe() can race with uprobe_mmap() and we
+ * can try and install a probe where one is already installed.
+ */
+
 /* serialize (un)register */
 static struct mutex uprobes_mutex[UPROBES_HASH_SZ];
 
@@ -356,6 +369,9 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned
 {
 	int result;
 
+	/*
+	 * See the comment near uprobes_hash().
+	 */
 	result = is_swbp_at_addr(mm, vaddr);
 	if (result == 1)
 		return -EEXIST;


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-05-30 17:54         ` Peter Zijlstra
@ 2012-05-30 18:03           ` Peter Zijlstra
  2012-05-30 18:04           ` Srikar Dronamraju
  2012-05-31 18:53           ` Oleg Nesterov
  2 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2012-05-30 18:03 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Oleg Nesterov, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

On Wed, 2012-05-30 at 19:54 +0200, Peter Zijlstra wrote:
> Something like so?

---
 kernel/events/uprobes.c |   25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 985be4d..16f986d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -45,6 +45,19 @@ static DEFINE_SPINLOCK(uprobes_treelock);	/* serialize rbtree access */
 
 #define UPROBES_HASH_SZ	13
 
+/*
+ * We need separate register/unregister and mmap/munmap lock hashes because of
+ * mmap_sem nesting.
+ *
+ * {,un}egister_uprobe() need to install probes on (potentially) all processes
+ * and thus need to acquire multiple mmap_sems (consequtively, not
+ * concurrently), whereas uprobe_m{,un}map() are called while holding mmap_sem
+ * for the particular process doing the mmap.
+ *
+ * This all means that register_uprobe() can race with uprobe_mmap() and we
+ * can try and install a probe where one is already installed.
+ */
+
 /* serialize (un)register */
 static struct mutex uprobes_mutex[UPROBES_HASH_SZ];
 
@@ -356,6 +369,9 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned
 {
 	int result;
 
+	/*
+	 * See the comment near uprobes_hash().
+	 */
 	result = is_swbp_at_addr(mm, vaddr);
 	if (result == 1)
 		return -EEXIST;
@@ -870,6 +886,10 @@ static int register_for_each_vma(struct uprobe *uprobe, bool is_register)
 		up_read(&mm->mmap_sem);
 		mmput(mm);
 		if (is_register) {
+			/*
+			 * We can race against uprobe_mmap() see the comment
+			 * near uprobe_hash().
+			 */
 			if (ret && ret == -EEXIST)
 				ret = 0;
 			if (ret)
@@ -1080,7 +1100,10 @@ int uprobe_mmap(struct vm_area_struct *vma)
 
 			ret = install_breakpoint(uprobe, vma->vm_mm, vma, vaddr);
 
-			/* Ignore double add: */
+			/* 
+			 * We can race against register_uprobe(), see the
+			 * comment near uprobe_hash().
+			 */
 			if (ret == -EEXIST) {
 				ret = 0;
 


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-05-30 17:54         ` Peter Zijlstra
  2012-05-30 18:03           ` Peter Zijlstra
@ 2012-05-30 18:04           ` Srikar Dronamraju
  2012-05-30 18:21             ` Peter Zijlstra
  2012-05-31 18:53           ` Oleg Nesterov
  2 siblings, 1 reply; 24+ messages in thread
From: Srikar Dronamraju @ 2012-05-30 18:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

> 
> So its the -EEXIST from set_swbp() that I was thinking about. I think I
> was also wrong on the reason that that can happen. register's vma
> iteration is very careful not to have the same vma twice, but it can
> race against mmap() because of the uprobe_hash() vs uprobe_mmap_hash()
> madness, right?


right. 

> 
> Something like so?
> 
> ---
>  kernel/events/uprobes.c |   16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 985be4d..b4e749e 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -45,6 +45,19 @@ static DEFINE_SPINLOCK(uprobes_treelock);	/* serialize rbtree access */
>  
>  #define UPROBES_HASH_SZ	13
>  
> +/*
> + * We need separate register/unregister and mmap/munmap lock hashes because of
> + * mmap_sem nesting.
> + *
> + * {,un}egister_uprobe() needs to install probes on (potentially) all processes
> + * and thus need to acquire multiple mmap_sems (consecutively, not
> + * concurrently), whereas uprobe_m{,un}map() is called while holding mmap_sem
> + * for the particular process doing the mmap.
> + *
> + * This all means that register_uprobe() can race with uprobe_mmap() and we
> + * can try and install a probe where one is already installed.
> + */

Nit: {,un}egister_uprobe should have been uprobe_{,un}register  at
couple of places.

> +
>  /* serialize (un)register */
>  static struct mutex uprobes_mutex[UPROBES_HASH_SZ];
>  
> @@ -356,6 +369,9 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned
>  {
>  	int result;
>  
> +	/*
> +	 * See the comment near uprobes_hash().
> +	 */
>  	result = is_swbp_at_addr(mm, vaddr);
>  	if (result == 1)
>  		return -EEXIST;
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] uprobes: simplify the usage of uprobe->pending_list
  2012-05-30 17:48   ` Srikar Dronamraju
@ 2012-05-30 18:10     ` Oleg Nesterov
  2012-05-30 18:23       ` Srikar Dronamraju
  0 siblings, 1 reply; 24+ messages in thread
From: Oleg Nesterov @ 2012-05-30 18:10 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Peter Zijlstra, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

On 05/30, Srikar Dronamraju wrote:
>
> * Oleg Nesterov <oleg@redhat.com> [2012-05-30 18:58:46]:
>
> > uprobe->pending_list is only used to create the temporary list,
> > it has no meaning after we drop uprobes_mmap_hash(inode).
> >
> > No need to initialize this node or remove it from tmp_list, and
> > we can use list_for_each_entry().
>
>
> I actually dont see the patch that removed the uprobe->pending_list.

I think you misread this cleanup. Or may be I misunderstood you...

The patch only removes the unnecessary INIT_LIST_HEAD/list_del and
changes the code to use list_for_each_entry (_safe is not needed).

list_add() doesn't need the initialized entry, and there is no need
to "cleanup" ->pending_list after list_for_each().

However. If you dislike this change, feel free to nack it.. Cleanups
are always subjective, I won't argue if you prefer the current code.

Oleg.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-05-30 18:04           ` Srikar Dronamraju
@ 2012-05-30 18:21             ` Peter Zijlstra
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2012-05-30 18:21 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Oleg Nesterov, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

On Wed, 2012-05-30 at 23:34 +0530, Srikar Dronamraju wrote:
> Nit: {,un}egister_uprobe should have been uprobe_{,un}register  at
> couple of places. 

Ah indeed.. very silly.. hopefully last version.


---
Subject: uprobe: Document uprobe_register() vs uprobe_mmap() race

Because the mind is treacherous and makes us forget we need to write
stuff down.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/events/uprobes.c |   29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 985be4d..fd6fb30 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -45,6 +45,23 @@ static DEFINE_SPINLOCK(uprobes_treelock);	/* serialize rbtree access */
 
 #define UPROBES_HASH_SZ	13
 
+/*
+ * We need separate register/unregister and mmap/munmap lock hashes because of
+ * mmap_sem nesting.
+ *
+ * uprobe_register() needs to install probes on (potentially) all processes
+ * and thus needs to acquire multiple mmap_sems (consequtively, not
+ * concurrently), whereas uprobe_mmap() is called while holding mmap_sem
+ * for the particular process doing the mmap.
+ *
+ * uprobe_register()->register_for_each_vma() needs to drop/acquire mmap_sem
+ * because of lock order against i_mmap_mutex. This means there's a hole in the
+ * register vma iteration where a mmap() can happen.
+ *
+ * Thus uprobe_register() can race with uprobe_mmap() and we can try and
+ * install a probe where one is already installed.
+ */
+
 /* serialize (un)register */
 static struct mutex uprobes_mutex[UPROBES_HASH_SZ];
 
@@ -356,6 +373,9 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned
 {
 	int result;
 
+	/*
+	 * See the comment near uprobes_hash().
+	 */
 	result = is_swbp_at_addr(mm, vaddr);
 	if (result == 1)
 		return -EEXIST;
@@ -870,6 +890,10 @@ static int register_for_each_vma(struct uprobe *uprobe, bool is_register)
 		up_read(&mm->mmap_sem);
 		mmput(mm);
 		if (is_register) {
+			/*
+			 * We can race against uprobe_mmap() see the comment
+			 * near uprobe_hash().
+			 */
 			if (ret && ret == -EEXIST)
 				ret = 0;
 			if (ret)
@@ -1080,7 +1104,10 @@ int uprobe_mmap(struct vm_area_struct *vma)
 
 			ret = install_breakpoint(uprobe, vma->vm_mm, vma, vaddr);
 
-			/* Ignore double add: */
+			/* 
+			 * We can race against uprobe_register(), see the
+			 * comment near uprobe_hash().
+			 */
 			if (ret == -EEXIST) {
 				ret = 0;
 


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] uprobes: simplify the usage of uprobe->pending_list
  2012-05-30 18:10     ` Oleg Nesterov
@ 2012-05-30 18:23       ` Srikar Dronamraju
  0 siblings, 0 replies; 24+ messages in thread
From: Srikar Dronamraju @ 2012-05-30 18:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Peter Zijlstra, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

* Oleg Nesterov <oleg@redhat.com> [2012-05-30 20:10:06]:

> On 05/30, Srikar Dronamraju wrote:
> >
> > * Oleg Nesterov <oleg@redhat.com> [2012-05-30 18:58:46]:
> >
> > > uprobe->pending_list is only used to create the temporary list,
> > > it has no meaning after we drop uprobes_mmap_hash(inode).
> > >
> > > No need to initialize this node or remove it from tmp_list, and
> > > we can use list_for_each_entry().
> >
> >
> > I actually dont see the patch that removed the uprobe->pending_list.
> 
> I think you misread this cleanup. Or may be I misunderstood you...

Yes, I  misread the patch. Sorry for the noise. 
I will take a look again tomorrow.

-- 
Thanks and regards
Srikar


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-05-30 17:49       ` Peter Zijlstra
  2012-05-30 17:54         ` Peter Zijlstra
@ 2012-05-31 12:15         ` Peter Zijlstra
  2012-05-31 12:54           ` Srikar Dronamraju
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2012-05-31 12:15 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Oleg Nesterov, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

On Wed, 2012-05-30 at 19:49 +0200, Peter Zijlstra wrote:
> 
> Oh, indeed. I overlooked copy_insn() is taking the page from the
> page-cache instead of the page-tables and will thus get the original.
> 

Related to this, what happens when we try to install a probe on a page
that's already been COWed by a ptrace user?

Say gdb did a code modification at or near the intended probe site.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-05-31 12:15         ` Peter Zijlstra
@ 2012-05-31 12:54           ` Srikar Dronamraju
  0 siblings, 0 replies; 24+ messages in thread
From: Srikar Dronamraju @ 2012-05-31 12:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

* Peter Zijlstra <peterz@infradead.org> [2012-05-31 14:15:18]:

> On Wed, 2012-05-30 at 19:49 +0200, Peter Zijlstra wrote:
> > 
> > Oh, indeed. I overlooked copy_insn() is taking the page from the
> > page-cache instead of the page-tables and will thus get the original.
> > 
> 
> Related to this, what happens when we try to install a probe on a page
> that's already been COWed by a ptrace user?
> 
> Say gdb did a code modification at or near the intended probe site.
> 

If the page is already COWED because of an existing code modification
near the intended probe site, we are replacing it with another page
keeping the previous modification as is.

ptrace and uprobes tracing the same instruction wont work cleanly for
now.

-- 
thanks and regards
Srikar


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-05-30 17:54         ` Peter Zijlstra
  2012-05-30 18:03           ` Peter Zijlstra
  2012-05-30 18:04           ` Srikar Dronamraju
@ 2012-05-31 18:53           ` Oleg Nesterov
  2012-06-01 15:53             ` Oleg Nesterov
  2012-06-01 16:47             ` Srikar Dronamraju
  2 siblings, 2 replies; 24+ messages in thread
From: Oleg Nesterov @ 2012-05-31 18:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

On 05/30, Peter Zijlstra wrote:
>
> register's vma
> iteration is very careful not to have the same vma twice,

Hmm. I am wondering if it is careful enough...

Just in case, I think your patch is great. But it seems to me
there is another problem.

__find_next_vma_info() checks tmpvi->mm == vma->vm_mm to detect the
already visited mm/vma. However, afaics this can be false positive?

The caller, register_for_each_vma(), does mmput() and after that
this memory can be freed and re-used as another mm_struct.

I'll recheck this, but perhaps we need something like below?

Oleg.

--- x/kernel/events/uprobes.c
+++ x/kernel/events/uprobes.c
@@ -851,7 +851,6 @@ static int register_for_each_vma(struct 
 			list_del(&vi->probe_list);
 			kfree(vi);
 			up_write(&mm->mmap_sem);
-			mmput(mm);
 			continue;
 		}
 		vaddr = vma_address(vma, uprobe->offset);
@@ -860,7 +859,6 @@ static int register_for_each_vma(struct 
 			list_del(&vi->probe_list);
 			kfree(vi);
 			up_write(&mm->mmap_sem);
-			mmput(mm);
 			continue;
 		}
 
@@ -870,7 +868,6 @@ static int register_for_each_vma(struct 
 			remove_breakpoint(uprobe, mm, vi->vaddr);
 
 		up_write(&mm->mmap_sem);
-		mmput(mm);
 		if (is_register) {
 			if (ret && ret == -EEXIST)
 				ret = 0;
@@ -881,6 +878,7 @@ static int register_for_each_vma(struct 
 
 	list_for_each_entry_safe(vi, tmpvi, &try_list, probe_list) {
 		list_del(&vi->probe_list);
+		mmput(vi->mm)
 		kfree(vi);
 	}
 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-05-31 18:53           ` Oleg Nesterov
@ 2012-06-01 15:53             ` Oleg Nesterov
  2012-06-01 16:33               ` Srikar Dronamraju
  2012-06-01 16:47             ` Srikar Dronamraju
  1 sibling, 1 reply; 24+ messages in thread
From: Oleg Nesterov @ 2012-06-01 15:53 UTC (permalink / raw)
  To: Peter Zijlstra, Srikar Dronamraju
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, Anton Arapov,
	Linus Torvalds, Masami Hiramatsu, linux-kernel

On 05/31, Oleg Nesterov wrote:
>
> __find_next_vma_info() checks tmpvi->mm == vma->vm_mm to detect the
> already visited mm/vma. However, afaics this can be false positive?

Yes, but I guess this is harmless, we can rely on uprobe_mmap.



But. Doesn't this mean we can greatly simplify register_for_each_vma()
and make it O(n) ?

Unless I missed something, we can simply create the list of
mm/vaddr structures under ->i_mmap_mutex (vma_prio_tree_foreach), then
register_for_each_vma() can process the list and that is all.

If another mapping comes after we drop i_mmap_mutex, uprobe_mmap()
should be called and it should install the bp.


Srikar, Peter, what do you think?

Oleg.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-06-01 15:53             ` Oleg Nesterov
@ 2012-06-01 16:33               ` Srikar Dronamraju
  2012-06-01 17:20                 ` Oleg Nesterov
  0 siblings, 1 reply; 24+ messages in thread
From: Srikar Dronamraju @ 2012-06-01 16:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

* Oleg Nesterov <oleg@redhat.com> [2012-06-01 17:53:12]:

> On 05/31, Oleg Nesterov wrote:
> >
> > __find_next_vma_info() checks tmpvi->mm == vma->vm_mm to detect the
> > already visited mm/vma. However, afaics this can be false positive?
> 
> Yes, but I guess this is harmless, we can rely on uprobe_mmap.
> 
> 
> 
> But. Doesn't this mean we can greatly simplify register_for_each_vma()
> and make it O(n) ?
> 
> Unless I missed something, we can simply create the list of
> mm/vaddr structures under ->i_mmap_mutex (vma_prio_tree_foreach), then
> register_for_each_vma() can process the list and that is all.


If I remember correctly, we cannot allocate the list elements under
i_mmap_mutex. We dont know how many list elements to allocate.

This is what Peter had to say : https://lkml.org/lkml/2011/6/27/72

"Because we try to take i_mmap_mutex during reclaim, trying to unmap
pages. So suppose we do an allocation while holding i_mmap_mutex, find
there's no free memory, try and unmap a page in order to free it, and
we're stuck."

> 
> If another mapping comes after we drop i_mmap_mutex, uprobe_mmap()
> should be called and it should install the bp.
> 

-- 
Thanks and Regards
Srikar


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-05-31 18:53           ` Oleg Nesterov
  2012-06-01 15:53             ` Oleg Nesterov
@ 2012-06-01 16:47             ` Srikar Dronamraju
  2012-06-01 18:38               ` Oleg Nesterov
  1 sibling, 1 reply; 24+ messages in thread
From: Srikar Dronamraju @ 2012-06-01 16:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

* Oleg Nesterov <oleg@redhat.com> [2012-05-31 20:53:39]:

> On 05/30, Peter Zijlstra wrote:
> >
> > register's vma
> > iteration is very careful not to have the same vma twice,
> 
> Hmm. I am wondering if it is careful enough...
> 
> Just in case, I think your patch is great. But it seems to me
> there is another problem.
> 
> __find_next_vma_info() checks tmpvi->mm == vma->vm_mm to detect the
> already visited mm/vma. However, afaics this can be false positive?
> 
> The caller, register_for_each_vma(), does mmput() and after that
> this memory can be freed and re-used as another mm_struct.
> 

Before we do a mmput(), we take the mmap_sem for that mm. So this mm
cannot be freed until we complete insertion/removal.  If the mm gets
reused after the insertion/removal, and maps the inode, then we are
doing the right thing by parsing it.

Or are you hinting at some other problem?


> I'll recheck this, but perhaps we need something like below?
> 
> Oleg.
> 
> --- x/kernel/events/uprobes.c
> +++ x/kernel/events/uprobes.c
> @@ -851,7 +851,6 @@ static int register_for_each_vma(struct 
>  			list_del(&vi->probe_list);
>  			kfree(vi);
>  			up_write(&mm->mmap_sem);
> -			mmput(mm);

If we want to do this, then we have to even move the list_del and kfree
along with mmput.  Otherwise we may leak mm's.  I see no advantages
except for decrease in code.  I might be missing something trivial. 

On the other side, moving the lines below would mean keeping extra
elements in the list that have to be traversed again. i.e if we
determine in this pass that elements of the list can be removed, then
why keep them till the next pass.

>  			continue;
>  		}
>  		vaddr = vma_address(vma, uprobe->offset);
> @@ -860,7 +859,6 @@ static int register_for_each_vma(struct 
>  			list_del(&vi->probe_list);
>  			kfree(vi);
>  			up_write(&mm->mmap_sem);
> -			mmput(mm);
>  			continue;
>  		}
> 
> @@ -870,7 +868,6 @@ static int register_for_each_vma(struct 
>  			remove_breakpoint(uprobe, mm, vi->vaddr);
> 
>  		up_write(&mm->mmap_sem);
> -		mmput(mm);
>  		if (is_register) {
>  			if (ret && ret == -EEXIST)
>  				ret = 0;
> @@ -881,6 +878,7 @@ static int register_for_each_vma(struct 
> 
>  	list_for_each_entry_safe(vi, tmpvi, &try_list, probe_list) {
>  		list_del(&vi->probe_list);
> +		mmput(vi->mm)
>  		kfree(vi);
>  	}
> 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-06-01 16:33               ` Srikar Dronamraju
@ 2012-06-01 17:20                 ` Oleg Nesterov
  2012-06-01 18:31                   ` Oleg Nesterov
  0 siblings, 1 reply; 24+ messages in thread
From: Oleg Nesterov @ 2012-06-01 17:20 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

On 06/01, Srikar Dronamraju wrote:
>
> * Oleg Nesterov <oleg@redhat.com> [2012-06-01 17:53:12]:
>
> > But. Doesn't this mean we can greatly simplify register_for_each_vma()
> > and make it O(n) ?
> >
> > Unless I missed something, we can simply create the list of
> > mm/vaddr structures under ->i_mmap_mutex (vma_prio_tree_foreach), then
> > register_for_each_vma() can process the list and that is all.
>
>
> If I remember correctly, we cannot allocate the list elements under
> i_mmap_mutex. We dont know how many list elements to allocate.
>
> This is what Peter had to say : https://lkml.org/lkml/2011/6/27/72
>
> "Because we try to take i_mmap_mutex during reclaim, trying to unmap
> pages. So suppose we do an allocation while holding i_mmap_mutex, find
> there's no free memory, try and unmap a page in order to free it, and
> we're stuck."

Yes, try_to_unmap.

But. What do you think about the pseudo-code below? Only to illustrate
the approach, the code is not complete.

In the "likely" case we do vma_prio_tree_foreach() twice, this is
better than the current quadratic behaviour.

Oleg.

struct map_info {
	struct map_info *next;
	struct mm_struct *mm;
	loff_t vaddr;
};

static struct map_info *
build_map_info_list(struct address_space *mapping, loff_t offset,
							bool is_register)
{
	struct map_info *prev = NULL;
	struct map_info *curr;
	int more;

again:
	more = 0;
	curr = NULL;
	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
		struct map_info *info = prev;

		if (!valid_vma(vma, is_register))
			continue;

		if (!info) {
			more++;
			continue;
		}

		if (!atomic_inc_not_zero(&vma->vm_mm->mm_users))
			contunue;

		prev = info->next;

		info->mm = vma->vm_mm;
		info->vaddr = vma_address(vma, offset);

		info->next = curr;
		curr = info;
	}

	if (!more) {
		while (prev) {
			map_info *tmp = prev;
			prev = prev->next;
			kfree(tmp);
		}

		return curr;
	}

	prev = curr;

	while (curr) {
		mmput(curr->mm);
		curr = curr->next;
	}

	while (more--) {
		struct map_info *info = kmalloc(...);
		if (!info)
			return ERR_PTR(-ENOMEM);	// MEMORY LEAK
		info->next = prev;
		prev = info;
	}

	goto again;
}


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-06-01 17:20                 ` Oleg Nesterov
@ 2012-06-01 18:31                   ` Oleg Nesterov
  2012-06-02 18:21                     ` Oleg Nesterov
  0 siblings, 1 reply; 24+ messages in thread
From: Oleg Nesterov @ 2012-06-01 18:31 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

On 06/01, Oleg Nesterov wrote:
>
> But. What do you think about the pseudo-code below? Only to illustrate
> the approach, the code is not complete.
>
> In the "likely" case we do vma_prio_tree_foreach() twice, this is
> better than the current quadratic behaviour.

See the "full" version. Untested of course, most probably has bugs,
but hopefully close enough.

What do you think?

Oleg.

struct map_info {
	struct map_info *next;
	struct mm_struct *mm;
	loff_t vaddr;
};

static inline struct map_info *free_map_info(struct map_info *info)
{
	struct map_info *next = info->next;
	kfree(info);
	return next;
}

static struct map_info *
build_map_info(struct address_space *mapping, loff_t offset, bool is_register)
{
	unsigned long pgoff = offset >> PAGE_SHIFT;
	struct prio_tree_iter iter;
	struct vm_area_struct *vma;
	struct map_info *curr = NULL;
	struct map_info *prev = NULL;
	struct map_info *info;
	int more = 0;

 again:
 	mutex_lock(&mapping->i_mmap_mutex);
	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
		if (!valid_vma(vma, is_register))
			continue;

		if (!prev) {
			more++;
			continue;
		}

		if (!atomic_inc_not_zero(&vma->vm_mm->mm_users))
			continue;

		info = prev;
		prev = prev->next;
		info->next = curr;
		curr = info;

		info->mm = vma->vm_mm;
		info->vaddr = vma_address(vma, offset);
	}
	mutex_unlock(&mapping->i_mmap_mutex);

	if (!more)
		goto out;

	prev = curr;
	while (curr) {
		mmput(curr->mm);
		curr = curr->next;
	}

	while (more--) {
		info = kmalloc(sizeof(struct map_info), GFP_KERNEL);
		if (!info) {
			curr = ERR_PTR(-ENOMEM);
			goto out;
		}
		info->next = prev;
		prev = info;
	}

	goto again;
 out:
	while (prev)
		prev = free_map_info(prev);
	return curr;
}

static int register_for_each_vma(struct uprobe *uprobe, bool is_register)
{
	struct map_info *info;
	int err = 0;

	info = build_map_info(uprobe->inode->i_mapping,
					uprobe->offset, is_register);
	if (IS_ERR(info))
		return PTR_ERR(info);

	while (info) {
		struct mm_struct *mm = info->mm;
		struct vm_area_struct *vma;
		loff_t vaddr;

		if (err)
			goto free;

		down_write(&mm->mmap_sem);
		vma = find_vma(mm, (unsigned long)info->vaddr);
		if (!vma || !valid_vma(vma, is_register))
			goto unlock;

		vaddr = vma_address(vma, uprobe->offset);
		if (vma->vm_file->f_mapping->host != uprobe->inode ||
						vaddr != info->vaddr)
			goto unlock;

		if (is_register) {
			err = install_breakpoint(uprobe, mm, vma, info->vaddr);
			if (err == -EEXIST)
				err = 0;
		} else {
			remove_breakpoint(uprobe, mm, info->vaddr);
		}
 unlock:
		up_write(&mm->mmap_sem);
 free:
		mmput(mm);
		info = free_map_info(info);
	}

	return err;
}


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-06-01 16:47             ` Srikar Dronamraju
@ 2012-06-01 18:38               ` Oleg Nesterov
  0 siblings, 0 replies; 24+ messages in thread
From: Oleg Nesterov @ 2012-06-01 18:38 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

On 06/01, Srikar Dronamraju wrote:
>
> * Oleg Nesterov <oleg@redhat.com> [2012-05-31 20:53:39]:
>
> > __find_next_vma_info() checks tmpvi->mm == vma->vm_mm to detect the
> > already visited mm/vma. However, afaics this can be false positive?
> >
> > The caller, register_for_each_vma(), does mmput() and after that
> > this memory can be freed and re-used as another mm_struct.
> >
>
> Before we do a mmput(), we take the mmap_sem for that mm. So this mm
> cannot be freed until we complete insertion/removal.  If the mm gets
> reused after the insertion/removal, and maps the inode,

Yes, see the previous email from me, we can rely on uprobe_mmap().

> Or are you hinting at some other problem?

Perhaps this deserves a comment. I mean, to explain that yes,
tmpvi->mm == vma->vm_mm can be wrong but this is fine.

However, I hope we simply can kill this code. See build_map_info()
I sent.

Oleg.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T
  2012-06-01 18:31                   ` Oleg Nesterov
@ 2012-06-02 18:21                     ` Oleg Nesterov
  0 siblings, 0 replies; 24+ messages in thread
From: Oleg Nesterov @ 2012-06-02 18:21 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Ingo Molnar, Ananth N Mavinakayanahalli,
	Anton Arapov, Linus Torvalds, Masami Hiramatsu, linux-kernel

On 06/01, Oleg Nesterov wrote:
>
> 	while (more--) {
> 		info = kmalloc(sizeof(struct map_info), GFP_KERNEL);
> 		if (!info) {
> 			curr = ERR_PTR(-ENOMEM);
> 			goto out;
> 		}
> 		info->next = prev;
> 		prev = info;
> 	}

This should be "do ... while (--more)", otherwise it seems to work
and really helps.

I did the simple test under qemu. Currently uprobe_register() hangs
"forever" if the probed addr has 10000 mappings, I'have stopped qemu
after several minutes.

With the new code uprobe_register() needs 0.403s to complete.

We can also change build_map_info() to try GFP_NOWAIT | GFP_NOMEMALLOC
under i_mmap_mutex first, not sure this is really needed.

I'll write the changelog and send the patch...

Do you see any problems with this approach?

Oleg.


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2012-06-02 18:23 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-30 16:57 [PATCH 0/3] uprobes: misc minor changes Oleg Nesterov
2012-05-30 16:58 ` [PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T Oleg Nesterov
2012-05-30 17:28   ` Peter Zijlstra
2012-05-30 17:37     ` Srikar Dronamraju
2012-05-30 17:49       ` Peter Zijlstra
2012-05-30 17:54         ` Peter Zijlstra
2012-05-30 18:03           ` Peter Zijlstra
2012-05-30 18:04           ` Srikar Dronamraju
2012-05-30 18:21             ` Peter Zijlstra
2012-05-31 18:53           ` Oleg Nesterov
2012-06-01 15:53             ` Oleg Nesterov
2012-06-01 16:33               ` Srikar Dronamraju
2012-06-01 17:20                 ` Oleg Nesterov
2012-06-01 18:31                   ` Oleg Nesterov
2012-06-02 18:21                     ` Oleg Nesterov
2012-06-01 16:47             ` Srikar Dronamraju
2012-06-01 18:38               ` Oleg Nesterov
2012-05-31 12:15         ` Peter Zijlstra
2012-05-31 12:54           ` Srikar Dronamraju
2012-05-30 16:58 ` [PATCH 2/3] uprobes: remove the unnecessary initialization in add_utask() Oleg Nesterov
2012-05-30 16:58 ` [PATCH 3/3] uprobes: simplify the usage of uprobe->pending_list Oleg Nesterov
2012-05-30 17:48   ` Srikar Dronamraju
2012-05-30 18:10     ` Oleg Nesterov
2012-05-30 18:23       ` Srikar Dronamraju

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.