All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] [PATCH 2.6.37-rc5-tip 0/20]  0: Inode based uprobes
@ 2010-12-16  9:57 Srikar Dronamraju
  2010-12-16  9:57 ` [RFC] [PATCH 2.6.37-rc5-tip 1/20] 1: mm: Move replace_page() / write_protect_page() to mm/memory.c Srikar Dronamraju
                   ` (20 more replies)
  0 siblings, 21 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Arnaldo Carvalho de Melo,
	Linus Torvalds, Andi Kleen, Christoph Hellwig,
	Ananth N Mavinakayanahalli, Masami Hiramatsu, Oleg Nesterov,
	LKML, Linux-mm, Jim Keniston, Frederic Weisbecker, SystemTap,
	Andrew Morton, Paul E. McKenney

into any routine in a user space application and collect information
non-disruptively.

For previous posting: please refer: http://lkml.org/lkml/2010/8/25/165
http://lkml.org/lkml/2010/7/27/121, http://lkml.org/lkml/2010/7/12/67,
http://lkml.org/lkml/2010/7/8/239, http://lkml.org/lkml/2010/6/29/299,
http://lkml.org/lkml/2010/6/14/41, http://lkml.org/lkml/2010/3/20/107
and http://lkml.org/lkml/2010/5/18/307

Uprobes Patches
Unlike the previous postings where a probe was specified as pid:vaddr,
this patchset implements inode based uprobes which are specified as
<file>:<offset> where offset is the offset from start of the map.
The probehit overhead is around 3X times the previous patchset overhead.

This patchset is a rework based on suggestions from discussions on lkml
in September, March and January 2010 (http://lkml.org/lkml/2010/1/11/92,
http://lkml.org/lkml/2010/1/27/19, http://lkml.org/lkml/2010/3/20/107
and http://lkml.org/lkml/2010/3/31/199 ). This implementation of uprobes
doesnt depend on utrace.

When a uprobe is registered, Uprobes makes a copy of the probed
instruction, replaces the first byte(s) of the probed instruction with a
breakpoint instruction. (Uprobes uses background page replacement
mechanism and ensures that the breakpoint affects only that process.)

When a CPU hits the breakpoint instruction, Uprobes gets notified of
trap and finds the associated uprobe. It then executes the associated
handler. Uprobes single-steps its copy of the probed instruction and
resumes execution of the probed process at the instruction following the
probepoint. Instruction copies to be single-stepped are stored in a
per-mm "execution out of line (XOL) area". Currently XOL area is
allocated as one page vma.

Advantages of uprobes over conventional debugging include:

1. Non-disruptive.
Unlike current ptrace based mechanisms, uprobes tracing wouldnt
involve signals, stopping threads and context switching between the
tracer and tracee.

2. Much better handling of multithreaded programs because of XOL.
Current ptrace based mechanisms use single stepping inline, i.e they
copy back the original instruction on hitting a breakpoint.  In such
mechanisms tracers have to stop all the threads on a breakpoint hit or
tracers will not be able to handle all hits to the location of
interest. Uprobes uses execution out of line, where the instruction to
be traced is analysed at the time of breakpoint insertion and a copy
of instruction is stored at a different location.  On breakpoint hit,
uprobes jumps to that copied location and singlesteps the same
instruction and does the necessary fixups post singlestepping.

3. Multiple tracers for an application.
Multiple uprobes based tracer could work in unison to trace an
application. There could one tracer that could be interested in
generic events for a particular set of process. While there could be
another tracer that is just interested in one specific event of a
particular process thats part of the previous set of process.

4. Corelating events from kernels and userspace.
Uprobes could be used with other tools like kprobes, tracepoints or as
part of higher level tools like perf to give a consolidated set of
events from kernel and userspace.  In future we could look at a single
backtrace showing application, library and kernel calls.

Here is the list of TODO Items.

- Integrating perf probe with this patchset.
- Prefiltering (i.e filtering at the time of probe insertion)
 (Can be achieved if we can dynamically assign consumers at uprobe tracer
  enable time; Suggestions on how to do this are welcome)
- Signal handling.
	- queueing non-uprobes based INT3 as SIGTRAPS.
	- delaying signals from INT3 till post singlestep and queueing the
	  delayed signals.
- Return probes.
- Support for other architectures.
- Uprobes booster.
- replace macro W with bits in inat table.
- Bulk registration/unregisteration.

To try please fetch using
git fetch \
git://git.kernel.org/pub/scm/linux/kernel/git/srikar/linux-uprobes.git \
tip_inode_uprobes_161210:tip_inode_uprobes

Please refer "[RFC] [PATCH 2.6.37-rc5-tip 20/20] 20: tracing: uprobes
trace_event infrastructure" on how to use uprobe_tracer.

Please do provide your valuable comments.

Thanks in advance.
Srikar

 Srikar Dronamraju(20)
 0: Inode based uprobes
 1: mm: Move replace_page() / write_protect_page() to mm/memory.c
 2: X86 specific breakpoint definitions.
 3: uprobes: Breakground page replacement.
 4: uprobes: Adding and remove a uprobe in a rb tree.
 5: Uprobes: register/unregister probes.
 6: x86: analyze instruction and determine fixups.
 7: uprobes: store/restore original instruction.
 8: uprobes: mmap and fork hooks.
 9: x86: architecture specific task information.
10: uprobes: task specific information.
11: uprobes: slot allocation for uprobes
12: uprobes: get the breakpoint address.
13: x86: x86 specific probe handling
14: uprobes: Handing int3 and singlestep exception.
15: x86: uprobes exception notifier for x86.
16: uprobes: register a notifier for uprobes.
17: uprobes: filter chain
18: uprobes: commonly used filters.
19: tracing: Extract out common code for kprobes/uprobes traceevents.
20: tracing: uprobes trace_event interface

 arch/Kconfig                       |    4 +
 arch/x86/Kconfig                   |    3 +
 arch/x86/include/asm/thread_info.h |    2 +
 arch/x86/include/asm/uprobes.h     |   55 ++
 arch/x86/kernel/Makefile           |    1 +
 arch/x86/kernel/signal.c           |   14 +
 arch/x86/kernel/uprobes.c          |  599 +++++++++++++++++
 include/linux/mm.h                 |    4 +
 include/linux/mm_types.h           |    9 +
 include/linux/sched.h              |    3 +
 include/linux/uprobes.h            |  186 ++++++
 kernel/Makefile                    |    1 +
 kernel/fork.c                      |   10 +
 kernel/trace/Kconfig               |   20 +
 kernel/trace/Makefile              |    2 +
 kernel/trace/trace.h               |    5 +
 kernel/trace/trace_kprobe.c        |  752 +---------------------
 kernel/trace/trace_probe.c         |  654 +++++++++++++++++++
 kernel/trace/trace_probe.h         |  157 +++++
 kernel/trace/trace_uprobe.c        |  753 ++++++++++++++++++++++
 kernel/uprobes.c                   | 1250 ++++++++++++++++++++++++++++++++++++
 mm/ksm.c                           |  114 ----
 mm/memory.c                        |  122 ++++
 mm/mmap.c                          |    2 +
 24 files changed, 3871 insertions(+), 851 deletions(-)

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 1/20]  1: mm: Move replace_page() / write_protect_page() to mm/memory.c
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
@ 2010-12-16  9:57 ` Srikar Dronamraju
  2010-12-16  9:57 ` [RFC] [PATCH 2.6.37-rc5-tip 2/20] 2: X86 specific breakpoint definitions Srikar Dronamraju
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, Andrew Morton, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, LKML,
	Paul E. McKenney


User bkpt will use background page replacement approach to insert/delete
breakpoints. Background page replacement approach will be based on
replace_page and write_protect_page.  Now replace_page() and
write_protect_page loses their static attribute.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
---
 include/linux/mm.h |    4 ++
 mm/ksm.c           |  114 -------------------------------------------------
 mm/memory.c        |  122 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 126 insertions(+), 114 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 721f451..24f8bb0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -874,6 +874,10 @@ void account_page_writeback(struct page *page);
 int set_page_dirty(struct page *page);
 int set_page_dirty_lock(struct page *page);
 int clear_page_dirty_for_io(struct page *page);
+int replace_page(struct vm_area_struct *vma, struct page *page,
+					struct page *kpage, pte_t orig_pte);
+int write_protect_page(struct vm_area_struct *vma, struct page *page,
+						      pte_t *orig_pte);
 
 /* Is the vma a continuation of the stack vma above it? */
 static inline int vma_stack_continue(struct vm_area_struct *vma, unsigned long addr)
diff --git a/mm/ksm.c b/mm/ksm.c
index 43bc893..0169c6b 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -694,120 +694,6 @@ static inline int pages_identical(struct page *page1, struct page *page2)
 	return !memcmp_pages(page1, page2);
 }
 
-static int write_protect_page(struct vm_area_struct *vma, struct page *page,
-			      pte_t *orig_pte)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	unsigned long addr;
-	pte_t *ptep;
-	spinlock_t *ptl;
-	int swapped;
-	int err = -EFAULT;
-
-	addr = page_address_in_vma(page, vma);
-	if (addr == -EFAULT)
-		goto out;
-
-	ptep = page_check_address(page, mm, addr, &ptl, 0);
-	if (!ptep)
-		goto out;
-
-	if (pte_write(*ptep) || pte_dirty(*ptep)) {
-		pte_t entry;
-
-		swapped = PageSwapCache(page);
-		flush_cache_page(vma, addr, page_to_pfn(page));
-		/*
-		 * Ok this is tricky, when get_user_pages_fast() run it doesnt
-		 * take any lock, therefore the check that we are going to make
-		 * with the pagecount against the mapcount is racey and
-		 * O_DIRECT can happen right after the check.
-		 * So we clear the pte and flush the tlb before the check
-		 * this assure us that no O_DIRECT can happen after the check
-		 * or in the middle of the check.
-		 */
-		entry = ptep_clear_flush(vma, addr, ptep);
-		/*
-		 * Check that no O_DIRECT or similar I/O is in progress on the
-		 * page
-		 */
-		if (page_mapcount(page) + 1 + swapped != page_count(page)) {
-			set_pte_at(mm, addr, ptep, entry);
-			goto out_unlock;
-		}
-		if (pte_dirty(entry))
-			set_page_dirty(page);
-		entry = pte_mkclean(pte_wrprotect(entry));
-		set_pte_at_notify(mm, addr, ptep, entry);
-	}
-	*orig_pte = *ptep;
-	err = 0;
-
-out_unlock:
-	pte_unmap_unlock(ptep, ptl);
-out:
-	return err;
-}
-
-/**
- * replace_page - replace page in vma by new ksm page
- * @vma:      vma that holds the pte pointing to page
- * @page:     the page we are replacing by kpage
- * @kpage:    the ksm page we replace page by
- * @orig_pte: the original value of the pte
- *
- * Returns 0 on success, -EFAULT on failure.
- */
-static int replace_page(struct vm_area_struct *vma, struct page *page,
-			struct page *kpage, pte_t orig_pte)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *ptep;
-	spinlock_t *ptl;
-	unsigned long addr;
-	int err = -EFAULT;
-
-	addr = page_address_in_vma(page, vma);
-	if (addr == -EFAULT)
-		goto out;
-
-	pgd = pgd_offset(mm, addr);
-	if (!pgd_present(*pgd))
-		goto out;
-
-	pud = pud_offset(pgd, addr);
-	if (!pud_present(*pud))
-		goto out;
-
-	pmd = pmd_offset(pud, addr);
-	if (!pmd_present(*pmd))
-		goto out;
-
-	ptep = pte_offset_map_lock(mm, pmd, addr, &ptl);
-	if (!pte_same(*ptep, orig_pte)) {
-		pte_unmap_unlock(ptep, ptl);
-		goto out;
-	}
-
-	get_page(kpage);
-	page_add_anon_rmap(kpage, vma, addr);
-
-	flush_cache_page(vma, addr, pte_pfn(*ptep));
-	ptep_clear_flush(vma, addr, ptep);
-	set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot));
-
-	page_remove_rmap(page);
-	put_page(page);
-
-	pte_unmap_unlock(ptep, ptl);
-	err = 0;
-out:
-	return err;
-}
-
 /*
  * try_to_merge_one_page - take two pages and merge them into one
  * @vma: the vma that holds the pte pointing to page
diff --git a/mm/memory.c b/mm/memory.c
index 02e48aa..28f79bb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2591,6 +2591,128 @@ void unmap_mapping_range(struct address_space *mapping,
 }
 EXPORT_SYMBOL(unmap_mapping_range);
 
+/**
+ * replace_page - replace page in vma by new ksm page
+ * @vma:      vma that holds the pte pointing to page
+ * @page:     the page we are replacing by kpage
+ * @kpage:    the ksm page we replace page by
+ * @orig_pte: the original value of the pte
+ *
+ * Returns 0 on success, -EFAULT on failure.
+ */
+int replace_page(struct vm_area_struct *vma, struct page *page,
+			struct page *kpage, pte_t orig_pte)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *ptep;
+	spinlock_t *ptl;
+	unsigned long addr;
+	int err = -EFAULT;
+
+	addr = page_address_in_vma(page, vma);
+	if (addr == -EFAULT)
+		goto out;
+
+	pgd = pgd_offset(mm, addr);
+	if (!pgd_present(*pgd))
+		goto out;
+
+	pud = pud_offset(pgd, addr);
+	if (!pud_present(*pud))
+		goto out;
+
+	pmd = pmd_offset(pud, addr);
+	if (!pmd_present(*pmd))
+		goto out;
+
+	ptep = pte_offset_map_lock(mm, pmd, addr, &ptl);
+	if (!pte_same(*ptep, orig_pte)) {
+		pte_unmap_unlock(ptep, ptl);
+		goto out;
+	}
+
+	get_page(kpage);
+	page_add_anon_rmap(kpage, vma, addr);
+
+	flush_cache_page(vma, addr, pte_pfn(*ptep));
+	ptep_clear_flush(vma, addr, ptep);
+	set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot));
+
+	page_remove_rmap(page);
+	put_page(page);
+
+	pte_unmap_unlock(ptep, ptl);
+	err = 0;
+out:
+	return err;
+}
+
+/**
+ * write_protect_page - mark the page readonly
+ * @vma:      vma that holds the page we want to mark
+ * @page:     page that needs to be marked readonly
+ * @orig_pte: pte for the protected page.
+ *
+ * Returns 0 on success, -EFAULT on failure.
+ */
+int write_protect_page(struct vm_area_struct *vma, struct page *page,
+						      pte_t *orig_pte)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	unsigned long addr;
+	pte_t *ptep;
+	spinlock_t *ptl;
+	int swapped;
+	int err = -EFAULT;
+
+	addr = page_address_in_vma(page, vma);
+	if (addr == -EFAULT)
+		goto out;
+
+	ptep = page_check_address(page, mm, addr, &ptl, 0);
+	if (!ptep)
+		goto out;
+
+	if (pte_write(*ptep) || pte_dirty(*ptep)) {
+		pte_t entry;
+
+		swapped = PageSwapCache(page);
+		flush_cache_page(vma, addr, page_to_pfn(page));
+		/*
+		 * Ok this is tricky, when get_user_pages_fast() run it doesnt
+		 * take any lock, therefore the check that we are going to make
+		 * with the pagecount against the mapcount is racey and
+		 * O_DIRECT can happen right after the check.
+		 * So we clear the pte and flush the tlb before the check
+		 * this assure us that no O_DIRECT can happen after the check
+		 * or in the middle of the check.
+		 */
+		entry = ptep_clear_flush(vma, addr, ptep);
+		/*
+		 * Check that no O_DIRECT or similar I/O is in progress on the
+		 * page
+		 */
+		if (page_mapcount(page) + 1 + swapped != page_count(page)) {
+			set_pte_at(mm, addr, ptep, entry);
+			goto out_unlock;
+		}
+		if (pte_dirty(entry))
+			set_page_dirty(page);
+		entry = pte_mkclean(pte_wrprotect(entry));
+		set_pte_at_notify(mm, addr, ptep, entry);
+	}
+	*orig_pte = *ptep;
+	err = 0;
+
+out_unlock:
+	pte_unmap_unlock(ptep, ptl);
+out:
+	return err;
+}
+
 int vmtruncate_range(struct inode *inode, loff_t offset, loff_t end)
 {
 	struct address_space *mapping = inode->i_mapping;

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 2/20]  2: X86 specific breakpoint definitions.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
  2010-12-16  9:57 ` [RFC] [PATCH 2.6.37-rc5-tip 1/20] 1: mm: Move replace_page() / write_protect_page() to mm/memory.c Srikar Dronamraju
@ 2010-12-16  9:57 ` Srikar Dronamraju
  2010-12-16  9:57 ` [RFC] [PATCH 2.6.37-rc5-tip 3/20] 3: uprobes: Breakground page replacement Srikar Dronamraju
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Linux-mm,
	Arnaldo Carvalho de Melo, Linus Torvalds,
	Ananth N Mavinakayanahalli, Christoph Hellwig, Masami Hiramatsu,
	Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney


Provides definitions for the breakpoint instruction and x86 specific
uprobe info structure.

Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/x86/Kconfig               |    3 +++
 arch/x86/include/asm/uprobes.h |   40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/uprobes.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b6fccb0..bcbbc52 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -239,6 +239,9 @@ config ARCH_CPU_PROBE_RELEASE
 	def_bool y
 	depends on HOTPLUG_CPU
 
+config ARCH_SUPPORTS_UPROBES
+	def_bool y
+
 source "init/Kconfig"
 source "kernel/Kconfig.freezer"
 
diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
new file mode 100644
index 0000000..0e8ad5d
--- /dev/null
+++ b/arch/x86/include/asm/uprobes.h
@@ -0,0 +1,40 @@
+#ifndef _ASM_UPROBES_H
+#define _ASM_UPROBES_H
+/*
+ * Userspace Probes (UProbes) for x86
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2008-2010
+ * Authors:
+ *	Srikar Dronamraju
+ *	Jim Keniston
+ */
+
+typedef u8 uprobe_opcode_t;
+#define MAX_UINSN_BYTES 16
+#define UPROBES_XOL_SLOT_BYTES (MAX_UINSN_BYTES)
+
+#define UPROBES_BKPT_INSN 0xcc
+#define UPROBES_BKPT_INSN_SIZE 1
+
+#ifdef CONFIG_X86_64
+struct uprobe_arch_info {
+	unsigned long rip_rela_target_address;
+};
+#else
+struct uprobe_arch_info {};
+#endif
+#endif	/* _ASM_UPROBES_H */

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 3/20]  3: uprobes: Breakground page replacement.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
  2010-12-16  9:57 ` [RFC] [PATCH 2.6.37-rc5-tip 1/20] 1: mm: Move replace_page() / write_protect_page() to mm/memory.c Srikar Dronamraju
  2010-12-16  9:57 ` [RFC] [PATCH 2.6.37-rc5-tip 2/20] 2: X86 specific breakpoint definitions Srikar Dronamraju
@ 2010-12-16  9:57 ` Srikar Dronamraju
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 4/20] 4: uprobes: Adding and remove a uprobe in a rb tree Srikar Dronamraju
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Arnaldo Carvalho de Melo,
	Linus Torvalds, Andi Kleen, Christoph Hellwig,
	Ananth N Mavinakayanahalli, Masami Hiramatsu, Oleg Nesterov,
	Andrew Morton, Linux-mm, Jim Keniston, Frederic Weisbecker,
	SystemTap, LKML, Paul E. McKenney


Provides Background page replacement using replace_page() routine.
Also provides routines to read/write few bytes to vm and for
verifying if a instruction is a breakpoint instruction.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
---
 arch/Kconfig                   |   11 ++
 arch/x86/include/asm/uprobes.h |    2 
 include/linux/uprobes.h        |   76 ++++++++++++
 kernel/Makefile                |    1 
 kernel/uprobes.c               |  252 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 341 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/uprobes.h
 create mode 100644 kernel/uprobes.c

diff --git a/arch/Kconfig b/arch/Kconfig
index f78c2be..6e8f26e 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -61,6 +61,17 @@ config OPTPROBES
 	depends on KPROBES && HAVE_OPTPROBES
 	depends on !PREEMPT
 
+config UPROBES
+	bool "User-space probes (EXPERIMENTAL)"
+	depends on ARCH_SUPPORTS_UPROBES
+	depends on MMU
+	help
+	  Uprobes enables kernel subsystems to establish probepoints
+	  in user applications and execute handler functions when
+	  the probepoints are hit. For more information, refer to
+	  Documentation/uprobes.txt.
+	  If in doubt, say "N".
+
 config HAVE_EFFICIENT_UNALIGNED_ACCESS
 	bool
 	help
diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 0e8ad5d..5026359 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -25,7 +25,7 @@
 
 typedef u8 uprobe_opcode_t;
 #define MAX_UINSN_BYTES 16
-#define UPROBES_XOL_SLOT_BYTES (MAX_UINSN_BYTES)
+#define UPROBES_XOL_SLOT_BYTES	128	/* to keep it cache aligned */
 
 #define UPROBES_BKPT_INSN 0xcc
 #define UPROBES_BKPT_INSN_SIZE 1
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
new file mode 100644
index 0000000..952e9d7
--- /dev/null
+++ b/include/linux/uprobes.h
@@ -0,0 +1,76 @@
+#ifndef _LINUX_UPROBES_H
+#define _LINUX_UPROBES_H
+/*
+ * Userspace Probes (UProbes)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2008-2010
+ * Authors:
+ *	Srikar Dronamraju
+ *	Jim Keniston
+ */
+
+#ifdef CONFIG_ARCH_SUPPORTS_UPROBES
+#include <asm/uprobes.h>
+#else
+/*
+ * ARCH_SUPPORTS_UPROBES has not be defined.
+ */
+typedef u8 uprobe_opcode_t;
+
+/* Post-execution fixups.  Some architectures may define others. */
+#endif /* CONFIG_ARCH_SUPPORTS_UPROBES */
+
+/* No fixup needed */
+#define UPROBES_FIX_NONE	0x0
+/* Adjust IP back to vicinity of actual insn */
+#define UPROBES_FIX_IP	0x1
+/* Adjust the return address of a call insn */
+#define UPROBES_FIX_CALL	0x2
+/* Might sleep while doing Fixup */
+#define UPROBES_FIX_SLEEPY	0x4
+
+#ifndef UPROBES_FIX_DEFAULT
+#define UPROBES_FIX_DEFAULT UPROBES_FIX_IP
+#endif
+
+/* Unexported functions & macros for use by arch-specific code */
+#define uprobe_opcode_sz (sizeof(uprobe_opcode_t))
+extern unsigned long uprobes_read_vm(struct task_struct *tsk,
+			void __user *vaddr, void *kbuf,
+			unsigned long nbytes);
+extern unsigned long uprobes_write_vm(struct task_struct *tsk,
+			void __user *vaddr, const void *kbuf,
+			unsigned long nbytes);
+
+/*
+ * Most architectures can use the default versions of @read_opcode(),
+ * @set_bkpt(), @set_orig_insn(), and @is_bkpt_insn();
+ *
+ * @set_ip:
+ *	Set the instruction pointer in @regs to @vaddr.
+ * @analyze_insn:
+ *	Analyze @user_bkpt->insn.  Return 0 if @user_bkpt->insn is an
+ *	instruction you can probe, or a negative errno (typically -%EPERM)
+ *	otherwise. Determine what sort of
+ * @pre_xol:
+ * @post_xol:
+ *	XOL-related fixups @post_xol() (and possibly @pre_xol()) will need
+ *	to do for this instruction, and annotate @user_bkpt accordingly.
+ *	You may modify @user_bkpt->insn (e.g., the x86_64 port does this
+ *	for rip-relative instructions).
+ */
+#endif	/* _LINUX_UPROBES_H */
diff --git a/kernel/Makefile b/kernel/Makefile
index 0b5ff08..e53c0c4 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_PERF_EVENTS) += perf_event.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
 obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
 obj-$(CONFIG_PADATA) += padata.o
+obj-$(CONFIG_UPROBES) += uprobes.o
 
 ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
diff --git a/kernel/uprobes.c b/kernel/uprobes.c
new file mode 100644
index 0000000..cb5884b
--- /dev/null
+++ b/kernel/uprobes.c
@@ -0,0 +1,252 @@
+/*
+ * Userspace Probes (UProbes)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2008-2010
+ * Authors:
+ *	Srikar Dronamraju
+ *	Jim Keniston
+ */
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/ptrace.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/uprobes.h>
+#include <linux/rmap.h> /* needed for anon_vma_prepare */
+
+struct uprobe {
+	uprobe_opcode_t		opcode;
+	u16			fixups;
+};
+
+/**
+ * uprobes_read_vm - Read @nbytes at @vaddr from @tsk into @kbuf.
+ * @tsk: The probed task
+ * @vaddr: Source address, in user space to be read.
+ * @kbuf: Destination address, in kernel space.
+ *
+ * Context: This function may sleep.
+ *
+ * Returns number of bytes that could be copied.
+ */
+unsigned long uprobes_read_vm(struct task_struct *tsk, void __user *vaddr,
+					void *kbuf, unsigned long nbytes)
+{
+	if (tsk == current) {
+		unsigned long nleft = copy_from_user(kbuf, vaddr, nbytes);
+		return nbytes - nleft;
+	} else
+		return access_process_vm(tsk, (unsigned long) vaddr, kbuf,
+							nbytes, 0);
+}
+
+/**
+ * uprobes_write_vm - Write @nbytes from @kbuf at @vaddr in @tsk.
+ * Can be used to write to stack or data VM areas, but not instructions.
+ * Not exported, but available for use by arch-specific uprobes code.
+ * @tsk: The probed task
+ * @vaddr: Destination address, in user space.
+ * @kbuf: Source address, in kernel space to be read.
+ *
+ * Context: This function may sleep.
+ *
+ * Return number of bytes written.
+ */
+unsigned long uprobes_write_vm(struct task_struct *tsk, void __user *vaddr,
+				const void *kbuf, unsigned long nbytes)
+{
+	unsigned long nleft;
+
+	if (tsk == current) {
+		nleft = copy_to_user(vaddr, kbuf, nbytes);
+		return nbytes - nleft;
+	} else
+		return access_process_vm(tsk, (unsigned long) vaddr,
+						(void *) kbuf, nbytes, 1);
+}
+
+static int write_opcode(struct task_struct *tsk, unsigned long vaddr,
+						uprobe_opcode_t opcode)
+{
+	struct page *old_page, *new_page;
+	void *vaddr_old, *vaddr_new;
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	pte_t orig_pte;
+	int ret = -EINVAL;
+
+	mm = get_task_mm(tsk);
+	down_read(&mm->mmap_sem);
+
+	/* Read the page with vaddr into memory */
+	ret = get_user_pages(tsk, mm, vaddr, 1, 0, 0, &old_page, &vma);
+	if (ret <= 0)
+		goto mmput_out;
+
+	/*
+	 * check if the page we are interested is read-only mapped
+	 * Since we are interested in text pages, Our pages of interest
+	 * should be mapped read-only.
+	 */
+	if ((vma->vm_flags && (VM_READ|VM_WRITE)) != VM_READ) {
+		ret = -EINVAL;
+		goto put_out;
+	}
+
+	/* If its VM_SHARED vma, lets not write to such vma's.  */
+	if (vma->vm_flags & VM_SHARED) {
+		ret = -EINVAL;
+		goto put_out;
+	}
+
+	/* Allocate a page */
+	new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr);
+	if (!new_page) {
+		ret = -ENOMEM;
+		goto put_out;
+	}
+
+	/*
+	 * lock page will serialize against do_wp_page()'s
+	 * PageAnon() handling
+	 */
+	lock_page(old_page);
+	/* copy the page now that we've got it stable */
+	vaddr_old = kmap_atomic(old_page, KM_USER0);
+	vaddr_new = kmap_atomic(new_page, KM_USER1);
+
+	memcpy(vaddr_new, vaddr_old, PAGE_SIZE);
+	/* poke the new insn in, ASSUMES we don't cross page boundary */
+	vaddr &= ~PAGE_MASK;
+	memcpy(vaddr_new + vaddr, &opcode, uprobe_opcode_sz);
+
+	kunmap_atomic(vaddr_new, KM_USER1);
+	kunmap_atomic(vaddr_old, KM_USER0);
+
+	/* mark page RO so any concurrent access will end up in do_wp_page() */
+	if (write_protect_page(vma, old_page, &orig_pte))
+		goto unlock_out;
+
+	lock_page(new_page);
+	if (!anon_vma_prepare(vma))
+		/* flip pages, do_wp_page() will fail pte_same() and bail */
+		ret = replace_page(vma, old_page, new_page, orig_pte);
+
+	unlock_page(new_page);
+	if (ret != 0)
+		page_cache_release(new_page);
+unlock_out:
+	unlock_page(old_page);
+
+put_out:
+	put_page(old_page); /* we did a get_page in the beginning */
+
+mmput_out:
+	up_read(&mm->mmap_sem);
+	mmput(mm);
+	return ret;
+}
+
+/**
+ * read_opcode - read the opcode at a given virtual address.
+ * @tsk: the probed task.
+ * @vaddr: the virtual address to store the opcode.
+ * @opcode: location to store the read opcode.
+ *
+ * For task @tsk, read the opcode at @vaddr and store it in @opcode.
+ * Return 0 (success) or a negative errno.
+ */
+int __weak read_opcode(struct task_struct *tsk, unsigned long vaddr,
+						uprobe_opcode_t *opcode)
+{
+	unsigned long bytes_read;
+
+	bytes_read = uprobes_read_vm(tsk, (void __user *) vaddr, opcode,
+						uprobe_opcode_sz);
+	return (bytes_read == uprobe_opcode_sz ? 0 : -EFAULT);
+}
+
+/**
+ * set_bkpt - store breakpoint at a given address.
+ * @tsk: the probed task
+ * @vaddr: the virtual address to insert the opcode.
+ *
+ * For task @tsk, store the breakpoint instruction at @vaddr.
+ * Return 0 (success) or a negative errno.
+ */
+int __weak set_bkpt(struct task_struct *tsk, unsigned long vaddr)
+{
+	return write_opcode(tsk, vaddr, UPROBES_BKPT_INSN);
+}
+
+/**
+ * set_orig_insn - Restore the original instruction.
+ * @tsk: the probed task
+ * @vaddr: the virtual address to insert the opcode.
+ * @verify: if true, verify existance of breakpoint instruction.
+ *
+ * For task @tsk, restore the original opcode (opcode) at @vaddr.
+ * Return 0 (success) or a negative errno.
+ */
+int __weak set_orig_insn(struct task_struct *tsk, unsigned long vaddr,
+				bool verify, struct uprobe *uprobe)
+{
+	if (verify) {
+		uprobe_opcode_t opcode;
+		int result = read_opcode(tsk, vaddr, &opcode);
+		if (result)
+			return result;
+		if (opcode != UPROBES_BKPT_INSN)
+			return -EINVAL;
+	}
+	return write_opcode(tsk, vaddr, uprobe->opcode);
+}
+
+static void print_insert_fail(struct task_struct *tsk,
+			unsigned long vaddr, const char *why)
+{
+	printk(KERN_ERR "Can't place breakpoint at pid %d vaddr %#lx: %s\n",
+					tsk->pid, vaddr, why);
+}
+
+/*
+ * uprobes_resume_can_sleep - Check if fixup might result in sleep.
+ * @uprobes: the probepoint information.
+ *
+ * Returns true if fixup might result in sleep.
+ */
+static bool uprobes_resume_can_sleep(struct uprobe *uprobe)
+{
+	return uprobe->fixups & UPROBES_FIX_SLEEPY;
+}
+
+/**
+ * is_bkpt_insn - check if instruction is breakpoint instruction.
+ * @uprobe: the probepoint information.
+ * Default implementation of is_bkpt_insn
+ * Returns true if @uprobe->opcode is @bkpt_insn.
+ */
+bool __weak is_bkpt_insn(struct uprobe *uprobe)
+{
+	return (uprobe->opcode == UPROBES_BKPT_INSN);
+}
+

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 4/20]  4: uprobes: Adding and remove a uprobe in a rb tree.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (2 preceding siblings ...)
  2010-12-16  9:57 ` [RFC] [PATCH 2.6.37-rc5-tip 3/20] 3: uprobes: Breakground page replacement Srikar Dronamraju
@ 2010-12-16  9:58 ` Srikar Dronamraju
  2011-01-25 12:15   ` Peter Zijlstra
                     ` (3 more replies)
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 5/20] 5: Uprobes: register/unregister probes Srikar Dronamraju
                   ` (16 subsequent siblings)
  20 siblings, 4 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney


Provides interfaces to add and remove uprobes from the global rb tree.
Also provides definitions for uprobe_consumer, interfaces to add and
remove a consumer to a uprobe.  There is a unique uprobe element in the
rbtree for each unique inode:offset pair.

Uprobe gets added to the global rb tree when the first consumer for that
uprobe gets registered. It gets removed from the tree only when all
registered consumers are unregistered.

Multiple consumers can share the same probe. Each consumer provides a
filter to limit the tasks on which the handler should run, a handler
that runs on probe hit and a value which helps filter callback to limit
the tasks on which the handler should run.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 include/linux/uprobes.h |   14 +++
 kernel/uprobes.c        |  209 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 223 insertions(+), 0 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 952e9d7..94557ff 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -23,6 +23,7 @@
  *	Jim Keniston
  */
 
+#include <linux/rbtree.h>
 #ifdef CONFIG_ARCH_SUPPORTS_UPROBES
 #include <asm/uprobes.h>
 #else
@@ -56,6 +57,19 @@ extern unsigned long uprobes_write_vm(struct task_struct *tsk,
 			void __user *vaddr, const void *kbuf,
 			unsigned long nbytes);
 
+struct uprobe_consumer {
+	int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);
+	/*
+	 * filter is optional; If a filter exists, handler is run
+	 * if and only if filter returns true.
+	 */
+	bool (*filter)(struct uprobe_consumer *self, struct task_struct *task);
+
+	struct uprobe_consumer *next;
+	void *fvalue;	/* filter value */
+};
+
+
 /*
  * Most architectures can use the default versions of @read_opcode(),
  * @set_bkpt(), @set_orig_insn(), and @is_bkpt_insn();
diff --git a/kernel/uprobes.c b/kernel/uprobes.c
index cb5884b..ba8ff99 100644
--- a/kernel/uprobes.c
+++ b/kernel/uprobes.c
@@ -34,6 +34,12 @@
 #include <linux/rmap.h> /* needed for anon_vma_prepare */
 
 struct uprobe {
+	struct rb_node		rb_node;	/* node in the rb tree */
+	atomic_t		ref;		/* lifetime muck */
+	struct rw_semaphore	consumer_rwsem;
+	struct uprobe_consumer	*consumers;
+	struct inode		*inode;		/* we hold a ref */
+	unsigned long		offset;
 	uprobe_opcode_t		opcode;
 	u16			fixups;
 };
@@ -250,3 +256,206 @@ bool __weak is_bkpt_insn(struct uprobe *uprobe)
 	return (uprobe->opcode == UPROBES_BKPT_INSN);
 }
 
+static struct rb_root uprobes_tree = RB_ROOT;
+static DEFINE_MUTEX(uprobes_mutex);
+static DEFINE_SPINLOCK(treelock);
+
+static int match_inode(struct uprobe *uprobe, struct inode *inode,
+						struct rb_node **p)
+{
+	struct rb_node *n = *p;
+
+	if (inode < uprobe->inode)
+		*p = n->rb_left;
+	else if (inode > uprobe->inode)
+		*p = n->rb_right;
+	else
+		return 1;
+	return 0;
+}
+
+static int match_offset(struct uprobe *uprobe, unsigned long offset,
+						struct rb_node **p)
+{
+	struct rb_node *n = *p;
+
+	if (offset < uprobe->offset)
+		*p = n->rb_left;
+	else if (offset > uprobe->offset)
+		*p = n->rb_right;
+	else
+		return 1;
+	return 0;
+}
+
+/*
+ * Find a uprobe corresponding to a given inode:offset
+ * Acquires treelock
+ */
+static struct uprobe *find_uprobe(struct inode * inode,
+					 unsigned long offset)
+{
+	struct rb_node *n = uprobes_tree.rb_node;
+	struct uprobe *uprobe, *u = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&treelock, flags);
+	while (n) {
+		uprobe = rb_entry(n, struct uprobe, rb_node);
+
+		if (match_inode(uprobe, inode, &n)) {
+			if (match_offset(uprobe, offset, &n)) {
+				if (atomic_inc_not_zero(&uprobe->ref))
+					u = uprobe;
+				break;
+			}
+		}
+	}
+	spin_unlock_irqrestore(&treelock, flags);
+	return u;
+}
+
+/*
+ * Check if a uprobe is already inserted;
+ *	If it does; return refcount incremented uprobe
+ *	else add the current uprobe and return NULL
+ * Acquires treelock.
+ */
+static struct uprobe *insert_uprobe_rb_node(struct uprobe *uprobe)
+{
+	struct rb_node **p = &uprobes_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct uprobe *u;
+	unsigned long flags;
+
+	spin_lock_irqsave(&treelock, flags);
+	while (*p) {
+		parent = *p;
+		u = rb_entry(parent, struct uprobe, rb_node);
+		if (u->inode > uprobe->inode)
+			p = &(*p)->rb_left;
+		else if (u->inode < uprobe->inode)
+			p = &(*p)->rb_right;
+		else {
+			if (u->offset > uprobe->offset)
+				p = &(*p)->rb_left;
+			else if (u->offset < uprobe->offset)
+				p = &(*p)->rb_right;
+			else {
+				atomic_inc(&u->ref);
+				goto unlock_return;
+			}
+		}
+	}
+	u = NULL;
+	rb_link_node(&uprobe->rb_node, parent, p);
+	rb_insert_color(&uprobe->rb_node, &uprobes_tree);
+	atomic_set(&uprobe->ref, 2);
+
+unlock_return:
+	spin_unlock_irqrestore(&treelock, flags);
+	return u;
+}
+
+/* Should be called lock-less */
+static void put_uprobe(struct uprobe *uprobe)
+{
+	if (atomic_dec_and_test(&uprobe->ref))
+		kfree(uprobe);
+}
+
+static int valid_vma(struct vm_area_struct *vma)
+{
+	if (!vma->vm_file)
+		return 0;
+
+	if ((vma->vm_flags & (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)) ==
+						(VM_READ|VM_EXEC))
+		return 1;
+
+	return 0;
+}
+
+/* Acquires uprobes_mutex */
+static struct uprobe *uprobes_add(struct inode *inode,
+					unsigned long offset)
+{
+	struct uprobe *uprobe, *cur_uprobe;
+
+	__iget(inode);
+	uprobe = kzalloc(sizeof(struct uprobe), GFP_KERNEL);
+
+	if (!uprobe) {
+		iput(inode);
+		return NULL;
+	}
+	uprobe->inode = inode;
+	uprobe->offset = offset;
+
+	/* add to uprobes_tree, sorted on inode:offset */
+	cur_uprobe = insert_uprobe_rb_node(uprobe);
+
+	/* a uprobe exists for this inode:offset combination*/
+	if (cur_uprobe) {
+		kfree(uprobe);
+		uprobe = cur_uprobe;
+		iput(inode);
+	} else
+		init_rwsem(&uprobe->consumer_rwsem);
+
+	return uprobe;
+}
+
+/* Acquires uprobe->consumer_rwsem */
+static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
+{
+	struct uprobe_consumer *consumer = uprobe->consumers;
+
+	down_read(&uprobe->consumer_rwsem);
+	while (consumer) {
+		if (!consumer->filter || consumer->filter(consumer, current))
+			consumer->handler(consumer, regs);
+
+		consumer = consumer->next;
+	}
+	up_read(&uprobe->consumer_rwsem);
+}
+
+/* Acquires uprobe->consumer_rwsem */
+static void add_consumer(struct uprobe *uprobe,
+				struct uprobe_consumer *consumer)
+{
+	down_write(&uprobe->consumer_rwsem);
+	consumer->next = uprobe->consumers;
+	uprobe->consumers = consumer;
+	up_write(&uprobe->consumer_rwsem);
+	return;
+}
+
+/* Acquires uprobe->consumer_rwsem */
+static int del_consumer(struct uprobe *uprobe,
+				struct uprobe_consumer *consumer)
+{
+	struct uprobe_consumer *con;
+	int ret = 0;
+
+	down_write(&uprobe->consumer_rwsem);
+	con = uprobe->consumers;
+	if (consumer == con) {
+		uprobe->consumers = con->next;
+		if (!con->next)
+			put_uprobe(uprobe);
+		ret = 1;
+	} else {
+		for (; con; con = con->next) {
+			if (con->next == consumer) {
+				con->next = consumer->next;
+				ret = 1;
+				break;
+			}
+		}
+	}
+	up_write(&uprobe->consumer_rwsem);
+	return ret;
+}
+

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (3 preceding siblings ...)
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 4/20] 4: uprobes: Adding and remove a uprobe in a rb tree Srikar Dronamraju
@ 2010-12-16  9:58 ` Srikar Dronamraju
  2011-01-25 12:15   ` Peter Zijlstra
  2011-01-25 12:15   ` Peter Zijlstra
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 6/20] 6: x86: analyze instruction and determine fixups Srikar Dronamraju
                   ` (15 subsequent siblings)
  20 siblings, 2 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Linux-mm,
	Arnaldo Carvalho de Melo, Linus Torvalds,
	Ananth N Mavinakayanahalli, Christoph Hellwig, Masami Hiramatsu,
	Oleg Nesterov, Andrew Morton, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, LKML, Paul E. McKenney


A probe is specified by a file:offset.  While registering, a breakpoint
is inserted for the first consumer, On subsequent probes, the consumer
gets appended to the existing consumers. While unregistering a
breakpoint is removed if the consumer happens to be the last consumer.
All other unregisterations, the consumer is deleted from the list of
consumers.

Probe specifications are maintained in a rb tree. A probe specification
is converted into a uprobe before store in a rb tree.  A uprobe can be
shared by many consumers.

Given a inode, we get a list of mm's that have mapped the inode.
However we want to limit the probes to certain processes/threads.  The
filtering should be at thread level. To limit the probes to a certain
processes/threads, we would want to walk through the list of threads
whose mm member refer to a given mm.

Here are the options that I thought of:
1. Use mm->owner and walk thro the thread_group of mm->owner, siblings
of mm->owner, siblings of parent of mm->owner.  This should be
good list to traverse. Not sure if this is an exhaustive
enough list that all tasks that have a mm set to this mm_struct are
walked through.

2. Install probes on all mm's that have mapped the probes and filter
only at probe hit time.

3. walk thro do_each_thread; while_each_thread; I think this will catch
all tasks that have a mm set to the given mm. However this might
be too heavy esp if mm corresponds to a library.

4. add a list_head element to the mm struct and update the list whenever
the task->mm thread gets updated. This could mean extending the current
mm->owner. However there is some maintainance overhead.

Currently we use the second approach, i.e probe all mm's that have mapped
the probes and filter only at probe hit.

Also would be interested to know if there are ways to call
replace_page without having to take mmap_sem.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 include/linux/mm_types.h |    5 +
 include/linux/uprobes.h  |   32 +++++++++
 kernel/uprobes.c         |  161 +++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 187 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index bb7288a..af2b55d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -312,6 +312,11 @@ struct mm_struct {
 #endif
 	/* How many tasks sharing this mm are OOM_DISABLE */
 	atomic_t oom_disable_count;
+#ifdef CONFIG_UPROBES
+	unsigned long uprobes_vaddr;
+	struct list_head uprobes_list;
+	atomic_t uprobes_count;
+#endif
 };
 
 /* Future-safe accessor for struct mm_struct's cpu_vm_mask. */
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 94557ff..f62c7b0 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -31,6 +31,7 @@
  * ARCH_SUPPORTS_UPROBES has not be defined.
  */
 typedef u8 uprobe_opcode_t;
+struct uprobe_arch_info	{};		/* arch specific info*/
 
 /* Post-execution fixups.  Some architectures may define others. */
 #endif /* CONFIG_ARCH_SUPPORTS_UPROBES */
@@ -69,6 +70,19 @@ struct uprobe_consumer {
 	void *fvalue;	/* filter value */
 };
 
+struct uprobe {
+	struct rb_node		rb_node;	/* node in the rb tree */
+	atomic_t		ref;
+	struct rw_semaphore	consumer_rwsem;
+	struct uprobe_arch_info	arch_info;	/* arch specific info if any */
+	struct uprobe_consumer	*consumers;
+	struct inode		*inode;		/* Also hold a ref to inode */
+	unsigned long		offset;
+	uprobe_opcode_t		opcode;
+	u16			fixups;
+	int			copy;
+	u8			insn[MAX_UINSN_BYTES];	/* orig instruction */
+};
 
 /*
  * Most architectures can use the default versions of @read_opcode(),
@@ -87,4 +101,22 @@ struct uprobe_consumer {
  *	You may modify @user_bkpt->insn (e.g., the x86_64 port does this
  *	for rip-relative instructions).
  */
+
+#ifdef CONFIG_UPROBES
+extern int register_uprobe(struct inode *inode, unsigned long offset,
+				struct uprobe_consumer *consumer);
+extern void unregister_uprobe(struct inode *inode, unsigned long offset,
+				struct uprobe_consumer *consumer);
+#else /* CONFIG_UPROBES is not defined */
+static inline int register_uprobe(struct inode *inode, unsigned long offset,
+				struct uprobe_consumer *consumer)
+{
+	return -ENOSYS;
+}
+static inline void unregister_uprobe(struct inode *inode, unsigned long offset,
+				struct uprobe_consumer *consumer)
+{
+}
+
+#endif /* CONFIG_UPROBES */
 #endif	/* _LINUX_UPROBES_H */
diff --git a/kernel/uprobes.c b/kernel/uprobes.c
index ba8ff99..8a5da38 100644
--- a/kernel/uprobes.c
+++ b/kernel/uprobes.c
@@ -33,17 +33,6 @@
 #include <linux/uprobes.h>
 #include <linux/rmap.h> /* needed for anon_vma_prepare */
 
-struct uprobe {
-	struct rb_node		rb_node;	/* node in the rb tree */
-	atomic_t		ref;		/* lifetime muck */
-	struct rw_semaphore	consumer_rwsem;
-	struct uprobe_consumer	*consumers;
-	struct inode		*inode;		/* we hold a ref */
-	unsigned long		offset;
-	uprobe_opcode_t		opcode;
-	u16			fixups;
-};
-
 /**
  * uprobes_read_vm - Read @nbytes at @vaddr from @tsk into @kbuf.
  * @tsk: The probed task
@@ -459,3 +448,153 @@ static int del_consumer(struct uprobe *uprobe,
 	return ret;
 }
 
+static int install_uprobe(struct mm_struct *mm, struct uprobe *uprobe)
+{
+	int ret = 0;
+
+	/*TODO: install breakpoint */
+	if (!ret)
+		atomic_inc(&mm->uprobes_count);
+	return ret;
+}
+
+static int remove_uprobe(struct mm_struct *mm, struct uprobe *uprobe)
+{
+	int ret = 0;
+
+	/*TODO: remove breakpoint */
+	if (!ret)
+		atomic_dec(&mm->uprobes_count);
+
+	return ret;
+}
+
+/* Returns 0 if it can install one probe */
+int register_uprobe(struct inode *inode, unsigned long offset,
+				struct uprobe_consumer *consumer)
+{
+	struct prio_tree_iter iter;
+	struct list_head tmp_list;
+	struct address_space *mapping;
+	struct mm_struct *mm, *tmpmm;
+	struct vm_area_struct *vma;
+	struct uprobe *uprobe;
+	int ret = -1;
+
+	if (!inode || !consumer || consumer->next)
+		return -EINVAL;
+	uprobe = uprobes_add(inode, offset);
+	INIT_LIST_HEAD(&tmp_list);
+
+	mapping = inode->i_mapping;
+
+	mutex_lock(&uprobes_mutex);
+	if (uprobe->consumers) {
+		ret = 0;
+		goto consumers_add;
+	}
+
+	spin_lock(&mapping->i_mmap_lock);
+	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, 0, 0) {
+		if (!atomic_inc_not_zero(&vma->vm_mm->mm_users))
+			continue;
+
+		mm = vma->vm_mm;
+		if (!valid_vma(vma)) {
+			mmput(mm);
+			continue;
+		}
+
+		list_add(&mm->uprobes_list, &tmp_list);
+		mm->uprobes_vaddr = vma->vm_start + offset;
+	}
+	spin_unlock(&mapping->i_mmap_lock);
+
+	if (list_empty(&tmp_list)) {
+		ret = 0;
+		goto consumers_add;
+	}
+	list_for_each_entry_safe(mm, tmpmm, &tmp_list, uprobes_list) {
+		if (!install_uprobe(mm, uprobe))
+			ret = 0;
+		list_del(&mm->uprobes_list);
+		mmput(mm);
+	}
+
+consumers_add:
+	add_consumer(uprobe, consumer);
+	mutex_unlock(&uprobes_mutex);
+	put_uprobe(uprobe);
+	return ret;
+}
+
+void unregister_uprobe(struct inode *inode, unsigned long offset,
+				struct uprobe_consumer *consumer)
+{
+	struct prio_tree_iter iter;
+	struct list_head tmp_list;
+	struct address_space *mapping;
+	struct mm_struct *mm, *tmpmm;
+	struct vm_area_struct *vma;
+	struct uprobe *uprobe;
+
+	if (!inode || !consumer)
+		return;
+
+	uprobe = find_uprobe(inode, offset);
+	if (!uprobe) {
+		printk(KERN_ERR "No uprobe found with inode:offset %p %lu\n",
+				inode, offset);
+		return;
+	}
+
+	if (!del_consumer(uprobe, consumer)) {
+		printk(KERN_ERR "No uprobe found with consumer %p\n",
+				consumer);
+		return;
+	}
+
+	INIT_LIST_HEAD(&tmp_list);
+
+	mapping = inode->i_mapping;
+
+	mutex_lock(&uprobes_mutex);
+	if (uprobe->consumers)
+		goto put_unlock;
+
+	spin_lock(&mapping->i_mmap_lock);
+	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, 0, 0) {
+		if (!atomic_inc_not_zero(&vma->vm_mm->mm_users))
+			continue;
+
+		mm = vma->vm_mm;
+
+		if (!atomic_read(&mm->uprobes_count)) {
+			mmput(mm);
+			continue;
+		}
+
+		if (valid_vma(vma)) {
+			list_add(&mm->uprobes_list, &tmp_list);
+			mm->uprobes_vaddr = vma->vm_start + offset;
+		} else
+			mmput(mm);
+	}
+	spin_unlock(&mapping->i_mmap_lock);
+	list_for_each_entry_safe(mm, tmpmm, &tmp_list, uprobes_list) {
+		remove_uprobe(mm, uprobe);
+		list_del(&mm->uprobes_list);
+		mmput(mm);
+	}
+
+	if (atomic_read(&uprobe->ref) == 1) {
+		synchronize_sched();
+		rb_erase(&uprobe->rb_node, &uprobes_tree);
+		iput(uprobe->inode);
+	}
+
+put_unlock:
+	mutex_unlock(&uprobes_mutex);
+	put_uprobe(uprobe);
+}
+

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 6/20]  6: x86: analyze instruction and determine fixups.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (4 preceding siblings ...)
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 5/20] 5: Uprobes: register/unregister probes Srikar Dronamraju
@ 2010-12-16  9:58 ` Srikar Dronamraju
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 7/20] 7: uprobes: store/restore original instruction Srikar Dronamraju
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Arnaldo Carvalho de Melo,
	Linus Torvalds, Andi Kleen, Christoph Hellwig,
	Ananth N Mavinakayanahalli, Masami Hiramatsu, Oleg Nesterov,
	LKML, Linux-mm, Jim Keniston, Frederic Weisbecker, SystemTap,
	Andrew Morton, Paul E. McKenney


The instruction analysis is based on x86 instruction decoder and
determines if an instruction can be probed and determines the necessary
fixups after singlestep.  Instruction analysis is done at probe
insertion time so that we avoid having to repeat the same analysis every
time a probe is hit.

Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/x86/include/asm/uprobes.h |    2 
 arch/x86/kernel/Makefile       |    1 
 arch/x86/kernel/uprobes.c      |  415 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 418 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/uprobes.c

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 5026359..0063207 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -37,4 +37,6 @@ struct uprobe_arch_info {
 #else
 struct uprobe_arch_info {};
 #endif
+struct uprobe;
+extern int analyze_insn(struct task_struct *tsk, struct uprobe *uprobe);
 #endif	/* _ASM_UPROBES_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index f60153d..2146b87 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_MICROCODE)			+= microcode.o
 obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o
 
 obj-$(CONFIG_SWIOTLB)			+= pci-swiotlb.o
+obj-$(CONFIG_UPROBES)			+= uprobes.o
 
 ###
 # 64 bit specific files
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
new file mode 100644
index 0000000..352c71f
--- /dev/null
+++ b/arch/x86/kernel/uprobes.c
@@ -0,0 +1,415 @@
+/*
+ * Userspace Probes (UProbes) for x86
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2008-2010
+ * Authors:
+ *	Srikar Dronamraju
+ *	Jim Keniston
+ */
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/ptrace.h>
+#include <linux/uprobes.h>
+
+#include <linux/kdebug.h>
+#include <asm/insn.h>
+
+#ifdef CONFIG_X86_32
+#define is_32bit_app(tsk) 1
+#else
+#define is_32bit_app(tsk) (test_tsk_thread_flag(tsk, TIF_IA32))
+#endif
+
+#define UPROBES_FIX_RIP_AX	0x8000
+#define UPROBES_FIX_RIP_CX	0x4000
+
+/* Adaptations for mhiramat x86 decoder v14. */
+#define OPCODE1(insn) ((insn)->opcode.bytes[0])
+#define OPCODE2(insn) ((insn)->opcode.bytes[1])
+#define OPCODE3(insn) ((insn)->opcode.bytes[2])
+#define MODRM_REG(insn) X86_MODRM_REG(insn->modrm.value)
+
+#define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
+	(((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) |   \
+	  (b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) |   \
+	  (b8##UL << 0x8)|(b9##UL << 0x9)|(ba##UL << 0xa)|(bb##UL << 0xb) |   \
+	  (bc##UL << 0xc)|(bd##UL << 0xd)|(be##UL << 0xe)|(bf##UL << 0xf))    \
+	 << (row % 32))
+
+
+static const u32 good_insns_64[256 / 32] = {
+	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
+	/*      ----------------------------------------------         */
+	W(0x00, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) | /* 00 */
+	W(0x10, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) , /* 10 */
+	W(0x20, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) | /* 20 */
+	W(0x30, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) , /* 30 */
+	W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */
+	W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */
+	W(0x60, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */
+	W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 70 */
+	W(0x80, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
+	W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */
+	W(0xa0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* a0 */
+	W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* b0 */
+	W(0xc0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */
+	W(0xd0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
+	W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* e0 */
+	W(0xf0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1)   /* f0 */
+	/*      ----------------------------------------------         */
+	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
+};
+
+/* Good-instruction tables for 32-bit apps */
+
+static const u32 good_insns_32[256 / 32] = {
+	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
+	/*      ----------------------------------------------         */
+	W(0x00, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) | /* 00 */
+	W(0x10, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) , /* 10 */
+	W(0x20, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1) | /* 20 */
+	W(0x30, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1) , /* 30 */
+	W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */
+	W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */
+	W(0x60, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */
+	W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 70 */
+	W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
+	W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */
+	W(0xa0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* a0 */
+	W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* b0 */
+	W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */
+	W(0xd0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
+	W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* e0 */
+	W(0xf0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1)   /* f0 */
+	/*      ----------------------------------------------         */
+	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
+};
+
+/* Using this for both 64-bit and 32-bit apps */
+static const u32 good_2byte_insns[256 / 32] = {
+	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
+	/*      ----------------------------------------------         */
+	W(0x00, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1) | /* 00 */
+	W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1) , /* 10 */
+	W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 20 */
+	W(0x30, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 30 */
+	W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */
+	W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */
+	W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 60 */
+	W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1) , /* 70 */
+	W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
+	W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */
+	W(0xa0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1) | /* a0 */
+	W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1) , /* b0 */
+	W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* c0 */
+	W(0xd0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
+	W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* e0 */
+	W(0xf0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0)   /* f0 */
+	/*      ----------------------------------------------         */
+	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
+};
+#undef W
+
+/*
+ * opcodes we'll probably never support:
+ * 6c-6d, e4-e5, ec-ed - in
+ * 6e-6f, e6-e7, ee-ef - out
+ * cc, cd - int3, int
+ * cf - iret
+ * d6 - illegal instruction
+ * f1 - int1/icebp
+ * f4 - hlt
+ * fa, fb - cli, sti
+ * 0f - lar, lsl, syscall, clts, sysret, sysenter, sysexit, invd, wbinvd, ud2
+ *
+ * invalid opcodes in 64-bit mode:
+ * 06, 0e, 16, 1e, 27, 2f, 37, 3f, 60-62, 82, c4-c5, d4-d5
+ *
+ * 63 - we support this opcode in x86_64 but not in i386.
+ *
+ * opcodes we may need to refine support for:
+ * 0f - 2-byte instructions: For many of these instructions, the validity
+ * depends on the prefix and/or the reg field.  On such instructions, we
+ * just consider the opcode combination valid if it corresponds to any
+ * valid instruction.
+ * 8f - Group 1 - only reg = 0 is OK
+ * c6-c7 - Group 11 - only reg = 0 is OK
+ * d9-df - fpu insns with some illegal encodings
+ * f2, f3 - repnz, repz prefixes.  These are also the first byte for
+ * certain floating-point instructions, such as addsd.
+ * fe - Group 4 - only reg = 0 or 1 is OK
+ * ff - Group 5 - only reg = 0-6 is OK
+ *
+ * others -- Do we need to support these?
+ * 0f - (floating-point?) prefetch instructions
+ * 07, 17, 1f - pop es, pop ss, pop ds
+ * 26, 2e, 36, 3e - es:, cs:, ss:, ds: segment prefixes --
+ *	but 64 and 65 (fs: and gs:) seem to be used, so we support them
+ * 67 - addr16 prefix
+ * ce - into
+ * f0 - lock prefix
+ */
+
+/*
+ * TODO:
+ * - Where necessary, examine the modrm byte and allow only valid instructions
+ * in the different Groups and fpu instructions.
+ */
+
+static bool is_prefix_bad(struct insn *insn)
+{
+	int i;
+
+	for (i = 0; i < insn->prefixes.nbytes; i++) {
+		switch (insn->prefixes.bytes[i]) {
+		case 0x26:	 /*INAT_PFX_ES   */
+		case 0x2E:	 /*INAT_PFX_CS   */
+		case 0x36:	 /*INAT_PFX_DS   */
+		case 0x3E:	 /*INAT_PFX_SS   */
+		case 0xF0:	 /*INAT_PFX_LOCK */
+			return 1;
+		}
+	}
+	return 0;
+}
+
+static void report_bad_prefix(void)
+{
+	printk(KERN_ERR "uprobes does not currently support probing "
+		"instructions with any of the following prefixes: "
+		"cs:, ds:, es:, ss:, lock:\n");
+}
+
+static void report_bad_1byte_opcode(int mode, uprobe_opcode_t op)
+{
+	printk(KERN_ERR "In %d-bit apps, "
+		"uprobes does not currently support probing "
+		"instructions whose first byte is 0x%2.2x\n", mode, op);
+}
+
+static void report_bad_2byte_opcode(uprobe_opcode_t op)
+{
+	printk(KERN_ERR "uprobes does not currently support probing "
+		"instructions with the 2-byte opcode 0x0f 0x%2.2x\n", op);
+}
+
+static int validate_insn_32bits(struct uprobe *uprobe, struct insn *insn)
+{
+	insn_init(insn, uprobe->insn, false);
+
+	/* Skip good instruction prefixes; reject "bad" ones. */
+	insn_get_opcode(insn);
+	if (is_prefix_bad(insn)) {
+		report_bad_prefix();
+		return -EPERM;
+	}
+	if (test_bit(OPCODE1(insn), (unsigned long *) good_insns_32))
+		return 0;
+	if (insn->opcode.nbytes == 2) {
+		if (test_bit(OPCODE2(insn),
+					(unsigned long *) good_2byte_insns))
+			return 0;
+		report_bad_2byte_opcode(OPCODE2(insn));
+	} else
+		report_bad_1byte_opcode(32, OPCODE1(insn));
+	return -EPERM;
+}
+
+static int validate_insn_64bits(struct uprobe *uprobe, struct insn *insn)
+{
+	insn_init(insn, uprobe->insn, true);
+
+	/* Skip good instruction prefixes; reject "bad" ones. */
+	insn_get_opcode(insn);
+	if (is_prefix_bad(insn)) {
+		report_bad_prefix();
+		return -EPERM;
+	}
+	if (test_bit(OPCODE1(insn), (unsigned long *) good_insns_64))
+		return 0;
+	if (insn->opcode.nbytes == 2) {
+		if (test_bit(OPCODE2(insn),
+					(unsigned long *) good_2byte_insns))
+			return 0;
+		report_bad_2byte_opcode(OPCODE2(insn));
+	} else
+		report_bad_1byte_opcode(64, OPCODE1(insn));
+	return -EPERM;
+}
+
+/*
+ * Figure out which fixups post_xol() will need to perform, and annotate
+ * uprobe->fixups accordingly.  To start with, uprobe->fixups is
+ * either zero or it reflects rip-related fixups.
+ */
+static void prepare_fixups(struct uprobe *uprobe, struct insn *insn)
+{
+	bool fix_ip = true, fix_call = false;	/* defaults */
+	insn_get_opcode(insn);	/* should be a nop */
+
+	switch (OPCODE1(insn)) {
+	case 0xc3:		/* ret/lret */
+	case 0xcb:
+	case 0xc2:
+	case 0xca:
+		/* ip is correct */
+		fix_ip = false;
+		break;
+	case 0xe8:		/* call relative - Fix return addr */
+		fix_call = true;
+		break;
+	case 0x9a:		/* call absolute - Fix return addr, not ip */
+		fix_call = true;
+		fix_ip = false;
+		break;
+	case 0xff:
+	    {
+		int reg;
+		insn_get_modrm(insn);
+		reg = MODRM_REG(insn);
+		if (reg == 2 || reg == 3) {
+			/* call or lcall, indirect */
+			/* Fix return addr; ip is correct. */
+			fix_call = true;
+			fix_ip = false;
+		} else if (reg == 4 || reg == 5) {
+			/* jmp or ljmp, indirect */
+			/* ip is correct. */
+			fix_ip = false;
+		}
+		break;
+	    }
+	case 0xea:		/* jmp absolute -- ip is correct */
+		fix_ip = false;
+		break;
+	default:
+		break;
+	}
+	if (fix_ip)
+		uprobe->fixups |= UPROBES_FIX_IP;
+	if (fix_call)
+		uprobe->fixups |=
+			(UPROBES_FIX_CALL | UPROBES_FIX_SLEEPY);
+}
+
+#ifdef CONFIG_X86_64
+/*
+ * If uprobe->insn doesn't use rip-relative addressing, return 0.  Otherwise,
+ * rewrite the instruction so that it accesses its memory operand
+ * indirectly through a scratch register.  Set uprobe->fixups and
+ * uprobe->arch_info.rip_rela_target_address accordingly.  (The contents of the
+ * scratch register will be saved before we single-step the modified
+ * instruction, and restored afterward.)  Return 1.
+ *
+ * We do this because a rip-relative instruction can access only a
+ * relatively small area (+/- 2 GB from the instruction), and the XOL
+ * area typically lies beyond that area.  At least for instructions
+ * that store to memory, we can't execute the original instruction
+ * and "fix things up" later, because the misdirected store could be
+ * disastrous.
+ *
+ * Some useful facts about rip-relative instructions:
+ * - There's always a modrm byte.
+ * - There's never a SIB byte.
+ * - The displacement is always 4 bytes.
+ */
+static int handle_riprel_insn(struct uprobe *uprobe, struct insn *insn)
+{
+	u8 *cursor;
+	u8 reg;
+
+	if (!insn_rip_relative(insn))
+		return 0;
+	/*
+	 * Point cursor at the modrm byte.  The next 4 bytes are the
+	 * displacement.  Beyond the displacement, for some instructions,
+	 * is the immediate operand.
+	 */
+	cursor = uprobe->insn + insn->prefixes.nbytes
+			+ insn->rex_prefix.nbytes + insn->opcode.nbytes;
+	insn_get_length(insn);
+
+	/*
+	 * Convert from rip-relative addressing to indirect addressing
+	 * via a scratch register.  Change the r/m field from 0x5 (%rip)
+	 * to 0x0 (%rax) or 0x1 (%rcx), and squeeze out the offset field.
+	 */
+	reg = MODRM_REG(insn);
+	if (reg == 0) {
+		/*
+		 * The register operand (if any) is either the A register
+		 * (%rax, %eax, etc.) or (if the 0x4 bit is set in the
+		 * REX prefix) %r8.  In any case, we know the C register
+		 * is NOT the register operand, so we use %rcx (register
+		 * #1) for the scratch register.
+		 */
+		uprobe->fixups = UPROBES_FIX_RIP_CX;
+		/* Change modrm from 00 000 101 to 00 000 001. */
+		*cursor = 0x1;
+	} else {
+		/* Use %rax (register #0) for the scratch register. */
+		uprobe->fixups = UPROBES_FIX_RIP_AX;
+		/* Change modrm from 00 xxx 101 to 00 xxx 000 */
+		*cursor = (reg << 3);
+	}
+
+	/* Target address = address of next instruction + (signed) offset */
+	uprobe->arch_info.rip_rela_target_address = (long) insn->length
+					+ insn->displacement.value;
+	/* Displacement field is gone; slide immediate field (if any) over. */
+	if (insn->immediate.nbytes) {
+		cursor++;
+		memmove(cursor, cursor + insn->displacement.nbytes,
+						insn->immediate.nbytes);
+	}
+	return 1;
+}
+#endif /* CONFIG_X86_64 */
+
+/**
+ * analyze_insn - instruction analysis including validity and fixups.
+ * @tsk: the probed task.
+ * @uprobe: the probepoint information.
+ * Return 0 on success or a -ve number on error.
+ */
+int analyze_insn(struct task_struct *tsk, struct uprobe *uprobe)
+{
+	int ret;
+	struct insn insn;
+
+	uprobe->fixups = 0;
+#ifdef CONFIG_X86_64
+	uprobe->arch_info.rip_rela_target_address = 0x0;
+#endif
+
+	if (is_32bit_app(tsk))
+		ret = validate_insn_32bits(uprobe, &insn);
+	else
+		ret = validate_insn_64bits(uprobe, &insn);
+	if (ret != 0)
+		return ret;
+#ifdef CONFIG_X86_64
+	ret = handle_riprel_insn(uprobe, &insn);
+	if (ret == -1)
+		/* rip-relative; can't XOL */
+		return 0;
+#endif
+	prepare_fixups(uprobe, &insn);
+	return 0;
+}
+

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 7/20]  7: uprobes: store/restore original instruction.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (5 preceding siblings ...)
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 6/20] 6: x86: analyze instruction and determine fixups Srikar Dronamraju
@ 2010-12-16  9:58 ` Srikar Dronamraju
  2011-01-25 12:15   ` Peter Zijlstra
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 8/20] 8: uprobes: mmap and fork hooks Srikar Dronamraju
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, Andrew Morton, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, LKML,
	Paul E. McKenney


On the first probe insertion, copy the original instruction and opcode.
If multiple vmas map the same text area corresponding to an inode, we
only need to copy the instruction just once.
The copied instruction is further copied to a designated slot on probe
hit.  Its also used at the time of probe removal to restore the original
instruction.
opcode is used to analyze the instruction and determine the fixups.
Determining fixups at probe hit time would result in doing the same
operation on every probe hit. Hence Instruction analysis using the
opcode is done at probe insertion time.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/Kconfig     |    1 +
 kernel/uprobes.c |   61 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 6e8f26e..bba8108 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -65,6 +65,7 @@ config UPROBES
 	bool "User-space probes (EXPERIMENTAL)"
 	depends on ARCH_SUPPORTS_UPROBES
 	depends on MMU
+	select MM_OWNER
 	help
 	  Uprobes enables kernel subsystems to establish probepoints
 	  in user applications and execute handler functions when
diff --git a/kernel/uprobes.c b/kernel/uprobes.c
index 8a5da38..858ddb1 100644
--- a/kernel/uprobes.c
+++ b/kernel/uprobes.c
@@ -448,21 +448,72 @@ static int del_consumer(struct uprobe *uprobe,
 	return ret;
 }
 
+static int copy_insn(struct task_struct *tsk, unsigned long vaddr,
+						struct uprobe *uprobe)
+{
+	int len;
+
+	len = uprobes_read_vm(tsk, (void __user *)vaddr, uprobe->insn,
+						MAX_UINSN_BYTES);
+	if (len < uprobe_opcode_sz) {
+		print_insert_fail(tsk, vaddr,
+				"error reading original instruction");
+		return -EINVAL;
+	}
+	memcpy(&uprobe->opcode, uprobe->insn, uprobe_opcode_sz);
+	if (is_bkpt_insn(uprobe)) {
+		print_insert_fail(tsk, vaddr,
+				"breakpoint instruction already exists");
+		return -EEXIST;
+	}
+	if (analyze_insn(tsk, uprobe)) {
+		print_insert_fail(tsk, vaddr,
+					"instruction type cannot be probed");
+		return -EINVAL;
+	}
+	uprobe->copy = 1;
+	return 0;
+}
+
 static int install_uprobe(struct mm_struct *mm, struct uprobe *uprobe)
 {
-	int ret = 0;
+	struct task_struct *tsk;
+	int ret = -EINVAL;
 
-	/*TODO: install breakpoint */
-	if (!ret)
+	get_task_struct(mm->owner);
+	tsk = mm->owner;
+	if (!tsk)
+		return ret;
+
+	if (!uprobe->copy) {
+		ret = copy_insn(tsk, mm->uprobes_vaddr, uprobe);
+		if (ret)
+			goto put_return;
+	}
+
+	ret = set_bkpt(tsk, mm->uprobes_vaddr);
+	if (ret < 0)
+		print_insert_fail(tsk, mm->uprobes_vaddr,
+					"failed to insert bkpt instruction");
+	else
 		atomic_inc(&mm->uprobes_count);
+
+put_return:
+	put_task_struct(tsk);
 	return ret;
 }
 
 static int remove_uprobe(struct mm_struct *mm, struct uprobe *uprobe)
 {
-	int ret = 0;
+	struct task_struct *tsk;
+	int ret;
+
+	get_task_struct(mm->owner);
+	tsk = mm->owner;
+	if (!tsk)
+		return -EINVAL;
 
-	/*TODO: remove breakpoint */
+	ret = set_orig_insn(tsk, mm->uprobes_vaddr, true, uprobe);
 	if (!ret)
 		atomic_dec(&mm->uprobes_count);
 

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (6 preceding siblings ...)
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 7/20] 7: uprobes: store/restore original instruction Srikar Dronamraju
@ 2010-12-16  9:58 ` Srikar Dronamraju
  2011-01-25 12:15   ` Peter Zijlstra
  2011-01-25 12:15   ` Peter Zijlstra
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 9/20] 9: x86: architecture specific task information Srikar Dronamraju
                   ` (12 subsequent siblings)
  20 siblings, 2 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Linux-mm,
	Arnaldo Carvalho de Melo, Linus Torvalds,
	Ananth N Mavinakayanahalli, Christoph Hellwig, Masami Hiramatsu,
	Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney


Provides hooks in mmap and fork.

On fork, after the new mm is created, we need to set the count of
uprobes.  On mmap, check if the mmap region is an executable page and if
its a executable page, walk through the rbtree and insert actual
breakpoints for already registered probes corresponding to this inode.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 include/linux/uprobes.h |   11 +++++
 kernel/fork.c           |    2 +
 kernel/uprobes.c        |   96 +++++++++++++++++++++++++++++++++++++++++++++++
 mm/mmap.c               |    2 +
 4 files changed, 110 insertions(+), 1 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index f62c7b0..0d4f5e3 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -73,6 +73,7 @@ struct uprobe_consumer {
 struct uprobe {
 	struct rb_node		rb_node;	/* node in the rb tree */
 	atomic_t		ref;
+	struct list_head	pending_list;
 	struct rw_semaphore	consumer_rwsem;
 	struct uprobe_arch_info	arch_info;	/* arch specific info if any */
 	struct uprobe_consumer	*consumers;
@@ -107,6 +108,10 @@ extern int register_uprobe(struct inode *inode, unsigned long offset,
 				struct uprobe_consumer *consumer);
 extern void unregister_uprobe(struct inode *inode, unsigned long offset,
 				struct uprobe_consumer *consumer);
+
+struct vm_area_struct;
+extern void uprobe_mmap(struct vm_area_struct *vma);
+extern void uprobe_dup_mmap(struct mm_struct *old_mm, struct mm_struct *mm);
 #else /* CONFIG_UPROBES is not defined */
 static inline int register_uprobe(struct inode *inode, unsigned long offset,
 				struct uprobe_consumer *consumer)
@@ -117,6 +122,10 @@ static inline void unregister_uprobe(struct inode *inode, unsigned long offset,
 				struct uprobe_consumer *consumer)
 {
 }
-
+static inline void uprobe_dup_mmap(struct mm_struct *old_mm,
+		struct mm_struct *mm)
+{
+}
+static inline void uprobe_mmap(struct vm_area_struct *vma) { }
 #endif /* CONFIG_UPROBES */
 #endif	/* _LINUX_UPROBES_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 70ea75f..b135d1b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -66,6 +66,7 @@
 #include <linux/posix-timers.h>
 #include <linux/user-return-notifier.h>
 #include <linux/oom.h>
+#include <linux/uprobes.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -415,6 +416,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
 	}
 	/* a new mm has just been created */
 	arch_dup_mmap(oldmm, mm);
+	uprobe_dup_mmap(oldmm, mm);
 	retval = 0;
 out:
 	up_write(&mm->mmap_sem);
diff --git a/kernel/uprobes.c b/kernel/uprobes.c
index 858ddb1..31867a6 100644
--- a/kernel/uprobes.c
+++ b/kernel/uprobes.c
@@ -380,6 +380,7 @@ static struct uprobe *uprobes_add(struct inode *inode,
 	}
 	uprobe->inode = inode;
 	uprobe->offset = offset;
+	INIT_LIST_HEAD(&uprobe->pending_list);
 
 	/* add to uprobes_tree, sorted on inode:offset */
 	cur_uprobe = insert_uprobe_rb_node(uprobe);
@@ -649,3 +650,98 @@ put_unlock:
 	put_uprobe(uprobe);
 }
 
+static void search_within_subtree(struct rb_node *n, struct inode *inode,
+		struct list_head *tmp_list);
+
+static void add_to_temp_list(struct vm_area_struct *vma, struct inode *inode,
+		struct list_head *tmp_list)
+{
+	struct uprobe *uprobe;
+	struct rb_node *n;
+	unsigned long flags;
+
+	n = uprobes_tree.rb_node;
+	spin_lock_irqsave(&treelock, flags);
+	while (n) {
+		uprobe = rb_entry(n, struct uprobe, rb_node);
+		if (match_inode(uprobe, inode, &n)) {
+			list_add(&uprobe->pending_list, tmp_list);
+			search_within_subtree(n, inode, tmp_list);
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&treelock, flags);
+}
+
+static void __search_within_subtree(struct rb_node *p, struct inode *inode,
+		struct list_head *tmp_list)
+{
+	struct uprobe *uprobe;
+
+	uprobe = rb_entry(p, struct uprobe, rb_node);
+	if (match_inode(uprobe, inode, &p)) {
+		list_add(&uprobe->pending_list, tmp_list);
+		search_within_subtree(p, inode, tmp_list);
+	}
+
+
+}
+
+static void search_within_subtree(struct rb_node *n, struct inode *inode,
+		struct list_head *tmp_list)
+{
+	struct rb_node *p;
+
+	p = n->rb_left;
+	if (p)
+		__search_within_subtree(p, inode, tmp_list);
+
+	p = n->rb_right;
+	if (p)
+		__search_within_subtree(p, inode, tmp_list);
+}
+
+/*
+ * Called from dup_mmap.
+ * called with mm->mmap_sem and old_mm->mmap_sem acquired.
+ */
+void uprobe_dup_mmap(struct mm_struct *old_mm, struct mm_struct *mm)
+{
+	atomic_set(&old_mm->uprobes_count,
+			atomic_read(&mm->uprobes_count));
+}
+
+void uprobe_mmap(struct vm_area_struct *vma)
+{
+	struct list_head tmp_list;
+	struct uprobe *uprobe, *u;
+	struct mm_struct *mm;
+	struct inode *inode;
+
+	if (!valid_vma(vma))
+		return;
+
+	INIT_LIST_HEAD(&tmp_list);
+
+	/*
+	 * The vma was just allocated and this routine gets called
+	 * while holding write lock for mmap_sem.  Function called
+	 * in context of a thread that has a reference to mm.
+	 * Hence no need to take a reference to mm
+	 */
+	mm = vma->vm_mm;
+	up_write(&mm->mmap_sem);
+	mutex_lock(&uprobes_mutex);
+
+	inode = vma->vm_file->f_mapping->host;
+	add_to_temp_list(vma, inode, &tmp_list);
+
+	list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
+		mm->uprobes_vaddr = vma->vm_start + uprobe->offset;
+		install_uprobe(mm, uprobe);
+		list_del(&uprobe->pending_list);
+	}
+	mutex_unlock(&uprobes_mutex);
+	down_write(&mm->mmap_sem);
+}
+
diff --git a/mm/mmap.c b/mm/mmap.c
index b179abb..df7307f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -29,6 +29,7 @@
 #include <linux/mmu_notifier.h>
 #include <linux/perf_event.h>
 #include <linux/audit.h>
+#include <linux/uprobes.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -1353,6 +1354,7 @@ out:
 			mm->locked_vm += (len >> PAGE_SHIFT);
 	} else if ((flags & MAP_POPULATE) && !(flags & MAP_NONBLOCK))
 		make_pages_present(addr, addr + len);
+	uprobe_mmap(vma);
 	return addr;
 
 unmap_and_free_vma:

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 9/20]  9: x86: architecture specific task information.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (7 preceding siblings ...)
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 8/20] 8: uprobes: mmap and fork hooks Srikar Dronamraju
@ 2010-12-16  9:58 ` Srikar Dronamraju
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 10/20] 10: uprobes: task specific information Srikar Dronamraju
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Arnaldo Carvalho de Melo,
	Linus Torvalds, Andi Kleen, Christoph Hellwig,
	Ananth N Mavinakayanahalli, Masami Hiramatsu, Oleg Nesterov,
	Andrew Morton, Linux-mm, Jim Keniston, Frederic Weisbecker,
	SystemTap, LKML, Paul E. McKenney


On X86_64, we need to support rip relative instructions.
Rip relative instructions are handled by saving the scratch register
on probe hit and then retrieving the previously saved scratch register
after single-step. This value stored at probe hit is specific to each
task. Hence this is implemented as part of uprobe_task_arch_info.

Since x86_32 has no support for rip relative instructions, we dont need to
bother for x86_32.

Signed-off-by: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/x86/include/asm/uprobes.h |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 0063207..e38950f 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -34,8 +34,13 @@ typedef u8 uprobe_opcode_t;
 struct uprobe_arch_info {
 	unsigned long rip_rela_target_address;
 };
+
+struct uprobe_task_arch_info {
+	unsigned long saved_scratch_register;
+};
 #else
 struct uprobe_arch_info {};
+struct uprobe_task_arch_info {};
 #endif
 struct uprobe;
 extern int analyze_insn(struct task_struct *tsk, struct uprobe *uprobe);

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 10/20] 10: uprobes: task specific information.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (8 preceding siblings ...)
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 9/20] 9: x86: architecture specific task information Srikar Dronamraju
@ 2010-12-16  9:59 ` Srikar Dronamraju
  2011-01-25 13:56   ` Peter Zijlstra
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 11/20] 11: uprobes: slot allocation for uprobes Srikar Dronamraju
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney


Uprobes needs to maintain some task specific information include if a
task is currently uprobed, the currently handing uprobe, any arch
specific information (for example to handle rip relative instructions),
the per-task slot where the original instruction is copied to before
single-stepping.

Provides routines to create/manage and free the task specific
information.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 include/linux/sched.h   |    3 +++
 include/linux/uprobes.h |   25 +++++++++++++++++++++++++
 kernel/fork.c           |    4 ++++
 kernel/uprobes.c        |   37 +++++++++++++++++++++++++++++++++++++
 4 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f4e90b6..5a3ebea 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1523,6 +1523,9 @@ struct task_struct {
 		unsigned long memsw_bytes; /* uncharged mem+swap usage */
 	} memcg_batch;
 #endif
+#ifdef CONFIG_UPROBES
+	struct uprobe_task *utask;
+#endif
 };
 
 /* Future-safe accessor for struct task_struct's cpus_allowed. */
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0d4f5e3..14a4fce 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -26,12 +26,14 @@
 #include <linux/rbtree.h>
 #ifdef CONFIG_ARCH_SUPPORTS_UPROBES
 #include <asm/uprobes.h>
+struct uprobe_task_arch_info;	/* arch specific task info */
 #else
 /*
  * ARCH_SUPPORTS_UPROBES has not be defined.
  */
 typedef u8 uprobe_opcode_t;
 struct uprobe_arch_info	{};		/* arch specific info*/
+struct uprobe_task_arch_info {};	/* arch specific task info */
 
 /* Post-execution fixups.  Some architectures may define others. */
 #endif /* CONFIG_ARCH_SUPPORTS_UPROBES */
@@ -85,6 +87,27 @@ struct uprobe {
 	u8			insn[MAX_UINSN_BYTES];	/* orig instruction */
 };
 
+enum uprobe_task_state {
+	UTASK_RUNNING,
+	UTASK_BP_HIT,
+	UTASK_SSTEP
+};
+
+/*
+ * uprobe_utask -- not a user-visible struct.
+ * Corresponds to a thread in a probed process.
+ * Guarded by uproc->mutex.
+ */
+struct uprobe_task {
+	unsigned long xol_vaddr;
+	unsigned long vaddr;
+
+	enum uprobe_task_state state;
+	struct uprobe_task_arch_info tskinfo;
+
+	struct uprobe *active_uprobe;
+};
+
 /*
  * Most architectures can use the default versions of @read_opcode(),
  * @set_bkpt(), @set_orig_insn(), and @is_bkpt_insn();
@@ -108,6 +131,7 @@ extern int register_uprobe(struct inode *inode, unsigned long offset,
 				struct uprobe_consumer *consumer);
 extern void unregister_uprobe(struct inode *inode, unsigned long offset,
 				struct uprobe_consumer *consumer);
+extern void uprobe_free_utask(struct task_struct *tsk);
 
 struct vm_area_struct;
 extern void uprobe_mmap(struct vm_area_struct *vma);
@@ -126,6 +150,7 @@ static inline void uprobe_dup_mmap(struct mm_struct *old_mm,
 		struct mm_struct *mm)
 {
 }
+static inline void uprobe_free_utask(struct task_struct *tsk) {}
 static inline void uprobe_mmap(struct vm_area_struct *vma) { }
 #endif /* CONFIG_UPROBES */
 #endif	/* _LINUX_UPROBES_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index b135d1b..ae99239 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -191,6 +191,7 @@ void __put_task_struct(struct task_struct *tsk)
 	delayacct_tsk_free(tsk);
 	put_signal_struct(tsk->signal);
 
+	uprobe_free_utask(tsk);
 	if (!profile_handoff_task(tsk))
 		free_task(tsk);
 }
@@ -1202,6 +1203,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	INIT_LIST_HEAD(&p->pi_state_list);
 	p->pi_state_cache = NULL;
 #endif
+#ifdef CONFIG_UPROBES
+	p->utask = NULL;
+#endif
 	/*
 	 * sigaltstack should be cleared when sharing the same VM
 	 */
diff --git a/kernel/uprobes.c b/kernel/uprobes.c
index 31867a6..f182fe6 100644
--- a/kernel/uprobes.c
+++ b/kernel/uprobes.c
@@ -745,3 +745,40 @@ void uprobe_mmap(struct vm_area_struct *vma)
 	down_write(&mm->mmap_sem);
 }
 
+/*
+ * Called with no locks held.
+ * Called in context of a exiting or a exec-ing thread.
+ */
+void uprobe_free_utask(struct task_struct *tsk)
+{
+	struct uprobe_task *utask = tsk->utask;
+
+	if (!utask)
+		return;
+
+	if (utask->active_uprobe)
+		put_uprobe(utask->active_uprobe);
+	kfree(utask);
+	tsk->utask = NULL;
+}
+
+/*
+ * Allocate a uprobe_task object for the task.
+ * Called when the thread hits a breakpoint for the first time.
+ *
+ * Returns:
+ * - pointer to new uprobe_task on success
+ * - negative errno otherwise
+ */
+static struct uprobe_task *add_utask(void)
+{
+	struct uprobe_task *utask;
+
+	utask = kzalloc(sizeof *utask, GFP_KERNEL);
+	if (unlikely(utask == NULL))
+		return ERR_PTR(-ENOMEM);
+
+	utask->active_uprobe = NULL;
+	current->utask = utask;
+	return utask;
+}

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 11/20] 11: uprobes: slot allocation for uprobes
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (9 preceding siblings ...)
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 10/20] 10: uprobes: task specific information Srikar Dronamraju
@ 2010-12-16  9:59 ` Srikar Dronamraju
  2011-01-25 13:56   ` Peter Zijlstra
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 12/20] 12: uprobes: get the breakpoint address Srikar Dronamraju
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Linux-mm,
	Arnaldo Carvalho de Melo, Linus Torvalds,
	Ananth N Mavinakayanahalli, Christoph Hellwig, Masami Hiramatsu,
	Oleg Nesterov, Andrew Morton, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, LKML, Paul E. McKenney


Every task is allocated a fixed slot. When a probe is hit, the original
instruction corresponding to the probe hit is copied to per-task fixed
slot. Currently we allocate one page of slots for each mm. Bitmaps are
used to know which slots are free. Each slot is made of 128 bytes so
that its cache aligned.

TODO: On massively threaded processes (or if a huge number of processes
share the same mm), there is a possiblilty of running out of slots.
One alternative could be to extend the slots as when slots are required.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
---
 include/linux/mm_types.h |    4 +
 include/linux/uprobes.h  |   21 ++++
 kernel/fork.c            |    4 +
 kernel/uprobes.c         |  233 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 262 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index af2b55d..2bc97cc 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -12,6 +12,9 @@
 #include <linux/completion.h>
 #include <linux/cpumask.h>
 #include <linux/page-debug-flags.h>
+#ifdef CONFIG_UPROBES
+#include <linux/uprobes.h>
+#endif
 #include <asm/page.h>
 #include <asm/mmu.h>
 
@@ -316,6 +319,7 @@ struct mm_struct {
 	unsigned long uprobes_vaddr;
 	struct list_head uprobes_list;
 	atomic_t uprobes_count;
+	struct uprobes_xol_area *uprobes_xol_area;
 #endif
 };
 
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 14a4fce..a631c42 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -109,6 +109,25 @@ struct uprobe_task {
 };
 
 /*
+ * Every thread gets its own slot.  Once it's assigned a slot, it
+ * keeps that slot until the thread exits. Only definite number
+ * of slots are allocated.
+ */
+
+struct uprobes_xol_area {
+	spinlock_t slot_lock;	/* protects bitmap and slot (de)allocation*/
+	unsigned long *bitmap;	/* 0 = free slot */
+	struct page *page;
+
+	/*
+	 * We keep the vma's vm_start rather than a pointer to the vma
+	 * itself.  The probed process or a naughty kernel module could make
+	 * the vma go away, and we must handle that reasonably gracefully.
+	 */
+	unsigned long vaddr;		/* Page(s) of instruction slots */
+};
+
+/*
  * Most architectures can use the default versions of @read_opcode(),
  * @set_bkpt(), @set_orig_insn(), and @is_bkpt_insn();
  *
@@ -136,6 +155,7 @@ extern void uprobe_free_utask(struct task_struct *tsk);
 struct vm_area_struct;
 extern void uprobe_mmap(struct vm_area_struct *vma);
 extern void uprobe_dup_mmap(struct mm_struct *old_mm, struct mm_struct *mm);
+extern void uprobes_free_xol_area(struct mm_struct *mm);
 #else /* CONFIG_UPROBES is not defined */
 static inline int register_uprobe(struct inode *inode, unsigned long offset,
 				struct uprobe_consumer *consumer)
@@ -152,5 +172,6 @@ static inline void uprobe_dup_mmap(struct mm_struct *old_mm,
 }
 static inline void uprobe_free_utask(struct task_struct *tsk) {}
 static inline void uprobe_mmap(struct vm_area_struct *vma) { }
+static inline void uprobes_free_xol_area(struct mm_struct *mm) {}
 #endif /* CONFIG_UPROBES */
 #endif	/* _LINUX_UPROBES_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index ae99239..76538fd 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -545,6 +545,7 @@ void mmput(struct mm_struct *mm)
 	might_sleep();
 
 	if (atomic_dec_and_test(&mm->mm_users)) {
+		uprobes_free_xol_area(mm);
 		exit_aio(mm);
 		ksm_exit(mm);
 		exit_mmap(mm);
@@ -670,6 +671,9 @@ struct mm_struct *dup_mm(struct task_struct *tsk)
 	memcpy(mm, oldmm, sizeof(*mm));
 
 	/* Initializing for Swap token stuff */
+#ifdef CONFIG_UPROBES
+	mm->uprobes_xol_area = NULL;
+#endif
 	mm->token_priority = 0;
 	mm->last_interval = 0;
 
diff --git a/kernel/uprobes.c b/kernel/uprobes.c
index f182fe6..09e36f6 100644
--- a/kernel/uprobes.c
+++ b/kernel/uprobes.c
@@ -32,6 +32,11 @@
 #include <linux/slab.h>
 #include <linux/uprobes.h>
 #include <linux/rmap.h> /* needed for anon_vma_prepare */
+#include <linux/mman.h>	/* needed for PROT_EXEC, MAP_PRIVATE */
+#include <linux/file.h> /* needed for fput() */
+
+#define UINSNS_PER_PAGE	(PAGE_SIZE/UPROBES_XOL_SLOT_BYTES)
+#define MAX_UPROBES_XOL_SLOTS UINSNS_PER_PAGE
 
 /**
  * uprobes_read_vm - Read @nbytes at @vaddr from @tsk into @kbuf.
@@ -353,11 +358,22 @@ static void put_uprobe(struct uprobe *uprobe)
 		kfree(uprobe);
 }
 
+/*
+ * valid_vma: Verify if the specified vma is an executable vma,
+ * but not an XOL vma.
+ *	- Return 1 if the specified virtual address is in an
+ *	  executable vma, but not in an XOL vma.
+ */
 static int valid_vma(struct vm_area_struct *vma)
 {
+	struct uprobes_xol_area *area = current->mm->uprobes_xol_area;
+
 	if (!vma->vm_file)
 		return 0;
 
+	if (area && (area->vaddr == vma->vm_start))
+			return 0;
+
 	if ((vma->vm_flags & (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)) ==
 						(VM_READ|VM_EXEC))
 		return 1;
@@ -745,6 +761,221 @@ void uprobe_mmap(struct vm_area_struct *vma)
 	down_write(&mm->mmap_sem);
 }
 
+/* Slot allocation for XOL */
+
+static int xol_add_vma(struct uprobes_xol_area *area)
+{
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	struct file *file;
+	unsigned long addr;
+	int ret = -ENOMEM;
+
+	mm = get_task_mm(current);
+	if (!mm)
+		return -ESRCH;
+
+	down_write(&mm->mmap_sem);
+	if (mm->uprobes_xol_area) {
+		ret = -EALREADY;
+		goto fail;
+	}
+
+	/*
+	 * Find the end of the top mapping and skip a page.
+	 * If there is no space for PAGE_SIZE above
+	 * that, mmap will ignore our address hint.
+	 *
+	 * We allocate a "fake" unlinked shmem file because
+	 * anonymous memory might not be granted execute
+	 * permission when the selinux security hooks have
+	 * their way.
+	 */
+	vma = rb_entry(rb_last(&mm->mm_rb), struct vm_area_struct, vm_rb);
+	addr = vma->vm_end + PAGE_SIZE;
+	file = shmem_file_setup("uprobes/xol", PAGE_SIZE, VM_NORESERVE);
+	if (!file) {
+		printk(KERN_ERR "uprobes_xol failed to setup shmem_file "
+			"while allocating vma for pid/tgid %d/%d for "
+			"single-stepping out of line.\n",
+			current->pid, current->tgid);
+		goto fail;
+	}
+	addr = do_mmap_pgoff(file, addr, PAGE_SIZE, PROT_EXEC, MAP_PRIVATE, 0);
+	fput(file);
+
+	if (addr & ~PAGE_MASK) {
+		printk(KERN_ERR "uprobes_xol failed to allocate a vma for "
+				"pid/tgid %d/%d for single-stepping out of "
+				"line.\n", current->pid, current->tgid);
+		goto fail;
+	}
+	vma = find_vma(mm, addr);
+
+	/* Don't expand vma on mremap(). */
+	vma->vm_flags |= VM_DONTEXPAND | VM_DONTCOPY;
+	area->vaddr = vma->vm_start;
+	if (get_user_pages(current, mm, area->vaddr, 1, 1, 1, &area->page,
+				&vma) > 0)
+		ret = 0;
+
+fail:
+	up_write(&mm->mmap_sem);
+	mmput(mm);
+	return ret;
+}
+
+/*
+ * xol_alloc_area - Allocate process's uprobes_xol_area.
+ * This area will be used for storing instructions for execution out of
+ * line.
+ *
+ * Returns the allocated area or NULL.
+ */
+static struct uprobes_xol_area *xol_alloc_area(void)
+{
+	struct uprobes_xol_area *area = NULL;
+
+	area = kzalloc(sizeof(*area), GFP_USER);
+	if (unlikely(!area))
+		return NULL;
+
+	area->bitmap = kzalloc(BITS_TO_LONGS(UINSNS_PER_PAGE) * sizeof(long),
+								GFP_USER);
+
+	if (!area->bitmap)
+		goto fail;
+
+	spin_lock_init(&area->slot_lock);
+	if (!xol_add_vma(area))
+		return area;
+
+fail:
+	if (area) {
+		if (area->bitmap)
+			kfree(area->bitmap);
+		kfree(area);
+	}
+	return current->mm->uprobes_xol_area;
+}
+
+/*
+ * uprobes_free_xol_area - Free the area allocated for slots.
+ */
+void uprobes_free_xol_area(struct mm_struct *mm)
+{
+	struct uprobes_xol_area *area = mm->uprobes_xol_area;
+
+	if (!area)
+		return;
+
+	put_page(area->page);
+	kfree(area->bitmap);
+	kfree(area);
+}
+
+/*
+ * Find a slot
+ *  - searching in existing vmas for a free slot.
+ *  - If no free slot in existing vmas, return 0;
+ *
+ * Called when holding uprobes_xol_area->slot_lock
+ */
+static unsigned long xol_take_insn_slot(struct uprobes_xol_area *area)
+{
+	unsigned long slot_addr;
+	int slot_nr;
+
+	slot_nr = find_first_zero_bit(area->bitmap, UINSNS_PER_PAGE);
+	if (slot_nr < UINSNS_PER_PAGE) {
+		__set_bit(slot_nr, area->bitmap);
+		slot_addr = area->vaddr +
+				(slot_nr * UPROBES_XOL_SLOT_BYTES);
+		return slot_addr;
+	}
+
+	return 0;
+}
+
+/*
+ * xol_get_insn_slot - If was not allocated a slot, then
+ * allocate a slot.
+ * Returns the allocated slot address or 0.
+ */
+static unsigned long xol_get_insn_slot(struct uprobe *uprobe,
+					unsigned long slot_addr)
+{
+	struct uprobes_xol_area *area = current->mm->uprobes_xol_area;
+	unsigned long flags, xol_vaddr = 0;
+	int len = 0;
+
+	if (!current->utask->xol_vaddr) {
+		if (!area)
+			area = xol_alloc_area();
+
+		if (!area)
+			return 0;
+
+		spin_lock_irqsave(&area->slot_lock, flags);
+		xol_vaddr = xol_take_insn_slot(area);
+		spin_unlock_irqrestore(&area->slot_lock, flags);
+	}
+
+	/*
+	 * Initialize the slot if xol_vaddr points to valid
+	 * instruction slot.
+	 */
+	if (likely(xol_vaddr)) {
+		void *vaddr;
+
+		current->utask->xol_vaddr = xol_vaddr;
+		current->utask->vaddr = slot_addr;
+		vaddr = kmap_atomic(area->page, KM_USER0);
+		xol_vaddr &= ~PAGE_MASK;
+		memcpy(vaddr + xol_vaddr, uprobe->insn, MAX_UINSN_BYTES);
+		kunmap_atomic(vaddr, KM_USER0);
+	}
+	return current->utask->xol_vaddr;
+}
+
+/*
+ * xol_free_insn_slot - If slot was earlier allocated by
+ * @xol_get_insn_slot(), make the slot available for
+ * subsequent requests.
+ */
+static void xol_free_insn_slot(struct task_struct *tsk, unsigned long slot_addr)
+{
+	struct uprobes_xol_area *area;
+	unsigned long vma_end;
+
+	if (!tsk->mm || tsk->mm->uprobes_xol_area)
+		return;
+
+	area = tsk->mm->uprobes_xol_area;
+
+	if (unlikely(!slot_addr || IS_ERR_VALUE(slot_addr)))
+		return;
+
+	vma_end = area->vaddr + PAGE_SIZE;
+	if (area->vaddr <= slot_addr && slot_addr < vma_end) {
+		int slot_nr;
+		unsigned long offset = slot_addr - area->vaddr;
+		unsigned long flags;
+
+		BUG_ON(offset % UPROBES_XOL_SLOT_BYTES);
+
+		slot_nr = offset / UPROBES_XOL_SLOT_BYTES;
+		BUG_ON(slot_nr >= UINSNS_PER_PAGE);
+
+		spin_lock_irqsave(&area->slot_lock, flags);
+		__clear_bit(slot_nr, area->bitmap);
+		spin_unlock_irqrestore(&area->slot_lock, flags);
+		return;
+	}
+	printk(KERN_ERR "%s: no XOL vma for slot address %#lx\n",
+						__func__, slot_addr);
+}
+
 /*
  * Called with no locks held.
  * Called in context of a exiting or a exec-ing thread.
@@ -758,6 +989,8 @@ void uprobe_free_utask(struct task_struct *tsk)
 
 	if (utask->active_uprobe)
 		put_uprobe(utask->active_uprobe);
+
+	xol_free_insn_slot(tsk, utask->xol_vaddr);
 	kfree(utask);
 	tsk->utask = NULL;
 }

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 12/20] 12: uprobes: get the breakpoint address.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (10 preceding siblings ...)
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 11/20] 11: uprobes: slot allocation for uprobes Srikar Dronamraju
@ 2010-12-16  9:59 ` Srikar Dronamraju
  2011-01-25 13:56   ` Peter Zijlstra
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 13/20] 13: x86: x86 specific probe handling Srikar Dronamraju
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Arnaldo Carvalho de Melo,
	Linus Torvalds, Andi Kleen, Christoph Hellwig,
	Ananth N Mavinakayanahalli, Masami Hiramatsu, Oleg Nesterov,
	LKML, Linux-mm, Jim Keniston, Frederic Weisbecker, SystemTap,
	Andrew Morton, Paul E. McKenney


On a breakpoint hit, perform a architecture specific calculation to
return the address where the breakpoint was hit.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
---
 include/linux/uprobes.h |    5 +++++
 kernel/uprobes.c        |   11 +++++++++++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index a631c42..ee12b2e 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -154,6 +154,7 @@ extern void uprobe_free_utask(struct task_struct *tsk);
 
 struct vm_area_struct;
 extern void uprobe_mmap(struct vm_area_struct *vma);
+extern unsigned long uprobes_get_bkpt_addr(struct pt_regs *regs);
 extern void uprobe_dup_mmap(struct mm_struct *old_mm, struct mm_struct *mm);
 extern void uprobes_free_xol_area(struct mm_struct *mm);
 #else /* CONFIG_UPROBES is not defined */
@@ -173,5 +174,9 @@ static inline void uprobe_dup_mmap(struct mm_struct *old_mm,
 static inline void uprobe_free_utask(struct task_struct *tsk) {}
 static inline void uprobe_mmap(struct vm_area_struct *vma) { }
 static inline void uprobes_free_xol_area(struct mm_struct *mm) {}
+static inline unsigned long uprobes_get_bkpt_addr(struct pt_regs *regs)
+{
+	return 0;
+}
 #endif /* CONFIG_UPROBES */
 #endif	/* _LINUX_UPROBES_H */
diff --git a/kernel/uprobes.c b/kernel/uprobes.c
index 09e36f6..f486c4f 100644
--- a/kernel/uprobes.c
+++ b/kernel/uprobes.c
@@ -976,6 +976,17 @@ static void xol_free_insn_slot(struct task_struct *tsk, unsigned long slot_addr)
 						__func__, slot_addr);
 }
 
+/**
+ * uprobes_get_bkpt_addr - compute address of bkpt given post-bkpt regs
+ * @regs: Reflects the saved state of the task after it has hit a breakpoint
+ * instruction.
+ * Return the address of the breakpoint instruction.
+ */
+unsigned long uprobes_get_bkpt_addr(struct pt_regs *regs)
+{
+	return instruction_pointer(regs) - UPROBES_BKPT_INSN_SIZE;
+}
+
 /*
  * Called with no locks held.
  * Called in context of a exiting or a exec-ing thread.

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 13/20] 13: x86: x86 specific probe handling
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (11 preceding siblings ...)
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 12/20] 12: uprobes: get the breakpoint address Srikar Dronamraju
@ 2010-12-16  9:59 ` Srikar Dronamraju
  2011-01-25 13:56   ` Peter Zijlstra
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception Srikar Dronamraju
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, Andrew Morton, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, LKML,
	Paul E. McKenney


Provides x86 specific implementations for setting the current
instruction pointer, pre single-step and post-singlestep handling,
enabling and disabling singlestep.

This patch also introduces TIF_UPROBE which is set by uprobes notifier
code. TIF_UPROBE indicates that there is pending work that needs to be
done at do_notify_resume time.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/x86/include/asm/thread_info.h |    2 
 arch/x86/include/asm/uprobes.h     |    5 +
 arch/x86/kernel/uprobes.c          |  155 ++++++++++++++++++++++++++++++++++++
 3 files changed, 162 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index f0b6e5d..5b9c9f0 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -84,6 +84,7 @@ struct thread_info {
 #define TIF_SECCOMP		8	/* secure computing */
 #define TIF_MCE_NOTIFY		10	/* notify userspace of an MCE */
 #define TIF_USER_RETURN_NOTIFY	11	/* notify kernel of userspace return */
+#define TIF_UPROBE		12	/* breakpointed or singlestepping */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* 32bit process */
 #define TIF_FORK		18	/* ret_from_fork */
@@ -107,6 +108,7 @@ struct thread_info {
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
 #define _TIF_MCE_NOTIFY		(1 << TIF_MCE_NOTIFY)
 #define _TIF_USER_RETURN_NOTIFY	(1 << TIF_USER_RETURN_NOTIFY)
+#define _TIF_UPROBE		(1 << TIF_UPROBE)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
 #define _TIF_FORK		(1 << TIF_FORK)
diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index e38950f..0c9c8b6 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -44,4 +44,9 @@ struct uprobe_task_arch_info {};
 #endif
 struct uprobe;
 extern int analyze_insn(struct task_struct *tsk, struct uprobe *uprobe);
+extern void set_ip(struct pt_regs *regs, unsigned long vaddr);
+extern int pre_xol(struct uprobe *uprobe, struct pt_regs *regs);
+extern int post_xol(struct uprobe *uprobe, struct pt_regs *regs);
+extern void arch_uprobe_enable_sstep(struct pt_regs *regs);
+extern void arch_uprobe_disable_sstep(struct pt_regs *regs);
 #endif	/* _ASM_UPROBES_H */
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 352c71f..9a0b8a9 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -413,3 +413,158 @@ int analyze_insn(struct task_struct *tsk, struct uprobe *uprobe)
 	return 0;
 }
 
+/*
+ * @reg: reflects the saved state of the task
+ * @vaddr: the virtual address to jump to.
+ * Return 0 on success or a -ve number on error.
+ */
+void set_ip(struct pt_regs *regs, unsigned long vaddr)
+{
+	regs->ip = vaddr;
+}
+
+/*
+ * pre_xol - prepare to execute out of line.
+ * @uprobe: the probepoint information.
+ * @regs: reflects the saved user state of @tsk.
+ *
+ * If we're emulating a rip-relative instruction, save the contents
+ * of the scratch register and store the target address in that register.
+ *
+ * Returns true if @uprobe->opcode is @bkpt_insn.
+ */
+int pre_xol(struct uprobe *uprobe, struct pt_regs *regs)
+{
+	struct uprobe_task_arch_info *tskinfo = &current->utask->tskinfo;
+
+	regs->ip = current->utask->xol_vaddr;
+#ifdef CONFIG_X86_64
+	if (uprobe->fixups & UPROBES_FIX_RIP_AX) {
+		tskinfo->saved_scratch_register = regs->ax;
+		regs->ax = current->utask->vaddr;
+		regs->ax += uprobe->arch_info.rip_rela_target_address;
+	} else if (uprobe->fixups & UPROBES_FIX_RIP_CX) {
+		tskinfo->saved_scratch_register = regs->cx;
+		regs->cx = current->utask->vaddr;
+		regs->cx += uprobe->arch_info.rip_rela_target_address;
+	}
+#endif
+	return 0;
+}
+
+/*
+ * Called by post_xol() to adjust the return address pushed by a call
+ * instruction executed out of line.
+ */
+static int adjust_ret_addr(unsigned long sp, long correction)
+{
+	int rasize, ncopied;
+	long ra = 0;
+
+	if (is_32bit_app(current))
+		rasize = 4;
+	else
+		rasize = 8;
+	ncopied = uprobes_read_vm(current, (void __user *) sp, &ra, rasize);
+	if (unlikely(ncopied != rasize))
+		goto fail;
+	ra += correction;
+	ncopied = uprobes_write_vm(current, (void __user *) sp, &ra, rasize);
+	if (unlikely(ncopied != rasize))
+		goto fail;
+	return 0;
+
+fail:
+	printk(KERN_ERR
+		"uprobes: Failed to adjust return address after"
+		" single-stepping call instruction;"
+		" pid=%d, sp=%#lx\n", current->pid, sp);
+	return -EFAULT;
+}
+
+#ifdef CONFIG_X86_64
+static bool is_riprel_insn(struct uprobe *uprobe)
+{
+	return ((uprobe->fixups &
+			(UPROBES_FIX_RIP_AX | UPROBES_FIX_RIP_CX)) != 0);
+}
+
+#endif	/* CONFIG_X86_64 */
+
+/*
+ * Called after single-stepping. To avoid the SMP problems that can
+ * occur when we temporarily put back the original opcode to
+ * single-step, we single-stepped a copy of the instruction.
+ *
+ * This function prepares to resume execution after the single-step.
+ * We have to fix things up as follows:
+ *
+ * Typically, the new ip is relative to the copied instruction.  We need
+ * to make it relative to the original instruction (FIX_IP).  Exceptions
+ * are return instructions and absolute or indirect jump or call instructions.
+ *
+ * If the single-stepped instruction was a call, the return address that
+ * is atop the stack is the address following the copied instruction.  We
+ * need to make it the address following the original instruction (FIX_CALL).
+ *
+ * If the original instruction was a rip-relative instruction such as
+ * "movl %edx,0xnnnn(%rip)", we have instead executed an equivalent
+ * instruction using a scratch register -- e.g., "movl %edx,(%rax)".
+ * We need to restore the contents of the scratch register and adjust
+ * the ip, keeping in mind that the instruction we executed is 4 bytes
+ * shorter than the original instruction (since we squeezed out the offset
+ * field).  (FIX_RIP_AX or FIX_RIP_CX)
+ */
+int post_xol(struct uprobe *uprobe, struct pt_regs *regs)
+{
+	struct uprobe_task *utask = current->utask;
+	int result = 0;
+	long correction;
+
+	correction = (long)(utask->vaddr - utask->xol_vaddr);
+#ifdef CONFIG_X86_64
+	if (is_riprel_insn(uprobe)) {
+		struct uprobe_task_arch_info *tskinfo;
+		tskinfo = &current->utask->tskinfo;
+
+		if (uprobe->fixups & UPROBES_FIX_RIP_AX)
+			regs->ax = tskinfo->saved_scratch_register;
+		else
+			regs->cx = tskinfo->saved_scratch_register;
+		/*
+		 * The original instruction includes a displacement, and so
+		 * is 4 bytes longer than what we've just single-stepped.
+		 * Fall through to handle stuff like "jmpq *...(%rip)" and
+		 * "callq *...(%rip)".
+		 */
+		correction += 4;
+	}
+#endif
+	if (uprobe->fixups & UPROBES_FIX_IP)
+		regs->ip += correction;
+	if (uprobe->fixups & UPROBES_FIX_CALL)
+		result = adjust_ret_addr(regs->sp, correction);
+	return result;
+}
+
+void arch_uprobe_enable_sstep(struct pt_regs *regs)
+{
+	/*
+	 * Enable single-stepping by
+	 * - Set TF on stack
+	 * - Set TIF_SINGLESTEP: Guarantees that TF is set when
+	 *	returning to user mode.
+	 *  - Indicate that TF is set by us.
+	 */
+	regs->flags |= X86_EFLAGS_TF;
+	set_thread_flag(TIF_SINGLESTEP);
+	set_thread_flag(TIF_FORCED_TF);
+}
+
+void arch_uprobe_disable_sstep(struct pt_regs *regs)
+{
+	/* Disable single-stepping by clearing what we set */
+	clear_thread_flag(TIF_SINGLESTEP);
+	clear_thread_flag(TIF_FORCED_TF);
+	regs->flags &= ~X86_EFLAGS_TF;
+}

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (12 preceding siblings ...)
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 13/20] 13: x86: x86 specific probe handling Srikar Dronamraju
@ 2010-12-16  9:59 ` Srikar Dronamraju
  2011-01-25 13:56   ` Peter Zijlstra
  2011-01-25 13:56   ` Peter Zijlstra
  2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 15/20] 15: x86: uprobes exception notifier for x86 Srikar Dronamraju
                   ` (6 subsequent siblings)
  20 siblings, 2 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16  9:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Linux-mm,
	Arnaldo Carvalho de Melo, Linus Torvalds,
	Ananth N Mavinakayanahalli, Christoph Hellwig, Masami Hiramatsu,
	Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney


On int3, set the TIF_UPROBE flag and if a task specific info is
available, indicate the task state as breakpoint hit.  Setting the
TIF_UPROBE flag results in uprobe_notify_resume being called.
uprobe_notify_resume walks thro the list of vmas and then matches the
inode and offset corresponding to the instruction pointer to enteries in
rbtree. Once a matcing uprobes is found, run the handlers for all the
consumers that have registered.

On singlestep exception, perform the necessary fixups and allow the
process to continue. The necessary fixups are determined at instruction
analysis time.

TODO: If there is no matching uprobe, signal a trap to the process.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 include/linux/uprobes.h |    4 +
 kernel/uprobes.c        |  144 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 148 insertions(+), 0 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index ee12b2e..a91ff42 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -157,6 +157,9 @@ extern void uprobe_mmap(struct vm_area_struct *vma);
 extern unsigned long uprobes_get_bkpt_addr(struct pt_regs *regs);
 extern void uprobe_dup_mmap(struct mm_struct *old_mm, struct mm_struct *mm);
 extern void uprobes_free_xol_area(struct mm_struct *mm);
+extern int uprobe_post_notifier(struct pt_regs *regs);
+extern int uprobe_bkpt_notifier(struct pt_regs *regs);
+extern void uprobe_notify_resume(struct pt_regs *regs);
 #else /* CONFIG_UPROBES is not defined */
 static inline int register_uprobe(struct inode *inode, unsigned long offset,
 				struct uprobe_consumer *consumer)
@@ -174,6 +177,7 @@ static inline void uprobe_dup_mmap(struct mm_struct *old_mm,
 static inline void uprobe_free_utask(struct task_struct *tsk) {}
 static inline void uprobe_mmap(struct vm_area_struct *vma) { }
 static inline void uprobes_free_xol_area(struct mm_struct *mm) {}
+static inline void uprobe_notify_resume(struct pt_regs *regs) {}
 static inline unsigned long uprobes_get_bkpt_addr(struct pt_regs *regs)
 {
 	return 0;
diff --git a/kernel/uprobes.c b/kernel/uprobes.c
index f486c4f..3d21d8f 100644
--- a/kernel/uprobes.c
+++ b/kernel/uprobes.c
@@ -1026,3 +1026,147 @@ static struct uprobe_task *add_utask(void)
 	current->utask = utask;
 	return utask;
 }
+
+/* Prepare to single-step probed instruction out of line. */
+static int pre_ssout(struct uprobe *uprobe, struct pt_regs *regs,
+				unsigned long vaddr)
+{
+	xol_get_insn_slot(uprobe, vaddr);
+	BUG_ON(!current->utask->xol_vaddr);
+	if (!pre_xol(uprobe, regs)) {
+		set_ip(regs, current->utask->xol_vaddr);
+		return 0;
+	}
+	return -EFAULT;
+}
+
+/*
+ * Verify from Instruction Pointer if singlestep has indeed occurred.
+ * If Singlestep has occurred, then do post singlestep fix-ups.
+ */
+static bool sstep_complete(struct uprobe *uprobe, struct pt_regs *regs)
+{
+	unsigned long vaddr = instruction_pointer(regs);
+
+	/*
+	 * If we have executed out of line, Instruction pointer
+	 * cannot be same as virtual address of XOL slot.
+	 */
+	if (vaddr == current->utask->xol_vaddr)
+		return false;
+	post_xol(uprobe, regs);
+	return true;
+}
+
+/*
+ * uprobe_notify_resume gets called in task context just before returning
+ * to userspace.
+ *
+ *  If its the first time the probepoint is hit, slot gets allocated here.
+ *  If its the first time the thread hit a breakpoint, utask gets
+ *  allocated here.
+ */
+void uprobe_notify_resume(struct pt_regs *regs)
+{
+	struct vm_area_struct *vma;
+	struct uprobe_task *utask;
+	struct mm_struct *mm;
+	struct uprobe *u = NULL;
+	unsigned long probept;
+
+	utask = current->utask;
+	mm = current->mm;
+	if (unlikely(!utask)) {
+		utask = add_utask();
+
+		/* Failed to allocate utask for the current task. */
+		BUG_ON(!utask);
+		utask->state = UTASK_BP_HIT;
+	}
+	if (utask->state == UTASK_BP_HIT) {
+		probept = uprobes_get_bkpt_addr(regs);
+		down_read(&mm->mmap_sem);
+		for (vma = mm->mmap; vma; vma = vma->vm_next) {
+			if (!valid_vma(vma))
+				continue;
+			if (probept < vma->vm_start || probept > vma->vm_end)
+				continue;
+			u = find_uprobe(vma->vm_file->f_mapping->host,
+					probept - vma->vm_start);
+			if (u)
+				break;
+		}
+		up_read(&mm->mmap_sem);
+		/*TODO Return SIGTRAP signal */
+		/*if (!u) {
+			;
+		} */
+		/* TODO Start queueing signals. */
+		utask->active_uprobe = u;
+		handler_chain(u, regs);
+		utask->state = UTASK_SSTEP;
+		if (!pre_ssout(u, regs, probept))
+			arch_uprobe_enable_sstep(regs);
+	} else if (utask->state == UTASK_SSTEP) {
+		u = utask->active_uprobe;
+		if (sstep_complete(u, regs)) {
+			put_uprobe(u);
+			utask->active_uprobe = NULL;
+			utask->state = UTASK_RUNNING;
+		/* TODO Stop queueing signals. */
+			arch_uprobe_disable_sstep(regs);
+		}
+	}
+}
+
+/*
+ * uprobe_bkpt_notifier gets called from interrupt context
+ * it gets a reference to the ppt and sets TIF_UPROBE flag,
+ */
+int uprobe_bkpt_notifier(struct pt_regs *regs)
+{
+	struct uprobe_task *utask;
+
+	if (!current->mm || !atomic_read(&current->mm->uprobes_count))
+		/* task is currently not uprobed */
+		return 0;
+
+	utask = current->utask;
+	if (utask)
+		utask->state = UTASK_BP_HIT;
+	set_thread_flag(TIF_UPROBE);
+	return 1;
+}
+
+/*
+ * uprobe_post_notifier gets called in interrupt context.
+ * It completes the single step operation.
+ */
+int uprobe_post_notifier(struct pt_regs *regs)
+{
+	struct uprobe *uprobe;
+	struct uprobe_task *utask;
+
+	if (!current->mm || !current->utask || !current->utask->active_uprobe)
+		/* task is currently not uprobed */
+		return 0;
+
+	utask = current->utask;
+	uprobe = utask->active_uprobe;
+	if (!uprobe)
+		return 0;
+
+	if (uprobes_resume_can_sleep(uprobe)) {
+		set_thread_flag(TIF_UPROBE);
+		return 1;
+	}
+	if (sstep_complete(uprobe, regs)) {
+		put_uprobe(uprobe);
+		utask->active_uprobe = NULL;
+		utask->state = UTASK_RUNNING;
+		/* TODO Stop queueing signals. */
+		arch_uprobe_disable_sstep(regs);
+		return 1;
+	}
+	return 0;
+}

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 15/20] 15: x86: uprobes exception notifier for x86.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (13 preceding siblings ...)
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception Srikar Dronamraju
@ 2010-12-16 10:00 ` Srikar Dronamraju
  2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 16/20] 16: uprobes: register a notifier for uprobes Srikar Dronamraju
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16 10:00 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Arnaldo Carvalho de Melo,
	Linus Torvalds, Andi Kleen, Christoph Hellwig,
	Ananth N Mavinakayanahalli, Masami Hiramatsu, Oleg Nesterov,
	Andrew Morton, Linux-mm, Jim Keniston, Frederic Weisbecker,
	SystemTap, LKML, Paul E. McKenney


Provides a uprobes exception notifier for x86.  This uprobes_exception
notifier gets called in interrupt context and routes int3 and singlestep
exception when a uprobed process encounters a INT3 or a singlestep exception.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/x86/include/asm/uprobes.h |    3 +++
 arch/x86/kernel/signal.c       |   14 ++++++++++++++
 arch/x86/kernel/uprobes.c      |   29 +++++++++++++++++++++++++++++
 3 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 0c9c8b6..bb5c03c 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -22,6 +22,7 @@
  *	Srikar Dronamraju
  *	Jim Keniston
  */
+#include <linux/notifier.h>
 
 typedef u8 uprobe_opcode_t;
 #define MAX_UINSN_BYTES 16
@@ -49,4 +50,6 @@ extern int pre_xol(struct uprobe *uprobe, struct pt_regs *regs);
 extern int post_xol(struct uprobe *uprobe, struct pt_regs *regs);
 extern void arch_uprobe_enable_sstep(struct pt_regs *regs);
 extern void arch_uprobe_disable_sstep(struct pt_regs *regs);
+extern int uprobes_exception_notify(struct notifier_block *self,
+				       unsigned long val, void *data);
 #endif	/* _ASM_UPROBES_H */
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 4fd173c..0b06b94 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -20,6 +20,7 @@
 #include <linux/personality.h>
 #include <linux/uaccess.h>
 #include <linux/user-return-notifier.h>
+#include <linux/uprobes.h>
 
 #include <asm/processor.h>
 #include <asm/ucontext.h>
@@ -848,6 +849,19 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
 	if (thread_info_flags & _TIF_SIGPENDING)
 		do_signal(regs);
 
+	if (thread_info_flags & _TIF_UPROBE) {
+		clear_thread_flag(TIF_UPROBE);
+#ifdef CONFIG_X86_32
+		/*
+		 * On x86_32, do_notify_resume() gets called with
+		 * interrupts disabled. Hence enable interrupts if they
+		 * are still disabled.
+		 */
+		local_irq_enable();
+#endif
+		uprobe_notify_resume(regs);
+	}
+
 	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
 		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 9a0b8a9..c4239b3 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -568,3 +568,32 @@ void arch_uprobe_disable_sstep(struct pt_regs *regs)
 	clear_thread_flag(TIF_FORCED_TF);
 	regs->flags &= ~X86_EFLAGS_TF;
 }
+
+/*
+ * Wrapper routine for handling exceptions.
+ */
+int uprobes_exception_notify(struct notifier_block *self,
+				       unsigned long val, void *data)
+{
+	struct die_args *args = data;
+	struct pt_regs *regs = args->regs;
+	int ret = NOTIFY_DONE;
+
+	/* We are only interested in userspace traps */
+	if (regs && !user_mode_vm(regs))
+		return NOTIFY_DONE;
+
+	switch (val) {
+	case DIE_INT3:
+		/* Run your handler here */
+		if (uprobe_bkpt_notifier(regs))
+			ret = NOTIFY_STOP;
+		break;
+	case DIE_DEBUG:
+		if (uprobe_post_notifier(regs))
+			ret = NOTIFY_STOP;
+	default:
+		break;
+	}
+	return ret;
+}

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 16/20] 16: uprobes: register a notifier for uprobes.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (14 preceding siblings ...)
  2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 15/20] 15: x86: uprobes exception notifier for x86 Srikar Dronamraju
@ 2010-12-16 10:00 ` Srikar Dronamraju
  2011-01-25 13:56   ` Peter Zijlstra
  2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 17/20] 17: uprobes: filter chain Srikar Dronamraju
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16 10:00 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney


Uprobe needs to be intimated on int3 and singlestep exceptions.
Hence uprobes registers a die notifier so that its notified of the events.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 kernel/uprobes.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/kernel/uprobes.c b/kernel/uprobes.c
index 3d21d8f..93a3118 100644
--- a/kernel/uprobes.c
+++ b/kernel/uprobes.c
@@ -34,6 +34,7 @@
 #include <linux/rmap.h> /* needed for anon_vma_prepare */
 #include <linux/mman.h>	/* needed for PROT_EXEC, MAP_PRIVATE */
 #include <linux/file.h> /* needed for fput() */
+#include <linux/kdebug.h> /* needed for notifier mechanism */
 
 #define UINSNS_PER_PAGE	(PAGE_SIZE/UPROBES_XOL_SLOT_BYTES)
 #define MAX_UPROBES_XOL_SLOTS UINSNS_PER_PAGE
@@ -1170,3 +1171,21 @@ int uprobe_post_notifier(struct pt_regs *regs)
 	}
 	return 0;
 }
+
+struct notifier_block uprobes_exception_nb = {
+	.notifier_call = uprobes_exception_notify,
+	.priority = 0x7ffffff0,
+};
+
+static int __init init_uprobes(void)
+{
+	register_die_notifier(&uprobes_exception_nb);
+	return 0;
+}
+
+static void __exit exit_uprobes(void)
+{
+}
+
+module_init(init_uprobes);
+module_exit(exit_uprobes);

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 17/20] 17: uprobes: filter chain
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (15 preceding siblings ...)
  2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 16/20] 16: uprobes: register a notifier for uprobes Srikar Dronamraju
@ 2010-12-16 10:00 ` Srikar Dronamraju
  2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 18/20] 18: uprobes: commonly used filters Srikar Dronamraju
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16 10:00 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Linux-mm,
	Arnaldo Carvalho de Melo, Linus Torvalds,
	Ananth N Mavinakayanahalli, Christoph Hellwig, Masami Hiramatsu,
	Oleg Nesterov, Andrew Morton, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, LKML, Paul E. McKenney


Loops through the filters callbacks of currently registered
consumers to see if any consumer is interested in tracing this task.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 kernel/uprobes.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/kernel/uprobes.c b/kernel/uprobes.c
index 93a3118..b0d323c 100644
--- a/kernel/uprobes.c
+++ b/kernel/uprobes.c
@@ -429,6 +429,24 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
 }
 
 /* Acquires uprobe->consumer_rwsem */
+static bool filter_chain(struct uprobe *uprobe, struct task_struct *t)
+{
+	struct uprobe_consumer *consumer;
+	bool ret = false;
+
+	down_read(&uprobe->consumer_rwsem);
+	for (consumer = uprobe->consumers; consumer;
+					consumer = consumer->next) {
+		if (!consumer->filter || consumer->filter(consumer, t)) {
+			ret = true;
+			break;
+		}
+	}
+	up_read(&uprobe->consumer_rwsem);
+	return ret;
+}
+
+/* Acquires uprobe->consumer_rwsem */
 static void add_consumer(struct uprobe *uprobe,
 				struct uprobe_consumer *consumer)
 {

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 18/20] 18: uprobes: commonly used filters.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (16 preceding siblings ...)
  2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 17/20] 17: uprobes: filter chain Srikar Dronamraju
@ 2010-12-16 10:00 ` Srikar Dronamraju
  2010-12-17 19:32   ` Valdis.Kletnieks
  2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 19/20] 19: tracing: Extract out common code for kprobes/uprobes traceevents Srikar Dronamraju
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16 10:00 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Arnaldo Carvalho de Melo,
	Linus Torvalds, Andi Kleen, Christoph Hellwig,
	Ananth N Mavinakayanahalli, Masami Hiramatsu, Oleg Nesterov,
	LKML, Linux-mm, Jim Keniston, Frederic Weisbecker, SystemTap,
	Andrew Morton, Paul E. McKenney


Provides most commonly used filters. Prevents users from having to define
their own filters. However this would be useful once we can
dynamically associate a filter with a uprobe-event tracer.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 kernel/uprobes.c |   41 +++++++++++++++++++++++++++++++++++++++++
 1 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/kernel/uprobes.c b/kernel/uprobes.c
index b0d323c..44a7ff6 100644
--- a/kernel/uprobes.c
+++ b/kernel/uprobes.c
@@ -1190,6 +1190,46 @@ int uprobe_post_notifier(struct pt_regs *regs)
 	return 0;
 }
 
+bool uprobes_pid_filter(struct uprobe_consumer *self, struct task_struct *t)
+{
+	if (t->tgid == (long)self->fvalue)
+		return true;
+	return false;
+}
+
+bool uprobes_tid_filter(struct uprobe_consumer *self, struct task_struct *t)
+{
+	if (t->pid == (long)self->fvalue)
+		return true;
+	return false;
+}
+
+bool uprobes_ppid_filter(struct uprobe_consumer *self, struct task_struct *t)
+{
+	pid_t pid;
+
+	rcu_read_lock();
+	pid = task_tgid_vnr(t->real_parent);
+	rcu_read_unlock();
+
+	if (pid == (long)self->fvalue)
+		return true;
+	return false;
+}
+
+bool uprobes_sid_filter(struct uprobe_consumer *self, struct task_struct *t)
+{
+	pid_t pid;
+
+	rcu_read_lock();
+	pid = pid_vnr(task_session(t));
+	rcu_read_unlock();
+
+	if (pid == (long)self->fvalue)
+		return true;
+	return false;
+}
+
 struct notifier_block uprobes_exception_nb = {
 	.notifier_call = uprobes_exception_notify,
 	.priority = 0x7ffffff0,
@@ -1207,3 +1247,4 @@ static void __exit exit_uprobes(void)
 
 module_init(init_uprobes);
 module_exit(exit_uprobes);
+

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 19/20] 19: tracing: Extract out common code for kprobes/uprobes traceevents.
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (17 preceding siblings ...)
  2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 18/20] 18: uprobes: commonly used filters Srikar Dronamraju
@ 2010-12-16 10:00 ` Srikar Dronamraju
  2010-12-16 10:01 ` [RFC] [PATCH 2.6.37-rc5-tip 20/20] 20: tracing: uprobes trace_event interface Srikar Dronamraju
  2010-12-16 10:07 ` [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
  20 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16 10:00 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, Andrew Morton, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, LKML,
	Paul E. McKenney


Move parts of trace_kprobe.c that can be shared with upcoming
trace_uprobe.c. Common code to kernel/trace/trace_probe.h and
kernel/trace/trace_probe.c.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 kernel/trace/Kconfig        |    4 
 kernel/trace/Makefile       |    1 
 kernel/trace/trace_kprobe.c |  752 +------------------------------------------
 kernel/trace/trace_probe.c  |  648 +++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_probe.h  |  155 +++++++++
 5 files changed, 823 insertions(+), 737 deletions(-)
 create mode 100644 kernel/trace/trace_probe.c
 create mode 100644 kernel/trace/trace_probe.h

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index ea37e2f..09ad930 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -358,6 +358,7 @@ config KPROBE_EVENT
 	depends on HAVE_REGS_AND_STACK_ACCESS_API
 	bool "Enable kprobes-based dynamic events"
 	select TRACING
+	select PROBE_EVENTS
 	default y
 	help
 	  This allows the user to add tracing events (similar to tracepoints)
@@ -370,6 +371,9 @@ config KPROBE_EVENT
 	  This option is also required by perf-probe subcommand of perf tools.
 	  If you want to use perf tools, this option is strongly recommended.
 
+config PROBE_EVENTS
+	def_bool n
+
 config DYNAMIC_FTRACE
 	bool "enable/disable ftrace tracepoints dynamically"
 	depends on FUNCTION_TRACER
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 53f3381..95d2043 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -56,5 +56,6 @@ obj-$(CONFIG_EVENT_TRACING) += power-traces.o
 ifeq ($(CONFIG_TRACING),y)
 obj-$(CONFIG_KGDB_KDB) += trace_kdb.o
 endif
+obj-$(CONFIG_PROBE_EVENTS) +=trace_probe.o
 
 libftrace-y := ftrace.o
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 2dec9bc..1045ed7 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -19,462 +19,15 @@
 
 #include <linux/module.h>
 #include <linux/uaccess.h>
-#include <linux/kprobes.h>
-#include <linux/seq_file.h>
-#include <linux/slab.h>
-#include <linux/smp.h>
-#include <linux/debugfs.h>
-#include <linux/types.h>
-#include <linux/string.h>
-#include <linux/ctype.h>
-#include <linux/ptrace.h>
-#include <linux/perf_event.h>
-#include <linux/stringify.h>
-#include <linux/limits.h>
-#include <asm/bitsperlong.h>
-
-#include "trace.h"
-#include "trace_output.h"
-
-#define MAX_TRACE_ARGS 128
-#define MAX_ARGSTR_LEN 63
-#define MAX_EVENT_NAME_LEN 64
-#define MAX_STRING_SIZE PATH_MAX
-#define KPROBE_EVENT_SYSTEM "kprobes"
-
-/* Reserved field names */
-#define FIELD_STRING_IP "__probe_ip"
-#define FIELD_STRING_RETIP "__probe_ret_ip"
-#define FIELD_STRING_FUNC "__probe_func"
-
-const char *reserved_field_names[] = {
-	"common_type",
-	"common_flags",
-	"common_preempt_count",
-	"common_pid",
-	"common_tgid",
-	"common_lock_depth",
-	FIELD_STRING_IP,
-	FIELD_STRING_RETIP,
-	FIELD_STRING_FUNC,
-};
-
-/* Printing function type */
-typedef int (*print_type_func_t)(struct trace_seq *, const char *, void *,
-				 void *);
-#define PRINT_TYPE_FUNC_NAME(type)	print_type_##type
-#define PRINT_TYPE_FMT_NAME(type)	print_type_format_##type
-
-/* Printing  in basic type function template */
-#define DEFINE_BASIC_PRINT_TYPE_FUNC(type, fmt, cast)			\
-static __kprobes int PRINT_TYPE_FUNC_NAME(type)(struct trace_seq *s,	\
-						const char *name,	\
-						void *data, void *ent)\
-{									\
-	return trace_seq_printf(s, " %s=" fmt, name, (cast)*(type *)data);\
-}									\
-static const char PRINT_TYPE_FMT_NAME(type)[] = fmt;
-
-DEFINE_BASIC_PRINT_TYPE_FUNC(u8, "%x", unsigned int)
-DEFINE_BASIC_PRINT_TYPE_FUNC(u16, "%x", unsigned int)
-DEFINE_BASIC_PRINT_TYPE_FUNC(u32, "%lx", unsigned long)
-DEFINE_BASIC_PRINT_TYPE_FUNC(u64, "%llx", unsigned long long)
-DEFINE_BASIC_PRINT_TYPE_FUNC(s8, "%d", int)
-DEFINE_BASIC_PRINT_TYPE_FUNC(s16, "%d", int)
-DEFINE_BASIC_PRINT_TYPE_FUNC(s32, "%ld", long)
-DEFINE_BASIC_PRINT_TYPE_FUNC(s64, "%lld", long long)
-
-/* data_rloc: data relative location, compatible with u32 */
-#define make_data_rloc(len, roffs)	\
-	(((u32)(len) << 16) | ((u32)(roffs) & 0xffff))
-#define get_rloc_len(dl)	((u32)(dl) >> 16)
-#define get_rloc_offs(dl)	((u32)(dl) & 0xffff)
-
-static inline void *get_rloc_data(u32 *dl)
-{
-	return (u8 *)dl + get_rloc_offs(*dl);
-}
-
-/* For data_loc conversion */
-static inline void *get_loc_data(u32 *dl, void *ent)
-{
-	return (u8 *)ent + get_rloc_offs(*dl);
-}
-
-/*
- * Convert data_rloc to data_loc:
- *  data_rloc stores the offset from data_rloc itself, but data_loc
- *  stores the offset from event entry.
- */
-#define convert_rloc_to_loc(dl, offs)	((u32)(dl) + (offs))
-
-/* For defining macros, define string/string_size types */
-typedef u32 string;
-typedef u32 string_size;
-
-/* Print type function for string type */
-static __kprobes int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s,
-						  const char *name,
-						  void *data, void *ent)
-{
-	int len = *(u32 *)data >> 16;
-
-	if (!len)
-		return trace_seq_printf(s, " %s=(fault)", name);
-	else
-		return trace_seq_printf(s, " %s=\"%s\"", name,
-					(const char *)get_loc_data(data, ent));
-}
-static const char PRINT_TYPE_FMT_NAME(string)[] = "\\\"%s\\\"";
-
-/* Data fetch function type */
-typedef	void (*fetch_func_t)(struct pt_regs *, void *, void *);
-
-struct fetch_param {
-	fetch_func_t	fn;
-	void *data;
-};
-
-static __kprobes void call_fetch(struct fetch_param *fprm,
-				 struct pt_regs *regs, void *dest)
-{
-	return fprm->fn(regs, fprm->data, dest);
-}
-
-#define FETCH_FUNC_NAME(method, type)	fetch_##method##_##type
-/*
- * Define macro for basic types - we don't need to define s* types, because
- * we have to care only about bitwidth at recording time.
- */
-#define DEFINE_BASIC_FETCH_FUNCS(method) \
-DEFINE_FETCH_##method(u8)		\
-DEFINE_FETCH_##method(u16)		\
-DEFINE_FETCH_##method(u32)		\
-DEFINE_FETCH_##method(u64)
-
-#define CHECK_FETCH_FUNCS(method, fn)			\
-	(((FETCH_FUNC_NAME(method, u8) == fn) ||	\
-	  (FETCH_FUNC_NAME(method, u16) == fn) ||	\
-	  (FETCH_FUNC_NAME(method, u32) == fn) ||	\
-	  (FETCH_FUNC_NAME(method, u64) == fn) ||	\
-	  (FETCH_FUNC_NAME(method, string) == fn) ||	\
-	  (FETCH_FUNC_NAME(method, string_size) == fn)) \
-	 && (fn != NULL))
-
-/* Data fetch function templates */
-#define DEFINE_FETCH_reg(type)						\
-static __kprobes void FETCH_FUNC_NAME(reg, type)(struct pt_regs *regs,	\
-					void *offset, void *dest)	\
-{									\
-	*(type *)dest = (type)regs_get_register(regs,			\
-				(unsigned int)((unsigned long)offset));	\
-}
-DEFINE_BASIC_FETCH_FUNCS(reg)
-/* No string on the register */
-#define fetch_reg_string NULL
-#define fetch_reg_string_size NULL
-
-#define DEFINE_FETCH_stack(type)					\
-static __kprobes void FETCH_FUNC_NAME(stack, type)(struct pt_regs *regs,\
-					  void *offset, void *dest)	\
-{									\
-	*(type *)dest = (type)regs_get_kernel_stack_nth(regs,		\
-				(unsigned int)((unsigned long)offset));	\
-}
-DEFINE_BASIC_FETCH_FUNCS(stack)
-/* No string on the stack entry */
-#define fetch_stack_string NULL
-#define fetch_stack_string_size NULL
-
-#define DEFINE_FETCH_retval(type)					\
-static __kprobes void FETCH_FUNC_NAME(retval, type)(struct pt_regs *regs,\
-					  void *dummy, void *dest)	\
-{									\
-	*(type *)dest = (type)regs_return_value(regs);			\
-}
-DEFINE_BASIC_FETCH_FUNCS(retval)
-/* No string on the retval */
-#define fetch_retval_string NULL
-#define fetch_retval_string_size NULL
-
-#define DEFINE_FETCH_memory(type)					\
-static __kprobes void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs,\
-					  void *addr, void *dest)	\
-{									\
-	type retval;							\
-	if (probe_kernel_address(addr, retval))				\
-		*(type *)dest = 0;					\
-	else								\
-		*(type *)dest = retval;					\
-}
-DEFINE_BASIC_FETCH_FUNCS(memory)
-/*
- * Fetch a null-terminated string. Caller MUST set *(u32 *)dest with max
- * length and relative data location.
- */
-static __kprobes void FETCH_FUNC_NAME(memory, string)(struct pt_regs *regs,
-						      void *addr, void *dest)
-{
-	long ret;
-	int maxlen = get_rloc_len(*(u32 *)dest);
-	u8 *dst = get_rloc_data(dest);
-	u8 *src = addr;
-	mm_segment_t old_fs = get_fs();
-	if (!maxlen)
-		return;
-	/*
-	 * Try to get string again, since the string can be changed while
-	 * probing.
-	 */
-	set_fs(KERNEL_DS);
-	pagefault_disable();
-	do
-		ret = __copy_from_user_inatomic(dst++, src++, 1);
-	while (dst[-1] && ret == 0 && src - (u8 *)addr < maxlen);
-	dst[-1] = '\0';
-	pagefault_enable();
-	set_fs(old_fs);
-
-	if (ret < 0) {	/* Failed to fetch string */
-		((u8 *)get_rloc_data(dest))[0] = '\0';
-		*(u32 *)dest = make_data_rloc(0, get_rloc_offs(*(u32 *)dest));
-	} else
-		*(u32 *)dest = make_data_rloc(src - (u8 *)addr,
-					      get_rloc_offs(*(u32 *)dest));
-}
-/* Return the length of string -- including null terminal byte */
-static __kprobes void FETCH_FUNC_NAME(memory, string_size)(struct pt_regs *regs,
-							void *addr, void *dest)
-{
-	int ret, len = 0;
-	u8 c;
-	mm_segment_t old_fs = get_fs();
-
-	set_fs(KERNEL_DS);
-	pagefault_disable();
-	do {
-		ret = __copy_from_user_inatomic(&c, (u8 *)addr + len, 1);
-		len++;
-	} while (c && ret == 0 && len < MAX_STRING_SIZE);
-	pagefault_enable();
-	set_fs(old_fs);
-
-	if (ret < 0)	/* Failed to check the length */
-		*(u32 *)dest = 0;
-	else
-		*(u32 *)dest = len;
-}
-
-/* Memory fetching by symbol */
-struct symbol_cache {
-	char *symbol;
-	long offset;
-	unsigned long addr;
-};
-
-static unsigned long update_symbol_cache(struct symbol_cache *sc)
-{
-	sc->addr = (unsigned long)kallsyms_lookup_name(sc->symbol);
-	if (sc->addr)
-		sc->addr += sc->offset;
-	return sc->addr;
-}
-
-static void free_symbol_cache(struct symbol_cache *sc)
-{
-	kfree(sc->symbol);
-	kfree(sc);
-}
-
-static struct symbol_cache *alloc_symbol_cache(const char *sym, long offset)
-{
-	struct symbol_cache *sc;
-
-	if (!sym || strlen(sym) == 0)
-		return NULL;
-	sc = kzalloc(sizeof(struct symbol_cache), GFP_KERNEL);
-	if (!sc)
-		return NULL;
-
-	sc->symbol = kstrdup(sym, GFP_KERNEL);
-	if (!sc->symbol) {
-		kfree(sc);
-		return NULL;
-	}
-	sc->offset = offset;
-
-	update_symbol_cache(sc);
-	return sc;
-}
-
-#define DEFINE_FETCH_symbol(type)					\
-static __kprobes void FETCH_FUNC_NAME(symbol, type)(struct pt_regs *regs,\
-					  void *data, void *dest)	\
-{									\
-	struct symbol_cache *sc = data;					\
-	if (sc->addr)							\
-		fetch_memory_##type(regs, (void *)sc->addr, dest);	\
-	else								\
-		*(type *)dest = 0;					\
-}
-DEFINE_BASIC_FETCH_FUNCS(symbol)
-DEFINE_FETCH_symbol(string)
-DEFINE_FETCH_symbol(string_size)
-
-/* Dereference memory access function */
-struct deref_fetch_param {
-	struct fetch_param orig;
-	long offset;
-};
 
-#define DEFINE_FETCH_deref(type)					\
-static __kprobes void FETCH_FUNC_NAME(deref, type)(struct pt_regs *regs,\
-					    void *data, void *dest)	\
-{									\
-	struct deref_fetch_param *dprm = data;				\
-	unsigned long addr;						\
-	call_fetch(&dprm->orig, regs, &addr);				\
-	if (addr) {							\
-		addr += dprm->offset;					\
-		fetch_memory_##type(regs, (void *)addr, dest);		\
-	} else								\
-		*(type *)dest = 0;					\
-}
-DEFINE_BASIC_FETCH_FUNCS(deref)
-DEFINE_FETCH_deref(string)
-DEFINE_FETCH_deref(string_size)
+#include "trace_probe.h"
 
-static __kprobes void free_deref_fetch_param(struct deref_fetch_param *data)
-{
-	if (CHECK_FETCH_FUNCS(deref, data->orig.fn))
-		free_deref_fetch_param(data->orig.data);
-	else if (CHECK_FETCH_FUNCS(symbol, data->orig.fn))
-		free_symbol_cache(data->orig.data);
-	kfree(data);
-}
-
-/* Default (unsigned long) fetch type */
-#define __DEFAULT_FETCH_TYPE(t) u##t
-#define _DEFAULT_FETCH_TYPE(t) __DEFAULT_FETCH_TYPE(t)
-#define DEFAULT_FETCH_TYPE _DEFAULT_FETCH_TYPE(BITS_PER_LONG)
-#define DEFAULT_FETCH_TYPE_STR __stringify(DEFAULT_FETCH_TYPE)
-
-/* Fetch types */
-enum {
-	FETCH_MTD_reg = 0,
-	FETCH_MTD_stack,
-	FETCH_MTD_retval,
-	FETCH_MTD_memory,
-	FETCH_MTD_symbol,
-	FETCH_MTD_deref,
-	FETCH_MTD_END,
-};
-
-#define ASSIGN_FETCH_FUNC(method, type)	\
-	[FETCH_MTD_##method] = FETCH_FUNC_NAME(method, type)
-
-#define __ASSIGN_FETCH_TYPE(_name, ptype, ftype, _size, sign, _fmttype)	\
-	{.name = _name,				\
-	 .size = _size,					\
-	 .is_signed = sign,				\
-	 .print = PRINT_TYPE_FUNC_NAME(ptype),		\
-	 .fmt = PRINT_TYPE_FMT_NAME(ptype),		\
-	 .fmttype = _fmttype,				\
-	 .fetch = {					\
-ASSIGN_FETCH_FUNC(reg, ftype),				\
-ASSIGN_FETCH_FUNC(stack, ftype),			\
-ASSIGN_FETCH_FUNC(retval, ftype),			\
-ASSIGN_FETCH_FUNC(memory, ftype),			\
-ASSIGN_FETCH_FUNC(symbol, ftype),			\
-ASSIGN_FETCH_FUNC(deref, ftype),			\
-	  }						\
-	}
-
-#define ASSIGN_FETCH_TYPE(ptype, ftype, sign)			\
-	__ASSIGN_FETCH_TYPE(#ptype, ptype, ftype, sizeof(ftype), sign, #ptype)
-
-#define FETCH_TYPE_STRING 0
-#define FETCH_TYPE_STRSIZE 1
-
-/* Fetch type information table */
-static const struct fetch_type {
-	const char	*name;		/* Name of type */
-	size_t		size;		/* Byte size of type */
-	int		is_signed;	/* Signed flag */
-	print_type_func_t	print;	/* Print functions */
-	const char	*fmt;		/* Fromat string */
-	const char	*fmttype;	/* Name in format file */
-	/* Fetch functions */
-	fetch_func_t	fetch[FETCH_MTD_END];
-} fetch_type_table[] = {
-	/* Special types */
-	[FETCH_TYPE_STRING] = __ASSIGN_FETCH_TYPE("string", string, string,
-					sizeof(u32), 1, "__data_loc char[]"),
-	[FETCH_TYPE_STRSIZE] = __ASSIGN_FETCH_TYPE("string_size", u32,
-					string_size, sizeof(u32), 0, "u32"),
-	/* Basic types */
-	ASSIGN_FETCH_TYPE(u8,  u8,  0),
-	ASSIGN_FETCH_TYPE(u16, u16, 0),
-	ASSIGN_FETCH_TYPE(u32, u32, 0),
-	ASSIGN_FETCH_TYPE(u64, u64, 0),
-	ASSIGN_FETCH_TYPE(s8,  u8,  1),
-	ASSIGN_FETCH_TYPE(s16, u16, 1),
-	ASSIGN_FETCH_TYPE(s32, u32, 1),
-	ASSIGN_FETCH_TYPE(s64, u64, 1),
-};
-
-static const struct fetch_type *find_fetch_type(const char *type)
-{
-	int i;
-
-	if (!type)
-		type = DEFAULT_FETCH_TYPE_STR;
-
-	for (i = 0; i < ARRAY_SIZE(fetch_type_table); i++)
-		if (strcmp(type, fetch_type_table[i].name) == 0)
-			return &fetch_type_table[i];
-	return NULL;
-}
-
-/* Special function : only accept unsigned long */
-static __kprobes void fetch_stack_address(struct pt_regs *regs,
-					  void *dummy, void *dest)
-{
-	*(unsigned long *)dest = kernel_stack_pointer(regs);
-}
-
-static fetch_func_t get_fetch_size_function(const struct fetch_type *type,
-					    fetch_func_t orig_fn)
-{
-	int i;
-
-	if (type != &fetch_type_table[FETCH_TYPE_STRING])
-		return NULL;	/* Only string type needs size function */
-	for (i = 0; i < FETCH_MTD_END; i++)
-		if (type->fetch[i] == orig_fn)
-			return fetch_type_table[FETCH_TYPE_STRSIZE].fetch[i];
-
-	WARN_ON(1);	/* This should not happen */
-	return NULL;
-}
+#define KPROBE_EVENT_SYSTEM "kprobes"
 
 /**
  * Kprobe event core functions
  */
 
-struct probe_arg {
-	struct fetch_param	fetch;
-	struct fetch_param	fetch_size;
-	unsigned int		offset;	/* Offset from argument entry */
-	const char		*name;	/* Name of this argument */
-	const char		*comm;	/* Command of this argument */
-	const struct fetch_type	*type;	/* Type of this argument */
-};
-
-/* Flags for trace_probe */
-#define TP_FLAG_TRACE	1
-#define TP_FLAG_PROFILE	2
-
 struct trace_probe {
 	struct list_head	list;
 	struct kretprobe	rp;	/* Use rp.kp for kprobe use */
@@ -513,18 +66,6 @@ static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs);
 static int kretprobe_dispatcher(struct kretprobe_instance *ri,
 				struct pt_regs *regs);
 
-/* Check the name is good for event/group/fields */
-static int is_good_name(const char *name)
-{
-	if (!isalpha(*name) && *name != '_')
-		return 0;
-	while (*++name != '\0') {
-		if (!isalpha(*name) && !isdigit(*name) && *name != '_')
-			return 0;
-	}
-	return 1;
-}
-
 /*
  * Allocate new trace_probe and initialize it (including kprobes).
  */
@@ -533,7 +74,7 @@ static struct trace_probe *alloc_trace_probe(const char *group,
 					     void *addr,
 					     const char *symbol,
 					     unsigned long offs,
-					     int nargs, int is_return)
+					     int nargs, bool is_return)
 {
 	struct trace_probe *tp;
 	int ret = -ENOMEM;
@@ -584,22 +125,12 @@ error:
 	return ERR_PTR(ret);
 }
 
-static void free_probe_arg(struct probe_arg *arg)
-{
-	if (CHECK_FETCH_FUNCS(deref, arg->fetch.fn))
-		free_deref_fetch_param(arg->fetch.data);
-	else if (CHECK_FETCH_FUNCS(symbol, arg->fetch.fn))
-		free_symbol_cache(arg->fetch.data);
-	kfree(arg->name);
-	kfree(arg->comm);
-}
-
 static void free_trace_probe(struct trace_probe *tp)
 {
 	int i;
 
 	for (i = 0; i < tp->nr_args; i++)
-		free_probe_arg(&tp->args[i]);
+		traceprobe_free_probe_arg(&tp->args[i]);
 
 	kfree(tp->call.class->system);
 	kfree(tp->call.name);
@@ -673,191 +204,6 @@ end:
 	return ret;
 }
 
-/* Split symbol and offset. */
-static int split_symbol_offset(char *symbol, unsigned long *offset)
-{
-	char *tmp;
-	int ret;
-
-	if (!offset)
-		return -EINVAL;
-
-	tmp = strchr(symbol, '+');
-	if (tmp) {
-		/* skip sign because strict_strtol doesn't accept '+' */
-		ret = strict_strtoul(tmp + 1, 0, offset);
-		if (ret)
-			return ret;
-		*tmp = '\0';
-	} else
-		*offset = 0;
-	return 0;
-}
-
-#define PARAM_MAX_ARGS 16
-#define PARAM_MAX_STACK (THREAD_SIZE / sizeof(unsigned long))
-
-static int parse_probe_vars(char *arg, const struct fetch_type *t,
-			    struct fetch_param *f, int is_return)
-{
-	int ret = 0;
-	unsigned long param;
-
-	if (strcmp(arg, "retval") == 0) {
-		if (is_return)
-			f->fn = t->fetch[FETCH_MTD_retval];
-		else
-			ret = -EINVAL;
-	} else if (strncmp(arg, "stack", 5) == 0) {
-		if (arg[5] == '\0') {
-			if (strcmp(t->name, DEFAULT_FETCH_TYPE_STR) == 0)
-				f->fn = fetch_stack_address;
-			else
-				ret = -EINVAL;
-		} else if (isdigit(arg[5])) {
-			ret = strict_strtoul(arg + 5, 10, &param);
-			if (ret || param > PARAM_MAX_STACK)
-				ret = -EINVAL;
-			else {
-				f->fn = t->fetch[FETCH_MTD_stack];
-				f->data = (void *)param;
-			}
-		} else
-			ret = -EINVAL;
-	} else
-		ret = -EINVAL;
-	return ret;
-}
-
-/* Recursive argument parser */
-static int __parse_probe_arg(char *arg, const struct fetch_type *t,
-			     struct fetch_param *f, int is_return)
-{
-	int ret = 0;
-	unsigned long param;
-	long offset;
-	char *tmp;
-
-	switch (arg[0]) {
-	case '$':
-		ret = parse_probe_vars(arg + 1, t, f, is_return);
-		break;
-	case '%':	/* named register */
-		ret = regs_query_register_offset(arg + 1);
-		if (ret >= 0) {
-			f->fn = t->fetch[FETCH_MTD_reg];
-			f->data = (void *)(unsigned long)ret;
-			ret = 0;
-		}
-		break;
-	case '@':	/* memory or symbol */
-		if (isdigit(arg[1])) {
-			ret = strict_strtoul(arg + 1, 0, &param);
-			if (ret)
-				break;
-			f->fn = t->fetch[FETCH_MTD_memory];
-			f->data = (void *)param;
-		} else {
-			ret = split_symbol_offset(arg + 1, &offset);
-			if (ret)
-				break;
-			f->data = alloc_symbol_cache(arg + 1, offset);
-			if (f->data)
-				f->fn = t->fetch[FETCH_MTD_symbol];
-		}
-		break;
-	case '+':	/* deref memory */
-	case '-':
-		tmp = strchr(arg, '(');
-		if (!tmp)
-			break;
-		*tmp = '\0';
-		ret = strict_strtol(arg + 1, 0, &offset);
-		if (ret)
-			break;
-		if (arg[0] == '-')
-			offset = -offset;
-		arg = tmp + 1;
-		tmp = strrchr(arg, ')');
-		if (tmp) {
-			struct deref_fetch_param *dprm;
-			const struct fetch_type *t2 = find_fetch_type(NULL);
-			*tmp = '\0';
-			dprm = kzalloc(sizeof(struct deref_fetch_param),
-				       GFP_KERNEL);
-			if (!dprm)
-				return -ENOMEM;
-			dprm->offset = offset;
-			ret = __parse_probe_arg(arg, t2, &dprm->orig,
-						is_return);
-			if (ret)
-				kfree(dprm);
-			else {
-				f->fn = t->fetch[FETCH_MTD_deref];
-				f->data = (void *)dprm;
-			}
-		}
-		break;
-	}
-	if (!ret && !f->fn) {	/* Parsed, but do not find fetch method */
-		pr_info("%s type has no corresponding fetch method.\n",
-			t->name);
-		ret = -EINVAL;
-	}
-	return ret;
-}
-
-/* String length checking wrapper */
-static int parse_probe_arg(char *arg, struct trace_probe *tp,
-			   struct probe_arg *parg, int is_return)
-{
-	const char *t;
-	int ret;
-
-	if (strlen(arg) > MAX_ARGSTR_LEN) {
-		pr_info("Argument is too long.: %s\n",  arg);
-		return -ENOSPC;
-	}
-	parg->comm = kstrdup(arg, GFP_KERNEL);
-	if (!parg->comm) {
-		pr_info("Failed to allocate memory for command '%s'.\n", arg);
-		return -ENOMEM;
-	}
-	t = strchr(parg->comm, ':');
-	if (t) {
-		arg[t - parg->comm] = '\0';
-		t++;
-	}
-	parg->type = find_fetch_type(t);
-	if (!parg->type) {
-		pr_info("Unsupported type: %s\n", t);
-		return -EINVAL;
-	}
-	parg->offset = tp->size;
-	tp->size += parg->type->size;
-	ret = __parse_probe_arg(arg, parg->type, &parg->fetch, is_return);
-	if (ret >= 0) {
-		parg->fetch_size.fn = get_fetch_size_function(parg->type,
-							      parg->fetch.fn);
-		parg->fetch_size.data = parg->fetch.data;
-	}
-	return ret;
-}
-
-/* Return 1 if name is reserved or already used by another argument */
-static int conflict_field_name(const char *name,
-			       struct probe_arg *args, int narg)
-{
-	int i;
-	for (i = 0; i < ARRAY_SIZE(reserved_field_names); i++)
-		if (strcmp(reserved_field_names[i], name) == 0)
-			return 1;
-	for (i = 0; i < narg; i++)
-		if (strcmp(args[i].name, name) == 0)
-			return 1;
-	return 0;
-}
-
 static int create_trace_probe(int argc, char **argv)
 {
 	/*
@@ -880,7 +226,7 @@ static int create_trace_probe(int argc, char **argv)
 	 */
 	struct trace_probe *tp;
 	int i, ret = 0;
-	int is_return = 0, is_delete = 0;
+	bool is_return = false, is_delete = false;
 	char *symbol = NULL, *event = NULL, *group = NULL;
 	char *arg;
 	unsigned long offset = 0;
@@ -889,11 +235,11 @@ static int create_trace_probe(int argc, char **argv)
 
 	/* argc must be >= 1 */
 	if (argv[0][0] == 'p')
-		is_return = 0;
+		is_return = false;
 	else if (argv[0][0] == 'r')
-		is_return = 1;
+		is_return = true;
 	else if (argv[0][0] == '-')
-		is_delete = 1;
+		is_delete = true;
 	else {
 		pr_info("Probe definition must be started with 'p', 'r' or"
 			" '-'.\n");
@@ -957,7 +303,7 @@ static int create_trace_probe(int argc, char **argv)
 		/* a symbol specified */
 		symbol = argv[1];
 		/* TODO: support .init module functions */
-		ret = split_symbol_offset(symbol, &offset);
+		ret = traceprobe_split_symbol_offset(symbol, &offset);
 		if (ret) {
 			pr_info("Failed to parse symbol.\n");
 			return ret;
@@ -1019,7 +365,8 @@ static int create_trace_probe(int argc, char **argv)
 			goto error;
 		}
 
-		if (conflict_field_name(tp->args[i].name, tp->args, i)) {
+		if (traceprobe_conflict_field_name(tp->args[i].name,
+							tp->args, i)) {
 			pr_info("Argument[%d] name '%s' conflicts with "
 				"another field.\n", i, argv[i]);
 			ret = -EINVAL;
@@ -1027,7 +374,8 @@ static int create_trace_probe(int argc, char **argv)
 		}
 
 		/* Parse fetch argument */
-		ret = parse_probe_arg(arg, tp, &tp->args[i], is_return);
+		ret = traceprobe_parse_probe_arg(arg, &tp->size, &tp->args[i],
+								is_return);
 		if (ret) {
 			pr_info("Parse error at argument[%d]. (%d)\n", i, ret);
 			goto error;
@@ -1114,70 +462,11 @@ static int probes_open(struct inode *inode, struct file *file)
 	return seq_open(file, &probes_seq_op);
 }
 
-static int command_trace_probe(const char *buf)
-{
-	char **argv;
-	int argc = 0, ret = 0;
-
-	argv = argv_split(GFP_KERNEL, buf, &argc);
-	if (!argv)
-		return -ENOMEM;
-
-	if (argc)
-		ret = create_trace_probe(argc, argv);
-
-	argv_free(argv);
-	return ret;
-}
-
-#define WRITE_BUFSIZE 128
-
 static ssize_t probes_write(struct file *file, const char __user *buffer,
 			    size_t count, loff_t *ppos)
 {
-	char *kbuf, *tmp;
-	int ret;
-	size_t done;
-	size_t size;
-
-	kbuf = kmalloc(WRITE_BUFSIZE, GFP_KERNEL);
-	if (!kbuf)
-		return -ENOMEM;
-
-	ret = done = 0;
-	while (done < count) {
-		size = count - done;
-		if (size >= WRITE_BUFSIZE)
-			size = WRITE_BUFSIZE - 1;
-		if (copy_from_user(kbuf, buffer + done, size)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		kbuf[size] = '\0';
-		tmp = strchr(kbuf, '\n');
-		if (tmp) {
-			*tmp = '\0';
-			size = tmp - kbuf + 1;
-		} else if (done + size < count) {
-			pr_warning("Line length is too long: "
-				   "Should be less than %d.", WRITE_BUFSIZE);
-			ret = -EINVAL;
-			goto out;
-		}
-		done += size;
-		/* Remove comments */
-		tmp = strchr(kbuf, '#');
-		if (tmp)
-			*tmp = '\0';
-
-		ret = command_trace_probe(kbuf);
-		if (ret)
-			goto out;
-	}
-	ret = done;
-out:
-	kfree(kbuf);
-	return ret;
+	return traceprobe_probes_write(file, buffer, count, ppos,
+			create_trace_probe);
 }
 
 static const struct file_operations kprobe_events_ops = {
@@ -1435,17 +724,6 @@ static void probe_event_disable(struct ftrace_event_call *call)
 	}
 }
 
-#undef DEFINE_FIELD
-#define DEFINE_FIELD(type, item, name, is_signed)			\
-	do {								\
-		ret = trace_define_field(event_call, #type, name,	\
-					 offsetof(typeof(field), item),	\
-					 sizeof(field.item), is_signed, \
-					 FILTER_OTHER);			\
-		if (ret)						\
-			return ret;					\
-	} while (0)
-
 static int kprobe_event_define_fields(struct ftrace_event_call *event_call)
 {
 	int ret, i;
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
new file mode 100644
index 0000000..4ca48c6
--- /dev/null
+++ b/kernel/trace/trace_probe.c
@@ -0,0 +1,648 @@
+/*
+ * Common code for probe-based Dynamic events.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ * Copyright (C) IBM Corporation, 2010
+ * Author:     Srikar Dronamraju
+ *
+ * Derived from kernel/trace/trace_kprobe.c written by
+ * Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ */
+
+#include "trace_probe.h"
+
+const char *reserved_field_names[] = {
+	"common_type",
+	"common_flags",
+	"common_preempt_count",
+	"common_pid",
+	"common_tgid",
+	"common_lock_depth",
+	FIELD_STRING_IP,
+	FIELD_STRING_RETIP,
+	FIELD_STRING_FUNC,
+};
+
+/* Printing function type */
+#define PRINT_TYPE_FUNC_NAME(type)	print_type_##type
+#define PRINT_TYPE_FMT_NAME(type)	print_type_format_##type
+
+/* Printing  in basic type function template */
+#define DEFINE_BASIC_PRINT_TYPE_FUNC(type, fmt, cast)			\
+static __kprobes int PRINT_TYPE_FUNC_NAME(type)(struct trace_seq *s,	\
+						const char *name,	\
+						void *data, void *ent)\
+{									\
+	return trace_seq_printf(s, " %s=" fmt, name, (cast)*(type *)data);\
+}									\
+static const char PRINT_TYPE_FMT_NAME(type)[] = fmt;
+
+DEFINE_BASIC_PRINT_TYPE_FUNC(u8, "%x", unsigned int)
+DEFINE_BASIC_PRINT_TYPE_FUNC(u16, "%x", unsigned int)
+DEFINE_BASIC_PRINT_TYPE_FUNC(u32, "%lx", unsigned long)
+DEFINE_BASIC_PRINT_TYPE_FUNC(u64, "%llx", unsigned long long)
+DEFINE_BASIC_PRINT_TYPE_FUNC(s8, "%d", int)
+DEFINE_BASIC_PRINT_TYPE_FUNC(s16, "%d", int)
+DEFINE_BASIC_PRINT_TYPE_FUNC(s32, "%ld", long)
+DEFINE_BASIC_PRINT_TYPE_FUNC(s64, "%lld", long long)
+
+static inline void *get_rloc_data(u32 *dl)
+{
+	return (u8 *)dl + get_rloc_offs(*dl);
+}
+
+/* For data_loc conversion */
+static inline void *get_loc_data(u32 *dl, void *ent)
+{
+	return (u8 *)ent + get_rloc_offs(*dl);
+}
+
+/* For defining macros, define string/string_size types */
+typedef u32 string;
+typedef u32 string_size;
+
+/* Print type function for string type */
+static __kprobes int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s,
+						  const char *name,
+						  void *data, void *ent)
+{
+	int len = *(u32 *)data >> 16;
+
+	if (!len)
+		return trace_seq_printf(s, " %s=(fault)", name);
+	else
+		return trace_seq_printf(s, " %s=\"%s\"", name,
+					(const char *)get_loc_data(data, ent));
+}
+static const char PRINT_TYPE_FMT_NAME(string)[] = "\\\"%s\\\"";
+
+#define FETCH_FUNC_NAME(method, type)	fetch_##method##_##type
+/*
+ * Define macro for basic types - we don't need to define s* types, because
+ * we have to care only about bitwidth at recording time.
+ */
+#define DEFINE_BASIC_FETCH_FUNCS(method) \
+DEFINE_FETCH_##method(u8)		\
+DEFINE_FETCH_##method(u16)		\
+DEFINE_FETCH_##method(u32)		\
+DEFINE_FETCH_##method(u64)
+
+#define CHECK_FETCH_FUNCS(method, fn)			\
+	(((FETCH_FUNC_NAME(method, u8) == fn) ||	\
+	  (FETCH_FUNC_NAME(method, u16) == fn) ||	\
+	  (FETCH_FUNC_NAME(method, u32) == fn) ||	\
+	  (FETCH_FUNC_NAME(method, u64) == fn) ||	\
+	  (FETCH_FUNC_NAME(method, string) == fn) ||	\
+	  (FETCH_FUNC_NAME(method, string_size) == fn)) \
+	 && (fn != NULL))
+
+/* Data fetch function templates */
+#define DEFINE_FETCH_reg(type)						\
+static __kprobes void FETCH_FUNC_NAME(reg, type)(struct pt_regs *regs,	\
+					void *offset, void *dest)	\
+{									\
+	*(type *)dest = (type)regs_get_register(regs,			\
+				(unsigned int)((unsigned long)offset));	\
+}
+DEFINE_BASIC_FETCH_FUNCS(reg)
+/* No string on the register */
+#define fetch_reg_string NULL
+#define fetch_reg_string_size NULL
+
+#define DEFINE_FETCH_stack(type)					\
+static __kprobes void FETCH_FUNC_NAME(stack, type)(struct pt_regs *regs,\
+					  void *offset, void *dest)	\
+{									\
+	*(type *)dest = (type)regs_get_kernel_stack_nth(regs,		\
+				(unsigned int)((unsigned long)offset));	\
+}
+DEFINE_BASIC_FETCH_FUNCS(stack)
+/* No string on the stack entry */
+#define fetch_stack_string NULL
+#define fetch_stack_string_size NULL
+
+#define DEFINE_FETCH_retval(type)					\
+static __kprobes void FETCH_FUNC_NAME(retval, type)(struct pt_regs *regs,\
+					  void *dummy, void *dest)	\
+{									\
+	*(type *)dest = (type)regs_return_value(regs);			\
+}
+DEFINE_BASIC_FETCH_FUNCS(retval)
+/* No string on the retval */
+#define fetch_retval_string NULL
+#define fetch_retval_string_size NULL
+
+#define DEFINE_FETCH_memory(type)					\
+static __kprobes void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs,\
+					  void *addr, void *dest)	\
+{									\
+	type retval;							\
+	if (probe_kernel_address(addr, retval))				\
+		*(type *)dest = 0;					\
+	else								\
+		*(type *)dest = retval;					\
+}
+DEFINE_BASIC_FETCH_FUNCS(memory)
+/*
+ * Fetch a null-terminated string. Caller MUST set *(u32 *)dest with max
+ * length and relative data location.
+ */
+static __kprobes void FETCH_FUNC_NAME(memory, string)(struct pt_regs *regs,
+						      void *addr, void *dest)
+{
+	long ret;
+	int maxlen = get_rloc_len(*(u32 *)dest);
+	u8 *dst = get_rloc_data(dest);
+	u8 *src = addr;
+	mm_segment_t old_fs = get_fs();
+	if (!maxlen)
+		return;
+	/*
+	 * Try to get string again, since the string can be changed while
+	 * probing.
+	 */
+	set_fs(KERNEL_DS);
+	pagefault_disable();
+	do
+		ret = __copy_from_user_inatomic(dst++, src++, 1);
+	while (dst[-1] && ret == 0 && src - (u8 *)addr < maxlen);
+	dst[-1] = '\0';
+	pagefault_enable();
+	set_fs(old_fs);
+
+	if (ret < 0) {	/* Failed to fetch string */
+		((u8 *)get_rloc_data(dest))[0] = '\0';
+		*(u32 *)dest = make_data_rloc(0, get_rloc_offs(*(u32 *)dest));
+	} else
+		*(u32 *)dest = make_data_rloc(src - (u8 *)addr,
+					      get_rloc_offs(*(u32 *)dest));
+}
+/* Return the length of string -- including null terminal byte */
+static __kprobes void FETCH_FUNC_NAME(memory, string_size)(struct pt_regs *regs,
+							void *addr, void *dest)
+{
+	int ret, len = 0;
+	u8 c;
+	mm_segment_t old_fs = get_fs();
+
+	set_fs(KERNEL_DS);
+	pagefault_disable();
+	do {
+		ret = __copy_from_user_inatomic(&c, (u8 *)addr + len, 1);
+		len++;
+	} while (c && ret == 0 && len < MAX_STRING_SIZE);
+	pagefault_enable();
+	set_fs(old_fs);
+
+	if (ret < 0)	/* Failed to check the length */
+		*(u32 *)dest = 0;
+	else
+		*(u32 *)dest = len;
+}
+
+/* Memory fetching by symbol */
+struct symbol_cache {
+	char *symbol;
+	long offset;
+	unsigned long addr;
+};
+
+static unsigned long update_symbol_cache(struct symbol_cache *sc)
+{
+	sc->addr = (unsigned long)kallsyms_lookup_name(sc->symbol);
+	if (sc->addr)
+		sc->addr += sc->offset;
+	return sc->addr;
+}
+
+static void free_symbol_cache(struct symbol_cache *sc)
+{
+	kfree(sc->symbol);
+	kfree(sc);
+}
+
+static struct symbol_cache *alloc_symbol_cache(const char *sym, long offset)
+{
+	struct symbol_cache *sc;
+
+	if (!sym || strlen(sym) == 0)
+		return NULL;
+	sc = kzalloc(sizeof(struct symbol_cache), GFP_KERNEL);
+	if (!sc)
+		return NULL;
+
+	sc->symbol = kstrdup(sym, GFP_KERNEL);
+	if (!sc->symbol) {
+		kfree(sc);
+		return NULL;
+	}
+	sc->offset = offset;
+
+	update_symbol_cache(sc);
+	return sc;
+}
+
+#define DEFINE_FETCH_symbol(type)					\
+static __kprobes void FETCH_FUNC_NAME(symbol, type)(struct pt_regs *regs,\
+					  void *data, void *dest)	\
+{									\
+	struct symbol_cache *sc = data;					\
+	if (sc->addr)							\
+		fetch_memory_##type(regs, (void *)sc->addr, dest);	\
+	else								\
+		*(type *)dest = 0;					\
+}
+DEFINE_BASIC_FETCH_FUNCS(symbol)
+DEFINE_FETCH_symbol(string)
+DEFINE_FETCH_symbol(string_size)
+
+/* Dereference memory access function */
+struct deref_fetch_param {
+	struct fetch_param orig;
+	long offset;
+};
+
+#define DEFINE_FETCH_deref(type)					\
+static __kprobes void FETCH_FUNC_NAME(deref, type)(struct pt_regs *regs,\
+					    void *data, void *dest)	\
+{									\
+	struct deref_fetch_param *dprm = data;				\
+	unsigned long addr;						\
+	call_fetch(&dprm->orig, regs, &addr);				\
+	if (addr) {							\
+		addr += dprm->offset;					\
+		fetch_memory_##type(regs, (void *)addr, dest);		\
+	} else								\
+		*(type *)dest = 0;					\
+}
+DEFINE_BASIC_FETCH_FUNCS(deref)
+DEFINE_FETCH_deref(string)
+DEFINE_FETCH_deref(string_size)
+
+static __kprobes void free_deref_fetch_param(struct deref_fetch_param *data)
+{
+	if (CHECK_FETCH_FUNCS(deref, data->orig.fn))
+		free_deref_fetch_param(data->orig.data);
+	else if (CHECK_FETCH_FUNCS(symbol, data->orig.fn))
+		free_symbol_cache(data->orig.data);
+	kfree(data);
+}
+
+/* Default (unsigned long) fetch type */
+#define __DEFAULT_FETCH_TYPE(t) u##t
+#define _DEFAULT_FETCH_TYPE(t) __DEFAULT_FETCH_TYPE(t)
+#define DEFAULT_FETCH_TYPE _DEFAULT_FETCH_TYPE(BITS_PER_LONG)
+#define DEFAULT_FETCH_TYPE_STR __stringify(DEFAULT_FETCH_TYPE)
+
+#define ASSIGN_FETCH_FUNC(method, type)	\
+	[FETCH_MTD_##method] = FETCH_FUNC_NAME(method, type)
+
+#define __ASSIGN_FETCH_TYPE(_name, ptype, ftype, _size, sign, _fmttype)	\
+	{.name = _name,				\
+	 .size = _size,					\
+	 .is_signed = sign,				\
+	 .print = PRINT_TYPE_FUNC_NAME(ptype),		\
+	 .fmt = PRINT_TYPE_FMT_NAME(ptype),		\
+	 .fmttype = _fmttype,				\
+	 .fetch = {					\
+ASSIGN_FETCH_FUNC(reg, ftype),				\
+ASSIGN_FETCH_FUNC(stack, ftype),			\
+ASSIGN_FETCH_FUNC(retval, ftype),			\
+ASSIGN_FETCH_FUNC(memory, ftype),			\
+ASSIGN_FETCH_FUNC(symbol, ftype),			\
+ASSIGN_FETCH_FUNC(deref, ftype),			\
+	  }						\
+	}
+
+#define ASSIGN_FETCH_TYPE(ptype, ftype, sign)			\
+	__ASSIGN_FETCH_TYPE(#ptype, ptype, ftype, sizeof(ftype), sign, #ptype)
+
+#define FETCH_TYPE_STRING 0
+#define FETCH_TYPE_STRSIZE 1
+
+/* Fetch type information table */
+static const struct fetch_type fetch_type_table[] = {
+	/* Special types */
+	[FETCH_TYPE_STRING] = __ASSIGN_FETCH_TYPE("string", string, string,
+					sizeof(u32), 1, "__data_loc char[]"),
+	[FETCH_TYPE_STRSIZE] = __ASSIGN_FETCH_TYPE("string_size", u32,
+					string_size, sizeof(u32), 0, "u32"),
+	/* Basic types */
+	ASSIGN_FETCH_TYPE(u8,  u8,  0),
+	ASSIGN_FETCH_TYPE(u16, u16, 0),
+	ASSIGN_FETCH_TYPE(u32, u32, 0),
+	ASSIGN_FETCH_TYPE(u64, u64, 0),
+	ASSIGN_FETCH_TYPE(s8,  u8,  1),
+	ASSIGN_FETCH_TYPE(s16, u16, 1),
+	ASSIGN_FETCH_TYPE(s32, u32, 1),
+	ASSIGN_FETCH_TYPE(s64, u64, 1),
+};
+
+static const struct fetch_type *find_fetch_type(const char *type)
+{
+	int i;
+
+	if (!type)
+		type = DEFAULT_FETCH_TYPE_STR;
+
+	for (i = 0; i < ARRAY_SIZE(fetch_type_table); i++)
+		if (strcmp(type, fetch_type_table[i].name) == 0)
+			return &fetch_type_table[i];
+	return NULL;
+}
+
+/* Special function : only accept unsigned long */
+static __kprobes void fetch_stack_address(struct pt_regs *regs,
+					  void *dummy, void *dest)
+{
+	*(unsigned long *)dest = kernel_stack_pointer(regs);
+}
+
+static fetch_func_t get_fetch_size_function(const struct fetch_type *type,
+					    fetch_func_t orig_fn)
+{
+	int i;
+
+	if (type != &fetch_type_table[FETCH_TYPE_STRING])
+		return NULL;	/* Only string type needs size function */
+	for (i = 0; i < FETCH_MTD_END; i++)
+		if (type->fetch[i] == orig_fn)
+			return fetch_type_table[FETCH_TYPE_STRSIZE].fetch[i];
+
+	WARN_ON(1);	/* This should not happen */
+	return NULL;
+}
+
+/* Split symbol and offset. */
+int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset)
+{
+	char *tmp;
+	int ret;
+
+	if (!offset)
+		return -EINVAL;
+
+	tmp = strchr(symbol, '+');
+	if (tmp) {
+		/* skip sign because strict_strtol doesn't accept '+' */
+		ret = strict_strtoul(tmp + 1, 0, offset);
+		if (ret)
+			return ret;
+		*tmp = '\0';
+	} else
+		*offset = 0;
+	return 0;
+}
+
+
+#define PARAM_MAX_STACK (THREAD_SIZE / sizeof(unsigned long))
+
+static int parse_probe_vars(char *arg, const struct fetch_type *t,
+			    struct fetch_param *f, bool is_return)
+{
+	int ret = 0;
+	unsigned long param;
+
+	if (strcmp(arg, "retval") == 0) {
+		if (is_return)
+			f->fn = t->fetch[FETCH_MTD_retval];
+		else
+			ret = -EINVAL;
+	} else if (strncmp(arg, "stack", 5) == 0) {
+		if (arg[5] == '\0') {
+			if (strcmp(t->name, DEFAULT_FETCH_TYPE_STR) == 0)
+				f->fn = fetch_stack_address;
+			else
+				ret = -EINVAL;
+		} else if (isdigit(arg[5])) {
+			ret = strict_strtoul(arg + 5, 10, &param);
+			if (ret || param > PARAM_MAX_STACK)
+				ret = -EINVAL;
+			else {
+				f->fn = t->fetch[FETCH_MTD_stack];
+				f->data = (void *)param;
+			}
+		} else
+			ret = -EINVAL;
+	} else
+		ret = -EINVAL;
+	return ret;
+}
+
+/* Recursive argument parser */
+static int parse_probe_arg(char *arg, const struct fetch_type *t,
+		     struct fetch_param *f, bool is_return)
+{
+	int ret = 0;
+	unsigned long param;
+	long offset;
+	char *tmp;
+
+	switch (arg[0]) {
+	case '$':
+		ret = parse_probe_vars(arg + 1, t, f, is_return);
+		break;
+	case '%':	/* named register */
+		ret = regs_query_register_offset(arg + 1);
+		if (ret >= 0) {
+			f->fn = t->fetch[FETCH_MTD_reg];
+			f->data = (void *)(unsigned long)ret;
+			ret = 0;
+		}
+		break;
+	case '@':	/* memory or symbol */
+		if (isdigit(arg[1])) {
+			ret = strict_strtoul(arg + 1, 0, &param);
+			if (ret)
+				break;
+			f->fn = t->fetch[FETCH_MTD_memory];
+			f->data = (void *)param;
+		} else {
+			ret = traceprobe_split_symbol_offset(arg + 1, &offset);
+			if (ret)
+				break;
+			f->data = alloc_symbol_cache(arg + 1, offset);
+			if (f->data)
+				f->fn = t->fetch[FETCH_MTD_symbol];
+		}
+		break;
+	case '+':	/* deref memory */
+	case '-':
+		tmp = strchr(arg, '(');
+		if (!tmp)
+			break;
+		*tmp = '\0';
+		ret = strict_strtol(arg + 1, 0, &offset);
+		if (ret)
+			break;
+		if (arg[0] == '-')
+			offset = -offset;
+		arg = tmp + 1;
+		tmp = strrchr(arg, ')');
+		if (tmp) {
+			struct deref_fetch_param *dprm;
+			const struct fetch_type *t2 = find_fetch_type(NULL);
+			*tmp = '\0';
+			dprm = kzalloc(sizeof(struct deref_fetch_param),
+				       GFP_KERNEL);
+			if (!dprm)
+				return -ENOMEM;
+			dprm->offset = offset;
+			ret = parse_probe_arg(arg, t2, &dprm->orig, is_return);
+			if (ret)
+				kfree(dprm);
+			else {
+				f->fn = t->fetch[FETCH_MTD_deref];
+				f->data = (void *)dprm;
+			}
+		}
+		break;
+	}
+	if (!ret && !f->fn) {	/* Parsed, but do not find fetch method */
+		pr_info("%s type has no corresponding fetch method.\n",
+			t->name);
+		ret = -EINVAL;
+	}
+	return ret;
+}
+
+/* String length checking wrapper */
+int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
+		struct probe_arg *parg, bool is_return)
+{
+	const char *t;
+	int ret;
+
+	if (strlen(arg) > MAX_ARGSTR_LEN) {
+		pr_info("Argument is too long.: %s\n",  arg);
+		return -ENOSPC;
+	}
+	parg->comm = kstrdup(arg, GFP_KERNEL);
+	if (!parg->comm) {
+		pr_info("Failed to allocate memory for command '%s'.\n", arg);
+		return -ENOMEM;
+	}
+	t = strchr(parg->comm, ':');
+	if (t) {
+		arg[t - parg->comm] = '\0';
+		t++;
+	}
+	parg->type = find_fetch_type(t);
+	if (!parg->type) {
+		pr_info("Unsupported type: %s\n", t);
+		return -EINVAL;
+	}
+	parg->offset = *size;
+	*size += parg->type->size;
+	ret = parse_probe_arg(arg, parg->type, &parg->fetch, is_return);
+
+	if (ret >= 0) {
+		parg->fetch_size.fn = get_fetch_size_function(parg->type,
+							      parg->fetch.fn);
+		parg->fetch_size.data = parg->fetch.data;
+	}
+	return ret;
+}
+
+/* Return 1 if name is reserved or already used by another argument */
+int traceprobe_conflict_field_name(const char *name,
+			       struct probe_arg *args, int narg)
+{
+	int i;
+	for (i = 0; i < ARRAY_SIZE(reserved_field_names); i++)
+		if (strcmp(reserved_field_names[i], name) == 0)
+			return 1;
+	for (i = 0; i < narg; i++)
+		if (strcmp(args[i].name, name) == 0)
+			return 1;
+	return 0;
+}
+
+void traceprobe_free_probe_arg(struct probe_arg *arg)
+{
+	if (CHECK_FETCH_FUNCS(deref, arg->fetch.fn))
+		free_deref_fetch_param(arg->fetch.data);
+	else if (CHECK_FETCH_FUNCS(symbol, arg->fetch.fn))
+		free_symbol_cache(arg->fetch.data);
+	kfree(arg->name);
+	kfree(arg->comm);
+}
+
+static int command_trace_probe(const char *buf,
+			int (*createfn)(int, char**))
+{
+	char **argv;
+	int argc = 0, ret = 0;
+
+	argv = argv_split(GFP_KERNEL, buf, &argc);
+	if (!argv)
+		return -ENOMEM;
+
+	if (argc)
+		ret = createfn(argc, argv);
+
+	argv_free(argv);
+	return ret;
+}
+
+#define WRITE_BUFSIZE 128
+
+ssize_t traceprobe_probes_write(struct file *file, const char __user *buffer,
+	    size_t count, loff_t *ppos, int (*createfn)(int, char**))
+{
+	char *kbuf, *tmp;
+	int ret = 0;
+	size_t done = 0;
+	size_t size;
+
+	kbuf = kmalloc(WRITE_BUFSIZE, GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	while (done < count) {
+		size = count - done;
+		if (size >= WRITE_BUFSIZE)
+			size = WRITE_BUFSIZE - 1;
+		if (copy_from_user(kbuf, buffer + done, size)) {
+			ret = -EFAULT;
+			goto out;
+		}
+		kbuf[size] = '\0';
+		tmp = strchr(kbuf, '\n');
+		if (tmp) {
+			*tmp = '\0';
+			size = tmp - kbuf + 1;
+		} else if (done + size < count) {
+			pr_warning("Line length is too long: "
+				   "Should be less than %d.", WRITE_BUFSIZE);
+			ret = -EINVAL;
+			goto out;
+		}
+		done += size;
+		/* Remove comments */
+		tmp = strchr(kbuf, '#');
+		if (tmp)
+			*tmp = '\0';
+
+		ret = command_trace_probe(kbuf, createfn);
+		if (ret)
+			goto out;
+	}
+	ret = done;
+out:
+	kfree(kbuf);
+	return ret;
+}
+
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
new file mode 100644
index 0000000..5352fd2
--- /dev/null
+++ b/kernel/trace/trace_probe.h
@@ -0,0 +1,155 @@
+/*
+ * Common header file for probe-based Dynamic events.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ * Copyright (C) IBM Corporation, 2010
+ * Author:     Srikar Dronamraju
+ *
+ * Derived from kernel/trace/trace_kprobe.c written by
+ * Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
+ */
+
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/smp.h>
+#include <linux/debugfs.h>
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/ctype.h>
+#include <linux/ptrace.h>
+#include <linux/perf_event.h>
+#include <linux/kprobes.h>
+#include <linux/stringify.h>
+#include <linux/limits.h>
+#include <linux/uaccess.h>
+#include <asm/bitsperlong.h>
+
+#include "trace.h"
+#include "trace_output.h"
+
+#define MAX_TRACE_ARGS 128
+#define MAX_ARGSTR_LEN 63
+#define MAX_EVENT_NAME_LEN 64
+#define MAX_STRING_SIZE PATH_MAX
+
+/* Reserved field names */
+#define FIELD_STRING_IP "__probe_ip"
+#define FIELD_STRING_RETIP "__probe_ret_ip"
+#define FIELD_STRING_FUNC "__probe_func"
+
+#undef DEFINE_FIELD
+#define DEFINE_FIELD(type, item, name, is_signed)			\
+	do {								\
+		ret = trace_define_field(event_call, #type, name,	\
+					 offsetof(typeof(field), item),	\
+					 sizeof(field.item), is_signed, \
+					 FILTER_OTHER);			\
+		if (ret)						\
+			return ret;					\
+	} while (0)
+
+
+/* Flags for trace_probe */
+#define TP_FLAG_TRACE	1
+#define TP_FLAG_PROFILE	2
+
+
+/* data_rloc: data relative location, compatible with u32 */
+#define make_data_rloc(len, roffs)	\
+	(((u32)(len) << 16) | ((u32)(roffs) & 0xffff))
+#define get_rloc_len(dl)	((u32)(dl) >> 16)
+#define get_rloc_offs(dl)	((u32)(dl) & 0xffff)
+
+/*
+ * Convert data_rloc to data_loc:
+ *  data_rloc stores the offset from data_rloc itself, but data_loc
+ *  stores the offset from event entry.
+ */
+#define convert_rloc_to_loc(dl, offs)	((u32)(dl) + (offs))
+
+/* Data fetch function type */
+typedef	void (*fetch_func_t)(struct pt_regs *, void *, void *);
+/* Printing function type */
+typedef int (*print_type_func_t)(struct trace_seq *, const char *, void *,
+				 void *);
+
+/* Fetch types */
+enum {
+	FETCH_MTD_reg = 0,
+	FETCH_MTD_stack,
+	FETCH_MTD_retval,
+	FETCH_MTD_memory,
+	FETCH_MTD_symbol,
+	FETCH_MTD_deref,
+	FETCH_MTD_END,
+};
+
+/* Fetch type information table */
+struct fetch_type {
+	const char	*name;		/* Name of type */
+	size_t		size;		/* Byte size of type */
+	int		is_signed;	/* Signed flag */
+	print_type_func_t	print;	/* Print functions */
+	const char	*fmt;		/* Fromat string */
+	const char	*fmttype;	/* Name in format file */
+	/* Fetch functions */
+	fetch_func_t	fetch[FETCH_MTD_END];
+};
+
+struct fetch_param {
+	fetch_func_t	fn;
+	void *data;
+};
+
+struct probe_arg {
+	struct fetch_param	fetch;
+	struct fetch_param	fetch_size;
+	unsigned int		offset;	/* Offset from argument entry */
+	const char		*name;	/* Name of this argument */
+	const char		*comm;	/* Command of this argument */
+	const struct fetch_type	*type;	/* Type of this argument */
+};
+
+static inline __kprobes void call_fetch(struct fetch_param *fprm,
+				 struct pt_regs *regs, void *dest)
+{
+	return fprm->fn(regs, fprm->data, dest);
+}
+
+/* Check the name is good for event/group/fields */
+static int is_good_name(const char *name)
+{
+	if (!isalpha(*name) && *name != '_')
+		return 0;
+	while (*++name != '\0') {
+		if (!isalpha(*name) && !isdigit(*name) && *name != '_')
+			return 0;
+	}
+	return 1;
+}
+
+extern int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
+		   struct probe_arg *parg, bool is_return);
+
+extern int traceprobe_conflict_field_name(const char *name,
+			       struct probe_arg *args, int narg);
+
+extern void traceprobe_free_probe_arg(struct probe_arg *arg);
+
+extern int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset);
+
+extern ssize_t traceprobe_probes_write(struct file *file,
+		const char __user *buffer, size_t count, loff_t *ppos,
+		int (*createfn)(int, char**));

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [RFC] [PATCH 2.6.37-rc5-tip 20/20] 20: tracing: uprobes trace_event interface
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (18 preceding siblings ...)
  2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 19/20] 19: tracing: Extract out common code for kprobes/uprobes traceevents Srikar Dronamraju
@ 2010-12-16 10:01 ` Srikar Dronamraju
  2010-12-16 10:07 ` [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
  20 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16 10:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Srikar Dronamraju, Linux-mm,
	Arnaldo Carvalho de Melo, Linus Torvalds,
	Ananth N Mavinakayanahalli, Christoph Hellwig, Masami Hiramatsu,
	Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney


Implements trace_event support for uprobes. In its current form it can
be used to put probes at a specified offset in a file and dump the
required registers when the code flow reaches the probed address.

The following example shows how to dump the instruction pointer and %ax
a register at the probed text address.  Here we are trying to probe
zfree in /bin/zsh

# cd /sys/kernel/debug/tracing/
# cat /proc/`pgrep  zsh`/maps | grep /bin/zsh | grep r-xp
00400000-0048a000 r-xp 00000000 08:03 130904 /bin/zsh
# objdump -T /bin/zsh | grep -w zfree
0000000000446420 g    DF .text  0000000000000012  Base        zfree
# echo 'p /bin/zsh:0x46420 %ip %ax' > uprobe_events
# cat uprobe_events
p:uprobes/p_zsh_0x46420 /bin/zsh:0x0000000000046420
# echo 1 > events/uprobes/enable
# sleep 20
# echo 0 > events/uprobes/enable
# cat trace
# tracer: nop
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
             zsh-24842 [006] 258544.995456: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79
             zsh-24842 [007] 258545.000270: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79
             zsh-24842 [002] 258545.043929: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79
             zsh-24842 [004] 258547.046129: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79

TODO: Documentation/trace/uprobetrace.txt
TODO: dynamically allocate a consumer at probe enable time and remove the
      consumer from trace_uprobe structure.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/Kconfig                |   10 -
 kernel/trace/Kconfig        |   16 +
 kernel/trace/Makefile       |    1 
 kernel/trace/trace.h        |    5 
 kernel/trace/trace_kprobe.c |    4 
 kernel/trace/trace_probe.c  |   14 +
 kernel/trace/trace_probe.h  |    6 
 kernel/trace/trace_uprobe.c |  753 +++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 792 insertions(+), 17 deletions(-)
 create mode 100644 kernel/trace/trace_uprobe.c

diff --git a/arch/Kconfig b/arch/Kconfig
index bba8108..c4e9663 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -62,16 +62,8 @@ config OPTPROBES
 	depends on !PREEMPT
 
 config UPROBES
-	bool "User-space probes (EXPERIMENTAL)"
-	depends on ARCH_SUPPORTS_UPROBES
-	depends on MMU
 	select MM_OWNER
-	help
-	  Uprobes enables kernel subsystems to establish probepoints
-	  in user applications and execute handler functions when
-	  the probepoints are hit. For more information, refer to
-	  Documentation/uprobes.txt.
-	  If in doubt, say "N".
+	def_bool n
 
 config HAVE_EFFICIENT_UNALIGNED_ACCESS
 	bool
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 09ad930..0030f1e 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -371,6 +371,22 @@ config KPROBE_EVENT
 	  This option is also required by perf-probe subcommand of perf tools.
 	  If you want to use perf tools, this option is strongly recommended.
 
+config UPROBE_EVENT
+	bool "Enable uprobes-based dynamic events"
+	depends on ARCH_SUPPORTS_UPROBES
+	depends on MMU
+	select UPROBES
+	select PROBE_EVENTS
+	select TRACING
+	default n
+	help
+	  This allows the user to add tracing events on top of userspace dynamic
+	  events (similar to tracepoints) on the fly via the traceevents interface.
+	  Those events can be inserted wherever uprobes can probe, and record
+	  various registers.
+	  This option is required if you plan to use perf-probe subcommand of perf
+	  tools on user space applications.
+
 config PROBE_EVENTS
 	def_bool n
 
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 95d2043..67a12e3 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -57,5 +57,6 @@ ifeq ($(CONFIG_TRACING),y)
 obj-$(CONFIG_KGDB_KDB) += trace_kdb.o
 endif
 obj-$(CONFIG_PROBE_EVENTS) +=trace_probe.o
+obj-$(CONFIG_UPROBE_EVENT) += trace_uprobe.o
 
 libftrace-y := ftrace.o
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 9021f8c..73ca49e 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -97,6 +97,11 @@ struct kretprobe_trace_entry_head {
 	unsigned long		ret_ip;
 };
 
+struct uprobe_trace_entry_head {
+	struct trace_entry	ent;
+	unsigned long		ip;
+};
+
 /*
  * trace_flag_type is an enumeration that holds different
  * states when a trace occurs. These are:
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 1045ed7..bb116b6 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -374,8 +374,8 @@ static int create_trace_probe(int argc, char **argv)
 		}
 
 		/* Parse fetch argument */
-		ret = traceprobe_parse_probe_arg(arg, &tp->size, &tp->args[i],
-								is_return);
+		ret = traceprobe_parse_probe_arg(arg, &tp->size,
+					&tp->args[i], is_return, true);
 		if (ret) {
 			pr_info("Parse error at argument[%d]. (%d)\n", i, ret);
 			goto error;
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 4ca48c6..5123d1c 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -443,13 +443,17 @@ static int parse_probe_vars(char *arg, const struct fetch_type *t,
 
 /* Recursive argument parser */
 static int parse_probe_arg(char *arg, const struct fetch_type *t,
-		     struct fetch_param *f, bool is_return)
+		     struct fetch_param *f, bool is_return, bool is_kprobe)
 {
 	int ret = 0;
 	unsigned long param;
 	long offset;
 	char *tmp;
 
+	/* Until uprobe_events supports only reg arguments */
+	if (!is_kprobe && arg[0] != '%')
+		return -EINVAL;
+
 	switch (arg[0]) {
 	case '$':
 		ret = parse_probe_vars(arg + 1, t, f, is_return);
@@ -500,7 +504,8 @@ static int parse_probe_arg(char *arg, const struct fetch_type *t,
 			if (!dprm)
 				return -ENOMEM;
 			dprm->offset = offset;
-			ret = parse_probe_arg(arg, t2, &dprm->orig, is_return);
+			ret = parse_probe_arg(arg, t2, &dprm->orig, is_return,
+							is_kprobe);
 			if (ret)
 				kfree(dprm);
 			else {
@@ -520,7 +525,7 @@ static int parse_probe_arg(char *arg, const struct fetch_type *t,
 
 /* String length checking wrapper */
 int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
-		struct probe_arg *parg, bool is_return)
+		struct probe_arg *parg, bool is_return, bool is_kprobe)
 {
 	const char *t;
 	int ret;
@@ -546,7 +551,8 @@ int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
 	}
 	parg->offset = *size;
 	*size += parg->type->size;
-	ret = parse_probe_arg(arg, parg->type, &parg->fetch, is_return);
+	ret = parse_probe_arg(arg, parg->type, &parg->fetch, is_return,
+							is_kprobe);
 
 	if (ret >= 0) {
 		parg->fetch_size.fn = get_fetch_size_function(parg->type,
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 5352fd2..6f9a75d 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -48,6 +48,7 @@
 #define FIELD_STRING_IP "__probe_ip"
 #define FIELD_STRING_RETIP "__probe_ret_ip"
 #define FIELD_STRING_FUNC "__probe_func"
+#define FIELD_STRING_PID "__probe_pid"
 
 #undef DEFINE_FIELD
 #define DEFINE_FIELD(type, item, name, is_signed)			\
@@ -64,6 +65,7 @@
 /* Flags for trace_probe */
 #define TP_FLAG_TRACE	1
 #define TP_FLAG_PROFILE	2
+#define TP_FLAG_UPROBE	4
 
 
 /* data_rloc: data relative location, compatible with u32 */
@@ -129,7 +131,7 @@ static inline __kprobes void call_fetch(struct fetch_param *fprm,
 }
 
 /* Check the name is good for event/group/fields */
-static int is_good_name(const char *name)
+static inline int is_good_name(const char *name)
 {
 	if (!isalpha(*name) && *name != '_')
 		return 0;
@@ -141,7 +143,7 @@ static int is_good_name(const char *name)
 }
 
 extern int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
-		   struct probe_arg *parg, bool is_return);
+		   struct probe_arg *parg, bool is_return, bool is_kprobe);
 
 extern int traceprobe_conflict_field_name(const char *name,
 			       struct probe_arg *args, int narg);
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
new file mode 100644
index 0000000..f165504
--- /dev/null
+++ b/kernel/trace/trace_uprobe.c
@@ -0,0 +1,753 @@
+/*
+ * uprobes-based tracing events
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ * Copyright (C) IBM Corporation, 2010
+ * Author:	Srikar Dronamraju
+ */
+
+#include <linux/module.h>
+#include <linux/uaccess.h>
+#include <linux/uprobes.h>
+#include <linux/namei.h>
+
+#include "trace_probe.h"
+
+#define UPROBE_EVENT_SYSTEM "uprobes"
+
+/**
+ * uprobe event core functions
+ */
+
+struct trace_uprobe {
+	struct list_head	list;
+	struct uprobe_consumer  consumer;
+	struct ftrace_event_class	class;
+	struct ftrace_event_call	call;
+	struct inode		*inode;
+	char			*filename;
+	unsigned long		offset;
+	unsigned long		nhit;
+	unsigned int		flags;	/* For TP_FLAG_* */
+	ssize_t			size;		/* trace entry size */
+	unsigned int		nr_args;
+	struct probe_arg	args[];
+};
+
+#define SIZEOF_TRACE_UPROBE(n)			\
+	(offsetof(struct trace_uprobe, args) +	\
+	(sizeof(struct probe_arg) * (n)))
+
+static int register_uprobe_event(struct trace_uprobe *tp);
+static void unregister_uprobe_event(struct trace_uprobe *tp);
+
+static DEFINE_MUTEX(uprobe_lock);
+static LIST_HEAD(uprobe_list);
+
+static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs);
+
+/*
+ * Allocate new trace_uprobe and initialize it (including uprobes).
+ */
+static struct trace_uprobe *alloc_trace_uprobe(const char *group,
+				const char *event, int nargs)
+{
+	struct trace_uprobe *tp;
+
+	if (!event || !is_good_name(event))  {
+		printk(KERN_ERR "%s\n", event);
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (!group || !is_good_name(group)) {
+		printk(KERN_ERR "%s\n", group);
+		return ERR_PTR(-EINVAL);
+	}
+
+	tp = kzalloc(SIZEOF_TRACE_UPROBE(nargs), GFP_KERNEL);
+	if (!tp)
+		return ERR_PTR(-ENOMEM);
+
+	tp->call.class = &tp->class;
+	tp->call.name = kstrdup(event, GFP_KERNEL);
+	if (!tp->call.name)
+		goto error;
+
+	tp->class.system = kstrdup(group, GFP_KERNEL);
+	if (!tp->class.system)
+		goto error;
+
+	INIT_LIST_HEAD(&tp->list);
+	return tp;
+error:
+	kfree(tp->call.name);
+	kfree(tp);
+	return ERR_PTR(-ENOMEM);
+}
+
+static void free_trace_uprobe(struct trace_uprobe *tp)
+{
+	int i;
+
+	for (i = 0; i < tp->nr_args; i++)
+		traceprobe_free_probe_arg(&tp->args[i]);
+
+	iput(tp->inode);
+	kfree(tp->call.class->system);
+	kfree(tp->call.name);
+	kfree(tp->filename);
+	kfree(tp);
+}
+
+static struct trace_uprobe *find_probe_event(const char *event,
+					const char *group)
+{
+	struct trace_uprobe *tp;
+
+	list_for_each_entry(tp, &uprobe_list, list)
+		if (strcmp(tp->call.name, event) == 0 &&
+		    strcmp(tp->call.class->system, group) == 0)
+			return tp;
+	return NULL;
+}
+
+/* Unregister a trace_uprobe and probe_event: call with locking uprobe_lock */
+static void unregister_trace_uprobe(struct trace_uprobe *tp)
+{
+	list_del(&tp->list);
+	unregister_uprobe_event(tp);
+	free_trace_uprobe(tp);
+}
+
+/* Register a trace_uprobe and probe_event */
+static int register_trace_uprobe(struct trace_uprobe *tp)
+{
+	struct trace_uprobe *old_tp;
+	int ret;
+
+	mutex_lock(&uprobe_lock);
+
+	/* register as an event */
+	old_tp = find_probe_event(tp->call.name, tp->call.class->system);
+	if (old_tp)
+		/* delete old event */
+		unregister_trace_uprobe(old_tp);
+
+	ret = register_uprobe_event(tp);
+	if (ret) {
+		pr_warning("Failed to register probe event(%d)\n", ret);
+		goto end;
+	}
+
+	list_add_tail(&tp->list, &uprobe_list);
+end:
+	mutex_unlock(&uprobe_lock);
+	return ret;
+}
+
+static int create_trace_uprobe(int argc, char **argv)
+{
+	/*
+	 * Argument syntax:
+	 *  - Add uprobe: p[:[GRP/]EVENT] VADDR@PID [%REG]
+	 *
+	 *  - Remove uprobe: -:[GRP/]EVENT
+	 */
+	struct path path;
+	struct inode *inode;
+	struct trace_uprobe *tp;
+	int i, ret = 0;
+	int is_delete = 0;
+	char *arg = NULL, *event = NULL, *group = NULL;
+	unsigned long offset;
+	char buf[MAX_EVENT_NAME_LEN];
+	char *filename;
+
+	/* argc must be >= 1 */
+	if (argv[0][0] == '-')
+		is_delete = 1;
+	else if (argv[0][0] != 'p') {
+		pr_info("Probe definition must be started with 'p', 'r' or"
+			" '-'.\n");
+		return -EINVAL;
+	}
+
+	if (argv[0][1] == ':') {
+		event = &argv[0][2];
+		if (strchr(event, '/')) {
+			group = event;
+			event = strchr(group, '/') + 1;
+			event[-1] = '\0';
+			if (strlen(group) == 0) {
+				pr_info("Group name is not specified\n");
+				return -EINVAL;
+			}
+		}
+		if (strlen(event) == 0) {
+			pr_info("Event name is not specified\n");
+			return -EINVAL;
+		}
+	}
+	if (!group)
+		group = UPROBE_EVENT_SYSTEM;
+
+	if (is_delete) {
+		if (!event) {
+			pr_info("Delete command needs an event name.\n");
+			return -EINVAL;
+		}
+		mutex_lock(&uprobe_lock);
+		tp = find_probe_event(event, group);
+		if (!tp) {
+			mutex_unlock(&uprobe_lock);
+			pr_info("Event %s/%s doesn't exist.\n", group, event);
+			return -ENOENT;
+		}
+		/* delete an event */
+		unregister_trace_uprobe(tp);
+		mutex_unlock(&uprobe_lock);
+		return 0;
+	}
+
+	if (argc < 2) {
+		pr_info("Probe point is not specified.\n");
+		return -EINVAL;
+	}
+	if (isdigit(argv[1][0])) {
+		pr_info("probe point must be have a filename.\n");
+		return -EINVAL;
+	}
+	arg = strchr(argv[1], ':');
+	if (!arg)
+		goto fail_address_parse;
+
+	*arg++ = '\0';
+	filename = argv[1];
+	ret = kern_path(filename, LOOKUP_FOLLOW, &path);
+	if (ret)
+		goto fail_address_parse;
+	inode = path.dentry->d_inode;
+	__iget(inode);
+
+	ret = strict_strtoul(arg, 0, &offset);
+		if (ret)
+			goto fail_address_parse;
+	argc -= 2; argv += 2;
+
+	/* setup a probe */
+	if (!event) {
+		char *tail = strrchr(filename, '/');
+
+		snprintf(buf, MAX_EVENT_NAME_LEN, "%c_%s_0x%lx", 'p',
+				(tail ? tail + 1 : filename), offset);
+		event = buf;
+	}
+	tp = alloc_trace_uprobe(group, event, argc);
+	if (IS_ERR(tp)) {
+		pr_info("Failed to allocate trace_uprobe.(%d)\n",
+			(int)PTR_ERR(tp));
+		iput(inode);
+		return PTR_ERR(tp);
+	}
+	tp->offset = offset;
+	tp->inode = inode;
+	tp->consumer.handler = uprobe_dispatcher;
+	tp->consumer.filter = NULL;
+	tp->consumer.fvalue = NULL;
+	tp->filename = kstrdup(filename, GFP_KERNEL);
+	if (!tp->filename) {
+			pr_info("Failed to allocate filename.\n");
+			ret = -ENOMEM;
+			goto error;
+	}
+
+	/* parse arguments */
+	ret = 0;
+	for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
+		/* Increment count for freeing args in error case */
+		tp->nr_args++;
+
+		/* Parse argument name */
+		arg = strchr(argv[i], '=');
+		if (arg) {
+			*arg++ = '\0';
+			tp->args[i].name = kstrdup(argv[i], GFP_KERNEL);
+		} else {
+			arg = argv[i];
+			/* If argument name is omitted, set "argN" */
+			snprintf(buf, MAX_EVENT_NAME_LEN, "arg%d", i + 1);
+			tp->args[i].name = kstrdup(buf, GFP_KERNEL);
+		}
+
+		if (!tp->args[i].name) {
+			pr_info("Failed to allocate argument[%d] name.\n", i);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		if (!is_good_name(tp->args[i].name)) {
+			pr_info("Invalid argument[%d] name: %s\n",
+				i, tp->args[i].name);
+			ret = -EINVAL;
+			goto error;
+		}
+
+		if (traceprobe_conflict_field_name(tp->args[i].name,
+							tp->args, i)) {
+			pr_info("Argument[%d] name '%s' conflicts with "
+				"another field.\n", i, argv[i]);
+			ret = -EINVAL;
+			goto error;
+		}
+
+		/* Parse fetch argument */
+		ret = traceprobe_parse_probe_arg(arg, &tp->size, &tp->args[i],
+								false, false);
+		if (ret) {
+			pr_info("Parse error at argument[%d]. (%d)\n", i, ret);
+			goto error;
+		}
+	}
+
+	ret = register_trace_uprobe(tp);
+	if (ret)
+		goto error;
+	return 0;
+
+error:
+	free_trace_uprobe(tp);
+	return ret;
+
+fail_address_parse:
+	pr_info("Failed to parse address.\n");
+	return ret;
+}
+
+static void cleanup_all_probes(void)
+{
+	struct trace_uprobe *tp;
+
+	mutex_lock(&uprobe_lock);
+	while (!list_empty(&uprobe_list)) {
+		tp = list_entry(uprobe_list.next, struct trace_uprobe, list);
+		unregister_trace_uprobe(tp);
+	}
+	mutex_unlock(&uprobe_lock);
+}
+
+
+/* Probes listing interfaces */
+static void *probes_seq_start(struct seq_file *m, loff_t *pos)
+{
+	mutex_lock(&uprobe_lock);
+	return seq_list_start(&uprobe_list, *pos);
+}
+
+static void *probes_seq_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	return seq_list_next(v, &uprobe_list, pos);
+}
+
+static void probes_seq_stop(struct seq_file *m, void *v)
+{
+	mutex_unlock(&uprobe_lock);
+}
+
+static int probes_seq_show(struct seq_file *m, void *v)
+{
+	struct trace_uprobe *tp = v;
+	int i;
+
+	seq_printf(m, "p:%s/%s", tp->call.class->system, tp->call.name);
+	seq_printf(m, " %s:0x%p", tp->filename, (void *)tp->offset);
+
+	for (i = 0; i < tp->nr_args; i++)
+		seq_printf(m, " %s=%s", tp->args[i].name, tp->args[i].comm);
+	seq_printf(m, "\n");
+	return 0;
+}
+
+static const struct seq_operations probes_seq_op = {
+	.start  = probes_seq_start,
+	.next   = probes_seq_next,
+	.stop   = probes_seq_stop,
+	.show   = probes_seq_show
+};
+
+static int probes_open(struct inode *inode, struct file *file)
+{
+	if ((file->f_mode & FMODE_WRITE) &&
+	    (file->f_flags & O_TRUNC))
+		cleanup_all_probes();
+
+	return seq_open(file, &probes_seq_op);
+}
+
+static ssize_t probes_write(struct file *file, const char __user *buffer,
+			    size_t count, loff_t *ppos)
+{
+	return traceprobe_probes_write(file, buffer, count, ppos,
+			create_trace_uprobe);
+}
+
+static const struct file_operations uprobe_events_ops = {
+	.owner          = THIS_MODULE,
+	.open           = probes_open,
+	.read           = seq_read,
+	.llseek         = seq_lseek,
+	.release        = seq_release,
+	.write		= probes_write,
+};
+
+/* Probes profiling interfaces */
+static int probes_profile_seq_show(struct seq_file *m, void *v)
+{
+	struct trace_uprobe *tp = v;
+
+	seq_printf(m, "  %s %-44s %15lu\n", tp->filename, tp->call.name,
+								tp->nhit);
+	return 0;
+}
+
+static const struct seq_operations profile_seq_op = {
+	.start  = probes_seq_start,
+	.next   = probes_seq_next,
+	.stop   = probes_seq_stop,
+	.show   = probes_profile_seq_show
+};
+
+static int profile_open(struct inode *inode, struct file *file)
+{
+	return seq_open(file, &profile_seq_op);
+}
+
+static const struct file_operations uprobe_profile_ops = {
+	.owner          = THIS_MODULE,
+	.open           = profile_open,
+	.read           = seq_read,
+	.llseek         = seq_lseek,
+	.release        = seq_release,
+};
+
+/* uprobe handler */
+static void uprobe_trace_func(struct trace_uprobe *tp, struct pt_regs *regs)
+{
+	struct uprobe_trace_entry_head *entry;
+	struct ring_buffer_event *event;
+	struct ring_buffer *buffer;
+	u8 *data;
+	int size, i, pc;
+	unsigned long irq_flags;
+	struct ftrace_event_call *call = &tp->call;
+
+	tp->nhit++;
+
+	local_save_flags(irq_flags);
+	pc = preempt_count();
+
+	size = sizeof(*entry) + tp->size;
+
+	event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
+						  size, irq_flags, pc);
+	if (!event)
+		return;
+
+	entry = ring_buffer_event_data(event);
+	entry->ip = uprobes_get_bkpt_addr(task_pt_regs(current));
+	data = (u8 *)&entry[1];
+	for (i = 0; i < tp->nr_args; i++)
+		call_fetch(&tp->args[i].fetch, regs,
+						data + tp->args[i].offset);
+
+	if (!filter_current_check_discard(buffer, call, entry, event))
+		trace_buffer_unlock_commit(buffer, event, irq_flags, pc);
+}
+
+/* Event entry printers */
+enum print_line_t
+print_uprobe_event(struct trace_iterator *iter, int flags,
+		   struct trace_event *event)
+{
+	struct uprobe_trace_entry_head *field;
+	struct trace_seq *s = &iter->seq;
+	struct trace_uprobe *tp;
+	u8 *data;
+	int i;
+
+	field = (struct uprobe_trace_entry_head *)iter->ent;
+	tp = container_of(event, struct trace_uprobe, call.event);
+
+	if (!trace_seq_printf(s, "%s: (", tp->call.name))
+		goto partial;
+
+	if (!seq_print_ip_sym(s, field->ip, flags | TRACE_ITER_SYM_OFFSET))
+		goto partial;
+
+	if (!trace_seq_puts(s, ")"))
+		goto partial;
+
+	data = (u8 *)&field[1];
+	for (i = 0; i < tp->nr_args; i++)
+		if (!tp->args[i].type->print(s, tp->args[i].name,
+					     data + tp->args[i].offset, field))
+			goto partial;
+
+	if (!trace_seq_puts(s, "\n"))
+		goto partial;
+
+	return TRACE_TYPE_HANDLED;
+partial:
+	return TRACE_TYPE_PARTIAL_LINE;
+}
+
+
+static int probe_event_enable(struct ftrace_event_call *call)
+{
+	int ret = 0;
+	struct trace_uprobe *tp = (struct trace_uprobe *)call->data;
+
+	ret = register_uprobe(tp->inode, tp->offset, &tp->consumer);
+	if (!ret)
+		tp->flags |= TP_FLAG_TRACE;
+	return ret;
+}
+
+static void probe_event_disable(struct ftrace_event_call *call)
+{
+	struct trace_uprobe *tp = (struct trace_uprobe *)call->data;
+
+	unregister_uprobe(tp->inode, tp->offset, &tp->consumer);
+	tp->flags &= ~TP_FLAG_TRACE;
+}
+
+static int uprobe_event_define_fields(struct ftrace_event_call *event_call)
+{
+	int ret, i;
+	struct uprobe_trace_entry_head field;
+	struct trace_uprobe *tp = (struct trace_uprobe *)event_call->data;
+
+	DEFINE_FIELD(unsigned long, ip, FIELD_STRING_IP, 0);
+	/* Set argument names as fields */
+	for (i = 0; i < tp->nr_args; i++) {
+		ret = trace_define_field(event_call, tp->args[i].type->fmttype,
+					 tp->args[i].name,
+					 sizeof(field) + tp->args[i].offset,
+					 tp->args[i].type->size,
+					 tp->args[i].type->is_signed,
+					 FILTER_OTHER);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static int __set_print_fmt(struct trace_uprobe *tp, char *buf, int len)
+{
+	int i;
+	int pos = 0;
+
+	const char *fmt, *arg;
+
+	fmt = "(%lx)";
+	arg = "REC->" FIELD_STRING_IP;
+
+	/* When len=0, we just calculate the needed length */
+#define LEN_OR_ZERO (len ? len - pos : 0)
+
+	pos += snprintf(buf + pos, LEN_OR_ZERO, "\"%s", fmt);
+
+	for (i = 0; i < tp->nr_args; i++) {
+		pos += snprintf(buf + pos, LEN_OR_ZERO, " %s=%s",
+				tp->args[i].name, tp->args[i].type->fmt);
+	}
+
+	pos += snprintf(buf + pos, LEN_OR_ZERO, "\", %s", arg);
+
+	for (i = 0; i < tp->nr_args; i++) {
+		pos += snprintf(buf + pos, LEN_OR_ZERO, ", REC->%s",
+				tp->args[i].name);
+	}
+
+#undef LEN_OR_ZERO
+
+	/* return the length of print_fmt */
+	return pos;
+}
+
+static int set_print_fmt(struct trace_uprobe *tp)
+{
+	int len;
+	char *print_fmt;
+
+	/* First: called with 0 length to calculate the needed length */
+	len = __set_print_fmt(tp, NULL, 0);
+	print_fmt = kmalloc(len + 1, GFP_KERNEL);
+	if (!print_fmt)
+		return -ENOMEM;
+
+	/* Second: actually write the @print_fmt */
+	__set_print_fmt(tp, print_fmt, len + 1);
+	tp->call.print_fmt = print_fmt;
+
+	return 0;
+}
+
+#ifdef CONFIG_PERF_EVENTS
+
+/* uprobe profile handler */
+static void uprobe_perf_func(struct trace_uprobe *tp,
+					 struct pt_regs *regs)
+{
+	struct ftrace_event_call *call = &tp->call;
+	struct uprobe_trace_entry_head *entry;
+	struct hlist_head *head;
+	u8 *data;
+	int size, __size, i;
+	int rctx;
+
+	__size = sizeof(*entry) + tp->size;
+	size = ALIGN(__size + sizeof(u32), sizeof(u64));
+	size -= sizeof(u32);
+	if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE,
+		     "profile buffer not large enough"))
+		return;
+
+	entry = perf_trace_buf_prepare(size, call->event.type, regs, &rctx);
+	if (!entry)
+		return;
+
+	entry->ip = uprobes_get_bkpt_addr(task_pt_regs(current));
+	data = (u8 *)&entry[1];
+	for (i = 0; i < tp->nr_args; i++)
+		call_fetch(&tp->args[i].fetch, regs,
+						data + tp->args[i].offset);
+
+	head = this_cpu_ptr(call->perf_events);
+	perf_trace_buf_submit(entry, size, rctx, entry->ip, 1, regs, head);
+}
+
+static int probe_perf_enable(struct ftrace_event_call *call)
+{
+	int ret = 0;
+	struct trace_uprobe *tp = (struct trace_uprobe *)call->data;
+
+	ret = register_uprobe(tp->inode, tp->offset, &tp->consumer);
+	if (!ret)
+		tp->flags |= TP_FLAG_PROFILE;
+	return ret;
+}
+
+static void probe_perf_disable(struct ftrace_event_call *call)
+{
+	struct trace_uprobe *tp = (struct trace_uprobe *)call->data;
+
+	unregister_uprobe(tp->inode, tp->offset, &tp->consumer);
+	tp->flags &= ~TP_FLAG_PROFILE;
+}
+#endif	/* CONFIG_PERF_EVENTS */
+
+static
+int uprobe_register(struct ftrace_event_call *event, enum trace_reg type)
+{
+	switch (type) {
+	case TRACE_REG_REGISTER:
+		return probe_event_enable(event);
+	case TRACE_REG_UNREGISTER:
+		probe_event_disable(event);
+		return 0;
+
+#ifdef CONFIG_PERF_EVENTS
+	case TRACE_REG_PERF_REGISTER:
+		return probe_perf_enable(event);
+	case TRACE_REG_PERF_UNREGISTER:
+		probe_perf_disable(event);
+		return 0;
+#endif
+	}
+	return 0;
+}
+
+static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
+{
+	struct trace_uprobe *tp;
+
+	tp = container_of(con, struct trace_uprobe, consumer);
+	if (tp->flags & TP_FLAG_TRACE)
+		uprobe_trace_func(tp, regs);
+#ifdef CONFIG_PERF_EVENTS
+	if (tp->flags & TP_FLAG_PROFILE)
+		uprobe_perf_func(tp, regs);
+#endif
+	return 0;
+}
+
+
+static struct trace_event_functions uprobe_funcs = {
+	.trace		= print_uprobe_event
+};
+
+static int register_uprobe_event(struct trace_uprobe *tp)
+{
+	struct ftrace_event_call *call = &tp->call;
+	int ret;
+
+	/* Initialize ftrace_event_call */
+	INIT_LIST_HEAD(&call->class->fields);
+	call->event.funcs = &uprobe_funcs;
+	call->class->define_fields = uprobe_event_define_fields;
+	if (set_print_fmt(tp) < 0)
+		return -ENOMEM;
+	ret = register_ftrace_event(&call->event);
+	if (!ret) {
+		kfree(call->print_fmt);
+		return -ENODEV;
+	}
+	call->flags = 0;
+	call->class->reg = uprobe_register;
+	call->data = tp;
+	ret = trace_add_event_call(call);
+	if (ret) {
+		pr_info("Failed to register uprobe event: %s\n", call->name);
+		kfree(call->print_fmt);
+		unregister_ftrace_event(&call->event);
+	}
+	return ret;
+}
+
+static void unregister_uprobe_event(struct trace_uprobe *tp)
+{
+	/* tp->event is unregistered in trace_remove_event_call() */
+	trace_remove_event_call(&tp->call);
+	kfree(tp->call.print_fmt);
+}
+
+/* Make a trace interface for controling probe points */
+static __init int init_uprobe_trace(void)
+{
+	struct dentry *d_tracer;
+	struct dentry *entry;
+
+	d_tracer = tracing_init_dentry();
+	if (!d_tracer)
+		return 0;
+
+	entry = trace_create_file("uprobe_events", 0644, d_tracer,
+				    NULL, &uprobe_events_ops);
+	/* Profile interface */
+	entry = trace_create_file("uprobe_profile", 0444, d_tracer,
+				    NULL, &uprobe_profile_ops);
+	return 0;
+}
+fs_initcall(init_uprobe_trace);

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 0/20]  0: Inode based uprobes
  2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
                   ` (19 preceding siblings ...)
  2010-12-16 10:01 ` [RFC] [PATCH 2.6.37-rc5-tip 20/20] 20: tracing: uprobes trace_event interface Srikar Dronamraju
@ 2010-12-16 10:07 ` Srikar Dronamraju
  20 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-16 10:07 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Steven Rostedt, Arnaldo Carvalho de Melo, Linus Torvalds,
	Andi Kleen, Christoph Hellwig, Ananth N Mavinakayanahalli,
	Masami Hiramatsu, Oleg Nesterov, LKML, Jim Keniston,
	Frederic Weisbecker, SystemTap, Andrew Morton, Paul E. McKenney

Sorry, the first line got edited while sending, please read this as.

This patchset implements Uprobes which enables you to dynamically break
> into any routine in a user space application and collect information
> non-disruptively.
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 18/20] 18: uprobes: commonly used filters.
  2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 18/20] 18: uprobes: commonly used filters Srikar Dronamraju
@ 2010-12-17 19:32   ` Valdis.Kletnieks
  2010-12-18  3:04     ` Srikar Dronamraju
  0 siblings, 1 reply; 116+ messages in thread
From: Valdis.Kletnieks @ 2010-12-17 19:32 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Arnaldo Carvalho de Melo, Linus Torvalds, Andi Kleen,
	Christoph Hellwig, Ananth N Mavinakayanahalli, Masami Hiramatsu,
	Oleg Nesterov, LKML, Linux-mm, Jim Keniston, Frederic Weisbecker,
	SystemTap, Andrew Morton, Paul E. McKenney

[-- Attachment #1: Type: text/plain, Size: 416 bytes --]

On Thu, 16 Dec 2010 15:30:49 +0530, Srikar Dronamraju said:

> Provides most commonly used filters. Prevents users from having to define
> their own filters. However this would be useful once we can
> dynamically associate a filter with a uprobe-event tracer.

Unclear/awkward language here...

Did you mean "Prevents users from defining their own filters" or "Allows
users to not have to define their own filters"?

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 18/20] 18: uprobes: commonly used filters.
  2010-12-17 19:32   ` Valdis.Kletnieks
@ 2010-12-18  3:04     ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2010-12-18  3:04 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Arnaldo Carvalho de Melo, Linus Torvalds, Andi Kleen,
	Christoph Hellwig, Ananth N Mavinakayanahalli, Masami Hiramatsu,
	Oleg Nesterov, LKML, Jim Keniston, Frederic Weisbecker,
	SystemTap, Andrew Morton, Paul E. McKenney

* Valdis.Kletnieks@vt.edu <Valdis.Kletnieks@vt.edu> [2010-12-17 14:32:59]:

> On Thu, 16 Dec 2010 15:30:49 +0530, Srikar Dronamraju said:
> 
> > Provides most commonly used filters. Prevents users from having to define
> > their own filters. However this would be useful once we can
> > dynamically associate a filter with a uprobe-event tracer.
> 
> Unclear/awkward language here...
> 
> Did you mean "Prevents users from defining their own filters" or "Allows
> users to not have to define their own filters"?

The later one. From my perspective, users would prefer to filter
based on tid, pid, sid. So by defining these common/generic filters
within uprobes, we help users from not having to redine these
filters. However it no way prevents users from defining their
filters. Also users can still go ahead and add a filter for the same
purpose for which we already have defined a filter.

If you think user would like to filter based on some other
parameter that most users would be interested, then do let me know,
I will try to add a filter for the same.

--
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 8/20] 8: uprobes: mmap and fork hooks Srikar Dronamraju
@ 2011-01-25 12:15   ` Peter Zijlstra
  2011-01-26  9:03       ` Srikar Dronamraju
  2011-01-25 12:15   ` Peter Zijlstra
  1 sibling, 1 reply; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 12:15 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> +void uprobe_mmap(struct vm_area_struct *vma)
> +{
> +       struct list_head tmp_list;
> +       struct uprobe *uprobe, *u;
> +       struct mm_struct *mm;
> +       struct inode *inode;
> +
> +       if (!valid_vma(vma))
> +               return;
> +
> +       INIT_LIST_HEAD(&tmp_list);
> +
> +       /*
> +        * The vma was just allocated and this routine gets called
> +        * while holding write lock for mmap_sem.  Function called
> +        * in context of a thread that has a reference to mm.
> +        * Hence no need to take a reference to mm
> +        */
> +       mm = vma->vm_mm;
> +       up_write(&mm->mmap_sem);

Are you very very sure its a good thing to simply drop the mmap_sem
here? Also, why?

> +       mutex_lock(&uprobes_mutex);
> +
> +       inode = vma->vm_file->f_mapping->host;

Since you just dropped the mmap_sem, what's keeping that vma from going
away?

> +       add_to_temp_list(vma, inode, &tmp_list);
> +
> +       list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
> +               mm->uprobes_vaddr = vma->vm_start + uprobe->offset;
> +               install_uprobe(mm, uprobe);
> +               list_del(&uprobe->pending_list);
> +       }
> +       mutex_unlock(&uprobes_mutex);
> +       down_write(&mm->mmap_sem);
> +} 



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 4/20]  4: uprobes: Adding and remove a uprobe in a rb tree.
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 4/20] 4: uprobes: Adding and remove a uprobe in a rb tree Srikar Dronamraju
@ 2011-01-25 12:15   ` Peter Zijlstra
  2011-01-26  8:37       ` Srikar Dronamraju
  2011-01-25 12:15   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 12:15 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> +static int match_inode(struct uprobe *uprobe, struct inode *inode,
> +                                               struct rb_node **p)
> +{
> +       struct rb_node *n = *p;
> +
> +       if (inode < uprobe->inode)
> +               *p = n->rb_left;
> +       else if (inode > uprobe->inode)
> +               *p = n->rb_right;
> +       else
> +               return 1;
> +       return 0;
> +}
> +
> +static int match_offset(struct uprobe *uprobe, unsigned long offset,
> +                                               struct rb_node **p)
> +{
> +       struct rb_node *n = *p;
> +
> +       if (offset < uprobe->offset)
> +               *p = n->rb_left;
> +       else if (offset > uprobe->offset)
> +               *p = n->rb_right;
> +       else
> +               return 1;
> +       return 0;
> +}
> +
> +/*
> + * Find a uprobe corresponding to a given inode:offset
> + * Acquires treelock
> + */
> +static struct uprobe *find_uprobe(struct inode * inode,
> +                                        unsigned long offset)
> +{
> +       struct rb_node *n = uprobes_tree.rb_node;
> +       struct uprobe *uprobe, *u = NULL;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&treelock, flags);
> +       while (n) {
> +               uprobe = rb_entry(n, struct uprobe, rb_node);
> +
> +               if (match_inode(uprobe, inode, &n)) {
> +                       if (match_offset(uprobe, offset, &n)) {
> +                               if (atomic_inc_not_zero(&uprobe->ref))
> +                                       u = uprobe;
> +                               break;
> +                       }
> +               }
> +       }
> +       spin_unlock_irqrestore(&treelock, flags);
> +       return u;
> +}
> +
> +/*
> + * Check if a uprobe is already inserted;
> + *     If it does; return refcount incremented uprobe
> + *     else add the current uprobe and return NULL
> + * Acquires treelock.
> + */
> +static struct uprobe *insert_uprobe_rb_node(struct uprobe *uprobe)
> +{
> +       struct rb_node **p = &uprobes_tree.rb_node;
> +       struct rb_node *parent = NULL;
> +       struct uprobe *u;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&treelock, flags);
> +       while (*p) {
> +               parent = *p;
> +               u = rb_entry(parent, struct uprobe, rb_node);
> +               if (u->inode > uprobe->inode)
> +                       p = &(*p)->rb_left;
> +               else if (u->inode < uprobe->inode)
> +                       p = &(*p)->rb_right;
> +               else {
> +                       if (u->offset > uprobe->offset)
> +                               p = &(*p)->rb_left;
> +                       else if (u->offset < uprobe->offset)
> +                               p = &(*p)->rb_right;
> +                       else {
> +                               atomic_inc(&u->ref);

If the lookup can find a 'dead' entry, then why can't we here?

> +                               goto unlock_return;
> +                       }
> +               }
> +       }
> +       u = NULL;
> +       rb_link_node(&uprobe->rb_node, parent, p);
> +       rb_insert_color(&uprobe->rb_node, &uprobes_tree);
> +       atomic_set(&uprobe->ref, 2);
> +
> +unlock_return:
> +       spin_unlock_irqrestore(&treelock, flags);
> +       return u;
> +} 

It would be nice if you could merge the find and 'acquire' thing, the
lookup is basically the same in both cases.

Also, I'm not quite sure on the name of that last function, its not a
strict insert and what's the trailing _rb_node about? That lookup isn't
called find_uprobe_rb_node() either is it?

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 8/20] 8: uprobes: mmap and fork hooks Srikar Dronamraju
  2011-01-25 12:15   ` Peter Zijlstra
@ 2011-01-25 12:15   ` Peter Zijlstra
  2011-01-25 20:05     ` Steven Rostedt
  2011-01-26 15:09       ` Srikar Dronamraju
  1 sibling, 2 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 12:15 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> +static void search_within_subtree(struct rb_node *n, struct inode *inode,
> +               struct list_head *tmp_list);
> +
> +static void add_to_temp_list(struct vm_area_struct *vma, struct inode *inode,
> +               struct list_head *tmp_list)
> +{
> +       struct uprobe *uprobe;
> +       struct rb_node *n;
> +       unsigned long flags;
> +
> +       n = uprobes_tree.rb_node;
> +       spin_lock_irqsave(&treelock, flags);
> +       while (n) {
> +               uprobe = rb_entry(n, struct uprobe, rb_node);
> +               if (match_inode(uprobe, inode, &n)) {
> +                       list_add(&uprobe->pending_list, tmp_list);
> +                       search_within_subtree(n, inode, tmp_list);
> +                       break;
> +               }
> +       }
> +       spin_unlock_irqrestore(&treelock, flags);
> +}
> +
> +static void __search_within_subtree(struct rb_node *p, struct inode *inode,
> +               struct list_head *tmp_list)
> +{
> +       struct uprobe *uprobe;
> +
> +       uprobe = rb_entry(p, struct uprobe, rb_node);
> +       if (match_inode(uprobe, inode, &p)) {
> +               list_add(&uprobe->pending_list, tmp_list);
> +               search_within_subtree(p, inode, tmp_list);
> +       }
> +
> +
> +}
> +
> +static void search_within_subtree(struct rb_node *n, struct inode *inode,
> +               struct list_head *tmp_list)
> +{
> +       struct rb_node *p;
> +
> +       if (p)
> +               __search_within_subtree(p, inode, tmp_list);
> +
> +       p = n->rb_right;
> +       if (p)
> +               __search_within_subtree(p, inode, tmp_list);
> +} 

Whee recursion FTW!, you just blew your kernel stack :-)

Since you sort inode first, offset second, I think you can simply look
for the first matching inode entry and simply rb_next() until you don't
match.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 4/20]  4: uprobes: Adding and remove a uprobe in a rb tree.
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 4/20] 4: uprobes: Adding and remove a uprobe in a rb tree Srikar Dronamraju
  2011-01-25 12:15   ` Peter Zijlstra
@ 2011-01-25 12:15   ` Peter Zijlstra
  2011-01-26  8:41     ` Srikar Dronamraju
  2011-01-25 12:15   ` Peter Zijlstra
  2011-01-25 13:56   ` Peter Zijlstra
  3 siblings, 1 reply; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 12:15 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> +/* Should be called lock-less */
> +static void put_uprobe(struct uprobe *uprobe)
> +{
> +       if (atomic_dec_and_test(&uprobe->ref))
> +               kfree(uprobe);
> +} 

Since this instantly frees the uprobe once ref hits 0, the
atomic_inc_not_zero() in find_uprobe() doesn't really make sense does
it?

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 7/20]  7: uprobes: store/restore original instruction.
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 7/20] 7: uprobes: store/restore original instruction Srikar Dronamraju
@ 2011-01-25 12:15   ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 12:15 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, Andrew Morton, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, LKML,
	Paul E. McKenney

On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> On the first probe insertion, copy the original instruction and opcode.
> If multiple vmas map the same text area corresponding to an inode, we
> only need to copy the instruction just once.
> The copied instruction is further copied to a designated slot on probe
> hit.  Its also used at the time of probe removal to restore the original
> instruction.
> opcode is used to analyze the instruction and determine the fixups.
> Determining fixups at probe hit time would result in doing the same
> operation on every probe hit. Hence Instruction analysis using the
> opcode is done at probe insertion time.
> 
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  arch/Kconfig     |    1 +
>  kernel/uprobes.c |   61 ++++++++++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 57 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 6e8f26e..bba8108 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -65,6 +65,7 @@ config UPROBES
>  	bool "User-space probes (EXPERIMENTAL)"
>  	depends on ARCH_SUPPORTS_UPROBES
>  	depends on MMU
> +	select MM_OWNER
>  	help
>  	  Uprobes enables kernel subsystems to establish probepoints
>  	  in user applications and execute handler functions when
> diff --git a/kernel/uprobes.c b/kernel/uprobes.c
> index 8a5da38..858ddb1 100644
> --- a/kernel/uprobes.c
> +++ b/kernel/uprobes.c
> @@ -448,21 +448,72 @@ static int del_consumer(struct uprobe *uprobe,
>  	return ret;
>  }
>  
> +static int copy_insn(struct task_struct *tsk, unsigned long vaddr,
> +						struct uprobe *uprobe)
> +{
> +	int len;
> +
> +	len = uprobes_read_vm(tsk, (void __user *)vaddr, uprobe->insn,
> +						MAX_UINSN_BYTES);
> +	if (len < uprobe_opcode_sz) {
> +		print_insert_fail(tsk, vaddr,
> +				"error reading original instruction");
> +		return -EINVAL;
> +	}
> +	memcpy(&uprobe->opcode, uprobe->insn, uprobe_opcode_sz);
> +	if (is_bkpt_insn(uprobe)) {
> +		print_insert_fail(tsk, vaddr,
> +				"breakpoint instruction already exists");
> +		return -EEXIST;
> +	}
> +	if (analyze_insn(tsk, uprobe)) {
> +		print_insert_fail(tsk, vaddr,
> +					"instruction type cannot be probed");
> +		return -EINVAL;
> +	}
> +	uprobe->copy = 1;
> +	return 0;
> +}

Since you actually have the inode, you could read it from the
page-cache. Also, why do you have the whole opcode/insn thing, that
looks like its data duplication.

>  static int install_uprobe(struct mm_struct *mm, struct uprobe *uprobe)
>  {
> -	int ret = 0;
> +	struct task_struct *tsk;
> +	int ret = -EINVAL;
>  
> -	/*TODO: install breakpoint */
> -	if (!ret)
> +	get_task_struct(mm->owner);
> +	tsk = mm->owner;
> +	if (!tsk)
> +		return ret;
> +
> +	if (!uprobe->copy) {
> +		ret = copy_insn(tsk, mm->uprobes_vaddr, uprobe);
> +		if (ret)
> +			goto put_return;
> +	}

So you do know that uprobes_vaddr can point to some random piece of
memory by now, right? :-)

> +	ret = set_bkpt(tsk, mm->uprobes_vaddr);
> +	if (ret < 0)
> +		print_insert_fail(tsk, mm->uprobes_vaddr,
> +					"failed to insert bkpt instruction");
> +	else
>  		atomic_inc(&mm->uprobes_count);
> +
> +put_return:
> +	put_task_struct(tsk);
>  	return ret;
>  }
>  
>  static int remove_uprobe(struct mm_struct *mm, struct uprobe *uprobe)
>  {
> -	int ret = 0;
> +	struct task_struct *tsk;
> +	int ret;
> +
> +	get_task_struct(mm->owner);
> +	tsk = mm->owner;
> +	if (!tsk)
> +		return -EINVAL;
>  
> -	/*TODO: remove breakpoint */
> +	ret = set_orig_insn(tsk, mm->uprobes_vaddr, true, uprobe);
>  	if (!ret)
>  		atomic_dec(&mm->uprobes_count);

Same here, there is no guarantee vaddr is even still mapped.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 5/20] 5: Uprobes: register/unregister probes Srikar Dronamraju
@ 2011-01-25 12:15   ` Peter Zijlstra
  2011-01-26  7:55       ` Srikar Dronamraju
  2011-01-25 12:15   ` Peter Zijlstra
  1 sibling, 1 reply; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 12:15 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> +/* Returns 0 if it can install one probe */
> +int register_uprobe(struct inode *inode, unsigned long offset,
> +                               struct uprobe_consumer *consumer)
> +{
> +       struct prio_tree_iter iter;
> +       struct list_head tmp_list;
> +       struct address_space *mapping;
> +       struct mm_struct *mm, *tmpmm;
> +       struct vm_area_struct *vma;
> +       struct uprobe *uprobe;
> +       int ret = -1;
> +
> +       if (!inode || !consumer || consumer->next)
> +               return -EINVAL;
> +       uprobe = uprobes_add(inode, offset);
> +       INIT_LIST_HEAD(&tmp_list);
> +
> +       mapping = inode->i_mapping;
> +
> +       mutex_lock(&uprobes_mutex);
> +       if (uprobe->consumers) {
> +               ret = 0;
> +               goto consumers_add;
> +       }
> +
> +       spin_lock(&mapping->i_mmap_lock);
> +       vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, 0, 0) {
> +               if (!atomic_inc_not_zero(&vma->vm_mm->mm_users))
> +                       continue;
> +
> +               mm = vma->vm_mm;
> +               if (!valid_vma(vma)) {
> +                       mmput(mm);
> +                       continue;
> +               }
> +
> +               list_add(&mm->uprobes_list, &tmp_list);
> +               mm->uprobes_vaddr = vma->vm_start + offset;
> +       }
> +       spin_unlock(&mapping->i_mmap_lock);

Both this and unregister are racy, what is to say:
 - the vma didn't get removed from the mm
 - no new matching vma got added

> +       if (list_empty(&tmp_list)) {
> +               ret = 0;
> +               goto consumers_add;
> +       }
> +       list_for_each_entry_safe(mm, tmpmm, &tmp_list, uprobes_list) {
> +               if (!install_uprobe(mm, uprobe))
> +                       ret = 0;
> +               list_del(&mm->uprobes_list);
> +               mmput(mm);
> +       }
> +
> +consumers_add:
> +       add_consumer(uprobe, consumer);
> +       mutex_unlock(&uprobes_mutex);
> +       put_uprobe(uprobe);
> +       return ret;
> +}
> + 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 5/20] 5: Uprobes: register/unregister probes Srikar Dronamraju
  2011-01-25 12:15   ` Peter Zijlstra
@ 2011-01-25 12:15   ` Peter Zijlstra
  2011-01-26  7:47       ` Srikar Dronamraju
  1 sibling, 1 reply; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 12:15 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> +void unregister_uprobe(struct inode *inode, unsigned long offset,
> +                               struct uprobe_consumer *consumer)
> +{
> +       struct prio_tree_iter iter;
> +       struct list_head tmp_list;
> +       struct address_space *mapping;
> +       struct mm_struct *mm, *tmpmm;
> +       struct vm_area_struct *vma;
> +       struct uprobe *uprobe;
> +
> +       if (!inode || !consumer)
> +               return;
> +
> +       uprobe = find_uprobe(inode, offset);
> +       if (!uprobe) {
> +               printk(KERN_ERR "No uprobe found with inode:offset %p %lu\n",
> +                               inode, offset);
> +               return;
> +       }
> +
> +       if (!del_consumer(uprobe, consumer)) {
> +               printk(KERN_ERR "No uprobe found with consumer %p\n",
> +                               consumer);
> +               return;
> +       }
> +
> +       INIT_LIST_HEAD(&tmp_list);
> +
> +       mapping = inode->i_mapping;
> +
> +       mutex_lock(&uprobes_mutex);
> +       if (uprobe->consumers)
> +               goto put_unlock;
> +
> +       spin_lock(&mapping->i_mmap_lock);
> +       vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, 0, 0) {
> +               if (!atomic_inc_not_zero(&vma->vm_mm->mm_users))
> +                       continue;
> +
> +               mm = vma->vm_mm;
> +
> +               if (!atomic_read(&mm->uprobes_count)) {
> +                       mmput(mm);
> +                       continue;
> +               }
> +
> +               if (valid_vma(vma)) {
> +                       list_add(&mm->uprobes_list, &tmp_list);
> +                       mm->uprobes_vaddr = vma->vm_start + offset;
> +               } else
> +                       mmput(mm);
> +       }
> +       spin_unlock(&mapping->i_mmap_lock);
> +       list_for_each_entry_safe(mm, tmpmm, &tmp_list, uprobes_list) {
> +               remove_uprobe(mm, uprobe);
> +               list_del(&mm->uprobes_list);
> +               mmput(mm);
> +       }
> +
> +       if (atomic_read(&uprobe->ref) == 1) {
> +               synchronize_sched();
> +               rb_erase(&uprobe->rb_node, &uprobes_tree);

How is that safe without holding the treelock?

> +               iput(uprobe->inode);
> +       }
> +
> +put_unlock:
> +       mutex_unlock(&uprobes_mutex);
> +       put_uprobe(uprobe);
> +} 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 4/20]  4: uprobes: Adding and remove a uprobe in a rb tree.
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 4/20] 4: uprobes: Adding and remove a uprobe in a rb tree Srikar Dronamraju
  2011-01-25 12:15   ` Peter Zijlstra
  2011-01-25 12:15   ` Peter Zijlstra
@ 2011-01-25 12:15   ` Peter Zijlstra
  2011-01-26  8:38     ` Srikar Dronamraju
  2011-01-25 13:56   ` Peter Zijlstra
  3 siblings, 1 reply; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 12:15 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> +/* Acquires uprobes_mutex */

Requires? afaict uprobes_mutex isn't actually used anywhere in this
patch.

Its use is added in the next patch.

> +static struct uprobe *uprobes_add(struct inode *inode,
> +                                       unsigned long offset)
> +{
> +       struct uprobe *uprobe, *cur_uprobe;
> +
> +       __iget(inode);
> +       uprobe = kzalloc(sizeof(struct uprobe), GFP_KERNEL);
> +
> +       if (!uprobe) {
> +               iput(inode);
> +               return NULL;
> +       }
> +       uprobe->inode = inode;
> +       uprobe->offset = offset;
> +
> +       /* add to uprobes_tree, sorted on inode:offset */
> +       cur_uprobe = insert_uprobe_rb_node(uprobe);
> +
> +       /* a uprobe exists for this inode:offset combination*/
> +       if (cur_uprobe) {
> +               kfree(uprobe);
> +               uprobe = cur_uprobe;
> +               iput(inode);
> +       } else
> +               init_rwsem(&uprobe->consumer_rwsem);
> +
> +       return uprobe;
> +} 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 4/20]  4: uprobes: Adding and remove a uprobe in a rb tree.
  2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 4/20] 4: uprobes: Adding and remove a uprobe in a rb tree Srikar Dronamraju
                     ` (2 preceding siblings ...)
  2011-01-25 12:15   ` Peter Zijlstra
@ 2011-01-25 13:56   ` Peter Zijlstra
  2011-01-26  8:45     ` Srikar Dronamraju
  3 siblings, 1 reply; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 13:56 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> +struct uprobe_consumer {
> +       int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);
> +       /*
> +        * filter is optional; If a filter exists, handler is run
> +        * if and only if filter returns true.
> +        */
> +       bool (*filter)(struct uprobe_consumer *self, struct task_struct *task);
> +
> +       struct uprobe_consumer *next;
> +       void *fvalue;   /* filter value */
> +}; 

Since you pass in a pointer to this structure at register_uprobe() its
user allocated, hence you can remove the fvalue thing and let the user
embed this in a larger struct if needed, the filter functions can then
use container_of() to get at the larger data structure.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception.
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception Srikar Dronamraju
@ 2011-01-25 13:56   ` Peter Zijlstra
  2011-01-25 13:56   ` Peter Zijlstra
  1 sibling, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 13:56 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Thu, 2010-12-16 at 15:29 +0530, Srikar Dronamraju wrote:
> +       if (unlikely(!utask)) {
> +               utask = add_utask();
> +
> +               /* Failed to allocate utask for the current task. */
> +               BUG_ON(!utask);

That's not really sane...

> +               utask->state = UTASK_BP_HIT;
> +       } 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 10/20] 10: uprobes: task specific information.
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 10/20] 10: uprobes: task specific information Srikar Dronamraju
@ 2011-01-25 13:56   ` Peter Zijlstra
  2011-01-25 18:38     ` Josh Stone
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 13:56 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

On Thu, 2010-12-16 at 15:29 +0530, Srikar Dronamraju wrote:
> Uprobes needs to maintain some task specific information include if a
> task is currently uprobed, the currently handing uprobe, any arch
> specific information (for example to handle rip relative instructions),
> the per-task slot where the original instruction is copied to before
> single-stepping.

This can go away once you have per-task xol slots and boosted probes,
because then you can write the complete replacement sequence on trap and
never need to come back until you hit another probe, right?

> +/*
> + * uprobe_utask -- not a user-visible struct.
> + * Corresponds to a thread in a probed process.
> + * Guarded by uproc->mutex.
> + */
> +struct uprobe_task {
> +	unsigned long xol_vaddr;
> +	unsigned long vaddr;
> +
> +	enum uprobe_task_state state;
> +	struct uprobe_task_arch_info tskinfo;
> +
> +	struct uprobe *active_uprobe;
> +};

So xol_vaddr is the start of the xol slot,
vaddr is the trap address, we store those so that you still have the
state during the single-step things?

I guess you could obtain the xol slot information from the IP during
single-step, but since you have storage anyway, this might be cheaper.

And the active_probe is again due to single-step, right? Why exactly do
you need that? If you trap, acquire a new slot, write the replacement
sequence, single step through it, and release the slot once you're back
to the original code stream. I'm not quite seeing where you need the
probe during stepping.

Ah, I think I found it while reading patch 13, you need the pre/post_xol
callbacks, can't you simply synthesize their effect into the replacement
sequence?

  push %rax
  mov $vaddr, %rax
  $INSN
  pop %rax
  jmp $next_insn

like replacements would obviate the need for the pre/post callbacks and
allow you to run straight through.

It doesn't look too hard to create simple sequences for each
UPROBE_FIX_* thingy:

pre:
  push %rax; mov $vaddr, %rax && UPROBE_FIX_RIP_AX
  push %rcx; mov $vaddr, %rcx && UPROBE_FIX_RIP_CX

INSN

post:
  pop %rax && UPROBE_FIX_RIP_AX
  pop %rcx && UPROBE_FIX_RIP_CX
  add $correction, $offset(%rsp) && UPROBE_FIX_CALL
  jmp $next_insn

you already have all the logic of computing the various constants there.
And your slots are 128bytes long, which should fit sequences like that
just fine I think.

It would also remove the whole single-step need since they're proper
boosted probes.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 11/20] 11: uprobes: slot allocation for uprobes
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 11/20] 11: uprobes: slot allocation for uprobes Srikar Dronamraju
@ 2011-01-25 13:56   ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 13:56 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Thu, 2010-12-16 at 15:29 +0530, Srikar Dronamraju wrote:
> +       /*
> +        * We keep the vma's vm_start rather than a pointer to the vma
> +        * itself.  The probed process or a naughty kernel module could make
> +        * the vma go away, and we must handle that reasonably gracefully.
> +        */
> +       unsigned long vaddr;            /* Page(s) of instruction slots */ 

You could simply refuse to let the user unmap that area and rogue kernel
modules aren't something you can do anything about anyway.

But yeah..

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 16/20] 16: uprobes: register a notifier for uprobes.
  2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 16/20] 16: uprobes: register a notifier for uprobes Srikar Dronamraju
@ 2011-01-25 13:56   ` Peter Zijlstra
  2011-01-27  6:50     ` Srikar Dronamraju
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 13:56 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

On Thu, 2010-12-16 at 15:30 +0530, Srikar Dronamraju wrote:
> Uprobe needs to be intimated on int3 and singlestep exceptions.
> Hence uprobes registers a die notifier so that its notified of the events.

Why isn't this part of the previous patch? This splitup really doesn't
make sense.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception.
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception Srikar Dronamraju
  2011-01-25 13:56   ` Peter Zijlstra
@ 2011-01-25 13:56   ` Peter Zijlstra
  2011-01-26  8:52       ` Srikar Dronamraju
  1 sibling, 1 reply; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 13:56 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Thu, 2010-12-16 at 15:29 +0530, Srikar Dronamraju wrote:
> +               down_read(&mm->mmap_sem);
> +               for (vma = mm->mmap; vma; vma = vma->vm_next) {
> +                       if (!valid_vma(vma))
> +                               continue;
> +                       if (probept < vma->vm_start || probept > vma->vm_end)
> +                               continue;
> +                       u = find_uprobe(vma->vm_file->f_mapping->host,
> +                                       probept - vma->vm_start);
> +                       if (u)
> +                               break;
> +               }
> +               up_read(&mm->mmap_sem); 

One has to ask, what's wrong with find_vma() ?



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 12/20] 12: uprobes: get the breakpoint address.
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 12/20] 12: uprobes: get the breakpoint address Srikar Dronamraju
@ 2011-01-25 13:56   ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 13:56 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Andi Kleen, Christoph Hellwig,
	Ananth N Mavinakayanahalli, Masami Hiramatsu, Oleg Nesterov,
	LKML, Linux-mm, Jim Keniston, Frederic Weisbecker, SystemTap,
	Andrew Morton, Paul E. McKenney

On Thu, 2010-12-16 at 15:29 +0530, Srikar Dronamraju wrote:
> On a breakpoint hit, perform a architecture specific calculation to
> return the address where the breakpoint was hit.

And yet all the code added is generic ;-)

> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
> ---
>  include/linux/uprobes.h |    5 +++++
>  kernel/uprobes.c        |   11 +++++++++++
>  2 files changed, 16 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index a631c42..ee12b2e 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -154,6 +154,7 @@ extern void uprobe_free_utask(struct task_struct *tsk);
>  
>  struct vm_area_struct;
>  extern void uprobe_mmap(struct vm_area_struct *vma);
> +extern unsigned long uprobes_get_bkpt_addr(struct pt_regs *regs);
>  extern void uprobe_dup_mmap(struct mm_struct *old_mm, struct mm_struct *mm);
>  extern void uprobes_free_xol_area(struct mm_struct *mm);
>  #else /* CONFIG_UPROBES is not defined */
> @@ -173,5 +174,9 @@ static inline void uprobe_dup_mmap(struct mm_struct *old_mm,
>  static inline void uprobe_free_utask(struct task_struct *tsk) {}
>  static inline void uprobe_mmap(struct vm_area_struct *vma) { }
>  static inline void uprobes_free_xol_area(struct mm_struct *mm) {}
> +static inline unsigned long uprobes_get_bkpt_addr(struct pt_regs *regs)
> +{
> +	return 0;
> +}
>  #endif /* CONFIG_UPROBES */
>  #endif	/* _LINUX_UPROBES_H */
> diff --git a/kernel/uprobes.c b/kernel/uprobes.c
> index 09e36f6..f486c4f 100644
> --- a/kernel/uprobes.c
> +++ b/kernel/uprobes.c
> @@ -976,6 +976,17 @@ static void xol_free_insn_slot(struct task_struct *tsk, unsigned long slot_addr)
>  						__func__, slot_addr);
>  }
>  
> +/**
> + * uprobes_get_bkpt_addr - compute address of bkpt given post-bkpt regs
> + * @regs: Reflects the saved state of the task after it has hit a breakpoint
> + * instruction.
> + * Return the address of the breakpoint instruction.
> + */
> +unsigned long uprobes_get_bkpt_addr(struct pt_regs *regs)
> +{
> +	return instruction_pointer(regs) - UPROBES_BKPT_INSN_SIZE;
> +}
> +
>  /*
>   * Called with no locks held.
>   * Called in context of a exiting or a exec-ing thread.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 13/20] 13: x86: x86 specific probe handling
  2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 13/20] 13: x86: x86 specific probe handling Srikar Dronamraju
@ 2011-01-25 13:56   ` Peter Zijlstra
  2011-01-27  9:40     ` Srikar Dronamraju
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 13:56 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, Andrew Morton, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, LKML,
	Paul E. McKenney

On Thu, 2010-12-16 at 15:29 +0530, Srikar Dronamraju wrote:
> 
> +void arch_uprobe_enable_sstep(struct pt_regs *regs)
> +{
> +       /*
> +        * Enable single-stepping by
> +        * - Set TF on stack
> +        * - Set TIF_SINGLESTEP: Guarantees that TF is set when
> +        *      returning to user mode.
> +        *  - Indicate that TF is set by us.
> +        */
> +       regs->flags |= X86_EFLAGS_TF;
> +       set_thread_flag(TIF_SINGLESTEP);
> +       set_thread_flag(TIF_FORCED_TF);
> +}
> +
> +void arch_uprobe_disable_sstep(struct pt_regs *regs)
> +{
> +       /* Disable single-stepping by clearing what we set */
> +       clear_thread_flag(TIF_SINGLESTEP);
> +       clear_thread_flag(TIF_FORCED_TF);
> +       regs->flags &= ~X86_EFLAGS_TF;
> +} 

Why not use the code from arch/x86/kernel/step.c?

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 10/20] 10: uprobes: task specific information.
  2011-01-25 13:56   ` Peter Zijlstra
@ 2011-01-25 18:38     ` Josh Stone
  2011-01-25 18:55       ` Roland McGrath
  2011-01-25 19:56       ` Peter Zijlstra
  0 siblings, 2 replies; 116+ messages in thread
From: Josh Stone @ 2011-01-25 18:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Steven Rostedt,
	Arnaldo Carvalho de Melo, Linus Torvalds, Masami Hiramatsu,
	Christoph Hellwig, Andi Kleen, Oleg Nesterov, LKML, SystemTap,
	Linux-mm, Jim Keniston, Frederic Weisbecker,
	Ananth N Mavinakayanahalli, Andrew Morton, Paul E. McKenney

On 01/25/2011 05:56 AM, Peter Zijlstra wrote:
> Ah, I think I found it while reading patch 13, you need the pre/post_xol
> callbacks, can't you simply synthesize their effect into the replacement
> sequence?
> 
>   push %rax
>   mov $vaddr, %rax
>   $INSN
>   pop %rax
>   jmp $next_insn
> 
> like replacements would obviate the need for the pre/post callbacks and
> allow you to run straight through.

For this particular example, you'd better be sure that $INSN doesn't
need %rsp intact.

Control flow in general also makes this challenging.  If $INSN is a
call, then any inline fixups won't get a chance until after return.  If
$INSN is a jump, then its target must be modified so that both taken and
not-taken paths land in respective fixup locations.  I'm sure there are
more cases that I'm not thinking of.

> It would also remove the whole single-step need since they're proper
> boosted probes.

Kprobes has boosting, but it doesn't apply to all opcodes.  I would
guess that the same could be done for uprobes, where certain opcodes get
a fixup sequence like you suggest, but the pre/post_xol mechanism is
still needed in general.

Josh

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 10/20] 10: uprobes: task specific information.
  2011-01-25 18:38     ` Josh Stone
@ 2011-01-25 18:55       ` Roland McGrath
  2011-01-25 19:56       ` Peter Zijlstra
  1 sibling, 0 replies; 116+ messages in thread
From: Roland McGrath @ 2011-01-25 18:55 UTC (permalink / raw)
  To: Josh Stone
  Cc: Peter Zijlstra, Srikar Dronamraju, Ingo Molnar, Steven Rostedt,
	Arnaldo Carvalho de Melo, Linus Torvalds, Masami Hiramatsu,
	Christoph Hellwig, Andi Kleen, Oleg Nesterov, LKML, SystemTap,
	Linux-mm, Jim Keniston, Frederic Weisbecker,
	Ananth N Mavinakayanahalli, Andrew Morton, Paul E. McKenney

> On 01/25/2011 05:56 AM, Peter Zijlstra wrote:
> > Ah, I think I found it while reading patch 13, you need the pre/post_xol
> > callbacks, can't you simply synthesize their effect into the replacement
> > sequence?
> > 
> >   push %rax
> >   mov $vaddr, %rax
> >   $INSN
> >   pop %rax
> >   jmp $next_insn
> > 
> > like replacements would obviate the need for the pre/post callbacks and
> > allow you to run straight through.
> 
> For this particular example, you'd better be sure that $INSN doesn't
> need %rsp intact.

In general it is quite bad form to touch the user's stack at all for
instrumentation purposes.  Unexpected stack usage might be what you are
trying to debug, after all.

On x86-64 in particular, it is strictly verboten to touch the user's stack
immediately below the SP.  In the x86-64 ABI, the 128 bytes below %rsp are
a scratch area for leaf functions that normal compiled user code will use
to store data that must not be clobbered.  (Normal signal handler frames
start 128 bytes below %rsp for this reason.)

That's aside from the more obvious issues Josh mentioned, where the
instruction itself is a push/pop/call/ret or uses an addressing mode
relative to %rsp.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 10/20] 10: uprobes: task specific information.
  2011-01-25 18:38     ` Josh Stone
  2011-01-25 18:55       ` Roland McGrath
@ 2011-01-25 19:56       ` Peter Zijlstra
  1 sibling, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-25 19:56 UTC (permalink / raw)
  To: Josh Stone
  Cc: Srikar Dronamraju, Ingo Molnar, Steven Rostedt,
	Arnaldo Carvalho de Melo, Linus Torvalds, Masami Hiramatsu,
	Christoph Hellwig, Andi Kleen, Oleg Nesterov, LKML, SystemTap,
	Linux-mm, Jim Keniston, Frederic Weisbecker,
	Ananth N Mavinakayanahalli, Andrew Morton, Paul E. McKenney

On Tue, 2011-01-25 at 10:38 -0800, Josh Stone wrote:
> On 01/25/2011 05:56 AM, Peter Zijlstra wrote:
> > Ah, I think I found it while reading patch 13, you need the pre/post_xol
> > callbacks, can't you simply synthesize their effect into the replacement
> > sequence?
> > 
> >   push %rax
> >   mov $vaddr, %rax
> >   $INSN
> >   pop %rax
> >   jmp $next_insn
> > 
> > like replacements would obviate the need for the pre/post callbacks and
> > allow you to run straight through.
> 
> For this particular example, you'd better be sure that $INSN doesn't
> need %rsp intact.

Well, either that of fix up the %rsp offset, but yes I had not
considered this.

> Control flow in general also makes this challenging.  If $INSN is a
> call, then any inline fixups won't get a chance until after return.  If
> $INSN is a jump, then its target must be modified so that both taken and
> not-taken paths land in respective fixup locations.  I'm sure there are
> more cases that I'm not thinking of.

Right.

> > It would also remove the whole single-step need since they're proper
> > boosted probes.
> 
> Kprobes has boosting, but it doesn't apply to all opcodes.  I would
> guess that the same could be done for uprobes, where certain opcodes get
> a fixup sequence like you suggest, but the pre/post_xol mechanism is
> still needed in general.

Bummer..

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-25 12:15   ` Peter Zijlstra
@ 2011-01-25 20:05     ` Steven Rostedt
  2011-01-26  9:06       ` Srikar Dronamraju
  2011-01-26 15:09       ` Srikar Dronamraju
  1 sibling, 1 reply; 116+ messages in thread
From: Steven Rostedt @ 2011-01-25 20:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Linux-mm,
	Arnaldo Carvalho de Melo, Linus Torvalds,
	Ananth N Mavinakayanahalli, Christoph Hellwig, Masami Hiramatsu,
	Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Tue, 2011-01-25 at 13:15 +0100, Peter Zijlstra wrote:
> On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:

> > +static void search_within_subtree(struct rb_node *n, struct inode *inode,
> > +               struct list_head *tmp_list)
> > +{
> > +       struct rb_node *p;
> > +
> > +       if (p)
> > +               __search_within_subtree(p, inode, tmp_list);
> > +
> > +       p = n->rb_right;
> > +       if (p)
> > +               __search_within_subtree(p, inode, tmp_list);
> > +} 
> 
> Whee recursion FTW!, you just blew your kernel stack :-)
> 
> Since you sort inode first, offset second, I think you can simply look
> for the first matching inode entry and simply rb_next() until you don't
> match.

Not to mention that p is uninitialized. Did this code ever work?

-- Steve



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2011-01-25 12:15   ` Peter Zijlstra
@ 2011-01-26  7:47       ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26  7:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-25 13:15:45]:

> > +
> > +       if (atomic_read(&uprobe->ref) == 1) {
> > +               synchronize_sched();
> > +               rb_erase(&uprobe->rb_node, &uprobes_tree);
> 
> How is that safe without holding the treelock?

Right, 
Something like this should be good enuf right?

if (atomic_read(&uprobe->ref) == 1) {
	synchronize_sched();
	spin_lock_irqsave(&treelock, flags);
	rb_erase(&uprobe->rb_node, &uprobes_tree);
	spin_lock_irqrestore(&treelock, flags);
	iput(uprobe->inode);
}
	
-- 
Thanks and Regards
Srikar

PS: Last time I had goofed up with Linux-mm mailing alias. 
Hopefully this time it goes to the right list.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
@ 2011-01-26  7:47       ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26  7:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-25 13:15:45]:

> > +
> > +       if (atomic_read(&uprobe->ref) == 1) {
> > +               synchronize_sched();
> > +               rb_erase(&uprobe->rb_node, &uprobes_tree);
> 
> How is that safe without holding the treelock?

Right, 
Something like this should be good enuf right?

if (atomic_read(&uprobe->ref) == 1) {
	synchronize_sched();
	spin_lock_irqsave(&treelock, flags);
	rb_erase(&uprobe->rb_node, &uprobes_tree);
	spin_lock_irqrestore(&treelock, flags);
	iput(uprobe->inode);
}
	
-- 
Thanks and Regards
Srikar

PS: Last time I had goofed up with Linux-mm mailing alias. 
Hopefully this time it goes to the right list.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2011-01-25 12:15   ` Peter Zijlstra
@ 2011-01-26  7:55       ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26  7:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

> > +
> > +               list_add(&mm->uprobes_list, &tmp_list);
> > +               mm->uprobes_vaddr = vma->vm_start + offset;
> > +       }
> > +       spin_unlock(&mapping->i_mmap_lock);
> 
> Both this and unregister are racy, what is to say:
>  - the vma didn't get removed from the mm
>  - no new matching vma got added
> 

register_uprobe, unregister_uprobe, uprobe_mmap are all synchronized by
uprobes_mutex. So I dont see one unregister_uprobe getting thro when
another register_uprobe is working with a vma.

If I am missing something elementary, please explain a bit more.

> > +       if (list_empty(&tmp_list)) {
> > +               ret = 0;
> > +               goto consumers_add;
> > +       }
> > +       list_for_each_entry_safe(mm, tmpmm, &tmp_list, uprobes_list) {
> > +               if (!install_uprobe(mm, uprobe))
> > +                       ret = 0;
> > +               list_del(&mm->uprobes_list);
> > +               mmput(mm);
> > +       }
> > +
> > +consumers_add:
> > +       add_consumer(uprobe, consumer);
> > +       mutex_unlock(&uprobes_mutex);
> > +       put_uprobe(uprobe);
> > +       return ret;
> > +}
> > + 

-- 
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
@ 2011-01-26  7:55       ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26  7:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

> > +
> > +               list_add(&mm->uprobes_list, &tmp_list);
> > +               mm->uprobes_vaddr = vma->vm_start + offset;
> > +       }
> > +       spin_unlock(&mapping->i_mmap_lock);
> 
> Both this and unregister are racy, what is to say:
>  - the vma didn't get removed from the mm
>  - no new matching vma got added
> 

register_uprobe, unregister_uprobe, uprobe_mmap are all synchronized by
uprobes_mutex. So I dont see one unregister_uprobe getting thro when
another register_uprobe is working with a vma.

If I am missing something elementary, please explain a bit more.

> > +       if (list_empty(&tmp_list)) {
> > +               ret = 0;
> > +               goto consumers_add;
> > +       }
> > +       list_for_each_entry_safe(mm, tmpmm, &tmp_list, uprobes_list) {
> > +               if (!install_uprobe(mm, uprobe))
> > +                       ret = 0;
> > +               list_del(&mm->uprobes_list);
> > +               mmput(mm);
> > +       }
> > +
> > +consumers_add:
> > +       add_consumer(uprobe, consumer);
> > +       mutex_unlock(&uprobes_mutex);
> > +       put_uprobe(uprobe);
> > +       return ret;
> > +}
> > + 

-- 
Thanks and Regards
Srikar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 4/20]  4: uprobes: Adding and remove a uprobe in a rb tree.
  2011-01-25 12:15   ` Peter Zijlstra
@ 2011-01-26  8:37       ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26  8:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

> > +       spin_lock_irqsave(&treelock, flags);
> > +       while (*p) {
> > +               parent = *p;
> > +               u = rb_entry(parent, struct uprobe, rb_node);
> > +               if (u->inode > uprobe->inode)
> > +                       p = &(*p)->rb_left;
> > +               else if (u->inode < uprobe->inode)
> > +                       p = &(*p)->rb_right;
> > +               else {
> > +                       if (u->offset > uprobe->offset)
> > +                               p = &(*p)->rb_left;
> > +                       else if (u->offset < uprobe->offset)
> > +                               p = &(*p)->rb_right;
> > +                       else {
> > +                               atomic_inc(&u->ref);
> 
> If the lookup can find a 'dead' entry, then why can't we here?
> 

If a new user of a uprobe comes up as when the last registered user was
removing the uprobe, we keep the uprobe entry till the new user
loses interest in that uprobe.

> > +                               goto unlock_return;
> > +                       }
> > +               }
> > +       }
> > +       u = NULL;
> > +       rb_link_node(&uprobe->rb_node, parent, p);
> > +       rb_insert_color(&uprobe->rb_node, &uprobes_tree);
> > +       atomic_set(&uprobe->ref, 2);
> > +
> > +unlock_return:
> > +       spin_unlock_irqrestore(&treelock, flags);
> > +       return u;
> > +} 
> 
> It would be nice if you could merge the find and 'acquire' thing, the
> lookup is basically the same in both cases.
> 
> Also, I'm not quite sure on the name of that last function, its not a
> strict insert and what's the trailing _rb_node about? That lookup isn't
> called find_uprobe_rb_node() either is it?

Since we already have a install_uprobe, register_uprobe, I thought
insert_uprobe_rb_node would give context to that function that it was
only inserting an rb_node but not installing the actual breakpoint.
I am okay to rename it to insert_uprobe(). 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 4/20]  4: uprobes: Adding and remove a uprobe in a rb tree.
@ 2011-01-26  8:37       ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26  8:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

> > +       spin_lock_irqsave(&treelock, flags);
> > +       while (*p) {
> > +               parent = *p;
> > +               u = rb_entry(parent, struct uprobe, rb_node);
> > +               if (u->inode > uprobe->inode)
> > +                       p = &(*p)->rb_left;
> > +               else if (u->inode < uprobe->inode)
> > +                       p = &(*p)->rb_right;
> > +               else {
> > +                       if (u->offset > uprobe->offset)
> > +                               p = &(*p)->rb_left;
> > +                       else if (u->offset < uprobe->offset)
> > +                               p = &(*p)->rb_right;
> > +                       else {
> > +                               atomic_inc(&u->ref);
> 
> If the lookup can find a 'dead' entry, then why can't we here?
> 

If a new user of a uprobe comes up as when the last registered user was
removing the uprobe, we keep the uprobe entry till the new user
loses interest in that uprobe.

> > +                               goto unlock_return;
> > +                       }
> > +               }
> > +       }
> > +       u = NULL;
> > +       rb_link_node(&uprobe->rb_node, parent, p);
> > +       rb_insert_color(&uprobe->rb_node, &uprobes_tree);
> > +       atomic_set(&uprobe->ref, 2);
> > +
> > +unlock_return:
> > +       spin_unlock_irqrestore(&treelock, flags);
> > +       return u;
> > +} 
> 
> It would be nice if you could merge the find and 'acquire' thing, the
> lookup is basically the same in both cases.
> 
> Also, I'm not quite sure on the name of that last function, its not a
> strict insert and what's the trailing _rb_node about? That lookup isn't
> called find_uprobe_rb_node() either is it?

Since we already have a install_uprobe, register_uprobe, I thought
insert_uprobe_rb_node would give context to that function that it was
only inserting an rb_node but not installing the actual breakpoint.
I am okay to rename it to insert_uprobe(). 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 4/20]  4: uprobes: Adding and remove a uprobe in a rb tree.
  2011-01-25 12:15   ` Peter Zijlstra
@ 2011-01-26  8:38     ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26  8:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-25 13:15:46]:

> On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> > +/* Acquires uprobes_mutex */
> 
> Requires? afaict uprobes_mutex isn't actually used anywhere in this
> patch.
> 
> Its use is added in the next patch.
> 

Right, Stale  comment, will be removed in the next patch series.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 4/20]  4: uprobes: Adding and remove a uprobe in a rb tree.
  2011-01-25 12:15   ` Peter Zijlstra
@ 2011-01-26  8:41     ` Srikar Dronamraju
  2011-01-26 10:13       ` Peter Zijlstra
  0 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26  8:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-25 13:15:42]:

> On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> > +/* Should be called lock-less */
> > +static void put_uprobe(struct uprobe *uprobe)
> > +{
> > +       if (atomic_dec_and_test(&uprobe->ref))
> > +               kfree(uprobe);
> > +} 
> 
> Since this instantly frees the uprobe once ref hits 0, the
> atomic_inc_not_zero() in find_uprobe() doesn't really make sense does
> it?

Okay, I can move the atomic_inc_not_zero() in find_uprobe() to
atomic_inc().

Do you see any side-effects of using atomic_inc_not_zero?

-- 
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 4/20]  4: uprobes: Adding and remove a uprobe in a rb tree.
  2011-01-25 13:56   ` Peter Zijlstra
@ 2011-01-26  8:45     ` Srikar Dronamraju
  2011-01-26 10:14       ` Peter Zijlstra
  0 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26  8:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-25 14:56:13]:

> On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> > +struct uprobe_consumer {
> > +       int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);
> > +       /*
> > +        * filter is optional; If a filter exists, handler is run
> > +        * if and only if filter returns true.
> > +        */
> > +       bool (*filter)(struct uprobe_consumer *self, struct task_struct *task);
> > +
> > +       struct uprobe_consumer *next;
> > +       void *fvalue;   /* filter value */
> > +}; 
> 
> Since you pass in a pointer to this structure at register_uprobe() its
> user allocated, hence you can remove the fvalue thing and let the user
> embed this in a larger struct if needed, the filter functions can then
> use container_of() to get at the larger data structure.


Okay, Will do, but Is there a reason for moving the fvalue out of the
uprobe_consumer? Except for reducing the size of the structure, I am
unable to see advantage.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception.
  2011-01-25 13:56   ` Peter Zijlstra
@ 2011-01-26  8:52       ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26  8:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-25 14:56:19]:

> On Thu, 2010-12-16 at 15:29 +0530, Srikar Dronamraju wrote:
> > +               down_read(&mm->mmap_sem);
> > +               for (vma = mm->mmap; vma; vma = vma->vm_next) {
> > +                       if (!valid_vma(vma))
> > +                               continue;
> > +                       if (probept < vma->vm_start || probept > vma->vm_end)
> > +                               continue;
> > +                       u = find_uprobe(vma->vm_file->f_mapping->host,
> > +                                       probept - vma->vm_start);
> > +                       if (u)
> > +                               break;
> > +               }
> > +               up_read(&mm->mmap_sem); 
> 
> One has to ask, what's wrong with find_vma() ?

Are you looking for something like this.

       down_read(&mm->mmap_sem);
	for (vma = find_vma(mm, probept); ; vma = vma->vm_next) {
	       if (!valid_vma(vma))
		       continue;
	       u = find_uprobe(vma->vm_file->f_mapping->host,
			       probept - vma->vm_start);
	       if (u)
		       break;
       }
       up_read(&mm->mmap_sem); 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception.
@ 2011-01-26  8:52       ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26  8:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-25 14:56:19]:

> On Thu, 2010-12-16 at 15:29 +0530, Srikar Dronamraju wrote:
> > +               down_read(&mm->mmap_sem);
> > +               for (vma = mm->mmap; vma; vma = vma->vm_next) {
> > +                       if (!valid_vma(vma))
> > +                               continue;
> > +                       if (probept < vma->vm_start || probept > vma->vm_end)
> > +                               continue;
> > +                       u = find_uprobe(vma->vm_file->f_mapping->host,
> > +                                       probept - vma->vm_start);
> > +                       if (u)
> > +                               break;
> > +               }
> > +               up_read(&mm->mmap_sem); 
> 
> One has to ask, what's wrong with find_vma() ?

Are you looking for something like this.

       down_read(&mm->mmap_sem);
	for (vma = find_vma(mm, probept); ; vma = vma->vm_next) {
	       if (!valid_vma(vma))
		       continue;
	       u = find_uprobe(vma->vm_file->f_mapping->host,
			       probept - vma->vm_start);
	       if (u)
		       break;
       }
       up_read(&mm->mmap_sem); 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-25 12:15   ` Peter Zijlstra
@ 2011-01-26  9:03       ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

> On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> > +void uprobe_mmap(struct vm_area_struct *vma)
> > +{
> > +       struct list_head tmp_list;
> > +       struct uprobe *uprobe, *u;
> > +       struct mm_struct *mm;
> > +       struct inode *inode;
> > +
> > +       if (!valid_vma(vma))
> > +               return;
> > +
> > +       INIT_LIST_HEAD(&tmp_list);
> > +
> > +       /*
> > +        * The vma was just allocated and this routine gets called
> > +        * while holding write lock for mmap_sem.  Function called
> > +        * in context of a thread that has a reference to mm.
> > +        * Hence no need to take a reference to mm
> > +        */
> > +       mm = vma->vm_mm;
> > +       up_write(&mm->mmap_sem);
> 
> Are you very very sure its a good thing to simply drop the mmap_sem
> here? Also, why?
> 

I actually dont like to release the write_lock and then reacquire it.
write_opcode, which is called thro install_uprobe, i.e to insert the
actual breakpoint instruction takes a read lock on the mmap_sem.
Hence uprobe_mmap gets called in context with write lock on mmap_sem
held, I had to release it before calling install_uprobe.

Another solution, I thought of was to pass a context to write_opcode to
say that map-sem is already acquired by us. But I am not sure that
idea is good enuf. 

> > +       mutex_lock(&uprobes_mutex);
> > +
> > +       inode = vma->vm_file->f_mapping->host;
> 
> Since you just dropped the mmap_sem, what's keeping that vma from going
> away?
> 

How about dropping the mmap_sem after add_to_temp_list and cachng the
vma->vm_start value before calling add_to_temp_list?

Or if you have better ideas, then that would be great.

> > +       add_to_temp_list(vma, inode, &tmp_list);
> > +
> > +       list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
> > +               mm->uprobes_vaddr = vma->vm_start + uprobe->offset;
> > +               install_uprobe(mm, uprobe);
> > +               list_del(&uprobe->pending_list);
> > +       }
> > +       mutex_unlock(&uprobes_mutex);
> > +       down_write(&mm->mmap_sem);
> > +} 
> 
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
@ 2011-01-26  9:03       ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

> On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> > +void uprobe_mmap(struct vm_area_struct *vma)
> > +{
> > +       struct list_head tmp_list;
> > +       struct uprobe *uprobe, *u;
> > +       struct mm_struct *mm;
> > +       struct inode *inode;
> > +
> > +       if (!valid_vma(vma))
> > +               return;
> > +
> > +       INIT_LIST_HEAD(&tmp_list);
> > +
> > +       /*
> > +        * The vma was just allocated and this routine gets called
> > +        * while holding write lock for mmap_sem.  Function called
> > +        * in context of a thread that has a reference to mm.
> > +        * Hence no need to take a reference to mm
> > +        */
> > +       mm = vma->vm_mm;
> > +       up_write(&mm->mmap_sem);
> 
> Are you very very sure its a good thing to simply drop the mmap_sem
> here? Also, why?
> 

I actually dont like to release the write_lock and then reacquire it.
write_opcode, which is called thro install_uprobe, i.e to insert the
actual breakpoint instruction takes a read lock on the mmap_sem.
Hence uprobe_mmap gets called in context with write lock on mmap_sem
held, I had to release it before calling install_uprobe.

Another solution, I thought of was to pass a context to write_opcode to
say that map-sem is already acquired by us. But I am not sure that
idea is good enuf. 

> > +       mutex_lock(&uprobes_mutex);
> > +
> > +       inode = vma->vm_file->f_mapping->host;
> 
> Since you just dropped the mmap_sem, what's keeping that vma from going
> away?
> 

How about dropping the mmap_sem after add_to_temp_list and cachng the
vma->vm_start value before calling add_to_temp_list?

Or if you have better ideas, then that would be great.

> > +       add_to_temp_list(vma, inode, &tmp_list);
> > +
> > +       list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
> > +               mm->uprobes_vaddr = vma->vm_start + uprobe->offset;
> > +               install_uprobe(mm, uprobe);
> > +               list_del(&uprobe->pending_list);
> > +       }
> > +       mutex_unlock(&uprobes_mutex);
> > +       down_write(&mm->mmap_sem);
> > +} 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-25 20:05     ` Steven Rostedt
@ 2011-01-26  9:06       ` Srikar Dronamraju
  2011-01-27 17:03         ` Steven Rostedt
  0 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26  9:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

* Steven Rostedt <rostedt@goodmis.org> [2011-01-25 15:05:26]:

> On Tue, 2011-01-25 at 13:15 +0100, Peter Zijlstra wrote:
> > On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> 
> > > +static void search_within_subtree(struct rb_node *n, struct inode *inode,
> > > +               struct list_head *tmp_list)
> > > +{
> > > +       struct rb_node *p;
> > > +
> > > +       if (p)
> > > +               __search_within_subtree(p, inode, tmp_list);
> > > +
> > > +       p = n->rb_right;
> > > +       if (p)
> > > +               __search_within_subtree(p, inode, tmp_list);
> > > +} 
> > 
> > Whee recursion FTW!, you just blew your kernel stack :-)
> > 
> > Since you sort inode first, offset second, I think you can simply look
> > for the first matching inode entry and simply rb_next() until you don't
> > match.
> 
> Not to mention that p is uninitialized. Did this code ever work?

I think the original patch that I sent had p initialized. I think it got
dropped off by Peter when he replied. Please do confirm.


-- 
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2011-01-26  7:47       ` Srikar Dronamraju
@ 2011-01-26 10:10         ` Peter Zijlstra
  -1 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 10:10 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Wed, 2011-01-26 at 13:17 +0530, Srikar Dronamraju wrote:
> * Peter Zijlstra <peterz@infradead.org> [2011-01-25 13:15:45]:
> 
> > > +
> > > +       if (atomic_read(&uprobe->ref) == 1) {
> > > +               synchronize_sched();
> > > +               rb_erase(&uprobe->rb_node, &uprobes_tree);
> > 
> > How is that safe without holding the treelock?
> 
> Right, 
> Something like this should be good enuf right?
> 
> if (atomic_read(&uprobe->ref) == 1) {
> 	synchronize_sched();
> 	spin_lock_irqsave(&treelock, flags);
> 	rb_erase(&uprobe->rb_node, &uprobes_tree);
> 	spin_lock_irqrestore(&treelock, flags);
> 	iput(uprobe->inode);
> }
> 	

How is the atomic_read() not racy with a future increment, and what is
that synchronize_sched() thing for?

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
@ 2011-01-26 10:10         ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 10:10 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Wed, 2011-01-26 at 13:17 +0530, Srikar Dronamraju wrote:
> * Peter Zijlstra <peterz@infradead.org> [2011-01-25 13:15:45]:
> 
> > > +
> > > +       if (atomic_read(&uprobe->ref) == 1) {
> > > +               synchronize_sched();
> > > +               rb_erase(&uprobe->rb_node, &uprobes_tree);
> > 
> > How is that safe without holding the treelock?
> 
> Right, 
> Something like this should be good enuf right?
> 
> if (atomic_read(&uprobe->ref) == 1) {
> 	synchronize_sched();
> 	spin_lock_irqsave(&treelock, flags);
> 	rb_erase(&uprobe->rb_node, &uprobes_tree);
> 	spin_lock_irqrestore(&treelock, flags);
> 	iput(uprobe->inode);
> }
> 	

How is the atomic_read() not racy with a future increment, and what is
that synchronize_sched() thing for?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2011-01-26  7:55       ` Srikar Dronamraju
@ 2011-01-26 10:11         ` Peter Zijlstra
  -1 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 10:11 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Wed, 2011-01-26 at 13:25 +0530, Srikar Dronamraju wrote:
> 
> > > +
> > > +               list_add(&mm->uprobes_list, &tmp_list);
> > > +               mm->uprobes_vaddr = vma->vm_start + offset;
> > > +       }
> > > +       spin_unlock(&mapping->i_mmap_lock);
> > 
> > Both this and unregister are racy, what is to say:
> >  - the vma didn't get removed from the mm
> >  - no new matching vma got added
> > 
> 
> register_uprobe, unregister_uprobe, uprobe_mmap are all synchronized by
> uprobes_mutex. So I dont see one unregister_uprobe getting thro when
> another register_uprobe is working with a vma.
> 
> If I am missing something elementary, please explain a bit more.

afaict you're not holding the mmap_sem, so userspace can simply unmap
the vma.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
@ 2011-01-26 10:11         ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 10:11 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Wed, 2011-01-26 at 13:25 +0530, Srikar Dronamraju wrote:
> 
> > > +
> > > +               list_add(&mm->uprobes_list, &tmp_list);
> > > +               mm->uprobes_vaddr = vma->vm_start + offset;
> > > +       }
> > > +       spin_unlock(&mapping->i_mmap_lock);
> > 
> > Both this and unregister are racy, what is to say:
> >  - the vma didn't get removed from the mm
> >  - no new matching vma got added
> > 
> 
> register_uprobe, unregister_uprobe, uprobe_mmap are all synchronized by
> uprobes_mutex. So I dont see one unregister_uprobe getting thro when
> another register_uprobe is working with a vma.
> 
> If I am missing something elementary, please explain a bit more.

afaict you're not holding the mmap_sem, so userspace can simply unmap
the vma.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 4/20]  4: uprobes: Adding and remove a uprobe in a rb tree.
  2011-01-26  8:41     ` Srikar Dronamraju
@ 2011-01-26 10:13       ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 10:13 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

On Wed, 2011-01-26 at 14:11 +0530, Srikar Dronamraju wrote:
> * Peter Zijlstra <peterz@infradead.org> [2011-01-25 13:15:42]:
> 
> > On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> > > +/* Should be called lock-less */
> > > +static void put_uprobe(struct uprobe *uprobe)
> > > +{
> > > +       if (atomic_dec_and_test(&uprobe->ref))
> > > +               kfree(uprobe);
> > > +} 
> > 
> > Since this instantly frees the uprobe once ref hits 0, the
> > atomic_inc_not_zero() in find_uprobe() doesn't really make sense does
> > it?
> 
> Okay, I can move the atomic_inc_not_zero() in find_uprobe() to
> atomic_inc().
> 
> Do you see any side-effects of using atomic_inc_not_zero?

No, its just slower, once you want to start doing RCU lookups in the
probe tree you'll need it through.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 4/20]  4: uprobes: Adding and remove a uprobe in a rb tree.
  2011-01-26  8:45     ` Srikar Dronamraju
@ 2011-01-26 10:14       ` Peter Zijlstra
  2011-01-26 15:18         ` Srikar Dronamraju
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 10:14 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

On Wed, 2011-01-26 at 14:15 +0530, Srikar Dronamraju wrote:
> 
> 
> Okay, Will do, but Is there a reason for moving the fvalue out of the
> uprobe_consumer? Except for reducing the size of the structure, I am
> unable to see advantage. 

That's about it, and its the normal way to do such things in kernel
space.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception.
  2011-01-26  8:52       ` Srikar Dronamraju
@ 2011-01-26 10:17         ` Peter Zijlstra
  -1 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 10:17 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Wed, 2011-01-26 at 14:22 +0530, Srikar Dronamraju wrote:
> * Peter Zijlstra <peterz@infradead.org> [2011-01-25 14:56:19]:
> 
> > On Thu, 2010-12-16 at 15:29 +0530, Srikar Dronamraju wrote:
> > > +               down_read(&mm->mmap_sem);
> > > +               for (vma = mm->mmap; vma; vma = vma->vm_next) {
> > > +                       if (!valid_vma(vma))
> > > +                               continue;
> > > +                       if (probept < vma->vm_start || probept > vma->vm_end)
> > > +                               continue;
> > > +                       u = find_uprobe(vma->vm_file->f_mapping->host,
> > > +                                       probept - vma->vm_start);
> > > +                       if (u)
> > > +                               break;
> > > +               }
> > > +               up_read(&mm->mmap_sem); 
> > 
> > One has to ask, what's wrong with find_vma() ?
> 
> Are you looking for something like this.
> 
>        down_read(&mm->mmap_sem);
> 	for (vma = find_vma(mm, probept); ; vma = vma->vm_next) {
> 	       if (!valid_vma(vma))
> 		       continue;
> 	       u = find_uprobe(vma->vm_file->f_mapping->host,
> 			       probept - vma->vm_start);
> 	       if (u)
> 		       break;
>        }
>        up_read(&mm->mmap_sem); 

How could you ever need to iterate here? There is only a single vma that
covers the probe point, if that doesn't find a uprobe, there isn't any.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception.
@ 2011-01-26 10:17         ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 10:17 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Wed, 2011-01-26 at 14:22 +0530, Srikar Dronamraju wrote:
> * Peter Zijlstra <peterz@infradead.org> [2011-01-25 14:56:19]:
> 
> > On Thu, 2010-12-16 at 15:29 +0530, Srikar Dronamraju wrote:
> > > +               down_read(&mm->mmap_sem);
> > > +               for (vma = mm->mmap; vma; vma = vma->vm_next) {
> > > +                       if (!valid_vma(vma))
> > > +                               continue;
> > > +                       if (probept < vma->vm_start || probept > vma->vm_end)
> > > +                               continue;
> > > +                       u = find_uprobe(vma->vm_file->f_mapping->host,
> > > +                                       probept - vma->vm_start);
> > > +                       if (u)
> > > +                               break;
> > > +               }
> > > +               up_read(&mm->mmap_sem); 
> > 
> > One has to ask, what's wrong with find_vma() ?
> 
> Are you looking for something like this.
> 
>        down_read(&mm->mmap_sem);
> 	for (vma = find_vma(mm, probept); ; vma = vma->vm_next) {
> 	       if (!valid_vma(vma))
> 		       continue;
> 	       u = find_uprobe(vma->vm_file->f_mapping->host,
> 			       probept - vma->vm_start);
> 	       if (u)
> 		       break;
>        }
>        up_read(&mm->mmap_sem); 

How could you ever need to iterate here? There is only a single vma that
covers the probe point, if that doesn't find a uprobe, there isn't any.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-26  9:03       ` Srikar Dronamraju
@ 2011-01-26 10:20         ` Peter Zijlstra
  -1 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 10:20 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Wed, 2011-01-26 at 14:33 +0530, Srikar Dronamraju wrote:
> 
> 
> I actually dont like to release the write_lock and then reacquire it.
> write_opcode, which is called thro install_uprobe, i.e to insert the
> actual breakpoint instruction takes a read lock on the mmap_sem.
> Hence uprobe_mmap gets called in context with write lock on mmap_sem
> held, I had to release it before calling install_uprobe. 

Ah, right, so that's going to give you a head-ache ;-)

The moment you release this mmap_sem, the map you're going to install
the probe point in can go away.

The only way to make this work seems to start by holding the mmap_sem
for writing and make a breakpoint install function that assumes its
taken and doesn't try to acquire it again.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
@ 2011-01-26 10:20         ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 10:20 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Wed, 2011-01-26 at 14:33 +0530, Srikar Dronamraju wrote:
> 
> 
> I actually dont like to release the write_lock and then reacquire it.
> write_opcode, which is called thro install_uprobe, i.e to insert the
> actual breakpoint instruction takes a read lock on the mmap_sem.
> Hence uprobe_mmap gets called in context with write lock on mmap_sem
> held, I had to release it before calling install_uprobe. 

Ah, right, so that's going to give you a head-ache ;-)

The moment you release this mmap_sem, the map you're going to install
the probe point in can go away.

The only way to make this work seems to start by holding the mmap_sem
for writing and make a breakpoint install function that assumes its
taken and doesn't try to acquire it again.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-26 10:20         ` Peter Zijlstra
@ 2011-01-26 14:59           ` Srikar Dronamraju
  -1 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26 14:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-26 11:20:39]:

> On Wed, 2011-01-26 at 14:33 +0530, Srikar Dronamraju wrote:
> > 
> > 
> > I actually dont like to release the write_lock and then reacquire it.
> > write_opcode, which is called thro install_uprobe, i.e to insert the
> > actual breakpoint instruction takes a read lock on the mmap_sem.
> > Hence uprobe_mmap gets called in context with write lock on mmap_sem
> > held, I had to release it before calling install_uprobe. 
> 
> Ah, right, so that's going to give you a head-ache ;-)
> 
> The moment you release this mmap_sem, the map you're going to install
> the probe point in can go away.
> 
> The only way to make this work seems to start by holding the mmap_sem
> for writing and make a breakpoint install function that assumes its
> taken and doesn't try to acquire it again.
> 


Yes, this can be done.
I would have to do something like this in register_uprobe().

list_for_each_entry_safe(mm, tmpmm, &tmp_list, uprobes_list) {
		down_read(&mm->map_sem);
                if (!install_uprobe(mm, uprobe))
                        ret = 0;
		up_read(&mm->map_sem);
                list_del(&mm->uprobes_list);
                mmput(mm);
}

Agree that this is much better than what we have now.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
@ 2011-01-26 14:59           ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26 14:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-26 11:20:39]:

> On Wed, 2011-01-26 at 14:33 +0530, Srikar Dronamraju wrote:
> > 
> > 
> > I actually dont like to release the write_lock and then reacquire it.
> > write_opcode, which is called thro install_uprobe, i.e to insert the
> > actual breakpoint instruction takes a read lock on the mmap_sem.
> > Hence uprobe_mmap gets called in context with write lock on mmap_sem
> > held, I had to release it before calling install_uprobe. 
> 
> Ah, right, so that's going to give you a head-ache ;-)
> 
> The moment you release this mmap_sem, the map you're going to install
> the probe point in can go away.
> 
> The only way to make this work seems to start by holding the mmap_sem
> for writing and make a breakpoint install function that assumes its
> taken and doesn't try to acquire it again.
> 


Yes, this can be done.
I would have to do something like this in register_uprobe().

list_for_each_entry_safe(mm, tmpmm, &tmp_list, uprobes_list) {
		down_read(&mm->map_sem);
                if (!install_uprobe(mm, uprobe))
                        ret = 0;
		up_read(&mm->map_sem);
                list_del(&mm->uprobes_list);
                mmput(mm);
}

Agree that this is much better than what we have now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-25 12:15   ` Peter Zijlstra
@ 2011-01-26 15:09       ` Srikar Dronamraju
  2011-01-26 15:09       ` Srikar Dronamraju
  1 sibling, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26 15:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-25 13:15:41]:

> On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> > +static void search_within_subtree(struct rb_node *n, struct inode *inode,
> > +               struct list_head *tmp_list);
> > +
> > +static void add_to_temp_list(struct vm_area_struct *vma, struct inode *inode,
> > +               struct list_head *tmp_list)
> > +{
> > +       struct uprobe *uprobe;
> > +       struct rb_node *n;
> > +       unsigned long flags;
> > +
> > +       n = uprobes_tree.rb_node;
> > +       spin_lock_irqsave(&treelock, flags);
> > +       while (n) {
> > +               uprobe = rb_entry(n, struct uprobe, rb_node);
> > +               if (match_inode(uprobe, inode, &n)) {
> > +                       list_add(&uprobe->pending_list, tmp_list);
> > +                       search_within_subtree(n, inode, tmp_list);
> > +                       break;
> > +               }
> > +       }
> > +       spin_unlock_irqrestore(&treelock, flags);
> > +}
> > +
> > +static void __search_within_subtree(struct rb_node *p, struct inode *inode,
> > +               struct list_head *tmp_list)
> > +{
> > +       struct uprobe *uprobe;
> > +
> > +       uprobe = rb_entry(p, struct uprobe, rb_node);
> > +       if (match_inode(uprobe, inode, &p)) {
> > +               list_add(&uprobe->pending_list, tmp_list);
> > +               search_within_subtree(p, inode, tmp_list);
> > +       }
> > +
> > +
> > +}
> > +
> > +static void search_within_subtree(struct rb_node *n, struct inode *inode,
> > +               struct list_head *tmp_list)
> > +{
> > +       struct rb_node *p;
> > +
> > +       if (p)
> > +               __search_within_subtree(p, inode, tmp_list);
> > +
> > +       p = n->rb_right;
> > +       if (p)
> > +               __search_within_subtree(p, inode, tmp_list);
> > +} 
> 
> Whee recursion FTW!, you just blew your kernel stack :-)
> 
> Since you sort inode first, offset second, I think you can simply look
> for the first matching inode entry and simply rb_next() until you don't
> match.

Agree that we should get rid of recursion.

I dont think we can simply use rb_next() once we have the first
matching function. There could be a matching inode but a smaller
offset in left that will be missed by rb_next(). (Unless I have
misunderstood rb_next() !!!)

Here are the ways I think we can workaround.
A. change the match_inode() logic to use rb_first/rb_next.
This would make negate the benefit we get from rb_trees because we
have to match every node. Also match_offset might get a little tricky.

B. use the current match_inode but change the search_within_subtree
logic. search_within_subtree() would first find the leftmode node
within the subtree that still has the same inode. Thereafter it will use
rb_next().

Do you have any other ideas?

-- 
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
@ 2011-01-26 15:09       ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26 15:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-25 13:15:41]:

> On Thu, 2010-12-16 at 15:28 +0530, Srikar Dronamraju wrote:
> > +static void search_within_subtree(struct rb_node *n, struct inode *inode,
> > +               struct list_head *tmp_list);
> > +
> > +static void add_to_temp_list(struct vm_area_struct *vma, struct inode *inode,
> > +               struct list_head *tmp_list)
> > +{
> > +       struct uprobe *uprobe;
> > +       struct rb_node *n;
> > +       unsigned long flags;
> > +
> > +       n = uprobes_tree.rb_node;
> > +       spin_lock_irqsave(&treelock, flags);
> > +       while (n) {
> > +               uprobe = rb_entry(n, struct uprobe, rb_node);
> > +               if (match_inode(uprobe, inode, &n)) {
> > +                       list_add(&uprobe->pending_list, tmp_list);
> > +                       search_within_subtree(n, inode, tmp_list);
> > +                       break;
> > +               }
> > +       }
> > +       spin_unlock_irqrestore(&treelock, flags);
> > +}
> > +
> > +static void __search_within_subtree(struct rb_node *p, struct inode *inode,
> > +               struct list_head *tmp_list)
> > +{
> > +       struct uprobe *uprobe;
> > +
> > +       uprobe = rb_entry(p, struct uprobe, rb_node);
> > +       if (match_inode(uprobe, inode, &p)) {
> > +               list_add(&uprobe->pending_list, tmp_list);
> > +               search_within_subtree(p, inode, tmp_list);
> > +       }
> > +
> > +
> > +}
> > +
> > +static void search_within_subtree(struct rb_node *n, struct inode *inode,
> > +               struct list_head *tmp_list)
> > +{
> > +       struct rb_node *p;
> > +
> > +       if (p)
> > +               __search_within_subtree(p, inode, tmp_list);
> > +
> > +       p = n->rb_right;
> > +       if (p)
> > +               __search_within_subtree(p, inode, tmp_list);
> > +} 
> 
> Whee recursion FTW!, you just blew your kernel stack :-)
> 
> Since you sort inode first, offset second, I think you can simply look
> for the first matching inode entry and simply rb_next() until you don't
> match.

Agree that we should get rid of recursion.

I dont think we can simply use rb_next() once we have the first
matching function. There could be a matching inode but a smaller
offset in left that will be missed by rb_next(). (Unless I have
misunderstood rb_next() !!!)

Here are the ways I think we can workaround.
A. change the match_inode() logic to use rb_first/rb_next.
This would make negate the benefit we get from rb_trees because we
have to match every node. Also match_offset might get a little tricky.

B. use the current match_inode but change the search_within_subtree
logic. search_within_subtree() would first find the leftmode node
within the subtree that still has the same inode. Thereafter it will use
rb_next().

Do you have any other ideas?

-- 
Thanks and Regards
Srikar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception.
  2011-01-26 10:17         ` Peter Zijlstra
@ 2011-01-26 15:14           ` Srikar Dronamraju
  -1 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26 15:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-26 11:17:11]:

> On Wed, 2011-01-26 at 14:22 +0530, Srikar Dronamraju wrote:
> > * Peter Zijlstra <peterz@infradead.org> [2011-01-25 14:56:19]:
> > 
> > > On Thu, 2010-12-16 at 15:29 +0530, Srikar Dronamraju wrote:
> > > > +               down_read(&mm->mmap_sem);
> > > > +               for (vma = mm->mmap; vma; vma = vma->vm_next) {
> > > > +                       if (!valid_vma(vma))
> > > > +                               continue;
> > > > +                       if (probept < vma->vm_start || probept > vma->vm_end)
> > > > +                               continue;
> > > > +                       u = find_uprobe(vma->vm_file->f_mapping->host,
> > > > +                                       probept - vma->vm_start);
> > > > +                       if (u)
> > > > +                               break;
> > > > +               }
> > > > +               up_read(&mm->mmap_sem); 
> > > 
> > > One has to ask, what's wrong with find_vma() ?
> > 
> > Are you looking for something like this.
> > 
> >        down_read(&mm->mmap_sem);
> > 	for (vma = find_vma(mm, probept); ; vma = vma->vm_next) {
> > 	       if (!valid_vma(vma))
> > 		       continue;
> > 	       u = find_uprobe(vma->vm_file->f_mapping->host,
> > 			       probept - vma->vm_start);
> > 	       if (u)
> > 		       break;
> >        }
> >        up_read(&mm->mmap_sem); 
> 
> How could you ever need to iterate here? There is only a single vma that
> covers the probe point, if that doesn't find a uprobe, there isn't any.

Agree.
So it simplifies to 

        down_read(&mm->mmap_sem);
 	vma = find_vma(mm, probept);
        if (valid_vma(vma)) {
 	       u = find_uprobe(vma->vm_file->f_mapping->host,
 			       probept - vma->vm_start);
        }
        up_read(&mm->mmap_sem); 

-- 
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception.
@ 2011-01-26 15:14           ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26 15:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-26 11:17:11]:

> On Wed, 2011-01-26 at 14:22 +0530, Srikar Dronamraju wrote:
> > * Peter Zijlstra <peterz@infradead.org> [2011-01-25 14:56:19]:
> > 
> > > On Thu, 2010-12-16 at 15:29 +0530, Srikar Dronamraju wrote:
> > > > +               down_read(&mm->mmap_sem);
> > > > +               for (vma = mm->mmap; vma; vma = vma->vm_next) {
> > > > +                       if (!valid_vma(vma))
> > > > +                               continue;
> > > > +                       if (probept < vma->vm_start || probept > vma->vm_end)
> > > > +                               continue;
> > > > +                       u = find_uprobe(vma->vm_file->f_mapping->host,
> > > > +                                       probept - vma->vm_start);
> > > > +                       if (u)
> > > > +                               break;
> > > > +               }
> > > > +               up_read(&mm->mmap_sem); 
> > > 
> > > One has to ask, what's wrong with find_vma() ?
> > 
> > Are you looking for something like this.
> > 
> >        down_read(&mm->mmap_sem);
> > 	for (vma = find_vma(mm, probept); ; vma = vma->vm_next) {
> > 	       if (!valid_vma(vma))
> > 		       continue;
> > 	       u = find_uprobe(vma->vm_file->f_mapping->host,
> > 			       probept - vma->vm_start);
> > 	       if (u)
> > 		       break;
> >        }
> >        up_read(&mm->mmap_sem); 
> 
> How could you ever need to iterate here? There is only a single vma that
> covers the probe point, if that doesn't find a uprobe, there isn't any.

Agree.
So it simplifies to 

        down_read(&mm->mmap_sem);
 	vma = find_vma(mm, probept);
        if (valid_vma(vma)) {
 	       u = find_uprobe(vma->vm_file->f_mapping->host,
 			       probept - vma->vm_start);
        }
        up_read(&mm->mmap_sem); 

-- 
Thanks and Regards
Srikar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-26 14:59           ` Srikar Dronamraju
@ 2011-01-26 15:16             ` Peter Zijlstra
  -1 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 15:16 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Wed, 2011-01-26 at 20:29 +0530, Srikar Dronamraju wrote:
> list_for_each_entry_safe(mm, tmpmm, &tmp_list, uprobes_list) {
>                 down_read(&mm->map_sem);
>                 if (!install_uprobe(mm, uprobe))
>                         ret = 0;
>                 up_read(&mm->map_sem);
>                 list_del(&mm->uprobes_list);
>                 mmput(mm);
> } 

and the tmp_list thing works because new mm's will hit the mmap callback
and you cannot loose mm's due to the refcount, right?



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
@ 2011-01-26 15:16             ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 15:16 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Wed, 2011-01-26 at 20:29 +0530, Srikar Dronamraju wrote:
> list_for_each_entry_safe(mm, tmpmm, &tmp_list, uprobes_list) {
>                 down_read(&mm->map_sem);
>                 if (!install_uprobe(mm, uprobe))
>                         ret = 0;
>                 up_read(&mm->map_sem);
>                 list_del(&mm->uprobes_list);
>                 mmput(mm);
> } 

and the tmp_list thing works because new mm's will hit the mmap callback
and you cannot loose mm's due to the refcount, right?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 4/20]  4: uprobes: Adding and remove a uprobe in a rb tree.
  2011-01-26 10:14       ` Peter Zijlstra
@ 2011-01-26 15:18         ` Srikar Dronamraju
  2011-01-26 15:33           ` Peter Zijlstra
  0 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26 15:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-26 11:14:07]:

> On Wed, 2011-01-26 at 14:15 +0530, Srikar Dronamraju wrote:
> > 
> > 
> > Okay, Will do, but Is there a reason for moving the fvalue out of the
> > uprobe_consumer? Except for reducing the size of the structure, I am
> > unable to see advantage. 
> 
> That's about it, and its the normal way to do such things in kernel
> space.

But the disadvantage would be we wont be able to share the filter
functions. Currently i had one patch that implemented the common
filter functions that tracers could reuse.

-- 
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-26 15:09       ` Srikar Dronamraju
@ 2011-01-26 15:20         ` Peter Zijlstra
  -1 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 15:20 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Wed, 2011-01-26 at 20:39 +0530, Srikar Dronamraju wrote:
> 
> B. use the current match_inode but change the search_within_subtree
> logic. search_within_subtree() would first find the leftmode node
> within the subtree that still has the same inode. Thereafter it will use
> rb_next().
> 
> Do you have any other ideas? 

Look for the right inode but with offset 0, that should get you the
leftmost matching inode, or the entry left of that (depending on how you
build the tree), after that you should be able to iterate all probes of
that inode by using rb_next.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
@ 2011-01-26 15:20         ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 15:20 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Wed, 2011-01-26 at 20:39 +0530, Srikar Dronamraju wrote:
> 
> B. use the current match_inode but change the search_within_subtree
> logic. search_within_subtree() would first find the leftmode node
> within the subtree that still has the same inode. Thereafter it will use
> rb_next().
> 
> Do you have any other ideas? 

Look for the right inode but with offset 0, that should get you the
leftmost matching inode, or the entry left of that (depending on how you
build the tree), after that you should be able to iterate all probes of
that inode by using rb_next.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception.
  2011-01-26 15:14           ` Srikar Dronamraju
@ 2011-01-26 15:29             ` Peter Zijlstra
  -1 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 15:29 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Wed, 2011-01-26 at 20:44 +0530, Srikar Dronamraju wrote:
> So it simplifies to 
> 
>         down_read(&mm->mmap_sem);
>         vma = find_vma(mm, probept);
>         if (valid_vma(vma)) {
>                u = find_uprobe(vma->vm_file->f_mapping->host,
>                                probept - vma->vm_start);
>         }
>         up_read(&mm->mmap_sem); 

Almost, the offset within a file is something like:

  (address - vma->vm_start) + (vma->vm_pgoff << PAGE_SHIFT)



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception.
@ 2011-01-26 15:29             ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 15:29 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Wed, 2011-01-26 at 20:44 +0530, Srikar Dronamraju wrote:
> So it simplifies to 
> 
>         down_read(&mm->mmap_sem);
>         vma = find_vma(mm, probept);
>         if (valid_vma(vma)) {
>                u = find_uprobe(vma->vm_file->f_mapping->host,
>                                probept - vma->vm_start);
>         }
>         up_read(&mm->mmap_sem); 

Almost, the offset within a file is something like:

  (address - vma->vm_start) + (vma->vm_pgoff << PAGE_SHIFT)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2011-01-26 10:11         ` Peter Zijlstra
@ 2011-01-26 15:30           ` Srikar Dronamraju
  -1 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26 15:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-26 11:11:48]:

> On Wed, 2011-01-26 at 13:25 +0530, Srikar Dronamraju wrote:
> > 
> > > > +
> > > > +               list_add(&mm->uprobes_list, &tmp_list);
> > > > +               mm->uprobes_vaddr = vma->vm_start + offset;
> > > > +       }
> > > > +       spin_unlock(&mapping->i_mmap_lock);
> > > 
> > > Both this and unregister are racy, what is to say:
> > >  - the vma didn't get removed from the mm
> > >  - no new matching vma got added
> > > 
> > 
> > register_uprobe, unregister_uprobe, uprobe_mmap are all synchronized by
> > uprobes_mutex. So I dont see one unregister_uprobe getting thro when
> > another register_uprobe is working with a vma.
> > 
> > If I am missing something elementary, please explain a bit more.
> 
> afaict you're not holding the mmap_sem, so userspace can simply unmap
> the vma.

When we do the actual insert/remove of the breakpoint we hold the
mmap_sem. During the actual insertion/removal, if the vma for the
specific inode is not found, we just come out without doing the
actual insertion/deletion.

-- 
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
@ 2011-01-26 15:30           ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26 15:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-26 11:11:48]:

> On Wed, 2011-01-26 at 13:25 +0530, Srikar Dronamraju wrote:
> > 
> > > > +
> > > > +               list_add(&mm->uprobes_list, &tmp_list);
> > > > +               mm->uprobes_vaddr = vma->vm_start + offset;
> > > > +       }
> > > > +       spin_unlock(&mapping->i_mmap_lock);
> > > 
> > > Both this and unregister are racy, what is to say:
> > >  - the vma didn't get removed from the mm
> > >  - no new matching vma got added
> > > 
> > 
> > register_uprobe, unregister_uprobe, uprobe_mmap are all synchronized by
> > uprobes_mutex. So I dont see one unregister_uprobe getting thro when
> > another register_uprobe is working with a vma.
> > 
> > If I am missing something elementary, please explain a bit more.
> 
> afaict you're not holding the mmap_sem, so userspace can simply unmap
> the vma.

When we do the actual insert/remove of the breakpoint we hold the
mmap_sem. During the actual insertion/removal, if the vma for the
specific inode is not found, we just come out without doing the
actual insertion/deletion.

-- 
Thanks and Regards
Srikar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 4/20]  4: uprobes: Adding and remove a uprobe in a rb tree.
  2011-01-26 15:18         ` Srikar Dronamraju
@ 2011-01-26 15:33           ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 15:33 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

On Wed, 2011-01-26 at 20:48 +0530, Srikar Dronamraju wrote:
> * Peter Zijlstra <peterz@infradead.org> [2011-01-26 11:14:07]:
> 
> > On Wed, 2011-01-26 at 14:15 +0530, Srikar Dronamraju wrote:
> > > 
> > > 
> > > Okay, Will do, but Is there a reason for moving the fvalue out of the
> > > uprobe_consumer? Except for reducing the size of the structure, I am
> > > unable to see advantage. 
> > 
> > That's about it, and its the normal way to do such things in kernel
> > space.
> 
> But the disadvantage would be we wont be able to share the filter
> functions. Currently i had one patch that implemented the common
> filter functions that tracers could reuse.

But you could still do that, just make then use something like:

struct uprobe_simple_consumer {
	struct uprobe_consumer consumer;
	unsigned long value;
};




^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2011-01-26 15:30           ` Srikar Dronamraju
@ 2011-01-26 15:45             ` Peter Zijlstra
  -1 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 15:45 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Wed, 2011-01-26 at 21:00 +0530, Srikar Dronamraju wrote:
> * Peter Zijlstra <peterz@infradead.org> [2011-01-26 11:11:48]:
> 
> > On Wed, 2011-01-26 at 13:25 +0530, Srikar Dronamraju wrote:
> > > 
> > > > > +
> > > > > +               list_add(&mm->uprobes_list, &tmp_list);
> > > > > +               mm->uprobes_vaddr = vma->vm_start + offset;
> > > > > +       }
> > > > > +       spin_unlock(&mapping->i_mmap_lock);
> > > > 
> > > > Both this and unregister are racy, what is to say:
> > > >  - the vma didn't get removed from the mm
> > > >  - no new matching vma got added
> > > > 
> > > 
> > > register_uprobe, unregister_uprobe, uprobe_mmap are all synchronized by
> > > uprobes_mutex. So I dont see one unregister_uprobe getting thro when
> > > another register_uprobe is working with a vma.
> > > 
> > > If I am missing something elementary, please explain a bit more.
> > 
> > afaict you're not holding the mmap_sem, so userspace can simply unmap
> > the vma.
> 
> When we do the actual insert/remove of the breakpoint we hold the
> mmap_sem. During the actual insertion/removal, if the vma for the
> specific inode is not found, we just come out without doing the
> actual insertion/deletion.

Right, but then install_uprobe() should:

 - lookup the vma relating to the address you stored,
 - validate that the vma is indeed a map of the right inode
 - validate that the offset of the probe corresponds with the stored
address

Otherwise you can race with unmap/map and end up installing the probe in
a random location.

Also, I think the whole thing goes funny if someone maps the same text
twice ;-)

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
@ 2011-01-26 15:45             ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 15:45 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Wed, 2011-01-26 at 21:00 +0530, Srikar Dronamraju wrote:
> * Peter Zijlstra <peterz@infradead.org> [2011-01-26 11:11:48]:
> 
> > On Wed, 2011-01-26 at 13:25 +0530, Srikar Dronamraju wrote:
> > > 
> > > > > +
> > > > > +               list_add(&mm->uprobes_list, &tmp_list);
> > > > > +               mm->uprobes_vaddr = vma->vm_start + offset;
> > > > > +       }
> > > > > +       spin_unlock(&mapping->i_mmap_lock);
> > > > 
> > > > Both this and unregister are racy, what is to say:
> > > >  - the vma didn't get removed from the mm
> > > >  - no new matching vma got added
> > > > 
> > > 
> > > register_uprobe, unregister_uprobe, uprobe_mmap are all synchronized by
> > > uprobes_mutex. So I dont see one unregister_uprobe getting thro when
> > > another register_uprobe is working with a vma.
> > > 
> > > If I am missing something elementary, please explain a bit more.
> > 
> > afaict you're not holding the mmap_sem, so userspace can simply unmap
> > the vma.
> 
> When we do the actual insert/remove of the breakpoint we hold the
> mmap_sem. During the actual insertion/removal, if the vma for the
> specific inode is not found, we just come out without doing the
> actual insertion/deletion.

Right, but then install_uprobe() should:

 - lookup the vma relating to the address you stored,
 - validate that the vma is indeed a map of the right inode
 - validate that the offset of the probe corresponds with the stored
address

Otherwise you can race with unmap/map and end up installing the probe in
a random location.

Also, I think the whole thing goes funny if someone maps the same text
twice ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-26 15:16             ` Peter Zijlstra
@ 2011-01-26 16:30               ` Srikar Dronamraju
  -1 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26 16:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-26 16:16:49]:

> On Wed, 2011-01-26 at 20:29 +0530, Srikar Dronamraju wrote:
> > list_for_each_entry_safe(mm, tmpmm, &tmp_list, uprobes_list) {
> >                 down_read(&mm->map_sem);
> >                 if (!install_uprobe(mm, uprobe))
> >                         ret = 0;
> >                 up_read(&mm->map_sem);
> >                 list_del(&mm->uprobes_list);
> >                 mmput(mm);
> > } 
> 
> and the tmp_list thing works because new mm's will hit the mmap callback
> and you cannot loose mm's due to the refcount, right?
> 

Right, In other words, the tmp_list has all mm's that have already
running and have this inode mapped as executable text. Those process
that are yet to start or yet to map the inode as executable text
will hit mmap and then we look at inserting the probes thro
uprobes_mmap. 

-- 
Thanks and Regards
Srikar
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
@ 2011-01-26 16:30               ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26 16:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-26 16:16:49]:

> On Wed, 2011-01-26 at 20:29 +0530, Srikar Dronamraju wrote:
> > list_for_each_entry_safe(mm, tmpmm, &tmp_list, uprobes_list) {
> >                 down_read(&mm->map_sem);
> >                 if (!install_uprobe(mm, uprobe))
> >                         ret = 0;
> >                 up_read(&mm->map_sem);
> >                 list_del(&mm->uprobes_list);
> >                 mmput(mm);
> > } 
> 
> and the tmp_list thing works because new mm's will hit the mmap callback
> and you cannot loose mm's due to the refcount, right?
> 

Right, In other words, the tmp_list has all mm's that have already
running and have this inode mapped as executable text. Those process
that are yet to start or yet to map the inode as executable text
will hit mmap and then we look at inserting the probes thro
uprobes_mmap. 

-- 
Thanks and Regards
Srikar
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2011-01-26 15:45             ` Peter Zijlstra
@ 2011-01-26 16:56               ` Srikar Dronamraju
  -1 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26 16:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-26 16:45:56]:

> On Wed, 2011-01-26 at 21:00 +0530, Srikar Dronamraju wrote:
> > * Peter Zijlstra <peterz@infradead.org> [2011-01-26 11:11:48]:
> > 
> > > On Wed, 2011-01-26 at 13:25 +0530, Srikar Dronamraju wrote:
> > > > 
> > > > > > +
> > > > > > +               list_add(&mm->uprobes_list, &tmp_list);
> > > > > > +               mm->uprobes_vaddr = vma->vm_start + offset;
> > > > > > +       }
> > > > > > +       spin_unlock(&mapping->i_mmap_lock);
> > > > > 
> > > > > Both this and unregister are racy, what is to say:
> > > > >  - the vma didn't get removed from the mm
> > > > >  - no new matching vma got added
> > > > > 
> > > > 
> > > > register_uprobe, unregister_uprobe, uprobe_mmap are all synchronized by
> > > > uprobes_mutex. So I dont see one unregister_uprobe getting thro when
> > > > another register_uprobe is working with a vma.
> > > > 
> > > > If I am missing something elementary, please explain a bit more.
> > > 
> > > afaict you're not holding the mmap_sem, so userspace can simply unmap
> > > the vma.
> > 
> > When we do the actual insert/remove of the breakpoint we hold the
> > mmap_sem. During the actual insertion/removal, if the vma for the
> > specific inode is not found, we just come out without doing the
> > actual insertion/deletion.
> 
> Right, but then install_uprobe() should:
> 
>  - lookup the vma relating to the address you stored,

We already do this thro get_user_pages in write_opcode().

>  - validate that the vma is indeed a map of the right inode

We can add a check in write_opcode( we need to pass the inode to
write_opcode).

>  - validate that the offset of the probe corresponds with the stored
> address

I am not clear on this. We would have derived the address from the
offset. So is that we check for
 (vaddr == vma->vm_start + uprobe->offset)

> 
> Otherwise you can race with unmap/map and end up installing the probe in
> a random location.
> 
> Also, I think the whole thing goes funny if someone maps the same text
> twice ;-)

I am not sure if we can map the same text twice. If something like
this is possible then we would have 2 addresses for each function.
So how does the linker know which address to jump to out of the 2 or
multiple matching addresses. What would be the usecases for same
text being mapped multiple times and both being executable?

-- 
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
@ 2011-01-26 16:56               ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-26 16:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-26 16:45:56]:

> On Wed, 2011-01-26 at 21:00 +0530, Srikar Dronamraju wrote:
> > * Peter Zijlstra <peterz@infradead.org> [2011-01-26 11:11:48]:
> > 
> > > On Wed, 2011-01-26 at 13:25 +0530, Srikar Dronamraju wrote:
> > > > 
> > > > > > +
> > > > > > +               list_add(&mm->uprobes_list, &tmp_list);
> > > > > > +               mm->uprobes_vaddr = vma->vm_start + offset;
> > > > > > +       }
> > > > > > +       spin_unlock(&mapping->i_mmap_lock);
> > > > > 
> > > > > Both this and unregister are racy, what is to say:
> > > > >  - the vma didn't get removed from the mm
> > > > >  - no new matching vma got added
> > > > > 
> > > > 
> > > > register_uprobe, unregister_uprobe, uprobe_mmap are all synchronized by
> > > > uprobes_mutex. So I dont see one unregister_uprobe getting thro when
> > > > another register_uprobe is working with a vma.
> > > > 
> > > > If I am missing something elementary, please explain a bit more.
> > > 
> > > afaict you're not holding the mmap_sem, so userspace can simply unmap
> > > the vma.
> > 
> > When we do the actual insert/remove of the breakpoint we hold the
> > mmap_sem. During the actual insertion/removal, if the vma for the
> > specific inode is not found, we just come out without doing the
> > actual insertion/deletion.
> 
> Right, but then install_uprobe() should:
> 
>  - lookup the vma relating to the address you stored,

We already do this thro get_user_pages in write_opcode().

>  - validate that the vma is indeed a map of the right inode

We can add a check in write_opcode( we need to pass the inode to
write_opcode).

>  - validate that the offset of the probe corresponds with the stored
> address

I am not clear on this. We would have derived the address from the
offset. So is that we check for
 (vaddr == vma->vm_start + uprobe->offset)

> 
> Otherwise you can race with unmap/map and end up installing the probe in
> a random location.
> 
> Also, I think the whole thing goes funny if someone maps the same text
> twice ;-)

I am not sure if we can map the same text twice. If something like
this is possible then we would have 2 addresses for each function.
So how does the linker know which address to jump to out of the 2 or
multiple matching addresses. What would be the usecases for same
text being mapped multiple times and both being executable?

-- 
Thanks and Regards
Srikar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2011-01-26 16:56               ` Srikar Dronamraju
@ 2011-01-26 17:12                 ` Peter Zijlstra
  -1 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 17:12 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Wed, 2011-01-26 at 22:26 +0530, Srikar Dronamraju wrote:

> >  - lookup the vma relating to the address you stored,
> 
> We already do this thro get_user_pages in write_opcode().

Ah, I didn't read that far..

> >  - validate that the vma is indeed a map of the right inode
> 
> We can add a check in write_opcode( we need to pass the inode to
> write_opcode).

sure..

> >  - validate that the offset of the probe corresponds with the stored
> > address
> 
> I am not clear on this. We would have derived the address from the
> offset. So is that we check for
>  (vaddr == vma->vm_start + uprobe->offset)

Sure, but the vma might have changed since you computed the offset -)

> > 
> > Otherwise you can race with unmap/map and end up installing the probe in
> > a random location.
> > 
> > Also, I think the whole thing goes funny if someone maps the same text
> > twice ;-)
> 
> I am not sure if we can map the same text twice. If something like
> this is possible then we would have 2 addresses for each function.
> So how does the linker know which address to jump to out of the 2 or
> multiple matching addresses. What would be the usecases for same
> text being mapped multiple times and both being executable?

You can, if only to wreck your thing, you can call mmap() as often as
you like (until your virtual memory space runs out) and get many many
mapping of the same file.

It doesn't need to make sense to the linker, all it needs to do is
confuse your code ;-)

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
@ 2011-01-26 17:12                 ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-26 17:12 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Wed, 2011-01-26 at 22:26 +0530, Srikar Dronamraju wrote:

> >  - lookup the vma relating to the address you stored,
> 
> We already do this thro get_user_pages in write_opcode().

Ah, I didn't read that far..

> >  - validate that the vma is indeed a map of the right inode
> 
> We can add a check in write_opcode( we need to pass the inode to
> write_opcode).

sure..

> >  - validate that the offset of the probe corresponds with the stored
> > address
> 
> I am not clear on this. We would have derived the address from the
> offset. So is that we check for
>  (vaddr == vma->vm_start + uprobe->offset)

Sure, but the vma might have changed since you computed the offset -)

> > 
> > Otherwise you can race with unmap/map and end up installing the probe in
> > a random location.
> > 
> > Also, I think the whole thing goes funny if someone maps the same text
> > twice ;-)
> 
> I am not sure if we can map the same text twice. If something like
> this is possible then we would have 2 addresses for each function.
> So how does the linker know which address to jump to out of the 2 or
> multiple matching addresses. What would be the usecases for same
> text being mapped multiple times and both being executable?

You can, if only to wreck your thing, you can call mmap() as often as
you like (until your virtual memory space runs out) and get many many
mapping of the same file.

It doesn't need to make sense to the linker, all it needs to do is
confuse your code ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 16/20] 16: uprobes: register a notifier for uprobes.
  2011-01-25 13:56   ` Peter Zijlstra
@ 2011-01-27  6:50     ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-27  6:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, LKML, SystemTap, Linux-mm, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, Andrew Morton,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-25 14:56:18]:

> On Thu, 2010-12-16 at 15:30 +0530, Srikar Dronamraju wrote:
> > Uprobe needs to be intimated on int3 and singlestep exceptions.
> > Hence uprobes registers a die notifier so that its notified of the events.
> 
> Why isn't this part of the previous patch? This splitup really doesn't
> make sense.

The die notifier which is introduced in patch 15 is arch dependent
(i.e x86). Whereever possible I have kept the arch dependent parts
separate from the arch independent parts so that porting to
another architecture becomes easier and clear.

-- 
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 13/20] 13: x86: x86 specific probe handling
  2011-01-25 13:56   ` Peter Zijlstra
@ 2011-01-27  9:40     ` Srikar Dronamraju
  2011-01-27 10:22       ` Peter Zijlstra
  0 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-27  9:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, Andrew Morton, SystemTap, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, LKML,
	Roland McGrath, Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-25 14:56:22]:

> On Thu, 2010-12-16 at 15:29 +0530, Srikar Dronamraju wrote:
> > 
> > +void arch_uprobe_enable_sstep(struct pt_regs *regs)
> > +{
> > +       /*
> > +        * Enable single-stepping by
> > +        * - Set TF on stack
> > +        * - Set TIF_SINGLESTEP: Guarantees that TF is set when
> > +        *      returning to user mode.
> > +        *  - Indicate that TF is set by us.
> > +        */
> > +       regs->flags |= X86_EFLAGS_TF;
> > +       set_thread_flag(TIF_SINGLESTEP);
> > +       set_thread_flag(TIF_FORCED_TF);
> > +}
> > +
> > +void arch_uprobe_disable_sstep(struct pt_regs *regs)
> > +{
> > +       /* Disable single-stepping by clearing what we set */
> > +       clear_thread_flag(TIF_SINGLESTEP);
> > +       clear_thread_flag(TIF_FORCED_TF);
> > +       regs->flags &= ~X86_EFLAGS_TF;
> > +} 
> 
> Why not use the code from arch/x86/kernel/step.c?

user_enable_single_step and user_disable_single_step that are
defined in arch/x86/kernel/step.c cant be called in interrupt context.

Initially we were looking at enabling/disabling singlestep in
interrupt context. Even now we disable singlestep in post notifier in
interrupt context.

Though  arch/x86/kernel/step.c has a static function
enable_single_step which is identical to arch_uprobe_enable_sstep;
there is no equivalent function for arch_uprobe_disable_sstep.

-- 
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2011-01-26 17:12                 ` Peter Zijlstra
@ 2011-01-27 10:01                   ` Srikar Dronamraju
  -1 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-27 10:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-26 18:12:29]:

> On Wed, 2011-01-26 at 22:26 +0530, Srikar Dronamraju wrote:
> 
> > >  - lookup the vma relating to the address you stored,
> > 
> > We already do this thro get_user_pages in write_opcode().
> 
> Ah, I didn't read that far..
> 
> > >  - validate that the vma is indeed a map of the right inode
> > 
> > We can add a check in write_opcode( we need to pass the inode to
> > write_opcode).
> 
> sure..
> 
> > >  - validate that the offset of the probe corresponds with the stored
> > > address
> > 
> > I am not clear on this. We would have derived the address from the
> > offset. So is that we check for
> >  (vaddr == vma->vm_start + uprobe->offset)
> 
> Sure, but the vma might have changed since you computed the offset -)

If the vma has changed then it would fail the 2nd validation i.e vma
corresponds to the uprobe inode right. If the vma was unmapped and
mapped back at the same place, then I guess we are okay to probe.

> 
> > > 
> > > Otherwise you can race with unmap/map and end up installing the probe in
> > > a random location.
> > > 
> > > Also, I think the whole thing goes funny if someone maps the same text
> > > twice ;-)
> > 
> > I am not sure if we can map the same text twice. If something like
> > this is possible then we would have 2 addresses for each function.
> > So how does the linker know which address to jump to out of the 2 or
> > multiple matching addresses. What would be the usecases for same
> > text being mapped multiple times and both being executable?
> 
> You can, if only to wreck your thing, you can call mmap() as often as
> you like (until your virtual memory space runs out) and get many many
> mapping of the same file.
> 
> It doesn't need to make sense to the linker, all it needs to do is
> confuse your code ;-)

Currently if there are multiple mappings of the same executable
code, only one mapped area would have the breakpoint inserted.

If the code were to execute from some other mapping, then it would
work as if there are no probes.  However if the code from the
mapping that had the breakpoint executes then we would see the
probes.

If we want to insert breakpoints in each of the maps then we
would have to extend mm->uprobes_vaddr.

Do you have any other ideas to tackle this?
Infact do you think we should be handling this case?

-- 
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
@ 2011-01-27 10:01                   ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-27 10:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-26 18:12:29]:

> On Wed, 2011-01-26 at 22:26 +0530, Srikar Dronamraju wrote:
> 
> > >  - lookup the vma relating to the address you stored,
> > 
> > We already do this thro get_user_pages in write_opcode().
> 
> Ah, I didn't read that far..
> 
> > >  - validate that the vma is indeed a map of the right inode
> > 
> > We can add a check in write_opcode( we need to pass the inode to
> > write_opcode).
> 
> sure..
> 
> > >  - validate that the offset of the probe corresponds with the stored
> > > address
> > 
> > I am not clear on this. We would have derived the address from the
> > offset. So is that we check for
> >  (vaddr == vma->vm_start + uprobe->offset)
> 
> Sure, but the vma might have changed since you computed the offset -)

If the vma has changed then it would fail the 2nd validation i.e vma
corresponds to the uprobe inode right. If the vma was unmapped and
mapped back at the same place, then I guess we are okay to probe.

> 
> > > 
> > > Otherwise you can race with unmap/map and end up installing the probe in
> > > a random location.
> > > 
> > > Also, I think the whole thing goes funny if someone maps the same text
> > > twice ;-)
> > 
> > I am not sure if we can map the same text twice. If something like
> > this is possible then we would have 2 addresses for each function.
> > So how does the linker know which address to jump to out of the 2 or
> > multiple matching addresses. What would be the usecases for same
> > text being mapped multiple times and both being executable?
> 
> You can, if only to wreck your thing, you can call mmap() as often as
> you like (until your virtual memory space runs out) and get many many
> mapping of the same file.
> 
> It doesn't need to make sense to the linker, all it needs to do is
> confuse your code ;-)

Currently if there are multiple mappings of the same executable
code, only one mapped area would have the breakpoint inserted.

If the code were to execute from some other mapping, then it would
work as if there are no probes.  However if the code from the
mapping that had the breakpoint executes then we would see the
probes.

If we want to insert breakpoints in each of the maps then we
would have to extend mm->uprobes_vaddr.

Do you have any other ideas to tackle this?
Infact do you think we should be handling this case?

-- 
Thanks and Regards
Srikar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 13/20] 13: x86: x86 specific probe handling
  2011-01-27  9:40     ` Srikar Dronamraju
@ 2011-01-27 10:22       ` Peter Zijlstra
  2011-01-27 19:11         ` Roland McGrath
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-27 10:22 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Arnaldo Carvalho de Melo,
	Linus Torvalds, Masami Hiramatsu, Christoph Hellwig, Andi Kleen,
	Oleg Nesterov, Andrew Morton, SystemTap, Jim Keniston,
	Frederic Weisbecker, Ananth N Mavinakayanahalli, LKML,
	Roland McGrath, Paul E. McKenney

On Thu, 2011-01-27 at 15:10 +0530, Srikar Dronamraju wrote:
> * Peter Zijlstra <peterz@infradead.org> [2011-01-25 14:56:22]:
> 
> > On Thu, 2010-12-16 at 15:29 +0530, Srikar Dronamraju wrote:
> > > 
> > > +void arch_uprobe_enable_sstep(struct pt_regs *regs)
> > > +{
> > > +       /*
> > > +        * Enable single-stepping by
> > > +        * - Set TF on stack
> > > +        * - Set TIF_SINGLESTEP: Guarantees that TF is set when
> > > +        *      returning to user mode.
> > > +        *  - Indicate that TF is set by us.
> > > +        */
> > > +       regs->flags |= X86_EFLAGS_TF;
> > > +       set_thread_flag(TIF_SINGLESTEP);
> > > +       set_thread_flag(TIF_FORCED_TF);
> > > +}
> > > +
> > > +void arch_uprobe_disable_sstep(struct pt_regs *regs)
> > > +{
> > > +       /* Disable single-stepping by clearing what we set */
> > > +       clear_thread_flag(TIF_SINGLESTEP);
> > > +       clear_thread_flag(TIF_FORCED_TF);
> > > +       regs->flags &= ~X86_EFLAGS_TF;
> > > +} 
> > 
> > Why not use the code from arch/x86/kernel/step.c?
> 
> user_enable_single_step and user_disable_single_step that are
> defined in arch/x86/kernel/step.c cant be called in interrupt context.

Right, because of is_setting_trap_flag()..

> Initially we were looking at enabling/disabling singlestep in
> interrupt context. Even now we disable singlestep in post notifier in
> interrupt context.
> 
> Though  arch/x86/kernel/step.c has a static function
> enable_single_step which is identical to arch_uprobe_enable_sstep;
> there is no equivalent function for arch_uprobe_disable_sstep.

Its not even close to identical, its very careful to deal with user-mode
already doing single step.

But I'll leave this to the x86 people who actually know the intricacies
of the single step cruft, I was just wondering why you weren't using (or
extending) the existing code.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2011-01-27 10:01                   ` Srikar Dronamraju
@ 2011-01-27 10:23                     ` Peter Zijlstra
  -1 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-27 10:23 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Thu, 2011-01-27 at 15:31 +0530, Srikar Dronamraju wrote:
> > > >  - validate that the vma is indeed a map of the right inode
> > > 
> > > We can add a check in write_opcode( we need to pass the inode to
> > > write_opcode).
> > 
> > sure..
> > 
> > > >  - validate that the offset of the probe corresponds with the stored
> > > > address
> > > 
> > > I am not clear on this. We would have derived the address from the
> > > offset. So is that we check for
> > >  (vaddr == vma->vm_start + uprobe->offset)
> > 
> > Sure, but the vma might have changed since you computed the offset -)
> 
> If the vma has changed then it would fail the 2nd validation i.e vma
> corresponds to the uprobe inode right. If the vma was unmapped and
> mapped back at the same place, then I guess we are okay to probe.

It can be unmapped and mapped back slightly different. A map of the same
file doesn't need to mean its in the exact same location or has the
exact same pgoffset.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
@ 2011-01-27 10:23                     ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-27 10:23 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Thu, 2011-01-27 at 15:31 +0530, Srikar Dronamraju wrote:
> > > >  - validate that the vma is indeed a map of the right inode
> > > 
> > > We can add a check in write_opcode( we need to pass the inode to
> > > write_opcode).
> > 
> > sure..
> > 
> > > >  - validate that the offset of the probe corresponds with the stored
> > > > address
> > > 
> > > I am not clear on this. We would have derived the address from the
> > > offset. So is that we check for
> > >  (vaddr == vma->vm_start + uprobe->offset)
> > 
> > Sure, but the vma might have changed since you computed the offset -)
> 
> If the vma has changed then it would fail the 2nd validation i.e vma
> corresponds to the uprobe inode right. If the vma was unmapped and
> mapped back at the same place, then I guess we are okay to probe.

It can be unmapped and mapped back slightly different. A map of the same
file doesn't need to mean its in the exact same location or has the
exact same pgoffset.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2011-01-27 10:23                     ` Peter Zijlstra
@ 2011-01-27 10:25                       ` Srikar Dronamraju
  -1 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-27 10:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-27 11:23:37]:

> On Thu, 2011-01-27 at 15:31 +0530, Srikar Dronamraju wrote:
> > > > >  - validate that the vma is indeed a map of the right inode
> > > > 
> > > > We can add a check in write_opcode( we need to pass the inode to
> > > > write_opcode).
> > > 
> > > sure..
> > > 
> > > > >  - validate that the offset of the probe corresponds with the stored
> > > > > address
> > > > 
> > > > I am not clear on this. We would have derived the address from the
> > > > offset. So is that we check for
> > > >  (vaddr == vma->vm_start + uprobe->offset)
> > > 
> > > Sure, but the vma might have changed since you computed the offset -)
> > 
> > If the vma has changed then it would fail the 2nd validation i.e vma
> > corresponds to the uprobe inode right. If the vma was unmapped and
> > mapped back at the same place, then I guess we are okay to probe.
> 
> It can be unmapped and mapped back slightly different. A map of the same
> file doesn't need to mean its in the exact same location or has the
> exact same pgoffset.
> 
> 

If its not at the exact same location, then our third validation of
checking that (vaddr == vma->vm_start + uprobe->offset)  should fail
right?

Also should it be (vaddr == uprobe->offset + vma->vm_start -
vma->pgoff << PAGE_SHIFT) ?

-- 
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
@ 2011-01-27 10:25                       ` Srikar Dronamraju
  0 siblings, 0 replies; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-27 10:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

* Peter Zijlstra <peterz@infradead.org> [2011-01-27 11:23:37]:

> On Thu, 2011-01-27 at 15:31 +0530, Srikar Dronamraju wrote:
> > > > >  - validate that the vma is indeed a map of the right inode
> > > > 
> > > > We can add a check in write_opcode( we need to pass the inode to
> > > > write_opcode).
> > > 
> > > sure..
> > > 
> > > > >  - validate that the offset of the probe corresponds with the stored
> > > > > address
> > > > 
> > > > I am not clear on this. We would have derived the address from the
> > > > offset. So is that we check for
> > > >  (vaddr == vma->vm_start + uprobe->offset)
> > > 
> > > Sure, but the vma might have changed since you computed the offset -)
> > 
> > If the vma has changed then it would fail the 2nd validation i.e vma
> > corresponds to the uprobe inode right. If the vma was unmapped and
> > mapped back at the same place, then I guess we are okay to probe.
> 
> It can be unmapped and mapped back slightly different. A map of the same
> file doesn't need to mean its in the exact same location or has the
> exact same pgoffset.
> 
> 

If its not at the exact same location, then our third validation of
checking that (vaddr == vma->vm_start + uprobe->offset)  should fail
right?

Also should it be (vaddr == uprobe->offset + vma->vm_start -
vma->pgoff << PAGE_SHIFT) ?

-- 
Thanks and Regards
Srikar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2011-01-27 10:01                   ` Srikar Dronamraju
@ 2011-01-27 10:29                     ` Peter Zijlstra
  -1 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-27 10:29 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Thu, 2011-01-27 at 15:31 +0530, Srikar Dronamraju wrote:
> 
> > You can, if only to wreck your thing, you can call mmap() as often as
> > you like (until your virtual memory space runs out) and get many many
> > mapping of the same file.
> > 
> > It doesn't need to make sense to the linker, all it needs to do is
> > confuse your code ;-)
> 
> Currently if there are multiple mappings of the same executable
> code, only one mapped area would have the breakpoint inserted.

Right, so you could use it to make debugging harder..

> If the code were to execute from some other mapping, then it would
> work as if there are no probes.  However if the code from the
> mapping that had the breakpoint executes then we would see the
> probes.
> 
> If we want to insert breakpoints in each of the maps then we
> would have to extend mm->uprobes_vaddr.
> 
> Do you have any other ideas to tackle this?

Supposing I can get my preemptible mmu patches anywhere.. you could
simply call install_uprobe() while holding the i_mmap_mutex ;-)

> Infact do you think we should be handling this case?

I'm really not sure how often this would happen, but dealing with it
sure makes me feel better..

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
@ 2011-01-27 10:29                     ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-27 10:29 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Thu, 2011-01-27 at 15:31 +0530, Srikar Dronamraju wrote:
> 
> > You can, if only to wreck your thing, you can call mmap() as often as
> > you like (until your virtual memory space runs out) and get many many
> > mapping of the same file.
> > 
> > It doesn't need to make sense to the linker, all it needs to do is
> > confuse your code ;-)
> 
> Currently if there are multiple mappings of the same executable
> code, only one mapped area would have the breakpoint inserted.

Right, so you could use it to make debugging harder..

> If the code were to execute from some other mapping, then it would
> work as if there are no probes.  However if the code from the
> mapping that had the breakpoint executes then we would see the
> probes.
> 
> If we want to insert breakpoints in each of the maps then we
> would have to extend mm->uprobes_vaddr.
> 
> Do you have any other ideas to tackle this?

Supposing I can get my preemptible mmu patches anywhere.. you could
simply call install_uprobe() while holding the i_mmap_mutex ;-)

> Infact do you think we should be handling this case?

I'm really not sure how often this would happen, but dealing with it
sure makes me feel better..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
  2011-01-27 10:25                       ` Srikar Dronamraju
@ 2011-01-27 10:41                         ` Peter Zijlstra
  -1 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-27 10:41 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Thu, 2011-01-27 at 15:55 +0530, Srikar Dronamraju wrote:
> 
> 
> If its not at the exact same location, then our third validation of
> checking that (vaddr == vma->vm_start + uprobe->offset)  should fail
> right?
> 
> Also should it be (vaddr == uprobe->offset + vma->vm_start -
> vma->pgoff << PAGE_SHIFT) ?

Yeah, although I just realized that ->offset should be a u64, since
pgoff is a unsigned long, we can have files up to 44 bit (assuming the
page-size is 12bits).

But yes, this matches the validation I mentioned.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 5/20]  5: Uprobes: register/unregister probes.
@ 2011-01-27 10:41                         ` Peter Zijlstra
  0 siblings, 0 replies; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-27 10:41 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, Steven Rostedt, Linux-mm, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, Andrew Morton, SystemTap,
	Jim Keniston, Frederic Weisbecker, Andi Kleen, LKML,
	Paul E. McKenney

On Thu, 2011-01-27 at 15:55 +0530, Srikar Dronamraju wrote:
> 
> 
> If its not at the exact same location, then our third validation of
> checking that (vaddr == vma->vm_start + uprobe->offset)  should fail
> right?
> 
> Also should it be (vaddr == uprobe->offset + vma->vm_start -
> vma->pgoff << PAGE_SHIFT) ?

Yeah, although I just realized that ->offset should be a u64, since
pgoff is a unsigned long, we can have files up to 44 bit (assuming the
page-size is 12bits).

But yes, this matches the validation I mentioned.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-26  9:06       ` Srikar Dronamraju
@ 2011-01-27 17:03         ` Steven Rostedt
  2011-01-28  4:53           ` Srikar Dronamraju
  0 siblings, 1 reply; 116+ messages in thread
From: Steven Rostedt @ 2011-01-27 17:03 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Wed, 2011-01-26 at 14:36 +0530, Srikar Dronamraju wrote:

> > Not to mention that p is uninitialized. Did this code ever work?
> 
> I think the original patch that I sent had p initialized. I think it got
> dropped off by Peter when he replied. Please do confirm.


> +static void search_within_subtree(struct rb_node *n, struct inode *inode,
> +               struct list_head *tmp_list)
> +{
> +       struct rb_node *p;
> +
> +       if (p)
> +               __search_within_subtree(p, inode, tmp_list);
> +
> +       p = n->rb_right;
> +       if (p)
> +               __search_within_subtree(p, inode, tmp_list);
> +}
> +
> 
The above is from the original patch. 'p' does not look initialized to
me.

-- Steve



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 13/20] 13: x86: x86 specific probe handling
  2011-01-27 10:22       ` Peter Zijlstra
@ 2011-01-27 19:11         ` Roland McGrath
  2011-01-28  4:57           ` Srikar Dronamraju
  0 siblings, 1 reply; 116+ messages in thread
From: Roland McGrath @ 2011-01-27 19:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Steven Rostedt,
	Arnaldo Carvalho de Melo, Linus Torvalds, Masami Hiramatsu,
	Christoph Hellwig, Andi Kleen, Oleg Nesterov, Andrew Morton,
	SystemTap, Jim Keniston, Frederic Weisbecker,
	Ananth N Mavinakayanahalli, LKML, Paul E. McKenney

> But I'll leave this to the x86 people who actually know the intricacies
> of the single step cruft, I was just wondering why you weren't using (or
> extending) the existing code.

The hairy aspects of the step.c code are hairy (and not usable at interrupt
level) because they do some instruction analysis.  Since uprobes already
does its own instruction analysis, reusing step.c's separate hacks makes
less sense to me than integrating knowledge of the single-step vs
pushf/popf issues into the uprobes instruction analysis.

That said, there is further nontriviality just to do with the block-step
support and with not clobbering user-visible usage of TF in eflags, which
uprobes needs to handle as well.  It makes sense to share that code rather
than repeating it, even if that entails changes to the step.c code.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-27 17:03         ` Steven Rostedt
@ 2011-01-28  4:53           ` Srikar Dronamraju
  2011-01-28 13:57             ` Steven Rostedt
  0 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-28  4:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

* Steven Rostedt <rostedt@goodmis.org> [2011-01-27 12:03:57]:

> On Wed, 2011-01-26 at 14:36 +0530, Srikar Dronamraju wrote:
> 
> > > Not to mention that p is uninitialized. Did this code ever work?
> > 
> > I think the original patch that I sent had p initialized. I think it got
> > dropped off by Peter when he replied. Please do confirm.
> 
> 
> > +static void search_within_subtree(struct rb_node *n, struct inode *inode,
> > +               struct list_head *tmp_list)
> > +{
> > +       struct rb_node *p;
> > +
> > +       if (p)
> > +               __search_within_subtree(p, inode, tmp_list);
> > +
> > +       p = n->rb_right;
> > +       if (p)
> > +               __search_within_subtree(p, inode, tmp_list);
> > +}
> > +
> > 
> The above is from the original patch. 'p' does not look initialized to
> me.
> 

> -- Steve
> 
> 

Here is the extract from the original patch at
https://lkml.org/lkml/2010/12/16/74 that I sent to LKML and I dont see
p being uninitialized. 

+
+static void search_within_subtree(struct rb_node *n, struct inode *inode,
+		struct list_head *tmp_list)
+{
+	struct rb_node *p;
+
+	p = n->rb_left;
+	if (p)
+		__search_within_subtree(p, inode, tmp_list);
+
+	p = n->rb_right;
+	if (p)
+		__search_within_subtree(p, inode, tmp_list);
+}
+

However I have already agreed to remove this recursion and replace it
with a rb_next() logic.

-- 
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 13/20] 13: x86: x86 specific probe handling
  2011-01-27 19:11         ` Roland McGrath
@ 2011-01-28  4:57           ` Srikar Dronamraju
  2011-01-28  6:23             ` Roland McGrath
  0 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-28  4:57 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Arnaldo Carvalho de Melo, Linus Torvalds, Masami Hiramatsu,
	Christoph Hellwig, Andi Kleen, Oleg Nesterov, Andrew Morton,
	SystemTap, Jim Keniston, Frederic Weisbecker,
	Ananth N Mavinakayanahalli, LKML, Paul E. McKenney

Hi Roland,

> > But I'll leave this to the x86 people who actually know the intricacies
> > of the single step cruft, I was just wondering why you weren't using (or
> > extending) the existing code.
> 
> The hairy aspects of the step.c code are hairy (and not usable at interrupt
> level) because they do some instruction analysis.  Since uprobes already
> does its own instruction analysis, reusing step.c's separate hacks makes
> less sense to me than integrating knowledge of the single-step vs
> pushf/popf issues into the uprobes instruction analysis.
> 
> That said, there is further nontriviality just to do with the block-step
> support and with not clobbering user-visible usage of TF in eflags, which
> uprobes needs to handle as well.  It makes sense to share that code rather
> than repeating it, even if that entails changes to the step.c code.
> 

Uprobes doesn't request/handle block-step for now. So can we postpone
your suggested changes till uprobes needs to handle block-step?

-- 
Thanks and Regards
Srikar


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 13/20] 13: x86: x86 specific probe handling
  2011-01-28  4:57           ` Srikar Dronamraju
@ 2011-01-28  6:23             ` Roland McGrath
  2011-01-28  8:36               ` Peter Zijlstra
  0 siblings, 1 reply; 116+ messages in thread
From: Roland McGrath @ 2011-01-28  6:23 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Arnaldo Carvalho de Melo, Linus Torvalds, Masami Hiramatsu,
	Christoph Hellwig, Andi Kleen, Oleg Nesterov, Andrew Morton,
	SystemTap, Jim Keniston, Frederic Weisbecker,
	Ananth N Mavinakayanahalli, LKML, Paul E. McKenney

> Uprobes doesn't request/handle block-step for now. So can we postpone
> your suggested changes till uprobes needs to handle block-step?

That's not the issue.  The way the hardware works is that if the bit is set
in the MSR, then the TF eflags bit means block-step instead of single-step.
So if PTRACE_SINGLEBLOCK has been used (i.e. user_enable_block_step), then
this can interfere with your use of single-step.  You need to do the work
in the else branch of step.c:enable_step to ensure that the hardware is not
left in the state where it will do block-step instead of single-step when
uprobes wants a single-step done.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 13/20] 13: x86: x86 specific probe handling
  2011-01-28  6:23             ` Roland McGrath
@ 2011-01-28  8:36               ` Peter Zijlstra
  2011-01-28 18:23                 ` Roland McGrath
  0 siblings, 1 reply; 116+ messages in thread
From: Peter Zijlstra @ 2011-01-28  8:36 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Srikar Dronamraju, Ingo Molnar, Steven Rostedt,
	Arnaldo Carvalho de Melo, Linus Torvalds, Masami Hiramatsu,
	Christoph Hellwig, Andi Kleen, Oleg Nesterov, Andrew Morton,
	SystemTap, Jim Keniston, Frederic Weisbecker,
	Ananth N Mavinakayanahalli, LKML, Paul E. McKenney

On Thu, 2011-01-27 at 22:23 -0800, Roland McGrath wrote:
> > Uprobes doesn't request/handle block-step for now. So can we postpone
> > your suggested changes till uprobes needs to handle block-step?
> 
> That's not the issue.  The way the hardware works is that if the bit is set
> in the MSR, then the TF eflags bit means block-step instead of single-step.
> So if PTRACE_SINGLEBLOCK has been used (i.e. user_enable_block_step), then
> this can interfere with your use of single-step.  You need to do the work
> in the else branch of step.c:enable_step to ensure that the hardware is not
> left in the state where it will do block-step instead of single-step when
> uprobes wants a single-step done. 

And reset the hardware back to block step when done, and provide the
actual break blockstep would have.

Suppose you hit a breakpoint on the return path while the user it
debugging in blockstep mode, that should all just work.

So there you trap on the return, switch to single step to execute the
return out of line, when done you need to actually break to userspace
since its the end of a block, as well as reset block mode.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-28  4:53           ` Srikar Dronamraju
@ 2011-01-28 13:57             ` Steven Rostedt
  2011-01-28 14:28               ` Steven Rostedt
  0 siblings, 1 reply; 116+ messages in thread
From: Steven Rostedt @ 2011-01-28 13:57 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Fri, 2011-01-28 at 10:23 +0530, Srikar Dronamraju wrote:

> Here is the extract from the original patch at
> https://lkml.org/lkml/2010/12/16/74 that I sent to LKML and I dont see
> p being uninitialized. 

I very much apologize. Seems that Evolution was playing tricks on me.

  http://rostedt.homelinux.com/private/evolution-wtf.png

That's a screen shot of your email in the preview window. Where 'p' is
definitely not initialized. WTF happened to my email?

But all the replies to that email does not have it initialized either ??
I was not the only one affected by this.

-- Steve



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-28 13:57             ` Steven Rostedt
@ 2011-01-28 14:28               ` Steven Rostedt
  2011-01-28 14:46                 ` Srikar Dronamraju
  0 siblings, 1 reply; 116+ messages in thread
From: Steven Rostedt @ 2011-01-28 14:28 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Fri, 2011-01-28 at 08:57 -0500, Steven Rostedt wrote:
> On Fri, 2011-01-28 at 10:23 +0530, Srikar Dronamraju wrote:
> 
> > Here is the extract from the original patch at
> > https://lkml.org/lkml/2010/12/16/74 that I sent to LKML and I dont see
> > p being uninitialized. 
> 
> I very much apologize. Seems that Evolution was playing tricks on me.
> 
>   http://rostedt.homelinux.com/private/evolution-wtf.png
> 
> That's a screen shot of your email in the preview window. Where 'p' is
> definitely not initialized. WTF happened to my email?
> 
> But all the replies to that email does not have it initialized either ??
> I was not the only one affected by this.
> 

Seems to be a bug in Evolution :-p  As both Peter and I tested out this
email in mutt, and that line appears ?? And Peter tried it out in raw
mode (^u) and it shows up there in Evolution.

I know, I know, GUI email clients suck. And for good reason too. I guess
I just spent too many years in the corporate environment to know any
better.

-- Steve



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-28 14:28               ` Steven Rostedt
@ 2011-01-28 14:46                 ` Srikar Dronamraju
  2011-01-28 15:02                   ` Steven Rostedt
  0 siblings, 1 reply; 116+ messages in thread
From: Srikar Dronamraju @ 2011-01-28 14:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

* Steven Rostedt <rostedt@goodmis.org> [2011-01-28 09:28:46]:

> On Fri, 2011-01-28 at 08:57 -0500, Steven Rostedt wrote:
> > On Fri, 2011-01-28 at 10:23 +0530, Srikar Dronamraju wrote:
> > 
> > > Here is the extract from the original patch at
> > > https://lkml.org/lkml/2010/12/16/74 that I sent to LKML and I dont see
> > > p being uninitialized. 
> > 
> > I very much apologize. Seems that Evolution was playing tricks on me.
> > 
> >   http://rostedt.homelinux.com/private/evolution-wtf.png
> > 
> > That's a screen shot of your email in the preview window. Where 'p' is
> > definitely not initialized. WTF happened to my email?
> > 
> > But all the replies to that email does not have it initialized either ??
> > I was not the only one affected by this.
> > 
> 
> Seems to be a bug in Evolution :-p  As both Peter and I tested out this
> email in mutt, and that line appears ?? And Peter tried it out in raw
> mode (^u) and it shows up there in Evolution.
> 
> I know, I know, GUI email clients suck. And for good reason too. I guess
> I just spent too many years in the corporate environment to know any
> better.
> 

Interesting, Nothing you can do if your mail client decides to
delete one arbitrary line. Its Good that you told what you saw and
hence found a bug in evolution.

-- 
Thanks and Regards
Srikar

> -- Steve
> 
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 8/20]  8: uprobes: mmap and fork hooks.
  2011-01-28 14:46                 ` Srikar Dronamraju
@ 2011-01-28 15:02                   ` Steven Rostedt
  0 siblings, 0 replies; 116+ messages in thread
From: Steven Rostedt @ 2011-01-28 15:02 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Linus Torvalds, Ananth N Mavinakayanahalli, Christoph Hellwig,
	Masami Hiramatsu, Oleg Nesterov, LKML, SystemTap, Jim Keniston,
	Frederic Weisbecker, Andi Kleen, Andrew Morton, Paul E. McKenney

On Fri, 2011-01-28 at 20:16 +0530, Srikar Dronamraju wrote:

> Interesting, Nothing you can do if your mail client decides to
> delete one arbitrary line. Its Good that you told what you saw and
> hence found a bug in evolution.
> 

Yep, I'm filling out a BZ report right now.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [RFC] [PATCH 2.6.37-rc5-tip 13/20] 13: x86: x86 specific probe handling
  2011-01-28  8:36               ` Peter Zijlstra
@ 2011-01-28 18:23                 ` Roland McGrath
  0 siblings, 0 replies; 116+ messages in thread
From: Roland McGrath @ 2011-01-28 18:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, Ingo Molnar, Steven Rostedt,
	Arnaldo Carvalho de Melo, Linus Torvalds, Masami Hiramatsu,
	Christoph Hellwig, Andi Kleen, Oleg Nesterov, Andrew Morton,
	SystemTap, Jim Keniston, Frederic Weisbecker,
	Ananth N Mavinakayanahalli, LKML, Paul E. McKenney

> And reset the hardware back to block step when done, and provide the
> actual break blockstep would have.

Oh, sure, that too.  If you're that ambitious, then the place to start
first is with plain single-step working right.  When TF was already set
(either via user_enable_single_step, so TIF_SINGLESTEP is set, or just from
user mode, so it and TIF_FORCED_TF are not set, but TF is in the user
state's eflags) and you hit a uprobe, then after servicing the uprobe and
stepping over the copied original instruction and restoring the PC to where
it should be, you should let the trap turn into a SIGTRAP as normal rather
than swallowing it.

To support block-step correctly, you have to do something more clever.
If block-step was enabled (TIF_BLOCKSTEP set), then you need to figure
out which of two things is the right one to do.  If the copied original
instruction uprobes just single-stepped over is one that would trigger
block-step, then you should treat it as if plain single-step were
enabled, i.e. let that SIGTRAP go as above.  If not, then you should
swallow the signal, re-enable block-step and set TF (i.e. do the work of
user_enable_block_step) before resuming.  You have to decide which case
it is based on instruction analysis.  If it's a control-flow instruction
(including the syscall instructions), then it would trigger block-step.
IIRC a conditional branch instruction triggers it only if the branch is
taken (check the book), so you have to notice that too.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 116+ messages in thread

end of thread, other threads:[~2011-01-28 18:23 UTC | newest]

Thread overview: 116+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-16  9:57 [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju
2010-12-16  9:57 ` [RFC] [PATCH 2.6.37-rc5-tip 1/20] 1: mm: Move replace_page() / write_protect_page() to mm/memory.c Srikar Dronamraju
2010-12-16  9:57 ` [RFC] [PATCH 2.6.37-rc5-tip 2/20] 2: X86 specific breakpoint definitions Srikar Dronamraju
2010-12-16  9:57 ` [RFC] [PATCH 2.6.37-rc5-tip 3/20] 3: uprobes: Breakground page replacement Srikar Dronamraju
2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 4/20] 4: uprobes: Adding and remove a uprobe in a rb tree Srikar Dronamraju
2011-01-25 12:15   ` Peter Zijlstra
2011-01-26  8:37     ` Srikar Dronamraju
2011-01-26  8:37       ` Srikar Dronamraju
2011-01-25 12:15   ` Peter Zijlstra
2011-01-26  8:41     ` Srikar Dronamraju
2011-01-26 10:13       ` Peter Zijlstra
2011-01-25 12:15   ` Peter Zijlstra
2011-01-26  8:38     ` Srikar Dronamraju
2011-01-25 13:56   ` Peter Zijlstra
2011-01-26  8:45     ` Srikar Dronamraju
2011-01-26 10:14       ` Peter Zijlstra
2011-01-26 15:18         ` Srikar Dronamraju
2011-01-26 15:33           ` Peter Zijlstra
2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 5/20] 5: Uprobes: register/unregister probes Srikar Dronamraju
2011-01-25 12:15   ` Peter Zijlstra
2011-01-26  7:55     ` Srikar Dronamraju
2011-01-26  7:55       ` Srikar Dronamraju
2011-01-26 10:11       ` Peter Zijlstra
2011-01-26 10:11         ` Peter Zijlstra
2011-01-26 15:30         ` Srikar Dronamraju
2011-01-26 15:30           ` Srikar Dronamraju
2011-01-26 15:45           ` Peter Zijlstra
2011-01-26 15:45             ` Peter Zijlstra
2011-01-26 16:56             ` Srikar Dronamraju
2011-01-26 16:56               ` Srikar Dronamraju
2011-01-26 17:12               ` Peter Zijlstra
2011-01-26 17:12                 ` Peter Zijlstra
2011-01-27 10:01                 ` Srikar Dronamraju
2011-01-27 10:01                   ` Srikar Dronamraju
2011-01-27 10:23                   ` Peter Zijlstra
2011-01-27 10:23                     ` Peter Zijlstra
2011-01-27 10:25                     ` Srikar Dronamraju
2011-01-27 10:25                       ` Srikar Dronamraju
2011-01-27 10:41                       ` Peter Zijlstra
2011-01-27 10:41                         ` Peter Zijlstra
2011-01-27 10:29                   ` Peter Zijlstra
2011-01-27 10:29                     ` Peter Zijlstra
2011-01-25 12:15   ` Peter Zijlstra
2011-01-26  7:47     ` Srikar Dronamraju
2011-01-26  7:47       ` Srikar Dronamraju
2011-01-26 10:10       ` Peter Zijlstra
2011-01-26 10:10         ` Peter Zijlstra
2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 6/20] 6: x86: analyze instruction and determine fixups Srikar Dronamraju
2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 7/20] 7: uprobes: store/restore original instruction Srikar Dronamraju
2011-01-25 12:15   ` Peter Zijlstra
2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 8/20] 8: uprobes: mmap and fork hooks Srikar Dronamraju
2011-01-25 12:15   ` Peter Zijlstra
2011-01-26  9:03     ` Srikar Dronamraju
2011-01-26  9:03       ` Srikar Dronamraju
2011-01-26 10:20       ` Peter Zijlstra
2011-01-26 10:20         ` Peter Zijlstra
2011-01-26 14:59         ` Srikar Dronamraju
2011-01-26 14:59           ` Srikar Dronamraju
2011-01-26 15:16           ` Peter Zijlstra
2011-01-26 15:16             ` Peter Zijlstra
2011-01-26 16:30             ` Srikar Dronamraju
2011-01-26 16:30               ` Srikar Dronamraju
2011-01-25 12:15   ` Peter Zijlstra
2011-01-25 20:05     ` Steven Rostedt
2011-01-26  9:06       ` Srikar Dronamraju
2011-01-27 17:03         ` Steven Rostedt
2011-01-28  4:53           ` Srikar Dronamraju
2011-01-28 13:57             ` Steven Rostedt
2011-01-28 14:28               ` Steven Rostedt
2011-01-28 14:46                 ` Srikar Dronamraju
2011-01-28 15:02                   ` Steven Rostedt
2011-01-26 15:09     ` Srikar Dronamraju
2011-01-26 15:09       ` Srikar Dronamraju
2011-01-26 15:20       ` Peter Zijlstra
2011-01-26 15:20         ` Peter Zijlstra
2010-12-16  9:58 ` [RFC] [PATCH 2.6.37-rc5-tip 9/20] 9: x86: architecture specific task information Srikar Dronamraju
2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 10/20] 10: uprobes: task specific information Srikar Dronamraju
2011-01-25 13:56   ` Peter Zijlstra
2011-01-25 18:38     ` Josh Stone
2011-01-25 18:55       ` Roland McGrath
2011-01-25 19:56       ` Peter Zijlstra
2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 11/20] 11: uprobes: slot allocation for uprobes Srikar Dronamraju
2011-01-25 13:56   ` Peter Zijlstra
2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 12/20] 12: uprobes: get the breakpoint address Srikar Dronamraju
2011-01-25 13:56   ` Peter Zijlstra
2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 13/20] 13: x86: x86 specific probe handling Srikar Dronamraju
2011-01-25 13:56   ` Peter Zijlstra
2011-01-27  9:40     ` Srikar Dronamraju
2011-01-27 10:22       ` Peter Zijlstra
2011-01-27 19:11         ` Roland McGrath
2011-01-28  4:57           ` Srikar Dronamraju
2011-01-28  6:23             ` Roland McGrath
2011-01-28  8:36               ` Peter Zijlstra
2011-01-28 18:23                 ` Roland McGrath
2010-12-16  9:59 ` [RFC] [PATCH 2.6.37-rc5-tip 14/20] 14: uprobes: Handing int3 and singlestep exception Srikar Dronamraju
2011-01-25 13:56   ` Peter Zijlstra
2011-01-25 13:56   ` Peter Zijlstra
2011-01-26  8:52     ` Srikar Dronamraju
2011-01-26  8:52       ` Srikar Dronamraju
2011-01-26 10:17       ` Peter Zijlstra
2011-01-26 10:17         ` Peter Zijlstra
2011-01-26 15:14         ` Srikar Dronamraju
2011-01-26 15:14           ` Srikar Dronamraju
2011-01-26 15:29           ` Peter Zijlstra
2011-01-26 15:29             ` Peter Zijlstra
2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 15/20] 15: x86: uprobes exception notifier for x86 Srikar Dronamraju
2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 16/20] 16: uprobes: register a notifier for uprobes Srikar Dronamraju
2011-01-25 13:56   ` Peter Zijlstra
2011-01-27  6:50     ` Srikar Dronamraju
2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 17/20] 17: uprobes: filter chain Srikar Dronamraju
2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 18/20] 18: uprobes: commonly used filters Srikar Dronamraju
2010-12-17 19:32   ` Valdis.Kletnieks
2010-12-18  3:04     ` Srikar Dronamraju
2010-12-16 10:00 ` [RFC] [PATCH 2.6.37-rc5-tip 19/20] 19: tracing: Extract out common code for kprobes/uprobes traceevents Srikar Dronamraju
2010-12-16 10:01 ` [RFC] [PATCH 2.6.37-rc5-tip 20/20] 20: tracing: uprobes trace_event interface Srikar Dronamraju
2010-12-16 10:07 ` [RFC] [PATCH 2.6.37-rc5-tip 0/20] 0: Inode based uprobes Srikar Dronamraju

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.