linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC V2] Add gup trace points support
@ 2015-12-02 22:53 Yang Shi
  2015-12-02 22:53 ` [PATCH V2 1/7] trace/events: Add gup trace events Yang Shi
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Yang Shi @ 2015-12-02 22:53 UTC (permalink / raw)
  To: akpm, rostedt, mingo; +Cc: linux-kernel, linux-mm, linaro-kernel, yang.shi


Changelog V1 --> V2:
Adopted commetns from Steven
* remove all reference to tsk->comm since it is unnecessary for non-sched
  trace points
* reduce arguments for __get_user_pages trace point and update mm/gup.c
  accordingly
* Added Ralf's acked-by for patch 4/7.

There is not content change for the trace points in arch specific mm/gup.c.


Some background about why I think this might be useful.

When I was profiling some hugetlb related program, I got page-faults event
doubled when hugetlb is enabled. When I looked into the code, I found page-faults
come from two places, do_page_fault and gup. So, I tried to figure out which
play a role (or both) in my use case. But I can't find existing finer tracing
event for sub page-faults in current mainline kernel.

So, I added the gup trace points support to have finer tracing events for
page-faults. The below events are added:

__get_user_pages
__get_user_pages_fast
fixup_user_fault

Both __get_user_pages and fixup_user_fault call handle_mm_fault.

Just added trace points to raw version __get_user_pages since all variants
will call it finally to do real work.

Although __get_user_pages_fast doesn't call handle_mm_fault, it might be useful
to have it to distinguish between slow and fast version.

Yang Shi (7):
      trace/events: Add gup trace events
      mm/gup: add gup trace points
      x86: mm/gup: add gup trace points
      mips: mm/gup: add gup trace points
      s390: mm/gup: add gup trace points
      sh: mm/gup: add gup trace points
      sparc64: mm/gup: add gup trace points

 arch/mips/mm/gup.c         |  7 +++++++
 arch/s390/mm/gup.c         |  7 +++++++
 arch/sh/mm/gup.c           |  8 ++++++++
 arch/sparc/mm/gup.c        |  8 ++++++++
 arch/x86/mm/gup.c          |  7 +++++++
 include/trace/events/gup.h | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/gup.c                   |  8 ++++++++
 7 files changed, 116 insertions(+)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH V2 1/7] trace/events: Add gup trace events
  2015-12-02 22:53 [RFC V2] Add gup trace points support Yang Shi
@ 2015-12-02 22:53 ` Yang Shi
  2015-12-03  4:07   ` Steven Rostedt
  2015-12-02 22:53 ` [PATCH V2 2/7] mm/gup: add gup trace points Yang Shi
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Yang Shi @ 2015-12-02 22:53 UTC (permalink / raw)
  To: akpm, rostedt, mingo; +Cc: linux-kernel, linux-mm, linaro-kernel, yang.shi

page-faults events record the invoke to handle_mm_fault, but the invoke
may come from do_page_fault or gup. In some use cases, the finer event count
mey be needed, so add trace events support for:

__get_user_pages
__get_user_pages_fast
fixup_user_fault

Signed-off-by: Yang Shi <yang.shi@linaro.org>
---
 include/trace/events/gup.h | 71 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)
 create mode 100644 include/trace/events/gup.h

diff --git a/include/trace/events/gup.h b/include/trace/events/gup.h
new file mode 100644
index 0000000..03a4674
--- /dev/null
+++ b/include/trace/events/gup.h
@@ -0,0 +1,71 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM gup
+
+#if !defined(_TRACE_GUP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_GUP_H
+
+#include <linux/types.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(gup_fixup_user_fault,
+
+	TP_PROTO(struct task_struct *tsk, struct mm_struct *mm,
+			unsigned long address, unsigned int fault_flags),
+
+	TP_ARGS(tsk, mm, address, fault_flags),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	address		)
+	),
+
+	TP_fast_assign(
+		__entry->address	= address;
+	),
+
+	TP_printk("address=%lx",  __entry->address)
+);
+
+TRACE_EVENT(gup_get_user_pages,
+
+	TP_PROTO(struct task_struct *tsk, struct mm_struct *mm,
+			unsigned long start, unsigned long nr_pages),
+
+	TP_ARGS(tsk, mm, start, nr_pages),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	start		)
+		__field(	unsigned long,	nr_pages	)
+	),
+
+	TP_fast_assign(
+		__entry->start		= start;
+		__entry->nr_pages	= nr_pages;
+	),
+
+	TP_printk("start=%lx nr_pages=%lu", __entry->start, __entry->nr_pages)
+);
+
+TRACE_EVENT(gup_get_user_pages_fast,
+
+	TP_PROTO(unsigned long start, int nr_pages, int write,
+			struct page **pages),
+
+	TP_ARGS(start, nr_pages, write, pages),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	start		)
+		__field(	unsigned long,	nr_pages	)
+	),
+
+	TP_fast_assign(
+		__entry->start  	= start;
+		__entry->nr_pages	= nr_pages;
+	),
+
+	TP_printk("start=%lx nr_pages=%lu",  __entry->start, __entry->nr_pages)
+);
+
+#endif /* _TRACE_GUP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
2.0.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V2 2/7] mm/gup: add gup trace points
  2015-12-02 22:53 [RFC V2] Add gup trace points support Yang Shi
  2015-12-02 22:53 ` [PATCH V2 1/7] trace/events: Add gup trace events Yang Shi
@ 2015-12-02 22:53 ` Yang Shi
  2015-12-02 23:36   ` Dave Hansen
  2015-12-02 22:53 ` [PATCH V2 3/7] x86: " Yang Shi
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Yang Shi @ 2015-12-02 22:53 UTC (permalink / raw)
  To: akpm, rostedt, mingo; +Cc: linux-kernel, linux-mm, linaro-kernel, yang.shi

For slow version, just add trace point for raw __get_user_pages since all
slow variants call it to do the real work finally.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
---
 mm/gup.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/gup.c b/mm/gup.c
index deafa2c..10245a4 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -13,6 +13,9 @@
 #include <linux/rwsem.h>
 #include <linux/hugetlb.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/gup.h>
+
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
 
@@ -462,6 +465,8 @@ long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 	if (!nr_pages)
 		return 0;
 
+	trace_gup_get_user_pages(tsk, mm, start, nr_pages);
+
 	VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET));
 
 	/*
@@ -599,6 +604,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 	if (!(vm_flags & vma->vm_flags))
 		return -EFAULT;
 
+	trace_gup_fixup_user_fault(tsk, mm, address, fault_flags);
 	ret = handle_mm_fault(mm, vma, address, fault_flags);
 	if (ret & VM_FAULT_ERROR) {
 		if (ret & VM_FAULT_OOM)
@@ -1340,6 +1346,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 					start, len)))
 		return 0;
 
+	trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
 	/*
 	 * Disable interrupts.  We use the nested form as we can already have
 	 * interrupts disabled by get_futex_key.
-- 
2.0.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V2 3/7] x86: mm/gup: add gup trace points
  2015-12-02 22:53 [RFC V2] Add gup trace points support Yang Shi
  2015-12-02 22:53 ` [PATCH V2 1/7] trace/events: Add gup trace events Yang Shi
  2015-12-02 22:53 ` [PATCH V2 2/7] mm/gup: add gup trace points Yang Shi
@ 2015-12-02 22:53 ` Yang Shi
  2015-12-02 22:53 ` [PATCH V2 4/7] mips: " Yang Shi
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Yang Shi @ 2015-12-02 22:53 UTC (permalink / raw)
  To: akpm, rostedt, mingo
  Cc: linux-kernel, linux-mm, linaro-kernel, yang.shi, Thomas Gleixner,
	H. Peter Anvin, x86

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
---
 arch/x86/mm/gup.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index ae9a37b..ed6cca9 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -10,6 +10,9 @@
 #include <linux/highmem.h>
 #include <linux/swap.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/gup.h>>
+
 #include <asm/pgtable.h>
 
 static inline pte_t gup_get_pte(pte_t *ptep)
@@ -270,6 +273,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 					(void __user *)start, len)))
 		return 0;
 
+	trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
 	/*
 	 * XXX: batch / limit 'nr', to avoid large irq off latency
 	 * needs some instrumenting to determine the common sizes used by
@@ -342,6 +347,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 		goto slow_irqon;
 #endif
 
+	trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
 	/*
 	 * XXX: batch / limit 'nr', to avoid large irq off latency
 	 * needs some instrumenting to determine the common sizes used by
-- 
2.0.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V2 4/7] mips: mm/gup: add gup trace points
  2015-12-02 22:53 [RFC V2] Add gup trace points support Yang Shi
                   ` (2 preceding siblings ...)
  2015-12-02 22:53 ` [PATCH V2 3/7] x86: " Yang Shi
@ 2015-12-02 22:53 ` Yang Shi
  2015-12-02 22:53 ` [PATCH V2 5/7] s390: " Yang Shi
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Yang Shi @ 2015-12-02 22:53 UTC (permalink / raw)
  To: akpm, rostedt, mingo
  Cc: linux-kernel, linux-mm, linaro-kernel, yang.shi, linux-mips

Cc: linux-mips@linux-mips.org
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
---
 arch/mips/mm/gup.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c
index 349995d..3c5b8c8 100644
--- a/arch/mips/mm/gup.c
+++ b/arch/mips/mm/gup.c
@@ -12,6 +12,9 @@
 #include <linux/swap.h>
 #include <linux/hugetlb.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/gup.h>
+
 #include <asm/cpu-features.h>
 #include <asm/pgtable.h>
 
@@ -211,6 +214,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 					(void __user *)start, len)))
 		return 0;
 
+	trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
 	/*
 	 * XXX: batch / limit 'nr', to avoid large irq off latency
 	 * needs some instrumenting to determine the common sizes used by
@@ -277,6 +282,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 	if (end < start || cpu_has_dc_aliases)
 		goto slow_irqon;
 
+	trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
 	/* XXX: batch / limit 'nr' */
 	local_irq_disable();
 	pgdp = pgd_offset(mm, addr);
-- 
2.0.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V2 5/7] s390: mm/gup: add gup trace points
  2015-12-02 22:53 [RFC V2] Add gup trace points support Yang Shi
                   ` (3 preceding siblings ...)
  2015-12-02 22:53 ` [PATCH V2 4/7] mips: " Yang Shi
@ 2015-12-02 22:53 ` Yang Shi
  2015-12-02 22:53 ` [PATCH V2 6/7] sh: " Yang Shi
  2015-12-02 22:53 ` [PATCH V2 7/7] sparc64: " Yang Shi
  6 siblings, 0 replies; 17+ messages in thread
From: Yang Shi @ 2015-12-02 22:53 UTC (permalink / raw)
  To: akpm, rostedt, mingo
  Cc: linux-kernel, linux-mm, linaro-kernel, yang.shi,
	Martin Schwidefsky, Heiko Carstens, linux-s390

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
---
 arch/s390/mm/gup.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/s390/mm/gup.c b/arch/s390/mm/gup.c
index 12bbf0e..ac25e28 100644
--- a/arch/s390/mm/gup.c
+++ b/arch/s390/mm/gup.c
@@ -10,6 +10,10 @@
 #include <linux/vmstat.h>
 #include <linux/pagemap.h>
 #include <linux/rwsem.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/gup.h>
+
 #include <asm/pgtable.h>
 
 /*
@@ -188,6 +192,9 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 	end = start + len;
 	if ((end <= start) || (end > TASK_SIZE))
 		return 0;
+
+	trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
 	/*
 	 * local_irq_save() doesn't prevent pagetable teardown, but does
 	 * prevent the pagetables from being freed on s390.
-- 
2.0.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V2 6/7] sh: mm/gup: add gup trace points
  2015-12-02 22:53 [RFC V2] Add gup trace points support Yang Shi
                   ` (4 preceding siblings ...)
  2015-12-02 22:53 ` [PATCH V2 5/7] s390: " Yang Shi
@ 2015-12-02 22:53 ` Yang Shi
  2015-12-02 22:53 ` [PATCH V2 7/7] sparc64: " Yang Shi
  6 siblings, 0 replies; 17+ messages in thread
From: Yang Shi @ 2015-12-02 22:53 UTC (permalink / raw)
  To: akpm, rostedt, mingo
  Cc: linux-kernel, linux-mm, linaro-kernel, yang.shi, linux-sh

Cc: linux-sh@vger.kernel.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
---
 arch/sh/mm/gup.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/sh/mm/gup.c b/arch/sh/mm/gup.c
index e7af6a6..6df3e97 100644
--- a/arch/sh/mm/gup.c
+++ b/arch/sh/mm/gup.c
@@ -12,6 +12,10 @@
 #include <linux/mm.h>
 #include <linux/vmstat.h>
 #include <linux/highmem.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/gup.h>
+
 #include <asm/pgtable.h>
 
 static inline pte_t gup_get_pte(pte_t *ptep)
@@ -178,6 +182,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 					(void __user *)start, len)))
 		return 0;
 
+	trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
 	/*
 	 * This doesn't prevent pagetable teardown, but does prevent
 	 * the pagetables and pages from being freed.
@@ -231,6 +237,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 	if (end < start)
 		goto slow_irqon;
 
+	trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
 	local_irq_disable();
 	pgdp = pgd_offset(mm, addr);
 	do {
-- 
2.0.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V2 7/7] sparc64: mm/gup: add gup trace points
  2015-12-02 22:53 [RFC V2] Add gup trace points support Yang Shi
                   ` (5 preceding siblings ...)
  2015-12-02 22:53 ` [PATCH V2 6/7] sh: " Yang Shi
@ 2015-12-02 22:53 ` Yang Shi
  6 siblings, 0 replies; 17+ messages in thread
From: Yang Shi @ 2015-12-02 22:53 UTC (permalink / raw)
  To: akpm, rostedt, mingo
  Cc: linux-kernel, linux-mm, linaro-kernel, yang.shi, David S. Miller,
	sparclinux

Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
---
The context depends on the below patch:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1028752.html

 arch/sparc/mm/gup.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c
index cf4fb47..6dcfc4d 100644
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -10,6 +10,10 @@
 #include <linux/vmstat.h>
 #include <linux/pagemap.h>
 #include <linux/rwsem.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/gup.h>
+
 #include <asm/pgtable.h>
 
 /*
@@ -177,6 +181,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 					(void __user *)start, len)))
 		return 0;
 
+	trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
 	local_irq_save(flags);
 	pgdp = pgd_offset(mm, addr);
 	do {
@@ -209,6 +215,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 	if (end < start)
 		goto slow_irqon;
 
+	trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
+
 	/*
 	 * XXX: batch / limit 'nr', to avoid large irq off latency
 	 * needs some instrumenting to determine the common sizes used by
-- 
2.0.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/7] mm/gup: add gup trace points
  2015-12-02 22:53 ` [PATCH V2 2/7] mm/gup: add gup trace points Yang Shi
@ 2015-12-02 23:36   ` Dave Hansen
  2015-12-03  0:11     ` Shi, Yang
  2015-12-03  4:13     ` Steven Rostedt
  0 siblings, 2 replies; 17+ messages in thread
From: Dave Hansen @ 2015-12-02 23:36 UTC (permalink / raw)
  To: Yang Shi, akpm, rostedt, mingo; +Cc: linux-kernel, linux-mm, linaro-kernel

On 12/02/2015 02:53 PM, Yang Shi wrote:
> diff --git a/mm/gup.c b/mm/gup.c
> index deafa2c..10245a4 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -13,6 +13,9 @@
>  #include <linux/rwsem.h>
>  #include <linux/hugetlb.h>
>  
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/gup.h>
> +
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>

This needs to be _the_ last thing that gets #included.  Otherwise, you
risk colliding with any other trace header that gets implicitly included
below.

> @@ -1340,6 +1346,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
>  					start, len)))
>  		return 0;
>  
> +	trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
> +
>  	/*
>  	 * Disable interrupts.  We use the nested form as we can already have
>  	 * interrupts disabled by get_futex_key.

It would be _really_ nice to be able to see return values from the
various gup calls as well.  Is that feasible?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/7] mm/gup: add gup trace points
  2015-12-02 23:36   ` Dave Hansen
@ 2015-12-03  0:11     ` Shi, Yang
  2015-12-03  0:52       ` Shi, Yang
  2015-12-03  4:13     ` Steven Rostedt
  1 sibling, 1 reply; 17+ messages in thread
From: Shi, Yang @ 2015-12-03  0:11 UTC (permalink / raw)
  To: Dave Hansen, akpm, rostedt, mingo; +Cc: linux-kernel, linux-mm, linaro-kernel

On 12/2/2015 3:36 PM, Dave Hansen wrote:
> On 12/02/2015 02:53 PM, Yang Shi wrote:
>> diff --git a/mm/gup.c b/mm/gup.c
>> index deafa2c..10245a4 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -13,6 +13,9 @@
>>   #include <linux/rwsem.h>
>>   #include <linux/hugetlb.h>
>>
>> +#define CREATE_TRACE_POINTS
>> +#include <trace/events/gup.h>
>> +
>>   #include <asm/pgtable.h>
>>   #include <asm/tlbflush.h>
>
> This needs to be _the_ last thing that gets #included.  Otherwise, you
> risk colliding with any other trace header that gets implicitly included
> below.

Thanks for the suggestion, will move it to the last.

>
>> @@ -1340,6 +1346,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
>>   					start, len)))
>>   		return 0;
>>
>> +	trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
>> +
>>   	/*
>>   	 * Disable interrupts.  We use the nested form as we can already have
>>   	 * interrupts disabled by get_futex_key.
>
> It would be _really_ nice to be able to see return values from the
> various gup calls as well.  Is that feasible?

I think it should be feasible. kmem_cache_alloc trace event could show 
return value. I'm supposed gup trace events should be able to do the 
same thing.

Regards,
Yang

>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/7] mm/gup: add gup trace points
  2015-12-03  0:11     ` Shi, Yang
@ 2015-12-03  0:52       ` Shi, Yang
  0 siblings, 0 replies; 17+ messages in thread
From: Shi, Yang @ 2015-12-03  0:52 UTC (permalink / raw)
  To: Dave Hansen, akpm, rostedt, mingo; +Cc: linux-kernel, linux-mm, linaro-kernel

On 12/2/2015 4:11 PM, Shi, Yang wrote:
> On 12/2/2015 3:36 PM, Dave Hansen wrote:
>> On 12/02/2015 02:53 PM, Yang Shi wrote:
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index deafa2c..10245a4 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -13,6 +13,9 @@
>>>   #include <linux/rwsem.h>
>>>   #include <linux/hugetlb.h>
>>>
>>> +#define CREATE_TRACE_POINTS
>>> +#include <trace/events/gup.h>
>>> +
>>>   #include <asm/pgtable.h>
>>>   #include <asm/tlbflush.h>
>>
>> This needs to be _the_ last thing that gets #included.  Otherwise, you
>> risk colliding with any other trace header that gets implicitly included
>> below.
>
> Thanks for the suggestion, will move it to the last.
>
>>
>>> @@ -1340,6 +1346,8 @@ int __get_user_pages_fast(unsigned long start,
>>> int nr_pages, int write,
>>>                       start, len)))
>>>           return 0;
>>>
>>> +    trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
>>> +
>>>       /*
>>>        * Disable interrupts.  We use the nested form as we can
>>> already have
>>>        * interrupts disabled by get_futex_key.
>>
>> It would be _really_ nice to be able to see return values from the
>> various gup calls as well.  Is that feasible?
>
> I think it should be feasible. kmem_cache_alloc trace event could show
> return value. I'm supposed gup trace events should be able to do the
> same thing.

Just did a quick test, it is definitely feasible. Please check the below 
test log:

        trace-cmd-200   [000]    99.221486: gup_get_user_pages: 
start=8000000ff0 nr_pages=1 ret=1
        trace-cmd-200   [000]    99.223215: gup_get_user_pages: 
start=8000000fdb nr_pages=1 ret=1
        trace-cmd-200   [000]    99.223298: gup_get_user_pages: 
start=8000000ed0 nr_pages=1 ret=1

nr_pages is the number of pages requested by the gup, ret is the return 
value.

If nobody has objection, I will add it into V3.

Regards,
Yang

>
> Regards,
> Yang
>
>>
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 1/7] trace/events: Add gup trace events
  2015-12-02 22:53 ` [PATCH V2 1/7] trace/events: Add gup trace events Yang Shi
@ 2015-12-03  4:07   ` Steven Rostedt
  2015-12-03 18:36     ` Shi, Yang
  0 siblings, 1 reply; 17+ messages in thread
From: Steven Rostedt @ 2015-12-03  4:07 UTC (permalink / raw)
  To: Yang Shi; +Cc: akpm, mingo, linux-kernel, linux-mm, linaro-kernel

On Wed,  2 Dec 2015 14:53:27 -0800
Yang Shi <yang.shi@linaro.org> wrote:

> page-faults events record the invoke to handle_mm_fault, but the invoke
> may come from do_page_fault or gup. In some use cases, the finer event count
> mey be needed, so add trace events support for:
> 
> __get_user_pages
> __get_user_pages_fast
> fixup_user_fault
> 
> Signed-off-by: Yang Shi <yang.shi@linaro.org>
> ---
>  include/trace/events/gup.h | 71 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 71 insertions(+)
>  create mode 100644 include/trace/events/gup.h
> 
> diff --git a/include/trace/events/gup.h b/include/trace/events/gup.h
> new file mode 100644
> index 0000000..03a4674
> --- /dev/null
> +++ b/include/trace/events/gup.h
> @@ -0,0 +1,71 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM gup
> +
> +#if !defined(_TRACE_GUP_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_GUP_H
> +
> +#include <linux/types.h>
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(gup_fixup_user_fault,
> +
> +	TP_PROTO(struct task_struct *tsk, struct mm_struct *mm,
> +			unsigned long address, unsigned int fault_flags),
> +
> +	TP_ARGS(tsk, mm, address, fault_flags),

Arges added and not used by TP_fast_assign(), this will slow down the
code while tracing is enabled, as they need to be added to the trace
function call.

> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned long,	address		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->address	= address;
> +	),
> +
> +	TP_printk("address=%lx",  __entry->address)
> +);
> +
> +TRACE_EVENT(gup_get_user_pages,
> +
> +	TP_PROTO(struct task_struct *tsk, struct mm_struct *mm,
> +			unsigned long start, unsigned long nr_pages),
> +
> +	TP_ARGS(tsk, mm, start, nr_pages),

Here too but this is worse. See below.

> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned long,	start		)
> +		__field(	unsigned long,	nr_pages	)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->start		= start;
> +		__entry->nr_pages	= nr_pages;
> +	),
> +
> +	TP_printk("start=%lx nr_pages=%lu", __entry->start, __entry->nr_pages)
> +);
> +
> +TRACE_EVENT(gup_get_user_pages_fast,
> +
> +	TP_PROTO(unsigned long start, int nr_pages, int write,
> +			struct page **pages),
> +
> +	TP_ARGS(start, nr_pages, write, pages),

This and the above "gup_get_user_pages" have the same entry field,
assign and printk. They should be combined into a DECLARE_EVENT_CLASS()
and two DEFINE_EVENT()s. That will save on size as the
DECLARE_EVENT_CLASS() is the biggest part of each TRACE_EVENT().

-- Steve


> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned long,	start		)
> +		__field(	unsigned long,	nr_pages	)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->start  	= start;
> +		__entry->nr_pages	= nr_pages;
> +	),
> +
> +	TP_printk("start=%lx nr_pages=%lu",  __entry->start, __entry->nr_pages)
> +);
> +
> +#endif /* _TRACE_GUP_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/7] mm/gup: add gup trace points
  2015-12-02 23:36   ` Dave Hansen
  2015-12-03  0:11     ` Shi, Yang
@ 2015-12-03  4:13     ` Steven Rostedt
  2015-12-03 18:36       ` Shi, Yang
  1 sibling, 1 reply; 17+ messages in thread
From: Steven Rostedt @ 2015-12-03  4:13 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Yang Shi, akpm, mingo, linux-kernel, linux-mm, linaro-kernel

On Wed, 2 Dec 2015 15:36:50 -0800
Dave Hansen <dave.hansen@intel.com> wrote:

> On 12/02/2015 02:53 PM, Yang Shi wrote:
> > diff --git a/mm/gup.c b/mm/gup.c
> > index deafa2c..10245a4 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -13,6 +13,9 @@
> >  #include <linux/rwsem.h>
> >  #include <linux/hugetlb.h>
> >  
> > +#define CREATE_TRACE_POINTS
> > +#include <trace/events/gup.h>
> > +
> >  #include <asm/pgtable.h>
> >  #include <asm/tlbflush.h>  
> 
> This needs to be _the_ last thing that gets #included.  Otherwise, you
> risk colliding with any other trace header that gets implicitly included
> below.

Agreed.

> 
> > @@ -1340,6 +1346,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> >  					start, len)))
> >  		return 0;
> >  
> > +	trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
> > +
> >  	/*
> >  	 * Disable interrupts.  We use the nested form as we can already have
> >  	 * interrupts disabled by get_futex_key.  
> 
> It would be _really_ nice to be able to see return values from the
> various gup calls as well.  Is that feasible?

Only if you rewrite the functions to have a single return code path
that we can add a tracepoint too. Or have a wrapper function that gets
called directly that calls these functions internally and the tracepoint
can trap the return value.

I can probably make function_graph tracer give return values, although
it will give a return value for void functions as well. And it may give
long long returns for int returns that may have bogus data in the
higher bits.

-- Steve


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/7] mm/gup: add gup trace points
  2015-12-03  4:13     ` Steven Rostedt
@ 2015-12-03 18:36       ` Shi, Yang
  2015-12-03 19:06         ` Steven Rostedt
  0 siblings, 1 reply; 17+ messages in thread
From: Shi, Yang @ 2015-12-03 18:36 UTC (permalink / raw)
  To: Steven Rostedt, Dave Hansen
  Cc: akpm, mingo, linux-kernel, linux-mm, linaro-kernel

On 12/2/2015 8:13 PM, Steven Rostedt wrote:
> On Wed, 2 Dec 2015 15:36:50 -0800
> Dave Hansen <dave.hansen@intel.com> wrote:
>
>> On 12/02/2015 02:53 PM, Yang Shi wrote:
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index deafa2c..10245a4 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -13,6 +13,9 @@
>>>   #include <linux/rwsem.h>
>>>   #include <linux/hugetlb.h>
>>>
>>> +#define CREATE_TRACE_POINTS
>>> +#include <trace/events/gup.h>
>>> +
>>>   #include <asm/pgtable.h>
>>>   #include <asm/tlbflush.h>
>>
>> This needs to be _the_ last thing that gets #included.  Otherwise, you
>> risk colliding with any other trace header that gets implicitly included
>> below.
>
> Agreed.
>
>>
>>> @@ -1340,6 +1346,8 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
>>>   					start, len)))
>>>   		return 0;
>>>
>>> +	trace_gup_get_user_pages_fast(start, nr_pages, write, pages);
>>> +
>>>   	/*
>>>   	 * Disable interrupts.  We use the nested form as we can already have
>>>   	 * interrupts disabled by get_futex_key.
>>
>> It would be _really_ nice to be able to see return values from the
>> various gup calls as well.  Is that feasible?
>
> Only if you rewrite the functions to have a single return code path
> that we can add a tracepoint too. Or have a wrapper function that gets

Yes. My preliminary test just could cover the success case. gup could 
return errno from different a few code path.

> called directly that calls these functions internally and the tracepoint
> can trap the return value.

This will incur more changes in other subsystems (futex, kvm, etc), I'm 
not sure if it is worth making such changes to get return value.

> I can probably make function_graph tracer give return values, although
> it will give a return value for void functions as well. And it may give
> long long returns for int returns that may have bogus data in the
> higher bits.

If the return value requirement is not limited to gup, the approach 
sounds more reasonable.

Thanks,
Yang

>
> -- Steve
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 1/7] trace/events: Add gup trace events
  2015-12-03  4:07   ` Steven Rostedt
@ 2015-12-03 18:36     ` Shi, Yang
  0 siblings, 0 replies; 17+ messages in thread
From: Shi, Yang @ 2015-12-03 18:36 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: akpm, mingo, linux-kernel, linux-mm, linaro-kernel

On 12/2/2015 8:07 PM, Steven Rostedt wrote:
> On Wed,  2 Dec 2015 14:53:27 -0800
> Yang Shi <yang.shi@linaro.org> wrote:
>
>> page-faults events record the invoke to handle_mm_fault, but the invoke
>> may come from do_page_fault or gup. In some use cases, the finer event count
>> mey be needed, so add trace events support for:
>>
>> __get_user_pages
>> __get_user_pages_fast
>> fixup_user_fault
>>
>> Signed-off-by: Yang Shi <yang.shi@linaro.org>
>> ---
>>   include/trace/events/gup.h | 71 ++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 71 insertions(+)
>>   create mode 100644 include/trace/events/gup.h
>>
>> diff --git a/include/trace/events/gup.h b/include/trace/events/gup.h
>> new file mode 100644
>> index 0000000..03a4674
>> --- /dev/null
>> +++ b/include/trace/events/gup.h
>> @@ -0,0 +1,71 @@
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM gup
>> +
>> +#if !defined(_TRACE_GUP_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_GUP_H
>> +
>> +#include <linux/types.h>
>> +#include <linux/tracepoint.h>
>> +
>> +TRACE_EVENT(gup_fixup_user_fault,
>> +
>> +	TP_PROTO(struct task_struct *tsk, struct mm_struct *mm,
>> +			unsigned long address, unsigned int fault_flags),
>> +
>> +	TP_ARGS(tsk, mm, address, fault_flags),
>
> Arges added and not used by TP_fast_assign(), this will slow down the
> code while tracing is enabled, as they need to be added to the trace
> function call.
>
>> +
>> +	TP_STRUCT__entry(
>> +		__field(	unsigned long,	address		)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->address	= address;
>> +	),
>> +
>> +	TP_printk("address=%lx",  __entry->address)
>> +);
>> +
>> +TRACE_EVENT(gup_get_user_pages,
>> +
>> +	TP_PROTO(struct task_struct *tsk, struct mm_struct *mm,
>> +			unsigned long start, unsigned long nr_pages),
>> +
>> +	TP_ARGS(tsk, mm, start, nr_pages),
>
> Here too but this is worse. See below.
>
>> +
>> +	TP_STRUCT__entry(
>> +		__field(	unsigned long,	start		)
>> +		__field(	unsigned long,	nr_pages	)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->start		= start;
>> +		__entry->nr_pages	= nr_pages;
>> +	),
>> +
>> +	TP_printk("start=%lx nr_pages=%lu", __entry->start, __entry->nr_pages)
>> +);
>> +
>> +TRACE_EVENT(gup_get_user_pages_fast,
>> +
>> +	TP_PROTO(unsigned long start, int nr_pages, int write,
>> +			struct page **pages),
>> +
>> +	TP_ARGS(start, nr_pages, write, pages),
>
> This and the above "gup_get_user_pages" have the same entry field,
> assign and printk. They should be combined into a DECLARE_EVENT_CLASS()
> and two DEFINE_EVENT()s. That will save on size as the
> DECLARE_EVENT_CLASS() is the biggest part of each TRACE_EVENT().

Thanks for the suggestion, will fix them in V3.

Regards,
Yang

>
> -- Steve
>
>
>> +
>> +	TP_STRUCT__entry(
>> +		__field(	unsigned long,	start		)
>> +		__field(	unsigned long,	nr_pages	)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->start  	= start;
>> +		__entry->nr_pages	= nr_pages;
>> +	),
>> +
>> +	TP_printk("start=%lx nr_pages=%lu",  __entry->start, __entry->nr_pages)
>> +);
>> +
>> +#endif /* _TRACE_GUP_H */
>> +
>> +/* This part must be outside protection */
>> +#include <trace/define_trace.h>
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/7] mm/gup: add gup trace points
  2015-12-03 18:36       ` Shi, Yang
@ 2015-12-03 19:06         ` Steven Rostedt
  2015-12-03 22:10           ` Shi, Yang
  0 siblings, 1 reply; 17+ messages in thread
From: Steven Rostedt @ 2015-12-03 19:06 UTC (permalink / raw)
  To: Shi, Yang; +Cc: Dave Hansen, akpm, mingo, linux-kernel, linux-mm, linaro-kernel

On Thu, 03 Dec 2015 10:36:18 -0800
"Shi, Yang" <yang.shi@linaro.org> wrote:

> > called directly that calls these functions internally and the tracepoint
> > can trap the return value.  
> 
> This will incur more changes in other subsystems (futex, kvm, etc), I'm 
> not sure if it is worth making such changes to get return value.

No, it wouldn't require any changes outside of this.

-long __get_user_pages(..)
+static long __get_user_pages_internal(..)
{
  [..]
}
+
+long __get_user_pages(..)
+{
+	long ret;
+	ret = __get_user_pages_internal(..);
+	trace_get_user_pages(.., ret)
+}

> 
> > I can probably make function_graph tracer give return values, although
> > it will give a return value for void functions as well. And it may give
> > long long returns for int returns that may have bogus data in the
> > higher bits.  
> 
> If the return value requirement is not limited to gup, the approach 
> sounds more reasonable.
>

Others have asked about it. Maybe I should do it.

-- Steve


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/7] mm/gup: add gup trace points
  2015-12-03 19:06         ` Steven Rostedt
@ 2015-12-03 22:10           ` Shi, Yang
  0 siblings, 0 replies; 17+ messages in thread
From: Shi, Yang @ 2015-12-03 22:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dave Hansen, akpm, mingo, linux-kernel, linux-mm, linaro-kernel

On 12/3/2015 11:06 AM, Steven Rostedt wrote:
> On Thu, 03 Dec 2015 10:36:18 -0800
> "Shi, Yang" <yang.shi@linaro.org> wrote:
>
>>> called directly that calls these functions internally and the tracepoint
>>> can trap the return value.
>>
>> This will incur more changes in other subsystems (futex, kvm, etc), I'm
>> not sure if it is worth making such changes to get return value.
>
> No, it wouldn't require any changes outside of this.
>
> -long __get_user_pages(..)
> +static long __get_user_pages_internal(..)
> {
>    [..]
> }
> +
> +long __get_user_pages(..)
> +{
> +	long ret;
> +	ret = __get_user_pages_internal(..);
> +	trace_get_user_pages(.., ret)
> +}

Thanks for this. I just checked the fast version, it looks it just has 
single return path, so this should be just needed by slow version.

>
>>
>>> I can probably make function_graph tracer give return values, although
>>> it will give a return value for void functions as well. And it may give
>>> long long returns for int returns that may have bogus data in the
>>> higher bits.
>>
>> If the return value requirement is not limited to gup, the approach
>> sounds more reasonable.
>>
>
> Others have asked about it. Maybe I should do it.

If you are going to add return value in common trace code, I won't do 
the gup specific one in V3.

Thanks,
Yang

>
> -- Steve
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-12-03 22:10 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-02 22:53 [RFC V2] Add gup trace points support Yang Shi
2015-12-02 22:53 ` [PATCH V2 1/7] trace/events: Add gup trace events Yang Shi
2015-12-03  4:07   ` Steven Rostedt
2015-12-03 18:36     ` Shi, Yang
2015-12-02 22:53 ` [PATCH V2 2/7] mm/gup: add gup trace points Yang Shi
2015-12-02 23:36   ` Dave Hansen
2015-12-03  0:11     ` Shi, Yang
2015-12-03  0:52       ` Shi, Yang
2015-12-03  4:13     ` Steven Rostedt
2015-12-03 18:36       ` Shi, Yang
2015-12-03 19:06         ` Steven Rostedt
2015-12-03 22:10           ` Shi, Yang
2015-12-02 22:53 ` [PATCH V2 3/7] x86: " Yang Shi
2015-12-02 22:53 ` [PATCH V2 4/7] mips: " Yang Shi
2015-12-02 22:53 ` [PATCH V2 5/7] s390: " Yang Shi
2015-12-02 22:53 ` [PATCH V2 6/7] sh: " Yang Shi
2015-12-02 22:53 ` [PATCH V2 7/7] sparc64: " Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).