linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] introduce DAX tracepoint support
@ 2016-11-30 23:45 Ross Zwisler
  2016-11-30 23:45 ` [PATCH v2 1/6] tracing: add __print_flags_u64() Ross Zwisler
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Ross Zwisler @ 2016-11-30 23:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, Alexander Viro, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Ingo Molnar, Jan Kara,
	Matthew Wilcox, Steven Rostedt, linux-fsdevel, linux-mm,
	linux-nvdimm

Tracepoints are the standard way to capture debugging and tracing
information in many parts of the kernel, including the XFS and ext4
filesystems.  This series creates a tracepoint header for FS DAX and add
the first few DAX tracepoints to the PMD fault handler.  This allows the
tracing for DAX to be done in the same way as the filesystem tracing so
that developers can look at them together and get a coherent idea of what
the system is doing.

I do intend to add tracepoints to the normal 4k DAX fault path and to the
DAX I/O path, but I wanted to get feedback on the PMD tracepoints before I
went any further.

This series is based on Jan Kara's "dax: Clear dirty bits after flushing
caches" series:

https://lists.01.org/pipermail/linux-nvdimm/2016-November/007864.html

I've pushed a git tree with this work here:

https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=dax_tracepoints_v2

Changes since v1:
 - Dropped the patch fixing the build issue between DAX, ext4 and FS_IOMAP.
   I'll resend an updated patch if needed once Jan's patches for this issue
   are applied.
 - Added incude/linux/dax.h to MAINTAINERS in patch 4. (Matthew)
 - Begin each DAX tracepoint with the device major/minor and the inode so
   that we are consistent with the XFS tracepoints. This will allow for
   easy grepping of the tracepoint output. (Dave)
 - Print all PMD fault flags, not just whether we are doing a read or a
   write. (Jan)
 - Added __print_flags_u64() and the necessary helpers to the tracing
   infrastructure.  These functions allow us to print symbols associated
   with flags that are 64 bits wide even on 32 bit machines.  We need this
   for the pfn_t flags.

Ross Zwisler (6):
  tracing: add __print_flags_u64()
  dax: remove leading space from labels
  dax: add tracepoint infrastructure, PMD tracing
  dax: update MAINTAINERS entries for FS DAX
  dax: add tracepoints to dax_pmd_load_hole()
  dax: add tracepoints to dax_pmd_insert_mapping()

 MAINTAINERS                   |   5 +-
 fs/dax.c                      |  80 +++++++++++++--------
 include/linux/mm.h            |  25 +++++++
 include/linux/pfn_t.h         |   6 ++
 include/linux/trace_events.h  |   4 ++
 include/trace/events/fs_dax.h | 161 ++++++++++++++++++++++++++++++++++++++++++
 include/trace/trace_events.h  |  11 +++
 kernel/trace/trace_output.c   |  38 ++++++++++
 8 files changed, 300 insertions(+), 30 deletions(-)
 create mode 100644 include/trace/events/fs_dax.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/6] tracing: add __print_flags_u64()
  2016-11-30 23:45 [PATCH v2 0/6] introduce DAX tracepoint support Ross Zwisler
@ 2016-11-30 23:45 ` Ross Zwisler
  2016-12-01 14:12   ` Steven Rostedt
  2016-11-30 23:45 ` [PATCH v2 2/6] dax: remove leading space from labels Ross Zwisler
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Ross Zwisler @ 2016-11-30 23:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, Alexander Viro, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Ingo Molnar, Jan Kara,
	Matthew Wilcox, Steven Rostedt, linux-fsdevel, linux-mm,
	linux-nvdimm

Add __print_flags_u64() and the helper trace_print_flags_seq_u64() in the
same spirit as __print_symbolic_u64() and trace_print_symbols_seq_u64().
These functions allow us to print symbols associated with flags that are 64
bits wide even on 32 bit machines.

These will be used by the DAX code so that we can print the flags set in a
pfn_t such as PFN_SG_CHAIN, PFN_SG_LAST, PFN_DEV and PFN_MAP.

Without this new function I was getting errors like the following when
compiling for i386:

./include/linux/pfn_t.h:13:22: warning: large integer implicitly truncated
to unsigned type [-Woverflow]
 #define PFN_SG_CHAIN (1ULL << (BITS_PER_LONG_LONG - 1))
  ^

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
 include/linux/trace_events.h |  4 ++++
 include/trace/trace_events.h | 11 +++++++++++
 kernel/trace/trace_output.c  | 38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 53 insertions(+)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index be00761..db2c3ba 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -23,6 +23,10 @@ const char *trace_print_symbols_seq(struct trace_seq *p, unsigned long val,
 				    const struct trace_print_flags *symbol_array);
 
 #if BITS_PER_LONG == 32
+const char *trace_print_flags_seq_u64(struct trace_seq *p, const char *delim,
+		      unsigned long long flags,
+		      const struct trace_print_flags_u64 *flag_array);
+
 const char *trace_print_symbols_seq_u64(struct trace_seq *p,
 					unsigned long long val,
 					const struct trace_print_flags_u64
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 467e12f..c6e9f72 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -283,8 +283,16 @@ TRACE_MAKE_SYSTEM_STR();
 		trace_print_symbols_seq(p, value, symbols);		\
 	})
 
+#undef __print_flags_u64
 #undef __print_symbolic_u64
 #if BITS_PER_LONG == 32
+#define __print_flags_u64(flag, delim, flag_array...)			\
+	({								\
+		static const struct trace_print_flags_u64 __flags[] =	\
+			{ flag_array, { -1, NULL } };			\
+		trace_print_flags_seq_u64(p, delim, flag, __flags);	\
+	})
+
 #define __print_symbolic_u64(value, symbol_array...)			\
 	({								\
 		static const struct trace_print_flags_u64 symbols[] =	\
@@ -292,6 +300,9 @@ TRACE_MAKE_SYSTEM_STR();
 		trace_print_symbols_seq_u64(p, value, symbols);	\
 	})
 #else
+#define __print_flags_u64(flag, delim, flag_array...)			\
+			__print_flags(flag, delim, flag_array)
+
 #define __print_symbolic_u64(value, symbol_array...)			\
 			__print_symbolic(value, symbol_array)
 #endif
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 3fc2042..ed4398f 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -124,6 +124,44 @@ EXPORT_SYMBOL(trace_print_symbols_seq);
 
 #if BITS_PER_LONG == 32
 const char *
+trace_print_flags_seq_u64(struct trace_seq *p, const char *delim,
+		      unsigned long long flags,
+		      const struct trace_print_flags_u64 *flag_array)
+{
+	unsigned long mask;
+	const char *str;
+	const char *ret = trace_seq_buffer_ptr(p);
+	int i, first = 1;
+
+	for (i = 0;  flag_array[i].name && flags; i++) {
+
+		mask = flag_array[i].mask;
+		if ((flags & mask) != mask)
+			continue;
+
+		str = flag_array[i].name;
+		flags &= ~mask;
+		if (!first && delim)
+			trace_seq_puts(p, delim);
+		else
+			first = 0;
+		trace_seq_puts(p, str);
+	}
+
+	/* check for left over flags */
+	if (flags) {
+		if (!first && delim)
+			trace_seq_puts(p, delim);
+		trace_seq_printf(p, "0x%llx", flags);
+	}
+
+	trace_seq_putc(p, 0);
+
+	return ret;
+}
+EXPORT_SYMBOL(trace_print_flags_seq_u64);
+
+const char *
 trace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
 			 const struct trace_print_flags_u64 *symbol_array)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/6] dax: remove leading space from labels
  2016-11-30 23:45 [PATCH v2 0/6] introduce DAX tracepoint support Ross Zwisler
  2016-11-30 23:45 ` [PATCH v2 1/6] tracing: add __print_flags_u64() Ross Zwisler
@ 2016-11-30 23:45 ` Ross Zwisler
  2016-12-01  8:11   ` Jan Kara
  2016-11-30 23:45 ` [PATCH v2 3/6] dax: add tracepoint infrastructure, PMD tracing Ross Zwisler
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Ross Zwisler @ 2016-11-30 23:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, Alexander Viro, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Ingo Molnar, Jan Kara,
	Matthew Wilcox, Steven Rostedt, linux-fsdevel, linux-mm,
	linux-nvdimm

No functional change.

As of this commit:

commit 218dd85887da (".gitattributes: set git diff driver for C source code
files")

git-diff and git-format-patch both generate diffs whose hunks are correctly
prefixed by function names instead of labels, even if those labels aren't
indented with spaces.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
 fs/dax.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index be39633..b14335c 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -422,7 +422,7 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
 		return page;
 	}
 	entry = lock_slot(mapping, slot);
- out_unlock:
+out_unlock:
 	spin_unlock_irq(&mapping->tree_lock);
 	return entry;
 }
@@ -557,7 +557,7 @@ static int dax_load_hole(struct address_space *mapping, void **entry,
 				   vmf->gfp_mask | __GFP_ZERO);
 	if (!page)
 		return VM_FAULT_OOM;
- out:
+out:
 	vmf->page = page;
 	ret = finish_fault(vmf);
 	vmf->page = NULL;
@@ -659,7 +659,7 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
 	}
 	if (vmf->flags & FAULT_FLAG_WRITE)
 		radix_tree_tag_set(page_tree, index, PAGECACHE_TAG_DIRTY);
- unlock:
+unlock:
 	spin_unlock_irq(&mapping->tree_lock);
 	if (hole_fill) {
 		radix_tree_preload_end();
@@ -812,12 +812,12 @@ static int dax_writeback_one(struct block_device *bdev,
 	spin_lock_irq(&mapping->tree_lock);
 	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_DIRTY);
 	spin_unlock_irq(&mapping->tree_lock);
- unmap:
+unmap:
 	dax_unmap_atomic(bdev, &dax);
 	put_locked_mapping_entry(mapping, index, entry);
 	return ret;
 
- put_unlocked:
+put_unlocked:
 	put_unlocked_mapping_entry(mapping, index, entry2);
 	spin_unlock_irq(&mapping->tree_lock);
 	return ret;
@@ -1194,11 +1194,11 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		break;
 	}
 
- error_unlock_entry:
+error_unlock_entry:
 	vmf_ret = dax_fault_return(error) | major;
- unlock_entry:
+unlock_entry:
 	put_locked_mapping_entry(mapping, vmf->pgoff, entry);
- finish_iomap:
+finish_iomap:
 	if (ops->iomap_end) {
 		int copied = PAGE_SIZE;
 
@@ -1255,7 +1255,7 @@ static int dax_pmd_insert_mapping(struct vm_area_struct *vma, pmd_t *pmd,
 
 	return vmf_insert_pfn_pmd(vma, address, pmd, dax.pfn, write);
 
- unmap_fallback:
+unmap_fallback:
 	dax_unmap_atomic(bdev, &dax);
 	return VM_FAULT_FALLBACK;
 }
@@ -1379,9 +1379,9 @@ int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 		break;
 	}
 
- unlock_entry:
+unlock_entry:
 	put_locked_mapping_entry(mapping, pgoff, entry);
- finish_iomap:
+finish_iomap:
 	if (ops->iomap_end) {
 		int copied = PMD_SIZE;
 
@@ -1396,7 +1396,7 @@ int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 		ops->iomap_end(inode, pos, PMD_SIZE, copied, iomap_flags,
 				&iomap);
 	}
- fallback:
+fallback:
 	if (result == VM_FAULT_FALLBACK) {
 		split_huge_pmd(vma, pmd, address);
 		count_vm_event(THP_FAULT_FALLBACK);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/6] dax: add tracepoint infrastructure, PMD tracing
  2016-11-30 23:45 [PATCH v2 0/6] introduce DAX tracepoint support Ross Zwisler
  2016-11-30 23:45 ` [PATCH v2 1/6] tracing: add __print_flags_u64() Ross Zwisler
  2016-11-30 23:45 ` [PATCH v2 2/6] dax: remove leading space from labels Ross Zwisler
@ 2016-11-30 23:45 ` Ross Zwisler
  2016-12-01  8:10   ` Jan Kara
  2016-12-01 14:16   ` Steven Rostedt
  2016-11-30 23:45 ` [PATCH v2 4/6] dax: update MAINTAINERS entries for FS DAX Ross Zwisler
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 18+ messages in thread
From: Ross Zwisler @ 2016-11-30 23:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, Alexander Viro, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Ingo Molnar, Jan Kara,
	Matthew Wilcox, Steven Rostedt, linux-fsdevel, linux-mm,
	linux-nvdimm

Tracepoints are the standard way to capture debugging and tracing
information in many parts of the kernel, including the XFS and ext4
filesystems.  Create a tracepoint header for FS DAX and add the first DAX
tracepoints to the PMD fault handler.  This allows the tracing for DAX to
be done in the same way as the filesystem tracing so that developers can
look at them together and get a coherent idea of what the system is doing.

I added both an entry and exit tracepoint because future patches will add
tracepoints to child functions of dax_iomap_pmd_fault() like
dax_pmd_load_hole() and dax_pmd_insert_mapping(). We want those messages to
be wrapped by the parent function tracepoints so the code flow is more
easily understood.  Having entry and exit tracepoints for faults also
allows us to easily see what filesystems functions were called during the
fault.  These filesystem functions get executed via iomap_begin() and
iomap_end() calls, for example, and will have their own tracepoints.

For PMD faults we primarily want to understand the type of mapping, the
fault flags, the faulting address and whether it fell back to 4k faults.
If it fell back to 4k faults the tracepoints should let us understand why.

I named the new tracepoint header file "fs_dax.h" to allow for device DAX
to have its own separate tracing header in the same directory at some
point.

Here is an example output for these events from a successful PMD fault:

big-1441  [005] ....    32.582758: xfs_filemap_pmd_fault: dev 259:0 ino
0x1003

big-1441  [005] ....    32.582776: dax_pmd_fault: dev 259:0 ino 0x1003
shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start
0x10200000 vm_end 0x10700000 pgoff 0x200 max_pgoff 0x1400

big-1441  [005] ....    32.583292: dax_pmd_fault_done: dev 259:0 ino 0x1003
shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start
0x10200000 vm_end 0x10700000 pgoff 0x200 max_pgoff 0x1400 NOPAGE

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
---
 fs/dax.c                      | 30 ++++++++++++-------
 include/linux/mm.h            | 25 ++++++++++++++++
 include/trace/events/fs_dax.h | 68 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 113 insertions(+), 10 deletions(-)
 create mode 100644 include/trace/events/fs_dax.h

diff --git a/fs/dax.c b/fs/dax.c
index b14335c..4a99c2e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -35,6 +35,9 @@
 #include <linux/iomap.h>
 #include "internal.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/fs_dax.h>
+
 /* We choose 4096 entries - same as per-zone page wait tables */
 #define DAX_WAIT_TABLE_BITS 12
 #define DAX_WAIT_TABLE_ENTRIES (1 << DAX_WAIT_TABLE_BITS)
@@ -1311,6 +1314,16 @@ int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 	loff_t pos;
 	int error;
 
+	/*
+	 * Check whether offset isn't beyond end of file now. Caller is
+	 * supposed to hold locks serializing us with truncate / punch hole so
+	 * this is a reliable test.
+	 */
+	pgoff = linear_page_index(vma, pmd_addr);
+	max_pgoff = (i_size_read(inode) - 1) >> PAGE_SHIFT;
+
+	trace_dax_pmd_fault(inode, vma, address, flags, pgoff, max_pgoff, 0);
+
 	/* Fall back to PTEs if we're going to COW */
 	if (write && !(vma->vm_flags & VM_SHARED))
 		goto fallback;
@@ -1321,16 +1334,10 @@ int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 	if ((pmd_addr + PMD_SIZE) > vma->vm_end)
 		goto fallback;
 
-	/*
-	 * Check whether offset isn't beyond end of file now. Caller is
-	 * supposed to hold locks serializing us with truncate / punch hole so
-	 * this is a reliable test.
-	 */
-	pgoff = linear_page_index(vma, pmd_addr);
-	max_pgoff = (i_size_read(inode) - 1) >> PAGE_SHIFT;
-
-	if (pgoff > max_pgoff)
-		return VM_FAULT_SIGBUS;
+	if (pgoff > max_pgoff) {
+		result = VM_FAULT_SIGBUS;
+		goto out;
+	}
 
 	/* If the PMD would extend beyond the file size */
 	if ((pgoff | PG_PMD_COLOUR) > max_pgoff)
@@ -1401,6 +1408,9 @@ int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 		split_huge_pmd(vma, pmd, address);
 		count_vm_event(THP_FAULT_FALLBACK);
 	}
+out:
+	trace_dax_pmd_fault_done(inode, vma, address, flags, pgoff, max_pgoff,
+			result);
 	return result;
 }
 EXPORT_SYMBOL_GPL(dax_iomap_pmd_fault);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a5f52c0..30f416a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -281,6 +281,17 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_REMOTE	0x80	/* faulting for non current tsk/mm */
 #define FAULT_FLAG_INSTRUCTION  0x100	/* The fault was during an instruction fetch */
 
+#define FAULT_FLAG_TRACE \
+	{ FAULT_FLAG_WRITE,		"WRITE" }, \
+	{ FAULT_FLAG_MKWRITE,		"MKWRITE" }, \
+	{ FAULT_FLAG_ALLOW_RETRY,	"ALLOW_RETRY" }, \
+	{ FAULT_FLAG_RETRY_NOWAIT,	"RETRY_NOWAIT" }, \
+	{ FAULT_FLAG_KILLABLE,		"KILLABLE" }, \
+	{ FAULT_FLAG_TRIED,		"TRIED" }, \
+	{ FAULT_FLAG_USER,		"USER" }, \
+	{ FAULT_FLAG_REMOTE,		"REMOTE" }, \
+	{ FAULT_FLAG_INSTRUCTION,	"INSTRUCTION" }
+
 /*
  * vm_fault is filled by the the pagefault handler and passed to the vma's
  * ->fault function. The vma's ->fault is responsible for returning a bitmask
@@ -1107,6 +1118,20 @@ static inline void clear_page_pfmemalloc(struct page *page)
 			 VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \
 			 VM_FAULT_FALLBACK)
 
+#define VM_FAULT_RESULT_TRACE \
+	{ VM_FAULT_OOM,			"OOM" }, \
+	{ VM_FAULT_SIGBUS,		"SIGBUS" }, \
+	{ VM_FAULT_MAJOR,		"MAJOR" }, \
+	{ VM_FAULT_WRITE,		"WRITE" }, \
+	{ VM_FAULT_HWPOISON,		"HWPOISON" }, \
+	{ VM_FAULT_HWPOISON_LARGE,	"HWPOISON_LARGE" }, \
+	{ VM_FAULT_SIGSEGV,		"SIGSEGV" }, \
+	{ VM_FAULT_NOPAGE,		"NOPAGE" }, \
+	{ VM_FAULT_LOCKED,		"LOCKED" }, \
+	{ VM_FAULT_RETRY,		"RETRY" }, \
+	{ VM_FAULT_FALLBACK,		"FALLBACK" }, \
+	{ VM_FAULT_DONE_COW,		"DONE_COW" }
+
 /* Encode hstate index for a hwpoisoned large page */
 #define VM_FAULT_SET_HINDEX(x) ((x) << 12)
 #define VM_FAULT_GET_HINDEX(x) (((x) >> 12) & 0xf)
diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
new file mode 100644
index 0000000..5acc016
--- /dev/null
+++ b/include/trace/events/fs_dax.h
@@ -0,0 +1,68 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fs_dax
+
+#if !defined(_TRACE_FS_DAX_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FS_DAX_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(dax_pmd_fault_class,
+	TP_PROTO(struct inode *inode, struct vm_area_struct *vma,
+		unsigned long address, unsigned int flags, pgoff_t pgoff,
+		pgoff_t max_pgoff, int result),
+	TP_ARGS(inode, vma, address, flags, pgoff, max_pgoff, result),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned long, ino)
+		__field(unsigned long, vm_start)
+		__field(unsigned long, vm_end)
+		__field(unsigned long, vm_flags)
+		__field(unsigned long, address)
+		__field(unsigned int, flags)
+		__field(pgoff_t, pgoff)
+		__field(pgoff_t, max_pgoff)
+		__field(int, result)
+	),
+	TP_fast_assign(
+		__entry->dev = inode->i_sb->s_dev;
+		__entry->ino = inode->i_ino;
+		__entry->vm_start = vma->vm_start;
+		__entry->vm_end = vma->vm_end;
+		__entry->vm_flags = vma->vm_flags;
+		__entry->address = address;
+		__entry->flags = flags;
+		__entry->pgoff = pgoff;
+		__entry->max_pgoff = max_pgoff;
+		__entry->result = result;
+	),
+	TP_printk("dev %d:%d ino %#lx %s %s address %#lx vm_start "
+			"%#lx vm_end %#lx pgoff %#lx max_pgoff %#lx %s",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		__entry->ino,
+		__entry->vm_flags & VM_SHARED ? "shared" : "private",
+		__print_flags(__entry->flags, "|", FAULT_FLAG_TRACE),
+		__entry->address,
+		__entry->vm_start,
+		__entry->vm_end,
+		__entry->pgoff,
+		__entry->max_pgoff,
+		__print_flags(__entry->result, "|", VM_FAULT_RESULT_TRACE)
+	)
+)
+
+#define DEFINE_PMD_FAULT_EVENT(name) \
+DEFINE_EVENT(dax_pmd_fault_class, name, \
+	TP_PROTO(struct inode *inode, struct vm_area_struct *vma, \
+		unsigned long address, unsigned int flags, pgoff_t pgoff, \
+		pgoff_t max_pgoff, int result), \
+	TP_ARGS(inode, vma, address, flags, pgoff, max_pgoff, result))
+
+DEFINE_PMD_FAULT_EVENT(dax_pmd_fault);
+DEFINE_PMD_FAULT_EVENT(dax_pmd_fault_done);
+
+
+#endif /* _TRACE_FS_DAX_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/6] dax: update MAINTAINERS entries for FS DAX
  2016-11-30 23:45 [PATCH v2 0/6] introduce DAX tracepoint support Ross Zwisler
                   ` (2 preceding siblings ...)
  2016-11-30 23:45 ` [PATCH v2 3/6] dax: add tracepoint infrastructure, PMD tracing Ross Zwisler
@ 2016-11-30 23:45 ` Ross Zwisler
  2016-11-30 23:45 ` [PATCH v2 5/6] dax: add tracepoints to dax_pmd_load_hole() Ross Zwisler
  2016-11-30 23:45 ` [PATCH v2 6/6] dax: add tracepoints to dax_pmd_insert_mapping() Ross Zwisler
  5 siblings, 0 replies; 18+ messages in thread
From: Ross Zwisler @ 2016-11-30 23:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, Alexander Viro, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Ingo Molnar, Jan Kara,
	Matthew Wilcox, Steven Rostedt, linux-fsdevel, linux-mm,
	linux-nvdimm

Add the new include/trace/events/fs_dax.h tracepoint header, the existing
include/linux/dax.h header, update Matthew's email address and add myself
as a maintainer for filesystem DAX.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 MAINTAINERS | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2a58eea..b93ea2a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3855,10 +3855,13 @@ S:	Maintained
 F:	drivers/i2c/busses/i2c-diolan-u2c.c
 
 DIRECT ACCESS (DAX)
-M:	Matthew Wilcox <willy@linux.intel.com>
+M:	Matthew Wilcox <mawilcox@microsoft.com>
+M:	Ross Zwisler <ross.zwisler@linux.intel.com>
 L:	linux-fsdevel@vger.kernel.org
 S:	Supported
 F:	fs/dax.c
+F:	include/linux/dax.h
+F:	include/trace/events/fs_dax.h
 
 DIRECTORY NOTIFICATION (DNOTIFY)
 M:	Eric Paris <eparis@parisplace.org>
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 5/6] dax: add tracepoints to dax_pmd_load_hole()
  2016-11-30 23:45 [PATCH v2 0/6] introduce DAX tracepoint support Ross Zwisler
                   ` (3 preceding siblings ...)
  2016-11-30 23:45 ` [PATCH v2 4/6] dax: update MAINTAINERS entries for FS DAX Ross Zwisler
@ 2016-11-30 23:45 ` Ross Zwisler
  2016-11-30 23:45 ` [PATCH v2 6/6] dax: add tracepoints to dax_pmd_insert_mapping() Ross Zwisler
  5 siblings, 0 replies; 18+ messages in thread
From: Ross Zwisler @ 2016-11-30 23:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, Alexander Viro, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Ingo Molnar, Jan Kara,
	Matthew Wilcox, Steven Rostedt, linux-fsdevel, linux-mm,
	linux-nvdimm

Add tracepoints to dax_pmd_load_hole(), following the same logging
conventions as the tracepoints in dax_iomap_pmd_fault().

Here is an example PMD fault showing the new tracepoints:

read_big-1478  [004] ....   238.242188: xfs_filemap_pmd_fault: dev 259:0
ino 0x1003

read_big-1478  [004] ....   238.242191: dax_pmd_fault: dev 259:0 ino 0x1003
shared ALLOW_RETRY|KILLABLE|USER address 0x10400000 vm_start 0x10200000
vm_end 0x10600000 pgoff 0x200 max_pgoff 0x1400

read_big-1478  [004] ....   238.242390: dax_pmd_load_hole: dev 259:0 ino
0x1003 shared address 0x10400000 zero_page ffffea0002c20000 radix_entry
0x1e

read_big-1478  [004] ....   238.242392: dax_pmd_fault_done: dev 259:0 ino
0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10400000 vm_start
0x10200000 vm_end 0x10600000 pgoff 0x200 max_pgoff 0x1400 NOPAGE

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c                      | 14 ++++++++++----
 include/trace/events/fs_dax.h | 42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 4a99c2e..ad18366 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1269,33 +1269,39 @@ static int dax_pmd_load_hole(struct vm_area_struct *vma, pmd_t *pmd,
 {
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	unsigned long pmd_addr = address & PMD_MASK;
+	struct inode *inode = mapping->host;
 	struct page *zero_page;
+	void *ret = NULL;
 	spinlock_t *ptl;
 	pmd_t pmd_entry;
-	void *ret;
 
 	zero_page = mm_get_huge_zero_page(vma->vm_mm);
 
 	if (unlikely(!zero_page))
-		return VM_FAULT_FALLBACK;
+		goto fallback;
 
 	ret = dax_insert_mapping_entry(mapping, vmf, *entryp, 0,
 			RADIX_DAX_PMD | RADIX_DAX_HZP);
 	if (IS_ERR(ret))
-		return VM_FAULT_FALLBACK;
+		goto fallback;
 	*entryp = ret;
 
 	ptl = pmd_lock(vma->vm_mm, pmd);
 	if (!pmd_none(*pmd)) {
 		spin_unlock(ptl);
-		return VM_FAULT_FALLBACK;
+		goto fallback;
 	}
 
 	pmd_entry = mk_pmd(zero_page, vma->vm_page_prot);
 	pmd_entry = pmd_mkhuge(pmd_entry);
 	set_pmd_at(vma->vm_mm, pmd_addr, pmd, pmd_entry);
 	spin_unlock(ptl);
+	trace_dax_pmd_load_hole(inode, vma, address, zero_page, ret);
 	return VM_FAULT_NOPAGE;
+
+fallback:
+	trace_dax_pmd_load_hole_fallback(inode, vma, address, zero_page, ret);
+	return VM_FAULT_FALLBACK;
 }
 
 int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
index 5acc016..9f0a455 100644
--- a/include/trace/events/fs_dax.h
+++ b/include/trace/events/fs_dax.h
@@ -61,6 +61,48 @@ DEFINE_EVENT(dax_pmd_fault_class, name, \
 DEFINE_PMD_FAULT_EVENT(dax_pmd_fault);
 DEFINE_PMD_FAULT_EVENT(dax_pmd_fault_done);
 
+DECLARE_EVENT_CLASS(dax_pmd_load_hole_class,
+	TP_PROTO(struct inode *inode, struct vm_area_struct *vma,
+		unsigned long address, struct page *zero_page,
+		void *radix_entry),
+	TP_ARGS(inode, vma, address, zero_page, radix_entry),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned long, ino)
+		__field(unsigned long, vm_flags)
+		__field(unsigned long, address)
+		__field(struct page *, zero_page)
+		__field(void *, radix_entry)
+	),
+	TP_fast_assign(
+		__entry->dev = inode->i_sb->s_dev;
+		__entry->ino = inode->i_ino;
+		__entry->vm_flags = vma->vm_flags;
+		__entry->address = address;
+		__entry->zero_page = zero_page;
+		__entry->radix_entry = radix_entry;
+	),
+	TP_printk("dev %d:%d ino %#lx %s address %#lx zero_page %p "
+			"radix_entry %#lx",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		__entry->ino,
+		__entry->vm_flags & VM_SHARED ? "shared" : "private",
+		__entry->address,
+		__entry->zero_page,
+		(unsigned long)__entry->radix_entry
+	)
+)
+
+#define DEFINE_PMD_LOAD_HOLE_EVENT(name) \
+DEFINE_EVENT(dax_pmd_load_hole_class, name, \
+	TP_PROTO(struct inode *inode, struct vm_area_struct *vma, \
+		unsigned long address, struct page *zero_page, \
+		void *radix_entry), \
+	TP_ARGS(inode, vma, address, zero_page, radix_entry))
+
+DEFINE_PMD_LOAD_HOLE_EVENT(dax_pmd_load_hole);
+DEFINE_PMD_LOAD_HOLE_EVENT(dax_pmd_load_hole_fallback);
 
 #endif /* _TRACE_FS_DAX_H */
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 6/6] dax: add tracepoints to dax_pmd_insert_mapping()
  2016-11-30 23:45 [PATCH v2 0/6] introduce DAX tracepoint support Ross Zwisler
                   ` (4 preceding siblings ...)
  2016-11-30 23:45 ` [PATCH v2 5/6] dax: add tracepoints to dax_pmd_load_hole() Ross Zwisler
@ 2016-11-30 23:45 ` Ross Zwisler
  2016-12-01 14:19   ` Steven Rostedt
  5 siblings, 1 reply; 18+ messages in thread
From: Ross Zwisler @ 2016-11-30 23:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, Alexander Viro, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Ingo Molnar, Jan Kara,
	Matthew Wilcox, Steven Rostedt, linux-fsdevel, linux-mm,
	linux-nvdimm

Add tracepoints to dax_pmd_insert_mapping(), following the same logging
conventions as the tracepoints in dax_iomap_pmd_fault().

Here is an example PMD fault showing the new tracepoints:

big-1504  [001] ....   326.960743: xfs_filemap_pmd_fault: dev 259:0 ino
0x1003

big-1504  [001] ....   326.960753: dax_pmd_fault: dev 259:0 ino 0x1003
shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start
0x10200000 vm_end 0x10700000 pgoff 0x200 max_pgoff 0x1400

big-1504  [001] ....   326.960981: dax_pmd_insert_mapping: dev 259:0 ino
0x1003 shared write address 0x10505000 length 0x200000 pfn 0x100600 DEV|MAP
radix_entry 0xc000e

big-1504  [001] ....   326.960986: dax_pmd_fault_done: dev 259:0 ino 0x1003
shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start
0x10200000 vm_end 0x10700000 pgoff 0x200 max_pgoff 0x1400 NOPAGE

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c                      | 12 +++++++---
 include/linux/pfn_t.h         |  6 +++++
 include/trace/events/fs_dax.h | 51 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index ad18366..66bbd2d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1232,15 +1232,16 @@ static int dax_pmd_insert_mapping(struct vm_area_struct *vma, pmd_t *pmd,
 {
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	struct block_device *bdev = iomap->bdev;
+	struct inode *inode = mapping->host;
 	struct blk_dax_ctl dax = {
 		.sector = dax_iomap_sector(iomap, pos),
 		.size = PMD_SIZE,
 	};
 	long length = dax_map_atomic(bdev, &dax);
-	void *ret;
+	void *ret = NULL;
 
 	if (length < 0) /* dax_map_atomic() failed */
-		return VM_FAULT_FALLBACK;
+		goto fallback;
 	if (length < PMD_SIZE)
 		goto unmap_fallback;
 	if (pfn_t_to_pfn(dax.pfn) & PG_PMD_COLOUR)
@@ -1253,13 +1254,18 @@ static int dax_pmd_insert_mapping(struct vm_area_struct *vma, pmd_t *pmd,
 	ret = dax_insert_mapping_entry(mapping, vmf, *entryp, dax.sector,
 			RADIX_DAX_PMD);
 	if (IS_ERR(ret))
-		return VM_FAULT_FALLBACK;
+		goto fallback;
 	*entryp = ret;
 
+	trace_dax_pmd_insert_mapping(inode, vma, address, write, length,
+			dax.pfn, ret);
 	return vmf_insert_pfn_pmd(vma, address, pmd, dax.pfn, write);
 
 unmap_fallback:
 	dax_unmap_atomic(bdev, &dax);
+fallback:
+	trace_dax_pmd_insert_mapping_fallback(inode, vma, address, write,
+			length, dax.pfn, ret);
 	return VM_FAULT_FALLBACK;
 }
 
diff --git a/include/linux/pfn_t.h b/include/linux/pfn_t.h
index a3d90b9..033fc7b 100644
--- a/include/linux/pfn_t.h
+++ b/include/linux/pfn_t.h
@@ -15,6 +15,12 @@
 #define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3))
 #define PFN_MAP (1ULL << (BITS_PER_LONG_LONG - 4))
 
+#define PFN_FLAGS_TRACE \
+	{ PFN_SG_CHAIN,	"SG_CHAIN" }, \
+	{ PFN_SG_LAST,	"SG_LAST" }, \
+	{ PFN_DEV,	"DEV" }, \
+	{ PFN_MAP,	"MAP" }
+
 static inline pfn_t __pfn_to_pfn_t(unsigned long pfn, u64 flags)
 {
 	pfn_t pfn_t = { .val = pfn | (flags & PFN_FLAGS_MASK), };
diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
index 9f0a455..7d0ea33 100644
--- a/include/trace/events/fs_dax.h
+++ b/include/trace/events/fs_dax.h
@@ -104,6 +104,57 @@ DEFINE_EVENT(dax_pmd_load_hole_class, name, \
 DEFINE_PMD_LOAD_HOLE_EVENT(dax_pmd_load_hole);
 DEFINE_PMD_LOAD_HOLE_EVENT(dax_pmd_load_hole_fallback);
 
+DECLARE_EVENT_CLASS(dax_pmd_insert_mapping_class,
+	TP_PROTO(struct inode *inode, struct vm_area_struct *vma,
+		unsigned long address, int write, long length, pfn_t pfn,
+		void *radix_entry),
+	TP_ARGS(inode, vma, address, write, length, pfn, radix_entry),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned long, ino)
+		__field(unsigned long, vm_flags)
+		__field(unsigned long, address)
+		__field(int, write)
+		__field(long, length)
+		__field(u64, pfn_val)
+		__field(void *, radix_entry)
+	),
+	TP_fast_assign(
+		__entry->dev = inode->i_sb->s_dev;
+		__entry->ino = inode->i_ino;
+		__entry->vm_flags = vma->vm_flags;
+		__entry->address = address;
+		__entry->write = write;
+		__entry->length = length;
+		__entry->pfn_val = pfn.val;
+		__entry->radix_entry = radix_entry;
+	),
+	TP_printk("dev %d:%d ino %#lx %s %s address %#lx length %#lx "
+			"pfn %#llx %s radix_entry %#lx",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		__entry->ino,
+		__entry->vm_flags & VM_SHARED ? "shared" : "private",
+		__entry->write ? "write" : "read",
+		__entry->address,
+		__entry->length,
+		__entry->pfn_val & ~PFN_FLAGS_MASK,
+		__print_flags_u64(__entry->pfn_val & PFN_FLAGS_MASK, "|",
+			PFN_FLAGS_TRACE),
+		(unsigned long)__entry->radix_entry
+	)
+)
+
+#define DEFINE_PMD_INSERT_MAPPING_EVENT(name) \
+DEFINE_EVENT(dax_pmd_insert_mapping_class, name, \
+	TP_PROTO(struct inode *inode, struct vm_area_struct *vma, \
+		unsigned long address, int write, long length, pfn_t pfn, \
+		void *radix_entry), \
+	TP_ARGS(inode, vma, address, write, length, pfn, radix_entry))
+
+DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping);
+DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping_fallback);
+
 #endif /* _TRACE_FS_DAX_H */
 
 /* This part must be outside protection */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/6] dax: add tracepoint infrastructure, PMD tracing
  2016-11-30 23:45 ` [PATCH v2 3/6] dax: add tracepoint infrastructure, PMD tracing Ross Zwisler
@ 2016-12-01  8:10   ` Jan Kara
  2016-12-01 14:16   ` Steven Rostedt
  1 sibling, 0 replies; 18+ messages in thread
From: Jan Kara @ 2016-12-01  8:10 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, Alexander Viro, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Ingo Molnar, Jan Kara,
	Matthew Wilcox, Steven Rostedt, linux-fsdevel, linux-mm,
	linux-nvdimm

On Wed 30-11-16 16:45:30, Ross Zwisler wrote:
> Tracepoints are the standard way to capture debugging and tracing
> information in many parts of the kernel, including the XFS and ext4
> filesystems.  Create a tracepoint header for FS DAX and add the first DAX
> tracepoints to the PMD fault handler.  This allows the tracing for DAX to
> be done in the same way as the filesystem tracing so that developers can
> look at them together and get a coherent idea of what the system is doing.
> 
> I added both an entry and exit tracepoint because future patches will add
> tracepoints to child functions of dax_iomap_pmd_fault() like
> dax_pmd_load_hole() and dax_pmd_insert_mapping(). We want those messages to
> be wrapped by the parent function tracepoints so the code flow is more
> easily understood.  Having entry and exit tracepoints for faults also
> allows us to easily see what filesystems functions were called during the
> fault.  These filesystem functions get executed via iomap_begin() and
> iomap_end() calls, for example, and will have their own tracepoints.
> 
> For PMD faults we primarily want to understand the type of mapping, the
> fault flags, the faulting address and whether it fell back to 4k faults.
> If it fell back to 4k faults the tracepoints should let us understand why.
> 
> I named the new tracepoint header file "fs_dax.h" to allow for device DAX
> to have its own separate tracing header in the same directory at some
> point.
> 
> Here is an example output for these events from a successful PMD fault:
> 
> big-1441  [005] ....    32.582758: xfs_filemap_pmd_fault: dev 259:0 ino
> 0x1003
> 
> big-1441  [005] ....    32.582776: dax_pmd_fault: dev 259:0 ino 0x1003
> shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start
> 0x10200000 vm_end 0x10700000 pgoff 0x200 max_pgoff 0x1400
> 
> big-1441  [005] ....    32.583292: dax_pmd_fault_done: dev 259:0 ino 0x1003
> shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start
> 0x10200000 vm_end 0x10700000 pgoff 0x200 max_pgoff 0x1400 NOPAGE
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Suggested-by: Dave Chinner <david@fromorbit.com>

Looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/dax.c                      | 30 ++++++++++++-------
>  include/linux/mm.h            | 25 ++++++++++++++++
>  include/trace/events/fs_dax.h | 68 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 113 insertions(+), 10 deletions(-)
>  create mode 100644 include/trace/events/fs_dax.h
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index b14335c..4a99c2e 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -35,6 +35,9 @@
>  #include <linux/iomap.h>
>  #include "internal.h"
>  
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/fs_dax.h>
> +
>  /* We choose 4096 entries - same as per-zone page wait tables */
>  #define DAX_WAIT_TABLE_BITS 12
>  #define DAX_WAIT_TABLE_ENTRIES (1 << DAX_WAIT_TABLE_BITS)
> @@ -1311,6 +1314,16 @@ int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
>  	loff_t pos;
>  	int error;
>  
> +	/*
> +	 * Check whether offset isn't beyond end of file now. Caller is
> +	 * supposed to hold locks serializing us with truncate / punch hole so
> +	 * this is a reliable test.
> +	 */
> +	pgoff = linear_page_index(vma, pmd_addr);
> +	max_pgoff = (i_size_read(inode) - 1) >> PAGE_SHIFT;
> +
> +	trace_dax_pmd_fault(inode, vma, address, flags, pgoff, max_pgoff, 0);
> +
>  	/* Fall back to PTEs if we're going to COW */
>  	if (write && !(vma->vm_flags & VM_SHARED))
>  		goto fallback;
> @@ -1321,16 +1334,10 @@ int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
>  	if ((pmd_addr + PMD_SIZE) > vma->vm_end)
>  		goto fallback;
>  
> -	/*
> -	 * Check whether offset isn't beyond end of file now. Caller is
> -	 * supposed to hold locks serializing us with truncate / punch hole so
> -	 * this is a reliable test.
> -	 */
> -	pgoff = linear_page_index(vma, pmd_addr);
> -	max_pgoff = (i_size_read(inode) - 1) >> PAGE_SHIFT;
> -
> -	if (pgoff > max_pgoff)
> -		return VM_FAULT_SIGBUS;
> +	if (pgoff > max_pgoff) {
> +		result = VM_FAULT_SIGBUS;
> +		goto out;
> +	}
>  
>  	/* If the PMD would extend beyond the file size */
>  	if ((pgoff | PG_PMD_COLOUR) > max_pgoff)
> @@ -1401,6 +1408,9 @@ int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
>  		split_huge_pmd(vma, pmd, address);
>  		count_vm_event(THP_FAULT_FALLBACK);
>  	}
> +out:
> +	trace_dax_pmd_fault_done(inode, vma, address, flags, pgoff, max_pgoff,
> +			result);
>  	return result;
>  }
>  EXPORT_SYMBOL_GPL(dax_iomap_pmd_fault);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a5f52c0..30f416a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -281,6 +281,17 @@ extern pgprot_t protection_map[16];
>  #define FAULT_FLAG_REMOTE	0x80	/* faulting for non current tsk/mm */
>  #define FAULT_FLAG_INSTRUCTION  0x100	/* The fault was during an instruction fetch */
>  
> +#define FAULT_FLAG_TRACE \
> +	{ FAULT_FLAG_WRITE,		"WRITE" }, \
> +	{ FAULT_FLAG_MKWRITE,		"MKWRITE" }, \
> +	{ FAULT_FLAG_ALLOW_RETRY,	"ALLOW_RETRY" }, \
> +	{ FAULT_FLAG_RETRY_NOWAIT,	"RETRY_NOWAIT" }, \
> +	{ FAULT_FLAG_KILLABLE,		"KILLABLE" }, \
> +	{ FAULT_FLAG_TRIED,		"TRIED" }, \
> +	{ FAULT_FLAG_USER,		"USER" }, \
> +	{ FAULT_FLAG_REMOTE,		"REMOTE" }, \
> +	{ FAULT_FLAG_INSTRUCTION,	"INSTRUCTION" }
> +
>  /*
>   * vm_fault is filled by the the pagefault handler and passed to the vma's
>   * ->fault function. The vma's ->fault is responsible for returning a bitmask
> @@ -1107,6 +1118,20 @@ static inline void clear_page_pfmemalloc(struct page *page)
>  			 VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \
>  			 VM_FAULT_FALLBACK)
>  
> +#define VM_FAULT_RESULT_TRACE \
> +	{ VM_FAULT_OOM,			"OOM" }, \
> +	{ VM_FAULT_SIGBUS,		"SIGBUS" }, \
> +	{ VM_FAULT_MAJOR,		"MAJOR" }, \
> +	{ VM_FAULT_WRITE,		"WRITE" }, \
> +	{ VM_FAULT_HWPOISON,		"HWPOISON" }, \
> +	{ VM_FAULT_HWPOISON_LARGE,	"HWPOISON_LARGE" }, \
> +	{ VM_FAULT_SIGSEGV,		"SIGSEGV" }, \
> +	{ VM_FAULT_NOPAGE,		"NOPAGE" }, \
> +	{ VM_FAULT_LOCKED,		"LOCKED" }, \
> +	{ VM_FAULT_RETRY,		"RETRY" }, \
> +	{ VM_FAULT_FALLBACK,		"FALLBACK" }, \
> +	{ VM_FAULT_DONE_COW,		"DONE_COW" }
> +
>  /* Encode hstate index for a hwpoisoned large page */
>  #define VM_FAULT_SET_HINDEX(x) ((x) << 12)
>  #define VM_FAULT_GET_HINDEX(x) (((x) >> 12) & 0xf)
> diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
> new file mode 100644
> index 0000000..5acc016
> --- /dev/null
> +++ b/include/trace/events/fs_dax.h
> @@ -0,0 +1,68 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM fs_dax
> +
> +#if !defined(_TRACE_FS_DAX_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_FS_DAX_H
> +
> +#include <linux/tracepoint.h>
> +
> +DECLARE_EVENT_CLASS(dax_pmd_fault_class,
> +	TP_PROTO(struct inode *inode, struct vm_area_struct *vma,
> +		unsigned long address, unsigned int flags, pgoff_t pgoff,
> +		pgoff_t max_pgoff, int result),
> +	TP_ARGS(inode, vma, address, flags, pgoff, max_pgoff, result),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(unsigned long, ino)
> +		__field(unsigned long, vm_start)
> +		__field(unsigned long, vm_end)
> +		__field(unsigned long, vm_flags)
> +		__field(unsigned long, address)
> +		__field(unsigned int, flags)
> +		__field(pgoff_t, pgoff)
> +		__field(pgoff_t, max_pgoff)
> +		__field(int, result)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = inode->i_sb->s_dev;
> +		__entry->ino = inode->i_ino;
> +		__entry->vm_start = vma->vm_start;
> +		__entry->vm_end = vma->vm_end;
> +		__entry->vm_flags = vma->vm_flags;
> +		__entry->address = address;
> +		__entry->flags = flags;
> +		__entry->pgoff = pgoff;
> +		__entry->max_pgoff = max_pgoff;
> +		__entry->result = result;
> +	),
> +	TP_printk("dev %d:%d ino %#lx %s %s address %#lx vm_start "
> +			"%#lx vm_end %#lx pgoff %#lx max_pgoff %#lx %s",
> +		MAJOR(__entry->dev),
> +		MINOR(__entry->dev),
> +		__entry->ino,
> +		__entry->vm_flags & VM_SHARED ? "shared" : "private",
> +		__print_flags(__entry->flags, "|", FAULT_FLAG_TRACE),
> +		__entry->address,
> +		__entry->vm_start,
> +		__entry->vm_end,
> +		__entry->pgoff,
> +		__entry->max_pgoff,
> +		__print_flags(__entry->result, "|", VM_FAULT_RESULT_TRACE)
> +	)
> +)
> +
> +#define DEFINE_PMD_FAULT_EVENT(name) \
> +DEFINE_EVENT(dax_pmd_fault_class, name, \
> +	TP_PROTO(struct inode *inode, struct vm_area_struct *vma, \
> +		unsigned long address, unsigned int flags, pgoff_t pgoff, \
> +		pgoff_t max_pgoff, int result), \
> +	TP_ARGS(inode, vma, address, flags, pgoff, max_pgoff, result))
> +
> +DEFINE_PMD_FAULT_EVENT(dax_pmd_fault);
> +DEFINE_PMD_FAULT_EVENT(dax_pmd_fault_done);
> +
> +
> +#endif /* _TRACE_FS_DAX_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> -- 
> 2.7.4
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/6] dax: remove leading space from labels
  2016-11-30 23:45 ` [PATCH v2 2/6] dax: remove leading space from labels Ross Zwisler
@ 2016-12-01  8:11   ` Jan Kara
  2016-12-01 15:26     ` Ross Zwisler
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kara @ 2016-12-01  8:11 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, Alexander Viro, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Ingo Molnar, Jan Kara,
	Matthew Wilcox, Steven Rostedt, linux-fsdevel, linux-mm,
	linux-nvdimm

On Wed 30-11-16 16:45:29, Ross Zwisler wrote:
> No functional change.
> 
> As of this commit:
> 
> commit 218dd85887da (".gitattributes: set git diff driver for C source code
> files")
> 
> git-diff and git-format-patch both generate diffs whose hunks are correctly
> prefixed by function names instead of labels, even if those labels aren't
> indented with spaces.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Didn't we agree do leave this for a bit later?

								Honza

> ---
>  fs/dax.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index be39633..b14335c 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -422,7 +422,7 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
>  		return page;
>  	}
>  	entry = lock_slot(mapping, slot);
> - out_unlock:
> +out_unlock:
>  	spin_unlock_irq(&mapping->tree_lock);
>  	return entry;
>  }
> @@ -557,7 +557,7 @@ static int dax_load_hole(struct address_space *mapping, void **entry,
>  				   vmf->gfp_mask | __GFP_ZERO);
>  	if (!page)
>  		return VM_FAULT_OOM;
> - out:
> +out:
>  	vmf->page = page;
>  	ret = finish_fault(vmf);
>  	vmf->page = NULL;
> @@ -659,7 +659,7 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
>  	}
>  	if (vmf->flags & FAULT_FLAG_WRITE)
>  		radix_tree_tag_set(page_tree, index, PAGECACHE_TAG_DIRTY);
> - unlock:
> +unlock:
>  	spin_unlock_irq(&mapping->tree_lock);
>  	if (hole_fill) {
>  		radix_tree_preload_end();
> @@ -812,12 +812,12 @@ static int dax_writeback_one(struct block_device *bdev,
>  	spin_lock_irq(&mapping->tree_lock);
>  	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_DIRTY);
>  	spin_unlock_irq(&mapping->tree_lock);
> - unmap:
> +unmap:
>  	dax_unmap_atomic(bdev, &dax);
>  	put_locked_mapping_entry(mapping, index, entry);
>  	return ret;
>  
> - put_unlocked:
> +put_unlocked:
>  	put_unlocked_mapping_entry(mapping, index, entry2);
>  	spin_unlock_irq(&mapping->tree_lock);
>  	return ret;
> @@ -1194,11 +1194,11 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
>  		break;
>  	}
>  
> - error_unlock_entry:
> +error_unlock_entry:
>  	vmf_ret = dax_fault_return(error) | major;
> - unlock_entry:
> +unlock_entry:
>  	put_locked_mapping_entry(mapping, vmf->pgoff, entry);
> - finish_iomap:
> +finish_iomap:
>  	if (ops->iomap_end) {
>  		int copied = PAGE_SIZE;
>  
> @@ -1255,7 +1255,7 @@ static int dax_pmd_insert_mapping(struct vm_area_struct *vma, pmd_t *pmd,
>  
>  	return vmf_insert_pfn_pmd(vma, address, pmd, dax.pfn, write);
>  
> - unmap_fallback:
> +unmap_fallback:
>  	dax_unmap_atomic(bdev, &dax);
>  	return VM_FAULT_FALLBACK;
>  }
> @@ -1379,9 +1379,9 @@ int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
>  		break;
>  	}
>  
> - unlock_entry:
> +unlock_entry:
>  	put_locked_mapping_entry(mapping, pgoff, entry);
> - finish_iomap:
> +finish_iomap:
>  	if (ops->iomap_end) {
>  		int copied = PMD_SIZE;
>  
> @@ -1396,7 +1396,7 @@ int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
>  		ops->iomap_end(inode, pos, PMD_SIZE, copied, iomap_flags,
>  				&iomap);
>  	}
> - fallback:
> +fallback:
>  	if (result == VM_FAULT_FALLBACK) {
>  		split_huge_pmd(vma, pmd, address);
>  		count_vm_event(THP_FAULT_FALLBACK);
> -- 
> 2.7.4
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] tracing: add __print_flags_u64()
  2016-11-30 23:45 ` [PATCH v2 1/6] tracing: add __print_flags_u64() Ross Zwisler
@ 2016-12-01 14:12   ` Steven Rostedt
  2016-12-01 15:35     ` Ross Zwisler
  0 siblings, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2016-12-01 14:12 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, Alexander Viro, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Ingo Molnar, Jan Kara,
	Matthew Wilcox, linux-fsdevel, linux-mm, linux-nvdimm

On Wed, 30 Nov 2016 16:45:28 -0700
Ross Zwisler <ross.zwisler@linux.intel.com> wrote:

> Add __print_flags_u64() and the helper trace_print_flags_seq_u64() in the
> same spirit as __print_symbolic_u64() and trace_print_symbols_seq_u64().
> These functions allow us to print symbols associated with flags that are 64
> bits wide even on 32 bit machines.
> 
> These will be used by the DAX code so that we can print the flags set in a
> pfn_t such as PFN_SG_CHAIN, PFN_SG_LAST, PFN_DEV and PFN_MAP.
> 
> Without this new function I was getting errors like the following when
> compiling for i386:
> 
> ./include/linux/pfn_t.h:13:22: warning: large integer implicitly truncated
> to unsigned type [-Woverflow]
>  #define PFN_SG_CHAIN (1ULL << (BITS_PER_LONG_LONG - 1))
>   ^
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> ---
>  include/linux/trace_events.h |  4 ++++
>  include/trace/trace_events.h | 11 +++++++++++
>  kernel/trace/trace_output.c  | 38 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 53 insertions(+)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index be00761..db2c3ba 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -23,6 +23,10 @@ const char *trace_print_symbols_seq(struct trace_seq *p, unsigned long val,
>  				    const struct trace_print_flags *symbol_array);
>  
>  #if BITS_PER_LONG == 32
> +const char *trace_print_flags_seq_u64(struct trace_seq *p, const char *delim,
> +		      unsigned long long flags,
> +		      const struct trace_print_flags_u64 *flag_array);
> +
>  const char *trace_print_symbols_seq_u64(struct trace_seq *p,
>  					unsigned long long val,
>  					const struct trace_print_flags_u64
> diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
> index 467e12f..c6e9f72 100644
> --- a/include/trace/trace_events.h
> +++ b/include/trace/trace_events.h
> @@ -283,8 +283,16 @@ TRACE_MAKE_SYSTEM_STR();
>  		trace_print_symbols_seq(p, value, symbols);		\
>  	})
>  
> +#undef __print_flags_u64
>  #undef __print_symbolic_u64
>  #if BITS_PER_LONG == 32
> +#define __print_flags_u64(flag, delim, flag_array...)			\
> +	({								\
> +		static const struct trace_print_flags_u64 __flags[] =	\
> +			{ flag_array, { -1, NULL } };			\
> +		trace_print_flags_seq_u64(p, delim, flag, __flags);	\
> +	})
> +
>  #define __print_symbolic_u64(value, symbol_array...)			\
>  	({								\
>  		static const struct trace_print_flags_u64 symbols[] =	\
> @@ -292,6 +300,9 @@ TRACE_MAKE_SYSTEM_STR();
>  		trace_print_symbols_seq_u64(p, value, symbols);	\
>  	})
>  #else
> +#define __print_flags_u64(flag, delim, flag_array...)			\
> +			__print_flags(flag, delim, flag_array)
> +
>  #define __print_symbolic_u64(value, symbol_array...)			\
>  			__print_symbolic(value, symbol_array)
>  #endif
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 3fc2042..ed4398f 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -124,6 +124,44 @@ EXPORT_SYMBOL(trace_print_symbols_seq);
>  
>  #if BITS_PER_LONG == 32
>  const char *
> +trace_print_flags_seq_u64(struct trace_seq *p, const char *delim,
> +		      unsigned long long flags,
> +		      const struct trace_print_flags_u64 *flag_array)
> +{
> +	unsigned long mask;

Don't you want mask to be unsigned long long?

-- Steve

> +	const char *str;
> +	const char *ret = trace_seq_buffer_ptr(p);
> +	int i, first = 1;
> +
> +	for (i = 0;  flag_array[i].name && flags; i++) {
> +
> +		mask = flag_array[i].mask;
> +		if ((flags & mask) != mask)
> +			continue;
> +
> +		str = flag_array[i].name;
> +		flags &= ~mask;
> +		if (!first && delim)
> +			trace_seq_puts(p, delim);
> +		else
> +			first = 0;
> +		trace_seq_puts(p, str);
> +	}
> +
> +	/* check for left over flags */
> +	if (flags) {
> +		if (!first && delim)
> +			trace_seq_puts(p, delim);
> +		trace_seq_printf(p, "0x%llx", flags);
> +	}
> +
> +	trace_seq_putc(p, 0);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(trace_print_flags_seq_u64);
> +
> +const char *
>  trace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
>  			 const struct trace_print_flags_u64 *symbol_array)
>  {

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/6] dax: add tracepoint infrastructure, PMD tracing
  2016-11-30 23:45 ` [PATCH v2 3/6] dax: add tracepoint infrastructure, PMD tracing Ross Zwisler
  2016-12-01  8:10   ` Jan Kara
@ 2016-12-01 14:16   ` Steven Rostedt
  2016-12-01 15:39     ` Ross Zwisler
  1 sibling, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2016-12-01 14:16 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, Alexander Viro, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Ingo Molnar, Jan Kara,
	Matthew Wilcox, linux-fsdevel, linux-mm, linux-nvdimm

On Wed, 30 Nov 2016 16:45:30 -0700
Ross Zwisler <ross.zwisler@linux.intel.com> wrote:


> --- /dev/null
> +++ b/include/trace/events/fs_dax.h
> @@ -0,0 +1,68 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM fs_dax
> +
> +#if !defined(_TRACE_FS_DAX_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_FS_DAX_H
> +
> +#include <linux/tracepoint.h>
> +
> +DECLARE_EVENT_CLASS(dax_pmd_fault_class,
> +	TP_PROTO(struct inode *inode, struct vm_area_struct *vma,
> +		unsigned long address, unsigned int flags, pgoff_t pgoff,
> +		pgoff_t max_pgoff, int result),
> +	TP_ARGS(inode, vma, address, flags, pgoff, max_pgoff, result),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(unsigned long, ino)
> +		__field(unsigned long, vm_start)
> +		__field(unsigned long, vm_end)
> +		__field(unsigned long, vm_flags)
> +		__field(unsigned long, address)
> +		__field(unsigned int, flags)
> +		__field(pgoff_t, pgoff)
> +		__field(pgoff_t, max_pgoff)
> +		__field(int, result)

For better compaction, I would put flags and result together, as they
are both ints. Otherwise, you'll probably have 4 empty bytes after
flags.

-- Steve

> +	),
> +	TP_fast_assign(
> +		__entry->dev = inode->i_sb->s_dev;
> +		__entry->ino = inode->i_ino;
> +		__entry->vm_start = vma->vm_start;
> +		__entry->vm_end = vma->vm_end;
> +		__entry->vm_flags = vma->vm_flags;
> +		__entry->address = address;
> +		__entry->flags = flags;
> +		__entry->pgoff = pgoff;
> +		__entry->max_pgoff = max_pgoff;
> +		__entry->result = result;
> +	),
> +	TP_printk("dev %d:%d ino %#lx %s %s address %#lx vm_start "
> +			"%#lx vm_end %#lx pgoff %#lx max_pgoff %#lx %s",
> +		MAJOR(__entry->dev),
> +		MINOR(__entry->dev),
> +		__entry->ino,
> +		__entry->vm_flags & VM_SHARED ? "shared" : "private",
> +		__print_flags(__entry->flags, "|", FAULT_FLAG_TRACE),
> +		__entry->address,
> +		__entry->vm_start,
> +		__entry->vm_end,
> +		__entry->pgoff,
> +		__entry->max_pgoff,
> +		__print_flags(__entry->result, "|", VM_FAULT_RESULT_TRACE)
> +	)
> +)
> +
> +#define DEFINE_PMD_FAULT_EVENT(name) \
> +DEFINE_EVENT(dax_pmd_fault_class, name, \
> +	TP_PROTO(struct inode *inode, struct vm_area_struct *vma, \
> +		unsigned long address, unsigned int flags, pgoff_t pgoff, \
> +		pgoff_t max_pgoff, int result), \
> +	TP_ARGS(inode, vma, address, flags, pgoff, max_pgoff, result))
> +
> +DEFINE_PMD_FAULT_EVENT(dax_pmd_fault);
> +DEFINE_PMD_FAULT_EVENT(dax_pmd_fault_done);
> +
> +
> +#endif /* _TRACE_FS_DAX_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 6/6] dax: add tracepoints to dax_pmd_insert_mapping()
  2016-11-30 23:45 ` [PATCH v2 6/6] dax: add tracepoints to dax_pmd_insert_mapping() Ross Zwisler
@ 2016-12-01 14:19   ` Steven Rostedt
  2016-12-01 15:44     ` Ross Zwisler
  0 siblings, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2016-12-01 14:19 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, Alexander Viro, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Ingo Molnar, Jan Kara,
	Matthew Wilcox, linux-fsdevel, linux-mm, linux-nvdimm

On Wed, 30 Nov 2016 16:45:33 -0700
Ross Zwisler <ross.zwisler@linux.intel.com> wrote:

> diff --git a/include/linux/pfn_t.h b/include/linux/pfn_t.h
> index a3d90b9..033fc7b 100644
> --- a/include/linux/pfn_t.h
> +++ b/include/linux/pfn_t.h
> @@ -15,6 +15,12 @@
>  #define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3))
>  #define PFN_MAP (1ULL << (BITS_PER_LONG_LONG - 4))
>  
> +#define PFN_FLAGS_TRACE \
> +	{ PFN_SG_CHAIN,	"SG_CHAIN" }, \
> +	{ PFN_SG_LAST,	"SG_LAST" }, \
> +	{ PFN_DEV,	"DEV" }, \
> +	{ PFN_MAP,	"MAP" }
> +
>  static inline pfn_t __pfn_to_pfn_t(unsigned long pfn, u64 flags)
>  {
>  	pfn_t pfn_t = { .val = pfn | (flags & PFN_FLAGS_MASK), };
> diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
> index 9f0a455..7d0ea33 100644
> --- a/include/trace/events/fs_dax.h
> +++ b/include/trace/events/fs_dax.h
> @@ -104,6 +104,57 @@ DEFINE_EVENT(dax_pmd_load_hole_class, name, \
>  DEFINE_PMD_LOAD_HOLE_EVENT(dax_pmd_load_hole);
>  DEFINE_PMD_LOAD_HOLE_EVENT(dax_pmd_load_hole_fallback);
>  
> +DECLARE_EVENT_CLASS(dax_pmd_insert_mapping_class,
> +	TP_PROTO(struct inode *inode, struct vm_area_struct *vma,
> +		unsigned long address, int write, long length, pfn_t pfn,
> +		void *radix_entry),
> +	TP_ARGS(inode, vma, address, write, length, pfn, radix_entry),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(unsigned long, ino)
> +		__field(unsigned long, vm_flags)
> +		__field(unsigned long, address)
> +		__field(int, write)

Place "write" at the end. The ring buffer is 4 byte aligned, so on
archs that can access 8 bytes on 4 byte alignment, this will be packed
tighter. Otherwise, you'll get 4 empty bytes after "write".

-- Steve

> +		__field(long, length)
> +		__field(u64, pfn_val)
> +		__field(void *, radix_entry)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = inode->i_sb->s_dev;
> +		__entry->ino = inode->i_ino;
> +		__entry->vm_flags = vma->vm_flags;
> +		__entry->address = address;
> +		__entry->write = write;
> +		__entry->length = length;
> +		__entry->pfn_val = pfn.val;
> +		__entry->radix_entry = radix_entry;
> +	),
> +	TP_printk("dev %d:%d ino %#lx %s %s address %#lx length %#lx "
> +			"pfn %#llx %s radix_entry %#lx",
> +		MAJOR(__entry->dev),
> +		MINOR(__entry->dev),
> +		__entry->ino,
> +		__entry->vm_flags & VM_SHARED ? "shared" : "private",
> +		__entry->write ? "write" : "read",
> +		__entry->address,
> +		__entry->length,
> +		__entry->pfn_val & ~PFN_FLAGS_MASK,
> +		__print_flags_u64(__entry->pfn_val & PFN_FLAGS_MASK, "|",
> +			PFN_FLAGS_TRACE),
> +		(unsigned long)__entry->radix_entry
> +	)
> +)
> +
> +#define DEFINE_PMD_INSERT_MAPPING_EVENT(name) \
> +DEFINE_EVENT(dax_pmd_insert_mapping_class, name, \
> +	TP_PROTO(struct inode *inode, struct vm_area_struct *vma, \
> +		unsigned long address, int write, long length, pfn_t pfn, \
> +		void *radix_entry), \
> +	TP_ARGS(inode, vma, address, write, length, pfn, radix_entry))
> +
> +DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping);
> +DEFINE_PMD_INSERT_MAPPING_EVENT(dax_pmd_insert_mapping_fallback);
> +
>  #endif /* _TRACE_FS_DAX_H */
>  
>  /* This part must be outside protection */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/6] dax: remove leading space from labels
  2016-12-01  8:11   ` Jan Kara
@ 2016-12-01 15:26     ` Ross Zwisler
  2016-12-01 16:33       ` Jan Kara
  0 siblings, 1 reply; 18+ messages in thread
From: Ross Zwisler @ 2016-12-01 15:26 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, linux-kernel, Alexander Viro, Andrew Morton,
	Christoph Hellwig, Dan Williams, Dave Chinner, Ingo Molnar,
	Matthew Wilcox, Steven Rostedt, linux-fsdevel, linux-mm,
	linux-nvdimm

On Thu, Dec 01, 2016 at 09:11:44AM +0100, Jan Kara wrote:
> On Wed 30-11-16 16:45:29, Ross Zwisler wrote:
> > No functional change.
> > 
> > As of this commit:
> > 
> > commit 218dd85887da (".gitattributes: set git diff driver for C source code
> > files")
> > 
> > git-diff and git-format-patch both generate diffs whose hunks are correctly
> > prefixed by function names instead of labels, even if those labels aren't
> > indented with spaces.
> > 
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> 
> Didn't we agree do leave this for a bit later?

Sorry, I thought you just asked to not have to edit your "Page invalidation
fixes" series because of this change.  This series is based on a tree that
already includes your page invalidation work, so it shouldn't cause you any
thrash.

I'll pull it out of the next version of this series.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] tracing: add __print_flags_u64()
  2016-12-01 14:12   ` Steven Rostedt
@ 2016-12-01 15:35     ` Ross Zwisler
  0 siblings, 0 replies; 18+ messages in thread
From: Ross Zwisler @ 2016-12-01 15:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ross Zwisler, linux-kernel, Alexander Viro, Andrew Morton,
	Christoph Hellwig, Dan Williams, Dave Chinner, Ingo Molnar,
	Jan Kara, Matthew Wilcox, linux-fsdevel, linux-mm, linux-nvdimm

On Thu, Dec 01, 2016 at 09:12:54AM -0500, Steven Rostedt wrote:
> On Wed, 30 Nov 2016 16:45:28 -0700
> Ross Zwisler <ross.zwisler@linux.intel.com> wrote:
> > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> > index 3fc2042..ed4398f 100644
> > --- a/kernel/trace/trace_output.c
> > +++ b/kernel/trace/trace_output.c
> > @@ -124,6 +124,44 @@ EXPORT_SYMBOL(trace_print_symbols_seq);
> >  
> >  #if BITS_PER_LONG == 32
> >  const char *
> > +trace_print_flags_seq_u64(struct trace_seq *p, const char *delim,
> > +		      unsigned long long flags,
> > +		      const struct trace_print_flags_u64 *flag_array)
> > +{
> > +	unsigned long mask;
> 
> Don't you want mask to be unsigned long long?

Yep, thanks for spotting that.  I'll fix it in v3.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/6] dax: add tracepoint infrastructure, PMD tracing
  2016-12-01 14:16   ` Steven Rostedt
@ 2016-12-01 15:39     ` Ross Zwisler
  0 siblings, 0 replies; 18+ messages in thread
From: Ross Zwisler @ 2016-12-01 15:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ross Zwisler, linux-kernel, Alexander Viro, Andrew Morton,
	Christoph Hellwig, Dan Williams, Dave Chinner, Ingo Molnar,
	Jan Kara, Matthew Wilcox, linux-fsdevel, linux-mm, linux-nvdimm

On Thu, Dec 01, 2016 at 09:16:28AM -0500, Steven Rostedt wrote:
> On Wed, 30 Nov 2016 16:45:30 -0700
> Ross Zwisler <ross.zwisler@linux.intel.com> wrote:
> 
> 
> > --- /dev/null
> > +++ b/include/trace/events/fs_dax.h
> > @@ -0,0 +1,68 @@
> > +#undef TRACE_SYSTEM
> > +#define TRACE_SYSTEM fs_dax
> > +
> > +#if !defined(_TRACE_FS_DAX_H) || defined(TRACE_HEADER_MULTI_READ)
> > +#define _TRACE_FS_DAX_H
> > +
> > +#include <linux/tracepoint.h>
> > +
> > +DECLARE_EVENT_CLASS(dax_pmd_fault_class,
> > +	TP_PROTO(struct inode *inode, struct vm_area_struct *vma,
> > +		unsigned long address, unsigned int flags, pgoff_t pgoff,
> > +		pgoff_t max_pgoff, int result),
> > +	TP_ARGS(inode, vma, address, flags, pgoff, max_pgoff, result),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(unsigned long, ino)
> > +		__field(unsigned long, vm_start)
> > +		__field(unsigned long, vm_end)
> > +		__field(unsigned long, vm_flags)
> > +		__field(unsigned long, address)
> > +		__field(unsigned int, flags)
> > +		__field(pgoff_t, pgoff)
> > +		__field(pgoff_t, max_pgoff)
> > +		__field(int, result)
> 
> For better compaction, I would put flags and result together, as they
> are both ints. Otherwise, you'll probably have 4 empty bytes after
> flags.

Sure, will do for v3.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 6/6] dax: add tracepoints to dax_pmd_insert_mapping()
  2016-12-01 14:19   ` Steven Rostedt
@ 2016-12-01 15:44     ` Ross Zwisler
  2016-12-01 16:11       ` Steven Rostedt
  0 siblings, 1 reply; 18+ messages in thread
From: Ross Zwisler @ 2016-12-01 15:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ross Zwisler, linux-kernel, Alexander Viro, Andrew Morton,
	Christoph Hellwig, Dan Williams, Dave Chinner, Ingo Molnar,
	Jan Kara, Matthew Wilcox, linux-fsdevel, linux-mm, linux-nvdimm

On Thu, Dec 01, 2016 at 09:19:30AM -0500, Steven Rostedt wrote:
> On Wed, 30 Nov 2016 16:45:33 -0700
> Ross Zwisler <ross.zwisler@linux.intel.com> wrote:
> 
> > diff --git a/include/linux/pfn_t.h b/include/linux/pfn_t.h
> > index a3d90b9..033fc7b 100644
> > --- a/include/linux/pfn_t.h
> > +++ b/include/linux/pfn_t.h
> > @@ -15,6 +15,12 @@
> >  #define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3))
> >  #define PFN_MAP (1ULL << (BITS_PER_LONG_LONG - 4))
> >  
> > +#define PFN_FLAGS_TRACE \
> > +	{ PFN_SG_CHAIN,	"SG_CHAIN" }, \
> > +	{ PFN_SG_LAST,	"SG_LAST" }, \
> > +	{ PFN_DEV,	"DEV" }, \
> > +	{ PFN_MAP,	"MAP" }
> > +
> >  static inline pfn_t __pfn_to_pfn_t(unsigned long pfn, u64 flags)
> >  {
> >  	pfn_t pfn_t = { .val = pfn | (flags & PFN_FLAGS_MASK), };
> > diff --git a/include/trace/events/fs_dax.h b/include/trace/events/fs_dax.h
> > index 9f0a455..7d0ea33 100644
> > --- a/include/trace/events/fs_dax.h
> > +++ b/include/trace/events/fs_dax.h
> > @@ -104,6 +104,57 @@ DEFINE_EVENT(dax_pmd_load_hole_class, name, \
> >  DEFINE_PMD_LOAD_HOLE_EVENT(dax_pmd_load_hole);
> >  DEFINE_PMD_LOAD_HOLE_EVENT(dax_pmd_load_hole_fallback);
> >  
> > +DECLARE_EVENT_CLASS(dax_pmd_insert_mapping_class,
> > +	TP_PROTO(struct inode *inode, struct vm_area_struct *vma,
> > +		unsigned long address, int write, long length, pfn_t pfn,
> > +		void *radix_entry),
> > +	TP_ARGS(inode, vma, address, write, length, pfn, radix_entry),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(unsigned long, ino)
> > +		__field(unsigned long, vm_flags)
> > +		__field(unsigned long, address)
> > +		__field(int, write)
> 
> Place "write" at the end. The ring buffer is 4 byte aligned, so on
> archs that can access 8 bytes on 4 byte alignment, this will be packed
> tighter. Otherwise, you'll get 4 empty bytes after "write".

Actually I think it may be ideal to stick it as the 2nd entry after 'dev'.
dev_t is:

typedef __u32 __kernel_dev_t;
typedef __kernel_dev_t		dev_t;

So those two 32 bit values should combine into a single 64 bit space.

Thanks for the help, I obviously wasn't considering packing when ordering the
elements.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 6/6] dax: add tracepoints to dax_pmd_insert_mapping()
  2016-12-01 15:44     ` Ross Zwisler
@ 2016-12-01 16:11       ` Steven Rostedt
  0 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2016-12-01 16:11 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, Alexander Viro, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Ingo Molnar, Jan Kara,
	Matthew Wilcox, linux-fsdevel, linux-mm, linux-nvdimm

On Thu, 1 Dec 2016 08:44:32 -0700
Ross Zwisler <ross.zwisler@linux.intel.com> wrote:


> Actually I think it may be ideal to stick it as the 2nd entry after 'dev'.
> dev_t is:
> 
> typedef __u32 __kernel_dev_t;
> typedef __kernel_dev_t		dev_t;
> 
> So those two 32 bit values should combine into a single 64 bit space.

Yeah that should work too.

-- Steve

> 
> Thanks for the help, I obviously wasn't considering packing when ordering the
> elements.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/6] dax: remove leading space from labels
  2016-12-01 15:26     ` Ross Zwisler
@ 2016-12-01 16:33       ` Jan Kara
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2016-12-01 16:33 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-kernel, Alexander Viro, Andrew Morton,
	Christoph Hellwig, Dan Williams, Dave Chinner, Ingo Molnar,
	Matthew Wilcox, Steven Rostedt, linux-fsdevel, linux-mm,
	linux-nvdimm

On Thu 01-12-16 08:26:19, Ross Zwisler wrote:
> On Thu, Dec 01, 2016 at 09:11:44AM +0100, Jan Kara wrote:
> > On Wed 30-11-16 16:45:29, Ross Zwisler wrote:
> > > No functional change.
> > > 
> > > As of this commit:
> > > 
> > > commit 218dd85887da (".gitattributes: set git diff driver for C source code
> > > files")
> > > 
> > > git-diff and git-format-patch both generate diffs whose hunks are correctly
> > > prefixed by function names instead of labels, even if those labels aren't
> > > indented with spaces.
> > > 
> > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > 
> > Didn't we agree do leave this for a bit later?
> 
> Sorry, I thought you just asked to not have to edit your "Page invalidation
> fixes" series because of this change.  This series is based on a tree that
> already includes your page invalidation work, so it shouldn't cause you any
> thrash.

Ah, I see, I didn't notice. Then it's fine :). Thanks.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-12-01 16:33 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-30 23:45 [PATCH v2 0/6] introduce DAX tracepoint support Ross Zwisler
2016-11-30 23:45 ` [PATCH v2 1/6] tracing: add __print_flags_u64() Ross Zwisler
2016-12-01 14:12   ` Steven Rostedt
2016-12-01 15:35     ` Ross Zwisler
2016-11-30 23:45 ` [PATCH v2 2/6] dax: remove leading space from labels Ross Zwisler
2016-12-01  8:11   ` Jan Kara
2016-12-01 15:26     ` Ross Zwisler
2016-12-01 16:33       ` Jan Kara
2016-11-30 23:45 ` [PATCH v2 3/6] dax: add tracepoint infrastructure, PMD tracing Ross Zwisler
2016-12-01  8:10   ` Jan Kara
2016-12-01 14:16   ` Steven Rostedt
2016-12-01 15:39     ` Ross Zwisler
2016-11-30 23:45 ` [PATCH v2 4/6] dax: update MAINTAINERS entries for FS DAX Ross Zwisler
2016-11-30 23:45 ` [PATCH v2 5/6] dax: add tracepoints to dax_pmd_load_hole() Ross Zwisler
2016-11-30 23:45 ` [PATCH v2 6/6] dax: add tracepoints to dax_pmd_insert_mapping() Ross Zwisler
2016-12-01 14:19   ` Steven Rostedt
2016-12-01 15:44     ` Ross Zwisler
2016-12-01 16:11       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).