All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] xfs: small debug improvements
@ 2016-01-06 23:31 Dave Chinner
  2016-01-06 23:31 ` [PATCH 1/2] xfs: add tracepoints to readpage calls Dave Chinner
  2016-01-06 23:31 ` [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive Dave Chinner
  0 siblings, 2 replies; 5+ messages in thread
From: Dave Chinner @ 2016-01-06 23:31 UTC (permalink / raw)
  To: xfs

Hi folks,

These are a couple of simple patches I have recently been using when
trying to diagnose issues that have been reported to me. They seem
useful enough to include for everyone, so here they are.

-Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] xfs: add tracepoints to readpage calls
  2016-01-06 23:31 [PATCH 0/2] xfs: small debug improvements Dave Chinner
@ 2016-01-06 23:31 ` Dave Chinner
  2016-01-07 12:53   ` Brian Foster
  2016-01-06 23:31 ` [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive Dave Chinner
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2016-01-06 23:31 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

This allows us to see page cache driven readahead in action as it
passes through XFS. This helps to understand buffered read
throughput problems such as readahead IO IO sizes being too small
for the underlying device to reach max throughput.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c  |  2 ++
 fs/xfs/xfs_trace.h | 26 ++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 29e7e5d..379c089 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1917,6 +1917,7 @@ xfs_vm_readpage(
 	struct file		*unused,
 	struct page		*page)
 {
+	trace_xfs_vm_readpage(page->mapping->host, 1);
 	return mpage_readpage(page, xfs_get_blocks);
 }
 
@@ -1927,6 +1928,7 @@ xfs_vm_readpages(
 	struct list_head	*pages,
 	unsigned		nr_pages)
 {
+	trace_xfs_vm_readpages(mapping->host, nr_pages);
 	return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks);
 }
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 877079eb..391d797 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1222,6 +1222,32 @@ DEFINE_PAGE_EVENT(xfs_writepage);
 DEFINE_PAGE_EVENT(xfs_releasepage);
 DEFINE_PAGE_EVENT(xfs_invalidatepage);
 
+DECLARE_EVENT_CLASS(xfs_readpage_class,
+	TP_PROTO(struct inode *inode, int nr_pages),
+	TP_ARGS(inode, nr_pages),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(int, nr_pages)
+	),
+	TP_fast_assign(
+		__entry->dev = inode->i_sb->s_dev;
+		__entry->ino = inode->i_ino;
+		__entry->nr_pages = nr_pages;
+	),
+	TP_printk("dev %d:%d ino 0x%llx nr_pages %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->nr_pages)
+)
+
+#define DEFINE_READPAGE_EVENT(name)		\
+DEFINE_EVENT(xfs_readpage_class, name,	\
+	TP_PROTO(struct inode *inode, int nr_pages), \
+	TP_ARGS(inode, nr_pages))
+DEFINE_READPAGE_EVENT(xfs_vm_readpage);
+DEFINE_READPAGE_EVENT(xfs_vm_readpages);
+
 DECLARE_EVENT_CLASS(xfs_imap_class,
 	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
 		 int type, struct xfs_bmbt_irec *irec),
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive
  2016-01-06 23:31 [PATCH 0/2] xfs: small debug improvements Dave Chinner
  2016-01-06 23:31 ` [PATCH 1/2] xfs: add tracepoints to readpage calls Dave Chinner
@ 2016-01-06 23:31 ` Dave Chinner
  2016-01-07 12:53   ` Brian Foster
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2016-01-06 23:31 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

For large sparse or fragmented files, checking every single entry in
the bmapbt on every operation is prohibitively expensive. Especially
as such checks rarely discover problems during normal operations on
high extent coutn files. Our regression tests don't tend to exercise
files with hundreds of thousands to millions of extents, so mostly
this isn't noticed.

However, trying to run things like xfs_mdrestore of large filesystem
dumps on a debug kernel quickly becomes impossible as the CPU is
completely burnt up repeatedly walking the sparse file bmapbt that
is generated for every allocation that is made.

Hence, if the file has more than 10,000 extents, just don't bother
with walking the tree to check it exhaustively. The btree code has
checks that ensure that the newly inserted/removed/modified record
is correctly ordered, so the entrie tree walk in thses cases has
limited additional value.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7388495..bc7e7d5 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -325,9 +325,11 @@ xfs_check_block(
 
 /*
  * Check that the extents for the inode ip are in the right order in all
- * btree leaves.
+ * btree leaves. THis becomes prohibitively expensive for large extent count
+ * files, so don't bother with inodes that have more than 10,000 extents in
+ * them. The btree record ordering checks will still be done, so for such large
+ * bmapbt constructs that is going to catch most corruptions.
  */
-
 STATIC void
 xfs_bmap_check_leaf_extents(
 	xfs_btree_cur_t		*cur,	/* btree cursor or null */
@@ -352,6 +354,10 @@ xfs_bmap_check_leaf_extents(
 		return;
 	}
 
+	/* skip large extent count inodes */
+	if (ip->i_d.di_nextents > 10000)
+		return;
+
 	bno = NULLFSBLOCK;
 	mp = ip->i_mount;
 	ifp = XFS_IFORK_PTR(ip, whichfork);
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] xfs: add tracepoints to readpage calls
  2016-01-06 23:31 ` [PATCH 1/2] xfs: add tracepoints to readpage calls Dave Chinner
@ 2016-01-07 12:53   ` Brian Foster
  0 siblings, 0 replies; 5+ messages in thread
From: Brian Foster @ 2016-01-07 12:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Jan 07, 2016 at 10:31:26AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> This allows us to see page cache driven readahead in action as it
> passes through XFS. This helps to understand buffered read
> throughput problems such as readahead IO IO sizes being too small

Extra "IO" above:			^

> for the underlying device to reach max throughput.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c  |  2 ++
>  fs/xfs/xfs_trace.h | 26 ++++++++++++++++++++++++++
>  2 files changed, 28 insertions(+)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 29e7e5d..379c089 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1917,6 +1917,7 @@ xfs_vm_readpage(
>  	struct file		*unused,
>  	struct page		*page)
>  {
> +	trace_xfs_vm_readpage(page->mapping->host, 1);
>  	return mpage_readpage(page, xfs_get_blocks);
>  }
>  
> @@ -1927,6 +1928,7 @@ xfs_vm_readpages(
>  	struct list_head	*pages,
>  	unsigned		nr_pages)
>  {
> +	trace_xfs_vm_readpages(mapping->host, nr_pages);
>  	return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks);
>  }
>  
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 877079eb..391d797 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1222,6 +1222,32 @@ DEFINE_PAGE_EVENT(xfs_writepage);
>  DEFINE_PAGE_EVENT(xfs_releasepage);
>  DEFINE_PAGE_EVENT(xfs_invalidatepage);
>  
> +DECLARE_EVENT_CLASS(xfs_readpage_class,
> +	TP_PROTO(struct inode *inode, int nr_pages),
> +	TP_ARGS(inode, nr_pages),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_ino_t, ino)
> +		__field(int, nr_pages)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = inode->i_sb->s_dev;
> +		__entry->ino = inode->i_ino;
> +		__entry->nr_pages = nr_pages;
> +	),
> +	TP_printk("dev %d:%d ino 0x%llx nr_pages %d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->ino,
> +		  __entry->nr_pages)
> +)
> +
> +#define DEFINE_READPAGE_EVENT(name)		\
> +DEFINE_EVENT(xfs_readpage_class, name,	\
> +	TP_PROTO(struct inode *inode, int nr_pages), \
> +	TP_ARGS(inode, nr_pages))
> +DEFINE_READPAGE_EVENT(xfs_vm_readpage);
> +DEFINE_READPAGE_EVENT(xfs_vm_readpages);
> +
>  DECLARE_EVENT_CLASS(xfs_imap_class,
>  	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
>  		 int type, struct xfs_bmbt_irec *irec),
> -- 
> 2.5.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive
  2016-01-06 23:31 ` [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive Dave Chinner
@ 2016-01-07 12:53   ` Brian Foster
  0 siblings, 0 replies; 5+ messages in thread
From: Brian Foster @ 2016-01-07 12:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Jan 07, 2016 at 10:31:27AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> For large sparse or fragmented files, checking every single entry in
> the bmapbt on every operation is prohibitively expensive. Especially
> as such checks rarely discover problems during normal operations on
> high extent coutn files. Our regression tests don't tend to exercise

	      count

> files with hundreds of thousands to millions of extents, so mostly
> this isn't noticed.
> 
> However, trying to run things like xfs_mdrestore of large filesystem
> dumps on a debug kernel quickly becomes impossible as the CPU is
> completely burnt up repeatedly walking the sparse file bmapbt that
> is generated for every allocation that is made.
> 
> Hence, if the file has more than 10,000 extents, just don't bother
> with walking the tree to check it exhaustively. The btree code has
> checks that ensure that the newly inserted/removed/modified record
> is correctly ordered, so the entrie tree walk in thses cases has

						   these

> limited additional value.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7388495..bc7e7d5 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -325,9 +325,11 @@ xfs_check_block(
>  
>  /*
>   * Check that the extents for the inode ip are in the right order in all
> - * btree leaves.
> + * btree leaves. THis becomes prohibitively expensive for large extent count

		    This

Reviewed-by: Brian Foster <bfoster@redhat.com>

> + * files, so don't bother with inodes that have more than 10,000 extents in
> + * them. The btree record ordering checks will still be done, so for such large
> + * bmapbt constructs that is going to catch most corruptions.
>   */
> -
>  STATIC void
>  xfs_bmap_check_leaf_extents(
>  	xfs_btree_cur_t		*cur,	/* btree cursor or null */
> @@ -352,6 +354,10 @@ xfs_bmap_check_leaf_extents(
>  		return;
>  	}
>  
> +	/* skip large extent count inodes */
> +	if (ip->i_d.di_nextents > 10000)
> +		return;
> +
>  	bno = NULLFSBLOCK;
>  	mp = ip->i_mount;
>  	ifp = XFS_IFORK_PTR(ip, whichfork);
> -- 
> 2.5.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-01-07 12:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-06 23:31 [PATCH 0/2] xfs: small debug improvements Dave Chinner
2016-01-06 23:31 ` [PATCH 1/2] xfs: add tracepoints to readpage calls Dave Chinner
2016-01-07 12:53   ` Brian Foster
2016-01-06 23:31 ` [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive Dave Chinner
2016-01-07 12:53   ` Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.