* [PATCH 0/2] xfs: small debug improvements
@ 2016-01-06 23:31 Dave Chinner
2016-01-06 23:31 ` [PATCH 1/2] xfs: add tracepoints to readpage calls Dave Chinner
2016-01-06 23:31 ` [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive Dave Chinner
0 siblings, 2 replies; 5+ messages in thread
From: Dave Chinner @ 2016-01-06 23:31 UTC (permalink / raw)
To: xfs
Hi folks,
These are a couple of simple patches I have recently been using when
trying to diagnose issues that have been reported to me. They seem
useful enough to include for everyone, so here they are.
-Dave.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] xfs: add tracepoints to readpage calls
2016-01-06 23:31 [PATCH 0/2] xfs: small debug improvements Dave Chinner
@ 2016-01-06 23:31 ` Dave Chinner
2016-01-07 12:53 ` Brian Foster
2016-01-06 23:31 ` [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive Dave Chinner
1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2016-01-06 23:31 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
This allows us to see page cache driven readahead in action as it
passes through XFS. This helps to understand buffered read
throughput problems such as readahead IO IO sizes being too small
for the underlying device to reach max throughput.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_aops.c | 2 ++
fs/xfs/xfs_trace.h | 26 ++++++++++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 29e7e5d..379c089 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1917,6 +1917,7 @@ xfs_vm_readpage(
struct file *unused,
struct page *page)
{
+ trace_xfs_vm_readpage(page->mapping->host, 1);
return mpage_readpage(page, xfs_get_blocks);
}
@@ -1927,6 +1928,7 @@ xfs_vm_readpages(
struct list_head *pages,
unsigned nr_pages)
{
+ trace_xfs_vm_readpages(mapping->host, nr_pages);
return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks);
}
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 877079eb..391d797 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1222,6 +1222,32 @@ DEFINE_PAGE_EVENT(xfs_writepage);
DEFINE_PAGE_EVENT(xfs_releasepage);
DEFINE_PAGE_EVENT(xfs_invalidatepage);
+DECLARE_EVENT_CLASS(xfs_readpage_class,
+ TP_PROTO(struct inode *inode, int nr_pages),
+ TP_ARGS(inode, nr_pages),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, ino)
+ __field(int, nr_pages)
+ ),
+ TP_fast_assign(
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = inode->i_ino;
+ __entry->nr_pages = nr_pages;
+ ),
+ TP_printk("dev %d:%d ino 0x%llx nr_pages %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ __entry->nr_pages)
+)
+
+#define DEFINE_READPAGE_EVENT(name) \
+DEFINE_EVENT(xfs_readpage_class, name, \
+ TP_PROTO(struct inode *inode, int nr_pages), \
+ TP_ARGS(inode, nr_pages))
+DEFINE_READPAGE_EVENT(xfs_vm_readpage);
+DEFINE_READPAGE_EVENT(xfs_vm_readpages);
+
DECLARE_EVENT_CLASS(xfs_imap_class,
TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
int type, struct xfs_bmbt_irec *irec),
--
2.5.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive
2016-01-06 23:31 [PATCH 0/2] xfs: small debug improvements Dave Chinner
2016-01-06 23:31 ` [PATCH 1/2] xfs: add tracepoints to readpage calls Dave Chinner
@ 2016-01-06 23:31 ` Dave Chinner
2016-01-07 12:53 ` Brian Foster
1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2016-01-06 23:31 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
For large sparse or fragmented files, checking every single entry in
the bmapbt on every operation is prohibitively expensive. Especially
as such checks rarely discover problems during normal operations on
high extent coutn files. Our regression tests don't tend to exercise
files with hundreds of thousands to millions of extents, so mostly
this isn't noticed.
However, trying to run things like xfs_mdrestore of large filesystem
dumps on a debug kernel quickly becomes impossible as the CPU is
completely burnt up repeatedly walking the sparse file bmapbt that
is generated for every allocation that is made.
Hence, if the file has more than 10,000 extents, just don't bother
with walking the tree to check it exhaustively. The btree code has
checks that ensure that the newly inserted/removed/modified record
is correctly ordered, so the entrie tree walk in thses cases has
limited additional value.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_bmap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7388495..bc7e7d5 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -325,9 +325,11 @@ xfs_check_block(
/*
* Check that the extents for the inode ip are in the right order in all
- * btree leaves.
+ * btree leaves. THis becomes prohibitively expensive for large extent count
+ * files, so don't bother with inodes that have more than 10,000 extents in
+ * them. The btree record ordering checks will still be done, so for such large
+ * bmapbt constructs that is going to catch most corruptions.
*/
-
STATIC void
xfs_bmap_check_leaf_extents(
xfs_btree_cur_t *cur, /* btree cursor or null */
@@ -352,6 +354,10 @@ xfs_bmap_check_leaf_extents(
return;
}
+ /* skip large extent count inodes */
+ if (ip->i_d.di_nextents > 10000)
+ return;
+
bno = NULLFSBLOCK;
mp = ip->i_mount;
ifp = XFS_IFORK_PTR(ip, whichfork);
--
2.5.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] xfs: add tracepoints to readpage calls
2016-01-06 23:31 ` [PATCH 1/2] xfs: add tracepoints to readpage calls Dave Chinner
@ 2016-01-07 12:53 ` Brian Foster
0 siblings, 0 replies; 5+ messages in thread
From: Brian Foster @ 2016-01-07 12:53 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Thu, Jan 07, 2016 at 10:31:26AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> This allows us to see page cache driven readahead in action as it
> passes through XFS. This helps to understand buffered read
> throughput problems such as readahead IO IO sizes being too small
Extra "IO" above: ^
> for the underlying device to reach max throughput.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
Reviewed-by: Brian Foster <bfoster@redhat.com>
> fs/xfs/xfs_aops.c | 2 ++
> fs/xfs/xfs_trace.h | 26 ++++++++++++++++++++++++++
> 2 files changed, 28 insertions(+)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 29e7e5d..379c089 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1917,6 +1917,7 @@ xfs_vm_readpage(
> struct file *unused,
> struct page *page)
> {
> + trace_xfs_vm_readpage(page->mapping->host, 1);
> return mpage_readpage(page, xfs_get_blocks);
> }
>
> @@ -1927,6 +1928,7 @@ xfs_vm_readpages(
> struct list_head *pages,
> unsigned nr_pages)
> {
> + trace_xfs_vm_readpages(mapping->host, nr_pages);
> return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks);
> }
>
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 877079eb..391d797 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1222,6 +1222,32 @@ DEFINE_PAGE_EVENT(xfs_writepage);
> DEFINE_PAGE_EVENT(xfs_releasepage);
> DEFINE_PAGE_EVENT(xfs_invalidatepage);
>
> +DECLARE_EVENT_CLASS(xfs_readpage_class,
> + TP_PROTO(struct inode *inode, int nr_pages),
> + TP_ARGS(inode, nr_pages),
> + TP_STRUCT__entry(
> + __field(dev_t, dev)
> + __field(xfs_ino_t, ino)
> + __field(int, nr_pages)
> + ),
> + TP_fast_assign(
> + __entry->dev = inode->i_sb->s_dev;
> + __entry->ino = inode->i_ino;
> + __entry->nr_pages = nr_pages;
> + ),
> + TP_printk("dev %d:%d ino 0x%llx nr_pages %d",
> + MAJOR(__entry->dev), MINOR(__entry->dev),
> + __entry->ino,
> + __entry->nr_pages)
> +)
> +
> +#define DEFINE_READPAGE_EVENT(name) \
> +DEFINE_EVENT(xfs_readpage_class, name, \
> + TP_PROTO(struct inode *inode, int nr_pages), \
> + TP_ARGS(inode, nr_pages))
> +DEFINE_READPAGE_EVENT(xfs_vm_readpage);
> +DEFINE_READPAGE_EVENT(xfs_vm_readpages);
> +
> DECLARE_EVENT_CLASS(xfs_imap_class,
> TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count,
> int type, struct xfs_bmbt_irec *irec),
> --
> 2.5.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive
2016-01-06 23:31 ` [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive Dave Chinner
@ 2016-01-07 12:53 ` Brian Foster
0 siblings, 0 replies; 5+ messages in thread
From: Brian Foster @ 2016-01-07 12:53 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Thu, Jan 07, 2016 at 10:31:27AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> For large sparse or fragmented files, checking every single entry in
> the bmapbt on every operation is prohibitively expensive. Especially
> as such checks rarely discover problems during normal operations on
> high extent coutn files. Our regression tests don't tend to exercise
count
> files with hundreds of thousands to millions of extents, so mostly
> this isn't noticed.
>
> However, trying to run things like xfs_mdrestore of large filesystem
> dumps on a debug kernel quickly becomes impossible as the CPU is
> completely burnt up repeatedly walking the sparse file bmapbt that
> is generated for every allocation that is made.
>
> Hence, if the file has more than 10,000 extents, just don't bother
> with walking the tree to check it exhaustively. The btree code has
> checks that ensure that the newly inserted/removed/modified record
> is correctly ordered, so the entrie tree walk in thses cases has
these
> limited additional value.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/libxfs/xfs_bmap.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7388495..bc7e7d5 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -325,9 +325,11 @@ xfs_check_block(
>
> /*
> * Check that the extents for the inode ip are in the right order in all
> - * btree leaves.
> + * btree leaves. THis becomes prohibitively expensive for large extent count
This
Reviewed-by: Brian Foster <bfoster@redhat.com>
> + * files, so don't bother with inodes that have more than 10,000 extents in
> + * them. The btree record ordering checks will still be done, so for such large
> + * bmapbt constructs that is going to catch most corruptions.
> */
> -
> STATIC void
> xfs_bmap_check_leaf_extents(
> xfs_btree_cur_t *cur, /* btree cursor or null */
> @@ -352,6 +354,10 @@ xfs_bmap_check_leaf_extents(
> return;
> }
>
> + /* skip large extent count inodes */
> + if (ip->i_d.di_nextents > 10000)
> + return;
> +
> bno = NULLFSBLOCK;
> mp = ip->i_mount;
> ifp = XFS_IFORK_PTR(ip, whichfork);
> --
> 2.5.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-01-07 12:53 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-06 23:31 [PATCH 0/2] xfs: small debug improvements Dave Chinner
2016-01-06 23:31 ` [PATCH 1/2] xfs: add tracepoints to readpage calls Dave Chinner
2016-01-07 12:53 ` Brian Foster
2016-01-06 23:31 ` [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive Dave Chinner
2016-01-07 12:53 ` Brian Foster
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.