* [PATCH 0/2] xfs: small debug improvements @ 2016-01-06 23:31 Dave Chinner 2016-01-06 23:31 ` [PATCH 1/2] xfs: add tracepoints to readpage calls Dave Chinner 2016-01-06 23:31 ` [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive Dave Chinner 0 siblings, 2 replies; 5+ messages in thread From: Dave Chinner @ 2016-01-06 23:31 UTC (permalink / raw) To: xfs Hi folks, These are a couple of simple patches I have recently been using when trying to diagnose issues that have been reported to me. They seem useful enough to include for everyone, so here they are. -Dave. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] xfs: add tracepoints to readpage calls 2016-01-06 23:31 [PATCH 0/2] xfs: small debug improvements Dave Chinner @ 2016-01-06 23:31 ` Dave Chinner 2016-01-07 12:53 ` Brian Foster 2016-01-06 23:31 ` [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive Dave Chinner 1 sibling, 1 reply; 5+ messages in thread From: Dave Chinner @ 2016-01-06 23:31 UTC (permalink / raw) To: xfs From: Dave Chinner <dchinner@redhat.com> This allows us to see page cache driven readahead in action as it passes through XFS. This helps to understand buffered read throughput problems such as readahead IO IO sizes being too small for the underlying device to reach max throughput. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_aops.c | 2 ++ fs/xfs/xfs_trace.h | 26 ++++++++++++++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 29e7e5d..379c089 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -1917,6 +1917,7 @@ xfs_vm_readpage( struct file *unused, struct page *page) { + trace_xfs_vm_readpage(page->mapping->host, 1); return mpage_readpage(page, xfs_get_blocks); } @@ -1927,6 +1928,7 @@ xfs_vm_readpages( struct list_head *pages, unsigned nr_pages) { + trace_xfs_vm_readpages(mapping->host, nr_pages); return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks); } diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 877079eb..391d797 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1222,6 +1222,32 @@ DEFINE_PAGE_EVENT(xfs_writepage); DEFINE_PAGE_EVENT(xfs_releasepage); DEFINE_PAGE_EVENT(xfs_invalidatepage); +DECLARE_EVENT_CLASS(xfs_readpage_class, + TP_PROTO(struct inode *inode, int nr_pages), + TP_ARGS(inode, nr_pages), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_ino_t, ino) + __field(int, nr_pages) + ), + TP_fast_assign( + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->nr_pages = nr_pages; + ), + TP_printk("dev %d:%d ino 0x%llx nr_pages %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino, + __entry->nr_pages) +) + +#define DEFINE_READPAGE_EVENT(name) \ +DEFINE_EVENT(xfs_readpage_class, name, \ + TP_PROTO(struct inode *inode, int nr_pages), \ + TP_ARGS(inode, nr_pages)) +DEFINE_READPAGE_EVENT(xfs_vm_readpage); +DEFINE_READPAGE_EVENT(xfs_vm_readpages); + DECLARE_EVENT_CLASS(xfs_imap_class, TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count, int type, struct xfs_bmbt_irec *irec), -- 2.5.0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] xfs: add tracepoints to readpage calls 2016-01-06 23:31 ` [PATCH 1/2] xfs: add tracepoints to readpage calls Dave Chinner @ 2016-01-07 12:53 ` Brian Foster 0 siblings, 0 replies; 5+ messages in thread From: Brian Foster @ 2016-01-07 12:53 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Thu, Jan 07, 2016 at 10:31:26AM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > This allows us to see page cache driven readahead in action as it > passes through XFS. This helps to understand buffered read > throughput problems such as readahead IO IO sizes being too small Extra "IO" above: ^ > for the underlying device to reach max throughput. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- Reviewed-by: Brian Foster <bfoster@redhat.com> > fs/xfs/xfs_aops.c | 2 ++ > fs/xfs/xfs_trace.h | 26 ++++++++++++++++++++++++++ > 2 files changed, 28 insertions(+) > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 29e7e5d..379c089 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -1917,6 +1917,7 @@ xfs_vm_readpage( > struct file *unused, > struct page *page) > { > + trace_xfs_vm_readpage(page->mapping->host, 1); > return mpage_readpage(page, xfs_get_blocks); > } > > @@ -1927,6 +1928,7 @@ xfs_vm_readpages( > struct list_head *pages, > unsigned nr_pages) > { > + trace_xfs_vm_readpages(mapping->host, nr_pages); > return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks); > } > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h > index 877079eb..391d797 100644 > --- a/fs/xfs/xfs_trace.h > +++ b/fs/xfs/xfs_trace.h > @@ -1222,6 +1222,32 @@ DEFINE_PAGE_EVENT(xfs_writepage); > DEFINE_PAGE_EVENT(xfs_releasepage); > DEFINE_PAGE_EVENT(xfs_invalidatepage); > > +DECLARE_EVENT_CLASS(xfs_readpage_class, > + TP_PROTO(struct inode *inode, int nr_pages), > + TP_ARGS(inode, nr_pages), > + TP_STRUCT__entry( > + __field(dev_t, dev) > + __field(xfs_ino_t, ino) > + __field(int, nr_pages) > + ), > + TP_fast_assign( > + __entry->dev = inode->i_sb->s_dev; > + __entry->ino = inode->i_ino; > + __entry->nr_pages = nr_pages; > + ), > + TP_printk("dev %d:%d ino 0x%llx nr_pages %d", > + MAJOR(__entry->dev), MINOR(__entry->dev), > + __entry->ino, > + __entry->nr_pages) > +) > + > +#define DEFINE_READPAGE_EVENT(name) \ > +DEFINE_EVENT(xfs_readpage_class, name, \ > + TP_PROTO(struct inode *inode, int nr_pages), \ > + TP_ARGS(inode, nr_pages)) > +DEFINE_READPAGE_EVENT(xfs_vm_readpage); > +DEFINE_READPAGE_EVENT(xfs_vm_readpages); > + > DECLARE_EVENT_CLASS(xfs_imap_class, > TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count, > int type, struct xfs_bmbt_irec *irec), > -- > 2.5.0 > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive 2016-01-06 23:31 [PATCH 0/2] xfs: small debug improvements Dave Chinner 2016-01-06 23:31 ` [PATCH 1/2] xfs: add tracepoints to readpage calls Dave Chinner @ 2016-01-06 23:31 ` Dave Chinner 2016-01-07 12:53 ` Brian Foster 1 sibling, 1 reply; 5+ messages in thread From: Dave Chinner @ 2016-01-06 23:31 UTC (permalink / raw) To: xfs From: Dave Chinner <dchinner@redhat.com> For large sparse or fragmented files, checking every single entry in the bmapbt on every operation is prohibitively expensive. Especially as such checks rarely discover problems during normal operations on high extent coutn files. Our regression tests don't tend to exercise files with hundreds of thousands to millions of extents, so mostly this isn't noticed. However, trying to run things like xfs_mdrestore of large filesystem dumps on a debug kernel quickly becomes impossible as the CPU is completely burnt up repeatedly walking the sparse file bmapbt that is generated for every allocation that is made. Hence, if the file has more than 10,000 extents, just don't bother with walking the tree to check it exhaustively. The btree code has checks that ensure that the newly inserted/removed/modified record is correctly ordered, so the entrie tree walk in thses cases has limited additional value. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/libxfs/xfs_bmap.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 7388495..bc7e7d5 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -325,9 +325,11 @@ xfs_check_block( /* * Check that the extents for the inode ip are in the right order in all - * btree leaves. + * btree leaves. THis becomes prohibitively expensive for large extent count + * files, so don't bother with inodes that have more than 10,000 extents in + * them. The btree record ordering checks will still be done, so for such large + * bmapbt constructs that is going to catch most corruptions. */ - STATIC void xfs_bmap_check_leaf_extents( xfs_btree_cur_t *cur, /* btree cursor or null */ @@ -352,6 +354,10 @@ xfs_bmap_check_leaf_extents( return; } + /* skip large extent count inodes */ + if (ip->i_d.di_nextents > 10000) + return; + bno = NULLFSBLOCK; mp = ip->i_mount; ifp = XFS_IFORK_PTR(ip, whichfork); -- 2.5.0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive 2016-01-06 23:31 ` [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive Dave Chinner @ 2016-01-07 12:53 ` Brian Foster 0 siblings, 0 replies; 5+ messages in thread From: Brian Foster @ 2016-01-07 12:53 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Thu, Jan 07, 2016 at 10:31:27AM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > For large sparse or fragmented files, checking every single entry in > the bmapbt on every operation is prohibitively expensive. Especially > as such checks rarely discover problems during normal operations on > high extent coutn files. Our regression tests don't tend to exercise count > files with hundreds of thousands to millions of extents, so mostly > this isn't noticed. > > However, trying to run things like xfs_mdrestore of large filesystem > dumps on a debug kernel quickly becomes impossible as the CPU is > completely burnt up repeatedly walking the sparse file bmapbt that > is generated for every allocation that is made. > > Hence, if the file has more than 10,000 extents, just don't bother > with walking the tree to check it exhaustively. The btree code has > checks that ensure that the newly inserted/removed/modified record > is correctly ordered, so the entrie tree walk in thses cases has these > limited additional value. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/libxfs/xfs_bmap.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c > index 7388495..bc7e7d5 100644 > --- a/fs/xfs/libxfs/xfs_bmap.c > +++ b/fs/xfs/libxfs/xfs_bmap.c > @@ -325,9 +325,11 @@ xfs_check_block( > > /* > * Check that the extents for the inode ip are in the right order in all > - * btree leaves. > + * btree leaves. THis becomes prohibitively expensive for large extent count This Reviewed-by: Brian Foster <bfoster@redhat.com> > + * files, so don't bother with inodes that have more than 10,000 extents in > + * them. The btree record ordering checks will still be done, so for such large > + * bmapbt constructs that is going to catch most corruptions. > */ > - > STATIC void > xfs_bmap_check_leaf_extents( > xfs_btree_cur_t *cur, /* btree cursor or null */ > @@ -352,6 +354,10 @@ xfs_bmap_check_leaf_extents( > return; > } > > + /* skip large extent count inodes */ > + if (ip->i_d.di_nextents > 10000) > + return; > + > bno = NULLFSBLOCK; > mp = ip->i_mount; > ifp = XFS_IFORK_PTR(ip, whichfork); > -- > 2.5.0 > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-01-07 12:53 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-06 23:31 [PATCH 0/2] xfs: small debug improvements Dave Chinner 2016-01-06 23:31 ` [PATCH 1/2] xfs: add tracepoints to readpage calls Dave Chinner 2016-01-07 12:53 ` Brian Foster 2016-01-06 23:31 ` [PATCH 2/2] xfs: bmapbt checking on debug kernels too expensive Dave Chinner 2016-01-07 12:53 ` Brian Foster
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.