From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 10/10] xfs: refactor INUMBERS to use iwalk functions
Date: Tue, 11 Jun 2019 13:01:41 -0400 [thread overview]
Message-ID: <20190611170137.GC12395@bfoster> (raw)
In-Reply-To: <20190611162127.GT1871505@magnolia>
On Tue, Jun 11, 2019 at 09:21:28AM -0700, Darrick J. Wong wrote:
> On Tue, Jun 11, 2019 at 11:08:51AM -0400, Brian Foster wrote:
> > On Tue, Jun 04, 2019 at 02:50:33PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > >
> > > Now that we have generic functions to walk inode records, refactor the
> > > INUMBERS implementation to use it.
> > >
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > > fs/xfs/xfs_ioctl.c | 20 ++++--
> > > fs/xfs/xfs_ioctl.h | 2 +
> > > fs/xfs/xfs_ioctl32.c | 35 ++++------
> > > fs/xfs/xfs_itable.c | 168 ++++++++++++++++++++------------------------------
> > > fs/xfs/xfs_itable.h | 22 +------
> > > fs/xfs/xfs_iwalk.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++--
> > > fs/xfs/xfs_iwalk.h | 12 ++++
> >
> > It looks like there's a decent amount of xfs_iwalk code changes in this
> > patch. That should probably be a separate patch, at minimum to have a
> > separate changelog to document the changes/updates required for
> > inumbers.
>
> <nod> I'll break out the iwalk changes into a separate patch so that
> this one only has the changes needed to wire up the ioctl.
>
> > > 7 files changed, 262 insertions(+), 158 deletions(-)
> > >
> > >
> > ...
> > > diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> > > index 06abe5c9c0ee..bade54d6ac64 100644
> > > --- a/fs/xfs/xfs_itable.c
> > > +++ b/fs/xfs/xfs_itable.c
> > > @@ -259,121 +259,85 @@ xfs_bulkstat(
> > > return error;
> > > }
> > >
> > > -int
> > > -xfs_inumbers_fmt(
> > > - void __user *ubuffer, /* buffer to write to */
> > > - const struct xfs_inogrp *buffer, /* buffer to read from */
> > > - long count, /* # of elements to read */
> > > - long *written) /* # of bytes written */
> > > +struct xfs_inumbers_chunk {
> > > + inumbers_fmt_pf formatter;
> > > + struct xfs_ibulk *breq;
> > > +};
> > > +
> > > +/*
> > > + * INUMBERS
> > > + * ========
> > > + * This is how we export inode btree records to userspace, so that XFS tools
> > > + * can figure out where inodes are allocated.
> > > + */
> > > +
> > > +/*
> > > + * Format the inode group structure and report it somewhere.
> > > + *
> > > + * Similar to xfs_bulkstat_one_int, lastino is the inode cursor as we walk
> > > + * through the filesystem so we move it forward unless there was a runtime
> > > + * error. If the formatter tells us the buffer is now full we also move the
> > > + * cursor forward and abort the walk.
> > > + */
> > > +STATIC int
> > > +xfs_inumbers_walk(
> > > + struct xfs_mount *mp,
> > > + struct xfs_trans *tp,
> > > + xfs_agnumber_t agno,
> > > + const struct xfs_inobt_rec_incore *irec,
> > > + void *data)
> > > {
> > > - if (copy_to_user(ubuffer, buffer, count * sizeof(*buffer)))
> > > - return -EFAULT;
> > > - *written = count * sizeof(*buffer);
> > > - return 0;
> > > + struct xfs_inogrp inogrp = {
> > > + .xi_startino = XFS_AGINO_TO_INO(mp, agno, irec->ir_startino),
> > > + .xi_alloccount = irec->ir_count - irec->ir_freecount,
> > > + .xi_allocmask = ~irec->ir_free,
> > > + };
> >
> > Not related to this patch, but I'm wondering if we should be using
> > xfs_inobt_irec_to_allocmask() here to mask off holes from the resulting
> > allocation bitmap. Eh, I guess it's misleading either way..
>
> Holes were supposed to be marked in ir_free also, right?
>
Hm, yeah I think so. So right now holes would be reported as free via
inumbers, which is probably what we want. Must have been thinking about
this backwards... disregard!
Brian
> So (assuming the irec isn't corrupt) we should be protected against
> reporting a hole as an "allocated" inode, right?
>
> > > + struct xfs_inumbers_chunk *ic = data;
> > > + xfs_agino_t agino;
> > > + int error;
> > > +
> > > + error = ic->formatter(ic->breq, &inogrp);
> > > + if (error && error != XFS_IBULK_BUFFER_FULL)
> > > + return error;
> > > + if (error == XFS_IBULK_BUFFER_FULL)
> > > + error = XFS_INOBT_WALK_ABORT;
> > > +
> > > + agino = irec->ir_startino + XFS_INODES_PER_CHUNK;
> > > + ic->breq->startino = XFS_AGINO_TO_INO(mp, agno, agino);
> > > + return error;
> > > }
> > >
> > ...
> > > diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
> > > index c4a9c4c246b7..3a35d1cf7e14 100644
> > > --- a/fs/xfs/xfs_iwalk.c
> > > +++ b/fs/xfs/xfs_iwalk.c
> > ...
> > > @@ -286,7 +298,7 @@ xfs_iwalk_ag_start(
> > > * have to deal with tearing down the cursor to walk the records.
> > > */
> > > error = xfs_iwalk_grab_ichunk(*curpp, agino, &icount,
> > > - &iwag->recs[iwag->nr_recs]);
> > > + &iwag->recs[iwag->nr_recs], trim);
> >
> > Hmm, it looks like we could lift the lookup from xfs_iwalk_grab_ichunk()
> > up into xfs_iwalk_ag_start() and avoid needing to pass trim down
> > multiple levels. In fact, if we're not trimming the record we don't need
> > to grab the record at all in this path. We could do the lookup (setting
> > has_more) then bounce right up to the core walker algorithm, right? If
> > so, that seems a bit cleaner in terms of only using special cased code
> > when it's actually necessary.
>
> Right.
>
> >
> > > if (error)
> > > return error;
> > > if (icount)
> > ...
> > > @@ -561,3 +574,135 @@ xfs_iwalk_threaded(
> > ...
> > > +/*
> > > + * Walk all inode btree records in a single AG, from @iwag->startino to the end
> > > + * of the AG.
> > > + */
> > > +STATIC int
> > > +xfs_inobt_walk_ag(
> > > + struct xfs_iwalk_ag *iwag)
> > > +{
> > > + struct xfs_mount *mp = iwag->mp;
> > > + struct xfs_trans *tp = iwag->tp;
> > > + struct xfs_buf *agi_bp = NULL;
> > > + struct xfs_btree_cur *cur = NULL;
> > > + xfs_agnumber_t agno;
> > > + xfs_agino_t agino;
> > > + int has_more;
> > > + int error = 0;
> > > +
> > > + /* Set up our cursor at the right place in the inode btree. */
> > > + agno = XFS_INO_TO_AGNO(mp, iwag->startino);
> > > + agino = XFS_INO_TO_AGINO(mp, iwag->startino);
> > > + error = xfs_iwalk_ag_start(iwag, agno, agino, &cur, &agi_bp, &has_more,
> > > + false);
> > > +
> > > + while (!error && has_more && !xfs_pwork_want_abort(&iwag->pwork)) {
> > > + struct xfs_inobt_rec_incore *irec;
> > > +
> > > + cond_resched();
> > > +
> > > + /* Fetch the inobt record. */
> > > + irec = &iwag->recs[iwag->nr_recs];
> > > + error = xfs_inobt_get_rec(cur, irec, &has_more);
> > > + if (error || !has_more)
> > > + break;
> > > +
> > > + /*
> > > + * If there's space in the buffer for more records, increment
> > > + * the btree cursor and grab more.
> > > + */
> > > + if (++iwag->nr_recs < iwag->sz_recs) {
> > > + error = xfs_btree_increment(cur, 0, &has_more);
> > > + if (error || !has_more)
> > > + break;
> > > + continue;
> > > + }
> > > +
> > > + /*
> > > + * Otherwise, we need to save cursor state and run the callback
> > > + * function on the cached records. The run_callbacks function
> > > + * is supposed to return a cursor pointing to the record where
> > > + * we would be if we had been able to increment like above.
> > > + */
> > > + error = xfs_iwalk_run_callbacks(iwag, xfs_inobt_walk_ag_recs,
> > > + agno, &cur, &agi_bp, &has_more);
> > > + }
> > > +
> > > + xfs_iwalk_del_inobt(tp, &cur, &agi_bp, error);
> > > +
> > > + /* Walk any records left behind in the cache. */
> > > + if (iwag->nr_recs == 0 || error || xfs_pwork_want_abort(&iwag->pwork))
> > > + return error;
> > > +
> > > + return xfs_inobt_walk_ag_recs(iwag);
> > > +}
> >
> > Similar comments apply here as for the previous xfs_iwalk_ag() patch.
> > Though looking at it, the only differences here are the lack of free
> > inode check, readahead and the callback function (particularly once you
> > consider the partial completion refactoring we discussed earlier). I
> > think this could all be generalized with a single flag such that there's
> > no need for separate xfs_[inobt|iwalk]_ag() functions.
>
> Yep. Already refactoring that. :)
>
> > Hmmm.. perhaps we could use a flag or separate function pointers in
> > struct xfs_iwalk_ag to accomplish the same thing all the way up through
> > the record walker functions. IOW, xfs_iwalk_ag_recs() looks like it
> > could call either callback based on which is defined in the
> > xfs_iwalk_ag.
>
> <nod>
>
> > This could all be done as a separate patch of course, if that's easier.
>
> I might just go back and remove the function pointer from run_callbacks
> in the earlier patches...
>
> >
> > > +
> > > +/*
> > > + * Walk all inode btree records in the filesystem starting from @startino. The
> > > + * @inobt_walk_fn will be called for each btree record, being passed the incore
> > > + * record and @data. @max_prefetch controls how many inobt records we try to
> > > + * cache ahead of time.
> > > + */
> > > +int
> > > +xfs_inobt_walk(
> > > + struct xfs_mount *mp,
> > > + struct xfs_trans *tp,
> > > + xfs_ino_t startino,
> > > + xfs_inobt_walk_fn inobt_walk_fn,
> > > + unsigned int max_prefetch,
> > > + void *data)
> > > +{
> > > + struct xfs_iwalk_ag iwag = {
> > > + .mp = mp,
> > > + .tp = tp,
> > > + .inobt_walk_fn = inobt_walk_fn,
> > > + .data = data,
> > > + .startino = startino,
> > > + .pwork = XFS_PWORK_SINGLE_THREADED,
> > > + };
> > > + xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, startino);
> > > + int error;
> > > +
> > > + ASSERT(agno < mp->m_sb.sb_agcount);
> > > +
> > > + xfs_iwalk_set_prefetch(&iwag, max_prefetch * XFS_INODES_PER_CHUNK);
> >
> > A brief comment above this line would be helpful. Something like:
> >
> > /* translate inumbers record count to inode count */
>
> Done. Thanks for the review!
>
> --D
>
> > Brian
> >
> > > + error = xfs_iwalk_alloc(&iwag);
> > > + if (error)
> > > + return error;
> > > +
> > > + for (; agno < mp->m_sb.sb_agcount; agno++) {
> > > + error = xfs_inobt_walk_ag(&iwag);
> > > + if (error)
> > > + break;
> > > + iwag.startino = XFS_AGINO_TO_INO(mp, agno + 1, 0);
> > > + }
> > > +
> > > + xfs_iwalk_free(&iwag);
> > > + return error;
> > > +}
> > > diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h
> > > index 76d8f87a39ef..20bee93d4676 100644
> > > --- a/fs/xfs/xfs_iwalk.h
> > > +++ b/fs/xfs/xfs_iwalk.h
> > > @@ -18,4 +18,16 @@ int xfs_iwalk_threaded(struct xfs_mount *mp, xfs_ino_t startino,
> > > xfs_iwalk_fn iwalk_fn, unsigned int max_prefetch, bool poll,
> > > void *data);
> > >
> > > +/* Walk all inode btree records in the filesystem starting from @startino. */
> > > +typedef int (*xfs_inobt_walk_fn)(struct xfs_mount *mp, struct xfs_trans *tp,
> > > + xfs_agnumber_t agno,
> > > + const struct xfs_inobt_rec_incore *irec,
> > > + void *data);
> > > +/* Return value (for xfs_inobt_walk_fn) that aborts the walk immediately. */
> > > +#define XFS_INOBT_WALK_ABORT (XFS_IWALK_ABORT)
> > > +
> > > +int xfs_inobt_walk(struct xfs_mount *mp, struct xfs_trans *tp,
> > > + xfs_ino_t startino, xfs_inobt_walk_fn inobt_walk_fn,
> > > + unsigned int max_prefetch, void *data);
> > > +
> > > #endif /* __XFS_IWALK_H__ */
> > >
prev parent reply other threads:[~2019-06-11 17:01 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-04 21:49 [PATCH v2 00/10] xfs: refactor and improve inode iteration Darrick J. Wong
2019-06-04 21:49 ` [PATCH 01/10] xfs: create simplified inode walk function Darrick J. Wong
2019-06-10 13:58 ` Brian Foster
2019-06-10 16:59 ` Darrick J. Wong
2019-06-10 17:55 ` Brian Foster
2019-06-10 23:11 ` Darrick J. Wong
2019-06-11 22:33 ` Dave Chinner
2019-06-11 23:05 ` Darrick J. Wong
2019-06-12 12:13 ` Brian Foster
2019-06-12 16:53 ` Darrick J. Wong
2019-06-12 17:54 ` Darrick J. Wong
2019-06-04 21:49 ` [PATCH 02/10] xfs: convert quotacheck to use the new iwalk functions Darrick J. Wong
2019-06-10 13:58 ` Brian Foster
2019-06-10 17:10 ` Darrick J. Wong
2019-06-11 23:23 ` Dave Chinner
2019-06-12 0:32 ` Darrick J. Wong
2019-06-12 12:55 ` Brian Foster
2019-06-12 23:33 ` Dave Chinner
2019-06-13 18:34 ` Brian Foster
2019-06-04 21:49 ` [PATCH 03/10] xfs: bulkstat should copy lastip whenever userspace supplies one Darrick J. Wong
2019-06-10 13:59 ` Brian Foster
2019-06-04 21:49 ` [PATCH 04/10] xfs: convert bulkstat to new iwalk infrastructure Darrick J. Wong
2019-06-10 14:02 ` Brian Foster
2019-06-10 17:38 ` Darrick J. Wong
2019-06-10 18:29 ` Brian Foster
2019-06-10 23:42 ` Darrick J. Wong
2019-06-04 21:49 ` [PATCH 05/10] xfs: move bulkstat ichunk helpers to iwalk code Darrick J. Wong
2019-06-10 14:02 ` Brian Foster
2019-06-04 21:50 ` [PATCH 06/10] xfs: change xfs_iwalk_grab_ichunk to use startino, not lastino Darrick J. Wong
2019-06-10 19:32 ` Brian Foster
2019-06-04 21:50 ` [PATCH 07/10] xfs: clean up long conditionals in xfs_iwalk_ichunk_ra Darrick J. Wong
2019-06-10 19:32 ` Brian Foster
2019-06-04 21:50 ` [PATCH 08/10] xfs: multithreaded iwalk implementation Darrick J. Wong
2019-06-10 19:40 ` Brian Foster
2019-06-11 1:10 ` Darrick J. Wong
2019-06-11 13:13 ` Brian Foster
2019-06-11 15:29 ` Darrick J. Wong
2019-06-11 17:00 ` Brian Foster
2019-06-04 21:50 ` [PATCH 09/10] xfs: poll waiting for quotacheck Darrick J. Wong
2019-06-11 15:07 ` Brian Foster
2019-06-11 16:06 ` Darrick J. Wong
2019-06-11 17:01 ` Brian Foster
2019-06-04 21:50 ` [PATCH 10/10] xfs: refactor INUMBERS to use iwalk functions Darrick J. Wong
2019-06-11 15:08 ` Brian Foster
2019-06-11 16:21 ` Darrick J. Wong
2019-06-11 17:01 ` Brian Foster [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190611170137.GC12395@bfoster \
--to=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).