linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: david@fromorbit.com, linux-fsdevel@vger.kernel.org,
	vishal.l.verma@intel.com, bfoster@redhat.com, xfs@oss.sgi.com
Subject: Re: [PATCH 08/47] xfs: support btrees with overlapping intervals for keys
Date: Mon, 1 Aug 2016 12:11:26 -0700	[thread overview]
Message-ID: <20160801191126.GE8590@birch.djwong.org> (raw)
In-Reply-To: <20160801064818.GJ15590@infradead.org>

On Sun, Jul 31, 2016 at 11:48:18PM -0700, Christoph Hellwig wrote:
> > v2: When we're deleting a record in a btree that supports overlapped
> > interval records and the deletion results in two btree blocks being
> > joined, we defer updating the high/low keys until after all possible
> > joining (at higher levels in the tree) have finished.  At this point,
> > the btree pointers at all levels have been updated to remove the empty
> > blocks and we can update the low and high keys.
> > 
> > When we're doing this, we must be careful to update the keys of all
> > node pointers up to the root instead of stopping at the first set of
> > keys that don't need updating.  This is because it's possible for a
> > single deletion to cause joining of multiple levels of tree, and so
> > we need to update everything going back to the root.
> > 
> > v3: Make diff_two_keys return < 0, 0, or > 0 if key1 is less than,
> > equal to, or greater than key2, respectively.  This is consistent
> > with the rest of the kernel and the C library.  Clarify some comments
> > and refactor the sibling_update function out of existence.  Check the
> > return value of btree_updkeys().
> 
> The changelogs go below the "-- " marker so that they don't appear
> in the git log.  That is unless they actually are useful like this
> one and should be merged into the actual patch description instead
> of being worded incrementally.
> 
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -51,7 +51,6 @@ static const __uint32_t xfs_magics[2][XFS_BTNUM_MAX] = {
> >  #define xfs_btree_magic(cur) \
> >  	xfs_magics[!!((cur)->bc_flags & XFS_BTREE_CRC_BLOCKS)][cur->bc_btnum]
> >  
> > -
> >  STATIC int				/* error (0 or EFSCORRUPTED) */
> >  xfs_btree_check_lblock(
> >  	struct xfs_btree_cur	*cur,	/* btree cursor */
> 
> Random whitespace change that probably shouldn't be in the patch.

Oops.

> > @@ -428,6 +427,50 @@ xfs_btree_dup_cursor(
> >   * into a btree block (xfs_btree_*_offset) or return a pointer to the given
> >   * record, key or pointer (xfs_btree_*_addr).  Note that all addressing
> >   * inside the btree block is done using indices starting at one, not zero!
> > + *
> > + * If XFS_BTREE_OVERLAPPING is set, then this btree supports keys containing
> 
> And here we already have the flag I asked for in the last patch.  I
> think that should be enough to drop the new methods.

(As I mentioned in a previous reply, I used to open code this:

if (cur->bc_flags & XFS_BTREE_OVERLAPPING)
	xfs_btree_get_node_overlapped(...);
else
	xfs_btree_get_node(...);

but Dave prefers to dispatch this through function pointers so that
the switching logic occurs in only one place.)

> > +/*
> > + * In-core key that holds both low and high keys for overlapped btrees.
> > + * The two keys are packed next to each other on disk, so do the same
> > + * in memory.  Preserve the existing xfs_btree_key as a single key to
> > + * avoid the mental model breakage that would happen if we passed a
> > + * bigkey into a function that operates on a single key.
> > + */
> > +union xfs_btree_bigkey {
> > +	struct xfs_bmbt_key		bmbt;
> > +	xfs_bmdr_key_t			bmbr;	/* bmbt root block */
> > +	xfs_alloc_key_t			alloc;
> > +	struct xfs_inobt_key		inobt;
> > +};
> 
> I don't understand the purpose of this union at all, and the comment
> seems misleading.  Compared to union xfs_btree_key the only difference
> seems to be that xfs_btree_bigkey is missing the
> 'struct xfs_rmap_key rmap' member.  How does that enable us to holds

I think you might be missing a later patch, wherein we add the rmap
stuff to the btree structures, which expands bigkey to look like this:

union xfs_btree_bigkey {
	struct xfs_bmbt_key		bmbt;
	xfs_bmdr_key_t			bmbr;	/* bmbt root block */
	xfs_alloc_key_t			alloc;
	struct xfs_inobt_key		inobt;
	struct {
		struct xfs_rmap_key	rmap;
		struct xfs_rmap_key	rmap_hi;
	};
	struct xfs_refcount_key		refc;
};

bigkey.rmap is the low key, bigkey.rmap_hi is the high key.  None of
the other btrees are overlapped, so they don't get a high key.

> low and high keys?  Also every single user seems to cast it to
> xfs_btree_key which is a little odd and smells unsafe.

On disk, the low and high keys of a pointer reside next to each other.
The btree_split code wants to store the new block's keys somewhere so
that the block can later be insrec'd into a higher btree level.  It
would be convenient if this incore storage could also store the two
keys right next to each other so that we can memcpy key_len bytes from
the temporary storage into the on-disk btree block and not have to
special case that code.

I thought about simply declaring an on-stack array of two union
xfs_btree_keys.  The array is big enough to contain both keys and
eliminates the need for casting.  On the other hand it's weird because
the two keys have to be aligned to xfs_rmap_key boundaries, not
xfs_btree_key, which means that the high key isn't necessarily stored
in the second array element like the code would suggest.

Then I thought about stuffing both low and high keys into
xfs_rmap_key like so:

struct xfs_rmap_key {
	__be32		rm_startblock;	/* extent start block */
	__be64		rm_owner;	/* extent owner */
	__be64		rm_offset;	/* offset within the owner */
	__be32		rm_hi_startblock;	/* extent start block */
	__be64		rm_hi_owner;	/* extent owner */
	__be64		rm_hi_offset;	/* offset within the owner */
} __attribute__((packed));

But that was even uglier, because an overlapped btree has two keys
associated with a pointer, not one gigantic key.  It's also a
non-starter because sometimes we want to be able to treat the high
fields as a distinct key and then feed that key to the btree key
handling functions; when we do this, the hi_ fields point past the end
of the allotted space.  The overlapped query range function and the
btree scrubbers in later patches want to use high keys in this manner.

So then there was this way:

union xfs_btree_key {
	struct xfs_bmbt_key		bmbt;
	xfs_bmdr_key_t			bmbr;	/* bmbt root block */
	xfs_alloc_key_t			alloc;
	struct xfs_inobt_key		inobt;
	struct xfs_rmap_key		rmap[2];
	struct xfs_refcount_key		refc;
};

This gives us the storage we want and avoids casts, but it still
doesn't fix the problem that sometimes we want to create a key pointer
to just the high fields and treat that as a pointer.

So I created the separate bigkey structure to get the storage size I
wanted, and cast it to xfs_btree_key wherever it gets fed into the
other parts of the btree code.  It's smelly like you say, but at least
we have a distinct type to help future us identify the three smelly
places where we do this.

What I really wanted to do instead of bigkey was this:

struct xfs_btree_key *key = kmalloc(cur->bc_ops->key_len);

...except then we have a memory allocation.

<shrug> I don't have a problem with replacing the bigkey variables
with two-element array and just living with the fact that the high key
will not be found at key[1], but I worry that future me won't remember
that subtlety.  Whereas tracing the key pointers back to the bigkey on
the stack is not subtle and even better the debugger correctly locates
the high key contents.

--D

  reply	other threads:[~2016-08-01 19:13 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-21  4:55 [PATCH v7 00/47] xfs: add reverse mapping support Darrick J. Wong
2016-07-21  4:56 ` [PATCH 01/47] vfs: fix return type of ioctl_file_dedupe_range Darrick J. Wong
2016-08-01  6:33   ` Christoph Hellwig
2016-07-21  4:56 ` [PATCH 02/47] vfs: support FS_XFLAG_REFLINK and FS_XFLAG_COWEXTSIZE Darrick J. Wong
2016-08-01  6:33   ` Christoph Hellwig
2016-07-21  4:56 ` [PATCH 03/47] xfs: fix attr shortform structure alignment on cris Darrick J. Wong
2016-07-26 16:36   ` Brian Foster
2016-08-01  6:34   ` Christoph Hellwig
2016-07-21  4:56 ` [PATCH 04/47] xfs: fix locking of the rt bitmap/summary inodes Darrick J. Wong
2016-07-26 16:36   ` Brian Foster
2016-07-28 18:58     ` Darrick J. Wong
2016-08-01  6:34   ` Christoph Hellwig
2016-07-21  4:56 ` [PATCH 05/47] xfs: set *stat=1 after iroot realloc Darrick J. Wong
2016-07-26 16:36   ` Brian Foster
2016-08-01  6:35   ` Christoph Hellwig
2016-07-21  4:56 ` [PATCH 06/47] xfs: during btree split, save new block key & ptr for future insertion Darrick J. Wong
2016-07-26 16:36   ` Brian Foster
2016-08-01  6:37   ` Christoph Hellwig
2016-07-21  4:56 ` [PATCH 07/47] xfs: add function pointers for get/update keys to the btree Darrick J. Wong
2016-07-26 19:09   ` Brian Foster
2016-07-28 19:13     ` Darrick J. Wong
2016-07-28 19:46   ` [PATCH v2 " Darrick J. Wong
2016-08-01 15:57     ` Brian Foster
2016-08-01 17:54       ` Darrick J. Wong
2016-08-01  6:39   ` [PATCH " Christoph Hellwig
2016-08-01 17:33     ` Darrick J. Wong
2016-08-02 12:23       ` Christoph Hellwig
2016-08-03  0:12         ` Darrick J. Wong
2016-07-21  4:56 ` [PATCH 08/47] xfs: support btrees with overlapping intervals for keys Darrick J. Wong
2016-08-01  6:48   ` Christoph Hellwig
2016-08-01 19:11     ` Darrick J. Wong [this message]
2016-08-02 12:03       ` Christoph Hellwig
2016-08-03  3:29         ` Darrick J. Wong
2016-08-02 14:04       ` Brian Foster
2016-08-03  1:06         ` Dave Chinner
2016-08-01 17:47   ` Brian Foster
2016-08-01 19:18     ` Darrick J. Wong
2016-07-21  4:56 ` [PATCH 09/47] xfs: introduce interval queries on btrees Darrick J. Wong
2016-08-01  8:00   ` Christoph Hellwig
2016-07-21  4:57 ` [PATCH 10/47] xfs: refactor btree owner change into a separate visit-blocks function Darrick J. Wong
2016-08-01  6:50   ` Christoph Hellwig
2016-07-21  4:57 ` [PATCH 11/47] xfs: move deferred operations into a separate file Darrick J. Wong
2016-08-01  7:08   ` Christoph Hellwig
2016-08-01  8:02   ` Christoph Hellwig
2016-08-02 22:39     ` Dave Chinner
2016-08-03  9:16       ` Christoph Hellwig
2016-08-03 22:57         ` Dave Chinner
2016-08-04 16:00           ` Christoph Hellwig
2016-08-04 23:44             ` Dave Chinner
2016-08-02 17:30   ` Brian Foster
2016-07-21  4:57 ` [PATCH 12/47] xfs: add tracepoints for the deferred ops mechanism Darrick J. Wong
2016-07-21  4:57 ` [PATCH 13/47] xfs: clean up typedef usage in the EFI/EFD handling code Darrick J. Wong
2016-08-01  7:09   ` Christoph Hellwig
2016-07-21  4:57 ` [PATCH 14/47] xfs: enable the xfs_defer mechanism to process extents to free Darrick J. Wong
2016-08-01  7:09   ` Christoph Hellwig
2016-08-02 17:30   ` Brian Foster
2016-07-21  4:57 ` [PATCH 15/47] xfs: rework xfs_bmap_free callers to use xfs_defer_ops Darrick J. Wong
2016-08-02 17:30   ` Brian Foster
2016-07-21  4:57 ` [PATCH 16/47] xfs: change xfs_bmap_{finish, cancel, init, free} -> xfs_defer_* Darrick J. Wong
2016-08-02 17:30   ` Brian Foster
2016-08-02 20:47     ` Darrick J. Wong
2016-07-21  4:57 ` [PATCH 17/47] xfs: rename flist/free_list to dfops Darrick J. Wong
2016-08-02 17:30   ` Brian Foster
2016-07-21  4:58 ` [PATCH 18/47] xfs: refactor redo intent item processing Darrick J. Wong
2016-08-01  8:10   ` Christoph Hellwig
2016-08-02 20:35     ` Darrick J. Wong
2016-08-02 18:47   ` Brian Foster
2016-07-21  4:58 ` [PATCH 19/47] xfs: add tracepoints and error injection for deferred extent freeing Darrick J. Wong
2016-08-02 18:48   ` Brian Foster
2016-08-02 20:24     ` Darrick J. Wong
2016-08-02 21:38       ` Brian Foster
2016-08-02 22:43         ` Darrick J. Wong
2016-07-21  4:58 ` [PATCH 20/47] xfs: increase XFS_BTREE_MAXLEVELS to fit the rmapbt Darrick J. Wong
2016-08-02 18:48   ` Brian Foster
2016-08-02 20:06     ` Darrick J. Wong
2016-08-02 21:38       ` Brian Foster
2016-07-21  4:58 ` [PATCH 21/47] xfs: introduce rmap btree definitions Darrick J. Wong
2016-07-21  4:58 ` [PATCH 22/47] xfs: add rmap btree stats infrastructure Darrick J. Wong
2016-07-21  4:58 ` [PATCH 23/47] xfs: rmap btree add more reserved blocks Darrick J. Wong
2016-07-21  4:58 ` [PATCH 24/47] xfs: add owner field to extent allocation and freeing Darrick J. Wong
2016-07-21  4:58 ` [PATCH 25/47] xfs: introduce rmap extent operation stubs Darrick J. Wong
2016-07-21  4:58 ` [PATCH 26/47] xfs: define the on-disk rmap btree format Darrick J. Wong
2016-07-21  4:59 ` [PATCH 27/47] xfs: add rmap btree growfs support Darrick J. Wong
2016-07-21  4:59 ` [PATCH 28/47] xfs: rmap btree transaction reservations Darrick J. Wong
2016-07-21  4:59 ` [PATCH 29/47] xfs: rmap btree requires more reserved free space Darrick J. Wong
2016-07-21  4:59 ` [PATCH 30/47] xfs: add rmap btree operations Darrick J. Wong
2016-07-21  4:59 ` [PATCH 31/47] xfs: support overlapping intervals in the rmap btree Darrick J. Wong
2016-07-21  4:59 ` [PATCH 32/47] xfs: teach rmapbt to support interval queries Darrick J. Wong
2016-07-21  4:59 ` [PATCH 33/47] xfs: add tracepoints for the rmap functions Darrick J. Wong
2016-07-21  4:59 ` [PATCH 34/47] xfs: add an extent to the rmap btree Darrick J. Wong
2016-07-21  4:59 ` [PATCH 35/47] xfs: remove an extent from " Darrick J. Wong
2016-07-21  5:00 ` [PATCH 36/47] xfs: convert unwritten status of reverse mappings Darrick J. Wong
2016-08-03  2:00   ` Dave Chinner
2016-07-21  5:00 ` [PATCH 37/47] xfs: add rmap btree insert and delete helpers Darrick J. Wong
2016-07-21  5:00 ` [PATCH 38/47] xfs: create rmap update intent log items Darrick J. Wong
2016-08-01  7:12   ` Christoph Hellwig
2016-08-01 18:08     ` Darrick J. Wong
2016-07-21  5:00 ` [PATCH 39/47] xfs: log rmap intent items Darrick J. Wong
2016-07-21  5:00 ` [PATCH 40/47] xfs: enable the xfs_defer mechanism to process rmaps to update Darrick J. Wong
2016-07-21  5:00 ` [PATCH 41/47] xfs: propagate bmap updates to rmapbt Darrick J. Wong
2016-07-21  5:00 ` [PATCH 42/47] xfs: add rmap btree geometry feature flag Darrick J. Wong
2016-07-21  5:00 ` [PATCH 43/47] xfs: add rmap btree block detection to log recovery Darrick J. Wong
2016-07-21  5:00 ` [PATCH 44/47] xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled Darrick J. Wong
2016-07-21  5:01 ` [PATCH 45/47] xfs: don't update rmapbt when fixing agfl Darrick J. Wong
2016-07-21  5:01 ` [PATCH 46/47] xfs: enable the rmap btree functionality Darrick J. Wong
2016-07-21  5:01 ` [PATCH 47/47] xfs: introduce the XFS_IOC_GETFSMAP ioctl Darrick J. Wong
2016-07-23  4:28   ` [PATCH v2 " Darrick J. Wong
2016-08-03 19:45 ` [PATCH v7 00/47] xfs: add reverse mapping support Mark Fasheh
2016-08-03 20:55   ` Darrick J. Wong
2016-08-04  0:58     ` Darrick J. Wong
2016-08-04  2:18       ` Mark Fasheh
2016-08-04 15:48         ` Darrick J. Wong
2016-08-04 23:50           ` Dave Chinner
2016-08-05  0:49             ` Darrick J. Wong
2016-08-05  7:01             ` Artem Bityutskiy
2016-08-05  7:22               ` Darrick J. Wong
2016-08-05 10:49               ` Dave Chinner
2016-08-05 11:57                 ` Artem Bityutskiy
2016-08-05 22:26                   ` Dave Chinner
2016-08-05 18:36             ` Mark Fasheh
2016-08-05 22:39               ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160801191126.GE8590@birch.djwong.org \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).