All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Mahoney <jeffm@suse.de>
To: Mark Fasheh <mfasheh@suse.de>
Cc: linux-btrfs@vger.kernel.org, Chris Mason <chris.mason@oracle.com>,
	Josef Bacik <josef@redhat.com>
Subject: Re: [PATCH 0/3] btrfs: extended inode refs
Date: Thu, 05 Apr 2012 17:13:29 -0400	[thread overview]
Message-ID: <4F7E0AF9.7070305@suse.de> (raw)
In-Reply-To: <1333656543-4843-1-git-send-email-mfasheh@suse.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/05/2012 04:09 PM, Mark Fasheh wrote:
> Currently btrfs has a limitation on the maximum number of hard
> links an inode can have. Specifically, links are stored in an array
> of ref items:
> 
> struct btrfs_inode_ref { __le64 index; __le16 name_len; /* name
> goes here */ } __attribute__ ((__packed__));
> 
> The ref arrays are found via key triple:
> 
> (inode objectid, BTRFS_INODE_EXTREF_KEY, parent dir objectid)
> 
> Since items can not exceed the size of a leaf, the total number of
> links that can be stored for a given inode / parent dir pair is
> limited to under 4k. This works fine for the most common case of
> few to only a handful of links. Once the link count gets higher
> however, we begin to return EMLINK.
> 
> 
> The following patches fix this situation by introducing a new ref
> item:
> 
> struct btrfs_inode_extref { __le64 parent_objectid; __le64 index; 
> __le16 name_len; __u8   name[0]; /* name goes here */ }
> __attribute__ ((__packed__));
> 
> Extended refs behave differently from ref arrays in several key
> areas.

Thanks for digging into this. It's been heating up on the list lately.

> Each extended refs is it's own item so there is no ref array (and 
> therefore no limit on size).
> 
> As a result, we must use a different addressing scheme. Extended
> ref keys look like:
> 
> (inode objectid, BTRFS_INODE_EXTREF_KEY, hash)
> 
> Where hash is defined as a function of the parent objectid and link
> name.

I think this is effective. It will essentially have the same
properties as a dirent but seeds the hash at objectid instead of ~1.

> This effectively fixes the limitation, though we have a slightly
> less efficient packing of link data. To keep the best of both
> worlds then, I implemented the following behavior:
> 
> Extended refs don't replace the existing ref array. An inode gets
> an extended ref for a given link _only_ after the ref array has
> been filled.  So the most common cases shouldn't actually see any
> difference in performance or disk usage as they'll never get to the
> point where we're using an extended ref.
> 
> It's important while reading the patches however that there's still
> the possibility that we can have a set of operations that grow out
> an inode ref array (adding some extended refs) and then remove only
> the refs in the array.  I don't really see this being common but
> it's a case we always have to consider when coding these changes.
> 
> Right now there is a limitation for extrefs in that we're not
> handling the possibility of a hash collision. There are two ways I
> see we can deal with this:
> 
> We can use a 56-bit hash and keep a generation counter in the lower
> 8 bits of the offset field.  The cost would be an additional tree
> search (between offset <hash>00 and <hash>FF) if we don't find
> exactly the name we were looking for.
> 
> An alternative solution to dealing with collisions could be to
> emulate the dir-item insertion code - specifically something like
> insert_with_overflow() which will stuff multiple items under one
> key. I tend to prefer the idea of

I vote for this option. The code for insert_with_overflow is already
well tested and anything that will generate a collision in dirent
insertion will generate a collision for the backref insertion. The
dirent structure is larger than the extref structure, so there should
always be space to match the dirent leaf so that a failure occurs
there first.

- -Jeff
- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJPfgr5AAoJEB57S2MheeWyt6cQAJPHUlCtB3+huJTodA7ow+jy
3WPhrbTPYME6lLpC/JQH8XbogKL1IqLbsvl9M3KzHMZAJ4fRzNJXmMCFgIou4cgu
v2cnNwkU1r5LJF/M3HMk1nhxABCeSONNTFqDbEp/eiTbI7X/UsM6q0vdPpj0vYih
20kWZhazmgx4pUPrtldKU+k91jjsZRoZyn8Bx6lEYPKIx1RQuBDPDH8q3ep5og2d
OQHDfMVNEJJ9Mz9lv+BZDqx/Q2Om8wyaM5GfjhtSocT+XrpTT4tC8FnKiUuOE1Ej
dy1Td43t+cCWvglGRDFj5I6ObfW7x4aUDTizo1hMUuGEQFQVc3vGY3OtDuqPxwoV
XS/4XRR3GU2cmsyjTky4Twa+81RF84cUl05+gM9VMaD7BJX60kIp1SXnE/TpyVUd
oVgZsgkV75R4XmjBp1zDrDRHTzpsgxm+zaVhSTxOKWhnV1bWLZ36SXw/1LRr0IVP
K+0xbnds7WWoRJNMSpJTBgAr7N4A5QEYvIUSQEO4UpgX4EXHYkfWstvvYNx1vUOT
HiHtS3ZKbrfzcmD1ZNSOPAcxold5BsVisuBwoyJnvOyCFeCi+Q+S0W8wA0kI6iS8
57nWYZfGelKLm/ATL8vPkpj6BCkFfhAK8neNdd80v1SpkUJOZr5tMTxaYO1r3QvC
T0e3jAC32pgrJ/2fByT+
=dx+I
-----END PGP SIGNATURE-----

  parent reply	other threads:[~2012-04-05 21:13 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-05 20:09 Mark Fasheh
2012-04-05 20:09 ` [PATCH 1/3] " Mark Fasheh
2012-04-12 13:08   ` Jan Schmidt
2012-04-24 22:23     ` Mark Fasheh
2012-04-25 10:19       ` Jan Schmidt
2012-04-05 20:09 ` [PATCH 2/3] " Mark Fasheh
2012-04-12 13:08   ` Jan Schmidt
2012-05-03 23:12     ` Mark Fasheh
2012-05-04 11:39       ` David Sterba
2012-04-12 15:53   ` Jan Schmidt
2012-05-01 18:39     ` Mark Fasheh
2012-04-05 20:09 ` [PATCH 3/3] " Mark Fasheh
2012-04-12 17:59   ` Jan Schmidt
2012-04-12 18:38     ` Jan Schmidt
2012-05-08 22:57     ` Mark Fasheh
2012-05-09 17:02       ` Chris Mason
2012-05-10  8:23         ` Jan Schmidt
2012-05-10 13:35           ` Chris Mason
2012-04-05 21:13 ` Jeff Mahoney [this message]
2012-04-11 13:11   ` [PATCH 0/3] " Jan Schmidt
2012-04-11 13:29     ` Jan Schmidt
2012-04-12 16:11     ` Chris Mason
2012-04-12 16:19       ` Mark Fasheh
2012-04-06  1:24 ` Liu Bo
2012-04-06  2:12   ` Liu Bo
2012-05-21 21:46 Mark Fasheh
2012-08-08 18:55 Mark Fasheh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F7E0AF9.7070305@suse.de \
    --to=jeffm@suse.de \
    --cc=chris.mason@oracle.com \
    --cc=josef@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mfasheh@suse.de \
    --subject='Re: [PATCH 0/3] btrfs: extended inode refs' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.