All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Mahoney <jeffm@suse.de>
To: Mark Fasheh <mfasheh@suse.de>
Cc: linux-btrfs@vger.kernel.org, Chris Mason <chris.mason@oracle.com>,
	Josef Bacik <josef@redhat.com>
Subject: Re: [PATCH 0/3] btrfs: extended inode refs
Date: Thu, 05 Apr 2012 17:13:29 -0400	[thread overview]
Message-ID: <4F7E0AF9.7070305@suse.de> (raw)
In-Reply-To: <1333656543-4843-1-git-send-email-mfasheh@suse.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/05/2012 04:09 PM, Mark Fasheh wrote:
> Currently btrfs has a limitation on the maximum number of hard
> links an inode can have. Specifically, links are stored in an array
> of ref items:
> 
> struct btrfs_inode_ref { __le64 index; __le16 name_len; /* name
> goes here */ } __attribute__ ((__packed__));
> 
> The ref arrays are found via key triple:
> 
> (inode objectid, BTRFS_INODE_EXTREF_KEY, parent dir objectid)
> 
> Since items can not exceed the size of a leaf, the total number of
> links that can be stored for a given inode / parent dir pair is
> limited to under 4k. This works fine for the most common case of
> few to only a handful of links. Once the link count gets higher
> however, we begin to return EMLINK.
> 
> 
> The following patches fix this situation by introducing a new ref
> item:
> 
> struct btrfs_inode_extref { __le64 parent_objectid; __le64 index; 
> __le16 name_len; __u8   name[0]; /* name goes here */ }
> __attribute__ ((__packed__));
> 
> Extended refs behave differently from ref arrays in several key
> areas.

Thanks for digging into this. It's been heating up on the list lately.

> Each extended refs is it's own item so there is no ref array (and 
> therefore no limit on size).
> 
> As a result, we must use a different addressing scheme. Extended
> ref keys look like:
> 
> (inode objectid, BTRFS_INODE_EXTREF_KEY, hash)
> 
> Where hash is defined as a function of the parent objectid and link
> name.

I think this is effective. It will essentially have the same
properties as a dirent but seeds the hash at objectid instead of ~1.

> This effectively fixes the limitation, though we have a slightly
> less efficient packing of link data. To keep the best of both
> worlds then, I implemented the following behavior:
> 
> Extended refs don't replace the existing ref array. An inode gets
> an extended ref for a given link _only_ after the ref array has
> been filled.  So the most common cases shouldn't actually see any
> difference in performance or disk usage as they'll never get to the
> point where we're using an extended ref.
> 
> It's important while reading the patches however that there's still
> the possibility that we can have a set of operations that grow out
> an inode ref array (adding some extended refs) and then remove only
> the refs in the array.  I don't really see this being common but
> it's a case we always have to consider when coding these changes.
> 
> Right now there is a limitation for extrefs in that we're not
> handling the possibility of a hash collision. There are two ways I
> see we can deal with this:
> 
> We can use a 56-bit hash and keep a generation counter in the lower
> 8 bits of the offset field.  The cost would be an additional tree
> search (between offset <hash>00 and <hash>FF) if we don't find
> exactly the name we were looking for.
> 
> An alternative solution to dealing with collisions could be to
> emulate the dir-item insertion code - specifically something like
> insert_with_overflow() which will stuff multiple items under one
> key. I tend to prefer the idea of

I vote for this option. The code for insert_with_overflow is already
well tested and anything that will generate a collision in dirent
insertion will generate a collision for the backref insertion. The
dirent structure is larger than the extref structure, so there should
always be space to match the dirent leaf so that a failure occurs
there first.

- -Jeff
- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJPfgr5AAoJEB57S2MheeWyt6cQAJPHUlCtB3+huJTodA7ow+jy
3WPhrbTPYME6lLpC/JQH8XbogKL1IqLbsvl9M3KzHMZAJ4fRzNJXmMCFgIou4cgu
v2cnNwkU1r5LJF/M3HMk1nhxABCeSONNTFqDbEp/eiTbI7X/UsM6q0vdPpj0vYih
20kWZhazmgx4pUPrtldKU+k91jjsZRoZyn8Bx6lEYPKIx1RQuBDPDH8q3ep5og2d
OQHDfMVNEJJ9Mz9lv+BZDqx/Q2Om8wyaM5GfjhtSocT+XrpTT4tC8FnKiUuOE1Ej
dy1Td43t+cCWvglGRDFj5I6ObfW7x4aUDTizo1hMUuGEQFQVc3vGY3OtDuqPxwoV
XS/4XRR3GU2cmsyjTky4Twa+81RF84cUl05+gM9VMaD7BJX60kIp1SXnE/TpyVUd
oVgZsgkV75R4XmjBp1zDrDRHTzpsgxm+zaVhSTxOKWhnV1bWLZ36SXw/1LRr0IVP
K+0xbnds7WWoRJNMSpJTBgAr7N4A5QEYvIUSQEO4UpgX4EXHYkfWstvvYNx1vUOT
HiHtS3ZKbrfzcmD1ZNSOPAcxold5BsVisuBwoyJnvOyCFeCi+Q+S0W8wA0kI6iS8
57nWYZfGelKLm/ATL8vPkpj6BCkFfhAK8neNdd80v1SpkUJOZr5tMTxaYO1r3QvC
T0e3jAC32pgrJ/2fByT+
=dx+I
-----END PGP SIGNATURE-----

  parent reply	other threads:[~2012-04-05 21:13 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-05 20:09 [PATCH 0/3] btrfs: extended inode refs Mark Fasheh
2012-04-05 20:09 ` [PATCH 1/3] " Mark Fasheh
2012-04-12 13:08   ` Jan Schmidt
2012-04-24 22:23     ` Mark Fasheh
2012-04-25 10:19       ` Jan Schmidt
2012-04-05 20:09 ` [PATCH 2/3] " Mark Fasheh
2012-04-12 13:08   ` Jan Schmidt
2012-05-03 23:12     ` Mark Fasheh
2012-05-04 11:39       ` David Sterba
2012-04-12 15:53   ` Jan Schmidt
2012-05-01 18:39     ` Mark Fasheh
2012-04-05 20:09 ` [PATCH 3/3] " Mark Fasheh
2012-04-12 17:59   ` Jan Schmidt
2012-04-12 18:38     ` Jan Schmidt
2012-05-08 22:57     ` Mark Fasheh
2012-05-09 17:02       ` Chris Mason
2012-05-10  8:23         ` Jan Schmidt
2012-05-10 13:35           ` Chris Mason
2012-04-05 21:13 ` Jeff Mahoney [this message]
2012-04-11 13:11   ` [PATCH 0/3] " Jan Schmidt
2012-04-11 13:29     ` Jan Schmidt
2012-04-12 16:11     ` Chris Mason
2012-04-12 16:19       ` Mark Fasheh
2012-04-06  1:24 ` Liu Bo
2012-04-06  2:12   ` Liu Bo
2012-05-21 21:46 Mark Fasheh
2012-08-08 18:55 Mark Fasheh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F7E0AF9.7070305@suse.de \
    --to=jeffm@suse.de \
    --cc=chris.mason@oracle.com \
    --cc=josef@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mfasheh@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.