From: Liu Bo <liubo2009@cn.fujitsu.com> To: Mark Fasheh <mfasheh@suse.de> Cc: linux-btrfs@vger.kernel.org, Chris Mason <chris.mason@oracle.com>, Josef Bacik <josef@redhat.com> Subject: Re: [PATCH 0/3] btrfs: extended inode refs Date: Fri, 06 Apr 2012 09:24:21 +0800 [thread overview] Message-ID: <4F7E45C5.5090306@cn.fujitsu.com> (raw) In-Reply-To: <1333656543-4843-1-git-send-email-mfasheh@suse.de> On 04/06/2012 04:09 AM, Mark Fasheh wrote: > Currently btrfs has a limitation on the maximum number of hard links an > inode can have. Specifically, links are stored in an array of ref > items: > > struct btrfs_inode_ref { > __le64 index; > __le16 name_len; > /* name goes here */ > } __attribute__ ((__packed__)); > > The ref arrays are found via key triple: > > (inode objectid, BTRFS_INODE_EXTREF_KEY, parent dir objectid) > > Since items can not exceed the size of a leaf, the total number of links > that can be stored for a given inode / parent dir pair is limited to under > 4k. This works fine for the most common case of few to only a handful of > links. Once the link count gets higher however, we begin to return EMLINK. > > > The following patches fix this situation by introducing a new ref item: > > struct btrfs_inode_extref { > __le64 parent_objectid; > __le64 index; > __le16 name_len; > __u8 name[0]; > /* name goes here */ > } __attribute__ ((__packed__)); > > Extended refs behave differently from ref arrays in several key areas. > > Each extended refs is it's own item so there is no ref array (and > therefore no limit on size). > > As a result, we must use a different addressing scheme. Extended ref keys > look like: > > (inode objectid, BTRFS_INODE_EXTREF_KEY, hash) > > Where hash is defined as a function of the parent objectid and link name. > > This effectively fixes the limitation, though we have a slightly less > efficient packing of link data. To keep the best of both worlds then, I > implemented the following behavior: > > Extended refs don't replace the existing ref array. An inode gets an > extended ref for a given link _only_ after the ref array has been filled. So > the most common cases shouldn't actually see any difference in performance > or disk usage as they'll never get to the point where we're using an > extended ref. > > It's important while reading the patches however that there's still the > possibility that we can have a set of operations that grow out an inode ref > array (adding some extended refs) and then remove only the refs in the > array. I don't really see this being common but it's a case we always have > to consider when coding these changes. > > Right now there is a limitation for extrefs in that we're not handling the > possibility of a hash collision. There are two ways I see we can deal with > this: > > We can use a 56-bit hash and keep a generation counter in the lower 8 > bits of the offset field. The cost would be an additional tree search > (between offset <hash>00 and <hash>FF) if we don't find exactly the name we > were looking for. > > An alternative solution to dealing with collisions could be to emulate the > dir-item insertion code - specifically something like insert_with_overflow() > which will stuff multiple items under one key. I tend to prefer the idea of > simply including a generation in the key offset however since it maintains > the 1:1 relationship of keys to names which turns out to be much nicer to > code for in my honest opinion. Also none of the code which iterates the tree > looking for refs would have to change as the only difference is in the key > offset and not in the actual item structure. > > > Testing wise, the patches are in an intermediate state. I've debugged a fair > bit but I'm certain there's gremlins lurking in there. The basic namespace > operations work well enough (link, unlink, etc). I've done light testing of > my changes in backref.c by exercising BTRFS_IOC_INO_PATHS. The changes in > tree-log.c need the most review and testing - I haven't really figured out a > great way to exercise the code in tree-log yet (suggestions would be > great!). > For the log recover test, I used to sysrq+b to make sure our log remains on disk. Will also test this patchset sooner or later. thanks, liubo > > Finally, these patches are based off Linux v3.3. > --Mark > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
next prev parent reply other threads:[~2012-04-06 1:24 UTC|newest] Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top 2012-04-05 20:09 Mark Fasheh 2012-04-05 20:09 ` [PATCH 1/3] " Mark Fasheh 2012-04-12 13:08 ` Jan Schmidt 2012-04-24 22:23 ` Mark Fasheh 2012-04-25 10:19 ` Jan Schmidt 2012-04-05 20:09 ` [PATCH 2/3] " Mark Fasheh 2012-04-12 13:08 ` Jan Schmidt 2012-05-03 23:12 ` Mark Fasheh 2012-05-04 11:39 ` David Sterba 2012-04-12 15:53 ` Jan Schmidt 2012-05-01 18:39 ` Mark Fasheh 2012-04-05 20:09 ` [PATCH 3/3] " Mark Fasheh 2012-04-12 17:59 ` Jan Schmidt 2012-04-12 18:38 ` Jan Schmidt 2012-05-08 22:57 ` Mark Fasheh 2012-05-09 17:02 ` Chris Mason 2012-05-10 8:23 ` Jan Schmidt 2012-05-10 13:35 ` Chris Mason 2012-04-05 21:13 ` [PATCH 0/3] " Jeff Mahoney 2012-04-11 13:11 ` Jan Schmidt 2012-04-11 13:29 ` Jan Schmidt 2012-04-12 16:11 ` Chris Mason 2012-04-12 16:19 ` Mark Fasheh 2012-04-06 1:24 ` Liu Bo [this message] 2012-04-06 2:12 ` Liu Bo 2012-05-21 21:46 Mark Fasheh 2012-08-08 18:55 Mark Fasheh
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=4F7E45C5.5090306@cn.fujitsu.com \ --to=liubo2009@cn.fujitsu.com \ --cc=chris.mason@oracle.com \ --cc=josef@redhat.com \ --cc=linux-btrfs@vger.kernel.org \ --cc=mfasheh@suse.de \ --subject='Re: [PATCH 0/3] btrfs: extended inode refs' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.