All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] btrfs: extended inode refs
@ 2012-05-21 21:46 Mark Fasheh
  2012-05-21 21:46 ` [PATCH 1/3] " Mark Fasheh
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Mark Fasheh @ 2012-05-21 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, Jan Schmidt, Mark Fasheh

Currently btrfs has a limitation on the maximum number of hard links an
inode can have. Specifically, links are stored in an array of ref
items:

struct btrfs_inode_ref {
	__le64 index;
	__le16 name_len;
	/* name goes here */
} __attribute__ ((__packed__));

The ref arrays are found via key triple:

(inode objectid, BTRFS_INODE_EXTREF_KEY, parent dir objectid)

Since items can not exceed the size of a leaf, the total number of links
that can be stored for a given inode / parent dir pair is limited to under
4k. This works fine for the most common case of few to only a handful of
links. Once the link count gets higher however, we begin to return EMLINK.


The following patches fix this situation by introducing a new ref item:

struct btrfs_inode_extref {
	__le64 parent_objectid;
	__le64 index;
	__le16 name_len;
	__u8   name[0];
	/* name goes here */
} __attribute__ ((__packed__));

Extended refs use a different addressing scheme. Extended ref keys
look like:

(inode objectid, BTRFS_INODE_EXTREF_KEY, hash)

Where hash is defined as a function of the parent objectid and link name.

This effectively fixes the limitation, though we have a slightly less
efficient packing of link data. To keep the best of both worlds then, I
implemented the following behavior:

Extended refs don't replace the existing ref array. An inode gets an
extended ref for a given link _only_ after the ref array has been filled.  So
the most common cases shouldn't actually see any difference in performance
or disk usage as they'll never get to the point where we're using an
extended ref.

It's important while reading the patches however that there's still the
possibility that we can have a set of operations that grow out an inode ref
array (adding some extended refs) and then remove only the refs in the
array.  I don't really see this being common but it's a case we always have
to consider when coding these changes.

Extended refs handle the case of a hash collision by storing items with the
same key in an array just like the dir item code. This means we have to
search an array on rare occasion.

Testing wise, the basic namespace operations work well (link, unlink, etc).
The rest has gotten less debugging (and I really don't have a great way of
testing the code in tree-log.c)


Finally, these patches are based off Linux v3.3.
	--Mark

Changes from the first version of this patch:

Thanks to Jan Schmidt for giving it a very nice review. Most of the changes
are from his suggestions.

- Implemented collision handling.

- Standardized naming of extended ref variables (extref).

- moved hashing code to hash.h and gave the function a better name
  (btrfs_extref_hash).

- A few cleanups of error handling.

- Fixed a bug where btrfs_find_one_extref() was erroneously incrementing the
  extref offset before returning it.

- Moved btrfs_find_one_extref() into backref.c. This means that backref.c no
  longer has to include tree-log.h.

- Fixed a bug in iref_to_path() where we were looking for extended refs
  (this actually lead to other bugs). Since iref_to_path() only deals with
  directory inodes we would never have an extended ref.

- added some explicit locking calls in the backref.c changes

- Instead of adding a second iterate function for extended refs, I fixed up
  iterate_irefs_t arguments to take the raw information from whatever ref
  version we're coming from. This removed a bunch of duplicated code.

- I am actually including a patch to btrfs-progs with this drop. :)



From: Mark Fasheh <mfasheh@suse.com>

[PATCH] btrfs-progs: basic support for extended inode refs

This patch adds enough mkfs support to turn on the superblock flag and
btrfs-debug-tree support so that we can visualize the state of extended refs
on disk.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 ctree.h      |   27 ++++++++++++++++++++++++++-
 mkfs.c       |   14 +++++++++-----
 print-tree.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 79 insertions(+), 6 deletions(-)

diff --git a/ctree.h b/ctree.h
index 6545c50..ebf38fe 100644
--- a/ctree.h
+++ b/ctree.h
@@ -115,6 +115,13 @@ struct btrfs_trans_handle;
  */
 #define BTRFS_NAME_LEN 255
 
+/*
+ * Theoretical limit is larger, but we keep this down to a sane
+ * value. That should limit greatly the possibility of collisions on
+ * inode ref items.
+ */
+#define	BTRFS_LINK_MAX	65535U
+
 /* 32 bytes in various csum fields */
 #define BTRFS_CSUM_SIZE 32
 
@@ -412,6 +419,7 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL	(1ULL << 1)
 #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS	(1ULL << 2)
 #define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO	(1ULL << 3)
+
 /*
  * some patches floated around with a second compression method
  * lets save that incompat here for when they do get in
@@ -426,6 +434,7 @@ struct btrfs_super_block {
  */
 #define BTRFS_FEATURE_INCOMPAT_BIG_METADATA     (1ULL << 5)
 
+#define BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF   (1ULL << 6)
 
 #define BTRFS_FEATURE_COMPAT_SUPP		0ULL
 #define BTRFS_FEATURE_COMPAT_RO_SUPP		0ULL
@@ -434,7 +443,8 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL |	\
 	 BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO |		\
 	 BTRFS_FEATURE_INCOMPAT_BIG_METADATA |		\
-	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS)
+	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |		\
+	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
 
 /*
  * A leaf is full of items. offset and size tell us where to find
@@ -573,6 +583,13 @@ struct btrfs_inode_ref {
 	/* name goes here */
 } __attribute__ ((__packed__));
 
+struct btrfs_inode_extref {
+	__le64 parent_objectid;
+	__le64 index;
+	__le16 name_len;
+	__u8   name[0]; /* name goes here */
+} __attribute__ ((__packed__));
+
 struct btrfs_timespec {
 	__le64 sec;
 	__le32 nsec;
@@ -866,6 +883,7 @@ struct btrfs_root {
  */
 #define BTRFS_INODE_ITEM_KEY		1
 #define BTRFS_INODE_REF_KEY		12
+#define BTRFS_INODE_EXTREF_KEY		13
 #define BTRFS_XATTR_ITEM_KEY		24
 #define BTRFS_ORPHAN_ITEM_KEY		48
 
@@ -1145,6 +1163,13 @@ BTRFS_SETGET_FUNCS(inode_ref_name_len, struct btrfs_inode_ref, name_len, 16);
 BTRFS_SETGET_STACK_FUNCS(stack_inode_ref_name_len, struct btrfs_inode_ref, name_len, 16);
 BTRFS_SETGET_FUNCS(inode_ref_index, struct btrfs_inode_ref, index, 64);
 
+/* struct btrfs_inode_extref */
+BTRFS_SETGET_FUNCS(inode_extref_parent, struct btrfs_inode_extref,
+		   parent_objectid, 64);
+BTRFS_SETGET_FUNCS(inode_extref_name_len, struct btrfs_inode_extref,
+		   name_len, 16);
+BTRFS_SETGET_FUNCS(inode_extref_index, struct btrfs_inode_extref, index, 64);
+
 /* struct btrfs_inode_item */
 BTRFS_SETGET_FUNCS(inode_generation, struct btrfs_inode_item, generation, 64);
 BTRFS_SETGET_FUNCS(inode_sequence, struct btrfs_inode_item, sequence, 64);
diff --git a/mkfs.c b/mkfs.c
index c531ef2..5c18a6d 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1225,6 +1225,9 @@ int main(int ac, char **av)
 	u64 source_dir_size = 0;
 	char *pretty_buf;
 
+	struct btrfs_super_block *super;
+	u64 flags;
+
 	while(1) {
 		int c;
 		c = getopt_long(ac, av, "A:b:l:n:s:m:d:L:r:VM", long_options,
@@ -1426,13 +1429,14 @@ raid_groups:
 	ret = create_data_reloc_tree(trans, root);
 	BUG_ON(ret);
 
-	if (mixed) {
-		struct btrfs_super_block *super = &root->fs_info->super_copy;
-		u64 flags = btrfs_super_incompat_flags(super);
+	super = &root->fs_info->super_copy;
+	flags = btrfs_super_incompat_flags(super);
+	flags |= BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF;
 
+	if (mixed)
 		flags |= BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS;
-		btrfs_set_super_incompat_flags(super, flags);
-	}
+
+	btrfs_set_super_incompat_flags(super, flags);
 
 	printf("fs created label %s on %s\n\tnodesize %u leafsize %u "
 	    "sectorsize %u size %s\n",
diff --git a/print-tree.c b/print-tree.c
index fc134c0..6012df8 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -55,6 +55,42 @@ static int print_dir_item(struct extent_buffer *eb, struct btrfs_item *item,
 	return 0;
 }
 
+static int print_inode_extref_item(struct extent_buffer *eb,
+				   struct btrfs_item *item,
+				   struct btrfs_inode_extref *extref)
+{
+	u32 total;
+	u32 cur = 0;
+	u32 len;
+	u32 name_len = 0;
+	u64 index = 0;
+	u64 parent_objid;
+	char namebuf[BTRFS_NAME_LEN];
+
+	total = btrfs_item_size(eb, item);
+
+	while (cur < total) {
+		index = btrfs_inode_extref_index(eb, extref);
+		name_len = btrfs_inode_extref_name_len(eb, extref);
+		parent_objid = btrfs_inode_extref_parent(eb, extref);
+
+		len = (name_len <= sizeof(namebuf))? name_len: sizeof(namebuf);
+
+		read_extent_buffer(eb, namebuf, (unsigned long)(extref->name), len);
+
+		printf("\t\tinode extref index %llu parent %llu namelen %u "
+		       "name: %.*s\n",
+		       (unsigned long long)index,
+		       (unsigned long long)parent_objid,
+		       name_len, len, namebuf);
+
+		len = sizeof(*extref) + name_len;
+		extref = (struct btrfs_inode_extref *)((char *)extref + len);
+		cur += len;
+	}
+	return 0;
+}
+
 static int print_inode_ref_item(struct extent_buffer *eb, struct btrfs_item *item,
 				struct btrfs_inode_ref *ref)
 {
@@ -285,6 +321,9 @@ static void print_key_type(u8 type)
 	case BTRFS_INODE_REF_KEY:
 		printf("INODE_REF");
 		break;
+	case BTRFS_INODE_EXTREF_KEY:
+		printf("INODE_EXTREF");
+		break;
 	case BTRFS_DIR_ITEM_KEY:
 		printf("DIR_ITEM");
 		break;
@@ -454,6 +493,7 @@ void btrfs_print_leaf(struct btrfs_root *root, struct extent_buffer *l)
 	struct btrfs_extent_data_ref *dref;
 	struct btrfs_shared_data_ref *sref;
 	struct btrfs_inode_ref *iref;
+	struct btrfs_inode_extref *iref2;
 	struct btrfs_dev_extent *dev_extent;
 	struct btrfs_disk_key disk_key;
 	struct btrfs_root_item root_item;
@@ -492,6 +532,10 @@ void btrfs_print_leaf(struct btrfs_root *root, struct extent_buffer *l)
 			iref = btrfs_item_ptr(l, i, struct btrfs_inode_ref);
 			print_inode_ref_item(l, item, iref);
 			break;
+		case BTRFS_INODE_EXTREF_KEY:
+			iref2 = btrfs_item_ptr(l, i, struct btrfs_inode_extref);
+			print_inode_extref_item(l, item, iref2);
+			break;
 		case BTRFS_DIR_ITEM_KEY:
 		case BTRFS_DIR_INDEX_KEY:
 		case BTRFS_XATTR_ITEM_KEY:
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 1/3] btrfs: extended inode refs
  2012-05-21 21:46 [PATCH 0/3] btrfs: extended inode refs Mark Fasheh
@ 2012-05-21 21:46 ` Mark Fasheh
  2012-07-06 14:56   ` Jan Schmidt
  2012-05-21 21:46 ` [PATCH 2/3] " Mark Fasheh
  2012-05-21 21:46 ` [PATCH 3/3] " Mark Fasheh
  2 siblings, 1 reply; 21+ messages in thread
From: Mark Fasheh @ 2012-05-21 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, Jan Schmidt, Mark Fasheh, Mark Fasheh

From: Mark Fasheh <mfasheh@suse.com>

This patch adds basic support for extended inode refs. This includes support
for link and unlink of the refs, which basically gets us support for rename
as well.

Inode creation does not need changing - extended refs are only added after
the ref array is full.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/ctree.h      |   52 ++++++++--
 fs/btrfs/hash.h       |   10 ++
 fs/btrfs/inode-item.c |  279 +++++++++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/inode.c      |   23 +++--
 4 files changed, 338 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 80b6486..3882813 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -143,6 +143,13 @@ struct btrfs_ordered_sum;
  */
 #define BTRFS_NAME_LEN 255
 
+/*
+ * Theoretical limit is larger, but we keep this down to a sane
+ * value. That should limit greatly the possibility of collisions on
+ * inode ref items.
+ */
+#define BTRFS_LINK_MAX 65535U
+
 /* 32 bytes in various csum fields */
 #define BTRFS_CSUM_SIZE 32
 
@@ -462,13 +469,16 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS	(1ULL << 2)
 #define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO	(1ULL << 3)
 
+#define BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF	(1ULL << 6)
+
 #define BTRFS_FEATURE_COMPAT_SUPP		0ULL
 #define BTRFS_FEATURE_COMPAT_RO_SUPP		0ULL
 #define BTRFS_FEATURE_INCOMPAT_SUPP			\
 	(BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF |		\
 	 BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL |	\
 	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |		\
-	 BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)
+	 BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO |		\
+	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
 
 /*
  * A leaf is full of items. offset and size tell us where to find
@@ -615,6 +625,14 @@ struct btrfs_inode_ref {
 	/* name goes here */
 } __attribute__ ((__packed__));
 
+struct btrfs_inode_extref {
+	__le64 parent_objectid;
+	__le64 index;
+	__le16 name_len;
+	__u8   name[0];
+	/* name goes here */
+} __attribute__ ((__packed__));
+
 struct btrfs_timespec {
 	__le64 sec;
 	__le32 nsec;
@@ -1400,6 +1418,7 @@ struct btrfs_ioctl_defrag_range_args {
  */
 #define BTRFS_INODE_ITEM_KEY		1
 #define BTRFS_INODE_REF_KEY		12
+#define BTRFS_INODE_EXTREF_KEY		13
 #define BTRFS_XATTR_ITEM_KEY		24
 #define BTRFS_ORPHAN_ITEM_KEY		48
 /* reserve 2-15 close to the inode for later flexibility */
@@ -1701,6 +1720,13 @@ BTRFS_SETGET_STACK_FUNCS(block_group_flags,
 BTRFS_SETGET_FUNCS(inode_ref_name_len, struct btrfs_inode_ref, name_len, 16);
 BTRFS_SETGET_FUNCS(inode_ref_index, struct btrfs_inode_ref, index, 64);
 
+/* struct btrfs_inode_extref */
+BTRFS_SETGET_FUNCS(inode_extref_parent, struct btrfs_inode_extref,
+		   parent_objectid, 64);
+BTRFS_SETGET_FUNCS(inode_extref_name_len, struct btrfs_inode_extref,
+		   name_len, 16);
+BTRFS_SETGET_FUNCS(inode_extref_index, struct btrfs_inode_extref, index, 64);
+
 /* struct btrfs_inode_item */
 BTRFS_SETGET_FUNCS(inode_generation, struct btrfs_inode_item, generation, 64);
 BTRFS_SETGET_FUNCS(inode_sequence, struct btrfs_inode_item, sequence, 64);
@@ -2791,12 +2817,12 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
 			   struct btrfs_root *root,
 			   const char *name, int name_len,
 			   u64 inode_objectid, u64 ref_objectid, u64 *index);
-struct btrfs_inode_ref *
-btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans,
-			struct btrfs_root *root,
-			struct btrfs_path *path,
-			const char *name, int name_len,
-			u64 inode_objectid, u64 ref_objectid, int mod);
+int btrfs_get_inode_ref_index(struct btrfs_trans_handle *trans,
+			      struct btrfs_root *root,
+			      struct btrfs_path *path,
+			      const char *name, int name_len,
+			      u64 inode_objectid, u64 ref_objectid, int mod,
+			      u64 *ret_index);
 int btrfs_insert_empty_inode(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *root,
 			     struct btrfs_path *path, u64 objectid);
@@ -2804,6 +2830,18 @@ int btrfs_lookup_inode(struct btrfs_trans_handle *trans, struct btrfs_root
 		       *root, struct btrfs_path *path,
 		       struct btrfs_key *location, int mod);
 
+struct btrfs_inode_extref *
+btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans,
+			  struct btrfs_root *root,
+			  struct btrfs_path *path,
+			  const char *name, int name_len,
+			  u64 inode_objectid, u64 ref_objectid, int ins_len,
+			  int cow);
+
+int find_name_in_ext_backref(struct btrfs_path *path, const char *name,
+			     int name_len,
+			     struct btrfs_inode_extref **extref_ret);
+
 /* file-item.c */
 int btrfs_del_csums(struct btrfs_trans_handle *trans,
 		    struct btrfs_root *root, u64 bytenr, u64 len);
diff --git a/fs/btrfs/hash.h b/fs/btrfs/hash.h
index db2ff97..1d98281 100644
--- a/fs/btrfs/hash.h
+++ b/fs/btrfs/hash.h
@@ -24,4 +24,14 @@ static inline u64 btrfs_name_hash(const char *name, int len)
 {
 	return crc32c((u32)~1, name, len);
 }
+
+/*
+ * Figure the key offset of an extended inode ref
+ */
+static inline u64 btrfs_extref_hash(u64 parent_objectid, const char *name,
+				    int len)
+{
+	return (u64) crc32c(parent_objectid, name, len);
+}
+
 #endif
diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c
index baa74f3..496fb1c 100644
--- a/fs/btrfs/inode-item.c
+++ b/fs/btrfs/inode-item.c
@@ -18,6 +18,7 @@
 
 #include "ctree.h"
 #include "disk-io.h"
+#include "hash.h"
 #include "transaction.h"
 
 static int find_name_in_backref(struct btrfs_path *path, const char *name,
@@ -49,18 +50,56 @@ static int find_name_in_backref(struct btrfs_path *path, const char *name,
 	return 0;
 }
 
-struct btrfs_inode_ref *
+int find_name_in_ext_backref(struct btrfs_path *path, const char *name,
+			     int name_len,
+			     struct btrfs_inode_extref **extref_ret)
+{
+	struct extent_buffer *leaf;
+	struct btrfs_inode_extref *extref;
+	unsigned long ptr;
+	unsigned long name_ptr;
+	u32 item_size;
+	u32 cur_offset = 0;
+	int ref_name_len;
+
+	leaf = path->nodes[0];
+	item_size = btrfs_item_size_nr(leaf, path->slots[0]);
+	ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
+
+	/*
+	 * Search all extended backrefs in this item. We're only
+	 * looking through any collisions so most of the time this is
+	 * just going to compare against one buffer. If all is well,
+	 * we'll return success and the inode ref object.
+	 */
+	while (cur_offset < item_size) {
+		extref = (struct btrfs_inode_extref *) (ptr + cur_offset);
+		name_ptr = (unsigned long)(&extref->name);
+		ref_name_len = btrfs_inode_extref_name_len(leaf, extref);
+
+		if (ref_name_len == name_len
+		    && (memcmp_extent_buffer(leaf, name, name_ptr, name_len) == 0)) {
+			if (extref_ret)
+				*extref_ret = extref;
+			return 1;
+		}
+
+		cur_offset += ref_name_len + sizeof(*extref);
+	}
+	return 0;
+}
+
+static struct btrfs_inode_ref *
 btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans,
-			struct btrfs_root *root,
-			struct btrfs_path *path,
-			const char *name, int name_len,
-			u64 inode_objectid, u64 ref_objectid, int mod)
+		       struct btrfs_root *root,
+		       struct btrfs_path *path,
+		       const char *name, int name_len,
+		       u64 inode_objectid, u64 ref_objectid, int ins_len,
+		       int cow)
 {
+	int ret;
 	struct btrfs_key key;
 	struct btrfs_inode_ref *ref;
-	int ins_len = mod < 0 ? -1 : 0;
-	int cow = mod != 0;
-	int ret;
 
 	key.objectid = inode_objectid;
 	key.type = BTRFS_INODE_REF_KEY;
@@ -76,20 +115,156 @@ btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans,
 	return ref;
 }
 
-int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
+struct btrfs_inode_extref *
+btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans,
+			  struct btrfs_root *root,
+			  struct btrfs_path *path,
+			  const char *name, int name_len,
+			  u64 inode_objectid, u64 ref_objectid, int ins_len,
+			  int cow)
+{
+	int ret;
+	struct btrfs_key key;
+	struct btrfs_inode_extref *extref;
+
+	key.objectid = inode_objectid;
+	key.type = BTRFS_INODE_EXTREF_KEY;
+	key.offset = btrfs_extref_hash(ref_objectid, name, name_len);
+
+	ret = btrfs_search_slot(trans, root, &key, path, ins_len, cow);
+	if (ret < 0)
+		return ERR_PTR(ret);
+	if (ret > 0)
+		return NULL;
+	if (!find_name_in_ext_backref(path, name, name_len, &extref))
+		return NULL;
+	return extref;
+}
+
+int btrfs_get_inode_ref_index(struct btrfs_trans_handle *trans,
+			      struct btrfs_root *root,
+			      struct btrfs_path *path,
+			      const char *name, int name_len,
+			      u64 inode_objectid, u64 ref_objectid, int mod,
+			      u64 *ret_index)
+{
+	struct btrfs_inode_ref *ref1;
+	struct btrfs_inode_extref *extref;
+	int ins_len = mod < 0 ? -1 : 0;
+	int cow = mod != 0;
+
+	ref1 = btrfs_lookup_inode_ref(trans, root, path, name, name_len,
+				      inode_objectid, ref_objectid, ins_len,
+				      cow);
+	if (IS_ERR(ref1))
+		return PTR_ERR(ref1);
+
+	if (ref1 != NULL) {
+		*ret_index = btrfs_inode_ref_index(path->nodes[0], ref1);
+		return 0;
+	}
+
+	btrfs_release_path(path);
+
+	extref = btrfs_lookup_inode_extref(trans, root, path, name,
+					   name_len, inode_objectid,
+					   ref_objectid, ins_len, cow);
+	if (IS_ERR(extref))
+		return PTR_ERR(extref);
+
+	if (extref) {
+		*ret_index = btrfs_inode_extref_index(path->nodes[0], extref);
+		return 0;
+	}
+
+	return -ENOENT;
+}
+
+int btrfs_del_inode_extref(struct btrfs_trans_handle *trans,
 			   struct btrfs_root *root,
 			   const char *name, int name_len,
 			   u64 inode_objectid, u64 ref_objectid, u64 *index)
 {
 	struct btrfs_path *path;
 	struct btrfs_key key;
+	struct btrfs_inode_extref *extref;
+	struct extent_buffer *leaf;
+	int ret;
+	int del_len = name_len + sizeof(*extref);
+	unsigned long ptr;
+	unsigned long item_start;
+	u32 item_size;
+
+	key.objectid = inode_objectid;
+	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
+	key.offset = btrfs_extref_hash(ref_objectid, name, name_len);
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	path->leave_spinning = 1;
+
+	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
+	if (ret > 0)
+		ret = -ENOENT;
+	if (ret < 0)
+		goto out;
+
+	/*
+	 * Sanity check - did we find the right item for this name?
+	 * This should always succeed so error here will make the FS
+	 * readonly.
+	 */
+	if (!find_name_in_ext_backref(path, name, name_len, &extref)) {
+		btrfs_std_error(root->fs_info, -ENOENT);
+		ret = -EROFS;
+		goto out;
+	}
+
+	leaf = path->nodes[0];
+	item_size = btrfs_item_size_nr(leaf, path->slots[0]);
+	if (index)
+		*index = btrfs_inode_extref_index(leaf, extref);
+
+	if (del_len == item_size) {
+		/*
+		 * Common case only one ref in the item, remove the
+		 * whole item.
+		 */
+		ret = btrfs_del_item(trans, root, path);
+		goto out;
+	}
+
+	ptr = (unsigned long)extref;
+	item_start = btrfs_item_ptr_offset(leaf, path->slots[0]);
+
+	memmove_extent_buffer(leaf, ptr, ptr + del_len,
+			      item_size - (ptr + del_len - item_start));
+
+	ret = btrfs_truncate_item(trans, root, path,
+				  item_size - del_len, 1);
+
+out:
+	btrfs_free_path(path);
+
+	return ret;
+}
+
+int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
+			struct btrfs_root *root,
+			const char *name, int name_len,
+			u64 inode_objectid, u64 ref_objectid, u64 *index)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
 	struct btrfs_inode_ref *ref;
 	struct extent_buffer *leaf;
 	unsigned long ptr;
 	unsigned long item_start;
 	u32 item_size;
 	u32 sub_item_len;
-	int ret;
+	int ret, search_ext_refs = 0;
 	int del_len = name_len + sizeof(*ref);
 
 	key.objectid = inode_objectid;
@@ -105,12 +280,14 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
 	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
 	if (ret > 0) {
 		ret = -ENOENT;
+		search_ext_refs = 1;
 		goto out;
 	} else if (ret < 0) {
 		goto out;
 	}
 	if (!find_name_in_backref(path, name, name_len, &ref)) {
 		ret = -ENOENT;
+		search_ext_refs = 1;
 		goto out;
 	}
 	leaf = path->nodes[0];
@@ -132,6 +309,75 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
 				  item_size - sub_item_len, 1);
 out:
 	btrfs_free_path(path);
+
+	if (search_ext_refs) {
+		/*
+		 * No refs were found, or we could not find the
+		 * name in our ref array. Find and remove the extended
+		 * inode ref then.
+		 */
+		return btrfs_del_inode_extref(trans, root, name, name_len,
+					      inode_objectid, ref_objectid, index);
+	}
+
+	return ret;
+}
+
+/*
+ * btrfs_insert_inode_extref() - Inserts an extended inode ref into a tree.
+ *
+ * The caller must have checked against BTRFS_LINK_MAX already.
+ */
+static int btrfs_insert_inode_extref(struct btrfs_trans_handle *trans,
+				     struct btrfs_root *root,
+				     const char *name, int name_len,
+				     u64 inode_objectid, u64 ref_objectid, u64 index)
+{
+	struct btrfs_inode_extref *extref;
+	int ret;
+	int ins_len = name_len + sizeof(*extref);
+	unsigned long ptr;
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	struct extent_buffer *leaf;
+	struct btrfs_item *item;
+
+	key.objectid = inode_objectid;
+	key.type = BTRFS_INODE_EXTREF_KEY;
+	key.offset = btrfs_extref_hash(ref_objectid, name, name_len);
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	path->leave_spinning = 1;
+	ret = btrfs_insert_empty_item(trans, root, path, &key,
+				      ins_len);
+	if (ret == -EEXIST) {
+		if (find_name_in_ext_backref(path, name, name_len, NULL))
+			goto out;
+
+		ret = btrfs_extend_item(trans, root, path, ins_len);
+	}
+	if (ret < 0)
+		goto out;
+
+	leaf = path->nodes[0];
+	item = btrfs_item_nr(leaf, path->slots[0]);
+	ptr = (unsigned long)btrfs_item_ptr(leaf, path->slots[0], char);
+	ptr += btrfs_item_size(leaf, item) - ins_len;
+	extref = (struct btrfs_inode_extref *)ptr;
+
+	btrfs_set_inode_extref_name_len(path->nodes[0], extref, name_len);
+	btrfs_set_inode_extref_index(path->nodes[0], extref, index);
+	btrfs_set_inode_extref_parent(path->nodes[0], extref, ref_objectid);
+
+	ptr = (unsigned long)&extref->name;
+	write_extent_buffer(path->nodes[0], name, ptr, name_len);
+	btrfs_mark_buffer_dirty(path->nodes[0]);
+
+out:
+	btrfs_free_path(path);
 	return ret;
 }
 
@@ -189,6 +435,19 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans,
 
 out:
 	btrfs_free_path(path);
+
+	if (ret == -EMLINK) {
+		struct btrfs_super_block *disk_super = root->fs_info->super_copy;
+		/* We ran out of space in the ref array. Need to
+		 * add an extended ref. */
+		if (btrfs_super_incompat_flags(disk_super)
+		    & BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
+			ret = btrfs_insert_inode_extref(trans, root, name,
+							name_len,
+							inode_objectid,
+							ref_objectid, index);
+	}
+
 	return ret;
 }
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 892b347..5ce89e4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2689,7 +2689,6 @@ static struct btrfs_trans_handle *__unlink_start_trans(struct inode *dir,
 	struct btrfs_trans_handle *trans;
 	struct btrfs_root *root = BTRFS_I(dir)->root;
 	struct btrfs_path *path;
-	struct btrfs_inode_ref *ref;
 	struct btrfs_dir_item *di;
 	struct inode *inode = dentry->d_inode;
 	u64 index;
@@ -2803,17 +2802,17 @@ static struct btrfs_trans_handle *__unlink_start_trans(struct inode *dir,
 	}
 	btrfs_release_path(path);
 
-	ref = btrfs_lookup_inode_ref(trans, root, path,
-				dentry->d_name.name, dentry->d_name.len,
-				ino, dir_ino, 0);
-	if (IS_ERR(ref)) {
-		err = PTR_ERR(ref);
+	ret = btrfs_get_inode_ref_index(trans, root, path, dentry->d_name.name,
+					dentry->d_name.len, ino, dir_ino, 0,
+					&index);
+	if (ret) {
+		err = ret;
 		goto out;
 	}
-	BUG_ON(!ref);
+
 	if (check_path_shared(root, path))
 		goto out;
-	index = btrfs_inode_ref_index(path->nodes[0], ref);
+
 	btrfs_release_path(path);
 
 	/*
@@ -4484,6 +4483,12 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans,
 	btrfs_set_key_type(&key[0], BTRFS_INODE_ITEM_KEY);
 	key[0].offset = 0;
 
+	/*
+	 * Start new inodes with an inode_ref. This is slightly more
+	 * efficient for small numbers of hard links since they will
+	 * be packed into one item. Extended refs will kick in if we
+	 * add more hard links than can fit in the ref item.
+	 */
 	key[1].objectid = objectid;
 	btrfs_set_key_type(&key[1], BTRFS_INODE_REF_KEY);
 	key[1].offset = ref_objectid;
@@ -4777,7 +4782,7 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
 	if (root->objectid != BTRFS_I(inode)->root->objectid)
 		return -EXDEV;
 
-	if (inode->i_nlink == ~0U)
+	if (inode->i_nlink >= BTRFS_LINK_MAX)
 		return -EMLINK;
 
 	err = btrfs_set_inode_index(dir, &index);
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/3] btrfs: extended inode refs
  2012-05-21 21:46 [PATCH 0/3] btrfs: extended inode refs Mark Fasheh
  2012-05-21 21:46 ` [PATCH 1/3] " Mark Fasheh
@ 2012-05-21 21:46 ` Mark Fasheh
  2012-07-06 14:57   ` Jan Schmidt
  2012-05-21 21:46 ` [PATCH 3/3] " Mark Fasheh
  2 siblings, 1 reply; 21+ messages in thread
From: Mark Fasheh @ 2012-05-21 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, Jan Schmidt, Mark Fasheh, Mark Fasheh

From: Mark Fasheh <mfasheh@suse.com>

Teach tree-log.c about extended inode refs. In particular, we have to adjust
the behavior of inode ref replay as well as log tree recovery to account for
the existence of extended refs.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/backref.c  |   68 +++++++++++
 fs/btrfs/backref.h  |    5 +
 fs/btrfs/tree-log.c |  308 +++++++++++++++++++++++++++++++++++++++++---------
 fs/btrfs/tree-log.h |    1 +
 4 files changed, 326 insertions(+), 56 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 0436c12..c97240a 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -857,6 +857,74 @@ static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
 				found_key);
 }
 
+int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
+			  u64 start_off, struct btrfs_path *path,
+			  struct btrfs_inode_extref **ret_extref,
+			  u64 *found_off)
+{
+	int ret, slot;
+	struct btrfs_key key;
+	struct btrfs_key found_key;
+	struct btrfs_inode_extref *extref;
+	struct extent_buffer *leaf;
+	unsigned long ptr;
+
+	key.objectid = inode_objectid;
+	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
+	key.offset = start_off;
+
+	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+	if (ret < 0)
+		return ret;
+
+	while (1) {
+		leaf = path->nodes[0];
+		slot = path->slots[0];
+		if (slot >= btrfs_header_nritems(leaf)) {
+			/*
+			 * If the item at offset is not found,
+			 * btrfs_search_slot will point us to the slot
+			 * where it should be inserted. In our case
+			 * that will be the slot directly before the
+			 * next INODE_REF_KEY_V2 item. In the case
+			 * that we're pointing to the last slot in a
+			 * leaf, we must move one leaf over.
+			 */
+			ret = btrfs_next_leaf(root, path);
+			if (ret) {
+				if (ret >= 1)
+					ret = -ENOENT;
+				break;
+			}
+			continue;
+		}
+
+		btrfs_item_key_to_cpu(leaf, &found_key, slot);
+
+		/*
+		 * Check that we're still looking at an extended ref key for
+		 * this particular objectid. If we have different
+		 * objectid or type then there are no more to be found
+		 * in the tree and we can exit.
+		 */
+		ret = -ENOENT;
+		if (found_key.objectid != inode_objectid)
+			break;
+		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
+			break;
+
+		ret = 0;
+		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
+		extref = (struct btrfs_inode_extref *)ptr;
+		*ret_extref = extref;
+		if (found_off)
+			*found_off = found_key.offset;
+		break;
+	}
+
+	return ret;
+}
+
 /*
  * this iterates to turn a btrfs_inode_ref into a full filesystem path. elements
  * of the path are separated by '/' and the path is guaranteed to be
diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
index d00dfa9..8586d1b 100644
--- a/fs/btrfs/backref.h
+++ b/fs/btrfs/backref.h
@@ -64,4 +64,9 @@ struct inode_fs_paths *init_ipath(s32 total_bytes, struct btrfs_root *fs_root,
 					struct btrfs_path *path);
 void free_ipath(struct inode_fs_paths *ipath);
 
+int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
+			  u64 start_off, struct btrfs_path *path,
+			  struct btrfs_inode_extref **ret_extref,
+			  u64 *found_off);
+
 #endif
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 966cc74..1e812dd 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -23,8 +23,10 @@
 #include "disk-io.h"
 #include "locking.h"
 #include "print-tree.h"
+#include "backref.h"
 #include "compat.h"
 #include "tree-log.h"
+#include "hash.h"
 
 /* magic values for the inode_only field in btrfs_log_inode:
  *
@@ -751,10 +753,10 @@ static noinline int backref_in_log(struct btrfs_root *log,
 	unsigned long ptr;
 	unsigned long ptr_end;
 	unsigned long name_ptr;
-	int found_name_len;
 	int item_size;
 	int ret;
 	int match = 0;
+	int found_name_len;
 
 	path = btrfs_alloc_path();
 	if (!path)
@@ -764,8 +766,16 @@ static noinline int backref_in_log(struct btrfs_root *log,
 	if (ret != 0)
 		goto out;
 
-	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
 	ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]);
+
+	if (key->type == BTRFS_INODE_EXTREF_KEY) {
+		if (find_name_in_ext_backref(path, name, namelen, NULL))
+			match = 1;
+
+		goto out;
+	}
+
+	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
 	ptr_end = ptr + item_size;
 	while (ptr < ptr_end) {
 		ref = (struct btrfs_inode_ref *)ptr;
@@ -786,6 +796,50 @@ out:
 	return match;
 }
 
+static int extref_get_fields(struct extent_buffer *eb, int slot,
+			     u32 *namelen, char **name, u64 *index,
+			     u64 *parent_objectid)
+{
+	struct btrfs_inode_extref *extref;
+
+	extref = (struct btrfs_inode_extref *)btrfs_item_ptr_offset(eb, slot);
+
+	*namelen = btrfs_inode_extref_name_len(eb, extref);
+	*name = kmalloc(*namelen, GFP_NOFS);
+	if (*name == NULL)
+		return -ENOMEM;
+
+	read_extent_buffer(eb, *name, (unsigned long)&extref->name,
+			   *namelen);
+
+	*index = btrfs_inode_extref_index(eb, extref);
+	if (parent_objectid)
+		*parent_objectid = btrfs_inode_extref_parent(eb, extref);
+
+	return 0;
+}
+
+static int ref_get_fields(struct btrfs_key *key, struct extent_buffer *eb,
+			  int slot, u32 *namelen, char **name, u64 *index,
+			  u64 *parent_objectid)
+{
+	struct btrfs_inode_ref *ref;
+
+	ref = (struct btrfs_inode_ref *)btrfs_item_ptr_offset(eb, slot);
+
+	*namelen = btrfs_inode_ref_name_len(eb, ref);
+	*name = kmalloc(*namelen, GFP_NOFS);	
+	if (*name == NULL)
+		return -ENOMEM;
+
+	read_extent_buffer(eb, *name, (unsigned long)(ref + 1), *namelen);
+
+	*index = btrfs_inode_ref_index(eb, ref);
+	if (parent_objectid)
+		*parent_objectid = key->offset;
+
+	return 0;
+}
 
 /*
  * replay one inode back reference item found in the log tree.
@@ -801,15 +855,25 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
 				  struct btrfs_key *key)
 {
 	struct btrfs_inode_ref *ref;
+	struct btrfs_inode_extref *extref;
 	struct btrfs_dir_item *di;
+	struct btrfs_key search_key;
 	struct inode *dir;
 	struct inode *inode;
 	unsigned long ref_ptr;
 	unsigned long ref_end;
 	char *name;
+	char *victim_name;
 	int namelen;
+	int victim_name_len;
 	int ret;
 	int search_done = 0;
+	int log_ref_ver = 0;
+	u64 parent_objectid;
+	u64 inode_objectid;
+	u64 ref_index;
+	struct extent_buffer *leaf;
+	int ref_struct_size;
 
 	/*
 	 * it is possible that we didn't log all the parent directories
@@ -817,32 +881,44 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
 	 * copy the back ref in.  The link count fixup code will take
 	 * care of the rest
 	 */
-	dir = read_one_inode(root, key->offset);
+
+
+	ref_ptr = btrfs_item_ptr_offset(eb, slot);
+	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
+
+	if (key->type == BTRFS_INODE_EXTREF_KEY) {	
+		ref_struct_size = sizeof(*extref);
+		log_ref_ver = 1;
+
+		ret = extref_get_fields(eb, slot, &namelen, &name, &ref_index,
+					&parent_objectid);
+		if (ret)
+			return ret;
+	} else {
+		ref_struct_size = sizeof(*ref);
+
+		ret = ref_get_fields(key, eb, slot, &namelen, &name,
+				     &ref_index, &parent_objectid);
+		if (ret)
+			return -ENOMEM;
+	}
+
+	inode_objectid = key->objectid;
+
+	dir = read_one_inode(root, parent_objectid);
 	if (!dir)
 		return -ENOENT;
 
-	inode = read_one_inode(root, key->objectid);
+	inode = read_one_inode(root, inode_objectid);
 	if (!inode) {
 		iput(dir);
 		return -EIO;
 	}
 
-	ref_ptr = btrfs_item_ptr_offset(eb, slot);
-	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
-
 again:
-	ref = (struct btrfs_inode_ref *)ref_ptr;
-
-	namelen = btrfs_inode_ref_name_len(eb, ref);
-	name = kmalloc(namelen, GFP_NOFS);
-	BUG_ON(!name);
-
-	read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
-
 	/* if we already have a perfect match, we're done */
 	if (inode_in_dir(root, path, btrfs_ino(dir), btrfs_ino(inode),
-			 btrfs_inode_ref_index(eb, ref),
-			 name, namelen)) {
+			 ref_index, name, namelen)) {
 		goto out;
 	}
 
@@ -857,19 +933,23 @@ again:
 	if (search_done)
 		goto insert;
 
-	ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
+	/* Search old style refs */
+	search_key.objectid = inode_objectid;
+	search_key.type = BTRFS_INODE_REF_KEY;
+	search_key.offset = parent_objectid;
+
+	ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0);
 	if (ret == 0) {
-		char *victim_name;
-		int victim_name_len;
 		struct btrfs_inode_ref *victim_ref;
 		unsigned long ptr;
 		unsigned long ptr_end;
-		struct extent_buffer *leaf = path->nodes[0];
+
+		leaf = path->nodes[0];
 
 		/* are we trying to overwrite a back ref for the root directory
 		 * if so, just jump out, we're done
 		 */
-		if (key->objectid == key->offset)
+		if (search_key.objectid == search_key.offset)
 			goto out_nowrite;
 
 		/* check all the names in this back reference to see
@@ -889,7 +969,7 @@ again:
 					   (unsigned long)(victim_ref + 1),
 					   victim_name_len);
 
-			if (!backref_in_log(log, key, victim_name,
+			if (!backref_in_log(log, &search_key, victim_name,
 					    victim_name_len)) {
 				btrfs_inc_nlink(inode);
 				btrfs_release_path(path);
@@ -902,19 +982,60 @@ again:
 			ptr = (unsigned long)(victim_ref + 1) + victim_name_len;
 		}
 		BUG_ON(ret);
-
-		/*
-		 * NOTE: we have searched root tree and checked the
-		 * coresponding ref, it does not need to check again.
-		 */
-		search_done = 1;
 	}
 	btrfs_release_path(path);
 
+	/* Same search but for extended refs */
+	extref = btrfs_lookup_inode_extref(NULL, root, path, name, namelen,
+					   inode_objectid, parent_objectid, 0,
+					   0);
+	if (!IS_ERR_OR_NULL(extref)) {
+		u32 item_size;
+		u32 cur_offset = 0;
+		unsigned long base;
+
+		leaf = path->nodes[0];
+
+		item_size = btrfs_item_size_nr(leaf, path->slots[0]);
+		base = btrfs_item_ptr_offset(leaf, path->slots[0]);
+
+		while (cur_offset < item_size) {
+			extref = (struct btrfs_inode_extref *)base + cur_offset;
+
+			victim_name_len = btrfs_inode_extref_name_len(eb, extref);
+			victim_name = kmalloc(namelen, GFP_NOFS);
+			leaf = path->nodes[0];
+			read_extent_buffer(eb, name, (unsigned long)&extref->name, namelen);
+
+			search_key.objectid = inode_objectid;
+			search_key.type = BTRFS_INODE_EXTREF_KEY;
+			search_key.offset = btrfs_extref_hash(parent_objectid,
+							      name, namelen);
+			if (!backref_in_log(log, &search_key, victim_name,
+					    victim_name_len)) {
+				btrfs_inc_nlink(inode);
+				btrfs_release_path(path);
+
+				ret = btrfs_unlink_inode(trans, root, dir,
+							 inode, victim_name,
+							 victim_name_len);
+			}
+			kfree(victim_name);
+			BUG_ON(ret);
+
+			cur_offset += victim_name_len + sizeof(*extref);
+		}
+	}
+
+	/*
+	 * NOTE: we have searched root tree and checked the
+	 * coresponding refs, it does not need to be checked again.
+	 */
+	search_done = 1;
+
 	/* look for a conflicting sequence number */
 	di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir),
-					 btrfs_inode_ref_index(eb, ref),
-					 name, namelen, 0);
+					 ref_index, name, namelen, 0);
 	if (di && !IS_ERR(di)) {
 		ret = drop_one_dir_item(trans, root, path, dir, di);
 		BUG_ON(ret);
@@ -932,17 +1053,25 @@ again:
 
 insert:
 	/* insert our name */
-	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
-			     btrfs_inode_ref_index(eb, ref));
+	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0, ref_index);
 	BUG_ON(ret);
 
 	btrfs_update_inode(trans, root, inode);
 
 out:
-	ref_ptr = (unsigned long)(ref + 1) + namelen;
+	ref_ptr = (unsigned long)(ref_ptr + ref_struct_size) + namelen;
 	kfree(name);
-	if (ref_ptr < ref_end)
+	if (ref_ptr < ref_end) {
+		if (log_ref_ver) {
+			ret = extref_get_fields(eb, slot, &namelen, &name,
+						&ref_index, NULL);
+		} else {
+			ret = ref_get_fields(key, eb, slot, &namelen, &name,
+					     &ref_index, NULL);
+		}
+		BUG_ON(ret);
 		goto again;
+	}
 
 	/* finally write the back reference in the inode */
 	ret = overwrite_item(trans, root, path, eb, slot, key);
@@ -965,25 +1094,52 @@ static int insert_orphan_item(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
+static int count_inode_extrefs(struct btrfs_root *root,
+			       struct inode *inode, struct btrfs_path *path)
+{
+	int ret;
+	int name_len;
+	unsigned int nlink = 0;
+	u32 item_size;
+	u32 cur_offset = 0;
+	u64 inode_objectid = btrfs_ino(inode);
+	u64 offset = 0;
+	unsigned long ptr;
+	struct btrfs_inode_extref *extref;
+	struct extent_buffer *leaf;
 
-/*
- * There are a few corners where the link count of the file can't
- * be properly maintained during replay.  So, instead of adding
- * lots of complexity to the log code, we just scan the backrefs
- * for any file that has been through replay.
- *
- * The scan will update the link count on the inode to reflect the
- * number of back refs found.  If it goes down to zero, the iput
- * will free the inode.
- */
-static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
-					   struct btrfs_root *root,
-					   struct inode *inode)
+	while (1) {
+		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
+					    &extref, &offset);
+		if (ret)
+			break;
+
+		leaf = path->nodes[0];
+		item_size = btrfs_item_size_nr(leaf, path->slots[0]);
+		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
+
+		while (cur_offset < item_size) {
+			extref = (struct btrfs_inode_extref *) (ptr + cur_offset);
+			name_len = btrfs_inode_extref_name_len(leaf, extref);
+
+			nlink++;
+
+			cur_offset += name_len + sizeof(*extref);
+		}
+
+		offset++;
+	}
+	btrfs_release_path(path);
+
+	return nlink;
+}
+
+static int count_inode_refs(struct btrfs_root *root,
+			       struct inode *inode, struct btrfs_path *path)
 {
-	struct btrfs_path *path;
 	int ret;
 	struct btrfs_key key;
-	u64 nlink = 0;
+	unsigned int nlink = 0;
 	unsigned long ptr;
 	unsigned long ptr_end;
 	int name_len;
@@ -993,10 +1149,6 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
 	key.type = BTRFS_INODE_REF_KEY;
 	key.offset = (u64)-1;
 
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
-
 	while (1) {
 		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
 		if (ret < 0)
@@ -1030,6 +1182,45 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
 		btrfs_release_path(path);
 	}
 	btrfs_release_path(path);
+
+	return nlink;
+}
+
+/*
+ * There are a few corners where the link count of the file can't
+ * be properly maintained during replay.  So, instead of adding
+ * lots of complexity to the log code, we just scan the backrefs
+ * for any file that has been through replay.
+ *
+ * The scan will update the link count on the inode to reflect the
+ * number of back refs found.  If it goes down to zero, the iput
+ * will free the inode.
+ */
+static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
+					   struct btrfs_root *root,
+					   struct inode *inode)
+{
+	struct btrfs_path *path;
+	int ret;
+	u64 nlink = 0;
+	u64 ino = btrfs_ino(inode);
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	ret = count_inode_refs(root, inode, path);
+	if (ret < 0)
+		goto out;
+
+	nlink = ret;
+
+	ret = count_inode_extrefs(root, inode, path);
+	if (ret < 0)
+		goto out;
+
+	nlink += ret;
+
 	if (nlink != inode->i_nlink) {
 		set_nlink(inode, nlink);
 		btrfs_update_inode(trans, root, inode);
@@ -1045,9 +1236,10 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
 		ret = insert_orphan_item(trans, root, ino);
 		BUG_ON(ret);
 	}
-	btrfs_free_path(path);
 
-	return 0;
+out:
+	btrfs_free_path(path);
+	return ret;
 }
 
 static noinline int fixup_inode_link_counts(struct btrfs_trans_handle *trans,
@@ -1689,6 +1881,10 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb,
 			ret = add_inode_ref(wc->trans, root, log, path,
 					    eb, i, &key);
 			BUG_ON(ret && ret != -ENOENT);
+		} else if (key.type == BTRFS_INODE_EXTREF_KEY) {
+			ret = add_inode_ref(wc->trans, root, log, path,
+					    eb, i, &key);
+			BUG_ON(ret && ret != -ENOENT);
 		} else if (key.type == BTRFS_EXTENT_DATA_KEY) {
 			ret = replay_one_extent(wc->trans, root, path,
 						eb, i, &key);
diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
index 2270ac5..a935a8c 100644
--- a/fs/btrfs/tree-log.h
+++ b/fs/btrfs/tree-log.h
@@ -49,4 +49,5 @@ void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans,
 int btrfs_log_new_name(struct btrfs_trans_handle *trans,
 			struct inode *inode, struct inode *old_dir,
 			struct dentry *parent);
+
 #endif
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/3] btrfs: extended inode refs
  2012-05-21 21:46 [PATCH 0/3] btrfs: extended inode refs Mark Fasheh
  2012-05-21 21:46 ` [PATCH 1/3] " Mark Fasheh
  2012-05-21 21:46 ` [PATCH 2/3] " Mark Fasheh
@ 2012-05-21 21:46 ` Mark Fasheh
  2012-07-06 14:57   ` Jan Schmidt
  2 siblings, 1 reply; 21+ messages in thread
From: Mark Fasheh @ 2012-05-21 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, Jan Schmidt, Mark Fasheh, Mark Fasheh

From: Mark Fasheh <mfasheh@suse.com>

The iterate_irefs in backref.c is used to build path components from inode
refs. This patch adds code to iterate extended refs as well.

I had modify the callback function signature to abstract out some of the
differences between ref structures. iref_to_path() also needed similar
changes.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/backref.c |  144 +++++++++++++++++++++++++++++++++++++++++++---------
 fs/btrfs/backref.h |    2 -
 2 files changed, 119 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index c97240a..d88fa49 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -22,6 +22,7 @@
 #include "ulist.h"
 #include "transaction.h"
 #include "delayed-ref.h"
+#include "locking.h"
 
 /*
  * this structure records all encountered refs on the way up to the root
@@ -940,34 +941,35 @@ int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
  * value will be smaller than dest. callers must check this!
  */
 static char *iref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path,
-				struct btrfs_inode_ref *iref,
-				struct extent_buffer *eb_in, u64 parent,
-				char *dest, u32 size)
+			  u32 name_len, unsigned long name_off,
+			  struct extent_buffer *eb_in, u64 parent,
+			  char *dest, u32 size)
 {
-	u32 len;
 	int slot;
 	u64 next_inum;
 	int ret;
 	s64 bytes_left = size - 1;
 	struct extent_buffer *eb = eb_in;
 	struct btrfs_key found_key;
+	struct btrfs_inode_ref *iref;
 
 	if (bytes_left >= 0)
 		dest[bytes_left] = '\0';
 
 	while (1) {
-		len = btrfs_inode_ref_name_len(eb, iref);
-		bytes_left -= len;
+		bytes_left -= name_len;
 		if (bytes_left >= 0)
 			read_extent_buffer(eb, dest + bytes_left,
-						(unsigned long)(iref + 1), len);
+					   name_off, name_len);
 		if (eb != eb_in)
 			free_extent_buffer(eb);
+
 		ret = inode_ref_info(parent, 0, fs_root, path, &found_key);
 		if (ret > 0)
 			ret = -ENOENT;
 		if (ret)
 			break;
+
 		next_inum = found_key.offset;
 
 		/* regular exit ahead */
@@ -980,8 +982,11 @@ static char *iref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path,
 		if (eb != eb_in)
 			atomic_inc(&eb->refs);
 		btrfs_release_path(path);
-
 		iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
+
+		name_len = btrfs_inode_ref_name_len(eb, iref);
+		name_off = (unsigned long)(iref + 1);
+
 		parent = next_inum;
 		--bytes_left;
 		if (bytes_left >= 0)
@@ -1294,9 +1299,12 @@ int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info,
 	return ret;
 }
 
-static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
-				struct btrfs_path *path,
-				iterate_irefs_t *iterate, void *ctx)
+typedef int (iterate_irefs_t)(u64 parent, u32 name_len, unsigned long name_off,
+			      struct extent_buffer *eb, void *ctx);
+
+static int iterate_inode_refs(u64 inum, struct btrfs_root *fs_root,
+			      struct btrfs_path *path,
+			      iterate_irefs_t *iterate, void *ctx)
 {
 	int ret;
 	int slot;
@@ -1312,7 +1320,7 @@ static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
 
 	while (1) {
 		ret = inode_ref_info(inum, parent ? parent+1 : 0, fs_root, path,
-					&found_key);
+				     &found_key);
 		if (ret < 0)
 			break;
 		if (ret) {
@@ -1326,8 +1334,11 @@ static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
 		eb = path->nodes[0];
 		/* make sure we can use eb after releasing the path */
 		atomic_inc(&eb->refs);
+		btrfs_tree_read_lock(eb);
+		btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
 		btrfs_release_path(path);
 
+
 		item = btrfs_item_nr(eb, slot);
 		iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
 
@@ -1338,15 +1349,81 @@ static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
 				 "tree %llu\n", cur,
 				 (unsigned long long)found_key.objectid,
 				 (unsigned long long)fs_root->objectid);
-			ret = iterate(parent, iref, eb, ctx);
-			if (ret) {
-				free_extent_buffer(eb);
+			ret = iterate(parent, name_len,
+				      (unsigned long)(iref + 1),eb, ctx);
+			if (ret)
 				break;
-			}
 			len = sizeof(*iref) + name_len;
 			iref = (struct btrfs_inode_ref *)((char *)iref + len);
 		}
+		btrfs_tree_read_unlock_blocking(eb);
+		free_extent_buffer(eb);
+	}
+
+	btrfs_release_path(path);
+
+	return ret;
+}
+
+static int iterate_inode_extrefs(u64 inum, struct btrfs_root *fs_root,
+				 struct btrfs_path *path,
+				 iterate_irefs_t *iterate, void *ctx)
+{
+	int ret;
+	int slot;
+	u64 offset = 0;
+	u64 parent;
+	int found = 0;
+	struct extent_buffer *eb;
+	struct btrfs_inode_extref *extref;
+	struct extent_buffer *leaf;
+	u32 item_size;
+	u32 cur_offset;
+	unsigned long ptr;
+
+	while (1) {
+		ret = btrfs_find_one_extref(fs_root, inum, offset, path, &extref,
+					    &offset);
+		if (ret < 0)
+			break;
+		if (ret) {
+			ret = found ? 0 : -ENOENT;
+			break;
+		}
+		++found;
+
+		slot = path->slots[0];
+		eb = path->nodes[0];
+		/* make sure we can use eb after releasing the path */
+		atomic_inc(&eb->refs);
+
+		btrfs_tree_read_lock(eb);
+		btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
+		btrfs_release_path(path);
+
+		leaf = path->nodes[0];
+		item_size = btrfs_item_size_nr(leaf, path->slots[0]);
+		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
+		cur_offset = 0;
+
+		while (cur_offset < item_size) {
+			u32 name_len;
+
+			extref = (struct btrfs_inode_extref *)(ptr + cur_offset);
+			parent = btrfs_inode_extref_parent(eb, extref);
+			name_len = btrfs_inode_extref_name_len(eb, extref);
+			ret = iterate(parent, name_len,
+				      (unsigned long)&extref->name, eb, ctx);
+			if (ret)
+				break;
+
+			cur_offset += btrfs_inode_extref_name_len(leaf, extref);
+			cur_offset += sizeof(*extref);
+		}
+		btrfs_tree_read_unlock_blocking(eb);
 		free_extent_buffer(eb);
+
+		offset++;
 	}
 
 	btrfs_release_path(path);
@@ -1354,12 +1431,32 @@ static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
 	return ret;
 }
 
+static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
+			 struct btrfs_path *path, iterate_irefs_t *iterate,
+			 void *ctx)
+{
+	int ret;
+	int found_refs = 0;
+
+	ret = iterate_inode_refs(inum, fs_root, path, iterate, ctx);
+	if (!ret)
+		++found_refs;
+	else if (ret != -ENOENT)
+		return ret;
+
+	ret = iterate_inode_extrefs(inum, fs_root, path, iterate, ctx);
+	if (ret == -ENOENT && found_refs)
+		return 0;
+
+	return ret;
+}
+
 /*
  * returns 0 if the path could be dumped (probably truncated)
  * returns <0 in case of an error
  */
-static int inode_to_path(u64 inum, struct btrfs_inode_ref *iref,
-				struct extent_buffer *eb, void *ctx)
+static int inode_to_path(u64 inum, u32 name_len, unsigned long name_off,
+			 struct extent_buffer *eb, void *ctx)
 {
 	struct inode_fs_paths *ipath = ctx;
 	char *fspath;
@@ -1372,20 +1469,17 @@ static int inode_to_path(u64 inum, struct btrfs_inode_ref *iref,
 					ipath->fspath->bytes_left - s_ptr : 0;
 
 	fspath_min = (char *)ipath->fspath->val + (i + 1) * s_ptr;
-	fspath = iref_to_path(ipath->fs_root, ipath->btrfs_path, iref, eb,
-				inum, fspath_min, bytes_left);
+	fspath = iref_to_path(ipath->fs_root, ipath->btrfs_path, name_len,
+			      name_off, eb, inum, fspath_min,
+			      bytes_left);
 	if (IS_ERR(fspath))
 		return PTR_ERR(fspath);
 
 	if (fspath > fspath_min) {
-		pr_debug("path resolved: %s\n", fspath);
 		ipath->fspath->val[i] = (u64)(unsigned long)fspath;
 		++ipath->fspath->elem_cnt;
 		ipath->fspath->bytes_left = fspath - fspath_min;
 	} else {
-		pr_debug("missed path, not enough space. missing bytes: %lu, "
-			 "constructed so far: %s\n",
-			 (unsigned long)(fspath_min - fspath), fspath_min);
 		++ipath->fspath->elem_missed;
 		ipath->fspath->bytes_missing += fspath_min - fspath;
 		ipath->fspath->bytes_left = 0;
@@ -1407,7 +1501,7 @@ static int inode_to_path(u64 inum, struct btrfs_inode_ref *iref,
 int paths_from_inode(u64 inum, struct inode_fs_paths *ipath)
 {
 	return iterate_irefs(inum, ipath->fs_root, ipath->btrfs_path,
-				inode_to_path, ipath);
+			     inode_to_path, ipath);
 }
 
 /*
diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
index 8586d1b..649a220 100644
--- a/fs/btrfs/backref.h
+++ b/fs/btrfs/backref.h
@@ -30,8 +30,6 @@ struct inode_fs_paths {
 
 typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 root,
 		void *ctx);
-typedef int (iterate_irefs_t)(u64 parent, struct btrfs_inode_ref *iref,
-				struct extent_buffer *eb, void *ctx);
 
 int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
 			struct btrfs_path *path);
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] btrfs: extended inode refs
  2012-05-21 21:46 ` [PATCH 1/3] " Mark Fasheh
@ 2012-07-06 14:56   ` Jan Schmidt
  2012-07-06 15:14     ` Stefan Behrens
                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Jan Schmidt @ 2012-07-06 14:56 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: linux-btrfs, Chris Mason, Mark Fasheh

Hi Mark,

Sorry, I'm a bit late with my feedback.

On Mon, May 21, 2012 at 23:46 (+0200), Mark Fasheh wrote:
> From: Mark Fasheh <mfasheh@suse.com>

Where does this From line come from? Git shouldn't add them unless its different
from the sender's name. Shouldn't be in the patch.

> This patch adds basic support for extended inode refs. This includes support
> for link and unlink of the refs, which basically gets us support for rename
> as well.
> 
> Inode creation does not need changing - extended refs are only added after
> the ref array is full.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> ---
>  fs/btrfs/ctree.h      |   52 ++++++++--
>  fs/btrfs/hash.h       |   10 ++
>  fs/btrfs/inode-item.c |  279 +++++++++++++++++++++++++++++++++++++++++++++++--
>  fs/btrfs/inode.c      |   23 +++--
>  4 files changed, 338 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 80b6486..3882813 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -143,6 +143,13 @@ struct btrfs_ordered_sum;
>   */
>  #define BTRFS_NAME_LEN 255
>  
> +/*
> + * Theoretical limit is larger, but we keep this down to a sane
> + * value. That should limit greatly the possibility of collisions on
> + * inode ref items.
> + */
> +#define BTRFS_LINK_MAX 65535U
> +
>  /* 32 bytes in various csum fields */
>  #define BTRFS_CSUM_SIZE 32
>  
> @@ -462,13 +469,16 @@ struct btrfs_super_block {
>  #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS	(1ULL << 2)
>  #define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO	(1ULL << 3)
>  
> +#define BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF	(1ULL << 6)
> +
>  #define BTRFS_FEATURE_COMPAT_SUPP		0ULL
>  #define BTRFS_FEATURE_COMPAT_RO_SUPP		0ULL
>  #define BTRFS_FEATURE_INCOMPAT_SUPP			\
>  	(BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF |		\
>  	 BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL |	\
>  	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |		\
> -	 BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)
> +	 BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO |		\
> +	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
>  
>  /*
>   * A leaf is full of items. offset and size tell us where to find
> @@ -615,6 +625,14 @@ struct btrfs_inode_ref {
>  	/* name goes here */
>  } __attribute__ ((__packed__));
>  
> +struct btrfs_inode_extref {
> +	__le64 parent_objectid;
> +	__le64 index;
> +	__le16 name_len;
> +	__u8   name[0];
> +	/* name goes here */
> +} __attribute__ ((__packed__));
> +
>  struct btrfs_timespec {
>  	__le64 sec;
>  	__le32 nsec;
> @@ -1400,6 +1418,7 @@ struct btrfs_ioctl_defrag_range_args {
>   */
>  #define BTRFS_INODE_ITEM_KEY		1
>  #define BTRFS_INODE_REF_KEY		12
> +#define BTRFS_INODE_EXTREF_KEY		13
>  #define BTRFS_XATTR_ITEM_KEY		24
>  #define BTRFS_ORPHAN_ITEM_KEY		48
>  /* reserve 2-15 close to the inode for later flexibility */
> @@ -1701,6 +1720,13 @@ BTRFS_SETGET_STACK_FUNCS(block_group_flags,
>  BTRFS_SETGET_FUNCS(inode_ref_name_len, struct btrfs_inode_ref, name_len, 16);
>  BTRFS_SETGET_FUNCS(inode_ref_index, struct btrfs_inode_ref, index, 64);
>  
> +/* struct btrfs_inode_extref */
> +BTRFS_SETGET_FUNCS(inode_extref_parent, struct btrfs_inode_extref,
> +		   parent_objectid, 64);
> +BTRFS_SETGET_FUNCS(inode_extref_name_len, struct btrfs_inode_extref,
> +		   name_len, 16);
> +BTRFS_SETGET_FUNCS(inode_extref_index, struct btrfs_inode_extref, index, 64);
> +
>  /* struct btrfs_inode_item */
>  BTRFS_SETGET_FUNCS(inode_generation, struct btrfs_inode_item, generation, 64);
>  BTRFS_SETGET_FUNCS(inode_sequence, struct btrfs_inode_item, sequence, 64);
> @@ -2791,12 +2817,12 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
>  			   struct btrfs_root *root,
>  			   const char *name, int name_len,
>  			   u64 inode_objectid, u64 ref_objectid, u64 *index);
> -struct btrfs_inode_ref *
> -btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans,
> -			struct btrfs_root *root,
> -			struct btrfs_path *path,
> -			const char *name, int name_len,
> -			u64 inode_objectid, u64 ref_objectid, int mod);
> +int btrfs_get_inode_ref_index(struct btrfs_trans_handle *trans,
> +			      struct btrfs_root *root,
> +			      struct btrfs_path *path,
> +			      const char *name, int name_len,
> +			      u64 inode_objectid, u64 ref_objectid, int mod,
> +			      u64 *ret_index);
>  int btrfs_insert_empty_inode(struct btrfs_trans_handle *trans,
>  			     struct btrfs_root *root,
>  			     struct btrfs_path *path, u64 objectid);
> @@ -2804,6 +2830,18 @@ int btrfs_lookup_inode(struct btrfs_trans_handle *trans, struct btrfs_root
>  		       *root, struct btrfs_path *path,
>  		       struct btrfs_key *location, int mod);
>  
> +struct btrfs_inode_extref *
> +btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans,
> +			  struct btrfs_root *root,
> +			  struct btrfs_path *path,
> +			  const char *name, int name_len,
> +			  u64 inode_objectid, u64 ref_objectid, int ins_len,
> +			  int cow);
> +
> +int find_name_in_ext_backref(struct btrfs_path *path, const char *name,
> +			     int name_len,
> +			     struct btrfs_inode_extref **extref_ret);
> +
>  /* file-item.c */
>  int btrfs_del_csums(struct btrfs_trans_handle *trans,
>  		    struct btrfs_root *root, u64 bytenr, u64 len);
> diff --git a/fs/btrfs/hash.h b/fs/btrfs/hash.h
> index db2ff97..1d98281 100644
> --- a/fs/btrfs/hash.h
> +++ b/fs/btrfs/hash.h
> @@ -24,4 +24,14 @@ static inline u64 btrfs_name_hash(const char *name, int len)
>  {
>  	return crc32c((u32)~1, name, len);
>  }
> +
> +/*
> + * Figure the key offset of an extended inode ref
> + */
> +static inline u64 btrfs_extref_hash(u64 parent_objectid, const char *name,
> +				    int len)
> +{
> +	return (u64) crc32c(parent_objectid, name, len);
> +}
> +
>  #endif
> diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c
> index baa74f3..496fb1c 100644
> --- a/fs/btrfs/inode-item.c
> +++ b/fs/btrfs/inode-item.c
> @@ -18,6 +18,7 @@
>  
>  #include "ctree.h"
>  #include "disk-io.h"
> +#include "hash.h"
>  #include "transaction.h"
>  
>  static int find_name_in_backref(struct btrfs_path *path, const char *name,
> @@ -49,18 +50,56 @@ static int find_name_in_backref(struct btrfs_path *path, const char *name,
>  	return 0;
>  }
>  
> -struct btrfs_inode_ref *
> +int find_name_in_ext_backref(struct btrfs_path *path, const char *name,
> +			     int name_len,
> +			     struct btrfs_inode_extref **extref_ret)

Exported functions should be prefixed "btrfs_". What about btrfs_find_extref_name?

> +{
> +	struct extent_buffer *leaf;
> +	struct btrfs_inode_extref *extref;
> +	unsigned long ptr;
> +	unsigned long name_ptr;
> +	u32 item_size;
> +	u32 cur_offset = 0;
> +	int ref_name_len;
> +
> +	leaf = path->nodes[0];
> +	item_size = btrfs_item_size_nr(leaf, path->slots[0]);
> +	ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +
> +	/*
> +	 * Search all extended backrefs in this item. We're only
> +	 * looking through any collisions so most of the time this is
> +	 * just going to compare against one buffer. If all is well,
> +	 * we'll return success and the inode ref object.
> +	 */
> +	while (cur_offset < item_size) {
> +		extref = (struct btrfs_inode_extref *) (ptr + cur_offset);
> +		name_ptr = (unsigned long)(&extref->name);
> +		ref_name_len = btrfs_inode_extref_name_len(leaf, extref);
> +
> +		if (ref_name_len == name_len
> +		    && (memcmp_extent_buffer(leaf, name, name_ptr, name_len) == 0)) {
> +			if (extref_ret)
> +				*extref_ret = extref;
> +			return 1;
> +		}
> +
> +		cur_offset += ref_name_len + sizeof(*extref);
> +	}
> +	return 0;

For consistency, I'd like to switch return 0 and 1.

> +}
> +
> +static struct btrfs_inode_ref *
>  btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans,

static functions should not be prefixed "btrfs_".

> -			struct btrfs_root *root,
> -			struct btrfs_path *path,
> -			const char *name, int name_len,
> -			u64 inode_objectid, u64 ref_objectid, int mod)
> +		       struct btrfs_root *root,
> +		       struct btrfs_path *path,
> +		       const char *name, int name_len,
> +		       u64 inode_objectid, u64 ref_objectid, int ins_len,
> +		       int cow)
>  {
> +	int ret;
>  	struct btrfs_key key;
>  	struct btrfs_inode_ref *ref;
> -	int ins_len = mod < 0 ? -1 : 0;
> -	int cow = mod != 0;
> -	int ret;
>  
>  	key.objectid = inode_objectid;
>  	key.type = BTRFS_INODE_REF_KEY;
> @@ -76,20 +115,156 @@ btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans,
>  	return ref;
>  }
>  
> -int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
> +struct btrfs_inode_extref *
> +btrfs_lookup_inode_extref(struct btrfs_trans_handle *trans,
> +			  struct btrfs_root *root,
> +			  struct btrfs_path *path,
> +			  const char *name, int name_len,
> +			  u64 inode_objectid, u64 ref_objectid, int ins_len,
> +			  int cow)

Please add a comment on the return value above this function. It is important to
note that it returns NULL instead of an ENOENT error pointer (which is okay and
consistent with btrfs_inode_ref_index).

> +{
> +	int ret;
> +	struct btrfs_key key;
> +	struct btrfs_inode_extref *extref;
> +
> +	key.objectid = inode_objectid;
> +	key.type = BTRFS_INODE_EXTREF_KEY;
> +	key.offset = btrfs_extref_hash(ref_objectid, name, name_len);
> +
> +	ret = btrfs_search_slot(trans, root, &key, path, ins_len, cow);
> +	if (ret < 0)
> +		return ERR_PTR(ret);
> +	if (ret > 0)
> +		return NULL;
> +	if (!find_name_in_ext_backref(path, name, name_len, &extref))

Maybe easier to read if written with "ret" in two lines. Not sure.

> +		return NULL;
> +	return extref;
> +}
> +
> +int btrfs_get_inode_ref_index(struct btrfs_trans_handle *trans,
> +			      struct btrfs_root *root,
> +			      struct btrfs_path *path,
> +			      const char *name, int name_len,
> +			      u64 inode_objectid, u64 ref_objectid, int mod,
> +			      u64 *ret_index)
> +{
> +	struct btrfs_inode_ref *ref1;

"ref1" seems to be a left over from when "extref" was called "ref2". We should
rather just say "ref" here.

> +	struct btrfs_inode_extref *extref;
> +	int ins_len = mod < 0 ? -1 : 0;
> +	int cow = mod != 0;
> +
> +	ref1 = btrfs_lookup_inode_ref(trans, root, path, name, name_len,
> +				      inode_objectid, ref_objectid, ins_len,
> +				      cow);
> +	if (IS_ERR(ref1))
> +		return PTR_ERR(ref1);
> +
> +	if (ref1 != NULL) {
> +		*ret_index = btrfs_inode_ref_index(path->nodes[0], ref1);
> +		return 0;
> +	}
> +
> +	btrfs_release_path(path);
> +
> +	extref = btrfs_lookup_inode_extref(trans, root, path, name,
> +					   name_len, inode_objectid,
> +					   ref_objectid, ins_len, cow);
> +	if (IS_ERR(extref))
> +		return PTR_ERR(extref);
> +
> +	if (extref) {
> +		*ret_index = btrfs_inode_extref_index(path->nodes[0], extref);
> +		return 0;
> +	}
> +
> +	return -ENOENT;
> +}
> +
> +int btrfs_del_inode_extref(struct btrfs_trans_handle *trans,
>  			   struct btrfs_root *root,
>  			   const char *name, int name_len,
>  			   u64 inode_objectid, u64 ref_objectid, u64 *index)
>  {
>  	struct btrfs_path *path;
>  	struct btrfs_key key;
> +	struct btrfs_inode_extref *extref;
> +	struct extent_buffer *leaf;
> +	int ret;
> +	int del_len = name_len + sizeof(*extref);
> +	unsigned long ptr;
> +	unsigned long item_start;
> +	u32 item_size;
> +
> +	key.objectid = inode_objectid;
> +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
> +	key.offset = btrfs_extref_hash(ref_objectid, name, name_len);
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	path->leave_spinning = 1;
> +
> +	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
> +	if (ret > 0)
> +		ret = -ENOENT;
> +	if (ret < 0)
> +		goto out;
> +
> +	/*
> +	 * Sanity check - did we find the right item for this name?
> +	 * This should always succeed so error here will make the FS
> +	 * readonly.
> +	 */
> +	if (!find_name_in_ext_backref(path, name, name_len, &extref)) {
> +		btrfs_std_error(root->fs_info, -ENOENT);
> +		ret = -EROFS;
> +		goto out;
> +	}
> +
> +	leaf = path->nodes[0];
> +	item_size = btrfs_item_size_nr(leaf, path->slots[0]);
> +	if (index)
> +		*index = btrfs_inode_extref_index(leaf, extref);
> +
> +	if (del_len == item_size) {
> +		/*
> +		 * Common case only one ref in the item, remove the
> +		 * whole item.
> +		 */
> +		ret = btrfs_del_item(trans, root, path);
> +		goto out;
> +	}
> +
> +	ptr = (unsigned long)extref;
> +	item_start = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +
> +	memmove_extent_buffer(leaf, ptr, ptr + del_len,
> +			      item_size - (ptr + del_len - item_start));
> +
> +	ret = btrfs_truncate_item(trans, root, path,
> +				  item_size - del_len, 1);
> +
> +out:
> +	btrfs_free_path(path);
> +
> +	return ret;
> +}
> +
> +int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
> +			struct btrfs_root *root,
> +			const char *name, int name_len,
> +			u64 inode_objectid, u64 ref_objectid, u64 *index)
> +{
> +	struct btrfs_path *path;
> +	struct btrfs_key key;
>  	struct btrfs_inode_ref *ref;
>  	struct extent_buffer *leaf;
>  	unsigned long ptr;
>  	unsigned long item_start;
>  	u32 item_size;
>  	u32 sub_item_len;
> -	int ret;
> +	int ret, search_ext_refs = 0;

No comma declarations according to style guide (as mentioned in May ;-) )

>  	int del_len = name_len + sizeof(*ref);
>  
>  	key.objectid = inode_objectid;
> @@ -105,12 +280,14 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
>  	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
>  	if (ret > 0) {
>  		ret = -ENOENT;
> +		search_ext_refs = 1;
>  		goto out;
>  	} else if (ret < 0) {
>  		goto out;
>  	}
>  	if (!find_name_in_backref(path, name, name_len, &ref)) {
>  		ret = -ENOENT;
> +		search_ext_refs = 1;
>  		goto out;
>  	}
>  	leaf = path->nodes[0];
> @@ -132,6 +309,75 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle *trans,
>  				  item_size - sub_item_len, 1);
>  out:
>  	btrfs_free_path(path);
> +
> +	if (search_ext_refs) {
> +		/*
> +		 * No refs were found, or we could not find the
> +		 * name in our ref array. Find and remove the extended
> +		 * inode ref then.
> +		 */
> +		return btrfs_del_inode_extref(trans, root, name, name_len,
> +					      inode_objectid, ref_objectid, index);
> +	}
> +
> +	return ret;
> +}
> +
> +/*
> + * btrfs_insert_inode_extref() - Inserts an extended inode ref into a tree.
> + *
> + * The caller must have checked against BTRFS_LINK_MAX already.
> + */
> +static int btrfs_insert_inode_extref(struct btrfs_trans_handle *trans,
> +				     struct btrfs_root *root,
> +				     const char *name, int name_len,
> +				     u64 inode_objectid, u64 ref_objectid, u64 index)
> +{
> +	struct btrfs_inode_extref *extref;
> +	int ret;
> +	int ins_len = name_len + sizeof(*extref);
> +	unsigned long ptr;
> +	struct btrfs_path *path;
> +	struct btrfs_key key;
> +	struct extent_buffer *leaf;
> +	struct btrfs_item *item;
> +
> +	key.objectid = inode_objectid;
> +	key.type = BTRFS_INODE_EXTREF_KEY;
> +	key.offset = btrfs_extref_hash(ref_objectid, name, name_len);
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	path->leave_spinning = 1;
> +	ret = btrfs_insert_empty_item(trans, root, path, &key,
> +				      ins_len);
> +	if (ret == -EEXIST) {
> +		if (find_name_in_ext_backref(path, name, name_len, NULL))
> +			goto out;
> +
> +		ret = btrfs_extend_item(trans, root, path, ins_len);
> +	}
> +	if (ret < 0)
> +		goto out;
> +
> +	leaf = path->nodes[0];
> +	item = btrfs_item_nr(leaf, path->slots[0]);
> +	ptr = (unsigned long)btrfs_item_ptr(leaf, path->slots[0], char);
> +	ptr += btrfs_item_size(leaf, item) - ins_len;
> +	extref = (struct btrfs_inode_extref *)ptr;
> +
> +	btrfs_set_inode_extref_name_len(path->nodes[0], extref, name_len);
> +	btrfs_set_inode_extref_index(path->nodes[0], extref, index);
> +	btrfs_set_inode_extref_parent(path->nodes[0], extref, ref_objectid);
> +
> +	ptr = (unsigned long)&extref->name;
> +	write_extent_buffer(path->nodes[0], name, ptr, name_len);
> +	btrfs_mark_buffer_dirty(path->nodes[0]);
> +
> +out:
> +	btrfs_free_path(path);
>  	return ret;
>  }
>  
> @@ -189,6 +435,19 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans,
>  
>  out:
>  	btrfs_free_path(path);
> +
> +	if (ret == -EMLINK) {
> +		struct btrfs_super_block *disk_super = root->fs_info->super_copy;
> +		/* We ran out of space in the ref array. Need to
> +		 * add an extended ref. */
> +		if (btrfs_super_incompat_flags(disk_super)
> +		    & BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)

Uhm. Good place to check here. But please help me to find the place where this
is added to the super block's flags in the first place.

> +			ret = btrfs_insert_inode_extref(trans, root, name,
> +							name_len,
> +							inode_objectid,
> +							ref_objectid, index);
> +	}
> +
>  	return ret;
>  }
>  
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 892b347..5ce89e4 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2689,7 +2689,6 @@ static struct btrfs_trans_handle *__unlink_start_trans(struct inode *dir,
>  	struct btrfs_trans_handle *trans;
>  	struct btrfs_root *root = BTRFS_I(dir)->root;
>  	struct btrfs_path *path;
> -	struct btrfs_inode_ref *ref;
>  	struct btrfs_dir_item *di;
>  	struct inode *inode = dentry->d_inode;
>  	u64 index;
> @@ -2803,17 +2802,17 @@ static struct btrfs_trans_handle *__unlink_start_trans(struct inode *dir,
>  	}
>  	btrfs_release_path(path);
>  
> -	ref = btrfs_lookup_inode_ref(trans, root, path,
> -				dentry->d_name.name, dentry->d_name.len,
> -				ino, dir_ino, 0);
> -	if (IS_ERR(ref)) {
> -		err = PTR_ERR(ref);
> +	ret = btrfs_get_inode_ref_index(trans, root, path, dentry->d_name.name,
> +					dentry->d_name.len, ino, dir_ino, 0,
> +					&index);
> +	if (ret) {
> +		err = ret;
>  		goto out;
>  	}
> -	BUG_ON(!ref);
> +
>  	if (check_path_shared(root, path))
>  		goto out;
> -	index = btrfs_inode_ref_index(path->nodes[0], ref);
> +
>  	btrfs_release_path(path);
>  
>  	/*
> @@ -4484,6 +4483,12 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans,
>  	btrfs_set_key_type(&key[0], BTRFS_INODE_ITEM_KEY);
>  	key[0].offset = 0;
>  
> +	/*
> +	 * Start new inodes with an inode_ref. This is slightly more
> +	 * efficient for small numbers of hard links since they will
> +	 * be packed into one item. Extended refs will kick in if we
> +	 * add more hard links than can fit in the ref item.
> +	 */
>  	key[1].objectid = objectid;
>  	btrfs_set_key_type(&key[1], BTRFS_INODE_REF_KEY);
>  	key[1].offset = ref_objectid;
> @@ -4777,7 +4782,7 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
>  	if (root->objectid != BTRFS_I(inode)->root->objectid)
>  		return -EXDEV;
>  
> -	if (inode->i_nlink == ~0U)
> +	if (inode->i_nlink >= BTRFS_LINK_MAX)
>  		return -EMLINK;
>  
>  	err = btrfs_set_inode_index(dir, &index);

That's all for this one,
-Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] btrfs: extended inode refs
  2012-05-21 21:46 ` [PATCH 2/3] " Mark Fasheh
@ 2012-07-06 14:57   ` Jan Schmidt
  2012-08-06 23:31     ` Mark Fasheh
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Schmidt @ 2012-07-06 14:57 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: linux-btrfs, Chris Mason, Mark Fasheh

On Mon, May 21, 2012 at 23:46 (+0200), Mark Fasheh wrote:
> From: Mark Fasheh <mfasheh@suse.com>
> 
> Teach tree-log.c about extended inode refs. In particular, we have to adjust
> the behavior of inode ref replay as well as log tree recovery to account for
> the existence of extended refs.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> ---
>  fs/btrfs/backref.c  |   68 +++++++++++
>  fs/btrfs/backref.h  |    5 +
>  fs/btrfs/tree-log.c |  308 +++++++++++++++++++++++++++++++++++++++++---------
>  fs/btrfs/tree-log.h |    1 +
>  4 files changed, 326 insertions(+), 56 deletions(-)
> 
> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> index 0436c12..c97240a 100644
> --- a/fs/btrfs/backref.c
> +++ b/fs/btrfs/backref.c
> @@ -857,6 +857,74 @@ static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
>  				found_key);
>  }
>  
> +int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> +			  u64 start_off, struct btrfs_path *path,
> +			  struct btrfs_inode_extref **ret_extref,
> +			  u64 *found_off)
> +{
> +	int ret, slot;
> +	struct btrfs_key key;
> +	struct btrfs_key found_key;
> +	struct btrfs_inode_extref *extref;
> +	struct extent_buffer *leaf;
> +	unsigned long ptr;
> +
> +	key.objectid = inode_objectid;
> +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
> +	key.offset = start_off;
> +
> +	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> +	if (ret < 0)
> +		return ret;
> +
> +	while (1) {
> +		leaf = path->nodes[0];
> +		slot = path->slots[0];
> +		if (slot >= btrfs_header_nritems(leaf)) {
> +			/*
> +			 * If the item at offset is not found,
> +			 * btrfs_search_slot will point us to the slot
> +			 * where it should be inserted. In our case
> +			 * that will be the slot directly before the
> +			 * next INODE_REF_KEY_V2 item. In the case
> +			 * that we're pointing to the last slot in a
> +			 * leaf, we must move one leaf over.
> +			 */
> +			ret = btrfs_next_leaf(root, path);
> +			if (ret) {
> +				if (ret >= 1)
> +					ret = -ENOENT;
> +				break;
> +			}

Okay there's still no btrfs_search_slot_for_read in mainline :-/ We should keep
in mind to replace this later on.

> +			continue;
> +		}
> +
> +		btrfs_item_key_to_cpu(leaf, &found_key, slot);
> +
> +		/*
> +		 * Check that we're still looking at an extended ref key for
> +		 * this particular objectid. If we have different
> +		 * objectid or type then there are no more to be found
> +		 * in the tree and we can exit.
> +		 */
> +		ret = -ENOENT;
> +		if (found_key.objectid != inode_objectid)
> +			break;
> +		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
> +			break;
> +
> +		ret = 0;
> +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +		extref = (struct btrfs_inode_extref *)ptr;
> +		*ret_extref = extref;
> +		if (found_off)
> +			*found_off = found_key.offset;
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>  /*
>   * this iterates to turn a btrfs_inode_ref into a full filesystem path. elements
>   * of the path are separated by '/' and the path is guaranteed to be
> diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
> index d00dfa9..8586d1b 100644
> --- a/fs/btrfs/backref.h
> +++ b/fs/btrfs/backref.h
> @@ -64,4 +64,9 @@ struct inode_fs_paths *init_ipath(s32 total_bytes, struct btrfs_root *fs_root,
>  					struct btrfs_path *path);
>  void free_ipath(struct inode_fs_paths *ipath);
>  
> +int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> +			  u64 start_off, struct btrfs_path *path,
> +			  struct btrfs_inode_extref **ret_extref,
> +			  u64 *found_off);
> +
>  #endif
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 966cc74..1e812dd 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -23,8 +23,10 @@
>  #include "disk-io.h"
>  #include "locking.h"
>  #include "print-tree.h"
> +#include "backref.h"
>  #include "compat.h"
>  #include "tree-log.h"
> +#include "hash.h"
>  
>  /* magic values for the inode_only field in btrfs_log_inode:
>   *
> @@ -751,10 +753,10 @@ static noinline int backref_in_log(struct btrfs_root *log,
>  	unsigned long ptr;
>  	unsigned long ptr_end;
>  	unsigned long name_ptr;
> -	int found_name_len;
>  	int item_size;
>  	int ret;
>  	int match = 0;
> +	int found_name_len;

Avoid such shifts, please.

>  	path = btrfs_alloc_path();
>  	if (!path)
> @@ -764,8 +766,16 @@ static noinline int backref_in_log(struct btrfs_root *log,
>  	if (ret != 0)
>  		goto out;
>  
> -	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
>  	ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]);
> +
> +	if (key->type == BTRFS_INODE_EXTREF_KEY) {
> +		if (find_name_in_ext_backref(path, name, namelen, NULL))
> +			match = 1;
> +
> +		goto out;
> +	}
> +
> +	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);

Diff makes us remove and add the unmodified item_size line, here. Can be avoided
by adding the if-block two lines above. Should be functionally the same, still.

>  	ptr_end = ptr + item_size;
>  	while (ptr < ptr_end) {
>  		ref = (struct btrfs_inode_ref *)ptr;
> @@ -786,6 +796,50 @@ out:
>  	return match;
>  }
>  
> +static int extref_get_fields(struct extent_buffer *eb, int slot,
> +			     u32 *namelen, char **name, u64 *index,
> +			     u64 *parent_objectid)
> +{
> +	struct btrfs_inode_extref *extref;
> +
> +	extref = (struct btrfs_inode_extref *)btrfs_item_ptr_offset(eb, slot);
> +
> +	*namelen = btrfs_inode_extref_name_len(eb, extref);
> +	*name = kmalloc(*namelen, GFP_NOFS);
> +	if (*name == NULL)
> +		return -ENOMEM;
> +
> +	read_extent_buffer(eb, *name, (unsigned long)&extref->name,
> +			   *namelen);
> +
> +	*index = btrfs_inode_extref_index(eb, extref);
> +	if (parent_objectid)
> +		*parent_objectid = btrfs_inode_extref_parent(eb, extref);
> +
> +	return 0;
> +}
> +
> +static int ref_get_fields(struct btrfs_key *key, struct extent_buffer *eb,
> +			  int slot, u32 *namelen, char **name, u64 *index,
> +			  u64 *parent_objectid)
> +{
> +	struct btrfs_inode_ref *ref;
> +
> +	ref = (struct btrfs_inode_ref *)btrfs_item_ptr_offset(eb, slot);
> +
> +	*namelen = btrfs_inode_ref_name_len(eb, ref);
> +	*name = kmalloc(*namelen, GFP_NOFS);	
> +	if (*name == NULL)
> +		return -ENOMEM;
> +
> +	read_extent_buffer(eb, *name, (unsigned long)(ref + 1), *namelen);
> +
> +	*index = btrfs_inode_ref_index(eb, ref);
> +	if (parent_objectid)
> +		*parent_objectid = key->offset;

This doesn't make much sense here. There are only 2 callers of this helper and
only one of them is interested in that offset value. So, just drop the key
parameter and grab the offset from the key yourself in case you really need it.

> +	return 0;
> +}
>  
>  /*
>   * replay one inode back reference item found in the log tree.
> @@ -801,15 +855,25 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
>  				  struct btrfs_key *key)
>  {
>  	struct btrfs_inode_ref *ref;
> +	struct btrfs_inode_extref *extref;
>  	struct btrfs_dir_item *di;
> +	struct btrfs_key search_key;
>  	struct inode *dir;
>  	struct inode *inode;
>  	unsigned long ref_ptr;
>  	unsigned long ref_end;
>  	char *name;
> +	char *victim_name;
>  	int namelen;
> +	int victim_name_len;
>  	int ret;
>  	int search_done = 0;
> +	int log_ref_ver = 0;
> +	u64 parent_objectid;
> +	u64 inode_objectid;
> +	u64 ref_index;
> +	struct extent_buffer *leaf;
> +	int ref_struct_size;
>  
>  	/*
>  	 * it is possible that we didn't log all the parent directories
> @@ -817,32 +881,44 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
>  	 * copy the back ref in.  The link count fixup code will take
>  	 * care of the rest
>  	 */
> -	dir = read_one_inode(root, key->offset);
> +
> +
> +	ref_ptr = btrfs_item_ptr_offset(eb, slot);
> +	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
> +
> +	if (key->type == BTRFS_INODE_EXTREF_KEY) {	
> +		ref_struct_size = sizeof(*extref);
> +		log_ref_ver = 1;
> +
> +		ret = extref_get_fields(eb, slot, &namelen, &name, &ref_index,
> +					&parent_objectid);
> +		if (ret)
> +			return ret;
> +	} else {
> +		ref_struct_size = sizeof(*ref);
> +
> +		ret = ref_get_fields(key, eb, slot, &namelen, &name,
> +				     &ref_index, &parent_objectid);
> +		if (ret)
> +			return -ENOMEM;

return ret here, too?

> +	}
> +
> +	inode_objectid = key->objectid;
> +
> +	dir = read_one_inode(root, parent_objectid);
>  	if (!dir)
>  		return -ENOENT;
>  
> -	inode = read_one_inode(root, key->objectid);
> +	inode = read_one_inode(root, inode_objectid);
>  	if (!inode) {
>  		iput(dir);
>  		return -EIO;
>  	}
>  
> -	ref_ptr = btrfs_item_ptr_offset(eb, slot);
> -	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
> -
>  again:
> -	ref = (struct btrfs_inode_ref *)ref_ptr;
> -
> -	namelen = btrfs_inode_ref_name_len(eb, ref);
> -	name = kmalloc(namelen, GFP_NOFS);
> -	BUG_ON(!name);
> -
> -	read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> -
>  	/* if we already have a perfect match, we're done */
>  	if (inode_in_dir(root, path, btrfs_ino(dir), btrfs_ino(inode),
> -			 btrfs_inode_ref_index(eb, ref),
> -			 name, namelen)) {
> +			 ref_index, name, namelen)) {
>  		goto out;
>  	}
>  
> @@ -857,19 +933,23 @@ again:
>  	if (search_done)
>  		goto insert;
>  
> -	ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
> +	/* Search old style refs */
> +	search_key.objectid = inode_objectid;
> +	search_key.type = BTRFS_INODE_REF_KEY;
> +	search_key.offset = parent_objectid;
> +
> +	ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0);
>  	if (ret == 0) {
> -		char *victim_name;
> -		int victim_name_len;
>  		struct btrfs_inode_ref *victim_ref;
>  		unsigned long ptr;
>  		unsigned long ptr_end;
> -		struct extent_buffer *leaf = path->nodes[0];
> +
> +		leaf = path->nodes[0];
>  
>  		/* are we trying to overwrite a back ref for the root directory
>  		 * if so, just jump out, we're done
>  		 */
> -		if (key->objectid == key->offset)
> +		if (search_key.objectid == search_key.offset)
>  			goto out_nowrite;
>  
>  		/* check all the names in this back reference to see
> @@ -889,7 +969,7 @@ again:
>  					   (unsigned long)(victim_ref + 1),
>  					   victim_name_len);
>  
> -			if (!backref_in_log(log, key, victim_name,
> +			if (!backref_in_log(log, &search_key, victim_name,
>  					    victim_name_len)) {
>  				btrfs_inc_nlink(inode);
>  				btrfs_release_path(path);
> @@ -902,19 +982,60 @@ again:
>  			ptr = (unsigned long)(victim_ref + 1) + victim_name_len;
>  		}
>  		BUG_ON(ret);
> -
> -		/*
> -		 * NOTE: we have searched root tree and checked the
> -		 * coresponding ref, it does not need to check again.
> -		 */
> -		search_done = 1;
>  	}
>  	btrfs_release_path(path);
>  
> +	/* Same search but for extended refs */
> +	extref = btrfs_lookup_inode_extref(NULL, root, path, name, namelen,
> +					   inode_objectid, parent_objectid, 0,
> +					   0);
> +	if (!IS_ERR_OR_NULL(extref)) {
> +		u32 item_size;
> +		u32 cur_offset = 0;
> +		unsigned long base;
> +
> +		leaf = path->nodes[0];
> +
> +		item_size = btrfs_item_size_nr(leaf, path->slots[0]);
> +		base = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +
> +		while (cur_offset < item_size) {
> +			extref = (struct btrfs_inode_extref *)base + cur_offset;
> +
> +			victim_name_len = btrfs_inode_extref_name_len(eb, extref);
> +			victim_name = kmalloc(namelen, GFP_NOFS);
> +			leaf = path->nodes[0];
> +			read_extent_buffer(eb, name, (unsigned long)&extref->name, namelen);
> +
> +			search_key.objectid = inode_objectid;
> +			search_key.type = BTRFS_INODE_EXTREF_KEY;
> +			search_key.offset = btrfs_extref_hash(parent_objectid,
> +							      name, namelen);
> +			if (!backref_in_log(log, &search_key, victim_name,
> +					    victim_name_len)) {
> +				btrfs_inc_nlink(inode);
> +				btrfs_release_path(path);
> +
> +				ret = btrfs_unlink_inode(trans, root, dir,
> +							 inode, victim_name,
> +							 victim_name_len);
> +			}
> +			kfree(victim_name);
> +			BUG_ON(ret);
> +
> +			cur_offset += victim_name_len + sizeof(*extref);
> +		}
> +	}
> +
> +	/*
> +	 * NOTE: we have searched root tree and checked the
> +	 * coresponding refs, it does not need to be checked again.
> +	 */
> +	search_done = 1;
> +

Thought about this search_done once again, I'd like to repeat our May's
conversation:

On Fri, May 04, 2012 at 01:12 (+0200), Mark Fasheh wrote:
> > You moved this comment and assignment out of the "if (ret == 0)" case.
> > I'm not sure if this is still doing exactly the same now (?).
> > Previously, we were executing another btrfs_search_slot,
> > btrfs_lookup_dir_index_item, ... after the "goto again" case, which
> > would be skipped with this patch.
> Hmm, ok you're definitely right that the search_done line there is broken.
> Come to think of it, I'm not quite sure what the meaning of that tiny bit of
> code was. I'll come back to this one once I've looked closer.

What's the result of looking closer?

>  	/* look for a conflicting sequence number */
>  	di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir),
> -					 btrfs_inode_ref_index(eb, ref),
> -					 name, namelen, 0);
> +					 ref_index, name, namelen, 0);
>  	if (di && !IS_ERR(di)) {
>  		ret = drop_one_dir_item(trans, root, path, dir, di);
>  		BUG_ON(ret);
> @@ -932,17 +1053,25 @@ again:
>  
>  insert:
>  	/* insert our name */
> -	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
> -			     btrfs_inode_ref_index(eb, ref));
> +	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0, ref_index);
>  	BUG_ON(ret);
>  
>  	btrfs_update_inode(trans, root, inode);
>  
>  out:
> -	ref_ptr = (unsigned long)(ref + 1) + namelen;
> +	ref_ptr = (unsigned long)(ref_ptr + ref_struct_size) + namelen;
>  	kfree(name);
> -	if (ref_ptr < ref_end)
> +	if (ref_ptr < ref_end) {
> +		if (log_ref_ver) {
> +			ret = extref_get_fields(eb, slot, &namelen, &name,
> +						&ref_index, NULL);
> +		} else {
> +			ret = ref_get_fields(key, eb, slot, &namelen, &name,
> +					     &ref_index, NULL);
> +		}
> +		BUG_ON(ret);

We return ret above and BUG_ON ret, here. Is that on purpose? May make sense, I
just don't see the difference immediately.

>  		goto again;
> +	}
>  
>  	/* finally write the back reference in the inode */
>  	ret = overwrite_item(trans, root, path, eb, slot, key);
> @@ -965,25 +1094,52 @@ static int insert_orphan_item(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +static int count_inode_extrefs(struct btrfs_root *root,
> +			       struct inode *inode, struct btrfs_path *path)
> +{
> +	int ret;
> +	int name_len;
> +	unsigned int nlink = 0;
> +	u32 item_size;
> +	u32 cur_offset = 0;
> +	u64 inode_objectid = btrfs_ino(inode);
> +	u64 offset = 0;
> +	unsigned long ptr;
> +	struct btrfs_inode_extref *extref;
> +	struct extent_buffer *leaf;
>  
> -/*
> - * There are a few corners where the link count of the file can't
> - * be properly maintained during replay.  So, instead of adding
> - * lots of complexity to the log code, we just scan the backrefs
> - * for any file that has been through replay.
> - *
> - * The scan will update the link count on the inode to reflect the
> - * number of back refs found.  If it goes down to zero, the iput
> - * will free the inode.
> - */
> -static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> -					   struct btrfs_root *root,
> -					   struct inode *inode)
> +	while (1) {
> +		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
> +					    &extref, &offset);
> +		if (ret)
> +			break;

Still looking strange. We should ask harder for an answer here.

On Fri, May 04, 2012 at 01:12 (+0200), Mark Fasheh wrote:
> > Assume the first call to btrfs_find_ione_extref returns -EIO. Do we
> > really want count_inode_extrefs return 0 here? I agree that the previous
> > code suffers from the same problem, but still: it's a problem.
> Yeah as you note, I'm just keeping the same behavior as before. This I think
> is probably a question for Chris...

To me it seems the best choice would be to return a negative value on error and
check for that in the caller.

> +		leaf = path->nodes[0];
> +		item_size = btrfs_item_size_nr(leaf, path->slots[0]);
> +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +
> +		while (cur_offset < item_size) {
> +			extref = (struct btrfs_inode_extref *) (ptr + cur_offset);
> +			name_len = btrfs_inode_extref_name_len(leaf, extref);
> +
> +			nlink++;
> +
> +			cur_offset += name_len + sizeof(*extref);
> +		}
> +
> +		offset++;
> +	}
> +	btrfs_release_path(path);
> +
> +	return nlink;
> +}
> +
> +static int count_inode_refs(struct btrfs_root *root,
> +			       struct inode *inode, struct btrfs_path *path)
>  {
> -	struct btrfs_path *path;
>  	int ret;
>  	struct btrfs_key key;
> -	u64 nlink = 0;
> +	unsigned int nlink = 0;
>  	unsigned long ptr;
>  	unsigned long ptr_end;
>  	int name_len;
> @@ -993,10 +1149,6 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
>  	key.type = BTRFS_INODE_REF_KEY;
>  	key.offset = (u64)-1;
>  
> -	path = btrfs_alloc_path();
> -	if (!path)
> -		return -ENOMEM;
> -
>  	while (1) {
>  		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
>  		if (ret < 0)
> @@ -1030,6 +1182,45 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
>  		btrfs_release_path(path);
>  	}
>  	btrfs_release_path(path);
> +
> +	return nlink;
> +}
> +
> +/*
> + * There are a few corners where the link count of the file can't
> + * be properly maintained during replay.  So, instead of adding
> + * lots of complexity to the log code, we just scan the backrefs
> + * for any file that has been through replay.
> + *
> + * The scan will update the link count on the inode to reflect the
> + * number of back refs found.  If it goes down to zero, the iput
> + * will free the inode.
> + */
> +static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> +					   struct btrfs_root *root,
> +					   struct inode *inode)
> +{
> +	struct btrfs_path *path;
> +	int ret;
> +	u64 nlink = 0;
> +	u64 ino = btrfs_ino(inode);
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	ret = count_inode_refs(root, inode, path);
> +	if (ret < 0)
> +		goto out;
> +
> +	nlink = ret;
> +
> +	ret = count_inode_extrefs(root, inode, path);
> +	if (ret < 0)
> +		goto out;
> +
> +	nlink += ret;
> +
>  	if (nlink != inode->i_nlink) {
>  		set_nlink(inode, nlink);
>  		btrfs_update_inode(trans, root, inode);
> @@ -1045,9 +1236,10 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
>  		ret = insert_orphan_item(trans, root, ino);
>  		BUG_ON(ret);
>  	}
> -	btrfs_free_path(path);
>  
> -	return 0;
> +out:
> +	btrfs_free_path(path);
> +	return ret;
>  }
>  
>  static noinline int fixup_inode_link_counts(struct btrfs_trans_handle *trans,
> @@ -1689,6 +1881,10 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb,
>  			ret = add_inode_ref(wc->trans, root, log, path,
>  					    eb, i, &key);
>  			BUG_ON(ret && ret != -ENOENT);
> +		} else if (key.type == BTRFS_INODE_EXTREF_KEY) {
> +			ret = add_inode_ref(wc->trans, root, log, path,
> +					    eb, i, &key);
> +			BUG_ON(ret && ret != -ENOENT);
>  		} else if (key.type == BTRFS_EXTENT_DATA_KEY) {
>  			ret = replay_one_extent(wc->trans, root, path,
>  						eb, i, &key);
> diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
> index 2270ac5..a935a8c 100644
> --- a/fs/btrfs/tree-log.h
> +++ b/fs/btrfs/tree-log.h
> @@ -49,4 +49,5 @@ void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans,
>  int btrfs_log_new_name(struct btrfs_trans_handle *trans,
>  			struct inode *inode, struct inode *old_dir,
>  			struct dentry *parent);
> +

Doesn't add much, so just drop it.

>  #endif

-Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] btrfs: extended inode refs
  2012-05-21 21:46 ` [PATCH 3/3] " Mark Fasheh
@ 2012-07-06 14:57   ` Jan Schmidt
  2012-07-09 20:24     ` Mark Fasheh
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Schmidt @ 2012-07-06 14:57 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: linux-btrfs, Chris Mason, Mark Fasheh

On Mon, May 21, 2012 at 23:46 (+0200), Mark Fasheh wrote:
> From: Mark Fasheh <mfasheh@suse.com>
> 
> The iterate_irefs in backref.c is used to build path components from inode
> refs. This patch adds code to iterate extended refs as well.
> 
> I had modify the callback function signature to abstract out some of the
> differences between ref structures. iref_to_path() also needed similar
> changes.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> ---
>  fs/btrfs/backref.c |  144 +++++++++++++++++++++++++++++++++++++++++++---------
>  fs/btrfs/backref.h |    2 -
>  2 files changed, 119 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> index c97240a..d88fa49 100644
> --- a/fs/btrfs/backref.c
> +++ b/fs/btrfs/backref.c
> @@ -22,6 +22,7 @@
>  #include "ulist.h"
>  #include "transaction.h"
>  #include "delayed-ref.h"
> +#include "locking.h"

This + line tells me it's not based on top of linux-3.4 or newer. I see that the
changes made in between are now included in your patch set. It might have been
better to rebase it before sending them. Anyway, that only makes review a bit
harder, should affect applying the patches.

>  
>  /*
>   * this structure records all encountered refs on the way up to the root
> @@ -940,34 +941,35 @@ int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
>   * value will be smaller than dest. callers must check this!
>   */
>  static char *iref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path,
> -				struct btrfs_inode_ref *iref,
> -				struct extent_buffer *eb_in, u64 parent,
> -				char *dest, u32 size)
> +			  u32 name_len, unsigned long name_off,
> +			  struct extent_buffer *eb_in, u64 parent,
> +			  char *dest, u32 size)
>  {
> -	u32 len;
>  	int slot;
>  	u64 next_inum;
>  	int ret;
>  	s64 bytes_left = size - 1;
>  	struct extent_buffer *eb = eb_in;
>  	struct btrfs_key found_key;
> +	struct btrfs_inode_ref *iref;
>  
>  	if (bytes_left >= 0)
>  		dest[bytes_left] = '\0';
>  
>  	while (1) {
> -		len = btrfs_inode_ref_name_len(eb, iref);
> -		bytes_left -= len;
> +		bytes_left -= name_len;
>  		if (bytes_left >= 0)
>  			read_extent_buffer(eb, dest + bytes_left,
> -						(unsigned long)(iref + 1), len);
> +					   name_off, name_len);
>  		if (eb != eb_in)
>  			free_extent_buffer(eb);
> +
>  		ret = inode_ref_info(parent, 0, fs_root, path, &found_key);
>  		if (ret > 0)
>  			ret = -ENOENT;
>  		if (ret)
>  			break;
> +
>  		next_inum = found_key.offset;
>  
>  		/* regular exit ahead */
> @@ -980,8 +982,11 @@ static char *iref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path,
>  		if (eb != eb_in)
>  			atomic_inc(&eb->refs);
>  		btrfs_release_path(path);
> -
>  		iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
> +
> +		name_len = btrfs_inode_ref_name_len(eb, iref);
> +		name_off = (unsigned long)(iref + 1);
> +
>  		parent = next_inum;
>  		--bytes_left;
>  		if (bytes_left >= 0)
> @@ -1294,9 +1299,12 @@ int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info,
>  	return ret;
>  }
>  
> -static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
> -				struct btrfs_path *path,
> -				iterate_irefs_t *iterate, void *ctx)
> +typedef int (iterate_irefs_t)(u64 parent, u32 name_len, unsigned long name_off,
> +			      struct extent_buffer *eb, void *ctx);
> +
> +static int iterate_inode_refs(u64 inum, struct btrfs_root *fs_root,
> +			      struct btrfs_path *path,
> +			      iterate_irefs_t *iterate, void *ctx)
>  {
>  	int ret;
>  	int slot;
> @@ -1312,7 +1320,7 @@ static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
>  
>  	while (1) {
>  		ret = inode_ref_info(inum, parent ? parent+1 : 0, fs_root, path,
> -					&found_key);
> +				     &found_key);
>  		if (ret < 0)
>  			break;
>  		if (ret) {
> @@ -1326,8 +1334,11 @@ static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
>  		eb = path->nodes[0];
>  		/* make sure we can use eb after releasing the path */
>  		atomic_inc(&eb->refs);
> +		btrfs_tree_read_lock(eb);
> +		btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
>  		btrfs_release_path(path);
>  
> +

I realized you like adding new lines, but we really don't need two of them here.

>  		item = btrfs_item_nr(eb, slot);
>  		iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
>  
> @@ -1338,15 +1349,81 @@ static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
>  				 "tree %llu\n", cur,
>  				 (unsigned long long)found_key.objectid,
>  				 (unsigned long long)fs_root->objectid);
> -			ret = iterate(parent, iref, eb, ctx);
> -			if (ret) {
> -				free_extent_buffer(eb);
> +			ret = iterate(parent, name_len,
> +				      (unsigned long)(iref + 1),eb, ctx);

There's a space missing before "eb".

> +			if (ret)
>  				break;
> -			}
>  			len = sizeof(*iref) + name_len;
>  			iref = (struct btrfs_inode_ref *)((char *)iref + len);
>  		}
> +		btrfs_tree_read_unlock_blocking(eb);
> +		free_extent_buffer(eb);
> +	}
> +
> +	btrfs_release_path(path);
> +
> +	return ret;
> +}
> +
> +static int iterate_inode_extrefs(u64 inum, struct btrfs_root *fs_root,
> +				 struct btrfs_path *path,
> +				 iterate_irefs_t *iterate, void *ctx)
> +{
> +	int ret;
> +	int slot;
> +	u64 offset = 0;
> +	u64 parent;
> +	int found = 0;
> +	struct extent_buffer *eb;
> +	struct btrfs_inode_extref *extref;
> +	struct extent_buffer *leaf;
> +	u32 item_size;
> +	u32 cur_offset;
> +	unsigned long ptr;
> +
> +	while (1) {
> +		ret = btrfs_find_one_extref(fs_root, inum, offset, path, &extref,
> +					    &offset);
> +		if (ret < 0)
> +			break;
> +		if (ret) {
> +			ret = found ? 0 : -ENOENT;
> +			break;
> +		}
> +		++found;
> +
> +		slot = path->slots[0];
> +		eb = path->nodes[0];
> +		/* make sure we can use eb after releasing the path */
> +		atomic_inc(&eb->refs);
> +
> +		btrfs_tree_read_lock(eb);
> +		btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
> +		btrfs_release_path(path);
> +
> +		leaf = path->nodes[0];
> +		item_size = btrfs_item_size_nr(leaf, path->slots[0]);
> +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +		cur_offset = 0;
> +
> +		while (cur_offset < item_size) {
> +			u32 name_len;
> +
> +			extref = (struct btrfs_inode_extref *)(ptr + cur_offset);
> +			parent = btrfs_inode_extref_parent(eb, extref);
> +			name_len = btrfs_inode_extref_name_len(eb, extref);
> +			ret = iterate(parent, name_len,
> +				      (unsigned long)&extref->name, eb, ctx);
> +			if (ret)
> +				break;
> +
> +			cur_offset += btrfs_inode_extref_name_len(leaf, extref);
> +			cur_offset += sizeof(*extref);
> +		}
> +		btrfs_tree_read_unlock_blocking(eb);
>  		free_extent_buffer(eb);
> +
> +		offset++;
>  	}
>  
>  	btrfs_release_path(path);
> @@ -1354,12 +1431,32 @@ static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
>  	return ret;
>  }
>  
> +static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
> +			 struct btrfs_path *path, iterate_irefs_t *iterate,
> +			 void *ctx)
> +{
> +	int ret;
> +	int found_refs = 0;
> +
> +	ret = iterate_inode_refs(inum, fs_root, path, iterate, ctx);
> +	if (!ret)
> +		++found_refs;
> +	else if (ret != -ENOENT)
> +		return ret;
> +
> +	ret = iterate_inode_extrefs(inum, fs_root, path, iterate, ctx);
> +	if (ret == -ENOENT && found_refs)
> +		return 0;
> +
> +	return ret;
> +}
> +
>  /*
>   * returns 0 if the path could be dumped (probably truncated)
>   * returns <0 in case of an error
>   */
> -static int inode_to_path(u64 inum, struct btrfs_inode_ref *iref,
> -				struct extent_buffer *eb, void *ctx)
> +static int inode_to_path(u64 inum, u32 name_len, unsigned long name_off,
> +			 struct extent_buffer *eb, void *ctx)
>  {
>  	struct inode_fs_paths *ipath = ctx;
>  	char *fspath;
> @@ -1372,20 +1469,17 @@ static int inode_to_path(u64 inum, struct btrfs_inode_ref *iref,
>  					ipath->fspath->bytes_left - s_ptr : 0;
>  
>  	fspath_min = (char *)ipath->fspath->val + (i + 1) * s_ptr;
> -	fspath = iref_to_path(ipath->fs_root, ipath->btrfs_path, iref, eb,
> -				inum, fspath_min, bytes_left);
> +	fspath = iref_to_path(ipath->fs_root, ipath->btrfs_path, name_len,
> +			      name_off, eb, inum, fspath_min,
> +			      bytes_left);
>  	if (IS_ERR(fspath))
>  		return PTR_ERR(fspath);
>  
>  	if (fspath > fspath_min) {
> -		pr_debug("path resolved: %s\n", fspath);
>  		ipath->fspath->val[i] = (u64)(unsigned long)fspath;
>  		++ipath->fspath->elem_cnt;
>  		ipath->fspath->bytes_left = fspath - fspath_min;
>  	} else {
> -		pr_debug("missed path, not enough space. missing bytes: %lu, "
> -			 "constructed so far: %s\n",
> -			 (unsigned long)(fspath_min - fspath), fspath_min);
>  		++ipath->fspath->elem_missed;
>  		ipath->fspath->bytes_missing += fspath_min - fspath;
>  		ipath->fspath->bytes_left = 0;
> @@ -1407,7 +1501,7 @@ static int inode_to_path(u64 inum, struct btrfs_inode_ref *iref,
>  int paths_from_inode(u64 inum, struct inode_fs_paths *ipath)
>  {
>  	return iterate_irefs(inum, ipath->fs_root, ipath->btrfs_path,
> -				inode_to_path, ipath);
> +			     inode_to_path, ipath);
>  }
>  
>  /*
> diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
> index 8586d1b..649a220 100644
> --- a/fs/btrfs/backref.h
> +++ b/fs/btrfs/backref.h
> @@ -30,8 +30,6 @@ struct inode_fs_paths {
>  
>  typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 root,
>  		void *ctx);
> -typedef int (iterate_irefs_t)(u64 parent, struct btrfs_inode_ref *iref,
> -				struct extent_buffer *eb, void *ctx);
>  
>  int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
>  			struct btrfs_path *path);

Almost ready for a reviewed-by tag :-)
-Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] btrfs: extended inode refs
  2012-07-06 14:56   ` Jan Schmidt
@ 2012-07-06 15:14     ` Stefan Behrens
  2012-07-09 19:05     ` Mark Fasheh
  2012-07-09 20:33     ` Mark Fasheh
  2 siblings, 0 replies; 21+ messages in thread
From: Stefan Behrens @ 2012-07-06 15:14 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: Mark Fasheh, linux-btrfs, Chris Mason, Mark Fasheh

On Fri, 06 Jul 2012 16:56:40 +0200, Jan Schmidt wrote:
> On Mon, May 21, 2012 at 23:46 (+0200), Mark Fasheh wrote:
>> +int find_name_in_ext_backref(struct btrfs_path *path, const char *name,
>> +			     int name_len,
>> +			     struct btrfs_inode_extref **extref_ret)
> 
> Exported functions should be prefixed "btrfs_". What about btrfs_find_extref_name?

>> +
>> +static struct btrfs_inode_ref *
>>  btrfs_lookup_inode_ref(struct btrfs_trans_handle *trans,
> 
> static functions should not be prefixed "btrfs_".

Prefixing _all_ functions with the module name has some advantages:
- uniqueness for ctags(1)
- unique names in stack traces that allow to identify the module and to
find the source code

IMO Mark should not change it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] btrfs: extended inode refs
  2012-07-06 14:56   ` Jan Schmidt
  2012-07-06 15:14     ` Stefan Behrens
@ 2012-07-09 19:05     ` Mark Fasheh
  2012-07-09 20:33     ` Mark Fasheh
  2 siblings, 0 replies; 21+ messages in thread
From: Mark Fasheh @ 2012-07-09 19:05 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: linux-btrfs, Chris Mason

On Fri, Jul 06, 2012 at 04:56:40PM +0200, Jan Schmidt wrote:
> Hi Mark,
> 
> Sorry, I'm a bit late with my feedback.

No problem, thank you for taking the time to read these patches! Reply to
your most substantial comment is below. Everything else you pointed out I'll
have fixed up for an updated patch (don't think I need to reply individually
to those).


> On Mon, May 21, 2012 at 23:46 (+0200), Mark Fasheh wrote:

> > @@ -189,6 +435,19 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans,
> >  
> >  out:
> >  	btrfs_free_path(path);
> > +
> > +	if (ret == -EMLINK) {
> > +		struct btrfs_super_block *disk_super = root->fs_info->super_copy;
> > +		/* We ran out of space in the ref array. Need to
> > +		 * add an extended ref. */
> > +		if (btrfs_super_incompat_flags(disk_super)
> > +		    & BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
> 
> Uhm. Good place to check here. But please help me to find the place where this
> is added to the super block's flags in the first place.

mkfs.btrfs  :)

The kernel should never be adding the flag as things are currently designed.

Thanks again!
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] btrfs: extended inode refs
  2012-07-06 14:57   ` Jan Schmidt
@ 2012-07-09 20:24     ` Mark Fasheh
  0 siblings, 0 replies; 21+ messages in thread
From: Mark Fasheh @ 2012-07-09 20:24 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: linux-btrfs, Chris Mason

On Fri, Jul 06, 2012 at 04:57:29PM +0200, Jan Schmidt wrote:
> On Mon, May 21, 2012 at 23:46 (+0200), Mark Fasheh wrote:
> > From: Mark Fasheh <mfasheh@suse.com>
> > 
> > The iterate_irefs in backref.c is used to build path components from inode
> > refs. This patch adds code to iterate extended refs as well.
> > 
> > I had modify the callback function signature to abstract out some of the
> > differences between ref structures. iref_to_path() also needed similar
> > changes.
> > 
> > Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> > ---
> >  fs/btrfs/backref.c |  144 +++++++++++++++++++++++++++++++++++++++++++---------
> >  fs/btrfs/backref.h |    2 -
> >  2 files changed, 119 insertions(+), 27 deletions(-)
> > 
> > diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> > index c97240a..d88fa49 100644
> > --- a/fs/btrfs/backref.c
> > +++ b/fs/btrfs/backref.c
> > @@ -22,6 +22,7 @@
> >  #include "ulist.h"
> >  #include "transaction.h"
> >  #include "delayed-ref.h"
> > +#include "locking.h"
> 
> This + line tells me it's not based on top of linux-3.4 or newer. I see that the
> changes made in between are now included in your patch set. It might have been
> better to rebase it before sending them. Anyway, that only makes review a bit
> harder, should affect applying the patches.

Yes, it's all based on Linux 3.3 (when I started). I can rebase of course
but have avoided it so far in order to keep a stable base upon which to test
/ fix. I can rebase however, especially if that makes life easier for Chris.


> >  /*
> >   * this structure records all encountered refs on the way up to the root
> > @@ -940,34 +941,35 @@ int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> >   * value will be smaller than dest. callers must check this!
> >   */
> >  static char *iref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path,
> > -				struct btrfs_inode_ref *iref,
> > -				struct extent_buffer *eb_in, u64 parent,
> > -				char *dest, u32 size)
> > +			  u32 name_len, unsigned long name_off,
> > +			  struct extent_buffer *eb_in, u64 parent,
> > +			  char *dest, u32 size)
> >  {
> > -	u32 len;
> >  	int slot;
> >  	u64 next_inum;
> >  	int ret;
> >  	s64 bytes_left = size - 1;
> >  	struct extent_buffer *eb = eb_in;
> >  	struct btrfs_key found_key;
> > +	struct btrfs_inode_ref *iref;
> >  
> >  	if (bytes_left >= 0)
> >  		dest[bytes_left] = '\0';
> >  
> >  	while (1) {
> > -		len = btrfs_inode_ref_name_len(eb, iref);
> > -		bytes_left -= len;
> > +		bytes_left -= name_len;
> >  		if (bytes_left >= 0)
> >  			read_extent_buffer(eb, dest + bytes_left,
> > -						(unsigned long)(iref + 1), len);
> > +					   name_off, name_len);
> >  		if (eb != eb_in)
> >  			free_extent_buffer(eb);
> > +
> >  		ret = inode_ref_info(parent, 0, fs_root, path, &found_key);
> >  		if (ret > 0)
> >  			ret = -ENOENT;
> >  		if (ret)
> >  			break;
> > +
> >  		next_inum = found_key.offset;
> >  
> >  		/* regular exit ahead */
> > @@ -980,8 +982,11 @@ static char *iref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path,
> >  		if (eb != eb_in)
> >  			atomic_inc(&eb->refs);
> >  		btrfs_release_path(path);
> > -
> >  		iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
> > +
> > +		name_len = btrfs_inode_ref_name_len(eb, iref);
> > +		name_off = (unsigned long)(iref + 1);
> > +
> >  		parent = next_inum;
> >  		--bytes_left;
> >  		if (bytes_left >= 0)
> > @@ -1294,9 +1299,12 @@ int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info,
> >  	return ret;
> >  }
> >  
> > -static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
> > -				struct btrfs_path *path,
> > -				iterate_irefs_t *iterate, void *ctx)
> > +typedef int (iterate_irefs_t)(u64 parent, u32 name_len, unsigned long name_off,
> > +			      struct extent_buffer *eb, void *ctx);
> > +
> > +static int iterate_inode_refs(u64 inum, struct btrfs_root *fs_root,
> > +			      struct btrfs_path *path,
> > +			      iterate_irefs_t *iterate, void *ctx)
> >  {
> >  	int ret;
> >  	int slot;
> > @@ -1312,7 +1320,7 @@ static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
> >  
> >  	while (1) {
> >  		ret = inode_ref_info(inum, parent ? parent+1 : 0, fs_root, path,
> > -					&found_key);
> > +				     &found_key);
> >  		if (ret < 0)
> >  			break;
> >  		if (ret) {
> > @@ -1326,8 +1334,11 @@ static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
> >  		eb = path->nodes[0];
> >  		/* make sure we can use eb after releasing the path */
> >  		atomic_inc(&eb->refs);
> > +		btrfs_tree_read_lock(eb);
> > +		btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
> >  		btrfs_release_path(path);
> >  
> > +
> 
> I realized you like adding new lines, but we really don't need two of them here.

;)


> >  		item = btrfs_item_nr(eb, slot);
> >  		iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
> >  
> > @@ -1338,15 +1349,81 @@ static int iterate_irefs(u64 inum, struct btrfs_root *fs_root,
> >  				 "tree %llu\n", cur,
> >  				 (unsigned long long)found_key.objectid,
> >  				 (unsigned long long)fs_root->objectid);
> > -			ret = iterate(parent, iref, eb, ctx);
> > -			if (ret) {
> > -				free_extent_buffer(eb);
> > +			ret = iterate(parent, name_len,
> > +				      (unsigned long)(iref + 1),eb, ctx);
> 
> There's a space missing before "eb".

Ooop, will fix that.

> 
> > +			if (ret)
> >  				break;
> > -			}
> >  			len = sizeof(*iref) + name_len;
> >  			iref = (struct btrfs_inode_ref *)((char *)iref + len);
> >  		}
> > +		btrfs_tree_read_unlock_blocking(eb);
> > +		free_extent_buffer(eb);
> > +	}
> > +
> > +	btrfs_release_path(path);
> > +
> > +	return ret;
> > +}

> >  /*
> > diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
> > index 8586d1b..649a220 100644
> > --- a/fs/btrfs/backref.h
> > +++ b/fs/btrfs/backref.h
> > @@ -30,8 +30,6 @@ struct inode_fs_paths {
> >  
> >  typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 root,
> >  		void *ctx);
> > -typedef int (iterate_irefs_t)(u64 parent, struct btrfs_inode_ref *iref,
> > -				struct extent_buffer *eb, void *ctx);
> >  
> >  int inode_item_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
> >  			struct btrfs_path *path);
> 
> Almost ready for a reviewed-by tag :-)

Yes, it seems like that last issues left which aren't cosmetic / patch
guideline fixups have to do with patch 2. Let me know if you disagree :)

Thanks again Jan,
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] btrfs: extended inode refs
  2012-07-06 14:56   ` Jan Schmidt
  2012-07-06 15:14     ` Stefan Behrens
  2012-07-09 19:05     ` Mark Fasheh
@ 2012-07-09 20:33     ` Mark Fasheh
  2 siblings, 0 replies; 21+ messages in thread
From: Mark Fasheh @ 2012-07-09 20:33 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: linux-btrfs, Chris Mason


> > diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c
> > index baa74f3..496fb1c 100644
> > --- a/fs/btrfs/inode-item.c
> > +++ b/fs/btrfs/inode-item.c
> > @@ -18,6 +18,7 @@
> >  
> >  #include "ctree.h"
> >  #include "disk-io.h"
> > +#include "hash.h"
> >  #include "transaction.h"
> >  
> >  static int find_name_in_backref(struct btrfs_path *path, const char *name,
> > @@ -49,18 +50,56 @@ static int find_name_in_backref(struct btrfs_path *path, const char *name,
> >  	return 0;
> >  }
> >  
> > -struct btrfs_inode_ref *
> > +int find_name_in_ext_backref(struct btrfs_path *path, const char *name,
> > +			     int name_len,
> > +			     struct btrfs_inode_extref **extref_ret)
> 
> Exported functions should be prefixed "btrfs_". What about btrfs_find_extref_name?
> 
> > +{
> > +	struct extent_buffer *leaf;
> > +	struct btrfs_inode_extref *extref;
> > +	unsigned long ptr;
> > +	unsigned long name_ptr;
> > +	u32 item_size;
> > +	u32 cur_offset = 0;
> > +	int ref_name_len;
> > +
> > +	leaf = path->nodes[0];
> > +	item_size = btrfs_item_size_nr(leaf, path->slots[0]);
> > +	ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> > +
> > +	/*
> > +	 * Search all extended backrefs in this item. We're only
> > +	 * looking through any collisions so most of the time this is
> > +	 * just going to compare against one buffer. If all is well,
> > +	 * we'll return success and the inode ref object.
> > +	 */
> > +	while (cur_offset < item_size) {
> > +		extref = (struct btrfs_inode_extref *) (ptr + cur_offset);
> > +		name_ptr = (unsigned long)(&extref->name);
> > +		ref_name_len = btrfs_inode_extref_name_len(leaf, extref);
> > +
> > +		if (ref_name_len == name_len
> > +		    && (memcmp_extent_buffer(leaf, name, name_ptr, name_len) == 0)) {
> > +			if (extref_ret)
> > +				*extref_ret = extref;
> > +			return 1;
> > +		}
> > +
> > +		cur_offset += ref_name_len + sizeof(*extref);
> > +	}
> > +	return 0;
> 
> For consistency, I'd like to switch return 0 and 1.

Ok so btrfs_find_name_in_ext_backref() is designed to mirror
btrfs_find_name_in_backref() for obvious reasons - it does the same thing
except for extended backrefs. So it'd actually be inconsistent if I change
this (unless I change both but I don't think we want to do that).

The name is kept without the btrfs_ prefix for the same reasons, however I
don't think prefixing it is a big deal so I'll go ahead and make _that_
change unless you feel otherwise.

Thanks,
	--Mark


--
Mark Fasheh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] btrfs: extended inode refs
  2012-07-06 14:57   ` Jan Schmidt
@ 2012-08-06 23:31     ` Mark Fasheh
  0 siblings, 0 replies; 21+ messages in thread
From: Mark Fasheh @ 2012-08-06 23:31 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: linux-btrfs, Chris Mason

On Fri, Jul 06, 2012 at 04:57:15PM +0200, Jan Schmidt wrote:
> Thought about this search_done once again, I'd like to repeat our May's
> conversation:
> 
> On Fri, May 04, 2012 at 01:12 (+0200), Mark Fasheh wrote:
> > > You moved this comment and assignment out of the "if (ret == 0)" case.
> > > I'm not sure if this is still doing exactly the same now (?).
> > > Previously, we were executing another btrfs_search_slot,
> > > btrfs_lookup_dir_index_item, ... after the "goto again" case, which
> > > would be skipped with this patch.
> > Hmm, ok you're definitely right that the search_done line there is broken.
> > Come to think of it, I'm not quite sure what the meaning of that tiny bit of
> > code was. I'll come back to this one once I've looked closer.
> 
> What's the result of looking closer?

Ok so to describe what, to the best of my understanding,
add_inode_ref() does before my patch (corrections are appreciated):

add_inode_ref() is iterating through a backref found in the tree log. As it
iterates though each log tree backref item it checks the following against
the subvolume tree:

1) Does a backref item with the logged key exist in the subvolume tree? If so
   each individual backref must be checked to make sure it exists in the log.
   If not, it is removed.

2) Does a directory index item for the log refs name exist in the parent? Remove
   it.

3) Does a directory item exist for the log refs name in the parent? Remove
   it.

The 'search_done' variable is set the first time step 1 is executed. We
never need to execute that step more than once since the key we're
processing never changes (so we'd always pull up the same ref item) and we
fully process the item we get from it the 1st time.

I'm not sure why Steps 2 and 3 are also skipped though if we found subvolume
refs. It seems to me that we would want to do those checks on every log
ref. In the case that the subvolume has no existing refs we'd certainly
exectute that once for every log ref.

Also to note is that the condition for step 1 either hits the first time we
execute or we'll never hit it. There's nothing in the code there that
*adds* a ref item to the subtree so if it exists we'll get it otherwise
we'll never see it.


So in order to preserve this behavior, I'll update the patch so that
search_done is set within the two blocks which look over (extended and
traditional) refs found in the subvolume. The goto can stay in the same
place, as can all the labels.


How does that sound?


> >  	/* look for a conflicting sequence number */
> >  	di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir),
> > -					 btrfs_inode_ref_index(eb, ref),
> > -					 name, namelen, 0);
> > +					 ref_index, name, namelen, 0);
> >  	if (di && !IS_ERR(di)) {
> >  		ret = drop_one_dir_item(trans, root, path, dir, di);
> >  		BUG_ON(ret);
> > @@ -932,17 +1053,25 @@ again:
> >  
> >  insert:
> >  	/* insert our name */
> > -	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
> > -			     btrfs_inode_ref_index(eb, ref));
> > +	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0, ref_index);
> >  	BUG_ON(ret);
> >  
> >  	btrfs_update_inode(trans, root, inode);
> >  
> >  out:
> > -	ref_ptr = (unsigned long)(ref + 1) + namelen;
> > +	ref_ptr = (unsigned long)(ref_ptr + ref_struct_size) + namelen;
> >  	kfree(name);
> > -	if (ref_ptr < ref_end)
> > +	if (ref_ptr < ref_end) {
> > +		if (log_ref_ver) {
> > +			ret = extref_get_fields(eb, slot, &namelen, &name,
> > +						&ref_index, NULL);
> > +		} else {
> > +			ret = ref_get_fields(key, eb, slot, &namelen, &name,
> > +					     &ref_index, NULL);
> > +		}
> > +		BUG_ON(ret);
> 
> We return ret above and BUG_ON ret, here. Is that on purpose? May make sense, I
> just don't see the difference immediately.

Ahh, the first block we can return safely from since the function has not
done work yet. At this point though we've started the operation and it can
not be unrolled. The reason I say this "can not be unrolled" is because
almost every other call in between this and the start of the function has a
"BUG_ON(ret);" after it. As that was the case it seemed to make sense to put
this there.


> 
> >  		goto again;
> > +	}
> >  
> >  	/* finally write the back reference in the inode */
> >  	ret = overwrite_item(trans, root, path, eb, slot, key);
> > @@ -965,25 +1094,52 @@ static int insert_orphan_item(struct btrfs_trans_handle *trans,
> >  	return ret;
> >  }
> >  
> > +static int count_inode_extrefs(struct btrfs_root *root,
> > +			       struct inode *inode, struct btrfs_path *path)
> > +{
> > +	int ret;
> > +	int name_len;
> > +	unsigned int nlink = 0;
> > +	u32 item_size;
> > +	u32 cur_offset = 0;
> > +	u64 inode_objectid = btrfs_ino(inode);
> > +	u64 offset = 0;
> > +	unsigned long ptr;
> > +	struct btrfs_inode_extref *extref;
> > +	struct extent_buffer *leaf;
> >  
> > -/*
> > - * There are a few corners where the link count of the file can't
> > - * be properly maintained during replay.  So, instead of adding
> > - * lots of complexity to the log code, we just scan the backrefs
> > - * for any file that has been through replay.
> > - *
> > - * The scan will update the link count on the inode to reflect the
> > - * number of back refs found.  If it goes down to zero, the iput
> > - * will free the inode.
> > - */
> > -static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> > -					   struct btrfs_root *root,
> > -					   struct inode *inode)
> > +	while (1) {
> > +		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
> > +					    &extref, &offset);
> > +		if (ret)
> > +			break;
> 
> Still looking strange. We should ask harder for an answer here.
> 
> On Fri, May 04, 2012 at 01:12 (+0200), Mark Fasheh wrote:
> > > Assume the first call to btrfs_find_ione_extref returns -EIO. Do we
> > > really want count_inode_extrefs return 0 here? I agree that the previous
> > > code suffers from the same problem, but still: it's a problem.
> > Yeah as you note, I'm just keeping the same behavior as before. This I think
> > is probably a question for Chris...
> 
> To me it seems the best choice would be to return a negative value on error and
> check for that in the caller.

There's no good back out as far as I can tell. You really want this changed
so I went ahead and did it your way. What will happen now is that the
BUG_ON(ret) in fixup_inode_link_counts() will get triggered. This diverges
from how count_inode_refs() handles the error (it doesn't). I don't think changing
count_inode_refs() is in the scope of a patch to introduce extended refs so
I did not touch it.

IMHO, the problem of handling errors where it's very difficult to back out
is bigger than this one tiny function, and I don't want to try to solve it
any more than what we've done already.


Btw, the rest of your review comments have been addressed AFAICT. I should be
sending patches to the list very soon for more review :)

Thanks,
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] btrfs: extended inode refs
  2012-08-15 15:04   ` Jan Schmidt
@ 2012-08-15 17:59     ` Mark Fasheh
  0 siblings, 0 replies; 21+ messages in thread
From: Mark Fasheh @ 2012-08-15 17:59 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: linux-btrfs, Chris Mason

On Wed, Aug 15, 2012 at 05:04:48PM +0200, Jan Schmidt wrote:
> When applying this patch I get:
> 
> warning: 2 lines add whitespace errors.
> 

Oop, I'll fix that up.


> More comments inline.
> 
> On Wed, August 08, 2012 at 20:55 (+0200), Mark Fasheh wrote:
> > Teach tree-log.c about extended inode refs. In particular, we have to adjust
> > the behavior of inode ref replay as well as log tree recovery to account for
> > the existence of extended refs.
> > 
> > Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> > ---
> >  fs/btrfs/backref.c  |   68 ++++++++++++
> >  fs/btrfs/backref.h  |    5 +
> >  fs/btrfs/tree-log.c |  297 ++++++++++++++++++++++++++++++++++++++++++---------
> >  3 files changed, 319 insertions(+), 51 deletions(-)
> > 
> > diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> > index a383c18..658e09c 100644
> > --- a/fs/btrfs/backref.c
> > +++ b/fs/btrfs/backref.c
> > @@ -1111,6 +1111,74 @@ static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
> >  				found_key);
> >  }
> >  
> > +int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> > +			  u64 start_off, struct btrfs_path *path,
> > +			  struct btrfs_inode_extref **ret_extref,
> > +			  u64 *found_off)
> > +{
> > +	int ret, slot;
> > +	struct btrfs_key key;
> > +	struct btrfs_key found_key;
> > +	struct btrfs_inode_extref *extref;
> > +	struct extent_buffer *leaf;
> > +	unsigned long ptr;
> > +
> > +	key.objectid = inode_objectid;
> > +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
> > +	key.offset = start_off;
> > +
> > +	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	while (1) {
> > +		leaf = path->nodes[0];
> > +		slot = path->slots[0];
> > +		if (slot >= btrfs_header_nritems(leaf)) {
> > +			/*
> > +			 * If the item at offset is not found,
> > +			 * btrfs_search_slot will point us to the slot
> > +			 * where it should be inserted. In our case
> > +			 * that will be the slot directly before the
> > +			 * next INODE_REF_KEY_V2 item. In the case
> > +			 * that we're pointing to the last slot in a
> > +			 * leaf, we must move one leaf over.
> > +			 */
> > +			ret = btrfs_next_leaf(root, path);
> > +			if (ret) {
> > +				if (ret >= 1)
> > +					ret = -ENOENT;
> > +				break;
> > +			}
> 
> We can finally replace this with btrfs_search_slot_for_read, according to my
> first suggestion. It's merged now. Saves that long comment and the whole
> while-1-continue construct, which is quite cumbersome to read.

Alright, I'll take a look at this. I need to read
btrfs_search_slot_for_read() first to make sure it all fits correctly.


> > +			continue;
> > +		}
> > +
> > +		btrfs_item_key_to_cpu(leaf, &found_key, slot);
> > +
> > +		/*
> > +		 * Check that we're still looking at an extended ref key for
> > +		 * this particular objectid. If we have different
> > +		 * objectid or type then there are no more to be found
> > +		 * in the tree and we can exit.
> > +		 */
> > +		ret = -ENOENT;
> > +		if (found_key.objectid != inode_objectid)
> > +			break;
> > +		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
> > +			break;
> > +
> > +		ret = 0;
> > +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> > +		extref = (struct btrfs_inode_extref *)ptr;
> > +		*ret_extref = extref;
> > +		if (found_off)
> > +			*found_off = found_key.offset;
> > +		break;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> >  /*
> >   * this iterates to turn a btrfs_inode_ref into a full filesystem path. elements
> >   * of the path are separated by '/' and the path is guaranteed to be
> > diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
> > index c18d8ac..9f3e251 100644
> > --- a/fs/btrfs/backref.h
> > +++ b/fs/btrfs/backref.h
> > @@ -66,4 +66,9 @@ struct inode_fs_paths *init_ipath(s32 total_bytes, struct btrfs_root *fs_root,
> >  					struct btrfs_path *path);
> >  void free_ipath(struct inode_fs_paths *ipath);
> >  
> > +int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> > +			  u64 start_off, struct btrfs_path *path,
> > +			  struct btrfs_inode_extref **ret_extref,
> > +			  u64 *found_off);
> > +
> >  #endif
> > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > index 8abeae4..e5ba0a4 100644
> > --- a/fs/btrfs/tree-log.c
> > +++ b/fs/btrfs/tree-log.c
> > @@ -23,8 +23,10 @@
> >  #include "disk-io.h"
> >  #include "locking.h"
> >  #include "print-tree.h"
> > +#include "backref.h"
> >  #include "compat.h"
> >  #include "tree-log.h"
> > +#include "hash.h"
> >  
> >  /* magic values for the inode_only field in btrfs_log_inode:
> >   *
> > @@ -764,8 +766,16 @@ static noinline int backref_in_log(struct btrfs_root *log,
> >  	if (ret != 0)
> >  		goto out;
> >  
> > -	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
> >  	ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]);
> > +
> > +	if (key->type == BTRFS_INODE_EXTREF_KEY) {
> > +		if (btrfs_find_name_in_ext_backref(path, name, namelen, NULL))
> > +			match = 1;
> > +
> > +		goto out;
> > +	}
> > +
> > +	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
> >  	ptr_end = ptr + item_size;
> >  	while (ptr < ptr_end) {
> >  		ref = (struct btrfs_inode_ref *)ptr;
> > @@ -786,6 +796,47 @@ out:
> >  	return match;
> >  }
> >  
> > +static int extref_get_fields(struct extent_buffer *eb, int slot,
> > +			     u32 *namelen, char **name, u64 *index,
> > +			     u64 *parent_objectid)
> > +{
> > +	struct btrfs_inode_extref *extref;
> > +
> > +	extref = (struct btrfs_inode_extref *)btrfs_item_ptr_offset(eb, slot);
> > +
> > +	*namelen = btrfs_inode_extref_name_len(eb, extref);
> > +	*name = kmalloc(*namelen, GFP_NOFS);
> > +	if (*name == NULL)
> > +		return -ENOMEM;
> > +
> > +	read_extent_buffer(eb, *name, (unsigned long)&extref->name,
> > +			   *namelen);
> > +
> > +	*index = btrfs_inode_extref_index(eb, extref);
> > +	if (parent_objectid)
> > +		*parent_objectid = btrfs_inode_extref_parent(eb, extref);
> > +
> > +	return 0;
> > +}
> > +
> > +static int ref_get_fields(struct extent_buffer *eb, int slot, u32 *namelen,
> > +			  char **name, u64 *index)
> > +{
> > +	struct btrfs_inode_ref *ref;
> > +
> > +	ref = (struct btrfs_inode_ref *)btrfs_item_ptr_offset(eb, slot);
> > +
> > +	*namelen = btrfs_inode_ref_name_len(eb, ref);
> > +	*name = kmalloc(*namelen, GFP_NOFS);	
> > +	if (*name == NULL)
> > +		return -ENOMEM;
> > +
> > +	read_extent_buffer(eb, *name, (unsigned long)(ref + 1), *namelen);
> > +
> > +	*index = btrfs_inode_ref_index(eb, ref);
> > +
> > +	return 0;
> > +}
> >  
> >  /*
> >   * replay one inode back reference item found in the log tree.
> > @@ -801,15 +852,50 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
> >  				  struct btrfs_key *key)
> 
> Control flow in this function is horrible. I finally failed to understand it.
> That's why I made a patch that replaces most gotos by if blocks and uses a sub
> function (see the end of this email). I hope that it's getting easier to extend
> that patched version. Cannot comment on changes to code I don't understand, sorry.

I *think* I understand most of it, after much much reading and re-reading. I
really am not confident enough though to say anything on the matter until
Chris weighs in.


> If you find my patch helps understanding that function and makes patching
> easier, you can include that patch in your patch set as patch 2/4 and then do
> your extension afterwards.

Thanks, I'll take a look at your patch. I think any attempt to clean it up
will be nicely recieved.


> >  {
> >  	struct btrfs_inode_ref *ref;
> > +	struct btrfs_inode_extref *extref;
> >  	struct btrfs_dir_item *di;
> > +	struct btrfs_key search_key;
> >  	struct inode *dir;
> >  	struct inode *inode;
> >  	unsigned long ref_ptr;
> >  	unsigned long ref_end;
> >  	char *name;
> > +	char *victim_name;
> >  	int namelen;
> > +	int victim_name_len;
> >  	int ret;
> >  	int search_done = 0;
> > +	int log_ref_ver = 0;
> > +	u64 parent_objectid;
> > +	u64 inode_objectid;
> > +	u64 ref_index;
> > +	struct extent_buffer *leaf;
> > +	int ref_struct_size;
> > +
> > +
> > +	ref_ptr = btrfs_item_ptr_offset(eb, slot);
> > +	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
> > +
> > +	if (key->type == BTRFS_INODE_EXTREF_KEY) {	
> > +		ref_struct_size = sizeof(*extref);
> > +		log_ref_ver = 1;
> > +
> > +		ret = extref_get_fields(eb, slot, &namelen, &name, &ref_index,
> > +					&parent_objectid);
> > +		if (ret)
> > +			return ret;
> > +	} else {
> > +		ref_struct_size = sizeof(*ref);
> > +
> > +		ret = ref_get_fields(eb, slot, &namelen, &name, &ref_index);
> > +		if (ret)
> > +			return ret;
> > +
> > +
> > +		parent_objectid = key->offset;
> > +	}
> > +
> > +	inode_objectid = key->objectid;
> >  
> >  	/*
> >  	 * it is possible that we didn't log all the parent directories
> > @@ -817,32 +903,20 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
> >  	 * copy the back ref in.  The link count fixup code will take
> >  	 * care of the rest
> >  	 */
> > -	dir = read_one_inode(root, key->offset);
> > +	dir = read_one_inode(root, parent_objectid);
> >  	if (!dir)
> >  		return -ENOENT;
> >  
> > -	inode = read_one_inode(root, key->objectid);
> > +	inode = read_one_inode(root, inode_objectid);
> >  	if (!inode) {
> >  		iput(dir);
> >  		return -EIO;
> >  	}
> >  
> > -	ref_ptr = btrfs_item_ptr_offset(eb, slot);
> > -	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
> > -
> >  again:
> > -	ref = (struct btrfs_inode_ref *)ref_ptr;
> > -
> > -	namelen = btrfs_inode_ref_name_len(eb, ref);
> > -	name = kmalloc(namelen, GFP_NOFS);
> > -	BUG_ON(!name);
> > -
> > -	read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> > -
> >  	/* if we already have a perfect match, we're done */
> >  	if (inode_in_dir(root, path, btrfs_ino(dir), btrfs_ino(inode),
> > -			 btrfs_inode_ref_index(eb, ref),
> > -			 name, namelen)) {
> > +			 ref_index, name, namelen)) {
> >  		goto out;
> >  	}
> >  
> > @@ -857,19 +931,23 @@ again:
> >  	if (search_done)
> >  		goto insert;
> >  
> > -	ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
> > +	/* Search old style refs */
> > +	search_key.objectid = inode_objectid;
> > +	search_key.type = BTRFS_INODE_REF_KEY;
> > +	search_key.offset = parent_objectid;
> > +
> > +	ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0);
> >  	if (ret == 0) {
> > -		char *victim_name;
> > -		int victim_name_len;
> >  		struct btrfs_inode_ref *victim_ref;
> >  		unsigned long ptr;
> >  		unsigned long ptr_end;
> > -		struct extent_buffer *leaf = path->nodes[0];
> > +
> > +		leaf = path->nodes[0];
> >  
> >  		/* are we trying to overwrite a back ref for the root directory
> >  		 * if so, just jump out, we're done
> >  		 */
> > -		if (key->objectid == key->offset)
> > +		if (search_key.objectid == search_key.offset)
> >  			goto out_nowrite;
> >  
> >  		/* check all the names in this back reference to see
> > @@ -889,7 +967,7 @@ again:
> >  					   (unsigned long)(victim_ref + 1),
> >  					   victim_name_len);
> >  
> > -			if (!backref_in_log(log, key, victim_name,
> > +			if (!backref_in_log(log, &search_key, victim_name,
> >  					    victim_name_len)) {
> >  				btrfs_inc_nlink(inode);
> >  				btrfs_release_path(path);
> > @@ -903,19 +981,61 @@ again:
> >  			ptr = (unsigned long)(victim_ref + 1) + victim_name_len;
> >  		}
> >  		BUG_ON(ret);
> > -
> >  		/*
> >  		 * NOTE: we have searched root tree and checked the
> > -		 * coresponding ref, it does not need to check again.
> > +		 * coresponding refs, it does not need to be checked again.
> >  		 */
> >  		search_done = 1;
> >  	}
> >  	btrfs_release_path(path);
> >  
> > +	/* Same search but for extended refs */
> > +	extref = btrfs_lookup_inode_extref(NULL, root, path, name, namelen,
> > +					   inode_objectid, parent_objectid, 0,
> > +					   0);
> > +	if (!IS_ERR_OR_NULL(extref)) {
> 
> Okay, one obvious comment at least: You don't execute this block on
> IS_ERR(extref), but you don't act on that situation either. Is that on purpose?

This is just mirroring the broken (or maybe misunderstood?) error handling
that this function has (unfortunately, all over the place). Basically, we
ignore errors from the lookups for either ref type.


> > +		u32 item_size;
> > +		u32 cur_offset = 0;
> > +		unsigned long base;
> > +
> > +		leaf = path->nodes[0];
> > +
> > +		item_size = btrfs_item_size_nr(leaf, path->slots[0]);
> > +		base = btrfs_item_ptr_offset(leaf, path->slots[0]);
> > +
> > +		while (cur_offset < item_size) {
> > +			extref = (struct btrfs_inode_extref *)base + cur_offset;
> > +
> > +			victim_name_len = btrfs_inode_extref_name_len(eb, extref);
> > +			victim_name = kmalloc(namelen, GFP_NOFS);
> > +			leaf = path->nodes[0];
> > +			read_extent_buffer(eb, name, (unsigned long)&extref->name, namelen);
> > +
> > +			search_key.objectid = inode_objectid;
> > +			search_key.type = BTRFS_INODE_EXTREF_KEY;
> > +			search_key.offset = btrfs_extref_hash(parent_objectid,
> > +							      name, namelen);
> > +			if (!backref_in_log(log, &search_key, victim_name,
> > +					    victim_name_len)) {
> > +				btrfs_inc_nlink(inode);
> > +				btrfs_release_path(path);
> > +
> > +				ret = btrfs_unlink_inode(trans, root, dir,
> > +							 inode, victim_name,
> > +							 victim_name_len);
> > +			}
> > +			kfree(victim_name);
> > +			BUG_ON(ret);
> > +
> > +			cur_offset += victim_name_len + sizeof(*extref);
> > +		}
> > +		search_done = 1;
> > +	}
> > +
> > +
> >  	/* look for a conflicting sequence number */
> >  	di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir),
> > -					 btrfs_inode_ref_index(eb, ref),
> > -					 name, namelen, 0);
> > +					 ref_index, name, namelen, 0);
> >  	if (di && !IS_ERR(di)) {
> >  		ret = drop_one_dir_item(trans, root, path, dir, di);
> >  		BUG_ON(ret);
> > @@ -933,17 +1053,25 @@ again:
> >  
> >  insert:
> >  	/* insert our name */
> > -	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
> > -			     btrfs_inode_ref_index(eb, ref));
> > +	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0, ref_index);
> >  	BUG_ON(ret);
> >  
> >  	btrfs_update_inode(trans, root, inode);
> >  
> >  out:
> > -	ref_ptr = (unsigned long)(ref + 1) + namelen;
> > +	ref_ptr = (unsigned long)(ref_ptr + ref_struct_size) + namelen;
> >  	kfree(name);
> > -	if (ref_ptr < ref_end)
> > +	if (ref_ptr < ref_end) {
> > +		if (log_ref_ver) {
> > +			ret = extref_get_fields(eb, slot, &namelen, &name,
> > +						&ref_index, NULL);
> > +		} else {
> > +			ret = ref_get_fields(eb, slot, &namelen, &name,
> > +					     &ref_index);
> > +		}
> > +		BUG_ON(ret);
> >  		goto again;
> > +	}
> >  
> >  	/* finally write the back reference in the inode */
> >  	ret = overwrite_item(trans, root, path, eb, slot, key);
> > @@ -966,25 +1094,52 @@ static int insert_orphan_item(struct btrfs_trans_handle *trans,
> >  	return ret;
> >  }
> >  
> > +static int count_inode_extrefs(struct btrfs_root *root,
> > +			       struct inode *inode, struct btrfs_path *path)
> > +{
> > +	int ret = 0;
> > +	int name_len;
> > +	unsigned int nlink = 0;
> > +	u32 item_size;
> > +	u32 cur_offset = 0;
> > +	u64 inode_objectid = btrfs_ino(inode);
> > +	u64 offset = 0;
> > +	unsigned long ptr;
> > +	struct btrfs_inode_extref *extref;
> > +	struct extent_buffer *leaf;
> > +
> > +	while (1) {
> > +		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
> > +					    &extref, &offset);
> > +		if (ret)
> > +			break;
> >  
> > -/*
> > - * There are a few corners where the link count of the file can't
> > - * be properly maintained during replay.  So, instead of adding
> > - * lots of complexity to the log code, we just scan the backrefs
> > - * for any file that has been through replay.
> > - *
> > - * The scan will update the link count on the inode to reflect the
> > - * number of back refs found.  If it goes down to zero, the iput
> > - * will free the inode.
> > - */
> > -static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> > -					   struct btrfs_root *root,
> > -					   struct inode *inode)
> > +		leaf = path->nodes[0];
> > +		item_size = btrfs_item_size_nr(leaf, path->slots[0]);
> > +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> > +
> > +		while (cur_offset < item_size) {
> > +			extref = (struct btrfs_inode_extref *) (ptr + cur_offset);
> > +			name_len = btrfs_inode_extref_name_len(leaf, extref);
> > +
> > +			nlink++;
> > +
> > +			cur_offset += name_len + sizeof(*extref);
> > +		}
> > +
> > +		offset++;
> > +	}
> > +	btrfs_release_path(path);
> > +
> > +	return (ret == 0) ? nlink : ret;
> 
> You depend on btrfs_find_one_extref never returning >0 here. I'd make that ...
> 
>  if (ret < 0)
>  	return ret;
>  return nlink;
> 
> This ignores ret > 0 explicitly, which is in my opinion better than a spurious
> link count.

Sounds good.


> > +}
> > +
> > +static int count_inode_refs(struct btrfs_root *root,
> > +			       struct inode *inode, struct btrfs_path *path)
> >  {
> > -	struct btrfs_path *path;
> >  	int ret;
> >  	struct btrfs_key key;
> > -	u64 nlink = 0;
> > +	unsigned int nlink = 0;
> >  	unsigned long ptr;
> >  	unsigned long ptr_end;
> >  	int name_len;
> > @@ -994,10 +1149,6 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> >  	key.type = BTRFS_INODE_REF_KEY;
> >  	key.offset = (u64)-1;
> >  
> > -	path = btrfs_alloc_path();
> > -	if (!path)
> > -		return -ENOMEM;
> > -
> >  	while (1) {
> >  		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> >  		if (ret < 0)
> > @@ -1031,6 +1182,45 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> >  		btrfs_release_path(path);
> >  	}
> >  	btrfs_release_path(path);
> > +
> > +	return nlink;
> > +}
> > +
> > +/*
> > + * There are a few corners where the link count of the file can't
> > + * be properly maintained during replay.  So, instead of adding
> > + * lots of complexity to the log code, we just scan the backrefs
> > + * for any file that has been through replay.
> > + *
> > + * The scan will update the link count on the inode to reflect the
> > + * number of back refs found.  If it goes down to zero, the iput
> > + * will free the inode.
> > + */
> > +static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> > +					   struct btrfs_root *root,
> > +					   struct inode *inode)
> > +{
> > +	struct btrfs_path *path;
> > +	int ret;
> > +	u64 nlink = 0;
> > +	u64 ino = btrfs_ino(inode);
> > +
> > +	path = btrfs_alloc_path();
> > +	if (!path)
> > +		return -ENOMEM;
> > +
> > +	ret = count_inode_refs(root, inode, path);
> > +	if (ret < 0)
> > +		goto out;
> > +
> > +	nlink = ret;
> > +
> > +	ret = count_inode_extrefs(root, inode, path);
> > +	if (ret < 0)
> > +		goto out;
> > +
> > +	nlink += ret;
> > +
> >  	if (nlink != inode->i_nlink) {
> >  		set_nlink(inode, nlink);
> >  		btrfs_update_inode(trans, root, inode);
> > @@ -1046,9 +1236,10 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> >  		ret = insert_orphan_item(trans, root, ino);
> >  		BUG_ON(ret);
> >  	}
> > -	btrfs_free_path(path);
> >  
> > -	return 0;
> > +out:
> > +	btrfs_free_path(path);
> > +	return ret;
> >  }
> >  
> >  static noinline int fixup_inode_link_counts(struct btrfs_trans_handle *trans,
> > @@ -1695,6 +1886,10 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb,
> >  			ret = add_inode_ref(wc->trans, root, log, path,
> >  					    eb, i, &key);
> >  			BUG_ON(ret && ret != -ENOENT);
> > +		} else if (key.type == BTRFS_INODE_EXTREF_KEY) {
> > +			ret = add_inode_ref(wc->trans, root, log, path,
> > +					    eb, i, &key);
> > +			BUG_ON(ret && ret != -ENOENT);
> >  		} else if (key.type == BTRFS_EXTENT_DATA_KEY) {
> >  			ret = replay_one_extent(wc->trans, root, path,
> >  						eb, i, &key);
> 
> That's it for this round, here comes the promised patch. Should apply with "git
> am --scissors".

Fantastic, thanks.

> Can you please add a version to the subject before sending the revised patches?
> Like "git format-patch --subject-prefix v4".

Sure, no problem. I'll start the next round at v4 unless you'd like some other starting
value :)


Thanks again for such valuable review Jan!
	--Mark

> 
> Thanks,
> -Jan
> 
> -- >8 --
> Subject: [PATCH] Btrfs: improved readablity for add_inode_ref
> 
> Moved part of the code into a sub function and replaced most of the gotos
> by ifs, hoping that it will be easier to read now.
> 
> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
> ---
>  fs/btrfs/tree-log.c |  178 ++++++++++++++++++++++++++++-----------------------
>  1 files changed, 97 insertions(+), 81 deletions(-)
> 
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index c86670f..59f3071 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -786,76 +786,18 @@ out:
>  	return match;
>  }
> 
> -
> -/*
> - * replay one inode back reference item found in the log tree.
> - * eb, slot and key refer to the buffer and key found in the log tree.
> - * root is the destination we are replaying into, and path is for temp
> - * use by this function.  (it should be released on return).
> - */
> -static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
> +static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
>  				  struct btrfs_root *root,
> -				  struct btrfs_root *log,
>  				  struct btrfs_path *path,
> -				  struct extent_buffer *eb, int slot,
> -				  struct btrfs_key *key)
> +				  struct btrfs_root *log_root,
> +				  struct inode *dir, struct inode *inode,
> +				  struct btrfs_key *key,
> +				  struct extent_buffer *eb,
> +				  struct btrfs_inode_ref *ref,
> +				  char *name, int namelen, int *search_done)
>  {
> -	struct btrfs_inode_ref *ref;
> -	struct btrfs_dir_item *di;
> -	struct inode *dir;
> -	struct inode *inode;
> -	unsigned long ref_ptr;
> -	unsigned long ref_end;
> -	char *name;
> -	int namelen;
>  	int ret;
> -	int search_done = 0;
> -
> -	/*
> -	 * it is possible that we didn't log all the parent directories
> -	 * for a given inode.  If we don't find the dir, just don't
> -	 * copy the back ref in.  The link count fixup code will take
> -	 * care of the rest
> -	 */
> -	dir = read_one_inode(root, key->offset);
> -	if (!dir)
> -		return -ENOENT;
> -
> -	inode = read_one_inode(root, key->objectid);
> -	if (!inode) {
> -		iput(dir);
> -		return -EIO;
> -	}
> -
> -	ref_ptr = btrfs_item_ptr_offset(eb, slot);
> -	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
> -
> -again:
> -	ref = (struct btrfs_inode_ref *)ref_ptr;
> -
> -	namelen = btrfs_inode_ref_name_len(eb, ref);
> -	name = kmalloc(namelen, GFP_NOFS);
> -	BUG_ON(!name);
> -
> -	read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> -
> -	/* if we already have a perfect match, we're done */
> -	if (inode_in_dir(root, path, btrfs_ino(dir), btrfs_ino(inode),
> -			 btrfs_inode_ref_index(eb, ref),
> -			 name, namelen)) {
> -		goto out;
> -	}
> -
> -	/*
> -	 * look for a conflicting back reference in the metadata.
> -	 * if we find one we have to unlink that name of the file
> -	 * before we add our new link.  Later on, we overwrite any
> -	 * existing back reference, and we don't want to create
> -	 * dangling pointers in the directory.
> -	 */
> -
> -	if (search_done)
> -		goto insert;
> +	struct btrfs_dir_item *di;
> 
>  	ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
>  	if (ret == 0) {
> @@ -870,7 +812,7 @@ again:
>  		 * if so, just jump out, we're done
>  		 */
>  		if (key->objectid == key->offset)
> -			goto out_nowrite;
> +			return 1;
> 
>  		/* check all the names in this back reference to see
>  		 * if they are in the log.  if so, we allow them to stay
> @@ -889,7 +831,7 @@ again:
>  					   (unsigned long)(victim_ref + 1),
>  					   victim_name_len);
> 
> -			if (!backref_in_log(log, key, victim_name,
> +			if (!backref_in_log(log_root, key, victim_name,
>  					    victim_name_len)) {
>  				btrfs_inc_nlink(inode);
>  				btrfs_release_path(path);
> @@ -908,7 +850,7 @@ again:
>  		 * NOTE: we have searched root tree and checked the
>  		 * coresponding ref, it does not need to check again.
>  		 */
> -		search_done = 1;
> +		*search_done = 1;
>  	}
>  	btrfs_release_path(path);
> 
> @@ -931,25 +873,99 @@ again:
>  	}
>  	btrfs_release_path(path);
> 
> -insert:
> -	/* insert our name */
> -	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
> -			     btrfs_inode_ref_index(eb, ref));
> -	BUG_ON(ret);
> +	return 0;
> +}
> +
> +/*
> + * replay one inode back reference item found in the log tree.
> + * eb, slot and key refer to the buffer and key found in the log tree.
> + * root is the destination we are replaying into, and path is for temp
> + * use by this function.  (it should be released on return).
> + */
> +static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
> +				  struct btrfs_root *root,
> +				  struct btrfs_root *log,
> +				  struct btrfs_path *path,
> +				  struct extent_buffer *eb, int slot,
> +				  struct btrfs_key *key)
> +{
> +	struct btrfs_inode_ref *ref;
> +	struct inode *dir;
> +	struct inode *inode;
> +	unsigned long ref_ptr;
> +	unsigned long ref_end;
> +	char *name;
> +	int namelen;
> +	int ret;
> +	int search_done = 0;
> +
> +	/*
> +	 * it is possible that we didn't log all the parent directories
> +	 * for a given inode.  If we don't find the dir, just don't
> +	 * copy the back ref in.  The link count fixup code will take
> +	 * care of the rest
> +	 */
> +	dir = read_one_inode(root, key->offset);
> +	if (!dir)
> +		return -ENOENT;
> 
> -	btrfs_update_inode(trans, root, inode);
> +	inode = read_one_inode(root, key->objectid);
> +	if (!inode) {
> +		iput(dir);
> +		return -EIO;
> +	}
> 
> -out:
> -	ref_ptr = (unsigned long)(ref + 1) + namelen;
> -	kfree(name);
> -	if (ref_ptr < ref_end)
> -		goto again;
> +	ref_ptr = btrfs_item_ptr_offset(eb, slot);
> +	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
> +
> +	while (ref_ptr < ref_end) {
> +		ref = (struct btrfs_inode_ref *)ref_ptr;
> +
> +		namelen = btrfs_inode_ref_name_len(eb, ref);
> +		name = kmalloc(namelen, GFP_NOFS);
> +		BUG_ON(!name);
> +
> +		read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> +
> +		/* if we already have a perfect match, we're done */
> +		if (!inode_in_dir(root, path, btrfs_ino(dir), btrfs_ino(inode),
> +				  btrfs_inode_ref_index(eb, ref),
> +				  name, namelen)) {
> +			/*
> +			 * look for a conflicting back reference in the
> +			 * metadata. if we find one we have to unlink that name
> +			 * of the file before we add our new link.  Later on, we
> +			 * overwrite any existing back reference, and we don't
> +			 * want to create dangling pointers in the directory.
> +			 */
> +
> +			if (!search_done) {
> +				ret = __add_inode_ref(trans, root, path, log,
> +						      dir, inode, key, eb, ref,
> +						      name, namelen,
> +						      &search_done);
> +				if (ret == 1)
> +					goto out;
> +				BUG_ON(ret);
> +			}
> +
> +			/* insert our name */
> +			ret = btrfs_add_link(trans, dir, inode, name, namelen,
> +					     0, btrfs_inode_ref_index(eb, ref));
> +			BUG_ON(ret);
> +
> +			btrfs_update_inode(trans, root, inode);
> +		}
> +
> +		ref_ptr = (unsigned long)(ref + 1) + namelen;
> +		kfree(name);
> +	}
> 
>  	/* finally write the back reference in the inode */
>  	ret = overwrite_item(trans, root, path, eb, slot, key);
>  	BUG_ON(ret);
> 
> -out_nowrite:
> +out:
>  	btrfs_release_path(path);
>  	iput(dir);
>  	iput(inode);
> -- 
> 1.7.1
--
Mark Fasheh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] btrfs: extended inode refs
  2012-08-08 18:55 ` [PATCH 2/3] " Mark Fasheh
@ 2012-08-15 15:04   ` Jan Schmidt
  2012-08-15 17:59     ` Mark Fasheh
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Schmidt @ 2012-08-15 15:04 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: linux-btrfs, Chris Mason

When applying this patch I get:

warning: 2 lines add whitespace errors.

More comments inline.

On Wed, August 08, 2012 at 20:55 (+0200), Mark Fasheh wrote:
> Teach tree-log.c about extended inode refs. In particular, we have to adjust
> the behavior of inode ref replay as well as log tree recovery to account for
> the existence of extended refs.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> ---
>  fs/btrfs/backref.c  |   68 ++++++++++++
>  fs/btrfs/backref.h  |    5 +
>  fs/btrfs/tree-log.c |  297 ++++++++++++++++++++++++++++++++++++++++++---------
>  3 files changed, 319 insertions(+), 51 deletions(-)
> 
> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> index a383c18..658e09c 100644
> --- a/fs/btrfs/backref.c
> +++ b/fs/btrfs/backref.c
> @@ -1111,6 +1111,74 @@ static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
>  				found_key);
>  }
>  
> +int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> +			  u64 start_off, struct btrfs_path *path,
> +			  struct btrfs_inode_extref **ret_extref,
> +			  u64 *found_off)
> +{
> +	int ret, slot;
> +	struct btrfs_key key;
> +	struct btrfs_key found_key;
> +	struct btrfs_inode_extref *extref;
> +	struct extent_buffer *leaf;
> +	unsigned long ptr;
> +
> +	key.objectid = inode_objectid;
> +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
> +	key.offset = start_off;
> +
> +	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> +	if (ret < 0)
> +		return ret;
> +
> +	while (1) {
> +		leaf = path->nodes[0];
> +		slot = path->slots[0];
> +		if (slot >= btrfs_header_nritems(leaf)) {
> +			/*
> +			 * If the item at offset is not found,
> +			 * btrfs_search_slot will point us to the slot
> +			 * where it should be inserted. In our case
> +			 * that will be the slot directly before the
> +			 * next INODE_REF_KEY_V2 item. In the case
> +			 * that we're pointing to the last slot in a
> +			 * leaf, we must move one leaf over.
> +			 */
> +			ret = btrfs_next_leaf(root, path);
> +			if (ret) {
> +				if (ret >= 1)
> +					ret = -ENOENT;
> +				break;
> +			}

We can finally replace this with btrfs_search_slot_for_read, according to my
first suggestion. It's merged now. Saves that long comment and the whole
while-1-continue construct, which is quite cumbersome to read.

> +			continue;
> +		}
> +
> +		btrfs_item_key_to_cpu(leaf, &found_key, slot);
> +
> +		/*
> +		 * Check that we're still looking at an extended ref key for
> +		 * this particular objectid. If we have different
> +		 * objectid or type then there are no more to be found
> +		 * in the tree and we can exit.
> +		 */
> +		ret = -ENOENT;
> +		if (found_key.objectid != inode_objectid)
> +			break;
> +		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
> +			break;
> +
> +		ret = 0;
> +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +		extref = (struct btrfs_inode_extref *)ptr;
> +		*ret_extref = extref;
> +		if (found_off)
> +			*found_off = found_key.offset;
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>  /*
>   * this iterates to turn a btrfs_inode_ref into a full filesystem path. elements
>   * of the path are separated by '/' and the path is guaranteed to be
> diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
> index c18d8ac..9f3e251 100644
> --- a/fs/btrfs/backref.h
> +++ b/fs/btrfs/backref.h
> @@ -66,4 +66,9 @@ struct inode_fs_paths *init_ipath(s32 total_bytes, struct btrfs_root *fs_root,
>  					struct btrfs_path *path);
>  void free_ipath(struct inode_fs_paths *ipath);
>  
> +int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> +			  u64 start_off, struct btrfs_path *path,
> +			  struct btrfs_inode_extref **ret_extref,
> +			  u64 *found_off);
> +
>  #endif
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 8abeae4..e5ba0a4 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -23,8 +23,10 @@
>  #include "disk-io.h"
>  #include "locking.h"
>  #include "print-tree.h"
> +#include "backref.h"
>  #include "compat.h"
>  #include "tree-log.h"
> +#include "hash.h"
>  
>  /* magic values for the inode_only field in btrfs_log_inode:
>   *
> @@ -764,8 +766,16 @@ static noinline int backref_in_log(struct btrfs_root *log,
>  	if (ret != 0)
>  		goto out;
>  
> -	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
>  	ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]);
> +
> +	if (key->type == BTRFS_INODE_EXTREF_KEY) {
> +		if (btrfs_find_name_in_ext_backref(path, name, namelen, NULL))
> +			match = 1;
> +
> +		goto out;
> +	}
> +
> +	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
>  	ptr_end = ptr + item_size;
>  	while (ptr < ptr_end) {
>  		ref = (struct btrfs_inode_ref *)ptr;
> @@ -786,6 +796,47 @@ out:
>  	return match;
>  }
>  
> +static int extref_get_fields(struct extent_buffer *eb, int slot,
> +			     u32 *namelen, char **name, u64 *index,
> +			     u64 *parent_objectid)
> +{
> +	struct btrfs_inode_extref *extref;
> +
> +	extref = (struct btrfs_inode_extref *)btrfs_item_ptr_offset(eb, slot);
> +
> +	*namelen = btrfs_inode_extref_name_len(eb, extref);
> +	*name = kmalloc(*namelen, GFP_NOFS);
> +	if (*name == NULL)
> +		return -ENOMEM;
> +
> +	read_extent_buffer(eb, *name, (unsigned long)&extref->name,
> +			   *namelen);
> +
> +	*index = btrfs_inode_extref_index(eb, extref);
> +	if (parent_objectid)
> +		*parent_objectid = btrfs_inode_extref_parent(eb, extref);
> +
> +	return 0;
> +}
> +
> +static int ref_get_fields(struct extent_buffer *eb, int slot, u32 *namelen,
> +			  char **name, u64 *index)
> +{
> +	struct btrfs_inode_ref *ref;
> +
> +	ref = (struct btrfs_inode_ref *)btrfs_item_ptr_offset(eb, slot);
> +
> +	*namelen = btrfs_inode_ref_name_len(eb, ref);
> +	*name = kmalloc(*namelen, GFP_NOFS);	
> +	if (*name == NULL)
> +		return -ENOMEM;
> +
> +	read_extent_buffer(eb, *name, (unsigned long)(ref + 1), *namelen);
> +
> +	*index = btrfs_inode_ref_index(eb, ref);
> +
> +	return 0;
> +}
>  
>  /*
>   * replay one inode back reference item found in the log tree.
> @@ -801,15 +852,50 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
>  				  struct btrfs_key *key)

Control flow in this function is horrible. I finally failed to understand it.
That's why I made a patch that replaces most gotos by if blocks and uses a sub
function (see the end of this email). I hope that it's getting easier to extend
that patched version. Cannot comment on changes to code I don't understand, sorry.

If you find my patch helps understanding that function and makes patching
easier, you can include that patch in your patch set as patch 2/4 and then do
your extension afterwards.

>  {
>  	struct btrfs_inode_ref *ref;
> +	struct btrfs_inode_extref *extref;
>  	struct btrfs_dir_item *di;
> +	struct btrfs_key search_key;
>  	struct inode *dir;
>  	struct inode *inode;
>  	unsigned long ref_ptr;
>  	unsigned long ref_end;
>  	char *name;
> +	char *victim_name;
>  	int namelen;
> +	int victim_name_len;
>  	int ret;
>  	int search_done = 0;
> +	int log_ref_ver = 0;
> +	u64 parent_objectid;
> +	u64 inode_objectid;
> +	u64 ref_index;
> +	struct extent_buffer *leaf;
> +	int ref_struct_size;
> +
> +
> +	ref_ptr = btrfs_item_ptr_offset(eb, slot);
> +	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
> +
> +	if (key->type == BTRFS_INODE_EXTREF_KEY) {	
> +		ref_struct_size = sizeof(*extref);
> +		log_ref_ver = 1;
> +
> +		ret = extref_get_fields(eb, slot, &namelen, &name, &ref_index,
> +					&parent_objectid);
> +		if (ret)
> +			return ret;
> +	} else {
> +		ref_struct_size = sizeof(*ref);
> +
> +		ret = ref_get_fields(eb, slot, &namelen, &name, &ref_index);
> +		if (ret)
> +			return ret;
> +
> +
> +		parent_objectid = key->offset;
> +	}
> +
> +	inode_objectid = key->objectid;
>  
>  	/*
>  	 * it is possible that we didn't log all the parent directories
> @@ -817,32 +903,20 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
>  	 * copy the back ref in.  The link count fixup code will take
>  	 * care of the rest
>  	 */
> -	dir = read_one_inode(root, key->offset);
> +	dir = read_one_inode(root, parent_objectid);
>  	if (!dir)
>  		return -ENOENT;
>  
> -	inode = read_one_inode(root, key->objectid);
> +	inode = read_one_inode(root, inode_objectid);
>  	if (!inode) {
>  		iput(dir);
>  		return -EIO;
>  	}
>  
> -	ref_ptr = btrfs_item_ptr_offset(eb, slot);
> -	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
> -
>  again:
> -	ref = (struct btrfs_inode_ref *)ref_ptr;
> -
> -	namelen = btrfs_inode_ref_name_len(eb, ref);
> -	name = kmalloc(namelen, GFP_NOFS);
> -	BUG_ON(!name);
> -
> -	read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> -
>  	/* if we already have a perfect match, we're done */
>  	if (inode_in_dir(root, path, btrfs_ino(dir), btrfs_ino(inode),
> -			 btrfs_inode_ref_index(eb, ref),
> -			 name, namelen)) {
> +			 ref_index, name, namelen)) {
>  		goto out;
>  	}
>  
> @@ -857,19 +931,23 @@ again:
>  	if (search_done)
>  		goto insert;
>  
> -	ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
> +	/* Search old style refs */
> +	search_key.objectid = inode_objectid;
> +	search_key.type = BTRFS_INODE_REF_KEY;
> +	search_key.offset = parent_objectid;
> +
> +	ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0);
>  	if (ret == 0) {
> -		char *victim_name;
> -		int victim_name_len;
>  		struct btrfs_inode_ref *victim_ref;
>  		unsigned long ptr;
>  		unsigned long ptr_end;
> -		struct extent_buffer *leaf = path->nodes[0];
> +
> +		leaf = path->nodes[0];
>  
>  		/* are we trying to overwrite a back ref for the root directory
>  		 * if so, just jump out, we're done
>  		 */
> -		if (key->objectid == key->offset)
> +		if (search_key.objectid == search_key.offset)
>  			goto out_nowrite;
>  
>  		/* check all the names in this back reference to see
> @@ -889,7 +967,7 @@ again:
>  					   (unsigned long)(victim_ref + 1),
>  					   victim_name_len);
>  
> -			if (!backref_in_log(log, key, victim_name,
> +			if (!backref_in_log(log, &search_key, victim_name,
>  					    victim_name_len)) {
>  				btrfs_inc_nlink(inode);
>  				btrfs_release_path(path);
> @@ -903,19 +981,61 @@ again:
>  			ptr = (unsigned long)(victim_ref + 1) + victim_name_len;
>  		}
>  		BUG_ON(ret);
> -
>  		/*
>  		 * NOTE: we have searched root tree and checked the
> -		 * coresponding ref, it does not need to check again.
> +		 * coresponding refs, it does not need to be checked again.
>  		 */
>  		search_done = 1;
>  	}
>  	btrfs_release_path(path);
>  
> +	/* Same search but for extended refs */
> +	extref = btrfs_lookup_inode_extref(NULL, root, path, name, namelen,
> +					   inode_objectid, parent_objectid, 0,
> +					   0);
> +	if (!IS_ERR_OR_NULL(extref)) {

Okay, one obvious comment at least: You don't execute this block on
IS_ERR(extref), but you don't act on that situation either. Is that on purpose?

> +		u32 item_size;
> +		u32 cur_offset = 0;
> +		unsigned long base;
> +
> +		leaf = path->nodes[0];
> +
> +		item_size = btrfs_item_size_nr(leaf, path->slots[0]);
> +		base = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +
> +		while (cur_offset < item_size) {
> +			extref = (struct btrfs_inode_extref *)base + cur_offset;
> +
> +			victim_name_len = btrfs_inode_extref_name_len(eb, extref);
> +			victim_name = kmalloc(namelen, GFP_NOFS);
> +			leaf = path->nodes[0];
> +			read_extent_buffer(eb, name, (unsigned long)&extref->name, namelen);
> +
> +			search_key.objectid = inode_objectid;
> +			search_key.type = BTRFS_INODE_EXTREF_KEY;
> +			search_key.offset = btrfs_extref_hash(parent_objectid,
> +							      name, namelen);
> +			if (!backref_in_log(log, &search_key, victim_name,
> +					    victim_name_len)) {
> +				btrfs_inc_nlink(inode);
> +				btrfs_release_path(path);
> +
> +				ret = btrfs_unlink_inode(trans, root, dir,
> +							 inode, victim_name,
> +							 victim_name_len);
> +			}
> +			kfree(victim_name);
> +			BUG_ON(ret);
> +
> +			cur_offset += victim_name_len + sizeof(*extref);
> +		}
> +		search_done = 1;
> +	}
> +
> +
>  	/* look for a conflicting sequence number */
>  	di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir),
> -					 btrfs_inode_ref_index(eb, ref),
> -					 name, namelen, 0);
> +					 ref_index, name, namelen, 0);
>  	if (di && !IS_ERR(di)) {
>  		ret = drop_one_dir_item(trans, root, path, dir, di);
>  		BUG_ON(ret);
> @@ -933,17 +1053,25 @@ again:
>  
>  insert:
>  	/* insert our name */
> -	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
> -			     btrfs_inode_ref_index(eb, ref));
> +	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0, ref_index);
>  	BUG_ON(ret);
>  
>  	btrfs_update_inode(trans, root, inode);
>  
>  out:
> -	ref_ptr = (unsigned long)(ref + 1) + namelen;
> +	ref_ptr = (unsigned long)(ref_ptr + ref_struct_size) + namelen;
>  	kfree(name);
> -	if (ref_ptr < ref_end)
> +	if (ref_ptr < ref_end) {
> +		if (log_ref_ver) {
> +			ret = extref_get_fields(eb, slot, &namelen, &name,
> +						&ref_index, NULL);
> +		} else {
> +			ret = ref_get_fields(eb, slot, &namelen, &name,
> +					     &ref_index);
> +		}
> +		BUG_ON(ret);
>  		goto again;
> +	}
>  
>  	/* finally write the back reference in the inode */
>  	ret = overwrite_item(trans, root, path, eb, slot, key);
> @@ -966,25 +1094,52 @@ static int insert_orphan_item(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +static int count_inode_extrefs(struct btrfs_root *root,
> +			       struct inode *inode, struct btrfs_path *path)
> +{
> +	int ret = 0;
> +	int name_len;
> +	unsigned int nlink = 0;
> +	u32 item_size;
> +	u32 cur_offset = 0;
> +	u64 inode_objectid = btrfs_ino(inode);
> +	u64 offset = 0;
> +	unsigned long ptr;
> +	struct btrfs_inode_extref *extref;
> +	struct extent_buffer *leaf;
> +
> +	while (1) {
> +		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
> +					    &extref, &offset);
> +		if (ret)
> +			break;
>  
> -/*
> - * There are a few corners where the link count of the file can't
> - * be properly maintained during replay.  So, instead of adding
> - * lots of complexity to the log code, we just scan the backrefs
> - * for any file that has been through replay.
> - *
> - * The scan will update the link count on the inode to reflect the
> - * number of back refs found.  If it goes down to zero, the iput
> - * will free the inode.
> - */
> -static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> -					   struct btrfs_root *root,
> -					   struct inode *inode)
> +		leaf = path->nodes[0];
> +		item_size = btrfs_item_size_nr(leaf, path->slots[0]);
> +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +
> +		while (cur_offset < item_size) {
> +			extref = (struct btrfs_inode_extref *) (ptr + cur_offset);
> +			name_len = btrfs_inode_extref_name_len(leaf, extref);
> +
> +			nlink++;
> +
> +			cur_offset += name_len + sizeof(*extref);
> +		}
> +
> +		offset++;
> +	}
> +	btrfs_release_path(path);
> +
> +	return (ret == 0) ? nlink : ret;

You depend on btrfs_find_one_extref never returning >0 here. I'd make that ...

 if (ret < 0)
 	return ret;
 return nlink;

This ignores ret > 0 explicitly, which is in my opinion better than a spurious
link count.

> +}
> +
> +static int count_inode_refs(struct btrfs_root *root,
> +			       struct inode *inode, struct btrfs_path *path)
>  {
> -	struct btrfs_path *path;
>  	int ret;
>  	struct btrfs_key key;
> -	u64 nlink = 0;
> +	unsigned int nlink = 0;
>  	unsigned long ptr;
>  	unsigned long ptr_end;
>  	int name_len;
> @@ -994,10 +1149,6 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
>  	key.type = BTRFS_INODE_REF_KEY;
>  	key.offset = (u64)-1;
>  
> -	path = btrfs_alloc_path();
> -	if (!path)
> -		return -ENOMEM;
> -
>  	while (1) {
>  		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
>  		if (ret < 0)
> @@ -1031,6 +1182,45 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
>  		btrfs_release_path(path);
>  	}
>  	btrfs_release_path(path);
> +
> +	return nlink;
> +}
> +
> +/*
> + * There are a few corners where the link count of the file can't
> + * be properly maintained during replay.  So, instead of adding
> + * lots of complexity to the log code, we just scan the backrefs
> + * for any file that has been through replay.
> + *
> + * The scan will update the link count on the inode to reflect the
> + * number of back refs found.  If it goes down to zero, the iput
> + * will free the inode.
> + */
> +static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> +					   struct btrfs_root *root,
> +					   struct inode *inode)
> +{
> +	struct btrfs_path *path;
> +	int ret;
> +	u64 nlink = 0;
> +	u64 ino = btrfs_ino(inode);
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	ret = count_inode_refs(root, inode, path);
> +	if (ret < 0)
> +		goto out;
> +
> +	nlink = ret;
> +
> +	ret = count_inode_extrefs(root, inode, path);
> +	if (ret < 0)
> +		goto out;
> +
> +	nlink += ret;
> +
>  	if (nlink != inode->i_nlink) {
>  		set_nlink(inode, nlink);
>  		btrfs_update_inode(trans, root, inode);
> @@ -1046,9 +1236,10 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
>  		ret = insert_orphan_item(trans, root, ino);
>  		BUG_ON(ret);
>  	}
> -	btrfs_free_path(path);
>  
> -	return 0;
> +out:
> +	btrfs_free_path(path);
> +	return ret;
>  }
>  
>  static noinline int fixup_inode_link_counts(struct btrfs_trans_handle *trans,
> @@ -1695,6 +1886,10 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb,
>  			ret = add_inode_ref(wc->trans, root, log, path,
>  					    eb, i, &key);
>  			BUG_ON(ret && ret != -ENOENT);
> +		} else if (key.type == BTRFS_INODE_EXTREF_KEY) {
> +			ret = add_inode_ref(wc->trans, root, log, path,
> +					    eb, i, &key);
> +			BUG_ON(ret && ret != -ENOENT);
>  		} else if (key.type == BTRFS_EXTENT_DATA_KEY) {
>  			ret = replay_one_extent(wc->trans, root, path,
>  						eb, i, &key);

That's it for this round, here comes the promised patch. Should apply with "git
am --scissors".

Can you please add a version to the subject before sending the revised patches?
Like "git format-patch --subject-prefix v4".

Thanks,
-Jan

-- >8 --
Subject: [PATCH] Btrfs: improved readablity for add_inode_ref

Moved part of the code into a sub function and replaced most of the gotos
by ifs, hoping that it will be easier to read now.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
---
 fs/btrfs/tree-log.c |  178 ++++++++++++++++++++++++++++-----------------------
 1 files changed, 97 insertions(+), 81 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index c86670f..59f3071 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -786,76 +786,18 @@ out:
 	return match;
 }

-
-/*
- * replay one inode back reference item found in the log tree.
- * eb, slot and key refer to the buffer and key found in the log tree.
- * root is the destination we are replaying into, and path is for temp
- * use by this function.  (it should be released on return).
- */
-static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
+static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
 				  struct btrfs_root *root,
-				  struct btrfs_root *log,
 				  struct btrfs_path *path,
-				  struct extent_buffer *eb, int slot,
-				  struct btrfs_key *key)
+				  struct btrfs_root *log_root,
+				  struct inode *dir, struct inode *inode,
+				  struct btrfs_key *key,
+				  struct extent_buffer *eb,
+				  struct btrfs_inode_ref *ref,
+				  char *name, int namelen, int *search_done)
 {
-	struct btrfs_inode_ref *ref;
-	struct btrfs_dir_item *di;
-	struct inode *dir;
-	struct inode *inode;
-	unsigned long ref_ptr;
-	unsigned long ref_end;
-	char *name;
-	int namelen;
 	int ret;
-	int search_done = 0;
-
-	/*
-	 * it is possible that we didn't log all the parent directories
-	 * for a given inode.  If we don't find the dir, just don't
-	 * copy the back ref in.  The link count fixup code will take
-	 * care of the rest
-	 */
-	dir = read_one_inode(root, key->offset);
-	if (!dir)
-		return -ENOENT;
-
-	inode = read_one_inode(root, key->objectid);
-	if (!inode) {
-		iput(dir);
-		return -EIO;
-	}
-
-	ref_ptr = btrfs_item_ptr_offset(eb, slot);
-	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
-
-again:
-	ref = (struct btrfs_inode_ref *)ref_ptr;
-
-	namelen = btrfs_inode_ref_name_len(eb, ref);
-	name = kmalloc(namelen, GFP_NOFS);
-	BUG_ON(!name);
-
-	read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
-
-	/* if we already have a perfect match, we're done */
-	if (inode_in_dir(root, path, btrfs_ino(dir), btrfs_ino(inode),
-			 btrfs_inode_ref_index(eb, ref),
-			 name, namelen)) {
-		goto out;
-	}
-
-	/*
-	 * look for a conflicting back reference in the metadata.
-	 * if we find one we have to unlink that name of the file
-	 * before we add our new link.  Later on, we overwrite any
-	 * existing back reference, and we don't want to create
-	 * dangling pointers in the directory.
-	 */
-
-	if (search_done)
-		goto insert;
+	struct btrfs_dir_item *di;

 	ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
 	if (ret == 0) {
@@ -870,7 +812,7 @@ again:
 		 * if so, just jump out, we're done
 		 */
 		if (key->objectid == key->offset)
-			goto out_nowrite;
+			return 1;

 		/* check all the names in this back reference to see
 		 * if they are in the log.  if so, we allow them to stay
@@ -889,7 +831,7 @@ again:
 					   (unsigned long)(victim_ref + 1),
 					   victim_name_len);

-			if (!backref_in_log(log, key, victim_name,
+			if (!backref_in_log(log_root, key, victim_name,
 					    victim_name_len)) {
 				btrfs_inc_nlink(inode);
 				btrfs_release_path(path);
@@ -908,7 +850,7 @@ again:
 		 * NOTE: we have searched root tree and checked the
 		 * coresponding ref, it does not need to check again.
 		 */
-		search_done = 1;
+		*search_done = 1;
 	}
 	btrfs_release_path(path);

@@ -931,25 +873,99 @@ again:
 	}
 	btrfs_release_path(path);

-insert:
-	/* insert our name */
-	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
-			     btrfs_inode_ref_index(eb, ref));
-	BUG_ON(ret);
+	return 0;
+}
+
+/*
+ * replay one inode back reference item found in the log tree.
+ * eb, slot and key refer to the buffer and key found in the log tree.
+ * root is the destination we are replaying into, and path is for temp
+ * use by this function.  (it should be released on return).
+ */
+static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
+				  struct btrfs_root *root,
+				  struct btrfs_root *log,
+				  struct btrfs_path *path,
+				  struct extent_buffer *eb, int slot,
+				  struct btrfs_key *key)
+{
+	struct btrfs_inode_ref *ref;
+	struct inode *dir;
+	struct inode *inode;
+	unsigned long ref_ptr;
+	unsigned long ref_end;
+	char *name;
+	int namelen;
+	int ret;
+	int search_done = 0;
+
+	/*
+	 * it is possible that we didn't log all the parent directories
+	 * for a given inode.  If we don't find the dir, just don't
+	 * copy the back ref in.  The link count fixup code will take
+	 * care of the rest
+	 */
+	dir = read_one_inode(root, key->offset);
+	if (!dir)
+		return -ENOENT;

-	btrfs_update_inode(trans, root, inode);
+	inode = read_one_inode(root, key->objectid);
+	if (!inode) {
+		iput(dir);
+		return -EIO;
+	}

-out:
-	ref_ptr = (unsigned long)(ref + 1) + namelen;
-	kfree(name);
-	if (ref_ptr < ref_end)
-		goto again;
+	ref_ptr = btrfs_item_ptr_offset(eb, slot);
+	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
+
+	while (ref_ptr < ref_end) {
+		ref = (struct btrfs_inode_ref *)ref_ptr;
+
+		namelen = btrfs_inode_ref_name_len(eb, ref);
+		name = kmalloc(namelen, GFP_NOFS);
+		BUG_ON(!name);
+
+		read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
+
+		/* if we already have a perfect match, we're done */
+		if (!inode_in_dir(root, path, btrfs_ino(dir), btrfs_ino(inode),
+				  btrfs_inode_ref_index(eb, ref),
+				  name, namelen)) {
+			/*
+			 * look for a conflicting back reference in the
+			 * metadata. if we find one we have to unlink that name
+			 * of the file before we add our new link.  Later on, we
+			 * overwrite any existing back reference, and we don't
+			 * want to create dangling pointers in the directory.
+			 */
+
+			if (!search_done) {
+				ret = __add_inode_ref(trans, root, path, log,
+						      dir, inode, key, eb, ref,
+						      name, namelen,
+						      &search_done);
+				if (ret == 1)
+					goto out;
+				BUG_ON(ret);
+			}
+
+			/* insert our name */
+			ret = btrfs_add_link(trans, dir, inode, name, namelen,
+					     0, btrfs_inode_ref_index(eb, ref));
+			BUG_ON(ret);
+
+			btrfs_update_inode(trans, root, inode);
+		}
+
+		ref_ptr = (unsigned long)(ref + 1) + namelen;
+		kfree(name);
+	}

 	/* finally write the back reference in the inode */
 	ret = overwrite_item(trans, root, path, eb, slot, key);
 	BUG_ON(ret);

-out_nowrite:
+out:
 	btrfs_release_path(path);
 	iput(dir);
 	iput(inode);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/3] btrfs: extended inode refs
  2012-08-08 18:55 [PATCH 0/3] " Mark Fasheh
@ 2012-08-08 18:55 ` Mark Fasheh
  2012-08-15 15:04   ` Jan Schmidt
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Fasheh @ 2012-08-08 18:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, Jan Schmidt, Mark Fasheh

Teach tree-log.c about extended inode refs. In particular, we have to adjust
the behavior of inode ref replay as well as log tree recovery to account for
the existence of extended refs.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/backref.c  |   68 ++++++++++++
 fs/btrfs/backref.h  |    5 +
 fs/btrfs/tree-log.c |  297 ++++++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 319 insertions(+), 51 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index a383c18..658e09c 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -1111,6 +1111,74 @@ static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
 				found_key);
 }
 
+int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
+			  u64 start_off, struct btrfs_path *path,
+			  struct btrfs_inode_extref **ret_extref,
+			  u64 *found_off)
+{
+	int ret, slot;
+	struct btrfs_key key;
+	struct btrfs_key found_key;
+	struct btrfs_inode_extref *extref;
+	struct extent_buffer *leaf;
+	unsigned long ptr;
+
+	key.objectid = inode_objectid;
+	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
+	key.offset = start_off;
+
+	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+	if (ret < 0)
+		return ret;
+
+	while (1) {
+		leaf = path->nodes[0];
+		slot = path->slots[0];
+		if (slot >= btrfs_header_nritems(leaf)) {
+			/*
+			 * If the item at offset is not found,
+			 * btrfs_search_slot will point us to the slot
+			 * where it should be inserted. In our case
+			 * that will be the slot directly before the
+			 * next INODE_REF_KEY_V2 item. In the case
+			 * that we're pointing to the last slot in a
+			 * leaf, we must move one leaf over.
+			 */
+			ret = btrfs_next_leaf(root, path);
+			if (ret) {
+				if (ret >= 1)
+					ret = -ENOENT;
+				break;
+			}
+			continue;
+		}
+
+		btrfs_item_key_to_cpu(leaf, &found_key, slot);
+
+		/*
+		 * Check that we're still looking at an extended ref key for
+		 * this particular objectid. If we have different
+		 * objectid or type then there are no more to be found
+		 * in the tree and we can exit.
+		 */
+		ret = -ENOENT;
+		if (found_key.objectid != inode_objectid)
+			break;
+		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
+			break;
+
+		ret = 0;
+		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
+		extref = (struct btrfs_inode_extref *)ptr;
+		*ret_extref = extref;
+		if (found_off)
+			*found_off = found_key.offset;
+		break;
+	}
+
+	return ret;
+}
+
 /*
  * this iterates to turn a btrfs_inode_ref into a full filesystem path. elements
  * of the path are separated by '/' and the path is guaranteed to be
diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
index c18d8ac..9f3e251 100644
--- a/fs/btrfs/backref.h
+++ b/fs/btrfs/backref.h
@@ -66,4 +66,9 @@ struct inode_fs_paths *init_ipath(s32 total_bytes, struct btrfs_root *fs_root,
 					struct btrfs_path *path);
 void free_ipath(struct inode_fs_paths *ipath);
 
+int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
+			  u64 start_off, struct btrfs_path *path,
+			  struct btrfs_inode_extref **ret_extref,
+			  u64 *found_off);
+
 #endif
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 8abeae4..e5ba0a4 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -23,8 +23,10 @@
 #include "disk-io.h"
 #include "locking.h"
 #include "print-tree.h"
+#include "backref.h"
 #include "compat.h"
 #include "tree-log.h"
+#include "hash.h"
 
 /* magic values for the inode_only field in btrfs_log_inode:
  *
@@ -764,8 +766,16 @@ static noinline int backref_in_log(struct btrfs_root *log,
 	if (ret != 0)
 		goto out;
 
-	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
 	ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]);
+
+	if (key->type == BTRFS_INODE_EXTREF_KEY) {
+		if (btrfs_find_name_in_ext_backref(path, name, namelen, NULL))
+			match = 1;
+
+		goto out;
+	}
+
+	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
 	ptr_end = ptr + item_size;
 	while (ptr < ptr_end) {
 		ref = (struct btrfs_inode_ref *)ptr;
@@ -786,6 +796,47 @@ out:
 	return match;
 }
 
+static int extref_get_fields(struct extent_buffer *eb, int slot,
+			     u32 *namelen, char **name, u64 *index,
+			     u64 *parent_objectid)
+{
+	struct btrfs_inode_extref *extref;
+
+	extref = (struct btrfs_inode_extref *)btrfs_item_ptr_offset(eb, slot);
+
+	*namelen = btrfs_inode_extref_name_len(eb, extref);
+	*name = kmalloc(*namelen, GFP_NOFS);
+	if (*name == NULL)
+		return -ENOMEM;
+
+	read_extent_buffer(eb, *name, (unsigned long)&extref->name,
+			   *namelen);
+
+	*index = btrfs_inode_extref_index(eb, extref);
+	if (parent_objectid)
+		*parent_objectid = btrfs_inode_extref_parent(eb, extref);
+
+	return 0;
+}
+
+static int ref_get_fields(struct extent_buffer *eb, int slot, u32 *namelen,
+			  char **name, u64 *index)
+{
+	struct btrfs_inode_ref *ref;
+
+	ref = (struct btrfs_inode_ref *)btrfs_item_ptr_offset(eb, slot);
+
+	*namelen = btrfs_inode_ref_name_len(eb, ref);
+	*name = kmalloc(*namelen, GFP_NOFS);	
+	if (*name == NULL)
+		return -ENOMEM;
+
+	read_extent_buffer(eb, *name, (unsigned long)(ref + 1), *namelen);
+
+	*index = btrfs_inode_ref_index(eb, ref);
+
+	return 0;
+}
 
 /*
  * replay one inode back reference item found in the log tree.
@@ -801,15 +852,50 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
 				  struct btrfs_key *key)
 {
 	struct btrfs_inode_ref *ref;
+	struct btrfs_inode_extref *extref;
 	struct btrfs_dir_item *di;
+	struct btrfs_key search_key;
 	struct inode *dir;
 	struct inode *inode;
 	unsigned long ref_ptr;
 	unsigned long ref_end;
 	char *name;
+	char *victim_name;
 	int namelen;
+	int victim_name_len;
 	int ret;
 	int search_done = 0;
+	int log_ref_ver = 0;
+	u64 parent_objectid;
+	u64 inode_objectid;
+	u64 ref_index;
+	struct extent_buffer *leaf;
+	int ref_struct_size;
+
+
+	ref_ptr = btrfs_item_ptr_offset(eb, slot);
+	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
+
+	if (key->type == BTRFS_INODE_EXTREF_KEY) {	
+		ref_struct_size = sizeof(*extref);
+		log_ref_ver = 1;
+
+		ret = extref_get_fields(eb, slot, &namelen, &name, &ref_index,
+					&parent_objectid);
+		if (ret)
+			return ret;
+	} else {
+		ref_struct_size = sizeof(*ref);
+
+		ret = ref_get_fields(eb, slot, &namelen, &name, &ref_index);
+		if (ret)
+			return ret;
+
+
+		parent_objectid = key->offset;
+	}
+
+	inode_objectid = key->objectid;
 
 	/*
 	 * it is possible that we didn't log all the parent directories
@@ -817,32 +903,20 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
 	 * copy the back ref in.  The link count fixup code will take
 	 * care of the rest
 	 */
-	dir = read_one_inode(root, key->offset);
+	dir = read_one_inode(root, parent_objectid);
 	if (!dir)
 		return -ENOENT;
 
-	inode = read_one_inode(root, key->objectid);
+	inode = read_one_inode(root, inode_objectid);
 	if (!inode) {
 		iput(dir);
 		return -EIO;
 	}
 
-	ref_ptr = btrfs_item_ptr_offset(eb, slot);
-	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
-
 again:
-	ref = (struct btrfs_inode_ref *)ref_ptr;
-
-	namelen = btrfs_inode_ref_name_len(eb, ref);
-	name = kmalloc(namelen, GFP_NOFS);
-	BUG_ON(!name);
-
-	read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
-
 	/* if we already have a perfect match, we're done */
 	if (inode_in_dir(root, path, btrfs_ino(dir), btrfs_ino(inode),
-			 btrfs_inode_ref_index(eb, ref),
-			 name, namelen)) {
+			 ref_index, name, namelen)) {
 		goto out;
 	}
 
@@ -857,19 +931,23 @@ again:
 	if (search_done)
 		goto insert;
 
-	ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
+	/* Search old style refs */
+	search_key.objectid = inode_objectid;
+	search_key.type = BTRFS_INODE_REF_KEY;
+	search_key.offset = parent_objectid;
+
+	ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0);
 	if (ret == 0) {
-		char *victim_name;
-		int victim_name_len;
 		struct btrfs_inode_ref *victim_ref;
 		unsigned long ptr;
 		unsigned long ptr_end;
-		struct extent_buffer *leaf = path->nodes[0];
+
+		leaf = path->nodes[0];
 
 		/* are we trying to overwrite a back ref for the root directory
 		 * if so, just jump out, we're done
 		 */
-		if (key->objectid == key->offset)
+		if (search_key.objectid == search_key.offset)
 			goto out_nowrite;
 
 		/* check all the names in this back reference to see
@@ -889,7 +967,7 @@ again:
 					   (unsigned long)(victim_ref + 1),
 					   victim_name_len);
 
-			if (!backref_in_log(log, key, victim_name,
+			if (!backref_in_log(log, &search_key, victim_name,
 					    victim_name_len)) {
 				btrfs_inc_nlink(inode);
 				btrfs_release_path(path);
@@ -903,19 +981,61 @@ again:
 			ptr = (unsigned long)(victim_ref + 1) + victim_name_len;
 		}
 		BUG_ON(ret);
-
 		/*
 		 * NOTE: we have searched root tree and checked the
-		 * coresponding ref, it does not need to check again.
+		 * coresponding refs, it does not need to be checked again.
 		 */
 		search_done = 1;
 	}
 	btrfs_release_path(path);
 
+	/* Same search but for extended refs */
+	extref = btrfs_lookup_inode_extref(NULL, root, path, name, namelen,
+					   inode_objectid, parent_objectid, 0,
+					   0);
+	if (!IS_ERR_OR_NULL(extref)) {
+		u32 item_size;
+		u32 cur_offset = 0;
+		unsigned long base;
+
+		leaf = path->nodes[0];
+
+		item_size = btrfs_item_size_nr(leaf, path->slots[0]);
+		base = btrfs_item_ptr_offset(leaf, path->slots[0]);
+
+		while (cur_offset < item_size) {
+			extref = (struct btrfs_inode_extref *)base + cur_offset;
+
+			victim_name_len = btrfs_inode_extref_name_len(eb, extref);
+			victim_name = kmalloc(namelen, GFP_NOFS);
+			leaf = path->nodes[0];
+			read_extent_buffer(eb, name, (unsigned long)&extref->name, namelen);
+
+			search_key.objectid = inode_objectid;
+			search_key.type = BTRFS_INODE_EXTREF_KEY;
+			search_key.offset = btrfs_extref_hash(parent_objectid,
+							      name, namelen);
+			if (!backref_in_log(log, &search_key, victim_name,
+					    victim_name_len)) {
+				btrfs_inc_nlink(inode);
+				btrfs_release_path(path);
+
+				ret = btrfs_unlink_inode(trans, root, dir,
+							 inode, victim_name,
+							 victim_name_len);
+			}
+			kfree(victim_name);
+			BUG_ON(ret);
+
+			cur_offset += victim_name_len + sizeof(*extref);
+		}
+		search_done = 1;
+	}
+
+
 	/* look for a conflicting sequence number */
 	di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir),
-					 btrfs_inode_ref_index(eb, ref),
-					 name, namelen, 0);
+					 ref_index, name, namelen, 0);
 	if (di && !IS_ERR(di)) {
 		ret = drop_one_dir_item(trans, root, path, dir, di);
 		BUG_ON(ret);
@@ -933,17 +1053,25 @@ again:
 
 insert:
 	/* insert our name */
-	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
-			     btrfs_inode_ref_index(eb, ref));
+	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0, ref_index);
 	BUG_ON(ret);
 
 	btrfs_update_inode(trans, root, inode);
 
 out:
-	ref_ptr = (unsigned long)(ref + 1) + namelen;
+	ref_ptr = (unsigned long)(ref_ptr + ref_struct_size) + namelen;
 	kfree(name);
-	if (ref_ptr < ref_end)
+	if (ref_ptr < ref_end) {
+		if (log_ref_ver) {
+			ret = extref_get_fields(eb, slot, &namelen, &name,
+						&ref_index, NULL);
+		} else {
+			ret = ref_get_fields(eb, slot, &namelen, &name,
+					     &ref_index);
+		}
+		BUG_ON(ret);
 		goto again;
+	}
 
 	/* finally write the back reference in the inode */
 	ret = overwrite_item(trans, root, path, eb, slot, key);
@@ -966,25 +1094,52 @@ static int insert_orphan_item(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
+static int count_inode_extrefs(struct btrfs_root *root,
+			       struct inode *inode, struct btrfs_path *path)
+{
+	int ret = 0;
+	int name_len;
+	unsigned int nlink = 0;
+	u32 item_size;
+	u32 cur_offset = 0;
+	u64 inode_objectid = btrfs_ino(inode);
+	u64 offset = 0;
+	unsigned long ptr;
+	struct btrfs_inode_extref *extref;
+	struct extent_buffer *leaf;
+
+	while (1) {
+		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
+					    &extref, &offset);
+		if (ret)
+			break;
 
-/*
- * There are a few corners where the link count of the file can't
- * be properly maintained during replay.  So, instead of adding
- * lots of complexity to the log code, we just scan the backrefs
- * for any file that has been through replay.
- *
- * The scan will update the link count on the inode to reflect the
- * number of back refs found.  If it goes down to zero, the iput
- * will free the inode.
- */
-static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
-					   struct btrfs_root *root,
-					   struct inode *inode)
+		leaf = path->nodes[0];
+		item_size = btrfs_item_size_nr(leaf, path->slots[0]);
+		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
+
+		while (cur_offset < item_size) {
+			extref = (struct btrfs_inode_extref *) (ptr + cur_offset);
+			name_len = btrfs_inode_extref_name_len(leaf, extref);
+
+			nlink++;
+
+			cur_offset += name_len + sizeof(*extref);
+		}
+
+		offset++;
+	}
+	btrfs_release_path(path);
+
+	return (ret == 0) ? nlink : ret;
+}
+
+static int count_inode_refs(struct btrfs_root *root,
+			       struct inode *inode, struct btrfs_path *path)
 {
-	struct btrfs_path *path;
 	int ret;
 	struct btrfs_key key;
-	u64 nlink = 0;
+	unsigned int nlink = 0;
 	unsigned long ptr;
 	unsigned long ptr_end;
 	int name_len;
@@ -994,10 +1149,6 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
 	key.type = BTRFS_INODE_REF_KEY;
 	key.offset = (u64)-1;
 
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
-
 	while (1) {
 		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
 		if (ret < 0)
@@ -1031,6 +1182,45 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
 		btrfs_release_path(path);
 	}
 	btrfs_release_path(path);
+
+	return nlink;
+}
+
+/*
+ * There are a few corners where the link count of the file can't
+ * be properly maintained during replay.  So, instead of adding
+ * lots of complexity to the log code, we just scan the backrefs
+ * for any file that has been through replay.
+ *
+ * The scan will update the link count on the inode to reflect the
+ * number of back refs found.  If it goes down to zero, the iput
+ * will free the inode.
+ */
+static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
+					   struct btrfs_root *root,
+					   struct inode *inode)
+{
+	struct btrfs_path *path;
+	int ret;
+	u64 nlink = 0;
+	u64 ino = btrfs_ino(inode);
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	ret = count_inode_refs(root, inode, path);
+	if (ret < 0)
+		goto out;
+
+	nlink = ret;
+
+	ret = count_inode_extrefs(root, inode, path);
+	if (ret < 0)
+		goto out;
+
+	nlink += ret;
+
 	if (nlink != inode->i_nlink) {
 		set_nlink(inode, nlink);
 		btrfs_update_inode(trans, root, inode);
@@ -1046,9 +1236,10 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
 		ret = insert_orphan_item(trans, root, ino);
 		BUG_ON(ret);
 	}
-	btrfs_free_path(path);
 
-	return 0;
+out:
+	btrfs_free_path(path);
+	return ret;
 }
 
 static noinline int fixup_inode_link_counts(struct btrfs_trans_handle *trans,
@@ -1695,6 +1886,10 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb,
 			ret = add_inode_ref(wc->trans, root, log, path,
 					    eb, i, &key);
 			BUG_ON(ret && ret != -ENOENT);
+		} else if (key.type == BTRFS_INODE_EXTREF_KEY) {
+			ret = add_inode_ref(wc->trans, root, log, path,
+					    eb, i, &key);
+			BUG_ON(ret && ret != -ENOENT);
 		} else if (key.type == BTRFS_EXTENT_DATA_KEY) {
 			ret = replay_one_extent(wc->trans, root, path,
 						eb, i, &key);
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] btrfs: extended inode refs
  2012-05-03 23:12     ` Mark Fasheh
@ 2012-05-04 11:39       ` David Sterba
  0 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2012-05-04 11:39 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: Jan Schmidt, linux-btrfs, Chris Mason, Josef Bacik

On Thu, May 03, 2012 at 04:12:21PM -0700, Mark Fasheh wrote:
> > > +
> > > +		ref_ptr = btrfs_item_ptr_offset(eb, slot);
> > > +
> > > +		/* So that we don't loop back looking for old style log refs. */
> > > +		ref_end = ref_ptr;
> > > +
> > > +		extref = (struct btrfs_inode_extref *) btrfs_item_ptr_offset(eb, slot);
> > > +		namelen = btrfs_inode_extref_name_len(eb, extref);
> > > +		name = kmalloc(namelen, GFP_NOFS);
> > 
> > kmalloc may fail.
> 
> Fixed both instances of this. I'm just testing for null return from kmalloc
> and bubbling the -ENOMEM back up. The callers of add_inode_ref() will wind
> up BUGing on us anyway but that's beyond the scope of this patch.

Yes, this is consistent with the rest of no-mem handling. Fixing all
caller paths is not always trivial and one does not want to do it during
a patch development.


david

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] btrfs: extended inode refs
  2012-04-12 13:08   ` Jan Schmidt
@ 2012-05-03 23:12     ` Mark Fasheh
  2012-05-04 11:39       ` David Sterba
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Fasheh @ 2012-05-03 23:12 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: linux-btrfs, Chris Mason, Josef Bacik

On Thu, Apr 12, 2012 at 03:08:35PM +0200, Jan Schmidt wrote:
> On 05.04.2012 22:09, Mark Fasheh wrote:
> > Teach tree-log.c about extended inode refs. In particular, we have to adjust
> > the behavior of inode ref replay as well as log tree recovery to account for
> > the existence of extended refs.
> > 
> > Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> > ---
> >  fs/btrfs/tree-log.c |  320 +++++++++++++++++++++++++++++++++++++++++---------
> >  fs/btrfs/tree-log.h |    4 +
> >  2 files changed, 266 insertions(+), 58 deletions(-)
> > 
> > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > index 966cc74..d69b07a 100644
> > --- a/fs/btrfs/tree-log.c
> > +++ b/fs/btrfs/tree-log.c
> > @@ -23,6 +23,7 @@
> >  #include "disk-io.h"
> >  #include "locking.h"
> >  #include "print-tree.h"
> > +#include "backref.h"
> 
> So now tree-log.c includes backref.h and backref.c includes tree-log.h,
> this is not a problem by itself, but I'd try to avoid such dependencies.
> I'd put find_one_extref over to backref.c to solve this, also because
> there it would be closer to inode_ref_info (which does the same for
> INODE_REFs.

Yeah good idea. I went ahead and moved find_one_extref to backref.c as you
suggest.


> > @@ -786,7 +804,6 @@ out:
> >  	return match;
> >  }
> >  
> > -
> >  /*
> >   * replay one inode back reference item found in the log tree.
> >   * eb, slot and key refer to the buffer and key found in the log tree.
> > @@ -801,15 +818,20 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
> >  				  struct btrfs_key *key)
> >  {
> >  	struct btrfs_inode_ref *ref;
> > +	struct btrfs_inode_extref *extref;
> >  	struct btrfs_dir_item *di;
> > +	struct btrfs_key search_key;
> 
> You don't need search_key, just use key from the parameter list as the
> code did previously.

I force the value of search key (to look for refs) however while the key
value from the parameter list could be of any kind of inode ref.



> >  	struct inode *dir;
> >  	struct inode *inode;
> >  	unsigned long ref_ptr;
> >  	unsigned long ref_end;
> > -	char *name;
> > -	int namelen;
> > +	char *name, *victim_name;
> > +	int namelen, victim_name_len;
> 
> split
> 
> >  	int ret;
> >  	int search_done = 0;
> > +	int log_ref_ver = 0;
> > +	u64 parent_objectid, inode_objectid, ref_index;
> 
> split

done, thanks.


> > +	struct extent_buffer *leaf;
> >  
> >  	/*
> >  	 * it is possible that we didn't log all the parent directories
> > @@ -817,32 +839,56 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
> >  	 * copy the back ref in.  The link count fixup code will take
> >  	 * care of the rest
> >  	 */
> > -	dir = read_one_inode(root, key->offset);
> > +
> > +	if (key->type == BTRFS_INODE_EXTREF_KEY) {
> > +		log_ref_ver = 1;
>                 ^^^^^^^^^^^^^^^
> Assigned but never used.

Almost got rid of this but it turns out after some changes, I'm using it
now :)


> > +
> > +		ref_ptr = btrfs_item_ptr_offset(eb, slot);
> > +
> > +		/* So that we don't loop back looking for old style log refs. */
> > +		ref_end = ref_ptr;
> > +
> > +		extref = (struct btrfs_inode_extref *) btrfs_item_ptr_offset(eb, slot);
> > +		namelen = btrfs_inode_extref_name_len(eb, extref);
> > +		name = kmalloc(namelen, GFP_NOFS);
> 
> kmalloc may fail.

Fixed both instances of this. I'm just testing for null return from kmalloc
and bubbling the -ENOMEM back up. The callers of add_inode_ref() will wind
up BUGing on us anyway but that's beyond the scope of this patch.


> > +
> > +		read_extent_buffer(eb, name, (unsigned long)&extref->name,
> > +				   namelen);
> > +
> > +		ref_index = btrfs_inode_extref_index(eb, extref);
> > +		parent_objectid = btrfs_inode_extref_parent(eb, extref);
> > +	} else {
> > +		parent_objectid = key->offset;
> > +
> > +		ref_ptr = btrfs_item_ptr_offset(eb, slot);
> > +		ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
> > +
> > +		ref = (struct btrfs_inode_ref *)ref_ptr;
> > +		namelen = btrfs_inode_ref_name_len(eb, ref);
> > +		name = kmalloc(namelen, GFP_NOFS);
> > +		BUG_ON(!name);
> > +
> > +		read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> > +
> > +		ref_index = btrfs_inode_ref_index(eb, ref);
> 
> The way you put it, you had to copy all the code from the else block
> down before the "goto again" case. Code duplication is error prone
>
> Maybe you can manage to put everything between "again:" and "goto again"
> into a separate function that can be called multiple times. For the old
> INODE_REF items that could be done in a loop then. This would also split
> an overly long function into more readable units.

Hmm, ok I pushed the variable setting code into a function for each type of
ref. That cleans things up and keeps the code centralized.

> > +	/* Same search but for extended refs */
> > +	extref = btrfs_lookup_inode_extref(NULL, root, path, name, namelen,
> > +					   inode_objectid, parent_objectid, 0,
> > +					   0);
> > +	if (extref && !IS_ERR(extref)) {
> 
> if (!IS_ERR_OR_NULL(extref))

Nice, thanks for the tip.



> > +		victim_name_len = btrfs_inode_extref_name_len(eb, extref);
> > +		victim_name = kmalloc(namelen, GFP_NOFS);
> > +		leaf = path->nodes[0];
> > +		read_extent_buffer(eb, name, (unsigned long)&extref->name, namelen);
> > +
> > +		search_key.objectid = inode_objectid;
> > +		search_key.type = BTRFS_INODE_EXTREF_KEY;
> > +		search_key.offset = btrfs_extref_key_off(parent_objectid, name, namelen);
> > +		if (!backref_in_log(log, &search_key, victim_name,
> > +				    victim_name_len)) {
> > +			btrfs_inc_nlink(inode);
> > +			btrfs_release_path(path);
> > +
> > +			ret = btrfs_unlink_inode(trans, root, dir,
> > +						 inode, victim_name,
> > +						 victim_name_len);
> > +		}
> > +		kfree(victim_name);
> > +		BUG_ON(ret);
> > +	}
> > +
> > +	/*
> > +	 * NOTE: we have searched root tree and checked the
> > +	 * coresponding refs, it does not need to be checked again.
> > +	 */
> > +	search_done = 1;
> 
> You moved this comment and assignment out of the "if (ret == 0)" case.
> I'm not sure if this is still doing exactly the same now (?).
> Previously, we were executing another btrfs_search_slot,
> btrfs_lookup_dir_index_item, ... after the "goto again" case, which
> would be skipped with this patch.

Hmm, ok you're definitely right that the search_done line there is broken.
Come to think of it, I'm not quite sure what the meaning of that tiny bit of
code was. I'll come back to this one once I've looked closer.


> > +
> >  	/* look for a conflicting sequence number */
> >  	di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir),
> > -					 btrfs_inode_ref_index(eb, ref),
> > -					 name, namelen, 0);
> > +					 ref_index, name, namelen, 0);
> >  	if (di && !IS_ERR(di)) {
> >  		ret = drop_one_dir_item(trans, root, path, dir, di);
> >  		BUG_ON(ret);
> > @@ -932,17 +1007,25 @@ again:
> >  
> >  insert:
> >  	/* insert our name */
> > -	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
> > -			     btrfs_inode_ref_index(eb, ref));
> > +	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0, ref_index);
> >  	BUG_ON(ret);
> >  
> >  	btrfs_update_inode(trans, root, inode);
> >  
> >  out:
> > -	ref_ptr = (unsigned long)(ref + 1) + namelen;
> > +	ref_ptr = (unsigned long)(ref_ptr + sizeof(struct btrfs_inode_ref)) + namelen;
> >  	kfree(name);
> > -	if (ref_ptr < ref_end)
> > +	if (ref_ptr < ref_end) {
> > +		ref = (struct btrfs_inode_ref *)ref_ptr;
> > +		namelen = btrfs_inode_ref_name_len(eb, ref);
> > +		name = kmalloc(namelen, GFP_NOFS);
> > +		BUG_ON(!name);
> > +
> > +		read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> > +
> > +		ref_index = btrfs_inode_ref_index(eb, ref);
> 
> This is the duplicated code mentioned above.
> 
> >  		goto again;
> > +	}
> >  
> >  	/* finally write the back reference in the inode */
> >  	ret = overwrite_item(trans, root, path, eb, slot, key);
> > @@ -965,25 +1048,103 @@ static int insert_orphan_item(struct btrfs_trans_handle *trans,
> >  	return ret;
> >  }
> >  
> > +int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> > +			  u64 start_off, struct btrfs_path *path,
> > +			  struct btrfs_inode_extref **ret_ref, u64 *found_off)
> 
> ret_extref

fixed.

> 
> > +{
> > +	int ret, slot;
> > +	struct btrfs_key key, found_key;
> 
> split

fixed.


> > +	struct btrfs_inode_extref *ref;
> > +	struct extent_buffer *leaf;
> > +	struct btrfs_item *item;
> > +	unsigned long ptr;
> >  
> > -/*
> > - * There are a few corners where the link count of the file can't
> > - * be properly maintained during replay.  So, instead of adding
> > - * lots of complexity to the log code, we just scan the backrefs
> > - * for any file that has been through replay.
> > - *
> > - * The scan will update the link count on the inode to reflect the
> > - * number of back refs found.  If it goes down to zero, the iput
> > - * will free the inode.
> > - */
> > -static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> > -					   struct btrfs_root *root,
> > -					   struct inode *inode)
> 
> [side note: this is really strange patch alignment. my git is doing it
> the same way.]
> 
> > +	key.objectid = inode_objectid;
> > +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
> > +	key.offset = start_off;
> > +
> > +	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> > +	if (ret < 0)
> > +		goto out;
> 
> Just return here and drop the out label.

Done.

> 
> > +
> > +	while (1) {
> > +		leaf = path->nodes[0];
> > +		slot = path->slots[0];
> > +		if (slot >= btrfs_header_nritems(leaf)) {
> > +			/*
> > +			 * If the item at offset is not found,
> > +			 * btrfs_search_slot will point us to the slot
> > +			 * where it should be inserted. In our case
> > +			 * that will be the slot directly before the
> > +			 * next INODE_REF_KEY_V2 item. In the case
> > +			 * that we're pointing to the last slot in a
> > +			 * leaf, we must move one leaf over.
> > +			 */
> > +			ret = btrfs_next_leaf(root, path);
> > +			if (ret) {
> > +				if (ret >= 1)
> > +					ret = -ENOENT;
> > +				break;
> > +			}
> 
> This large block (and its comment) can be replaced by
> btrfs_search_slot_for_read with find_higher = 1, return_any = 0. This
> also avoids the "continue" and we even git rid of the whole while (1)
> loop here.

Ok. I don't have that in my tree but presumably it's a very new function I
just need to pick up with an update...



> > +			continue;
> > +		}
> > +
> > +		item = btrfs_item_nr(leaf, slot);
> > +		btrfs_item_key_to_cpu(leaf, &found_key, slot);
> > +
> > +		/*
> > +		 * Check that we're still looking at an extended ref key for
> > +		 * this particular objectid. If we have different
> > +		 * objectid or type then there are no more to be found
> > +		 * in the tree and we can exit.
> > +		 */
> > +		ret = -ENOENT;
> > +		if (found_key.objectid != inode_objectid)
> > +			break;
> > +		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
> > +			break;
> > +
> > +		ret = 0;
> > +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> > +		ref = (struct btrfs_inode_extref *)ptr;
> > +		*ret_ref = ref;
> > +		if (found_off)
> > +			*found_off = found_key.offset + 1;
> > +		break;
> > +	}
> > +
> > +out:
> > +	return ret;
> > +}
> > +
> > +static int count_inode_extrefs(struct btrfs_root *root,
> > +			       struct inode *inode, struct btrfs_path *path)
> > +{
> > +	int ret;
> > +	unsigned int nlink = 0;
> > +	u64 inode_objectid = btrfs_ino(inode);
> > +	u64 offset = 0;
> > +	struct btrfs_inode_extref *ref;
> > +
> > +	while (1) {
> > +		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
> > +					    &ref, &offset);
> > +		if (ret)
> > +			break;
> 
> Assume the first call to btrfs_find_ione_extref returns -EIO. Do we
> really want count_inode_extrefs return 0 here? I agree that the previous
> code suffers from the same problem, but still: it's a problem.

Yeah as you note, I'm just keeping the same behavior as before. This I think
is probably a question for Chris...


> > +
> > +		nlink++;
> > +		offset++;
> > +	}
> > +
> > +	return nlink;
> > +}
> > +
> > +static int count_inode_refs(struct btrfs_root *root,
> > +			       struct inode *inode, struct btrfs_path *path)
> >  {
> > -	struct btrfs_path *path;
> >  	int ret;
> >  	struct btrfs_key key;
> > -	u64 nlink = 0;
> > +	unsigned int nlink = 0;
> >  	unsigned long ptr;
> >  	unsigned long ptr_end;
> >  	int name_len;
> > @@ -993,10 +1154,6 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> >  	key.type = BTRFS_INODE_REF_KEY;
> >  	key.offset = (u64)-1;
> >  
> > -	path = btrfs_alloc_path();
> > -	if (!path)
> > -		return -ENOMEM;
> > -
> >  	while (1) {
> >  		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> >  		if (ret < 0)
> > @@ -1030,6 +1187,48 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> >  		btrfs_release_path(path);
> >  	}
> >  	btrfs_release_path(path);
> > +	btrfs_free_path(path);
> 
> In general, you must not free a path when you don't allocate it.

Ok yeah that one must've gotten past me. Fixed.


> 
> > +
> > +	return nlink;
> > +}
> > +
> > +/*
> > + * There are a few corners where the link count of the file can't
> > + * be properly maintained during replay.  So, instead of adding
> > + * lots of complexity to the log code, we just scan the backrefs
> > + * for any file that has been through replay.
> > + *
> > + * The scan will update the link count on the inode to reflect the
> > + * number of back refs found.  If it goes down to zero, the iput
> > + * will free the inode.
> > + */
> > +static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> > +					   struct btrfs_root *root,
> > +					   struct inode *inode)
> > +{
> > +	struct btrfs_path *path;
> > +	int ret;
> > +	u64 nlink = 0;
> > +	u64 ino = btrfs_ino(inode);
> > +
> > +	path = btrfs_alloc_path();
> > +	if (!path)
> > +		return -ENOMEM;
> > +
> > +	ret = count_inode_refs(root, inode, path);
> > +	btrfs_release_path(path);
> 
> Either count_inode_refs should alloc it's private path, or it should
> return the path in a released state. The caller should not be
> responsible to pass a clean path and release it afterwards, as
> count_inode_refs does not return anything through the path.
> 
> > +	if (ret < 0)
> > +		goto out;
> > +
> > +	nlink = ret;
> > +
> > +	ret = count_inode_extrefs(root, inode, path);
> > +	btrfs_release_path(path);
> 
> Same here. I'd just put the btrfs_release_path into count_inode_(ext)refs.

Yeah, fixed those up.

Thanks again Jan!
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] btrfs: extended inode refs
  2012-04-12 15:53   ` Jan Schmidt
@ 2012-05-01 18:39     ` Mark Fasheh
  0 siblings, 0 replies; 21+ messages in thread
From: Mark Fasheh @ 2012-05-01 18:39 UTC (permalink / raw)
  To: Jan Schmidt; +Cc: linux-btrfs, Chris Mason, Josef Bacik

On Thu, Apr 12, 2012 at 05:53:15PM +0200, Jan Schmidt wrote:
> Hi Mark,
> 
> While reading 3/3 I stumbled across one more thing in this one:
> 
> On 05.04.2012 22:09, Mark Fasheh wrote:
> > +int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> > +			  u64 start_off, struct btrfs_path *path,
> > +			  struct btrfs_inode_extref **ret_ref, u64 *found_off)
> > +{
> > +	int ret, slot;
> > +	struct btrfs_key key, found_key;
> > +	struct btrfs_inode_extref *ref;
> > +	struct extent_buffer *leaf;
> > +	struct btrfs_item *item;
> > +	unsigned long ptr;
> >  
> > -/*
> > - * There are a few corners where the link count of the file can't
> > - * be properly maintained during replay.  So, instead of adding
> > - * lots of complexity to the log code, we just scan the backrefs
> > - * for any file that has been through replay.
> > - *
> > - * The scan will update the link count on the inode to reflect the
> > - * number of back refs found.  If it goes down to zero, the iput
> > - * will free the inode.
> > - */
> > -static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> > -					   struct btrfs_root *root,
> > -					   struct inode *inode)
> > +	key.objectid = inode_objectid;
> > +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
> > +	key.offset = start_off;
> > +
> > +	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> > +	if (ret < 0)
> > +		goto out;
> > +
> > +	while (1) {
> > +		leaf = path->nodes[0];
> > +		slot = path->slots[0];
> > +		if (slot >= btrfs_header_nritems(leaf)) {
> > +			/*
> > +			 * If the item at offset is not found,
> > +			 * btrfs_search_slot will point us to the slot
> > +			 * where it should be inserted. In our case
> > +			 * that will be the slot directly before the
> > +			 * next INODE_REF_KEY_V2 item. In the case
> > +			 * that we're pointing to the last slot in a
> > +			 * leaf, we must move one leaf over.
> > +			 */
> > +			ret = btrfs_next_leaf(root, path);
> > +			if (ret) {
> > +				if (ret >= 1)
> > +					ret = -ENOENT;
> > +				break;
> > +			}
> > +			continue;
> > +		}
> > +
> > +		item = btrfs_item_nr(leaf, slot);
> > +		btrfs_item_key_to_cpu(leaf, &found_key, slot);
> > +
> > +		/*
> > +		 * Check that we're still looking at an extended ref key for
> > +		 * this particular objectid. If we have different
> > +		 * objectid or type then there are no more to be found
> > +		 * in the tree and we can exit.
> > +		 */
> > +		ret = -ENOENT;
> > +		if (found_key.objectid != inode_objectid)
> > +			break;
> > +		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
> > +			break;
> > +
> > +		ret = 0;
> > +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> > +		ref = (struct btrfs_inode_extref *)ptr;
> > +		*ret_ref = ref;
> > +		if (found_off)
> > +			*found_off = found_key.offset + 1;
>                                                       ^^^
> It's evil to call it "found offset" an then return one larger than the
> offset found. No caller would ever expect this.
> 
> > +		break;
> > +	}
> > +
> > +out:
> > +	return ret;
> > +}
> > +
> > +static int count_inode_extrefs(struct btrfs_root *root,
> > +			       struct inode *inode, struct btrfs_path *path)
> > +{
> > +	int ret;
> > +	unsigned int nlink = 0;
> > +	u64 inode_objectid = btrfs_ino(inode);
> > +	u64 offset = 0;
> > +	struct btrfs_inode_extref *ref;
> > +
> > +	while (1) {
> > +		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
> > +					    &ref, &offset);
> > +		if (ret)
> > +			break;
> > +
> > +		nlink++;
> > +		offset++;
> 		^^^^^^^^
> Huh. See? The caller expected to get the offset found from
> btrfs_find_one_extref. As it stands you might be missing the very next key.

Oh yeah that was totally broken btw. Fixed now, but forgot to mention that
:)

btrfs_find_one_extref() now returns the actual found offset (and the callers
can just increment if they're doing a search).
	--Mark


--
Mark Fasheh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] btrfs: extended inode refs
  2012-04-05 20:09 ` [PATCH 2/3] " Mark Fasheh
  2012-04-12 13:08   ` Jan Schmidt
@ 2012-04-12 15:53   ` Jan Schmidt
  2012-05-01 18:39     ` Mark Fasheh
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Schmidt @ 2012-04-12 15:53 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: linux-btrfs, Chris Mason, Josef Bacik

Hi Mark,

While reading 3/3 I stumbled across one more thing in this one:

On 05.04.2012 22:09, Mark Fasheh wrote:
> +int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> +			  u64 start_off, struct btrfs_path *path,
> +			  struct btrfs_inode_extref **ret_ref, u64 *found_off)
> +{
> +	int ret, slot;
> +	struct btrfs_key key, found_key;
> +	struct btrfs_inode_extref *ref;
> +	struct extent_buffer *leaf;
> +	struct btrfs_item *item;
> +	unsigned long ptr;
>  
> -/*
> - * There are a few corners where the link count of the file can't
> - * be properly maintained during replay.  So, instead of adding
> - * lots of complexity to the log code, we just scan the backrefs
> - * for any file that has been through replay.
> - *
> - * The scan will update the link count on the inode to reflect the
> - * number of back refs found.  If it goes down to zero, the iput
> - * will free the inode.
> - */
> -static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> -					   struct btrfs_root *root,
> -					   struct inode *inode)
> +	key.objectid = inode_objectid;
> +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
> +	key.offset = start_off;
> +
> +	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> +	if (ret < 0)
> +		goto out;
> +
> +	while (1) {
> +		leaf = path->nodes[0];
> +		slot = path->slots[0];
> +		if (slot >= btrfs_header_nritems(leaf)) {
> +			/*
> +			 * If the item at offset is not found,
> +			 * btrfs_search_slot will point us to the slot
> +			 * where it should be inserted. In our case
> +			 * that will be the slot directly before the
> +			 * next INODE_REF_KEY_V2 item. In the case
> +			 * that we're pointing to the last slot in a
> +			 * leaf, we must move one leaf over.
> +			 */
> +			ret = btrfs_next_leaf(root, path);
> +			if (ret) {
> +				if (ret >= 1)
> +					ret = -ENOENT;
> +				break;
> +			}
> +			continue;
> +		}
> +
> +		item = btrfs_item_nr(leaf, slot);
> +		btrfs_item_key_to_cpu(leaf, &found_key, slot);
> +
> +		/*
> +		 * Check that we're still looking at an extended ref key for
> +		 * this particular objectid. If we have different
> +		 * objectid or type then there are no more to be found
> +		 * in the tree and we can exit.
> +		 */
> +		ret = -ENOENT;
> +		if (found_key.objectid != inode_objectid)
> +			break;
> +		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
> +			break;
> +
> +		ret = 0;
> +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +		ref = (struct btrfs_inode_extref *)ptr;
> +		*ret_ref = ref;
> +		if (found_off)
> +			*found_off = found_key.offset + 1;
                                                      ^^^
It's evil to call it "found offset" an then return one larger than the
offset found. No caller would ever expect this.

> +		break;
> +	}
> +
> +out:
> +	return ret;
> +}
> +
> +static int count_inode_extrefs(struct btrfs_root *root,
> +			       struct inode *inode, struct btrfs_path *path)
> +{
> +	int ret;
> +	unsigned int nlink = 0;
> +	u64 inode_objectid = btrfs_ino(inode);
> +	u64 offset = 0;
> +	struct btrfs_inode_extref *ref;
> +
> +	while (1) {
> +		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
> +					    &ref, &offset);
> +		if (ret)
> +			break;
> +
> +		nlink++;
> +		offset++;
		^^^^^^^^
Huh. See? The caller expected to get the offset found from
btrfs_find_one_extref. As it stands you might be missing the very next key.

> +	}
> +
> +	return nlink;
> +}

-Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] btrfs: extended inode refs
  2012-04-05 20:09 ` [PATCH 2/3] " Mark Fasheh
@ 2012-04-12 13:08   ` Jan Schmidt
  2012-05-03 23:12     ` Mark Fasheh
  2012-04-12 15:53   ` Jan Schmidt
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Schmidt @ 2012-04-12 13:08 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: linux-btrfs, Chris Mason, Josef Bacik

On 05.04.2012 22:09, Mark Fasheh wrote:
> Teach tree-log.c about extended inode refs. In particular, we have to adjust
> the behavior of inode ref replay as well as log tree recovery to account for
> the existence of extended refs.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> ---
>  fs/btrfs/tree-log.c |  320 +++++++++++++++++++++++++++++++++++++++++---------
>  fs/btrfs/tree-log.h |    4 +
>  2 files changed, 266 insertions(+), 58 deletions(-)
> 
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 966cc74..d69b07a 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -23,6 +23,7 @@
>  #include "disk-io.h"
>  #include "locking.h"
>  #include "print-tree.h"
> +#include "backref.h"

So now tree-log.c includes backref.h and backref.c includes tree-log.h,
this is not a problem by itself, but I'd try to avoid such dependencies.
I'd put find_one_extref over to backref.c to solve this, also because
there it would be closer to inode_ref_info (which does the same for
INODE_REFs.

>  #include "compat.h"
>  #include "tree-log.h"
>  
> @@ -748,6 +749,7 @@ static noinline int backref_in_log(struct btrfs_root *log,
>  {
>  	struct btrfs_path *path;
>  	struct btrfs_inode_ref *ref;
> +	struct btrfs_inode_extref *extref;
                                   ^^^^^^
:-)

>  	unsigned long ptr;
>  	unsigned long ptr_end;
>  	unsigned long name_ptr;
> @@ -764,8 +766,24 @@ static noinline int backref_in_log(struct btrfs_root *log,
>  	if (ret != 0)
>  		goto out;
>  
> -	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
>  	ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]);
> +
> +	if (key->type == BTRFS_INODE_EXTREF_KEY) {
> +		extref = (struct btrfs_inode_extref *)ptr;
> +
> +		found_name_len = btrfs_inode_extref_name_len(path->nodes[0],
> +							     extref);
> +		if (found_name_len == namelen) {
> +			name_ptr = (unsigned long)&extref->name;
> +			ret = memcmp_extent_buffer(path->nodes[0], name,
> +						   name_ptr, namelen);
> +			if (ret == 0)
> +				match = 1;
> +		}
> +		goto out;
> +	}
> +
> +	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
>  	ptr_end = ptr + item_size;
>  	while (ptr < ptr_end) {
>  		ref = (struct btrfs_inode_ref *)ptr;
> @@ -786,7 +804,6 @@ out:
>  	return match;
>  }
>  
> -
>  /*
>   * replay one inode back reference item found in the log tree.
>   * eb, slot and key refer to the buffer and key found in the log tree.
> @@ -801,15 +818,20 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
>  				  struct btrfs_key *key)
>  {
>  	struct btrfs_inode_ref *ref;
> +	struct btrfs_inode_extref *extref;
>  	struct btrfs_dir_item *di;
> +	struct btrfs_key search_key;

You don't need search_key, just use key from the parameter list as the
code did previously.

>  	struct inode *dir;
>  	struct inode *inode;
>  	unsigned long ref_ptr;
>  	unsigned long ref_end;
> -	char *name;
> -	int namelen;
> +	char *name, *victim_name;
> +	int namelen, victim_name_len;

split

>  	int ret;
>  	int search_done = 0;
> +	int log_ref_ver = 0;
> +	u64 parent_objectid, inode_objectid, ref_index;

split

> +	struct extent_buffer *leaf;
>  
>  	/*
>  	 * it is possible that we didn't log all the parent directories
> @@ -817,32 +839,56 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
>  	 * copy the back ref in.  The link count fixup code will take
>  	 * care of the rest
>  	 */
> -	dir = read_one_inode(root, key->offset);
> +
> +	if (key->type == BTRFS_INODE_EXTREF_KEY) {
> +		log_ref_ver = 1;
                ^^^^^^^^^^^^^^^
Assigned but never used.

> +
> +		ref_ptr = btrfs_item_ptr_offset(eb, slot);
> +
> +		/* So that we don't loop back looking for old style log refs. */
> +		ref_end = ref_ptr;
> +
> +		extref = (struct btrfs_inode_extref *) btrfs_item_ptr_offset(eb, slot);
> +		namelen = btrfs_inode_extref_name_len(eb, extref);
> +		name = kmalloc(namelen, GFP_NOFS);

kmalloc may fail.

> +
> +		read_extent_buffer(eb, name, (unsigned long)&extref->name,
> +				   namelen);
> +
> +		ref_index = btrfs_inode_extref_index(eb, extref);
> +		parent_objectid = btrfs_inode_extref_parent(eb, extref);
> +	} else {
> +		parent_objectid = key->offset;
> +
> +		ref_ptr = btrfs_item_ptr_offset(eb, slot);
> +		ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
> +
> +		ref = (struct btrfs_inode_ref *)ref_ptr;
> +		namelen = btrfs_inode_ref_name_len(eb, ref);
> +		name = kmalloc(namelen, GFP_NOFS);
> +		BUG_ON(!name);
> +
> +		read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> +
> +		ref_index = btrfs_inode_ref_index(eb, ref);

The way you put it, you had to copy all the code from the else block
down before the "goto again" case. Code duplication is error prone.

Maybe you can manage to put everything between "again:" and "goto again"
into a separate function that can be called multiple times. For the old
INODE_REF items that could be done in a loop then. This would also split
an overly long function into more readable units.

> +	}
> +
> +	inode_objectid = key->objectid;
> +
> +	dir = read_one_inode(root, parent_objectid);
>  	if (!dir)
>  		return -ENOENT;
>  
> -	inode = read_one_inode(root, key->objectid);
> +	inode = read_one_inode(root, inode_objectid);
>  	if (!inode) {
>  		iput(dir);
>  		return -EIO;
>  	}
>  
> -	ref_ptr = btrfs_item_ptr_offset(eb, slot);
> -	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
> -
>  again:
> -	ref = (struct btrfs_inode_ref *)ref_ptr;
> -
> -	namelen = btrfs_inode_ref_name_len(eb, ref);
> -	name = kmalloc(namelen, GFP_NOFS);
> -	BUG_ON(!name);
> -
> -	read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> -
>  	/* if we already have a perfect match, we're done */
>  	if (inode_in_dir(root, path, btrfs_ino(dir), btrfs_ino(inode),
> -			 btrfs_inode_ref_index(eb, ref),
> -			 name, namelen)) {
> +			 ref_index, name, namelen)) {
>  		goto out;
>  	}
>  
> @@ -857,19 +903,23 @@ again:
>  	if (search_done)
>  		goto insert;
>  
> -	ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
> +	/* Search old style refs */
> +	search_key.objectid = inode_objectid;
> +	search_key.type = BTRFS_INODE_REF_KEY;
> +	search_key.offset = parent_objectid;
> +
> +	ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0);
>  	if (ret == 0) {
> -		char *victim_name;
> -		int victim_name_len;
>  		struct btrfs_inode_ref *victim_ref;
>  		unsigned long ptr;
>  		unsigned long ptr_end;
> -		struct extent_buffer *leaf = path->nodes[0];
> +
> +		leaf = path->nodes[0];
>  
>  		/* are we trying to overwrite a back ref for the root directory
>  		 * if so, just jump out, we're done
>  		 */
> -		if (key->objectid == key->offset)
> +		if (search_key.objectid == search_key.offset)
>  			goto out_nowrite;
>  
>  		/* check all the names in this back reference to see
> @@ -889,7 +939,7 @@ again:
>  					   (unsigned long)(victim_ref + 1),
>  					   victim_name_len);
>  
> -			if (!backref_in_log(log, key, victim_name,
> +			if (!backref_in_log(log, &search_key, victim_name,
>  					    victim_name_len)) {
>  				btrfs_inc_nlink(inode);
>  				btrfs_release_path(path);
> @@ -902,19 +952,44 @@ again:
>  			ptr = (unsigned long)(victim_ref + 1) + victim_name_len;
>  		}
>  		BUG_ON(ret);
> -
> -		/*
> -		 * NOTE: we have searched root tree and checked the
> -		 * coresponding ref, it does not need to check again.
> -		 */
> -		search_done = 1;
>  	}
>  	btrfs_release_path(path);
>  
> +	/* Same search but for extended refs */
> +	extref = btrfs_lookup_inode_extref(NULL, root, path, name, namelen,
> +					   inode_objectid, parent_objectid, 0,
> +					   0);
> +	if (extref && !IS_ERR(extref)) {

if (!IS_ERR_OR_NULL(extref))

> +		victim_name_len = btrfs_inode_extref_name_len(eb, extref);
> +		victim_name = kmalloc(namelen, GFP_NOFS);
> +		leaf = path->nodes[0];
> +		read_extent_buffer(eb, name, (unsigned long)&extref->name, namelen);
> +
> +		search_key.objectid = inode_objectid;
> +		search_key.type = BTRFS_INODE_EXTREF_KEY;
> +		search_key.offset = btrfs_extref_key_off(parent_objectid, name, namelen);
> +		if (!backref_in_log(log, &search_key, victim_name,
> +				    victim_name_len)) {
> +			btrfs_inc_nlink(inode);
> +			btrfs_release_path(path);
> +
> +			ret = btrfs_unlink_inode(trans, root, dir,
> +						 inode, victim_name,
> +						 victim_name_len);
> +		}
> +		kfree(victim_name);
> +		BUG_ON(ret);
> +	}
> +
> +	/*
> +	 * NOTE: we have searched root tree and checked the
> +	 * coresponding refs, it does not need to be checked again.
> +	 */
> +	search_done = 1;

You moved this comment and assignment out of the "if (ret == 0)" case.
I'm not sure if this is still doing exactly the same now (?).
Previously, we were executing another btrfs_search_slot,
btrfs_lookup_dir_index_item, ... after the "goto again" case, which
would be skipped with this patch.

> +
>  	/* look for a conflicting sequence number */
>  	di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir),
> -					 btrfs_inode_ref_index(eb, ref),
> -					 name, namelen, 0);
> +					 ref_index, name, namelen, 0);
>  	if (di && !IS_ERR(di)) {
>  		ret = drop_one_dir_item(trans, root, path, dir, di);
>  		BUG_ON(ret);
> @@ -932,17 +1007,25 @@ again:
>  
>  insert:
>  	/* insert our name */
> -	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
> -			     btrfs_inode_ref_index(eb, ref));
> +	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0, ref_index);
>  	BUG_ON(ret);
>  
>  	btrfs_update_inode(trans, root, inode);
>  
>  out:
> -	ref_ptr = (unsigned long)(ref + 1) + namelen;
> +	ref_ptr = (unsigned long)(ref_ptr + sizeof(struct btrfs_inode_ref)) + namelen;
>  	kfree(name);
> -	if (ref_ptr < ref_end)
> +	if (ref_ptr < ref_end) {
> +		ref = (struct btrfs_inode_ref *)ref_ptr;
> +		namelen = btrfs_inode_ref_name_len(eb, ref);
> +		name = kmalloc(namelen, GFP_NOFS);
> +		BUG_ON(!name);
> +
> +		read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
> +
> +		ref_index = btrfs_inode_ref_index(eb, ref);

This is the duplicated code mentioned above.

>  		goto again;
> +	}
>  
>  	/* finally write the back reference in the inode */
>  	ret = overwrite_item(trans, root, path, eb, slot, key);
> @@ -965,25 +1048,103 @@ static int insert_orphan_item(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> +			  u64 start_off, struct btrfs_path *path,
> +			  struct btrfs_inode_extref **ret_ref, u64 *found_off)

ret_extref

> +{
> +	int ret, slot;
> +	struct btrfs_key key, found_key;

split

> +	struct btrfs_inode_extref *ref;
> +	struct extent_buffer *leaf;
> +	struct btrfs_item *item;
> +	unsigned long ptr;
>  
> -/*
> - * There are a few corners where the link count of the file can't
> - * be properly maintained during replay.  So, instead of adding
> - * lots of complexity to the log code, we just scan the backrefs
> - * for any file that has been through replay.
> - *
> - * The scan will update the link count on the inode to reflect the
> - * number of back refs found.  If it goes down to zero, the iput
> - * will free the inode.
> - */
> -static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> -					   struct btrfs_root *root,
> -					   struct inode *inode)

[side note: this is really strange patch alignment. my git is doing it
the same way.]

> +	key.objectid = inode_objectid;
> +	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
> +	key.offset = start_off;
> +
> +	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> +	if (ret < 0)
> +		goto out;

Just return here and drop the out label.

> +
> +	while (1) {
> +		leaf = path->nodes[0];
> +		slot = path->slots[0];
> +		if (slot >= btrfs_header_nritems(leaf)) {
> +			/*
> +			 * If the item at offset is not found,
> +			 * btrfs_search_slot will point us to the slot
> +			 * where it should be inserted. In our case
> +			 * that will be the slot directly before the
> +			 * next INODE_REF_KEY_V2 item. In the case
> +			 * that we're pointing to the last slot in a
> +			 * leaf, we must move one leaf over.
> +			 */
> +			ret = btrfs_next_leaf(root, path);
> +			if (ret) {
> +				if (ret >= 1)
> +					ret = -ENOENT;
> +				break;
> +			}

This large block (and its comment) can be replaced by
btrfs_search_slot_for_read with find_higher = 1, return_any = 0. This
also avoids the "continue" and we even git rid of the whole while (1)
loop here.

> +			continue;
> +		}
> +
> +		item = btrfs_item_nr(leaf, slot);
> +		btrfs_item_key_to_cpu(leaf, &found_key, slot);
> +
> +		/*
> +		 * Check that we're still looking at an extended ref key for
> +		 * this particular objectid. If we have different
> +		 * objectid or type then there are no more to be found
> +		 * in the tree and we can exit.
> +		 */
> +		ret = -ENOENT;
> +		if (found_key.objectid != inode_objectid)
> +			break;
> +		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
> +			break;
> +
> +		ret = 0;
> +		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +		ref = (struct btrfs_inode_extref *)ptr;
> +		*ret_ref = ref;
> +		if (found_off)
> +			*found_off = found_key.offset + 1;
> +		break;
> +	}
> +
> +out:
> +	return ret;
> +}
> +
> +static int count_inode_extrefs(struct btrfs_root *root,
> +			       struct inode *inode, struct btrfs_path *path)
> +{
> +	int ret;
> +	unsigned int nlink = 0;
> +	u64 inode_objectid = btrfs_ino(inode);
> +	u64 offset = 0;
> +	struct btrfs_inode_extref *ref;
> +
> +	while (1) {
> +		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
> +					    &ref, &offset);
> +		if (ret)
> +			break;

Assume the first call to btrfs_find_ione_extref returns -EIO. Do we
really want count_inode_extrefs return 0 here? I agree that the previous
code suffers from the same problem, but still: it's a problem.

> +
> +		nlink++;
> +		offset++;
> +	}
> +
> +	return nlink;
> +}
> +
> +static int count_inode_refs(struct btrfs_root *root,
> +			       struct inode *inode, struct btrfs_path *path)
>  {
> -	struct btrfs_path *path;
>  	int ret;
>  	struct btrfs_key key;
> -	u64 nlink = 0;
> +	unsigned int nlink = 0;
>  	unsigned long ptr;
>  	unsigned long ptr_end;
>  	int name_len;
> @@ -993,10 +1154,6 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
>  	key.type = BTRFS_INODE_REF_KEY;
>  	key.offset = (u64)-1;
>  
> -	path = btrfs_alloc_path();
> -	if (!path)
> -		return -ENOMEM;
> -
>  	while (1) {
>  		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
>  		if (ret < 0)
> @@ -1030,6 +1187,48 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
>  		btrfs_release_path(path);
>  	}
>  	btrfs_release_path(path);
> +	btrfs_free_path(path);

In general, you must not free a path when you don't allocate it.

> +
> +	return nlink;
> +}
> +
> +/*
> + * There are a few corners where the link count of the file can't
> + * be properly maintained during replay.  So, instead of adding
> + * lots of complexity to the log code, we just scan the backrefs
> + * for any file that has been through replay.
> + *
> + * The scan will update the link count on the inode to reflect the
> + * number of back refs found.  If it goes down to zero, the iput
> + * will free the inode.
> + */
> +static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
> +					   struct btrfs_root *root,
> +					   struct inode *inode)
> +{
> +	struct btrfs_path *path;
> +	int ret;
> +	u64 nlink = 0;
> +	u64 ino = btrfs_ino(inode);
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	ret = count_inode_refs(root, inode, path);
> +	btrfs_release_path(path);

Either count_inode_refs should alloc it's private path, or it should
return the path in a released state. The caller should not be
responsible to pass a clean path and release it afterwards, as
count_inode_refs does not return anything through the path.

> +	if (ret < 0)
> +		goto out;
> +
> +	nlink = ret;
> +
> +	ret = count_inode_extrefs(root, inode, path);
> +	btrfs_release_path(path);

Same here. I'd just put the btrfs_release_path into count_inode_(ext)refs.

> +	if (ret < 0)
> +		goto out;
> +
> +	nlink += ret;
> +
>  	if (nlink != inode->i_nlink) {
>  		set_nlink(inode, nlink);
>  		btrfs_update_inode(trans, root, inode);
> @@ -1045,9 +1244,10 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
>  		ret = insert_orphan_item(trans, root, ino);
>  		BUG_ON(ret);
>  	}
> -	btrfs_free_path(path);
>  
> -	return 0;
> +out:
> +	btrfs_free_path(path);
> +	return ret;
>  }
>  
>  static noinline int fixup_inode_link_counts(struct btrfs_trans_handle *trans,
> @@ -1689,6 +1889,10 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb,
>  			ret = add_inode_ref(wc->trans, root, log, path,
>  					    eb, i, &key);
>  			BUG_ON(ret && ret != -ENOENT);
> +		} else if (key.type == BTRFS_INODE_EXTREF_KEY) {
> +			ret = add_inode_ref(wc->trans, root, log, path,
> +					    eb, i, &key);
> +			BUG_ON(ret && ret != -ENOENT);
>  		} else if (key.type == BTRFS_EXTENT_DATA_KEY) {
>  			ret = replay_one_extent(wc->trans, root, path,
>  						eb, i, &key);
> diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
> index 2270ac5..fd40ad5 100644
> --- a/fs/btrfs/tree-log.h
> +++ b/fs/btrfs/tree-log.h
> @@ -49,4 +49,8 @@ void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans,
>  int btrfs_log_new_name(struct btrfs_trans_handle *trans,
>  			struct inode *inode, struct inode *old_dir,
>  			struct dentry *parent);
> +int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
> +			  u64 start_off, struct btrfs_path *path,
> +			  struct btrfs_inode_extref **ret_ref, u64 *found_off);
> +
>  #endif

-Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/3] btrfs: extended inode refs
  2012-04-05 20:09 [PATCH 0/3] " Mark Fasheh
@ 2012-04-05 20:09 ` Mark Fasheh
  2012-04-12 13:08   ` Jan Schmidt
  2012-04-12 15:53   ` Jan Schmidt
  0 siblings, 2 replies; 21+ messages in thread
From: Mark Fasheh @ 2012-04-05 20:09 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, Josef Bacik, Mark Fasheh

Teach tree-log.c about extended inode refs. In particular, we have to adjust
the behavior of inode ref replay as well as log tree recovery to account for
the existence of extended refs.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/tree-log.c |  320 +++++++++++++++++++++++++++++++++++++++++---------
 fs/btrfs/tree-log.h |    4 +
 2 files changed, 266 insertions(+), 58 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 966cc74..d69b07a 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -23,6 +23,7 @@
 #include "disk-io.h"
 #include "locking.h"
 #include "print-tree.h"
+#include "backref.h"
 #include "compat.h"
 #include "tree-log.h"
 
@@ -748,6 +749,7 @@ static noinline int backref_in_log(struct btrfs_root *log,
 {
 	struct btrfs_path *path;
 	struct btrfs_inode_ref *ref;
+	struct btrfs_inode_extref *extref;
 	unsigned long ptr;
 	unsigned long ptr_end;
 	unsigned long name_ptr;
@@ -764,8 +766,24 @@ static noinline int backref_in_log(struct btrfs_root *log,
 	if (ret != 0)
 		goto out;
 
-	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
 	ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]);
+
+	if (key->type == BTRFS_INODE_EXTREF_KEY) {
+		extref = (struct btrfs_inode_extref *)ptr;
+
+		found_name_len = btrfs_inode_extref_name_len(path->nodes[0],
+							     extref);
+		if (found_name_len == namelen) {
+			name_ptr = (unsigned long)&extref->name;
+			ret = memcmp_extent_buffer(path->nodes[0], name,
+						   name_ptr, namelen);
+			if (ret == 0)
+				match = 1;
+		}
+		goto out;
+	}
+
+	item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
 	ptr_end = ptr + item_size;
 	while (ptr < ptr_end) {
 		ref = (struct btrfs_inode_ref *)ptr;
@@ -786,7 +804,6 @@ out:
 	return match;
 }
 
-
 /*
  * replay one inode back reference item found in the log tree.
  * eb, slot and key refer to the buffer and key found in the log tree.
@@ -801,15 +818,20 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
 				  struct btrfs_key *key)
 {
 	struct btrfs_inode_ref *ref;
+	struct btrfs_inode_extref *extref;
 	struct btrfs_dir_item *di;
+	struct btrfs_key search_key;
 	struct inode *dir;
 	struct inode *inode;
 	unsigned long ref_ptr;
 	unsigned long ref_end;
-	char *name;
-	int namelen;
+	char *name, *victim_name;
+	int namelen, victim_name_len;
 	int ret;
 	int search_done = 0;
+	int log_ref_ver = 0;
+	u64 parent_objectid, inode_objectid, ref_index;
+	struct extent_buffer *leaf;
 
 	/*
 	 * it is possible that we didn't log all the parent directories
@@ -817,32 +839,56 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
 	 * copy the back ref in.  The link count fixup code will take
 	 * care of the rest
 	 */
-	dir = read_one_inode(root, key->offset);
+
+	if (key->type == BTRFS_INODE_EXTREF_KEY) {
+		log_ref_ver = 1;
+
+		ref_ptr = btrfs_item_ptr_offset(eb, slot);
+
+		/* So that we don't loop back looking for old style log refs. */
+		ref_end = ref_ptr;
+
+		extref = (struct btrfs_inode_extref *) btrfs_item_ptr_offset(eb, slot);
+		namelen = btrfs_inode_extref_name_len(eb, extref);
+		name = kmalloc(namelen, GFP_NOFS);
+
+		read_extent_buffer(eb, name, (unsigned long)&extref->name,
+				   namelen);
+
+		ref_index = btrfs_inode_extref_index(eb, extref);
+		parent_objectid = btrfs_inode_extref_parent(eb, extref);
+	} else {
+		parent_objectid = key->offset;
+
+		ref_ptr = btrfs_item_ptr_offset(eb, slot);
+		ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
+
+		ref = (struct btrfs_inode_ref *)ref_ptr;
+		namelen = btrfs_inode_ref_name_len(eb, ref);
+		name = kmalloc(namelen, GFP_NOFS);
+		BUG_ON(!name);
+
+		read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
+
+		ref_index = btrfs_inode_ref_index(eb, ref);
+	}
+
+	inode_objectid = key->objectid;
+
+	dir = read_one_inode(root, parent_objectid);
 	if (!dir)
 		return -ENOENT;
 
-	inode = read_one_inode(root, key->objectid);
+	inode = read_one_inode(root, inode_objectid);
 	if (!inode) {
 		iput(dir);
 		return -EIO;
 	}
 
-	ref_ptr = btrfs_item_ptr_offset(eb, slot);
-	ref_end = ref_ptr + btrfs_item_size_nr(eb, slot);
-
 again:
-	ref = (struct btrfs_inode_ref *)ref_ptr;
-
-	namelen = btrfs_inode_ref_name_len(eb, ref);
-	name = kmalloc(namelen, GFP_NOFS);
-	BUG_ON(!name);
-
-	read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
-
 	/* if we already have a perfect match, we're done */
 	if (inode_in_dir(root, path, btrfs_ino(dir), btrfs_ino(inode),
-			 btrfs_inode_ref_index(eb, ref),
-			 name, namelen)) {
+			 ref_index, name, namelen)) {
 		goto out;
 	}
 
@@ -857,19 +903,23 @@ again:
 	if (search_done)
 		goto insert;
 
-	ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
+	/* Search old style refs */
+	search_key.objectid = inode_objectid;
+	search_key.type = BTRFS_INODE_REF_KEY;
+	search_key.offset = parent_objectid;
+
+	ret = btrfs_search_slot(NULL, root, &search_key, path, 0, 0);
 	if (ret == 0) {
-		char *victim_name;
-		int victim_name_len;
 		struct btrfs_inode_ref *victim_ref;
 		unsigned long ptr;
 		unsigned long ptr_end;
-		struct extent_buffer *leaf = path->nodes[0];
+
+		leaf = path->nodes[0];
 
 		/* are we trying to overwrite a back ref for the root directory
 		 * if so, just jump out, we're done
 		 */
-		if (key->objectid == key->offset)
+		if (search_key.objectid == search_key.offset)
 			goto out_nowrite;
 
 		/* check all the names in this back reference to see
@@ -889,7 +939,7 @@ again:
 					   (unsigned long)(victim_ref + 1),
 					   victim_name_len);
 
-			if (!backref_in_log(log, key, victim_name,
+			if (!backref_in_log(log, &search_key, victim_name,
 					    victim_name_len)) {
 				btrfs_inc_nlink(inode);
 				btrfs_release_path(path);
@@ -902,19 +952,44 @@ again:
 			ptr = (unsigned long)(victim_ref + 1) + victim_name_len;
 		}
 		BUG_ON(ret);
-
-		/*
-		 * NOTE: we have searched root tree and checked the
-		 * coresponding ref, it does not need to check again.
-		 */
-		search_done = 1;
 	}
 	btrfs_release_path(path);
 
+	/* Same search but for extended refs */
+	extref = btrfs_lookup_inode_extref(NULL, root, path, name, namelen,
+					   inode_objectid, parent_objectid, 0,
+					   0);
+	if (extref && !IS_ERR(extref)) {
+		victim_name_len = btrfs_inode_extref_name_len(eb, extref);
+		victim_name = kmalloc(namelen, GFP_NOFS);
+		leaf = path->nodes[0];
+		read_extent_buffer(eb, name, (unsigned long)&extref->name, namelen);
+
+		search_key.objectid = inode_objectid;
+		search_key.type = BTRFS_INODE_EXTREF_KEY;
+		search_key.offset = btrfs_extref_key_off(parent_objectid, name, namelen);
+		if (!backref_in_log(log, &search_key, victim_name,
+				    victim_name_len)) {
+			btrfs_inc_nlink(inode);
+			btrfs_release_path(path);
+
+			ret = btrfs_unlink_inode(trans, root, dir,
+						 inode, victim_name,
+						 victim_name_len);
+		}
+		kfree(victim_name);
+		BUG_ON(ret);
+	}
+
+	/*
+	 * NOTE: we have searched root tree and checked the
+	 * coresponding refs, it does not need to be checked again.
+	 */
+	search_done = 1;
+
 	/* look for a conflicting sequence number */
 	di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir),
-					 btrfs_inode_ref_index(eb, ref),
-					 name, namelen, 0);
+					 ref_index, name, namelen, 0);
 	if (di && !IS_ERR(di)) {
 		ret = drop_one_dir_item(trans, root, path, dir, di);
 		BUG_ON(ret);
@@ -932,17 +1007,25 @@ again:
 
 insert:
 	/* insert our name */
-	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
-			     btrfs_inode_ref_index(eb, ref));
+	ret = btrfs_add_link(trans, dir, inode, name, namelen, 0, ref_index);
 	BUG_ON(ret);
 
 	btrfs_update_inode(trans, root, inode);
 
 out:
-	ref_ptr = (unsigned long)(ref + 1) + namelen;
+	ref_ptr = (unsigned long)(ref_ptr + sizeof(struct btrfs_inode_ref)) + namelen;
 	kfree(name);
-	if (ref_ptr < ref_end)
+	if (ref_ptr < ref_end) {
+		ref = (struct btrfs_inode_ref *)ref_ptr;
+		namelen = btrfs_inode_ref_name_len(eb, ref);
+		name = kmalloc(namelen, GFP_NOFS);
+		BUG_ON(!name);
+
+		read_extent_buffer(eb, name, (unsigned long)(ref + 1), namelen);
+
+		ref_index = btrfs_inode_ref_index(eb, ref);
 		goto again;
+	}
 
 	/* finally write the back reference in the inode */
 	ret = overwrite_item(trans, root, path, eb, slot, key);
@@ -965,25 +1048,103 @@ static int insert_orphan_item(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
+int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
+			  u64 start_off, struct btrfs_path *path,
+			  struct btrfs_inode_extref **ret_ref, u64 *found_off)
+{
+	int ret, slot;
+	struct btrfs_key key, found_key;
+	struct btrfs_inode_extref *ref;
+	struct extent_buffer *leaf;
+	struct btrfs_item *item;
+	unsigned long ptr;
 
-/*
- * There are a few corners where the link count of the file can't
- * be properly maintained during replay.  So, instead of adding
- * lots of complexity to the log code, we just scan the backrefs
- * for any file that has been through replay.
- *
- * The scan will update the link count on the inode to reflect the
- * number of back refs found.  If it goes down to zero, the iput
- * will free the inode.
- */
-static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
-					   struct btrfs_root *root,
-					   struct inode *inode)
+	key.objectid = inode_objectid;
+	btrfs_set_key_type(&key, BTRFS_INODE_EXTREF_KEY);
+	key.offset = start_off;
+
+	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+
+	while (1) {
+		leaf = path->nodes[0];
+		slot = path->slots[0];
+		if (slot >= btrfs_header_nritems(leaf)) {
+			/*
+			 * If the item at offset is not found,
+			 * btrfs_search_slot will point us to the slot
+			 * where it should be inserted. In our case
+			 * that will be the slot directly before the
+			 * next INODE_REF_KEY_V2 item. In the case
+			 * that we're pointing to the last slot in a
+			 * leaf, we must move one leaf over.
+			 */
+			ret = btrfs_next_leaf(root, path);
+			if (ret) {
+				if (ret >= 1)
+					ret = -ENOENT;
+				break;
+			}
+			continue;
+		}
+
+		item = btrfs_item_nr(leaf, slot);
+		btrfs_item_key_to_cpu(leaf, &found_key, slot);
+
+		/*
+		 * Check that we're still looking at an extended ref key for
+		 * this particular objectid. If we have different
+		 * objectid or type then there are no more to be found
+		 * in the tree and we can exit.
+		 */
+		ret = -ENOENT;
+		if (found_key.objectid != inode_objectid)
+			break;
+		if (btrfs_key_type(&found_key) != BTRFS_INODE_EXTREF_KEY)
+			break;
+
+		ret = 0;
+		ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
+		ref = (struct btrfs_inode_extref *)ptr;
+		*ret_ref = ref;
+		if (found_off)
+			*found_off = found_key.offset + 1;
+		break;
+	}
+
+out:
+	return ret;
+}
+
+static int count_inode_extrefs(struct btrfs_root *root,
+			       struct inode *inode, struct btrfs_path *path)
+{
+	int ret;
+	unsigned int nlink = 0;
+	u64 inode_objectid = btrfs_ino(inode);
+	u64 offset = 0;
+	struct btrfs_inode_extref *ref;
+
+	while (1) {
+		ret = btrfs_find_one_extref(root, inode_objectid, offset, path,
+					    &ref, &offset);
+		if (ret)
+			break;
+
+		nlink++;
+		offset++;
+	}
+
+	return nlink;
+}
+
+static int count_inode_refs(struct btrfs_root *root,
+			       struct inode *inode, struct btrfs_path *path)
 {
-	struct btrfs_path *path;
 	int ret;
 	struct btrfs_key key;
-	u64 nlink = 0;
+	unsigned int nlink = 0;
 	unsigned long ptr;
 	unsigned long ptr_end;
 	int name_len;
@@ -993,10 +1154,6 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
 	key.type = BTRFS_INODE_REF_KEY;
 	key.offset = (u64)-1;
 
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
-
 	while (1) {
 		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
 		if (ret < 0)
@@ -1030,6 +1187,48 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
 		btrfs_release_path(path);
 	}
 	btrfs_release_path(path);
+	btrfs_free_path(path);
+
+	return nlink;
+}
+
+/*
+ * There are a few corners where the link count of the file can't
+ * be properly maintained during replay.  So, instead of adding
+ * lots of complexity to the log code, we just scan the backrefs
+ * for any file that has been through replay.
+ *
+ * The scan will update the link count on the inode to reflect the
+ * number of back refs found.  If it goes down to zero, the iput
+ * will free the inode.
+ */
+static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
+					   struct btrfs_root *root,
+					   struct inode *inode)
+{
+	struct btrfs_path *path;
+	int ret;
+	u64 nlink = 0;
+	u64 ino = btrfs_ino(inode);
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	ret = count_inode_refs(root, inode, path);
+	btrfs_release_path(path);
+	if (ret < 0)
+		goto out;
+
+	nlink = ret;
+
+	ret = count_inode_extrefs(root, inode, path);
+	btrfs_release_path(path);
+	if (ret < 0)
+		goto out;
+
+	nlink += ret;
+
 	if (nlink != inode->i_nlink) {
 		set_nlink(inode, nlink);
 		btrfs_update_inode(trans, root, inode);
@@ -1045,9 +1244,10 @@ static noinline int fixup_inode_link_count(struct btrfs_trans_handle *trans,
 		ret = insert_orphan_item(trans, root, ino);
 		BUG_ON(ret);
 	}
-	btrfs_free_path(path);
 
-	return 0;
+out:
+	btrfs_free_path(path);
+	return ret;
 }
 
 static noinline int fixup_inode_link_counts(struct btrfs_trans_handle *trans,
@@ -1689,6 +1889,10 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb,
 			ret = add_inode_ref(wc->trans, root, log, path,
 					    eb, i, &key);
 			BUG_ON(ret && ret != -ENOENT);
+		} else if (key.type == BTRFS_INODE_EXTREF_KEY) {
+			ret = add_inode_ref(wc->trans, root, log, path,
+					    eb, i, &key);
+			BUG_ON(ret && ret != -ENOENT);
 		} else if (key.type == BTRFS_EXTENT_DATA_KEY) {
 			ret = replay_one_extent(wc->trans, root, path,
 						eb, i, &key);
diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
index 2270ac5..fd40ad5 100644
--- a/fs/btrfs/tree-log.h
+++ b/fs/btrfs/tree-log.h
@@ -49,4 +49,8 @@ void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans,
 int btrfs_log_new_name(struct btrfs_trans_handle *trans,
 			struct inode *inode, struct inode *old_dir,
 			struct dentry *parent);
+int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
+			  u64 start_off, struct btrfs_path *path,
+			  struct btrfs_inode_extref **ret_ref, u64 *found_off);
+
 #endif
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-08-15 17:59 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-21 21:46 [PATCH 0/3] btrfs: extended inode refs Mark Fasheh
2012-05-21 21:46 ` [PATCH 1/3] " Mark Fasheh
2012-07-06 14:56   ` Jan Schmidt
2012-07-06 15:14     ` Stefan Behrens
2012-07-09 19:05     ` Mark Fasheh
2012-07-09 20:33     ` Mark Fasheh
2012-05-21 21:46 ` [PATCH 2/3] " Mark Fasheh
2012-07-06 14:57   ` Jan Schmidt
2012-08-06 23:31     ` Mark Fasheh
2012-05-21 21:46 ` [PATCH 3/3] " Mark Fasheh
2012-07-06 14:57   ` Jan Schmidt
2012-07-09 20:24     ` Mark Fasheh
  -- strict thread matches above, loose matches on Subject: below --
2012-08-08 18:55 [PATCH 0/3] " Mark Fasheh
2012-08-08 18:55 ` [PATCH 2/3] " Mark Fasheh
2012-08-15 15:04   ` Jan Schmidt
2012-08-15 17:59     ` Mark Fasheh
2012-04-05 20:09 [PATCH 0/3] " Mark Fasheh
2012-04-05 20:09 ` [PATCH 2/3] " Mark Fasheh
2012-04-12 13:08   ` Jan Schmidt
2012-05-03 23:12     ` Mark Fasheh
2012-05-04 11:39       ` David Sterba
2012-04-12 15:53   ` Jan Schmidt
2012-05-01 18:39     ` Mark Fasheh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.