All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 00/10 v1][RESEND] ext4: extent status tree (step 1)
@ 2012-07-22  7:59 Zheng Liu
  2012-07-22  7:59 ` [RFC][PATCH 01/10 v1][RESEND] ext4: add two structures supporting extent status tree Zheng Liu
                   ` (9 more replies)
  0 siblings, 10 replies; 19+ messages in thread
From: Zheng Liu @ 2012-07-22  7:59 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: xiaoqiangnk, achender, wenqing.lz

[Sorry, it is my fault because the title of patches is out of order.  So I
re-send this patch set]

Hi all,

As these links described and discussed [1-3], this is my first try to implement
extent status tree in ext4.  Although now maybe it is not perfect, I think that
it should be sent out as soon as possible because some one can review these
patches earlier and give me some feedbacks.  So I really appreciate if you can
review and comment this patch set.  Thanks.

This patch set is based-on Yongqiang's and Allison's work.  Thank you for your
excellent work.  Meanwhile I don't change its name from extent status tree to io
tree.  If most developers think that we should change it, I will fix it in next
version.  As previous discussion, extent status tree will bring a lot of
benifits for us, which includes bug fix, improvements and range locking.  In
first step, this patch set aims at improving fiemap implementation and quota
reservation in bigalloc, simplifying punching hole implementation, and
introducing SEEK_DATA/SEEK_HOLE support.

Extent status tree is used to identify a delay extent in ext4.  It aims at
providing a efficient method to lookup some blocks whether these blocks are
belong to a delay extent or not.  If without extent status tree, ext4 needs to
lookup page cache.  So it is too complicated and inefficient.  About the
detailed information, please feel free to see the second patch.  In comment of
the head of this patch it describes the extent status tree in detailed.

Patch [1-5]:
  Extent status tree is added in ext4 in order to identify a delay extent.
Patch [6]:
  Fiemap is re-implemented.  Now we only need to lookup extent status tree to
  find a delay extent rather than lookup page cache.
Patch [7]:
  ext4_find_delay_alloc_range is reworked using extent status tree to avoid to
  lookup page cache.
Patch [8]:
  Introduce lseek SEEK_DATA/SEEK_HOLE support.
Patch [9-10]:
  Avoid writeot all dirty pages in punching a hole, and add two tracepoints in
  ext4_ext_punch_hole

1. http://www.spinics.net/lists/linux-ext4/msg31742.html
2. http://www.spinics.net/lists/linux-ext4/msg32661.html
3. http://www.spinics.net/lists/linux-ext4/msg30281.html

Regards,
Zheng

Zheng Liu (10):
      ext4: add two structures supporting extent status tree
      ext4: add operations on extent status tree
      ext4: initialize extent status tree
      ext4: let ext4 maintain extent status tree
      ext4: add some tracepoints in extent status tree
      ext4: reimplement fiemap on extent status tree
      ext4: reimplement ext4_find_delay_alloc_range on extent status tree
      ext4: introduce lseek SEEK_DATA/SEEK_HOLE support
      ext4: don't need to writeout all dirty pages in punch hole
      ext4: add two tracepoints in punching hole

 fs/ext4/Makefile            |    2 +-
 fs/ext4/ext4.h              |   10 +-
 fs/ext4/ext4_extents.h      |    3 +-
 fs/ext4/extents.c           |  331 ++++++----------------------------
 fs/ext4/extents_status.c    |  426 +++++++++++++++++++++++++++++++++++++++++++
 fs/ext4/extents_status.h    |   39 ++++
 fs/ext4/file.c              |  152 +++++++++++++++-
 fs/ext4/indirect.c          |    4 +-
 fs/ext4/inode.c             |   83 +++------
 fs/ext4/super.c             |   13 ++-
 include/trace/events/ext4.h |  154 ++++++++++++++++
 11 files changed, 874 insertions(+), 343 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC][PATCH 01/10 v1][RESEND] ext4: add two structures supporting extent status tree
  2012-07-22  7:59 [RFC][PATCH 00/10 v1][RESEND] ext4: extent status tree (step 1) Zheng Liu
@ 2012-07-22  7:59 ` Zheng Liu
  2012-07-22  7:59 ` [RFC][PATCH 02/10 v1][RESEND] ext4: add operations on " Zheng Liu
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2012-07-22  7:59 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: xiaoqiangnk, achender, wenqing.lz

From: Zheng Liu <wenqing.lz@taobao.com>

This patch adds two structures that supports status extent tree and
status_extent_info to inode_info.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/ext4.h           |    5 +++++
 fs/ext4/extents_status.h |   22 ++++++++++++++++++++++
 2 files changed, 27 insertions(+), 0 deletions(-)
 create mode 100644 fs/ext4/extents_status.h

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index cfc4e01..c0fe23e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -810,6 +810,8 @@ struct ext4_ext_cache {
 	__u32		ec_len; /* must be 32bit to return holes */
 };
 
+#include "extents_status.h"
+
 /*
  * fourth extended file system inode data in memory
  */
@@ -887,6 +889,9 @@ struct ext4_inode_info {
 	struct list_head i_prealloc_list;
 	spinlock_t i_prealloc_lock;
 
+	/* extents status tree */
+	struct ext4_es_tree i_es_tree;
+
 	/* ialloc */
 	ext4_group_t	i_last_alloc_group;
 
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
new file mode 100644
index 0000000..d87fd71
--- /dev/null
+++ b/fs/ext4/extents_status.h
@@ -0,0 +1,22 @@
+/*
+ *  fs/ext4/extents_status.h
+ *
+ * Written by Yongqiang Yang <xiaoqiangnk@gmail.com>
+ *
+ */
+
+#ifndef _EXT4_EXTENTS_STATUS_H
+#define _EXT4_EXTENTS_STATUS_H
+
+struct extent_status {
+	struct rb_node rb_node;
+	ext4_lblk_t start;	/* first block extent covers */
+	ext4_lblk_t len;	/* length of extent in block */
+};
+
+struct ext4_es_tree {
+	struct rb_root root;
+	struct extent_status *cache_es;	/* recently accessed extent */
+};
+
+#endif /* _EXT4_EXTENTS_STATUS_H */
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC][PATCH 02/10 v1][RESEND] ext4: add operations on extent status tree
  2012-07-22  7:59 [RFC][PATCH 00/10 v1][RESEND] ext4: extent status tree (step 1) Zheng Liu
  2012-07-22  7:59 ` [RFC][PATCH 01/10 v1][RESEND] ext4: add two structures supporting extent status tree Zheng Liu
@ 2012-07-22  7:59 ` Zheng Liu
  2012-07-31 11:55   ` Lukáš Czerner
  2012-07-22  7:59 ` [RFC][PATCH 03/10 v1][RESEND] ext4: initialize " Zheng Liu
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Zheng Liu @ 2012-07-22  7:59 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: xiaoqiangnk, achender, wenqing.lz

From: Zheng Liu <wenqing.lz@taobao.com>

This patch adds operations on a extent status tree.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/Makefile         |    2 +-
 fs/ext4/extents_status.c |  418 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext4/extents_status.h |   17 ++
 3 files changed, 436 insertions(+), 1 deletions(-)
 create mode 100644 fs/ext4/extents_status.c

diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index 56fd8f86..41f22be 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_EXT4_FS) += ext4.o
 ext4-y	:= balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o page-io.o \
 		ioctl.o namei.o super.o symlink.o hash.o resize.o extents.o \
 		ext4_jbd2.o migrate.o mballoc.o block_validity.o move_extent.o \
-		mmp.o indirect.o
+		mmp.o indirect.o extents_status.o
 
 ext4-$(CONFIG_EXT4_FS_XATTR)		+= xattr.o xattr_user.o xattr_trusted.o
 ext4-$(CONFIG_EXT4_FS_POSIX_ACL)	+= acl.o
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
new file mode 100644
index 0000000..bd4e589
--- /dev/null
+++ b/fs/ext4/extents_status.c
@@ -0,0 +1,418 @@
+/*
+ *  fs/ext4/extents_status.c
+ *
+ * Written by Yongqiang Yang <xiaoqiangnk@gmail.com>
+ *
+ * Ext4 extents status tree core functions.
+ */
+#include <linux/rbtree.h>
+#include "ext4.h"
+#include "extents_status.h"
+#include "ext4_extents.h"
+
+/*
+ * extents status tree implementation for ext4.
+ *
+ *
+ * ==========================================================================
+ * Extents status encompass delayed extents and extent locks
+ *
+ * 1. Why delayed extent implementation ?
+ *
+ * Without delayed extent, ext4 identifies a delayed extent by looking up
+ * page cache, this has several deficiencies - complicated, buggy, and
+ * inefficient code.
+ *
+ * FIEMAP, SEEK_HOLE/DATA, bigalloc, punch hole and writeout all need to know if
+ * a block or a range of blocks are belonged to a delayed extent.
+ *
+ * Let us have a look at how they do without delayed extents implementation.
+ *   --	FIEMAP
+ *	FIEMAP looks up page cache to identify delayed allocations from holes.
+ *
+ *   --	SEEK_HOLE/DATA
+ *	SEEK_HOLE/DATA has the same problem as FIEMAP.
+ *
+ *   --	bigalloc
+ *	bigalloc looks up page cache to figure out if a block is already
+ *	under delayed allocation or not to determine whether quota reserving
+ *	is needed for the cluster.
+ *
+ *   -- punch hole
+ *	punch hole looks up page cache to identify a delayed extent.
+ *
+ *   --	writeout
+ *	Writeout looks up whole page cache to see if a buffer is mapped, If
+ *	there are not very many delayed buffers, then it is time comsuming.
+ *
+ * With delayed extents implementation, FIEMAP, SEEK_HOLE/DATA, bigalloc and
+ * writeout can figure out if a block or a range of blocks is under delayed
+ * allocation(belonged to a delayed extent) or not by searching the delayed
+ * extent tree.
+ *
+ *
+ * ==========================================================================
+ * 2. ext4 delayed extents impelmentation
+ *
+ *   --	delayed extent
+ *	A delayed extent is a range of blocks which are contiguous logically and
+ *	under delayed allocation.  Unlike extent in ext4, delayed extent in ext4
+ *	is a in-memory struct, there is no corresponding on-disk data.  There is
+ *	no limit on length of delayed extent, so a delayed extent can contain as
+ *	many blocks as they are contiguous logically.
+ *
+ *   --	delayed extent tree
+ *	Every inode has a delayed extent tree and all under delayed allocation
+ *	blocks are added to the tree as dealyed extents.  Delayed extents in
+ *	the tree are ordered by logical block no.
+ *
+ *   --	operations on a delayed extent tree
+ *	There are three operations on a delayed extent tree: find next delayed
+ *	extent, adding a space(a range of blocks) and removing a space.
+ *
+ *   --	race on a delayed extent tree
+ *	Delayed extent tree is protected inode->i_data_sem like extent tree.
+ *
+ *
+ * ==========================================================================
+ * 3. performance analysis
+ *   --	overhead
+ *	1. Apart from operations on a delayed extent tree, we need to
+ *	down_write(inode->i_data_sem) in delayed write path to maintain delayed
+ *	extent tree, this can have impact on parallel read-write and write-write
+ *
+ *	2. There is a cache extent for write access, so if writes are not very
+ *	random, adding space operaions are in O(1) time.
+ *
+ *   --	gain
+ *	3. Code is much simpler, more readable, more maintainable and
+ *      more efficient.
+ */
+
+static struct kmem_cache *ext4_es_cachep;
+
+int __init ext4_init_es(void)
+{
+	ext4_es_cachep = KMEM_CACHE(extent_status, SLAB_RECLAIM_ACCOUNT);
+	if (ext4_es_cachep == NULL)
+		return -ENOMEM;
+	return 0;
+}
+
+void ext4_exit_es(void)
+{
+	if (ext4_es_cachep)
+		kmem_cache_destroy(ext4_es_cachep);
+}
+
+void ext4_es_init_tree(struct ext4_es_tree *tree)
+{
+	tree->root = RB_ROOT;
+	tree->cache_es = NULL;
+}
+
+#ifdef SE_DEBUG
+static void ext4_es_print_tree(struct inode *inode)
+{
+	struct ext4_es_tree *tree;
+	struct rb_node *node;
+
+	printk(KERN_DEBUG "status extents for inode %lu:", inode->i_ino);
+	tree = &EXT4_I(inode)->i_es_tree;
+	node = rb_first(&tree->root);
+	while (node) {
+		struct extent_status *es;
+		es = rb_entry(node, struct extent_status, rb_node);
+		printk(KERN_DEBUG " [%u/%u)", es->start, es->len);
+		node = rb_next(node);
+	}
+	printk(KERN_DEBUG "\n");
+}
+#else
+#define ext4_es_print_tree(inode)
+#endif
+
+static inline ext4_lblk_t extent_status_end(struct extent_status *es)
+{
+	if (es->start + es->len < es->start)
+		return (ext4_lblk_t)-1;
+	return es->start + es->len;
+}
+
+/*
+ * search through the tree for an delayed_extent with a given offset.  If
+ * it can't be found, try to find next extent.
+ */
+static struct extent_status *__es_tree_search(struct rb_root *root,
+						ext4_lblk_t offset)
+{
+	struct rb_node *node = root->rb_node;
+	struct extent_status *es = NULL;
+
+	while (node) {
+		es = rb_entry(node, struct extent_status, rb_node);
+		if (offset < es->start)
+			node = node->rb_left;
+		else if (offset >= extent_status_end(es))
+			node = node->rb_right;
+		else
+			return es;
+	}
+
+	if (es && offset < es->start)
+		return es;
+
+	if (es && offset >= extent_status_end(es)) {
+		node = rb_next(&es->rb_node);
+		return node ? rb_entry(node, struct extent_status, rb_node) :
+			      NULL;
+	}
+
+	return NULL;
+}
+
+/*
+ * ext4_es_find_extent: find the 1st delayed extent covering @es->start
+ * if it exists, otherwise, the next extent after @es->start.
+ *
+ * @inode: the inode which owns delayed extents
+ * @es: delayed extent that we found
+ *
+ * Returns next block beyond the found extent.
+ * Delayed extent is returned via @es.
+ */
+ext4_lblk_t ext4_es_find_extent(struct inode *inode, struct extent_status *es)
+{
+	struct ext4_es_tree *tree;
+	struct extent_status *es1;
+	struct rb_node *node;
+	ext4_lblk_t ret = EXT_MAX_BLOCKS;
+
+	es->len = 0;
+	tree = &EXT4_I(inode)->i_es_tree;
+	es1 = __es_tree_search(&tree->root, es->start);
+	if (es1) {
+		tree->cache_es = es1;
+		es->start = es1->start;
+		es->len = es1->len;
+		node = rb_next(&es1->rb_node);
+		if (node) {
+			es1 = rb_entry(node, struct extent_status, rb_node);
+			ret = es1->start;
+		}
+	}
+
+	return ret;
+}
+
+static struct extent_status *
+ext4_es_alloc_extent(ext4_lblk_t start, ext4_lblk_t len)
+{
+	struct extent_status *es;
+	es = kmem_cache_alloc(ext4_es_cachep, GFP_NOFS);
+	if (es == NULL)
+		return NULL;
+	es->start = start;
+	es->len = len;
+	return es;
+}
+
+static void ext4_es_free_extent(struct extent_status *es)
+{
+	kmem_cache_free(ext4_es_cachep, es);
+}
+
+static void ext4_es_try_to_merge_left(struct ext4_es_tree *tree,
+				      struct extent_status *es)
+{
+	struct extent_status *es1;
+	struct rb_node *node;
+
+	node = rb_prev(&es->rb_node);
+	if (!node)
+		return;
+
+	es1 = rb_entry(node, struct extent_status, rb_node);
+	if (extent_status_end(es1) == es->start) {
+		es1->len += es->len;
+		rb_erase(&es->rb_node, &tree->root);
+		if (es == tree->cache_es)
+			tree->cache_es = es1;
+		ext4_es_free_extent(es);
+	}
+}
+
+static void ext4_es_try_to_merge_right(struct ext4_es_tree *tree,
+				       struct extent_status *es)
+{
+	struct extent_status *es1;
+	struct rb_node *node;
+
+	node = rb_next(&es->rb_node);
+	if (!node)
+		return;
+
+	es1 = rb_entry(node, struct extent_status, rb_node);
+	if (es1->start == extent_status_end(es)) {
+		es->len += es1->len;
+		rb_erase(node, &tree->root);
+		if (es1 == tree->cache_es)
+			tree->cache_es = es;
+		ext4_es_free_extent(es1);
+	}
+}
+
+/*
+ * ext4_es_add_space: adds a space to a delayed extent tree.
+ * Caller holds inode->i_data_sem.
+ *
+ * ext4_es_add_space is callyed by ext4_dealyed_write_begin and
+ * ext4_es_remove_space.
+ *
+ * Return 0 on success, error code on failure.
+ */
+int ext4_es_add_space(struct inode *inode, ext4_lblk_t offset, ext4_lblk_t len)
+{
+	struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
+	struct rb_node **p = &tree->root.rb_node;
+	struct rb_node *parent = NULL;
+	struct extent_status *es;
+	ext4_lblk_t end = offset + len;
+
+	BUG_ON(end <= offset);
+
+	es = tree->cache_es;
+	es_debug("add [%u/%u) to extent status tree of inode %lu\n",
+		 offset, len, inode->i_ino);
+
+	if (es && extent_status_end(es) == offset) {
+		es_debug("cached by [%u/%u)\n", es->start, es->len);
+		es->len += len;
+		ext4_es_try_to_merge_right(tree, es);
+		goto out;
+	} else if (es && es->start == end) {
+		es_debug("cached by [%u/%u)\n", es->start, es->len);
+		es->start = offset;
+		es->len += len;
+		ext4_es_try_to_merge_left(tree, es);
+		goto out;
+	} else if (es && es->start <= offset &&
+		   extent_status_end(es) >= end) {
+		es_debug("cached by [%u/%u)\n", es->start, es->len);
+		goto out;
+	}
+
+	while (*p) {
+		parent = *p;
+		es = rb_entry(parent, struct extent_status, rb_node);
+
+		if (offset < es->start) {
+			if (end == es->start) {
+				es->len += len;
+				es->start = offset;
+				goto out;
+			}
+			p = &(*p)->rb_left;
+		} else if (offset >= extent_status_end(es)) {
+			if (extent_status_end(es) == offset) {
+				es->len += len;
+				goto out;
+			}
+			p = &(*p)->rb_right;
+		} else
+			goto out;
+	}
+
+	es = ext4_es_alloc_extent(offset, len);
+	if (!es)
+		return -ENOMEM;
+	rb_link_node(&es->rb_node, parent, p);
+	rb_insert_color(&es->rb_node, &tree->root);
+
+out:
+	tree->cache_es = es;
+	ext4_es_print_tree(inode);
+
+	return 0;
+}
+
+/*
+ * ext4_es_remove_space() removes a space from a delayed extent tree.
+ * Caller holds inode->i_data_sem.
+ *
+ * Return 0 on success, error code on failure.
+ */
+int ext4_es_remove_space(struct inode *inode, ext4_lblk_t offset,
+			 ext4_lblk_t len)
+{
+	struct rb_node *node;
+	struct ext4_es_tree *tree;
+	struct extent_status *es;
+	struct extent_status orig_es;
+	ext4_lblk_t len1, len2, end;
+
+	es_debug("remove [%u/%u) from extent status tree of inode %lu\n",
+		 offset, len, inode->i_ino);
+
+	end = offset + len;
+	BUG_ON(end <= offset);
+	tree = &EXT4_I(inode)->i_es_tree;
+	es = __es_tree_search(&tree->root, offset);
+	if (!es)
+		goto out;
+
+	/* Simply invalidate cache_es. */
+	tree->cache_es = NULL;
+
+	orig_es.start = es->start;
+	orig_es.len = es->len;
+	len1 = offset > es->start ? offset - es->start : 0;
+	len2 = extent_status_end(es) > end ?
+	       extent_status_end(es) - end : 0;
+	if (len1 > 0)
+		es->len = len1;
+	if (len2 > 0) {
+		if (len1 > 0) {
+			int err;
+			err = ext4_es_add_space(inode, end, len2);
+			if (err) {
+				es->start = orig_es.start;
+				es->len = orig_es.len;
+				return err;
+			}
+		} else {
+			es->start = end;
+			es->len = len2;
+		}
+		goto out;
+	}
+
+	if (len1 > 0) {
+		node = rb_next(&es->rb_node);
+		if (!node)
+			es = rb_entry(node, struct extent_status, rb_node);
+		else
+			es = NULL;
+	}
+
+	while (es && extent_status_end(es) <= end) {
+		node = rb_next(&es->rb_node);
+		rb_erase(&es->rb_node, &tree->root);
+		ext4_es_free_extent(es);
+		if (!node) {
+			es = NULL;
+			break;
+		}
+		es = rb_entry(node, struct extent_status, rb_node);
+	}
+
+	if (es && es->start < end) {
+		len1 = extent_status_end(es) - end;
+		es->start = end;
+		es->len = len1;
+	}
+
+out:
+	ext4_es_print_tree(inode);
+	return 0;
+}
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index d87fd71..8fe8084 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -8,6 +8,12 @@
 #ifndef _EXT4_EXTENTS_STATUS_H
 #define _EXT4_EXTENTS_STATUS_H
 
+#ifdef ES_DEBUG
+#define es_debug(a...)         printk(a)
+#else
+#define es_debug(a...)
+#endif
+
 struct extent_status {
 	struct rb_node rb_node;
 	ext4_lblk_t start;	/* first block extent covers */
@@ -19,4 +25,15 @@ struct ext4_es_tree {
 	struct extent_status *cache_es;	/* recently accessed extent */
 };
 
+extern int __init ext4_init_es(void);
+extern void ext4_exit_es(void);
+extern void ext4_es_init_tree(struct ext4_es_tree *tree);
+
+extern int ext4_es_add_space(struct inode *inode, ext4_lblk_t start,
+				ext4_lblk_t len);
+extern int ext4_es_remove_space(struct inode *inode, ext4_lblk_t start,
+				ext4_lblk_t len);
+extern ext4_lblk_t ext4_es_find_extent(struct inode *inode,
+				struct extent_status *es);
+
 #endif /* _EXT4_EXTENTS_STATUS_H */
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC][PATCH 03/10 v1][RESEND] ext4: initialize extent status tree
  2012-07-22  7:59 [RFC][PATCH 00/10 v1][RESEND] ext4: extent status tree (step 1) Zheng Liu
  2012-07-22  7:59 ` [RFC][PATCH 01/10 v1][RESEND] ext4: add two structures supporting extent status tree Zheng Liu
  2012-07-22  7:59 ` [RFC][PATCH 02/10 v1][RESEND] ext4: add operations on " Zheng Liu
@ 2012-07-22  7:59 ` Zheng Liu
  2012-07-22  7:59 ` [RFC][PATCH 04/10 v1][RESEND] ext4: let ext4 maintain " Zheng Liu
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2012-07-22  7:59 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: xiaoqiangnk, achender, wenqing.lz

From: Zheng Liu <wenqing.lz@taobao.com>

Let ext4 initialize extent status tree of an inode.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/super.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index eb7aa3e..7cd5e4f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -972,6 +972,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
 	memset(&ei->i_cached_extent, 0, sizeof(struct ext4_ext_cache));
 	INIT_LIST_HEAD(&ei->i_prealloc_list);
 	spin_lock_init(&ei->i_prealloc_lock);
+	ext4_es_init_tree(&ei->i_es_tree);
 	ei->i_reserved_data_blocks = 0;
 	ei->i_reserved_meta_blocks = 0;
 	ei->i_allocated_meta_blocks = 0;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC][PATCH 04/10 v1][RESEND] ext4: let ext4 maintain extent status tree
  2012-07-22  7:59 [RFC][PATCH 00/10 v1][RESEND] ext4: extent status tree (step 1) Zheng Liu
                   ` (2 preceding siblings ...)
  2012-07-22  7:59 ` [RFC][PATCH 03/10 v1][RESEND] ext4: initialize " Zheng Liu
@ 2012-07-22  7:59 ` Zheng Liu
  2012-07-22  7:59 ` [RFC][PATCH 05/10 v1][RESEND] ext4: add some tracepoints in " Zheng Liu
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2012-07-22  7:59 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: xiaoqiangnk, achender, wenqing.lz

From: Zheng Liu <wenqing.lz@taobao.com>

This patch lets ext4 maintain extent status tree.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/extents.c  |    2 ++
 fs/ext4/indirect.c |    3 +++
 fs/ext4/inode.c    |   30 ++++++++++++++++++++++++++++--
 fs/ext4/super.c    |   12 +++++++++++-
 4 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 91341ec..0747917 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4270,6 +4270,8 @@ void ext4_ext_truncate(struct inode *inode)
 
 	last_block = (inode->i_size + sb->s_blocksize - 1)
 			>> EXT4_BLOCK_SIZE_BITS(sb);
+	err = ext4_es_remove_space(inode, last_block,
+				   EXT_MAX_BLOCKS - last_block);
 	err = ext4_ext_remove_space(inode, last_block, EXT_MAX_BLOCKS - 1);
 
 	/* In a multi-transaction truncate, we only make the final
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 830e1b2..e190427 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -21,6 +21,7 @@
  */
 
 #include "ext4_jbd2.h"
+#include "ext4_extents.h"
 #include "truncate.h"
 
 #include <trace/events/ext4.h>
@@ -1398,6 +1399,8 @@ void ext4_ind_truncate(struct inode *inode)
 	down_write(&ei->i_data_sem);
 
 	ext4_discard_preallocations(inode);
+	ext4_es_remove_space(inode, last_block,
+			     EXT_MAX_BLOCKS - last_block);
 
 	/*
 	 * The orphan list entry will now protect us from any crash which
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 02bc8cb..2f82630 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -555,7 +555,16 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 	up_read((&EXT4_I(inode)->i_data_sem));
 
 	if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
-		int ret = check_block_validity(inode, map);
+		int ret;
+		if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
+			/* delayed alloc may be allocated by fallocate and
+			 * coverted to initialized by directIO.
+			 * we need to handle delayed extent here.
+			 */
+			down_write((&EXT4_I(inode)->i_data_sem));
+			goto delayed_mapped;
+		}
+		ret = check_block_validity(inode, map);
 		if (ret != 0)
 			return ret;
 	}
@@ -637,8 +646,16 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 		 * set the BH_Da_Mapped bit on them. Its important to do this
 		 * under the protection of i_data_sem.
 		 */
-		if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED)
+		if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
+			int ret;
 			set_buffers_da_mapped(inode, map);
+delayed_mapped:
+			/* delayed allocation blocks has been allocated */
+			ret = ext4_es_remove_space(inode, map->m_lblk,
+						   map->m_len);
+			if (ret < 0)
+				retval = ret;
+		}
 	}
 
 	up_write((&EXT4_I(inode)->i_data_sem));
@@ -1750,6 +1767,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
 			      struct buffer_head *bh)
 {
 	int retval;
+	int ret;
 	sector_t invalid_block = ~((sector_t) 0xffff);
 
 	if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
@@ -1796,6 +1814,14 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
 out_unlock:
 	up_read((&EXT4_I(inode)->i_data_sem));
 
+	if (retval == 0) {
+		down_write((&EXT4_I(inode)->i_data_sem));
+		ret = ext4_es_add_space(inode, map->m_lblk, map->m_len);
+		up_write((&EXT4_I(inode)->i_data_sem));
+		if (ret)
+			return ret;
+	}
+
 	return retval;
 }
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 7cd5e4f..6f97c47 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -50,6 +50,7 @@
 #include "xattr.h"
 #include "acl.h"
 #include "mballoc.h"
+#include "ext4_extents.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/ext4.h>
@@ -1056,6 +1057,7 @@ void ext4_clear_inode(struct inode *inode)
 	clear_inode(inode);
 	dquot_drop(inode);
 	ext4_discard_preallocations(inode);
+	ext4_es_remove_space(inode, 0, EXT_MAX_BLOCKS);
 	if (EXT4_I(inode)->jinode) {
 		jbd2_journal_release_jbd_inode(EXT4_JOURNAL(inode),
 					       EXT4_I(inode)->jinode);
@@ -5080,9 +5082,14 @@ static int __init ext4_init_fs(void)
 		init_waitqueue_head(&ext4__ioend_wq[i]);
 	}
 
-	err = ext4_init_pageio();
+	err = ext4_init_es();
 	if (err)
 		return err;
+
+	err = ext4_init_pageio();
+	if (err)
+		goto out7;
+
 	err = ext4_init_system_zone();
 	if (err)
 		goto out6;
@@ -5130,6 +5137,9 @@ out5:
 	ext4_exit_system_zone();
 out6:
 	ext4_exit_pageio();
+out7:
+	ext4_exit_es();
+
 	return err;
 }
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC][PATCH 05/10 v1][RESEND] ext4: add some tracepoints in extent status tree
  2012-07-22  7:59 [RFC][PATCH 00/10 v1][RESEND] ext4: extent status tree (step 1) Zheng Liu
                   ` (3 preceding siblings ...)
  2012-07-22  7:59 ` [RFC][PATCH 04/10 v1][RESEND] ext4: let ext4 maintain " Zheng Liu
@ 2012-07-22  7:59 ` Zheng Liu
  2012-07-22  7:59 ` [RFC][PATCH 06/10 v1][RESEND] ext4: reimplement fiemap on " Zheng Liu
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2012-07-22  7:59 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: xiaoqiangnk, achender, wenqing.lz

From: Zheng Liu <wenqing.lz@taobao.com>

This patch adds some tracepoints in extent status tree.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/extents_status.c    |    8 +++
 include/trace/events/ext4.h |  101 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 109 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index bd4e589..1c82340 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -10,6 +10,8 @@
 #include "extents_status.h"
 #include "ext4_extents.h"
 
+#include <trace/events/ext4.h>
+
 /*
  * extents status tree implementation for ext4.
  *
@@ -188,6 +190,8 @@ ext4_lblk_t ext4_es_find_extent(struct inode *inode, struct extent_status *es)
 	struct rb_node *node;
 	ext4_lblk_t ret = EXT_MAX_BLOCKS;
 
+	trace_ext4_es_find_extent_enter(inode, es->start);
+
 	es->len = 0;
 	tree = &EXT4_I(inode)->i_es_tree;
 	es1 = __es_tree_search(&tree->root, es->start);
@@ -202,6 +206,8 @@ ext4_lblk_t ext4_es_find_extent(struct inode *inode, struct extent_status *es)
 		}
 	}
 
+	trace_ext4_es_find_extent_exit(inode, es, ret);
+
 	return ret;
 }
 
@@ -282,6 +288,7 @@ int ext4_es_add_space(struct inode *inode, ext4_lblk_t offset, ext4_lblk_t len)
 	BUG_ON(end <= offset);
 
 	es = tree->cache_es;
+	trace_ext4_es_add_space(inode, offset, len);
 	es_debug("add [%u/%u) to extent status tree of inode %lu\n",
 		 offset, len, inode->i_ino);
 
@@ -351,6 +358,7 @@ int ext4_es_remove_space(struct inode *inode, ext4_lblk_t offset,
 	struct extent_status orig_es;
 	ext4_lblk_t len1, len2, end;
 
+	trace_ext4_es_remove_space(inode, offset, len);
 	es_debug("remove [%u/%u) from extent status tree of inode %lu\n",
 		 offset, len, inode->i_ino);
 
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 69d8a69..5c17592 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -15,6 +15,7 @@ struct ext4_inode_info;
 struct mpage_da_data;
 struct ext4_map_blocks;
 struct ext4_extent;
+struct extent_status;
 
 #define EXT4_I(inode) (container_of(inode, struct ext4_inode_info, vfs_inode))
 
@@ -2055,6 +2056,106 @@ TRACE_EVENT(ext4_ext_remove_space_done,
 		  (unsigned short) __entry->eh_entries)
 );
 
+TRACE_EVENT(ext4_es_add_space,
+	TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t len),
+
+	TP_ARGS(inode, start, len),
+
+	TP_STRUCT__entry(
+		__field(	dev_t,	dev			)
+		__field(	ino_t,	ino			)
+		__field(	loff_t,	start			)
+		__field(	loff_t, len			)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= inode->i_sb->s_dev;
+		__entry->ino	= inode->i_ino;
+		__entry->start	= start;
+		__entry->len	= len;
+	),
+
+	TP_printk("dev %d,%d ino %lu es [%lld/%lld)",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  (unsigned long) __entry->ino,
+		  __entry->start, __entry->len)
+);
+
+TRACE_EVENT(ext4_es_remove_space,
+	TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t len),
+
+	TP_ARGS(inode, start, len),
+
+	TP_STRUCT__entry(
+		__field(	dev_t,	dev			)
+		__field(	ino_t,	ino			)
+		__field(	loff_t,	start			)
+		__field(	loff_t,	len			)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= inode->i_sb->s_dev;
+		__entry->ino	= inode->i_ino;
+		__entry->start	= start;
+		__entry->len	= len;
+	),
+
+	TP_printk("dev %d,%d ino %lu es [%lld/%lld)",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  (unsigned long) __entry->ino,
+		  __entry->start, __entry->len)
+);
+
+TRACE_EVENT(ext4_es_find_extent_enter,
+	TP_PROTO(struct inode *inode, ext4_lblk_t start),
+
+	TP_ARGS(inode, start),
+
+	TP_STRUCT__entry(
+		__field(	dev_t,		dev		)
+		__field(	ino_t,		ino		)
+		__field(	ext4_lblk_t,	start		)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= inode->i_sb->s_dev;
+		__entry->ino	= inode->i_ino;
+		__entry->start	= start;
+	),
+
+	TP_printk("dev %d,%d ino %lu start %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  (unsigned long) __entry->ino, __entry->start)
+);
+
+TRACE_EVENT(ext4_es_find_extent_exit,
+	TP_PROTO(struct inode *inode, struct extent_status *es,
+		 ext4_lblk_t ret),
+
+	TP_ARGS(inode, es, ret),
+
+	TP_STRUCT__entry(
+		__field(	dev_t,		dev		)
+		__field(	ino_t,		ino		)
+		__field(	ext4_lblk_t,	start		)
+		__field(	ext4_lblk_t,	len		)
+		__field(	ext4_lblk_t,	ret		)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= inode->i_sb->s_dev;
+		__entry->ino	= inode->i_ino;
+		__entry->start	= es->start;
+		__entry->len	= es->len;
+		__entry->ret	= ret;
+	),
+
+	TP_printk("dev %d,%d ino %lu es [%u/%u) ret %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  (unsigned long) __entry->ino,
+		  __entry->start, __entry->len, __entry->ret)
+);
+
 #endif /* _TRACE_EXT4_H */
 
 /* This part must be outside protection */
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC][PATCH 06/10 v1][RESEND] ext4: reimplement fiemap on extent status tree
  2012-07-22  7:59 [RFC][PATCH 00/10 v1][RESEND] ext4: extent status tree (step 1) Zheng Liu
                   ` (4 preceding siblings ...)
  2012-07-22  7:59 ` [RFC][PATCH 05/10 v1][RESEND] ext4: add some tracepoints in " Zheng Liu
@ 2012-07-22  7:59 ` Zheng Liu
  2012-07-22  7:59 ` [RFC][PATCH 07/10 v1][RESEND] ext4: reimplement ext4_find_delay_alloc_range " Zheng Liu
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2012-07-22  7:59 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: xiaoqiangnk, achender, wenqing.lz

From: Zheng Liu <wenqing.lz@taobao.com>

This patch reimplements fiemap on extent status tree.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/extents.c |  186 +++++++----------------------------------------------
 1 files changed, 23 insertions(+), 163 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 0747917..06ca2a3 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4501,193 +4501,53 @@ static int ext4_ext_fiemap_cb(struct inode *inode, ext4_lblk_t next,
 		       struct ext4_ext_cache *newex, struct ext4_extent *ex,
 		       void *data)
 {
+	struct extent_status es;
 	__u64	logical;
 	__u64	physical;
 	__u64	length;
 	__u32	flags = 0;
+	ext4_lblk_t next_del;
 	int		ret = 0;
 	struct fiemap_extent_info *fieinfo = data;
 	unsigned char blksize_bits;
 
-	blksize_bits = inode->i_sb->s_blocksize_bits;
-	logical = (__u64)newex->ec_block << blksize_bits;
+	es.start = newex->ec_block;
+	down_read(&EXT4_I(inode)->i_data_sem);
+	next_del = ext4_es_find_extent(inode, &es);
+	up_read(&EXT4_I(inode)->i_data_sem);
 
+	next = min(next_del, next);
 	if (newex->ec_start == 0) {
 		/*
 		 * No extent in extent-tree contains block @newex->ec_start,
 		 * then the block may stay in 1)a hole or 2)delayed-extent.
-		 *
-		 * Holes or delayed-extents are processed as follows.
-		 * 1. lookup dirty pages with specified range in pagecache.
-		 *    If no page is got, then there is no delayed-extent and
-		 *    return with EXT_CONTINUE.
-		 * 2. find the 1st mapped buffer,
-		 * 3. check if the mapped buffer is both in the request range
-		 *    and a delayed buffer. If not, there is no delayed-extent,
-		 *    then return.
-		 * 4. a delayed-extent is found, the extent will be collected.
 		 */
-		ext4_lblk_t	end = 0;
-		pgoff_t		last_offset;
-		pgoff_t		offset;
-		pgoff_t		index;
-		pgoff_t		start_index = 0;
-		struct page	**pages = NULL;
-		struct buffer_head *bh = NULL;
-		struct buffer_head *head = NULL;
-		unsigned int nr_pages = PAGE_SIZE / sizeof(struct page *);
-
-		pages = kmalloc(PAGE_SIZE, GFP_KERNEL);
-		if (pages == NULL)
-			return -ENOMEM;
-
-		offset = logical >> PAGE_SHIFT;
-repeat:
-		last_offset = offset;
-		head = NULL;
-		ret = find_get_pages_tag(inode->i_mapping, &offset,
-					PAGECACHE_TAG_DIRTY, nr_pages, pages);
-
-		if (!(flags & FIEMAP_EXTENT_DELALLOC)) {
-			/* First time, try to find a mapped buffer. */
-			if (ret == 0) {
-out:
-				for (index = 0; index < ret; index++)
-					page_cache_release(pages[index]);
-				/* just a hole. */
-				kfree(pages);
-				return EXT_CONTINUE;
-			}
-			index = 0;
-
-next_page:
-			/* Try to find the 1st mapped buffer. */
-			end = ((__u64)pages[index]->index << PAGE_SHIFT) >>
-				  blksize_bits;
-			if (!page_has_buffers(pages[index]))
-				goto out;
-			head = page_buffers(pages[index]);
-			if (!head)
-				goto out;
-
-			index++;
-			bh = head;
-			do {
-				if (end >= newex->ec_block +
-					newex->ec_len)
-					/* The buffer is out of
-					 * the request range.
-					 */
-					goto out;
-
-				if (buffer_mapped(bh) &&
-				    end >= newex->ec_block) {
-					start_index = index - 1;
-					/* get the 1st mapped buffer. */
-					goto found_mapped_buffer;
-				}
-
-				bh = bh->b_this_page;
-				end++;
-			} while (bh != head);
-
-			/* No mapped buffer in the range found in this page,
-			 * We need to look up next page.
-			 */
-			if (index >= ret) {
-				/* There is no page left, but we need to limit
-				 * newex->ec_len.
-				 */
-				newex->ec_len = end - newex->ec_block;
-				goto out;
-			}
-			goto next_page;
-		} else {
-			/*Find contiguous delayed buffers. */
-			if (ret > 0 && pages[0]->index == last_offset)
-				head = page_buffers(pages[0]);
-			bh = head;
-			index = 1;
-			start_index = 0;
-		}
-
-found_mapped_buffer:
-		if (bh != NULL && buffer_delay(bh)) {
-			/* 1st or contiguous delayed buffer found. */
-			if (!(flags & FIEMAP_EXTENT_DELALLOC)) {
-				/*
-				 * 1st delayed buffer found, record
-				 * the start of extent.
-				 */
-				flags |= FIEMAP_EXTENT_DELALLOC;
-				newex->ec_block = end;
-				logical = (__u64)end << blksize_bits;
-			}
-			/* Find contiguous delayed buffers. */
-			do {
-				if (!buffer_delay(bh))
-					goto found_delayed_extent;
-				bh = bh->b_this_page;
-				end++;
-			} while (bh != head);
-
-			for (; index < ret; index++) {
-				if (!page_has_buffers(pages[index])) {
-					bh = NULL;
-					break;
-				}
-				head = page_buffers(pages[index]);
-				if (!head) {
-					bh = NULL;
-					break;
-				}
-
-				if (pages[index]->index !=
-				    pages[start_index]->index + index
-				    - start_index) {
-					/* Blocks are not contiguous. */
-					bh = NULL;
-					break;
-				}
-				bh = head;
-				do {
-					if (!buffer_delay(bh))
-						/* Delayed-extent ends. */
-						goto found_delayed_extent;
-					bh = bh->b_this_page;
-					end++;
-				} while (bh != head);
-			}
-		} else if (!(flags & FIEMAP_EXTENT_DELALLOC))
-			/* a hole found. */
-			goto out;
-
-found_delayed_extent:
-		newex->ec_len = min(end - newex->ec_block,
-						(ext4_lblk_t)EXT_INIT_MAX_LEN);
-		if (ret == nr_pages && bh != NULL &&
-			newex->ec_len < EXT_INIT_MAX_LEN &&
-			buffer_delay(bh)) {
-			/* Have not collected an extent and continue. */
-			for (index = 0; index < ret; index++)
-				page_cache_release(pages[index]);
-			goto repeat;
+		if (es.len == 0)
+			/* A hole found. */
+			return EXT_CONTINUE;
+
+		if (es.start > newex->ec_block) {
+			/* A hole found. */
+			newex->ec_len = min(es.start - newex->ec_block,
+					    newex->ec_len);
+			return EXT_CONTINUE;
 		}
 
-		for (index = 0; index < ret; index++)
-			page_cache_release(pages[index]);
-		kfree(pages);
+		flags |= FIEMAP_EXTENT_DELALLOC;
+		newex->ec_len = es.start + es.len - newex->ec_block;
 	}
 
-	physical = (__u64)newex->ec_start << blksize_bits;
-	length =   (__u64)newex->ec_len << blksize_bits;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC][PATCH 07/10 v1][RESEND] ext4: reimplement ext4_find_delay_alloc_range on extent status tree
  2012-07-22  7:59 [RFC][PATCH 00/10 v1][RESEND] ext4: extent status tree (step 1) Zheng Liu
                   ` (5 preceding siblings ...)
  2012-07-22  7:59 ` [RFC][PATCH 06/10 v1][RESEND] ext4: reimplement fiemap on " Zheng Liu
@ 2012-07-22  7:59 ` Zheng Liu
  2012-07-22  7:59 ` [RFC][PATCH 08/10 v1][RESEND] ext4: introduce lseek SEEK_DATA/SEEK_HOLE support Zheng Liu
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2012-07-22  7:59 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: xiaoqiangnk, achender, wenqing.lz

From: Zheng Liu <wenqing.lz@taobao.com>

ext4_find_delay_alloc_range is reimplemented on extent status tree.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/ext4.h         |    4 --
 fs/ext4/ext4_extents.h |    3 +-
 fs/ext4/extents.c      |  112 +++++------------------------------------------
 fs/ext4/inode.c        |   53 +----------------------
 4 files changed, 14 insertions(+), 158 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index c0fe23e..f64b471 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2433,14 +2433,10 @@ enum ext4_state_bits {
 				 * never, ever appear in a buffer_head's state
 				 * flag. See EXT4_MAP_FROM_CLUSTER to see where
 				 * this is used. */
-	BH_Da_Mapped,	/* Delayed allocated block that now has a mapping. This
-			 * flag is set when ext4_map_blocks is called on a
-			 * delayed allocated block to get its real mapping. */
 };
 
 BUFFER_FNS(Uninit, uninit)
 TAS_BUFFER_FNS(Uninit, uninit)
-BUFFER_FNS(Da_Mapped, da_mapped)
 
 /*
  * Add new method to test wether block and inode bitmaps are properly
diff --git a/fs/ext4/ext4_extents.h b/fs/ext4/ext4_extents.h
index cb1b2c9..603bb11 100644
--- a/fs/ext4/ext4_extents.h
+++ b/fs/ext4/ext4_extents.h
@@ -314,7 +314,6 @@ extern struct ext4_ext_path *ext4_ext_find_extent(struct inode *, ext4_lblk_t,
 							struct ext4_ext_path *);
 extern void ext4_ext_drop_refs(struct ext4_ext_path *);
 extern int ext4_ext_check_inode(struct inode *inode);
-extern int ext4_find_delalloc_cluster(struct inode *inode, ext4_lblk_t lblk,
-				      int search_hint_reverse);
+extern int ext4_find_delalloc_cluster(struct inode *inode, ext4_lblk_t lblk);
 #endif /* _EXT4_EXTENTS */
 
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 06ca2a3..f2c5294 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3394,115 +3394,27 @@ out:
 /**
  * ext4_find_delalloc_range: find delayed allocated block in the given range.
  *
- * Goes through the buffer heads in the range [lblk_start, lblk_end] and returns
- * whether there are any buffers marked for delayed allocation. It returns '1'
- * on the first delalloc'ed buffer head found. If no buffer head in the given
- * range is marked for delalloc, it returns 0.
- * lblk_start should always be <= lblk_end.
- * search_hint_reverse is to indicate that searching in reverse from lblk_end to
- * lblk_start might be more efficient (i.e., we will likely hit the delalloc'ed
- * block sooner). This is useful when blocks are truncated sequentially from
- * lblk_start towards lblk_end.
+ * Return 1 if there is a delalloc block in the range, otherwise 0.
  */
 static int ext4_find_delalloc_range(struct inode *inode,
 				    ext4_lblk_t lblk_start,
-				    ext4_lblk_t lblk_end,
-				    int search_hint_reverse)
+				    ext4_lblk_t lblk_end)
 {
-	struct address_space *mapping = inode->i_mapping;
-	struct buffer_head *head, *bh = NULL;
-	struct page *page;
-	ext4_lblk_t i, pg_lblk;
-	pgoff_t index;
-
-	if (!test_opt(inode->i_sb, DELALLOC))
-		return 0;
-
-	/* reverse search wont work if fs block size is less than page size */
-	if (inode->i_blkbits < PAGE_CACHE_SHIFT)
-		search_hint_reverse = 0;
-
-	if (search_hint_reverse)
-		i = lblk_end;
-	else
-		i = lblk_start;
-
-	index = i >> (PAGE_CACHE_SHIFT - inode->i_blkbits);
-
-	while ((i >= lblk_start) && (i <= lblk_end)) {
-		page = find_get_page(mapping, index);
-		if (!page)
-			goto nextpage;
-
-		if (!page_has_buffers(page))
-			goto nextpage;
-
-		head = page_buffers(page);
-		if (!head)
-			goto nextpage;
-
-		bh = head;
-		pg_lblk = index << (PAGE_CACHE_SHIFT -
-						inode->i_blkbits);
-		do {
-			if (unlikely(pg_lblk < lblk_start)) {
-				/*
-				 * This is possible when fs block size is less
-				 * than page size and our cluster starts/ends in
-				 * middle of the page. So we need to skip the
-				 * initial few blocks till we reach the 'lblk'
-				 */
-				pg_lblk++;
-				continue;
-			}
-
-			/* Check if the buffer is delayed allocated and that it
-			 * is not yet mapped. (when da-buffers are mapped during
-			 * their writeout, their da_mapped bit is set.)
-			 */
-			if (buffer_delay(bh) && !buffer_da_mapped(bh)) {
-				page_cache_release(page);
-				trace_ext4_find_delalloc_range(inode,
-						lblk_start, lblk_end,
-						search_hint_reverse,
-						1, i);
-				return 1;
-			}
-			if (search_hint_reverse)
-				i--;
-			else
-				i++;
-		} while ((i >= lblk_start) && (i <= lblk_end) &&
-				((bh = bh->b_this_page) != head));
-nextpage:
-		if (page)
-			page_cache_release(page);
-		/*
-		 * Move to next page. 'i' will be the first lblk in the next
-		 * page.
-		 */
-		if (search_hint_reverse)
-			index--;
-		else
-			index++;
-		i = index << (PAGE_CACHE_SHIFT - inode->i_blkbits);
-	}
+	struct extent_status es;
 
-	trace_ext4_find_delalloc_range(inode, lblk_start, lblk_end,
-					search_hint_reverse, 0, 0);
-	return 0;
+	es.start = lblk_start;
+	ext4_es_find_extent(inode, &es);
+	return (es.start + es.len) >= lblk_end && es.start <= lblk_start;
 }
 
-int ext4_find_delalloc_cluster(struct inode *inode, ext4_lblk_t lblk,
-			       int search_hint_reverse)
+int ext4_find_delalloc_cluster(struct inode *inode, ext4_lblk_t lblk)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	ext4_lblk_t lblk_start, lblk_end;
 	lblk_start = lblk & (~(sbi->s_cluster_ratio - 1));
 	lblk_end = lblk_start + sbi->s_cluster_ratio - 1;
 
-	return ext4_find_delalloc_range(inode, lblk_start, lblk_end,
-					search_hint_reverse);
+	return ext4_find_delalloc_range(inode, lblk_start, lblk_end);
 }
 
 /**
@@ -3563,7 +3475,7 @@ get_reserved_cluster_alloc(struct inode *inode, ext4_lblk_t lblk_start,
 		lblk_from = lblk_start & (~(sbi->s_cluster_ratio - 1));
 		lblk_to = lblk_from + c_offset - 1;
 
-		if (ext4_find_delalloc_range(inode, lblk_from, lblk_to, 0))
+		if (ext4_find_delalloc_range(inode, lblk_from, lblk_to))
 			allocated_clusters--;
 	}
 
@@ -3573,7 +3485,7 @@ get_reserved_cluster_alloc(struct inode *inode, ext4_lblk_t lblk_start,
 		lblk_from = lblk_start + num_blks;
 		lblk_to = lblk_from + (sbi->s_cluster_ratio - c_offset) - 1;
 
-		if (ext4_find_delalloc_range(inode, lblk_from, lblk_to, 0))
+		if (ext4_find_delalloc_range(inode, lblk_from, lblk_to))
 			allocated_clusters--;
 	}
 
@@ -3857,7 +3769,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 	if (ext4_ext_in_cache(inode, map->m_lblk, &newex)) {
 		if (!newex.ee_start_lo && !newex.ee_start_hi) {
 			if ((sbi->s_cluster_ratio > 1) &&
-			    ext4_find_delalloc_cluster(inode, map->m_lblk, 0))
+			    ext4_find_delalloc_cluster(inode, map->m_lblk))
 				map->m_flags |= EXT4_MAP_FROM_CLUSTER;
 
 			if ((flags & EXT4_GET_BLOCKS_CREATE) == 0) {
@@ -3945,7 +3857,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 	}
 
 	if ((sbi->s_cluster_ratio > 1) &&
-	    ext4_find_delalloc_cluster(inode, map->m_lblk, 0))
+	    ext4_find_delalloc_cluster(inode, map->m_lblk))
 		map->m_flags |= EXT4_MAP_FROM_CLUSTER;
 
 	/*
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2f82630..2d6d93e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -467,49 +467,6 @@ static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx,
 }
 
 /*
- * Sets the BH_Da_Mapped bit on the buffer heads corresponding to the given map.
- */
-static void set_buffers_da_mapped(struct inode *inode,
-				   struct ext4_map_blocks *map)
-{
-	struct address_space *mapping = inode->i_mapping;
-	struct pagevec pvec;
-	int i, nr_pages;
-	pgoff_t index, end;
-
-	index = map->m_lblk >> (PAGE_CACHE_SHIFT - inode->i_blkbits);
-	end = (map->m_lblk + map->m_len - 1) >>
-		(PAGE_CACHE_SHIFT - inode->i_blkbits);
-
-	pagevec_init(&pvec, 0);
-	while (index <= end) {
-		nr_pages = pagevec_lookup(&pvec, mapping, index,
-					  min(end - index + 1,
-					      (pgoff_t)PAGEVEC_SIZE));
-		if (nr_pages == 0)
-			break;
-		for (i = 0; i < nr_pages; i++) {
-			struct page *page = pvec.pages[i];
-			struct buffer_head *bh, *head;
-
-			if (unlikely(page->mapping != mapping) ||
-			    !PageDirty(page))
-				break;
-
-			if (page_has_buffers(page)) {
-				bh = head = page_buffers(page);
-				do {
-					set_buffer_da_mapped(bh);
-					bh = bh->b_this_page;
-				} while (bh != head);
-			}
-			index++;
-		}
-		pagevec_release(&pvec);
-	}
-}
-
-/*
  * The ext4_map_blocks() function tries to look up the requested blocks,
  * and returns if the blocks are already mapped.
  *
@@ -642,13 +599,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 	if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
 		ext4_clear_inode_state(inode, EXT4_STATE_DELALLOC_RESERVED);
 
-		/* If we have successfully mapped the delayed allocated blocks,
-		 * set the BH_Da_Mapped bit on them. Its important to do this
-		 * under the protection of i_data_sem.
-		 */
 		if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
 			int ret;
-			set_buffers_da_mapped(inode, map);
 delayed_mapped:
 			/* delayed allocation blocks has been allocated */
 			ret = ext4_es_remove_space(inode, map->m_lblk,
@@ -1296,7 +1248,6 @@ static void ext4_da_page_release_reservation(struct page *page,
 		if ((offset <= curr_off) && (buffer_delay(bh))) {
 			to_release++;
 			clear_buffer_delay(bh);
-			clear_buffer_da_mapped(bh);
 		}
 		curr_off = next_off;
 	} while ((bh = bh->b_this_page) != head);
@@ -1309,7 +1260,7 @@ static void ext4_da_page_release_reservation(struct page *page,
 		lblk = (page->index << (PAGE_CACHE_SHIFT - inode->i_blkbits)) +
 			((num_clusters - 1) << sbi->s_cluster_bits);
 		if (sbi->s_cluster_ratio == 1 ||
-		    !ext4_find_delalloc_cluster(inode, lblk, 1))
+		    !ext4_find_delalloc_cluster(inode, lblk))
 			ext4_da_release_space(inode, 1);
 
 		num_clusters--;
@@ -1415,8 +1366,6 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
 						clear_buffer_delay(bh);
 						bh->b_blocknr = pblock;
 					}
-					if (buffer_da_mapped(bh))
-						clear_buffer_da_mapped(bh);
 					if (buffer_unwritten(bh) ||
 					    buffer_mapped(bh))
 						BUG_ON(bh->b_blocknr != pblock);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC][PATCH 08/10 v1][RESEND] ext4: introduce lseek SEEK_DATA/SEEK_HOLE support
  2012-07-22  7:59 [RFC][PATCH 00/10 v1][RESEND] ext4: extent status tree (step 1) Zheng Liu
                   ` (6 preceding siblings ...)
  2012-07-22  7:59 ` [RFC][PATCH 07/10 v1][RESEND] ext4: reimplement ext4_find_delay_alloc_range " Zheng Liu
@ 2012-07-22  7:59 ` Zheng Liu
  2012-07-22  7:59 ` [RFC][PATCH 09/10 v1][RESEND] ext4: don't need to writeout all dirty pages in punch hole Zheng Liu
  2012-07-22  7:59 ` [RFC][PATCH 10/10 v1][RESEND] ext4: add two tracepoints in punching hole Zheng Liu
  9 siblings, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2012-07-22  7:59 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: xiaoqiangnk, achender, wenqing.lz

From: Zheng Liu <wenqing.lz@taobao.com>

This patch makes ext4 really support SEEK_DATA/SEEK_HOLE flags.  Extent-based
and block-based files are implemented together because ext4_map_blocks hides
this difference.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/ext4.h     |    1 +
 fs/ext4/file.c     |  152 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/ext4/indirect.c |    1 -
 3 files changed, 152 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f64b471..0957309 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2400,6 +2400,7 @@ extern int ext4_map_blocks(handle_t *handle, struct inode *inode,
 			   struct ext4_map_blocks *map, int flags);
 extern int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 			__u64 start, __u64 len);
+
 /* move_extent.c */
 extern int ext4_move_extents(struct file *o_filp, struct file *d_filp,
 			     __u64 start_orig, __u64 start_donor,
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 8c7642a..38f4d97 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -210,6 +210,145 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
 	return dquot_file_open(inode, filp);
 }
 
+static loff_t ext4_seek_data(struct file *file, loff_t offset, loff_t maxsize)
+{
+	struct inode *inode = file->f_mapping->host;
+	struct ext4_map_blocks map;
+	struct extent_status es;
+	ext4_lblk_t start_block, end_block, len, delblk;
+	loff_t dataoff, isize;
+	int blkbits;
+	int ret = 0;
+
+	mutex_lock(&inode->i_mutex);
+
+	blkbits = inode->i_sb->s_blocksize_bits;
+	isize = i_size_read(inode);
+	start_block = offset >> blkbits;
+	end_block = isize >> blkbits;
+	len = isize - offset + 1;
+
+	do {
+		map.m_lblk = start_block;
+		map.m_len = len >> blkbits;
+
+		ret = ext4_map_blocks(NULL, inode, &map, 0);
+
+		if (ret > 0) {
+			dataoff = start_block << blkbits;
+			break;
+		} else {
+			/* search in extent status */
+			es.start = start_block;
+			down_read(&EXT4_I(inode)->i_data_sem);
+			delblk = ext4_es_find_extent(inode, &es);
+			up_read(&EXT4_I(inode)->i_data_sem);
+
+			if (start_block >= es.start &&
+			    start_block < es.start + es.len) {
+				dataoff = start_block << blkbits;
+				break;
+			}
+
+			start_block++;
+
+			/*
+			 * currently hole punching doesn't change the size of
+			 * file.  So after hole punching, maybe there has a
+			 * hole at the end of file.
+			 */
+			if (start_block > end_block) {
+				dataoff = -ENXIO;
+				break;
+			}
+		}
+	} while (1);
+
+	mutex_unlock(&inode->i_mutex);
+	return dataoff;
+}
+
+static loff_t ext4_seek_hole(struct file *file, loff_t offset, loff_t maxsize)
+{
+	struct inode *inode = file->f_mapping->host;
+	struct ext4_map_blocks map;
+	struct extent_status es;
+	ext4_lblk_t start_block, end_block, len, delblk;
+	loff_t holeoff, isize;
+	int blkbits;
+	int ret = 0;
+
+	mutex_lock(&inode->i_mutex);
+
+	blkbits = inode->i_sb->s_blocksize_bits;
+	isize = i_size_read(inode);
+	start_block = offset >> blkbits;
+	end_block = isize >> blkbits;
+	len = isize - offset + 1;
+
+	do {
+		map.m_lblk = start_block;
+		map.m_len = len >> blkbits;
+
+		ret = ext4_map_blocks(NULL, inode, &map, 0);
+
+		if (ret > 0) {
+			/* skip this extent */
+			start_block += ret;
+			if (start_block > end_block) {
+				holeoff = isize;
+				break;
+			}
+		} else {
+			/* search in extent status */
+			es.start = start_block;
+			down_read(&EXT4_I(inode)->i_data_sem);
+			delblk = ext4_es_find_extent(inode, &es);
+			up_read(&EXT4_I(inode)->i_data_sem);
+
+			if (start_block >= es.start &&
+			    start_block < es.start + es.len) {
+				/* skip this delay extent */
+				start_block = es.start + es.len;
+				continue;
+			}
+
+			holeoff = start_block << blkbits;
+			break;
+		}
+	} while (1);
+
+	mutex_unlock(&inode->i_mutex);
+	return holeoff;
+}
+
+static loff_t ext4_seek_data_hole(struct file *file, loff_t offset,
+				  int origin, loff_t maxsize)
+{
+	struct inode *inode = file->f_mapping->host;
+
+	BUG_ON((origin != SEEK_DATA) && (origin != SEEK_HOLE));
+
+	if (offset >= i_size_read(inode))
+		return -ENXIO;
+
+	if (offset < 0 && !(file->f_mode & FMODE_UNSIGNED_OFFSET))
+		return -EINVAL;
+	if (offset > maxsize)
+		return -EINVAL;
+
+	switch (origin) {
+	case SEEK_DATA:
+		return ext4_seek_data(file, offset, maxsize);
+	case SEEK_HOLE:
+		return ext4_seek_hole(file, offset, maxsize);
+	default:
+		ext4_error(inode->i_sb, "Unknown origin");
+	}
+
+	return -EINVAL;
+}
+
 /*
  * ext4_llseek() copied from generic_file_llseek() to handle both
  * block-mapped and extent-mapped maxbytes values. This should
@@ -225,7 +364,18 @@ loff_t ext4_llseek(struct file *file, loff_t offset, int origin)
 	else
 		maxbytes = inode->i_sb->s_maxbytes;
 
-	return generic_file_llseek_size(file, offset, origin, maxbytes);
+	switch (origin) {
+	case SEEK_END:
+	case SEEK_CUR:
+	case SEEK_SET:
+		return generic_file_llseek_size(file, offset,
+						origin, maxbytes);
+	case SEEK_DATA:
+	case SEEK_HOLE:
+		return ext4_seek_data_hole(file, offset, origin, maxbytes);
+	}
+
+	return -EINVAL;
 }
 
 const struct file_operations ext4_file_operations = {
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index e190427..b22b86b 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -1502,4 +1502,3 @@ out_stop:
 	ext4_journal_stop(handle);
 	trace_ext4_truncate_exit(inode);
 }

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC][PATCH 09/10 v1][RESEND] ext4: don't need to writeout all dirty pages in punch hole
  2012-07-22  7:59 [RFC][PATCH 00/10 v1][RESEND] ext4: extent status tree (step 1) Zheng Liu
                   ` (7 preceding siblings ...)
  2012-07-22  7:59 ` [RFC][PATCH 08/10 v1][RESEND] ext4: introduce lseek SEEK_DATA/SEEK_HOLE support Zheng Liu
@ 2012-07-22  7:59 ` Zheng Liu
  2012-07-23 11:01   ` Lukáš Czerner
  2012-07-22  7:59 ` [RFC][PATCH 10/10 v1][RESEND] ext4: add two tracepoints in punching hole Zheng Liu
  9 siblings, 1 reply; 19+ messages in thread
From: Zheng Liu @ 2012-07-22  7:59 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: xiaoqiangnk, achender, wenqing.lz

From: Zheng Liu <wenqing.lz@taobao.com>

Now we don't need to writeout all dirty pages when punching a hole.  The i_mutex
locking is taken to avoid concurrent writes.  In truncate_pagecache_range, all
pages in this hole is released, and ext4_es_remove_space is called to update
extent status tree. 

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/extents.c |   28 ++++++++++++----------------
 1 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f2c5294..2a526b4 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4529,9 +4529,11 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 	loff_t first_page_offset, last_page_offset;
 	int credits, err = 0;
 
+	mutex_lock(&inode->i_mutex);
+
 	/* No need to punch hole beyond i_size */
 	if (offset >= inode->i_size)
-		return 0;
+		goto error;
 
 	/*
 	 * If the hole extends beyond i_size, set the hole
@@ -4549,18 +4551,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 	first_page_offset = first_page << PAGE_CACHE_SHIFT;
 	last_page_offset = last_page << PAGE_CACHE_SHIFT;
 
-	/*
-	 * Write out all dirty pages to avoid race conditions
-	 * Then release them.
-	 */
-	if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
-		err = filemap_write_and_wait_range(mapping,
-			offset, offset + length - 1);
-
-		if (err)
-			return err;
-	}
-
 	/* Now release the pages */
 	if (last_page_offset > first_page_offset) {
 		truncate_pagecache_range(inode, first_page_offset,
@@ -4572,12 +4562,14 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 
 	credits = ext4_writepage_trans_blocks(inode);
 	handle = ext4_journal_start(inode, credits);
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
+	if (IS_ERR(handle)) {
+		err = PTR_ERR(handle);
+		goto error;
+	}
 
 	err = ext4_orphan_add(handle, inode);
 	if (err)
-		goto out;
+		goto error;
 
 	/*
 	 * Now we need to zero out the non-page-aligned data in the
@@ -4652,6 +4644,8 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 	ext4_ext_invalidate_cache(inode);
 	ext4_discard_preallocations(inode);
 
+	err = ext4_es_remove_space(inode, first_block,
+				   stop_block - first_block);
 	err = ext4_ext_remove_space(inode, first_block, stop_block - 1);
 
 	ext4_ext_invalidate_cache(inode);
@@ -4667,6 +4661,8 @@ out:
 	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
 	ext4_mark_inode_dirty(handle, inode);
 	ext4_journal_stop(handle);
+error:
+	mutex_unlock(&inode->i_mutex);
 	return err;
 }
 int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC][PATCH 10/10 v1][RESEND] ext4: add two tracepoints in punching hole
  2012-07-22  7:59 [RFC][PATCH 00/10 v1][RESEND] ext4: extent status tree (step 1) Zheng Liu
                   ` (8 preceding siblings ...)
  2012-07-22  7:59 ` [RFC][PATCH 09/10 v1][RESEND] ext4: don't need to writeout all dirty pages in punch hole Zheng Liu
@ 2012-07-22  7:59 ` Zheng Liu
  2012-07-27 13:43   ` Lukáš Czerner
  9 siblings, 1 reply; 19+ messages in thread
From: Zheng Liu @ 2012-07-22  7:59 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: xiaoqiangnk, achender, wenqing.lz

From: Zheng Liu <wenqing.lz@taobao.com>

This patch adds two tracepoints in ext4_ext_punch_hole.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/extents.c           |    3 ++
 include/trace/events/ext4.h |   53 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 2a526b4..0fb4ff5 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4529,6 +4529,8 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
 	loff_t first_page_offset, last_page_offset;
 	int credits, err = 0;
 
+	trace_ext4_ext_punch_hole_enter(inode, offset, length);
+
 	mutex_lock(&inode->i_mutex);
 
 	/* No need to punch hole beyond i_size */
@@ -4663,6 +4665,7 @@ out:
 	ext4_journal_stop(handle);
 error:
 	mutex_unlock(&inode->i_mutex);
+	trace_ext4_ext_punch_hole_exit(inode, offset, length, err);
 	return err;
 }
 int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 5c17592..583f066 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -1312,6 +1312,59 @@ TRACE_EVENT(ext4_fallocate_exit,
 		  __entry->ret)
 );
 
+TRACE_EVENT(ext4_ext_punch_hole_enter,
+	TP_PROTO(struct inode *inode, loff_t offset, loff_t len),
+
+	TP_ARGS(inode, offset, len),
+
+	TP_STRUCT__entry(
+		__field(	dev_t,	dev			)
+		__field(	ino_t,	ino			)
+		__field(	loff_t,	offset			)
+		__field(	loff_t, len			)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= inode->i_sb->s_dev;
+		__entry->ino	= inode->i_ino;
+		__entry->offset	= offset;
+		__entry->len	= len;
+	),
+
+	TP_printk("dev %d,%d ino %lu offset %lld len %lld",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  (unsigned long) __entry->ino,
+		  __entry->offset, __entry->len)
+);
+
+TRACE_EVENT(ext4_ext_punch_hole_exit,
+	TP_PROTO(struct inode *inode, loff_t offset,
+		 loff_t len, int err),
+
+	TP_ARGS(inode, offset, len, err),
+
+	TP_STRUCT__entry(
+		__field(	dev_t,	dev			)
+		__field(	ino_t,	ino			)
+		__field(	loff_t,	offset			)
+		__field(	loff_t,	len			)
+		__field(	int,	err			)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= inode->i_sb->s_dev;
+		__entry->ino	= inode->i_ino;
+		__entry->offset	= offset;
+		__entry->len	= len;
+		__entry->err	= err;
+	),
+
+	TP_printk("dev %d,%d ino %lu offset %lld len %lld err %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  (unsigned long) __entry->ino,
+		  __entry->offset, __entry->len, __entry->err)
+);
+
 TRACE_EVENT(ext4_unlink_enter,
 	TP_PROTO(struct inode *parent, struct dentry *dentry),
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 09/10 v1][RESEND] ext4: don't need to writeout all dirty pages in punch hole
  2012-07-22  7:59 ` [RFC][PATCH 09/10 v1][RESEND] ext4: don't need to writeout all dirty pages in punch hole Zheng Liu
@ 2012-07-23 11:01   ` Lukáš Czerner
  2012-07-23 11:57     ` Zheng Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Lukáš Czerner @ 2012-07-23 11:01 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-ext4, linux-fsdevel, xiaoqiangnk, achender, wenqing.lz

On Sun, 22 Jul 2012, Zheng Liu wrote:

> Date: Sun, 22 Jul 2012 15:59:45 +0800
> From: Zheng Liu <gnehzuil.liu@gmail.com>
> To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
> Cc: xiaoqiangnk@gmail.com, achender@linux.vnet.ibm.com, wenqing.lz@taobao.com
> Subject: [RFC][PATCH 09/10 v1][RESEND] ext4: don't need to writeout all dirty
>     pages in punch hole
> 
> From: Zheng Liu <wenqing.lz@taobao.com>
> 
> Now we don't need to writeout all dirty pages when punching a hole.  The i_mutex
> locking is taken to avoid concurrent writes.  In truncate_pagecache_range, all
> pages in this hole is released, and ext4_es_remove_space is called to update
> extent status tree. 

Hi Zheng,

I am currently in the middle of reworking punch hole for ext4 so
there will be some changes in this area. See the patch set

http://www.spinics.net/lists/linux-ext4/msg33014.html

Moreover I think that we should avoid taking i_mutex if we can and I
believe that we can in this case, because we only need to prevent
allocation. So I just want to let you know that this part is
probably going to change anyway.

Thanks!
-Lukas

> 
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> ---
>  fs/ext4/extents.c |   28 ++++++++++++----------------
>  1 files changed, 12 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index f2c5294..2a526b4 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4529,9 +4529,11 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	loff_t first_page_offset, last_page_offset;
>  	int credits, err = 0;
>  
> +	mutex_lock(&inode->i_mutex);
> +
>  	/* No need to punch hole beyond i_size */
>  	if (offset >= inode->i_size)
> -		return 0;
> +		goto error;
>  
>  	/*
>  	 * If the hole extends beyond i_size, set the hole
> @@ -4549,18 +4551,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	first_page_offset = first_page << PAGE_CACHE_SHIFT;
>  	last_page_offset = last_page << PAGE_CACHE_SHIFT;
>  
> -	/*
> -	 * Write out all dirty pages to avoid race conditions
> -	 * Then release them.
> -	 */
> -	if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> -		err = filemap_write_and_wait_range(mapping,
> -			offset, offset + length - 1);
> -
> -		if (err)
> -			return err;
> -	}
> -
>  	/* Now release the pages */
>  	if (last_page_offset > first_page_offset) {
>  		truncate_pagecache_range(inode, first_page_offset,
> @@ -4572,12 +4562,14 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  
>  	credits = ext4_writepage_trans_blocks(inode);
>  	handle = ext4_journal_start(inode, credits);
> -	if (IS_ERR(handle))
> -		return PTR_ERR(handle);
> +	if (IS_ERR(handle)) {
> +		err = PTR_ERR(handle);
> +		goto error;
> +	}
>  
>  	err = ext4_orphan_add(handle, inode);
>  	if (err)
> -		goto out;
> +		goto error;
>  
>  	/*
>  	 * Now we need to zero out the non-page-aligned data in the
> @@ -4652,6 +4644,8 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	ext4_ext_invalidate_cache(inode);
>  	ext4_discard_preallocations(inode);
>  
> +	err = ext4_es_remove_space(inode, first_block,
> +				   stop_block - first_block);
>  	err = ext4_ext_remove_space(inode, first_block, stop_block - 1);
>  
>  	ext4_ext_invalidate_cache(inode);
> @@ -4667,6 +4661,8 @@ out:
>  	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
>  	ext4_mark_inode_dirty(handle, inode);
>  	ext4_journal_stop(handle);
> +error:
> +	mutex_unlock(&inode->i_mutex);
>  	return err;
>  }
>  int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 09/10 v1][RESEND] ext4: don't need to writeout all dirty pages in punch hole
  2012-07-23 11:01   ` Lukáš Czerner
@ 2012-07-23 11:57     ` Zheng Liu
  2012-07-23 12:20       ` Lukáš Czerner
  0 siblings, 1 reply; 19+ messages in thread
From: Zheng Liu @ 2012-07-23 11:57 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: linux-ext4, linux-fsdevel, xiaoqiangnk, achender, wenqing.lz

Hi Lukas,

On Mon, Jul 23, 2012 at 01:01:53PM +0200, Lukáš Czerner wrote:
> On Sun, 22 Jul 2012, Zheng Liu wrote:
> 
> > Date: Sun, 22 Jul 2012 15:59:45 +0800
> > From: Zheng Liu <gnehzuil.liu@gmail.com>
> > To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
> > Cc: xiaoqiangnk@gmail.com, achender@linux.vnet.ibm.com, wenqing.lz@taobao.com
> > Subject: [RFC][PATCH 09/10 v1][RESEND] ext4: don't need to writeout all dirty
> >     pages in punch hole
> > 
> > From: Zheng Liu <wenqing.lz@taobao.com>
> > 
> > Now we don't need to writeout all dirty pages when punching a hole.  The i_mutex
> > locking is taken to avoid concurrent writes.  In truncate_pagecache_range, all
> > pages in this hole is released, and ext4_es_remove_space is called to update
> > extent status tree. 
> 
> Hi Zheng,
> 
> I am currently in the middle of reworking punch hole for ext4 so
> there will be some changes in this area. See the patch set
> 
> http://www.spinics.net/lists/linux-ext4/msg33014.html

Thank you for your reminding.  I have seen your patch set and it is
pretty cool.  IMHO your patch set shall be merged into upstream kernel
before applying io tree because io tree still has a lot of things that
needs to be done.  So I will change it according to your patches. :-)

> Moreover I think that we should avoid taking i_mutex if we can and I
> believe that we can in this case, because we only need to prevent
> allocation. So I just want to let you know that this part is
> probably going to change anyway.

It seems that we need to take i_mutex locking to prevent from buffered
writes after page cache has been truncated by truncate_pagecache_range.
If a buffered write without delalloc occurs after truncating page cache
and before taking i_data_sem, that means that the allocated block for
this buffered write will be removed in ext4_ext_remove_space when the
offset is within the range of the hole.  Am I missing something?

Regards,
Zheng

> > 
> > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> > ---
> >  fs/ext4/extents.c |   28 ++++++++++++----------------
> >  1 files changed, 12 insertions(+), 16 deletions(-)
> > 
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index f2c5294..2a526b4 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -4529,9 +4529,11 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> >  	loff_t first_page_offset, last_page_offset;
> >  	int credits, err = 0;
> >  
> > +	mutex_lock(&inode->i_mutex);
> > +
> >  	/* No need to punch hole beyond i_size */
> >  	if (offset >= inode->i_size)
> > -		return 0;
> > +		goto error;
> >  
> >  	/*
> >  	 * If the hole extends beyond i_size, set the hole
> > @@ -4549,18 +4551,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> >  	first_page_offset = first_page << PAGE_CACHE_SHIFT;
> >  	last_page_offset = last_page << PAGE_CACHE_SHIFT;
> >  
> > -	/*
> > -	 * Write out all dirty pages to avoid race conditions
> > -	 * Then release them.
> > -	 */
> > -	if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> > -		err = filemap_write_and_wait_range(mapping,
> > -			offset, offset + length - 1);
> > -
> > -		if (err)
> > -			return err;
> > -	}
> > -
> >  	/* Now release the pages */
> >  	if (last_page_offset > first_page_offset) {
> >  		truncate_pagecache_range(inode, first_page_offset,
> > @@ -4572,12 +4562,14 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> >  
> >  	credits = ext4_writepage_trans_blocks(inode);
> >  	handle = ext4_journal_start(inode, credits);
> > -	if (IS_ERR(handle))
> > -		return PTR_ERR(handle);
> > +	if (IS_ERR(handle)) {
> > +		err = PTR_ERR(handle);
> > +		goto error;
> > +	}
> >  
> >  	err = ext4_orphan_add(handle, inode);
> >  	if (err)
> > -		goto out;
> > +		goto error;
> >  
> >  	/*
> >  	 * Now we need to zero out the non-page-aligned data in the
> > @@ -4652,6 +4644,8 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> >  	ext4_ext_invalidate_cache(inode);
> >  	ext4_discard_preallocations(inode);
> >  
> > +	err = ext4_es_remove_space(inode, first_block,
> > +				   stop_block - first_block);
> >  	err = ext4_ext_remove_space(inode, first_block, stop_block - 1);
> >  
> >  	ext4_ext_invalidate_cache(inode);
> > @@ -4667,6 +4661,8 @@ out:
> >  	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
> >  	ext4_mark_inode_dirty(handle, inode);
> >  	ext4_journal_stop(handle);
> > +error:
> > +	mutex_unlock(&inode->i_mutex);
> >  	return err;
> >  }
> >  int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 09/10 v1][RESEND] ext4: don't need to writeout all dirty pages in punch hole
  2012-07-23 11:57     ` Zheng Liu
@ 2012-07-23 12:20       ` Lukáš Czerner
  2012-07-23 13:18         ` Zheng Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Lukáš Czerner @ 2012-07-23 12:20 UTC (permalink / raw)
  To: Zheng Liu
  Cc: Lukáš Czerner, linux-ext4, linux-fsdevel, xiaoqiangnk,
	achender, wenqing.lz

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6172 bytes --]

On Mon, 23 Jul 2012, Zheng Liu wrote:

> Date: Mon, 23 Jul 2012 19:57:22 +0800
> From: Zheng Liu <gnehzuil.liu@gmail.com>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
>     xiaoqiangnk@gmail.com, achender@linux.vnet.ibm.com, wenqing.lz@taobao.com
> Subject: Re: [RFC][PATCH 09/10 v1][RESEND] ext4: don't need to writeout all
>     dirty pages in punch hole
> 
> Hi Lukas,
> 
> On Mon, Jul 23, 2012 at 01:01:53PM +0200, Lukáš Czerner wrote:
> > On Sun, 22 Jul 2012, Zheng Liu wrote:
> > 
> > > Date: Sun, 22 Jul 2012 15:59:45 +0800
> > > From: Zheng Liu <gnehzuil.liu@gmail.com>
> > > To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
> > > Cc: xiaoqiangnk@gmail.com, achender@linux.vnet.ibm.com, wenqing.lz@taobao.com
> > > Subject: [RFC][PATCH 09/10 v1][RESEND] ext4: don't need to writeout all dirty
> > >     pages in punch hole
> > > 
> > > From: Zheng Liu <wenqing.lz@taobao.com>
> > > 
> > > Now we don't need to writeout all dirty pages when punching a hole.  The i_mutex
> > > locking is taken to avoid concurrent writes.  In truncate_pagecache_range, all
> > > pages in this hole is released, and ext4_es_remove_space is called to update
> > > extent status tree. 
> > 
> > Hi Zheng,
> > 
> > I am currently in the middle of reworking punch hole for ext4 so
> > there will be some changes in this area. See the patch set
> > 
> > http://www.spinics.net/lists/linux-ext4/msg33014.html
> 
> Thank you for your reminding.  I have seen your patch set and it is
> pretty cool.  IMHO your patch set shall be merged into upstream kernel
> before applying io tree because io tree still has a lot of things that
> needs to be done.  So I will change it according to your patches. :-)

Great, thanks!

> 
> > Moreover I think that we should avoid taking i_mutex if we can and I
> > believe that we can in this case, because we only need to prevent
> > allocation. So I just want to let you know that this part is
> > probably going to change anyway.
> 
> It seems that we need to take i_mutex locking to prevent from buffered
> writes after page cache has been truncated by truncate_pagecache_range.
> If a buffered write without delalloc occurs after truncating page cache
> and before taking i_data_sem, that means that the allocated block for
> this buffered write will be removed in ext4_ext_remove_space when the
> offset is within the range of the hole.  Am I missing something?

You're absolutely right, currently this is possible. But I think that we
can take i_data_sem before truncating the pagecache hence preventing anyone
from mapping new blocks. However this is not yet implemented in my
patch set.

...
hmm, looking at the ext4_write_begin() it seems like it might not be
such good idea after all. It seems to take page lock before
i_data_sem so we might get deadlock, moreover if the punch hole
happened in the middle of the ext4_write_begin() we might have only
part of the data written, moreover this does not have to be hole
aligned, which is bad. I need to revise that.

Thanks!
-Lukas


> 
> Regards,
> Zheng
> 
> > > 
> > > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> > > ---
> > >  fs/ext4/extents.c |   28 ++++++++++++----------------
> > >  1 files changed, 12 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > > index f2c5294..2a526b4 100644
> > > --- a/fs/ext4/extents.c
> > > +++ b/fs/ext4/extents.c
> > > @@ -4529,9 +4529,11 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> > >  	loff_t first_page_offset, last_page_offset;
> > >  	int credits, err = 0;
> > >  
> > > +	mutex_lock(&inode->i_mutex);
> > > +
> > >  	/* No need to punch hole beyond i_size */
> > >  	if (offset >= inode->i_size)
> > > -		return 0;
> > > +		goto error;
> > >  
> > >  	/*
> > >  	 * If the hole extends beyond i_size, set the hole
> > > @@ -4549,18 +4551,6 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> > >  	first_page_offset = first_page << PAGE_CACHE_SHIFT;
> > >  	last_page_offset = last_page << PAGE_CACHE_SHIFT;
> > >  
> > > -	/*
> > > -	 * Write out all dirty pages to avoid race conditions
> > > -	 * Then release them.
> > > -	 */
> > > -	if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> > > -		err = filemap_write_and_wait_range(mapping,
> > > -			offset, offset + length - 1);
> > > -
> > > -		if (err)
> > > -			return err;
> > > -	}
> > > -
> > >  	/* Now release the pages */
> > >  	if (last_page_offset > first_page_offset) {
> > >  		truncate_pagecache_range(inode, first_page_offset,
> > > @@ -4572,12 +4562,14 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> > >  
> > >  	credits = ext4_writepage_trans_blocks(inode);
> > >  	handle = ext4_journal_start(inode, credits);
> > > -	if (IS_ERR(handle))
> > > -		return PTR_ERR(handle);
> > > +	if (IS_ERR(handle)) {
> > > +		err = PTR_ERR(handle);
> > > +		goto error;
> > > +	}
> > >  
> > >  	err = ext4_orphan_add(handle, inode);
> > >  	if (err)
> > > -		goto out;
> > > +		goto error;
> > >  
> > >  	/*
> > >  	 * Now we need to zero out the non-page-aligned data in the
> > > @@ -4652,6 +4644,8 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> > >  	ext4_ext_invalidate_cache(inode);
> > >  	ext4_discard_preallocations(inode);
> > >  
> > > +	err = ext4_es_remove_space(inode, first_block,
> > > +				   stop_block - first_block);
> > >  	err = ext4_ext_remove_space(inode, first_block, stop_block - 1);
> > >  
> > >  	ext4_ext_invalidate_cache(inode);
> > > @@ -4667,6 +4661,8 @@ out:
> > >  	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
> > >  	ext4_mark_inode_dirty(handle, inode);
> > >  	ext4_journal_stop(handle);
> > > +error:
> > > +	mutex_unlock(&inode->i_mutex);
> > >  	return err;
> > >  }
> > >  int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> > > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 09/10 v1][RESEND] ext4: don't need to writeout all dirty pages in punch hole
  2012-07-23 12:20       ` Lukáš Czerner
@ 2012-07-23 13:18         ` Zheng Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2012-07-23 13:18 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: linux-ext4, linux-fsdevel, xiaoqiangnk, achender, wenqing.lz

On Mon, Jul 23, 2012 at 02:20:57PM +0200, Lukáš Czerner wrote:
[cut...]
> > > Moreover I think that we should avoid taking i_mutex if we can and I
> > > believe that we can in this case, because we only need to prevent
> > > allocation. So I just want to let you know that this part is
> > > probably going to change anyway.
> > 
> > It seems that we need to take i_mutex locking to prevent from buffered
> > writes after page cache has been truncated by truncate_pagecache_range.
> > If a buffered write without delalloc occurs after truncating page cache
> > and before taking i_data_sem, that means that the allocated block for
> > this buffered write will be removed in ext4_ext_remove_space when the
> > offset is within the range of the hole.  Am I missing something?
> 
> You're absolutely right, currently this is possible. But I think that we
> can take i_data_sem before truncating the pagecache hence preventing anyone
> from mapping new blocks. However this is not yet implemented in my
> patch set.
> 
> ...
> hmm, looking at the ext4_write_begin() it seems like it might not be
> such good idea after all. It seems to take page lock before
> i_data_sem so we might get deadlock, moreover if the punch hole
> happened in the middle of the ext4_write_begin() we might have only
> part of the data written, moreover this does not have to be hole
> aligned, which is bad. I need to revise that.

Yes, this is why I think that i_mutex locking should be taken.  At least
we are safty when we take the i_mutex. :-)

Regards,
Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 10/10 v1][RESEND] ext4: add two tracepoints in punching hole
  2012-07-22  7:59 ` [RFC][PATCH 10/10 v1][RESEND] ext4: add two tracepoints in punching hole Zheng Liu
@ 2012-07-27 13:43   ` Lukáš Czerner
  2012-07-30  2:15     ` Zheng Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Lukáš Czerner @ 2012-07-27 13:43 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-ext4, linux-fsdevel, xiaoqiangnk, achender, wenqing.lz

On Sun, 22 Jul 2012, Zheng Liu wrote:

> Date: Sun, 22 Jul 2012 15:59:46 +0800
> From: Zheng Liu <gnehzuil.liu@gmail.com>
> To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
> Cc: xiaoqiangnk@gmail.com, achender@linux.vnet.ibm.com, wenqing.lz@taobao.com
> Subject: [RFC][PATCH 10/10 v1][RESEND] ext4: add two tracepoints in punching
>     hole
> 
> From: Zheng Liu <wenqing.lz@taobao.com>
> 
> This patch adds two tracepoints in ext4_ext_punch_hole.

Hi,

the trace_ext4_ext_punch_hole_enter() looks good, but I am not so
sure about the trace_ext4_ext_punch_hole_exit() trace point. What is
the point of having this tracepoint ? The only thing it adds to the
information we already have is return value of the function and
that's something we'll know anyway, right ?

Is there any special reason for having this ? If not, I think it can
be removed and trace_ext4_ext_punch_hole_enter() can be renamed to
trace_ext4_ext_punch_hole() to match the naming conventions of other
ext4 tracepoints.

Thanks!
-Lukas

> 
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> ---
>  fs/ext4/extents.c           |    3 ++
>  include/trace/events/ext4.h |   53 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 56 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 2a526b4..0fb4ff5 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4529,6 +4529,8 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	loff_t first_page_offset, last_page_offset;
>  	int credits, err = 0;
>  
> +	trace_ext4_ext_punch_hole_enter(inode, offset, length);
> +
>  	mutex_lock(&inode->i_mutex);
>  
>  	/* No need to punch hole beyond i_size */
> @@ -4663,6 +4665,7 @@ out:
>  	ext4_journal_stop(handle);
>  error:
>  	mutex_unlock(&inode->i_mutex);
> +	trace_ext4_ext_punch_hole_exit(inode, offset, length, err);
>  	return err;
>  }
>  int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> index 5c17592..583f066 100644
> --- a/include/trace/events/ext4.h
> +++ b/include/trace/events/ext4.h
> @@ -1312,6 +1312,59 @@ TRACE_EVENT(ext4_fallocate_exit,
>  		  __entry->ret)
>  );
>  
> +TRACE_EVENT(ext4_ext_punch_hole_enter,
> +	TP_PROTO(struct inode *inode, loff_t offset, loff_t len),
> +
> +	TP_ARGS(inode, offset, len),
> +
> +	TP_STRUCT__entry(
> +		__field(	dev_t,	dev			)
> +		__field(	ino_t,	ino			)
> +		__field(	loff_t,	offset			)
> +		__field(	loff_t, len			)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->dev	= inode->i_sb->s_dev;
> +		__entry->ino	= inode->i_ino;
> +		__entry->offset	= offset;
> +		__entry->len	= len;
> +	),
> +
> +	TP_printk("dev %d,%d ino %lu offset %lld len %lld",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  (unsigned long) __entry->ino,
> +		  __entry->offset, __entry->len)
> +);
> +
> +TRACE_EVENT(ext4_ext_punch_hole_exit,
> +	TP_PROTO(struct inode *inode, loff_t offset,
> +		 loff_t len, int err),
> +
> +	TP_ARGS(inode, offset, len, err),
> +
> +	TP_STRUCT__entry(
> +		__field(	dev_t,	dev			)
> +		__field(	ino_t,	ino			)
> +		__field(	loff_t,	offset			)
> +		__field(	loff_t,	len			)
> +		__field(	int,	err			)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->dev	= inode->i_sb->s_dev;
> +		__entry->ino	= inode->i_ino;
> +		__entry->offset	= offset;
> +		__entry->len	= len;
> +		__entry->err	= err;
> +	),
> +
> +	TP_printk("dev %d,%d ino %lu offset %lld len %lld err %d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  (unsigned long) __entry->ino,
> +		  __entry->offset, __entry->len, __entry->err)
> +);
> +
>  TRACE_EVENT(ext4_unlink_enter,
>  	TP_PROTO(struct inode *parent, struct dentry *dentry),
>  
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 10/10 v1][RESEND] ext4: add two tracepoints in punching hole
  2012-07-27 13:43   ` Lukáš Czerner
@ 2012-07-30  2:15     ` Zheng Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2012-07-30  2:15 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: linux-ext4, linux-fsdevel, xiaoqiangnk, achender, wenqing.lz

On Fri, Jul 27, 2012 at 03:43:46PM +0200, Lukáš Czerner wrote:
> On Sun, 22 Jul 2012, Zheng Liu wrote:
> 
> > Date: Sun, 22 Jul 2012 15:59:46 +0800
> > From: Zheng Liu <gnehzuil.liu@gmail.com>
> > To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
> > Cc: xiaoqiangnk@gmail.com, achender@linux.vnet.ibm.com, wenqing.lz@taobao.com
> > Subject: [RFC][PATCH 10/10 v1][RESEND] ext4: add two tracepoints in punching
> >     hole
> > 
> > From: Zheng Liu <wenqing.lz@taobao.com>
> > 
> > This patch adds two tracepoints in ext4_ext_punch_hole.
> 
> Hi,
> 
> the trace_ext4_ext_punch_hole_enter() looks good, but I am not so
> sure about the trace_ext4_ext_punch_hole_exit() trace point. What is
> the point of having this tracepoint ? The only thing it adds to the
> information we already have is return value of the function and
> that's something we'll know anyway, right ?
> 
> Is there any special reason for having this ? If not, I think it can
> be removed and trace_ext4_ext_punch_hole_enter() can be renamed to
> trace_ext4_ext_punch_hole() to match the naming conventions of other
> ext4 tracepoints.

Hi Lukas,

The trace_ext4_ext_punch_hole_exit() is added because we do the same
thing in ext4_truncate().  In this function we call *_truncate_enter()
and *_truncate_exit().  There is not other special reason for adding it.
So I can change it according to your advice in next version.

Regards,
Zheng

> > 
> > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> > ---
> >  fs/ext4/extents.c           |    3 ++
> >  include/trace/events/ext4.h |   53 +++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 56 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index 2a526b4..0fb4ff5 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -4529,6 +4529,8 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
> >  	loff_t first_page_offset, last_page_offset;
> >  	int credits, err = 0;
> >  
> > +	trace_ext4_ext_punch_hole_enter(inode, offset, length);
> > +
> >  	mutex_lock(&inode->i_mutex);
> >  
> >  	/* No need to punch hole beyond i_size */
> > @@ -4663,6 +4665,7 @@ out:
> >  	ext4_journal_stop(handle);
> >  error:
> >  	mutex_unlock(&inode->i_mutex);
> > +	trace_ext4_ext_punch_hole_exit(inode, offset, length, err);
> >  	return err;
> >  }
> >  int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> > diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> > index 5c17592..583f066 100644
> > --- a/include/trace/events/ext4.h
> > +++ b/include/trace/events/ext4.h
> > @@ -1312,6 +1312,59 @@ TRACE_EVENT(ext4_fallocate_exit,
> >  		  __entry->ret)
> >  );
> >  
> > +TRACE_EVENT(ext4_ext_punch_hole_enter,
> > +	TP_PROTO(struct inode *inode, loff_t offset, loff_t len),
> > +
> > +	TP_ARGS(inode, offset, len),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(	dev_t,	dev			)
> > +		__field(	ino_t,	ino			)
> > +		__field(	loff_t,	offset			)
> > +		__field(	loff_t, len			)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->dev	= inode->i_sb->s_dev;
> > +		__entry->ino	= inode->i_ino;
> > +		__entry->offset	= offset;
> > +		__entry->len	= len;
> > +	),
> > +
> > +	TP_printk("dev %d,%d ino %lu offset %lld len %lld",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > +		  (unsigned long) __entry->ino,
> > +		  __entry->offset, __entry->len)
> > +);
> > +
> > +TRACE_EVENT(ext4_ext_punch_hole_exit,
> > +	TP_PROTO(struct inode *inode, loff_t offset,
> > +		 loff_t len, int err),
> > +
> > +	TP_ARGS(inode, offset, len, err),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(	dev_t,	dev			)
> > +		__field(	ino_t,	ino			)
> > +		__field(	loff_t,	offset			)
> > +		__field(	loff_t,	len			)
> > +		__field(	int,	err			)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->dev	= inode->i_sb->s_dev;
> > +		__entry->ino	= inode->i_ino;
> > +		__entry->offset	= offset;
> > +		__entry->len	= len;
> > +		__entry->err	= err;
> > +	),
> > +
> > +	TP_printk("dev %d,%d ino %lu offset %lld len %lld err %d",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > +		  (unsigned long) __entry->ino,
> > +		  __entry->offset, __entry->len, __entry->err)
> > +);
> > +
> >  TRACE_EVENT(ext4_unlink_enter,
> >  	TP_PROTO(struct inode *parent, struct dentry *dentry),
> >  
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 02/10 v1][RESEND] ext4: add operations on extent status tree
  2012-07-22  7:59 ` [RFC][PATCH 02/10 v1][RESEND] ext4: add operations on " Zheng Liu
@ 2012-07-31 11:55   ` Lukáš Czerner
  2012-07-31 13:18     ` Zheng Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Lukáš Czerner @ 2012-07-31 11:55 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-ext4, linux-fsdevel, xiaoqiangnk, achender, wenqing.lz

On Sun, 22 Jul 2012, Zheng Liu wrote:

> Date: Sun, 22 Jul 2012 15:59:38 +0800
> From: Zheng Liu <gnehzuil.liu@gmail.com>
> To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
> Cc: xiaoqiangnk@gmail.com, achender@linux.vnet.ibm.com, wenqing.lz@taobao.com
> Subject: [RFC][PATCH 02/10 v1][RESEND] ext4: add operations on extent status
>     tree
> 
> From: Zheng Liu <wenqing.lz@taobao.com>
> 
> This patch adds operations on a extent status tree.
> 
> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
> Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> ---
>  fs/ext4/Makefile         |    2 +-
>  fs/ext4/extents_status.c |  418 ++++++++++++++++++++++++++++++++++++++++++++++
>  fs/ext4/extents_status.h |   17 ++
>  3 files changed, 436 insertions(+), 1 deletions(-)
>  create mode 100644 fs/ext4/extents_status.c
> 
> diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
> index 56fd8f86..41f22be 100644
> --- a/fs/ext4/Makefile
> +++ b/fs/ext4/Makefile
> @@ -7,7 +7,7 @@ obj-$(CONFIG_EXT4_FS) += ext4.o
>  ext4-y	:= balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o page-io.o \
>  		ioctl.o namei.o super.o symlink.o hash.o resize.o extents.o \
>  		ext4_jbd2.o migrate.o mballoc.o block_validity.o move_extent.o \
> -		mmp.o indirect.o
> +		mmp.o indirect.o extents_status.o
>  
>  ext4-$(CONFIG_EXT4_FS_XATTR)		+= xattr.o xattr_user.o xattr_trusted.o
>  ext4-$(CONFIG_EXT4_FS_POSIX_ACL)	+= acl.o
> diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
> new file mode 100644
> index 0000000..bd4e589
> --- /dev/null
> +++ b/fs/ext4/extents_status.c
> @@ -0,0 +1,418 @@
> +/*
> + *  fs/ext4/extents_status.c
> + *
> + * Written by Yongqiang Yang <xiaoqiangnk@gmail.com>
> + *
> + * Ext4 extents status tree core functions.
> + */
> +#include <linux/rbtree.h>
> +#include "ext4.h"
> +#include "extents_status.h"
> +#include "ext4_extents.h"
> +
> +/*
> + * extents status tree implementation for ext4.
> + *
> + *
> + * ==========================================================================
> + * Extents status encompass delayed extents and extent locks
> + *
> + * 1. Why delayed extent implementation ?
> + *
> + * Without delayed extent, ext4 identifies a delayed extent by looking up
> + * page cache, this has several deficiencies - complicated, buggy, and
> + * inefficient code.
> + *
> + * FIEMAP, SEEK_HOLE/DATA, bigalloc, punch hole and writeout all need to know if
> + * a block or a range of blocks are belonged to a delayed extent.
> + *
> + * Let us have a look at how they do without delayed extents implementation.
> + *   --	FIEMAP
> + *	FIEMAP looks up page cache to identify delayed allocations from holes.
> + *
> + *   --	SEEK_HOLE/DATA
> + *	SEEK_HOLE/DATA has the same problem as FIEMAP.
> + *
> + *   --	bigalloc
> + *	bigalloc looks up page cache to figure out if a block is already
> + *	under delayed allocation or not to determine whether quota reserving
> + *	is needed for the cluster.
> + *
> + *   -- punch hole
> + *	punch hole looks up page cache to identify a delayed extent.
> + *
> + *   --	writeout
> + *	Writeout looks up whole page cache to see if a buffer is mapped, If
> + *	there are not very many delayed buffers, then it is time comsuming.
> + *
> + * With delayed extents implementation, FIEMAP, SEEK_HOLE/DATA, bigalloc and
> + * writeout can figure out if a block or a range of blocks is under delayed
> + * allocation(belonged to a delayed extent) or not by searching the delayed
> + * extent tree.
> + *
> + *
> + * ==========================================================================
> + * 2. ext4 delayed extents impelmentation
> + *
> + *   --	delayed extent
> + *	A delayed extent is a range of blocks which are contiguous logically and
> + *	under delayed allocation.  Unlike extent in ext4, delayed extent in ext4
> + *	is a in-memory struct, there is no corresponding on-disk data.  There is
> + *	no limit on length of delayed extent, so a delayed extent can contain as
> + *	many blocks as they are contiguous logically.
> + *
> + *   --	delayed extent tree
> + *	Every inode has a delayed extent tree and all under delayed allocation
> + *	blocks are added to the tree as dealyed extents.  Delayed extents in
> + *	the tree are ordered by logical block no.
> + *
> + *   --	operations on a delayed extent tree
> + *	There are three operations on a delayed extent tree: find next delayed
> + *	extent, adding a space(a range of blocks) and removing a space.
> + *
> + *   --	race on a delayed extent tree
> + *	Delayed extent tree is protected inode->i_data_sem like extent tree.
> + *
> + *
> + * ==========================================================================
> + * 3. performance analysis
> + *   --	overhead
> + *	1. Apart from operations on a delayed extent tree, we need to
> + *	down_write(inode->i_data_sem) in delayed write path to maintain delayed
> + *	extent tree, this can have impact on parallel read-write and write-write
> + *
> + *	2. There is a cache extent for write access, so if writes are not very
> + *	random, adding space operaions are in O(1) time.
> + *
> + *   --	gain
> + *	3. Code is much simpler, more readable, more maintainable and
> + *      more efficient.
> + */
> +
> +static struct kmem_cache *ext4_es_cachep;
> +
> +int __init ext4_init_es(void)
> +{
> +	ext4_es_cachep = KMEM_CACHE(extent_status, SLAB_RECLAIM_ACCOUNT);
> +	if (ext4_es_cachep == NULL)
> +		return -ENOMEM;
> +	return 0;
> +}
> +
> +void ext4_exit_es(void)
> +{
> +	if (ext4_es_cachep)
> +		kmem_cache_destroy(ext4_es_cachep);
> +}
> +
> +void ext4_es_init_tree(struct ext4_es_tree *tree)
> +{
> +	tree->root = RB_ROOT;
> +	tree->cache_es = NULL;
> +}
> +
> +#ifdef SE_DEBUG
> +static void ext4_es_print_tree(struct inode *inode)
> +{
> +	struct ext4_es_tree *tree;
> +	struct rb_node *node;
> +
> +	printk(KERN_DEBUG "status extents for inode %lu:", inode->i_ino);
> +	tree = &EXT4_I(inode)->i_es_tree;
> +	node = rb_first(&tree->root);
> +	while (node) {
> +		struct extent_status *es;
> +		es = rb_entry(node, struct extent_status, rb_node);
> +		printk(KERN_DEBUG " [%u/%u)", es->start, es->len);
> +		node = rb_next(node);
> +	}
> +	printk(KERN_DEBUG "\n");
> +}
> +#else
> +#define ext4_es_print_tree(inode)
> +#endif
> +
> +static inline ext4_lblk_t extent_status_end(struct extent_status *es)
> +{
> +	if (es->start + es->len < es->start)
> +		return (ext4_lblk_t)-1;

Is this possible to happen ? It seems to me that if so, it means
that the extent is corrupted and we should do something about this
rather than ignore the problem by returning -1 (callers does not
actually check for it), at least warning should be in place.

> +	return es->start + es->len;

This seems like it should rather be es->start + es->len - 1, since
the function name suggests that we should get the last block of the
extent. Having it this way seems error prone to the off by one
errors.

> +}
> +
> +/*
> + * search through the tree for an delayed_extent with a given offset.  If
> + * it can't be found, try to find next extent.

This helper could be used by any other extent tree in the ext4 not
just for delayed extents right (if we're going to add some in the
future)? So we should change the comment, because it is generic
enough.

> + */
> +static struct extent_status *__es_tree_search(struct rb_root *root,
> +						ext4_lblk_t offset)
> +{
> +	struct rb_node *node = root->rb_node;
> +	struct extent_status *es = NULL;
> +
> +	while (node) {
> +		es = rb_entry(node, struct extent_status, rb_node);
> +		if (offset < es->start)
> +			node = node->rb_left;
> +		else if (offset >= extent_status_end(es))
> +			node = node->rb_right;
> +		else
> +			return es;
> +	}
> +
> +	if (es && offset < es->start)
> +		return es;
> +
> +	if (es && offset >= extent_status_end(es)) {
> +		node = rb_next(&es->rb_node);
> +		return node ? rb_entry(node, struct extent_status, rb_node) :
> +			      NULL;
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * ext4_es_find_extent: find the 1st delayed extent covering @es->start
> + * if it exists, otherwise, the next extent after @es->start.
> + *
> + * @inode: the inode which owns delayed extents
> + * @es: delayed extent that we found
> + *
> + * Returns next block beyond the found extent.

This is not exactly right isn't it ? From what I can see it returns
the first block of the next extent after the one we're returning in
es, or EXT_MAX_BLOCKS if no extent is found.

> + * Delayed extent is returned via @es.
> + */
> +ext4_lblk_t ext4_es_find_extent(struct inode *inode, struct extent_status *es)
> +{
> +	struct ext4_es_tree *tree;
> +	struct extent_status *es1;
> +	struct rb_node *node;
> +	ext4_lblk_t ret = EXT_MAX_BLOCKS;

Would not it make sense to try the cache first ? Maybe we do not
need to search the tree at all ?

> +
> +	es->len = 0;
> +	tree = &EXT4_I(inode)->i_es_tree;
> +	es1 = __es_tree_search(&tree->root, es->start);
> +	if (es1) {
> +		tree->cache_es = es1;
> +		es->start = es1->start;
> +		es->len = es1->len;
> +		node = rb_next(&es1->rb_node);
> +		if (node) {
> +			es1 = rb_entry(node, struct extent_status, rb_node);
> +			ret = es1->start;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +static struct extent_status *
> +ext4_es_alloc_extent(ext4_lblk_t start, ext4_lblk_t len)
> +{
> +	struct extent_status *es;
> +	es = kmem_cache_alloc(ext4_es_cachep, GFP_NOFS);
> +	if (es == NULL)
> +		return NULL;
> +	es->start = start;
> +	es->len = len;
> +	return es;
> +}
> +
> +static void ext4_es_free_extent(struct extent_status *es)
> +{
> +	kmem_cache_free(ext4_es_cachep, es);
> +}
> +
> +static void ext4_es_try_to_merge_left(struct ext4_es_tree *tree,
> +				      struct extent_status *es)
> +{
> +	struct extent_status *es1;
> +	struct rb_node *node;
> +
> +	node = rb_prev(&es->rb_node);
> +	if (!node)
> +		return;
> +
> +	es1 = rb_entry(node, struct extent_status, rb_node);
> +	if (extent_status_end(es1) == es->start) {
> +		es1->len += es->len;
> +		rb_erase(&es->rb_node, &tree->root);
> +		if (es == tree->cache_es)
> +			tree->cache_es = es1;
> +		ext4_es_free_extent(es);
> +	}
> +}
> +
> +static void ext4_es_try_to_merge_right(struct ext4_es_tree *tree,
> +				       struct extent_status *es)
> +{
> +	struct extent_status *es1;
> +	struct rb_node *node;
> +
> +	node = rb_next(&es->rb_node);
> +	if (!node)
> +		return;
> +
> +	es1 = rb_entry(node, struct extent_status, rb_node);
> +	if (es1->start == extent_status_end(es)) {
> +		es->len += es1->len;
> +		rb_erase(node, &tree->root);
> +		if (es1 == tree->cache_es)
> +			tree->cache_es = es;
> +		ext4_es_free_extent(es1);
> +	}
> +}
> +
> +/*
> + * ext4_es_add_space: adds a space to a delayed extent tree.
> + * Caller holds inode->i_data_sem.
> + *
> + * ext4_es_add_space is callyed by ext4_dealyed_write_begin and
> + * ext4_es_remove_space.
> + *
> + * Return 0 on success, error code on failure.
> + */
> +int ext4_es_add_space(struct inode *inode, ext4_lblk_t offset, ext4_lblk_t len)

I would rather call the function ext4_es_add_extent(), but that may
be subjective.

> +{
> +	struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
> +	struct rb_node **p = &tree->root.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct extent_status *es;
> +	ext4_lblk_t end = offset + len;
> +
> +	BUG_ON(end <= offset);
> +
> +	es = tree->cache_es;
> +	es_debug("add [%u/%u) to extent status tree of inode %lu\n",
> +		 offset, len, inode->i_ino);
> +
> +	if (es && extent_status_end(es) == offset) {
> +		es_debug("cached by [%u/%u)\n", es->start, es->len);
> +		es->len += len;
> +		ext4_es_try_to_merge_right(tree, es);
> +		goto out;
> +	} else if (es && es->start == end) {
> +		es_debug("cached by [%u/%u)\n", es->start, es->len);
> +		es->start = offset;
> +		es->len += len;
> +		ext4_es_try_to_merge_left(tree, es);
> +		goto out;
> +	} else if (es && es->start <= offset &&
> +		   extent_status_end(es) >= end) {
> +		es_debug("cached by [%u/%u)\n", es->start, es->len);
> +		goto out;
> +	}
> +
> +	while (*p) {
> +		parent = *p;
> +		es = rb_entry(parent, struct extent_status, rb_node);
> +
> +		if (offset < es->start) {
> +			if (end == es->start) {
> +				es->len += len;
> +				es->start = offset;
> +				goto out;
> +			}
> +			p = &(*p)->rb_left;
> +		} else if (offset >= extent_status_end(es)) {
> +			if (extent_status_end(es) == offset) {
> +				es->len += len;
> +				goto out;
> +			}
> +			p = &(*p)->rb_right;
> +		} else
> +			goto out;
> +	}
> +
> +	es = ext4_es_alloc_extent(offset, len);
> +	if (!es)
> +		return -ENOMEM;
> +	rb_link_node(&es->rb_node, parent, p);
> +	rb_insert_color(&es->rb_node, &tree->root);
> +
> +out:
> +	tree->cache_es = es;
> +	ext4_es_print_tree(inode);
> +
> +	return 0;
> +}
> +
> +/*
> + * ext4_es_remove_space() removes a space from a delayed extent tree.
> + * Caller holds inode->i_data_sem.
> + *
> + * Return 0 on success, error code on failure.
> + */
> +int ext4_es_remove_space(struct inode *inode, ext4_lblk_t offset,
> +			 ext4_lblk_t len)

Again, my subjective opinion is that it could be rather called
ext4_es_remove_extent().

> +{
> +	struct rb_node *node;
> +	struct ext4_es_tree *tree;
> +	struct extent_status *es;
> +	struct extent_status orig_es;
> +	ext4_lblk_t len1, len2, end;
> +
> +	es_debug("remove [%u/%u) from extent status tree of inode %lu\n",
> +		 offset, len, inode->i_ino);

Since you're adding tracepoints later on, are those debug messages
necessary ?

> +
> +	end = offset + len;
> +	BUG_ON(end <= offset);
> +	tree = &EXT4_I(inode)->i_es_tree;
> +	es = __es_tree_search(&tree->root, offset);
> +	if (!es)
> +		goto out;
> +
> +	/* Simply invalidate cache_es. */
> +	tree->cache_es = NULL;
> +
> +	orig_es.start = es->start;
> +	orig_es.len = es->len;
> +	len1 = offset > es->start ? offset - es->start : 0;
> +	len2 = extent_status_end(es) > end ?
> +	       extent_status_end(es) - end : 0;
> +	if (len1 > 0)
> +		es->len = len1;
> +	if (len2 > 0) {
> +		if (len1 > 0) {
> +			int err;
> +			err = ext4_es_add_space(inode, end, len2);
> +			if (err) {
> +				es->start = orig_es.start;
> +				es->len = orig_es.len;
> +				return err;
> +			}
> +		} else {
> +			es->start = end;
> +			es->len = len2;
> +		}
> +		goto out;
> +	}
> +
> +	if (len1 > 0) {
> +		node = rb_next(&es->rb_node);
> +		if (!node)
> +			es = rb_entry(node, struct extent_status, rb_node);
> +		else
> +			es = NULL;
> +	}
> +
> +	while (es && extent_status_end(es) <= end) {
> +		node = rb_next(&es->rb_node);
> +		rb_erase(&es->rb_node, &tree->root);
> +		ext4_es_free_extent(es);
> +		if (!node) {
> +			es = NULL;
> +			break;
> +		}
> +		es = rb_entry(node, struct extent_status, rb_node);
> +	}
> +
> +	if (es && es->start < end) {
> +		len1 = extent_status_end(es) - end;
> +		es->start = end;
> +		es->len = len1;
> +	}
> +
> +out:
> +	ext4_es_print_tree(inode);
> +	return 0;
> +}
> diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
> index d87fd71..8fe8084 100644
> --- a/fs/ext4/extents_status.h
> +++ b/fs/ext4/extents_status.h
> @@ -8,6 +8,12 @@
>  #ifndef _EXT4_EXTENTS_STATUS_H
>  #define _EXT4_EXTENTS_STATUS_H
>  
> +#ifdef ES_DEBUG
> +#define es_debug(a...)         printk(a)

Please take a look at the ext4_debug() or ext_debug(), maybe you can
use it, or at least reuse the code from it.

> +#else
> +#define es_debug(a...)
> +#endif
> +
>  struct extent_status {
>  	struct rb_node rb_node;
>  	ext4_lblk_t start;	/* first block extent covers */
> @@ -19,4 +25,15 @@ struct ext4_es_tree {
>  	struct extent_status *cache_es;	/* recently accessed extent */
>  };
>  
> +extern int __init ext4_init_es(void);
> +extern void ext4_exit_es(void);
> +extern void ext4_es_init_tree(struct ext4_es_tree *tree);
> +
> +extern int ext4_es_add_space(struct inode *inode, ext4_lblk_t start,
> +				ext4_lblk_t len);
> +extern int ext4_es_remove_space(struct inode *inode, ext4_lblk_t start,
> +				ext4_lblk_t len);
> +extern ext4_lblk_t ext4_es_find_extent(struct inode *inode,
> +				struct extent_status *es);
> +
>  #endif /* _EXT4_EXTENTS_STATUS_H */
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 02/10 v1][RESEND] ext4: add operations on extent status tree
  2012-07-31 11:55   ` Lukáš Czerner
@ 2012-07-31 13:18     ` Zheng Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Zheng Liu @ 2012-07-31 13:18 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: linux-ext4, linux-fsdevel, xiaoqiangnk, achender, wenqing.lz

Hi Lukas,

Thanks for a comprehensive review. :-)

On Tue, Jul 31, 2012 at 01:55:14PM +0200, Lukáš Czerner wrote:
[cut...]
> > +static inline ext4_lblk_t extent_status_end(struct extent_status *es)
> > +{
> > +	if (es->start + es->len < es->start)
> > +		return (ext4_lblk_t)-1;
> 
> Is this possible to happen ? It seems to me that if so, it means
> that the extent is corrupted and we should do something about this
> rather than ignore the problem by returning -1 (callers does not
> actually check for it), at least warning should be in place.

Yes, ext4_warning() will be added to tell user that we meet a problem.

> 
> > +	return es->start + es->len;
> 
> This seems like it should rather be es->start + es->len - 1, since
> the function name suggests that we should get the last block of the
> extent. Having it this way seems error prone to the off by one
> errors.

Will fix it.

> 
> > +}
> > +
> > +/*
> > + * search through the tree for an delayed_extent with a given offset.  If
> > + * it can't be found, try to find next extent.
> 
> This helper could be used by any other extent tree in the ext4 not
> just for delayed extents right (if we're going to add some in the
> future)? So we should change the comment, because it is generic
> enough.

Until now it seems that this function only is used by io tree itself.
As Ted described before, there is an implementation of big extent tree
in Google, and this function might be used by it.  But firstly we will
implement these two features separately.  Certainly I will change the
comment if necessary.

> 
> > + */
> > +static struct extent_status *__es_tree_search(struct rb_root *root,
> > +						ext4_lblk_t offset)
> > +{
> > +	struct rb_node *node = root->rb_node;
> > +	struct extent_status *es = NULL;
> > +
> > +	while (node) {
> > +		es = rb_entry(node, struct extent_status, rb_node);
> > +		if (offset < es->start)
> > +			node = node->rb_left;
> > +		else if (offset >= extent_status_end(es))
> > +			node = node->rb_right;
> > +		else
> > +			return es;
> > +	}
> > +
> > +	if (es && offset < es->start)
> > +		return es;
> > +
> > +	if (es && offset >= extent_status_end(es)) {
> > +		node = rb_next(&es->rb_node);
> > +		return node ? rb_entry(node, struct extent_status, rb_node) :
> > +			      NULL;
> > +	}
> > +
> > +	return NULL;
> > +}
> > +
> > +/*
> > + * ext4_es_find_extent: find the 1st delayed extent covering @es->start
> > + * if it exists, otherwise, the next extent after @es->start.
> > + *
> > + * @inode: the inode which owns delayed extents
> > + * @es: delayed extent that we found
> > + *
> > + * Returns next block beyond the found extent.
> 
> This is not exactly right isn't it ? From what I can see it returns
> the first block of the next extent after the one we're returning in
> es, or EXT_MAX_BLOCKS if no extent is found.

Will fix it.

> 
> > + * Delayed extent is returned via @es.
> > + */
> > +ext4_lblk_t ext4_es_find_extent(struct inode *inode, struct extent_status *es)
> > +{
> > +	struct ext4_es_tree *tree;
> > +	struct extent_status *es1;
> > +	struct rb_node *node;
> > +	ext4_lblk_t ret = EXT_MAX_BLOCKS;
> 
> Would not it make sense to try the cache first ? Maybe we do not
> need to search the tree at all ?

Yes, you are right.  I will add this stuff in next version.

> 
> > +
> > +	es->len = 0;
> > +	tree = &EXT4_I(inode)->i_es_tree;
> > +	es1 = __es_tree_search(&tree->root, es->start);
> > +	if (es1) {
> > +		tree->cache_es = es1;
> > +		es->start = es1->start;
> > +		es->len = es1->len;
> > +		node = rb_next(&es1->rb_node);
> > +		if (node) {
> > +			es1 = rb_entry(node, struct extent_status, rb_node);
> > +			ret = es1->start;
> > +		}
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static struct extent_status *
> > +ext4_es_alloc_extent(ext4_lblk_t start, ext4_lblk_t len)
> > +{
> > +	struct extent_status *es;
> > +	es = kmem_cache_alloc(ext4_es_cachep, GFP_NOFS);
> > +	if (es == NULL)
> > +		return NULL;
> > +	es->start = start;
> > +	es->len = len;
> > +	return es;
> > +}
> > +
> > +static void ext4_es_free_extent(struct extent_status *es)
> > +{
> > +	kmem_cache_free(ext4_es_cachep, es);
> > +}
> > +
> > +static void ext4_es_try_to_merge_left(struct ext4_es_tree *tree,
> > +				      struct extent_status *es)
> > +{
> > +	struct extent_status *es1;
> > +	struct rb_node *node;
> > +
> > +	node = rb_prev(&es->rb_node);
> > +	if (!node)
> > +		return;
> > +
> > +	es1 = rb_entry(node, struct extent_status, rb_node);
> > +	if (extent_status_end(es1) == es->start) {
> > +		es1->len += es->len;
> > +		rb_erase(&es->rb_node, &tree->root);
> > +		if (es == tree->cache_es)
> > +			tree->cache_es = es1;
> > +		ext4_es_free_extent(es);
> > +	}
> > +}
> > +
> > +static void ext4_es_try_to_merge_right(struct ext4_es_tree *tree,
> > +				       struct extent_status *es)
> > +{
> > +	struct extent_status *es1;
> > +	struct rb_node *node;
> > +
> > +	node = rb_next(&es->rb_node);
> > +	if (!node)
> > +		return;
> > +
> > +	es1 = rb_entry(node, struct extent_status, rb_node);
> > +	if (es1->start == extent_status_end(es)) {
> > +		es->len += es1->len;
> > +		rb_erase(node, &tree->root);
> > +		if (es1 == tree->cache_es)
> > +			tree->cache_es = es;
> > +		ext4_es_free_extent(es1);
> > +	}
> > +}
> > +
> > +/*
> > + * ext4_es_add_space: adds a space to a delayed extent tree.
> > + * Caller holds inode->i_data_sem.
> > + *
> > + * ext4_es_add_space is callyed by ext4_dealyed_write_begin and
> > + * ext4_es_remove_space.
> > + *
> > + * Return 0 on success, error code on failure.
> > + */
> > +int ext4_es_add_space(struct inode *inode, ext4_lblk_t offset, ext4_lblk_t len)
> 
> I would rather call the function ext4_es_add_extent(), but that may
> be subjective.

IMHO ext4_es_insert_extent() might be a better name because it can be
consistent with ext4_ext_insert_extent().

> 
> > +{
> > +	struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
> > +	struct rb_node **p = &tree->root.rb_node;
> > +	struct rb_node *parent = NULL;
> > +	struct extent_status *es;
> > +	ext4_lblk_t end = offset + len;
> > +
> > +	BUG_ON(end <= offset);
> > +
> > +	es = tree->cache_es;
> > +	es_debug("add [%u/%u) to extent status tree of inode %lu\n",
> > +		 offset, len, inode->i_ino);
> > +
> > +	if (es && extent_status_end(es) == offset) {
> > +		es_debug("cached by [%u/%u)\n", es->start, es->len);
> > +		es->len += len;
> > +		ext4_es_try_to_merge_right(tree, es);
> > +		goto out;
> > +	} else if (es && es->start == end) {
> > +		es_debug("cached by [%u/%u)\n", es->start, es->len);
> > +		es->start = offset;
> > +		es->len += len;
> > +		ext4_es_try_to_merge_left(tree, es);
> > +		goto out;
> > +	} else if (es && es->start <= offset &&
> > +		   extent_status_end(es) >= end) {
> > +		es_debug("cached by [%u/%u)\n", es->start, es->len);
> > +		goto out;
> > +	}
> > +
> > +	while (*p) {
> > +		parent = *p;
> > +		es = rb_entry(parent, struct extent_status, rb_node);
> > +
> > +		if (offset < es->start) {
> > +			if (end == es->start) {
> > +				es->len += len;
> > +				es->start = offset;
> > +				goto out;
> > +			}
> > +			p = &(*p)->rb_left;
> > +		} else if (offset >= extent_status_end(es)) {
> > +			if (extent_status_end(es) == offset) {
> > +				es->len += len;
> > +				goto out;
> > +			}
> > +			p = &(*p)->rb_right;
> > +		} else
> > +			goto out;
> > +	}
> > +
> > +	es = ext4_es_alloc_extent(offset, len);
> > +	if (!es)
> > +		return -ENOMEM;
> > +	rb_link_node(&es->rb_node, parent, p);
> > +	rb_insert_color(&es->rb_node, &tree->root);
> > +
> > +out:
> > +	tree->cache_es = es;
> > +	ext4_es_print_tree(inode);
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * ext4_es_remove_space() removes a space from a delayed extent tree.
> > + * Caller holds inode->i_data_sem.
> > + *
> > + * Return 0 on success, error code on failure.
> > + */
> > +int ext4_es_remove_space(struct inode *inode, ext4_lblk_t offset,
> > +			 ext4_lblk_t len)
> 
> Again, my subjective opinion is that it could be rather called
> ext4_es_remove_extent().

Will fix it.

> 
> > +{
> > +	struct rb_node *node;
> > +	struct ext4_es_tree *tree;
> > +	struct extent_status *es;
> > +	struct extent_status orig_es;
> > +	ext4_lblk_t len1, len2, end;
> > +
> > +	es_debug("remove [%u/%u) from extent status tree of inode %lu\n",
> > +		 offset, len, inode->i_ino);
> 
> Since you're adding tracepoints later on, are those debug messages
> necessary ?

Agree, I will remove this debug message.

> 
> > +
> > +	end = offset + len;
> > +	BUG_ON(end <= offset);
> > +	tree = &EXT4_I(inode)->i_es_tree;
> > +	es = __es_tree_search(&tree->root, offset);
> > +	if (!es)
> > +		goto out;
> > +
> > +	/* Simply invalidate cache_es. */
> > +	tree->cache_es = NULL;
> > +
> > +	orig_es.start = es->start;
> > +	orig_es.len = es->len;
> > +	len1 = offset > es->start ? offset - es->start : 0;
> > +	len2 = extent_status_end(es) > end ?
> > +	       extent_status_end(es) - end : 0;
> > +	if (len1 > 0)
> > +		es->len = len1;
> > +	if (len2 > 0) {
> > +		if (len1 > 0) {
> > +			int err;
> > +			err = ext4_es_add_space(inode, end, len2);
> > +			if (err) {
> > +				es->start = orig_es.start;
> > +				es->len = orig_es.len;
> > +				return err;
> > +			}
> > +		} else {
> > +			es->start = end;
> > +			es->len = len2;
> > +		}
> > +		goto out;
> > +	}
> > +
> > +	if (len1 > 0) {
> > +		node = rb_next(&es->rb_node);
> > +		if (!node)
> > +			es = rb_entry(node, struct extent_status, rb_node);
> > +		else
> > +			es = NULL;
> > +	}
> > +
> > +	while (es && extent_status_end(es) <= end) {
> > +		node = rb_next(&es->rb_node);
> > +		rb_erase(&es->rb_node, &tree->root);
> > +		ext4_es_free_extent(es);
> > +		if (!node) {
> > +			es = NULL;
> > +			break;
> > +		}
> > +		es = rb_entry(node, struct extent_status, rb_node);
> > +	}
> > +
> > +	if (es && es->start < end) {
> > +		len1 = extent_status_end(es) - end;
> > +		es->start = end;
> > +		es->len = len1;
> > +	}
> > +
> > +out:
> > +	ext4_es_print_tree(inode);
> > +	return 0;
> > +}
> > diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
> > index d87fd71..8fe8084 100644
> > --- a/fs/ext4/extents_status.h
> > +++ b/fs/ext4/extents_status.h
> > @@ -8,6 +8,12 @@
> >  #ifndef _EXT4_EXTENTS_STATUS_H
> >  #define _EXT4_EXTENTS_STATUS_H
> >  
> > +#ifdef ES_DEBUG
> > +#define es_debug(a...)         printk(a)
> 
> Please take a look at the ext4_debug() or ext_debug(), maybe you can
> use it, or at least reuse the code from it.

Yes, I will reuse the code from ext4_debug() as much as possible.

Regards,
Zheng

> 
> > +#else
> > +#define es_debug(a...)
> > +#endif
> > +
> >  struct extent_status {
> >  	struct rb_node rb_node;
> >  	ext4_lblk_t start;	/* first block extent covers */
> > @@ -19,4 +25,15 @@ struct ext4_es_tree {
> >  	struct extent_status *cache_es;	/* recently accessed extent */
> >  };
> >  
> > +extern int __init ext4_init_es(void);
> > +extern void ext4_exit_es(void);
> > +extern void ext4_es_init_tree(struct ext4_es_tree *tree);
> > +
> > +extern int ext4_es_add_space(struct inode *inode, ext4_lblk_t start,
> > +				ext4_lblk_t len);
> > +extern int ext4_es_remove_space(struct inode *inode, ext4_lblk_t start,
> > +				ext4_lblk_t len);
> > +extern ext4_lblk_t ext4_es_find_extent(struct inode *inode,
> > +				struct extent_status *es);
> > +
> >  #endif /* _EXT4_EXTENTS_STATUS_H */
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-07-31 13:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-22  7:59 [RFC][PATCH 00/10 v1][RESEND] ext4: extent status tree (step 1) Zheng Liu
2012-07-22  7:59 ` [RFC][PATCH 01/10 v1][RESEND] ext4: add two structures supporting extent status tree Zheng Liu
2012-07-22  7:59 ` [RFC][PATCH 02/10 v1][RESEND] ext4: add operations on " Zheng Liu
2012-07-31 11:55   ` Lukáš Czerner
2012-07-31 13:18     ` Zheng Liu
2012-07-22  7:59 ` [RFC][PATCH 03/10 v1][RESEND] ext4: initialize " Zheng Liu
2012-07-22  7:59 ` [RFC][PATCH 04/10 v1][RESEND] ext4: let ext4 maintain " Zheng Liu
2012-07-22  7:59 ` [RFC][PATCH 05/10 v1][RESEND] ext4: add some tracepoints in " Zheng Liu
2012-07-22  7:59 ` [RFC][PATCH 06/10 v1][RESEND] ext4: reimplement fiemap on " Zheng Liu
2012-07-22  7:59 ` [RFC][PATCH 07/10 v1][RESEND] ext4: reimplement ext4_find_delay_alloc_range " Zheng Liu
2012-07-22  7:59 ` [RFC][PATCH 08/10 v1][RESEND] ext4: introduce lseek SEEK_DATA/SEEK_HOLE support Zheng Liu
2012-07-22  7:59 ` [RFC][PATCH 09/10 v1][RESEND] ext4: don't need to writeout all dirty pages in punch hole Zheng Liu
2012-07-23 11:01   ` Lukáš Czerner
2012-07-23 11:57     ` Zheng Liu
2012-07-23 12:20       ` Lukáš Czerner
2012-07-23 13:18         ` Zheng Liu
2012-07-22  7:59 ` [RFC][PATCH 10/10 v1][RESEND] ext4: add two tracepoints in punching hole Zheng Liu
2012-07-27 13:43   ` Lukáš Czerner
2012-07-30  2:15     ` Zheng Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.