All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv6 0/5] reiser4: discard support: initial implementation.
@ 2014-06-22 10:48 Ivan Shapovalov
  2014-06-22 10:48 ` [PATCHv6 1/5] reiser4: make space_allocator's check_blocks() reusable Ivan Shapovalov
                   ` (4 more replies)
  0 siblings, 5 replies; 34+ messages in thread
From: Ivan Shapovalov @ 2014-06-22 10:48 UTC (permalink / raw)
  To: reiserfs-devel; +Cc: edward.shishkin, Ivan Shapovalov

v1: - initial implementation (patches 1, 2)

v2: - cleanup, fixes discovered in debug mode
    - saner logging
    - assertions
    - enablement of discard through mount option

v3: - fixed the extent merge loop in discard_atom()

v4: - squashed fix-ups into the main patch (with exception of reiser4_debug())
    - fixed bug in usage of division ops discovered while building on ARM

v5: - squashed mount option into the main patch
    - refactor based on discussion (see commit msg)
      - splitted off blocknr_list code
      - replaced ->discard_set with ->delete_set and ->aux_delete_set

v6: - actualized in-code comments
    - removed uber-verbose debug statements
    - fixed code-to-patch mapping
    - moved blocknrlist and blocknrset to kmem_cache instead of kmalloc/kfree
      (in a separate commit for ease of reviewing)
    - dropped the RFC label

Ivan Shapovalov (5):
  reiser4: make space_allocator's check_blocks() reusable.
  reiser4: add an implementation of "block lists", splitted off the discard code.
  reiser4: discard support: initial implementation using linked lists.
  reiser4: blocknr_list: use kmem_cache instead of kmalloc for allocating entries.
  reiser4: blocknr_set: use kmem_cache instead of kmalloc for allocating entries.

 fs/reiser4/Makefile                       |   2 +
 fs/reiser4/block_alloc.c                  |  48 +++--
 fs/reiser4/block_alloc.h                  |  14 +-
 fs/reiser4/blocknrlist.c                  | 336 ++++++++++++++++++++++++++++++
 fs/reiser4/blocknrset.c                   |  34 ++-
 fs/reiser4/dformat.h                      |   2 +
 fs/reiser4/discard.c                      | 216 +++++++++++++++++++
 fs/reiser4/discard.h                      |  31 +++
 fs/reiser4/forward.h                      |   1 +
 fs/reiser4/init_super.c                   |   2 +
 fs/reiser4/plugin/space/bitmap.c          |  84 +++++---
 fs/reiser4/plugin/space/bitmap.h          |   2 +-
 fs/reiser4/plugin/space/space_allocator.h |   4 +-
 fs/reiser4/super.h                        |   4 +-
 fs/reiser4/super_ops.c                    |  14 ++
 fs/reiser4/txnmgr.c                       | 125 ++++++++++-
 fs/reiser4/txnmgr.h                       |  67 +++++-
 fs/reiser4/znode.c                        |   9 +-
 18 files changed, 918 insertions(+), 77 deletions(-)
 create mode 100644 fs/reiser4/blocknrlist.c
 create mode 100644 fs/reiser4/discard.c
 create mode 100644 fs/reiser4/discard.h

-- 
2.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCHv6 1/5] reiser4: make space_allocator's check_blocks() reusable.
  2014-06-22 10:48 [PATCHv6 0/5] reiser4: discard support: initial implementation Ivan Shapovalov
@ 2014-06-22 10:48 ` Ivan Shapovalov
  2014-06-22 10:48 ` [PATCHv6 2/5] reiser4: add an implementation of "block lists", splitted off the discard code Ivan Shapovalov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 34+ messages in thread
From: Ivan Shapovalov @ 2014-06-22 10:48 UTC (permalink / raw)
  To: reiserfs-devel; +Cc: edward.shishkin, Ivan Shapovalov

Make check_blocks() return a boolean value (whether did the extent's
state match our expectations) instead of asserting success and crashing
system otherwise.
Also make it possible to check extents spanning multiple bitmap blocks.

The only user of reiser4_check_block() in its previous form has been updated
to assert on true return value.

Thus check_blocks() can now be reused by various parts of reiser4, e. g.
by the discard subsystem which will be added in next commits.

Signed-off-by: Ivan Shapovalov <intelfx100@gmail.com>
---
 fs/reiser4/block_alloc.c                  | 16 +-----
 fs/reiser4/block_alloc.h                  | 14 +++---
 fs/reiser4/plugin/space/bitmap.c          | 81 +++++++++++++++++++++----------
 fs/reiser4/plugin/space/bitmap.h          |  2 +-
 fs/reiser4/plugin/space/space_allocator.h |  4 +-
 fs/reiser4/znode.c                        |  9 ++--
 6 files changed, 72 insertions(+), 54 deletions(-)

diff --git a/fs/reiser4/block_alloc.c b/fs/reiser4/block_alloc.c
index 81ed96f..57b0836 100644
--- a/fs/reiser4/block_alloc.c
+++ b/fs/reiser4/block_alloc.c
@@ -962,26 +962,14 @@ static void used2free(reiser4_super_info_data * sbinfo, __u64 count)
 	spin_unlock_reiser4_super(sbinfo);
 }
 
-#if REISER4_DEBUG
-
 /* check "allocated" state of given block range */
-static void
+int
 reiser4_check_blocks(const reiser4_block_nr * start,
 		     const reiser4_block_nr * len, int desired)
 {
-	sa_check_blocks(start, len, desired);
+	return sa_check_blocks(start, len, desired);
 }
 
-/* check "allocated" state of given block */
-void reiser4_check_block(const reiser4_block_nr * block, int desired)
-{
-	const reiser4_block_nr one = 1;
-
-	reiser4_check_blocks(block, &one, desired);
-}
-
-#endif
-
 /* Blocks deallocation function may do an actual deallocation through space
    plugin allocation or store deleted block numbers in atom's delete_set data
    structure depend on @defer parameter. */
diff --git a/fs/reiser4/block_alloc.h b/fs/reiser4/block_alloc.h
index 689efc1..a4e98af 100644
--- a/fs/reiser4/block_alloc.h
+++ b/fs/reiser4/block_alloc.h
@@ -150,15 +150,15 @@ extern void cluster_reserved2free(int count);
 
 extern int reiser4_check_block_counters(const struct super_block *);
 
-#if REISER4_DEBUG
 
-extern void reiser4_check_block(const reiser4_block_nr *, int);
+extern int reiser4_check_blocks(const reiser4_block_nr *start,
+                                const reiser4_block_nr *len, int desired);
 
-#else
-
-#  define reiser4_check_block(beg, val)        noop
-
-#endif
+static inline int reiser4_check_block(const reiser4_block_nr *start,
+                                      int desired)
+{
+	return reiser4_check_blocks(start, NULL, desired);
+}
 
 extern int reiser4_pre_commit_hook(void);
 extern void reiser4_post_commit_hook(void);
diff --git a/fs/reiser4/plugin/space/bitmap.c b/fs/reiser4/plugin/space/bitmap.c
index 1d0fabf..d8ff542 100644
--- a/fs/reiser4/plugin/space/bitmap.c
+++ b/fs/reiser4/plugin/space/bitmap.c
@@ -1222,29 +1222,13 @@ void reiser4_dealloc_blocks_bitmap(reiser4_space_allocator * allocator,
 	release_and_unlock_bnode(bnode);
 }
 
-/* plugin->u.space_allocator.check_blocks(). */
-void reiser4_check_blocks_bitmap(const reiser4_block_nr * start,
-				 const reiser4_block_nr * len, int desired)
+static int check_blocks_one_bitmap(bmap_nr_t bmap, bmap_off_t start_offset,
+                                    bmap_off_t end_offset, int desired)
 {
-#if REISER4_DEBUG
 	struct super_block *super = reiser4_get_current_sb();
-
-	bmap_nr_t bmap;
-	bmap_off_t start_offset;
-	bmap_off_t end_offset;
-
-	struct bitmap_node *bnode;
+	struct bitmap_node *bnode = get_bnode(super, bmap);
 	int ret;
 
-	assert("zam-622", len != NULL);
-	check_block_range(start, len);
-	parse_blocknr(start, &bmap, &start_offset);
-
-	end_offset = start_offset + *len;
-	assert("nikita-2214", end_offset <= bmap_bit_count(super->s_blocksize));
-
-	bnode = get_bnode(super, bmap);
-
 	assert("nikita-2215", bnode != NULL);
 
 	ret = load_and_lock_bnode(bnode);
@@ -1253,19 +1237,64 @@ void reiser4_check_blocks_bitmap(const reiser4_block_nr * start,
 	assert("nikita-2216", jnode_is_loaded(bnode->wjnode));
 
 	if (desired) {
-		assert("zam-623",
-		       reiser4_find_next_zero_bit(bnode_working_data(bnode),
+		ret = reiser4_find_next_zero_bit(bnode_working_data(bnode),
 						  end_offset, start_offset)
-		       >= end_offset);
+		      >= end_offset;
 	} else {
-		assert("zam-624",
-		       reiser4_find_next_set_bit(bnode_working_data(bnode),
+		ret = reiser4_find_next_set_bit(bnode_working_data(bnode),
 						 end_offset, start_offset)
-		       >= end_offset);
+		      >= end_offset;
 	}
 
 	release_and_unlock_bnode(bnode);
-#endif
+
+	return ret;
+}
+
+/* plugin->u.space_allocator.check_blocks(). */
+int reiser4_check_blocks_bitmap(const reiser4_block_nr * start,
+				 const reiser4_block_nr * len, int desired)
+{
+	struct super_block *super = reiser4_get_current_sb();
+
+	reiser4_block_nr end;
+	bmap_nr_t bmap, end_bmap;
+	bmap_off_t offset;
+	bmap_off_t end_offset;
+	const bmap_off_t max_offset = bmap_bit_count(super->s_blocksize);
+
+	assert("intelfx-9", start != NULL);
+	assert("intelfx-10", ergo(len != NULL, *len > 0));
+
+	if (len != NULL) {
+		check_block_range(start, len);
+		end = *start + *len - 1;
+	} else {
+		/* end is used as temporary len here */
+		end = 1;
+		check_block_range(start, &end);
+		end = *start;
+	}
+
+	parse_blocknr(start, &bmap, &offset);
+
+	if (end == *start) {
+		end_bmap = bmap;
+		end_offset = offset;
+	} else {
+		parse_blocknr(&end, &end_bmap, &end_offset);
+	}
+	++end_offset;
+
+	assert("intelfx-4", end_bmap >= bmap);
+	assert("intelfx-5", ergo(end_bmap == bmap, end_offset >= offset));
+
+	for (; bmap < end_bmap; bmap++, offset = 0) {
+		if (!check_blocks_one_bitmap(bmap, offset, max_offset, desired)) {
+			return 0;
+		}
+	}
+	return check_blocks_one_bitmap(bmap, offset, end_offset, desired);
 }
 
 /* conditional insertion of @node into atom's overwrite set  if it was not there */
diff --git a/fs/reiser4/plugin/space/bitmap.h b/fs/reiser4/plugin/space/bitmap.h
index be867f1..4590498 100644
--- a/fs/reiser4/plugin/space/bitmap.h
+++ b/fs/reiser4/plugin/space/bitmap.h
@@ -19,7 +19,7 @@ extern int reiser4_alloc_blocks_bitmap(reiser4_space_allocator *,
 				       reiser4_blocknr_hint *, int needed,
 				       reiser4_block_nr * start,
 				       reiser4_block_nr * len);
-extern void reiser4_check_blocks_bitmap(const reiser4_block_nr *,
+extern int reiser4_check_blocks_bitmap(const reiser4_block_nr *,
 					const reiser4_block_nr *, int);
 extern void reiser4_dealloc_blocks_bitmap(reiser4_space_allocator *,
 					  reiser4_block_nr,
diff --git a/fs/reiser4/plugin/space/space_allocator.h b/fs/reiser4/plugin/space/space_allocator.h
index 5bfa9a3..71bfd11 100644
--- a/fs/reiser4/plugin/space/space_allocator.h
+++ b/fs/reiser4/plugin/space/space_allocator.h
@@ -29,9 +29,9 @@ static inline void sa_dealloc_blocks (reiser4_space_allocator * al, reiser4_bloc
 	reiser4_dealloc_blocks_##allocator (al, start, len);								\
 }															\
 															\
-static inline void sa_check_blocks (const reiser4_block_nr * start, const reiser4_block_nr * end, int desired) 		\
+static inline int sa_check_blocks (const reiser4_block_nr * start, const reiser4_block_nr * end, int desired) 		\
 {															\
-	reiser4_check_blocks_##allocator (start, end, desired);							        \
+	return reiser4_check_blocks_##allocator (start, end, desired);							        \
 }															\
 															\
 static inline void sa_pre_commit_hook (void)										\
diff --git a/fs/reiser4/znode.c b/fs/reiser4/znode.c
index 4ff9714..08eab3d 100644
--- a/fs/reiser4/znode.c
+++ b/fs/reiser4/znode.c
@@ -534,10 +534,11 @@ znode *zget(reiser4_tree * tree,
 
 		write_unlock_tree(tree);
 	}
-#if REISER4_DEBUG
-	if (!reiser4_blocknr_is_fake(blocknr) && *blocknr != 0)
-		reiser4_check_block(blocknr, 1);
-#endif
+
+	assert("intelfx-6",
+	       ergo(!reiser4_blocknr_is_fake(blocknr) && *blocknr != 0,
+	            reiser4_check_block(blocknr, 1)));
+
 	/* Check for invalid tree level, return -EIO */
 	if (unlikely(znode_get_level(result) != level)) {
 		warning("jmacd-504",
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv6 2/5] reiser4: add an implementation of "block lists", splitted off the discard code.
  2014-06-22 10:48 [PATCHv6 0/5] reiser4: discard support: initial implementation Ivan Shapovalov
  2014-06-22 10:48 ` [PATCHv6 1/5] reiser4: make space_allocator's check_blocks() reusable Ivan Shapovalov
@ 2014-06-22 10:48 ` Ivan Shapovalov
  2014-06-22 10:48 ` [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists Ivan Shapovalov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 34+ messages in thread
From: Ivan Shapovalov @ 2014-06-22 10:48 UTC (permalink / raw)
  To: reiserfs-devel; +Cc: edward.shishkin, Ivan Shapovalov

The block list is a less memory efficient, but ordered (and thus sortable)
implementation of the same concept as the blocknr_set.

Signed-off-by: Ivan Shapovalov <intelfx100@gmail.com>
---
 fs/reiser4/Makefile      |   1 +
 fs/reiser4/blocknrlist.c | 311 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/reiser4/forward.h     |   1 +
 fs/reiser4/txnmgr.h      |  19 +++
 4 files changed, 332 insertions(+)
 create mode 100644 fs/reiser4/blocknrlist.c

diff --git a/fs/reiser4/Makefile b/fs/reiser4/Makefile
index ff73d43..9f07194 100644
--- a/fs/reiser4/Makefile
+++ b/fs/reiser4/Makefile
@@ -46,6 +46,7 @@ reiser4-y := \
 		   status_flags.o \
 		   init_super.o \
 		   safe_link.o \
+		   blocknrlist.o \
            \
 		   plugin/plugin.o \
 		   plugin/plugin_set.o \
diff --git a/fs/reiser4/blocknrlist.c b/fs/reiser4/blocknrlist.c
new file mode 100644
index 0000000..2868771
--- /dev/null
+++ b/fs/reiser4/blocknrlist.c
@@ -0,0 +1,311 @@
+/* Copyright 2001, 2002, 2003 by Hans Reiser, licensing governed by
+ * reiser4/README */
+
+/* This is a block list implementation, used to create ordered block sets
+   (at the cost of being less memory efficient than blocknr_set).
+   It is used by discard code. */
+
+#include "debug.h"
+#include "dformat.h"
+#include "txnmgr.h"
+#include "context.h"
+
+#include <linux/slab.h>
+#include <linux/list_sort.h>
+
+/**
+ * Represents an extent range [@start; @end).
+ */
+struct blocknr_list_entry {
+	reiser4_block_nr start, len;
+	struct list_head link;
+};
+
+#define blocknr_list_entry(ptr) list_entry(ptr, blocknr_list_entry, link)
+
+static void blocknr_list_entry_init(blocknr_list_entry *entry)
+{
+	assert("intelfx-11", entry != NULL);
+
+	entry->start = 0;
+	entry->len = 0;
+	INIT_LIST_HEAD(&entry->link);
+}
+
+static blocknr_list_entry *blocknr_list_entry_alloc(void)
+{
+	blocknr_list_entry *entry;
+
+	entry = (blocknr_list_entry *)kmalloc(sizeof(blocknr_list_entry),
+	                                      reiser4_ctx_gfp_mask_get());
+	if (entry == NULL) {
+		return NULL;
+	}
+
+	blocknr_list_entry_init(entry);
+
+	return entry;
+}
+
+static void blocknr_list_entry_free(blocknr_list_entry *entry)
+{
+	assert("intelfx-12", entry != NULL);
+
+	kfree(entry);
+}
+
+/**
+ * Given ranges @to and [@start; @end), if they overlap, their union
+ * is calculated and saved in @to.
+ */
+static int blocknr_list_entry_merge(blocknr_list_entry *to,
+                                    reiser4_block_nr start,
+                                    reiser4_block_nr len)
+{
+	reiser4_block_nr end, to_end;
+
+	assert("intelfx-13", to != NULL);
+
+	assert("intelfx-16", to->len > 0);
+	assert("intelfx-17", len > 0);
+
+	end = start + len;
+	to_end = to->start + to->len;
+
+	if ((to->start <= end) && (start <= to_end)) {
+		if (start < to->start) {
+			to->start = start;
+		}
+
+		if (end > to_end) {
+			to_end = end;
+		}
+
+		to->len = to_end - to->start;
+
+		return 0;
+	}
+
+	return -1;
+}
+
+static int blocknr_list_entry_merge_entry(blocknr_list_entry *to,
+                                          blocknr_list_entry *from)
+{
+	assert("intelfx-18", from != NULL);
+
+	return blocknr_list_entry_merge(to, from->start, from->len);
+}
+
+/**
+ * A comparison function for list_sort().
+ *
+ * "The comparison function @cmp must return a negative value if @a
+ * should sort before @b, and a positive value if @a should sort after
+ * @b. If @a and @b are equivalent, and their original relative
+ * ordering is to be preserved, @cmp must return 0."
+ */
+static int blocknr_list_entry_compare(void* priv UNUSED_ARG,
+                                      struct list_head *a, struct list_head *b)
+{
+	blocknr_list_entry *entry_a, *entry_b;
+	reiser4_block_nr entry_a_end, entry_b_end;
+
+	assert("intelfx-19", a != NULL);
+	assert("intelfx-20", b != NULL);
+
+	entry_a = blocknr_list_entry(a);
+	entry_b = blocknr_list_entry(b);
+
+	entry_a_end = entry_a->start + entry_a->len;
+	entry_b_end = entry_b->start + entry_b->len;
+
+	/* First sort by starting block numbers... */
+	if (entry_a->start < entry_b->start) {
+		return -1;
+	}
+
+	if (entry_a->start > entry_b->start) {
+		return 1;
+	}
+
+	/** Then by ending block numbers.
+	 * If @a contains @b, it will be sorted before. */
+	if (entry_a_end > entry_b_end) {
+		return -1;
+	}
+
+	if (entry_a_end < entry_b_end) {
+		return 1;
+	}
+
+	return 0;
+}
+
+void blocknr_list_init(struct list_head* blist)
+{
+	assert("intelfx-24", blist != NULL);
+
+	INIT_LIST_HEAD(blist);
+}
+
+void blocknr_list_destroy(struct list_head* blist)
+{
+	struct list_head *pos, *tmp;
+	blocknr_list_entry *entry;
+
+	assert("intelfx-25", blist != NULL);
+
+	list_for_each_safe(pos, tmp, blist) {
+		entry = blocknr_list_entry(pos);
+		list_del_init(pos);
+		blocknr_list_entry_free(entry);
+	}
+
+	assert("intelfx-48", list_empty(blist));
+}
+
+void blocknr_list_merge(struct list_head *from, struct list_head *to)
+{
+	assert("intelfx-26", from != NULL);
+	assert("intelfx-27", to != NULL);
+
+	list_splice_tail_init(from, to);
+
+	assert("intelfx-49", list_empty(from));
+}
+
+void blocknr_list_sort_and_join(struct list_head *blist)
+{
+	struct list_head *pos, *next;
+	struct blocknr_list_entry *entry, *next_entry;
+
+	assert("intelfx-50", blist != NULL);
+
+	/* Step 1. Sort the extent list. */
+	list_sort(NULL, blist, blocknr_list_entry_compare);
+
+	/* Step 2. Join adjacent extents in the list. */
+	pos = blist->next;
+	next = pos->next;
+	entry = blocknr_list_entry(pos);
+
+	for (; next != blist; next = pos->next) {
+		/** @next is a valid node at this point */
+		next_entry = blocknr_list_entry(next);
+
+		/** try to merge @next into @pos */
+		if (!blocknr_list_entry_merge_entry(entry, next_entry)) {
+			/** successful; delete the @next node.
+			 * next merge will be attempted into the same node. */
+			list_del_init(next);
+			blocknr_list_entry_free(next_entry);
+		} else {
+			/** otherwise advance @pos. */
+			pos = next;
+			entry = next_entry;
+		}
+	}
+}
+
+int blocknr_list_add_extent(txn_atom *atom,
+                            struct list_head *blist,
+                            blocknr_list_entry **new_entry,
+                            const reiser4_block_nr *start,
+                            const reiser4_block_nr *len)
+{
+	assert("intelfx-29", atom != NULL);
+	assert("intelfx-42", atom_is_protected(atom));
+	assert("intelfx-43", blist != NULL);
+	assert("intelfx-30", new_entry != NULL);
+	assert("intelfx-31", start != NULL);
+	assert("intelfx-32", len != NULL && *len > 0);
+
+	if (*new_entry == NULL) {
+		/*
+		 * Optimization: try to merge new extent into the last one.
+		 */
+		if (!list_empty(blist)) {
+			blocknr_list_entry *last_entry;
+			last_entry = blocknr_list_entry(blist->prev);
+			if (!blocknr_list_entry_merge(last_entry, *start, *len)) {
+				return 0;
+			}
+		}
+
+		/*
+		 * Otherwise, allocate a new entry and tell -E_REPEAT.
+		 * Next time we'll take the branch below.
+		 */
+		spin_unlock_atom(atom);
+		*new_entry = blocknr_list_entry_alloc();
+		return (*new_entry != NULL) ? -E_REPEAT : RETERR(-ENOMEM);
+	}
+
+	/*
+	 * The entry has been allocated beforehand, fill it and link to the list.
+	 */
+	(*new_entry)->start = *start;
+	(*new_entry)->len = *len;
+	list_add_tail(&(*new_entry)->link, blist);
+
+	return 0;
+}
+
+int blocknr_list_iterator(txn_atom *atom,
+                          struct list_head *blist,
+                          blocknr_set_actor_f actor,
+                          void *data,
+                          int delete)
+{
+	struct list_head *pos;
+	blocknr_list_entry *entry;
+	int ret = 0;
+
+	assert("intelfx-46", blist != NULL);
+	assert("intelfx-47", actor != NULL);
+
+	if (delete) {
+		struct list_head *tmp;
+
+		list_for_each_safe(pos, tmp, blist) {
+			entry = blocknr_list_entry(pos);
+
+			/*
+			 * Do not exit, delete flag is set. Instead, on the first error we
+			 * downgrade from iterating to just deleting.
+			 */
+			if (ret == 0) {
+				ret = actor(atom, &entry->start, &entry->len, data);
+			}
+
+			list_del_init(pos);
+			blocknr_list_entry_free(entry);
+		}
+
+		assert("intelfx-44", list_empty(blist));
+	} else {
+		list_for_each(pos, blist) {
+			entry = blocknr_list_entry(pos);
+
+			ret = actor(atom, &entry->start, &entry->len, data);
+
+			if (ret != 0) {
+				return ret;
+			}
+		}
+	}
+
+	return ret;
+}
+
+/* Make Linus happy.
+   Local variables:
+   c-indentation-style: "K&R"
+   mode-name: "LC"
+   c-basic-offset: 8
+   tab-width: 8
+   fill-column: 120
+   scroll-step: 1
+   End:
+*/
diff --git a/fs/reiser4/forward.h b/fs/reiser4/forward.h
index 15dbfdc..9170c2b 100644
--- a/fs/reiser4/forward.h
+++ b/fs/reiser4/forward.h
@@ -38,6 +38,7 @@ typedef struct reiser4_dir_entry_desc reiser4_dir_entry_desc;
 typedef struct reiser4_context reiser4_context;
 typedef struct carry_level carry_level;
 typedef struct blocknr_set_entry blocknr_set_entry;
+typedef struct blocknr_list_entry blocknr_list_entry;
 /* super_block->s_fs_info points to this */
 typedef struct reiser4_super_info_data reiser4_super_info_data;
 /* next two objects are fields of reiser4_super_info_data */
diff --git a/fs/reiser4/txnmgr.h b/fs/reiser4/txnmgr.h
index 034a3fe..18ca23d 100644
--- a/fs/reiser4/txnmgr.h
+++ b/fs/reiser4/txnmgr.h
@@ -485,6 +485,25 @@ extern int blocknr_set_iterator(txn_atom * atom, struct list_head * bset,
 				blocknr_set_actor_f actor, void *data,
 				int delete);
 
+/* This is the block list interface (see blocknrlist.c) */
+extern void blocknr_list_init(struct list_head *blist);
+extern void blocknr_list_destroy(struct list_head *blist);
+extern void blocknr_list_merge(struct list_head *from, struct list_head *to);
+extern void blocknr_list_sort_and_join(struct list_head *blist);
+/**
+ * The @atom should be locked.
+ */
+extern int blocknr_list_add_extent(txn_atom *atom,
+                                   struct list_head *blist,
+                                   blocknr_list_entry **new_entry,
+                                   const reiser4_block_nr *start,
+                                   const reiser4_block_nr *len);
+extern int blocknr_list_iterator(txn_atom *atom,
+                                 struct list_head *blist,
+                                 blocknr_set_actor_f actor,
+                                 void *data,
+                                 int delete);
+
 /* flush code takes care about how to fuse flush queues */
 extern void flush_init_atom(txn_atom * atom);
 extern void flush_fuse_queues(txn_atom * large, txn_atom * small);
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-06-22 10:48 [PATCHv6 0/5] reiser4: discard support: initial implementation Ivan Shapovalov
  2014-06-22 10:48 ` [PATCHv6 1/5] reiser4: make space_allocator's check_blocks() reusable Ivan Shapovalov
  2014-06-22 10:48 ` [PATCHv6 2/5] reiser4: add an implementation of "block lists", splitted off the discard code Ivan Shapovalov
@ 2014-06-22 10:48 ` Ivan Shapovalov
  2014-07-06 23:47   ` Edward Shishkin
  2014-06-22 10:48 ` [PATCHv6 4/5] reiser4: blocknr_list: use kmem_cache instead of kmalloc for allocating entries Ivan Shapovalov
  2014-06-22 10:48 ` [PATCHv6 5/5] reiser4: blocknr_set: " Ivan Shapovalov
  4 siblings, 1 reply; 34+ messages in thread
From: Ivan Shapovalov @ 2014-06-22 10:48 UTC (permalink / raw)
  To: reiserfs-devel; +Cc: edward.shishkin, Ivan Shapovalov

Implementation details:

- candidate extents are stored during the transaction to a linked list
- if an extent being added is adjacent to the last one, they are merged
  to prevent another allocation
- at commit time, the log is further sorted and extents are merged
- extents are further processed to align discard requests to erase unit
  boundaries, extending them to neighbour blocks if needed
- each erase unit is checked to be deallocated and submitted to discard
- processing stops at first failure (this does not fail atom commit)

For now (shortcomings):

- kernel-reported erase unit granularity and alignment offset are used
  as-is without any override (it may make sense in order to mitigate
  bogus values sometimes reported by kernel/hardware)
- processing each erase unit makes its own bitmap query to check
  allocation status; this is suboptimal when granularity is less than
  the block size (should not matter in practice as granularity almost
  never is that small)

Note on candidate block collection:

Another per-atom block set, ->aux_delete_set, has been added, containing
extents deallocated without BA_DEFER (i. e. blocks of the wandered
journal). When discard is enabled, storage of both delete sets is enabled.
They are stored using blocknr_lists, then spliced and sorted before
discarding, so all blocks that have been deallocated during the
transaction are considered for discarding.

Otherwise, only ->delete_set is maintained, and it is stored using
blocknr_set which is more memory efficient but inherently unordered (so it
cannot be used for the discard algorithm).

The only semantic-significant change to existing code is that
reiser4_post_commit_hook() does not clear ->delete_set (instead, it is
cleared either by discard_atom() or as part of atom's destruction). This is
OK because ->delete_set is not accessed after reiser4_post_commit_hook().

Signed-off-by: Ivan Shapovalov <intelfx100@gmail.com>
---
 fs/reiser4/Makefile              |   1 +
 fs/reiser4/block_alloc.c         |  32 ++++--
 fs/reiser4/dformat.h             |   2 +
 fs/reiser4/discard.c             | 216 +++++++++++++++++++++++++++++++++++++++
 fs/reiser4/discard.h             |  31 ++++++
 fs/reiser4/init_super.c          |   2 +
 fs/reiser4/plugin/space/bitmap.c |   3 +-
 fs/reiser4/super.h               |   4 +-
 fs/reiser4/txnmgr.c              | 125 ++++++++++++++++++++--
 fs/reiser4/txnmgr.h              |  44 +++++++-
 10 files changed, 440 insertions(+), 20 deletions(-)
 create mode 100644 fs/reiser4/discard.c
 create mode 100644 fs/reiser4/discard.h

diff --git a/fs/reiser4/Makefile b/fs/reiser4/Makefile
index 9f07194..f50bb96 100644
--- a/fs/reiser4/Makefile
+++ b/fs/reiser4/Makefile
@@ -47,6 +47,7 @@ reiser4-y := \
 		   init_super.o \
 		   safe_link.o \
 		   blocknrlist.o \
+		   discard.o \
            \
 		   plugin/plugin.o \
 		   plugin/plugin_set.o \
diff --git a/fs/reiser4/block_alloc.c b/fs/reiser4/block_alloc.c
index 57b0836..e5ea7a4 100644
--- a/fs/reiser4/block_alloc.c
+++ b/fs/reiser4/block_alloc.c
@@ -992,6 +992,7 @@ reiser4_dealloc_blocks(const reiser4_block_nr * start,
 	int ret;
 	reiser4_context *ctx;
 	reiser4_super_info_data *sbinfo;
+	void *new_entry = NULL;
 
 	ctx = get_current_context();
 	sbinfo = get_super_private(ctx->super);
@@ -1007,17 +1008,13 @@ reiser4_dealloc_blocks(const reiser4_block_nr * start,
 	}
 
 	if (flags & BA_DEFER) {
-		blocknr_set_entry *bsep = NULL;
-
-		/* storing deleted block numbers in a blocknr set
-		   datastructure for further actual deletion */
+		/* store deleted block numbers in the atom's deferred delete set
+		   for further actual deletion */
 		do {
 			atom = get_current_atom_locked();
 			assert("zam-430", atom != NULL);
 
-			ret =
-			    blocknr_set_add_extent(atom, &atom->delete_set,
-						   &bsep, start, len);
+			ret = atom_dset_deferred_add_extent(atom, &new_entry, start, len);
 
 			if (ret == -ENOMEM)
 				return ret;
@@ -1031,6 +1028,25 @@ reiser4_dealloc_blocks(const reiser4_block_nr * start,
 		spin_unlock_atom(atom);
 
 	} else {
+		/* store deleted block numbers in the atom's immediate delete set
+		   for further processing */
+		do {
+			atom = get_current_atom_locked();
+			assert("intelfx-51", atom != NULL);
+
+			ret = atom_dset_immediate_add_extent(atom, &new_entry, start, len);
+
+			if (ret == -ENOMEM)
+				return ret;
+
+			/* This loop might spin at most two times */
+		} while (ret == -E_REPEAT);
+
+		assert("intelfx-52", ret == 0);
+		assert("intelfx-53", atom != NULL);
+
+		spin_unlock_atom(atom);
+
 		assert("zam-425", get_current_super_private() != NULL);
 		sa_dealloc_blocks(reiser4_get_space_allocator(ctx->super),
 				  *start, *len);
@@ -1128,7 +1144,7 @@ void reiser4_post_commit_hook(void)
 
 	/* do the block deallocation which was deferred
 	   until commit is done */
-	blocknr_set_iterator(atom, &atom->delete_set, apply_dset, NULL, 1);
+	atom_dset_deferred_apply(atom, apply_dset, NULL, 0);
 
 	assert("zam-504", get_current_super_private() != NULL);
 	sa_post_commit_hook();
diff --git a/fs/reiser4/dformat.h b/fs/reiser4/dformat.h
index 7943762..7316754 100644
--- a/fs/reiser4/dformat.h
+++ b/fs/reiser4/dformat.h
@@ -14,6 +14,8 @@
 #if !defined(__FS_REISER4_DFORMAT_H__)
 #define __FS_REISER4_DFORMAT_H__
 
+#include "debug.h"
+
 #include <asm/byteorder.h>
 #include <asm/unaligned.h>
 #include <linux/types.h>
diff --git a/fs/reiser4/discard.c b/fs/reiser4/discard.c
new file mode 100644
index 0000000..3c8ee89
--- /dev/null
+++ b/fs/reiser4/discard.c
@@ -0,0 +1,216 @@
+/* Copyright 2001, 2002, 2003 by Hans Reiser, licensing governed by
+ * reiser4/README */
+
+/* TRIM/discard interoperation subsystem for reiser4. */
+
+/*
+ * This subsystem is responsible for populating an atom's ->discard_set and
+ * (later) converting it into a series of discard calls to the kernel.
+ *
+ * The discard is an in-kernel interface for notifying the storage
+ * hardware about blocks that are being logically freed by the filesystem.
+ * This is done via calling the blkdev_issue_discard() function. There are
+ * restrictions on block ranges: they should constitute at least one erase unit
+ * in length and be correspondingly aligned. Otherwise a discard request will
+ * be ignored.
+ *
+ * The erase unit size is kept in struct queue_limits as discard_granularity.
+ * The offset from the partition start to the first erase unit is kept in
+ * struct queue_limits as discard_alignment.
+ *
+ * At atom level, we record numbers of all blocks that happen to be deallocated
+ * during the transaction. Then we read the generated set, filter out any blocks
+ * that have since been allocated again and issue discards for everything still
+ * valid. This is what discard.[ch] is here for.
+ *
+ * However, simply iterating through the recorded extents is not enough:
+ * - if a single extent is smaller than the erase unit, then this particular
+ *   extent won't be discarded even if it is surrounded by enough free blocks
+ *   to constitute a whole erase unit;
+ * - we won't be able to merge small adjacent extents forming an extent long
+ *   enough to be discarded.
+ *
+ * MECHANISM:
+ *
+ * During the transaction deallocated extents are recorded in atom's delete
+ * sets. There are two delete sets, because data in one of them (delete_set) is
+ * also used by other parts of reiser4. The second delete set (aux_delete_set)
+ * complements the first one and is maintained only when discard is enabled.
+ *
+ * Together these sets constitute "the discard set" -- blocks that have to be
+ * considered for discarding. On atom commit we will generate a minimal
+ * superset of the discard set, comprised of whole erase units.
+ *
+ * So, at commit time the following actions take place:
+ * - delete sets are merged to form the discard set;
+ * - elements of the discard set are sorted;
+ * - the discard set is iterated, joining any adjacent extents;
+ * - each of resulting extents is "covered" by erase units:
+ *   - its start is rounded down to the closest erase unit boundary;
+ *   - starting from this block, extents of erase unit length are created
+ *     until the original extent is fully covered;
+ * - the calculated erase units are checked to be fully deallocated;
+ * - remaining (valid) erase units are then passed to blkdev_issue_discard().
+ */
+
+#include "discard.h"
+#include "context.h"
+#include "debug.h"
+#include "txnmgr.h"
+#include "super.h"
+
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/blkdev.h>
+
+static int __discard_extent(struct block_device *bdev, sector_t start,
+                            sector_t len)
+{
+	assert("intelfx-21", bdev != NULL);
+
+	return blkdev_issue_discard(bdev, start, len, reiser4_ctx_gfp_mask_get(),
+	                            0);
+}
+
+static int discard_extent(txn_atom *atom UNUSED_ARG,
+                          const reiser4_block_nr* start,
+                          const reiser4_block_nr* len,
+                          void *data UNUSED_ARG)
+{
+	struct super_block *sb = reiser4_get_current_sb();
+	struct block_device *bdev = sb->s_bdev;
+	struct queue_limits *limits = &bdev_get_queue(bdev)->limits;
+
+	sector_t extent_start_sec, extent_end_sec,
+	         unit_sec, request_start_sec = 0, request_len_sec = 0;
+	reiser4_block_nr unit_start_blk, unit_len_blk;
+	int ret, erase_unit_counter = 0;
+
+	const int sec_per_blk = sb->s_blocksize >> 9;
+
+	/* from blkdev_issue_discard():
+	 * Zero-sector (unknown) and one-sector granularities are the same.  */
+	const int granularity = max(limits->discard_granularity >> 9, 1U);
+	const int alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
+
+	/* we assume block = N * sector */
+	assert("intelfx-7", sec_per_blk > 0);
+
+	/* convert extent to sectors */
+	extent_start_sec = *start * sec_per_blk;
+	extent_end_sec = (*start + *len) * sec_per_blk;
+
+	/* round down extent start sector to an erase unit boundary */
+	unit_sec = extent_start_sec;
+	if (granularity > 1) {
+		sector_t tmp = extent_start_sec - alignment;
+		unit_sec -= sector_div(tmp, granularity);
+	}
+
+	/* iterate over erase units in the extent */
+	do {
+		/* considering erase unit:
+		 * [unit_sec; unit_sec + granularity) */
+
+		/* calculate block range for erase unit:
+		 * [unit_start_blk; unit_start_blk+unit_len_blk) */
+		unit_start_blk = unit_sec;
+		do_div(unit_start_blk, sec_per_blk);
+
+		if (granularity > 1) {
+			unit_len_blk = unit_sec + granularity - 1;
+			do_div(unit_len_blk, sec_per_blk);
+			++unit_len_blk;
+
+			assert("intelfx-22", unit_len_blk > unit_start_blk);
+
+			unit_len_blk -= unit_start_blk;
+		} else {
+			unit_len_blk = 1;
+		}
+
+		if (reiser4_check_blocks(&unit_start_blk, &unit_len_blk, 0)) {
+			/* OK. Add this unit to the accumulator.
+			 * We accumulate discard units to call blkdev_issue_discard()
+			 * not too frequently. */
+
+			if (request_len_sec > 0) {
+				request_len_sec += granularity;
+			} else {
+				request_start_sec = unit_sec;
+				request_len_sec = granularity;
+			}
+		} else {
+			/* This unit can't be discarded. Discard what's been accumulated
+			 * so far. */
+			if (request_len_sec > 0) {
+				ret = __discard_extent(bdev, request_start_sec, request_len_sec);
+				if (ret != 0) {
+					return ret;
+				}
+				request_len_sec = 0;
+			}
+		}
+
+		unit_sec += granularity;
+		++erase_unit_counter;
+	} while (unit_sec < extent_end_sec);
+
+	/* Discard the last accumulated request. */
+	if (request_len_sec > 0) {
+		ret = __discard_extent(bdev, request_start_sec, request_len_sec);
+		if (ret != 0) {
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+int discard_atom(txn_atom *atom)
+{
+	int ret;
+	struct list_head discard_set;
+
+	if (!reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
+		spin_unlock_atom(atom);
+		return 0;
+	}
+
+	assert("intelfx-28", atom != NULL);
+
+	if (list_empty(&atom->discard.delete_set) &&
+	    list_empty(&atom->discard.aux_delete_set)) {
+		spin_unlock_atom(atom);
+		return 0;
+	}
+
+	/* Take the delete sets from the atom in order to release atom spinlock. */
+	blocknr_list_init(&discard_set);
+	blocknr_list_merge(&atom->discard.delete_set, &discard_set);
+	blocknr_list_merge(&atom->discard.aux_delete_set, &discard_set);
+	spin_unlock_atom(atom);
+
+	/* Sort the discard list, joining adjacent and overlapping extents. */
+	blocknr_list_sort_and_join(&discard_set);
+
+	/* Perform actual dirty work. */
+	ret = blocknr_list_iterator(NULL, &discard_set, &discard_extent, NULL, 1);
+	if (ret != 0) {
+		return ret;
+	}
+
+	/* Let's do this again for any new extents in the atom's discard set. */
+	return -E_REPEAT;
+}
+
+/* Make Linus happy.
+   Local variables:
+   c-indentation-style: "K&R"
+   mode-name: "LC"
+   c-basic-offset: 8
+   tab-width: 8
+   fill-column: 120
+   scroll-step: 1
+   End:
+*/
diff --git a/fs/reiser4/discard.h b/fs/reiser4/discard.h
new file mode 100644
index 0000000..ea46334
--- /dev/null
+++ b/fs/reiser4/discard.h
@@ -0,0 +1,31 @@
+/* Copyright 2001, 2002, 2003 by Hans Reiser, licensing governed by
+ * reiser4/README */
+
+/* TRIM/discard interoperation subsystem for reiser4. */
+
+#if !defined(__FS_REISER4_DISCARD_H__)
+#define __FS_REISER4_DISCARD_H__
+
+#include "forward.h"
+#include "dformat.h"
+
+/**
+ * Issue discard requests for all block extents recorded in @atom's delete sets,
+ * if discard is enabled. In this case the delete sets are cleared.
+ *
+ * @atom should be locked on entry and is unlocked on exit.
+ */
+extern int discard_atom(txn_atom *atom);
+
+/* __FS_REISER4_DISCARD_H__ */
+#endif
+
+/* Make Linus happy.
+   Local variables:
+   c-indentation-style: "K&R"
+   mode-name: "LC"
+   c-basic-offset: 8
+   tab-width: 8
+   fill-column: 120
+   End:
+*/
diff --git a/fs/reiser4/init_super.c b/fs/reiser4/init_super.c
index 620a0f5..1ff8dad 100644
--- a/fs/reiser4/init_super.c
+++ b/fs/reiser4/init_super.c
@@ -494,6 +494,8 @@ int reiser4_init_super_data(struct super_block *super, char *opt_string)
 	PUSH_BIT_OPT("atomic_write", REISER4_ATOMIC_WRITE);
 	/* disable use of write barriers in the reiser4 log writer. */
 	PUSH_BIT_OPT("no_write_barrier", REISER4_NO_WRITE_BARRIER);
+	/* enable issuing of discard requests */
+	PUSH_BIT_OPT("discard", REISER4_DISCARD);
 
 	PUSH_OPT(p, opts,
 	{
diff --git a/fs/reiser4/plugin/space/bitmap.c b/fs/reiser4/plugin/space/bitmap.c
index d8ff542..03bc5e7 100644
--- a/fs/reiser4/plugin/space/bitmap.c
+++ b/fs/reiser4/plugin/space/bitmap.c
@@ -1460,8 +1460,7 @@ int reiser4_pre_commit_hook_bitmap(void)
 		}
 	}
 
-	blocknr_set_iterator(atom, &atom->delete_set, apply_dset_to_commit_bmap,
-			     &blocks_freed, 0);
+	atom_dset_deferred_apply(atom, apply_dset_to_commit_bmap, &blocks_freed, 0);
 
 	blocks_freed -= atom->nr_blocks_allocated;
 
diff --git a/fs/reiser4/super.h b/fs/reiser4/super.h
index 0c73845..895c3f3 100644
--- a/fs/reiser4/super.h
+++ b/fs/reiser4/super.h
@@ -51,7 +51,9 @@ typedef enum {
 	/* enforce atomicity during write(2) */
 	REISER4_ATOMIC_WRITE = 6,
 	/* don't use write barriers in the log writer code. */
-	REISER4_NO_WRITE_BARRIER = 7
+	REISER4_NO_WRITE_BARRIER = 7,
+	/* enable issuing of discard requests */
+	REISER4_DISCARD = 8
 } reiser4_fs_flag;
 
 /*
diff --git a/fs/reiser4/txnmgr.c b/fs/reiser4/txnmgr.c
index 4950179..f27d1dc 100644
--- a/fs/reiser4/txnmgr.c
+++ b/fs/reiser4/txnmgr.c
@@ -233,6 +233,7 @@ year old --- define all technical terms used.
 #include "vfs_ops.h"
 #include "inode.h"
 #include "flush.h"
+#include "discard.h"
 
 #include <asm/atomic.h>
 #include <linux/types.h>
@@ -404,9 +405,10 @@ static void atom_init(txn_atom * atom)
 	INIT_LIST_HEAD(&atom->atom_link);
 	INIT_LIST_HEAD(&atom->fwaitfor_list);
 	INIT_LIST_HEAD(&atom->fwaiting_list);
-	blocknr_set_init(&atom->delete_set);
 	blocknr_set_init(&atom->wandered_map);
 
+	atom_dset_init(atom);
+
 	init_atom_fq_parts(atom);
 }
 
@@ -798,9 +800,10 @@ static void atom_free(txn_atom * atom)
 	       (atom->stage == ASTAGE_INVALID || atom->stage == ASTAGE_DONE));
 	atom->stage = ASTAGE_FREE;
 
-	blocknr_set_destroy(&atom->delete_set);
 	blocknr_set_destroy(&atom->wandered_map);
 
+	atom_dset_destroy(atom);
+
 	assert("jmacd-16", atom_isclean(atom));
 
 	spin_unlock_atom(atom);
@@ -1086,6 +1089,17 @@ static int commit_current_atom(long *nr_submitted, txn_atom ** atom)
 	if (ret < 0)
 		reiser4_panic("zam-597", "write log failed (%ld)\n", ret);
 
+	/* process and issue discard requests */
+	do {
+		spin_lock_atom(*atom);
+		ret = discard_atom(*atom);
+	} while (ret == -E_REPEAT);
+
+	if (ret) {
+		warning("intelfx-8", "discard atom failed (%ld)", ret);
+		ret = 0; /* the discard is optional, don't fail the commit */
+	}
+
 	/* The atom->ovrwr_nodes list is processed under commit mutex held
 	   because of bitmap nodes which are captured by special way in
 	   reiser4_pre_commit_hook_bitmap(), that way does not include
@@ -2938,9 +2952,11 @@ static void capture_fuse_into(txn_atom * small, txn_atom * large)
 	large->flags |= small->flags;
 
 	/* Merge blocknr sets. */
-	blocknr_set_merge(&small->delete_set, &large->delete_set);
 	blocknr_set_merge(&small->wandered_map, &large->wandered_map);
 
+	/* Merge delete sets. */
+	atom_dset_merge(small, large);
+
 	/* Merge allocated/deleted file counts */
 	large->nr_objects_deleted += small->nr_objects_deleted;
 	large->nr_objects_created += small->nr_objects_created;
@@ -3064,9 +3080,7 @@ reiser4_block_nr txnmgr_count_deleted_blocks(void)
 	list_for_each_entry(atom, &tmgr->atoms_list, atom_link) {
 		spin_lock_atom(atom);
 		if (atom_isopen(atom))
-			blocknr_set_iterator(
-				atom, &atom->delete_set,
-				count_deleted_blocks_actor, &result, 0);
+			atom_dset_deferred_apply(atom, count_deleted_blocks_actor, &result, 0);
 		spin_unlock_atom(atom);
 	}
 	spin_unlock_txnmgr(tmgr);
@@ -3074,6 +3088,105 @@ reiser4_block_nr txnmgr_count_deleted_blocks(void)
 	return result;
 }
 
+void atom_dset_init(txn_atom *atom)
+{
+	if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
+		blocknr_list_init(&atom->discard.delete_set);
+		blocknr_list_init(&atom->discard.aux_delete_set);
+	} else {
+		blocknr_set_init(&atom->nodiscard.delete_set);
+	}
+}
+
+void atom_dset_destroy(txn_atom *atom)
+{
+	if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
+		blocknr_list_destroy(&atom->discard.delete_set);
+		blocknr_list_destroy(&atom->discard.aux_delete_set);
+	} else {
+		blocknr_set_destroy(&atom->nodiscard.delete_set);
+	}
+}
+
+void atom_dset_merge(txn_atom *from, txn_atom *to)
+{
+	if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
+		blocknr_list_merge(&from->discard.delete_set, &to->discard.delete_set);
+		blocknr_list_merge(&from->discard.aux_delete_set, &to->discard.aux_delete_set);
+	} else {
+		blocknr_set_merge(&from->nodiscard.delete_set, &to->nodiscard.delete_set);
+	}
+}
+
+int atom_dset_deferred_apply(txn_atom* atom,
+                             blocknr_set_actor_f actor,
+                             void *data,
+                             int delete)
+{
+	int ret;
+
+	if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
+		ret = blocknr_list_iterator(atom,
+		                            &atom->discard.delete_set,
+		                            actor,
+		                            data,
+		                            delete);
+	} else {
+		ret = blocknr_set_iterator(atom,
+		                           &atom->nodiscard.delete_set,
+		                           actor,
+		                           data,
+		                           delete);
+	}
+
+	return ret;
+}
+
+extern int atom_dset_deferred_add_extent(txn_atom *atom,
+                                         void **new_entry,
+                                         const reiser4_block_nr *start,
+                                         const reiser4_block_nr *len)
+{
+	int ret;
+
+	if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
+		ret = blocknr_list_add_extent(atom,
+		                              &atom->discard.delete_set,
+		                              (blocknr_list_entry**)new_entry,
+		                              start,
+		                              len);
+	} else {
+		ret = blocknr_set_add_extent(atom,
+		                             &atom->nodiscard.delete_set,
+		                             (blocknr_set_entry**)new_entry,
+		                             start,
+		                             len);
+	}
+
+	return ret;
+}
+
+extern int atom_dset_immediate_add_extent(txn_atom *atom,
+                                          void **new_entry,
+                                          const reiser4_block_nr *start,
+                                          const reiser4_block_nr *len)
+{
+	int ret;
+
+	if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
+		ret = blocknr_list_add_extent(atom,
+		                             &atom->discard.aux_delete_set,
+		                             (blocknr_list_entry**)new_entry,
+		                             start,
+		                             len);
+	} else {
+		/* no-op */
+		ret = 0;
+	}
+
+	return ret;
+}
+
 /*
  * Local variables:
  * c-indentation-style: "K&R"
diff --git a/fs/reiser4/txnmgr.h b/fs/reiser4/txnmgr.h
index 18ca23d..02fc938 100644
--- a/fs/reiser4/txnmgr.h
+++ b/fs/reiser4/txnmgr.h
@@ -245,9 +245,26 @@ struct txn_atom {
 	/* Start time. */
 	unsigned long start_time;
 
-	/* The atom's delete set. It collects block numbers of the nodes
-	   which were deleted during the transaction. */
-	struct list_head delete_set;
+	/* The atom's delete sets.
+	   "simple" are blocknr_set instances and are used when discard is disabled.
+	   "discard" are blocknr_list instances and are used when discard is enabled. */
+	union {
+		struct {
+		/* The atom's delete set. It collects block numbers of the nodes
+		   which were deleted during the transaction. */
+			struct list_head delete_set;
+		} nodiscard;
+
+		struct {
+			/* The atom's delete set. It collects block numbers which were
+			   deallocated with BA_DEFER, i. e. of ordinary nodes. */
+			struct list_head delete_set;
+
+			/* The atom's auxiliary delete set. It collects block numbers
+			   which were deallocated without BA_DEFER, i. e. immediately. */
+			struct list_head aux_delete_set;
+		} discard;
+	};
 
 	/* The atom's wandered_block mapping. */
 	struct list_head wandered_map;
@@ -504,6 +521,27 @@ extern int blocknr_list_iterator(txn_atom *atom,
                                  void *data,
                                  int delete);
 
+/* These are wrappers for accessing and modifying atom's delete lists,
+   depending on whether discard is enabled or not.
+   If it is enabled. both deferred and immediate delete lists are maintained,
+   and (less memory efficient) blocknr_lists are used for storage. Otherwise, only
+   deferred delete list is maintained and blocknr_set is used for its storage. */
+extern void atom_dset_init(txn_atom *atom);
+extern void atom_dset_destroy(txn_atom *atom);
+extern void atom_dset_merge(txn_atom *from, txn_atom *to);
+extern int atom_dset_deferred_apply(txn_atom* atom,
+                                    blocknr_set_actor_f actor,
+                                    void *data,
+                                    int delete);
+extern int atom_dset_deferred_add_extent(txn_atom *atom,
+                                         void **new_entry,
+                                         const reiser4_block_nr *start,
+                                         const reiser4_block_nr *len);
+extern int atom_dset_immediate_add_extent(txn_atom *atom,
+                                          void **new_entry,
+                                          const reiser4_block_nr *start,
+                                          const reiser4_block_nr *len);
+
 /* flush code takes care about how to fuse flush queues */
 extern void flush_init_atom(txn_atom * atom);
 extern void flush_fuse_queues(txn_atom * large, txn_atom * small);
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv6 4/5] reiser4: blocknr_list: use kmem_cache instead of kmalloc for allocating entries.
  2014-06-22 10:48 [PATCHv6 0/5] reiser4: discard support: initial implementation Ivan Shapovalov
                   ` (2 preceding siblings ...)
  2014-06-22 10:48 ` [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists Ivan Shapovalov
@ 2014-06-22 10:48 ` Ivan Shapovalov
  2014-06-22 10:48 ` [PATCHv6 5/5] reiser4: blocknr_set: " Ivan Shapovalov
  4 siblings, 0 replies; 34+ messages in thread
From: Ivan Shapovalov @ 2014-06-22 10:48 UTC (permalink / raw)
  To: reiserfs-devel; +Cc: edward.shishkin, Ivan Shapovalov

---
 fs/reiser4/blocknrlist.c | 31 ++++++++++++++++++++++++++++---
 fs/reiser4/super_ops.c   |  7 +++++++
 fs/reiser4/txnmgr.h      |  2 ++
 3 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/fs/reiser4/blocknrlist.c b/fs/reiser4/blocknrlist.c
index 2868771..39a4a9b 100644
--- a/fs/reiser4/blocknrlist.c
+++ b/fs/reiser4/blocknrlist.c
@@ -9,10 +9,13 @@
 #include "dformat.h"
 #include "txnmgr.h"
 #include "context.h"
+#include "super.h"
 
 #include <linux/slab.h>
 #include <linux/list_sort.h>
 
+static struct kmem_cache *blocknr_list_slab = NULL;
+
 /**
  * Represents an extent range [@start; @end).
  */
@@ -36,8 +39,8 @@ static blocknr_list_entry *blocknr_list_entry_alloc(void)
 {
 	blocknr_list_entry *entry;
 
-	entry = (blocknr_list_entry *)kmalloc(sizeof(blocknr_list_entry),
-	                                      reiser4_ctx_gfp_mask_get());
+	entry = (blocknr_list_entry *)kmem_cache_alloc(blocknr_list_slab,
+	                                               reiser4_ctx_gfp_mask_get());
 	if (entry == NULL) {
 		return NULL;
 	}
@@ -51,7 +54,7 @@ static void blocknr_list_entry_free(blocknr_list_entry *entry)
 {
 	assert("intelfx-12", entry != NULL);
 
-	kfree(entry);
+	kmem_cache_free(blocknr_list_slab, entry);
 }
 
 /**
@@ -142,6 +145,28 @@ static int blocknr_list_entry_compare(void* priv UNUSED_ARG,
 	return 0;
 }
 
+int blocknr_list_init_static(void)
+{
+	assert("intelfx-54", blocknr_list_slab == NULL);
+
+	blocknr_list_slab = kmem_cache_create("blocknr_list_entry",
+	                                      sizeof(blocknr_list_entry),
+	                                      0,
+	                                      SLAB_HWCACHE_ALIGN |
+	                                      SLAB_RECLAIM_ACCOUNT,
+	                                      NULL);
+	if (blocknr_list_slab == NULL) {
+		return RETERR(-ENOMEM);
+	}
+
+	return 0;
+}
+
+void blocknr_list_done_static(void)
+{
+	destroy_reiser4_cache(&blocknr_list_slab);
+}
+
 void blocknr_list_init(struct list_head* blist)
 {
 	assert("intelfx-24", blist != NULL);
diff --git a/fs/reiser4/super_ops.c b/fs/reiser4/super_ops.c
index 81773b3..a63ceb5 100644
--- a/fs/reiser4/super_ops.c
+++ b/fs/reiser4/super_ops.c
@@ -678,11 +678,17 @@ static int __init init_reiser4(void)
 	if ((result = reiser4_init_d_cursor()) != 0)
 		goto failed_init_d_cursor;
 
+	/* initialize cache of blocknr list entries */
+	if ((result = blocknr_list_init_static()) != 0)
+		goto failed_init_blocknr_list;
+
 	if ((result = register_filesystem(&reiser4_fs_type)) == 0) {
 		reiser4_debugfs_root = debugfs_create_dir("reiser4", NULL);
 		return 0;
 	}
 
+	blocknr_list_done_static();
+ failed_init_blocknr_list:
 	reiser4_done_d_cursor();
  failed_init_d_cursor:
 	reiser4_done_file_fsdata();
@@ -718,6 +724,7 @@ static void __exit done_reiser4(void)
 	debugfs_remove(reiser4_debugfs_root);
 	result = unregister_filesystem(&reiser4_fs_type);
 	BUG_ON(result != 0);
+	blocknr_list_done_static();
 	reiser4_done_d_cursor();
 	reiser4_done_file_fsdata();
 	reiser4_done_dentry_fsdata();
diff --git a/fs/reiser4/txnmgr.h b/fs/reiser4/txnmgr.h
index 02fc938..be5d552 100644
--- a/fs/reiser4/txnmgr.h
+++ b/fs/reiser4/txnmgr.h
@@ -503,6 +503,8 @@ extern int blocknr_set_iterator(txn_atom * atom, struct list_head * bset,
 				int delete);
 
 /* This is the block list interface (see blocknrlist.c) */
+extern int blocknr_list_init_static(void);
+extern void blocknr_list_done_static(void);
 extern void blocknr_list_init(struct list_head *blist);
 extern void blocknr_list_destroy(struct list_head *blist);
 extern void blocknr_list_merge(struct list_head *from, struct list_head *to);
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv6 5/5] reiser4: blocknr_set: use kmem_cache instead of kmalloc for allocating entries.
  2014-06-22 10:48 [PATCHv6 0/5] reiser4: discard support: initial implementation Ivan Shapovalov
                   ` (3 preceding siblings ...)
  2014-06-22 10:48 ` [PATCHv6 4/5] reiser4: blocknr_list: use kmem_cache instead of kmalloc for allocating entries Ivan Shapovalov
@ 2014-06-22 10:48 ` Ivan Shapovalov
  4 siblings, 0 replies; 34+ messages in thread
From: Ivan Shapovalov @ 2014-06-22 10:48 UTC (permalink / raw)
  To: reiserfs-devel; +Cc: edward.shishkin, Ivan Shapovalov

---
 fs/reiser4/blocknrset.c | 34 +++++++++++++++++++++++++++++++---
 fs/reiser4/super_ops.c  |  7 +++++++
 fs/reiser4/txnmgr.h     |  2 ++
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/fs/reiser4/blocknrset.c b/fs/reiser4/blocknrset.c
index bf57c17..2f18cbc 100644
--- a/fs/reiser4/blocknrset.c
+++ b/fs/reiser4/blocknrset.c
@@ -8,6 +8,7 @@ reiser4/README */
 #include "dformat.h"
 #include "txnmgr.h"
 #include "context.h"
+#include "super.h"
 
 #include <linux/slab.h>
 
@@ -42,6 +43,8 @@ reiser4/README */
 	sizeof(struct list_head)) /		\
 	sizeof(reiser4_block_nr))
 
+static struct kmem_cache *blocknr_set_slab = NULL;
+
 /* An entry of the blocknr_set */
 struct blocknr_set_entry {
 	unsigned nr_singles;
@@ -82,8 +85,8 @@ static blocknr_set_entry *bse_alloc(void)
 {
 	blocknr_set_entry *e;
 
-	if ((e = (blocknr_set_entry *) kmalloc(sizeof(blocknr_set_entry),
-					   reiser4_ctx_gfp_mask_get())) == NULL)
+	if ((e = (blocknr_set_entry *) kmem_cache_alloc(blocknr_set_slab,
+							reiser4_ctx_gfp_mask_get())) == NULL)
 		return NULL;
 
 	bse_init(e);
@@ -95,7 +98,7 @@ static blocknr_set_entry *bse_alloc(void)
 /* Audited by: green(2002.06.11) */
 static void bse_free(blocknr_set_entry * bse)
 {
-	kfree(bse);
+	kmem_cache_free(blocknr_set_slab, bse);
 }
 
 /* Add a block number to a blocknr_set_entry */
@@ -225,6 +228,31 @@ blocknr_set_add_pair(txn_atom * atom,
 	return blocknr_set_add(atom, bset, new_bsep, a, b);
 }
 
+/* Initialize slab cache of blocknr_set_entry objects. */
+int blocknr_set_init_static(void)
+{
+	assert("intelfx-55", blocknr_set_slab == NULL);
+
+	blocknr_set_slab = kmem_cache_create("blocknr_set_entry",
+					     sizeof(blocknr_set_entry),
+					     0,
+					     SLAB_HWCACHE_ALIGN |
+					     SLAB_RECLAIM_ACCOUNT,
+					     NULL);
+
+	if (blocknr_set_slab == NULL) {
+		return RETERR(-ENOMEM);
+	}
+
+	return 0;
+}
+
+/* Destroy slab cache of blocknr_set_entry objects. */
+void blocknr_set_done_static(void)
+{
+	destroy_reiser4_cache(&blocknr_set_slab);
+}
+
 /* Initialize a blocknr_set. */
 void blocknr_set_init(struct list_head *bset)
 {
diff --git a/fs/reiser4/super_ops.c b/fs/reiser4/super_ops.c
index a63ceb5..bcd7fd6 100644
--- a/fs/reiser4/super_ops.c
+++ b/fs/reiser4/super_ops.c
@@ -678,6 +678,10 @@ static int __init init_reiser4(void)
 	if ((result = reiser4_init_d_cursor()) != 0)
 		goto failed_init_d_cursor;
 
+	/* initialize cache of blocknr set entries */
+	if ((result = blocknr_set_init_static()) != 0)
+		goto failed_init_blocknr_set;
+
 	/* initialize cache of blocknr list entries */
 	if ((result = blocknr_list_init_static()) != 0)
 		goto failed_init_blocknr_list;
@@ -689,6 +693,8 @@ static int __init init_reiser4(void)
 
 	blocknr_list_done_static();
  failed_init_blocknr_list:
+	blocknr_set_done_static();
+ failed_init_blocknr_set:
 	reiser4_done_d_cursor();
  failed_init_d_cursor:
 	reiser4_done_file_fsdata();
@@ -725,6 +731,7 @@ static void __exit done_reiser4(void)
 	result = unregister_filesystem(&reiser4_fs_type);
 	BUG_ON(result != 0);
 	blocknr_list_done_static();
+	blocknr_set_done_static();
 	reiser4_done_d_cursor();
 	reiser4_done_file_fsdata();
 	reiser4_done_dentry_fsdata();
diff --git a/fs/reiser4/txnmgr.h b/fs/reiser4/txnmgr.h
index be5d552..02757a9 100644
--- a/fs/reiser4/txnmgr.h
+++ b/fs/reiser4/txnmgr.h
@@ -482,6 +482,8 @@ int capture_bulk(jnode **, int count);
 
 /* See the comment on the function blocknrset.c:blocknr_set_add for the
    calling convention of these three routines. */
+extern int blocknr_set_init_static(void);
+extern void blocknr_set_done_static(void);
 extern void blocknr_set_init(struct list_head * bset);
 extern void blocknr_set_destroy(struct list_head * bset);
 extern void blocknr_set_merge(struct list_head * from, struct list_head * into);
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-06-22 10:48 ` [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists Ivan Shapovalov
@ 2014-07-06 23:47   ` Edward Shishkin
  2014-07-09 12:40     ` Ivan Shapovalov
  0 siblings, 1 reply; 34+ messages in thread
From: Edward Shishkin @ 2014-07-06 23:47 UTC (permalink / raw)
  To: Ivan Shapovalov, reiserfs-devel


On 06/22/2014 12:48 PM, Ivan Shapovalov wrote:
[...]
> +++ b/fs/reiser4/discard.c
> @@ -0,0 +1,216 @@
> +/* Copyright 2001, 2002, 2003 by Hans Reiser, licensing governed by
> + * reiser4/README */
> +
> +/* TRIM/discard interoperation subsystem for reiser4. */
> +
> +/*
> + * This subsystem is responsible for populating an atom's ->discard_set and
> + * (later) converting it into a series of discard calls to the kernel.
> + *
> + * The discard is an in-kernel interface for notifying the storage
> + * hardware about blocks that are being logically freed by the filesystem.
> + * This is done via calling the blkdev_issue_discard() function. There are
> + * restrictions on block ranges: they should constitute at least one erase unit
> + * in length and be correspondingly aligned. Otherwise a discard request will
> + * be ignored.
> + *
> + * The erase unit size is kept in struct queue_limits as discard_granularity.
> + * The offset from the partition start to the first erase unit is kept in
> + * struct queue_limits as discard_alignment.
> + *
> + * At atom level, we record numbers of all blocks that happen to be deallocated
> + * during the transaction. Then we read the generated set, filter out any blocks
> + * that have since been allocated again and issue discards for everything still
> + * valid. This is what discard.[ch] is here for.
> + *
> + * However, simply iterating through the recorded extents is not enough:


I still don't understand this explanation..


> + * - if a single extent is smaller than the erase unit, then this particular
> + *   extent won't be discarded even if it is surrounded by enough free blocks
> + *   to constitute a whole erase unit;


Why not to discard the aligned and padded extent, which coincides
with the whole erase unit?


> + * - we won't be able to merge small adjacent extents forming an extent long
> + *   enough to be discarded.


At this point we have already sorted and merged everything.
So may be it makes sense just to check the head and tail of every resulted
extent and discard the aligned and padded one?

Please, consider such possibility. Iterating over erase units in 
discard_extent()
looks suboptimal.

Thanks,
Edward.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-06 23:47   ` Edward Shishkin
@ 2014-07-09 12:40     ` Ivan Shapovalov
  2014-07-09 16:35       ` Edward Shishkin
  2014-07-13  1:33       ` Edward Shishkin
  0 siblings, 2 replies; 34+ messages in thread
From: Ivan Shapovalov @ 2014-07-09 12:40 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 3043 bytes --]

On Monday 07 July 2014 at 01:47:41, Edward Shishkin wrote:	
> 
> On 06/22/2014 12:48 PM, Ivan Shapovalov wrote:
> [...]
> > +++ b/fs/reiser4/discard.c
> > @@ -0,0 +1,216 @@
> > +/* Copyright 2001, 2002, 2003 by Hans Reiser, licensing governed by
> > + * reiser4/README */
> > +
> > +/* TRIM/discard interoperation subsystem for reiser4. */
> > +
> > +/*
> > + * This subsystem is responsible for populating an atom's ->discard_set and
> > + * (later) converting it into a series of discard calls to the kernel.
> > + *
> > + * The discard is an in-kernel interface for notifying the storage
> > + * hardware about blocks that are being logically freed by the filesystem.
> > + * This is done via calling the blkdev_issue_discard() function. There are
> > + * restrictions on block ranges: they should constitute at least one erase unit
> > + * in length and be correspondingly aligned. Otherwise a discard request will
> > + * be ignored.
> > + *
> > + * The erase unit size is kept in struct queue_limits as discard_granularity.
> > + * The offset from the partition start to the first erase unit is kept in
> > + * struct queue_limits as discard_alignment.
> > + *
> > + * At atom level, we record numbers of all blocks that happen to be deallocated
> > + * during the transaction. Then we read the generated set, filter out any blocks
> > + * that have since been allocated again and issue discards for everything still
> > + * valid. This is what discard.[ch] is here for.
> > + *
> > + * However, simply iterating through the recorded extents is not enough:
> 
> 
> I still don't understand this explanation..

In short: we need to "screen" those extents to check whether they are still
free.

> 
> 
> > + * - if a single extent is smaller than the erase unit, then this particular
> > + *   extent won't be discarded even if it is surrounded by enough free blocks
> > + *   to constitute a whole erase unit;
> 
> 
> Why not to discard the aligned and padded extent, which coincides
> with the whole erase unit?

With a number of whole erase units.

> 
> 
> > + * - we won't be able to merge small adjacent extents forming an extent long
> > + *   enough to be discarded.
> 
> 
> At this point we have already sorted and merged everything.
> So may be it makes sense just to check the head and tail of every resulted
> extent and discard the aligned and padded one?

"Head and tail" is not sufficient. We may check the whole extent with a single
bitmap request, but such algorithm will be very inefficient: it will miss many
possibilities for discarding.

Consider many-block extent, from which one block has been allocated again.
In this case we miss (all-1) blocks to be discarded (if granularity = 1 block).

> 
> Please, consider such possibility. Iterating over erase units in 
> discard_extent()
> looks suboptimal.

Yes, it's costly... but I don't see any other ways to do the task efficiently.

-- 
Ivan Shapovalov / intelfx /

> 
> Thanks,
> Edward.

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-09 12:40     ` Ivan Shapovalov
@ 2014-07-09 16:35       ` Edward Shishkin
  2014-07-13  1:33       ` Edward Shishkin
  1 sibling, 0 replies; 34+ messages in thread
From: Edward Shishkin @ 2014-07-09 16:35 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel

On 07/09/2014 02:40 PM, Ivan Shapovalov wrote:
> On Monday 07 July 2014 at 01:47:41, Edward Shishkin wrote:	
>> On 06/22/2014 12:48 PM, Ivan Shapovalov wrote:
>> [...]
>>> +++ b/fs/reiser4/discard.c
>>> @@ -0,0 +1,216 @@
>>> +/* Copyright 2001, 2002, 2003 by Hans Reiser, licensing governed by
>>> + * reiser4/README */
>>> +
>>> +/* TRIM/discard interoperation subsystem for reiser4. */
>>> +
>>> +/*
>>> + * This subsystem is responsible for populating an atom's ->discard_set and
>>> + * (later) converting it into a series of discard calls to the kernel.
>>> + *
>>> + * The discard is an in-kernel interface for notifying the storage
>>> + * hardware about blocks that are being logically freed by the filesystem.
>>> + * This is done via calling the blkdev_issue_discard() function. There are
>>> + * restrictions on block ranges: they should constitute at least one erase unit
>>> + * in length and be correspondingly aligned. Otherwise a discard request will
>>> + * be ignored.
>>> + *
>>> + * The erase unit size is kept in struct queue_limits as discard_granularity.
>>> + * The offset from the partition start to the first erase unit is kept in
>>> + * struct queue_limits as discard_alignment.
>>> + *
>>> + * At atom level, we record numbers of all blocks that happen to be deallocated
>>> + * during the transaction. Then we read the generated set, filter out any blocks
>>> + * that have since been allocated again and issue discards for everything still
>>> + * valid. This is what discard.[ch] is here for.
>>> + *
>>> + * However, simply iterating through the recorded extents is not enough:
>>
>> I still don't understand this explanation..
> In short: we need to "screen" those extents to check whether they are still
> free.
>
>>
>>> + * - if a single extent is smaller than the erase unit, then this particular
>>> + *   extent won't be discarded even if it is surrounded by enough free blocks
>>> + *   to constitute a whole erase unit;
>>
>> Why not to discard the aligned and padded extent, which coincides
>> with the whole erase unit?
> With a number of whole erase units.
>
>>
>>> + * - we won't be able to merge small adjacent extents forming an extent long
>>> + *   enough to be discarded.
>> "I don't believe" (c) K. Stanislavski
>>
>> At this point we have already sorted and merged everything.
>> So may be it makes sense just to check the head and tail of every resulted
>> extent and discard the aligned and padded one?
> "Head and tail" is not sufficient. We may check the whole extent with a single
> bitmap request, but such algorithm will be very inefficient: it will miss many
> possibilities for discarding.
>
> Consider many-block extent, from which one block has been allocated again.
> In this case we miss (all-1) blocks to be discarded (if granularity = 1 block).
>
>> Please, consider such possibility. Iterating over erase units in
>> discard_extent()
>> looks suboptimal.
> Yes, it's costly... but I don't see any other ways to do the task efficiently.


"I don't believe!" (c) K. Stanislavski.

OK, I'll show my version of the discard-list-extents a bit later..
Please, consider implementation of the FITRIM ioctl based on the
current discard functionality, if possible.

Thanks!
Edward.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-09 12:40     ` Ivan Shapovalov
  2014-07-09 16:35       ` Edward Shishkin
@ 2014-07-13  1:33       ` Edward Shishkin
  2014-07-13 12:47         ` Ivan Shapovalov
  1 sibling, 1 reply; 34+ messages in thread
From: Edward Shishkin @ 2014-07-13  1:33 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 1655 bytes --]

On 07/09/2014 02:40 PM, Ivan Shapovalov wrote:
> On Monday 07 July 2014 at 01:47:41, Edward Shishkin wrote:	
>> On 06/22/2014 12:48 PM, Ivan Shapovalov wrote:
>> [...]
[...]
>>
>>> + * - if a single extent is smaller than the erase unit, then this particular
>>> + *   extent won't be discarded even if it is surrounded by enough free blocks
>>> + *   to constitute a whole erase unit;
>>
>> Why not to discard the aligned and padded extent, which coincides
>> with the whole erase unit?
> With a number of whole erase units.
>
>>
>>> + * - we won't be able to merge small adjacent extents forming an extent long
>>> + *   enough to be discarded.
>>
>> At this point we have already sorted and merged everything.
>> So may be it makes sense just to check the head and tail of every resulted
>> extent and discard the aligned and padded one?
> "Head and tail" is not sufficient. We may check the whole extent with a single
> bitmap request, but such algorithm will be very inefficient: it will miss many
> possibilities for discarding.
>
> Consider many-block extent, from which one block has been allocated again.
> In this case we miss (all-1) blocks to be discarded (if granularity = 1 block).
>
>> Please, consider such possibility. Iterating over erase units in
>> discard_extent()
>> looks suboptimal.
> Yes, it's costly... but I don't see any other ways to do the task efficiently.


How about this function?  (the attached patch is against v6-series).
Total number of bitmap checks is in [N+1, 2N], where N is number of
extents  in the list. At the same time we don't leave any non-discarded
"garbage"...

Edward.
P.S. I tested it, but not enough.

[-- Attachment #2: reiser4-iterate-extents-discard.patch --]
[-- Type: text/x-patch, Size: 8197 bytes --]

---
 fs/reiser4/blocknrlist.c |   10 +
 fs/reiser4/discard.c     |  249 ++++++++++++++++++++++++++++++++++++++++++++++-
 fs/reiser4/txnmgr.h      |    2 
 3 files changed, 259 insertions(+), 2 deletions(-)

--- a/fs/reiser4/discard.c
+++ b/fs/reiser4/discard.c
@@ -167,6 +167,249 @@ static int discard_extent(txn_atom *atom
 	return 0;
 }
 
+/*
+ * Return size of head padding of an extent on a lattice
+ * with step @ulen and offset @uoff.
+ * @start - the start offset of the extent.
+ */
+static int extent_get_headp(reiser4_block_nr start, int uoff, int ulen)
+{
+	__u64 headp;
+	headp = ulen + start - uoff;
+	headp = do_div(headp, ulen);
+	return headp;
+}
+
+/*
+ * Return size of tail padding of an extent on a lattice
+ * with step @ulen and offset @uoff.
+ * @end - the end offset of the extent.
+ */
+static int extent_get_tailp(reiser4_block_nr end, int uoff, int ulen)
+{
+	__u64 tailp;
+	tailp = ulen + end - uoff;
+	tailp = do_div(tailp, ulen);
+	if (tailp)
+		tailp = ulen - tailp;
+	return tailp;
+}
+
+static struct list_head *get_next_at(struct list_head *pos,
+				     struct list_head *head)
+{
+	assert("edward-1631", pos != NULL);
+	assert("edward-1632", head != NULL);
+	assert("edward-1633", pos != head);
+
+	return pos->next == head ? NULL : pos->next;
+}
+
+static inline int check_free_blocks(const reiser4_block_nr start,
+				    const reiser4_block_nr len)
+{
+	return reiser4_check_blocks(&start, &len, 0);
+}
+
+/* Make sure that extents are sorted and merged */
+#if REISER4_DEBUG
+static inline void check_blocknr_list_at(struct list_head *pos,
+					 struct list_head *head)
+{
+	struct list_head *next;
+
+	next = get_next_at(pos, head);
+	if (next == NULL)
+		return;
+	if (blocknr_list_entry_start(next) <=
+	    blocknr_list_entry_start(pos) + blocknr_list_entry_len(pos))
+		warning("edward-1634",
+			"discard bad pair of extents: (%llu,%llu), (%llu,%llu)",
+			blocknr_list_entry_start(pos),
+			blocknr_list_entry_len(pos),
+			blocknr_list_entry_start(next),
+			blocknr_list_entry_len(next));
+}
+#else
+#define check_blocknr_list_at(pos, head) noop
+#endif
+
+/*
+ * discard_sorted_extents() - iterate through the list of
+ * sorted and merged extents and check head and tail paddings
+ * of each extent in the working space map. Try to "glue" the
+ * nearby extents. Discard the (glued) extents with padded (or
+ * cut) head and tail.
+ *
+ * Pre-conditions: @head points to the list of sorted and
+ * merged extents.
+ *
+ * Local variables:
+ *
+ * d_uni - discard unit size (in blocks);
+ * d_off - discard alignment (in blocks);
+ *
+ * start - offset of the first block of the extent;
+ * len - length of the extent;
+ * end - offset of the first block beyond extent;
+ *
+ * headp - size of head padding of the extent;
+ * tailp - size of tail padding of the extent;
+ *
+ * astart - actual start to discard (offset of the head padding);
+ * alen - actual length to discard (length of glued aligned and padded extents).
+ *
+ * Terminology in the comments:
+ *
+ * head - a part of extent at the beginning;
+ * tail - a part of extent at the end.
+ */
+
+static int discard_sorted_extents(struct list_head *head)
+{
+	int ret;
+	struct super_block *sb = reiser4_get_current_sb();
+	struct queue_limits *limits = &bdev_get_queue(sb->s_bdev)->limits;
+
+	int d_uni;
+	int d_off;
+	struct list_head *pos;
+
+	d_uni = limits->discard_granularity / sb->s_blocksize;
+	if (d_uni == 0)
+		d_uni = 1;
+	d_off = (bdev_discard_alignment(sb->s_bdev) / sb->s_blocksize) % d_uni;
+
+	for (pos = head->next; pos != head; pos = pos->next) {
+		int headp;
+		int tailp;
+		int headp_is_known_dirty = 0;
+		reiser4_block_nr start;
+		reiser4_block_nr len;
+		reiser4_block_nr end;
+		reiser4_block_nr astart; __s64 alen;
+
+		check_blocknr_list_at(pos, head);
+
+		start = blocknr_list_entry_start(pos);
+		len = blocknr_list_entry_len(pos);
+		/*
+		 * Step I. Cut or pad the head of extent
+		 *
+		 * This extent wasn't glued
+		 */
+		headp = extent_get_headp(start, d_off, d_uni);
+
+		if (headp && !headp_is_known_dirty &&
+		    check_free_blocks(start - headp, headp)) {
+			/*
+			 * head padding is clean,
+			 * pad the head
+			 */
+			astart = start - headp;
+			alen = len + headp;
+		}
+		else {
+			/*
+			 * head padding is empty or dirty,
+			 * cut the head
+			 */
+			headp_is_known_dirty = 0;
+			astart = start + (d_uni - headp);
+			alen = len - (d_uni - headp);
+		}
+		/*
+		 * Step II. Try to glue all nearby extents to the tail
+		 * Cut or pad the tail of the last extent.
+		 */
+		end = start + len;
+		tailp = extent_get_tailp(end, d_off, d_uni);
+
+		/*
+		 * This "gluing" loop updates @pos, @end, @tailp, @alen
+		 */
+		while (1) {
+			struct list_head *next;
+
+			next = get_next_at(pos, head);
+			check_blocknr_list_at(next, head);
+
+			if (next && (end + tailp >= blocknr_list_entry_start(next))) {
+				/*
+				 * try to glue the next extent
+				 */
+				reiser4_block_nr next_start;
+				reiser4_block_nr next_len;
+
+				next_start = blocknr_list_entry_start(next);
+				next_len = blocknr_list_entry_len(next);
+
+				if (check_free_blocks(end, next_start - end)) {
+					/*
+					 * jump to the glued extent
+					 */
+					if (end + tailp < next_start + next_len) {
+						/*
+						 * the glued extent doesn't
+						 * fit into the tail padding,
+						 * so update the last one
+						 */
+						tailp = extent_get_tailp(next_start + next_len,
+									 d_off, d_uni);
+						alen += (next_start + next_len - end);
+					}
+					pos = next;
+					end = next_start + next_len;
+					/*
+					 * try to glue more extents
+					 */
+					continue;
+				} else {
+					/*
+					 * gluing failed, cut the tail
+					 */
+					if (end + tailp > next_start)
+						headp_is_known_dirty = 1;
+
+					alen -= (d_uni - tailp);
+					break;
+				}
+			} else {
+				/*
+				 * nothing to glue:
+				 * this is the last extent, or the next
+				 * extent is too far. So just check the
+				 * rest of tail padding
+				 */
+				if (tailp && check_free_blocks(end, tailp))
+					/*
+					 * tail padding is clean,
+					 * pad the tail
+					 */
+					alen += tailp;
+				else
+					/*
+					 * empty or dirty tail padding,
+					 * cut the tail
+					 */
+					alen -= (d_uni - tailp);
+				break;
+			}
+		}
+		/*
+		 * Step III. Discard the result
+		 */
+		if (alen > 0) {
+			ret = __discard_extent(sb->s_bdev,
+					       astart * (sb->s_blocksize >> 9),
+					       alen * (sb->s_blocksize >> 9));
+			if (ret)
+				return ret;
+		}
+	}
+	return 0;
+}
+
 int discard_atom(txn_atom *atom)
 {
 	int ret;
@@ -195,11 +438,13 @@ int discard_atom(txn_atom *atom)
 	blocknr_list_sort_and_join(&discard_set);
 
 	/* Perform actual dirty work. */
-	ret = blocknr_list_iterator(NULL, &discard_set, &discard_extent, NULL, 1);
+
+	//ret = blocknr_list_iterator(NULL, &discard_set, &discard_extent, NULL, 1);
+
+	ret = discard_sorted_extents(&discard_set);
 	if (ret != 0) {
 		return ret;
 	}
-
 	/* Let's do this again for any new extents in the atom's discard set. */
 	return -E_REPEAT;
 }
--- a/fs/reiser4/blocknrlist.c
+++ b/fs/reiser4/blocknrlist.c
@@ -26,6 +26,16 @@ struct blocknr_list_entry {
 
 #define blocknr_list_entry(ptr) list_entry(ptr, blocknr_list_entry, link)
 
+reiser4_block_nr blocknr_list_entry_start(struct list_head *ptr)
+{
+	return blocknr_list_entry(ptr)->start;
+}
+
+reiser4_block_nr blocknr_list_entry_len(struct list_head *ptr)
+{
+	return blocknr_list_entry(ptr)->len;
+}
+
 static void blocknr_list_entry_init(blocknr_list_entry *entry)
 {
 	assert("intelfx-11", entry != NULL);
--- a/fs/reiser4/txnmgr.h
+++ b/fs/reiser4/txnmgr.h
@@ -507,6 +507,8 @@ extern int blocknr_set_iterator(txn_atom
 /* This is the block list interface (see blocknrlist.c) */
 extern int blocknr_list_init_static(void);
 extern void blocknr_list_done_static(void);
+extern reiser4_block_nr blocknr_list_entry_start(struct list_head *blist);
+extern reiser4_block_nr blocknr_list_entry_len(struct list_head *blist);
 extern void blocknr_list_init(struct list_head *blist);
 extern void blocknr_list_destroy(struct list_head *blist);
 extern void blocknr_list_merge(struct list_head *from, struct list_head *to);

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-13  1:33       ` Edward Shishkin
@ 2014-07-13 12:47         ` Ivan Shapovalov
  2014-07-13 19:04           ` Edward Shishkin
  0 siblings, 1 reply; 34+ messages in thread
From: Ivan Shapovalov @ 2014-07-13 12:47 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 2249 bytes --]

On Sunday 13 July 2014 at 03:33:57, Edward Shishkin wrote:	
> On 07/09/2014 02:40 PM, Ivan Shapovalov wrote:
> > On Monday 07 July 2014 at 01:47:41, Edward Shishkin wrote:	
> >> On 06/22/2014 12:48 PM, Ivan Shapovalov wrote:
> >> [...]
> [...]
> >>
> >>> + * - if a single extent is smaller than the erase unit, then this particular
> >>> + *   extent won't be discarded even if it is surrounded by enough free blocks
> >>> + *   to constitute a whole erase unit;
> >>
> >> Why not to discard the aligned and padded extent, which coincides
> >> with the whole erase unit?
> > With a number of whole erase units.
> >
> >>
> >>> + * - we won't be able to merge small adjacent extents forming an extent long
> >>> + *   enough to be discarded.
> >>
> >> At this point we have already sorted and merged everything.
> >> So may be it makes sense just to check the head and tail of every resulted
> >> extent and discard the aligned and padded one?
> > "Head and tail" is not sufficient. We may check the whole extent with a single
> > bitmap request, but such algorithm will be very inefficient: it will miss many
> > possibilities for discarding.
> >
> > Consider many-block extent, from which one block has been allocated again.
> > In this case we miss (all-1) blocks to be discarded (if granularity = 1 block).
> >
> >> Please, consider such possibility. Iterating over erase units in
> >> discard_extent()
> >> looks suboptimal.
> > Yes, it's costly... but I don't see any other ways to do the task efficiently.
> 
> 
> How about this function?  (the attached patch is against v6-series).
> Total number of bitmap checks is in [N+1, 2N], where N is number of
> extents  in the list. At the same time we don't leave any non-discarded
> "garbage"...
> 
> Edward.
> P.S. I tested it, but not enough.

Hm. I'm probably a dumbass, but...

I don't see where the [start; start+len) region is checked for being free.

Also, btw, do we need to cut the head (lines 155-163 of the patch) if headp
is empty? It seems that it would reduce extent by one whole erase unit
without any justification.

But I like the idea of gluing padded extents together if possible...

Thanks,
-- 
Ivan Shapovalov / intelfx /

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-13 12:47         ` Ivan Shapovalov
@ 2014-07-13 19:04           ` Edward Shishkin
  2014-07-13 19:18             ` Ivan Shapovalov
  0 siblings, 1 reply; 34+ messages in thread
From: Edward Shishkin @ 2014-07-13 19:04 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 2484 bytes --]


On 07/13/2014 02:47 PM, Ivan Shapovalov wrote:
> On Sunday 13 July 2014 at 03:33:57, Edward Shishkin wrote:	
>> On 07/09/2014 02:40 PM, Ivan Shapovalov wrote:
>>> On Monday 07 July 2014 at 01:47:41, Edward Shishkin wrote:	
>>>> On 06/22/2014 12:48 PM, Ivan Shapovalov wrote:
>>>> [...]
>> [...]
>>>>> + * - if a single extent is smaller than the erase unit, then this particular
>>>>> + *   extent won't be discarded even if it is surrounded by enough free blocks
>>>>> + *   to constitute a whole erase unit;
>>>> Why not to discard the aligned and padded extent, which coincides
>>>> with the whole erase unit?
>>> With a number of whole erase units.
>>>
>>>>> + * - we won't be able to merge small adjacent extents forming an extent long
>>>>> + *   enough to be discarded.
>>>> At this point we have already sorted and merged everything.
>>>> So may be it makes sense just to check the head and tail of every resulted
>>>> extent and discard the aligned and padded one?
>>> "Head and tail" is not sufficient. We may check the whole extent with a single
>>> bitmap request, but such algorithm will be very inefficient: it will miss many
>>> possibilities for discarding.
>>>
>>> Consider many-block extent, from which one block has been allocated again.
>>> In this case we miss (all-1) blocks to be discarded (if granularity = 1 block).
>>>
>>>> Please, consider such possibility. Iterating over erase units in
>>>> discard_extent()
>>>> looks suboptimal.
>>> Yes, it's costly... but I don't see any other ways to do the task efficiently.
>>
>> How about this function?  (the attached patch is against v6-series).
>> Total number of bitmap checks is in [N+1, 2N], where N is number of
>> extents  in the list. At the same time we don't leave any non-discarded
>> "garbage"...
>>
>> Edward.
>> P.S. I tested it, but not enough.
> Hm. I'm probably a dumbass, but...
>
> I don't see where the [start; start+len) region is checked for being free.


check_free_blocks() is called for this purpose.

Firstly when checking head padding. Secondly in the gluing loop
(to glue 2 extents in particular means to make sure that region
between them is free). Finally we check if the tail padding is free.


>
> Also, btw, do we need to cut the head (lines 155-163 of the patch) if headp
> is empty? It seems that it would reduce extent by one whole erase unit
> without any justification.


Yes, this is definitely a leak of not discarded erase units.
Is the attached version better?

Edward.

[-- Attachment #2: discard.c --]
[-- Type: text/x-csrc, Size: 9575 bytes --]

/* Copyright 2001, 2002, 2003 by Hans Reiser, licensing governed by
 * reiser4/README */

/* TRIM/discard interoperation subsystem for reiser4. */

/*
 * This subsystem is responsible for populating an atom's "discard set" and
 * (later) converting it into a series of discard calls to the kernel.
 *
 * The discard is an in-kernel interface for notifying the storage
 * hardware about blocks that are being logically freed by the filesystem.
 * This is done via calling the blkdev_issue_discard() function. There are
 * restrictions on block ranges: they should constitute at least one erase unit
 * in length and be correspondingly aligned. Otherwise a discard request will
 * be ignored.
 *
 * The erase unit size is kept in struct queue_limits as discard_granularity.
 * The offset from the partition start to the first erase unit is kept in
 * struct queue_limits as discard_alignment.
 *
 * At atom level, we record numbers of all blocks that happen to be deallocated
 * during the transaction. Then we read the generated set, filter out any blocks
 * that have since been allocated again and issue discards for everything still
 * valid. This is what discard.[ch] is here for.
 *
 * MECHANISM:
 *
 * During the transaction deallocated extents are recorded in atom's delete
 * sets. There are two delete sets, because data in one of them (delete_set) is
 * also used by other parts of reiser4. The second delete set (aux_delete_set)
 * complements the first one and is maintained only when discard is enabled.
 *
 * Together these sets constitute "the discard set" -- blocks that have to be
 * considered for discarding. On atom commit we will generate a minimal
 * superset of the discard set, comprised of whole erase units.
 *
 * So, at commit time we:
 * - merge the delete sets;
 * - sort extents of the resulted discard set;
 * - join adjacent extents of the discard set;
 * - discard extenmts of the discard set.
 */

#include "discard.h"
#include "context.h"
#include "debug.h"
#include "txnmgr.h"
#include "super.h"

#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/blkdev.h>

static int discard_extent(struct block_device *bdev, sector_t start,
			  sector_t len)
{
	assert("intelfx-21", bdev != NULL);

	return blkdev_issue_discard(bdev, start,
				    len, reiser4_ctx_gfp_mask_get(), 0);
}

/*
 * Return size of head padding of an extent on a lattice
 * with step @ulen and offset @uoff.
 * @start - the start offset of the extent.
 */
static int extent_get_headp(reiser4_block_nr start, int uoff, int ulen)
{
	__u64 headp;
	headp = ulen + start - uoff;
	headp = do_div(headp, ulen);
	return headp;
}

/*
 * Return size of tail padding of an extent on a lattice
 * with step @ulen and offset @uoff.
 * @end - the end offset of the extent.
 */
static int extent_get_tailp(reiser4_block_nr end, int uoff, int ulen)
{
	__u64 tailp;
	tailp = ulen + end - uoff;
	tailp = do_div(tailp, ulen);
	if (tailp)
		tailp = ulen - tailp;
	return tailp;
}

static inline struct list_head *get_next_at(struct list_head *pos,
					    struct list_head *head)
{
	assert("edward-1631", pos != NULL);
	assert("edward-1632", head != NULL);
	assert("edward-1633", pos != head);

	return pos->next == head ? NULL : pos->next;
}

static inline int check_free_blocks(const reiser4_block_nr start,
				    const reiser4_block_nr len)
{
	return reiser4_check_blocks(&start, &len, 0);
}

/* Make sure that extents are sorted and merged */
#if REISER4_DEBUG
static inline void check_blocknr_list_at(struct list_head *pos,
					 struct list_head *head)
{
	struct list_head *next;

	next = get_next_at(pos, head);
	if (next == NULL)
		return;
	if (blocknr_list_entry_start(next) <=
	    blocknr_list_entry_start(pos) + blocknr_list_entry_len(pos))
		warning("edward-1634",
			"discard bad pair of extents: (%llu,%llu), (%llu,%llu)",
			blocknr_list_entry_start(pos),
			blocknr_list_entry_len(pos),
			blocknr_list_entry_start(next),
			blocknr_list_entry_len(next));
}
#else
#define check_blocknr_list_at(pos, head) noop
#endif

/*
 * discard_sorted_merged_extents() - scan the list of sorted and
 * merged extents and check head and tail paddings of each
 * extent in the working space map. Try to "glue" the nearby
 * extents. Discard the (glued) extents with padded (or cut)
 * head and tail.
 *
 * Pre-conditions: @head points to the list of sorted and
 * merged extents.
 *
 * Local variables:
 *
 * d_uni - discard unit size (in blocks);
 * d_off - discard alignment (in blocks);
 *
 * start - offset of the first block of the extent;
 * len - length of the extent;
 * end - offset of the first block beyond extent;
 *
 * headp - size of head padding of the extent;
 * tailp - size of tail padding of the extent;
 *
 * astart - actual start to discard (offset of the head padding);
 * alen - actual length to discard (length of glued aligned and padded extents).
 *
 * Terminology in the comments:
 *
 * head - a part of extent at the beginning;
 * tail - a part of extent at the end.
 */

static int discard_sorted_merged_extents(struct list_head *head)
{
	int ret;
	struct super_block *sb = reiser4_get_current_sb();
	struct queue_limits *limits = &bdev_get_queue(sb->s_bdev)->limits;

	int d_uni;
	int d_off;
	struct list_head *pos;

	d_uni = limits->discard_granularity / sb->s_blocksize;
	if (d_uni == 0)
		d_uni = 1;
	d_off = (bdev_discard_alignment(sb->s_bdev) / sb->s_blocksize) % d_uni;

	for (pos = head->next; pos != head; pos = pos->next) {
		int headp;
		int tailp;
		int headp_is_known_dirty = 0;
		reiser4_block_nr start;
		reiser4_block_nr len;
		reiser4_block_nr end;
		reiser4_block_nr astart; __s64 alen;

		check_blocknr_list_at(pos, head);

		start = blocknr_list_entry_start(pos);
		len = blocknr_list_entry_len(pos);
		/*
		 * Step I. Cut or pad the head of extent
		 *
		 * This extent wasn't glued
		 */
		headp = extent_get_headp(start, d_off, d_uni);

		if (headp && !headp_is_known_dirty &&
		    check_free_blocks(start - headp, headp)) {
			/*
			 * head padding is clean,
			 * pad the head
			 */
			astart = start - headp;
			alen = len + headp;
		}
		else if (headp == 0) {
			/*
			 * empty head padding
			 */
			assert("edward-1635", headp_is_known_dirty == 0);
			astart = start;
			alen = len;
		} else {
			/*
			 * head padding is not empty and dirty,
			 * cut the head
			 */
			headp_is_known_dirty = 0;
			astart = start + (d_uni - headp);
			alen = len - (d_uni - headp);
		}
		/*
		 * Step II. Try to glue all nearby extents to the tail
		 *          Cut or pad the tail of the last extent.
		 */
		end = start + len;
		tailp = extent_get_tailp(end, d_off, d_uni);

		/*
		 * This "gluing" loop updates @pos, @end, @tailp, @alen
		 */
		while (1) {
			struct list_head *next;

			next = get_next_at(pos, head);
			check_blocknr_list_at(next, head);

			if (next && (end + tailp >= blocknr_list_entry_start(next))) {
				/*
				 * try to glue the next extent
				 */
				reiser4_block_nr next_start;
				reiser4_block_nr next_len;

				next_start = blocknr_list_entry_start(next);
				next_len = blocknr_list_entry_len(next);

				if (check_free_blocks(end, next_start - end)) {
					/*
					 * jump to the glued extent
					 */
					if (end + tailp < next_start + next_len) {
						/*
						 * the glued extent doesn't
						 * fit into the tail padding,
						 * so update the last one
						 */
						tailp = extent_get_tailp(next_start + next_len,
									 d_off, d_uni);
						alen += (next_start + next_len - end);
					}
					pos = next;
					end = next_start + next_len;
					/*
					 * try to glue more extents
					 */
					continue;
				} else {
					/*
					 * gluing failed, cut the tail
					 */
					if (end + tailp > next_start)
						headp_is_known_dirty = 1;

					alen -= (d_uni - tailp);
					break;
				}
			} else {
				/*
				 * nothing to glue:
				 * this is the last extent, or the next
				 * extent is too far. So just check the
				 * rest of tail padding
				 */
				if (tailp && check_free_blocks(end, tailp))
					/*
					 * tail padding is clean,
					 * pad the tail
					 */
					alen += tailp;
				else
					/*
					 * empty or dirty tail padding,
					 * cut the tail
					 */
					alen -= (d_uni - tailp);
				break;
			}
		}
		/*
		 * Step III. Discard the result
		 */
		if (alen > 0) {
			ret = discard_extent(sb->s_bdev,
					     astart * (sb->s_blocksize >> 9),
					     alen * (sb->s_blocksize >> 9));
			if (ret)
				return ret;
		}
	}
	return 0;
}

int discard_atom(txn_atom *atom)
{
	int ret;
	struct list_head discard_set;

	if (!reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
		spin_unlock_atom(atom);
		return 0;
	}

	assert("intelfx-28", atom != NULL);

	if (list_empty(&atom->discard.delete_set) &&
	    list_empty(&atom->discard.aux_delete_set)) {
		spin_unlock_atom(atom);
		return 0;
	}

	/* Take the delete sets from the atom in order to release atom spinlock. */
	blocknr_list_init(&discard_set);
	blocknr_list_merge(&atom->discard.delete_set, &discard_set);
	blocknr_list_merge(&atom->discard.aux_delete_set, &discard_set);
	spin_unlock_atom(atom);

	/* Sort the discard list, joining adjacent extents. */
	blocknr_list_sort_and_join(&discard_set);

	/*
	 * Perform actual dirty work
	 */
	ret = discard_sorted_merged_extents(&discard_set);
	if (ret != 0)
		return ret;
	/*
	 * Let's do this again for any new extents in the atom's discard set
	 */
	return -E_REPEAT;
}

/* Make Linus happy.
   Local variables:
   c-indentation-style: "K&R"
   mode-name: "LC"
   c-basic-offset: 8
   tab-width: 8
   fill-column: 120
   scroll-step: 1
   End:
*/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-13 19:04           ` Edward Shishkin
@ 2014-07-13 19:18             ` Ivan Shapovalov
  2014-07-14  1:56               ` Edward Shishkin
  0 siblings, 1 reply; 34+ messages in thread
From: Ivan Shapovalov @ 2014-07-13 19:18 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 3625 bytes --]

On Sunday 13 July 2014 at 21:04:11, Edward Shishkin wrote:	
> 
> On 07/13/2014 02:47 PM, Ivan Shapovalov wrote:
> > On Sunday 13 July 2014 at 03:33:57, Edward Shishkin wrote:	
> >> On 07/09/2014 02:40 PM, Ivan Shapovalov wrote:
> >>> On Monday 07 July 2014 at 01:47:41, Edward Shishkin wrote:	
> >>>> On 06/22/2014 12:48 PM, Ivan Shapovalov wrote:
> >>>> [...]
> >> [...]
> >>>>> + * - if a single extent is smaller than the erase unit, then this particular
> >>>>> + *   extent won't be discarded even if it is surrounded by enough free blocks
> >>>>> + *   to constitute a whole erase unit;
> >>>> Why not to discard the aligned and padded extent, which coincides
> >>>> with the whole erase unit?
> >>> With a number of whole erase units.
> >>>
> >>>>> + * - we won't be able to merge small adjacent extents forming an extent long
> >>>>> + *   enough to be discarded.
> >>>> At this point we have already sorted and merged everything.
> >>>> So may be it makes sense just to check the head and tail of every resulted
> >>>> extent and discard the aligned and padded one?
> >>> "Head and tail" is not sufficient. We may check the whole extent with a single
> >>> bitmap request, but such algorithm will be very inefficient: it will miss many
> >>> possibilities for discarding.
> >>>
> >>> Consider many-block extent, from which one block has been allocated again.
> >>> In this case we miss (all-1) blocks to be discarded (if granularity = 1 block).
> >>>
> >>>> Please, consider such possibility. Iterating over erase units in
> >>>> discard_extent()
> >>>> looks suboptimal.
> >>> Yes, it's costly... but I don't see any other ways to do the task efficiently.
> >>
> >> How about this function?  (the attached patch is against v6-series).
> >> Total number of bitmap checks is in [N+1, 2N], where N is number of
> >> extents  in the list. At the same time we don't leave any non-discarded
> >> "garbage"...
> >>
> >> Edward.
> >> P.S. I tested it, but not enough.
> > Hm. I'm probably a dumbass, but...
> >
> > I don't see where the [start; start+len) region is checked for being free.
> 
> 
> check_free_blocks() is called for this purpose.
> 
> Firstly when checking head padding. Secondly in the gluing loop
> (to glue 2 extents in particular means to make sure that region
> between them is free). Finally we check if the tail padding is free.

There are three calls to check_free_blocks():

line 197, check_free_blocks(start - headp, headp)
This checks first extent's head padding.

line 247, check_free_blocks(end, next_start - end)
This checks blocks between end of first extent and start of second extent
(including possible tail padding of first extent and possible head padding
of second extent).

line 284, check_free_blocks(end, tailp)
This checks first extent's tail padding.

Nothing seems to call at least check_free_blocks(begin, end)...

BTW, what's the purpose of headp_is_known_dirty? All uses of this variable
can be compile-time resolved to 0. It is never read after assigned 1.

> 
> 
> >
> > Also, btw, do we need to cut the head (lines 155-163 of the patch) if headp
> > is empty? It seems that it would reduce extent by one whole erase unit
> > without any justification.
> 
> 
> Yes, this is definitely a leak of not discarded erase units.
> Is the attached version better?

Yes.. and lines 290-295, seems that tail padding handling has the same problem.
If tailp == 0 (i. e. division remainder is 0 so that end is already aligned),
alen is reduced by d_uni.

> 
> Edward.

-- 
Ivan Shapovalov / intelfx /

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-13 19:18             ` Ivan Shapovalov
@ 2014-07-14  1:56               ` Edward Shishkin
  2014-07-15 11:42                 ` Ivan Shapovalov
  2014-07-19 21:20                 ` [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists Edward Shishkin
  0 siblings, 2 replies; 34+ messages in thread
From: Edward Shishkin @ 2014-07-14  1:56 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 07/13/2014 09:18 PM, Ivan Shapovalov wrote:
> On Sunday 13 July 2014 at 21:04:11, Edward Shishkin wrote:	
>> On 07/13/2014 02:47 PM, Ivan Shapovalov wrote:
>>> On Sunday 13 July 2014 at 03:33:57, Edward Shishkin wrote:	
>>>> On 07/09/2014 02:40 PM, Ivan Shapovalov wrote:
>>>>> On Monday 07 July 2014 at 01:47:41, Edward Shishkin wrote:	
>>>>>> On 06/22/2014 12:48 PM, Ivan Shapovalov wrote:
>>>>>> [...]
>>>> [...]
>>>>>>> + * - if a single extent is smaller than the erase unit, then this particular
>>>>>>> + *   extent won't be discarded even if it is surrounded by enough free blocks
>>>>>>> + *   to constitute a whole erase unit;
>>>>>> Why not to discard the aligned and padded extent, which coincides
>>>>>> with the whole erase unit?
>>>>> With a number of whole erase units.
>>>>>
>>>>>>> + * - we won't be able to merge small adjacent extents forming an extent long
>>>>>>> + *   enough to be discarded.
>>>>>> At this point we have already sorted and merged everything.
>>>>>> So may be it makes sense just to check the head and tail of every resulted
>>>>>> extent and discard the aligned and padded one?
>>>>> "Head and tail" is not sufficient. We may check the whole extent with a single
>>>>> bitmap request, but such algorithm will be very inefficient: it will miss many
>>>>> possibilities for discarding.
>>>>>
>>>>> Consider many-block extent, from which one block has been allocated again.
>>>>> In this case we miss (all-1) blocks to be discarded (if granularity = 1 block).
>>>>>
>>>>>> Please, consider such possibility. Iterating over erase units in
>>>>>> discard_extent()
>>>>>> looks suboptimal.
>>>>> Yes, it's costly... but I don't see any other ways to do the task efficiently.
>>>> How about this function?  (the attached patch is against v6-series).
>>>> Total number of bitmap checks is in [N+1, 2N], where N is number of
>>>> extents  in the list. At the same time we don't leave any non-discarded
>>>> "garbage"...
>>>>
>>>> Edward.
>>>> P.S. I tested it, but not enough.
>>> Hm. I'm probably a dumbass, but...
>>>
>>> I don't see where the [start; start+len) region is checked for being free.
>>
>> check_free_blocks() is called for this purpose.
>>
>> Firstly when checking head padding. Secondly in the gluing loop
>> (to glue 2 extents in particular means to make sure that region
>> between them is free). Finally we check if the tail padding is free.
> There are three calls to check_free_blocks():
>
> line 197, check_free_blocks(start - headp, headp)
> This checks first extent's head padding.
>
> line 247, check_free_blocks(end, next_start - end)
> This checks blocks between end of first extent and start of second extent
> (including possible tail padding of first extent and possible head padding
> of second extent).
>
> line 284, check_free_blocks(end, tailp)
> This checks first extent's tail padding.
>
> Nothing seems to call at least check_free_blocks(begin, end)...


Oh, bad... I thought all this time that extents of the delete sets are 
still dirty
in the working bitmap at the moment of discard.
Hmm, I definitely don't want to check the whole extents for discard...

Why not to delay the actual deallocation (in the working bitmap)?
Anyway, in the situations of disk space pressure (on the first "soft 
ENOSPC")
everyone waits for commit-everything completion. Let's think in this 
direction...


>
> BTW, what's the purpose of headp_is_known_dirty?

To avoid unneeded checks.

(end + tailp > next_start) means that the head padding of the next extent
includes the region [end, next_start), which was found dirty (when trying
to glue this next extent).


>   All uses of this variable
> can be compile-time resolved to 0. It is never read after assigned 1.


Yup. Its declaration with the first assignment are at wrong place.
It should be above (near the @pos). I definitely needed to work more
on this function.


>
>>
>>> Also, btw, do we need to cut the head (lines 155-163 of the patch) if headp
>>> is empty? It seems that it would reduce extent by one whole erase unit
>>> without any justification.
>>
>> Yes, this is definitely a leak of not discarded erase units.
>> Is the attached version better?
> Yes.. and lines 290-295, seems that tail padding handling has the same problem.
> If tailp == 0 (i. e. division remainder is 0 so that end is already aligned),

OK, will be fixed in the same way.

Thanks,
Edward.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-14  1:56               ` Edward Shishkin
@ 2014-07-15 11:42                 ` Ivan Shapovalov
  2014-07-16 10:23                   ` Edward Shishkin
  2014-07-19 21:20                 ` [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists Edward Shishkin
  1 sibling, 1 reply; 34+ messages in thread
From: Ivan Shapovalov @ 2014-07-15 11:42 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 6101 bytes --]

On Monday 14 July 2014 at 03:56:43, Edward Shishkin wrote:	
> 
> On 07/13/2014 09:18 PM, Ivan Shapovalov wrote:
> > On Sunday 13 July 2014 at 21:04:11, Edward Shishkin wrote:	
> >> On 07/13/2014 02:47 PM, Ivan Shapovalov wrote:
> >>> On Sunday 13 July 2014 at 03:33:57, Edward Shishkin wrote:	
> >>>> On 07/09/2014 02:40 PM, Ivan Shapovalov wrote:
> >>>>> On Monday 07 July 2014 at 01:47:41, Edward Shishkin wrote:	
> >>>>>> On 06/22/2014 12:48 PM, Ivan Shapovalov wrote:
> >>>>>> [...]
> >>>> [...]
> >>>>>>> + * - if a single extent is smaller than the erase unit, then this particular
> >>>>>>> + *   extent won't be discarded even if it is surrounded by enough free blocks
> >>>>>>> + *   to constitute a whole erase unit;
> >>>>>> Why not to discard the aligned and padded extent, which coincides
> >>>>>> with the whole erase unit?
> >>>>> With a number of whole erase units.
> >>>>>
> >>>>>>> + * - we won't be able to merge small adjacent extents forming an extent long
> >>>>>>> + *   enough to be discarded.
> >>>>>> At this point we have already sorted and merged everything.
> >>>>>> So may be it makes sense just to check the head and tail of every resulted
> >>>>>> extent and discard the aligned and padded one?
> >>>>> "Head and tail" is not sufficient. We may check the whole extent with a single
> >>>>> bitmap request, but such algorithm will be very inefficient: it will miss many
> >>>>> possibilities for discarding.
> >>>>>
> >>>>> Consider many-block extent, from which one block has been allocated again.
> >>>>> In this case we miss (all-1) blocks to be discarded (if granularity = 1 block).
> >>>>>
> >>>>>> Please, consider such possibility. Iterating over erase units in
> >>>>>> discard_extent()
> >>>>>> looks suboptimal.
> >>>>> Yes, it's costly... but I don't see any other ways to do the task efficiently.
> >>>> How about this function?  (the attached patch is against v6-series).
> >>>> Total number of bitmap checks is in [N+1, 2N], where N is number of
> >>>> extents  in the list. At the same time we don't leave any non-discarded
> >>>> "garbage"...
> >>>>
> >>>> Edward.
> >>>> P.S. I tested it, but not enough.
> >>> Hm. I'm probably a dumbass, but...
> >>>
> >>> I don't see where the [start; start+len) region is checked for being free.
> >>
> >> check_free_blocks() is called for this purpose.
> >>
> >> Firstly when checking head padding. Secondly in the gluing loop
> >> (to glue 2 extents in particular means to make sure that region
> >> between them is free). Finally we check if the tail padding is free.
> > There are three calls to check_free_blocks():
> >
> > line 197, check_free_blocks(start - headp, headp)
> > This checks first extent's head padding.
> >
> > line 247, check_free_blocks(end, next_start - end)
> > This checks blocks between end of first extent and start of second extent
> > (including possible tail padding of first extent and possible head padding
> > of second extent).
> >
> > line 284, check_free_blocks(end, tailp)
> > This checks first extent's tail padding.
> >
> > Nothing seems to call at least check_free_blocks(begin, end)...
> 
> 
> Oh, bad... I thought all this time that extents of the delete sets are 
> still dirty
> in the working bitmap at the moment of discard.
> Hmm, I definitely don't want to check the whole extents for discard...
> 
> Why not to delay the actual deallocation (in the working bitmap)?
> Anyway, in the situations of disk space pressure (on the first "soft 
> ENOSPC")
> everyone waits for commit-everything completion. Let's think in this 
> direction...

That's a good idea, actually. Let's outline what is needed for this:

- move reiser4_post_commit_hook() after the call to discard;

This will also move it outside of reiser4_write_logs(), after
reiser4_post_write_back_hook() and various immediate deallocations done by the
journal code. I suppose this is OK for we're under commit mutex anyway.

- defer journal's immediate deallocations until discard.

This is more interesting. Inside of the journal code, blocks are deallocated
in four places:
* dealloc_tx_list()
* dealloc_wmap() -> dealloc_wmap_actor()
* add_region_to_wmap() /* in error path */
* alloc_tx() /* seems like in error path, len == 0 in case of normal exit */

That is, blocks are deallocated after all meaningful work and so relative
order of these deallocations seems to not matter.

Given that point 1 is done, i. e. delete_set is applied to WORKING BITMAP
after reiser4_write_logs(), we can simply make these deallocations deferred
(BA_DEFER flag). This way, we also get rid of aux_delete_set.

The only thing to check is deallocation attributes. All journal's deallocations
are done with target_stage == BLOCK_NOT_COUNTED. This happily coincides with
what's done in apply_dset(), so no problems here.

Is this correct?

Thanks,
-- 
Ivan Shapovalov / intelfx /

> 
> 
> >
> > BTW, what's the purpose of headp_is_known_dirty?
> 
> To avoid unneeded checks.
> 
> (end + tailp > next_start) means that the head padding of the next extent
> includes the region [end, next_start), which was found dirty (when trying
> to glue this next extent).
> 
> 
> >   All uses of this variable
> > can be compile-time resolved to 0. It is never read after assigned 1.
> 
> 
> Yup. Its declaration with the first assignment are at wrong place.
> It should be above (near the @pos). I definitely needed to work more
> on this function.
> 
> 
> >
> >>
> >>> Also, btw, do we need to cut the head (lines 155-163 of the patch) if headp
> >>> is empty? It seems that it would reduce extent by one whole erase unit
> >>> without any justification.
> >>
> >> Yes, this is definitely a leak of not discarded erase units.
> >> Is the attached version better?
> > Yes.. and lines 290-295, seems that tail padding handling has the same problem.
> > If tailp == 0 (i. e. division remainder is 0 so that end is already aligned),
> 
> OK, will be fixed in the same way.
> 
> Thanks,
> Edward.

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-15 11:42                 ` Ivan Shapovalov
@ 2014-07-16 10:23                   ` Edward Shishkin
  2014-07-16 10:26                     ` Edward Shishkin
  2014-07-16 11:24                     ` [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation Ivan Shapovalov
  0 siblings, 2 replies; 34+ messages in thread
From: Edward Shishkin @ 2014-07-16 10:23 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 07/15/2014 01:42 PM, Ivan Shapovalov wrote:
> On Monday 14 July 2014 at 03:56:43, Edward Shishkin wrote:	
>> On 07/13/2014 09:18 PM, Ivan Shapovalov wrote:
>>> On Sunday 13 July 2014 at 21:04:11, Edward Shishkin wrote:	
>>>> On 07/13/2014 02:47 PM, Ivan Shapovalov wrote:
>>>>> On Sunday 13 July 2014 at 03:33:57, Edward Shishkin wrote:	
>>>>>> On 07/09/2014 02:40 PM, Ivan Shapovalov wrote:
>>>>>>> On Monday 07 July 2014 at 01:47:41, Edward Shishkin wrote:	
>>>>>>>> On 06/22/2014 12:48 PM, Ivan Shapovalov wrote:
>>>>>>>> [...]
>>>>>> [...]
>>>>>>>>> + * - if a single extent is smaller than the erase unit, then this particular
>>>>>>>>> + *   extent won't be discarded even if it is surrounded by enough free blocks
>>>>>>>>> + *   to constitute a whole erase unit;
>>>>>>>> Why not to discard the aligned and padded extent, which coincides
>>>>>>>> with the whole erase unit?
>>>>>>> With a number of whole erase units.
>>>>>>>
>>>>>>>>> + * - we won't be able to merge small adjacent extents forming an extent long
>>>>>>>>> + *   enough to be discarded.
>>>>>>>> At this point we have already sorted and merged everything.
>>>>>>>> So may be it makes sense just to check the head and tail of every resulted
>>>>>>>> extent and discard the aligned and padded one?
>>>>>>> "Head and tail" is not sufficient. We may check the whole extent with a single
>>>>>>> bitmap request, but such algorithm will be very inefficient: it will miss many
>>>>>>> possibilities for discarding.
>>>>>>>
>>>>>>> Consider many-block extent, from which one block has been allocated again.
>>>>>>> In this case we miss (all-1) blocks to be discarded (if granularity = 1 block).
>>>>>>>
>>>>>>>> Please, consider such possibility. Iterating over erase units in
>>>>>>>> discard_extent()
>>>>>>>> looks suboptimal.
>>>>>>> Yes, it's costly... but I don't see any other ways to do the task efficiently.
>>>>>> How about this function?  (the attached patch is against v6-series).
>>>>>> Total number of bitmap checks is in [N+1, 2N], where N is number of
>>>>>> extents  in the list. At the same time we don't leave any non-discarded
>>>>>> "garbage"...
>>>>>>
>>>>>> Edward.
>>>>>> P.S. I tested it, but not enough.
>>>>> Hm. I'm probably a dumbass, but...
>>>>>
>>>>> I don't see where the [start; start+len) region is checked for being free.
>>>> check_free_blocks() is called for this purpose.
>>>>
>>>> Firstly when checking head padding. Secondly in the gluing loop
>>>> (to glue 2 extents in particular means to make sure that region
>>>> between them is free). Finally we check if the tail padding is free.
>>> There are three calls to check_free_blocks():
>>>
>>> line 197, check_free_blocks(start - headp, headp)
>>> This checks first extent's head padding.
>>>
>>> line 247, check_free_blocks(end, next_start - end)
>>> This checks blocks between end of first extent and start of second extent
>>> (including possible tail padding of first extent and possible head padding
>>> of second extent).
>>>
>>> line 284, check_free_blocks(end, tailp)
>>> This checks first extent's tail padding.
>>>
>>> Nothing seems to call at least check_free_blocks(begin, end)...
>> Oh, bad... I thought all this time that extents of the delete sets are
>> still dirty
>> in the working bitmap at the moment of discard.
>> Hmm, I definitely don't want to check the whole extents for discard...
>>
>> Why not to delay the actual deallocation (in the working bitmap)?
>> Anyway, in the situations of disk space pressure (on the first "soft
>> ENOSPC")
>> everyone waits for commit-everything completion. Let's think in this
>> direction...
> That's a good idea, actually. Let's outline what is needed for this:
>
> - move reiser4_post_commit_hook() after the call to discard;


To be precise, not the post_commit_hook itself, only its current content.

Those hooks are undocumented, but I think that
- pre_commit_hook is something that should be called before journal
   writes;
- post_commit_hook is something that should be called after journal
   writes completion and before overwrites;
- post_write_back_hook is something that should be called after issuing
   the overwrites.

I suggest to use the post_write_back_hook for discard with the following
cleaning working bitmap. Move this hook to suitable place (make sure it is
after write_tx_back() and journal's "immediate" dealocations).


> This will also move it outside of reiser4_write_logs(), after
> reiser4_post_write_back_hook() and various immediate deallocations done by the
> journal code. I suppose this is OK for we're under commit mutex anyway.
>
> - defer journal's immediate deallocations until discard.
>
> This is more interesting. Inside of the journal code, blocks are deallocated
> in four places:
> * dealloc_tx_list()
> * dealloc_wmap() -> dealloc_wmap_actor()
> * add_region_to_wmap() /* in error path */
> * alloc_tx() /* seems like in error path, len == 0 in case of normal exit */
>
> That is, blocks are deallocated after all meaningful work


We want those 4 deallocations to be after pre_commit_hook. In this
case they won't affect commit bitmap if we make them deferred.


>   and so relative
> order of these deallocations seems to not matter.
>
> Given that point 1 is done, i. e. delete_set is applied to WORKING BITMAP
> after reiser4_write_logs(), we can simply make these deallocations deferred
> (BA_DEFER flag). This way, we also get rid of aux_delete_set.


Yes, make reiser4_block_alloc() jump to "defer" branch if discard mode
is on.


> The only thing to check is deallocation attributes. All journal's deallocations
> are done with target_stage == BLOCK_NOT_COUNTED. This happily coincides with
> what's done in apply_dset(), so no problems here.
>
> Is this correct?

Looks OK.

Thanks,
Edward.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-16 10:23                   ` Edward Shishkin
@ 2014-07-16 10:26                     ` Edward Shishkin
  2014-07-16 11:24                     ` [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation Ivan Shapovalov
  1 sibling, 0 replies; 34+ messages in thread
From: Edward Shishkin @ 2014-07-16 10:26 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 07/16/2014 12:23 PM, Edward Shishkin wrote:
>
> On 07/15/2014 01:42 PM, Ivan Shapovalov wrote:
>> On Monday 14 July 2014 at 03:56:43, Edward Shishkin wrote:
[...]
>>>
>>> Why not to delay the actual deallocation (in the working bitmap)?
>>> Anyway, in the situations of disk space pressure (on the first "soft
>>> ENOSPC")
>>> everyone waits for commit-everything completion. Let's think in this
>>> direction...
>> That's a good idea, actually. Let's outline what is needed for this:
>>
>> - move reiser4_post_commit_hook() after the call to discard;
>
>
> To be precise, not the post_commit_hook itself, only its current content.
>
> Those hooks are undocumented, but I think that
> - pre_commit_hook is something that should be called before journal
>   writes;
> - post_commit_hook is something that should be called after journal
>   writes completion and before overwrites;
> - post_write_back_hook is something that should be called after issuing
>   the overwrites.
>
> I suggest to use the post_write_back_hook for discard with the following
> cleaning working bitmap. Move this hook to suitable place (make sure 
> it is
> after write_tx_back() and journal's "immediate" dealocations).
>
>
>> This will also move it outside of reiser4_write_logs(), after
>> reiser4_post_write_back_hook() and various immediate deallocations 
>> done by the
>> journal code. I suppose this is OK for we're under commit mutex anyway.
>>
>> - defer journal's immediate deallocations until discard.
>>
>> This is more interesting. Inside of the journal code, blocks are 
>> deallocated
>> in four places:
>> * dealloc_tx_list()
>> * dealloc_wmap() -> dealloc_wmap_actor()
>> * add_region_to_wmap() /* in error path */
>> * alloc_tx() /* seems like in error path, len == 0 in case of normal 
>> exit */
>>
>> That is, blocks are deallocated after all meaningful work
>
>
> We want those 4 deallocations to be after pre_commit_hook. In this
> case they won't affect commit bitmap if we make them deferred.
>
>
>>   and so relative
>> order of these deallocations seems to not matter.
>>
>> Given that point 1 is done, i. e. delete_set is applied to WORKING 
>> BITMAP
>> after reiser4_write_logs(), we can simply make these deallocations 
>> deferred
>> (BA_DEFER flag). This way, we also get rid of aux_delete_set.
>
>
> Yes, make reiser4_block_alloc() jump to "defer" branch if discard mode

^reiser4_block_alloc^reiser4_dealloc_blocks

> is on.
>
>
>> The only thing to check is deallocation attributes. All journal's 
>> deallocations
>> are done with target_stage == BLOCK_NOT_COUNTED. This happily 
>> coincides with
>> what's done in apply_dset(), so no problems here.
>>
>> Is this correct?
>
> Looks OK.
>
> Thanks,
> Edward.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation.
  2014-07-16 10:23                   ` Edward Shishkin
  2014-07-16 10:26                     ` Edward Shishkin
@ 2014-07-16 11:24                     ` Ivan Shapovalov
  2014-07-16 11:24                       ` [veryRFC] [PATCH 1/2] reiser4: discard support: perform discards and deallocations after writing logs Ivan Shapovalov
                                         ` (3 more replies)
  1 sibling, 4 replies; 34+ messages in thread
From: Ivan Shapovalov @ 2014-07-16 11:24 UTC (permalink / raw)
  To: edward.shishkin; +Cc: reiserfs-devel, Ivan Shapovalov

This needs your version of discard algorithm, because blocks inside extents of
discard set are now marked as allocated.

BTW, have you dropped the case with erase unit size < block size? This is
probably OK, but configurations with nonzero alignments are then broken. Not
worth handling these, I think, but nevertheless a warning could be useful ;)

Ivan Shapovalov (2):
  reiser4: discard support: perform discards and deallocations after writing logs.
  reiser4: discard support: proof-of-concept for "discard before dealloc".

 fs/reiser4/block_alloc.c | 48 +++++++++++++++++++-------------------
 fs/reiser4/discard.c     | 60 ++++++++++++++++++++++++++++++++++--------------
 fs/reiser4/discard.h     |  4 +++-
 fs/reiser4/txnmgr.c      | 35 ----------------------------
 fs/reiser4/txnmgr.h      | 13 ++---------
 fs/reiser4/wander.c      |  3 ++-
 6 files changed, 73 insertions(+), 90 deletions(-)

-- 
2.0.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [veryRFC] [PATCH 1/2] reiser4: discard support: perform discards and deallocations after writing logs.
  2014-07-16 11:24                     ` [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation Ivan Shapovalov
@ 2014-07-16 11:24                       ` Ivan Shapovalov
  2014-07-16 11:24                       ` [veryRFC] [PATCH 2/2] reiser4: discard support: proof-of-concept for "discard before dealloc" Ivan Shapovalov
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: Ivan Shapovalov @ 2014-07-16 11:24 UTC (permalink / raw)
  To: edward.shishkin; +Cc: reiserfs-devel, Ivan Shapovalov

reiser4_post_write_back_hook() is now used for both of these things.
The call is moved after journalling code is done with deallocations.

Signed-off-by: Ivan Shapovalov <intelfx100@gmail.com>
---
 fs/reiser4/block_alloc.c | 24 +++++++++++++++++++-----
 fs/reiser4/txnmgr.c      | 11 -----------
 fs/reiser4/wander.c      |  3 ++-
 3 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/fs/reiser4/block_alloc.c b/fs/reiser4/block_alloc.c
index 7b5000b..4ce2a16 100644
--- a/fs/reiser4/block_alloc.c
+++ b/fs/reiser4/block_alloc.c
@@ -1136,15 +1136,13 @@ apply_dset(txn_atom * atom UNUSED_ARG, const reiser4_block_nr * a,
 
 void reiser4_post_commit_hook(void)
 {
+#ifdef REISER4_DEBUG
 	txn_atom *atom;
 
 	atom = get_current_atom_locked();
 	assert("zam-452", atom->stage == ASTAGE_POST_COMMIT);
 	spin_unlock_atom(atom);
-
-	/* do the block deallocation which was deferred
-	   until commit is done */
-	atom_dset_deferred_apply(atom, apply_dset, NULL, 0);
+#endif
 
 	assert("zam-504", get_current_super_private() != NULL);
 	sa_post_commit_hook();
@@ -1152,8 +1150,24 @@ void reiser4_post_commit_hook(void)
 
 void reiser4_post_write_back_hook(void)
 {
-	assert("zam-504", get_current_super_private() != NULL);
+	txn_atom *atom;
+	int ret;
 
+	/* process and issue discard requests */
+	do {
+		atom = get_current_atom_locked();
+		ret = discard_atom(*atom);
+	} while (ret == -E_REPEAT);
+
+	if (ret) {
+		warning("intelfx-8", "discard atom failed (%ld)", ret);
+	}
+
+	/* do the block deallocation which was deferred
+	   until commit is done */
+	atom_dset_deferred_apply(atom, apply_dset, NULL, 0);
+
+	assert("zam-504", get_current_super_private() != NULL);
 	sa_post_write_back_hook();
 }
 
diff --git a/fs/reiser4/txnmgr.c b/fs/reiser4/txnmgr.c
index f27d1dc..317bc4f 100644
--- a/fs/reiser4/txnmgr.c
+++ b/fs/reiser4/txnmgr.c
@@ -1089,17 +1089,6 @@ static int commit_current_atom(long *nr_submitted, txn_atom ** atom)
 	if (ret < 0)
 		reiser4_panic("zam-597", "write log failed (%ld)\n", ret);
 
-	/* process and issue discard requests */
-	do {
-		spin_lock_atom(*atom);
-		ret = discard_atom(*atom);
-	} while (ret == -E_REPEAT);
-
-	if (ret) {
-		warning("intelfx-8", "discard atom failed (%ld)", ret);
-		ret = 0; /* the discard is optional, don't fail the commit */
-	}
-
 	/* The atom->ovrwr_nodes list is processed under commit mutex held
 	   because of bitmap nodes which are captured by special way in
 	   reiser4_pre_commit_hook_bitmap(), that way does not include
diff --git a/fs/reiser4/wander.c b/fs/reiser4/wander.c
index 4e29de8..04ddec6 100644
--- a/fs/reiser4/wander.c
+++ b/fs/reiser4/wander.c
@@ -1252,7 +1252,6 @@ int reiser4_write_logs(long *nr_submitted)
 	reiser4_post_commit_hook();
 
 	ret = write_tx_back(&ch);
-	reiser4_post_write_back_hook();
 
       up_and_ret:
 	if (ret) {
@@ -1265,6 +1264,8 @@ int reiser4_write_logs(long *nr_submitted)
 	dealloc_tx_list(&ch);
 	dealloc_wmap(&ch);
 
+	reiser4_post_write_back_hook();
+
 	put_overwrite_set(&ch);
 
 	done_commit_handle(&ch);
-- 
2.0.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [veryRFC] [PATCH 2/2] reiser4: discard support: proof-of-concept for "discard before dealloc".
  2014-07-16 11:24                     ` [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation Ivan Shapovalov
  2014-07-16 11:24                       ` [veryRFC] [PATCH 1/2] reiser4: discard support: perform discards and deallocations after writing logs Ivan Shapovalov
@ 2014-07-16 11:24                       ` Ivan Shapovalov
  2014-07-20  1:11                         ` Edward Shishkin
  2014-07-16 14:19                       ` [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation Ivan Shapovalov
  2014-07-16 23:35                       ` Edward Shishkin
  3 siblings, 1 reply; 34+ messages in thread
From: Ivan Shapovalov @ 2014-07-16 11:24 UTC (permalink / raw)
  To: edward.shishkin; +Cc: reiserfs-devel, Ivan Shapovalov

This is INCOMPLETE and needs a modified discard_extent() because blocks inside
of original extents are always allocated but allowed for discarding.

Signed-off-by: Ivan Shapovalov <intelfx100@gmail.com>
---
 fs/reiser4/block_alloc.c | 28 +++++-----------------
 fs/reiser4/discard.c     | 60 ++++++++++++++++++++++++++++++++++--------------
 fs/reiser4/discard.h     |  4 +++-
 fs/reiser4/txnmgr.c      | 24 -------------------
 fs/reiser4/txnmgr.h      | 13 ++---------
 5 files changed, 54 insertions(+), 75 deletions(-)

diff --git a/fs/reiser4/block_alloc.c b/fs/reiser4/block_alloc.c
index 4ce2a16..f85cfb8 100644
--- a/fs/reiser4/block_alloc.c
+++ b/fs/reiser4/block_alloc.c
@@ -1007,7 +1007,8 @@ reiser4_dealloc_blocks(const reiser4_block_nr * start,
 		spin_unlock_reiser4_super(sbinfo);
 	}
 
-	if (flags & BA_DEFER) {
+	if ((flags & BA_DEFER) ||
+	    reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
 		/* store deleted block numbers in the atom's deferred delete set
 		   for further actual deletion */
 		do {
@@ -1028,25 +1029,6 @@ reiser4_dealloc_blocks(const reiser4_block_nr * start,
 		spin_unlock_atom(atom);
 
 	} else {
-		/* store deleted block numbers in the atom's immediate delete set
-		   for further processing */
-		do {
-			atom = get_current_atom_locked();
-			assert("intelfx-51", atom != NULL);
-
-			ret = atom_dset_immediate_add_extent(atom, &new_entry, start, len);
-
-			if (ret == -ENOMEM)
-				return ret;
-
-			/* This loop might spin at most two times */
-		} while (ret == -E_REPEAT);
-
-		assert("intelfx-52", ret == 0);
-		assert("intelfx-53", atom != NULL);
-
-		spin_unlock_atom(atom);
-
 		assert("zam-425", get_current_super_private() != NULL);
 		sa_dealloc_blocks(reiser4_get_space_allocator(ctx->super),
 				  *start, *len);
@@ -1150,13 +1132,15 @@ void reiser4_post_commit_hook(void)
 
 void reiser4_post_write_back_hook(void)
 {
+	struct list_head discard_tmp;
 	txn_atom *atom;
 	int ret;
 
 	/* process and issue discard requests */
+	blocknr_list_init (&discard_tmp);
 	do {
 		atom = get_current_atom_locked();
-		ret = discard_atom(*atom);
+		ret = discard_atom(*atom, &discard_tmp);
 	} while (ret == -E_REPEAT);
 
 	if (ret) {
@@ -1165,7 +1149,7 @@ void reiser4_post_write_back_hook(void)
 
 	/* do the block deallocation which was deferred
 	   until commit is done */
-	atom_dset_deferred_apply(atom, apply_dset, NULL, 0);
+	atom_dset_deferred_apply(atom, apply_dset, NULL, 1);
 
 	assert("zam-504", get_current_super_private() != NULL);
 	sa_post_write_back_hook();
diff --git a/fs/reiser4/discard.c b/fs/reiser4/discard.c
index 3c8ee89..0596938 100644
--- a/fs/reiser4/discard.c
+++ b/fs/reiser4/discard.c
@@ -33,24 +33,42 @@
  * MECHANISM:
  *
  * During the transaction deallocated extents are recorded in atom's delete
- * sets. There are two delete sets, because data in one of them (delete_set) is
- * also used by other parts of reiser4. The second delete set (aux_delete_set)
- * complements the first one and is maintained only when discard is enabled.
+ * set. In reiser4, there are two methods to deallocate a block:
+ * 1. deferred deallocation, enabled by BA_DEFER flag to reiser4_dealloc_block().
+ *    In this mode, blocks are stored to delete set instead of being marked free
+ *    immediately. After committing the transaction, the delete set is "applied"
+ *    by the block allocator and all these blocks are marked free in memory
+ *    (see reiser4_post_commit_hook()).
+ *    Space management plugins also read the delete set to update on-disk
+ *    allocation records (see reiser4_pre_commit_hook()).
+ * 2. immediate deallocation (the opposite).
+ *    In this mode, blocks are marked free immediately. This is used by the
+ *    journal subsystem to manage space used by the journal records, so these
+ *    allocations are not visible to the space management plugins and never hit
+ *    the disk.
  *
- * Together these sets constitute "the discard set" -- blocks that have to be
- * considered for discarding. On atom commit we will generate a minimal
- * superset of the discard set, comprised of whole erase units.
+ * When discard is enabled, all immediate deallocations become deferred. This
+ * is OK because journal's allocations happen after reiser4_pre_commit_hook()
+ * where the on-disk space allocation records are updated. So, in this mode
+ * the atom's delete set becomes "the discard set" -- list of blocks that have
+ * to be considered for discarding.
+ *
+ * On atom commit we will generate a minimal superset of the discard set,
+ * comprised of whole erase units.
+ *
+ * Discarding is performed before reiser4_post_commit_hook(), hence all extents
+ * in the discard set are still marked as allocated and cannot contain any data.
+ * Thus we can avoid any checks for blocks directly present in the discard set.
+ *
+ * However, we pad each extent from both sides to erase unit boundaries, and
+ * these paddings still have to be checked if they fall outside of initial
+ * extent (may not happen if block size > erase unit size).
  *
  * So, at commit time the following actions take place:
  * - delete sets are merged to form the discard set;
  * - elements of the discard set are sorted;
  * - the discard set is iterated, joining any adjacent extents;
- * - each of resulting extents is "covered" by erase units:
- *   - its start is rounded down to the closest erase unit boundary;
- *   - starting from this block, extents of erase unit length are created
- *     until the original extent is fully covered;
- * - the calculated erase units are checked to be fully deallocated;
- * - remaining (valid) erase units are then passed to blkdev_issue_discard().
+ * - <TODO>
  */
 
 #include "discard.h"
@@ -167,7 +185,7 @@ static int discard_extent(txn_atom *atom UNUSED_ARG,
 	return 0;
 }
 
-int discard_atom(txn_atom *atom)
+int discard_atom(txn_atom *atom, struct list_head *processed_set)
 {
 	int ret;
 	struct list_head discard_set;
@@ -178,9 +196,14 @@ int discard_atom(txn_atom *atom)
 	}
 
 	assert("intelfx-28", atom != NULL);
+	assert("intelfx-59", processed_entries != NULL);
 
-	if (list_empty(&atom->discard.delete_set) &&
-	    list_empty(&atom->discard.aux_delete_set)) {
+	if (list_empty(&atom->discard.delete_set)) {
+		/*
+		 * Nothing left to discard. Move all processed entries back to atom
+		 * for reiser4_post_commit_hook().
+		 */
+		blocknr_list_merge(processed_set, &atom->discard.delete_set);
 		spin_unlock_atom(atom);
 		return 0;
 	}
@@ -188,14 +211,17 @@ int discard_atom(txn_atom *atom)
 	/* Take the delete sets from the atom in order to release atom spinlock. */
 	blocknr_list_init(&discard_set);
 	blocknr_list_merge(&atom->discard.delete_set, &discard_set);
-	blocknr_list_merge(&atom->discard.aux_delete_set, &discard_set);
 	spin_unlock_atom(atom);
 
 	/* Sort the discard list, joining adjacent and overlapping extents. */
 	blocknr_list_sort_and_join(&discard_set);
 
 	/* Perform actual dirty work. */
-	ret = blocknr_list_iterator(NULL, &discard_set, &discard_extent, NULL, 1);
+	ret = blocknr_list_iterator(NULL, &discard_set, &discard_extent, NULL, 0);
+
+	/* Add processed extents to the temporary list. */
+	blocknr_list_merge(&discard_set, processed_set);
+
 	if (ret != 0) {
 		return ret;
 	}
diff --git a/fs/reiser4/discard.h b/fs/reiser4/discard.h
index ea46334..95bbe8b 100644
--- a/fs/reiser4/discard.h
+++ b/fs/reiser4/discard.h
@@ -14,8 +14,10 @@
  * if discard is enabled. In this case the delete sets are cleared.
  *
  * @atom should be locked on entry and is unlocked on exit.
+ * @processed_set keeps state between invocations in -E_REPEAT pattern;
+ *                must be initialized with blocknr_list_init().
  */
-extern int discard_atom(txn_atom *atom);
+extern int discard_atom(txn_atom *atom, struct list_head *processed_set);
 
 /* __FS_REISER4_DISCARD_H__ */
 #endif
diff --git a/fs/reiser4/txnmgr.c b/fs/reiser4/txnmgr.c
index 317bc4f..d73ecb9 100644
--- a/fs/reiser4/txnmgr.c
+++ b/fs/reiser4/txnmgr.c
@@ -3081,7 +3081,6 @@ void atom_dset_init(txn_atom *atom)
 {
 	if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
 		blocknr_list_init(&atom->discard.delete_set);
-		blocknr_list_init(&atom->discard.aux_delete_set);
 	} else {
 		blocknr_set_init(&atom->nodiscard.delete_set);
 	}
@@ -3091,7 +3090,6 @@ void atom_dset_destroy(txn_atom *atom)
 {
 	if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
 		blocknr_list_destroy(&atom->discard.delete_set);
-		blocknr_list_destroy(&atom->discard.aux_delete_set);
 	} else {
 		blocknr_set_destroy(&atom->nodiscard.delete_set);
 	}
@@ -3101,7 +3099,6 @@ void atom_dset_merge(txn_atom *from, txn_atom *to)
 {
 	if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
 		blocknr_list_merge(&from->discard.delete_set, &to->discard.delete_set);
-		blocknr_list_merge(&from->discard.aux_delete_set, &to->discard.aux_delete_set);
 	} else {
 		blocknr_set_merge(&from->nodiscard.delete_set, &to->nodiscard.delete_set);
 	}
@@ -3155,27 +3152,6 @@ extern int atom_dset_deferred_add_extent(txn_atom *atom,
 	return ret;
 }
 
-extern int atom_dset_immediate_add_extent(txn_atom *atom,
-                                          void **new_entry,
-                                          const reiser4_block_nr *start,
-                                          const reiser4_block_nr *len)
-{
-	int ret;
-
-	if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
-		ret = blocknr_list_add_extent(atom,
-		                             &atom->discard.aux_delete_set,
-		                             (blocknr_list_entry**)new_entry,
-		                             start,
-		                             len);
-	} else {
-		/* no-op */
-		ret = 0;
-	}
-
-	return ret;
-}
-
 /*
  * Local variables:
  * c-indentation-style: "K&R"
diff --git a/fs/reiser4/txnmgr.h b/fs/reiser4/txnmgr.h
index 02757a9..05990d8 100644
--- a/fs/reiser4/txnmgr.h
+++ b/fs/reiser4/txnmgr.h
@@ -259,10 +259,6 @@ struct txn_atom {
 			/* The atom's delete set. It collects block numbers which were
 			   deallocated with BA_DEFER, i. e. of ordinary nodes. */
 			struct list_head delete_set;
-
-			/* The atom's auxiliary delete set. It collects block numbers
-			   which were deallocated without BA_DEFER, i. e. immediately. */
-			struct list_head aux_delete_set;
 		} discard;
 	};
 
@@ -527,9 +523,8 @@ extern int blocknr_list_iterator(txn_atom *atom,
 
 /* These are wrappers for accessing and modifying atom's delete lists,
    depending on whether discard is enabled or not.
-   If it is enabled. both deferred and immediate delete lists are maintained,
-   and (less memory efficient) blocknr_lists are used for storage. Otherwise, only
-   deferred delete list is maintained and blocknr_set is used for its storage. */
+   If it is enabled, (less memory efficient) blocknr_list is used for delete
+   list storage. Otherwise, blocknr_set is used for this purpose. */
 extern void atom_dset_init(txn_atom *atom);
 extern void atom_dset_destroy(txn_atom *atom);
 extern void atom_dset_merge(txn_atom *from, txn_atom *to);
@@ -541,10 +536,6 @@ extern int atom_dset_deferred_add_extent(txn_atom *atom,
                                          void **new_entry,
                                          const reiser4_block_nr *start,
                                          const reiser4_block_nr *len);
-extern int atom_dset_immediate_add_extent(txn_atom *atom,
-                                          void **new_entry,
-                                          const reiser4_block_nr *start,
-                                          const reiser4_block_nr *len);
 
 /* flush code takes care about how to fuse flush queues */
 extern void flush_init_atom(txn_atom * atom);
-- 
2.0.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation.
  2014-07-16 11:24                     ` [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation Ivan Shapovalov
  2014-07-16 11:24                       ` [veryRFC] [PATCH 1/2] reiser4: discard support: perform discards and deallocations after writing logs Ivan Shapovalov
  2014-07-16 11:24                       ` [veryRFC] [PATCH 2/2] reiser4: discard support: proof-of-concept for "discard before dealloc" Ivan Shapovalov
@ 2014-07-16 14:19                       ` Ivan Shapovalov
  2014-07-16 23:35                       ` Edward Shishkin
  3 siblings, 0 replies; 34+ messages in thread
From: Ivan Shapovalov @ 2014-07-16 14:19 UTC (permalink / raw)
  To: reiserfs-devel; +Cc: edward.shishkin

[-- Attachment #1: Type: text/plain, Size: 1198 bytes --]

On Wednesday 16 July 2014 at 15:24:02, Ivan Shapovalov wrote:	
> This needs your version of discard algorithm, because blocks inside extents of
> discard set are now marked as allocated.
> 
> BTW, have you dropped the case with erase unit size < block size? This is
> probably OK, but configurations with nonzero alignments are then broken. Not
> worth handling these, I think, but nevertheless a warning could be useful ;)
> 
> Ivan Shapovalov (2):
>   reiser4: discard support: perform discards and deallocations after writing logs.
>   reiser4: discard support: proof-of-concept for "discard before dealloc".
> 
>  fs/reiser4/block_alloc.c | 48 +++++++++++++++++++-------------------
>  fs/reiser4/discard.c     | 60 ++++++++++++++++++++++++++++++++++--------------
>  fs/reiser4/discard.h     |  4 +++-
>  fs/reiser4/txnmgr.c      | 35 ----------------------------
>  fs/reiser4/txnmgr.h      | 13 ++---------
>  fs/reiser4/wander.c      |  3 ++-
>  6 files changed, 73 insertions(+), 90 deletions(-)

...and yes, these patches should be considered nothing but a payload for testing
whatever you use for systematic full-system backups :)

-- 
Ivan Shapovalov / intelfx /

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation.
  2014-07-16 11:24                     ` [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation Ivan Shapovalov
                                         ` (2 preceding siblings ...)
  2014-07-16 14:19                       ` [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation Ivan Shapovalov
@ 2014-07-16 23:35                       ` Edward Shishkin
  2014-07-17  9:46                         ` Ivan Shapovalov
  3 siblings, 1 reply; 34+ messages in thread
From: Edward Shishkin @ 2014-07-16 23:35 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel

On 07/16/2014 01:24 PM, Ivan Shapovalov wrote:
> This needs your version of discard algorithm, because blocks inside extents of
> discard set are now marked as allocated.


OK, I'll try to finish it in weekends..


> BTW, have you dropped the case with erase unit size < block size?


In my version all erase units are supported except ones which are
not a power of 2. We can support all erase units without exceptions,
if needed.


>   This is
> probably OK, but configurations with nonzero alignments are then broken.


I support alignments with (bdev_discard_alignment % sb->s_blocksize == 0)
We can support all alignments without exceptions, but it will lead to 
complications
in discard_sorted_extents().


>   Not
> worth handling these, I think, but nevertheless a warning could be useful ;)

Agreed. It is important.

Thanks!
Edward.


>
> Ivan Shapovalov (2):
>    reiser4: discard support: perform discards and deallocations after writing logs.
>    reiser4: discard support: proof-of-concept for "discard before dealloc".
>
>   fs/reiser4/block_alloc.c | 48 +++++++++++++++++++-------------------
>   fs/reiser4/discard.c     | 60 ++++++++++++++++++++++++++++++++++--------------
>   fs/reiser4/discard.h     |  4 +++-
>   fs/reiser4/txnmgr.c      | 35 ----------------------------
>   fs/reiser4/txnmgr.h      | 13 ++---------
>   fs/reiser4/wander.c      |  3 ++-
>   6 files changed, 73 insertions(+), 90 deletions(-)
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation.
  2014-07-16 23:35                       ` Edward Shishkin
@ 2014-07-17  9:46                         ` Ivan Shapovalov
  2014-07-17 11:14                           ` Edward Shishkin
  0 siblings, 1 reply; 34+ messages in thread
From: Ivan Shapovalov @ 2014-07-17  9:46 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 2032 bytes --]

On Thursday 17 July 2014 at 01:35:20, Edward Shishkin wrote:	
> On 07/16/2014 01:24 PM, Ivan Shapovalov wrote:
> > This needs your version of discard algorithm, because blocks inside extents of
> > discard set are now marked as allocated.
> 
> 
> OK, I'll try to finish it in weekends..
> 
> 
> > BTW, have you dropped the case with erase unit size < block size?
> 
> 
> In my version all erase units are supported except ones which are
> not a power of 2. We can support all erase units without exceptions,
> if needed.

I mean, when erase unit size < block size, erase unit size is forced to be
exactly one block. Again, this is OK, but in this case any alignment (by
definition less than one block then) will be dropped and represented as zero,
and _this_ deserves a warning :)

// btw, as you probably guess, these two patches are completely untested.

Thanks,
-- 
Ivan Shapovalov / intelfx /

> 
> 
> >   This is
> > probably OK, but configurations with nonzero alignments are then broken.
> 
> 
> I support alignments with (bdev_discard_alignment % sb->s_blocksize == 0)
> We can support all alignments without exceptions, but it will lead to 
> complications
> in discard_sorted_extents().
> 
> 
> >   Not
> > worth handling these, I think, but nevertheless a warning could be useful ;)
> 
> Agreed. It is important.
> 
> Thanks!
> Edward.
> 
> 
> >
> > Ivan Shapovalov (2):
> >    reiser4: discard support: perform discards and deallocations after writing logs.
> >    reiser4: discard support: proof-of-concept for "discard before dealloc".
> >
> >   fs/reiser4/block_alloc.c | 48 +++++++++++++++++++-------------------
> >   fs/reiser4/discard.c     | 60 ++++++++++++++++++++++++++++++++++--------------
> >   fs/reiser4/discard.h     |  4 +++-
> >   fs/reiser4/txnmgr.c      | 35 ----------------------------
> >   fs/reiser4/txnmgr.h      | 13 ++---------
> >   fs/reiser4/wander.c      |  3 ++-
> >   6 files changed, 73 insertions(+), 90 deletions(-)
> >
> 

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation.
  2014-07-17  9:46                         ` Ivan Shapovalov
@ 2014-07-17 11:14                           ` Edward Shishkin
  2014-07-20 11:33                             ` Ivan Shapovalov
  0 siblings, 1 reply; 34+ messages in thread
From: Edward Shishkin @ 2014-07-17 11:14 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel

On 07/17/2014 11:46 AM, Ivan Shapovalov wrote:
> On Thursday 17 July 2014 at 01:35:20, Edward Shishkin wrote:	
>> On 07/16/2014 01:24 PM, Ivan Shapovalov wrote:
>>> This needs your version of discard algorithm, because blocks inside extents of
>>> discard set are now marked as allocated.
>>
>> OK, I'll try to finish it in weekends..
>>
>>
>>> BTW, have you dropped the case with erase unit size < block size?
>>
>> In my version all erase units are supported except ones which are
>> not a power of 2. We can support all erase units without exceptions,
>> if needed.
> I mean, when erase unit size < block size, erase unit size is forced to be
> exactly one block. Again, this is OK, but in this case any alignment (by
> definition less than one block then) will be dropped and represented as zero,
> and _this_ deserves a warning :)


The case of alignment % block_size != 0 deserves to disable discard
support, because in this case all this science with heads and tails
simply doesn't work.


>
> // btw, as you probably guess, these two patches are completely untested.
>
> Thanks,


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-14  1:56               ` Edward Shishkin
  2014-07-15 11:42                 ` Ivan Shapovalov
@ 2014-07-19 21:20                 ` Edward Shishkin
  2014-07-20 10:06                   ` Ivan Shapovalov
  1 sibling, 1 reply; 34+ messages in thread
From: Edward Shishkin @ 2014-07-19 21:20 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 07/14/2014 03:56 AM, Edward Shishkin wrote:
>
> On 07/13/2014 09:18 PM, Ivan Shapovalov wrote:
>>
[...]
>>
>> Nothing seems to call at least check_free_blocks(begin, end)...
>
>
> Oh, bad... I thought all this time that extents of the delete sets are 
> still dirty
> in the working bitmap at the moment of discard.
> Hmm, I definitely don't want to check the whole extents for discard...


False alarm, actually..

Nothing is wrong if someone makes them dirty before issuing our discard
requests: his changes will be committed in the next sessions of possessing
the commit_mutex _after_ our discard. Everything is isolated.. For the same
reason we don't care if someone allocate blocks in head and tail paddings
before issuing our padded discard extents. We only need to make sure that
they were clean at _our_ commit session.

Anyway, let's use delayed deallocation. I like it better than 
aux_delete_set.
After all, we can roll everything back at every moment, if any problems..

Edward.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [veryRFC] [PATCH 2/2] reiser4: discard support: proof-of-concept for "discard before dealloc".
  2014-07-16 11:24                       ` [veryRFC] [PATCH 2/2] reiser4: discard support: proof-of-concept for "discard before dealloc" Ivan Shapovalov
@ 2014-07-20  1:11                         ` Edward Shishkin
  2014-07-20 10:09                           ` Ivan Shapovalov
  0 siblings, 1 reply; 34+ messages in thread
From: Edward Shishkin @ 2014-07-20  1:11 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 07/16/2014 01:24 PM, Ivan Shapovalov wrote:
> This is INCOMPLETE and needs a modified discard_extent() because blocks inside
> of original extents are always allocated but allowed for discarding.
>
> Signed-off-by: Ivan Shapovalov <intelfx100@gmail.com>
> ---
>   fs/reiser4/block_alloc.c | 28 +++++-----------------
>   fs/reiser4/discard.c     | 60 ++++++++++++++++++++++++++++++++++--------------
>   fs/reiser4/discard.h     |  4 +++-
>   fs/reiser4/txnmgr.c      | 24 -------------------
>   fs/reiser4/txnmgr.h      | 13 ++---------
>   5 files changed, 54 insertions(+), 75 deletions(-)
>
> diff --git a/fs/reiser4/block_alloc.c b/fs/reiser4/block_alloc.c
> index 4ce2a16..f85cfb8 100644
> --- a/fs/reiser4/block_alloc.c
> +++ b/fs/reiser4/block_alloc.c
> @@ -1007,7 +1007,8 @@ reiser4_dealloc_blocks(const reiser4_block_nr * start,
>   		spin_unlock_reiser4_super(sbinfo);
>   	}
>   
> -	if (flags & BA_DEFER) {
> +	if ((flags & BA_DEFER) ||
> +	    reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
>   		/* store deleted block numbers in the atom's deferred delete set
>   		   for further actual deletion */
>   		do {
> @@ -1028,25 +1029,6 @@ reiser4_dealloc_blocks(const reiser4_block_nr * start,
>   		spin_unlock_atom(atom);
>   
>   	} else {
> -		/* store deleted block numbers in the atom's immediate delete set
> -		   for further processing */
> -		do {
> -			atom = get_current_atom_locked();
> -			assert("intelfx-51", atom != NULL);
> -
> -			ret = atom_dset_immediate_add_extent(atom, &new_entry, start, len);
> -
> -			if (ret == -ENOMEM)
> -				return ret;
> -
> -			/* This loop might spin at most two times */
> -		} while (ret == -E_REPEAT);
> -
> -		assert("intelfx-52", ret == 0);
> -		assert("intelfx-53", atom != NULL);
> -
> -		spin_unlock_atom(atom);
> -
>   		assert("zam-425", get_current_super_private() != NULL);
>   		sa_dealloc_blocks(reiser4_get_space_allocator(ctx->super),
>   				  *start, *len);
> @@ -1150,13 +1132,15 @@ void reiser4_post_commit_hook(void)
>   
>   void reiser4_post_write_back_hook(void)
>   {
> +	struct list_head discard_tmp;
>   	txn_atom *atom;
>   	int ret;
>   
>   	/* process and issue discard requests */
> +	blocknr_list_init (&discard_tmp);
>   	do {
>   		atom = get_current_atom_locked();
> -		ret = discard_atom(*atom);
> +		ret = discard_atom(*atom, &discard_tmp);
>   	} while (ret == -E_REPEAT);


This is a leak of disk space on error paths:
if (ret != 0 && ret != -E_REPEAT), then delete_set is empty,
or incomplete at this point.


>   
>   	if (ret) {
> @@ -1165,7 +1149,7 @@ void reiser4_post_write_back_hook(void)
>   
>   	/* do the block deallocation which was deferred
>   	   until commit is done */
> -	atom_dset_deferred_apply(atom, apply_dset, NULL, 0);
> +	atom_dset_deferred_apply(atom, apply_dset, NULL, 1);
>   
>   	assert("zam-504", get_current_super_private() != NULL);
>   	sa_post_write_back_hook();
> diff --git a/fs/reiser4/discard.c b/fs/reiser4/discard.c
> index 3c8ee89..0596938 100644
> --- a/fs/reiser4/discard.c
> +++ b/fs/reiser4/discard.c
> @@ -33,24 +33,42 @@
>    * MECHANISM:
>    *
>    * During the transaction deallocated extents are recorded in atom's delete
> - * sets. There are two delete sets, because data in one of them (delete_set) is
> - * also used by other parts of reiser4. The second delete set (aux_delete_set)
> - * complements the first one and is maintained only when discard is enabled.
> + * set. In reiser4, there are two methods to deallocate a block:
> + * 1. deferred deallocation, enabled by BA_DEFER flag to reiser4_dealloc_block().
> + *    In this mode, blocks are stored to delete set instead of being marked free
> + *    immediately. After committing the transaction, the delete set is "applied"
> + *    by the block allocator and all these blocks are marked free in memory
> + *    (see reiser4_post_commit_hook()).
> + *    Space management plugins also read the delete set to update on-disk
> + *    allocation records (see reiser4_pre_commit_hook()).
> + * 2. immediate deallocation (the opposite).
> + *    In this mode, blocks are marked free immediately. This is used by the
> + *    journal subsystem to manage space used by the journal records, so these
> + *    allocations are not visible to the space management plugins and never hit
> + *    the disk.
>    *
> - * Together these sets constitute "the discard set" -- blocks that have to be
> - * considered for discarding. On atom commit we will generate a minimal
> - * superset of the discard set, comprised of whole erase units.
> + * When discard is enabled, all immediate deallocations become deferred. This
> + * is OK because journal's allocations happen after reiser4_pre_commit_hook()
> + * where the on-disk space allocation records are updated. So, in this mode
> + * the atom's delete set becomes "the discard set" -- list of blocks that have
> + * to be considered for discarding.
> + *
> + * On atom commit we will generate a minimal superset of the discard set,
> + * comprised of whole erase units.
> + *
> + * Discarding is performed before reiser4_post_commit_hook(), hence all extents
> + * in the discard set are still marked as allocated and cannot contain any data.
> + * Thus we can avoid any checks for blocks directly present in the discard set.
> + *
> + * However, we pad each extent from both sides to erase unit boundaries, and
> + * these paddings still have to be checked if they fall outside of initial
> + * extent (may not happen if block size > erase unit size).
>    *
>    * So, at commit time the following actions take place:
>    * - delete sets are merged to form the discard set;
>    * - elements of the discard set are sorted;
>    * - the discard set is iterated, joining any adjacent extents;
> - * - each of resulting extents is "covered" by erase units:
> - *   - its start is rounded down to the closest erase unit boundary;
> - *   - starting from this block, extents of erase unit length are created
> - *     until the original extent is fully covered;
> - * - the calculated erase units are checked to be fully deallocated;
> - * - remaining (valid) erase units are then passed to blkdev_issue_discard().
> + * - <TODO>
>    */
>   
>   #include "discard.h"
> @@ -167,7 +185,7 @@ static int discard_extent(txn_atom *atom UNUSED_ARG,
>   	return 0;
>   }
>   
> -int discard_atom(txn_atom *atom)
> +int discard_atom(txn_atom *atom, struct list_head *processed_set)
>   {
>   	int ret;
>   	struct list_head discard_set;
> @@ -178,9 +196,14 @@ int discard_atom(txn_atom *atom)
>   	}
>   
>   	assert("intelfx-28", atom != NULL);
> +	assert("intelfx-59", processed_entries != NULL);
>   
> -	if (list_empty(&atom->discard.delete_set) &&
> -	    list_empty(&atom->discard.aux_delete_set)) {
> +	if (list_empty(&atom->discard.delete_set)) {
> +		/*
> +		 * Nothing left to discard. Move all processed entries back to atom
> +		 * for reiser4_post_commit_hook().
> +		 */
> +		blocknr_list_merge(processed_set, &atom->discard.delete_set);
>   		spin_unlock_atom(atom);
>   		return 0;
>   	}
> @@ -188,14 +211,17 @@ int discard_atom(txn_atom *atom)
>   	/* Take the delete sets from the atom in order to release atom spinlock. */
>   	blocknr_list_init(&discard_set);
>   	blocknr_list_merge(&atom->discard.delete_set, &discard_set);
> -	blocknr_list_merge(&atom->discard.aux_delete_set, &discard_set);
>   	spin_unlock_atom(atom);
>   
>   	/* Sort the discard list, joining adjacent and overlapping extents. */
>   	blocknr_list_sort_and_join(&discard_set);
>   
>   	/* Perform actual dirty work. */
> -	ret = blocknr_list_iterator(NULL, &discard_set, &discard_extent, NULL, 1);
> +	ret = blocknr_list_iterator(NULL, &discard_set, &discard_extent, NULL, 0);
> +
> +	/* Add processed extents to the temporary list. */
> +	blocknr_list_merge(&discard_set, processed_set);
> +
>   	if (ret != 0) {
>   		return ret;
>   	}
> diff --git a/fs/reiser4/discard.h b/fs/reiser4/discard.h
> index ea46334..95bbe8b 100644
> --- a/fs/reiser4/discard.h
> +++ b/fs/reiser4/discard.h
> @@ -14,8 +14,10 @@
>    * if discard is enabled. In this case the delete sets are cleared.
>    *
>    * @atom should be locked on entry and is unlocked on exit.
> + * @processed_set keeps state between invocations in -E_REPEAT pattern;
> + *                must be initialized with blocknr_list_init().
>    */
> -extern int discard_atom(txn_atom *atom);
> +extern int discard_atom(txn_atom *atom, struct list_head *processed_set);
>   
>   /* __FS_REISER4_DISCARD_H__ */
>   #endif
> diff --git a/fs/reiser4/txnmgr.c b/fs/reiser4/txnmgr.c
> index 317bc4f..d73ecb9 100644
> --- a/fs/reiser4/txnmgr.c
> +++ b/fs/reiser4/txnmgr.c
> @@ -3081,7 +3081,6 @@ void atom_dset_init(txn_atom *atom)
>   {
>   	if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
>   		blocknr_list_init(&atom->discard.delete_set);
> -		blocknr_list_init(&atom->discard.aux_delete_set);
>   	} else {
>   		blocknr_set_init(&atom->nodiscard.delete_set);
>   	}
> @@ -3091,7 +3090,6 @@ void atom_dset_destroy(txn_atom *atom)
>   {
>   	if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
>   		blocknr_list_destroy(&atom->discard.delete_set);
> -		blocknr_list_destroy(&atom->discard.aux_delete_set);
>   	} else {
>   		blocknr_set_destroy(&atom->nodiscard.delete_set);
>   	}
> @@ -3101,7 +3099,6 @@ void atom_dset_merge(txn_atom *from, txn_atom *to)
>   {
>   	if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
>   		blocknr_list_merge(&from->discard.delete_set, &to->discard.delete_set);
> -		blocknr_list_merge(&from->discard.aux_delete_set, &to->discard.aux_delete_set);
>   	} else {
>   		blocknr_set_merge(&from->nodiscard.delete_set, &to->nodiscard.delete_set);
>   	}
> @@ -3155,27 +3152,6 @@ extern int atom_dset_deferred_add_extent(txn_atom *atom,
>   	return ret;
>   }
>   
> -extern int atom_dset_immediate_add_extent(txn_atom *atom,
> -                                          void **new_entry,
> -                                          const reiser4_block_nr *start,
> -                                          const reiser4_block_nr *len)
> -{
> -	int ret;
> -
> -	if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
> -		ret = blocknr_list_add_extent(atom,
> -		                             &atom->discard.aux_delete_set,
> -		                             (blocknr_list_entry**)new_entry,
> -		                             start,
> -		                             len);
> -	} else {
> -		/* no-op */
> -		ret = 0;
> -	}
> -
> -	return ret;
> -}
> -
>   /*
>    * Local variables:
>    * c-indentation-style: "K&R"
> diff --git a/fs/reiser4/txnmgr.h b/fs/reiser4/txnmgr.h
> index 02757a9..05990d8 100644
> --- a/fs/reiser4/txnmgr.h
> +++ b/fs/reiser4/txnmgr.h
> @@ -259,10 +259,6 @@ struct txn_atom {
>   			/* The atom's delete set. It collects block numbers which were
>   			   deallocated with BA_DEFER, i. e. of ordinary nodes. */
>   			struct list_head delete_set;
> -
> -			/* The atom's auxiliary delete set. It collects block numbers
> -			   which were deallocated without BA_DEFER, i. e. immediately. */
> -			struct list_head aux_delete_set;
>   		} discard;
>   	};
>   
> @@ -527,9 +523,8 @@ extern int blocknr_list_iterator(txn_atom *atom,
>   
>   /* These are wrappers for accessing and modifying atom's delete lists,
>      depending on whether discard is enabled or not.
> -   If it is enabled. both deferred and immediate delete lists are maintained,
> -   and (less memory efficient) blocknr_lists are used for storage. Otherwise, only
> -   deferred delete list is maintained and blocknr_set is used for its storage. */
> +   If it is enabled, (less memory efficient) blocknr_list is used for delete
> +   list storage. Otherwise, blocknr_set is used for this purpose. */
>   extern void atom_dset_init(txn_atom *atom);
>   extern void atom_dset_destroy(txn_atom *atom);
>   extern void atom_dset_merge(txn_atom *from, txn_atom *to);
> @@ -541,10 +536,6 @@ extern int atom_dset_deferred_add_extent(txn_atom *atom,
>                                            void **new_entry,
>                                            const reiser4_block_nr *start,
>                                            const reiser4_block_nr *len);
> -extern int atom_dset_immediate_add_extent(txn_atom *atom,
> -                                          void **new_entry,
> -                                          const reiser4_block_nr *start,
> -                                          const reiser4_block_nr *len);
>   
>   /* flush code takes care about how to fuse flush queues */
>   extern void flush_init_atom(txn_atom * atom);


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-19 21:20                 ` [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists Edward Shishkin
@ 2014-07-20 10:06                   ` Ivan Shapovalov
  2014-07-20 12:33                     ` Edward Shishkin
  0 siblings, 1 reply; 34+ messages in thread
From: Ivan Shapovalov @ 2014-07-20 10:06 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

> 20 июля 2014 г., в 1:20, Edward Shishkin <edward.shishkin@gmail.com> написал(а):
> 
> 
>> On 07/14/2014 03:56 AM, Edward Shishkin wrote:
>> 
>>> On 07/13/2014 09:18 PM, Ivan Shapovalov wrote:
> [...]
>>> 
>>> Nothing seems to call at least check_free_blocks(begin, end)...
>> 
>> 
>> Oh, bad... I thought all this time that extents of the delete sets are still dirty
>> in the working bitmap at the moment of discard.
>> Hmm, I definitely don't want to check the whole extents for discard...
> 
> 
> False alarm, actually..
> 
> Nothing is wrong if someone makes them dirty before issuing our discard
> requests: his changes will be committed in the next sessions of possessing
> the commit_mutex _after_ our discard. Everything is isolated.. For the same
> reason we don't care if someone allocate blocks in head and tail paddings
> before issuing our padded discard extents.

I was just thinking about the very same possible race...

But well, isn't the call to current_atom_complete_writes() placed before acquisition of commit_mutex? I was under impression of that relocate sets could be written while another atom is being committed under the mutex.

--
Ivan Shapovalov / intelfx /

(Sent from a phone. Havoc may be wreaked on the formatting.)

> We only need to make sure that
> they were clean at _our_ commit session.
> 
> Anyway, let's use delayed deallocation. I like it better than aux_delete_set.
> After all, we can roll everything back at every moment, if any problems..
> 
> Edward.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [veryRFC] [PATCH 2/2] reiser4: discard support: proof-of-concept for "discard before dealloc".
  2014-07-20  1:11                         ` Edward Shishkin
@ 2014-07-20 10:09                           ` Ivan Shapovalov
  0 siblings, 0 replies; 34+ messages in thread
From: Ivan Shapovalov @ 2014-07-20 10:09 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

> 20 июля 2014 г., в 5:11, Edward Shishkin <edward.shishkin@gmail.com> написал(а):
> 
> 
>> On 07/16/2014 01:24 PM, Ivan Shapovalov wrote:
>> This is INCOMPLETE and needs a modified discard_extent() because blocks inside
>> of original extents are always allocated but allowed for discarding.
>> 
>> Signed-off-by: Ivan Shapovalov <intelfx100@gmail.com>
>> ---
>>  fs/reiser4/block_alloc.c | 28 +++++-----------------
>>  fs/reiser4/discard.c     | 60 ++++++++++++++++++++++++++++++++++--------------
>>  fs/reiser4/discard.h     |  4 +++-
>>  fs/reiser4/txnmgr.c      | 24 -------------------
>>  fs/reiser4/txnmgr.h      | 13 ++---------
>>  5 files changed, 54 insertions(+), 75 deletions(-)
>> 
>> diff --git a/fs/reiser4/block_alloc.c b/fs/reiser4/block_alloc.c
>> index 4ce2a16..f85cfb8 100644
>> --- a/fs/reiser4/block_alloc.c
>> +++ b/fs/reiser4/block_alloc.c
>> @@ -1007,7 +1007,8 @@ reiser4_dealloc_blocks(const reiser4_block_nr * start,
>>          spin_unlock_reiser4_super(sbinfo);
>>      }
>>  -    if (flags & BA_DEFER) {
>> +    if ((flags & BA_DEFER) ||
>> +        reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
>>          /* store deleted block numbers in the atom's deferred delete set
>>             for further actual deletion */
>>          do {
>> @@ -1028,25 +1029,6 @@ reiser4_dealloc_blocks(const reiser4_block_nr * start,
>>          spin_unlock_atom(atom);
>>        } else {
>> -        /* store deleted block numbers in the atom's immediate delete set
>> -           for further processing */
>> -        do {
>> -            atom = get_current_atom_locked();
>> -            assert("intelfx-51", atom != NULL);
>> -
>> -            ret = atom_dset_immediate_add_extent(atom, &new_entry, start, len);
>> -
>> -            if (ret == -ENOMEM)
>> -                return ret;
>> -
>> -            /* This loop might spin at most two times */
>> -        } while (ret == -E_REPEAT);
>> -
>> -        assert("intelfx-52", ret == 0);
>> -        assert("intelfx-53", atom != NULL);
>> -
>> -        spin_unlock_atom(atom);
>> -
>>          assert("zam-425", get_current_super_private() != NULL);
>>          sa_dealloc_blocks(reiser4_get_space_allocator(ctx->super),
>>                    *start, *len);
>> @@ -1150,13 +1132,15 @@ void reiser4_post_commit_hook(void)
>>    void reiser4_post_write_back_hook(void)
>>  {
>> +    struct list_head discard_tmp;
>>      txn_atom *atom;
>>      int ret;
>>        /* process and issue discard requests */
>> +    blocknr_list_init (&discard_tmp);
>>      do {
>>          atom = get_current_atom_locked();
>> -        ret = discard_atom(*atom);
>> +        ret = discard_atom(*atom, &discard_tmp);
>>      } while (ret == -E_REPEAT);
> 
> 
> This is a leak of disk space on error paths:
> if (ret != 0 && ret != -E_REPEAT), then delete_set is empty,
> or incomplete at this point.

Yeah, I've missed this. I'll add a splice to discard_atom()'s error path.

Thanks,
--
Ivan Shapovalov / intelfx /

(Sent from a phone. Havoc may be wreaked on the formatting.)

>>        if (ret) {
>> @@ -1165,7 +1149,7 @@ void reiser4_post_write_back_hook(void)
>>        /* do the block deallocation which was deferred
>>         until commit is done */
>> -    atom_dset_deferred_apply(atom, apply_dset, NULL, 0);
>> +    atom_dset_deferred_apply(atom, apply_dset, NULL, 1);
>>        assert("zam-504", get_current_super_private() != NULL);
>>      sa_post_write_back_hook();
>> diff --git a/fs/reiser4/discard.c b/fs/reiser4/discard.c
>> index 3c8ee89..0596938 100644
>> --- a/fs/reiser4/discard.c
>> +++ b/fs/reiser4/discard.c
>> @@ -33,24 +33,42 @@
>>   * MECHANISM:
>>   *
>>   * During the transaction deallocated extents are recorded in atom's delete
>> - * sets. There are two delete sets, because data in one of them (delete_set) is
>> - * also used by other parts of reiser4. The second delete set (aux_delete_set)
>> - * complements the first one and is maintained only when discard is enabled.
>> + * set. In reiser4, there are two methods to deallocate a block:
>> + * 1. deferred deallocation, enabled by BA_DEFER flag to reiser4_dealloc_block().
>> + *    In this mode, blocks are stored to delete set instead of being marked free
>> + *    immediately. After committing the transaction, the delete set is "applied"
>> + *    by the block allocator and all these blocks are marked free in memory
>> + *    (see reiser4_post_commit_hook()).
>> + *    Space management plugins also read the delete set to update on-disk
>> + *    allocation records (see reiser4_pre_commit_hook()).
>> + * 2. immediate deallocation (the opposite).
>> + *    In this mode, blocks are marked free immediately. This is used by the
>> + *    journal subsystem to manage space used by the journal records, so these
>> + *    allocations are not visible to the space management plugins and never hit
>> + *    the disk.
>>   *
>> - * Together these sets constitute "the discard set" -- blocks that have to be
>> - * considered for discarding. On atom commit we will generate a minimal
>> - * superset of the discard set, comprised of whole erase units.
>> + * When discard is enabled, all immediate deallocations become deferred. This
>> + * is OK because journal's allocations happen after reiser4_pre_commit_hook()
>> + * where the on-disk space allocation records are updated. So, in this mode
>> + * the atom's delete set becomes "the discard set" -- list of blocks that have
>> + * to be considered for discarding.
>> + *
>> + * On atom commit we will generate a minimal superset of the discard set,
>> + * comprised of whole erase units.
>> + *
>> + * Discarding is performed before reiser4_post_commit_hook(), hence all extents
>> + * in the discard set are still marked as allocated and cannot contain any data.
>> + * Thus we can avoid any checks for blocks directly present in the discard set.
>> + *
>> + * However, we pad each extent from both sides to erase unit boundaries, and
>> + * these paddings still have to be checked if they fall outside of initial
>> + * extent (may not happen if block size > erase unit size).
>>   *
>>   * So, at commit time the following actions take place:
>>   * - delete sets are merged to form the discard set;
>>   * - elements of the discard set are sorted;
>>   * - the discard set is iterated, joining any adjacent extents;
>> - * - each of resulting extents is "covered" by erase units:
>> - *   - its start is rounded down to the closest erase unit boundary;
>> - *   - starting from this block, extents of erase unit length are created
>> - *     until the original extent is fully covered;
>> - * - the calculated erase units are checked to be fully deallocated;
>> - * - remaining (valid) erase units are then passed to blkdev_issue_discard().
>> + * - <TODO>
>>   */
>>    #include "discard.h"
>> @@ -167,7 +185,7 @@ static int discard_extent(txn_atom *atom UNUSED_ARG,
>>      return 0;
>>  }
>>  -int discard_atom(txn_atom *atom)
>> +int discard_atom(txn_atom *atom, struct list_head *processed_set)
>>  {
>>      int ret;
>>      struct list_head discard_set;
>> @@ -178,9 +196,14 @@ int discard_atom(txn_atom *atom)
>>      }
>>        assert("intelfx-28", atom != NULL);
>> +    assert("intelfx-59", processed_entries != NULL);
>>  -    if (list_empty(&atom->discard.delete_set) &&
>> -        list_empty(&atom->discard.aux_delete_set)) {
>> +    if (list_empty(&atom->discard.delete_set)) {
>> +        /*
>> +         * Nothing left to discard. Move all processed entries back to atom
>> +         * for reiser4_post_commit_hook().
>> +         */
>> +        blocknr_list_merge(processed_set, &atom->discard.delete_set);
>>          spin_unlock_atom(atom);
>>          return 0;
>>      }
>> @@ -188,14 +211,17 @@ int discard_atom(txn_atom *atom)
>>      /* Take the delete sets from the atom in order to release atom spinlock. */
>>      blocknr_list_init(&discard_set);
>>      blocknr_list_merge(&atom->discard.delete_set, &discard_set);
>> -    blocknr_list_merge(&atom->discard.aux_delete_set, &discard_set);
>>      spin_unlock_atom(atom);
>>        /* Sort the discard list, joining adjacent and overlapping extents. */
>>      blocknr_list_sort_and_join(&discard_set);
>>        /* Perform actual dirty work. */
>> -    ret = blocknr_list_iterator(NULL, &discard_set, &discard_extent, NULL, 1);
>> +    ret = blocknr_list_iterator(NULL, &discard_set, &discard_extent, NULL, 0);
>> +
>> +    /* Add processed extents to the temporary list. */
>> +    blocknr_list_merge(&discard_set, processed_set);
>> +
>>      if (ret != 0) {
>>          return ret;
>>      }
>> diff --git a/fs/reiser4/discard.h b/fs/reiser4/discard.h
>> index ea46334..95bbe8b 100644
>> --- a/fs/reiser4/discard.h
>> +++ b/fs/reiser4/discard.h
>> @@ -14,8 +14,10 @@
>>   * if discard is enabled. In this case the delete sets are cleared.
>>   *
>>   * @atom should be locked on entry and is unlocked on exit.
>> + * @processed_set keeps state between invocations in -E_REPEAT pattern;
>> + *                must be initialized with blocknr_list_init().
>>   */
>> -extern int discard_atom(txn_atom *atom);
>> +extern int discard_atom(txn_atom *atom, struct list_head *processed_set);
>>    /* __FS_REISER4_DISCARD_H__ */
>>  #endif
>> diff --git a/fs/reiser4/txnmgr.c b/fs/reiser4/txnmgr.c
>> index 317bc4f..d73ecb9 100644
>> --- a/fs/reiser4/txnmgr.c
>> +++ b/fs/reiser4/txnmgr.c
>> @@ -3081,7 +3081,6 @@ void atom_dset_init(txn_atom *atom)
>>  {
>>      if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
>>          blocknr_list_init(&atom->discard.delete_set);
>> -        blocknr_list_init(&atom->discard.aux_delete_set);
>>      } else {
>>          blocknr_set_init(&atom->nodiscard.delete_set);
>>      }
>> @@ -3091,7 +3090,6 @@ void atom_dset_destroy(txn_atom *atom)
>>  {
>>      if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
>>          blocknr_list_destroy(&atom->discard.delete_set);
>> -        blocknr_list_destroy(&atom->discard.aux_delete_set);
>>      } else {
>>          blocknr_set_destroy(&atom->nodiscard.delete_set);
>>      }
>> @@ -3101,7 +3099,6 @@ void atom_dset_merge(txn_atom *from, txn_atom *to)
>>  {
>>      if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
>>          blocknr_list_merge(&from->discard.delete_set, &to->discard.delete_set);
>> -        blocknr_list_merge(&from->discard.aux_delete_set, &to->discard.aux_delete_set);
>>      } else {
>>          blocknr_set_merge(&from->nodiscard.delete_set, &to->nodiscard.delete_set);
>>      }
>> @@ -3155,27 +3152,6 @@ extern int atom_dset_deferred_add_extent(txn_atom *atom,
>>      return ret;
>>  }
>>  -extern int atom_dset_immediate_add_extent(txn_atom *atom,
>> -                                          void **new_entry,
>> -                                          const reiser4_block_nr *start,
>> -                                          const reiser4_block_nr *len)
>> -{
>> -    int ret;
>> -
>> -    if (reiser4_is_set(reiser4_get_current_sb(), REISER4_DISCARD)) {
>> -        ret = blocknr_list_add_extent(atom,
>> -                                     &atom->discard.aux_delete_set,
>> -                                     (blocknr_list_entry**)new_entry,
>> -                                     start,
>> -                                     len);
>> -    } else {
>> -        /* no-op */
>> -        ret = 0;
>> -    }
>> -
>> -    return ret;
>> -}
>> -
>>  /*
>>   * Local variables:
>>   * c-indentation-style: "K&R"
>> diff --git a/fs/reiser4/txnmgr.h b/fs/reiser4/txnmgr.h
>> index 02757a9..05990d8 100644
>> --- a/fs/reiser4/txnmgr.h
>> +++ b/fs/reiser4/txnmgr.h
>> @@ -259,10 +259,6 @@ struct txn_atom {
>>              /* The atom's delete set. It collects block numbers which were
>>                 deallocated with BA_DEFER, i. e. of ordinary nodes. */
>>              struct list_head delete_set;
>> -
>> -            /* The atom's auxiliary delete set. It collects block numbers
>> -               which were deallocated without BA_DEFER, i. e. immediately. */
>> -            struct list_head aux_delete_set;
>>          } discard;
>>      };
>>  @@ -527,9 +523,8 @@ extern int blocknr_list_iterator(txn_atom *atom,
>>    /* These are wrappers for accessing and modifying atom's delete lists,
>>     depending on whether discard is enabled or not.
>> -   If it is enabled. both deferred and immediate delete lists are maintained,
>> -   and (less memory efficient) blocknr_lists are used for storage. Otherwise, only
>> -   deferred delete list is maintained and blocknr_set is used for its storage. */
>> +   If it is enabled, (less memory efficient) blocknr_list is used for delete
>> +   list storage. Otherwise, blocknr_set is used for this purpose. */
>>  extern void atom_dset_init(txn_atom *atom);
>>  extern void atom_dset_destroy(txn_atom *atom);
>>  extern void atom_dset_merge(txn_atom *from, txn_atom *to);
>> @@ -541,10 +536,6 @@ extern int atom_dset_deferred_add_extent(txn_atom *atom,
>>                                           void **new_entry,
>>                                           const reiser4_block_nr *start,
>>                                           const reiser4_block_nr *len);
>> -extern int atom_dset_immediate_add_extent(txn_atom *atom,
>> -                                          void **new_entry,
>> -                                          const reiser4_block_nr *start,
>> -                                          const reiser4_block_nr *len);
>>    /* flush code takes care about how to fuse flush queues */
>>  extern void flush_init_atom(txn_atom * atom);
> 
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation.
  2014-07-17 11:14                           ` Edward Shishkin
@ 2014-07-20 11:33                             ` Ivan Shapovalov
  0 siblings, 0 replies; 34+ messages in thread
From: Ivan Shapovalov @ 2014-07-20 11:33 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

> 17 июля 2014 г., в 15:14, Edward Shishkin <edward.shishkin@gmail.com> написал(а):
> 
>> On 07/17/2014 11:46 AM, Ivan Shapovalov wrote:
>>> On Thursday 17 July 2014 at 01:35:20, Edward Shishkin wrote:    
>>>> On 07/16/2014 01:24 PM, Ivan Shapovalov wrote:
>>>> This needs your version of discard algorithm, because blocks inside extents of
>>>> discard set are now marked as allocated.
>>> 
>>> OK, I'll try to finish it in weekends..
>>> 
>>> 
>>>> BTW, have you dropped the case with erase unit size < block size?
>>> 
>>> In my version all erase units are supported except ones which are
>>> not a power of 2. We can support all erase units without exceptions,
>>> if needed.
>> I mean, when erase unit size < block size, erase unit size is forced to be
>> exactly one block. Again, this is OK, but in this case any alignment (by
>> definition less than one block then) will be dropped and represented as zero,
>> and _this_ deserves a warning :)
> 
> 
> The case of alignment % block_size != 0 deserves to disable discard
> support, because in this case all this science with heads and tails
> simply doesn't work.

<bikeshed>
If erase unit size == block size (or less than), then it's ok. We could in this case unconditionally consider a tail/head padding of 1 block without cutting tail/head in case of unavailable padding.
But then again, </bikeshed>

--
Ivan Shapovalov / intelfx /

(Sent from a phone. Havoc may be wreaked on the formatting.)

>> // btw, as you probably guess, these two patches are completely untested.
>> 
>> Thanks,
> 
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-20 10:06                   ` Ivan Shapovalov
@ 2014-07-20 12:33                     ` Edward Shishkin
  2014-07-20 21:04                       ` Edward Shishkin
  0 siblings, 1 reply; 34+ messages in thread
From: Edward Shishkin @ 2014-07-20 12:33 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 07/20/2014 12:06 PM, Ivan Shapovalov wrote:
>> 20 июля 2014 г., в 1:20, Edward Shishkin <edward.shishkin@gmail.com> написал(а):
>>
>>
>>> On 07/14/2014 03:56 AM, Edward Shishkin wrote:
>>>
>>>> On 07/13/2014 09:18 PM, Ivan Shapovalov wrote:
>> [...]
>>>> Nothing seems to call at least check_free_blocks(begin, end)...
>>>
>>> Oh, bad... I thought all this time that extents of the delete sets are still dirty
>>> in the working bitmap at the moment of discard.
>>> Hmm, I definitely don't want to check the whole extents for discard...
>>
>> False alarm, actually..
>>
>> Nothing is wrong if someone makes them dirty before issuing our discard
>> requests: his changes will be committed in the next sessions of possessing
>> the commit_mutex _after_ our discard. Everything is isolated.. For the same
>> reason we don't care if someone allocate blocks in head and tail paddings
>> before issuing our padded discard extents.
> I was just thinking about the very same possible race...
>
> But well, isn't the call to current_atom_complete_writes() placed before acquisition of commit_mutex? I was under impression of that relocate sets could be written while another atom is being committed under the mutex.

take the commit_mutex before flush if discard_enabled && should_check_heads.

>
> --
> Ivan Shapovalov / intelfx /
>
> (Sent from a phone. Havoc may be wreaked on the formatting.)
>
>> We only need to make sure that
>> they were clean at _our_ commit session.
>>
>> Anyway, let's use delayed deallocation. I like it better than aux_delete_set.
>> After all, we can roll everything back at every moment, if any problems..
>>
>> Edward.

--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-20 12:33                     ` Edward Shishkin
@ 2014-07-20 21:04                       ` Edward Shishkin
  2014-07-20 22:49                         ` Edward Shishkin
  0 siblings, 1 reply; 34+ messages in thread
From: Edward Shishkin @ 2014-07-20 21:04 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 07/20/2014 02:33 PM, Edward Shishkin wrote:
>
> On 07/20/2014 12:06 PM, Ivan Shapovalov wrote:
>>> 20 июля 2014 г., в 1:20, Edward Shishkin <edward.shishkin@gmail.com> 
>>> написал(а):
>>>
>>>
>>>> On 07/14/2014 03:56 AM, Edward Shishkin wrote:
>>>>
>>>>> On 07/13/2014 09:18 PM, Ivan Shapovalov wrote:
>>> [...]
>>>>> Nothing seems to call at least check_free_blocks(begin, end)...
>>>>
>>>> Oh, bad... I thought all this time that extents of the delete sets 
>>>> are still dirty
>>>> in the working bitmap at the moment of discard.
>>>> Hmm, I definitely don't want to check the whole extents for discard...
>>>
>>> False alarm, actually..
>>>
>>> Nothing is wrong if someone makes them dirty before issuing our discard
>>> requests: his changes will be committed in the next sessions of 
>>> possessing
>>> the commit_mutex _after_ our discard. Everything is isolated.. For 
>>> the same
>>> reason we don't care if someone allocate blocks in head and tail 
>>> paddings
>>> before issuing our padded discard extents.
>> I was just thinking about the very same possible race...
>>
>> But well, isn't the call to current_atom_complete_writes() placed 
>> before acquisition of commit_mutex? I was under impression of that 
>> relocate sets could be written while another atom is being committed 
>> under the mutex.
>
> take the commit_mutex before flush if discard_enabled && 
> should_check_heads.


Actually, this is not so simple. E.g. I got a deadlock..
Another option is to make head/tail paddings dirty with the
following update of the extent of the delete_set.
One more option is to not support such discard params.

Edward.


>
>>
>> -- 
>> Ivan Shapovalov / intelfx /
>>
>> (Sent from a phone. Havoc may be wreaked on the formatting.)
>>
>>> We only need to make sure that
>>> they were clean at _our_ commit session.
>>>
>>> Anyway, let's use delayed deallocation. I like it better than 
>>> aux_delete_set.
>>> After all, we can roll everything back at every moment, if any 
>>> problems..
>>>
>>> Edward.
>

--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-20 21:04                       ` Edward Shishkin
@ 2014-07-20 22:49                         ` Edward Shishkin
  2014-07-20 23:14                           ` Ivan Shapovalov
  0 siblings, 1 reply; 34+ messages in thread
From: Edward Shishkin @ 2014-07-20 22:49 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 07/20/2014 11:04 PM, Edward Shishkin wrote:
>
> On 07/20/2014 02:33 PM, Edward Shishkin wrote:
>>
>> On 07/20/2014 12:06 PM, Ivan Shapovalov wrote:
>>>> 20 июля 2014 г., в 1:20, Edward Shishkin 
>>>> <edward.shishkin@gmail.com> написал(а):
>>>>
>>>>
>>>>> On 07/14/2014 03:56 AM, Edward Shishkin wrote:
>>>>>
>>>>>> On 07/13/2014 09:18 PM, Ivan Shapovalov wrote:
>>>> [...]
>>>>>> Nothing seems to call at least check_free_blocks(begin, end)...
>>>>>
>>>>> Oh, bad... I thought all this time that extents of the delete sets 
>>>>> are still dirty
>>>>> in the working bitmap at the moment of discard.
>>>>> Hmm, I definitely don't want to check the whole extents for 
>>>>> discard...
>>>>
>>>> False alarm, actually..
>>>>
>>>> Nothing is wrong if someone makes them dirty before issuing our 
>>>> discard
>>>> requests: his changes will be committed in the next sessions of 
>>>> possessing
>>>> the commit_mutex _after_ our discard. Everything is isolated.. For 
>>>> the same
>>>> reason we don't care if someone allocate blocks in head and tail 
>>>> paddings
>>>> before issuing our padded discard extents.
>>> I was just thinking about the very same possible race...
>>>
>>> But well, isn't the call to current_atom_complete_writes() placed 
>>> before acquisition of commit_mutex? I was under impression of that 
>>> relocate sets could be written while another atom is being committed 
>>> under the mutex.
>>
>> take the commit_mutex before flush if discard_enabled && 
>> should_check_heads.
>
>
> Actually, this is not so simple. E.g. I got a deadlock..
> Another option is to make head/tail paddings dirty with the
> following update of the extent of the delete_set.
> One more option is to not support such discard params.


Actually what we are trying to implement is "precise discard". This is
rather complicated task. I looked at other file systems: nobody bother
with paddings, just stupidly pass a freed extent to blkdev_issue_discard(),
which cuts the head and the tail. I suggest to do the same. We have done
our best. Someone can not afford even to accumulate the extents.

Edward.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-20 22:49                         ` Edward Shishkin
@ 2014-07-20 23:14                           ` Ivan Shapovalov
  2014-07-22  8:57                             ` Edward Shishkin
  0 siblings, 1 reply; 34+ messages in thread
From: Ivan Shapovalov @ 2014-07-20 23:14 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 2879 bytes --]

On Monday 21 July 2014 at 00:49:14, Edward Shishkin wrote:	
> 
> On 07/20/2014 11:04 PM, Edward Shishkin wrote:
> >
> > On 07/20/2014 02:33 PM, Edward Shishkin wrote:
> >>
> >> On 07/20/2014 12:06 PM, Ivan Shapovalov wrote:
> >>>> 20 июля 2014 г., в 1:20, Edward Shishkin 
> >>>> <edward.shishkin@gmail.com> написал(а):
> >>>>
> >>>>
> >>>>> On 07/14/2014 03:56 AM, Edward Shishkin wrote:
> >>>>>
> >>>>>> On 07/13/2014 09:18 PM, Ivan Shapovalov wrote:
> >>>> [...]
> >>>>>> Nothing seems to call at least check_free_blocks(begin, end)...
> >>>>>
> >>>>> Oh, bad... I thought all this time that extents of the delete sets 
> >>>>> are still dirty
> >>>>> in the working bitmap at the moment of discard.
> >>>>> Hmm, I definitely don't want to check the whole extents for 
> >>>>> discard...
> >>>>
> >>>> False alarm, actually..
> >>>>
> >>>> Nothing is wrong if someone makes them dirty before issuing our 
> >>>> discard
> >>>> requests: his changes will be committed in the next sessions of 
> >>>> possessing
> >>>> the commit_mutex _after_ our discard. Everything is isolated.. For 
> >>>> the same
> >>>> reason we don't care if someone allocate blocks in head and tail 
> >>>> paddings
> >>>> before issuing our padded discard extents.
> >>> I was just thinking about the very same possible race...
> >>>
> >>> But well, isn't the call to current_atom_complete_writes() placed 
> >>> before acquisition of commit_mutex? I was under impression of that 
> >>> relocate sets could be written while another atom is being committed 
> >>> under the mutex.
> >>
> >> take the commit_mutex before flush if discard_enabled && 
> >> should_check_heads.
> >
> >
> > Actually, this is not so simple. E.g. I got a deadlock..
> > Another option is to make head/tail paddings dirty with the
> > following update of the extent of the delete_set.
> > One more option is to not support such discard params.
> 
> 
> Actually what we are trying to implement is "precise discard". This is
> rather complicated task. I looked at other file systems: nobody bother
> with paddings, just stupidly pass a freed extent to blkdev_issue_discard(),
> which cuts the head and the tail. I suggest to do the same. We have done
> our best. Someone can not afford even to accumulate the extents.
> 
> Edward.

Let's do it ideally, after all, this is reiser4 ;) I also think that we can
just mark paddings as allocated and add them to extent in question (so that
they will be freed by reiser4_post_commit_hook). This looks simple enough.

The block allocator interface, IIRC, allows us to tell it "allocate right here
or return failure".

/* I had no time to look into any code during this weekend. Tomorrow I'll
hopefully repost the discard-before-dealloc patches and look into this.. */

-- 
Ivan Shapovalov / intelfx /

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.
  2014-07-20 23:14                           ` Ivan Shapovalov
@ 2014-07-22  8:57                             ` Edward Shishkin
  0 siblings, 0 replies; 34+ messages in thread
From: Edward Shishkin @ 2014-07-22  8:57 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel


On 07/21/2014 01:14 AM, Ivan Shapovalov wrote:
> On Monday 21 July 2014 at 00:49:14, Edward Shishkin wrote:	
[...]
>>
>> Actually, this is not so simple. E.g. I got a deadlock..
>> Another option is to make head/tail paddings dirty with the
>> following update of the extent of the delete_set.
>> One more option is to not support such discard params.
>>
>> Actually what we are trying to implement is "precise discard". This is
>> rather complicated task. I looked at other file systems: nobody bother
>> with paddings, just stupidly pass a freed extent to blkdev_issue_discard(),
>> which cuts the head and the tail. I suggest to do the same. We have done
>> our best. Someone can not afford even to accumulate the extents.
>>
>> Edward.
> Let's do it ideally, after all, this is reiser4 ;) I also think that we can
> just mark paddings as allocated and add them to extent in question (so that
> they will be freed by reiser4_post_commit_hook). This looks simple enough.
>
> The block allocator interface, IIRC, allows us to tell it "allocate right here
> or return failure".
>
> /* I had no time to look into any code during this weekend. Tomorrow I'll
> hopefully repost the discard-before-dealloc patches and look into this.. */

I think we'll also need a "leftward" version of check_blocks() to check
head paddings. The most unpleasant thing is when check fails at the
next bitmap node. It seems that the most reasonable way in this case
would be just to return 0 (dirty) and clear everything when applying
the "updated" delete set.

Edward.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2014-07-22  8:57 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-22 10:48 [PATCHv6 0/5] reiser4: discard support: initial implementation Ivan Shapovalov
2014-06-22 10:48 ` [PATCHv6 1/5] reiser4: make space_allocator's check_blocks() reusable Ivan Shapovalov
2014-06-22 10:48 ` [PATCHv6 2/5] reiser4: add an implementation of "block lists", splitted off the discard code Ivan Shapovalov
2014-06-22 10:48 ` [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists Ivan Shapovalov
2014-07-06 23:47   ` Edward Shishkin
2014-07-09 12:40     ` Ivan Shapovalov
2014-07-09 16:35       ` Edward Shishkin
2014-07-13  1:33       ` Edward Shishkin
2014-07-13 12:47         ` Ivan Shapovalov
2014-07-13 19:04           ` Edward Shishkin
2014-07-13 19:18             ` Ivan Shapovalov
2014-07-14  1:56               ` Edward Shishkin
2014-07-15 11:42                 ` Ivan Shapovalov
2014-07-16 10:23                   ` Edward Shishkin
2014-07-16 10:26                     ` Edward Shishkin
2014-07-16 11:24                     ` [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation Ivan Shapovalov
2014-07-16 11:24                       ` [veryRFC] [PATCH 1/2] reiser4: discard support: perform discards and deallocations after writing logs Ivan Shapovalov
2014-07-16 11:24                       ` [veryRFC] [PATCH 2/2] reiser4: discard support: proof-of-concept for "discard before dealloc" Ivan Shapovalov
2014-07-20  1:11                         ` Edward Shishkin
2014-07-20 10:09                           ` Ivan Shapovalov
2014-07-16 14:19                       ` [veryRFC] [PATCH 0/2] reiser4: discard before dealloc: first approximation Ivan Shapovalov
2014-07-16 23:35                       ` Edward Shishkin
2014-07-17  9:46                         ` Ivan Shapovalov
2014-07-17 11:14                           ` Edward Shishkin
2014-07-20 11:33                             ` Ivan Shapovalov
2014-07-19 21:20                 ` [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists Edward Shishkin
2014-07-20 10:06                   ` Ivan Shapovalov
2014-07-20 12:33                     ` Edward Shishkin
2014-07-20 21:04                       ` Edward Shishkin
2014-07-20 22:49                         ` Edward Shishkin
2014-07-20 23:14                           ` Ivan Shapovalov
2014-07-22  8:57                             ` Edward Shishkin
2014-06-22 10:48 ` [PATCHv6 4/5] reiser4: blocknr_list: use kmem_cache instead of kmalloc for allocating entries Ivan Shapovalov
2014-06-22 10:48 ` [PATCHv6 5/5] reiser4: blocknr_set: " Ivan Shapovalov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.