All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/5] [GSoC] remove_subtree(): reimplement using iterators
@ 2017-04-02 20:03 Daniel Ferreira
  2017-04-02 20:03 ` [PATCH v7 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Daniel Ferreira @ 2017-04-02 20:03 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

This is the seventh version of a patch series that implements the GSoC
microproject of converting a recursive call to readdir() to use dir_iterator.

v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#t
v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t
v3: https://public-inbox.org/git/CAGZ79kYtpmURSQWPumobA=e3JBFjKhWCdv_LPhKCd71ZRwMovA@mail.gmail.com/T/#t
v4: https://public-inbox.org/git/1490747533-89143-1-git-send-email-bnmvco@gmail.com/T/#e437a63e0c22c00c69b5d92977c9b438ed2b9fd3a
v5: https://public-inbox.org/git/1490844730-47634-1-git-send-email-bnmvco@gmail.com/T/#m2323f15e45de699f2e09364f40a62e17047cf453
v6: https://public-inbox.org/git/1491107726-21504-1-git-send-email-bnmvco@gmail.com/T/#t

I screwed up in v6 because I had introduced a bug that in case git tried to open
a directory that did not exist using dir_iterator, the program would segfault. This
was amended and all commits are passing the tests. Sorry for not having tested
my changes properly.

CI build: https://travis-ci.org/theiostream/git

Since the changes in v6 were not reviewed, I'll just copy what was sent
back there.

> Back in v5, Michael had a number of suggestions, all of which were applied
> to this version (including a slightly modified version of his "biggish rewrite"
> project to make dir_iterator's state machine simpler). The only suggestion that
> did not make it into this series was that of not traversing into subdirectories,
> since I believe it would be better off in another series that actually required
> that feature (that is, I do not want a series to implement a feature it will
> not need). The same goes for Junio's thought on a flag to list *only* directories
> and no files on the v4 discussion.

> Junio and Peff's comments about how to write to files in the tests were also
> considered, and the tests were adjusted.

> I chose to squash both the state machine refactor and the addition of the
> new flags in a single commit. I do not know whether you will feel this is
> the right choice but it seemed natural, since most of the state machine's
> new logic would not even make sense without encompassing the new features.
> I am, of course, open for feedback on this decision.

Daniel Ferreira (5):
  dir_iterator: add tests for dir_iterator API
  remove_subtree(): test removing nested directories
  dir_iterator: add helpers to dir_iterator_advance
  dir_iterator: refactor state machine model
  remove_subtree(): reimplement using iterators

 Makefile                        |   1 +
 dir-iterator.c                  | 196 ++++++++++++++++++++++++++--------------
 dir-iterator.h                  |  28 ++++--
 entry.c                         |  38 +++-----
 refs/files-backend.c            |   2 +-
 t/helper/.gitignore             |   1 +
 t/helper/test-dir-iterator.c    |  32 +++++++
 t/t0065-dir-iterator.sh         | 109 ++++++++++++++++++++++
 t/t2000-checkout-cache-clash.sh |  11 +++
 9 files changed, 316 insertions(+), 102 deletions(-)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0065-dir-iterator.sh

--
2.7.4 (Apple Git-66)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v7 1/5] dir_iterator: add tests for dir_iterator API
  2017-04-02 20:03 [PATCH v7 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-04-02 20:03 ` Daniel Ferreira
  2017-04-03 22:31   ` Stefan Beller
  2017-04-02 20:03 ` [PATCH v7 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Daniel Ferreira @ 2017-04-02 20:03 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Create t/helper/test-dir-iterator.c, which prints relevant information
about a directory tree iterated over with dir_iterator.

Create t/t0065-dir-iterator.sh, which tests that dir_iterator does
iterate through a whole directory tree.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 Makefile                     |  1 +
 t/helper/.gitignore          |  1 +
 t/helper/test-dir-iterator.c | 28 +++++++++++++++++++++++
 t/t0065-dir-iterator.sh      | 54 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 84 insertions(+)
 create mode 100644 t/helper/test-dir-iterator.c
 create mode 100755 t/t0065-dir-iterator.sh

diff --git a/Makefile b/Makefile
index 9b36068..41ce9ab 100644
--- a/Makefile
+++ b/Makefile
@@ -614,6 +614,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-config
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
+TEST_PROGRAMS_NEED_X += test-dir-iterator
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-dump-split-index
 TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index 758ed2e..a7d74d3 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -3,6 +3,7 @@
 /test-config
 /test-date
 /test-delta
+/test-dir-iterator
 /test-dump-cache-tree
 /test-dump-split-index
 /test-dump-untracked-cache
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
new file mode 100644
index 0000000..06f03fc
--- /dev/null
+++ b/t/helper/test-dir-iterator.c
@@ -0,0 +1,28 @@
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+
+int cmd_main(int argc, const char **argv) {
+	struct strbuf path = STRBUF_INIT;
+	struct dir_iterator *diter;
+
+	if (argc < 2) {
+		return 1;
+	}
+
+	strbuf_add(&path, argv[1], strlen(argv[1]));
+
+	diter = dir_iterator_begin(path.buf);
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode))
+			printf("[d] ");
+		else
+			printf("[f] ");
+
+		printf("(%s) %s\n", diter->relative_path, diter->path.buf);
+	}
+
+	return 0;
+}
diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
new file mode 100755
index 0000000..b857c07
--- /dev/null
+++ b/t/t0065-dir-iterator.sh
@@ -0,0 +1,54 @@
+#!/bin/sh
+
+test_description='Test directory iteration.'
+
+. ./test-lib.sh
+
+cat >expect-sorted-output <<-\EOF &&
+[d] (a) ./dir/a
+[d] (a/b) ./dir/a/b
+[d] (a/b/c) ./dir/a/b/c
+[d] (d) ./dir/d
+[d] (d/e) ./dir/d/e
+[d] (d/e/d) ./dir/d/e/d
+[f] (a/b/c/d) ./dir/a/b/c/d
+[f] (a/e) ./dir/a/e
+[f] (b) ./dir/b
+[f] (c) ./dir/c
+[f] (d/e/d/a) ./dir/d/e/d/a
+EOF
+
+test_expect_success 'dir-iterator should iterate through all files' '
+	mkdir -p dir &&
+	mkdir -p dir/a/b/c/ &&
+	>dir/b &&
+	>dir/c &&
+	mkdir -p dir/d/e/d/ &&
+	>dir/a/b/c/d &&
+	>dir/a/e &&
+	>dir/d/e/d/a &&
+
+	test-dir-iterator ./dir | sort >./actual-pre-order-sorted-output &&
+	rm -rf dir &&
+
+	test_cmp expect-sorted-output actual-pre-order-sorted-output
+'
+
+cat >expect-pre-order-output <<-\EOF &&
+[d] (a) ./dir/a
+[d] (a/b) ./dir/a/b
+[d] (a/b/c) ./dir/a/b/c
+[f] (a/b/c/d) ./dir/a/b/c/d
+EOF
+
+test_expect_success 'dir-iterator should list files in the correct order' '
+	mkdir -p dir/a/b/c/ &&
+	>dir/a/b/c/d &&
+
+	test-dir-iterator ./dir >actual-pre-order-output &&
+	rm -rf dir &&
+
+	test_cmp expect-pre-order-output actual-pre-order-output
+'
+
+test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 2/5] remove_subtree(): test removing nested directories
  2017-04-02 20:03 [PATCH v7 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-04-02 20:03 ` [PATCH v7 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
@ 2017-04-02 20:03 ` Daniel Ferreira
  2017-04-03 22:35   ` Stefan Beller
  2017-04-02 20:03 ` [PATCH v7 3/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Daniel Ferreira @ 2017-04-02 20:03 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Test removing a nested directory when an attempt is made to restore the
index to a state where it does not exist. A similar test could be found
previously in t/t2000-checkout-cache-clash.sh, but it did not check for
nested directories, which could allow a faulty implementation of
remove_subtree() pass the tests.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 t/t2000-checkout-cache-clash.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t2000-checkout-cache-clash.sh b/t/t2000-checkout-cache-clash.sh
index de3edb5..ac10ba3 100755
--- a/t/t2000-checkout-cache-clash.sh
+++ b/t/t2000-checkout-cache-clash.sh
@@ -57,4 +57,15 @@ test_expect_success SYMLINKS 'checkout-index -f twice with --prefix' '
 	git checkout-index -a -f --prefix=there/
 '
 
+test_expect_success 'git checkout-index -f should remove nested subtrees' '
+	echo content >path &&
+	git update-index --add path &&
+	rm path &&
+	mkdir -p path/with/nested/paths &&
+	echo content >path/file1 &&
+	echo content >path/with/nested/paths/file2 &&
+	git checkout-index -f -a &&
+	test ! -d path
+'
+
 test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 3/5] dir_iterator: add helpers to dir_iterator_advance
  2017-04-02 20:03 [PATCH v7 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
  2017-04-02 20:03 ` [PATCH v7 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
  2017-04-02 20:03 ` [PATCH v7 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
@ 2017-04-02 20:03 ` Daniel Ferreira
  2017-04-03 22:48   ` Stefan Beller
  2017-04-02 20:03 ` [PATCH v7 4/5] dir_iterator: refactor state machine model Daniel Ferreira
  2017-04-02 20:03 ` [PATCH v7 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
  4 siblings, 1 reply; 13+ messages in thread
From: Daniel Ferreira @ 2017-04-02 20:03 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Create inline helpers to dir_iterator_advance(). Make
dir_iterator_advance()'s code more legible and allow some behavior to
be reusable.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 dir-iterator.c | 65 +++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 23 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 34182a9..ce8bf81 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -50,6 +50,43 @@ struct dir_iterator_int {
 	struct dir_iterator_level *levels;
 };
 
+static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+	level->dir_state = DIR_STATE_RECURSE;
+	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+		   iter->levels_alloc);
+	level = &iter->levels[iter->levels_nr++];
+	level->initialized = 0;
+}
+
+static inline int pop_dir_level(struct dir_iterator_int *iter)
+{
+	return --iter->levels_nr;
+}
+
+static inline int set_iterator_data(struct dir_iterator_int *iter, struct dir_iterator_level *level)
+{
+	if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
+		if (errno != ENOENT)
+			warning("error reading path '%s': %s",
+				iter->base.path.buf,
+				strerror(errno));
+		return -1;
+	}
+
+	/*
+	 * We have to set these each time because
+	 * the path strbuf might have been realloc()ed.
+	 */
+	iter->base.relative_path =
+		iter->base.path.buf + iter->levels[0].prefix_len;
+	iter->base.basename =
+		iter->base.path.buf + level->prefix_len;
+	level->dir_state = DIR_STATE_ITER;
+
+	return 0;
+}
+
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
@@ -84,11 +121,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				 * over; now prepare to iterate into
 				 * it.
 				 */
-				level->dir_state = DIR_STATE_RECURSE;
-				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
-					   iter->levels_alloc);
-				level = &iter->levels[iter->levels_nr++];
-				level->initialized = 0;
+				push_dir_level(iter, level);
 				continue;
 			} else {
 				/*
@@ -104,7 +137,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			 * This level is exhausted (or wasn't opened
 			 * successfully); pop up a level.
 			 */
-			if (--iter->levels_nr == 0)
+			if (pop_dir_level(iter) == 0)
 				return dir_iterator_abort(dir_iterator);
 
 			continue;
@@ -129,7 +162,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 						iter->base.path.buf, strerror(errno));
 
 				level->dir = NULL;
-				if (--iter->levels_nr == 0)
+				if (pop_dir_level(iter) == 0)
 					return dir_iterator_abort(dir_iterator);
 				break;
 			}
@@ -138,23 +171,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 				continue;
 
 			strbuf_addstr(&iter->base.path, de->d_name);
-			if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
-				if (errno != ENOENT)
-					warning("error reading path '%s': %s",
-						iter->base.path.buf,
-						strerror(errno));
-				continue;
-			}
 
-			/*
-			 * We have to set these each time because
-			 * the path strbuf might have been realloc()ed.
-			 */
-			iter->base.relative_path =
-				iter->base.path.buf + iter->levels[0].prefix_len;
-			iter->base.basename =
-				iter->base.path.buf + level->prefix_len;
-			level->dir_state = DIR_STATE_ITER;
+			if (set_iterator_data(iter, level))
+				continue;
 
 			return ITER_OK;
 		}
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 4/5] dir_iterator: refactor state machine model
  2017-04-02 20:03 [PATCH v7 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (2 preceding siblings ...)
  2017-04-02 20:03 ` [PATCH v7 3/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
@ 2017-04-02 20:03 ` Daniel Ferreira
  2017-04-03  3:36   ` Michael Haggerty
  2017-04-02 20:03 ` [PATCH v7 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
  4 siblings, 1 reply; 13+ messages in thread
From: Daniel Ferreira @ 2017-04-02 20:03 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira

Perform major refactor of dir_iterator_advance(). dir_iterator has
ceased to rely on a convoluted state machine mechanism of two loops and
two state variables (level.initialized and level.dir_state). This serves
to ease comprehension of the iterator mechanism and ease addition of new
features to the iterator.

Create an option for the dir_iterator API to iterate over subdirectories
only after having iterated through their contents. This feature was
predicted, although not implemented by 0fe5043 ("dir_iterator: new API
for iterating over a directory tree", 2016-06-18). This is useful for
recursively removing a directory and calling rmdir() on a directory only
after all of its contents have been wiped.

Add an option for the dir_iterator API to iterate over the root
directory (the one it was initialized with) as well.

Add the "flags" parameter to dir_iterator_create, allowing for the
aforementioned new features to be enabled. The new default behavior
(i.e. flags set to 0) does not iterate over directories. Flag
DIR_ITERATOR_PRE_ORDER_TRAVERSAL iterates over a directory before doing
so over its contents. DIR_ITERATOR_POST_ORDER_TRAVERSAL iterates over a
directory after doing so over its contents. DIR_ITERATOR_LIST_ROOT_DIR
iterates over the root directory. These flags do not conflict with each
other and may be used simultaneously.

Amend a call to dir_iterator_begin() in refs/files-backend.c to pass
the flags parameter introduced.

Improve t/t0065-dir-iterator.sh and t/helper/test-dir-iterator.c to
test "post-order" and "iterate-over-root" modes.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 dir-iterator.c               | 155 +++++++++++++++++++++++++++----------------
 dir-iterator.h               |  28 ++++++--
 refs/files-backend.c         |   2 +-
 t/helper/test-dir-iterator.c |   6 +-
 t/t0065-dir-iterator.sh      |  61 ++++++++++++++++-
 5 files changed, 183 insertions(+), 69 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index ce8bf81..18b7e68 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -4,8 +4,6 @@
 #include "dir-iterator.h"
 
 struct dir_iterator_level {
-	int initialized;
-
 	DIR *dir;
 
 	/*
@@ -20,8 +18,11 @@ struct dir_iterator_level {
 	 * iteration and also iterated into):
 	 */
 	enum {
-		DIR_STATE_ITER,
-		DIR_STATE_RECURSE
+		DIR_STATE_PUSHED,
+		DIR_STATE_PRE_ITERATION,
+		DIR_STATE_ITERATING,
+		DIR_STATE_POST_ITERATION,
+		DIR_STATE_EXHAUSTED
 	} dir_state;
 };
 
@@ -48,15 +49,20 @@ struct dir_iterator_int {
 	 * that will be included in this iteration.
 	 */
 	struct dir_iterator_level *levels;
+
+	/* Holds the flags passed to dir_iterator_begin(). */
+	unsigned flags;
 };
 
 static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
 {
-	level->dir_state = DIR_STATE_RECURSE;
 	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
 		   iter->levels_alloc);
+
+	/* Push a new level */
 	level = &iter->levels[iter->levels_nr++];
-	level->initialized = 0;
+	level->dir = NULL;
+	level->dir_state = DIR_STATE_PUSHED;
 }
 
 static inline int pop_dir_level(struct dir_iterator_int *iter)
@@ -75,18 +81,35 @@ static inline int set_iterator_data(struct dir_iterator_int *iter, struct dir_it
 	}
 
 	/*
-	 * We have to set these each time because
-	 * the path strbuf might have been realloc()ed.
+	 * Check if we are dealing with the root directory as an
+	 * item that's being iterated through.
 	 */
-	iter->base.relative_path =
-		iter->base.path.buf + iter->levels[0].prefix_len;
+	if (level->dir_state != DIR_STATE_ITERATING &&
+		iter->levels_nr == 1) {
+		iter->base.relative_path = ".";
+	}
+	else {
+		iter->base.relative_path =
+			iter->base.path.buf + iter->levels[0].prefix_len;
+	}
 	iter->base.basename =
 		iter->base.path.buf + level->prefix_len;
-	level->dir_state = DIR_STATE_ITER;
 
 	return 0;
 }
 
+/*
+ * This function uses a state machine with the following states:
+ * -> DIR_STATE_PUSHED: the directory has been pushed to the
+ * iterator traversal tree.
+ * -> DIR_STATE_PRE_ITERATION: the directory is *NOT* initialized. The
+ * dirpath has already been returned if pre-order traversal is set.
+ * -> DIR_STATE_ITERATING: the directory is initialized. We are traversing
+ * through it.
+ * -> DIR_STATE_POST_ITERATION: the directory has been iterated through.
+ * We are ready to close it.
+ * -> DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped.
+ */
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
 	struct dir_iterator_int *iter =
@@ -97,7 +120,18 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			&iter->levels[iter->levels_nr - 1];
 		struct dirent *de;
 
-		if (!level->initialized) {
+		if (level->dir_state == DIR_STATE_PUSHED) {
+			level->dir_state = DIR_STATE_PRE_ITERATION;
+
+			if (iter->flags & DIR_ITERATOR_PRE_ORDER_TRAVERSAL) {
+				/* We may not want the root directory to be iterated over */
+				if (iter->levels_nr != 1 ||
+					(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR)) {
+					set_iterator_data(iter, level);
+					return ITER_OK;
+				}
+			}
+		} else if (level->dir_state == DIR_STATE_PRE_ITERATION) {
 			/*
 			 * Note: dir_iterator_begin() ensures that
 			 * path is not the empty string.
@@ -107,64 +141,35 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			level->prefix_len = iter->base.path.len;
 
 			level->dir = opendir(iter->base.path.buf);
-			if (!level->dir && errno != ENOENT) {
-				warning("error opening directory %s: %s",
-					iter->base.path.buf, strerror(errno));
-				/* Popping the level is handled below */
-			}
-
-			level->initialized = 1;
-		} else if (S_ISDIR(iter->base.st.st_mode)) {
-			if (level->dir_state == DIR_STATE_ITER) {
+			if (!level->dir) {
 				/*
-				 * The directory was just iterated
-				 * over; now prepare to iterate into
-				 * it.
+				 * This level wasn't opened sucessfully; pretend we
+				 * iterated through it already.
 				 */
-				push_dir_level(iter, level);
+				if (errno != ENOENT) {
+					warning("error opening directory %s: %s",
+						iter->base.path.buf, strerror(errno));
+				}
+
+				level->dir_state = DIR_STATE_POST_ITERATION;
 				continue;
-			} else {
-				/*
-				 * The directory has already been
-				 * iterated over and iterated into;
-				 * we're done with it.
-				 */
 			}
-		}
 
-		if (!level->dir) {
-			/*
-			 * This level is exhausted (or wasn't opened
-			 * successfully); pop up a level.
-			 */
-			if (pop_dir_level(iter) == 0)
-				return dir_iterator_abort(dir_iterator);
-
-			continue;
-		}
-
-		/*
-		 * Loop until we find an entry that we can give back
-		 * to the caller:
-		 */
-		while (1) {
+			level->dir_state = DIR_STATE_ITERATING;
+		} else if (level->dir_state == DIR_STATE_ITERATING) {
 			strbuf_setlen(&iter->base.path, level->prefix_len);
 			errno = 0;
 			de = readdir(level->dir);
 
 			if (!de) {
-				/* This level is exhausted; pop up a level. */
+				/* In case of readdir() error */
 				if (errno) {
 					warning("error reading directory %s: %s",
 						iter->base.path.buf, strerror(errno));
-				} else if (closedir(level->dir))
-					warning("error closing directory %s: %s",
-						iter->base.path.buf, strerror(errno));
+				}
 
-				level->dir = NULL;
-				if (pop_dir_level(iter) == 0)
-					return dir_iterator_abort(dir_iterator);
-				break;
+				level->dir_state = DIR_STATE_POST_ITERATION;
+				continue;
 			}
 
 			if (is_dot_or_dotdot(de->d_name))
@@ -175,7 +180,38 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 			if (set_iterator_data(iter, level))
 				continue;
 
+			if (S_ISDIR(iter->base.st.st_mode)) {
+				push_dir_level(iter, level);
+				continue;
+			}
+
 			return ITER_OK;
+		} else if (level->dir_state == DIR_STATE_POST_ITERATION) {
+			if (level->dir != NULL && closedir(level->dir)) {
+				warning("error closing directory %s: %s",
+					iter->base.path.buf, strerror(errno));
+			}
+			level->dir_state = DIR_STATE_EXHAUSTED;
+
+			strbuf_setlen(&iter->base.path, level->prefix_len);
+			/*
+			 * Since we are iterating through the dirpath
+			 * after we have gone through it, we still need
+			 * to get rid of the trailing slash we appended.
+			 */
+			strbuf_strip_suffix(&iter->base.path, "/");
+
+			if (iter->flags & DIR_ITERATOR_POST_ORDER_TRAVERSAL) {
+				/* We may not want the root directory to be iterated over */
+				if (iter->levels_nr != 1 ||
+					(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR)) {
+					set_iterator_data(iter, level);
+					return ITER_OK;
+				}
+			}
+		} else if (level->dir_state == DIR_STATE_EXHAUSTED) {
+			if (pop_dir_level(iter) == 0)
+				return dir_iterator_abort(dir_iterator);
 		}
 	}
 }
@@ -201,7 +237,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
 	return ITER_DONE;
 }
 
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
 {
 	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
 	struct dir_iterator *dir_iterator = &iter->base;
@@ -209,13 +245,16 @@ struct dir_iterator *dir_iterator_begin(const char *path)
 	if (!path || !*path)
 		die("BUG: empty path passed to dir_iterator_begin()");
 
+	iter->flags = flags;
+
 	strbuf_init(&iter->base.path, PATH_MAX);
 	strbuf_addstr(&iter->base.path, path);
 
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
 	iter->levels_nr = 1;
-	iter->levels[0].initialized = 0;
+	iter->levels[0].dir = NULL;
+	iter->levels[0].dir_state = DIR_STATE_PUSHED;
 
 	return dir_iterator;
 }
diff --git a/dir-iterator.h b/dir-iterator.h
index 27739e6..0e82f36 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -11,13 +11,12 @@
  * Every time dir_iterator_advance() is called, update the members of
  * the dir_iterator structure to reflect the next path in the
  * iteration. The order that paths are iterated over within a
- * directory is undefined, but directory paths are always iterated
- * over before the subdirectory contents.
+ * directory is undefined.
  *
  * A typical iteration looks like this:
  *
  *     int ok;
- *     struct iterator *iter = dir_iterator_begin(path);
+ *     struct iterator *iter = dir_iterator_begin(path, flags);
  *
  *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
  *             if (want_to_stop_iteration()) {
@@ -38,6 +37,22 @@
  * dir_iterator_advance() again.
  */
 
+/*
+ * Possible flags for dir_iterator_begin().
+ *
+ * -> DIR_ITERATOR_PRE_ORDER_TRAVERSAL: the iterator shall return
+ * a dirpath it has found before iterating through that directory's
+ * contents.
+ * -> DIR_ITERATOR_POST_ORDER_TRAVERSAL: the iterator shall return
+ * a dirpath it has found after iterating through that directory's
+ * contents.
+ * -> DIR_ITERATOR_LIST_ROOT_DIR: the iterator shall return the dirpath
+ * of the root directory it is iterating through.
+ */
+#define DIR_ITERATOR_PRE_ORDER_TRAVERSAL (1 << 0)
+#define DIR_ITERATOR_POST_ORDER_TRAVERSAL (1 << 1)
+#define DIR_ITERATOR_LIST_ROOT_DIR (1 << 2)
+
 struct dir_iterator {
 	/* The current path: */
 	struct strbuf path;
@@ -57,15 +72,16 @@ struct dir_iterator {
 };
 
 /*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path, with options specified in
+ * 'flags'. Return a dir_iterator that holds the internal state of
+ * the iteration.
  *
  * The iteration includes all paths under path, not including path
  * itself and not including "." or ".." entries.
  *
  * path is the starting directory. An internal copy will be made.
  */
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);
 
 /*
  * Advance the iterator to the first or next item and return ITER_OK.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 50188e9..c29dc68 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3346,7 +3346,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
 	files_downcast(ref_store, 0, "reflog_iterator_begin");
 
 	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
-	iter->dir_iterator = dir_iterator_begin(git_path("logs"));
+	iter->dir_iterator = dir_iterator_begin(git_path("logs"), DIR_ITERATOR_PRE_ORDER_TRAVERSAL);
 	return ref_iterator;
 }
 
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 06f03fc..a1b8c78 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -6,6 +6,7 @@
 int cmd_main(int argc, const char **argv) {
 	struct strbuf path = STRBUF_INIT;
 	struct dir_iterator *diter;
+	unsigned flag = DIR_ITERATOR_PRE_ORDER_TRAVERSAL;
 
 	if (argc < 2) {
 		return 1;
@@ -13,7 +14,10 @@ int cmd_main(int argc, const char **argv) {
 
 	strbuf_add(&path, argv[1], strlen(argv[1]));
 
-	diter = dir_iterator_begin(path.buf);
+	if (argc == 3)
+		flag = atoi(argv[2]);
+
+	diter = dir_iterator_begin(path.buf, flag);
 
 	while (dir_iterator_advance(diter) == ITER_OK) {
 		if (S_ISDIR(diter->st.st_mode))
diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
index b857c07..ade3ee0 100755
--- a/t/t0065-dir-iterator.sh
+++ b/t/t0065-dir-iterator.sh
@@ -28,12 +28,28 @@ test_expect_success 'dir-iterator should iterate through all files' '
 	>dir/a/e &&
 	>dir/d/e/d/a &&
 
-	test-dir-iterator ./dir | sort >./actual-pre-order-sorted-output &&
+	test-dir-iterator ./dir 1 | sort >./actual-pre-order-sorted-output &&
 	rm -rf dir &&
 
 	test_cmp expect-sorted-output actual-pre-order-sorted-output
 '
 
+test_expect_success 'dir-iterator should iterate through all files on post-order mode' '
+	mkdir -p dir &&
+	mkdir -p dir/a/b/c/ &&
+	>dir/b &&
+	>dir/c &&
+	mkdir -p dir/d/e/d/ &&
+	>dir/a/b/c/d &&
+	>dir/a/e &&
+	>dir/d/e/d/a &&
+
+	test-dir-iterator ./dir 2 | sort >actual-post-order-sorted-output &&
+	rm -rf dir &&
+
+	test_cmp expect-sorted-output actual-post-order-sorted-output
+'
+
 cat >expect-pre-order-output <<-\EOF &&
 [d] (a) ./dir/a
 [d] (a/b) ./dir/a/b
@@ -41,14 +57,53 @@ cat >expect-pre-order-output <<-\EOF &&
 [f] (a/b/c/d) ./dir/a/b/c/d
 EOF
 
-test_expect_success 'dir-iterator should list files in the correct order' '
+test_expect_success 'dir-iterator should list files properly on pre-order mode' '
 	mkdir -p dir/a/b/c/ &&
 	>dir/a/b/c/d &&
 
-	test-dir-iterator ./dir >actual-pre-order-output &&
+	test-dir-iterator ./dir 1 >actual-pre-order-output &&
 	rm -rf dir &&
 
 	test_cmp expect-pre-order-output actual-pre-order-output
 '
 
+cat >expect-post-order-output <<-\EOF &&
+[f] (a/b/c/d) ./dir/a/b/c/d
+[d] (a/b/c) ./dir/a/b/c
+[d] (a/b) ./dir/a/b
+[d] (a) ./dir/a
+EOF
+
+test_expect_success 'dir-iterator should list files properly on post-order mode' '
+	mkdir -p dir/a/b/c/ &&
+	>dir/a/b/c/d &&
+
+	test-dir-iterator ./dir 2 >actual-post-order-output &&
+	rm -rf dir &&
+
+	test_cmp expect-post-order-output actual-post-order-output
+'
+
+cat >expect-pre-order-post-order-root-dir-output <<-\EOF &&
+[d] (.) ./dir
+[d] (a) ./dir/a
+[d] (a/b) ./dir/a/b
+[d] (a/b/c) ./dir/a/b/c
+[f] (a/b/c/d) ./dir/a/b/c/d
+[d] (a/b/c) ./dir/a/b/c
+[d] (a/b) ./dir/a/b
+[d] (a) ./dir/a
+[d] (.) ./dir
+EOF
+
+test_expect_success 'dir-iterator should list files properly on pre-order + post-order + root-dir mode' '
+	mkdir -p dir/a/b/c/ &&
+	>dir/a/b/c/d &&
+
+	test-dir-iterator ./dir 7 >actual-pre-order-post-order-root-dir-output &&
+	rm -rf dir &&
+
+	test_cmp expect-pre-order-post-order-root-dir-output actual-pre-order-post-order-root-dir-output
+'
+
 test_done
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v7 5/5] remove_subtree(): reimplement using iterators
  2017-04-02 20:03 [PATCH v7 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
                   ` (3 preceding siblings ...)
  2017-04-02 20:03 ` [PATCH v7 4/5] dir_iterator: refactor state machine model Daniel Ferreira
@ 2017-04-02 20:03 ` Daniel Ferreira
  2017-04-03  3:42   ` Michael Haggerty
  4 siblings, 1 reply; 13+ messages in thread
From: Daniel Ferreira @ 2017-04-02 20:03 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira, Daniel Ferreira

From: Daniel Ferreira <daniel.calibeta@gmail.com>

Use dir_iterator to traverse through remove_subtree()'s directory tree,
avoiding the need for recursive calls to readdir(). Simplify
remove_subtree()'s code.

A conversion similar in purpose was previously done at 46d092a
("for_each_reflog(): reimplement using iterators", 2016-05-21).

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
 entry.c | 38 ++++++++++++--------------------------
 1 file changed, 12 insertions(+), 26 deletions(-)

diff --git a/entry.c b/entry.c
index d2b512d..bd06f41 100644
--- a/entry.c
+++ b/entry.c
@@ -3,6 +3,8 @@
 #include "dir.h"
 #include "streaming.h"
 #include "submodule.h"
+#include "iterator.h"
+#include "dir-iterator.h"
 
 static void create_directories(const char *path, int path_len,
 			       const struct checkout *state)
@@ -45,33 +47,17 @@ static void create_directories(const char *path, int path_len,
 	free(buf);
 }
 
-static void remove_subtree(struct strbuf *path)
+static void remove_subtree(const char *path)
 {
-	DIR *dir = opendir(path->buf);
-	struct dirent *de;
-	int origlen = path->len;
-
-	if (!dir)
-		die_errno("cannot opendir '%s'", path->buf);
-	while ((de = readdir(dir)) != NULL) {
-		struct stat st;
-
-		if (is_dot_or_dotdot(de->d_name))
-			continue;
-
-		strbuf_addch(path, '/');
-		strbuf_addstr(path, de->d_name);
-		if (lstat(path->buf, &st))
-			die_errno("cannot lstat '%s'", path->buf);
-		if (S_ISDIR(st.st_mode))
-			remove_subtree(path);
-		else if (unlink(path->buf))
-			die_errno("cannot unlink '%s'", path->buf);
-		strbuf_setlen(path, origlen);
+	struct dir_iterator *diter = dir_iterator_begin(path, DIR_ITERATOR_POST_ORDER_TRAVERSAL | DIR_ITERATOR_LIST_ROOT_DIR);
+
+	while (dir_iterator_advance(diter) == ITER_OK) {
+		if (S_ISDIR(diter->st.st_mode)) {
+			if (rmdir(diter->path.buf))
+				die_errno("cannot rmdir '%s'", diter->path.buf);
+		} else if (unlink(diter->path.buf))
+			die_errno("cannot unlink '%s'", diter->path.buf);
 	}
-	closedir(dir);
-	if (rmdir(path->buf))
-		die_errno("cannot rmdir '%s'", path->buf);
 }
 
 static int create_file(const char *path, unsigned int mode)
@@ -312,7 +298,7 @@ int checkout_entry(struct cache_entry *ce,
 				return 0;
 			if (!state->force)
 				return error("%s is a directory", path.buf);
-			remove_subtree(&path);
+			remove_subtree(path.buf);
 		} else if (unlink(path.buf))
 			return error_errno("unable to unlink old '%s'", path.buf);
 	} else if (state->not_new)
-- 
2.7.4 (Apple Git-66)


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 4/5] dir_iterator: refactor state machine model
  2017-04-02 20:03 ` [PATCH v7 4/5] dir_iterator: refactor state machine model Daniel Ferreira
@ 2017-04-03  3:36   ` Michael Haggerty
  2017-04-03 18:03     ` Daniel Ferreira (theiostream)
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Haggerty @ 2017-04-03  3:36 UTC (permalink / raw)
  To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds

On 04/02/2017 10:03 PM, Daniel Ferreira wrote:
> Perform major refactor of dir_iterator_advance(). dir_iterator has
> ceased to rely on a convoluted state machine mechanism of two loops and
> two state variables (level.initialized and level.dir_state). This serves
> to ease comprehension of the iterator mechanism and ease addition of new
> features to the iterator.
> 
> Create an option for the dir_iterator API to iterate over subdirectories
> only after having iterated through their contents. This feature was
> predicted, although not implemented by 0fe5043 ("dir_iterator: new API
> for iterating over a directory tree", 2016-06-18). This is useful for
> recursively removing a directory and calling rmdir() on a directory only
> after all of its contents have been wiped.
> 
> Add an option for the dir_iterator API to iterate over the root
> directory (the one it was initialized with) as well.
> 
> Add the "flags" parameter to dir_iterator_create, allowing for the
> aforementioned new features to be enabled. The new default behavior
> (i.e. flags set to 0) does not iterate over directories. Flag
> DIR_ITERATOR_PRE_ORDER_TRAVERSAL iterates over a directory before doing
> so over its contents. DIR_ITERATOR_POST_ORDER_TRAVERSAL iterates over a
> directory after doing so over its contents. DIR_ITERATOR_LIST_ROOT_DIR
> iterates over the root directory. These flags do not conflict with each
> other and may be used simultaneously.
> 
> Amend a call to dir_iterator_begin() in refs/files-backend.c to pass
> the flags parameter introduced.
> 
> Improve t/t0065-dir-iterator.sh and t/helper/test-dir-iterator.c to
> test "post-order" and "iterate-over-root" modes.

Nice. I think this version is a lot more understandable than the old
code. I also think it's getting very close to done.

> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
> ---
>  dir-iterator.c               | 155 +++++++++++++++++++++++++++----------------
>  dir-iterator.h               |  28 ++++++--
>  refs/files-backend.c         |   2 +-
>  t/helper/test-dir-iterator.c |   6 +-
>  t/t0065-dir-iterator.sh      |  61 ++++++++++++++++-
>  5 files changed, 183 insertions(+), 69 deletions(-)
> 
> diff --git a/dir-iterator.c b/dir-iterator.c
> index ce8bf81..18b7e68 100644
> --- a/dir-iterator.c
> +++ b/dir-iterator.c
> @@ -4,8 +4,6 @@
>  #include "dir-iterator.h"
>  
>  struct dir_iterator_level {
> -	int initialized;
> -
>  	DIR *dir;
>  
>  	/*
> @@ -20,8 +18,11 @@ struct dir_iterator_level {
>  	 * iteration and also iterated into):
>  	 */
>  	enum {
> -		DIR_STATE_ITER,
> -		DIR_STATE_RECURSE
> +		DIR_STATE_PUSHED,
> +		DIR_STATE_PRE_ITERATION,
> +		DIR_STATE_ITERATING,
> +		DIR_STATE_POST_ITERATION,
> +		DIR_STATE_EXHAUSTED
>  	} dir_state;
>  };
>  
> @@ -48,15 +49,20 @@ struct dir_iterator_int {
>  	 * that will be included in this iteration.
>  	 */
>  	struct dir_iterator_level *levels;
> +
> +	/* Holds the flags passed to dir_iterator_begin(). */
> +	unsigned flags;
>  };
>  
>  static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
>  {
> -	level->dir_state = DIR_STATE_RECURSE;
>  	ALLOC_GROW(iter->levels, iter->levels_nr + 1,
>  		   iter->levels_alloc);
> +
> +	/* Push a new level */
>  	level = &iter->levels[iter->levels_nr++];
> -	level->initialized = 0;
> +	level->dir = NULL;
> +	level->dir_state = DIR_STATE_PUSHED;
>  }
>  
>  static inline int pop_dir_level(struct dir_iterator_int *iter)
> @@ -75,18 +81,35 @@ static inline int set_iterator_data(struct dir_iterator_int *iter, struct dir_it
>  	}
>  
>  	/*
> -	 * We have to set these each time because
> -	 * the path strbuf might have been realloc()ed.
> +	 * Check if we are dealing with the root directory as an
> +	 * item that's being iterated through.
>  	 */
> -	iter->base.relative_path =
> -		iter->base.path.buf + iter->levels[0].prefix_len;
> +	if (level->dir_state != DIR_STATE_ITERATING &&
> +		iter->levels_nr == 1) {
> +		iter->base.relative_path = ".";

Doesn't `iter->base.basename` also need special handling in this case?

> +	}
> +	else {

The Git project standard is to including the `else` on the same line as
the curly brace ending the `if` block:

	} else {

> +		iter->base.relative_path =
> +			iter->base.path.buf + iter->levels[0].prefix_len;
> +	}
>  	iter->base.basename =
>  		iter->base.path.buf + level->prefix_len;
> -	level->dir_state = DIR_STATE_ITER;
>  
>  	return 0;
>  }
>  
> +/*
> + * This function uses a state machine with the following states:
> + * -> DIR_STATE_PUSHED: the directory has been pushed to the
> + * iterator traversal tree.
> + * -> DIR_STATE_PRE_ITERATION: the directory is *NOT* initialized. The
> + * dirpath has already been returned if pre-order traversal is set.
> + * -> DIR_STATE_ITERATING: the directory is initialized. We are traversing
> + * through it.
> + * -> DIR_STATE_POST_ITERATION: the directory has been iterated through.
> + * We are ready to close it.
> + * -> DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped.
> + */
>  int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  {
>  	struct dir_iterator_int *iter =
> @@ -97,7 +120,18 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			&iter->levels[iter->levels_nr - 1];
>  		struct dirent *de;
>  
> -		if (!level->initialized) {
> +		if (level->dir_state == DIR_STATE_PUSHED) {
> +			level->dir_state = DIR_STATE_PRE_ITERATION;
> +
> +			if (iter->flags & DIR_ITERATOR_PRE_ORDER_TRAVERSAL) {
> +				/* We may not want the root directory to be iterated over */
> +				if (iter->levels_nr != 1 ||
> +					(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR)) {

It would be a tiny improvement in readability to change the check
`iter->levels_nr != 1` to `iters->levels_nr > 1`, to help make it
clearer that this code block is not about `iters->level_nr == 0`.

> +					set_iterator_data(iter, level);

set_iterator_data() reports errors by returning -1, but here (and in the
DIR_ITERATOR_POST_ORDER_TRAVERSAL code block below) you fail to check
the return value.

If you are thinking to yourself "that can't happen because I already
checked `lstat()` in a previous iteration of the loop", remember that
another process might be changing things on the filesystem at the same
time as the iteration is running ("TOCTOU").

But of course that would make for slightly strange behavior if, during
the PRE_ORDER_TRAVERSAL, a directory is readable, but by the time of the
POST_ORDER_TRAVERSAL, it has disappeared. The iteration might list the
contents of a directory but not the POST_ORDER_TRAVERSAL of the
directory itself. Or, if both PRE and POST are being used, there would
be an "entering directory" event but no matching "leaving directory"
event, which could be confusing to a hypothetical caller who expects the
"entering directory" and "leaving directory" events to match up in pairs.

Furthermore, you are calling `lstat()` more often than necessary. This
function is quite expensive on Windows.

You could solve all three problems by running `lstat()` on the directory
once the first time you see it, storing its value in the
`dir_iterator_level`, and copying that value to `iter->base.st` when it
is needed. (This would fix the second problem by removing the need for a
failing `set_iterator_data()` call during the POST_ORDER_TRAVERSAL phase
of processing a directory.)

This would mean that we might return stale `lstat()` data to the caller,
but note that `lstat()` data is *always* stale, because it is always
possible that another process changes it the nanosecond after our
process reads it. So it's a possibility that the caller has to deal with
in any case.

> +					return ITER_OK;
> +				}
> +			}
> +		} else if (level->dir_state == DIR_STATE_PRE_ITERATION) {
>  			/*
>  			 * Note: dir_iterator_begin() ensures that
>  			 * path is not the empty string.
> @@ -107,64 +141,35 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			level->prefix_len = iter->base.path.len;
>  
>  			level->dir = opendir(iter->base.path.buf);
> -			if (!level->dir && errno != ENOENT) {
> -				warning("error opening directory %s: %s",
> -					iter->base.path.buf, strerror(errno));
> -				/* Popping the level is handled below */
> -			}
> -
> -			level->initialized = 1;
> -		} else if (S_ISDIR(iter->base.st.st_mode)) {
> -			if (level->dir_state == DIR_STATE_ITER) {
> +			if (!level->dir) {
>  				/*
> -				 * The directory was just iterated
> -				 * over; now prepare to iterate into
> -				 * it.
> +				 * This level wasn't opened sucessfully; pretend we
> +				 * iterated through it already.
>  				 */
> -				push_dir_level(iter, level);
> +				if (errno != ENOENT) {
> +					warning("error opening directory %s: %s",
> +						iter->base.path.buf, strerror(errno));
> +				}
> +
> +				level->dir_state = DIR_STATE_POST_ITERATION;
>  				continue;
> -			} else {
> -				/*
> -				 * The directory has already been
> -				 * iterated over and iterated into;
> -				 * we're done with it.
> -				 */
>  			}
> -		}
>  
> -		if (!level->dir) {
> -			/*
> -			 * This level is exhausted (or wasn't opened
> -			 * successfully); pop up a level.
> -			 */
> -			if (pop_dir_level(iter) == 0)
> -				return dir_iterator_abort(dir_iterator);
> -
> -			continue;
> -		}
> -
> -		/*
> -		 * Loop until we find an entry that we can give back
> -		 * to the caller:
> -		 */
> -		while (1) {
> +			level->dir_state = DIR_STATE_ITERATING;
> +		} else if (level->dir_state == DIR_STATE_ITERATING) {
>  			strbuf_setlen(&iter->base.path, level->prefix_len);
>  			errno = 0;
>  			de = readdir(level->dir);
>  
>  			if (!de) {
> -				/* This level is exhausted; pop up a level. */
> +				/* In case of readdir() error */
>  				if (errno) {
>  					warning("error reading directory %s: %s",
>  						iter->base.path.buf, strerror(errno));
> -				} else if (closedir(level->dir))
> -					warning("error closing directory %s: %s",
> -						iter->base.path.buf, strerror(errno));
> +				}
>  
> -				level->dir = NULL;
> -				if (pop_dir_level(iter) == 0)
> -					return dir_iterator_abort(dir_iterator);
> -				break;
> +				level->dir_state = DIR_STATE_POST_ITERATION;
> +				continue;
>  			}
>  
>  			if (is_dot_or_dotdot(de->d_name))
> @@ -175,7 +180,38 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
>  			if (set_iterator_data(iter, level))
>  				continue;
>  
> +			if (S_ISDIR(iter->base.st.st_mode)) {
> +				push_dir_level(iter, level);
> +				continue;
> +			}
> +
>  			return ITER_OK;
> +		} else if (level->dir_state == DIR_STATE_POST_ITERATION) {
> +			if (level->dir != NULL && closedir(level->dir)) {
> +				warning("error closing directory %s: %s",
> +					iter->base.path.buf, strerror(errno));
> +			}
> +			level->dir_state = DIR_STATE_EXHAUSTED;
> +
> +			strbuf_setlen(&iter->base.path, level->prefix_len);
> +			/*
> +			 * Since we are iterating through the dirpath
> +			 * after we have gone through it, we still need
> +			 * to get rid of the trailing slash we appended.
> +			 */
> +			strbuf_strip_suffix(&iter->base.path, "/");
> +
> +			if (iter->flags & DIR_ITERATOR_POST_ORDER_TRAVERSAL) {
> +				/* We may not want the root directory to be iterated over */
> +				if (iter->levels_nr != 1 ||
> +					(iter->flags & DIR_ITERATOR_LIST_ROOT_DIR)) {
> +					set_iterator_data(iter, level);
> +					return ITER_OK;
> +				}
> +			}
> +		} else if (level->dir_state == DIR_STATE_EXHAUSTED) {
> +			if (pop_dir_level(iter) == 0)
> +				return dir_iterator_abort(dir_iterator);
>  		}
>  	}
>  }

As far as I can tell, you got the logic in this complicated big loop
correct on the first try (well, if we ignore v6 :-) ), even as you added
new features. I think that's good evidence that the new structure is
more comprehensible than the old, plus the new tests probably helped.
That's a big win!

> @@ -201,7 +237,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
>  	return ITER_DONE;
>  }
>  
> -struct dir_iterator *dir_iterator_begin(const char *path)
> +struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
>  {
>  	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
>  	struct dir_iterator *dir_iterator = &iter->base;
> @@ -209,13 +245,16 @@ struct dir_iterator *dir_iterator_begin(const char *path)
>  	if (!path || !*path)
>  		die("BUG: empty path passed to dir_iterator_begin()");
>  
> +	iter->flags = flags;
> +
>  	strbuf_init(&iter->base.path, PATH_MAX);
>  	strbuf_addstr(&iter->base.path, path);
>  
>  	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
>  
>  	iter->levels_nr = 1;
> -	iter->levels[0].initialized = 0;
> +	iter->levels[0].dir = NULL;
> +	iter->levels[0].dir_state = DIR_STATE_PUSHED;
>  
>  	return dir_iterator;
>  }
> diff --git a/dir-iterator.h b/dir-iterator.h
> index 27739e6..0e82f36 100644
> --- a/dir-iterator.h
> +++ b/dir-iterator.h
> @@ -11,13 +11,12 @@
>   * Every time dir_iterator_advance() is called, update the members of
>   * the dir_iterator structure to reflect the next path in the
>   * iteration. The order that paths are iterated over within a
> - * directory is undefined, but directory paths are always iterated
> - * over before the subdirectory contents.
> + * directory is undefined.
>   *
>   * A typical iteration looks like this:
>   *
>   *     int ok;
> - *     struct iterator *iter = dir_iterator_begin(path);
> + *     struct iterator *iter = dir_iterator_begin(path, flags);
>   *
>   *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
>   *             if (want_to_stop_iteration()) {
> @@ -38,6 +37,22 @@
>   * dir_iterator_advance() again.
>   */
>  
> +/*
> + * Possible flags for dir_iterator_begin().
> + *
> + * -> DIR_ITERATOR_PRE_ORDER_TRAVERSAL: the iterator shall return
> + * a dirpath it has found before iterating through that directory's
> + * contents.
> + * -> DIR_ITERATOR_POST_ORDER_TRAVERSAL: the iterator shall return
> + * a dirpath it has found after iterating through that directory's
> + * contents.
> + * -> DIR_ITERATOR_LIST_ROOT_DIR: the iterator shall return the dirpath
> + * of the root directory it is iterating through.

I think it would be useful to mention here that the three options can be
used in any combination.

The documentation for `DIR_ITERATOR_LIST_ROOT_DIR` isn't precisely
accurate. If neither of the `*_ORDER_TRAVERSAL` options is specified,
then the root directory is not included in the iteration even if this
option is specified. I think this is enough of a boundary case that it
doesn't really matter if you change the behavior or change the
documentation, but one way or another please make them agree.

> + */
> +#define DIR_ITERATOR_PRE_ORDER_TRAVERSAL (1 << 0)
> +#define DIR_ITERATOR_POST_ORDER_TRAVERSAL (1 << 1)
> +#define DIR_ITERATOR_LIST_ROOT_DIR (1 << 2)
> +
>  struct dir_iterator {
>  	/* The current path: */
>  	struct strbuf path;
> @@ -57,15 +72,16 @@ struct dir_iterator {
>  };
>  
>  /*
> - * Start a directory iteration over path. Return a dir_iterator that
> - * holds the internal state of the iteration.
> + * Start a directory iteration over path, with options specified in
> + * 'flags'. Return a dir_iterator that holds the internal state of
> + * the iteration.
>   *
>   * The iteration includes all paths under path, not including path
>   * itself and not including "." or ".." entries.

The sentence above should be updated now that you have implemented
`DIR_ITERATOR_LIST_ROOT_DIR`.

>   * path is the starting directory. An internal copy will be made.
>   */
> -struct dir_iterator *dir_iterator_begin(const char *path);
> +struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);
>  
>  /*
>   * Advance the iterator to the first or next item and return ITER_OK.
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 50188e9..c29dc68 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3346,7 +3346,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
>  	files_downcast(ref_store, 0, "reflog_iterator_begin");
>  
>  	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
> -	iter->dir_iterator = dir_iterator_begin(git_path("logs"));
> +	iter->dir_iterator = dir_iterator_begin(git_path("logs"), DIR_ITERATOR_PRE_ORDER_TRAVERSAL);
>  	return ref_iterator;
>  }
>  
> diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
> index 06f03fc..a1b8c78 100644
> --- a/t/helper/test-dir-iterator.c
> +++ b/t/helper/test-dir-iterator.c
> @@ -6,6 +6,7 @@
>  int cmd_main(int argc, const char **argv) {
>  	struct strbuf path = STRBUF_INIT;
>  	struct dir_iterator *diter;
> +	unsigned flag = DIR_ITERATOR_PRE_ORDER_TRAVERSAL;
>  
>  	if (argc < 2) {
>  		return 1;
> @@ -13,7 +14,10 @@ int cmd_main(int argc, const char **argv) {
>  
>  	strbuf_add(&path, argv[1], strlen(argv[1]));
>  
> -	diter = dir_iterator_begin(path.buf);
> +	if (argc == 3)
> +		flag = atoi(argv[2]);

It's not ideal that the test code depends on the numerical values of the
flag constants, and it makes the tests harder to understand. It would be
better if this program were to accept options like `--pre-order`,
`--post-order`, etc., as I suggested in an earlier round of review.

> +	diter = dir_iterator_begin(path.buf, flag);
>  
>  	while (dir_iterator_advance(diter) == ITER_OK) {
>  		if (S_ISDIR(diter->st.st_mode))
> diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
> index b857c07..ade3ee0 100755
> --- a/t/t0065-dir-iterator.sh
> +++ b/t/t0065-dir-iterator.sh
> @@ -28,12 +28,28 @@ test_expect_success 'dir-iterator should iterate through all files' '
>  	>dir/a/e &&
>  	>dir/d/e/d/a &&
>  
> -	test-dir-iterator ./dir | sort >./actual-pre-order-sorted-output &&
> +	test-dir-iterator ./dir 1 | sort >./actual-pre-order-sorted-output &&
>  	rm -rf dir &&
>  
>  	test_cmp expect-sorted-output actual-pre-order-sorted-output
>  '
>  
> +test_expect_success 'dir-iterator should iterate through all files on post-order mode' '
> +	mkdir -p dir &&
> +	mkdir -p dir/a/b/c/ &&
> +	>dir/b &&
> +	>dir/c &&
> +	mkdir -p dir/d/e/d/ &&
> +	>dir/a/b/c/d &&
> +	>dir/a/e &&
> +	>dir/d/e/d/a &&
> +
> +	test-dir-iterator ./dir 2 | sort >actual-post-order-sorted-output &&
> +	rm -rf dir &&
> +
> +	test_cmp expect-sorted-output actual-post-order-sorted-output
> +'

The setup done in the test above is identical to the setup for the test
before it. Similarly, the last three tests also all use the same setup.
Please put all of the setup code in an extra "pseudo-test" at the top of
the file, to avoid duplication:

test_expect_success 'setup' '
	mkdir -p dir1 &&
	mkdir -p dir1/a/b/c/ &&
	>dir1/b &&
	>dir1/c &&
	mkdir -p dir1/d/e/d/ &&
	>dir1/a/b/c/d &&
	>dir1/a/e &&
	>dir1/d/e/d/a &&

	mkdir -p dir2/a/b/c/ &&
	>dir2/a/b/c/d
'

Note that at least one of the test directories has to be renamed so that
they don't conflict with each other.

There is no need for the individual tests to delete their test
directories; the test harness will take care of that. But if you *did*
need to clean up after a test, you should do it like this:

	mkdir foo &&
	test_when_finished "rm -rf foo" &&
	...tests involving foo...

The advantage of `test_when_finished` is that it ensures that the
cleanup code is run even if some part of the test fails.

>  cat >expect-pre-order-output <<-\EOF &&
>  [d] (a) ./dir/a
>  [d] (a/b) ./dir/a/b
> @@ -41,14 +57,53 @@ cat >expect-pre-order-output <<-\EOF &&
>  [f] (a/b/c/d) ./dir/a/b/c/d
>  EOF
>  
> -test_expect_success 'dir-iterator should list files in the correct order' '
> +test_expect_success 'dir-iterator should list files properly on pre-order mode' '
>  	mkdir -p dir/a/b/c/ &&
>  	>dir/a/b/c/d &&
>  
> -	test-dir-iterator ./dir >actual-pre-order-output &&
> +	test-dir-iterator ./dir 1 >actual-pre-order-output &&
>  	rm -rf dir &&
>  
>  	test_cmp expect-pre-order-output actual-pre-order-output
>  '
>  
> +cat >expect-post-order-output <<-\EOF &&
> +[f] (a/b/c/d) ./dir/a/b/c/d
> +[d] (a/b/c) ./dir/a/b/c
> +[d] (a/b) ./dir/a/b
> +[d] (a) ./dir/a
> +EOF
> +
> +test_expect_success 'dir-iterator should list files properly on post-order mode' '
> +	mkdir -p dir/a/b/c/ &&
> +	>dir/a/b/c/d &&
> +
> +	test-dir-iterator ./dir 2 >actual-post-order-output &&
> +	rm -rf dir &&
> +
> +	test_cmp expect-post-order-output actual-post-order-output
> +'
> +
> +cat >expect-pre-order-post-order-root-dir-output <<-\EOF &&
> +[d] (.) ./dir
> +[d] (a) ./dir/a
> +[d] (a/b) ./dir/a/b
> +[d] (a/b/c) ./dir/a/b/c
> +[f] (a/b/c/d) ./dir/a/b/c/d
> +[d] (a/b/c) ./dir/a/b/c
> +[d] (a/b) ./dir/a/b
> +[d] (a) ./dir/a
> +[d] (.) ./dir
> +EOF
> +
> +test_expect_success 'dir-iterator should list files properly on pre-order + post-order + root-dir mode' '
> +	mkdir -p dir/a/b/c/ &&
> +	>dir/a/b/c/d &&
> +
> +	test-dir-iterator ./dir 7 >actual-pre-order-post-order-root-dir-output &&
> +	rm -rf dir &&
> +
> +	test_cmp expect-pre-order-post-order-root-dir-output actual-pre-order-post-order-root-dir-output
> +'
> +
>  test_done

Michael


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 5/5] remove_subtree(): reimplement using iterators
  2017-04-02 20:03 ` [PATCH v7 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
@ 2017-04-03  3:42   ` Michael Haggerty
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2017-04-03  3:42 UTC (permalink / raw)
  To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds, Daniel Ferreira

On 04/02/2017 10:03 PM, Daniel Ferreira wrote:
> From: Daniel Ferreira <daniel.calibeta@gmail.com>
> 
> Use dir_iterator to traverse through remove_subtree()'s directory tree,
> avoiding the need for recursive calls to readdir(). Simplify
> remove_subtree()'s code.
> 
> A conversion similar in purpose was previously done at 46d092a
> ("for_each_reflog(): reimplement using iterators", 2016-05-21).
> 
> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>

This patch has a different author email than the others. If possible,
please choose one email address that you will use for your commits, and
try to be consistent.

> ---
>  entry.c | 38 ++++++++++++--------------------------
>  1 file changed, 12 insertions(+), 26 deletions(-)
> 
> diff --git a/entry.c b/entry.c
> index d2b512d..bd06f41 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -3,6 +3,8 @@
>  #include "dir.h"
>  #include "streaming.h"
>  #include "submodule.h"
> +#include "iterator.h"
> +#include "dir-iterator.h"
>  
>  static void create_directories(const char *path, int path_len,
>  			       const struct checkout *state)
> @@ -45,33 +47,17 @@ static void create_directories(const char *path, int path_len,
>  	free(buf);
>  }
>  
> -static void remove_subtree(struct strbuf *path)
> +static void remove_subtree(const char *path)
>  {
> -	DIR *dir = opendir(path->buf);
> -	struct dirent *de;
> -	int origlen = path->len;
> -
> -	if (!dir)
> -		die_errno("cannot opendir '%s'", path->buf);
> -	while ((de = readdir(dir)) != NULL) {
> -		struct stat st;
> -
> -		if (is_dot_or_dotdot(de->d_name))
> -			continue;
> -
> -		strbuf_addch(path, '/');
> -		strbuf_addstr(path, de->d_name);
> -		if (lstat(path->buf, &st))
> -			die_errno("cannot lstat '%s'", path->buf);
> -		if (S_ISDIR(st.st_mode))
> -			remove_subtree(path);
> -		else if (unlink(path->buf))
> -			die_errno("cannot unlink '%s'", path->buf);
> -		strbuf_setlen(path, origlen);
> +	struct dir_iterator *diter = dir_iterator_begin(path, DIR_ITERATOR_POST_ORDER_TRAVERSAL | DIR_ITERATOR_LIST_ROOT_DIR);

The line above is way too long. Try to limit lines to 80 characters max.
(This is documented in `Documentation/CodingGuidelines`.)

> +	while (dir_iterator_advance(diter) == ITER_OK) {
> +		if (S_ISDIR(diter->st.st_mode)) {
> +			if (rmdir(diter->path.buf))
> +				die_errno("cannot rmdir '%s'", diter->path.buf);
> +		} else if (unlink(diter->path.buf))
> +			die_errno("cannot unlink '%s'", diter->path.buf);
>  	}
> -	closedir(dir);
> -	if (rmdir(path->buf))
> -		die_errno("cannot rmdir '%s'", path->buf);
>  }
>  
>  static int create_file(const char *path, unsigned int mode)
> @@ -312,7 +298,7 @@ int checkout_entry(struct cache_entry *ce,
>  				return 0;
>  			if (!state->force)
>  				return error("%s is a directory", path.buf);
> -			remove_subtree(&path);
> +			remove_subtree(path.buf);
>  		} else if (unlink(path.buf))
>  			return error_errno("unable to unlink old '%s'", path.buf);
>  	} else if (state->not_new)
> 

That's a gratifying decrease in code size :-)

Michael


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 4/5] dir_iterator: refactor state machine model
  2017-04-03  3:36   ` Michael Haggerty
@ 2017-04-03 18:03     ` Daniel Ferreira (theiostream)
  2017-04-04  6:36       ` Michael Haggerty
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Ferreira (theiostream) @ 2017-04-03 18:03 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Git Mailing List, Junio C Hamano, Stefan Beller, Duy Nguyen

On Mon, Apr 3, 2017 at 12:36 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> As far as I can tell, you got the logic in this complicated big loop
> correct on the first try (well, if we ignore v6 :-) ), even as you added
> new features. I think that's good evidence that the new structure is
> more comprehensible than the old, plus the new tests probably helped.
> That's a big win!

Thanks for ignoring v6 ;)

Another gain is that the other proposed features (only iterate over
directories, do not recurse into subdirectories) are also quite easy
to add with this new structure.

> It's not ideal that the test code depends on the numerical values of the
> flag constants, and it makes the tests harder to understand. It would be
> better if this program were to accept options like `--pre-order`,
> `--post-order`, etc., as I suggested in an earlier round of review.

Although it does make tests harder to understand, if we were to
specify how to iterate with human-readable flags we'd add the getopt()
+ flag configuration overhead to this helper program to be able to
handle all cases properly. Additionally, new flags added to
dir_iterator would have to edit the test program as well, generating
extra work.

I personally think that the string describing the test in the script
is enough to explain what the flag-as-argument is doing for the sake
of readability. The only gain I see in your suggestion is that we
avoid the programmer committing errors calculating the flag by hand
when writing the test.

That said, I'd appreciate some more thought on this.

> Michael
>

Thanks for the review. I agree with all other points and I'll address
them in a next series.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 1/5] dir_iterator: add tests for dir_iterator API
  2017-04-02 20:03 ` [PATCH v7 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
@ 2017-04-03 22:31   ` Stefan Beller
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Beller @ 2017-04-03 22:31 UTC (permalink / raw)
  To: Daniel Ferreira; +Cc: git, Junio C Hamano, Duy Nguyen, Michael Haggerty

On Sun, Apr 2, 2017 at 1:03 PM, Daniel Ferreira <bnmvco@gmail.com> wrote:
> Create t/helper/test-dir-iterator.c, which prints relevant information
> about a directory tree iterated over with dir_iterator.
>
> Create t/t0065-dir-iterator.sh, which tests that dir_iterator does
> iterate through a whole directory tree.
>
> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
> ---
>  Makefile                     |  1 +
>  t/helper/.gitignore          |  1 +
>  t/helper/test-dir-iterator.c | 28 +++++++++++++++++++++++
>  t/t0065-dir-iterator.sh      | 54 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 84 insertions(+)
>  create mode 100644 t/helper/test-dir-iterator.c
>  create mode 100755 t/t0065-dir-iterator.sh
>
> diff --git a/Makefile b/Makefile
> index 9b36068..41ce9ab 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -614,6 +614,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
>  TEST_PROGRAMS_NEED_X += test-config
>  TEST_PROGRAMS_NEED_X += test-date
>  TEST_PROGRAMS_NEED_X += test-delta
> +TEST_PROGRAMS_NEED_X += test-dir-iterator
>  TEST_PROGRAMS_NEED_X += test-dump-cache-tree
>  TEST_PROGRAMS_NEED_X += test-dump-split-index
>  TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
> diff --git a/t/helper/.gitignore b/t/helper/.gitignore
> index 758ed2e..a7d74d3 100644
> --- a/t/helper/.gitignore
> +++ b/t/helper/.gitignore
> @@ -3,6 +3,7 @@
>  /test-config
>  /test-date
>  /test-delta
> +/test-dir-iterator
>  /test-dump-cache-tree
>  /test-dump-split-index
>  /test-dump-untracked-cache
> diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
> new file mode 100644
> index 0000000..06f03fc
> --- /dev/null
> +++ b/t/helper/test-dir-iterator.c
> @@ -0,0 +1,28 @@
> +#include "git-compat-util.h"
> +#include "strbuf.h"
> +#include "iterator.h"
> +#include "dir-iterator.h"
> +
> +int cmd_main(int argc, const char **argv) {
> +       struct strbuf path = STRBUF_INIT;
> +       struct dir_iterator *diter;
> +
> +       if (argc < 2) {

Coding style: please use no braces for single statements after
control structures (if, for, while).

Instead of just returning 1 (which is harder to diagnose in an
interactive shell for users who do not know the details of this
program), we could

    die("BUG: test-dir-iterator takes exactly one argument");

> +               return 1;
> +       }
> +
> +       strbuf_add(&path, argv[1], strlen(argv[1]));
> +
> +       diter = dir_iterator_begin(path.buf);
> +
> +       while (dir_iterator_advance(diter) == ITER_OK) {
> +               if (S_ISDIR(diter->st.st_mode))
> +                       printf("[d] ");
> +               else
> +                       printf("[f] ");

In the operating system there are more 'st_mode's than
just directory or file, e.g. symbolic link.

Maybe

    else if (S_ISREG(diter->st.st_mode))
        printf("[f] ");
    else
        printf("[?]");

to catch the unknown things better instead of
defaulting them to regular files?

> +#!/bin/sh
> +
> +test_description='Test directory iteration.'
> +
> +. ./test-lib.sh
> +
> +cat >expect-sorted-output <<-\EOF &&
> +[d] (a) ./dir/a
> +[d] (a/b) ./dir/a/b
> +[d] (a/b/c) ./dir/a/b/c
> +[d] (d) ./dir/d
> +[d] (d/e) ./dir/d/e
> +[d] (d/e/d) ./dir/d/e/d
> +[f] (a/b/c/d) ./dir/a/b/c/d
> +[f] (a/e) ./dir/a/e
> +[f] (b) ./dir/b
> +[f] (c) ./dir/c
> +[f] (d/e/d/a) ./dir/d/e/d/a
> +EOF
> +
> +test_expect_success 'dir-iterator should iterate through all files' '
> +       mkdir -p dir &&
> +       mkdir -p dir/a/b/c/ &&
> +       >dir/b &&
> +       >dir/c &&
> +       mkdir -p dir/d/e/d/ &&
> +       >dir/a/b/c/d &&
> +       >dir/a/e &&
> +       >dir/d/e/d/a &&
> +
> +       test-dir-iterator ./dir | sort >./actual-pre-order-sorted-output &&

We just had some micro projects that were trying to get rid of git
commands on the upstream of a pipe, to better observe
the exit codes of the programs that we write here. I think it is
not as critical for test helpers, but the it would be better to diagnose
an error when we had

    test-dir-iterator ./dir >out &&
    sort <out >actual &&
    test_cmp expect actual

as in case of an error we'd stop early in the && chain at test-dir-iterator
(and if that would use the die() function we'd even get some error message
on stderr).

> +       rm -rf dir &&

I'd put that cleanup first via
test_expect_success 'title' '
    test_when_finished "rm -rf dir" &&
    mkdir ... &&
    ...
'

By doing so, an inspection is easier afterwards.
When this test breaks, you'd want to debug it via e.g.

  cd t
  ./t0065-dir-iterator.sh -d -i -v -x # see README for all the nice flags)

which in case of error would stop at e.g. test_cmp failing. But at that
point of the test we already removed the files so it is impossible to inspect
what was on the file system, without adding a test_pause before the rm.

> +
> +       test_cmp expect-sorted-output actual-pre-order-sorted-output

Usually the file names are shorter (expect, actual), but I do not
mind encoding some expectation in them.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 2/5] remove_subtree(): test removing nested directories
  2017-04-02 20:03 ` [PATCH v7 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
@ 2017-04-03 22:35   ` Stefan Beller
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Beller @ 2017-04-03 22:35 UTC (permalink / raw)
  To: Daniel Ferreira; +Cc: git, Junio C Hamano, Duy Nguyen, Michael Haggerty

On Sun, Apr 2, 2017 at 1:03 PM, Daniel Ferreira <bnmvco@gmail.com> wrote:
> Test removing a nested directory when an attempt is made to restore the
> index to a state where it does not exist. A similar test could be found
> previously in t/t2000-checkout-cache-clash.sh, but it did not check for
> nested directories, which could allow a faulty implementation of
> remove_subtree() pass the tests.
>
> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
> ---
>  t/t2000-checkout-cache-clash.sh | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/t/t2000-checkout-cache-clash.sh b/t/t2000-checkout-cache-clash.sh
> index de3edb5..ac10ba3 100755
> --- a/t/t2000-checkout-cache-clash.sh
> +++ b/t/t2000-checkout-cache-clash.sh
> @@ -57,4 +57,15 @@ test_expect_success SYMLINKS 'checkout-index -f twice with --prefix' '
>         git checkout-index -a -f --prefix=there/
>  '
>
> +test_expect_success 'git checkout-index -f should remove nested subtrees' '
> +       echo content >path &&
> +       git update-index --add path &&
> +       rm path &&
> +       mkdir -p path/with/nested/paths &&
> +       echo content >path/file1 &&
> +       echo content >path/with/nested/paths/file2 &&
> +       git checkout-index -f -a &&
> +       test ! -d path
> +'
> +

Looks good to me.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 3/5] dir_iterator: add helpers to dir_iterator_advance
  2017-04-02 20:03 ` [PATCH v7 3/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
@ 2017-04-03 22:48   ` Stefan Beller
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Beller @ 2017-04-03 22:48 UTC (permalink / raw)
  To: Daniel Ferreira; +Cc: git, Junio C Hamano, Duy Nguyen, Michael Haggerty

On Sun, Apr 2, 2017 at 1:03 PM, Daniel Ferreira <bnmvco@gmail.com> wrote:
> Create inline helpers to dir_iterator_advance(). Make
> dir_iterator_advance()'s code more legible and allow some behavior to
> be reusable.
>
> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
> ---


> +static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)

Sometimes we use inline, sometimes we do not:
$ git grep "static inline" -- *.c |wc -l
84
$ git grep "static inline" -- *.h |wc -l
130
# approximate a function definition:
$ git grep -e "static" -- *.c  |grep "("|wc -l
2971

So I think we'd want to have the inline keyword in
the header only (when the code is actually to be inlined).
As a C file is one translation unit, the compiler can figure
out best whether to inline a particular static function on
its own.

The rest looks good to me,

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v7 4/5] dir_iterator: refactor state machine model
  2017-04-03 18:03     ` Daniel Ferreira (theiostream)
@ 2017-04-04  6:36       ` Michael Haggerty
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Haggerty @ 2017-04-04  6:36 UTC (permalink / raw)
  To: Daniel Ferreira; +Cc: Stefan Beller, git, Michael Haggerty

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
Daniel,

> Although it does make tests harder to understand, if we were to
> specify how to iterate with human-readable flags we'd add the getopt()
> + flag configuration overhead to this helper program to be able to
> handle all cases properly. Additionally, new flags added to
> dir_iterator would have to edit the test program as well, generating
> extra work.

I think you're exaggerating the amount of extra work. I think all you
need to do is squash the attached patch into your patch 4/5, for the
gain of only 14 lines of simple code, four of which could be deleted
if you don't care about supporting "--". Supporting hypothetical
future new options would cost exactly two more lines for each option.
Plus, this also fixes the handling of more than two args.

It might be even shorter if you use `parse_options()`, but that seems
like overkill here.

On the plus side, anybody can now change the `DIR_ITERATOR_*`
constants without breaking things, or grep for them to find all of the
places that they are used. Plus the test code is more readable.

In my opinion that is a win.

Michael

 t/helper/test-dir-iterator.c | 28 +++++++++++++++++++++-------
 t/t0065-dir-iterator.sh      | 10 +++++-----
 2 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index a1b8c78434..c2eb2ca1f9 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -4,18 +4,32 @@
 #include "dir-iterator.h"
 
 int cmd_main(int argc, const char **argv) {
+	const char **myargv = argv;
+	int myargc = argc;
 	struct strbuf path = STRBUF_INIT;
 	struct dir_iterator *diter;
-	unsigned flag = DIR_ITERATOR_PRE_ORDER_TRAVERSAL;
-
-	if (argc < 2) {
-		return 1;
+	unsigned flag = 0;
+
+	while (--myargc && starts_with(*++myargv, "--")) {
+		if (!strcmp(*myargv, "--pre-order")) {
+			flag |= DIR_ITERATOR_PRE_ORDER_TRAVERSAL;
+		} else if (!strcmp(*myargv, "--post-order")) {
+			flag |= DIR_ITERATOR_POST_ORDER_TRAVERSAL;
+		} else if (!strcmp(*myargv, "--list-root-dir")) {
+			flag |= DIR_ITERATOR_LIST_ROOT_DIR;
+		} else if (!strcmp(*myargv, "--")) {
+			myargc--;
+			myargv++;
+			break;
+		} else {
+			die("Unrecognized option: %s", *myargv);
+		}
 	}
 
-	strbuf_add(&path, argv[1], strlen(argv[1]));
+	if (myargc != 1)
+		die("expected exactly one non-option argument");
 
-	if (argc == 3)
-		flag = atoi(argv[2]);
+	strbuf_addstr(&path, *myargv);
 
 	diter = dir_iterator_begin(path.buf, flag);
 
diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
index ade3ee0e7e..4819e6181d 100755
--- a/t/t0065-dir-iterator.sh
+++ b/t/t0065-dir-iterator.sh
@@ -28,7 +28,7 @@ test_expect_success 'dir-iterator should iterate through all files' '
 	>dir/a/e &&
 	>dir/d/e/d/a &&
 
-	test-dir-iterator ./dir 1 | sort >./actual-pre-order-sorted-output &&
+	test-dir-iterator --pre-order ./dir | sort >./actual-pre-order-sorted-output &&
 	rm -rf dir &&
 
 	test_cmp expect-sorted-output actual-pre-order-sorted-output
@@ -44,7 +44,7 @@ test_expect_success 'dir-iterator should iterate through all files on post-order
 	>dir/a/e &&
 	>dir/d/e/d/a &&
 
-	test-dir-iterator ./dir 2 | sort >actual-post-order-sorted-output &&
+	test-dir-iterator --post-order ./dir | sort >actual-post-order-sorted-output &&
 	rm -rf dir &&
 
 	test_cmp expect-sorted-output actual-post-order-sorted-output
@@ -61,7 +61,7 @@ test_expect_success 'dir-iterator should list files properly on pre-order mode'
 	mkdir -p dir/a/b/c/ &&
 	>dir/a/b/c/d &&
 
-	test-dir-iterator ./dir 1 >actual-pre-order-output &&
+	test-dir-iterator --pre-order ./dir >actual-pre-order-output &&
 	rm -rf dir &&
 
 	test_cmp expect-pre-order-output actual-pre-order-output
@@ -78,7 +78,7 @@ test_expect_success 'dir-iterator should list files properly on post-order mode'
 	mkdir -p dir/a/b/c/ &&
 	>dir/a/b/c/d &&
 
-	test-dir-iterator ./dir 2 >actual-post-order-output &&
+	test-dir-iterator --post-order ./dir >actual-post-order-output &&
 	rm -rf dir &&
 
 	test_cmp expect-post-order-output actual-post-order-output
@@ -100,7 +100,7 @@ test_expect_success 'dir-iterator should list files properly on pre-order + post
 	mkdir -p dir/a/b/c/ &&
 	>dir/a/b/c/d &&
 
-	test-dir-iterator ./dir 7 >actual-pre-order-post-order-root-dir-output &&
+	test-dir-iterator --pre-order --post-order --list-root-dir ./dir >actual-pre-order-post-order-root-dir-output &&
 	rm -rf dir &&
 
 	test_cmp expect-pre-order-post-order-root-dir-output actual-pre-order-post-order-root-dir-output
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-04-04  6:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-02 20:03 [PATCH v7 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-04-02 20:03 ` [PATCH v7 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
2017-04-03 22:31   ` Stefan Beller
2017-04-02 20:03 ` [PATCH v7 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
2017-04-03 22:35   ` Stefan Beller
2017-04-02 20:03 ` [PATCH v7 3/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
2017-04-03 22:48   ` Stefan Beller
2017-04-02 20:03 ` [PATCH v7 4/5] dir_iterator: refactor state machine model Daniel Ferreira
2017-04-03  3:36   ` Michael Haggerty
2017-04-03 18:03     ` Daniel Ferreira (theiostream)
2017-04-04  6:36       ` Michael Haggerty
2017-04-02 20:03 ` [PATCH v7 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-04-03  3:42   ` Michael Haggerty

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.