This is the seventh version of a patch series that implements the GSoC microproject of converting a recursive call to readdir() to use dir_iterator. v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#t v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t v3: https://public-inbox.org/git/CAGZ79kYtpmURSQWPumobA=e3JBFjKhWCdv_LPhKCd71ZRwMovA@mail.gmail.com/T/#t v4: https://public-inbox.org/git/1490747533-89143-1-git-send-email-bnmvco@gmail.com/T/#e437a63e0c22c00c69b5d92977c9b438ed2b9fd3a v5: https://public-inbox.org/git/1490844730-47634-1-git-send-email-bnmvco@gmail.com/T/#m2323f15e45de699f2e09364f40a62e17047cf453 v6: https://public-inbox.org/git/1491107726-21504-1-git-send-email-bnmvco@gmail.com/T/#t I screwed up in v6 because I had introduced a bug that in case git tried to open a directory that did not exist using dir_iterator, the program would segfault. This was amended and all commits are passing the tests. Sorry for not having tested my changes properly. CI build: https://travis-ci.org/theiostream/git Since the changes in v6 were not reviewed, I'll just copy what was sent back there. > Back in v5, Michael had a number of suggestions, all of which were applied > to this version (including a slightly modified version of his "biggish rewrite" > project to make dir_iterator's state machine simpler). The only suggestion that > did not make it into this series was that of not traversing into subdirectories, > since I believe it would be better off in another series that actually required > that feature (that is, I do not want a series to implement a feature it will > not need). The same goes for Junio's thought on a flag to list *only* directories > and no files on the v4 discussion. > Junio and Peff's comments about how to write to files in the tests were also > considered, and the tests were adjusted. > I chose to squash both the state machine refactor and the addition of the > new flags in a single commit. I do not know whether you will feel this is > the right choice but it seemed natural, since most of the state machine's > new logic would not even make sense without encompassing the new features. > I am, of course, open for feedback on this decision. Daniel Ferreira (5): dir_iterator: add tests for dir_iterator API remove_subtree(): test removing nested directories dir_iterator: add helpers to dir_iterator_advance dir_iterator: refactor state machine model remove_subtree(): reimplement using iterators Makefile | 1 + dir-iterator.c | 196 ++++++++++++++++++++++++++-------------- dir-iterator.h | 28 ++++-- entry.c | 38 +++----- refs/files-backend.c | 2 +- t/helper/.gitignore | 1 + t/helper/test-dir-iterator.c | 32 +++++++ t/t0065-dir-iterator.sh | 109 ++++++++++++++++++++++ t/t2000-checkout-cache-clash.sh | 11 +++ 9 files changed, 316 insertions(+), 102 deletions(-) create mode 100644 t/helper/test-dir-iterator.c create mode 100755 t/t0065-dir-iterator.sh -- 2.7.4 (Apple Git-66)
Create t/helper/test-dir-iterator.c, which prints relevant information about a directory tree iterated over with dir_iterator. Create t/t0065-dir-iterator.sh, which tests that dir_iterator does iterate through a whole directory tree. Signed-off-by: Daniel Ferreira <bnmvco@gmail.com> --- Makefile | 1 + t/helper/.gitignore | 1 + t/helper/test-dir-iterator.c | 28 +++++++++++++++++++++++ t/t0065-dir-iterator.sh | 54 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 84 insertions(+) create mode 100644 t/helper/test-dir-iterator.c create mode 100755 t/t0065-dir-iterator.sh diff --git a/Makefile b/Makefile index 9b36068..41ce9ab 100644 --- a/Makefile +++ b/Makefile @@ -614,6 +614,7 @@ TEST_PROGRAMS_NEED_X += test-ctype TEST_PROGRAMS_NEED_X += test-config TEST_PROGRAMS_NEED_X += test-date TEST_PROGRAMS_NEED_X += test-delta +TEST_PROGRAMS_NEED_X += test-dir-iterator TEST_PROGRAMS_NEED_X += test-dump-cache-tree TEST_PROGRAMS_NEED_X += test-dump-split-index TEST_PROGRAMS_NEED_X += test-dump-untracked-cache diff --git a/t/helper/.gitignore b/t/helper/.gitignore index 758ed2e..a7d74d3 100644 --- a/t/helper/.gitignore +++ b/t/helper/.gitignore @@ -3,6 +3,7 @@ /test-config /test-date /test-delta +/test-dir-iterator /test-dump-cache-tree /test-dump-split-index /test-dump-untracked-cache diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c new file mode 100644 index 0000000..06f03fc --- /dev/null +++ b/t/helper/test-dir-iterator.c @@ -0,0 +1,28 @@ +#include "git-compat-util.h" +#include "strbuf.h" +#include "iterator.h" +#include "dir-iterator.h" + +int cmd_main(int argc, const char **argv) { + struct strbuf path = STRBUF_INIT; + struct dir_iterator *diter; + + if (argc < 2) { + return 1; + } + + strbuf_add(&path, argv[1], strlen(argv[1])); + + diter = dir_iterator_begin(path.buf); + + while (dir_iterator_advance(diter) == ITER_OK) { + if (S_ISDIR(diter->st.st_mode)) + printf("[d] "); + else + printf("[f] "); + + printf("(%s) %s\n", diter->relative_path, diter->path.buf); + } + + return 0; +} diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh new file mode 100755 index 0000000..b857c07 --- /dev/null +++ b/t/t0065-dir-iterator.sh @@ -0,0 +1,54 @@ +#!/bin/sh + +test_description='Test directory iteration.' + +. ./test-lib.sh + +cat >expect-sorted-output <<-\EOF && +[d] (a) ./dir/a +[d] (a/b) ./dir/a/b +[d] (a/b/c) ./dir/a/b/c +[d] (d) ./dir/d +[d] (d/e) ./dir/d/e +[d] (d/e/d) ./dir/d/e/d +[f] (a/b/c/d) ./dir/a/b/c/d +[f] (a/e) ./dir/a/e +[f] (b) ./dir/b +[f] (c) ./dir/c +[f] (d/e/d/a) ./dir/d/e/d/a +EOF + +test_expect_success 'dir-iterator should iterate through all files' ' + mkdir -p dir && + mkdir -p dir/a/b/c/ && + >dir/b && + >dir/c && + mkdir -p dir/d/e/d/ && + >dir/a/b/c/d && + >dir/a/e && + >dir/d/e/d/a && + + test-dir-iterator ./dir | sort >./actual-pre-order-sorted-output && + rm -rf dir && + + test_cmp expect-sorted-output actual-pre-order-sorted-output +' + +cat >expect-pre-order-output <<-\EOF && +[d] (a) ./dir/a +[d] (a/b) ./dir/a/b +[d] (a/b/c) ./dir/a/b/c +[f] (a/b/c/d) ./dir/a/b/c/d +EOF + +test_expect_success 'dir-iterator should list files in the correct order' ' + mkdir -p dir/a/b/c/ && + >dir/a/b/c/d && + + test-dir-iterator ./dir >actual-pre-order-output && + rm -rf dir && + + test_cmp expect-pre-order-output actual-pre-order-output +' + +test_done -- 2.7.4 (Apple Git-66)
Test removing a nested directory when an attempt is made to restore the index to a state where it does not exist. A similar test could be found previously in t/t2000-checkout-cache-clash.sh, but it did not check for nested directories, which could allow a faulty implementation of remove_subtree() pass the tests. Signed-off-by: Daniel Ferreira <bnmvco@gmail.com> --- t/t2000-checkout-cache-clash.sh | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/t/t2000-checkout-cache-clash.sh b/t/t2000-checkout-cache-clash.sh index de3edb5..ac10ba3 100755 --- a/t/t2000-checkout-cache-clash.sh +++ b/t/t2000-checkout-cache-clash.sh @@ -57,4 +57,15 @@ test_expect_success SYMLINKS 'checkout-index -f twice with --prefix' ' git checkout-index -a -f --prefix=there/ ' +test_expect_success 'git checkout-index -f should remove nested subtrees' ' + echo content >path && + git update-index --add path && + rm path && + mkdir -p path/with/nested/paths && + echo content >path/file1 && + echo content >path/with/nested/paths/file2 && + git checkout-index -f -a && + test ! -d path +' + test_done -- 2.7.4 (Apple Git-66)
Create inline helpers to dir_iterator_advance(). Make dir_iterator_advance()'s code more legible and allow some behavior to be reusable. Signed-off-by: Daniel Ferreira <bnmvco@gmail.com> --- dir-iterator.c | 65 +++++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 42 insertions(+), 23 deletions(-) diff --git a/dir-iterator.c b/dir-iterator.c index 34182a9..ce8bf81 100644 --- a/dir-iterator.c +++ b/dir-iterator.c @@ -50,6 +50,43 @@ struct dir_iterator_int { struct dir_iterator_level *levels; }; +static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level) +{ + level->dir_state = DIR_STATE_RECURSE; + ALLOC_GROW(iter->levels, iter->levels_nr + 1, + iter->levels_alloc); + level = &iter->levels[iter->levels_nr++]; + level->initialized = 0; +} + +static inline int pop_dir_level(struct dir_iterator_int *iter) +{ + return --iter->levels_nr; +} + +static inline int set_iterator_data(struct dir_iterator_int *iter, struct dir_iterator_level *level) +{ + if (lstat(iter->base.path.buf, &iter->base.st) < 0) { + if (errno != ENOENT) + warning("error reading path '%s': %s", + iter->base.path.buf, + strerror(errno)); + return -1; + } + + /* + * We have to set these each time because + * the path strbuf might have been realloc()ed. + */ + iter->base.relative_path = + iter->base.path.buf + iter->levels[0].prefix_len; + iter->base.basename = + iter->base.path.buf + level->prefix_len; + level->dir_state = DIR_STATE_ITER; + + return 0; +} + int dir_iterator_advance(struct dir_iterator *dir_iterator) { struct dir_iterator_int *iter = @@ -84,11 +121,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) * over; now prepare to iterate into * it. */ - level->dir_state = DIR_STATE_RECURSE; - ALLOC_GROW(iter->levels, iter->levels_nr + 1, - iter->levels_alloc); - level = &iter->levels[iter->levels_nr++]; - level->initialized = 0; + push_dir_level(iter, level); continue; } else { /* @@ -104,7 +137,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) * This level is exhausted (or wasn't opened * successfully); pop up a level. */ - if (--iter->levels_nr == 0) + if (pop_dir_level(iter) == 0) return dir_iterator_abort(dir_iterator); continue; @@ -129,7 +162,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) iter->base.path.buf, strerror(errno)); level->dir = NULL; - if (--iter->levels_nr == 0) + if (pop_dir_level(iter) == 0) return dir_iterator_abort(dir_iterator); break; } @@ -138,23 +171,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) continue; strbuf_addstr(&iter->base.path, de->d_name); - if (lstat(iter->base.path.buf, &iter->base.st) < 0) { - if (errno != ENOENT) - warning("error reading path '%s': %s", - iter->base.path.buf, - strerror(errno)); - continue; - } - /* - * We have to set these each time because - * the path strbuf might have been realloc()ed. - */ - iter->base.relative_path = - iter->base.path.buf + iter->levels[0].prefix_len; - iter->base.basename = - iter->base.path.buf + level->prefix_len; - level->dir_state = DIR_STATE_ITER; + if (set_iterator_data(iter, level)) + continue; return ITER_OK; } -- 2.7.4 (Apple Git-66)
Perform major refactor of dir_iterator_advance(). dir_iterator has ceased to rely on a convoluted state machine mechanism of two loops and two state variables (level.initialized and level.dir_state). This serves to ease comprehension of the iterator mechanism and ease addition of new features to the iterator. Create an option for the dir_iterator API to iterate over subdirectories only after having iterated through their contents. This feature was predicted, although not implemented by 0fe5043 ("dir_iterator: new API for iterating over a directory tree", 2016-06-18). This is useful for recursively removing a directory and calling rmdir() on a directory only after all of its contents have been wiped. Add an option for the dir_iterator API to iterate over the root directory (the one it was initialized with) as well. Add the "flags" parameter to dir_iterator_create, allowing for the aforementioned new features to be enabled. The new default behavior (i.e. flags set to 0) does not iterate over directories. Flag DIR_ITERATOR_PRE_ORDER_TRAVERSAL iterates over a directory before doing so over its contents. DIR_ITERATOR_POST_ORDER_TRAVERSAL iterates over a directory after doing so over its contents. DIR_ITERATOR_LIST_ROOT_DIR iterates over the root directory. These flags do not conflict with each other and may be used simultaneously. Amend a call to dir_iterator_begin() in refs/files-backend.c to pass the flags parameter introduced. Improve t/t0065-dir-iterator.sh and t/helper/test-dir-iterator.c to test "post-order" and "iterate-over-root" modes. Signed-off-by: Daniel Ferreira <bnmvco@gmail.com> --- dir-iterator.c | 155 +++++++++++++++++++++++++++---------------- dir-iterator.h | 28 ++++++-- refs/files-backend.c | 2 +- t/helper/test-dir-iterator.c | 6 +- t/t0065-dir-iterator.sh | 61 ++++++++++++++++- 5 files changed, 183 insertions(+), 69 deletions(-) diff --git a/dir-iterator.c b/dir-iterator.c index ce8bf81..18b7e68 100644 --- a/dir-iterator.c +++ b/dir-iterator.c @@ -4,8 +4,6 @@ #include "dir-iterator.h" struct dir_iterator_level { - int initialized; - DIR *dir; /* @@ -20,8 +18,11 @@ struct dir_iterator_level { * iteration and also iterated into): */ enum { - DIR_STATE_ITER, - DIR_STATE_RECURSE + DIR_STATE_PUSHED, + DIR_STATE_PRE_ITERATION, + DIR_STATE_ITERATING, + DIR_STATE_POST_ITERATION, + DIR_STATE_EXHAUSTED } dir_state; }; @@ -48,15 +49,20 @@ struct dir_iterator_int { * that will be included in this iteration. */ struct dir_iterator_level *levels; + + /* Holds the flags passed to dir_iterator_begin(). */ + unsigned flags; }; static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level) { - level->dir_state = DIR_STATE_RECURSE; ALLOC_GROW(iter->levels, iter->levels_nr + 1, iter->levels_alloc); + + /* Push a new level */ level = &iter->levels[iter->levels_nr++]; - level->initialized = 0; + level->dir = NULL; + level->dir_state = DIR_STATE_PUSHED; } static inline int pop_dir_level(struct dir_iterator_int *iter) @@ -75,18 +81,35 @@ static inline int set_iterator_data(struct dir_iterator_int *iter, struct dir_it } /* - * We have to set these each time because - * the path strbuf might have been realloc()ed. + * Check if we are dealing with the root directory as an + * item that's being iterated through. */ - iter->base.relative_path = - iter->base.path.buf + iter->levels[0].prefix_len; + if (level->dir_state != DIR_STATE_ITERATING && + iter->levels_nr == 1) { + iter->base.relative_path = "."; + } + else { + iter->base.relative_path = + iter->base.path.buf + iter->levels[0].prefix_len; + } iter->base.basename = iter->base.path.buf + level->prefix_len; - level->dir_state = DIR_STATE_ITER; return 0; } +/* + * This function uses a state machine with the following states: + * -> DIR_STATE_PUSHED: the directory has been pushed to the + * iterator traversal tree. + * -> DIR_STATE_PRE_ITERATION: the directory is *NOT* initialized. The + * dirpath has already been returned if pre-order traversal is set. + * -> DIR_STATE_ITERATING: the directory is initialized. We are traversing + * through it. + * -> DIR_STATE_POST_ITERATION: the directory has been iterated through. + * We are ready to close it. + * -> DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped. + */ int dir_iterator_advance(struct dir_iterator *dir_iterator) { struct dir_iterator_int *iter = @@ -97,7 +120,18 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) &iter->levels[iter->levels_nr - 1]; struct dirent *de; - if (!level->initialized) { + if (level->dir_state == DIR_STATE_PUSHED) { + level->dir_state = DIR_STATE_PRE_ITERATION; + + if (iter->flags & DIR_ITERATOR_PRE_ORDER_TRAVERSAL) { + /* We may not want the root directory to be iterated over */ + if (iter->levels_nr != 1 || + (iter->flags & DIR_ITERATOR_LIST_ROOT_DIR)) { + set_iterator_data(iter, level); + return ITER_OK; + } + } + } else if (level->dir_state == DIR_STATE_PRE_ITERATION) { /* * Note: dir_iterator_begin() ensures that * path is not the empty string. @@ -107,64 +141,35 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) level->prefix_len = iter->base.path.len; level->dir = opendir(iter->base.path.buf); - if (!level->dir && errno != ENOENT) { - warning("error opening directory %s: %s", - iter->base.path.buf, strerror(errno)); - /* Popping the level is handled below */ - } - - level->initialized = 1; - } else if (S_ISDIR(iter->base.st.st_mode)) { - if (level->dir_state == DIR_STATE_ITER) { + if (!level->dir) { /* - * The directory was just iterated - * over; now prepare to iterate into - * it. + * This level wasn't opened sucessfully; pretend we + * iterated through it already. */ - push_dir_level(iter, level); + if (errno != ENOENT) { + warning("error opening directory %s: %s", + iter->base.path.buf, strerror(errno)); + } + + level->dir_state = DIR_STATE_POST_ITERATION; continue; - } else { - /* - * The directory has already been - * iterated over and iterated into; - * we're done with it. - */ } - } - if (!level->dir) { - /* - * This level is exhausted (or wasn't opened - * successfully); pop up a level. - */ - if (pop_dir_level(iter) == 0) - return dir_iterator_abort(dir_iterator); - - continue; - } - - /* - * Loop until we find an entry that we can give back - * to the caller: - */ - while (1) { + level->dir_state = DIR_STATE_ITERATING; + } else if (level->dir_state == DIR_STATE_ITERATING) { strbuf_setlen(&iter->base.path, level->prefix_len); errno = 0; de = readdir(level->dir); if (!de) { - /* This level is exhausted; pop up a level. */ + /* In case of readdir() error */ if (errno) { warning("error reading directory %s: %s", iter->base.path.buf, strerror(errno)); - } else if (closedir(level->dir)) - warning("error closing directory %s: %s", - iter->base.path.buf, strerror(errno)); + } - level->dir = NULL; - if (pop_dir_level(iter) == 0) - return dir_iterator_abort(dir_iterator); - break; + level->dir_state = DIR_STATE_POST_ITERATION; + continue; } if (is_dot_or_dotdot(de->d_name)) @@ -175,7 +180,38 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) if (set_iterator_data(iter, level)) continue; + if (S_ISDIR(iter->base.st.st_mode)) { + push_dir_level(iter, level); + continue; + } + return ITER_OK; + } else if (level->dir_state == DIR_STATE_POST_ITERATION) { + if (level->dir != NULL && closedir(level->dir)) { + warning("error closing directory %s: %s", + iter->base.path.buf, strerror(errno)); + } + level->dir_state = DIR_STATE_EXHAUSTED; + + strbuf_setlen(&iter->base.path, level->prefix_len); + /* + * Since we are iterating through the dirpath + * after we have gone through it, we still need + * to get rid of the trailing slash we appended. + */ + strbuf_strip_suffix(&iter->base.path, "/"); + + if (iter->flags & DIR_ITERATOR_POST_ORDER_TRAVERSAL) { + /* We may not want the root directory to be iterated over */ + if (iter->levels_nr != 1 || + (iter->flags & DIR_ITERATOR_LIST_ROOT_DIR)) { + set_iterator_data(iter, level); + return ITER_OK; + } + } + } else if (level->dir_state == DIR_STATE_EXHAUSTED) { + if (pop_dir_level(iter) == 0) + return dir_iterator_abort(dir_iterator); } } } @@ -201,7 +237,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator) return ITER_DONE; } -struct dir_iterator *dir_iterator_begin(const char *path) +struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags) { struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter)); struct dir_iterator *dir_iterator = &iter->base; @@ -209,13 +245,16 @@ struct dir_iterator *dir_iterator_begin(const char *path) if (!path || !*path) die("BUG: empty path passed to dir_iterator_begin()"); + iter->flags = flags; + strbuf_init(&iter->base.path, PATH_MAX); strbuf_addstr(&iter->base.path, path); ALLOC_GROW(iter->levels, 10, iter->levels_alloc); iter->levels_nr = 1; - iter->levels[0].initialized = 0; + iter->levels[0].dir = NULL; + iter->levels[0].dir_state = DIR_STATE_PUSHED; return dir_iterator; } diff --git a/dir-iterator.h b/dir-iterator.h index 27739e6..0e82f36 100644 --- a/dir-iterator.h +++ b/dir-iterator.h @@ -11,13 +11,12 @@ * Every time dir_iterator_advance() is called, update the members of * the dir_iterator structure to reflect the next path in the * iteration. The order that paths are iterated over within a - * directory is undefined, but directory paths are always iterated - * over before the subdirectory contents. + * directory is undefined. * * A typical iteration looks like this: * * int ok; - * struct iterator *iter = dir_iterator_begin(path); + * struct iterator *iter = dir_iterator_begin(path, flags); * * while ((ok = dir_iterator_advance(iter)) == ITER_OK) { * if (want_to_stop_iteration()) { @@ -38,6 +37,22 @@ * dir_iterator_advance() again. */ +/* + * Possible flags for dir_iterator_begin(). + * + * -> DIR_ITERATOR_PRE_ORDER_TRAVERSAL: the iterator shall return + * a dirpath it has found before iterating through that directory's + * contents. + * -> DIR_ITERATOR_POST_ORDER_TRAVERSAL: the iterator shall return + * a dirpath it has found after iterating through that directory's + * contents. + * -> DIR_ITERATOR_LIST_ROOT_DIR: the iterator shall return the dirpath + * of the root directory it is iterating through. + */ +#define DIR_ITERATOR_PRE_ORDER_TRAVERSAL (1 << 0) +#define DIR_ITERATOR_POST_ORDER_TRAVERSAL (1 << 1) +#define DIR_ITERATOR_LIST_ROOT_DIR (1 << 2) + struct dir_iterator { /* The current path: */ struct strbuf path; @@ -57,15 +72,16 @@ struct dir_iterator { }; /* - * Start a directory iteration over path. Return a dir_iterator that - * holds the internal state of the iteration. + * Start a directory iteration over path, with options specified in + * 'flags'. Return a dir_iterator that holds the internal state of + * the iteration. * * The iteration includes all paths under path, not including path * itself and not including "." or ".." entries. * * path is the starting directory. An internal copy will be made. */ -struct dir_iterator *dir_iterator_begin(const char *path); +struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags); /* * Advance the iterator to the first or next item and return ITER_OK. diff --git a/refs/files-backend.c b/refs/files-backend.c index 50188e9..c29dc68 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3346,7 +3346,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st files_downcast(ref_store, 0, "reflog_iterator_begin"); base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable); - iter->dir_iterator = dir_iterator_begin(git_path("logs")); + iter->dir_iterator = dir_iterator_begin(git_path("logs"), DIR_ITERATOR_PRE_ORDER_TRAVERSAL); return ref_iterator; } diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c index 06f03fc..a1b8c78 100644 --- a/t/helper/test-dir-iterator.c +++ b/t/helper/test-dir-iterator.c @@ -6,6 +6,7 @@ int cmd_main(int argc, const char **argv) { struct strbuf path = STRBUF_INIT; struct dir_iterator *diter; + unsigned flag = DIR_ITERATOR_PRE_ORDER_TRAVERSAL; if (argc < 2) { return 1; @@ -13,7 +14,10 @@ int cmd_main(int argc, const char **argv) { strbuf_add(&path, argv[1], strlen(argv[1])); - diter = dir_iterator_begin(path.buf); + if (argc == 3) + flag = atoi(argv[2]); + + diter = dir_iterator_begin(path.buf, flag); while (dir_iterator_advance(diter) == ITER_OK) { if (S_ISDIR(diter->st.st_mode)) diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh index b857c07..ade3ee0 100755 --- a/t/t0065-dir-iterator.sh +++ b/t/t0065-dir-iterator.sh @@ -28,12 +28,28 @@ test_expect_success 'dir-iterator should iterate through all files' ' >dir/a/e && >dir/d/e/d/a && - test-dir-iterator ./dir | sort >./actual-pre-order-sorted-output && + test-dir-iterator ./dir 1 | sort >./actual-pre-order-sorted-output && rm -rf dir && test_cmp expect-sorted-output actual-pre-order-sorted-output ' +test_expect_success 'dir-iterator should iterate through all files on post-order mode' ' + mkdir -p dir && + mkdir -p dir/a/b/c/ && + >dir/b && + >dir/c && + mkdir -p dir/d/e/d/ && + >dir/a/b/c/d && + >dir/a/e && + >dir/d/e/d/a && + + test-dir-iterator ./dir 2 | sort >actual-post-order-sorted-output && + rm -rf dir && + + test_cmp expect-sorted-output actual-post-order-sorted-output +' + cat >expect-pre-order-output <<-\EOF && [d] (a) ./dir/a [d] (a/b) ./dir/a/b @@ -41,14 +57,53 @@ cat >expect-pre-order-output <<-\EOF && [f] (a/b/c/d) ./dir/a/b/c/d EOF -test_expect_success 'dir-iterator should list files in the correct order' ' +test_expect_success 'dir-iterator should list files properly on pre-order mode' ' mkdir -p dir/a/b/c/ && >dir/a/b/c/d && - test-dir-iterator ./dir >actual-pre-order-output && + test-dir-iterator ./dir 1 >actual-pre-order-output && rm -rf dir && test_cmp expect-pre-order-output actual-pre-order-output ' +cat >expect-post-order-output <<-\EOF && +[f] (a/b/c/d) ./dir/a/b/c/d +[d] (a/b/c) ./dir/a/b/c +[d] (a/b) ./dir/a/b +[d] (a) ./dir/a +EOF + +test_expect_success 'dir-iterator should list files properly on post-order mode' ' + mkdir -p dir/a/b/c/ && + >dir/a/b/c/d && + + test-dir-iterator ./dir 2 >actual-post-order-output && + rm -rf dir && + + test_cmp expect-post-order-output actual-post-order-output +' + +cat >expect-pre-order-post-order-root-dir-output <<-\EOF && +[d] (.) ./dir +[d] (a) ./dir/a +[d] (a/b) ./dir/a/b +[d] (a/b/c) ./dir/a/b/c +[f] (a/b/c/d) ./dir/a/b/c/d +[d] (a/b/c) ./dir/a/b/c +[d] (a/b) ./dir/a/b +[d] (a) ./dir/a +[d] (.) ./dir +EOF + +test_expect_success 'dir-iterator should list files properly on pre-order + post-order + root-dir mode' ' + mkdir -p dir/a/b/c/ && + >dir/a/b/c/d && + + test-dir-iterator ./dir 7 >actual-pre-order-post-order-root-dir-output && + rm -rf dir && + + test_cmp expect-pre-order-post-order-root-dir-output actual-pre-order-post-order-root-dir-output +' + test_done -- 2.7.4 (Apple Git-66)
From: Daniel Ferreira <daniel.calibeta@gmail.com> Use dir_iterator to traverse through remove_subtree()'s directory tree, avoiding the need for recursive calls to readdir(). Simplify remove_subtree()'s code. A conversion similar in purpose was previously done at 46d092a ("for_each_reflog(): reimplement using iterators", 2016-05-21). Signed-off-by: Daniel Ferreira <bnmvco@gmail.com> --- entry.c | 38 ++++++++++++-------------------------- 1 file changed, 12 insertions(+), 26 deletions(-) diff --git a/entry.c b/entry.c index d2b512d..bd06f41 100644 --- a/entry.c +++ b/entry.c @@ -3,6 +3,8 @@ #include "dir.h" #include "streaming.h" #include "submodule.h" +#include "iterator.h" +#include "dir-iterator.h" static void create_directories(const char *path, int path_len, const struct checkout *state) @@ -45,33 +47,17 @@ static void create_directories(const char *path, int path_len, free(buf); } -static void remove_subtree(struct strbuf *path) +static void remove_subtree(const char *path) { - DIR *dir = opendir(path->buf); - struct dirent *de; - int origlen = path->len; - - if (!dir) - die_errno("cannot opendir '%s'", path->buf); - while ((de = readdir(dir)) != NULL) { - struct stat st; - - if (is_dot_or_dotdot(de->d_name)) - continue; - - strbuf_addch(path, '/'); - strbuf_addstr(path, de->d_name); - if (lstat(path->buf, &st)) - die_errno("cannot lstat '%s'", path->buf); - if (S_ISDIR(st.st_mode)) - remove_subtree(path); - else if (unlink(path->buf)) - die_errno("cannot unlink '%s'", path->buf); - strbuf_setlen(path, origlen); + struct dir_iterator *diter = dir_iterator_begin(path, DIR_ITERATOR_POST_ORDER_TRAVERSAL | DIR_ITERATOR_LIST_ROOT_DIR); + + while (dir_iterator_advance(diter) == ITER_OK) { + if (S_ISDIR(diter->st.st_mode)) { + if (rmdir(diter->path.buf)) + die_errno("cannot rmdir '%s'", diter->path.buf); + } else if (unlink(diter->path.buf)) + die_errno("cannot unlink '%s'", diter->path.buf); } - closedir(dir); - if (rmdir(path->buf)) - die_errno("cannot rmdir '%s'", path->buf); } static int create_file(const char *path, unsigned int mode) @@ -312,7 +298,7 @@ int checkout_entry(struct cache_entry *ce, return 0; if (!state->force) return error("%s is a directory", path.buf); - remove_subtree(&path); + remove_subtree(path.buf); } else if (unlink(path.buf)) return error_errno("unable to unlink old '%s'", path.buf); } else if (state->not_new) -- 2.7.4 (Apple Git-66)
On 04/02/2017 10:03 PM, Daniel Ferreira wrote: > Perform major refactor of dir_iterator_advance(). dir_iterator has > ceased to rely on a convoluted state machine mechanism of two loops and > two state variables (level.initialized and level.dir_state). This serves > to ease comprehension of the iterator mechanism and ease addition of new > features to the iterator. > > Create an option for the dir_iterator API to iterate over subdirectories > only after having iterated through their contents. This feature was > predicted, although not implemented by 0fe5043 ("dir_iterator: new API > for iterating over a directory tree", 2016-06-18). This is useful for > recursively removing a directory and calling rmdir() on a directory only > after all of its contents have been wiped. > > Add an option for the dir_iterator API to iterate over the root > directory (the one it was initialized with) as well. > > Add the "flags" parameter to dir_iterator_create, allowing for the > aforementioned new features to be enabled. The new default behavior > (i.e. flags set to 0) does not iterate over directories. Flag > DIR_ITERATOR_PRE_ORDER_TRAVERSAL iterates over a directory before doing > so over its contents. DIR_ITERATOR_POST_ORDER_TRAVERSAL iterates over a > directory after doing so over its contents. DIR_ITERATOR_LIST_ROOT_DIR > iterates over the root directory. These flags do not conflict with each > other and may be used simultaneously. > > Amend a call to dir_iterator_begin() in refs/files-backend.c to pass > the flags parameter introduced. > > Improve t/t0065-dir-iterator.sh and t/helper/test-dir-iterator.c to > test "post-order" and "iterate-over-root" modes. Nice. I think this version is a lot more understandable than the old code. I also think it's getting very close to done. > Signed-off-by: Daniel Ferreira <bnmvco@gmail.com> > --- > dir-iterator.c | 155 +++++++++++++++++++++++++++---------------- > dir-iterator.h | 28 ++++++-- > refs/files-backend.c | 2 +- > t/helper/test-dir-iterator.c | 6 +- > t/t0065-dir-iterator.sh | 61 ++++++++++++++++- > 5 files changed, 183 insertions(+), 69 deletions(-) > > diff --git a/dir-iterator.c b/dir-iterator.c > index ce8bf81..18b7e68 100644 > --- a/dir-iterator.c > +++ b/dir-iterator.c > @@ -4,8 +4,6 @@ > #include "dir-iterator.h" > > struct dir_iterator_level { > - int initialized; > - > DIR *dir; > > /* > @@ -20,8 +18,11 @@ struct dir_iterator_level { > * iteration and also iterated into): > */ > enum { > - DIR_STATE_ITER, > - DIR_STATE_RECURSE > + DIR_STATE_PUSHED, > + DIR_STATE_PRE_ITERATION, > + DIR_STATE_ITERATING, > + DIR_STATE_POST_ITERATION, > + DIR_STATE_EXHAUSTED > } dir_state; > }; > > @@ -48,15 +49,20 @@ struct dir_iterator_int { > * that will be included in this iteration. > */ > struct dir_iterator_level *levels; > + > + /* Holds the flags passed to dir_iterator_begin(). */ > + unsigned flags; > }; > > static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level) > { > - level->dir_state = DIR_STATE_RECURSE; > ALLOC_GROW(iter->levels, iter->levels_nr + 1, > iter->levels_alloc); > + > + /* Push a new level */ > level = &iter->levels[iter->levels_nr++]; > - level->initialized = 0; > + level->dir = NULL; > + level->dir_state = DIR_STATE_PUSHED; > } > > static inline int pop_dir_level(struct dir_iterator_int *iter) > @@ -75,18 +81,35 @@ static inline int set_iterator_data(struct dir_iterator_int *iter, struct dir_it > } > > /* > - * We have to set these each time because > - * the path strbuf might have been realloc()ed. > + * Check if we are dealing with the root directory as an > + * item that's being iterated through. > */ > - iter->base.relative_path = > - iter->base.path.buf + iter->levels[0].prefix_len; > + if (level->dir_state != DIR_STATE_ITERATING && > + iter->levels_nr == 1) { > + iter->base.relative_path = "."; Doesn't `iter->base.basename` also need special handling in this case? > + } > + else { The Git project standard is to including the `else` on the same line as the curly brace ending the `if` block: } else { > + iter->base.relative_path = > + iter->base.path.buf + iter->levels[0].prefix_len; > + } > iter->base.basename = > iter->base.path.buf + level->prefix_len; > - level->dir_state = DIR_STATE_ITER; > > return 0; > } > > +/* > + * This function uses a state machine with the following states: > + * -> DIR_STATE_PUSHED: the directory has been pushed to the > + * iterator traversal tree. > + * -> DIR_STATE_PRE_ITERATION: the directory is *NOT* initialized. The > + * dirpath has already been returned if pre-order traversal is set. > + * -> DIR_STATE_ITERATING: the directory is initialized. We are traversing > + * through it. > + * -> DIR_STATE_POST_ITERATION: the directory has been iterated through. > + * We are ready to close it. > + * -> DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped. > + */ > int dir_iterator_advance(struct dir_iterator *dir_iterator) > { > struct dir_iterator_int *iter = > @@ -97,7 +120,18 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) > &iter->levels[iter->levels_nr - 1]; > struct dirent *de; > > - if (!level->initialized) { > + if (level->dir_state == DIR_STATE_PUSHED) { > + level->dir_state = DIR_STATE_PRE_ITERATION; > + > + if (iter->flags & DIR_ITERATOR_PRE_ORDER_TRAVERSAL) { > + /* We may not want the root directory to be iterated over */ > + if (iter->levels_nr != 1 || > + (iter->flags & DIR_ITERATOR_LIST_ROOT_DIR)) { It would be a tiny improvement in readability to change the check `iter->levels_nr != 1` to `iters->levels_nr > 1`, to help make it clearer that this code block is not about `iters->level_nr == 0`. > + set_iterator_data(iter, level); set_iterator_data() reports errors by returning -1, but here (and in the DIR_ITERATOR_POST_ORDER_TRAVERSAL code block below) you fail to check the return value. If you are thinking to yourself "that can't happen because I already checked `lstat()` in a previous iteration of the loop", remember that another process might be changing things on the filesystem at the same time as the iteration is running ("TOCTOU"). But of course that would make for slightly strange behavior if, during the PRE_ORDER_TRAVERSAL, a directory is readable, but by the time of the POST_ORDER_TRAVERSAL, it has disappeared. The iteration might list the contents of a directory but not the POST_ORDER_TRAVERSAL of the directory itself. Or, if both PRE and POST are being used, there would be an "entering directory" event but no matching "leaving directory" event, which could be confusing to a hypothetical caller who expects the "entering directory" and "leaving directory" events to match up in pairs. Furthermore, you are calling `lstat()` more often than necessary. This function is quite expensive on Windows. You could solve all three problems by running `lstat()` on the directory once the first time you see it, storing its value in the `dir_iterator_level`, and copying that value to `iter->base.st` when it is needed. (This would fix the second problem by removing the need for a failing `set_iterator_data()` call during the POST_ORDER_TRAVERSAL phase of processing a directory.) This would mean that we might return stale `lstat()` data to the caller, but note that `lstat()` data is *always* stale, because it is always possible that another process changes it the nanosecond after our process reads it. So it's a possibility that the caller has to deal with in any case. > + return ITER_OK; > + } > + } > + } else if (level->dir_state == DIR_STATE_PRE_ITERATION) { > /* > * Note: dir_iterator_begin() ensures that > * path is not the empty string. > @@ -107,64 +141,35 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) > level->prefix_len = iter->base.path.len; > > level->dir = opendir(iter->base.path.buf); > - if (!level->dir && errno != ENOENT) { > - warning("error opening directory %s: %s", > - iter->base.path.buf, strerror(errno)); > - /* Popping the level is handled below */ > - } > - > - level->initialized = 1; > - } else if (S_ISDIR(iter->base.st.st_mode)) { > - if (level->dir_state == DIR_STATE_ITER) { > + if (!level->dir) { > /* > - * The directory was just iterated > - * over; now prepare to iterate into > - * it. > + * This level wasn't opened sucessfully; pretend we > + * iterated through it already. > */ > - push_dir_level(iter, level); > + if (errno != ENOENT) { > + warning("error opening directory %s: %s", > + iter->base.path.buf, strerror(errno)); > + } > + > + level->dir_state = DIR_STATE_POST_ITERATION; > continue; > - } else { > - /* > - * The directory has already been > - * iterated over and iterated into; > - * we're done with it. > - */ > } > - } > > - if (!level->dir) { > - /* > - * This level is exhausted (or wasn't opened > - * successfully); pop up a level. > - */ > - if (pop_dir_level(iter) == 0) > - return dir_iterator_abort(dir_iterator); > - > - continue; > - } > - > - /* > - * Loop until we find an entry that we can give back > - * to the caller: > - */ > - while (1) { > + level->dir_state = DIR_STATE_ITERATING; > + } else if (level->dir_state == DIR_STATE_ITERATING) { > strbuf_setlen(&iter->base.path, level->prefix_len); > errno = 0; > de = readdir(level->dir); > > if (!de) { > - /* This level is exhausted; pop up a level. */ > + /* In case of readdir() error */ > if (errno) { > warning("error reading directory %s: %s", > iter->base.path.buf, strerror(errno)); > - } else if (closedir(level->dir)) > - warning("error closing directory %s: %s", > - iter->base.path.buf, strerror(errno)); > + } > > - level->dir = NULL; > - if (pop_dir_level(iter) == 0) > - return dir_iterator_abort(dir_iterator); > - break; > + level->dir_state = DIR_STATE_POST_ITERATION; > + continue; > } > > if (is_dot_or_dotdot(de->d_name)) > @@ -175,7 +180,38 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) > if (set_iterator_data(iter, level)) > continue; > > + if (S_ISDIR(iter->base.st.st_mode)) { > + push_dir_level(iter, level); > + continue; > + } > + > return ITER_OK; > + } else if (level->dir_state == DIR_STATE_POST_ITERATION) { > + if (level->dir != NULL && closedir(level->dir)) { > + warning("error closing directory %s: %s", > + iter->base.path.buf, strerror(errno)); > + } > + level->dir_state = DIR_STATE_EXHAUSTED; > + > + strbuf_setlen(&iter->base.path, level->prefix_len); > + /* > + * Since we are iterating through the dirpath > + * after we have gone through it, we still need > + * to get rid of the trailing slash we appended. > + */ > + strbuf_strip_suffix(&iter->base.path, "/"); > + > + if (iter->flags & DIR_ITERATOR_POST_ORDER_TRAVERSAL) { > + /* We may not want the root directory to be iterated over */ > + if (iter->levels_nr != 1 || > + (iter->flags & DIR_ITERATOR_LIST_ROOT_DIR)) { > + set_iterator_data(iter, level); > + return ITER_OK; > + } > + } > + } else if (level->dir_state == DIR_STATE_EXHAUSTED) { > + if (pop_dir_level(iter) == 0) > + return dir_iterator_abort(dir_iterator); > } > } > } As far as I can tell, you got the logic in this complicated big loop correct on the first try (well, if we ignore v6 :-) ), even as you added new features. I think that's good evidence that the new structure is more comprehensible than the old, plus the new tests probably helped. That's a big win! > @@ -201,7 +237,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator) > return ITER_DONE; > } > > -struct dir_iterator *dir_iterator_begin(const char *path) > +struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags) > { > struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter)); > struct dir_iterator *dir_iterator = &iter->base; > @@ -209,13 +245,16 @@ struct dir_iterator *dir_iterator_begin(const char *path) > if (!path || !*path) > die("BUG: empty path passed to dir_iterator_begin()"); > > + iter->flags = flags; > + > strbuf_init(&iter->base.path, PATH_MAX); > strbuf_addstr(&iter->base.path, path); > > ALLOC_GROW(iter->levels, 10, iter->levels_alloc); > > iter->levels_nr = 1; > - iter->levels[0].initialized = 0; > + iter->levels[0].dir = NULL; > + iter->levels[0].dir_state = DIR_STATE_PUSHED; > > return dir_iterator; > } > diff --git a/dir-iterator.h b/dir-iterator.h > index 27739e6..0e82f36 100644 > --- a/dir-iterator.h > +++ b/dir-iterator.h > @@ -11,13 +11,12 @@ > * Every time dir_iterator_advance() is called, update the members of > * the dir_iterator structure to reflect the next path in the > * iteration. The order that paths are iterated over within a > - * directory is undefined, but directory paths are always iterated > - * over before the subdirectory contents. > + * directory is undefined. > * > * A typical iteration looks like this: > * > * int ok; > - * struct iterator *iter = dir_iterator_begin(path); > + * struct iterator *iter = dir_iterator_begin(path, flags); > * > * while ((ok = dir_iterator_advance(iter)) == ITER_OK) { > * if (want_to_stop_iteration()) { > @@ -38,6 +37,22 @@ > * dir_iterator_advance() again. > */ > > +/* > + * Possible flags for dir_iterator_begin(). > + * > + * -> DIR_ITERATOR_PRE_ORDER_TRAVERSAL: the iterator shall return > + * a dirpath it has found before iterating through that directory's > + * contents. > + * -> DIR_ITERATOR_POST_ORDER_TRAVERSAL: the iterator shall return > + * a dirpath it has found after iterating through that directory's > + * contents. > + * -> DIR_ITERATOR_LIST_ROOT_DIR: the iterator shall return the dirpath > + * of the root directory it is iterating through. I think it would be useful to mention here that the three options can be used in any combination. The documentation for `DIR_ITERATOR_LIST_ROOT_DIR` isn't precisely accurate. If neither of the `*_ORDER_TRAVERSAL` options is specified, then the root directory is not included in the iteration even if this option is specified. I think this is enough of a boundary case that it doesn't really matter if you change the behavior or change the documentation, but one way or another please make them agree. > + */ > +#define DIR_ITERATOR_PRE_ORDER_TRAVERSAL (1 << 0) > +#define DIR_ITERATOR_POST_ORDER_TRAVERSAL (1 << 1) > +#define DIR_ITERATOR_LIST_ROOT_DIR (1 << 2) > + > struct dir_iterator { > /* The current path: */ > struct strbuf path; > @@ -57,15 +72,16 @@ struct dir_iterator { > }; > > /* > - * Start a directory iteration over path. Return a dir_iterator that > - * holds the internal state of the iteration. > + * Start a directory iteration over path, with options specified in > + * 'flags'. Return a dir_iterator that holds the internal state of > + * the iteration. > * > * The iteration includes all paths under path, not including path > * itself and not including "." or ".." entries. The sentence above should be updated now that you have implemented `DIR_ITERATOR_LIST_ROOT_DIR`. > * path is the starting directory. An internal copy will be made. > */ > -struct dir_iterator *dir_iterator_begin(const char *path); > +struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags); > > /* > * Advance the iterator to the first or next item and return ITER_OK. > diff --git a/refs/files-backend.c b/refs/files-backend.c > index 50188e9..c29dc68 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -3346,7 +3346,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st > files_downcast(ref_store, 0, "reflog_iterator_begin"); > > base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable); > - iter->dir_iterator = dir_iterator_begin(git_path("logs")); > + iter->dir_iterator = dir_iterator_begin(git_path("logs"), DIR_ITERATOR_PRE_ORDER_TRAVERSAL); > return ref_iterator; > } > > diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c > index 06f03fc..a1b8c78 100644 > --- a/t/helper/test-dir-iterator.c > +++ b/t/helper/test-dir-iterator.c > @@ -6,6 +6,7 @@ > int cmd_main(int argc, const char **argv) { > struct strbuf path = STRBUF_INIT; > struct dir_iterator *diter; > + unsigned flag = DIR_ITERATOR_PRE_ORDER_TRAVERSAL; > > if (argc < 2) { > return 1; > @@ -13,7 +14,10 @@ int cmd_main(int argc, const char **argv) { > > strbuf_add(&path, argv[1], strlen(argv[1])); > > - diter = dir_iterator_begin(path.buf); > + if (argc == 3) > + flag = atoi(argv[2]); It's not ideal that the test code depends on the numerical values of the flag constants, and it makes the tests harder to understand. It would be better if this program were to accept options like `--pre-order`, `--post-order`, etc., as I suggested in an earlier round of review. > + diter = dir_iterator_begin(path.buf, flag); > > while (dir_iterator_advance(diter) == ITER_OK) { > if (S_ISDIR(diter->st.st_mode)) > diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh > index b857c07..ade3ee0 100755 > --- a/t/t0065-dir-iterator.sh > +++ b/t/t0065-dir-iterator.sh > @@ -28,12 +28,28 @@ test_expect_success 'dir-iterator should iterate through all files' ' > >dir/a/e && > >dir/d/e/d/a && > > - test-dir-iterator ./dir | sort >./actual-pre-order-sorted-output && > + test-dir-iterator ./dir 1 | sort >./actual-pre-order-sorted-output && > rm -rf dir && > > test_cmp expect-sorted-output actual-pre-order-sorted-output > ' > > +test_expect_success 'dir-iterator should iterate through all files on post-order mode' ' > + mkdir -p dir && > + mkdir -p dir/a/b/c/ && > + >dir/b && > + >dir/c && > + mkdir -p dir/d/e/d/ && > + >dir/a/b/c/d && > + >dir/a/e && > + >dir/d/e/d/a && > + > + test-dir-iterator ./dir 2 | sort >actual-post-order-sorted-output && > + rm -rf dir && > + > + test_cmp expect-sorted-output actual-post-order-sorted-output > +' The setup done in the test above is identical to the setup for the test before it. Similarly, the last three tests also all use the same setup. Please put all of the setup code in an extra "pseudo-test" at the top of the file, to avoid duplication: test_expect_success 'setup' ' mkdir -p dir1 && mkdir -p dir1/a/b/c/ && >dir1/b && >dir1/c && mkdir -p dir1/d/e/d/ && >dir1/a/b/c/d && >dir1/a/e && >dir1/d/e/d/a && mkdir -p dir2/a/b/c/ && >dir2/a/b/c/d ' Note that at least one of the test directories has to be renamed so that they don't conflict with each other. There is no need for the individual tests to delete their test directories; the test harness will take care of that. But if you *did* need to clean up after a test, you should do it like this: mkdir foo && test_when_finished "rm -rf foo" && ...tests involving foo... The advantage of `test_when_finished` is that it ensures that the cleanup code is run even if some part of the test fails. > cat >expect-pre-order-output <<-\EOF && > [d] (a) ./dir/a > [d] (a/b) ./dir/a/b > @@ -41,14 +57,53 @@ cat >expect-pre-order-output <<-\EOF && > [f] (a/b/c/d) ./dir/a/b/c/d > EOF > > -test_expect_success 'dir-iterator should list files in the correct order' ' > +test_expect_success 'dir-iterator should list files properly on pre-order mode' ' > mkdir -p dir/a/b/c/ && > >dir/a/b/c/d && > > - test-dir-iterator ./dir >actual-pre-order-output && > + test-dir-iterator ./dir 1 >actual-pre-order-output && > rm -rf dir && > > test_cmp expect-pre-order-output actual-pre-order-output > ' > > +cat >expect-post-order-output <<-\EOF && > +[f] (a/b/c/d) ./dir/a/b/c/d > +[d] (a/b/c) ./dir/a/b/c > +[d] (a/b) ./dir/a/b > +[d] (a) ./dir/a > +EOF > + > +test_expect_success 'dir-iterator should list files properly on post-order mode' ' > + mkdir -p dir/a/b/c/ && > + >dir/a/b/c/d && > + > + test-dir-iterator ./dir 2 >actual-post-order-output && > + rm -rf dir && > + > + test_cmp expect-post-order-output actual-post-order-output > +' > + > +cat >expect-pre-order-post-order-root-dir-output <<-\EOF && > +[d] (.) ./dir > +[d] (a) ./dir/a > +[d] (a/b) ./dir/a/b > +[d] (a/b/c) ./dir/a/b/c > +[f] (a/b/c/d) ./dir/a/b/c/d > +[d] (a/b/c) ./dir/a/b/c > +[d] (a/b) ./dir/a/b > +[d] (a) ./dir/a > +[d] (.) ./dir > +EOF > + > +test_expect_success 'dir-iterator should list files properly on pre-order + post-order + root-dir mode' ' > + mkdir -p dir/a/b/c/ && > + >dir/a/b/c/d && > + > + test-dir-iterator ./dir 7 >actual-pre-order-post-order-root-dir-output && > + rm -rf dir && > + > + test_cmp expect-pre-order-post-order-root-dir-output actual-pre-order-post-order-root-dir-output > +' > + > test_done Michael
On 04/02/2017 10:03 PM, Daniel Ferreira wrote: > From: Daniel Ferreira <daniel.calibeta@gmail.com> > > Use dir_iterator to traverse through remove_subtree()'s directory tree, > avoiding the need for recursive calls to readdir(). Simplify > remove_subtree()'s code. > > A conversion similar in purpose was previously done at 46d092a > ("for_each_reflog(): reimplement using iterators", 2016-05-21). > > Signed-off-by: Daniel Ferreira <bnmvco@gmail.com> This patch has a different author email than the others. If possible, please choose one email address that you will use for your commits, and try to be consistent. > --- > entry.c | 38 ++++++++++++-------------------------- > 1 file changed, 12 insertions(+), 26 deletions(-) > > diff --git a/entry.c b/entry.c > index d2b512d..bd06f41 100644 > --- a/entry.c > +++ b/entry.c > @@ -3,6 +3,8 @@ > #include "dir.h" > #include "streaming.h" > #include "submodule.h" > +#include "iterator.h" > +#include "dir-iterator.h" > > static void create_directories(const char *path, int path_len, > const struct checkout *state) > @@ -45,33 +47,17 @@ static void create_directories(const char *path, int path_len, > free(buf); > } > > -static void remove_subtree(struct strbuf *path) > +static void remove_subtree(const char *path) > { > - DIR *dir = opendir(path->buf); > - struct dirent *de; > - int origlen = path->len; > - > - if (!dir) > - die_errno("cannot opendir '%s'", path->buf); > - while ((de = readdir(dir)) != NULL) { > - struct stat st; > - > - if (is_dot_or_dotdot(de->d_name)) > - continue; > - > - strbuf_addch(path, '/'); > - strbuf_addstr(path, de->d_name); > - if (lstat(path->buf, &st)) > - die_errno("cannot lstat '%s'", path->buf); > - if (S_ISDIR(st.st_mode)) > - remove_subtree(path); > - else if (unlink(path->buf)) > - die_errno("cannot unlink '%s'", path->buf); > - strbuf_setlen(path, origlen); > + struct dir_iterator *diter = dir_iterator_begin(path, DIR_ITERATOR_POST_ORDER_TRAVERSAL | DIR_ITERATOR_LIST_ROOT_DIR); The line above is way too long. Try to limit lines to 80 characters max. (This is documented in `Documentation/CodingGuidelines`.) > + while (dir_iterator_advance(diter) == ITER_OK) { > + if (S_ISDIR(diter->st.st_mode)) { > + if (rmdir(diter->path.buf)) > + die_errno("cannot rmdir '%s'", diter->path.buf); > + } else if (unlink(diter->path.buf)) > + die_errno("cannot unlink '%s'", diter->path.buf); > } > - closedir(dir); > - if (rmdir(path->buf)) > - die_errno("cannot rmdir '%s'", path->buf); > } > > static int create_file(const char *path, unsigned int mode) > @@ -312,7 +298,7 @@ int checkout_entry(struct cache_entry *ce, > return 0; > if (!state->force) > return error("%s is a directory", path.buf); > - remove_subtree(&path); > + remove_subtree(path.buf); > } else if (unlink(path.buf)) > return error_errno("unable to unlink old '%s'", path.buf); > } else if (state->not_new) > That's a gratifying decrease in code size :-) Michael
On Mon, Apr 3, 2017 at 12:36 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote: > As far as I can tell, you got the logic in this complicated big loop > correct on the first try (well, if we ignore v6 :-) ), even as you added > new features. I think that's good evidence that the new structure is > more comprehensible than the old, plus the new tests probably helped. > That's a big win! Thanks for ignoring v6 ;) Another gain is that the other proposed features (only iterate over directories, do not recurse into subdirectories) are also quite easy to add with this new structure. > It's not ideal that the test code depends on the numerical values of the > flag constants, and it makes the tests harder to understand. It would be > better if this program were to accept options like `--pre-order`, > `--post-order`, etc., as I suggested in an earlier round of review. Although it does make tests harder to understand, if we were to specify how to iterate with human-readable flags we'd add the getopt() + flag configuration overhead to this helper program to be able to handle all cases properly. Additionally, new flags added to dir_iterator would have to edit the test program as well, generating extra work. I personally think that the string describing the test in the script is enough to explain what the flag-as-argument is doing for the sake of readability. The only gain I see in your suggestion is that we avoid the programmer committing errors calculating the flag by hand when writing the test. That said, I'd appreciate some more thought on this. > Michael > Thanks for the review. I agree with all other points and I'll address them in a next series.
On Sun, Apr 2, 2017 at 1:03 PM, Daniel Ferreira <bnmvco@gmail.com> wrote: > Create t/helper/test-dir-iterator.c, which prints relevant information > about a directory tree iterated over with dir_iterator. > > Create t/t0065-dir-iterator.sh, which tests that dir_iterator does > iterate through a whole directory tree. > > Signed-off-by: Daniel Ferreira <bnmvco@gmail.com> > --- > Makefile | 1 + > t/helper/.gitignore | 1 + > t/helper/test-dir-iterator.c | 28 +++++++++++++++++++++++ > t/t0065-dir-iterator.sh | 54 ++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 84 insertions(+) > create mode 100644 t/helper/test-dir-iterator.c > create mode 100755 t/t0065-dir-iterator.sh > > diff --git a/Makefile b/Makefile > index 9b36068..41ce9ab 100644 > --- a/Makefile > +++ b/Makefile > @@ -614,6 +614,7 @@ TEST_PROGRAMS_NEED_X += test-ctype > TEST_PROGRAMS_NEED_X += test-config > TEST_PROGRAMS_NEED_X += test-date > TEST_PROGRAMS_NEED_X += test-delta > +TEST_PROGRAMS_NEED_X += test-dir-iterator > TEST_PROGRAMS_NEED_X += test-dump-cache-tree > TEST_PROGRAMS_NEED_X += test-dump-split-index > TEST_PROGRAMS_NEED_X += test-dump-untracked-cache > diff --git a/t/helper/.gitignore b/t/helper/.gitignore > index 758ed2e..a7d74d3 100644 > --- a/t/helper/.gitignore > +++ b/t/helper/.gitignore > @@ -3,6 +3,7 @@ > /test-config > /test-date > /test-delta > +/test-dir-iterator > /test-dump-cache-tree > /test-dump-split-index > /test-dump-untracked-cache > diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c > new file mode 100644 > index 0000000..06f03fc > --- /dev/null > +++ b/t/helper/test-dir-iterator.c > @@ -0,0 +1,28 @@ > +#include "git-compat-util.h" > +#include "strbuf.h" > +#include "iterator.h" > +#include "dir-iterator.h" > + > +int cmd_main(int argc, const char **argv) { > + struct strbuf path = STRBUF_INIT; > + struct dir_iterator *diter; > + > + if (argc < 2) { Coding style: please use no braces for single statements after control structures (if, for, while). Instead of just returning 1 (which is harder to diagnose in an interactive shell for users who do not know the details of this program), we could die("BUG: test-dir-iterator takes exactly one argument"); > + return 1; > + } > + > + strbuf_add(&path, argv[1], strlen(argv[1])); > + > + diter = dir_iterator_begin(path.buf); > + > + while (dir_iterator_advance(diter) == ITER_OK) { > + if (S_ISDIR(diter->st.st_mode)) > + printf("[d] "); > + else > + printf("[f] "); In the operating system there are more 'st_mode's than just directory or file, e.g. symbolic link. Maybe else if (S_ISREG(diter->st.st_mode)) printf("[f] "); else printf("[?]"); to catch the unknown things better instead of defaulting them to regular files? > +#!/bin/sh > + > +test_description='Test directory iteration.' > + > +. ./test-lib.sh > + > +cat >expect-sorted-output <<-\EOF && > +[d] (a) ./dir/a > +[d] (a/b) ./dir/a/b > +[d] (a/b/c) ./dir/a/b/c > +[d] (d) ./dir/d > +[d] (d/e) ./dir/d/e > +[d] (d/e/d) ./dir/d/e/d > +[f] (a/b/c/d) ./dir/a/b/c/d > +[f] (a/e) ./dir/a/e > +[f] (b) ./dir/b > +[f] (c) ./dir/c > +[f] (d/e/d/a) ./dir/d/e/d/a > +EOF > + > +test_expect_success 'dir-iterator should iterate through all files' ' > + mkdir -p dir && > + mkdir -p dir/a/b/c/ && > + >dir/b && > + >dir/c && > + mkdir -p dir/d/e/d/ && > + >dir/a/b/c/d && > + >dir/a/e && > + >dir/d/e/d/a && > + > + test-dir-iterator ./dir | sort >./actual-pre-order-sorted-output && We just had some micro projects that were trying to get rid of git commands on the upstream of a pipe, to better observe the exit codes of the programs that we write here. I think it is not as critical for test helpers, but the it would be better to diagnose an error when we had test-dir-iterator ./dir >out && sort <out >actual && test_cmp expect actual as in case of an error we'd stop early in the && chain at test-dir-iterator (and if that would use the die() function we'd even get some error message on stderr). > + rm -rf dir && I'd put that cleanup first via test_expect_success 'title' ' test_when_finished "rm -rf dir" && mkdir ... && ... ' By doing so, an inspection is easier afterwards. When this test breaks, you'd want to debug it via e.g. cd t ./t0065-dir-iterator.sh -d -i -v -x # see README for all the nice flags) which in case of error would stop at e.g. test_cmp failing. But at that point of the test we already removed the files so it is impossible to inspect what was on the file system, without adding a test_pause before the rm. > + > + test_cmp expect-sorted-output actual-pre-order-sorted-output Usually the file names are shorter (expect, actual), but I do not mind encoding some expectation in them. Thanks, Stefan
On Sun, Apr 2, 2017 at 1:03 PM, Daniel Ferreira <bnmvco@gmail.com> wrote:
> Test removing a nested directory when an attempt is made to restore the
> index to a state where it does not exist. A similar test could be found
> previously in t/t2000-checkout-cache-clash.sh, but it did not check for
> nested directories, which could allow a faulty implementation of
> remove_subtree() pass the tests.
>
> Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
> ---
> t/t2000-checkout-cache-clash.sh | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/t/t2000-checkout-cache-clash.sh b/t/t2000-checkout-cache-clash.sh
> index de3edb5..ac10ba3 100755
> --- a/t/t2000-checkout-cache-clash.sh
> +++ b/t/t2000-checkout-cache-clash.sh
> @@ -57,4 +57,15 @@ test_expect_success SYMLINKS 'checkout-index -f twice with --prefix' '
> git checkout-index -a -f --prefix=there/
> '
>
> +test_expect_success 'git checkout-index -f should remove nested subtrees' '
> + echo content >path &&
> + git update-index --add path &&
> + rm path &&
> + mkdir -p path/with/nested/paths &&
> + echo content >path/file1 &&
> + echo content >path/with/nested/paths/file2 &&
> + git checkout-index -f -a &&
> + test ! -d path
> +'
> +
Looks good to me.
Thanks,
Stefan
On Sun, Apr 2, 2017 at 1:03 PM, Daniel Ferreira <bnmvco@gmail.com> wrote: > Create inline helpers to dir_iterator_advance(). Make > dir_iterator_advance()'s code more legible and allow some behavior to > be reusable. > > Signed-off-by: Daniel Ferreira <bnmvco@gmail.com> > --- > +static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level) Sometimes we use inline, sometimes we do not: $ git grep "static inline" -- *.c |wc -l 84 $ git grep "static inline" -- *.h |wc -l 130 # approximate a function definition: $ git grep -e "static" -- *.c |grep "("|wc -l 2971 So I think we'd want to have the inline keyword in the header only (when the code is actually to be inlined). As a C file is one translation unit, the compiler can figure out best whether to inline a particular static function on its own. The rest looks good to me, Thanks, Stefan
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> --- Daniel, > Although it does make tests harder to understand, if we were to > specify how to iterate with human-readable flags we'd add the getopt() > + flag configuration overhead to this helper program to be able to > handle all cases properly. Additionally, new flags added to > dir_iterator would have to edit the test program as well, generating > extra work. I think you're exaggerating the amount of extra work. I think all you need to do is squash the attached patch into your patch 4/5, for the gain of only 14 lines of simple code, four of which could be deleted if you don't care about supporting "--". Supporting hypothetical future new options would cost exactly two more lines for each option. Plus, this also fixes the handling of more than two args. It might be even shorter if you use `parse_options()`, but that seems like overkill here. On the plus side, anybody can now change the `DIR_ITERATOR_*` constants without breaking things, or grep for them to find all of the places that they are used. Plus the test code is more readable. In my opinion that is a win. Michael t/helper/test-dir-iterator.c | 28 +++++++++++++++++++++------- t/t0065-dir-iterator.sh | 10 +++++----- 2 files changed, 26 insertions(+), 12 deletions(-) diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c index a1b8c78434..c2eb2ca1f9 100644 --- a/t/helper/test-dir-iterator.c +++ b/t/helper/test-dir-iterator.c @@ -4,18 +4,32 @@ #include "dir-iterator.h" int cmd_main(int argc, const char **argv) { + const char **myargv = argv; + int myargc = argc; struct strbuf path = STRBUF_INIT; struct dir_iterator *diter; - unsigned flag = DIR_ITERATOR_PRE_ORDER_TRAVERSAL; - - if (argc < 2) { - return 1; + unsigned flag = 0; + + while (--myargc && starts_with(*++myargv, "--")) { + if (!strcmp(*myargv, "--pre-order")) { + flag |= DIR_ITERATOR_PRE_ORDER_TRAVERSAL; + } else if (!strcmp(*myargv, "--post-order")) { + flag |= DIR_ITERATOR_POST_ORDER_TRAVERSAL; + } else if (!strcmp(*myargv, "--list-root-dir")) { + flag |= DIR_ITERATOR_LIST_ROOT_DIR; + } else if (!strcmp(*myargv, "--")) { + myargc--; + myargv++; + break; + } else { + die("Unrecognized option: %s", *myargv); + } } - strbuf_add(&path, argv[1], strlen(argv[1])); + if (myargc != 1) + die("expected exactly one non-option argument"); - if (argc == 3) - flag = atoi(argv[2]); + strbuf_addstr(&path, *myargv); diter = dir_iterator_begin(path.buf, flag); diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh index ade3ee0e7e..4819e6181d 100755 --- a/t/t0065-dir-iterator.sh +++ b/t/t0065-dir-iterator.sh @@ -28,7 +28,7 @@ test_expect_success 'dir-iterator should iterate through all files' ' >dir/a/e && >dir/d/e/d/a && - test-dir-iterator ./dir 1 | sort >./actual-pre-order-sorted-output && + test-dir-iterator --pre-order ./dir | sort >./actual-pre-order-sorted-output && rm -rf dir && test_cmp expect-sorted-output actual-pre-order-sorted-output @@ -44,7 +44,7 @@ test_expect_success 'dir-iterator should iterate through all files on post-order >dir/a/e && >dir/d/e/d/a && - test-dir-iterator ./dir 2 | sort >actual-post-order-sorted-output && + test-dir-iterator --post-order ./dir | sort >actual-post-order-sorted-output && rm -rf dir && test_cmp expect-sorted-output actual-post-order-sorted-output @@ -61,7 +61,7 @@ test_expect_success 'dir-iterator should list files properly on pre-order mode' mkdir -p dir/a/b/c/ && >dir/a/b/c/d && - test-dir-iterator ./dir 1 >actual-pre-order-output && + test-dir-iterator --pre-order ./dir >actual-pre-order-output && rm -rf dir && test_cmp expect-pre-order-output actual-pre-order-output @@ -78,7 +78,7 @@ test_expect_success 'dir-iterator should list files properly on post-order mode' mkdir -p dir/a/b/c/ && >dir/a/b/c/d && - test-dir-iterator ./dir 2 >actual-post-order-output && + test-dir-iterator --post-order ./dir >actual-post-order-output && rm -rf dir && test_cmp expect-post-order-output actual-post-order-output @@ -100,7 +100,7 @@ test_expect_success 'dir-iterator should list files properly on pre-order + post mkdir -p dir/a/b/c/ && >dir/a/b/c/d && - test-dir-iterator ./dir 7 >actual-pre-order-post-order-root-dir-output && + test-dir-iterator --pre-order --post-order --list-root-dir ./dir >actual-pre-order-post-order-root-dir-output && rm -rf dir && test_cmp expect-pre-order-post-order-root-dir-output actual-pre-order-post-order-root-dir-output -- 2.11.0