Hi Daniel, On Fri, 5 May 2017, Daniel Ferreira wrote: > Create a builtin helper for git-add--interactive, which right now is > only able to reproduce git-add--interactive.perl's status_cmd() > function, providing a summarized diff numstat to the user. > > This is the first step in an effort to convert git-add--interactive.perl > to a C builtin, in search for better portability, expressibility and > performance (specially on non-POSIX systems like Windows). > > Additionally, an eventual complete port of git-add--interactive would > remove the last "big" Git script to have Perl as a dependency, allowing > most Git users to have a NOPERL build running without big losses. > > Signed-off-by: Daniel Ferreira Good. I would think that git-add--helper is a better name. It is an internal command only, anyway, and hopefully it will go away soon (i.e. hopefully all of add -i will be builtin soon and can then be even moved into a separate file outside builtin/, say, add-interactive.c). > diff --git a/.gitignore b/.gitignore > index 833ef3b..0d6cfe4 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -11,6 +11,7 @@ > /git > /git-add > /git-add--interactive > +/git-add-interactive--helper Personally, I would prefer a separate commit adding the helper, before it learns about any functionality. Somehow I have an easier time reviewing patches if they tease such things apart. > diff --git a/builtin/add-interactive--helper.c b/builtin/add-interactive--helper.c > new file mode 100644 > index 0000000..97ca1b3 > --- /dev/null > +++ b/builtin/add-interactive--helper.c > @@ -0,0 +1,258 @@ > +#include "builtin.h" > +#include "color.h" > +#include "diff.h" > +#include "diffcore.h" > +#include "revision.h" > + > +#define ADD_INTERACTIVE_HEADER_INDENT " " This definition is local to this helper, and we already know that this is all about add -i. So maybe not repeat it? HEADER_INDENT would be catchier and just as expressive. > +enum add_interactive_collection_mode { > + COLLECTION_MODE_NONE, > + COLLECTION_MODE_WORKTREE, > + COLLECTION_MODE_INDEX > +}; Likewise, maybe we do not want to have this COLLECTION_MODE_ prefix littering the source code... NONE, WORKTREE and INDEX should be short and sweet. > +struct add_interactive_file_status { Just struct file_status, maybe? From a cursory glance, it also would appear that the only need for this struct to be named is the memset() call when initializing a new item, right? If that is the case, it would be better to use sizeof(*s->files) and keep it unnamed, as part of the enclosing struct. > + int selected; > + > + char path[PATH_MAX]; That is a bit wasteful. `char *name` is what diffstat_t uses... > + int lines_added_index; > + int lines_deleted_index; > + int lines_added_worktree; > + int lines_deleted_worktree; Maybe for better readability: struct { uintmax_t added, deleted; } index, worktree; > +}; > > +struct add_interactive_status { > + enum add_interactive_collection_mode current_mode; I am scared of future patches where you cannot wrap the code to the required <= 80 columns/row because the data type or variable name is too long... ;-) > + > + const char *reference; > + struct pathspec pathspec; > + > + int file_count; > + struct add_interactive_file_status *files; The convention for growable arrays in Git is to have nr, alloc and the array, and use ALLOC_GROW() to avoid reallocating for every single new item. > +}; > + > +static int add_interactive_use_color = -1; > +enum color_add_interactive { > + ADD_INTERACTIVE_PROMPT, > + ADD_INTERACTIVE_HEADER, > + ADD_INTERACTIVE_HELP, > + ADD_INTERACTIVE_ERROR > +}; Again, if we were talking about a declaration in a .h file, that prefix would make sense, but here we are safely in file-local territory where I think this is just unnecessary churn. > +static char add_interactive_colors[][COLOR_MAXLEN] = { > + GIT_COLOR_BOLD_BLUE, /* Prompt */ > + GIT_COLOR_BOLD, /* Header */ > + GIT_COLOR_BOLD_RED, /* Help */ > + GIT_COLOR_BOLD_RED /* Error */ > +}; > + > +static const char *add_interactive_get_color(enum color_add_interactive ix) > +{ > + if (want_color(add_interactive_use_color)) > + return add_interactive_colors[ix]; > + return ""; > +} > + > +static int parse_add_interactive_color_slot(const char *slot) > +{ > + if (!strcasecmp(slot, "prompt")) > + return ADD_INTERACTIVE_PROMPT; > + if (!strcasecmp(slot, "header")) > + return ADD_INTERACTIVE_HEADER; > + if (!strcasecmp(slot, "help")) > + return ADD_INTERACTIVE_HELP; > + if (!strcasecmp(slot, "error")) > + return ADD_INTERACTIVE_ERROR; > + > + return -1; > +} > + > +static int git_add_interactive_config(const char *var, Not git_add_interactive__helper_config()? ;-) > + const char *value, void *cbdata) > +{ > + const char *name; > + > + if (!strcmp(var, "color.interactive")) { > + add_interactive_use_color = git_config_colorbool(var, value); > + return 0; > + } > + > + if (skip_prefix(var, "color.interactive", &name)) { > + int slot = parse_add_interactive_color_slot(name); > + if (slot < 0) > + return 0; > + if (!value) > + return config_error_nonbool(var); > + return color_parse(value, add_interactive_colors[slot]); > + } > + > + return git_default_config(var, value, cbdata); > +} > + > +static void add_interactive_status_collect_changed_cb(struct diff_queue_struct *q, I did have that fear, and now it comes true... ;-) Seriously again, this is a file-local function. We know it is about add --interactive, and we can be pretty succinct about its role, i.e. something like collect_changes_callback() would be plenty sufficient. > + struct diff_options *options, > + void *data) > +{ > + struct add_interactive_status *s = data; > + struct diffstat_t stat; > + int i, j; > + > + if (!q->nr) > + return; > + > + memset(&stat, 0, sizeof(struct diffstat_t)); I guess it makes sense to play it safe here (by zeroing the entire struct rather than assigning the individual initial values). But would struct diffstat_t stat = { 0 }; not have done the same? > + for (i = 0; i < q->nr; i++) { > + struct diff_filepair *p; > + p = q->queue[i]; > + diff_flush_stat(p, options, &stat); > + } > + > + for (i = 0; i < stat.nr; i++) { > + int file_index = s->file_count; > + for (j = 0; j < s->file_count; j++) { > + if (!strcmp(s->files[j].path, stat.files[i]->name)) { > + file_index = j; > + break; > + } > + } So basically, this is looking up in a list whether we saw the file in question already, and the reason we have to do that is that we run the entire shebang twice, once with the worktree and once with the index. I wonder whether it would not make sense to switch away s->files from a list to a hashmap. > + if (file_index == s->file_count) { > + s->file_count++; > + s->files = realloc(s->files, > + (q->nr + s->file_count) * sizeof(*s->files)); We already have ALLOC_GROW() to do this, safer. And easier to verify ;-) > + memset(&s->files[file_index], 0, > + sizeof(struct add_interactive_file_status)); > + } > + > + memcpy(s->files[file_index].path, stat.files[i]->name, > + strlen(stat.files[i]->name) + 1); Apart from using PATH_MAX bytes for most likely only short names: should this not be inside the if (file_index == s->file_count) clause? I mean, if we do not hit that conditional code, it is because we already compared the file name and figured out that s->files[file_index].path is identical to stat.files[i]->name... > + if (s->current_mode == COLLECTION_MODE_WORKTREE) { > + s->files[file_index].lines_added_worktree = stat.files[i]->added; > + s->files[file_index].lines_deleted_worktree = stat.files[i]->deleted; > + } else if (s->current_mode == COLLECTION_MODE_INDEX) { > + s->files[file_index].lines_added_index = stat.files[i]->added; > + s->files[file_index].lines_deleted_index = stat.files[i]->deleted; > + } > + } > +} > + > +static void add_interactive_status_collect_changes_worktree(struct add_interactive_status *s) I would use the name status_worktree() here instead. > +{ > + struct rev_info rev; > + > + s->current_mode = COLLECTION_MODE_WORKTREE; Now that I read this and remember that only WORKTREE and INDEX are handled in the callback function: is there actually a use for the NONE enum value? I.e. is current_mode read out in any other context than the callback function? If there is no other read, then the NONE enum value is just confusing. BTW in the first pass, we pretty much know that we only get unique names, so the entire lookup is unnecessary and will just increase the time complexity from O(n) to O(n^2). So let's avoid that. By moving to a hashmap, you can even get the second phase down to an expected O(n). > + > + init_revisions(&rev, NULL); > + setup_revisions(0, NULL, &rev, NULL); > + > + rev.max_count = 0; Where does this come from? We are not logging commits... oh wait, I see that diff-lib.c reuses that field for its diff_unmerged_stage value. Urgh. Not your fault, of course. > + rev.diffopt.output_format = DIFF_FORMAT_CALLBACK; > + rev.diffopt.format_callback = add_interactive_status_collect_changed_cb; > + rev.diffopt.format_callback_data = s; > + > + run_diff_files(&rev, 0); > +} > + > +static void add_interactive_status_collect_changes_index(struct add_interactive_status *s) > +{ > + struct rev_info rev; > + struct setup_revision_opt opt; > + > + s->current_mode = COLLECTION_MODE_INDEX; > + > + init_revisions(&rev, NULL); > + memset(&opt, 0, sizeof(opt)); struct setup_revision_opt opt = { NULL }; > + opt.def = s->reference; > + setup_revisions(0, NULL, &rev, &opt); Oh, I see, you want to use the opt argument to pass HEAD or the emtpy tree SHA-1 if we're on an unborn branch. Since you're already calling get_sha1() on "HEAD", you may just as well keep the SHA-1 (and use the EMPTY_TREE_SHA1 if there is no HEAD), and then pass that in via the argc, argv parameters to setup_revisions() (imitating cmd_diff_index() a bit closer). > + > + rev.diffopt.output_format = DIFF_FORMAT_CALLBACK; > + rev.diffopt.format_callback = add_interactive_status_collect_changed_cb; > + rev.diffopt.format_callback_data = s; > + > + run_diff_index(&rev, 1); > +} > + > +static void list_modified_into_status(struct add_interactive_status *s) > +{ > + add_interactive_status_collect_changes_worktree(s); > + add_interactive_status_collect_changes_index(s); > +} Why not collapse all three functions into one? It is not like they are totally unrelated nor super-long. > +static void print_modified(void) > +{ > + int i; > + struct add_interactive_status s; > + const char *modified_fmt = _("%12s %12s %s"); We cannot really translate that... > + const char *header_color = add_interactive_get_color(ADD_INTERACTIVE_HEADER); > + unsigned char sha1[20]; > + > + if (read_cache() < 0) > + return; > + > + s.current_mode = COLLECTION_MODE_NONE; > + s.reference = !get_sha1("HEAD", sha1) ? "HEAD": EMPTY_TREE_SHA1_HEX; > + s.file_count = 0; > + s.files = NULL; > + list_modified_into_status(&s); > + > + printf(ADD_INTERACTIVE_HEADER_INDENT); > + color_fprintf(stdout, header_color, modified_fmt, _("staged"), > + _("unstaged"), _("path")); I think these _() need to become N_(). > + printf("\n"); Wouldn't it be better to avoid defining modified_fmt (we do not really modify it, do we?) and at least make the '\n' part of the format string? > + for (i = 0; i < s.file_count; i++) { > + struct add_interactive_file_status f = s.files[i]; > + char selection = f.selected ? '*' : ' '; I would prefer the variable to be called "marker". > + > + char worktree_changes[50]; > + char index_changes[50]; Those "50" look a bit arbitrary... maybe use strbufs instead (so that we do not have to hope that all translations of "nothing" or "unchanged", even Welsh ones, will fit inside those buffers)? > + if (f.lines_added_worktree != 0 || f.lines_deleted_worktree != 0) In Git's source code, the convention is to just drop the `!= 0` in comparisons, unless there is a really good reason not to. > + snprintf(worktree_changes, 50, "+%d/-%d", f.lines_added_worktree, > + f.lines_deleted_worktree); > + else > + snprintf(worktree_changes, 50, "%s", _("nothing")); > + > + if (f.lines_added_index != 0 || f.lines_deleted_index != 0) > + snprintf(index_changes, 50, "+%d/-%d", f.lines_added_index, > + f.lines_deleted_index); > + else > + snprintf(index_changes, 50, "%s", _("unchanged")); > + > + printf("%c%2d: ", selection, i + 1); > + printf(modified_fmt, index_changes, worktree_changes, f.path); > + printf("\n"); It would be nicer to make this a single printf() call. The only idea I have to allow for that is to turn modified_fmt into a `#define MODIFIED_FMT "%12s %12s %s"`, though. > + } > +} > + > +static void status_cmd(void) > +{ > + print_modified(); > +} As long as this function really only calls another function with no parameters, let's just drop it. We can call print_modified() instead of status_cmd() just as easily. Ævar already commented on the parseopt stuff. Otherwise this looks pretty good to me! Thanks, Johannes