git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GSoC][PATCH 0/4] submodule: port 'summary' from Shell to C
@ 2020-07-02 19:24 Shourya Shukla
  2020-07-02 19:24 ` [PATCH 1/4] submodule: amend extra line feed between callback struct and macro Shourya Shukla
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Shourya Shukla @ 2020-07-02 19:24 UTC (permalink / raw)
  To: git
  Cc: christian.couder, gitster, liu.denton, kaartic.sivaraam, pc44800,
	sbeller, pclouds, Shourya Shukla

Hello all!

This is third subcommand in line to be converted as a part of my GSoC
project. The other two were 'set-url' and 'set-branch' both of which
have been merged into Git. This port has been taken off from what
Prathamesh Chavan left as a part of his GSoC project in 2017 under
Stefan Beller and Christian Couder. As written in my proposal and
advised by my mentors Kaartic and Christian, it was decided to re-use
PC's work for multiple reasons.

Here is the link to the v1 of his patch:
https://lore.kernel.org/git/20170731205621.24305-9-pc44800@gmail.com/

Here is the link to the v2 of his patch:
https://lore.kernel.org/git/20170807211900.15001-9-pc44800@gmail.com/

Even though this patch series draws on Prathamesh' work, a lot extra
features had to be added along with some remodelling. A summary of the 4
commits in this patch are as follows: the patch consists of 3
preparatory patches preceding the main commit porting 'git submodule summary'.

Being a bit more verbose about the prepatory patches:

    1. a8dcc59a6a amends an extra linefeed between callback struct(s)
       macro(s) of subcommands 'init', 'status' and 'sync' so as to make
       the whole format of subcommands a bit more uniform and
       consistent.

    2. 5f17ca37e0 renames the helper functions
       'show_submodule_summary()', 'prepare_submodule_summary()' and
       'print_submodule_summary()' used by the 'builtin_diff()' function
       of diff.c This was done so as to avoid any ambiguity later on
       when the helper functions used by 'summary' will be defined in
       the upcoming port. The functions are renamed to
       '*_diff_submodule_summary()' so as to classify them as the one
       being used by the diff machinery.

    3. 5945ab4113 changes the scope of the 'count_lines()' function
       defined in diff.c so as to make it usable by other entities of
       the code. This was originally a patch authored by PC with a small
       change that the function instead of being defined as an 'extern
       int', it is now defined as 'int' since f758a7f8ac4 by Nguyễn Thái
       Ngọc Duy removed extern from the function declarations in diff.h

Now moving on to the main patch 4/4 that will port summary.

    17738e9df4 ports the subcommand summary from Shell to C. I will not
    be repetitive by stating how all the functions communicate since
    this is covered in the commit message. What I will focus on is the
    bits I had to tweak along with any extra functions added. Originally
    the functions were: module_summary(),
    compute_summary_submodule_list(), submodule_summary_callback(),
    prepare_submodule_summary(), generate_submodule_summary(),
    print_submodule_summary() and verify_submodule_object_name. I had to
    add another function called get_diff_cmd() which fetches a const
    char * (a diff-index or a diff-files) in response to the enum provided
    to it and bugs out in case of an unexpected enum value. I also had
    to tweak the enum declared by PC by NOT keeping it static and
    tweaking its definition just a little. These were suggested by
    Christian on PC's patch at the time.

    There were also some changes to function parameters such as those
    of index_fd() by passing the the_index parameter to it or the case
    of submodule_from_path() where it was needed to add another parameter
    of the_repository. This all must have been due to change of function
    defintions in the past 3 years.

    Even after all this, there were two problems: t7418 had been failing
    (something which I feel might have been overlooked/not tested by PC
    during his time) and the CI build failed. First let's cover the test
    failure.

    The main test for testing summary is t7401. t7418 is another test
    which involves sparse checkouts and submodules. The test creates a
    sparse checkout clone excluding the .gitmodules file to test the
    functionality of the '--for-status' option. After quite a thorough
    investigation me, Kaartic and Christian figured out that this port
    fails to handle cases where a gitfile exists instead of a .git dir
    since all the tests in t7401 did not any submodules which were
    cloned in any way but rather just exist in a nearby directory and
    are used in our tests (this is something I feel should be fixed and
    hence I am saving this for later). But anyway, we decided that it
    will be excellent to check for a non-bare repo instead of checking
    for a gitdir therefore replacing the function is_git_directory()
    with is_nonbare_repository() in the first non-nested if-check in
    generate_submodule_summary().

    Regarding the CI failure. The build was failing due to the
    function oidcmp() in the first if-check of
    generate_submodule_summary(). The CI build suggested that we use
    oideq() instead so that the build passes and on doing the change, it
    did!

    There were some other cosmetic changes such as wrapping of
    columns upto 80 chars (to follow the CodingGuidelines) and
    improving some error messages and option descriptions,

Finally, since it was originally PC's work, I have kept him as the author
for the final 2 commits and kept the date as-is (from 2017). This patch
is not perfect and hence needs the feedback from all of you :)

Here is a 'git log --oneline' of the commits:

    17738e9df4 submodule: port submodule subcommand 'summary' from shell to C
    5945ab4113 diff: change scope of the function count_lines()
    5f17ca37e0 submodule: rename helper functions to avoid ambiguity
    a8dcc59a6a submodule: amend extra line feed between callback struct and macro

Thanks,
Shourya Shukla


Prathamesh Chavan (2):
  diff: change scope of the function count_lines()
  submodule: port submodule subcommand 'summary' from shell to C

Shourya Shukla (2):
  submodule: amend extra line feed between callback struct and macro
  submodule: rename helper functions to avoid ambiguity

 builtin/submodule--helper.c | 454 +++++++++++++++++++++++++++++++++++-
 diff.c                      |   4 +-
 diff.h                      |   1 +
 git-submodule.sh            | 186 +--------------
 submodule.c                 |  10 +-
 submodule.h                 |   2 +-
 6 files changed, 461 insertions(+), 196 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4] submodule: amend extra line feed between callback struct and macro
  2020-07-02 19:24 [GSoC][PATCH 0/4] submodule: port 'summary' from Shell to C Shourya Shukla
@ 2020-07-02 19:24 ` Shourya Shukla
  2020-07-03 14:57   ` Johannes Schindelin
  2020-07-02 19:24 ` [PATCH 2/4] submodule: rename helper functions to avoid ambiguity Shourya Shukla
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Shourya Shukla @ 2020-07-02 19:24 UTC (permalink / raw)
  To: git
  Cc: christian.couder, gitster, liu.denton, kaartic.sivaraam, pc44800,
	sbeller, pclouds, Shourya Shukla, Christian Couder

All subcommands of 'git submodule' using a callback mechanism had
absence of an extra linefeed between their callback structs and
macros. Subcommands 'init', 'status' and 'sync' did not follow suit.
Amend the extra line feed.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
---
 builtin/submodule--helper.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 59c1e1217c..eea3932c40 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -612,7 +612,6 @@ struct init_cb {
 	const char *prefix;
 	unsigned int flags;
 };
-
 #define INIT_CB_INIT { NULL, 0 }
 
 static void init_submodule(const char *path, const char *prefix,
@@ -742,7 +741,6 @@ struct status_cb {
 	const char *prefix;
 	unsigned int flags;
 };
-
 #define STATUS_CB_INIT { NULL, 0 }
 
 static void print_status(unsigned int flags, char state, const char *path,
@@ -933,7 +931,6 @@ struct sync_cb {
 	const char *prefix;
 	unsigned int flags;
 };
-
 #define SYNC_CB_INIT { NULL, 0 }
 
 static void sync_submodule(const char *path, const char *prefix,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/4] submodule: rename helper functions to avoid ambiguity
  2020-07-02 19:24 [GSoC][PATCH 0/4] submodule: port 'summary' from Shell to C Shourya Shukla
  2020-07-02 19:24 ` [PATCH 1/4] submodule: amend extra line feed between callback struct and macro Shourya Shukla
@ 2020-07-02 19:24 ` Shourya Shukla
  2020-07-02 19:24 ` [PATCH 3/4] diff: change scope of the function count_lines() Shourya Shukla
  2020-07-02 19:24 ` [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C Shourya Shukla
  3 siblings, 0 replies; 21+ messages in thread
From: Shourya Shukla @ 2020-07-02 19:24 UTC (permalink / raw)
  To: git
  Cc: christian.couder, gitster, liu.denton, kaartic.sivaraam, pc44800,
	sbeller, pclouds, Shourya Shukla, Christian Couder

The helper functions: show_submodule_summary(),
prepare_submodule_summary() and print_submodule_summary() are used by
the builtin_diff() function in diff.c to generate a summary of
submodules in the context of a diff. Functions with similar names are to
be introduced in the upcoming port of submodule's summary subcommand.

So, rename the helper functions to '*_diff_submodule_summary()' to avoid
ambiguity.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
---
 diff.c      |  2 +-
 submodule.c | 10 +++++-----
 submodule.h |  2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index d24aaa3047..4a2c631c37 100644
--- a/diff.c
+++ b/diff.c
@@ -3429,7 +3429,7 @@ static void builtin_diff(const char *name_a,
 	if (o->submodule_format == DIFF_SUBMODULE_LOG &&
 	    (!one->mode || S_ISGITLINK(one->mode)) &&
 	    (!two->mode || S_ISGITLINK(two->mode))) {
-		show_submodule_summary(o, one->path ? one->path : two->path,
+		show_submodule_diff_summary(o, one->path ? one->path : two->path,
 				&one->oid, &two->oid,
 				two->dirty_submodule);
 		return;
diff --git a/submodule.c b/submodule.c
index e2ef5698c8..097902ee67 100644
--- a/submodule.c
+++ b/submodule.c
@@ -438,7 +438,7 @@ void handle_ignore_submodules_arg(struct diff_options *diffopt,
 	 */
 }
 
-static int prepare_submodule_summary(struct rev_info *rev, const char *path,
+static int prepare_submodule_diff_summary(struct rev_info *rev, const char *path,
 		struct commit *left, struct commit *right,
 		struct commit_list *merge_bases)
 {
@@ -459,7 +459,7 @@ static int prepare_submodule_summary(struct rev_info *rev, const char *path,
 	return prepare_revision_walk(rev);
 }
 
-static void print_submodule_summary(struct repository *r, struct rev_info *rev, struct diff_options *o)
+static void print_submodule_diff_summary(struct repository *r, struct rev_info *rev, struct diff_options *o)
 {
 	static const char format[] = "  %m %s";
 	struct strbuf sb = STRBUF_INIT;
@@ -610,7 +610,7 @@ static void show_submodule_header(struct diff_options *o,
 	strbuf_release(&sb);
 }
 
-void show_submodule_summary(struct diff_options *o, const char *path,
+void show_submodule_diff_summary(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule)
 {
@@ -632,12 +632,12 @@ void show_submodule_summary(struct diff_options *o, const char *path,
 		goto out;
 
 	/* Treat revision walker failure the same as missing commits */
-	if (prepare_submodule_summary(&rev, path, left, right, merge_bases)) {
+	if (prepare_submodule_diff_summary(&rev, path, left, right, merge_bases)) {
 		diff_emit_submodule_error(o, "(revision walker failed)\n");
 		goto out;
 	}
 
-	print_submodule_summary(sub, &rev, o);
+	print_submodule_diff_summary(sub, &rev, o);
 
 out:
 	if (merge_bases)
diff --git a/submodule.h b/submodule.h
index 4dad649f94..22db9e1832 100644
--- a/submodule.h
+++ b/submodule.h
@@ -69,7 +69,7 @@ int parse_submodule_update_strategy(const char *value,
 				    struct submodule_update_strategy *dst);
 const char *submodule_strategy_to_string(const struct submodule_update_strategy *s);
 void handle_ignore_submodules_arg(struct diff_options *, const char *);
-void show_submodule_summary(struct diff_options *o, const char *path,
+void show_submodule_diff_summary(struct diff_options *o, const char *path,
 			    struct object_id *one, struct object_id *two,
 			    unsigned dirty_submodule);
 void show_submodule_inline_diff(struct diff_options *o, const char *path,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/4] diff: change scope of the function count_lines()
  2020-07-02 19:24 [GSoC][PATCH 0/4] submodule: port 'summary' from Shell to C Shourya Shukla
  2020-07-02 19:24 ` [PATCH 1/4] submodule: amend extra line feed between callback struct and macro Shourya Shukla
  2020-07-02 19:24 ` [PATCH 2/4] submodule: rename helper functions to avoid ambiguity Shourya Shukla
@ 2020-07-02 19:24 ` Shourya Shukla
  2020-07-03 15:07   ` Johannes Schindelin
  2020-07-02 19:24 ` [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C Shourya Shukla
  3 siblings, 1 reply; 21+ messages in thread
From: Shourya Shukla @ 2020-07-02 19:24 UTC (permalink / raw)
  To: git
  Cc: christian.couder, gitster, liu.denton, kaartic.sivaraam, pc44800,
	sbeller, pclouds, Shourya Shukla

From: Prathamesh Chavan <pc44800@gmail.com>

Change the scope of function count_lines for allowing the function
to be reused in other parts of the code as well.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Stefan Beller <sbeller@google.com>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
---
 diff.c | 2 +-
 diff.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 4a2c631c37..39ddbcf451 100644
--- a/diff.c
+++ b/diff.c
@@ -547,7 +547,7 @@ struct emit_callback {
 	struct strbuf *header;
 };
 
-static int count_lines(const char *data, int size)
+int count_lines(const char *data, int size)
 {
 	int count, ch, completely_empty = 1, nl_just_seen = 0;
 	count = 0;
diff --git a/diff.h b/diff.h
index 9443dc1b00..bd96c8c434 100644
--- a/diff.h
+++ b/diff.h
@@ -495,6 +495,7 @@ void free_diffstat_info(struct diffstat_t *diffstat);
 int parse_long_opt(const char *opt, const char **argv,
 		   const char **optarg);
 
+int count_lines(const char *data, int size);
 int git_diff_basic_config(const char *var, const char *value, void *cb);
 int git_diff_heuristic_config(const char *var, const char *value, void *cb);
 void init_diff_ui_defaults(void);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C
  2020-07-02 19:24 [GSoC][PATCH 0/4] submodule: port 'summary' from Shell to C Shourya Shukla
                   ` (2 preceding siblings ...)
  2020-07-02 19:24 ` [PATCH 3/4] diff: change scope of the function count_lines() Shourya Shukla
@ 2020-07-02 19:24 ` Shourya Shukla
  2020-07-03 20:46   ` Johannes Schindelin
  3 siblings, 1 reply; 21+ messages in thread
From: Shourya Shukla @ 2020-07-02 19:24 UTC (permalink / raw)
  To: git
  Cc: christian.couder, gitster, liu.denton, kaartic.sivaraam, pc44800,
	sbeller, pclouds, Shourya Shukla

From: Prathamesh Chavan <pc44800@gmail.com>

The submodule subcommand 'summary' is ported in the process of
making git-submodule a builtin. The function cmd_summary() from
git-submodule.sh is ported to functions module_summary(),
compute_summary_module_list(), prepare_submodule_summary() and
generate_submodule_summary(), print_submodule_summary().

The first function module_summary() parses the options of submodule
subcommand and also acts as the front-end of this subcommand.
After parsing them, it calls the compute_summary_module_list()

The functions compute_summary_module_list() runs the diff_cmd,
and generates the modules list, as required by the subcommand.
The generation of this module list is done by the using the
callback function submodule_summary_callback(), and stored in the
structure module_cb.

Once the module list is generated, prepare_submodule_summary()
further goes through the list and filters the list, for
eventually calling the generate_submodule_summary() function.

The function generate_submodule_summary() takes care of generating
the summary for each submodule and then calls the function
print_submodule_summary() for printing it.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Stefan Beller <sbeller@google.com>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
---
 builtin/submodule--helper.c | 451 ++++++++++++++++++++++++++++++++++++
 git-submodule.sh            | 186 +--------------
 2 files changed, 452 insertions(+), 185 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index eea3932c40..1dbdb934f1 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -927,6 +927,456 @@ static int module_name(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+struct module_cb {
+	unsigned int mod_src;
+	unsigned int mod_dst;
+	struct object_id oid_src;
+	struct object_id oid_dst;
+	char status;
+	const char *sm_path;
+};
+#define MODULE_CB_INIT { 0, 0, NULL, NULL, '\0', NULL }
+
+struct module_cb_list {
+	struct module_cb **entries;
+	int alloc, nr;
+};
+#define MODULE_CB_LIST_INIT { NULL, 0, 0 }
+
+struct summary_cb {
+	int argc;
+	const char **argv;
+	const char *prefix;
+	unsigned int cached: 1;
+	unsigned int for_status: 1;
+	unsigned int quiet: 1;
+	unsigned int files: 1;
+	int summary_limit;
+};
+#define SUMMARY_CB_INIT { 0, NULL, NULL, 0, 0, 0, 0, 0 }
+
+enum diff_cmd {
+	DIFF_INDEX,
+	DIFF_FILES
+};
+
+static int verify_submodule_object_name(const char *sm_path,
+					  const char *sha1)
+{
+	struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
+
+	cp_rev_parse.git_cmd = 1;
+	cp_rev_parse.no_stdout = 1;
+	cp_rev_parse.dir = sm_path;
+	prepare_submodule_repo_env(&cp_rev_parse.env_array);
+	argv_array_pushl(&cp_rev_parse.args, "rev-parse", "-q",
+			 "--verify", NULL);
+	argv_array_pushf(&cp_rev_parse.args, "%s^0", sha1);
+
+	if (run_command(&cp_rev_parse))
+		return 1;
+
+	return 0;
+}
+
+static void print_submodule_summary(struct summary_cb *info, int errmsg,
+				      int total_commits, int missing_src,
+				      int missing_dst, const char *displaypath,
+				      int is_sm_git_dir, struct module_cb *p)
+{
+	if (p->status == 'T') {
+		if (S_ISGITLINK(p->mod_dst))
+			printf(_("* %s %s(blob)->%s(submodule)"),
+				 displaypath, find_unique_abbrev(&p->oid_src, 7),
+				 find_unique_abbrev(&p->oid_dst, 7));
+		else
+			printf(_("* %s %s(submodule)->%s(blob)"),
+				 displaypath, find_unique_abbrev(&p->oid_src, 7),
+				 find_unique_abbrev(&p->oid_dst, 7));
+	} else {
+		printf("* %s %s...%s",
+			displaypath, find_unique_abbrev(&p->oid_src, 7),
+			find_unique_abbrev(&p->oid_dst, 7));
+	}
+
+	if (total_commits < 0)
+		printf(":\n");
+	else
+		printf(" (%d):\n", total_commits);
+
+	if (errmsg) {
+		/*
+		 * Don't give error msg for modification whose dst is not
+		 * submodule, i.e. deleted or changed to blob
+		 */
+		if (S_ISGITLINK(p->mod_src)) {
+			if (missing_src && missing_dst) {
+				printf(_("  Warn: %s doesn't contain commits %s and %s\n"),
+				       displaypath, oid_to_hex(&p->oid_src),
+				       oid_to_hex(&p->oid_dst));
+			} else if (missing_src) {
+				printf(_("  Warn: %s doesn't contain commit %s\n"),
+				       displaypath, oid_to_hex(&p->oid_src));
+			} else {
+				printf(_("  Warn: %s doesn't contain commit %s\n"),
+				       displaypath, oid_to_hex(&p->oid_dst));
+			}
+		}
+	} else if (is_sm_git_dir) {
+		struct child_process cp_log = CHILD_PROCESS_INIT;
+
+		cp_log.git_cmd = 1;
+		cp_log.dir = p->sm_path;
+		prepare_submodule_repo_env(&cp_log.env_array);
+		argv_array_pushl(&cp_log.args, "log", NULL);
+
+		if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst)) {
+			if (info->summary_limit > 0)
+				argv_array_pushf(&cp_log.args, "-%d",
+						 info->summary_limit);
+
+			argv_array_pushl(&cp_log.args, "--pretty=  %m %s",
+					 "--first-parent", NULL);
+			argv_array_pushf(&cp_log.args, "%s...%s",
+					 oid_to_hex(&p->oid_src),
+					 oid_to_hex(&p->oid_dst));
+		} else if (S_ISGITLINK(p->mod_dst)) {
+			argv_array_pushl(&cp_log.args, "--pretty=  > %s",
+					 "-1", oid_to_hex(&p->oid_dst), NULL);
+		} else {
+			argv_array_pushl(&cp_log.args, "--pretty=  < %s",
+					 "-1", oid_to_hex(&p->oid_src), NULL);
+		}
+		run_command(&cp_log);
+	}
+	printf("\n");
+}
+
+static void generate_submodule_summary(struct summary_cb *info,
+				       struct module_cb *p)
+{
+	int missing_src = 0;
+	int missing_dst = 0;
+	char *displaypath;
+	int errmsg = 0;
+	int total_commits = -1;
+	int is_sm_git_dir = 0;
+	struct strbuf sm_git_dir_sb = STRBUF_INIT;
+
+	if (!info->cached && oideq(&p->oid_dst, &null_oid)) {
+		if (S_ISGITLINK(p->mod_dst)) {
+			/*
+			 * NEEDSWORK: avoid using separate process with
+			 * the help of the function head_ref_submodule()
+			 */
+			struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
+			struct strbuf sb_rev_parse = STRBUF_INIT;
+
+			cp_rev_parse.git_cmd = 1;
+			cp_rev_parse.no_stderr = 1;
+			cp_rev_parse.dir = p->sm_path;
+			prepare_submodule_repo_env(&cp_rev_parse.env_array);
+
+			argv_array_pushl(&cp_rev_parse.args, "rev-parse",
+					 "HEAD", NULL);
+			if (!capture_command(&cp_rev_parse, &sb_rev_parse, 0)) {
+				strbuf_strip_suffix(&sb_rev_parse, "\n");
+				get_oid_hex(sb_rev_parse.buf, &p->oid_dst);
+			}
+			strbuf_release(&sb_rev_parse);
+		} else if (S_ISLNK(p->mod_dst) || S_ISREG(p->mod_dst)) {
+			struct child_process cp_hash_object = CHILD_PROCESS_INIT;
+			struct strbuf sb_hash_object = STRBUF_INIT;
+
+			cp_hash_object.git_cmd = 1;
+			argv_array_pushl(&cp_hash_object.args, "hash-object",
+					 p->sm_path, NULL);
+			if (!capture_command(&cp_hash_object,
+					     &sb_hash_object, 0)) {
+				strbuf_strip_suffix(&sb_hash_object, "\n");
+				get_oid_hex(sb_hash_object.buf, &p->oid_dst);
+			}
+			strbuf_release(&sb_hash_object);
+		} else {
+			if (p->mod_dst)
+				die(_("unexpected mode %d\n"), p->mod_dst);
+		}
+	}
+
+	strbuf_addstr(&sm_git_dir_sb, p->sm_path);
+	if (is_nonbare_repository_dir(&sm_git_dir_sb))
+		is_sm_git_dir = 1;
+
+	if (is_sm_git_dir && S_ISGITLINK(p->mod_src))
+		missing_src = verify_submodule_object_name(p->sm_path,
+							   oid_to_hex(&p->oid_src));
+
+	if (is_sm_git_dir && S_ISGITLINK(p->mod_dst))
+		missing_dst = verify_submodule_object_name(p->sm_path,
+							   oid_to_hex(&p->oid_dst));
+
+	displaypath = get_submodule_displaypath(p->sm_path, info->prefix);
+
+	if (!missing_dst && !missing_src) {
+		if (is_sm_git_dir) {
+			struct child_process cp_rev_list = CHILD_PROCESS_INIT;
+			struct strbuf sb_rev_list = STRBUF_INIT;
+			char *range;
+
+			if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst))
+				range = xstrfmt("%s...%s", oid_to_hex(&p->oid_src),
+						oid_to_hex(&p->oid_dst));
+			else if (S_ISGITLINK(p->mod_src))
+				range = xstrdup(oid_to_hex(&p->oid_src));
+			else
+				range = xstrdup(oid_to_hex(&p->oid_dst));
+
+			cp_rev_list.git_cmd = 1;
+			cp_rev_list.dir = p->sm_path;
+			prepare_submodule_repo_env(&cp_rev_list.env_array);
+
+			argv_array_pushl(&cp_rev_list.args, "rev-list",
+					 "--first-parent", range, "--", NULL);
+			if (!capture_command(&cp_rev_list, &sb_rev_list, 0)) {
+				if (sb_rev_list.len)
+					total_commits = count_lines(sb_rev_list.buf,
+								    sb_rev_list.len);
+				else
+					total_commits = 0;
+			}
+
+			free(range);
+			strbuf_release(&sb_rev_list);
+		}
+	} else {
+		errmsg = 1;
+	}
+
+	print_submodule_summary(info, errmsg, total_commits,
+				missing_src, missing_dst,
+		      		displaypath, is_sm_git_dir, p);
+
+	free(displaypath);
+	strbuf_release(&sm_git_dir_sb);
+}
+
+static void prepare_submodule_summary(struct summary_cb *info,
+				      struct module_cb_list *list)
+{
+	int i;
+	for (i = 0; i < list->nr; i++) {
+		struct module_cb *p = list->entries[i];
+		struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
+
+		if (p->status == 'D' || p->status == 'T') {
+			generate_submodule_summary(info, p);
+			continue;
+		}
+
+		if (info->for_status) {
+			char *config_key;
+			const char *ignore_config = "none";
+			const char *value;
+			const struct submodule *sub = submodule_from_path(the_repository,
+									  &null_oid,
+									  p->sm_path);
+
+			if (sub && p->status != 'A') {
+				config_key = xstrfmt("submodule.%s.ignore",
+						     sub->name);
+				if (!git_config_get_string_const(config_key, &value))
+					ignore_config = value;
+				else if (sub->ignore)
+					ignore_config = sub->ignore;
+
+				free(config_key);
+				if (!strcmp(ignore_config, "all"))
+					continue;
+			}
+		}
+
+		/* Also show added or modified modules which are checked out */
+		cp_rev_parse.dir = p->sm_path;
+		cp_rev_parse.git_cmd = 1;
+		cp_rev_parse.no_stderr = 1;
+		cp_rev_parse.no_stdout = 1;
+
+		argv_array_pushl(&cp_rev_parse.args, "rev-parse",
+				 "--git-dir", NULL);
+
+		if (!run_command(&cp_rev_parse))
+			generate_submodule_summary(info, p);
+	}
+}
+
+static void submodule_summary_callback(struct diff_queue_struct *q,
+				       struct diff_options *options,
+				       void *data)
+{
+	int i;
+	struct module_cb_list *list = data;
+	for (i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+		struct module_cb *temp;
+
+		if (!S_ISGITLINK(p->one->mode) && !S_ISGITLINK(p->two->mode))
+			continue;
+		temp = (struct module_cb*)malloc(sizeof(struct module_cb));
+		temp->mod_src = p->one->mode;
+		temp->mod_dst = p->two->mode;
+		temp->oid_src = p->one->oid;
+		temp->oid_dst = p->two->oid;
+		temp->status = p->status;
+		temp->sm_path = xstrdup(p->one->path);
+
+		ALLOC_GROW(list->entries, list->nr + 1, list->alloc);
+		list->entries[list->nr++] = temp;
+	}
+}
+
+static const char *get_diff_cmd(enum diff_cmd diff_cmd)
+{
+	switch (diff_cmd) {
+	case DIFF_INDEX: return "diff-index";
+	case DIFF_FILES: return "diff-files";
+	default: BUG("bad diff_cmd value %d", diff_cmd);
+	}
+}
+
+static int compute_summary_module_list(char *head,
+				         struct summary_cb *info,
+				         enum diff_cmd diff_cmd)
+{
+	struct argv_array diff_args = ARGV_ARRAY_INIT;
+	struct rev_info rev;
+	struct module_cb_list list = MODULE_CB_LIST_INIT;
+
+	argv_array_push(&diff_args, get_diff_cmd(diff_cmd));
+	if (info->cached)
+		argv_array_push(&diff_args, "--cached");
+	argv_array_pushl(&diff_args, "--ignore-submodules=dirty", "--raw",
+			 NULL);
+	if (head)
+		argv_array_push(&diff_args, head);
+	argv_array_push(&diff_args, "--");
+	if (info->argc)
+		argv_array_pushv(&diff_args, info->argv);
+
+	git_config(git_diff_basic_config, NULL);
+	init_revisions(&rev, info->prefix);
+	rev.abbrev = 0;
+	precompose_argv(diff_args.argc, diff_args.argv);
+
+	diff_args.argc = setup_revisions(diff_args.argc, diff_args.argv,
+					 &rev, NULL);
+	rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
+	rev.diffopt.format_callback = submodule_summary_callback;
+	rev.diffopt.format_callback_data = &list;
+
+	if (!info->cached) {
+		if (diff_cmd ==  DIFF_INDEX)
+			setup_work_tree();
+		if (read_cache_preload(&rev.diffopt.pathspec) < 0) {
+			perror("read_cache_preload");
+			return -1;
+		}
+	} else if (read_cache() < 0) {
+		perror("read_cache");
+		return -1;
+	}
+
+	if (diff_cmd == DIFF_INDEX)
+		run_diff_index(&rev, info->cached);
+	else
+		run_diff_files(&rev, 0);
+	prepare_submodule_summary(info, &list);
+	return 0;
+}
+
+static int module_summary(int argc, const char **argv, const char *prefix)
+{
+	struct summary_cb info = SUMMARY_CB_INIT;
+	int cached = 0;
+	int for_status = 0;
+	int quiet = 0;
+	int files = 0;
+	int summary_limit = -1;
+	struct child_process cp_rev = CHILD_PROCESS_INIT;
+	struct strbuf sb = STRBUF_INIT;
+	enum diff_cmd diff_cmd = DIFF_INDEX;
+	int ret;
+
+	struct option module_summary_options[] = {
+		OPT__QUIET(&quiet, N_("Suppress output of summarising submodules")),
+		OPT_BOOL(0, "cached", &cached,
+			 N_("Use the commit stored in the index instead of the submodule HEAD")),
+		OPT_BOOL(0, "files", &files,
+			 N_("To compare the commit in the index with that in the submodule HEAD")),
+		OPT_BOOL(0, "for-status", &for_status,
+			 N_("Skip submodules with 'ignore_config' value set to 'all'")),
+		OPT_INTEGER('n', "summary-limit", &summary_limit,
+			     N_("Limit the summary size")),
+		OPT_END()
+	};
+
+	const char *const git_submodule_helper_usage[] = {
+		N_("git submodule--helper summary [<options>] [commit] [--] [<path>]"),
+		NULL
+	};
+
+	argc = parse_options(argc, argv, prefix, module_summary_options,
+			     git_submodule_helper_usage, 0);
+
+	if (!summary_limit)
+		return 0;
+
+	cp_rev.git_cmd = 1;
+	argv_array_pushl(&cp_rev.args, "rev-parse", "-q", "--verify",
+			 argc ? argv[0] : "HEAD", NULL);
+
+	if (!capture_command(&cp_rev, &sb, 0)) {
+		strbuf_strip_suffix(&sb, "\n");
+		if (argc) {
+			argv++;
+			argc--;
+		}
+	} else if (!argc || !strcmp(argv[0], "HEAD")) {
+		/* before the first commit: compare with an empty tree */
+		struct stat st;
+		struct object_id oid;
+		if (fstat(0, &st) < 0 || index_fd(&the_index, &oid, 0, &st, 2,
+						  prefix, 3))
+			die("Unable to add %s to database", oid.hash);
+		strbuf_addstr(&sb, oid_to_hex(&oid));
+		if (argc) {
+			argv++;
+			argc--;
+		}
+	} else {
+		strbuf_addstr(&sb, "HEAD");
+	}
+
+	if (files) {
+		if (cached)
+			die(_("--cached and --files are mutually exclusive"));
+		diff_cmd = DIFF_FILES;
+	}
+
+	info.argc = argc;
+	info.argv = argv;
+	info.prefix = prefix;
+	info.cached = !!cached;
+	info.for_status = !!for_status;
+	info.quiet = quiet;
+	info.files = files;
+	info.summary_limit = summary_limit;
+
+	ret = compute_summary_module_list((diff_cmd == DIFF_FILES) ? NULL : sb.buf,
+					   &info, diff_cmd);
+	strbuf_release(&sb);
+	return ret;
+}
+
 struct sync_cb {
 	const char *prefix;
 	unsigned int flags;
@@ -2341,6 +2791,7 @@ static struct cmd_struct commands[] = {
 	{"print-default-remote", print_default_remote, 0},
 	{"sync", module_sync, SUPPORT_SUPER_PREFIX},
 	{"deinit", module_deinit, 0},
+	{"summary", module_summary, SUPPORT_SUPER_PREFIX},
 	{"remote-branch", resolve_remote_submodule_branch, 0},
 	{"push-check", push_check, 0},
 	{"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX},
diff --git a/git-submodule.sh b/git-submodule.sh
index 43eb6051d2..899b8a409a 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -59,31 +59,6 @@ die_if_unmatched ()
 	fi
 }
 
-#
-# Print a submodule configuration setting
-#
-# $1 = submodule name
-# $2 = option name
-# $3 = default value
-#
-# Checks in the usual git-config places first (for overrides),
-# otherwise it falls back on .gitmodules.  This allows you to
-# distribute project-wide defaults in .gitmodules, while still
-# customizing individual repositories if necessary.  If the option is
-# not in .gitmodules either, print a default value.
-#
-get_submodule_config () {
-	name="$1"
-	option="$2"
-	default="$3"
-	value=$(git config submodule."$name"."$option")
-	if test -z "$value"
-	then
-		value=$(git submodule--helper config submodule."$name"."$option")
-	fi
-	printf '%s' "${value:-$default}"
-}
-
 isnumber()
 {
 	n=$(($1 + 0)) 2>/dev/null && test "$n" = "$1"
@@ -831,166 +806,7 @@ cmd_summary() {
 		shift
 	done
 
-	test $summary_limit = 0 && return
-
-	if rev=$(git rev-parse -q --verify --default HEAD ${1+"$1"})
-	then
-		head=$rev
-		test $# = 0 || shift
-	elif test -z "$1" || test "$1" = "HEAD"
-	then
-		# before the first commit: compare with an empty tree
-		head=$(git hash-object -w -t tree --stdin </dev/null)
-		test -z "$1" || shift
-	else
-		head="HEAD"
-	fi
-
-	if [ -n "$files" ]
-	then
-		test -n "$cached" &&
-		die "$(gettext "The --cached option cannot be used with the --files option")"
-		diff_cmd=diff-files
-		head=
-	fi
-
-	cd_to_toplevel
-	eval "set $(git rev-parse --sq --prefix "$wt_prefix" -- "$@")"
-	# Get modified modules cared by user
-	modules=$(git $diff_cmd $cached --ignore-submodules=dirty --raw $head -- "$@" |
-		sane_egrep '^:([0-7]* )?160000' |
-		while read -r mod_src mod_dst sha1_src sha1_dst status sm_path
-		do
-			# Always show modules deleted or type-changed (blob<->module)
-			if test "$status" = D || test "$status" = T
-			then
-				printf '%s\n' "$sm_path"
-				continue
-			fi
-			# Respect the ignore setting for --for-status.
-			if test -n "$for_status"
-			then
-				name=$(git submodule--helper name "$sm_path")
-				ignore_config=$(get_submodule_config "$name" ignore none)
-				test $status != A && test $ignore_config = all && continue
-			fi
-			# Also show added or modified modules which are checked out
-			GIT_DIR="$sm_path/.git" git rev-parse --git-dir >/dev/null 2>&1 &&
-			printf '%s\n' "$sm_path"
-		done
-	)
-
-	test -z "$modules" && return
-
-	git $diff_cmd $cached --ignore-submodules=dirty --raw $head -- $modules |
-	sane_egrep '^:([0-7]* )?160000' |
-	cut -c2- |
-	while read -r mod_src mod_dst sha1_src sha1_dst status name
-	do
-		if test -z "$cached" &&
-			is_zero_oid $sha1_dst
-		then
-			case "$mod_dst" in
-			160000)
-				sha1_dst=$(GIT_DIR="$name/.git" git rev-parse HEAD)
-				;;
-			100644 | 100755 | 120000)
-				sha1_dst=$(git hash-object $name)
-				;;
-			000000)
-				;; # removed
-			*)
-				# unexpected type
-				eval_gettextln "unexpected mode \$mod_dst" >&2
-				continue ;;
-			esac
-		fi
-		missing_src=
-		missing_dst=
-
-		test $mod_src = 160000 &&
-		! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_src^0 >/dev/null &&
-		missing_src=t
-
-		test $mod_dst = 160000 &&
-		! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_dst^0 >/dev/null &&
-		missing_dst=t
-
-		display_name=$(git submodule--helper relative-path "$name" "$wt_prefix")
-
-		total_commits=
-		case "$missing_src,$missing_dst" in
-		t,)
-			errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commit \$sha1_src")"
-			;;
-		,t)
-			errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commit \$sha1_dst")"
-			;;
-		t,t)
-			errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commits \$sha1_src and \$sha1_dst")"
-			;;
-		*)
-			errmsg=
-			total_commits=$(
-			if test $mod_src = 160000 && test $mod_dst = 160000
-			then
-				range="$sha1_src...$sha1_dst"
-			elif test $mod_src = 160000
-			then
-				range=$sha1_src
-			else
-				range=$sha1_dst
-			fi
-			GIT_DIR="$name/.git" \
-			git rev-list --first-parent $range -- | wc -l
-			)
-			total_commits=" ($(($total_commits + 0)))"
-			;;
-		esac
-
-		sha1_abbr_src=$(GIT_DIR="$name/.git" git rev-parse --short $sha1_src 2>/dev/null ||
-			echo $sha1_src | cut -c1-7)
-		sha1_abbr_dst=$(GIT_DIR="$name/.git" git rev-parse --short $sha1_dst 2>/dev/null ||
-			echo $sha1_dst | cut -c1-7)
-
-		if test $status = T
-		then
-			blob="$(gettext "blob")"
-			submodule="$(gettext "submodule")"
-			if test $mod_dst = 160000
-			then
-				echo "* $display_name $sha1_abbr_src($blob)->$sha1_abbr_dst($submodule)$total_commits:"
-			else
-				echo "* $display_name $sha1_abbr_src($submodule)->$sha1_abbr_dst($blob)$total_commits:"
-			fi
-		else
-			echo "* $display_name $sha1_abbr_src...$sha1_abbr_dst$total_commits:"
-		fi
-		if test -n "$errmsg"
-		then
-			# Don't give error msg for modification whose dst is not submodule
-			# i.e. deleted or changed to blob
-			test $mod_dst = 160000 && echo "$errmsg"
-		else
-			if test $mod_src = 160000 && test $mod_dst = 160000
-			then
-				limit=
-				test $summary_limit -gt 0 && limit="-$summary_limit"
-				GIT_DIR="$name/.git" \
-				git log $limit --pretty='format:  %m %s' \
-				--first-parent $sha1_src...$sha1_dst
-			elif test $mod_dst = 160000
-			then
-				GIT_DIR="$name/.git" \
-				git log --pretty='format:  > %s' -1 $sha1_dst
-			else
-				GIT_DIR="$name/.git" \
-				git log --pretty='format:  < %s' -1 $sha1_src
-			fi
-			echo
-		fi
-		echo
-	done
+	git ${wt_prefix:+-C "$wt_prefix"} submodule--helper summary ${GIT_QUIET:+--quiet} ${prefix:+--prefix "$prefix"} ${for_status:+--for-status} ${files:+--files} ${cached:+--cached} ${summary_limit:+-n $summary_limit} "$@"
 }
 #
 # List all submodules, prefixed with:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] submodule: amend extra line feed between callback struct and macro
  2020-07-02 19:24 ` [PATCH 1/4] submodule: amend extra line feed between callback struct and macro Shourya Shukla
@ 2020-07-03 14:57   ` Johannes Schindelin
  2020-07-03 15:37     ` Philip Oakley
  0 siblings, 1 reply; 21+ messages in thread
From: Johannes Schindelin @ 2020-07-03 14:57 UTC (permalink / raw)
  To: Shourya Shukla
  Cc: git, christian.couder, gitster, liu.denton, kaartic.sivaraam,
	pc44800, sbeller, pclouds, Christian Couder

Hi Shourya,

"amend" sounds a little awkward to describe what the patch does. Maybe
"submodule: remove extra empty lines"?

On Fri, 3 Jul 2020, Shourya Shukla wrote:

> All subcommands of 'git submodule' using a callback mechanism had
> absence of an extra linefeed between their callback structs and
> macros. Subcommands 'init', 'status' and 'sync' did not follow suit.
> Amend the extra line feed.

Maybe a native reader can suggest something that flows a bit easier? I am
not a native English speaker, but I'd prefer something along those lines:

	Many `submodule--helper` subcommands follow the convention a
	struct defines their callback data, and the declaration of said
	struct is followed immediately by a macro to use in static
	initializers, without any separating empty line.

	Let's align the `init`, `status` and `sync` subcommands with that
	convention.

The patch obviously does what the commit message promises.

Ciao,
Dscho

> Mentored-by: Christian Couder <chriscool@tuxfamily.org>
> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
> ---
>  builtin/submodule--helper.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 59c1e1217c..eea3932c40 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -612,7 +612,6 @@ struct init_cb {
>  	const char *prefix;
>  	unsigned int flags;
>  };
> -
>  #define INIT_CB_INIT { NULL, 0 }
>
>  static void init_submodule(const char *path, const char *prefix,
> @@ -742,7 +741,6 @@ struct status_cb {
>  	const char *prefix;
>  	unsigned int flags;
>  };
> -
>  #define STATUS_CB_INIT { NULL, 0 }
>
>  static void print_status(unsigned int flags, char state, const char *path,
> @@ -933,7 +931,6 @@ struct sync_cb {
>  	const char *prefix;
>  	unsigned int flags;
>  };
> -
>  #define SYNC_CB_INIT { NULL, 0 }
>
>  static void sync_submodule(const char *path, const char *prefix,
> --
> 2.27.0
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] diff: change scope of the function count_lines()
  2020-07-02 19:24 ` [PATCH 3/4] diff: change scope of the function count_lines() Shourya Shukla
@ 2020-07-03 15:07   ` Johannes Schindelin
  2020-07-04 15:28     ` Shourya Shukla
  0 siblings, 1 reply; 21+ messages in thread
From: Johannes Schindelin @ 2020-07-03 15:07 UTC (permalink / raw)
  To: Shourya Shukla
  Cc: git, christian.couder, gitster, liu.denton, kaartic.sivaraam,
	pc44800, sbeller, pclouds

Hi Shourya,

On Fri, 3 Jul 2020, Shourya Shukla wrote:

> From: Prathamesh Chavan <pc44800@gmail.com>
>
> Change the scope of function count_lines for allowing the function
> to be reused in other parts of the code as well.

It may be just me, but I'd rather see the word "visibility" instead of
"scope" here. I mistook the subject line to indicate that the function is
changed to serve an (at least slightly) different purpose than before,
which is not actually the case.

Another alternative to "visibility" might be to imitate existing commit
messages, such as e4cb659ebdd (diff: export diffstat interface,
2019-11-13), 22184497a36 (factor out refresh_and_write_cache function,
2019-09-11) or ef283b3699f (apply: make parse_git_diff_header public,
2019-07-11).

In addition, as with all such changes, we need to be careful to consider
whether unrelated function names coming in from system headers might
clash. In this case, I think `count_lines()` is a bit too generic, but
will probably not clash. Personally, I would probably have opted for
`count_lines_in_string()`.

Ciao,
Johannes

>
> Mentored-by: Christian Couder <christian.couder@gmail.com>
> Mentored-by: Stefan Beller <sbeller@google.com>
> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
> Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
> ---
>  diff.c | 2 +-
>  diff.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/diff.c b/diff.c
> index 4a2c631c37..39ddbcf451 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -547,7 +547,7 @@ struct emit_callback {
>  	struct strbuf *header;
>  };
>
> -static int count_lines(const char *data, int size)
> +int count_lines(const char *data, int size)
>  {
>  	int count, ch, completely_empty = 1, nl_just_seen = 0;
>  	count = 0;
> diff --git a/diff.h b/diff.h
> index 9443dc1b00..bd96c8c434 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -495,6 +495,7 @@ void free_diffstat_info(struct diffstat_t *diffstat);
>  int parse_long_opt(const char *opt, const char **argv,
>  		   const char **optarg);
>
> +int count_lines(const char *data, int size);
>  int git_diff_basic_config(const char *var, const char *value, void *cb);
>  int git_diff_heuristic_config(const char *var, const char *value, void *cb);
>  void init_diff_ui_defaults(void);
> --
> 2.27.0
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] submodule: amend extra line feed between callback struct and macro
  2020-07-03 14:57   ` Johannes Schindelin
@ 2020-07-03 15:37     ` Philip Oakley
  2020-07-04 15:21       ` Shourya Shukla
  0 siblings, 1 reply; 21+ messages in thread
From: Philip Oakley @ 2020-07-03 15:37 UTC (permalink / raw)
  To: Johannes Schindelin, Shourya Shukla
  Cc: git, christian.couder, gitster, liu.denton, kaartic.sivaraam,
	pc44800, sbeller, pclouds, Christian Couder


Suggestion...

On 03/07/2020 15:57, Johannes Schindelin wrote:
> Maybe a native reader can suggest something that flows a bit easier? I am
> not a native English speaker, but I'd prefer something along those lines:
>
> 	Many `submodule--helper` subcommands follow the convention a
s/convention a/convention that a/    feels nicer for me
> 	struct defines their callback data, and the declaration of said
s/said/that/   maybe.  'said' is a bit too much like 'patent speak', but
otherwise either word is OK.
> 	struct is followed immediately by a macro to use in static
> 	initializers, without any separating empty line.
>
> 	Let's align the `init`, `status` and `sync` subcommands with that
> 	convention.
Nice summary.

Philip
Native Yorkshire speaker, but not that 'literate' ;-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C
  2020-07-02 19:24 ` [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C Shourya Shukla
@ 2020-07-03 20:46   ` Johannes Schindelin
  2020-07-05 17:34     ` Shourya Shukla
  0 siblings, 1 reply; 21+ messages in thread
From: Johannes Schindelin @ 2020-07-03 20:46 UTC (permalink / raw)
  To: Shourya Shukla
  Cc: git, christian.couder, gitster, liu.denton, kaartic.sivaraam,
	pc44800, stefanbeller, pclouds

Hi Shourya,

[exchanging Stefan Beller's dysfunct @google address for their private
one; I encourage you to do the same in the next iteration, probably
by editing the `Mentored-by:` line.]

On Fri, 3 Jul 2020, Shourya Shukla wrote:

> From: Prathamesh Chavan <pc44800@gmail.com>
>
> The submodule subcommand 'summary' is ported in the process of
> making git-submodule a builtin. The function cmd_summary() from
> git-submodule.sh is ported to functions module_summary(),
> compute_summary_module_list(), prepare_submodule_summary() and
> generate_submodule_summary(), print_submodule_summary().
>
> The first function module_summary() parses the options of submodule
> subcommand and also acts as the front-end of this subcommand.
> After parsing them, it calls the compute_summary_module_list()

Missing full-stop, and probably the sentence also wanted to say "function"
at the end.

> The functions compute_summary_module_list() runs the diff_cmd,
> and generates the modules list, as required by the subcommand.
> The generation of this module list is done by the using the

s/the using/using/

> callback function submodule_summary_callback(), and stored in the
> structure module_cb.

This explains nicely what the patch does. But the commit message should
not really repeat what can be readily deduced from the patch; It should
focus on the motivation and on interesting background information that is
_not_ readily deduced from the patch.

For example, I see that `$diff_cmd` is called twice in the shell script
version, once to "get modified modules cared by user" and then _again_,
with that list of modified modules.

I would have liked to see a reasoning in the commit message that explains
why this has to be so in the C version. I get why it is complicated in a
shell script (which lacks proper objects, after all), but I would have
expected the C version to be able to accumulate the information with a
single pass.

(Before writing the following paragraph, I actually reviewed the patch
from bottom to top, in the caller->callee direction.)

Ah. I see that this indeed is the case: there is only one pass in the C
version. That's a useful piece of metadata for the commit message, I
think, much more useful than describing the call tree of the functions.

Another thing worth mentioning in the commit message is that we use the
combination of setting a child_process' working directory to the submodule
path and then calling `prepare_submodule_repo_env()` which also sets
`GIT_DIR` to `.git`, so that we can be certain that those spawned
processes will not access the superproject's ODB by mistake.

When reading my suggestions, please keep in mind that I reviewed the
functions in caller->callee order, i.e. I started at the end of the patch
and then worked my way up.

All in all, I like the function structure, but I think there is still a
bit room for improvement in a v2.

> Once the module list is generated, prepare_submodule_summary()
> further goes through the list and filters the list, for
> eventually calling the generate_submodule_summary() function.
>
> The function generate_submodule_summary() takes care of generating
> the summary for each submodule and then calls the function
> print_submodule_summary() for printing it.
>
> Mentored-by: Christian Couder <christian.couder@gmail.com>
> Mentored-by: Stefan Beller <sbeller@google.com>
> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
> Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
> ---
>  builtin/submodule--helper.c | 451 ++++++++++++++++++++++++++++++++++++
>  git-submodule.sh            | 186 +--------------
>  2 files changed, 452 insertions(+), 185 deletions(-)
>
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index eea3932c40..1dbdb934f1 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -927,6 +927,456 @@ static int module_name(int argc, const char **argv, const char *prefix)
>  	return 0;
>  }
>
> +struct module_cb {
> +	unsigned int mod_src;
> +	unsigned int mod_dst;
> +	struct object_id oid_src;
> +	struct object_id oid_dst;
> +	char status;
> +	const char *sm_path;
> +};
> +#define MODULE_CB_INIT { 0, 0, NULL, NULL, '\0', NULL }
> +
> +struct module_cb_list {
> +	struct module_cb **entries;
> +	int alloc, nr;
> +};
> +#define MODULE_CB_LIST_INIT { NULL, 0, 0 }
> +
> +struct summary_cb {
> +	int argc;
> +	const char **argv;
> +	const char *prefix;
> +	unsigned int cached: 1;
> +	unsigned int for_status: 1;
> +	unsigned int quiet: 1;
> +	unsigned int files: 1;
> +	int summary_limit;
> +};
> +#define SUMMARY_CB_INIT { 0, NULL, NULL, 0, 0, 0, 0, 0 }
> +
> +enum diff_cmd {
> +	DIFF_INDEX,
> +	DIFF_FILES
> +};
> +
> +static int verify_submodule_object_name(const char *sm_path,
> +					  const char *sha1)
> +{
> +	struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> +
> +	cp_rev_parse.git_cmd = 1;
> +	cp_rev_parse.no_stdout = 1;
> +	cp_rev_parse.dir = sm_path;

So here we specify `sm_path` as current working directory.

> +	prepare_submodule_repo_env(&cp_rev_parse.env_array);

And this implicitly sets `GIT_DIR=.git`. Good.

> +	argv_array_pushl(&cp_rev_parse.args, "rev-parse", "-q",
> +			 "--verify", NULL);
> +	argv_array_pushf(&cp_rev_parse.args, "%s^0", sha1);

After this, we should also append `--` to make sure that we're not parsing
this as a file name.

Two comments about naming: `sha1` is pretty misleading here, as we do not
require it to be a SHA-1 (especially in the future in which we switch to
SHA-256). Besides, what we're really asking for (via that `^0` suffix) is
a committish. Therefore, I would propose to use `committish` both in the
parameter name as well as the function name.

> +
> +	if (run_command(&cp_rev_parse))
> +		return 1;
> +
> +	return 0;
> +}
> +
> +static void print_submodule_summary(struct summary_cb *info, int errmsg,
> +				      int total_commits, int missing_src,
> +				      int missing_dst, const char *displaypath,
> +				      int is_sm_git_dir, struct module_cb *p)
> +{
> +	if (p->status == 'T') {
> +		if (S_ISGITLINK(p->mod_dst))
> +			printf(_("* %s %s(blob)->%s(submodule)"),
> +				 displaypath, find_unique_abbrev(&p->oid_src, 7),

The shell script version does this:

                sha1_abbr_src=$(GIT_DIR="$name/.git" git rev-parse --short $sha1_src 2>/dev/null ||
                        echo $sha1_src | cut -c1-7)

That is not quite the same, as it looks for the abbreviation _in the
submodule_, not in the current project. So I think `find_unique_abbrev()`
is not correct here.

The funny thing is that we _already_ will have called `git rev-parse
--verify` for both `p->oid_src` and `p->oid_dst` in the submodule, in the
caller of this function! And while we throw away the result, and while we
do not pass `--short`, there is no reason why we shouldn't be able to do
precisely that.

> +				 find_unique_abbrev(&p->oid_dst, 7));
> +		else
> +			printf(_("* %s %s(submodule)->%s(blob)"),
> +				 displaypath, find_unique_abbrev(&p->oid_src, 7),
> +				 find_unique_abbrev(&p->oid_dst, 7));
> +	} else {
> +		printf("* %s %s...%s",
> +			displaypath, find_unique_abbrev(&p->oid_src, 7),
> +			find_unique_abbrev(&p->oid_dst, 7));
> +	}
> +
> +	if (total_commits < 0)
> +		printf(":\n");
> +	else
> +		printf(" (%d):\n", total_commits);
> +
> +	if (errmsg) {
> +		/*
> +		 * Don't give error msg for modification whose dst is not
> +		 * submodule, i.e. deleted or changed to blob
> +		 */
> +		if (S_ISGITLINK(p->mod_src)) {
> +			if (missing_src && missing_dst) {
> +				printf(_("  Warn: %s doesn't contain commits %s and %s\n"),
> +				       displaypath, oid_to_hex(&p->oid_src),
> +				       oid_to_hex(&p->oid_dst));
> +			} else if (missing_src) {
> +				printf(_("  Warn: %s doesn't contain commit %s\n"),
> +				       displaypath, oid_to_hex(&p->oid_src));
> +			} else {
> +				printf(_("  Warn: %s doesn't contain commit %s\n"),
> +				       displaypath, oid_to_hex(&p->oid_dst));
> +			}
> +		}
> +	} else if (is_sm_git_dir) {
> +		struct child_process cp_log = CHILD_PROCESS_INIT;
> +
> +		cp_log.git_cmd = 1;
> +		cp_log.dir = p->sm_path;
> +		prepare_submodule_repo_env(&cp_log.env_array);

Since the working directory is set to the top-level directory of the
submodule, and since `prepare_submodule_repo_env()` sets `GIT_DIR` to
`.git`, I think that the `is_sm_git_dir` condition is unnecessary. In
fact, the entire `is_sm_git_dir` parameter (and local variable in the
caller, see more on that below) can go away.

> +		argv_array_pushl(&cp_log.args, "log", NULL);
> +
> +		if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst)) {
> +			if (info->summary_limit > 0)
> +				argv_array_pushf(&cp_log.args, "-%d",
> +						 info->summary_limit);
> +
> +			argv_array_pushl(&cp_log.args, "--pretty=  %m %s",
> +					 "--first-parent", NULL);
> +			argv_array_pushf(&cp_log.args, "%s...%s",
> +					 oid_to_hex(&p->oid_src),
> +					 oid_to_hex(&p->oid_dst));
> +		} else if (S_ISGITLINK(p->mod_dst)) {
> +			argv_array_pushl(&cp_log.args, "--pretty=  > %s",
> +					 "-1", oid_to_hex(&p->oid_dst), NULL);
> +		} else {
> +			argv_array_pushl(&cp_log.args, "--pretty=  < %s",
> +					 "-1", oid_to_hex(&p->oid_src), NULL);
> +		}
> +		run_command(&cp_log);
> +	}
> +	printf("\n");
> +}

It looks as if there is a whole lot of `oid_to_hex(&p->oid_src)` in that
function. Together with the realization that we need the abbreviated
version of that at least in one place, and the other realization that we
already call `rev-parse --verify` for both `oid_src` and `oid_dst` in the
caller of this function, it seems to suggest itself that we would actually
want to pass the `--short` option, too, and to capture the output, and
pass it down to `print_submodule_summary()` _instead of_ `missing_src` and
`missing_dst` (e.g. as `src_abbrev` and `dst_abbrev`).
> +
> +static void generate_submodule_summary(struct summary_cb *info,
> +				       struct module_cb *p)
> +{
> +	int missing_src = 0;
> +	int missing_dst = 0;
> +	char *displaypath;
> +	int errmsg = 0;
> +	int total_commits = -1;
> +	int is_sm_git_dir = 0;
> +	struct strbuf sm_git_dir_sb = STRBUF_INIT;
> +
> +	if (!info->cached && oideq(&p->oid_dst, &null_oid)) {
> +		if (S_ISGITLINK(p->mod_dst)) {
> +			/*
> +			 * NEEDSWORK: avoid using separate process with
> +			 * the help of the function head_ref_submodule()

I don't quite understand this comment. There is no `head_ref_submodule()`
function.

> +			 */
> +			struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> +			struct strbuf sb_rev_parse = STRBUF_INIT;
> +
> +			cp_rev_parse.git_cmd = 1;
> +			cp_rev_parse.no_stderr = 1;
> +			cp_rev_parse.dir = p->sm_path;
> +			prepare_submodule_repo_env(&cp_rev_parse.env_array);
> +
> +			argv_array_pushl(&cp_rev_parse.args, "rev-parse",
> +					 "HEAD", NULL);
> +			if (!capture_command(&cp_rev_parse, &sb_rev_parse, 0)) {
> +				strbuf_strip_suffix(&sb_rev_parse, "\n");
> +				get_oid_hex(sb_rev_parse.buf, &p->oid_dst);
> +			}
> +			strbuf_release(&sb_rev_parse);
> +		} else if (S_ISLNK(p->mod_dst) || S_ISREG(p->mod_dst)) {
> +			struct child_process cp_hash_object = CHILD_PROCESS_INIT;
> +			struct strbuf sb_hash_object = STRBUF_INIT;
> +
> +			cp_hash_object.git_cmd = 1;
> +			argv_array_pushl(&cp_hash_object.args, "hash-object",
> +					 p->sm_path, NULL);
> +			if (!capture_command(&cp_hash_object,
> +					     &sb_hash_object, 0)) {
> +				strbuf_strip_suffix(&sb_hash_object, "\n");
> +				get_oid_hex(sb_hash_object.buf, &p->oid_dst);
> +			}
> +			strbuf_release(&sb_hash_object);

It would probably be shorter, less error-prone, and quicker to use
`index_fd()` directly.

BTW I am not quite sure that this code does the correct thing in case of a
symlink: it hashes the contents of the symlink target (if it is a file,
otherwise it errors out). But that is hardly an issue introduced by the
conversion, that's just copied from `git-submodule.sh`.

> +		} else {
> +			if (p->mod_dst)
> +				die(_("unexpected mode %d\n"), p->mod_dst);

Hmm. This does not match what the shell script version does:

                        *)
                                # unexpected type
                                eval_gettextln "unexpected mode \$mod_dst" >&2
                                continue ;;

I think we should also just write the message to `stderr` and continue,
not `die()`.

In addition to that, I am missing the C code for this case:

                        000000)
                                ;; # removed

It is quite possible that our test suite does not cover this case (or did
the test suite fail for you?). If that is indeed the case, it would be
really good to add a test case as part of this patch series, to gain
confidence in the correctness of the conversion.

> +		}
> +	}
> +
> +	strbuf_addstr(&sm_git_dir_sb, p->sm_path);

I have to admit that I am not loving the name `sm_git_dir_sb`. Why not
`submodule_git_dir`? I guess you copied it from elsewhere in
`submodule--helper.c`...

> +	if (is_nonbare_repository_dir(&sm_git_dir_sb))
> +		is_sm_git_dir = 1;

So here, we verify whether there is a repository at `p->sm_path`. I don't
see that in the shell script version:

                missing_src=
                missing_dst=

                test $mod_src = 160000 &&
                ! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_src^0 >/dev/null &&
                missing_src=t

                test $mod_dst = 160000 &&
                ! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_dst^0 >/dev/null &&
                missing_dst=t

Let's read a bit further.

> +
> +	if (is_sm_git_dir && S_ISGITLINK(p->mod_src))
> +		missing_src = verify_submodule_object_name(p->sm_path,
> +							   oid_to_hex(&p->oid_src));

Ah, and `verify_submodule_object_name()` uses `p->sm_path` as working
directory. But that's not what the shell script version did: it specified
the `GIT_DIR` explicitly.

And by using the `prepare_submodule_repo_env()` function in
`verify_submodule_object_name()`, we specify `GIT_DIR` implicitly, as I
pointed out in my comment on that function.

So I think that `is_sm_git_dir` might be

> +
> +	if (is_sm_git_dir && S_ISGITLINK(p->mod_dst))
> +		missing_dst = verify_submodule_object_name(p->sm_path,
> +							   oid_to_hex(&p->oid_dst));
> +
> +	displaypath = get_submodule_displaypath(p->sm_path, info->prefix);
> +
> +	if (!missing_dst && !missing_src) {
> +		if (is_sm_git_dir) {
> +			struct child_process cp_rev_list = CHILD_PROCESS_INIT;
> +			struct strbuf sb_rev_list = STRBUF_INIT;
> +			char *range;
> +
> +			if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst))
> +				range = xstrfmt("%s...%s", oid_to_hex(&p->oid_src),
> +						oid_to_hex(&p->oid_dst));
> +			else if (S_ISGITLINK(p->mod_src))
> +				range = xstrdup(oid_to_hex(&p->oid_src));
> +			else
> +				range = xstrdup(oid_to_hex(&p->oid_dst));
> +
> +			cp_rev_list.git_cmd = 1;
> +			cp_rev_list.dir = p->sm_path;
> +			prepare_submodule_repo_env(&cp_rev_list.env_array);

Again, due to setting the working directory to `p->sm_path` and
(implicitly, via `prepare_submodule_repo_env()`) `GIT_DIR` to `.git`, I do
not think that we have to guard this block beind `is_sm_git_dir`.

> +
> +			argv_array_pushl(&cp_rev_list.args, "rev-list",
> +					 "--first-parent", range, "--", NULL);

Since `argv_array_push()` duplicates the strings, anyway, we can totally
avoid the need for the `range` variable:

			if (IS_GITLINK(p->mod_src) && IS_GITLINK(p->mod_dst))
				argv_array_pushf(&cp_rev_list.args, "%s...%s",
						 oid_to_hex(&p->oid_src),
						 oid_to_hex(&p->oid_dst));
			else
				argv_array_push(&cp_rev_list.args, IS_GITLINK(p->mod_src) ?
						oid_to_hex(&p->oid_src) :
						oid_to_hex(&p->oid_dst));

> +			if (!capture_command(&cp_rev_list, &sb_rev_list, 0)) {
> +				if (sb_rev_list.len)
> +					total_commits = count_lines(sb_rev_list.buf,
> +								    sb_rev_list.len);

That's actually not necessary. `git rev-list --count` will give you a nice
number, no need to capture a potentially large amount of memory only to
count the lines.

This may also make the patch obsolete that makes `count_lines()` public.

> +				else
> +					total_commits = 0;
> +			}

> +
> +			free(range);
> +			strbuf_release(&sb_rev_list);
> +		}
> +	} else {
> +		errmsg = 1;
> +	}

I am missing the equivalent for these lines here:

                t,)
                        errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commit \$sha1_src")"
                        ;;
                ,t)
                        errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commit \$sha1_dst")"
                        ;;
                t,t)
                        errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commits \$sha1_src and \$ ha1_dst")"
                        ;;

I am not quite sure whether it is a good idea to leave it to the
`print_submodule_summary()` function to generate the `errmsg`. I think I'd
rather have it a `char *` than an `int`.

> +
> +	print_submodule_summary(info, errmsg, total_commits,
> +				missing_src, missing_dst,
> +		      		displaypath, is_sm_git_dir, p);
> +
> +	free(displaypath);
> +	strbuf_release(&sm_git_dir_sb);
> +}
> +
> +static void prepare_submodule_summary(struct summary_cb *info,
> +				      struct module_cb_list *list)
> +{
> +	int i;
> +	for (i = 0; i < list->nr; i++) {
> +		struct module_cb *p = list->entries[i];
> +		struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> +
> +		if (p->status == 'D' || p->status == 'T') {
> +			generate_submodule_summary(info, p);
> +			continue;
> +		}
> +
> +		if (info->for_status) {
> +			char *config_key;

Since the `config_key` is only used within the `if()` block it would be
better to declare it within that block.

> +			const char *ignore_config = "none";

Since the only value we ever care about is "all", how about turning this
into an `int`, setting it to `0` here, and later assigning it to
`!strcmp(value, "all")` and `!strcmp(sub->ignore, "all")`, respectively?

I mean, I get it. Unix shell scripts are all about passing around text.
And it is much easier to just translate that to C faithfully. But that
does not make it good C code. C has data types, and proper C code makes
use of that.

> +			const char *value;

If you want to save on lines, you can cuddle this together with other
declarations of the same type. Even so, it could be scoped more narrowly.

> +			const struct submodule *sub = submodule_from_path(the_repository,
> +									  &null_oid,
> +									  p->sm_path);
> +
> +			if (sub && p->status != 'A') {

Good. The shell script version _always_ retrieved the `.ignore` config
value, even if the `status` is `A`. Your version is much better.

But why bother calling `submodule_from_path()` if the status is `A`?

I could actually see the `const struct submodule *sub;` declaration be
pulled out of this scope, and combining the `if (info->for_status &&
p->status != 'A'), and the moving the assignment of `sub` into the `else
if ((sub = submodule_from_path(r, &null_oid, p->sm_path)) &&
sub->ignore)`.

That would save us one entire indentation level.

> +				config_key = xstrfmt("submodule.%s.ignore",
> +						     sub->name);
> +				if (!git_config_get_string_const(config_key, &value))
> +					ignore_config = value;
> +				else if (sub->ignore)
> +					ignore_config = sub->ignore;
> +
> +				free(config_key);
> +				if (!strcmp(ignore_config, "all"))
> +					continue;
> +			}
> +		}
> +
> +		/* Also show added or modified modules which are checked out */
> +		cp_rev_parse.dir = p->sm_path;
> +		cp_rev_parse.git_cmd = 1;
> +		cp_rev_parse.no_stderr = 1;
> +		cp_rev_parse.no_stdout = 1;
> +
> +		argv_array_pushl(&cp_rev_parse.args, "rev-parse",
> +				 "--git-dir", NULL);
> +
> +		if (!run_command(&cp_rev_parse))

I wonder whether we really need to waste an entire spawned process on
figuring out whether `p->sm_path` refers to an active repository. Wouldn't
`is_submodule_active(r, p->sm_path)` fulfill the same purpose?

> +			generate_submodule_summary(info, p);
> +	}
> +}
> +
> +static void submodule_summary_callback(struct diff_queue_struct *q,
> +				       struct diff_options *options,
> +				       void *data)
> +{
> +	int i;
> +	struct module_cb_list *list = data;
> +	for (i = 0; i < q->nr; i++) {
> +		struct diff_filepair *p = q->queue[i];
> +		struct module_cb *temp;
> +
> +		if (!S_ISGITLINK(p->one->mode) && !S_ISGITLINK(p->two->mode))
> +			continue;
> +		temp = (struct module_cb*)malloc(sizeof(struct module_cb));
> +		temp->mod_src = p->one->mode;
> +		temp->mod_dst = p->two->mode;
> +		temp->oid_src = p->one->oid;
> +		temp->oid_dst = p->two->oid;
> +		temp->status = p->status;
> +		temp->sm_path = xstrdup(p->one->path);
> +
> +		ALLOC_GROW(list->entries, list->nr + 1, list->alloc);
> +		list->entries[list->nr++] = temp;
> +	}
> +}
> +
> +static const char *get_diff_cmd(enum diff_cmd diff_cmd)
> +{
> +	switch (diff_cmd) {
> +	case DIFF_INDEX: return "diff-index";
> +	case DIFF_FILES: return "diff-files";
> +	default: BUG("bad diff_cmd value %d", diff_cmd);
> +	}
> +}
> +
> +static int compute_summary_module_list(char *head,
> +				         struct summary_cb *info,
> +				         enum diff_cmd diff_cmd)
> +{
> +	struct argv_array diff_args = ARGV_ARRAY_INIT;
> +	struct rev_info rev;
> +	struct module_cb_list list = MODULE_CB_LIST_INIT;
> +
> +	argv_array_push(&diff_args, get_diff_cmd(diff_cmd));
> +	if (info->cached)
> +		argv_array_push(&diff_args, "--cached");
> +	argv_array_pushl(&diff_args, "--ignore-submodules=dirty", "--raw",
> +			 NULL);
> +	if (head)
> +		argv_array_push(&diff_args, head);
> +	argv_array_push(&diff_args, "--");
> +	if (info->argc)
> +		argv_array_pushv(&diff_args, info->argv);
> +
> +	git_config(git_diff_basic_config, NULL);
> +	init_revisions(&rev, info->prefix);
> +	rev.abbrev = 0;
> +	precompose_argv(diff_args.argc, diff_args.argv);
> +
> +	diff_args.argc = setup_revisions(diff_args.argc, diff_args.argv,
> +					 &rev, NULL);
> +	rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
> +	rev.diffopt.format_callback = submodule_summary_callback;
> +	rev.diffopt.format_callback_data = &list;
> +
> +	if (!info->cached) {
> +		if (diff_cmd ==  DIFF_INDEX)

Please substitute the double-space by a single one.

> +			setup_work_tree();
> +		if (read_cache_preload(&rev.diffopt.pathspec) < 0) {
> +			perror("read_cache_preload");
> +			return -1;
> +		}
> +	} else if (read_cache() < 0) {
> +		perror("read_cache");
> +		return -1;
> +	}
> +
> +	if (diff_cmd == DIFF_INDEX)
> +		run_diff_index(&rev, info->cached);
> +	else
> +		run_diff_files(&rev, 0);
> +	prepare_submodule_summary(info, &list);
> +	return 0;
> +}
> +
> +static int module_summary(int argc, const char **argv, const char *prefix)
> +{
> +	struct summary_cb info = SUMMARY_CB_INIT;
> +	int cached = 0;
> +	int for_status = 0;
> +	int quiet = 0;
> +	int files = 0;
> +	int summary_limit = -1;
> +	struct child_process cp_rev = CHILD_PROCESS_INIT;
> +	struct strbuf sb = STRBUF_INIT;
> +	enum diff_cmd diff_cmd = DIFF_INDEX;
> +	int ret;
> +
> +	struct option module_summary_options[] = {
> +		OPT__QUIET(&quiet, N_("Suppress output of summarising submodules")),
> +		OPT_BOOL(0, "cached", &cached,
> +			 N_("Use the commit stored in the index instead of the submodule HEAD")),
> +		OPT_BOOL(0, "files", &files,
> +			 N_("To compare the commit in the index with that in the submodule HEAD")),
> +		OPT_BOOL(0, "for-status", &for_status,
> +			 N_("Skip submodules with 'ignore_config' value set to 'all'")),
> +		OPT_INTEGER('n', "summary-limit", &summary_limit,
> +			     N_("Limit the summary size")),
> +		OPT_END()
> +	};
> +
> +	const char *const git_submodule_helper_usage[] = {
> +		N_("git submodule--helper summary [<options>] [commit] [--] [<path>]"),
> +		NULL
> +	};
> +
> +	argc = parse_options(argc, argv, prefix, module_summary_options,
> +			     git_submodule_helper_usage, 0);
> +
> +	if (!summary_limit)
> +		return 0;
> +
> +	cp_rev.git_cmd = 1;
> +	argv_array_pushl(&cp_rev.args, "rev-parse", "-q", "--verify",
> +			 argc ? argv[0] : "HEAD", NULL);

Oy. Why not simply call `get_oid()`? No need to spawn a new process.

> +
> +	if (!capture_command(&cp_rev, &sb, 0)) {
> +		strbuf_strip_suffix(&sb, "\n");
> +		if (argc) {
> +			argv++;
> +			argc--;
> +		}
> +	} else if (!argc || !strcmp(argv[0], "HEAD")) {
> +		/* before the first commit: compare with an empty tree */
> +		struct stat st;
> +		struct object_id oid;
> +		if (fstat(0, &st) < 0 || index_fd(&the_index, &oid, 0, &st, 2,
> +						  prefix, 3))
> +			die("Unable to add %s to database", oid.hash);

Umm. The original reads:

                # before the first commit: compare with an empty tree
                head=$(git hash-object -w -t tree --stdin </dev/null)

It does not actually read from `stdin`. It reads from `/dev/null`,
redirected to the input. And what it _actually_ does is to generate the
OID of the empty tree.

But we already _have_ the OID of the empty tree! It's
`the_hash_algo->empty_tree`.

I hope that this is covered by the test suite. Please check that. The test
would succeed with your version, but only because tests are run with
`stdin` redirected from `/dev/null` by default.

> +		strbuf_addstr(&sb, oid_to_hex(&oid));
> +		if (argc) {
> +			argv++;
> +			argc--;
> +		}
> +	} else {
> +		strbuf_addstr(&sb, "HEAD");
> +	}

The conversion to C would make for a fine excuse to simplify the logic.

> +	if (files) {
> +		if (cached)
> +			die(_("--cached and --files are mutually exclusive"));
> +		diff_cmd = DIFF_FILES;
> +	}
> +
> +	info.argc = argc;
> +	info.argv = argv;
> +	info.prefix = prefix;
> +	info.cached = !!cached;
> +	info.for_status = !!for_status;
> +	info.quiet = quiet;
> +	info.files = files;
> +	info.summary_limit = summary_limit;
> +
> +	ret = compute_summary_module_list((diff_cmd == DIFF_FILES) ? NULL : sb.buf,
> +					   &info, diff_cmd);

It would be better to pass the OID as `struct object_id *`, not as string.

Other than that, this patch nicely follows previous conversions from Unix
shell scripts to C.

Well done,
Johannes

> +	strbuf_release(&sb);
> +	return ret;
> +}
> +
>  struct sync_cb {
>  	const char *prefix;
>  	unsigned int flags;
> @@ -2341,6 +2791,7 @@ static struct cmd_struct commands[] = {
>  	{"print-default-remote", print_default_remote, 0},
>  	{"sync", module_sync, SUPPORT_SUPER_PREFIX},
>  	{"deinit", module_deinit, 0},
> +	{"summary", module_summary, SUPPORT_SUPER_PREFIX},
>  	{"remote-branch", resolve_remote_submodule_branch, 0},
>  	{"push-check", push_check, 0},
>  	{"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX},
> diff --git a/git-submodule.sh b/git-submodule.sh
> index 43eb6051d2..899b8a409a 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -59,31 +59,6 @@ die_if_unmatched ()
>  	fi
>  }
>
> -#
> -# Print a submodule configuration setting
> -#
> -# $1 = submodule name
> -# $2 = option name
> -# $3 = default value
> -#
> -# Checks in the usual git-config places first (for overrides),
> -# otherwise it falls back on .gitmodules.  This allows you to
> -# distribute project-wide defaults in .gitmodules, while still
> -# customizing individual repositories if necessary.  If the option is
> -# not in .gitmodules either, print a default value.
> -#
> -get_submodule_config () {
> -	name="$1"
> -	option="$2"
> -	default="$3"
> -	value=$(git config submodule."$name"."$option")
> -	if test -z "$value"
> -	then
> -		value=$(git submodule--helper config submodule."$name"."$option")
> -	fi
> -	printf '%s' "${value:-$default}"
> -}
> -
>  isnumber()
>  {
>  	n=$(($1 + 0)) 2>/dev/null && test "$n" = "$1"
> @@ -831,166 +806,7 @@ cmd_summary() {
>  		shift
>  	done
>
> -	test $summary_limit = 0 && return
> -
> -	if rev=$(git rev-parse -q --verify --default HEAD ${1+"$1"})
> -	then
> -		head=$rev
> -		test $# = 0 || shift
> -	elif test -z "$1" || test "$1" = "HEAD"
> -	then
> -		# before the first commit: compare with an empty tree
> -		head=$(git hash-object -w -t tree --stdin </dev/null)
> -		test -z "$1" || shift
> -	else
> -		head="HEAD"
> -	fi
> -
> -	if [ -n "$files" ]
> -	then
> -		test -n "$cached" &&
> -		die "$(gettext "The --cached option cannot be used with the --files option")"
> -		diff_cmd=diff-files
> -		head=
> -	fi
> -
> -	cd_to_toplevel
> -	eval "set $(git rev-parse --sq --prefix "$wt_prefix" -- "$@")"
> -	# Get modified modules cared by user
> -	modules=$(git $diff_cmd $cached --ignore-submodules=dirty --raw $head -- "$@" |
> -		sane_egrep '^:([0-7]* )?160000' |
> -		while read -r mod_src mod_dst sha1_src sha1_dst status sm_path
> -		do
> -			# Always show modules deleted or type-changed (blob<->module)
> -			if test "$status" = D || test "$status" = T
> -			then
> -				printf '%s\n' "$sm_path"
> -				continue
> -			fi
> -			# Respect the ignore setting for --for-status.
> -			if test -n "$for_status"
> -			then
> -				name=$(git submodule--helper name "$sm_path")
> -				ignore_config=$(get_submodule_config "$name" ignore none)
> -				test $status != A && test $ignore_config = all && continue
> -			fi
> -			# Also show added or modified modules which are checked out
> -			GIT_DIR="$sm_path/.git" git rev-parse --git-dir >/dev/null 2>&1 &&
> -			printf '%s\n' "$sm_path"
> -		done
> -	)
> -
> -	test -z "$modules" && return
> -
> -	git $diff_cmd $cached --ignore-submodules=dirty --raw $head -- $modules |
> -	sane_egrep '^:([0-7]* )?160000' |
> -	cut -c2- |
> -	while read -r mod_src mod_dst sha1_src sha1_dst status name
> -	do
> -		if test -z "$cached" &&
> -			is_zero_oid $sha1_dst
> -		then
> -			case "$mod_dst" in
> -			160000)
> -				sha1_dst=$(GIT_DIR="$name/.git" git rev-parse HEAD)
> -				;;
> -			100644 | 100755 | 120000)
> -				sha1_dst=$(git hash-object $name)
> -				;;
> -			000000)
> -				;; # removed
> -			*)
> -				# unexpected type
> -				eval_gettextln "unexpected mode \$mod_dst" >&2
> -				continue ;;
> -			esac
> -		fi
> -		missing_src=
> -		missing_dst=
> -
> -		test $mod_src = 160000 &&
> -		! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_src^0 >/dev/null &&
> -		missing_src=t
> -
> -		test $mod_dst = 160000 &&
> -		! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_dst^0 >/dev/null &&
> -		missing_dst=t
> -
> -		display_name=$(git submodule--helper relative-path "$name" "$wt_prefix")
> -
> -		total_commits=
> -		case "$missing_src,$missing_dst" in
> -		t,)
> -			errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commit \$sha1_src")"
> -			;;
> -		,t)
> -			errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commit \$sha1_dst")"
> -			;;
> -		t,t)
> -			errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commits \$sha1_src and \$sha1_dst")"
> -			;;
> -		*)
> -			errmsg=
> -			total_commits=$(
> -			if test $mod_src = 160000 && test $mod_dst = 160000
> -			then
> -				range="$sha1_src...$sha1_dst"
> -			elif test $mod_src = 160000
> -			then
> -				range=$sha1_src
> -			else
> -				range=$sha1_dst
> -			fi
> -			GIT_DIR="$name/.git" \
> -			git rev-list --first-parent $range -- | wc -l
> -			)
> -			total_commits=" ($(($total_commits + 0)))"
> -			;;
> -		esac
> -
> -		sha1_abbr_src=$(GIT_DIR="$name/.git" git rev-parse --short $sha1_src 2>/dev/null ||
> -			echo $sha1_src | cut -c1-7)
> -		sha1_abbr_dst=$(GIT_DIR="$name/.git" git rev-parse --short $sha1_dst 2>/dev/null ||
> -			echo $sha1_dst | cut -c1-7)
> -
> -		if test $status = T
> -		then
> -			blob="$(gettext "blob")"
> -			submodule="$(gettext "submodule")"
> -			if test $mod_dst = 160000
> -			then
> -				echo "* $display_name $sha1_abbr_src($blob)->$sha1_abbr_dst($submodule)$total_commits:"
> -			else
> -				echo "* $display_name $sha1_abbr_src($submodule)->$sha1_abbr_dst($blob)$total_commits:"
> -			fi
> -		else
> -			echo "* $display_name $sha1_abbr_src...$sha1_abbr_dst$total_commits:"
> -		fi
> -		if test -n "$errmsg"
> -		then
> -			# Don't give error msg for modification whose dst is not submodule
> -			# i.e. deleted or changed to blob
> -			test $mod_dst = 160000 && echo "$errmsg"
> -		else
> -			if test $mod_src = 160000 && test $mod_dst = 160000
> -			then
> -				limit=
> -				test $summary_limit -gt 0 && limit="-$summary_limit"
> -				GIT_DIR="$name/.git" \
> -				git log $limit --pretty='format:  %m %s' \
> -				--first-parent $sha1_src...$sha1_dst
> -			elif test $mod_dst = 160000
> -			then
> -				GIT_DIR="$name/.git" \
> -				git log --pretty='format:  > %s' -1 $sha1_dst
> -			else
> -				GIT_DIR="$name/.git" \
> -				git log --pretty='format:  < %s' -1 $sha1_src
> -			fi
> -			echo
> -		fi
> -		echo
> -	done
> +	git ${wt_prefix:+-C "$wt_prefix"} submodule--helper summary ${GIT_QUIET:+--quiet} ${prefix:+--prefix "$prefix"} ${for_status:+--for-status} ${files:+--files} ${cached:+--cached} ${summary_limit:+-n $summary_limit} "$@"
>  }
>  #
>  # List all submodules, prefixed with:
> --
> 2.27.0
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] submodule: amend extra line feed between callback struct and macro
  2020-07-03 15:37     ` Philip Oakley
@ 2020-07-04 15:21       ` Shourya Shukla
  2020-07-04 15:39         ` Đoàn Trần Công Danh
  0 siblings, 1 reply; 21+ messages in thread
From: Shourya Shukla @ 2020-07-04 15:21 UTC (permalink / raw)
  To: Philip Oakley
  Cc: git, christian.couder, gitster, liu.denton, kaartic.sivaraam,
	Johannes.Schindelin, pclouds

On 03/07 04:37, Philip Oakley wrote:
> 
> Suggestion...
> 
> On 03/07/2020 15:57, Johannes Schindelin wrote:
> > Maybe a native reader can suggest something that flows a bit easier? I am
> > not a native English speaker, but I'd prefer something along those lines:
> >
> > 	Many `submodule--helper` subcommands follow the convention a
> s/convention a/convention that a/    feels nicer for me

I did not get this one. Are you asking to replace "convention" with "a"
only?

> > 	struct defines their callback data, and the declaration of said
> s/said/that/   maybe.  'said' is a bit too much like 'patent speak', but
> otherwise either word is OK.

Okay will do!

> > 	struct is followed immediately by a macro to use in static
> > 	initializers, without any separating empty line.
> >
> > 	Let's align the `init`, `status` and `sync` subcommands with that
> > 	convention.
> Nice summary.
> 
> Philip
> Native Yorkshire speaker, but not that 'literate' ;-)

This made me chuckle. Nice!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] diff: change scope of the function count_lines()
  2020-07-03 15:07   ` Johannes Schindelin
@ 2020-07-04 15:28     ` Shourya Shukla
  2020-07-04 21:46       ` Christian Couder
  0 siblings, 1 reply; 21+ messages in thread
From: Shourya Shukla @ 2020-07-04 15:28 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: christian.couder, gitster, liu.denton, kaartic.sivaraam, pclouds, git

On 03/07 05:07, Johannes Schindelin wrote:
> Hi Shourya,
> 
> On Fri, 3 Jul 2020, Shourya Shukla wrote:
> 
> > From: Prathamesh Chavan <pc44800@gmail.com>
> >
> > Change the scope of function count_lines for allowing the function
> > to be reused in other parts of the code as well.
> 
> It may be just me, but I'd rather see the word "visibility" instead of
> "scope" here. I mistook the subject line to indicate that the function is
> changed to serve an (at least slightly) different purpose than before,
> which is not actually the case.
> 
> Another alternative to "visibility" might be to imitate existing commit
> messages, such as e4cb659ebdd (diff: export diffstat interface,
> 2019-11-13), 22184497a36 (factor out refresh_and_write_cache function,
> 2019-09-11) or ef283b3699f (apply: make parse_git_diff_header public,
> 2019-07-11).

These are some excellent commit messages! I will take inspiration from
these. Thanks Dscho! :)

> In addition, as with all such changes, we need to be careful to consider
> whether unrelated function names coming in from system headers might
> clash. In this case, I think `count_lines()` is a bit too generic, but
> will probably not clash. Personally, I would probably have opted for
> `count_lines_in_string()`.

I do not see it clashing with any other functions. Do we need to change
the name still (maybe to make the functions purpose even more clear)?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] submodule: amend extra line feed between callback struct and macro
  2020-07-04 15:21       ` Shourya Shukla
@ 2020-07-04 15:39         ` Đoàn Trần Công Danh
  2020-07-05 10:52           ` Philip Oakley
  0 siblings, 1 reply; 21+ messages in thread
From: Đoàn Trần Công Danh @ 2020-07-04 15:39 UTC (permalink / raw)
  To: Shourya Shukla
  Cc: Philip Oakley, git, christian.couder, gitster, liu.denton,
	kaartic.sivaraam, Johannes.Schindelin, pclouds

On 2020-07-04 20:51:04+0530, Shourya Shukla <shouryashukla.oo@gmail.com> wrote:
> On 03/07 04:37, Philip Oakley wrote:
> > 
> > Suggestion...
> > 
> > On 03/07/2020 15:57, Johannes Schindelin wrote:
> > > Maybe a native reader can suggest something that flows a bit easier? I am
> > > not a native English speaker, but I'd prefer something along those lines:
> > >
> > > 	Many `submodule--helper` subcommands follow the convention a
> > s/convention a/convention that a/    feels nicer for me
> 
> I did not get this one. Are you asking to replace "convention" with "a"
> only?

He probably meant put "that" between "convention" and "a"
Pull ed/vi/sed out and that try this command on that line :)

	s/convention a/convention that a/


-- 
Danh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] diff: change scope of the function count_lines()
  2020-07-04 15:28     ` Shourya Shukla
@ 2020-07-04 21:46       ` Christian Couder
  0 siblings, 0 replies; 21+ messages in thread
From: Christian Couder @ 2020-07-04 21:46 UTC (permalink / raw)
  To: Shourya Shukla
  Cc: Johannes Schindelin, Junio C Hamano, Denton Liu,
	Kaartic Sivaraam, Nguyen Thai Ngoc Duy, git

On Sat, Jul 4, 2020 at 5:28 PM Shourya Shukla
<shouryashukla.oo@gmail.com> wrote:
>
> On 03/07 05:07, Johannes Schindelin wrote:

> > In addition, as with all such changes, we need to be careful to consider
> > whether unrelated function names coming in from system headers might
> > clash. In this case, I think `count_lines()` is a bit too generic, but
> > will probably not clash. Personally, I would probably have opted for
> > `count_lines_in_string()`.
>
> I do not see it clashing with any other functions. Do we need to change
> the name still (maybe to make the functions purpose even more clear)?

This patch might not be needed at all according to Dscho's review of
another patch in this series.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] submodule: amend extra line feed between callback struct and macro
  2020-07-04 15:39         ` Đoàn Trần Công Danh
@ 2020-07-05 10:52           ` Philip Oakley
  0 siblings, 0 replies; 21+ messages in thread
From: Philip Oakley @ 2020-07-05 10:52 UTC (permalink / raw)
  To: Đoàn Trần Công Danh, Shourya Shukla
  Cc: git, christian.couder, gitster, liu.denton, kaartic.sivaraam,
	Johannes.Schindelin, pclouds

On 04/07/2020 16:39, Đoàn Trần Công Danh wrote:
> On 2020-07-04 20:51:04+0530, Shourya Shukla <shouryashukla.oo@gmail.com> wrote:
>> On 03/07 04:37, Philip Oakley wrote:
>>> Suggestion...
>>>
>>> On 03/07/2020 15:57, Johannes Schindelin wrote:
>>>> Maybe a native reader can suggest something that flows a bit easier? I am
>>>> not a native English speaker, but I'd prefer something along those lines:
>>>>
>>>> 	Many `submodule--helper` subcommands follow the convention a
>>> s/convention a/convention that a/    feels nicer for me
>> I did not get this one. Are you asking to replace "convention" with "a"
>> only?
> He probably meant put "that" between "convention" and "a"
> Pull ed/vi/sed out and that try this command on that line :)
>
> 	s/convention a/convention that a/
>
>
Correct.

Philip

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C
  2020-07-03 20:46   ` Johannes Schindelin
@ 2020-07-05 17:34     ` Shourya Shukla
  2020-07-06  9:16       ` Kaartic Sivaraam
  0 siblings, 1 reply; 21+ messages in thread
From: Shourya Shukla @ 2020-07-05 17:34 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, christian.couder, gitster, liu.denton, kaartic.sivaraam, pc44800

Hello Dscho!

Before replying, one thing I want to clarify is that I ask various
things for double-checking since if I get even the slightest confusion I
start overthinking and destroying it all for myself. Please bear with me
and confirm/clarify stuff wherever possible. Would be of great help! :)

> Hi Shourya,
> 
> [exchanging Stefan Beller's dysfunct @google address for their private
> one; I encourage you to do the same in the next iteration, probably
> by editing the `Mentored-by:` line.]

I think you missed to mention it.

> On Fri, 3 Jul 2020, Shourya Shukla wrote:
> 
> > From: Prathamesh Chavan <pc44800@gmail.com>
> >
> > The submodule subcommand 'summary' is ported in the process of
> > making git-submodule a builtin. The function cmd_summary() from
> > git-submodule.sh is ported to functions module_summary(),
> > compute_summary_module_list(), prepare_submodule_summary() and
> > generate_submodule_summary(), print_submodule_summary().
> >
> > The first function module_summary() parses the options of submodule
> > subcommand and also acts as the front-end of this subcommand.
> > After parsing them, it calls the compute_summary_module_list()
> 
> Missing full-stop, and probably the sentence also wanted to say "function"
> at the end.

I will correct. Thanks for pointing out!

> > The functions compute_summary_module_list() runs the diff_cmd,
> > and generates the modules list, as required by the subcommand.
> > The generation of this module list is done by the using the
> 
> s/the using/using/

Will amend!

> > callback function submodule_summary_callback(), and stored in the
> > structure module_cb.
> 
> This explains nicely what the patch does. But the commit message should
> not really repeat what can be readily deduced from the patch; It should
> focus on the motivation and on interesting background information that is
> _not_ readily deduced from the patch.

I understand. I will follow your suggestions regarding my patch.

> For example, I see that `$diff_cmd` is called twice in the shell script
> version, once to "get modified modules cared by user" and then _again_,
> with that list of modified modules.
> 
> I would have liked to see a reasoning in the commit message that explains
> why this has to be so in the C version. I get why it is complicated in a
> shell script (which lacks proper objects, after all), but I would have
> expected the C version to be able to accumulate the information with a
> single pass.
> 
> (Before writing the following paragraph, I actually reviewed the patch
> from bottom to top, in the caller->callee direction.)
> 
> Ah. I see that this indeed is the case: there is only one pass in the C
> version. That's a useful piece of metadata for the commit message, I
> think, much more useful than describing the call tree of the functions.

Yup that it worth mentioning.

> Another thing worth mentioning in the commit message is that we use the
> combination of setting a child_process' working directory to the submodule
> path and then calling `prepare_submodule_repo_env()` which also sets
> `GIT_DIR` to `.git`, so that we can be certain that those spawned
> processes will not access the superproject's ODB by mistake.
> 
> When reading my suggestions, please keep in mind that I reviewed the
> functions in caller->callee order, i.e. I started at the end of the patch
> and then worked my way up.
> 
> All in all, I like the function structure, but I think there is still a
> bit room for improvement in a v2.

> > +static int verify_submodule_object_name(const char *sm_path,
> > +					  const char *sha1)
> > +{
> > +	struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> > + > > +	cp_rev_parse.git_cmd = 1;
> > +	cp_rev_parse.no_stdout = 1;
> > +	cp_rev_parse.dir = sm_path;
> 
> So here we specify `sm_path` as current working directory.
> 
> > +	prepare_submodule_repo_env(&cp_rev_parse.env_array);
> 
> And this implicitly sets `GIT_DIR=.git`. Good.
> 
> > +	argv_array_pushl(&cp_rev_parse.args, "rev-parse", "-q",
> > +			 "--verify", NULL);
> > +	argv_array_pushf(&cp_rev_parse.args, "%s^0", sha1);
> 
> After this, we should also append `--` to make sure that we're not parsing
> this as a file name.

Will do!

> Two comments about naming: `sha1` is pretty misleading here, as we do not
> require it to be a SHA-1 (especially in the future in which we switch to
> SHA-256). Besides, what we're really asking for (via that `^0` suffix) is
> a committish. Therefore, I would propose to use `committish` both in the
> parameter name as well as the function name.

I am not aware of this change. I will take this suggestion into account.

> > +
> > +	if (run_command(&cp_rev_parse))
> > +		return 1;
> > +
> > +	return 0;
> > +}
> > +
> > +static void print_submodule_summary(struct summary_cb *info, int errmsg,
> > +				      int total_commits, int missing_src,
> > +				      int missing_dst, const char *displaypath,
> > +				      int is_sm_git_dir, struct module_cb *p)
> > +{
> > +	if (p->status == 'T') {
> > +		if (S_ISGITLINK(p->mod_dst))
> > +			printf(_("* %s %s(blob)->%s(submodule)"),
> > +				 displaypath, find_unique_abbrev(&p->oid_src, 7),
> 
> The shell script version does this:
> 
>                 sha1_abbr_src=$(GIT_DIR="$name/.git" git rev-parse --short $sha1_src 2>/dev/null ||
>                         echo $sha1_src | cut -c1-7)
> 
> That is not quite the same, as it looks for the abbreviation _in the
> submodule_, not in the current project. So I think `find_unique_abbrev()`
> is not correct here.
> 
> The funny thing is that we _already_ will have called `git rev-parse
> --verify` for both `p->oid_src` and `p->oid_dst` in the submodule, in the
> caller of this function! And while we throw away the result, and while we
> do not pass `--short`, there is no reason why we shouldn't be able to do
> precisely that.

Okay so you are saying that there is no need of a 'find_unique_abbrev()'
since we would be easily able to obtain these values from the caller of
'print_submodule_summary()' right? Maybe we can pass 'oid_src' or
'oid_dst' as an argument?

> > +				 find_unique_abbrev(&p->oid_dst, 7));
> > +		else
> > +			printf(_("* %s %s(submodule)->%s(blob)"),
> > +				 displaypath, find_unique_abbrev(&p->oid_src, 7),
> > +				 find_unique_abbrev(&p->oid_dst, 7));
> > +	} else {
> > +		printf("* %s %s...%s",
> > +			displaypath, find_unique_abbrev(&p->oid_src, 7),
> > +			find_unique_abbrev(&p->oid_dst, 7));
> > +	}
> > +
> > +	if (total_commits < 0)
> > +		printf(":\n");
> > +	else
> > +		printf(" (%d):\n", total_commits);
> > +
> > +	if (errmsg) {
> > +		/*
> > +		 * Don't give error msg for modification whose dst is not
> > +		 * submodule, i.e. deleted or changed to blob
> > +		 */
> > +		if (S_ISGITLINK(p->mod_src)) {
> > +			if (missing_src && missing_dst) {
> > +				printf(_("  Warn: %s doesn't contain commits %s and %s\n"),
> > +				       displaypath, oid_to_hex(&p->oid_src),
> > +				       oid_to_hex(&p->oid_dst));
> > +			} else if (missing_src) {
> > +				printf(_("  Warn: %s doesn't contain commit %s\n"),
> > +				       displaypath, oid_to_hex(&p->oid_src));
> > +			} else {
> > +				printf(_("  Warn: %s doesn't contain commit %s\n"),
> > +				       displaypath, oid_to_hex(&p->oid_dst));
> > +			}
> > +		}
> > +	} else if (is_sm_git_dir) {
> > +		struct child_process cp_log = CHILD_PROCESS_INIT;
> > +
> > +		cp_log.git_cmd = 1;
> > +		cp_log.dir = p->sm_path;
> > +		prepare_submodule_repo_env(&cp_log.env_array);
> 
> Since the working directory is set to the top-level directory of the
> submodule, and since `prepare_submodule_repo_env()` sets `GIT_DIR` to
> `.git`, I think that the `is_sm_git_dir` condition is unnecessary. In
> fact, the entire `is_sm_git_dir` parameter (and local variable in the
> caller, see more on that below) can go away.

Because we already set the $GIT_DIR to .git/ so an extra check will not
be necessary right?

> > +		argv_array_pushl(&cp_log.args, "log", NULL);
> > +
> > +		if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst)) {
> > +			if (info->summary_limit > 0)
> > +				argv_array_pushf(&cp_log.args, "-%d",
> > +						 info->summary_limit);
> > +
> > +			argv_array_pushl(&cp_log.args, "--pretty=  %m %s",
> > +					 "--first-parent", NULL);
> > +			argv_array_pushf(&cp_log.args, "%s...%s",
> > +					 oid_to_hex(&p->oid_src),
> > +					 oid_to_hex(&p->oid_dst));
> > +		} else if (S_ISGITLINK(p->mod_dst)) {
> > +			argv_array_pushl(&cp_log.args, "--pretty=  > %s",
> > +					 "-1", oid_to_hex(&p->oid_dst), NULL);
> > +		} else {
> > +			argv_array_pushl(&cp_log.args, "--pretty=  < %s",
> > +					 "-1", oid_to_hex(&p->oid_src), NULL);
> > +		}
> > +		run_command(&cp_log);
> > +	}
> > +	printf("\n");
> > +}
> 
> It looks as if there is a whole lot of `oid_to_hex(&p->oid_src)` in that
> function. Together with the realization that we need the abbreviated
> version of that at least in one place, and the other realization that we
> already call `rev-parse --verify` for both `oid_src` and `oid_dst` in the
> caller of this function, it seems to suggest itself that we would actually
> want to pass the `--short` option, too, and to capture the output, and
> pass it down to `print_submodule_summary()` _instead of_ `missing_src` and
> `missing_dst` (e.g. as `src_abbrev` and `dst_abbrev`).

Oh you have mentioned it here too. This seems quite a good approach. I
will adopt this.

> > +
> > +static void generate_submodule_summary(struct summary_cb *info,
> > +				       struct module_cb *p)
> > +{
> > +	int missing_src = 0;
> > +	int missing_dst = 0;
> > +	char *displaypath;
> > +	int errmsg = 0;
> > +	int total_commits = -1;
> > +	int is_sm_git_dir = 0;
> > +	struct strbuf sm_git_dir_sb = STRBUF_INIT;
> > +
> > +	if (!info->cached && oideq(&p->oid_dst, &null_oid)) {
> > +		if (S_ISGITLINK(p->mod_dst)) {
> > +			/*
> > +			 * NEEDSWORK: avoid using separate process with
> > +			 * the help of the function head_ref_submodule()
> 
> I don't quite understand this comment. There is no `head_ref_submodule()`
> function.
> 
> > +			 */
> > +			struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> > +			struct strbuf sb_rev_parse = STRBUF_INIT;
> > +
> > +			cp_rev_parse.git_cmd = 1;
> > +			cp_rev_parse.no_stderr = 1;
> > +			cp_rev_parse.dir = p->sm_path;
> > +			prepare_submodule_repo_env(&cp_rev_parse.env_array);
> > +
> > +			argv_array_pushl(&cp_rev_parse.args, "rev-parse",
> > +					 "HEAD", NULL);
> > +			if (!capture_command(&cp_rev_parse, &sb_rev_parse, 0)) {
> > +				strbuf_strip_suffix(&sb_rev_parse, "\n");
> > +				get_oid_hex(sb_rev_parse.buf, &p->oid_dst);
> > +			}
> > +			strbuf_release(&sb_rev_parse);
> > +		} else if (S_ISLNK(p->mod_dst) || S_ISREG(p->mod_dst)) {
> > +			struct child_process cp_hash_object = CHILD_PROCESS_INIT;
> > +			struct strbuf sb_hash_object = STRBUF_INIT;
> > +
> > +			cp_hash_object.git_cmd = 1;
> > +			argv_array_pushl(&cp_hash_object.args, "hash-object",
> > +					 p->sm_path, NULL);
> > +			if (!capture_command(&cp_hash_object,
> > +					     &sb_hash_object, 0)) {
> > +				strbuf_strip_suffix(&sb_hash_object, "\n");
> > +				get_oid_hex(sb_hash_object.buf, &p->oid_dst);
> > +			}
> > +			strbuf_release(&sb_hash_object);
> 
> It would probably be shorter, less error-prone, and quicker to use
> `index_fd()` directly.
>
> BTW I am not quite sure that this code does the correct thing in case of a
> symlink: it hashes the contents of the symlink target (if it is a file,
> otherwise it errors out). But that is hardly an issue introduced by the
> conversion, that's just copied from `git-submodule.sh`.
> 
> > +		} else {
> > +			if (p->mod_dst)
> > +				die(_("unexpected mode %d\n"), p->mod_dst);
> 
> Hmm. This does not match what the shell script version does:
> 
>                         *)
>                                 # unexpected type
>                                 eval_gettextln "unexpected mode \$mod_dst" >&2
>                                 continue ;;
> 
> I think we should also just write the message to `stderr` and continue,
> not `die()`.
> 
> In addition to that, I am missing the C code for this case:
> 
>                         000000)
>                                 ;; # removed
> 
> It is quite possible that our test suite does not cover this case (or did
> the test suite fail for you?). If that is indeed the case, it would be
> really good to add a test case as part of this patch series, to gain
> confidence in the correctness of the conversion.

The tests passed for me actually. Whether this is covered by the test
cases, I am not sure. I will have to check it.

> > +		}
> > +	}
> > +
> > +	strbuf_addstr(&sm_git_dir_sb, p->sm_path);
> 
> I have to admit that I am not loving the name `sm_git_dir_sb`. Why not
> `submodule_git_dir`? I guess you copied it from elsewhere in
> `submodule--helper.c`...
> 
> > +	if (is_nonbare_repository_dir(&sm_git_dir_sb))
> > +		is_sm_git_dir = 1;
> 
> So here, we verify whether there is a repository at `p->sm_path`. I don't
> see that in the shell script version:
> 
>                 missing_src=
>                 missing_dst=
> 
>                 test $mod_src = 160000 &&
>                 ! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_src^0 >/dev/null &&
>                 missing_src=t
> 
>                 test $mod_dst = 160000 &&
>                 ! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_dst^0 >/dev/null &&
>                 missing_dst=t
> 
> Let's read a bit further.
> 
> > +
> > +	if (is_sm_git_dir && S_ISGITLINK(p->mod_src))
> > +		missing_src = verify_submodule_object_name(p->sm_path,
> > +							   oid_to_hex(&p->oid_src));
> 
> Ah, and `verify_submodule_object_name()` uses `p->sm_path` as working
> directory. But that's not what the shell script version did: it specified
> the `GIT_DIR` explicitly.
> 
> And by using the `prepare_submodule_repo_env()` function in
> `verify_submodule_object_name()`, we specify `GIT_DIR` implicitly, as I
> pointed out in my comment on that function.

Oh so you're saying that it will be better to call
'prepare_submodule_repo_env()' on some variable since we explicitly want to
store the path to GIT_DIR?

It would be of much help if you could explain this part just a little
more (for my own sake).

> So I think that `is_sm_git_dir` might be

I think you missed something here...

> > +
> > +	if (is_sm_git_dir && S_ISGITLINK(p->mod_dst))
> > +		missing_dst = verify_submodule_object_name(p->sm_path,
> > +							   oid_to_hex(&p->oid_dst));
> > +
> > +	displaypath = get_submodule_displaypath(p->sm_path, info->prefix);
> > +
> > +	if (!missing_dst && !missing_src) {
> > +		if (is_sm_git_dir) {
> > +			struct child_process cp_rev_list = CHILD_PROCESS_INIT;
> > +			struct strbuf sb_rev_list = STRBUF_INIT;
> > +			char *range;
> > +
> > +			if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst))
> > +				range = xstrfmt("%s...%s", oid_to_hex(&p->oid_src),
> > +						oid_to_hex(&p->oid_dst));
> > +			else if (S_ISGITLINK(p->mod_src))
> > +				range = xstrdup(oid_to_hex(&p->oid_src));
> > +			else
> > +				range = xstrdup(oid_to_hex(&p->oid_dst));
> > +
> > +			cp_rev_list.git_cmd = 1;
> > +			cp_rev_list.dir = p->sm_path;
> > +			prepare_submodule_repo_env(&cp_rev_list.env_array);
> 
> Again, due to setting the working directory to `p->sm_path` and
> (implicitly, via `prepare_submodule_repo_env()`) `GIT_DIR` to `.git`, I do
> not think that we have to guard this block beind `is_sm_git_dir`.

> > +
> > +			argv_array_pushl(&cp_rev_list.args, "rev-list",
> > +					 "--first-parent", range, "--", NULL);
> 
> Since `argv_array_push()` duplicates the strings, anyway, we can totally
> avoid the need for the `range` variable:
> 
> 			if (IS_GITLINK(p->mod_src) && IS_GITLINK(p->mod_dst))
> 				argv_array_pushf(&cp_rev_list.args, "%s...%s",
> 						 oid_to_hex(&p->oid_src),
> 						 oid_to_hex(&p->oid_dst));
> 			else
> 				argv_array_push(&cp_rev_list.args, IS_GITLINK(p->mod_src) ?
> 						oid_to_hex(&p->oid_src) :
> 						oid_to_hex(&p->oid_dst));
> 
> > +			if (!capture_command(&cp_rev_list, &sb_rev_list, 0)) {
> > +				if (sb_rev_list.len)
> > +					total_commits = count_lines(sb_rev_list.buf,
> > +								    sb_rev_list.len);
> 
> That's actually not necessary. `git rev-list --count` will give you a nice
> number, no need to capture a potentially large amount of memory only to
> count the lines.
> 
> This may also make the patch obsolete that makes `count_lines()` public.

Therefore we eliminate count_lines() from here and instead do a 'git
rev-list --count'?

> > +				else
> > +					total_commits = 0;
> > +			}
> 
> > +
> > +			free(range);
> > +			strbuf_release(&sb_rev_list);
> > +		}
> > +	} else {
> > +		errmsg = 1;
> > +	}
> 
> I am missing the equivalent for these lines here:
> 
>                 t,)
>                         errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commit \$sha1_src")"
>                         ;;
>                 ,t)
>                         errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commit \$sha1_dst")"
>                         ;;
>                 t,t)
>                         errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commits \$sha1_src and \$ ha1_dst")"
>                         ;;

I will add them.

> I am not quite sure whether it is a good idea to leave it to the
> `print_submodule_summary()` function to generate the `errmsg`. I think I'd
> rather have it a `char *` than an `int`.

Would it be better to add these error messages in
'prepare_submodule_summary()'? If we have error messages as integers
then we will simply

> > +
> > +	print_submodule_summary(info, errmsg, total_commits,
> > +				missing_src, missing_dst,
> > +		      		displaypath, is_sm_git_dir, p);
> > +
> > +	free(displaypath);
> > +	strbuf_release(&sm_git_dir_sb);
> > +}
> > +
> > +static void prepare_submodule_summary(struct summary_cb *info,
> > +				      struct module_cb_list *list)
> > +{
> > +	int i;
> > +	for (i = 0; i < list->nr; i++) {
> > +		struct module_cb *p = list->entries[i];
> > +		struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> > +
> > +		if (p->status == 'D' || p->status == 'T') {
> > +			generate_submodule_summary(info, p);
> > +			continue;
> > +		}
> > +
> > +		if (info->for_status) {
> > +			char *config_key;
> 
> Since the `config_key` is only used within the `if()` block it would be
> better to declare it within that block.
> 
> > +			const char *ignore_config = "none";
> 
> Since the only value we ever care about is "all", how about turning this
> into an `int`, setting it to `0` here, and later assigning it to
> `!strcmp(value, "all")` and `!strcmp(sub->ignore, "all")`, respectively?

Alright will do!

> I mean, I get it. Unix shell scripts are all about passing around text.
> And it is much easier to just translate that to C faithfully. But that
> does not make it good C code. C has data types, and proper C code makes
> use of that.
> 
> > +			const char *value;
> 
> If you want to save on lines, you can cuddle this together with other
> declarations of the same type. Even so, it could be scoped more narrowly.
> 
> > +			const struct submodule *sub = submodule_from_path(the_repository,
> > +									  &null_oid,
> > +									  p->sm_path);
> > +
> > +			if (sub && p->status != 'A') {
> 
> Good. The shell script version _always_ retrieved the `.ignore` config
> value, even if the `status` is `A`. Your version is much better.
> 
> But why bother calling `submodule_from_path()` if the status is `A`?

What exactly does a status of 'A' or 'T' mean? I mean I know what we are
doing but what exactly do these translate into?

> I could actually see the `const struct submodule *sub;` declaration be
> pulled out of this scope, and combining the `if (info->for_status &&
> p->status != 'A'), and the moving the assignment of `sub` into the `else
> if ((sub = submodule_from_path(r, &null_oid, p->sm_path)) &&
> sub->ignore)`.
> 
> That would save us one entire indentation level.

That seems a good approach! I will try this out.

> > +				config_key = xstrfmt("submodule.%s.ignore",
> > +						     sub->name);
> > +				if (!git_config_get_string_const(config_key, &value))
> > +					ignore_config = value;
> > +				else if (sub->ignore)
> > +					ignore_config = sub->ignore;
> > +
> > +				free(config_key);
> > +				if (!strcmp(ignore_config, "all"))
> > +					continue;
> > +			}
> > +		}
> > +
> > +		/* Also show added or modified modules which are checked out */
> > +		cp_rev_parse.dir = p->sm_path;
> > +		cp_rev_parse.git_cmd = 1;
> > +		cp_rev_parse.no_stderr = 1;
> > +		cp_rev_parse.no_stdout = 1;
> > +
> > +		argv_array_pushl(&cp_rev_parse.args, "rev-parse",
> > +				 "--git-dir", NULL);
> > +
> > +		if (!run_command(&cp_rev_parse))
> 
> I wonder whether we really need to waste an entire spawned process on
> figuring out whether `p->sm_path` refers to an active repository. Wouldn't
> `is_submodule_active(r, p->sm_path)` fulfill the same purpose?

Yep! This is correct. I will change.

> > +			generate_submodule_summary(info, p);
> > +	}
> > +}
> > +
> > +static void submodule_summary_callback(struct diff_queue_struct *q,
> > +				       struct diff_options *options,
> > +				       void *data)
> > +{
> > +	int i;
> > +	struct module_cb_list *list = data;
> > +	for (i = 0; i < q->nr; i++) {
> > +		struct diff_filepair *p = q->queue[i];
> > +		struct module_cb *temp;
> > +
> > +		if (!S_ISGITLINK(p->one->mode) && !S_ISGITLINK(p->two->mode))
> > +			continue;
> > +		temp = (struct module_cb*)malloc(sizeof(struct module_cb));
> > +		temp->mod_src = p->one->mode;
> > +		temp->mod_dst = p->two->mode;
> > +		temp->oid_src = p->one->oid;
> > +		temp->oid_dst = p->two->oid;
> > +		temp->status = p->status;
> > +		temp->sm_path = xstrdup(p->one->path);
> > +
> > +		ALLOC_GROW(list->entries, list->nr + 1, list->alloc);
> > +		list->entries[list->nr++] = temp;
> > +	}
> > +}
> > +
> > +static const char *get_diff_cmd(enum diff_cmd diff_cmd)
> > +{
> > +	switch (diff_cmd) {
> > +	case DIFF_INDEX: return "diff-index";
> > +	case DIFF_FILES: return "diff-files";
> > +	default: BUG("bad diff_cmd value %d", diff_cmd);
> > +	}
> > +}
> > +
> > +static int compute_summary_module_list(char *head,
> > +				         struct summary_cb *info,
> > +				         enum diff_cmd diff_cmd)
> > +{
> > +	struct argv_array diff_args = ARGV_ARRAY_INIT;
> > +	struct rev_info rev;
> > +	struct module_cb_list list = MODULE_CB_LIST_INIT;
> > +
> > +	argv_array_push(&diff_args, get_diff_cmd(diff_cmd));
> > +	if (info->cached)
> > +		argv_array_push(&diff_args, "--cached");
> > +	argv_array_pushl(&diff_args, "--ignore-submodules=dirty", "--raw",
> > +			 NULL);
> > +	if (head)
> > +		argv_array_push(&diff_args, head);
> > +	argv_array_push(&diff_args, "--");
> > +	if (info->argc)
> > +		argv_array_pushv(&diff_args, info->argv);
> > +
> > +	git_config(git_diff_basic_config, NULL);
> > +	init_revisions(&rev, info->prefix);
> > +	rev.abbrev = 0;
> > +	precompose_argv(diff_args.argc, diff_args.argv);
> > +
> > +	diff_args.argc = setup_revisions(diff_args.argc, diff_args.argv,
> > +					 &rev, NULL);
> > +	rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
> > +	rev.diffopt.format_callback = submodule_summary_callback;
> > +	rev.diffopt.format_callback_data = &list;
> > +
> > +	if (!info->cached) {
> > +		if (diff_cmd ==  DIFF_INDEX)
> 
> Please substitute the double-space by a single one.

Will do!

> > +			setup_work_tree();
> > +		if (read_cache_preload(&rev.diffopt.pathspec) < 0) {
> > +			perror("read_cache_preload");
> > +			return -1;
> > +		}
> > +	} else if (read_cache() < 0) {
> > +		perror("read_cache");
> > +		return -1;
> > +	}
> > +
> > +	if (diff_cmd == DIFF_INDEX)
> > +		run_diff_index(&rev, info->cached);
> > +	else
> > +		run_diff_files(&rev, 0);
> > +	prepare_submodule_summary(info, &list);
> > +	return 0;
> > +}
> > +
> > +static int module_summary(int argc, const char **argv, const char *prefix)
> > +{
> > +	struct summary_cb info = SUMMARY_CB_INIT;
> > +	int cached = 0;
> > +	int for_status = 0;
> > +	int quiet = 0;
> > +	int files = 0;
> > +	int summary_limit = -1;
> > +	struct child_process cp_rev = CHILD_PROCESS_INIT;
> > +	struct strbuf sb = STRBUF_INIT;
> > +	enum diff_cmd diff_cmd = DIFF_INDEX;
> > +	int ret;
> > +
> > +	struct option module_summary_options[] = {
> > +		OPT__QUIET(&quiet, N_("Suppress output of summarising submodules")),
> > +		OPT_BOOL(0, "cached", &cached,
> > +			 N_("Use the commit stored in the index instead of the submodule HEAD")),
> > +		OPT_BOOL(0, "files", &files,
> > +			 N_("To compare the commit in the index with that in the submodule HEAD")),
> > +		OPT_BOOL(0, "for-status", &for_status,
> > +			 N_("Skip submodules with 'ignore_config' value set to 'all'")),
> > +		OPT_INTEGER('n', "summary-limit", &summary_limit,
> > +			     N_("Limit the summary size")),
> > +		OPT_END()
> > +	};
> > +
> > +	const char *const git_submodule_helper_usage[] = {
> > +		N_("git submodule--helper summary [<options>] [commit] [--] [<path>]"),
> > +		NULL
> > +	};
> > +
> > +	argc = parse_options(argc, argv, prefix, module_summary_options,
> > +			     git_submodule_helper_usage, 0);
> > +
> > +	if (!summary_limit)
> > +		return 0;
> > +
> > +	cp_rev.git_cmd = 1;
> > +	argv_array_pushl(&cp_rev.args, "rev-parse", "-q", "--verify",
> > +			 argc ? argv[0] : "HEAD", NULL);
> 
> Oy. Why not simply call `get_oid()`? No need to spawn a new process.

Then everytime we need 'rev-parse', we simply call 'get_oid()'? That
will save us a ton of processes?

But I think we do need to capture the output of 'git rev-parse --verify
....' so I think it will backfire to use 'get_oid()' or am I just being
too dumb and not catching on something?

> > +
> > +	if (!capture_command(&cp_rev, &sb, 0)) {
> > +		strbuf_strip_suffix(&sb, "\n");
> > +		if (argc) {
> > +			argv++;
> > +			argc--;
> > +		}
> > +	} else if (!argc || !strcmp(argv[0], "HEAD")) {
> > +		/* before the first commit: compare with an empty tree */
> > +		struct stat st;
> > +		struct object_id oid;
> > +		if (fstat(0, &st) < 0 || index_fd(&the_index, &oid, 0, &st, 2,
> > +						  prefix, 3))
> > +			die("Unable to add %s to database", oid.hash);
> 
> Umm. The original reads:
> 
>                 # before the first commit: compare with an empty tree
>                 head=$(git hash-object -w -t tree --stdin </dev/null)
> 
> It does not actually read from `stdin`. It reads from `/dev/null`,
> redirected to the input. And what it _actually_ does is to generate the
> OID of the empty tree.
> 
> But we already _have_ the OID of the empty tree! It's
> `the_hash_algo->empty_tree`.

I did not know this 'the_hash_algo'. I will use it. Thanks! :)

> I hope that this is covered by the test suite. Please check that. The test
> would succeed with your version, but only because tests are run with
> `stdin` redirected from `/dev/null` by default.

I guess yes. My work passed because the tests are written this way.

> > +		strbuf_addstr(&sb, oid_to_hex(&oid));
> > +		if (argc) {
> > +			argv++;
> > +			argc--;
> > +		}
> > +	} else {
> > +		strbuf_addstr(&sb, "HEAD");
> > +	}
> 
> The conversion to C would make for a fine excuse to simplify the logic.

This was kind of like the 'shift' in shell. What equivalent do you
suggest?

> > +	if (files) {
> > +		if (cached)
> > +			die(_("--cached and --files are mutually exclusive"));
> > +		diff_cmd = DIFF_FILES;
> > +	}
> > +
> > +	info.argc = argc;
> > +	info.argv = argv;
> > +	info.prefix = prefix;
> > +	info.cached = !!cached;
> > +	info.for_status = !!for_status;
> > +	info.quiet = quiet;
> > +	info.files = files;
> > +	info.summary_limit = summary_limit;
> > +
> > +	ret = compute_summary_module_list((diff_cmd == DIFF_FILES) ? NULL : sb.buf,
> > +					   &info, diff_cmd);
> 
> It would be better to pass the OID as `struct object_id *`, not as string.

Will do!

> Other than that, this patch nicely follows previous conversions from Unix
> shell scripts to C.
> 
> Well done,
> Johannes

Thank you! It was a highly detailed review! I am still learning tons of
stuff about Git's code and such a review does help a lot! :)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C
  2020-07-05 17:34     ` Shourya Shukla
@ 2020-07-06  9:16       ` Kaartic Sivaraam
  2020-07-06 11:15         ` Shourya Shukla
  0 siblings, 1 reply; 21+ messages in thread
From: Kaartic Sivaraam @ 2020-07-06  9:16 UTC (permalink / raw)
  To: Shourya Shukla
  Cc: Johannes Schindelin, git, christian.couder, gitster, liu.denton,
	pc44800, stefanbeller

Note: I've added some comment but I've not been able to address all the
parts due to lack of time. I'm sending it sooner hoping it would be
useful.

On 05-07-2020 23:04, Shourya Shukla wrote:
[...]
>> [exchanging Stefan Beller's dysfunct @google address for their private
>> one; I encourage you to do the same in the next iteration, probably
>> by editing the `Mentored-by:` line.]
> 
> I think you missed to mention it.
>

If you're looking for the private e-mail. The exachange was already done
and it was right there in the Cc list of the mail sent by Dscho. I've
added it again as you seem to have removed it.

>> On Fri, 3 Jul 2020, Shourya Shukla wrote:
>>
>>> From: Prathamesh Chavan <pc44800@gmail.com>
>>>
>>> The submodule subcommand 'summary' is ported in the process of
>>> making git-submodule a builtin. The function cmd_summary() from
>>> git-submodule.sh is ported to functions module_summary(),
>>> compute_summary_module_list(), prepare_submodule_summary() and
>>> generate_submodule_summary(), print_submodule_summary().
>>>
>>> The first function module_summary() parses the options of submodule
>>> subcommand and also acts as the front-end of this subcommand.
>>> After parsing them, it calls the compute_summary_module_list()
>>
>> Missing full-stop, and probably the sentence also wanted to say "function"
>> at the end.
> 
> I will correct. Thanks for pointing out!
> 
>>> The functions compute_summary_module_list() runs the diff_cmd,
>>> and generates the modules list, as required by the subcommand.
>>> The generation of this module list is done by the using the
>>
>> s/the using/using/
> 
> Will amend!
> 
>>> callback function submodule_summary_callback(), and stored in the
>>> structure module_cb.
>>
>> This explains nicely what the patch does. But the commit message should
>> not really repeat what can be readily deduced from the patch; It should
>> focus on the motivation and on interesting background information that is
>> _not_ readily deduced from the patch.
> 
> I understand. I will follow your suggestions regarding my patch.
> 
>> For example, I see that `$diff_cmd` is called twice in the shell script
>> version, once to "get modified modules cared by user" and then _again_,
>> with that list of modified modules.
>>
>> I would have liked to see a reasoning in the commit message that explains
>> why this has to be so in the C version. I get why it is complicated in a
>> shell script (which lacks proper objects, after all), but I would have
>> expected the C version to be able to accumulate the information with a
>> single pass.
>>
>> (Before writing the following paragraph, I actually reviewed the patch
>> from bottom to top, in the caller->callee direction.)
>>
>> Ah. I see that this indeed is the case: there is only one pass in the C
>> version. That's a useful piece of metadata for the commit message, I
>> think, much more useful than describing the call tree of the functions.
> 
> Yup that it worth mentioning.
> 
>> Another thing worth mentioning in the commit message is that we use the
>> combination of setting a child_process' working directory to the submodule
>> path and then calling `prepare_submodule_repo_env()` which also sets
>> `GIT_DIR` to `.git`, so that we can be certain that those spawned
>> processes will not access the superproject's ODB by mistake.
>>
>> When reading my suggestions, please keep in mind that I reviewed the
>> functions in caller->callee order, i.e. I started at the end of the patch
>> and then worked my way up.
>>
>> All in all, I like the function structure, but I think there is still a
>> bit room for improvement in a v2.
> 
>>> +static int verify_submodule_object_name(const char *sm_path,
>>> +					  const char *sha1)
>>> +{
>>> +	struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
>>> + > > +	cp_rev_parse.git_cmd = 1;
>>> +	cp_rev_parse.no_stdout = 1;
>>> +	cp_rev_parse.dir = sm_path;
>>
>> So here we specify `sm_path` as current working directory.
>>
>>> +	prepare_submodule_repo_env(&cp_rev_parse.env_array);
>>
>> And this implicitly sets `GIT_DIR=.git`. Good.
>>
>>> +	argv_array_pushl(&cp_rev_parse.args, "rev-parse", "-q",
>>> +			 "--verify", NULL);
>>> +	argv_array_pushf(&cp_rev_parse.args, "%s^0", sha1);
>>
>> After this, we should also append `--` to make sure that we're not parsing
>> this as a file name.
> 
> Will do!
> 
>> Two comments about naming: `sha1` is pretty misleading here, as we do not
>> require it to be a SHA-1 (especially in the future in which we switch to
>> SHA-256). Besides, what we're really asking for (via that `^0` suffix) is
>> a committish. Therefore, I would propose to use `committish` both in the
>> parameter name as well as the function name.
> 
> I am not aware of this change. I will take this suggestion into account.
> 
>>> +
>>> +	if (run_command(&cp_rev_parse))
>>> +		return 1;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static void print_submodule_summary(struct summary_cb *info, int errmsg,
>>> +				      int total_commits, int missing_src,
>>> +				      int missing_dst, const char *displaypath,
>>> +				      int is_sm_git_dir, struct module_cb *p)
>>> +{
>>> +	if (p->status == 'T') {
>>> +		if (S_ISGITLINK(p->mod_dst))
>>> +			printf(_("* %s %s(blob)->%s(submodule)"),
>>> +				 displaypath, find_unique_abbrev(&p->oid_src, 7),
>>
>> The shell script version does this:
>>
>>                 sha1_abbr_src=$(GIT_DIR="$name/.git" git rev-parse --short $sha1_src 2>/dev/null ||
>>                         echo $sha1_src | cut -c1-7)
>>
>> That is not quite the same, as it looks for the abbreviation _in the
>> submodule_, not in the current project. So I think `find_unique_abbrev()`
>> is not correct here.
>>
>> The funny thing is that we _already_ will have called `git rev-parse
>> --verify` for both `p->oid_src` and `p->oid_dst` in the submodule, in the
>> caller of this function! And while we throw away the result, and while we
>> do not pass `--short`, there is no reason why we shouldn't be able to do
>> precisely that.
> 
> Okay so you are saying that there is no need of a 'find_unique_abbrev()'
> since we would be easily able to obtain these values from the caller of
> 'print_submodule_summary()' right?

Yes. 'generate_submodule_summary' already does a rev-parse on
p->oid_src and p->oid_dst via 'verify_submodule_object_name'.
We should be able to get the short version of them by passing '--short'
to rev-parse there and make it return the short SHA1. We can then use it
like how Dscho mentions below.

> Maybe we can pass 'oid_src' or 'oid_dst' as an argument?
> 
>>> +				 find_unique_abbrev(&p->oid_dst, 7));
>>> +		else
>>> +			printf(_("* %s %s(submodule)->%s(blob)"),
>>> +				 displaypath, find_unique_abbrev(&p->oid_src, 7),
>>> +				 find_unique_abbrev(&p->oid_dst, 7));
>>> +	} else {
>>> +		printf("* %s %s...%s",
>>> +			displaypath, find_unique_abbrev(&p->oid_src, 7),
>>> +			find_unique_abbrev(&p->oid_dst, 7));
>>> +	}
>>> +
>>> +	if (total_commits < 0)
>>> +		printf(":\n");
>>> +	else
>>> +		printf(" (%d):\n", total_commits);
>>> +
>>> +	if (errmsg) {
>>> +		/*
>>> +		 * Don't give error msg for modification whose dst is not
>>> +		 * submodule, i.e. deleted or changed to blob
>>> +		 */
>>> +		if (S_ISGITLINK(p->mod_src)) {
>>> +			if (missing_src && missing_dst) {
>>> +				printf(_("  Warn: %s doesn't contain commits %s and %s\n"),
>>> +				       displaypath, oid_to_hex(&p->oid_src),
>>> +				       oid_to_hex(&p->oid_dst));
>>> +			} else if (missing_src) {
>>> +				printf(_("  Warn: %s doesn't contain commit %s\n"),
>>> +				       displaypath, oid_to_hex(&p->oid_src));
>>> +			} else {
>>> +				printf(_("  Warn: %s doesn't contain commit %s\n"),
>>> +				       displaypath, oid_to_hex(&p->oid_dst));
>>> +			}
>>> +		}
>>> +	} else if (is_sm_git_dir) {
>>> +		struct child_process cp_log = CHILD_PROCESS_INIT;
>>> +
>>> +		cp_log.git_cmd = 1;
>>> +		cp_log.dir = p->sm_path;
>>> +		prepare_submodule_repo_env(&cp_log.env_array);
>>
>> Since the working directory is set to the top-level directory of the
>> submodule, and since `prepare_submodule_repo_env()` sets `GIT_DIR` to
>> `.git`, I think that the `is_sm_git_dir` condition is unnecessary. In
>> fact, the entire `is_sm_git_dir` parameter (and local variable in the
>> caller, see more on that below) can go away.
> 
> Because we already set the $GIT_DIR to .git/ so an extra check will not
> be necessary right?
> 

Yes. If we remove that check and we get a p->sm_path that does not point
to a submodule, I wonder what would happen if we run 'run_command'
on it. I'm also not sure if that's a possible case. Something to
explore.

>>> +		argv_array_pushl(&cp_log.args, "log", NULL);
>>> +
>>> +		if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst)) {
>>> +			if (info->summary_limit > 0)
>>> +				argv_array_pushf(&cp_log.args, "-%d",
>>> +						 info->summary_limit);
>>> +
>>> +			argv_array_pushl(&cp_log.args, "--pretty=  %m %s",
>>> +					 "--first-parent", NULL);
>>> +			argv_array_pushf(&cp_log.args, "%s...%s",
>>> +					 oid_to_hex(&p->oid_src),
>>> +					 oid_to_hex(&p->oid_dst));
>>> +		} else if (S_ISGITLINK(p->mod_dst)) {
>>> +			argv_array_pushl(&cp_log.args, "--pretty=  > %s",
>>> +					 "-1", oid_to_hex(&p->oid_dst), NULL);
>>> +		} else {
>>> +			argv_array_pushl(&cp_log.args, "--pretty=  < %s",
>>> +					 "-1", oid_to_hex(&p->oid_src), NULL);
>>> +		}
>>> +		run_command(&cp_log);
>>> +	}
>>> +	printf("\n");
>>> +}
>>
>> It looks as if there is a whole lot of `oid_to_hex(&p->oid_src)` in that
>> function. Together with the realization that we need the abbreviated
>> version of that at least in one place, and the other realization that we
>> already call `rev-parse --verify` for both `oid_src` and `oid_dst` in the
>> caller of this function, it seems to suggest itself that we would actually
>> want to pass the `--short` option, too, and to capture the output, and
>> pass it down to `print_submodule_summary()` _instead of_ `missing_src` and
>> `missing_dst` (e.g. as `src_abbrev` and `dst_abbrev`).
> 
> Oh you have mentioned it here too. This seems quite a good approach. I
> will adopt this.
> 
>>> +
>>> +static void generate_submodule_summary(struct summary_cb *info,
>>> +				       struct module_cb *p)
>>> +{
>>> +	int missing_src = 0;
>>> +	int missing_dst = 0;
>>> +	char *displaypath;
>>> +	int errmsg = 0;
>>> +	int total_commits = -1;
>>> +	int is_sm_git_dir = 0;
>>> +	struct strbuf sm_git_dir_sb = STRBUF_INIT;
>>> +
>>> +	if (!info->cached && oideq(&p->oid_dst, &null_oid)) {
>>> +		if (S_ISGITLINK(p->mod_dst)) {
>>> +			/*
>>> +			 * NEEDSWORK: avoid using separate process with
>>> +			 * the help of the function head_ref_submodule()
>>
>> I don't quite understand this comment. There is no `head_ref_submodule()`
>> function.
>>

That NEEDSWORK was added based on Stefan's comment on a previous version
of Prathamesh's patch. Here it is for reference:

>> +       if (!info->cached && !oidcmp(&p->oid_dst, &null_oid)) {
>> +               if (S_ISGITLINK(p->mod_dst)) {
>> +                       struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
>> +                       struct strbuf sb_rev_parse = STRBUF_INIT;
>> +
>> +                       cp_rev_parse.git_cmd = 1;
>> +                       cp_rev_parse.no_stderr = 1;
>> +                       cp_rev_parse.dir = p->sm_path;
>> +                       prepare_submodule_repo_env(&cp_rev_parse.env_array);
>> +
>> +                       argv_array_pushl(&cp_rev_parse.args,
>> +                                        "rev-parse", "HEAD", NULL);
>> +                       if (!capture_command(&cp_rev_parse, &sb_rev_parse, 0)) {
>> +                               strbuf_strip_suffix(&sb_rev_parse, "\n");
>> +
>> +                               get_oid_hex(sb_rev_parse.buf, &p->oid_dst);
>> +                       }
>> +                       strbuf_release(&sb_rev_parse);
> 
> I think this could be replaced via
> head_ref_submodule(sub->path, callback function, &where_to_store)
> or is there some trickery going on, that this also works on
> non-compliant submodules?
> (Maybe add that as a NEEDSWORK/TODO)

Ref:
https://public-inbox.org/git/CAGZ79kaWn9z47Va=VW4R2Aswws1N5n2u4Kvatn73s0YnV0pVqQ@mail.gmail.com/

A quick search reveals that 'head_ref_submodule' existed during that
period. On further investigation it seems that 'refs_head_ref' was
introduced in 62f0b399e0 (refs: add refs_head_ref(), 2017-08-23) and
'head_ref_submodule' was made to use it. Later on, in 419221c106 (refs:
remove dead for_each_*_submodule(), 2017-08-23), 'head_ref-submodule'
was removed with an advice to use the 'refs_' API for accessing
submodules.

> +* Use `refs_` API for accessing submodules. The submodule ref store could
> +  be obtained with `get_submodule_ref_store()`

How it applies to our code is something to be looked into, yet.

>>> +			 */
>>> +			struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
>>> +			struct strbuf sb_rev_parse = STRBUF_INIT;
>>> +
>>> +			cp_rev_parse.git_cmd = 1;
>>> +			cp_rev_parse.no_stderr = 1;
>>> +			cp_rev_parse.dir = p->sm_path;
>>> +			prepare_submodule_repo_env(&cp_rev_parse.env_array);
>>> +
>>> +			argv_array_pushl(&cp_rev_parse.args, "rev-parse",
>>> +					 "HEAD", NULL);
>>> +			if (!capture_command(&cp_rev_parse, &sb_rev_parse, 0)) {
>>> +				strbuf_strip_suffix(&sb_rev_parse, "\n");
>>> +				get_oid_hex(sb_rev_parse.buf, &p->oid_dst);
>>> +			}
>>> +			strbuf_release(&sb_rev_parse);
>>> +		} else if (S_ISLNK(p->mod_dst) || S_ISREG(p->mod_dst)) {
>>> +			struct child_process cp_hash_object = CHILD_PROCESS_INIT;
>>> +			struct strbuf sb_hash_object = STRBUF_INIT;
>>> +
>>> +			cp_hash_object.git_cmd = 1;
>>> +			argv_array_pushl(&cp_hash_object.args, "hash-object",
>>> +					 p->sm_path, NULL);
>>> +			if (!capture_command(&cp_hash_object,
>>> +					     &sb_hash_object, 0)) {
>>> +				strbuf_strip_suffix(&sb_hash_object, "\n");
>>> +				get_oid_hex(sb_hash_object.buf, &p->oid_dst);
>>> +			}
>>> +			strbuf_release(&sb_hash_object);
>>
>> It would probably be shorter, less error-prone, and quicker to use
>> `index_fd()` directly.
>>
>> BTW I am not quite sure that this code does the correct thing in case of a
>> symlink: it hashes the contents of the symlink target (if it is a file,
>> otherwise it errors out). But that is hardly an issue introduced by the
>> conversion, that's just copied from `git-submodule.sh`.
>>
>>> +		} else {
>>> +			if (p->mod_dst)
>>> +				die(_("unexpected mode %d\n"), p->mod_dst);
>>
>> Hmm. This does not match what the shell script version does:
>>
>>                         *)
>>                                 # unexpected type
>>                                 eval_gettextln "unexpected mode \$mod_dst" >&2
>>                                 continue ;;
>>
>> I think we should also just write the message to `stderr` and continue,
>> not `die()`.
>>
>> In addition to that, I am missing the C code for this case:
>>
>>                         000000)
>>                                 ;; # removed
>>
>> It is quite possible that our test suite does not cover this case (or did
>> the test suite fail for you?). If that is indeed the case, it would be
>> really good to add a test case as part of this patch series, to gain
>> confidence in the correctness of the conversion.
> 
> The tests passed for me actually. Whether this is covered by the test
> cases, I am not sure. I will have to check it.
> 
>>> +		}
>>> +	}
>>> +
>>> +	strbuf_addstr(&sm_git_dir_sb, p->sm_path);
>>
>> I have to admit that I am not loving the name `sm_git_dir_sb`. Why not
>> `submodule_git_dir`? I guess you copied it from elsewhere in
>> `submodule--helper.c`...
>>
>>> +	if (is_nonbare_repository_dir(&sm_git_dir_sb))
>>> +		is_sm_git_dir = 1;
>>
>> So here, we verify whether there is a repository at `p->sm_path`. I don't
>> see that in the shell script version:
>>
>>                 missing_src=
>>                 missing_dst=
>>
>>                 test $mod_src = 160000 &&
>>                 ! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_src^0 >/dev/null &&
>>                 missing_src=t
>>
>>                 test $mod_dst = 160000 &&
>>                 ! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_dst^0 >/dev/null &&
>>                 missing_dst=t
>>
>> Let's read a bit further.
>>
>>> +
>>> +	if (is_sm_git_dir && S_ISGITLINK(p->mod_src))
>>> +		missing_src = verify_submodule_object_name(p->sm_path,
>>> +							   oid_to_hex(&p->oid_src));
>>
>> Ah, and `verify_submodule_object_name()` uses `p->sm_path` as working
>> directory. But that's not what the shell script version did: it specified
>> the `GIT_DIR` explicitly.
>>
>> And by using the `prepare_submodule_repo_env()` function in
>> `verify_submodule_object_name()`, we specify `GIT_DIR` implicitly, as I
>> pointed out in my comment on that function.
> 
> Oh so you're saying that it will be better to call
> 'prepare_submodule_repo_env()' on some variable since we explicitly want to
> store the path to GIT_DIR?
> 

We already call 'prepare_submodule_repo_env' in
'verify_submodule_object_name'. So, he's likely saying that
'is_sm_git_dir' is unnecessary here.

> It would be of much help if you could explain this part just a little
> more (for my own sake).
> 
>> So I think that `is_sm_git_dir` might be
> 

... unnecessary.

> I think you missed something here...
> 

That's likely what he meant based on what is mentioned above.

>>> +
>>> +	if (is_sm_git_dir && S_ISGITLINK(p->mod_dst))
>>> +		missing_dst = verify_submodule_object_name(p->sm_path,
>>> +							   oid_to_hex(&p->oid_dst));
>>> +
>>> +	displaypath = get_submodule_displaypath(p->sm_path, info->prefix);
>>> +
>>> +	if (!missing_dst && !missing_src) {
>>> +		if (is_sm_git_dir) {
>>> +			struct child_process cp_rev_list = CHILD_PROCESS_INIT;
>>> +			struct strbuf sb_rev_list = STRBUF_INIT;
>>> +			char *range;
>>> +
>>> +			if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst))
>>> +				range = xstrfmt("%s...%s", oid_to_hex(&p->oid_src),
>>> +						oid_to_hex(&p->oid_dst));
>>> +			else if (S_ISGITLINK(p->mod_src))
>>> +				range = xstrdup(oid_to_hex(&p->oid_src));
>>> +			else
>>> +				range = xstrdup(oid_to_hex(&p->oid_dst));
>>> +
>>> +			cp_rev_list.git_cmd = 1;
>>> +			cp_rev_list.dir = p->sm_path;
>>> +			prepare_submodule_repo_env(&cp_rev_list.env_array);
>>
>> Again, due to setting the working directory to `p->sm_path` and
>> (implicitly, via `prepare_submodule_repo_env()`) `GIT_DIR` to `.git`, I do
>> not think that we have to guard this block beind `is_sm_git_dir`.
> 
>>> +
>>> +			argv_array_pushl(&cp_rev_list.args, "rev-list",
>>> +					 "--first-parent", range, "--", NULL);
>>
>> Since `argv_array_push()` duplicates the strings, anyway, we can totally
>> avoid the need for the `range` variable:
>>
>> 			if (IS_GITLINK(p->mod_src) && IS_GITLINK(p->mod_dst))
>> 				argv_array_pushf(&cp_rev_list.args, "%s...%s",
>> 						 oid_to_hex(&p->oid_src),
>> 						 oid_to_hex(&p->oid_dst));
>> 			else
>> 				argv_array_push(&cp_rev_list.args, IS_GITLINK(p->mod_src) ?
>> 						oid_to_hex(&p->oid_src) :
>> 						oid_to_hex(&p->oid_dst));
>>
>>> +			if (!capture_command(&cp_rev_list, &sb_rev_list, 0)) {
>>> +				if (sb_rev_list.len)
>>> +					total_commits = count_lines(sb_rev_list.buf,
>>> +								    sb_rev_list.len);
>>
>> That's actually not necessary. `git rev-list --count` will give you a nice
>> number, no need to capture a potentially large amount of memory only to
>> count the lines.
>>
>> This may also make the patch obsolete that makes `count_lines()` public.
> 
> Therefore we eliminate count_lines() from here and instead do a 'git
> rev-list --count'?
> 

Yes.

>>> +				else
>>> +					total_commits = 0;
>>> +			}
>>
>>> +
>>> +			free(range);
>>> +			strbuf_release(&sb_rev_list);
>>> +		}
>>> +	} else {
>>> +		errmsg = 1;
>>> +	}
>>
>> I am missing the equivalent for these lines here:
>>
>>                 t,)
>>                         errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commit \$sha1_src")"
>>                         ;;
>>                 ,t)
>>                         errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commit \$sha1_dst")"
>>                         ;;
>>                 t,t)
>>                         errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commits \$sha1_src and \$ ha1_dst")"
>>                         ;;
> 
> I will add them.
> 

I think they're already there in the 'print_submodule_summary' function
above.

>> I am not quite sure whether it is a good idea to leave it to the
>> `print_submodule_summary()` function to generate the `errmsg`. I think I'd
>> rather have it a `char *` than an `int`.
> 
> Would it be better to add these error messages in
> 'prepare_submodule_summary()'?

No. He's likely saying that instead of setting `errmsg` to 1 and
constructing the error message in `print_submodule_summary` we should
be having the error messages in `generate_submodule_summary` and pass
them to `print_submodule_summary` instead of passing an int.

> If we have error messages as integers
> then we will simply
> 

You missed something here. ;)

>>> +
>>> +	print_submodule_summary(info, errmsg, total_commits,
>>> +				missing_src, missing_dst,
>>> +		      		displaypath, is_sm_git_dir, p);
>>> +
>>> +	free(displaypath);
>>> +	strbuf_release(&sm_git_dir_sb);
>>> +}
>>> +
>>> +static void prepare_submodule_summary(struct summary_cb *info,
>>> +				      struct module_cb_list *list)
>>> +{
>>> +	int i;
>>> +	for (i = 0; i < list->nr; i++) {
>>> +		struct module_cb *p = list->entries[i];
>>> +		struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
>>> +
>>> +		if (p->status == 'D' || p->status == 'T') {
>>> +			generate_submodule_summary(info, p);
>>> +			continue;
>>> +		}
>>> +
>>> +		if (info->for_status) {
>>> +			char *config_key;
>>
>> Since the `config_key` is only used within the `if()` block it would be
>> better to declare it within that block.
>>
>>> +			const char *ignore_config = "none";
>>
>> Since the only value we ever care about is "all", how about turning this
>> into an `int`, setting it to `0` here, and later assigning it to
>> `!strcmp(value, "all")` and `!strcmp(sub->ignore, "all")`, respectively?
> 
> Alright will do!
> 
>> I mean, I get it. Unix shell scripts are all about passing around text.
>> And it is much easier to just translate that to C faithfully. But that
>> does not make it good C code. C has data types, and proper C code makes
>> use of that.
>>
>>> +			const char *value;
>>
>> If you want to save on lines, you can cuddle this together with other
>> declarations of the same type. Even so, it could be scoped more narrowly.
>>
>>> +			const struct submodule *sub = submodule_from_path(the_repository,
>>> +									  &null_oid,
>>> +									  p->sm_path);
>>> +
>>> +			if (sub && p->status != 'A') {
>>
>> Good. The shell script version _always_ retrieved the `.ignore` config
>> value, even if the `status` is `A`. Your version is much better.
>>
>> But why bother calling `submodule_from_path()` if the status is `A`?
> 
> What exactly does a status of 'A' or 'T' mean? I mean I know what we are
> doing but what exactly do these translate into?
> 

Its interesting you understood it without knowing what 'A' and 'T'
meant. Anyways, if you take a look at the documentation of
'diff-index'[1] which provides us the `status` you'll know that:
> 
> Possible status letters are:
> 
>     A: addition of a file
> 
>     C: copy of a file into a new one
> 
>     D: deletion of a file
> 
>     M: modification of the contents or mode of a file
> 
>     R: renaming of a file
> 
>     T: change in the type of the file
> 
>     U: file is unmerged (you must complete the merge before it can be committed)
> 
>     X: "unknown" change type (most probably a bug, please report it)
> 

[1]: https://git-scm.com/docs/git-diff-index

>> I could actually see the `const struct submodule *sub;` declaration be
>> pulled out of this scope, and combining the `if (info->for_status &&
>> p->status != 'A'), and the moving the assignment of `sub` into the `else
>> if ((sub = submodule_from_path(r, &null_oid, p->sm_path)) &&
>> sub->ignore)`.
>>
>> That would save us one entire indentation level.
> 
> That seems a good approach! I will try this out.
> 
>>> +				config_key = xstrfmt("submodule.%s.ignore",
>>> +						     sub->name);
>>> +				if (!git_config_get_string_const(config_key, &value))
>>> +					ignore_config = value;
>>> +				else if (sub->ignore)
>>> +					ignore_config = sub->ignore;
>>> +
>>> +				free(config_key);
>>> +				if (!strcmp(ignore_config, "all"))
>>> +					continue;
>>> +			}
>>> +		}
>>> +
>>> +		/* Also show added or modified modules which are checked out */
>>> +		cp_rev_parse.dir = p->sm_path;
>>> +		cp_rev_parse.git_cmd = 1;
>>> +		cp_rev_parse.no_stderr = 1;
>>> +		cp_rev_parse.no_stdout = 1;
>>> +
>>> +		argv_array_pushl(&cp_rev_parse.args, "rev-parse",
>>> +				 "--git-dir", NULL);
>>> +
>>> +		if (!run_command(&cp_rev_parse))
>>
>> I wonder whether we really need to waste an entire spawned process on
>> figuring out whether `p->sm_path` refers to an active repository. Wouldn't
>> `is_submodule_active(r, p->sm_path)` fulfill the same purpose?
> 
> Yep! This is correct. I will change.
> 
>>> +			generate_submodule_summary(info, p);
>>> +	}
>>> +}
>>> +
>>> +static void submodule_summary_callback(struct diff_queue_struct *q,
>>> +				       struct diff_options *options,
>>> +				       void *data)
>>> +{
>>> +	int i;
>>> +	struct module_cb_list *list = data;
>>> +	for (i = 0; i < q->nr; i++) {
>>> +		struct diff_filepair *p = q->queue[i];
>>> +		struct module_cb *temp;
>>> +
>>> +		if (!S_ISGITLINK(p->one->mode) && !S_ISGITLINK(p->two->mode))
>>> +			continue;
>>> +		temp = (struct module_cb*)malloc(sizeof(struct module_cb));
>>> +		temp->mod_src = p->one->mode;
>>> +		temp->mod_dst = p->two->mode;
>>> +		temp->oid_src = p->one->oid;
>>> +		temp->oid_dst = p->two->oid;
>>> +		temp->status = p->status;
>>> +		temp->sm_path = xstrdup(p->one->path);
>>> +
>>> +		ALLOC_GROW(list->entries, list->nr + 1, list->alloc);
>>> +		list->entries[list->nr++] = temp;
>>> +	}
>>> +}
>>> +
>>> +static const char *get_diff_cmd(enum diff_cmd diff_cmd)
>>> +{
>>> +	switch (diff_cmd) {
>>> +	case DIFF_INDEX: return "diff-index";
>>> +	case DIFF_FILES: return "diff-files";
>>> +	default: BUG("bad diff_cmd value %d", diff_cmd);
>>> +	}
>>> +}
>>> +
>>> +static int compute_summary_module_list(char *head,
>>> +				         struct summary_cb *info,
>>> +				         enum diff_cmd diff_cmd)
>>> +{
>>> +	struct argv_array diff_args = ARGV_ARRAY_INIT;
>>> +	struct rev_info rev;
>>> +	struct module_cb_list list = MODULE_CB_LIST_INIT;
>>> +
>>> +	argv_array_push(&diff_args, get_diff_cmd(diff_cmd));
>>> +	if (info->cached)
>>> +		argv_array_push(&diff_args, "--cached");
>>> +	argv_array_pushl(&diff_args, "--ignore-submodules=dirty", "--raw",
>>> +			 NULL);
>>> +	if (head)
>>> +		argv_array_push(&diff_args, head);
>>> +	argv_array_push(&diff_args, "--");
>>> +	if (info->argc)
>>> +		argv_array_pushv(&diff_args, info->argv);
>>> +
>>> +	git_config(git_diff_basic_config, NULL);
>>> +	init_revisions(&rev, info->prefix);
>>> +	rev.abbrev = 0;
>>> +	precompose_argv(diff_args.argc, diff_args.argv);
>>> +
>>> +	diff_args.argc = setup_revisions(diff_args.argc, diff_args.argv,
>>> +					 &rev, NULL);
>>> +	rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
>>> +	rev.diffopt.format_callback = submodule_summary_callback;
>>> +	rev.diffopt.format_callback_data = &list;
>>> +
>>> +	if (!info->cached) {
>>> +		if (diff_cmd ==  DIFF_INDEX)
>>
>> Please substitute the double-space by a single one.
> 
> Will do!
> 
>>> +			setup_work_tree();
>>> +		if (read_cache_preload(&rev.diffopt.pathspec) < 0) {
>>> +			perror("read_cache_preload");
>>> +			return -1;
>>> +		}
>>> +	} else if (read_cache() < 0) {
>>> +		perror("read_cache");
>>> +		return -1;
>>> +	}
>>> +
>>> +	if (diff_cmd == DIFF_INDEX)
>>> +		run_diff_index(&rev, info->cached);
>>> +	else
>>> +		run_diff_files(&rev, 0);
>>> +	prepare_submodule_summary(info, &list);
>>> +	return 0;
>>> +}
>>> +
>>> +static int module_summary(int argc, const char **argv, const char *prefix)
>>> +{
>>> +	struct summary_cb info = SUMMARY_CB_INIT;
>>> +	int cached = 0;
>>> +	int for_status = 0;
>>> +	int quiet = 0;
>>> +	int files = 0;
>>> +	int summary_limit = -1;
>>> +	struct child_process cp_rev = CHILD_PROCESS_INIT;
>>> +	struct strbuf sb = STRBUF_INIT;
>>> +	enum diff_cmd diff_cmd = DIFF_INDEX;
>>> +	int ret;
>>> +
>>> +	struct option module_summary_options[] = {
>>> +		OPT__QUIET(&quiet, N_("Suppress output of summarising submodules")),
>>> +		OPT_BOOL(0, "cached", &cached,
>>> +			 N_("Use the commit stored in the index instead of the submodule HEAD")),
>>> +		OPT_BOOL(0, "files", &files,
>>> +			 N_("To compare the commit in the index with that in the submodule HEAD")),
>>> +		OPT_BOOL(0, "for-status", &for_status,
>>> +			 N_("Skip submodules with 'ignore_config' value set to 'all'")),
>>> +		OPT_INTEGER('n', "summary-limit", &summary_limit,
>>> +			     N_("Limit the summary size")),
>>> +		OPT_END()
>>> +	};
>>> +
>>> +	const char *const git_submodule_helper_usage[] = {
>>> +		N_("git submodule--helper summary [<options>] [commit] [--] [<path>]"),
>>> +		NULL
>>> +	};
>>> +
>>> +	argc = parse_options(argc, argv, prefix, module_summary_options,
>>> +			     git_submodule_helper_usage, 0);
>>> +
>>> +	if (!summary_limit)
>>> +		return 0;
>>> +
>>> +	cp_rev.git_cmd = 1;
>>> +	argv_array_pushl(&cp_rev.args, "rev-parse", "-q", "--verify",
>>> +			 argc ? argv[0] : "HEAD", NULL);
>>
>> Oy. Why not simply call `get_oid()`? No need to spawn a new process.
> 
> Then everytime we need 'rev-parse', we simply call 'get_oid()'? That
> will save us a ton of processes?
> 
> But I think we do need to capture the output of 'git rev-parse --verify
> ....' so I think it will backfire to use 'get_oid()' or am I just being
> too dumb and not catching on something?
> 

I'll leave this for others to answer.

>>> +
>>> +	if (!capture_command(&cp_rev, &sb, 0)) {
>>> +		strbuf_strip_suffix(&sb, "\n");
>>> +		if (argc) {
>>> +			argv++;
>>> +			argc--;
>>> +		}
>>> +	} else if (!argc || !strcmp(argv[0], "HEAD")) {
>>> +		/* before the first commit: compare with an empty tree */
>>> +		struct stat st;
>>> +		struct object_id oid;
>>> +		if (fstat(0, &st) < 0 || index_fd(&the_index, &oid, 0, &st, 2,
>>> +						  prefix, 3))
>>> +			die("Unable to add %s to database", oid.hash);
>>
>> Umm. The original reads:
>>
>>                 # before the first commit: compare with an empty tree
>>                 head=$(git hash-object -w -t tree --stdin </dev/null)
>>
>> It does not actually read from `stdin`. It reads from `/dev/null`,
>> redirected to the input. And what it _actually_ does is to generate the
>> OID of the empty tree.
>>
>> But we already _have_ the OID of the empty tree! It's
>> `the_hash_algo->empty_tree`.
> 
> I did not know this 'the_hash_algo'. I will use it. Thanks! :)
> 
>> I hope that this is covered by the test suite. Please check that. The test
>> would succeed with your version, but only because tests are run with
>> `stdin` redirected from `/dev/null` by default.
> 
> I guess yes. My work passed because the tests are written this way.
> 
>>> +		strbuf_addstr(&sb, oid_to_hex(&oid));
>>> +		if (argc) {
>>> +			argv++;
>>> +			argc--;
>>> +		}
>>> +	} else {
>>> +		strbuf_addstr(&sb, "HEAD");
>>> +	}
>>
>> The conversion to C would make for a fine excuse to simplify the logic.
> 
> This was kind of like the 'shift' in shell. What equivalent do you
> suggest?
> 

I think that's just a general comment after the other comments found
just above about simplifying things.

>>> +	if (files) {
>>> +		if (cached)
>>> +			die(_("--cached and --files are mutually exclusive"));
>>> +		diff_cmd = DIFF_FILES;
>>> +	}
>>> +
>>> +	info.argc = argc;
>>> +	info.argv = argv;
>>> +	info.prefix = prefix;
>>> +	info.cached = !!cached;
>>> +	info.for_status = !!for_status;
>>> +	info.quiet = quiet;
>>> +	info.files = files;
>>> +	info.summary_limit = summary_limit;
>>> +
>>> +	ret = compute_summary_module_list((diff_cmd == DIFF_FILES) ? NULL : sb.buf,
>>> +					   &info, diff_cmd);
>>
>> It would be better to pass the OID as `struct object_id *`, not as string.
> 
> Will do!
> 
>> Other than that, this patch nicely follows previous conversions from Unix
>> shell scripts to C.
>>
>> Well done,
>> Johannes
> 
> Thank you! It was a highly detailed review! I am still learning tons of
> stuff about Git's code and such a review does help a lot! :)
> 

Hope this helps,
Sivaraam

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C
  2020-07-06  9:16       ` Kaartic Sivaraam
@ 2020-07-06 11:15         ` Shourya Shukla
  2020-07-12  0:46           ` Johannes Schindelin
  0 siblings, 1 reply; 21+ messages in thread
From: Shourya Shukla @ 2020-07-06 11:15 UTC (permalink / raw)
  To: Kaartic Sivaraam
  Cc: Johannes.Schindelin, git, christian.couder, gitster, liu.denton,
	pc44800, stefanbeller

On 06/07 02:46, Kaartic Sivaraam wrote:
> Note: I've added some comment but I've not been able to address all the
> parts due to lack of time. I'm sending it sooner hoping it would be
> useful.

Sure no problem

> On 05-07-2020 23:04, Shourya Shukla wrote:
> [...]
> >> [exchanging Stefan Beller's dysfunct @google address for their private
> >> one; I encourage you to do the same in the next iteration, probably
> >> by editing the `Mentored-by:` line.]
> > 
> > I think you missed to mention it.
> >
> 
> If you're looking for the private e-mail. The exachange was already done
> and it was right there in the Cc list of the mail sent by Dscho. I've
> added it again as you seem to have removed it.

Oh I did not catch that actually. Thanks!

> >> On Fri, 3 Jul 2020, Shourya Shukla wrote:
> >>
> >>> From: Prathamesh Chavan <pc44800@gmail.com>
> >>>
> >>> The submodule subcommand 'summary' is ported in the process of
> >>> making git-submodule a builtin. The function cmd_summary() from
> >>> git-submodule.sh is ported to functions module_summary(),
> >>> compute_summary_module_list(), prepare_submodule_summary() and
> >>> generate_submodule_summary(), print_submodule_summary().
> >>>
> >>> The first function module_summary() parses the options of submodule
> >>> subcommand and also acts as the front-end of this subcommand.
> >>> After parsing them, it calls the compute_summary_module_list()
> >>
> >> Missing full-stop, and probably the sentence also wanted to say "function"
> >> at the end.
> > 
> > I will correct. Thanks for pointing out!
> > 
> >>> The functions compute_summary_module_list() runs the diff_cmd,
> >>> and generates the modules list, as required by the subcommand.
> >>> The generation of this module list is done by the using the
> >>
> >> s/the using/using/
> > 
> > Will amend!
> > 
> >>> callback function submodule_summary_callback(), and stored in the
> >>> structure module_cb.
> >>
> >> This explains nicely what the patch does. But the commit message should
> >> not really repeat what can be readily deduced from the patch; It should
> >> focus on the motivation and on interesting background information that is
> >> _not_ readily deduced from the patch.
> > 
> > I understand. I will follow your suggestions regarding my patch.
> > 
> >> For example, I see that `$diff_cmd` is called twice in the shell script
> >> version, once to "get modified modules cared by user" and then _again_,
> >> with that list of modified modules.
> >>
> >> I would have liked to see a reasoning in the commit message that explains
> >> why this has to be so in the C version. I get why it is complicated in a
> >> shell script (which lacks proper objects, after all), but I would have
> >> expected the C version to be able to accumulate the information with a
> >> single pass.
> >>
> >> (Before writing the following paragraph, I actually reviewed the patch
> >> from bottom to top, in the caller->callee direction.)
> >>
> >> Ah. I see that this indeed is the case: there is only one pass in the C
> >> version. That's a useful piece of metadata for the commit message, I
> >> think, much more useful than describing the call tree of the functions.
> > 
> > Yup that it worth mentioning.
> > 
> >> Another thing worth mentioning in the commit message is that we use the
> >> combination of setting a child_process' working directory to the submodule
> >> path and then calling `prepare_submodule_repo_env()` which also sets
> >> `GIT_DIR` to `.git`, so that we can be certain that those spawned
> >> processes will not access the superproject's ODB by mistake.
> >>
> >> When reading my suggestions, please keep in mind that I reviewed the
> >> functions in caller->callee order, i.e. I started at the end of the patch
> >> and then worked my way up.
> >>
> >> All in all, I like the function structure, but I think there is still a
> >> bit room for improvement in a v2.
> > 
> >>> +static int verify_submodule_object_name(const char *sm_path,
> >>> +					  const char *sha1)
> >>> +{
> >>> +	struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> >>> + > > +	cp_rev_parse.git_cmd = 1;
> >>> +	cp_rev_parse.no_stdout = 1;
> >>> +	cp_rev_parse.dir = sm_path;
> >>
> >> So here we specify `sm_path` as current working directory.
> >>
> >>> +	prepare_submodule_repo_env(&cp_rev_parse.env_array);
> >>
> >> And this implicitly sets `GIT_DIR=.git`. Good.
> >>
> >>> +	argv_array_pushl(&cp_rev_parse.args, "rev-parse", "-q",
> >>> +			 "--verify", NULL);
> >>> +	argv_array_pushf(&cp_rev_parse.args, "%s^0", sha1);
> >>
> >> After this, we should also append `--` to make sure that we're not parsing
> >> this as a file name.
> > 
> > Will do!
> > 
> >> Two comments about naming: `sha1` is pretty misleading here, as we do not
> >> require it to be a SHA-1 (especially in the future in which we switch to
> >> SHA-256). Besides, what we're really asking for (via that `^0` suffix) is
> >> a committish. Therefore, I would propose to use `committish` both in the
> >> parameter name as well as the function name.
> > 
> > I am not aware of this change. I will take this suggestion into account.
> > 
> >>> +
> >>> +	if (run_command(&cp_rev_parse))
> >>> +		return 1;
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +static void print_submodule_summary(struct summary_cb *info, int errmsg,
> >>> +				      int total_commits, int missing_src,
> >>> +				      int missing_dst, const char *displaypath,
> >>> +				      int is_sm_git_dir, struct module_cb *p)
> >>> +{
> >>> +	if (p->status == 'T') {
> >>> +		if (S_ISGITLINK(p->mod_dst))
> >>> +			printf(_("* %s %s(blob)->%s(submodule)"),
> >>> +				 displaypath, find_unique_abbrev(&p->oid_src, 7),
> >>
> >> The shell script version does this:
> >>
> >>                 sha1_abbr_src=$(GIT_DIR="$name/.git" git rev-parse --short $sha1_src 2>/dev/null ||
> >>                         echo $sha1_src | cut -c1-7)
> >>
> >> That is not quite the same, as it looks for the abbreviation _in the
> >> submodule_, not in the current project. So I think `find_unique_abbrev()`
> >> is not correct here.
> >>
> >> The funny thing is that we _already_ will have called `git rev-parse
> >> --verify` for both `p->oid_src` and `p->oid_dst` in the submodule, in the
> >> caller of this function! And while we throw away the result, and while we
> >> do not pass `--short`, there is no reason why we shouldn't be able to do
> >> precisely that.
> > 
> > Okay so you are saying that there is no need of a 'find_unique_abbrev()'
> > since we would be easily able to obtain these values from the caller of
> > 'print_submodule_summary()' right?
> 
> Yes. 'generate_submodule_summary' already does a rev-parse on
> p->oid_src and p->oid_dst via 'verify_submodule_object_name'.
> We should be able to get the short version of them by passing '--short'
> to rev-parse there and make it return the short SHA1. We can then use it
> like how Dscho mentions below.

Okay. That seems good. Yes, it will be cleaner to pass them as
arguments.

> > Maybe we can pass 'oid_src' or 'oid_dst' as an argument?
> > 
> >>> +				 find_unique_abbrev(&p->oid_dst, 7));
> >>> +		else
> >>> +			printf(_("* %s %s(submodule)->%s(blob)"),
> >>> +				 displaypath, find_unique_abbrev(&p->oid_src, 7),
> >>> +				 find_unique_abbrev(&p->oid_dst, 7));
> >>> +	} else {
> >>> +		printf("* %s %s...%s",
> >>> +			displaypath, find_unique_abbrev(&p->oid_src, 7),
> >>> +			find_unique_abbrev(&p->oid_dst, 7));
> >>> +	}
> >>> +
> >>> +	if (total_commits < 0)
> >>> +		printf(":\n");
> >>> +	else
> >>> +		printf(" (%d):\n", total_commits);
> >>> +
> >>> +	if (errmsg) {
> >>> +		/*
> >>> +		 * Don't give error msg for modification whose dst is not
> >>> +		 * submodule, i.e. deleted or changed to blob
> >>> +		 */
> >>> +		if (S_ISGITLINK(p->mod_src)) {
> >>> +			if (missing_src && missing_dst) {
> >>> +				printf(_("  Warn: %s doesn't contain commits %s and %s\n"),
> >>> +				       displaypath, oid_to_hex(&p->oid_src),
> >>> +				       oid_to_hex(&p->oid_dst));
> >>> +			} else if (missing_src) {
> >>> +				printf(_("  Warn: %s doesn't contain commit %s\n"),
> >>> +				       displaypath, oid_to_hex(&p->oid_src));
> >>> +			} else {
> >>> +				printf(_("  Warn: %s doesn't contain commit %s\n"),
> >>> +				       displaypath, oid_to_hex(&p->oid_dst));
> >>> +			}
> >>> +		}
> >>> +	} else if (is_sm_git_dir) {
> >>> +		struct child_process cp_log = CHILD_PROCESS_INIT;
> >>> +
> >>> +		cp_log.git_cmd = 1;
> >>> +		cp_log.dir = p->sm_path;
> >>> +		prepare_submodule_repo_env(&cp_log.env_array);
> >>
> >> Since the working directory is set to the top-level directory of the
> >> submodule, and since `prepare_submodule_repo_env()` sets `GIT_DIR` to
> >> `.git`, I think that the `is_sm_git_dir` condition is unnecessary. In
> >> fact, the entire `is_sm_git_dir` parameter (and local variable in the
> >> caller, see more on that below) can go away.
> > 
> > Because we already set the $GIT_DIR to .git/ so an extra check will not
> > be necessary right?
> > 
> 
> Yes. If we remove that check and we get a p->sm_path that does not point
> to a submodule, I wonder what would happen if we run 'run_command'
> on it. I'm also not sure if that's a possible case. Something to
> explore.

This might take some time but would be a huge change for this patch if
successful.

> >>> +		argv_array_pushl(&cp_log.args, "log", NULL);
> >>> +
> >>> +		if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst)) {
> >>> +			if (info->summary_limit > 0)
> >>> +				argv_array_pushf(&cp_log.args, "-%d",
> >>> +						 info->summary_limit);
> >>> +
> >>> +			argv_array_pushl(&cp_log.args, "--pretty=  %m %s",
> >>> +					 "--first-parent", NULL);
> >>> +			argv_array_pushf(&cp_log.args, "%s...%s",
> >>> +					 oid_to_hex(&p->oid_src),
> >>> +					 oid_to_hex(&p->oid_dst));
> >>> +		} else if (S_ISGITLINK(p->mod_dst)) {
> >>> +			argv_array_pushl(&cp_log.args, "--pretty=  > %s",
> >>> +					 "-1", oid_to_hex(&p->oid_dst), NULL);
> >>> +		} else {
> >>> +			argv_array_pushl(&cp_log.args, "--pretty=  < %s",
> >>> +					 "-1", oid_to_hex(&p->oid_src), NULL);
> >>> +		}
> >>> +		run_command(&cp_log);
> >>> +	}
> >>> +	printf("\n");
> >>> +}
> >>
> >> It looks as if there is a whole lot of `oid_to_hex(&p->oid_src)` in that
> >> function. Together with the realization that we need the abbreviated
> >> version of that at least in one place, and the other realization that we
> >> already call `rev-parse --verify` for both `oid_src` and `oid_dst` in the
> >> caller of this function, it seems to suggest itself that we would actually
> >> want to pass the `--short` option, too, and to capture the output, and
> >> pass it down to `print_submodule_summary()` _instead of_ `missing_src` and
> >> `missing_dst` (e.g. as `src_abbrev` and `dst_abbrev`).
> > 
> > Oh you have mentioned it here too. This seems quite a good approach. I
> > will adopt this.
> > 
> >>> +
> >>> +static void generate_submodule_summary(struct summary_cb *info,
> >>> +				       struct module_cb *p)
> >>> +{
> >>> +	int missing_src = 0;
> >>> +	int missing_dst = 0;
> >>> +	char *displaypath;
> >>> +	int errmsg = 0;
> >>> +	int total_commits = -1;
> >>> +	int is_sm_git_dir = 0;
> >>> +	struct strbuf sm_git_dir_sb = STRBUF_INIT;
> >>> +
> >>> +	if (!info->cached && oideq(&p->oid_dst, &null_oid)) {
> >>> +		if (S_ISGITLINK(p->mod_dst)) {
> >>> +			/*
> >>> +			 * NEEDSWORK: avoid using separate process with
> >>> +			 * the help of the function head_ref_submodule()
> >>
> >> I don't quite understand this comment. There is no `head_ref_submodule()`
> >> function.
> >>
> 
> That NEEDSWORK was added based on Stefan's comment on a previous version
> of Prathamesh's patch. Here it is for reference:
> 
> >> +       if (!info->cached && !oidcmp(&p->oid_dst, &null_oid)) {
> >> +               if (S_ISGITLINK(p->mod_dst)) {
> >> +                       struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> >> +                       struct strbuf sb_rev_parse = STRBUF_INIT;
> >> +
> >> +                       cp_rev_parse.git_cmd = 1;
> >> +                       cp_rev_parse.no_stderr = 1;
> >> +                       cp_rev_parse.dir = p->sm_path;
> >> +                       prepare_submodule_repo_env(&cp_rev_parse.env_array);
> >> +
> >> +                       argv_array_pushl(&cp_rev_parse.args,
> >> +                                        "rev-parse", "HEAD", NULL);
> >> +                       if (!capture_command(&cp_rev_parse, &sb_rev_parse, 0)) {
> >> +                               strbuf_strip_suffix(&sb_rev_parse, "\n");
> >> +
> >> +                               get_oid_hex(sb_rev_parse.buf, &p->oid_dst);
> >> +                       }
> >> +                       strbuf_release(&sb_rev_parse);
> > 
> > I think this could be replaced via
> > head_ref_submodule(sub->path, callback function, &where_to_store)
> > or is there some trickery going on, that this also works on
> > non-compliant submodules?
> > (Maybe add that as a NEEDSWORK/TODO)
> 
> Ref:
> https://public-inbox.org/git/CAGZ79kaWn9z47Va=VW4R2Aswws1N5n2u4Kvatn73s0YnV0pVqQ@mail.gmail.com/
> 
> A quick search reveals that 'head_ref_submodule' existed during that
> period. On further investigation it seems that 'refs_head_ref' was
> introduced in 62f0b399e0 (refs: add refs_head_ref(), 2017-08-23) and
> 'head_ref_submodule' was made to use it. Later on, in 419221c106 (refs:
> remove dead for_each_*_submodule(), 2017-08-23), 'head_ref-submodule'
> was removed with an advice to use the 'refs_' API for accessing
> submodules.
> 
> > +* Use `refs_` API for accessing submodules. The submodule ref store could
> > +  be obtained with `get_submodule_ref_store()`
> 
> How it applies to our code is something to be looked into, yet.
> 
> >>> +			 */
> >>> +			struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> >>> +			struct strbuf sb_rev_parse = STRBUF_INIT;
> >>> +
> >>> +			cp_rev_parse.git_cmd = 1;
> >>> +			cp_rev_parse.no_stderr = 1;
> >>> +			cp_rev_parse.dir = p->sm_path;
> >>> +			prepare_submodule_repo_env(&cp_rev_parse.env_array);
> >>> +
> >>> +			argv_array_pushl(&cp_rev_parse.args, "rev-parse",
> >>> +					 "HEAD", NULL);
> >>> +			if (!capture_command(&cp_rev_parse, &sb_rev_parse, 0)) {
> >>> +				strbuf_strip_suffix(&sb_rev_parse, "\n");
> >>> +				get_oid_hex(sb_rev_parse.buf, &p->oid_dst);
> >>> +			}
> >>> +			strbuf_release(&sb_rev_parse);
> >>> +		} else if (S_ISLNK(p->mod_dst) || S_ISREG(p->mod_dst)) {
> >>> +			struct child_process cp_hash_object = CHILD_PROCESS_INIT;
> >>> +			struct strbuf sb_hash_object = STRBUF_INIT;
> >>> +
> >>> +			cp_hash_object.git_cmd = 1;
> >>> +			argv_array_pushl(&cp_hash_object.args, "hash-object",
> >>> +					 p->sm_path, NULL);
> >>> +			if (!capture_command(&cp_hash_object,
> >>> +					     &sb_hash_object, 0)) {
> >>> +				strbuf_strip_suffix(&sb_hash_object, "\n");
> >>> +				get_oid_hex(sb_hash_object.buf, &p->oid_dst);
> >>> +			}
> >>> +			strbuf_release(&sb_hash_object);
> >>
> >> It would probably be shorter, less error-prone, and quicker to use
> >> `index_fd()` directly.
> >>
> >> BTW I am not quite sure that this code does the correct thing in case of a
> >> symlink: it hashes the contents of the symlink target (if it is a file,
> >> otherwise it errors out). But that is hardly an issue introduced by the
> >> conversion, that's just copied from `git-submodule.sh`.
> >>
> >>> +		} else {
> >>> +			if (p->mod_dst)
> >>> +				die(_("unexpected mode %d\n"), p->mod_dst);
> >>
> >> Hmm. This does not match what the shell script version does:
> >>
> >>                         *)
> >>                                 # unexpected type
> >>                                 eval_gettextln "unexpected mode \$mod_dst" >&2
> >>                                 continue ;;
> >>
> >> I think we should also just write the message to `stderr` and continue,
> >> not `die()`.
> >>
> >> In addition to that, I am missing the C code for this case:
> >>
> >>                         000000)
> >>                                 ;; # removed
> >>
> >> It is quite possible that our test suite does not cover this case (or did
> >> the test suite fail for you?). If that is indeed the case, it would be
> >> really good to add a test case as part of this patch series, to gain
> >> confidence in the correctness of the conversion.
> > 
> > The tests passed for me actually. Whether this is covered by the test
> > cases, I am not sure. I will have to check it.
> > 
> >>> +		}
> >>> +	}
> >>> +
> >>> +	strbuf_addstr(&sm_git_dir_sb, p->sm_path);
> >>
> >> I have to admit that I am not loving the name `sm_git_dir_sb`. Why not
> >> `submodule_git_dir`? I guess you copied it from elsewhere in
> >> `submodule--helper.c`...
> >>
> >>> +	if (is_nonbare_repository_dir(&sm_git_dir_sb))
> >>> +		is_sm_git_dir = 1;
> >>
> >> So here, we verify whether there is a repository at `p->sm_path`. I don't
> >> see that in the shell script version:
> >>
> >>                 missing_src=
> >>                 missing_dst=
> >>
> >>                 test $mod_src = 160000 &&
> >>                 ! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_src^0 >/dev/null &&
> >>                 missing_src=t
> >>
> >>                 test $mod_dst = 160000 &&
> >>                 ! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_dst^0 >/dev/null &&
> >>                 missing_dst=t
> >>
> >> Let's read a bit further.
> >>
> >>> +
> >>> +	if (is_sm_git_dir && S_ISGITLINK(p->mod_src))
> >>> +		missing_src = verify_submodule_object_name(p->sm_path,
> >>> +							   oid_to_hex(&p->oid_src));
> >>
> >> Ah, and `verify_submodule_object_name()` uses `p->sm_path` as working
> >> directory. But that's not what the shell script version did: it specified
> >> the `GIT_DIR` explicitly.
> >>
> >> And by using the `prepare_submodule_repo_env()` function in
> >> `verify_submodule_object_name()`, we specify `GIT_DIR` implicitly, as I
> >> pointed out in my comment on that function.
> > 
> > Oh so you're saying that it will be better to call
> > 'prepare_submodule_repo_env()' on some variable since we explicitly want to
> > store the path to GIT_DIR?
> > 
> 
> We already call 'prepare_submodule_repo_env' in
> 'verify_submodule_object_name'. So, he's likely saying that
> 'is_sm_git_dir' is unnecessary here.

Understood.

> > It would be of much help if you could explain this part just a little
> > more (for my own sake).
> > 
> >> So I think that `is_sm_git_dir` might be
> > 
> 
> ... unnecessary.
> 
> > I think you missed something here...
> > 
> 
> That's likely what he meant based on what is mentioned above.

Oh okay. Thanks!

> >>> +
> >>> +	if (is_sm_git_dir && S_ISGITLINK(p->mod_dst))
> >>> +		missing_dst = verify_submodule_object_name(p->sm_path,
> >>> +							   oid_to_hex(&p->oid_dst));
> >>> +
> >>> +	displaypath = get_submodule_displaypath(p->sm_path, info->prefix);
> >>> +
> >>> +	if (!missing_dst && !missing_src) {
> >>> +		if (is_sm_git_dir) {
> >>> +			struct child_process cp_rev_list = CHILD_PROCESS_INIT;
> >>> +			struct strbuf sb_rev_list = STRBUF_INIT;
> >>> +			char *range;
> >>> +
> >>> +			if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst))
> >>> +				range = xstrfmt("%s...%s", oid_to_hex(&p->oid_src),
> >>> +						oid_to_hex(&p->oid_dst));
> >>> +			else if (S_ISGITLINK(p->mod_src))
> >>> +				range = xstrdup(oid_to_hex(&p->oid_src));
> >>> +			else
> >>> +				range = xstrdup(oid_to_hex(&p->oid_dst));
> >>> +
> >>> +			cp_rev_list.git_cmd = 1;
> >>> +			cp_rev_list.dir = p->sm_path;
> >>> +			prepare_submodule_repo_env(&cp_rev_list.env_array);
> >>
> >> Again, due to setting the working directory to `p->sm_path` and
> >> (implicitly, via `prepare_submodule_repo_env()`) `GIT_DIR` to `.git`, I do
> >> not think that we have to guard this block beind `is_sm_git_dir`.
> > 
> >>> +
> >>> +			argv_array_pushl(&cp_rev_list.args, "rev-list",
> >>> +					 "--first-parent", range, "--", NULL);
> >>
> >> Since `argv_array_push()` duplicates the strings, anyway, we can totally
> >> avoid the need for the `range` variable:
> >>
> >> 			if (IS_GITLINK(p->mod_src) && IS_GITLINK(p->mod_dst))
> >> 				argv_array_pushf(&cp_rev_list.args, "%s...%s",
> >> 						 oid_to_hex(&p->oid_src),
> >> 						 oid_to_hex(&p->oid_dst));
> >> 			else
> >> 				argv_array_push(&cp_rev_list.args, IS_GITLINK(p->mod_src) ?
> >> 						oid_to_hex(&p->oid_src) :
> >> 						oid_to_hex(&p->oid_dst));
> >>
> >>> +			if (!capture_command(&cp_rev_list, &sb_rev_list, 0)) {
> >>> +				if (sb_rev_list.len)
> >>> +					total_commits = count_lines(sb_rev_list.buf,
> >>> +								    sb_rev_list.len);
> >>
> >> That's actually not necessary. `git rev-list --count` will give you a nice
> >> number, no need to capture a potentially large amount of memory only to
> >> count the lines.
> >>
> >> This may also make the patch obsolete that makes `count_lines()` public.
> > 
> > Therefore we eliminate count_lines() from here and instead do a 'git
> > rev-list --count'?
> > 
> 
> Yes.
> 
> >>> +				else
> >>> +					total_commits = 0;
> >>> +			}
> >>
> >>> +
> >>> +			free(range);
> >>> +			strbuf_release(&sb_rev_list);
> >>> +		}
> >>> +	} else {
> >>> +		errmsg = 1;
> >>> +	}
> >>
> >> I am missing the equivalent for these lines here:
> >>
> >>                 t,)
> >>                         errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commit \$sha1_src")"
> >>                         ;;
> >>                 ,t)
> >>                         errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commit \$sha1_dst")"
> >>                         ;;
> >>                 t,t)
> >>                         errmsg="$(eval_gettext "  Warn: \$display_name doesn't contain commits \$sha1_src and \$ ha1_dst")"
> >>                         ;;
> > 
> > I will add them.
> > 
> 
> I think they're already there in the 'print_submodule_summary' function
> above.
> 
> >> I am not quite sure whether it is a good idea to leave it to the
> >> `print_submodule_summary()` function to generate the `errmsg`. I think I'd
> >> rather have it a `char *` than an `int`.
> > 
> > Would it be better to add these error messages in
> > 'prepare_submodule_summary()'?
> 
> No. He's likely saying that instead of setting `errmsg` to 1 and
> constructing the error message in `print_submodule_summary` we should
> be having the error messages in `generate_submodule_summary` and pass
> them to `print_submodule_summary` instead of passing an int.

So this means that we will have to generate the above mentioned messages
in 'generate_submodule_summary()' and then pass them as char into
'print_submodule_summary()' rather than generating them in the latter.
This is what we want right?

> > If we have error messages as integers
> > then we will simply
> > 
> 
> You missed something here. ;)

I actually don't know what I was trying to write here.

> >>> +
> >>> +	print_submodule_summary(info, errmsg, total_commits,
> >>> +				missing_src, missing_dst,
> >>> +		      		displaypath, is_sm_git_dir, p);
> >>> +
> >>> +	free(displaypath);
> >>> +	strbuf_release(&sm_git_dir_sb);
> >>> +}
> >>> +
> >>> +static void prepare_submodule_summary(struct summary_cb *info,
> >>> +				      struct module_cb_list *list)
> >>> +{
> >>> +	int i;
> >>> +	for (i = 0; i < list->nr; i++) {
> >>> +		struct module_cb *p = list->entries[i];
> >>> +		struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> >>> +
> >>> +		if (p->status == 'D' || p->status == 'T') {
> >>> +			generate_submodule_summary(info, p);
> >>> +			continue;
> >>> +		}
> >>> +
> >>> +		if (info->for_status) {
> >>> +			char *config_key;
> >>
> >> Since the `config_key` is only used within the `if()` block it would be
> >> better to declare it within that block.
> >>
> >>> +			const char *ignore_config = "none";
> >>
> >> Since the only value we ever care about is "all", how about turning this
> >> into an `int`, setting it to `0` here, and later assigning it to
> >> `!strcmp(value, "all")` and `!strcmp(sub->ignore, "all")`, respectively?
> > 
> > Alright will do!
> > 
> >> I mean, I get it. Unix shell scripts are all about passing around text.
> >> And it is much easier to just translate that to C faithfully. But that
> >> does not make it good C code. C has data types, and proper C code makes
> >> use of that.
> >>
> >>> +			const char *value;
> >>
> >> If you want to save on lines, you can cuddle this together with other
> >> declarations of the same type. Even so, it could be scoped more narrowly.
> >>
> >>> +			const struct submodule *sub = submodule_from_path(the_repository,
> >>> +									  &null_oid,
> >>> +									  p->sm_path);
> >>> +
> >>> +			if (sub && p->status != 'A') {
> >>
> >> Good. The shell script version _always_ retrieved the `.ignore` config
> >> value, even if the `status` is `A`. Your version is much better.
> >>
> >> But why bother calling `submodule_from_path()` if the status is `A`?
> > 
> > What exactly does a status of 'A' or 'T' mean? I mean I know what we are
> > doing but what exactly do these translate into?
> > 
> 
> Its interesting you understood it without knowing what 'A' and 'T'
> meant. Anyways, if you take a look at the documentation of
> 'diff-index'[1] which provides us the `status` you'll know that:

I just assumed that they must mean something to something and read on
the rest of the code. I meant to ask the full-form of 'A', 'T', etc.
You have provided it below and this makes things even more clear.
Thanks!

> > Possible status letters are:
> > 
> >     A: addition of a file
> > 
> >     C: copy of a file into a new one
> > 
> >     D: deletion of a file
> > 
> >     M: modification of the contents or mode of a file
> > 
> >     R: renaming of a file
> > 
> >     T: change in the type of the file
> > 
> >     U: file is unmerged (you must complete the merge before it can be committed)
> > 
> >     X: "unknown" change type (most probably a bug, please report it)
> > 
> 
> [1]: https://git-scm.com/docs/git-diff-index
> 
> >> I could actually see the `const struct submodule *sub;` declaration be
> >> pulled out of this scope, and combining the `if (info->for_status &&
> >> p->status != 'A'), and the moving the assignment of `sub` into the `else
> >> if ((sub = submodule_from_path(r, &null_oid, p->sm_path)) &&
> >> sub->ignore)`.
> >>
> >> That would save us one entire indentation level.
> > 
> > That seems a good approach! I will try this out.
> > 
> >>> +				config_key = xstrfmt("submodule.%s.ignore",
> >>> +						     sub->name);
> >>> +				if (!git_config_get_string_const(config_key, &value))
> >>> +					ignore_config = value;
> >>> +				else if (sub->ignore)
> >>> +					ignore_config = sub->ignore;
> >>> +
> >>> +				free(config_key);
> >>> +				if (!strcmp(ignore_config, "all"))
> >>> +					continue;
> >>> +			}
> >>> +		}
> >>> +
> >>> +		/* Also show added or modified modules which are checked out */
> >>> +		cp_rev_parse.dir = p->sm_path;
> >>> +		cp_rev_parse.git_cmd = 1;
> >>> +		cp_rev_parse.no_stderr = 1;
> >>> +		cp_rev_parse.no_stdout = 1;
> >>> +
> >>> +		argv_array_pushl(&cp_rev_parse.args, "rev-parse",
> >>> +				 "--git-dir", NULL);
> >>> +
> >>> +		if (!run_command(&cp_rev_parse))
> >>
> >> I wonder whether we really need to waste an entire spawned process on
> >> figuring out whether `p->sm_path` refers to an active repository. Wouldn't
> >> `is_submodule_active(r, p->sm_path)` fulfill the same purpose?
> > 
> > Yep! This is correct. I will change.
> > 
> >>> +			generate_submodule_summary(info, p);
> >>> +	}
> >>> +}
> >>> +
> >>> +static void submodule_summary_callback(struct diff_queue_struct *q,
> >>> +				       struct diff_options *options,
> >>> +				       void *data)
> >>> +{
> >>> +	int i;
> >>> +	struct module_cb_list *list = data;
> >>> +	for (i = 0; i < q->nr; i++) {
> >>> +		struct diff_filepair *p = q->queue[i];
> >>> +		struct module_cb *temp;
> >>> +
> >>> +		if (!S_ISGITLINK(p->one->mode) && !S_ISGITLINK(p->two->mode))
> >>> +			continue;
> >>> +		temp = (struct module_cb*)malloc(sizeof(struct module_cb));
> >>> +		temp->mod_src = p->one->mode;
> >>> +		temp->mod_dst = p->two->mode;
> >>> +		temp->oid_src = p->one->oid;
> >>> +		temp->oid_dst = p->two->oid;
> >>> +		temp->status = p->status;
> >>> +		temp->sm_path = xstrdup(p->one->path);
> >>> +
> >>> +		ALLOC_GROW(list->entries, list->nr + 1, list->alloc);
> >>> +		list->entries[list->nr++] = temp;
> >>> +	}
> >>> +}
> >>> +
> >>> +static const char *get_diff_cmd(enum diff_cmd diff_cmd)
> >>> +{
> >>> +	switch (diff_cmd) {
> >>> +	case DIFF_INDEX: return "diff-index";
> >>> +	case DIFF_FILES: return "diff-files";
> >>> +	default: BUG("bad diff_cmd value %d", diff_cmd);
> >>> +	}
> >>> +}
> >>> +
> >>> +static int compute_summary_module_list(char *head,
> >>> +				         struct summary_cb *info,
> >>> +				         enum diff_cmd diff_cmd)
> >>> +{
> >>> +	struct argv_array diff_args = ARGV_ARRAY_INIT;
> >>> +	struct rev_info rev;
> >>> +	struct module_cb_list list = MODULE_CB_LIST_INIT;
> >>> +
> >>> +	argv_array_push(&diff_args, get_diff_cmd(diff_cmd));
> >>> +	if (info->cached)
> >>> +		argv_array_push(&diff_args, "--cached");
> >>> +	argv_array_pushl(&diff_args, "--ignore-submodules=dirty", "--raw",
> >>> +			 NULL);
> >>> +	if (head)
> >>> +		argv_array_push(&diff_args, head);
> >>> +	argv_array_push(&diff_args, "--");
> >>> +	if (info->argc)
> >>> +		argv_array_pushv(&diff_args, info->argv);
> >>> +
> >>> +	git_config(git_diff_basic_config, NULL);
> >>> +	init_revisions(&rev, info->prefix);
> >>> +	rev.abbrev = 0;
> >>> +	precompose_argv(diff_args.argc, diff_args.argv);
> >>> +
> >>> +	diff_args.argc = setup_revisions(diff_args.argc, diff_args.argv,
> >>> +					 &rev, NULL);
> >>> +	rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
> >>> +	rev.diffopt.format_callback = submodule_summary_callback;
> >>> +	rev.diffopt.format_callback_data = &list;
> >>> +
> >>> +	if (!info->cached) {
> >>> +		if (diff_cmd ==  DIFF_INDEX)
> >>
> >> Please substitute the double-space by a single one.
> > 
> > Will do!
> > 
> >>> +			setup_work_tree();
> >>> +		if (read_cache_preload(&rev.diffopt.pathspec) < 0) {
> >>> +			perror("read_cache_preload");
> >>> +			return -1;
> >>> +		}
> >>> +	} else if (read_cache() < 0) {
> >>> +		perror("read_cache");
> >>> +		return -1;
> >>> +	}
> >>> +
> >>> +	if (diff_cmd == DIFF_INDEX)
> >>> +		run_diff_index(&rev, info->cached);
> >>> +	else
> >>> +		run_diff_files(&rev, 0);
> >>> +	prepare_submodule_summary(info, &list);
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +static int module_summary(int argc, const char **argv, const char *prefix)
> >>> +{
> >>> +	struct summary_cb info = SUMMARY_CB_INIT;
> >>> +	int cached = 0;
> >>> +	int for_status = 0;
> >>> +	int quiet = 0;
> >>> +	int files = 0;
> >>> +	int summary_limit = -1;
> >>> +	struct child_process cp_rev = CHILD_PROCESS_INIT;
> >>> +	struct strbuf sb = STRBUF_INIT;
> >>> +	enum diff_cmd diff_cmd = DIFF_INDEX;
> >>> +	int ret;
> >>> +
> >>> +	struct option module_summary_options[] = {
> >>> +		OPT__QUIET(&quiet, N_("Suppress output of summarising submodules")),
> >>> +		OPT_BOOL(0, "cached", &cached,
> >>> +			 N_("Use the commit stored in the index instead of the submodule HEAD")),
> >>> +		OPT_BOOL(0, "files", &files,
> >>> +			 N_("To compare the commit in the index with that in the submodule HEAD")),
> >>> +		OPT_BOOL(0, "for-status", &for_status,
> >>> +			 N_("Skip submodules with 'ignore_config' value set to 'all'")),
> >>> +		OPT_INTEGER('n', "summary-limit", &summary_limit,
> >>> +			     N_("Limit the summary size")),
> >>> +		OPT_END()
> >>> +	};
> >>> +
> >>> +	const char *const git_submodule_helper_usage[] = {
> >>> +		N_("git submodule--helper summary [<options>] [commit] [--] [<path>]"),
> >>> +		NULL
> >>> +	};
> >>> +
> >>> +	argc = parse_options(argc, argv, prefix, module_summary_options,
> >>> +			     git_submodule_helper_usage, 0);
> >>> +
> >>> +	if (!summary_limit)
> >>> +		return 0;
> >>> +
> >>> +	cp_rev.git_cmd = 1;
> >>> +	argv_array_pushl(&cp_rev.args, "rev-parse", "-q", "--verify",
> >>> +			 argc ? argv[0] : "HEAD", NULL);
> >>
> >> Oy. Why not simply call `get_oid()`? No need to spawn a new process.
> > 
> > Then everytime we need 'rev-parse', we simply call 'get_oid()'? That
> > will save us a ton of processes?
> > 
> > But I think we do need to capture the output of 'git rev-parse --verify
> > ....' so I think it will backfire to use 'get_oid()' or am I just being
> > too dumb and not catching on something?
> > 
> 
> I'll leave this for others to answer.

I will resolve this one after Dscho answers then.

> >>> +
> >>> +	if (!capture_command(&cp_rev, &sb, 0)) {
> >>> +		strbuf_strip_suffix(&sb, "\n");
> >>> +		if (argc) {
> >>> +			argv++;
> >>> +			argc--;
> >>> +		}
> >>> +	} else if (!argc || !strcmp(argv[0], "HEAD")) {
> >>> +		/* before the first commit: compare with an empty tree */
> >>> +		struct stat st;
> >>> +		struct object_id oid;
> >>> +		if (fstat(0, &st) < 0 || index_fd(&the_index, &oid, 0, &st, 2,
> >>> +						  prefix, 3))
> >>> +			die("Unable to add %s to database", oid.hash);
> >>
> >> Umm. The original reads:
> >>
> >>                 # before the first commit: compare with an empty tree
> >>                 head=$(git hash-object -w -t tree --stdin </dev/null)
> >>
> >> It does not actually read from `stdin`. It reads from `/dev/null`,
> >> redirected to the input. And what it _actually_ does is to generate the
> >> OID of the empty tree.
> >>
> >> But we already _have_ the OID of the empty tree! It's
> >> `the_hash_algo->empty_tree`.
> > 
> > I did not know this 'the_hash_algo'. I will use it. Thanks! :)
> > 
> >> I hope that this is covered by the test suite. Please check that. The test
> >> would succeed with your version, but only because tests are run with
> >> `stdin` redirected from `/dev/null` by default.
> > 
> > I guess yes. My work passed because the tests are written this way.
> > 
> >>> +		strbuf_addstr(&sb, oid_to_hex(&oid));
> >>> +		if (argc) {
> >>> +			argv++;
> >>> +			argc--;
> >>> +		}
> >>> +	} else {
> >>> +		strbuf_addstr(&sb, "HEAD");
> >>> +	}
> >>
> >> The conversion to C would make for a fine excuse to simplify the logic.
> > 
> > This was kind of like the 'shift' in shell. What equivalent do you
> > suggest?
> > 
> 
> I think that's just a general comment after the other comments found
> just above about simplifying things.

Alright. But I do have to simplify the logic right?

> >>> +	if (files) {
> >>> +		if (cached)
> >>> +			die(_("--cached and --files are mutually exclusive"));
> >>> +		diff_cmd = DIFF_FILES;
> >>> +	}
> >>> +
> >>> +	info.argc = argc;
> >>> +	info.argv = argv;
> >>> +	info.prefix = prefix;
> >>> +	info.cached = !!cached;
> >>> +	info.for_status = !!for_status;
> >>> +	info.quiet = quiet;
> >>> +	info.files = files;
> >>> +	info.summary_limit = summary_limit;
> >>> +
> >>> +	ret = compute_summary_module_list((diff_cmd == DIFF_FILES) ? NULL : sb.buf,
> >>> +					   &info, diff_cmd);
> >>
> >> It would be better to pass the OID as `struct object_id *`, not as string.
> > 
> > Will do!
> > 
> >> Other than that, this patch nicely follows previous conversions from Unix
> >> shell scripts to C.
> >>
> >> Well done,
> >> Johannes
> > 
> > Thank you! It was a highly detailed review! I am still learning tons of
> > stuff about Git's code and such a review does help a lot! :)
> > 

> Hope this helps,

It surely did! Thanks Kaartic :0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C
  2020-07-06 11:15         ` Shourya Shukla
@ 2020-07-12  0:46           ` Johannes Schindelin
  2020-07-15 14:53             ` Shourya Shukla
  0 siblings, 1 reply; 21+ messages in thread
From: Johannes Schindelin @ 2020-07-12  0:46 UTC (permalink / raw)
  To: Shourya Shukla
  Cc: Kaartic Sivaraam, git, christian.couder, gitster, liu.denton,
	pc44800, stefanbeller

Hi Shourya,

On Mon, 6 Jul 2020, Shourya Shukla wrote:

> On 06/07 02:46, Kaartic Sivaraam wrote:
> > On 05-07-2020 23:04, Shourya Shukla wrote:
> > >> On Fri, 3 Jul 2020, Shourya Shukla wrote:
> > >>
> > >>> +static int module_summary(int argc, const char **argv, const char *prefix)
> > >>> +{
> > >>> +	struct summary_cb info = SUMMARY_CB_INIT;
> > >>> +	int cached = 0;
> > >>> +	int for_status = 0;
> > >>> +	int quiet = 0;
> > >>> +	int files = 0;
> > >>> +	int summary_limit = -1;
> > >>> +	struct child_process cp_rev = CHILD_PROCESS_INIT;
> > >>> +	struct strbuf sb = STRBUF_INIT;
> > >>> +	enum diff_cmd diff_cmd = DIFF_INDEX;
> > >>> +	int ret;
> > >>> +
> > >>> +	struct option module_summary_options[] = {
> > >>> +		OPT__QUIET(&quiet, N_("Suppress output of summarising submodules")),
> > >>> +		OPT_BOOL(0, "cached", &cached,
> > >>> +			 N_("Use the commit stored in the index instead of the submodule HEAD")),
> > >>> +		OPT_BOOL(0, "files", &files,
> > >>> +			 N_("To compare the commit in the index with that in the submodule HEAD")),
> > >>> +		OPT_BOOL(0, "for-status", &for_status,
> > >>> +			 N_("Skip submodules with 'ignore_config' value set to 'all'")),
> > >>> +		OPT_INTEGER('n', "summary-limit", &summary_limit,
> > >>> +			     N_("Limit the summary size")),
> > >>> +		OPT_END()
> > >>> +	};
> > >>> +
> > >>> +	const char *const git_submodule_helper_usage[] = {
> > >>> +		N_("git submodule--helper summary [<options>] [commit] [--] [<path>]"),
> > >>> +		NULL
> > >>> +	};
> > >>> +
> > >>> +	argc = parse_options(argc, argv, prefix, module_summary_options,
> > >>> +			     git_submodule_helper_usage, 0);
> > >>> +
> > >>> +	if (!summary_limit)
> > >>> +		return 0;
> > >>> +
> > >>> +	cp_rev.git_cmd = 1;
> > >>> +	argv_array_pushl(&cp_rev.args, "rev-parse", "-q", "--verify",
> > >>> +			 argc ? argv[0] : "HEAD", NULL);
> > >>
> > >> Oy. Why not simply call `get_oid()`? No need to spawn a new process.
> > >
> > > Then everytime we need 'rev-parse', we simply call 'get_oid()'? That
> > > will save us a ton of processes?
> > >
> > > But I think we do need to capture the output of 'git rev-parse --verify
> > > ....' so I think it will backfire to use 'get_oid()' or am I just being
> > > too dumb and not catching on something?
> > >
> >
> > I'll leave this for others to answer.
>
> I will resolve this one after Dscho answers then.

The `rev-parse` command _would_ essentially call `get_oid()` and print the
result (after converting it to hex, which you don't need, because the
caller actually would parse the hex string anyway).

You can verify my claim by following the code:
https://github.com/git/git/blob/v2.27.0/builtin/rev-parse.c#L956-L982 (it
is slightly harder to follow because the `--verify` option makes sure that
only one rev is shown, which means that the `get_oid_with_context()` call
is separated from the corresponding `show_rev()` call).

So there really is no need to capture the output of that `rev-parse`
command.

It is a different story, of course, when capturing the output of Git
commands _that are run in a submodule_. Those currently still need to be
spawned.

> > >>> +
> > >>> +	if (!capture_command(&cp_rev, &sb, 0)) {
> > >>> +		strbuf_strip_suffix(&sb, "\n");
> > >>> +		if (argc) {
> > >>> +			argv++;
> > >>> +			argc--;
> > >>> +		}
> > >>> +	} else if (!argc || !strcmp(argv[0], "HEAD")) {
> > >>> +		/* before the first commit: compare with an empty tree */
> > >>> +		struct stat st;
> > >>> +		struct object_id oid;
> > >>> +		if (fstat(0, &st) < 0 || index_fd(&the_index, &oid, 0, &st, 2,
> > >>> +						  prefix, 3))
> > >>> +			die("Unable to add %s to database", oid.hash);
> > >>
> > >> Umm. The original reads:
> > >>
> > >>                 # before the first commit: compare with an empty tree
> > >>                 head=$(git hash-object -w -t tree --stdin </dev/null)
> > >>
> > >> It does not actually read from `stdin`. It reads from `/dev/null`,
> > >> redirected to the input. And what it _actually_ does is to generate the
> > >> OID of the empty tree.
> > >>
> > >> But we already _have_ the OID of the empty tree! It's
> > >> `the_hash_algo->empty_tree`.
> > >
> > > I did not know this 'the_hash_algo'. I will use it. Thanks! :)
> > >
> > >> I hope that this is covered by the test suite. Please check that. The test
> > >> would succeed with your version, but only because tests are run with
> > >> `stdin` redirected from `/dev/null` by default.
> > >
> > > I guess yes. My work passed because the tests are written this way.
> > >
> > >>> +		strbuf_addstr(&sb, oid_to_hex(&oid));
> > >>> +		if (argc) {
> > >>> +			argv++;
> > >>> +			argc--;
> > >>> +		}
> > >>> +	} else {
> > >>> +		strbuf_addstr(&sb, "HEAD");
> > >>> +	}
> > >>
> > >> The conversion to C would make for a fine excuse to simplify the logic.
> > >
> > > This was kind of like the 'shift' in shell. What equivalent do you
> > > suggest?
> > >
> >
> > I think that's just a general comment after the other comments found
> > just above about simplifying things.
>
> Alright. But I do have to simplify the logic right?

You do not _have_ to. But it would make for a good opportunity to do that,
I think, as the code is really hard to follow.

The idea here is actually not at all hard to understand, though: use
the speficied rev (falling back to `HEAD`) to compare to, unless it is a
yet-unborn `HEAD` in which case you compare to the empty tree.

It is very, very similar in spirit to the code in `do_pick_commit()` in
`sequencer.c`:
https://github.com/git/git/blob/v2.27.0/sequencer.c#L1765-L1774

The only difference is that you will also have to fall back to using
`HEAD` if the argument in question turns out not to refer to a revision
but is actually the first pathspec.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C
  2020-07-12  0:46           ` Johannes Schindelin
@ 2020-07-15 14:53             ` Shourya Shukla
  2020-07-15 18:41               ` Shourya Shukla
  0 siblings, 1 reply; 21+ messages in thread
From: Shourya Shukla @ 2020-07-15 14:53 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: kaartic.sivaraam, git, christian.couder, gitster, liu.denton,
	pc44800, stefanbeller

Hello Johannes!

On 12/07 02:46, Johannes Schindelin wrote:
> > > >>> +	cp_rev.git_cmd = 1;
> > > >>> +	argv_array_pushl(&cp_rev.args, "rev-parse", "-q", "--verify",
> > > >>> +			 argc ? argv[0] : "HEAD", NULL);
> > > >>
> > > >> Oy. Why not simply call `get_oid()`? No need to spawn a new process.
> > > >
> > > > Then everytime we need 'rev-parse', we simply call 'get_oid()'? That
> > > > will save us a ton of processes?
> > > >
> > > > But I think we do need to capture the output of 'git rev-parse --verify
> > > > ....' so I think it will backfire to use 'get_oid()' or am I just being
> > > > too dumb and not catching on something?
> > > >
> > >
> > > I'll leave this for others to answer.
> >
> > I will resolve this one after Dscho answers then.
> 
> The `rev-parse` command _would_ essentially call `get_oid()` and print the
> result (after converting it to hex, which you don't need, because the
> caller actually would parse the hex string anyway).
> 
> You can verify my claim by following the code:
> https://github.com/git/git/blob/v2.27.0/builtin/rev-parse.c#L956-L982 (it
> is slightly harder to follow because the `--verify` option makes sure that
> only one rev is shown, which means that the `get_oid_with_context()` call
> is separated from the corresponding `show_rev()` call).
> 
> So there really is no need to capture the output of that `rev-parse`
> command.

Yep.

> It is a different story, of course, when capturing the output of Git
> commands _that are run in a submodule_. Those currently still need to be
> spawned.

Oh alright, since we don't need to capture the output and rather use the
command for a particular functionality it offers to, in our just fetch
the OID of head, we use a helper function which the command actually
uses to make things happen instead of spawning a process. Is this
correct?

> > > >>> +
> > > >>> +	if (!capture_command(&cp_rev, &sb, 0)) {
> > > >>> +		strbuf_strip_suffix(&sb, "\n");
> > > >>> +		if (argc) {
> > > >>> +			argv++;
> > > >>> +			argc--;
> > > >>> +		}
> > > >>> +	} else if (!argc || !strcmp(argv[0], "HEAD")) {
> > > >>> +		/* before the first commit: compare with an empty tree */
> > > >>> +		struct stat st;
> > > >>> +		struct object_id oid;
> > > >>> +		if (fstat(0, &st) < 0 || index_fd(&the_index, &oid, 0, &st, 2,
> > > >>> +						  prefix, 3))
> > > >>> +			die("Unable to add %s to database", oid.hash);
> > > >>
> > > >> Umm. The original reads:
> > > >>
> > > >>                 # before the first commit: compare with an empty tree
> > > >>                 head=$(git hash-object -w -t tree --stdin </dev/null)
> > > >>
> > > >> It does not actually read from `stdin`. It reads from `/dev/null`,
> > > >> redirected to the input. And what it _actually_ does is to generate the
> > > >> OID of the empty tree.
> > > >>
> > > >> But we already _have_ the OID of the empty tree! It's
> > > >> `the_hash_algo->empty_tree`.
> > > >
> > > > I did not know this 'the_hash_algo'. I will use it. Thanks! :)
> > > >
> > > >> I hope that this is covered by the test suite. Please check that. The test
> > > >> would succeed with your version, but only because tests are run with
> > > >> `stdin` redirected from `/dev/null` by default.
> > > >
> > > > I guess yes. My work passed because the tests are written this way.
> > > >
> > > >>> +		strbuf_addstr(&sb, oid_to_hex(&oid));
> > > >>> +		if (argc) {
> > > >>> +			argv++;
> > > >>> +			argc--;
> > > >>> +		}
> > > >>> +	} else {
> > > >>> +		strbuf_addstr(&sb, "HEAD");
> > > >>> +	}
> > > >>
> > > >> The conversion to C would make for a fine excuse to simplify the logic.
> > > >
> > > > This was kind of like the 'shift' in shell. What equivalent do you
> > > > suggest?
> > > >
> > >
> > > I think that's just a general comment after the other comments found
> > > just above about simplifying things.
> >
> > Alright. But I do have to simplify the logic right?
> 
> You do not _have_ to. But it would make for a good opportunity to do that,
> I think, as the code is really hard to follow.
> 
> The idea here is actually not at all hard to understand, though: use
> the speficied rev (falling back to `HEAD`) to compare to, unless it is a
> yet-unborn `HEAD` in which case you compare to the empty tree.
> 
> It is very, very similar in spirit to the code in `do_pick_commit()` in
> `sequencer.c`:
> https://github.com/git/git/blob/v2.27.0/sequencer.c#L1765-L1774
> 
> The only difference is that you will also have to fall back to using
> `HEAD` if the argument in question turns out not to refer to a revision
> but is actually the first pathspec.

Alright. So do you suggest a code segment like this?

--------8<---------------------

module_summary() {
struct object_id *head_oid = NULL;
.....
.....
if (get_oid(argc ? argv[0] : "HEAD", head_oid)) {
			if (argc) {
			argv++;
			argc--;
		}
	} else if (!argc || !strcmp(argv[0], "HEAD")) {
		/* before the first commit: compare with an empty tree */
		oidcpy(head_oid, the_hash_algo->empty_tree);
		if (argc) {
			argv++;
			argc--;
		}
	} else {
		get_oid("HEAD", head_oid);
	}
.....
.....
}

*Passing head_oid as a pointer into the fucntion below*

compute_summary_module_list(..., struct object_id *head_oid,...) {
	.....
    .....
    if (head_oid) {
		argv_array_push(&diff_args, oid_to_hex(head_oid));
	}
    .....
    .....
}

----------->8-------------

When I try this, I get a:

    BUG: diff-lib.c:526: run_diff_index must be passed exactly one tree

And this occurs when we are inside the first if-statement. I do not
understand how we are passing multiple trees or making it seem as if
there are multiple trees? Also, even when we do not enter the first if,
we still get a segmentation fault. Why so?

Is this logic correct? I took some inspiration from the links you sent
above as well as some advice from Christian and Kaartic on this.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C
  2020-07-15 14:53             ` Shourya Shukla
@ 2020-07-15 18:41               ` Shourya Shukla
  2020-08-10 13:24                 ` Johannes Schindelin
  0 siblings, 1 reply; 21+ messages in thread
From: Shourya Shukla @ 2020-07-15 18:41 UTC (permalink / raw)
  To: shouryashukla.oo
  Cc: Johannes.Schindelin, christian.couder, git, gitster,
	kaartic.sivaraam, liu.denton, pc44800, stefanbeller

Ignore the above mail I sent, I am a big dimwit. I have tried to do this
and this works

---------------8<--------------
module_summary() {
struct object_id head;
....
....
if (!get_oid(argc ? argv[0] : "HEAD", &head_oid)) {
		if (argc) {
			argv++;
			argc--;
		}
	} else if (!argc || !strcmp(argv[0], "HEAD")) {
		/* before the first commit: compare with an empty tree */
		oidcpy(&head_oid, the_hash_algo->empty_tree);
		if (argc) {
			argv++;
			argc--;
		}
	} else {
		get_oid("HEAD", &head_oid);
	}
....
....
ret = compute_summary_module_list((diff_cmd == DIFF_FILES) ? NULL : &head_oid,
					   &info, diff_cmd);
	return ret;
}

compute_summary_module_list() {
....
....
if (head_oid)
		argv_array_push(&diff_args, oid_to_hex(head_oid));
....
....
}
--------------->8--------------

Obviousy I was making the grave mistake of declaring the aforementioned
struct as 'struct object_id *' which caused all these weird errors.

All tests pass now.

BTW in your review of my patch,
https://lore.kernel.org/git/nycvar.QRO.7.76.6.2007031712160.50@tvgsbejvaqbjf.bet/
you said this:
--------------->8--------------
> +			 */
> +			struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> +			struct strbuf sb_rev_parse = STRBUF_INIT;
> +
> +			cp_rev_parse.git_cmd = 1;
> +			cp_rev_parse.no_stderr = 1;
> +			cp_rev_parse.dir = p->sm_path;
> +			prepare_submodule_repo_env(&cp_rev_parse.env_array);
> +
> +			argv_array_pushl(&cp_rev_parse.args, "rev-parse",
> +					 "HEAD", NULL);
> +			if (!capture_command(&cp_rev_parse, &sb_rev_parse, 0)) {
> +				strbuf_strip_suffix(&sb_rev_parse, "\n");
> +				get_oid_hex(sb_rev_parse.buf, &p->oid_dst);
> +			}
> +			strbuf_release(&sb_rev_parse);
> +		} else if (S_ISLNK(p->mod_dst) || S_ISREG(p->mod_dst)) {
> +			struct child_process cp_hash_object = CHILD_PROCESS_INIT;
> +			struct strbuf sb_hash_object = STRBUF_INIT;
> +
> +			cp_hash_object.git_cmd = 1;
> +			argv_array_pushl(&cp_hash_object.args, "hash-object",
> +					 p->sm_path, NULL);
> +			if (!capture_command(&cp_hash_object,
> +					     &sb_hash_object, 0)) {
> +				strbuf_strip_suffix(&sb_hash_object, "\n");
> +				get_oid_hex(sb_hash_object.buf, &p->oid_dst);
> +			}
> +			strbuf_release(&sb_hash_object);

It would probably be shorter, less error-prone, and quicker to use
`index_fd()` directly.
--------------->8--------------

What exactly did you mean here and where should the index_fd() be used?

Regards,
Shourya Shukla

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C
  2020-07-15 18:41               ` Shourya Shukla
@ 2020-08-10 13:24                 ` Johannes Schindelin
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Schindelin @ 2020-08-10 13:24 UTC (permalink / raw)
  To: Shourya Shukla
  Cc: christian.couder, git, gitster, kaartic.sivaraam, liu.denton,
	pc44800, stefanbeller

Hi Shourya,

On Thu, 16 Jul 2020, Shourya Shukla wrote:

> BTW in your review of my patch,
> https://lore.kernel.org/git/nycvar.QRO.7.76.6.2007031712160.50@tvgsbejvaqbjf.bet/
> you said this:
> --------------->8--------------
> > +			 */
> > +			struct child_process cp_rev_parse = CHILD_PROCESS_INIT;
> > +			struct strbuf sb_rev_parse = STRBUF_INIT;
> > +
> > +			cp_rev_parse.git_cmd = 1;
> > +			cp_rev_parse.no_stderr = 1;
> > +			cp_rev_parse.dir = p->sm_path;
> > +			prepare_submodule_repo_env(&cp_rev_parse.env_array);
> > +
> > +			argv_array_pushl(&cp_rev_parse.args, "rev-parse",
> > +					 "HEAD", NULL);
> > +			if (!capture_command(&cp_rev_parse, &sb_rev_parse, 0)) {
> > +				strbuf_strip_suffix(&sb_rev_parse, "\n");
> > +				get_oid_hex(sb_rev_parse.buf, &p->oid_dst);
> > +			}
> > +			strbuf_release(&sb_rev_parse);
> > +		} else if (S_ISLNK(p->mod_dst) || S_ISREG(p->mod_dst)) {
> > +			struct child_process cp_hash_object = CHILD_PROCESS_INIT;
> > +			struct strbuf sb_hash_object = STRBUF_INIT;
> > +
> > +			cp_hash_object.git_cmd = 1;
> > +			argv_array_pushl(&cp_hash_object.args, "hash-object",
> > +					 p->sm_path, NULL);
> > +			if (!capture_command(&cp_hash_object,
> > +					     &sb_hash_object, 0)) {
> > +				strbuf_strip_suffix(&sb_hash_object, "\n");
> > +				get_oid_hex(sb_hash_object.buf, &p->oid_dst);
> > +			}
> > +			strbuf_release(&sb_hash_object);
>
> It would probably be shorter, less error-prone, and quicker to use
> `index_fd()` directly.
> --------------->8--------------
>
> What exactly did you mean here and where should the index_fd() be used?

Well, `index_fd()` is the function that `git hash-object` uses to
calculate the hash of a given file (in this case, specified via the path
`p->sm_path`).

You could open an fd to that file directly, use `index_fd()` to calculate
the hash, and thereby avoid spawning another (totally unnecessary)
process.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-08-10 13:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-02 19:24 [GSoC][PATCH 0/4] submodule: port 'summary' from Shell to C Shourya Shukla
2020-07-02 19:24 ` [PATCH 1/4] submodule: amend extra line feed between callback struct and macro Shourya Shukla
2020-07-03 14:57   ` Johannes Schindelin
2020-07-03 15:37     ` Philip Oakley
2020-07-04 15:21       ` Shourya Shukla
2020-07-04 15:39         ` Đoàn Trần Công Danh
2020-07-05 10:52           ` Philip Oakley
2020-07-02 19:24 ` [PATCH 2/4] submodule: rename helper functions to avoid ambiguity Shourya Shukla
2020-07-02 19:24 ` [PATCH 3/4] diff: change scope of the function count_lines() Shourya Shukla
2020-07-03 15:07   ` Johannes Schindelin
2020-07-04 15:28     ` Shourya Shukla
2020-07-04 21:46       ` Christian Couder
2020-07-02 19:24 ` [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C Shourya Shukla
2020-07-03 20:46   ` Johannes Schindelin
2020-07-05 17:34     ` Shourya Shukla
2020-07-06  9:16       ` Kaartic Sivaraam
2020-07-06 11:15         ` Shourya Shukla
2020-07-12  0:46           ` Johannes Schindelin
2020-07-15 14:53             ` Shourya Shukla
2020-07-15 18:41               ` Shourya Shukla
2020-08-10 13:24                 ` Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).